2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-29 13:27:59 +00:00

228 Commits

Author SHA1 Message Date
Mike Pattrick
3b1882261c ofproto-dpif-mirror: Add support for pre-selection filter.
Currently a bridge mirror will collect all packets and tools like
ovs-tcpdump can apply additional filters after they have already been
duplicated by vswitchd. This can result in inefficient collection.

This patch adds support to apply pre-selection to bridge mirrors, which
can limit which packets are mirrored based on flow metadata. This
significantly improves overall vswitchd performance during mirroring if
only a subset of traffic is required.

Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2024-07-16 00:28:23 +02:00
Ihar Hrachyshka
c62b4ac8f8 ovs-ofctl: Implement compose-packet --bare [--bad-csum].
With --bare, it will produce a bare hexified payload with no spaces or
offset indicators inserted, which is useful in tests to produce frames
to pass to e.g. `ovs-ofctl receive`.

With --bad-csum, it will produce a frame that has an invalid IP checksum
(applicable to IPv4 only because IPv6 doesn't have checksums.)

The command is now more useful in tests, where we may need to produce
hex frame payloads to compare observed frames against.

As an example of the tool use, a single test case is converted to it.
The test uses both normal --bare and --bad-csum behaviors of the
command, confirming they work as advertised.

Acked-by: Simon Horman <horms@ovn.org>
Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-11-16 18:16:59 +01:00
Nobuhiro MIKI
349112f975 flow: Support rt_hdr in parse_ipv6_ext_hdrs().
Checks whether IPPROTO_ROUTING exists in the IPv6 extension headers.
If it exists, the first address is retrieved.

If NULL is specified for "frag_hdr" and/or "rt_hdr", those addresses in
the header are not reported to the caller. Of course, "frag_hdr" and
"rt_hdr" are properly parsed inside this function.

Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-03-29 21:41:28 +02:00
Ilya Maximets
e7e9973b80 dpif-netdev: Forwarding optimization for flows with a simple match.
There are cases where users might want simple forwarding or drop rules
for all packets received from a specific port, e.g ::

  "in_port=1,actions=2"
  "in_port=2,actions=IN_PORT"
  "in_port=3,vlan_tci=0x1234/0x1fff,actions=drop"
  "in_port=4,actions=push_vlan:0x8100,set_field:4196->vlan_vid,output:3"

There are also cases where complex OpenFlow rules can be simplified
down to datapath flows with very simple match criteria.

In theory, for very simple forwarding, OVS doesn't need to parse
packets at all in order to follow these rules.  "Simple match" lookup
optimization is intended to speed up packet forwarding in these cases.

Design:

Due to various implementation constraints userspace datapath has
following flow fields always in exact match (i.e. it's required to
match at least these fields of a packet even if the OF rule doesn't
need that):

  - recirc_id
  - in_port
  - packet_type
  - dl_type
  - vlan_tci (CFI + VID) - in most cases
  - nw_frag - for ip packets

Not all of these fields are related to packet itself.  We already
know the current 'recirc_id' and the 'in_port' before starting the
packet processing.  It also seems safe to assume that we're working
with Ethernet packets.  So, for the simple OF rule we need to match
only on 'dl_type', 'vlan_tci' and 'nw_frag'.

'in_port', 'dl_type', 'nw_frag' and 13 bits of 'vlan_tci' can be
combined in a single 64bit integer (mark) that can be used as a
hash in hash map.  We are using only VID and CFI form the 'vlan_tci',
flows that need to match on PCP will not qualify for the optimization.
Workaround for matching on non-existence of vlan updated to match on
CFI and VID only in order to qualify for the optimization.  CFI is
always set by OVS if vlan is present in a packet, so there is no need
to match on PCP in this case.  'nw_frag' takes 2 bits of PCP inside
the simple match mark.

New per-PMD flow table 'simple_match_table' introduced to store
simple match flows only.  'dp_netdev_flow_add' adds flow to the
usual 'flow_table' and to the 'simple_match_table' if the flow
meets following constraints:

  - 'recirc_id' in flow match is 0.
  - 'packet_type' in flow match is Ethernet.
  - Flow wildcards contains only minimal set of non-wildcarded fields
    (listed above).

If the number of flows for current 'in_port' in a regular 'flow_table'
equals number of flows for current 'in_port' in a 'simple_match_table',
we may use simple match optimization, because all the flows we have
are simple match flows.  This means that we only need to parse
'dl_type', 'vlan_tci' and 'nw_frag' to perform packet matching.
Now we make the unique flow mark from the 'in_port', 'dl_type',
'nw_frag' and 'vlan_tci' and looking for it in the 'simple_match_table'.
On successful lookup we don't need to run full 'miniflow_extract()'.

Unsuccessful lookup technically means that we have no suitable flow
in the datapath and upcall will be required.  So, in this case EMC and
SMC lookups are disabled.  We may optimize this path in the future by
bypassing the dpcls lookup too.

Performance improvement of this solution on a 'simple match' flows
should be comparable with partial HW offloading, because it parses same
packet fields and uses similar flow lookup scheme.
However, unlike partial HW offloading, it works for all port types
including virtual ones.

Performance results when compared to EMC:

Test setup:

             virtio-user   OVS    virtio-user
  Testpmd1  ------------>  pmd1  ------------>  Testpmd2
  (txonly)       x<------  pmd2  <------------ (mac swap)

Single stream of 64byte packets.  Actions:
  in_port=vhost0,actions=vhost1
  in_port=vhost1,actions=vhost0

Stats collected from pmd1 and pmd2, so there are 2 scenarios:
Virt-to-Virt   :     Testpmd1 ------> pmd1 ------> Testpmd2.
Virt-to-NoCopy :     Testpmd2 ------> pmd2 --->x   Testpmd1.
Here the packet sent from pmd2 to Testpmd1 is always dropped, because
the virtqueue is full since Testpmd1 is in txonly mode and doesn't
receive any packets.  This should be closer to the performance of a
VM-to-Phy scenario.

Test performed on machine with Intel Xeon CPU E5-2690 v4 @ 2.60GHz.
Table below represents improvement in throughput when compared to EMC.

 +----------------+------------------------+------------------------+
 |                |    Default (-g -O2)    | "-Ofast -march=native" |
 |   Scenario     +------------+-----------+------------+-----------+
 |                |     GCC    |   Clang   |     GCC    |   Clang   |
 +----------------+------------+-----------+------------+-----------+
 | Virt-to-Virt   |    +18.9%  |   +25.5%  |    +10.8%  |   +16.7%  |
 | Virt-to-NoCopy |    +24.3%  |   +33.7%  |    +14.9%  |   +22.0%  |
 +----------------+------------+-----------+------------+-----------+

For Phy-to-Phy case performance improvement should be even higher, but
it's not the main use-case for this functionality.  Performance
difference for the non-simple flows is within a margin of error.

Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-07 20:32:20 +01:00
Yunjian Wang
828d9cb8d4 ovs: fix wrong quote
Remove the coma character by using ' and " character.

Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2021-07-06 15:40:04 -07:00
William Tu
3c6d05a02e userspace: Add GTP-U support.
GTP, GPRS Tunneling Protocol, is a group of IP-based communications
protocols used to carry general packet radio service (GPRS) within
GSM, UMTS and LTE networks.  GTP protocol has two parts: Signalling
(GTP-Control, GTP-C) and User data (GTP-User, GTP-U). GTP-C is used
for setting up GTP-U protocol, which is an IP-in-UDP tunneling
protocol. Usually GTP is used in connecting between base station for
radio, Serving Gateway (S-GW), and PDN Gateway (P-GW).

This patch implements GTP-U protocol for userspace datapath,
supporting only required header fields and G-PDU message type.
See spec in:
https://tools.ietf.org/html/draft-hmm-dmm-5g-uplane-analysis-00

Tested-at: https://travis-ci.org/github/williamtu/ovs-travis/builds/666518784
Signed-off-by: Feng Yang <yangfengee04@gmail.com>
Co-authored-by: Feng Yang <yangfengee04@gmail.com>
Signed-off-by: Yi Yang <yangyi01@inspur.com>
Co-authored-by: Yi Yang <yangyi01@inspur.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Ben Pfaff <blp@ovn.org>
2020-03-25 20:26:51 -07:00
Ilya Maximets
54e2baec09 flow: Fix crash on vlan packets with partial offloading.
parse_tcp_flags() does not care about vlan tags in a packet thus
not able to parse them.  As a result, if partial offloading is
enabled in userspace datapath vlan packets are not parsed, i.e.
has no initialized offsets.  This causes OVS crash on any attempt
to access/modify packet header fields.

For example, having the flow with following actions:
  in_port=1,ip,actions=mod_nw_src:192.168.0.7,output:IN_PORT

will lead to OVS crash on vlan packet handling:

 Process terminating with default action of signal 11 (SIGSEGV)
 Invalid read of size 4
    at 0x785657: get_16aligned_be32 (unaligned.h:249)
    by 0x785657: odp_set_ipv4 (odp-execute.c:82)
    by 0x785657: odp_execute_masked_set_action (odp-execute.c:527)
    by 0x785657: odp_execute_actions (odp-execute.c:894)
    by 0x74CDA9: dp_netdev_execute_actions (dpif-netdev.c:7355)
    by 0x74CDA9: packet_batch_per_flow_execute (dpif-netdev.c:6339)
    by 0x74CDA9: dp_netdev_input__ (dpif-netdev.c:6845)
    by 0x74DB6E: dp_netdev_input (dpif-netdev.c:6854)
    by 0x74DB6E: dp_netdev_process_rxq_port (dpif-netdev.c:4287)
    by 0x74E863: dpif_netdev_run (dpif-netdev.c:5264)
    by 0x703F57: type_run (ofproto-dpif.c:370)
    by 0x6EC8B8: ofproto_type_run (ofproto.c:1760)
    by 0x6DA52B: bridge_run__ (bridge.c:3188)
    by 0x6E083F: bridge_run (bridge.c:3252)
    by 0x1642E4: main (ovs-vswitchd.c:127)
  Address 0xc is not stack'd, malloc'd or (recently) free'd

Fix that by properly parsing vlan tags first.  Function 'parse_dl_type'
transformed for that purpose as it had no users anyway.

Added unit test for packet modification with partial offloading that
triggers above crash.

Fixes: aab96ec4d81e ("dpif-netdev: retrieve flow directly from the flow mark")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
2019-10-25 18:06:50 +02:00
Darrell Ball
523464abb2 flow: Enhance parse_ipv6_ext_hdrs.
Acked-by: Justin Pettit <jpettit@ovn.org>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2019-02-14 11:39:18 -08:00
Sriharsha Basavapatna via dev
738c785ff1 dpif-netlink: Detect Out-Of-Resource condition on a netdev
This is the first patch in the patch-set to support dynamic rebalancing
of offloaded flows.

The patch detects OOR condition on a netdev port when ENOSPC error is
returned by TC-Flower while adding a flow rule. A new structure is added
to the netdev called "netdev_hw_info", to store OOR related information
required to perform dynamic offload-rebalancing.

Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Co-authored-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Reviewed-by: Sathya Perla <sathya.perla@broadcom.com>
Reviewed-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2018-10-19 11:27:45 +02:00
Han Zhou
2ccd0d2f0b bfd: Make the tp_dst masking megaflow-friendly.
When there are tunnel ports with BFD enabled, all UDP flows will have
dst port as match condition in datapath, which causes unnecessarily
high flow miss for all UDP traffic, and results in latency increase.

This patch solves the problem by masking tp_dst only for a single
bit that is enough to tell the mismatch when it is not BFD traffic.

Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2018-September/047360.html
Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2018-10-03 15:12:11 -07:00
Martin Xu
84ddf96ce0 bundle: add symmetric_l3 hash method for multipath
Add a symmetric_l3 hash method that uses both network destination
address and network source address.

VMware-BZ: #2112940
Signed-off-by: Martin Xu <martinxu9.ovs@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2018-10-02 15:17:43 -07:00
Jianbo Liu
2f9366beb4 flow: Refactor some of VLAN helper functions
By default, these function are to change the first vlan vid and pcp
in the flow. Add a parameter as index for vlans if we want to handle
the second ones.

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2018-07-25 18:15:34 +02:00
Yuanhan Liu
aab96ec4d8 dpif-netdev: retrieve flow directly from the flow mark
So that we could skip some very costly CPU operations, including but
not limiting to miniflow_extract, emc lookup, dpcls lookup, etc. Thus,
performance could be greatly improved.

A PHY-PHY forwarding with 1000 mega flows (udp,tp_src=1000-1999) and
1 million streams (tp_src=1000-1999, tp_dst=2000-2999) show more that
260% performance boost.

Note that though the heavy miniflow_extract is skipped, we still have
to do per packet checking, due to we have to check the tcp_flags.

Co-authored-by: Finn Christensen <fc@napatech.com>
Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org>
Signed-off-by: Finn Christensen <fc@napatech.com>
Co-authored-by: Shahaf Shuler <shahafs@mellanox.com>
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-07-06 10:32:52 +01:00
Gavi Teitz
d63ca5329f dpctl: Properly reflect a rule's offloaded to HW state
Previously, any rule that is offloaded via a netdev, not necessarily
to the HW, would be reported as "offloaded". This patch fixes this
misalignment, and introduces the 'dp' state, as follows:

rule is in HW via TC offload  -> offloaded=yes dp:tc
rule is in not HW over TC DP  -> offloaded=no  dp:tc
rule is in not HW over OVS DP -> offloaded=no  dp:ovs

To achieve this, the flows's 'offloaded' flag was encapsulated in a new
attrs struct, which contains the offloaded state of the flow and the
DP layer the flow is handled in, and instead of setting the flow's
'offloaded' state based solely on the type of dump it was acquired
via, for netdev flows it now sends the new attrs struct to be
collected along with the rest of the flow via the netdev, allowing
it to be set per flow.

For TC offloads, the offloaded state is set based on the 'in_hw' and
'not_in_hw' flags received from the TC as part of the flower. If no
such flag was received, due to lack of kernel support, it defaults
to true.

Signed-off-by: Gavi Teitz <gavi@mellanox.com>
Acked-by: Roi Dayan <roid@mellanox.com>
[simon: resolved conflict in lib/dpctl.man]
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2018-06-18 09:57:37 +02:00
Jan Scheurich
6a0b0d3be8 userspace datapath: Add OVS_HASH_L4_SYMMETRIC dp_hash algorithm
This commit implements a new dp_hash algorithm OVS_HASH_L4_SYMMETRIC in
the netdev datapath. It will be used as default hash algorithm for the
dp_hash-based select groups in a subsequent commit to maintain
compatibility with the symmetry property of the current default hash
selection method.

A new dpif_backer_support field 'max_hash_alg' is introduced to reflect
the highest hash algorithm a datapath supports in the dp_hash action.

Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Nitin Katiyar <nitin.katiyar@ericsson.com>
Co-authored-by: Nitin Katiyar <nitin.katiyar@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2018-05-25 14:58:40 -07:00
William Tu
7dc18ae96d userspace: add erspan tunnel support.
ERSPAN is a tunneling protocol based on GRE tunnel.  The patch
add erspan tunnel support for ovs-vswitchd with userspace datapath.
Configuring erspan tunnel is similar to gre tunnel, but with
additional erspan's parameters.  Matching a flow on erspan's
metadata is also supported, see ovs-fields for more details.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2018-05-21 20:33:30 -07:00
Manohar Krishnappa Chidambaraswamy
ba07cf222a Handle gratuitous ARP requests and replies in tnl_arp_snoop()
Problem:
========
In user-space tunneling implementation, tnl_arp_snoop() snoops only ARP
*reply* packets to resolve tunnel nexthop IP addresses to MAC addresses.
Normally the ARP requests are periodically sent by the local host IP stack,
so that the ARP cache in OVS is refreshed and entries do not time out.
However, if the remote tunnel nexthop is a VRRP IP, and the gateway
periodically sends gratuitous ARP *requests* to announce itself,
tnl_arp_snoop() treats them as INVALID. Consequently, the ARP cache in OVS
expires after 10 minutes, which results in dropping of the next packet(s)
until a new ARP request is responded to.

Fix:
====
Enhance the tunnel neighbor resolution logic in OVS to not only snoop on
ARP replies but also on gratuitous ARP requests.

Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
From: Manohar K C <manohar.krishnappa.chidambaraswamy@ericsson.com>
CC: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2018-04-10 16:19:14 -07:00
Ben Pfaff
1dc1ec247e flow, match, classifier: Add new functions for miniflow and minimatch.
The miniflow and minimatch APIs lack several of the features of the flow
and match APIs.  This commit adds a few of the missing functions.

These functions will be used for the first time in an upcoming commit.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Reviewed-by: Armando Migliaccio <armamig@gmail.com>
2018-03-31 11:32:10 -07:00
Ben Pfaff
f825fdd4ff flow: Improve type-safety of MINIFLOW_GET_TYPE.
Until mow, this macro has blindly read the passed-in type's size, but
that's unnecessarily risky.  This commit changes it to verify that the
passed-in type is the same size as the field and, on GCC and Clang, that
the types are compatible.  It also adds a version that does not check,
for the one case where (currently) we deliberately read the wrong size,
and updates a few uses to use more precise field names.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Reviewed-by: Armando Migliaccio <armamig@gmail.com>
2018-03-31 11:31:51 -07:00
Ben Pfaff
6f06837989 flow: Add some L7 payload data to most L4 protocols that accept it.
This makes traffic generated by flow_compose() look slightly more
realistic.  It requires lots of updates to tests, but at least the tests
themselves should be slightly more realistic too.

At the same time, add --l7 and --l7-len options to ofproto/trace to allow
users to specify the amount or contents of payloads that they want.

Suggested-by: Brad Cowie <brad@cowie.nz>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Tested-by: Yifeng Sun <pkusunyifeng@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
2018-01-27 08:58:31 -08:00
Yi Yang
17553f27ba nsh: add new flow key 'ttl'
IETF NSH draft added a new filed ttl in NSH header, this patch
is to add new nsh key 'ttl' for it.

Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2018-01-11 11:46:11 -08:00
Numan Siddique
83ee26c941 Check flow's dl_type before setting ct_orig_tuple in 'pkt_metadata_from_flow()'
Normally flow's dl_type will be a valid value. However when a packet is sent to
the controller, dl_type is not stored in the 'ofputil_packet_in_private'. When
the controller resumes (OFPRAW_NXT_RESUME) the packet, the flow's dl_type will be
0. If the flow's ct_state has valid value, then the 'pkt_metadata_from_flow'
neither sets the ct_orig_tuple from the flow nor resets it. This results in invalid
value ct_orig_tuple in the pkt_metadata.

This patch handles this situation by checking the dl_type before setting the
ct_orig_tuple. If dl_type is 0, it resets it. It also resets ct_orig_tuple if
dl_type is non zero and other than IPv4 or IPv6.

Reported-by: Daniel Alvarez Sanchez <dalvarez@redhat.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2017-October/339868.html
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-10-25 11:19:31 -07:00
Jan Scheurich
3d2fbd70bd userspace: Add support for NSH MD1 match fields
This patch adds support for NSH packet header fields to the OVS
control plane and the userspace datapath. Initially we support the
fields of the NSH base header as defined in
https://www.ietf.org/id/draft-ietf-sfc-nsh-13.txt
and the fixed context headers specified for metadata format MD1.
The variable length MD2 format is parsed but the TLV context headers
are not yet available for matching.

The NSH fields are modelled as experimenter fields with the dedicated
experimenter class 0x005ad650 proposed for NSH in ONF. The following
fields are defined:

NXOXM code            ofctl name    Size      Comment
=====================================================================
NXOXM_NSH_FLAGS       nsh_flags       8       Bits 2-9 of 1st NSH word
(0x005ad650,1)
NXOXM_NSH_MDTYPE      nsh_mdtype      8       Bits 16-23
(0x005ad650,2)
NXOXM_NSH_NEXTPROTO   nsh_np          8       Bits 24-31
(0x005ad650,3)
NXOXM_NSH_SPI         nsh_spi         24      Bits 0-23 of 2nd NSH word
(0x005ad650,4)
NXOXM_NSH_SI          nsh_si          8       Bits 24-31
(0x005ad650,5)
NXOXM_NSH_C1          nsh_c1          32      Maskable, nsh_mdtype==1
(0x005ad650,6)
NXOXM_NSH_C2          nsh_c2          32      Maskable, nsh_mdtype==1
(0x005ad650,7)
NXOXM_NSH_C3          nsh_c3          32      Maskable, nsh_mdtype==1
(0x005ad650,8)
NXOXM_NSH_C4          nsh_c4          32      Maskable, nsh_mdtype==1
(0x005ad650,9)

Co-authored-by: Johnson Li <johnson.li@intel.com>
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-08-07 11:26:09 -07:00
Andy Zhou
bc0f51765d flow: Refactor flow_compose() API.
Currently, flow_compose_size() is only supposed to be called after
flow_compose(). I find this API to be unintuitive.

Change flow_compose() API to take the 'size' argument, and
returns 'true' if the packet can be created, 'false' otherwise.

This change also improves error detection and reporting when
'size' is unreasonably small.

Signed-off-by: Andy Zhou <azhou@ovn.org>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
2017-07-27 15:22:39 -07:00
Ilya Maximets
3476ce3ad7 flow: Add flow_compose_size().
This allows to compose packets with different real lenghts from
odp flows i.e. memory will be allocated for requested packet
size and all required headers like ip->tot_len filled correctly.

Will be used in netdev-dummy to properly handle '--len' option.

Suggested-by: Andy Zhou <azhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Andy Zhou <azhou@ovn.org>
2017-07-25 14:42:11 -07:00
Yi-Hung Wei
b4293a336d conntrack: Move ct_state parsing to lib/flow.c
This patch moves conntrack state parsing function from ovn-trace.c to
lib/flow.c, because it will be used by ofproto/trace unixctl command
later on. It also updates the ct_state checking logic, since we no longer
assume CS_TRACKED is enable by default.

Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-07-12 09:54:29 -07:00
Jan Scheurich
3d4b2e6eb7 userspace: Add OXM field MFF_PACKET_TYPE
Allow packet type namespace OFPHTN_ETHERTYPE as alternative pre-requisite
for matching L3 protocols (MPLS, IP, IPv6, ARP etc).

Change the meta-flow definition of packet_type field to use the new
custom format MFS_PACKET_TYPE representing "(NS,NS_TYPE)".

Parsing routine for MFS_PACKET_TYPE added to meta-flow.c. Formatting
routine for field packet_type extracted from match_format() and moved to
flow.c to be used from meta-flow.c for formatting MFS_PACKET_TYPE.

Updated the ovs-fields man page source meta-flow.xml with documentation
for packet-type-aware bridges and added documentation for field packet_type.

Added packet_type to the matching properties in tests/ofproto.at.

If dl_type is unwildcarded due to later packet modification, make sure it
is cleared again if the original packet_type was not PT_ETH.

Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-06-27 17:28:30 -04:00
Ben Pfaff
50f96b10e1 Support accepting and displaying port names in OVS tools.
Until now, most ovs-ofctl commands have not accepted names for ports, only
numbers, and have not been able to display port names either.  It's a lot
easier for users if they can use and see meaningful names instead of
arbitrary numbers.  This commit adds that support.

For backward compatibility, only interactive ovs-ofctl commands by default
display port names; to display them in scripts, use the new --names
option.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Tested-by: Aaron Conole <aconole@redhat.com>
2017-05-31 16:06:12 -07:00
Jan Scheurich
2482b0b0c8 userspace: Add packet_type in dp_packet and flow
This commit adds a packet_type attribute to the structs dp_packet and flow
to explicitly carry the type of the packet as prepration for the
introduction of the so-called packet type-aware pipeline (PTAP) in OVS.

The packet_type is a big-endian 32 bit integer with the encoding as
specified in OpenFlow verion 1.5.

The upper 16 bits contain the packet type name space. Pre-defined values
are defined in openflow-common.h:

enum ofp_header_type_namespaces {
    OFPHTN_ONF = 0,             /* ONF namespace. */
    OFPHTN_ETHERTYPE = 1,       /* ns_type is an Ethertype. */
    OFPHTN_IP_PROTO = 2,        /* ns_type is a IP protocol number. */
    OFPHTN_UDP_TCP_PORT = 3,    /* ns_type is a TCP or UDP port. */
    OFPHTN_IPV4_OPTION = 4,     /* ns_type is an IPv4 option number. */
};

The lower 16 bits specify the actual type in the context of the name space.

Only name spaces 0 and 1 will be supported for now.

For name space OFPHTN_ONF the relevant packet type is 0 (Ethernet).
This is the default packet_type in OVS and the only one supported so far.
Packets of type (OFPHTN_ONF, 0) are called Ethernet packets.

In name space OFPHTN_ETHERTYPE the type is the Ethertype of the packet.
A packet of type (OFPHTN_ETHERTYPE, <Ethertype>) is a standard L2 packet
whith the Ethernet header (and any VLAN tags) removed to expose the L3
(or L2.5) payload of the packet. These will simply be called L3 packets.

The Ethernet address fields dl_src and dl_dst in struct flow are not
applicable for an L3 packet and must be zero. However, to maintain
compatibility with the large code base, we have chosen to copy the
Ethertype of an L3 packet into the the dl_type field of struct flow.

This does not mean that it will be possible to match on dl_type for L3
packets with PTAP later on. Matching must be done on packet_type instead.

New dp_packets are initialized with packet_type Ethernet. Ports that
receive L3 packets will have to explicitly adjust the packet_type.

Signed-off-by: Jean Tourrilhes <jt@labs.hpe.com>
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Co-authored-by: Zoltan Balogh <zoltan.balogh@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-05-03 16:56:40 -07:00
Ben Pfaff
6846e91e6f flow: New function flow_clear_conntrack().
This will have a new user in an upcoming commit.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Miguel Angel Ajo <majopela@redhat.com>
2017-04-21 08:20:06 -07:00
Ben Pfaff
b02e6cf86a flow: New function ct_state_from_string().
This will have its first user in an upcoming commit.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Miguel Angel Ajo <majopela@redhat.com>
2017-04-21 08:20:06 -07:00
Eric Garver
f0fb825a37 Add support for 802.1ad (QinQ tunneling)
Flow key handling changes:
 - Add VLAN header array in struct flow, to record multiple 802.1q VLAN
   headers.
 - Add dpif multi-VLAN capability probing. If datapath supports
   multi-VLAN, increase the maximum depth of nested OVS_KEY_ATTR_ENCAP.

Refactor VLAN handling in dpif-xlate:
 - Introduce 'xvlan' to track VLAN stack during flow processing.
 - Input and output VLAN translation according to the xbundle type.

Push VLAN action support:
 - Allow ethertype 0x88a8 in VLAN headers and push_vlan action.
 - Support push_vlan on dot1q packets.

Use other_config:vlan-limit in table Open_vSwitch to limit maximum VLANs
that can be matched. This allows us to preserve backwards compatibility.

Add test cases for VLAN depth limit, Multi-VLAN actions and QinQ VLAN
handling

Co-authored-by: Thomas F Herbert <thomasfherbert@gmail.com>
Signed-off-by: Thomas F Herbert <thomasfherbert@gmail.com>
Co-authored-by: Xiao Liang <shaw.leon@gmail.com>
Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Signed-off-by: Eric Garver <e@erig.me>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-03-16 15:18:40 -07:00
Jarno Rajahalme
daf4d3c18d odp: Support conntrack orig tuple key.
Userspace support for datapath original direction conntrack tuple.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
2017-03-08 17:23:15 -08:00
Jarno Rajahalme
742c0ac3c0 mpls: Fix MPLS restoration after patch port and group bucket.
This patch fixes problems with MPLS handling related to patch ports
and group buckets.

If a group bucket or a peer bridge across a patch port pushes MPLS
headers to a non-MPLS packet and outputs, the flow translation after
returning from the group bucket or patch port would undo the packet
transformations so that the processing could continue with the packet
as it was before entering the patch port.  There were two problems
with this:

1. As part of the first MPLS push on a non-MPLS packet, the flow
translation would first clear the L3/4 headers of the 'flow' to mark
those fields invalid.  Later, when committing 'flow' changes to
datapath actions before output, the necessary datapath MPLS actions
are created and the corresponding changes updated to the 'base flow'.
This was done using the same flow_push_mpls() function that clears
the L2/3 headers, so also the 'base flow' L2/3 headers were cleared.

Then, when translation returns from a patch port or group bucket, the
original 'flow' is restored, now showing no sign of the MPLS labels.
Since the 'base flow' now has the MPLS labels, following translations
know to issue MPLS POP actions before any output actions.  However, as
part of checking for changes to IP headers we test that the IP
protocol type was not changed.  But now the 'base flow's 'nw_proto'
field is zero and an assert fail crashes OVS.

This is solved by not clearing the L3/4 fields of the 'base
flow'. This allows the processing after the patch port to continue
with L3/4 fields as if no MPLS was done, after first issuing the
necessary MPLS POP actions.

2. IP header updates were done before the MPLS POP actions were
issued. This caused incorrect packet output after, e.g., group action
or patch port.  For example, with actions:

group 1234: all bucket=push_mpls,output:LOCAL

ip actions=group:1234,dec_ttl,output:LOCAL,output:LOCAL

the dec_ttl would only be executed before the last output to LOCAL,
since at the time of committing IP changes after the group action the
packet was still an MPLS packet.

This is solved by checking the dl_type of both 'flow' and 'base flow'
and issuing MPLS actions if they can transform the packet from an MPLS
packet to a non-MPLS packet.  For an IP packet the change in ttl can
then be correctly committed before the last two output actions.

Two test cases are added to prevent future regressions.

Reported-by: Thomas Morin <thomas.morin@orange.com>
Suggested-by: Takashi YAMAMOTO <yamamoto@ovn.org>
Fixes: 8bfd0fdac ("Enhance userspace support for MPLS, for up to 3 labels.")
Fixes: 1b035ef20 ("mpls: Allow l3 and l4 actions to prior to a push_mpls action")
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: YAMAMOTO Takashi <yamamoto@ovn.org>
2016-12-02 18:42:21 -08:00
Bhanuprakash Bodireddy
d8a304b447 flow: Add comments to mf_get_next_in_map().
This patch adds comments to mf_get_next_in_map() to make it more
comprehensible.

Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Antonio Fischetti <antonio.fischetti@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-11-14 14:25:25 -08:00
Bhanuprakash Bodireddy
22ccf9c258 flow: Skip invoking expensive count_1bits() with zero input.
This patch checks if trash is non-zero and only then resets the flowmap
bit and increment the pointer by set bits as found in trash.

Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Co-authored-by: Antonio Fischetti <antonio.fischetti@intel.com>
Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>
Acked-by: Jarno Rajahalme <jarno@ovn.org>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-11-14 14:25:25 -08:00
Daniele Di Proietto
05bb914854 tnl-neigh-cache: Unwildcard flow members before inspecting them.
tnl_neigh_snoop() is part of the translation.  During translation we
have to unwildcard all the fields we examine to make a decision.

tnl_arp_snoop() and tnl_nd_snoop() failed to unwildcard fileds in case
of failure.  The solution is to do unwildcarding before the field is
inspected.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Jarno Rajahalme <jarno@ovn.org>
2016-09-21 18:04:06 -07:00
Jarno Rajahalme
aff49b8c66 meta-flow: Clean up masking with prerequisities checking.
Change mf_are_prereqs_ok() take a flow_wildcards pointer, so that the
wildcards can be set at the same time as the prerequisiteis are
checked.  This makes it easier to write more obviously correct code.

Remove the functions mf_mask_field_and_prereqs() and
mf_mask_field_and_prereqs__(), and make the callers first check the
prerequisites, while supplying 'wc' to mf_are_prereqs_ok(), and if
successful, mask the bits of the field that were read or set using
mf_mask_field_masked().

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
2016-07-29 16:52:03 -07:00
Daniele Di Proietto
206b60d48f flow: Introduce parse_dl_type().
The function simply returns the ethernet type of the packet (after
eventually discarding the VLAN tag).  It will be used by a following
commit.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
2016-07-27 17:58:44 -07:00
Daniele Di Proietto
94a81e4021 flow: Export parse_ipv6_ext_hdrs().
This will be used by a future commit.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Joe Stringer <joe@ovn.org>
Acked-by: Flavio Leitner <fbl@sysclose.org>
2016-07-27 17:58:44 -07:00
Justin Pettit
b23ada8eec Introduce 128-bit xxregs.
These are needed to handle IPv6 addresses.

Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
2016-07-12 21:14:02 -07:00
Ben Pfaff
c17fcc0aed flow: New function is_nd().
This simplifies a few pieces of code and will acquire another user in an
upcoming commit.

Signed-off-by: Ben Pfaff <blp@ovn.org>
2016-07-02 11:35:55 -07:00
Ben Pfaff
a75636c8b9 ofproto-dpif-xlate: Fix IGMP megaflow matching.
IGMP translations wasn't setting enough bits in the wildcards to ensure
different packets were handled differently.

Reported-by: "O'Reilly, Darragh" <darragh.oreilly@hpe.com>
Reported-at: http://openvswitch.org/pipermail/discuss/2016-April/021036.html
Signed-off-by: Ben Pfaff <blp@ovn.org>
2016-05-20 13:12:18 -07:00
Ben Warren
1182038f4e Break flow.h into private and public parts
Public (struct definitions and some prototypes) go in
include/openvswitch

Signed-off-by: Ben Warren <ben@skyportsystems.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2016-04-14 11:03:47 -07:00
Ben Pfaff
0a96a21b6e hash: New helper functions hash_bytes32() and hash_bytes64().
All of the callers of hash_words() and hash_words64() actually find it
easier to pass in the number of bytes instead of the number of 32-bit
or 64-bit words.  These new functions allow the callers to be a little
simpler.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Jarno Rajahalme <jarno@ovn.org>
2016-01-20 10:02:53 -08:00
Joe Stringer
efba5ae464 ofproto-dpif: Reject partial ct_labels if unsupported.
If only half of a ct_label is present in a miniflow/minimask (eg, only
matching on one specific bit), then rule_check() would allow the flow
even if ct_label was unsupported, because it required both 64-bit fields
that comprise the ct_label to be present in the miniflow before
performing the check.

Fix this by populating the stack copy of the label directly from the
miniflow fields if available (or zero each 64-bit word if unavailable).

Suggested-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Jarno Rajahalme <jarno@ovn.org>
2015-12-01 15:29:00 -08:00
Jiri Benc
ffe4c74f93 tunneling: extend flow_tnl with ipv6 addresses
Note that because there's been no prerequisite on the outer protocol,
we cannot add it now. Instead, treat the ipv4 and ipv6 dst fields in the way
that either both are null, or at most one of them is non-null.

[cascardo: abstract testing either dst with flow_tnl_dst_is_set]
cascardo: using IPv4-mapped address is an exercise for the future, since this
would require special handling of MFF_TUN_SRC and MFF_TUN_DST and OpenFlow
messages.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Co-authored-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2015-11-30 10:31:35 -08:00
Justin Pettit
f6ecf944a9 vswitchd: Allow modifying ICMP type and code.
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
2015-11-09 15:01:50 -08:00
Joe Stringer
9daf23484f Add connection tracking label support.
This patch adds a new 128-bit metadata field to the connection tracking
interface. When a label is specified as part of the ct action and the
connection is committed, the value is saved with the current connection.
Subsequent ct lookups with the table specified will expose this metadata
as the "ct_label" field in the flow.

For example, to allow new TCP connections from port 1->2 and only allow
established connections from port 2->1, and to associate a label with
those connections:

    table=0,priority=1,action=drop
    table=0,arp,action=normal
    table=0,in_port=1,tcp,action=ct(commit,exec(set_field:1->ct_label)),2
    table=0,in_port=2,ct_state=-trk,tcp,action=ct(table=1)
    table=1,in_port=2,ct_state=+trk,ct_label=1,tcp,action=1

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-10-13 15:34:16 -07:00
Joe Stringer
8e53fe8cf7 Add connection tracking mark support.
This patch adds a new 32-bit metadata field to the connection tracking
interface. When a mark is specified as part of the ct action and the
connection is committed, the value is saved with the current connection.
Subsequent ct lookups with the table specified will expose this metadata
as the "ct_mark" field in the flow.

For example, to allow new TCP connections from port 1->2 and only allow
established connections from port 2->1, and to associate a mark with those
connections:

    table=0,priority=1,action=drop
    table=0,arp,action=normal
    table=0,in_port=1,tcp,action=ct(commit,exec(set_field:1->ct_mark)),2
    table=0,in_port=2,ct_state=-trk,tcp,action=ct(table=1)
    table=1,in_port=2,ct_state=+trk,ct_mark=1,tcp,action=1

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-10-13 15:34:15 -07:00