With --bare, it will produce a bare hexified payload with no spaces or
offset indicators inserted, which is useful in tests to produce frames
to pass to e.g. `ovs-ofctl receive`.
With --bad-csum, it will produce a frame that has an invalid IP checksum
(applicable to IPv4 only because IPv6 doesn't have checksums.)
The command is now more useful in tests, where we may need to produce
hex frame payloads to compare observed frames against.
As an example of the tool use, a single test case is converted to it.
The test uses both normal --bare and --bad-csum behaviors of the
command, confirming they work as advertised.
Acked-by: Simon Horman <horms@ovn.org>
Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The netdev receiving packets is supposed to provide the flags
indicating if the L4 checksum was verified and it is OK or BAD,
otherwise the stack will check when appropriate by software.
If the packet comes with good checksum, then postpone the
checksum calculation to the egress device if needed.
When encapsulate a packet with that flag, set the checksum
of the inner L4 header since that is not yet supported.
Calculate the L4 checksum when the packet is going to be sent
over a device that doesn't support the feature.
Linux tap devices allows enabling L3 and L4 offload, so this
patch enables the feature. However, Linux socket interface
remains disabled because the API doesn't allow enabling
those two features without enabling TSO too.
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Co-authored-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The netdev receiving packets is supposed to provide the flags
indicating if the IP checksum was verified and it is GOOD or BAD,
otherwise the stack will check when appropriate by software.
If the packet comes with good checksum, then postpone the
checksum calculation to the egress device if needed.
When encapsulate a packet with that flag, set the checksum
of the inner IP header since that is not yet supported.
Calculate the IP checksum when the packet is going to be sent over
a device that doesn't support the feature.
Linux devices don't support IP checksum offload alone, so the
support is not enabled.
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Co-authored-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Checks whether IPPROTO_ROUTING exists in the IPv6 extension headers.
If it exists, the first address is retrieved.
If NULL is specified for "frag_hdr" and/or "rt_hdr", those addresses in
the header are not reported to the caller. Of course, "frag_hdr" and
"rt_hdr" are properly parsed inside this function.
Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
nw_frag was not being printed in the flow key because it was improperly
masked and printed. Since this field is only two bits, it needs to use a
different macro to be masked. During printing, the switch statement
switched on the whole 8 bits rather than just the two that are relevant.
This caused nw_frag to often not be printed at all.
Signed-off-by: Rosemarie O'Riorden <roriorden@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
For packets which don't already have a hash calculated,
miniflow_hash_5tuple() calculates the hash of a packet
using the previously built miniflow.
This commit adds IPv6 profile specific hashing which
uses fixed offsets into the packet to improve hashing
performance.
Signed-off-by: Kumar Amber <kumar.amber@intel.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
For packets which don't already have a hash calculated,
miniflow_hash_5tuple() calculates the hash of a packet
using the previously built miniflow.
This commit adds IPv4 profile specific hashing which
uses fixed offsets into the packet to improve hashing
performance.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Co-authored-by: Harry van Haaren <harry.van.haaren@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Co-authored-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Kumar Amber <kumar.amber@intel.com>
Acked-by: Cian Ferriter <cian.ferriter@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
There are cases where users might want simple forwarding or drop rules
for all packets received from a specific port, e.g ::
"in_port=1,actions=2"
"in_port=2,actions=IN_PORT"
"in_port=3,vlan_tci=0x1234/0x1fff,actions=drop"
"in_port=4,actions=push_vlan:0x8100,set_field:4196->vlan_vid,output:3"
There are also cases where complex OpenFlow rules can be simplified
down to datapath flows with very simple match criteria.
In theory, for very simple forwarding, OVS doesn't need to parse
packets at all in order to follow these rules. "Simple match" lookup
optimization is intended to speed up packet forwarding in these cases.
Design:
Due to various implementation constraints userspace datapath has
following flow fields always in exact match (i.e. it's required to
match at least these fields of a packet even if the OF rule doesn't
need that):
- recirc_id
- in_port
- packet_type
- dl_type
- vlan_tci (CFI + VID) - in most cases
- nw_frag - for ip packets
Not all of these fields are related to packet itself. We already
know the current 'recirc_id' and the 'in_port' before starting the
packet processing. It also seems safe to assume that we're working
with Ethernet packets. So, for the simple OF rule we need to match
only on 'dl_type', 'vlan_tci' and 'nw_frag'.
'in_port', 'dl_type', 'nw_frag' and 13 bits of 'vlan_tci' can be
combined in a single 64bit integer (mark) that can be used as a
hash in hash map. We are using only VID and CFI form the 'vlan_tci',
flows that need to match on PCP will not qualify for the optimization.
Workaround for matching on non-existence of vlan updated to match on
CFI and VID only in order to qualify for the optimization. CFI is
always set by OVS if vlan is present in a packet, so there is no need
to match on PCP in this case. 'nw_frag' takes 2 bits of PCP inside
the simple match mark.
New per-PMD flow table 'simple_match_table' introduced to store
simple match flows only. 'dp_netdev_flow_add' adds flow to the
usual 'flow_table' and to the 'simple_match_table' if the flow
meets following constraints:
- 'recirc_id' in flow match is 0.
- 'packet_type' in flow match is Ethernet.
- Flow wildcards contains only minimal set of non-wildcarded fields
(listed above).
If the number of flows for current 'in_port' in a regular 'flow_table'
equals number of flows for current 'in_port' in a 'simple_match_table',
we may use simple match optimization, because all the flows we have
are simple match flows. This means that we only need to parse
'dl_type', 'vlan_tci' and 'nw_frag' to perform packet matching.
Now we make the unique flow mark from the 'in_port', 'dl_type',
'nw_frag' and 'vlan_tci' and looking for it in the 'simple_match_table'.
On successful lookup we don't need to run full 'miniflow_extract()'.
Unsuccessful lookup technically means that we have no suitable flow
in the datapath and upcall will be required. So, in this case EMC and
SMC lookups are disabled. We may optimize this path in the future by
bypassing the dpcls lookup too.
Performance improvement of this solution on a 'simple match' flows
should be comparable with partial HW offloading, because it parses same
packet fields and uses similar flow lookup scheme.
However, unlike partial HW offloading, it works for all port types
including virtual ones.
Performance results when compared to EMC:
Test setup:
virtio-user OVS virtio-user
Testpmd1 ------------> pmd1 ------------> Testpmd2
(txonly) x<------ pmd2 <------------ (mac swap)
Single stream of 64byte packets. Actions:
in_port=vhost0,actions=vhost1
in_port=vhost1,actions=vhost0
Stats collected from pmd1 and pmd2, so there are 2 scenarios:
Virt-to-Virt : Testpmd1 ------> pmd1 ------> Testpmd2.
Virt-to-NoCopy : Testpmd2 ------> pmd2 --->x Testpmd1.
Here the packet sent from pmd2 to Testpmd1 is always dropped, because
the virtqueue is full since Testpmd1 is in txonly mode and doesn't
receive any packets. This should be closer to the performance of a
VM-to-Phy scenario.
Test performed on machine with Intel Xeon CPU E5-2690 v4 @ 2.60GHz.
Table below represents improvement in throughput when compared to EMC.
+----------------+------------------------+------------------------+
| | Default (-g -O2) | "-Ofast -march=native" |
| Scenario +------------+-----------+------------+-----------+
| | GCC | Clang | GCC | Clang |
+----------------+------------+-----------+------------+-----------+
| Virt-to-Virt | +18.9% | +25.5% | +10.8% | +16.7% |
| Virt-to-NoCopy | +24.3% | +33.7% | +14.9% | +22.0% |
+----------------+------------+-----------+------------+-----------+
For Phy-to-Phy case performance improvement should be even higher, but
it's not the main use-case for this functionality. Performance
difference for the non-simple flows is within a margin of error.
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
'dataofs' field of TCP header indicates the TCP header length. The
length should be >= 20 bytes/4 and <= TCP data length. This patch is
to test the 'dataofs' and not parse layer 4 fields when meet bad
dataofs.
This behavior is consistent with the openvswitch kernel module.
Fixes: 5a51b2cd3483 ("lib/ofpbuf: Remove 'l7' pointer.")
Signed-off-by: lic121 <lic121@chinatelecom.cn>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Skipping further processing of invalid IP packets helps avoid crashes
but it does not help to figure out if the malformed packets are still
present on the network.
Add coverage counters for IPv4 and IPv6 sanity checks so that we know
there are some invalid packets.
Dump such whole packets in debug mode.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Although not required, padding can be optionally added until
the packet length is MTU bytes. A packet with extra padding
currently fails sanity checks.
Vulnerability: CVE-2020-35498
Fixes: fa8d9001a624 ("miniflow_extract: Properly handle small IP packets.")
Reported-by: Joakim Hindersson <joakim.hindersson@elastx.se>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
GTP, GPRS Tunneling Protocol, is a group of IP-based communications
protocols used to carry general packet radio service (GPRS) within
GSM, UMTS and LTE networks. GTP protocol has two parts: Signalling
(GTP-Control, GTP-C) and User data (GTP-User, GTP-U). GTP-C is used
for setting up GTP-U protocol, which is an IP-in-UDP tunneling
protocol. Usually GTP is used in connecting between base station for
radio, Serving Gateway (S-GW), and PDN Gateway (P-GW).
This patch implements GTP-U protocol for userspace datapath,
supporting only required header fields and G-PDU message type.
See spec in:
https://tools.ietf.org/html/draft-hmm-dmm-5g-uplane-analysis-00
Tested-at: https://travis-ci.org/github/williamtu/ovs-travis/builds/666518784
Signed-off-by: Feng Yang <yangfengee04@gmail.com>
Co-authored-by: Feng Yang <yangfengee04@gmail.com>
Signed-off-by: Yi Yang <yangyi01@inspur.com>
Co-authored-by: Yi Yang <yangyi01@inspur.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Ben Pfaff <blp@ovn.org>
l3_ofs should be set all Ethernet packets, not just IPv4/IPv6 ones.
For example for ARP over VLAN tagged packets, it may cause wrong
processing like in changing the VLAN ID action. Fix it.
Fixes: aab96ec4d81e ("dpif-netdev: retrieve flow directly from the flow mark")
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
parse_tcp_flags() does not care about vlan tags in a packet thus
not able to parse them. As a result, if partial offloading is
enabled in userspace datapath vlan packets are not parsed, i.e.
has no initialized offsets. This causes OVS crash on any attempt
to access/modify packet header fields.
For example, having the flow with following actions:
in_port=1,ip,actions=mod_nw_src:192.168.0.7,output:IN_PORT
will lead to OVS crash on vlan packet handling:
Process terminating with default action of signal 11 (SIGSEGV)
Invalid read of size 4
at 0x785657: get_16aligned_be32 (unaligned.h:249)
by 0x785657: odp_set_ipv4 (odp-execute.c:82)
by 0x785657: odp_execute_masked_set_action (odp-execute.c:527)
by 0x785657: odp_execute_actions (odp-execute.c:894)
by 0x74CDA9: dp_netdev_execute_actions (dpif-netdev.c:7355)
by 0x74CDA9: packet_batch_per_flow_execute (dpif-netdev.c:6339)
by 0x74CDA9: dp_netdev_input__ (dpif-netdev.c:6845)
by 0x74DB6E: dp_netdev_input (dpif-netdev.c:6854)
by 0x74DB6E: dp_netdev_process_rxq_port (dpif-netdev.c:4287)
by 0x74E863: dpif_netdev_run (dpif-netdev.c:5264)
by 0x703F57: type_run (ofproto-dpif.c:370)
by 0x6EC8B8: ofproto_type_run (ofproto.c:1760)
by 0x6DA52B: bridge_run__ (bridge.c:3188)
by 0x6E083F: bridge_run (bridge.c:3252)
by 0x1642E4: main (ovs-vswitchd.c:127)
Address 0xc is not stack'd, malloc'd or (recently) free'd
Fix that by properly parsing vlan tags first. Function 'parse_dl_type'
transformed for that purpose as it had no users anyway.
Added unit test for packet modification with partial offloading that
triggers above crash.
Fixes: aab96ec4d81e ("dpif-netdev: retrieve flow directly from the flow mark")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
OVS has no structure definition for ICMPv6 header with additional
data. More precisely, it has, but this structure named as
'icmp6_error_header' and only suitable to store error related
extended information. 'flow_compose_l4' stores additional
information in reserved bits by using system defined structure
'icmp6_hdr', which is marked as 'packed' and this leads to
build failure with gcc >= 9:
lib/flow.c:3041:34: error:
taking address of packed member of 'struct icmp6_hdr' may result
in an unaligned pointer value [-Werror=address-of-packed-member]
uint32_t *reserved = &icmp->icmp6_dataun.icmp6_un_data32[0];
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Fix that by renaming 'icmp6_error_header' to 'icmp6_data_header'
and allowing it to store not only errors, but any type of additional
information by analogue with 'struct icmp6_hdr'.
All the usages of 'struct icmp6_hdr' replaced with this new structure.
Removed redundant conversions between network and host representations.
Now fields are always in be.
This also, probably, makes flow_compose_l4 more robust by avoiding
possible unaligned accesses to 32 bit value.
Fixes: 9b2b84973db7 ("Support for match & set ICMPv6 reserved and options type fields")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
Acked-by: Ben Pfaff <blp@ovn.org>
The padding length is (packet size - ipv6 header length - ipv6 plen). This
patch fixes incorrect padding size checking in ipv6_sanity_check.
Acked-by: William Tu <u9012063@gmail.com>
Reviewed-by: Gavin Hu <Gavin.Hu@arm.com>
Signed-off-by: Yanqin Wei <Yanqin.Wei@arm.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
For untagged traffic, it is unnecessary to clear vlan_hdrs as it costs 32B
memset. So the patch improves it by postponing to clear vlan_hdrs until
ethtype check. It can benefit both untagged and single-tagged traffic. From
testing, it does not impact performance of dual-tagged traffic.
Reviewed-by: Gavin Hu <Gavin.Hu@arm.com>
Signed-off-by: Yanqin Wei <Yanqin.Wei@arm.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
This patch merges two separate if-else branches for metadata connection state
into one if-else branch to improve performance. It gives an average performance
improvement of ~3% on arm platforms and ~4.5% on x86 platforms.
Signed-off-by: Malvika Gupta <malvika.gupta@arm.com>
Reviewed-by: Yanqin Wei <yanqin.wei@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
The following, run inside the OVS sandbox, caused OVS to abort when
Address Sanitizer was used:
ovs-vsctl add-br br-int
ovs-ofctl add-flow br-int "table=0,cookie=0x1234,priority=10000,icmp,actions=drop"
ovs-ofctl --strict del-flows br-int "table=0,cookie=0x1234/-1,priority=10000"
Sample report from Address Sanitizer:
==3029==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000043260 at pc 0x7f6b09c2459b bp 0x7ffcb67e7540 sp 0x7ffcb67e6cf0
READ of size 40 at 0x603000043260 thread T0
#0 0x7f6b09c2459a (/lib/x86_64-linux-gnu/libasan.so.5+0xb859a)
#1 0x565110a748a5 in minimask_equal ../lib/flow.c:3510
#2 0x565110a9ea41 in minimatch_equal ../lib/match.c:1821
#3 0x56511091e864 in collect_rules_strict ../ofproto/ofproto.c:4516
#4 0x56511093d526 in delete_flow_start_strict ../ofproto/ofproto.c:5959
#5 0x56511093d526 in ofproto_flow_mod_start ../ofproto/ofproto.c:7949
#6 0x56511093d77b in handle_flow_mod__ ../ofproto/ofproto.c:6122
#7 0x56511093db71 in handle_flow_mod ../ofproto/ofproto.c:6099
#8 0x5651109407f6 in handle_single_part_openflow ../ofproto/ofproto.c:8406
#9 0x5651109407f6 in handle_openflow ../ofproto/ofproto.c:8587
#10 0x5651109e40da in ofconn_run ../ofproto/connmgr.c:1318
#11 0x5651109e40da in connmgr_run ../ofproto/connmgr.c:355
#12 0x56511092b129 in ofproto_run ../ofproto/ofproto.c:1826
#13 0x5651108f23cd in bridge_run__ ../vswitchd/bridge.c:2965
#14 0x565110904887 in bridge_run ../vswitchd/bridge.c:3023
#15 0x5651108e659c in main ../vswitchd/ovs-vswitchd.c:127
#16 0x7f6b093b709a in __libc_start_main ../csu/libc-start.c:308
#17 0x5651108e9009 in _start (/home/blp/nicira/ovs/_build/vswitchd/ovs-vswitchd+0x11d009)
This fixes the problem, which although largely theoretical could crop up
with odd implementations of memcmp(), perhaps ones optimized in various
"clever" ways. All in all, it seems best to avoid the theoretical problem.
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
UDP source and destination ports are not used to derive the hash index
used for selecting the bucket in case of SYMMETRIC_L4 hash based select
groups. However, they are un-wildcarded in the megaflow entry match criteria.
This results in distinct megaflow entry being created for each pair of UDP
source and destination ports unnecessarily and causes significant performance
deterioration when the megaflow cache limit is reached.
This patch wildcards UDP ports when using select group with SYMMETRIC_L4
hash function.
Signed-off-by: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com>
CC: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
For a series of IP fragments, only the first packet includes the transport
header (TCP/UDP/SCTP) and the src/dst ports. By including these port
numbers in the hash, it may happen that a first fragment hashes to a
different value than subsequent packets, causing different packets from
the same flow to follow different paths. This in turn may result in
out-of-order delivery or failed reassembly. This patch excludes port
numbers from the hash calculation in case of IP fragmentation.
Signed-off-by: Jeroen van Bemmel <jeroen.van_bemmel@nokia.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
The assertions make it easier to find all the places that need to be
updated when adding protocol support.
Acked-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
state_s should be freed always before exit parse_ct_state
Fixes: b4293a336d8d ("conntrack: Move ct_state parsing to lib/flow.c")
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Currently OVS supports all ARP protocol fields as OXM match fields to
implement the relevant ARP procedures for IPv4. This includes support
for matching copying and setting ARP fields. In IPv6 ARP has been
replaced by ICMPv6 neighbor discovery (ND) procedures, neighbor
advertisement and neighbor solicitation.
The support for ICMPv6 fields in OVS is not complete for the use cases
equivalent to ARP in IPv4. OVS lacks support for matching, copying and
setting the “ND option type” and “ND reserved” fields. Without these user
cannot implement all ICMPv6 ND procedures for IPv6 support.
This commit adds additional OXM fields to OVS for ICMPv6 “ND option type“
and ICMPv6 “ND reserved” using the OXM extension mechanism. This allows
support for parsing these fields from an ICMPv6 packet header and extending
the OpenFlow protocol with specifications for these new OXM fields for
matching, copying and setting.
Signed-off-by: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com>
Co-authored-by: Ashvin Lakshmikantha <ashvin.lakshmikantha@ericsson.com>
Signed-off-by: Ashvin Lakshmikantha <ashvin.lakshmikantha@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
As per RFC 768, if the calculated UDP checksum is 0, it should be
instead set as 0xFFFF in the frame. A value of 0 in the checksum
field indicates to the receiver that no checksum was calculated
and hence it should not verify the checksum.
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
This is the first patch in the patch-set to support dynamic rebalancing
of offloaded flows.
The patch detects OOR condition on a netdev port when ENOSPC error is
returned by TC-Flower while adding a flow rule. A new structure is added
to the netdev called "netdev_hw_info", to store OOR related information
required to perform dynamic offload-rebalancing.
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Co-authored-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Reviewed-by: Sathya Perla <sathya.perla@broadcom.com>
Reviewed-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
In the default case when nsh's md_type is not recognized by nsh parser,
uninitialized data in key->context can sneak into miniflow. This
patch fixes it.
Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=10519
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Add a symmetric_l3 hash method that uses both network destination
address and network source address.
VMware-BZ: #2112940
Signed-off-by: Martin Xu <martinxu9.ovs@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
When parse_ipv6_ext_hdrs__() returned false, half a 64-bit word had been
pushed into the miniflow and the second half was left uninitialized. This
commit fixes the problem.
Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=10518
Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
By default, these function are to change the first vlan vid and pcp
in the flow. Add a parameter as index for vlans if we want to handle
the second ones.
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
The ipv6_sanity_check() function implemented a check for IPv6 payload
length wrong: ip6_plen is the payload length but this function checked
whether it was longer than the total length of IPv6 header plus payload.
This meant that a packet with a crafted ip6_plen could result in a buffer
overread of up to the length of an IPv6 header (40 bytes).
The kernel datapath flow extraction code does not obviously have a similar
problem.
Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=9287
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Darrell Ball <dlu998@gmail.com>
So that we could skip some very costly CPU operations, including but
not limiting to miniflow_extract, emc lookup, dpcls lookup, etc. Thus,
performance could be greatly improved.
A PHY-PHY forwarding with 1000 mega flows (udp,tp_src=1000-1999) and
1 million streams (tp_src=1000-1999, tp_dst=2000-2999) show more that
260% performance boost.
Note that though the heavy miniflow_extract is skipped, we still have
to do per packet checking, due to we have to check the tcp_flags.
Co-authored-by: Finn Christensen <fc@napatech.com>
Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org>
Signed-off-by: Finn Christensen <fc@napatech.com>
Co-authored-by: Shahaf Shuler <shahafs@mellanox.com>
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Those checks were part of the miniflow extractor. Moving them out to
act as a general helpers for packet validation.
Co-authored-by: Finn Christensen <fc@napatech.com>
Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org>
Signed-off-by: Finn Christensen <fc@napatech.com>
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Co-authored-by: Shahaf Shuler <shahafs@mellanox.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
This commit implements a new dp_hash algorithm OVS_HASH_L4_SYMMETRIC in
the netdev datapath. It will be used as default hash algorithm for the
dp_hash-based select groups in a subsequent commit to maintain
compatibility with the symmetry property of the current default hash
selection method.
A new dpif_backer_support field 'max_hash_alg' is introduced to reflect
the highest hash algorithm a datapath supports in the dp_hash action.
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Nitin Katiyar <nitin.katiyar@ericsson.com>
Co-authored-by: Nitin Katiyar <nitin.katiyar@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
ERSPAN is a tunneling protocol based on GRE tunnel. The patch
add erspan tunnel support for ovs-vswitchd with userspace datapath.
Configuring erspan tunnel is similar to gre tunnel, but with
additional erspan's parameters. Matching a flow on erspan's
metadata is also supported, see ovs-fields for more details.
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Until mow, this macro has blindly read the passed-in type's size, but
that's unnecessarily risky. This commit changes it to verify that the
passed-in type is the same size as the field and, on GCC and Clang, that
the types are compatible. It also adds a version that does not check,
for the one case where (currently) we deliberately read the wrong size,
and updates a few uses to use more precise field names.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Reviewed-by: Armando Migliaccio <armamig@gmail.com>
This makes traffic generated by flow_compose() look slightly more
realistic. It requires lots of updates to tests, but at least the tests
themselves should be slightly more realistic too.
At the same time, add --l7 and --l7-len options to ofproto/trace to allow
users to specify the amount or contents of payloads that they want.
Suggested-by: Brad Cowie <brad@cowie.nz>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Tested-by: Yifeng Sun <pkusunyifeng@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Each of the cases in flow_compose_l4() separately tracked the number of
bytes of L4 data added to the packet. This commit makes the function do
that in a single place without per-protocol bookkeeping.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
IETF NSH draft added a new filed ttl in NSH header, this patch
is to add new nsh key 'ttl' for it.
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
This patch changes OVS_KEY_ATTR_NSH
to nested attribute and adds three new NSH sub attribute keys:
OVS_NSH_KEY_ATTR_BASE: for length-fixed NSH base header
OVS_NSH_KEY_ATTR_MD1: for length-fixed MD type 1 context
OVS_NSH_KEY_ATTR_MD2: for length-variable MD type 2 metadata
Its intention is to align to NSH kernel implementation.
NSH match fields, set and PUSH_NSH action all use the below
nested attribute format:
OVS_KEY_ATTR_NSH begin
OVS_NSH_KEY_ATTR_BASE
OVS_NSH_KEY_ATTR_MD1
OVS_KEY_ATTR_NSH end
or
OVS_KEY_ATTR_NSH begin
OVS_NSH_KEY_ATTR_BASE
OVS_NSH_KEY_ATTR_MD2
OVS_KEY_ATTR_NSH end
In addition, NSH encap and decap actions are renamed as push_nsh
and pop_nsh to meet action naming convention.
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Found by libfuzzer.
CC: Jan Scheurich <jan.scheurich@ericsson.com>
Fixes: 7edef47b4896 ("NSH: Minor bugfixes")
Reported-by: Bhargava Shastry <bshastry@sec.t-labs.tu-berlin.de>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Jan Scheurich <jan.scheurich@ericsson.com>
This commit adjusts the NSH user space implementation in OVS to
the latest wire format defined in draft-ietf-sfc-nsh-28 (November 3
2017). The NSH_MDTYPE field was reduced from 8 to 4 bits. The FLAGS
field is reduced from 8 to 2 bits. A new 6 bit TTL header field is
added. The TTL field is set to 63 at encap(nsh).
Match and set_field support for the newly introduced TTL header field
and a corresponding dec_nsh_ttl action is not yet included and will be
implemented in a future patch.
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
- Fix 2 incorrect length checks
- Remove unnecessary limit of MD length to 16 bytes
- Remove incorrect comments stating MD2 was not supported
- Pad metadata in encap_nsh with zeroes if not multiple of 4 bytes
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
When a packet is sent to the controller, dl_type is not stored in the
'ofputil_packet_in_private'. When the packet is resumed, the flow's
dl_type is 0 and this leads to invalid value in ct_orig_tuple in the
pkt_metadata.
This patch adds the dl_type to the metadata so that conntrack
information can be interpreted correctly when packets are resumed.
This is a change from the ordinary practice, since flow_get_metadata() is
only supposed to deal with metadata and dl_type is not metadata. It is
necessary when ct_state is involved, though, because ct_state only applies
in the case of particular Ethertypes (IPv4 and IPv6 currently), so we need
to add it as a kind of prerequisite. (This isn't ideal; maybe we didn't
think through the ct_state mechanism carefully enough.)
Reported-by: Daniel Alvarez Sanchez <dalvarez@redhat.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2017-October/339868.html
Signed-off-by: Daniel Alvarez <dalvarez@redhat.com>
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
This patch adds support for NSH packet header fields to the OVS
control plane and the userspace datapath. Initially we support the
fields of the NSH base header as defined in
https://www.ietf.org/id/draft-ietf-sfc-nsh-13.txt
and the fixed context headers specified for metadata format MD1.
The variable length MD2 format is parsed but the TLV context headers
are not yet available for matching.
The NSH fields are modelled as experimenter fields with the dedicated
experimenter class 0x005ad650 proposed for NSH in ONF. The following
fields are defined:
NXOXM code ofctl name Size Comment
=====================================================================
NXOXM_NSH_FLAGS nsh_flags 8 Bits 2-9 of 1st NSH word
(0x005ad650,1)
NXOXM_NSH_MDTYPE nsh_mdtype 8 Bits 16-23
(0x005ad650,2)
NXOXM_NSH_NEXTPROTO nsh_np 8 Bits 24-31
(0x005ad650,3)
NXOXM_NSH_SPI nsh_spi 24 Bits 0-23 of 2nd NSH word
(0x005ad650,4)
NXOXM_NSH_SI nsh_si 8 Bits 24-31
(0x005ad650,5)
NXOXM_NSH_C1 nsh_c1 32 Maskable, nsh_mdtype==1
(0x005ad650,6)
NXOXM_NSH_C2 nsh_c2 32 Maskable, nsh_mdtype==1
(0x005ad650,7)
NXOXM_NSH_C3 nsh_c3 32 Maskable, nsh_mdtype==1
(0x005ad650,8)
NXOXM_NSH_C4 nsh_c4 32 Maskable, nsh_mdtype==1
(0x005ad650,9)
Co-authored-by: Johnson Li <johnson.li@intel.com>
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Currently, flow_compose_size() is only supposed to be called after
flow_compose(). I find this API to be unintuitive.
Change flow_compose() API to take the 'size' argument, and
returns 'true' if the packet can be created, 'false' otherwise.
This change also improves error detection and reporting when
'size' is unreasonably small.
Signed-off-by: Andy Zhou <azhou@ovn.org>
Acked-by: Ilya Maximets <i.maximets@samsung.com>