'tcp_flags' are not placed at the beginning of the 64-bit block, so
they have to be padded. Today it is done in a hacky way by pushing
zeroes over the last 32-bits of the arp_tha. When that was written
the miniflow_pad_from_64() didn't exist, but it's better to use it
now instead to avoid confusion.
'ct_tp_src/dst' are not actually extracted for IGMP. See the
write_ct_md() function. The pushes are there for the padding purposes,
since 'tp_dst' doesn't end on a 64-bit boundary and so we need to pad
before pushing the IGMP group. Use an explicit padding function
instead to avoid a false impression that IGMP can have non-zero
"ports" in the conntrack tuple.
This change should not change anything functionally.
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Ales reported that ct_tp_{src,dst} matches are working only for the
first frag for the userspace datapath, whereas they are always working
for later frags in the case of kernel datapath.
The ipf propagates the info in packets metadata, but
miniflow_extract() has no handling for them.
Fix it by pushing the relevant fields in the miniflow.
tp_{src,dst} are not set for later frags, so fill them with padding as
ct_tp_{src,dst} are not aligned:
struct flow {
[...]
ovs_be16 tp_src; /* 656 2 */
ovs_be16 tp_dst; /* 658 2 */
ovs_be16 ct_tp_src; /* 660 2 */
ovs_be16 ct_tp_dst; /* 662 2 */
[...]
}
The patch also includes two tests to exercise the behavior.
Fixes: 4ea96698f667 ("Userspace datapath: Add fragmentation handling.")
Reported-at: https://issues.redhat.com/browse/FDP-124
Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Rather than drop all pending Tx offloads on recirculation,
preserve inner offloads (and mark packet with outer Tx offloads)
when parsing the packet again.
Fixes: c6538b443984 ("dpif-netdev: Fix crash due to tunnel offloading on recirculation.")
Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.")
Reported-at: https://issues.redhat.com/browse/FDP-1144
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Fixes: e7e9973b80d3 ("dpif-netdev: Forwarding optimization for flows with a simple match")
Signed-off-by: Allen Chen <allen.chen@jaguarmicro.com>
Reviewed-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Aaron Conole <aconole@redhat.com>
The data in the buffer are aligned to 2 bytes, however
'struct in6_addr' is aligned to 4 bytes. Use the 2 bytes aligned
equivalent 'union ovs_16aligned_in6_addr' instead. This was caught
by one of the OVN tests:
lib/flow.c:1133:25: runtime error: load of misaligned address
0x51400009cc92 for type 'const struct in6_addr *', which requires
4 byte alignment
0x51400009cc92: note: pointer points here
00 00 00 00 10 00 00 00 00 00 00 00 00 00 00 00 00 00
^
0 0x8255b2 in miniflow_extract lib/flow.c:1133:25
1 0x81d921 in flow_extract lib/flow.c:671:5
2 0xa966d4 in ofp_packet_to_string lib/ofp-print.c:82:5
3 0xa76de2 in ofputil_packet_in_private_format lib/ofp-packet.c:1037:24
4 0xa99817 in ofp_print_packet_in lib/ofp-print.c:132:9
5 0xa97f46 in ofp_to_string__ lib/ofp-print.c
6 0xa97f46 in ofp_to_string lib/ofp-print.c:1264:21
7 0xc338f4 in do_send lib/vconn.c:687:19
8 0xb7f678 in try_send lib/rconn.c:1128:14
9 0xb7d725 in rconn_send__ lib/rconn.c:760:13
10 0xb7d8e7 in rconn_send_with_limit lib/rconn.c:816:17
11 0x6f70de in do_send_packet_ins ofproto/connmgr.c:1697:13
12 0x6f691f in connmgr_send_async_msg ofproto/connmgr.c:1682:9
13 0x5c2d23 in run ofproto/ofproto-dpif.c:1877:13
14 0x56737f in ofproto_run ofproto/ofproto.c:1906:13
15 0x50d4fc in bridge_run__ vswitchd/bridge.c:3287:9
16 0x50a764 in bridge_run vswitchd/bridge.c:3346:5
17 0x53eed7 in main vswitchd/ovs-vswitchd.c:130:9
Acked-by: Simon Horman <horms@ovn.org>
Signed-off-by: Ales Musil <amusil@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
OVS can parse NSH, but can't compose. Fix that and get rid of plain
hex NSH packets in system tests as they are hard to read or modify.
Tcpdump calls modified to write actual pcaps instead of text output,
so ovs-pcap can be used while checking the results.
While at it, replacing sleeps with more robust waiting for tcpdump
to start listening.
M4 macros are better than shell variables, because we can see the
substitution result in the test log. So, using m4_define and m4_join
extensively.
Acked-by: Simon Horman <horms@ovn.org>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This patch enables most of the tunnel tests in the testsuite, and adds a
large TCP transfer to a vxlan and geneve test to verify TSO
functionality. Some additional changes were required to accommodate these
changes with netdev-linux interfaces. The test for vlan over vxlan is
purposely not enabled as the traffic produced by this test gives
incorrect values in the vnet header.
Acked-by: Simon Horman <horms@ovn.org>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
For userspace datapath, this patch provides vxlan and geneve tunnel tso.
Only support userspace vxlan or geneve tunnel, meanwhile support
tunnel outter and inner csum offload. If netdev do not support offload
features, there is a software fallback.If netdev do not support vxlan
and geneve tso,packets will drop. Front-end devices can close offload
features by ethtool also.
Acked-by: Simon Horman <horms@ovn.org>
Signed-off-by: Dexia Li <dexia.li@jaguarmicro.com>
Co-authored-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
With --bare, it will produce a bare hexified payload with no spaces or
offset indicators inserted, which is useful in tests to produce frames
to pass to e.g. `ovs-ofctl receive`.
With --bad-csum, it will produce a frame that has an invalid IP checksum
(applicable to IPv4 only because IPv6 doesn't have checksums.)
The command is now more useful in tests, where we may need to produce
hex frame payloads to compare observed frames against.
As an example of the tool use, a single test case is converted to it.
The test uses both normal --bare and --bad-csum behaviors of the
command, confirming they work as advertised.
Acked-by: Simon Horman <horms@ovn.org>
Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The netdev receiving packets is supposed to provide the flags
indicating if the L4 checksum was verified and it is OK or BAD,
otherwise the stack will check when appropriate by software.
If the packet comes with good checksum, then postpone the
checksum calculation to the egress device if needed.
When encapsulate a packet with that flag, set the checksum
of the inner L4 header since that is not yet supported.
Calculate the L4 checksum when the packet is going to be sent
over a device that doesn't support the feature.
Linux tap devices allows enabling L3 and L4 offload, so this
patch enables the feature. However, Linux socket interface
remains disabled because the API doesn't allow enabling
those two features without enabling TSO too.
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Co-authored-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The netdev receiving packets is supposed to provide the flags
indicating if the IP checksum was verified and it is GOOD or BAD,
otherwise the stack will check when appropriate by software.
If the packet comes with good checksum, then postpone the
checksum calculation to the egress device if needed.
When encapsulate a packet with that flag, set the checksum
of the inner IP header since that is not yet supported.
Calculate the IP checksum when the packet is going to be sent over
a device that doesn't support the feature.
Linux devices don't support IP checksum offload alone, so the
support is not enabled.
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Co-authored-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Checks whether IPPROTO_ROUTING exists in the IPv6 extension headers.
If it exists, the first address is retrieved.
If NULL is specified for "frag_hdr" and/or "rt_hdr", those addresses in
the header are not reported to the caller. Of course, "frag_hdr" and
"rt_hdr" are properly parsed inside this function.
Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
nw_frag was not being printed in the flow key because it was improperly
masked and printed. Since this field is only two bits, it needs to use a
different macro to be masked. During printing, the switch statement
switched on the whole 8 bits rather than just the two that are relevant.
This caused nw_frag to often not be printed at all.
Signed-off-by: Rosemarie O'Riorden <roriorden@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
For packets which don't already have a hash calculated,
miniflow_hash_5tuple() calculates the hash of a packet
using the previously built miniflow.
This commit adds IPv6 profile specific hashing which
uses fixed offsets into the packet to improve hashing
performance.
Signed-off-by: Kumar Amber <kumar.amber@intel.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
For packets which don't already have a hash calculated,
miniflow_hash_5tuple() calculates the hash of a packet
using the previously built miniflow.
This commit adds IPv4 profile specific hashing which
uses fixed offsets into the packet to improve hashing
performance.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Co-authored-by: Harry van Haaren <harry.van.haaren@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Co-authored-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Kumar Amber <kumar.amber@intel.com>
Acked-by: Cian Ferriter <cian.ferriter@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
There are cases where users might want simple forwarding or drop rules
for all packets received from a specific port, e.g ::
"in_port=1,actions=2"
"in_port=2,actions=IN_PORT"
"in_port=3,vlan_tci=0x1234/0x1fff,actions=drop"
"in_port=4,actions=push_vlan:0x8100,set_field:4196->vlan_vid,output:3"
There are also cases where complex OpenFlow rules can be simplified
down to datapath flows with very simple match criteria.
In theory, for very simple forwarding, OVS doesn't need to parse
packets at all in order to follow these rules. "Simple match" lookup
optimization is intended to speed up packet forwarding in these cases.
Design:
Due to various implementation constraints userspace datapath has
following flow fields always in exact match (i.e. it's required to
match at least these fields of a packet even if the OF rule doesn't
need that):
- recirc_id
- in_port
- packet_type
- dl_type
- vlan_tci (CFI + VID) - in most cases
- nw_frag - for ip packets
Not all of these fields are related to packet itself. We already
know the current 'recirc_id' and the 'in_port' before starting the
packet processing. It also seems safe to assume that we're working
with Ethernet packets. So, for the simple OF rule we need to match
only on 'dl_type', 'vlan_tci' and 'nw_frag'.
'in_port', 'dl_type', 'nw_frag' and 13 bits of 'vlan_tci' can be
combined in a single 64bit integer (mark) that can be used as a
hash in hash map. We are using only VID and CFI form the 'vlan_tci',
flows that need to match on PCP will not qualify for the optimization.
Workaround for matching on non-existence of vlan updated to match on
CFI and VID only in order to qualify for the optimization. CFI is
always set by OVS if vlan is present in a packet, so there is no need
to match on PCP in this case. 'nw_frag' takes 2 bits of PCP inside
the simple match mark.
New per-PMD flow table 'simple_match_table' introduced to store
simple match flows only. 'dp_netdev_flow_add' adds flow to the
usual 'flow_table' and to the 'simple_match_table' if the flow
meets following constraints:
- 'recirc_id' in flow match is 0.
- 'packet_type' in flow match is Ethernet.
- Flow wildcards contains only minimal set of non-wildcarded fields
(listed above).
If the number of flows for current 'in_port' in a regular 'flow_table'
equals number of flows for current 'in_port' in a 'simple_match_table',
we may use simple match optimization, because all the flows we have
are simple match flows. This means that we only need to parse
'dl_type', 'vlan_tci' and 'nw_frag' to perform packet matching.
Now we make the unique flow mark from the 'in_port', 'dl_type',
'nw_frag' and 'vlan_tci' and looking for it in the 'simple_match_table'.
On successful lookup we don't need to run full 'miniflow_extract()'.
Unsuccessful lookup technically means that we have no suitable flow
in the datapath and upcall will be required. So, in this case EMC and
SMC lookups are disabled. We may optimize this path in the future by
bypassing the dpcls lookup too.
Performance improvement of this solution on a 'simple match' flows
should be comparable with partial HW offloading, because it parses same
packet fields and uses similar flow lookup scheme.
However, unlike partial HW offloading, it works for all port types
including virtual ones.
Performance results when compared to EMC:
Test setup:
virtio-user OVS virtio-user
Testpmd1 ------------> pmd1 ------------> Testpmd2
(txonly) x<------ pmd2 <------------ (mac swap)
Single stream of 64byte packets. Actions:
in_port=vhost0,actions=vhost1
in_port=vhost1,actions=vhost0
Stats collected from pmd1 and pmd2, so there are 2 scenarios:
Virt-to-Virt : Testpmd1 ------> pmd1 ------> Testpmd2.
Virt-to-NoCopy : Testpmd2 ------> pmd2 --->x Testpmd1.
Here the packet sent from pmd2 to Testpmd1 is always dropped, because
the virtqueue is full since Testpmd1 is in txonly mode and doesn't
receive any packets. This should be closer to the performance of a
VM-to-Phy scenario.
Test performed on machine with Intel Xeon CPU E5-2690 v4 @ 2.60GHz.
Table below represents improvement in throughput when compared to EMC.
+----------------+------------------------+------------------------+
| | Default (-g -O2) | "-Ofast -march=native" |
| Scenario +------------+-----------+------------+-----------+
| | GCC | Clang | GCC | Clang |
+----------------+------------+-----------+------------+-----------+
| Virt-to-Virt | +18.9% | +25.5% | +10.8% | +16.7% |
| Virt-to-NoCopy | +24.3% | +33.7% | +14.9% | +22.0% |
+----------------+------------+-----------+------------+-----------+
For Phy-to-Phy case performance improvement should be even higher, but
it's not the main use-case for this functionality. Performance
difference for the non-simple flows is within a margin of error.
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
'dataofs' field of TCP header indicates the TCP header length. The
length should be >= 20 bytes/4 and <= TCP data length. This patch is
to test the 'dataofs' and not parse layer 4 fields when meet bad
dataofs.
This behavior is consistent with the openvswitch kernel module.
Fixes: 5a51b2cd3483 ("lib/ofpbuf: Remove 'l7' pointer.")
Signed-off-by: lic121 <lic121@chinatelecom.cn>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Skipping further processing of invalid IP packets helps avoid crashes
but it does not help to figure out if the malformed packets are still
present on the network.
Add coverage counters for IPv4 and IPv6 sanity checks so that we know
there are some invalid packets.
Dump such whole packets in debug mode.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Although not required, padding can be optionally added until
the packet length is MTU bytes. A packet with extra padding
currently fails sanity checks.
Vulnerability: CVE-2020-35498
Fixes: fa8d9001a624 ("miniflow_extract: Properly handle small IP packets.")
Reported-by: Joakim Hindersson <joakim.hindersson@elastx.se>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
GTP, GPRS Tunneling Protocol, is a group of IP-based communications
protocols used to carry general packet radio service (GPRS) within
GSM, UMTS and LTE networks. GTP protocol has two parts: Signalling
(GTP-Control, GTP-C) and User data (GTP-User, GTP-U). GTP-C is used
for setting up GTP-U protocol, which is an IP-in-UDP tunneling
protocol. Usually GTP is used in connecting between base station for
radio, Serving Gateway (S-GW), and PDN Gateway (P-GW).
This patch implements GTP-U protocol for userspace datapath,
supporting only required header fields and G-PDU message type.
See spec in:
https://tools.ietf.org/html/draft-hmm-dmm-5g-uplane-analysis-00
Tested-at: https://travis-ci.org/github/williamtu/ovs-travis/builds/666518784
Signed-off-by: Feng Yang <yangfengee04@gmail.com>
Co-authored-by: Feng Yang <yangfengee04@gmail.com>
Signed-off-by: Yi Yang <yangyi01@inspur.com>
Co-authored-by: Yi Yang <yangyi01@inspur.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Ben Pfaff <blp@ovn.org>
l3_ofs should be set all Ethernet packets, not just IPv4/IPv6 ones.
For example for ARP over VLAN tagged packets, it may cause wrong
processing like in changing the VLAN ID action. Fix it.
Fixes: aab96ec4d81e ("dpif-netdev: retrieve flow directly from the flow mark")
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
parse_tcp_flags() does not care about vlan tags in a packet thus
not able to parse them. As a result, if partial offloading is
enabled in userspace datapath vlan packets are not parsed, i.e.
has no initialized offsets. This causes OVS crash on any attempt
to access/modify packet header fields.
For example, having the flow with following actions:
in_port=1,ip,actions=mod_nw_src:192.168.0.7,output:IN_PORT
will lead to OVS crash on vlan packet handling:
Process terminating with default action of signal 11 (SIGSEGV)
Invalid read of size 4
at 0x785657: get_16aligned_be32 (unaligned.h:249)
by 0x785657: odp_set_ipv4 (odp-execute.c:82)
by 0x785657: odp_execute_masked_set_action (odp-execute.c:527)
by 0x785657: odp_execute_actions (odp-execute.c:894)
by 0x74CDA9: dp_netdev_execute_actions (dpif-netdev.c:7355)
by 0x74CDA9: packet_batch_per_flow_execute (dpif-netdev.c:6339)
by 0x74CDA9: dp_netdev_input__ (dpif-netdev.c:6845)
by 0x74DB6E: dp_netdev_input (dpif-netdev.c:6854)
by 0x74DB6E: dp_netdev_process_rxq_port (dpif-netdev.c:4287)
by 0x74E863: dpif_netdev_run (dpif-netdev.c:5264)
by 0x703F57: type_run (ofproto-dpif.c:370)
by 0x6EC8B8: ofproto_type_run (ofproto.c:1760)
by 0x6DA52B: bridge_run__ (bridge.c:3188)
by 0x6E083F: bridge_run (bridge.c:3252)
by 0x1642E4: main (ovs-vswitchd.c:127)
Address 0xc is not stack'd, malloc'd or (recently) free'd
Fix that by properly parsing vlan tags first. Function 'parse_dl_type'
transformed for that purpose as it had no users anyway.
Added unit test for packet modification with partial offloading that
triggers above crash.
Fixes: aab96ec4d81e ("dpif-netdev: retrieve flow directly from the flow mark")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
OVS has no structure definition for ICMPv6 header with additional
data. More precisely, it has, but this structure named as
'icmp6_error_header' and only suitable to store error related
extended information. 'flow_compose_l4' stores additional
information in reserved bits by using system defined structure
'icmp6_hdr', which is marked as 'packed' and this leads to
build failure with gcc >= 9:
lib/flow.c:3041:34: error:
taking address of packed member of 'struct icmp6_hdr' may result
in an unaligned pointer value [-Werror=address-of-packed-member]
uint32_t *reserved = &icmp->icmp6_dataun.icmp6_un_data32[0];
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Fix that by renaming 'icmp6_error_header' to 'icmp6_data_header'
and allowing it to store not only errors, but any type of additional
information by analogue with 'struct icmp6_hdr'.
All the usages of 'struct icmp6_hdr' replaced with this new structure.
Removed redundant conversions between network and host representations.
Now fields are always in be.
This also, probably, makes flow_compose_l4 more robust by avoiding
possible unaligned accesses to 32 bit value.
Fixes: 9b2b84973db7 ("Support for match & set ICMPv6 reserved and options type fields")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
Acked-by: Ben Pfaff <blp@ovn.org>
The padding length is (packet size - ipv6 header length - ipv6 plen). This
patch fixes incorrect padding size checking in ipv6_sanity_check.
Acked-by: William Tu <u9012063@gmail.com>
Reviewed-by: Gavin Hu <Gavin.Hu@arm.com>
Signed-off-by: Yanqin Wei <Yanqin.Wei@arm.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
For untagged traffic, it is unnecessary to clear vlan_hdrs as it costs 32B
memset. So the patch improves it by postponing to clear vlan_hdrs until
ethtype check. It can benefit both untagged and single-tagged traffic. From
testing, it does not impact performance of dual-tagged traffic.
Reviewed-by: Gavin Hu <Gavin.Hu@arm.com>
Signed-off-by: Yanqin Wei <Yanqin.Wei@arm.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
This patch merges two separate if-else branches for metadata connection state
into one if-else branch to improve performance. It gives an average performance
improvement of ~3% on arm platforms and ~4.5% on x86 platforms.
Signed-off-by: Malvika Gupta <malvika.gupta@arm.com>
Reviewed-by: Yanqin Wei <yanqin.wei@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
The following, run inside the OVS sandbox, caused OVS to abort when
Address Sanitizer was used:
ovs-vsctl add-br br-int
ovs-ofctl add-flow br-int "table=0,cookie=0x1234,priority=10000,icmp,actions=drop"
ovs-ofctl --strict del-flows br-int "table=0,cookie=0x1234/-1,priority=10000"
Sample report from Address Sanitizer:
==3029==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000043260 at pc 0x7f6b09c2459b bp 0x7ffcb67e7540 sp 0x7ffcb67e6cf0
READ of size 40 at 0x603000043260 thread T0
#0 0x7f6b09c2459a (/lib/x86_64-linux-gnu/libasan.so.5+0xb859a)
#1 0x565110a748a5 in minimask_equal ../lib/flow.c:3510
#2 0x565110a9ea41 in minimatch_equal ../lib/match.c:1821
#3 0x56511091e864 in collect_rules_strict ../ofproto/ofproto.c:4516
#4 0x56511093d526 in delete_flow_start_strict ../ofproto/ofproto.c:5959
#5 0x56511093d526 in ofproto_flow_mod_start ../ofproto/ofproto.c:7949
#6 0x56511093d77b in handle_flow_mod__ ../ofproto/ofproto.c:6122
#7 0x56511093db71 in handle_flow_mod ../ofproto/ofproto.c:6099
#8 0x5651109407f6 in handle_single_part_openflow ../ofproto/ofproto.c:8406
#9 0x5651109407f6 in handle_openflow ../ofproto/ofproto.c:8587
#10 0x5651109e40da in ofconn_run ../ofproto/connmgr.c:1318
#11 0x5651109e40da in connmgr_run ../ofproto/connmgr.c:355
#12 0x56511092b129 in ofproto_run ../ofproto/ofproto.c:1826
#13 0x5651108f23cd in bridge_run__ ../vswitchd/bridge.c:2965
#14 0x565110904887 in bridge_run ../vswitchd/bridge.c:3023
#15 0x5651108e659c in main ../vswitchd/ovs-vswitchd.c:127
#16 0x7f6b093b709a in __libc_start_main ../csu/libc-start.c:308
#17 0x5651108e9009 in _start (/home/blp/nicira/ovs/_build/vswitchd/ovs-vswitchd+0x11d009)
This fixes the problem, which although largely theoretical could crop up
with odd implementations of memcmp(), perhaps ones optimized in various
"clever" ways. All in all, it seems best to avoid the theoretical problem.
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
UDP source and destination ports are not used to derive the hash index
used for selecting the bucket in case of SYMMETRIC_L4 hash based select
groups. However, they are un-wildcarded in the megaflow entry match criteria.
This results in distinct megaflow entry being created for each pair of UDP
source and destination ports unnecessarily and causes significant performance
deterioration when the megaflow cache limit is reached.
This patch wildcards UDP ports when using select group with SYMMETRIC_L4
hash function.
Signed-off-by: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com>
CC: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
For a series of IP fragments, only the first packet includes the transport
header (TCP/UDP/SCTP) and the src/dst ports. By including these port
numbers in the hash, it may happen that a first fragment hashes to a
different value than subsequent packets, causing different packets from
the same flow to follow different paths. This in turn may result in
out-of-order delivery or failed reassembly. This patch excludes port
numbers from the hash calculation in case of IP fragmentation.
Signed-off-by: Jeroen van Bemmel <jeroen.van_bemmel@nokia.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
The assertions make it easier to find all the places that need to be
updated when adding protocol support.
Acked-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
state_s should be freed always before exit parse_ct_state
Fixes: b4293a336d8d ("conntrack: Move ct_state parsing to lib/flow.c")
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Currently OVS supports all ARP protocol fields as OXM match fields to
implement the relevant ARP procedures for IPv4. This includes support
for matching copying and setting ARP fields. In IPv6 ARP has been
replaced by ICMPv6 neighbor discovery (ND) procedures, neighbor
advertisement and neighbor solicitation.
The support for ICMPv6 fields in OVS is not complete for the use cases
equivalent to ARP in IPv4. OVS lacks support for matching, copying and
setting the “ND option type” and “ND reserved” fields. Without these user
cannot implement all ICMPv6 ND procedures for IPv6 support.
This commit adds additional OXM fields to OVS for ICMPv6 “ND option type“
and ICMPv6 “ND reserved” using the OXM extension mechanism. This allows
support for parsing these fields from an ICMPv6 packet header and extending
the OpenFlow protocol with specifications for these new OXM fields for
matching, copying and setting.
Signed-off-by: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com>
Co-authored-by: Ashvin Lakshmikantha <ashvin.lakshmikantha@ericsson.com>
Signed-off-by: Ashvin Lakshmikantha <ashvin.lakshmikantha@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
As per RFC 768, if the calculated UDP checksum is 0, it should be
instead set as 0xFFFF in the frame. A value of 0 in the checksum
field indicates to the receiver that no checksum was calculated
and hence it should not verify the checksum.
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
This is the first patch in the patch-set to support dynamic rebalancing
of offloaded flows.
The patch detects OOR condition on a netdev port when ENOSPC error is
returned by TC-Flower while adding a flow rule. A new structure is added
to the netdev called "netdev_hw_info", to store OOR related information
required to perform dynamic offload-rebalancing.
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Co-authored-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Reviewed-by: Sathya Perla <sathya.perla@broadcom.com>
Reviewed-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
In the default case when nsh's md_type is not recognized by nsh parser,
uninitialized data in key->context can sneak into miniflow. This
patch fixes it.
Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=10519
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Add a symmetric_l3 hash method that uses both network destination
address and network source address.
VMware-BZ: #2112940
Signed-off-by: Martin Xu <martinxu9.ovs@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
When parse_ipv6_ext_hdrs__() returned false, half a 64-bit word had been
pushed into the miniflow and the second half was left uninitialized. This
commit fixes the problem.
Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=10518
Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
By default, these function are to change the first vlan vid and pcp
in the flow. Add a parameter as index for vlans if we want to handle
the second ones.
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
The ipv6_sanity_check() function implemented a check for IPv6 payload
length wrong: ip6_plen is the payload length but this function checked
whether it was longer than the total length of IPv6 header plus payload.
This meant that a packet with a crafted ip6_plen could result in a buffer
overread of up to the length of an IPv6 header (40 bytes).
The kernel datapath flow extraction code does not obviously have a similar
problem.
Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=9287
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Darrell Ball <dlu998@gmail.com>
So that we could skip some very costly CPU operations, including but
not limiting to miniflow_extract, emc lookup, dpcls lookup, etc. Thus,
performance could be greatly improved.
A PHY-PHY forwarding with 1000 mega flows (udp,tp_src=1000-1999) and
1 million streams (tp_src=1000-1999, tp_dst=2000-2999) show more that
260% performance boost.
Note that though the heavy miniflow_extract is skipped, we still have
to do per packet checking, due to we have to check the tcp_flags.
Co-authored-by: Finn Christensen <fc@napatech.com>
Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org>
Signed-off-by: Finn Christensen <fc@napatech.com>
Co-authored-by: Shahaf Shuler <shahafs@mellanox.com>
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Those checks were part of the miniflow extractor. Moving them out to
act as a general helpers for packet validation.
Co-authored-by: Finn Christensen <fc@napatech.com>
Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org>
Signed-off-by: Finn Christensen <fc@napatech.com>
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Co-authored-by: Shahaf Shuler <shahafs@mellanox.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
This commit implements a new dp_hash algorithm OVS_HASH_L4_SYMMETRIC in
the netdev datapath. It will be used as default hash algorithm for the
dp_hash-based select groups in a subsequent commit to maintain
compatibility with the symmetry property of the current default hash
selection method.
A new dpif_backer_support field 'max_hash_alg' is introduced to reflect
the highest hash algorithm a datapath supports in the dp_hash action.
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Nitin Katiyar <nitin.katiyar@ericsson.com>
Co-authored-by: Nitin Katiyar <nitin.katiyar@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
ERSPAN is a tunneling protocol based on GRE tunnel. The patch
add erspan tunnel support for ovs-vswitchd with userspace datapath.
Configuring erspan tunnel is similar to gre tunnel, but with
additional erspan's parameters. Matching a flow on erspan's
metadata is also supported, see ovs-fields for more details.
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Until mow, this macro has blindly read the passed-in type's size, but
that's unnecessarily risky. This commit changes it to verify that the
passed-in type is the same size as the field and, on GCC and Clang, that
the types are compatible. It also adds a version that does not check,
for the one case where (currently) we deliberately read the wrong size,
and updates a few uses to use more precise field names.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Reviewed-by: Armando Migliaccio <armamig@gmail.com>
This makes traffic generated by flow_compose() look slightly more
realistic. It requires lots of updates to tests, but at least the tests
themselves should be slightly more realistic too.
At the same time, add --l7 and --l7-len options to ofproto/trace to allow
users to specify the amount or contents of payloads that they want.
Suggested-by: Brad Cowie <brad@cowie.nz>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Tested-by: Yifeng Sun <pkusunyifeng@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Each of the cases in flow_compose_l4() separately tracked the number of
bytes of L4 data added to the packet. This commit makes the function do
that in a single place without per-protocol bookkeeping.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>