mir/ovs - ovs - Mike's Git repositories

mir/ovs

mirror of https://github.com/openvswitch/ovs synced 2025-08-30 05:47:55 +00:00

Author	SHA1	Message	Date
Ilya Maximets	40ba3fc936	netdev-native-tnl: Fix use of uninitialized RSS hash. RSS hash calculation for a packet may be skipped in some cases. One of them is a simple match optimization. Packet is not fully parsed for the simple match, so there is no enough data to calculate the full 5-tuple hash. However, when such a packet needs tunnel encapsulation, we need RSS hash to calculate the source port for the outer UDP header. And netdev_tnl_get_src_port() function doesn't check if the hash is valid before using it. So, such packets will likely end up with different and unpredictable source ports potentially causing packet reordering or other issues in the network: WARNING: MemorySanitizer: use-of-uninitialized-value 0 0x10c129c in dp_packet_get_rss_hash lib/dp-packet.h:1029:5 1 0x10b264c in netdev_tnl_get_src_port lib/netdev-native-tnl.h:131:12 2 0x10b171a in netdev_tnl_push_udp_header lib/netdev-native-tnl.c:286:20 3 0xb772fe in netdev_push_header lib/netdev.c:1037:13 4 0x9673c4 in push_tnl_action lib/dpif-netdev.c:9067:11 5 0x961abe in dp_execute_cb lib/dpif-netdev.c:9226:13 6 0xbcb4b1 in odp_execute_actions lib/odp-execute.c:1008:17 7 0x8e939f in dp_netdev_execute_actions lib/dpif-netdev.c:9524:5 8 0x968f3f in dp_execute_userspace_action lib/dpif-netdev.c:9093:9 9 0x962e54 in dp_execute_cb lib/dpif-netdev.c:9307:17 10 0xbcb4b1 in odp_execute_actions lib/odp-execute.c:1008:17 11 0x8e939f in dp_netdev_execute_actions lib/dpif-netdev.c:9524:5 12 0x950fef in packet_batch_per_flow_execute lib/dpif-netdev.c:8271:5 13 0x8ec8db in dp_netdev_input__ lib/dpif-netdev.c:8899:9 14 0x8eb8ec in dp_netdev_input lib/dpif-netdev.c:8908:5 15 0x92d5e8 in dp_netdev_process_rxq_port lib/dpif-netdev.c:5660:19 16 0x8ee2c4 in dpif_netdev_run lib/dpif-netdev.c:6993:25 17 0x9b442f in dpif_run lib/dpif.c:471:16 18 0x5f8e3a in type_run ofproto/ofproto-dpif.c:367:9 19 0x56c508 in ofproto_type_run ofproto/ofproto.c:1879:31 20 0x4cb388 in bridge_run__ vswitchd/bridge.c:3281:9 21 0x4c9b00 in bridge_run vswitchd/bridge.c:3346:5 22 0x526043 in main vswitchd/ovs-vswitchd.c:130:9 23 0x7f1192 in __libc_start_call_main 24 0x7f1192 in __libc_start_main@GLIBC_2.2.5 25 0x432b24 in _start (vswitchd/ovs-vswitchd+0x432b24) The issue is caught by running the 'debug_slow' test under the memory sanitizer. Another way to reproduce is by sending two packets at once through the datapath. The first one will get the same memory chunk as the upcalled packet with already calculated RSS, the second one will get the brand new memory chunk without the calculated RSS, so these two packets will have different source ports after encapsulation. The test is updated to cover this case. Fix the issue by checking if the hash is valid before using, re-parsing and calculating if it is not. The netdev_tnl_get_src_port() function moved to the .c file, since there is no real reason for it to be in the header. Compiler can decide on inlining it. The declaration kept in the header, since all the other functions declared there, even if there is no reason for that. In the future we may want to consolidate all the places where we re-calculate RSS hash into a single function, but it's a little tricky. This is also a larger change that would be harder to backport. So, not touching that aspect for now. Re-parsing the packet eliminates advantages of the simple match, but it was designed primarily for very simple setups that do not involve tunneling or any other complex processing, so it should not be a big problem. And simple match can still be used with tunneling when the input port provides the RSS hash. Also, checking if the hash is valid is a right thing to do anyways. Next step might be to not use simple match when there is no RSS hash and there is a tunnel push action, but it seems hard to implement, especially since we don't know the actions until we lookup the flow. Fixes: e7e9973b80d3 ("dpif-netdev: Forwarding optimization for flows with a simple match.") Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-12-02 20:21:26 +01:00
Eelco Chaudron	4a9c06ba0a	netdev-native-tnl: Fix Coverity integer overflows report. Fixed potential integer overflow in netdev_srv6_pop_header(), by making sure the packet length does at least account for the IPv6 header. Fixes: 03fc1ad78521 ("userspace: Add SRv6 tunnel support.") Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Aaron Conole <aconole@redhat.com>	2024-08-29 11:18:07 +02:00
Mike Pattrick	d7a9a9eb62	userspace: Correctly set ip offload flag in native tunneling. Coverity identified the following issue CID 425094: (#1 of 1): Unchecked return value (CHECKED_RETURN) 4. check_return: Calling dp_packet_hwol_tx_ip_csum without checking return value (as is done elsewhere 9 out of 11 times). This appears to be a true positive, the fields getter was called instead of its setter. Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.") Reported-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Simon Horman <horms@ovn.org> Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>	2024-08-29 11:17:45 +02:00
Mike Pattrick	d7e77143fb	tunnel: Allow UDP zero checksum with IPv6 tunnels. This patch adopts the proposed RFC 6935 by allowing null UDP checksums even if the tunnel protocol is IPv6. This is already supported by Linux through the udp6zerocsumtx tunnel option. It is disabled by default and IPv6 tunnels are flagged as requiring a checksum, but this patch enables the user to set csum=false on IPv6 tunnels. Acked-by: Simon Horman <horms@ovn.org> Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-07-12 11:54:45 +02:00
David Marchand	c39a84c131	netdev-dpdk: Refactor tunnel checksum offloading. All information required for checksum offloading can be deduced by already tracked dp_packet l3_ofs, l4_ofs, inner_l3_ofs and inner_l4_ofs fields. Remove DPDK specific l[2-4]_len from generic OVS code. netdev-dpdk code then fills mbuf specifics step by step: - outer_l2_len and outer_l3_len are needed for tunneling (and below features), - l2_len and l3_len are needed for IP and L4 checksum (and below features), - l4_len and tso_segsz are needed when doing TSO, Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Kevin Traynor <ktraynor@redhat.com>	2024-06-06 17:10:29 +01:00
Mike Pattrick	8359cc422e	netdev-native-tnl: Fix use of uninitialized offset on SRv6 header pop. Clang's static analyzer will complain about uninitialized value 'hlen' because we weren't properly checking the error code from a function that would have initialized the value. Instead, add a check for that return code. Fixes: 03fc1ad78521 ("userspace: Add SRv6 tunnel support.") Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-05-28 22:18:55 +02:00
Ilya Maximets	320f7e1a40	srv6: Fix misaligned writes to segment list. Segments list in SRv6 header is 16-bit aligned as most of other fields in packet headers. A little counter-intuitively, compilers are allowed to make alignment assumptions based on the pointer type passed to memcpy(), so they can use copy instructions that require 32-bit alignment in case of struct in6_addr pointer. Reported by UBsan in Clang 18: lib/netdev-native-tnl.c:985:16: runtime error: store to misaligned address 0x7fd9e97351ce for type 'struct in6_addr *', which requires 4 byte alignment 0x7fd9e97351ce: note: pointer points here 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ^ 0 0xc1de38 in netdev_srv6_build_header lib/netdev-native-tnl.c:985:9 1 0x6e794b in tnl_port_build_header ofproto/tunnel.c:751:11 2 0x6c9c0a in native_tunnel_output ofproto/ofproto-dpif-xlate.c:3887:11 3 0x6c9c0a in compose_output_action__ ofproto/ofproto-dpif-xlate.c:4502:13 4 0x6b6646 in compose_output_action ofproto/ofproto-dpif-xlate.c:4564:5 5 0x6b6646 in xlate_output_action ofproto/ofproto-dpif-xlate.c:5517:13 6 0x68cfee in do_xlate_actions ofproto/ofproto-dpif-xlate.c:7288:13 7 0x67fed0 in xlate_actions ofproto/ofproto-dpif-xlate.c:8314:13 8 0x6468bd in ofproto_trace__ ofproto/ofproto-dpif-trace.c:782:30 9 0x64484a in ofproto_trace ofproto/ofproto-dpif-trace.c:851:5 10 0x647469 in ofproto_unixctl_trace ofproto/ofproto-dpif-trace.c:490:9 11 0xc33771 in process_command lib/unixctl.c:310:13 12 0xc33771 in run_connection lib/unixctl.c:344:17 13 0xc33771 in unixctl_server_run lib/unixctl.c:395:21 14 0x53e6ef in main vswitchd/ovs-vswitchd.c:131:9 15 0x7f61c7 in __libc_start_call_main (/lib64/libc.so.6+0x2a1c7) 16 0x7f628a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a28a) 17 0x42ca24 in _start (vswitchd/ovs-vswitchd+0x42ca24) SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior lib/netdev-native-tnl.c:985:16 Having misaligned pointers is also generally not allowed in C, let alone accessing memory through them. Fix that by using an appropriate ovs_16aligned_in6_addr pointer instead. Fixes: 7381fd440a88 ("odp: Add SRv6 tunnel actions.") Fixes: 03fc1ad78521 ("userspace: Add SRv6 tunnel support.") Reviewed-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-05-22 23:00:27 +02:00
Mike Pattrick	f81d782c19	netdev-native-tnl: Mark all vxlan/geneve packets as tunneled. Previously some packets were excluded from the tunnel mark if they weren't L4. However, this causes problems with multi encapsulated packets like arp. Due to these flags being set, additional checks are required in checksum modification code. Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.") Reported-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-02-16 15:23:26 +01:00
Mike Pattrick	85bcbbed83	userspace: Enable tunnel tests with TSO. This patch enables most of the tunnel tests in the testsuite, and adds a large TCP transfer to a vxlan and geneve test to verify TSO functionality. Some additional changes were required to accommodate these changes with netdev-linux interfaces. The test for vlan over vxlan is purposely not enabled as the traffic produced by this test gives incorrect values in the vnet header. Acked-by: Simon Horman <horms@ovn.org> Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-01-17 22:06:51 +01:00
Dexia Li	084c808729	userspace: Support VXLAN and GENEVE TSO. For userspace datapath, this patch provides vxlan and geneve tunnel tso. Only support userspace vxlan or geneve tunnel, meanwhile support tunnel outter and inner csum offload. If netdev do not support offload features, there is a software fallback.If netdev do not support vxlan and geneve tso,packets will drop. Front-end devices can close offload features by ethtool also. Acked-by: Simon Horman <horms@ovn.org> Signed-off-by: Dexia Li <dexia.li@jaguarmicro.com> Co-authored-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-01-17 22:06:45 +01:00
James Raphael Tiovalen	010c256caa	lib: Add non-null assertions to some return values of `dp_packet_data`. This commit adds some `ovs_assert()` checks to some return values of `dp_packet_data()` to ensure that they are not NULL and to prevent null-pointer dereferences, which might lead to unwanted crashes. We use assertions since it should be impossible for these calls to `dp_packet_data()` to return NULL. Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Aaron Conole <aconole@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: James Raphael Tiovalen <jamestiotio@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-09-25 12:53:06 +02:00
Mike Pattrick	3337e6d91c	userspace: Enable L4 checksum offloading by default. The netdev receiving packets is supposed to provide the flags indicating if the L4 checksum was verified and it is OK or BAD, otherwise the stack will check when appropriate by software. If the packet comes with good checksum, then postpone the checksum calculation to the egress device if needed. When encapsulate a packet with that flag, set the checksum of the inner L4 header since that is not yet supported. Calculate the L4 checksum when the packet is going to be sent over a device that doesn't support the feature. Linux tap devices allows enabling L3 and L4 offload, so this patch enables the feature. However, Linux socket interface remains disabled because the API doesn't allow enabling those two features without enabling TSO too. Signed-off-by: Flavio Leitner <fbl@sysclose.org> Co-authored-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-06-15 23:50:30 +02:00
Mike Pattrick	5d11c47d3e	userspace: Enable IP checksum offloading by default. The netdev receiving packets is supposed to provide the flags indicating if the IP checksum was verified and it is GOOD or BAD, otherwise the stack will check when appropriate by software. If the packet comes with good checksum, then postpone the checksum calculation to the egress device if needed. When encapsulate a packet with that flag, set the checksum of the inner IP header since that is not yet supported. Calculate the IP checksum when the packet is going to be sent over a device that doesn't support the feature. Linux devices don't support IP checksum offload alone, so the support is not enabled. Signed-off-by: Flavio Leitner <fbl@sysclose.org> Co-authored-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-06-15 23:49:51 +02:00
Nobuhiro MIKI	701c2dbfb8	userspace: Add new option srv6_flowlabel in SRv6 tunnel. It supports flowlabel based load balancing by controlling the flowlabel of outer IPv6 header, which is already implemented in Linux kernel as seg6_flowlabel sysctl [1]. [1]: https://docs.kernel.org/networking/seg6-sysctl.html Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-05-25 17:08:32 +02:00
Nobuhiro MIKI	f328fd4892	netdev-native-tnl: Add ipv6_label param in netdev_tnl_ip_build_header. For tunnels such as SRv6, some popular vendor appliances support IPv6 flowlabel based load balancing. In preparation for OVS to support it, this patch modifies the encapsulation to allow IPv6 flowlabel to be configured. Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-05-25 15:45:41 +02:00
Nobuhiro MIKI	eb8c19ebac	netdev-native-tnl: Add ipv6_label param in netdev_tnl_push_ip_header. For tunnels such as SRv6, some popular vendor appliances support IPv6 flowlabel based load balancing. In preparation for OVS to support it, this patch modifies the encapsulation to allow IPv6 flowlabel to be configured. Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-05-25 15:45:41 +02:00
Ilya Maximets	ce8828a372	netdev-vport: RCU-fy tunnel config. Tunnel config can be accessed by multiple threads at the same time and it is supposed to be protected by the netdev_vport mutex. However, many functions are getting direct access to it via netdev API without taking the mutex, creating a potential for various race conditions. Fix that by protecting the tunnel config with RCU. The whole structure is replaced on configuration changes. Individual fields are never updated and the structure itself is constant. This way it can be safely used by different threads within RCU grace period. Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-05-25 15:45:35 +02:00
Ilya Maximets	be6f096fbe	netdev-vport: Fix unsafe handling of GRE sequence number. GRE sequence number is maintained as part of the tunnel config. This triggers tunnel reconfiguration every time set_tunnel_config() is called, because memset over tunnel config will never be equal to the new config constructed from database options. And sequence number incremented non-atomically without holding a mutex on tunnel push, that may lead to corruption if multiple threads are sending packets to the same tunnel. Fix that by moving sequence number to the netdev_vport structure instead and using an atomic counter. Fixes: 0ffff4975308 ("userspace: add gre sequence number support.") Fixes: 7dc18ae96d33 ("userspace: add erspan tunnel support.") Fixes: 3c6d05a02e0f ("userspace: Add GTP-U support.") Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-05-25 15:45:08 +02:00
Nobuhiro MIKI	03fc1ad785	userspace: Add SRv6 tunnel support. SRv6 (Segment Routing IPv6) tunnel vport is responsible for encapsulation and decapsulation the inner packets with IPv6 header and an extended header called SRH (Segment Routing Header). See spec in: https://datatracker.ietf.org/doc/html/rfc8754 This patch implements SRv6 tunneling in userspace datapath. It uses `remote_ip` and `local_ip` options as with existing tunnel protocols. It also adds a dedicated `srv6_segs` option to define a sequence of routers called segment list. Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-03-29 22:16:04 +02:00
Yunjian Wang	46ee18f884	lib: Remove duplicated includes Remove duplicated includes. Acked-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Signed-off-by: William Tu <u9012063@gmail.com>	2020-07-14 06:45:44 -07:00
William Tu	3c6d05a02e	userspace: Add GTP-U support. GTP, GPRS Tunneling Protocol, is a group of IP-based communications protocols used to carry general packet radio service (GPRS) within GSM, UMTS and LTE networks. GTP protocol has two parts: Signalling (GTP-Control, GTP-C) and User data (GTP-User, GTP-U). GTP-C is used for setting up GTP-U protocol, which is an IP-in-UDP tunneling protocol. Usually GTP is used in connecting between base station for radio, Serving Gateway (S-GW), and PDN Gateway (P-GW). This patch implements GTP-U protocol for userspace datapath, supporting only required header fields and G-PDU message type. See spec in: https://tools.ietf.org/html/draft-hmm-dmm-5g-uplane-analysis-00 Tested-at: https://travis-ci.org/github/williamtu/ovs-travis/builds/666518784 Signed-off-by: Feng Yang <yangfengee04@gmail.com> Co-authored-by: Feng Yang <yangfengee04@gmail.com> Signed-off-by: Yi Yang <yangyi01@inspur.com> Co-authored-by: Yi Yang <yangyi01@inspur.com> Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Ben Pfaff <blp@ovn.org>	2020-03-25 20:26:51 -07:00
William Tu	9bf871c401	trivial: Fix erspan coding style. Fix indentation and whitespace. Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Ben Pfaff <blp@ovn.org>	2019-12-03 16:24:25 -08:00
Ben Pfaff	f5129153e3	treewide: Remove pointless "return;" at ends of functions. Found with: git ls-files \| xargs pcregrep -n -M 'return;\n*}' Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Darrell Ball <dlu998@gmail.com> Tested-by: Darrell Ball <dlu998@gmail.com>	2018-07-09 20:53:06 -07:00
Darrell Ball	8af725e8b5	netdev-native-tnl: Fix alignment for erspan index. Flagged by clang. CC: William Tu <u9012063@gmail.com> Fixes: 068794b43f0e ("erspan: Add flow-based erspan options") Signed-off-by: Darrell Ball <dlu998@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: William Tu <u9012063@gmail.com>	2018-05-24 10:13:25 -07:00
Greg Rose	068794b43f	erspan: Add flow-based erspan options The patch add supports for flow-based erspan options. The erspan_ver, erspan_idx, erspan_dir, and erspan_hwid can be set as "flow" so that its value is set by the openflow rule, instead of statically configured at port creation time. Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2018-05-21 20:33:30 -07:00
Greg Rose	3b10ceeed1	ip6gre: Add ip6gre vport type Add handlers for OVS_VPORT_TYPE_IP6GRE Cc: Ben Pfaff <blp@ovn.org> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: William Tu <u9012063@gmail.com>	2018-05-21 20:33:30 -07:00
William Tu	7dc18ae96d	userspace: add erspan tunnel support. ERSPAN is a tunneling protocol based on GRE tunnel. The patch add erspan tunnel support for ovs-vswitchd with userspace datapath. Configuring erspan tunnel is similar to gre tunnel, but with additional erspan's parameters. Matching a flow on erspan's metadata is also supported, see ovs-fields for more details. Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2018-05-21 20:33:30 -07:00
William Tu	0ffff49753	userspace: add gre sequence number support. The patch adds support for gre sequence number. Default is disable. When enable with 'options:seq=true', the outgoing gre packet will have its sequence number incremented by one. Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2018-05-21 20:33:30 -07:00
William Tu	754f8acb45	netdev-native-tnl: refactor the tunnel push header. The patch adds additional 'struct netdev *' to the native tunnel's push_header() interface. This is used for later GRE sequence number support. Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2018-05-21 20:33:30 -07:00
William Tu	3456684ed0	userspace: return correct ipv6 header len. The ipv6 header len might have extension header, but current code simply returns fixed ipv6 header length 40-byte. Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2018-04-04 17:23:18 -07:00
Bhanuprakash Bodireddy	1fc2e1bd07	netdev-native-tnl: Add assertion in vxlan_pop_header. During tunnel decapsulation the below steps are performed: [1] Tunnel information is populated in packet metadata i.e packet->md->tunnel. [2] Outer header gets popped. [3] Packet is recirculated. For [1] to work, the dp_packet L3 and L4 header offsets should be valid. The offsets in the dp_packet are set as part of miniflow extraction. If offsets are accidentally reset (or) the pop header operation is performed prior to miniflow extraction, step [1] fails silently and creates issues that are harder to debug. Add the assertion to check if the offsets are valid. Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2018-01-12 13:02:18 -08:00
Ben Pfaff	b2befd5bb2	sparse: Add guards to prevent FreeBSD-incompatible #include order. FreeBSD insists that <sys/types.h> be included before <netinet/in.h> and that <netinet/in.h> be included before <arpa/inet.h>. This adds guards to the "sparse" headers to yield a warning if this order is violated. This commit also adjusts the order of many #includes to suit this requirement. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org>	2017-12-22 12:58:02 -08:00
Jan Scheurich	478b14731c	userspace: add NSH support to vxlan-gpe tunnels Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2017-08-07 11:26:16 -07:00
Sugesh Chandran	7c12dfc527	tunneling: Avoid datapath-recirc by combining recirc actions at xlate. This patch set removes the recirculation of encapsulated tunnel packets if possible. It is done by computing the post tunnel actions at the time of translation. The combined nested action set are programmed in the datapath using CLONE action. The following test results shows the performance improvement offered by this optimization for tunnel encap. +-------------+ dpdk0 \| \| -->o br-in \| \| o--> gre0 +-------------+ --> LOCAL +-----------o-+ \| \| dpdk1 \| br-p1 o--> \| \| +-------------+ Test result on OVS master with DPDK 16.11.2 (Without optimization): # dpdk0 RX packets : 7037641.60 / sec RX packet errors : 0 / sec RX packets dropped : 7730632.90 / sec RX rate : 402.69 MB/sec # dpdk1 TX packets : 7037641.60 / sec TX packet errors : 0 / sec TX packets dropped : 0 / sec TX rate : 657.73 MB/sec TX processing cost per TX packets in nsec : 142.09 Test result on OVS master + DPDK 16.11.2 (With optimization): # dpdk0 RX packets : 9386809.60 / sec RX packet errors : 0 / sec RX packets dropped : 5381496.40 / sec RX rate : 537.11 MB/sec # dpdk1 TX packets : 9386809.60 / sec TX packet errors : 0 / sec TX packets dropped : 0 / sec TX rate : 877.29 MB/sec TX processing cost per TX packets in nsec : 106.53 The offered performance gain is approx 30%. Signed-off-by: Sugesh Chandran <sugesh.chandran@intel.com> Signed-off-by: Zoltán Balogh <zoltan.balogh@ericsson.com> Co-authored-by: Zoltán Balogh <zoltan.balogh@ericsson.com> Signed-off-by: Joe Stringer <joe@ovn.org>	2017-07-19 14:34:20 -07:00
Sugesh Chandran	ce8bbd37ff	tunneling: Calculate and update packet l4_offset in tunnel push. The following tunnel combine patch series avoids the packets recirculation after the tunnel push. So it is necessary to populate all relevant packet meta data fields for the following combined action-set. Consider a chained tunnel test case shown below, PKT-IN --> TUNNEL_PUSH --> MOD_PKT_HDR --> TUNNEL_POP In this eg: the last tunnel_pop operation uses the l4_offset in the packet to validate the packets. So it must be calculated and updated in the packet before executing the action. Since there is no recirculation now on, this calculation is doing as part of tunnel_push. Signed-off-by: Sugesh Chandran <sugesh.chandran@intel.com> Signed-off-by: Zoltán Balogh <zoltan.balogh@ericsson.com> Co-authored-by: Zoltán Balogh <zoltan.balogh@ericsson.com> Signed-off-by: Joe Stringer <joe@ovn.org>	2017-07-19 14:34:20 -07:00
Ben Pfaff	875ab13020	userspace: Handling of versatile tunnel ports In netdev_gre_build_header(), GRE protocol and VXLAN next_potocol is set based on packet_type of flow. If it's about an Ethernet packet, it is set to ETP_TYPE_TEB. Otherwise, if the name space is OFPHTN_ETHERNET, it is set according to the name space type. Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2017-06-27 17:28:30 -04:00
Georg Schmuecking	439f39cb9b	userspace: add vxlan gpe support to vport This patch is based on the "datapath: enable vxlangpe creation in compat mode" from Yi Yang. It introduces an extension option "gpe" to the vxlan port in the netdev-dpdk datapath. Description of vxlan gpe protocoll was added to header file lib/packets.h. In the vxlan specific methods the different packet are introduced and handled. Added VXLAN GPE tunnel push test. Signed-off-by: Yi Yang <yi.y.yang at intel.com> Signed-off-by: Georg Schmuecking <georg.schmuecking@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2017-06-02 15:01:20 -07:00
Jan Scheurich	63171f047f	userspace: L3 tunnel support for GRE and LISP Add a boolean "layer3" configuration option for tunnel vports. The layer3 option defaults to false for all ports except LISP. GRE ports accept both true and false for "layer3". A tunnel vport configured with layer3=true receives L3 packets. which are then converted to Ethernet packets by pushing a dummy Ethernet heder at the ingress of the OpenFlow pipeline. The Ethernet header of a packet is stripped before sending to a layer3 tunnel vport. Presently a single GRE vport cannot carry both L2 and L3 packets. But it is possible to create two GRE vports representing the same GRE tunel, one with layer3=false, the other with layer3=true. L2 packet from the tunnel are received on the first vport, L3 packets on the second. The controller must send packets to the layer3 GRE vport to tunnel them without their Ethernet header. Units tests have been added to check the L3 tunnel handling. LISP tunnels are not yet supported by the netdev userspace datapath. Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Co-authored-by: Zoltan Balogh <zoltan.balogh@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2017-06-02 14:40:34 -07:00
Jan Scheurich	2482b0b0c8	userspace: Add packet_type in dp_packet and flow This commit adds a packet_type attribute to the structs dp_packet and flow to explicitly carry the type of the packet as prepration for the introduction of the so-called packet type-aware pipeline (PTAP) in OVS. The packet_type is a big-endian 32 bit integer with the encoding as specified in OpenFlow verion 1.5. The upper 16 bits contain the packet type name space. Pre-defined values are defined in openflow-common.h: enum ofp_header_type_namespaces { OFPHTN_ONF = 0, /* ONF namespace. / OFPHTN_ETHERTYPE = 1, / ns_type is an Ethertype. / OFPHTN_IP_PROTO = 2, / ns_type is a IP protocol number. / OFPHTN_UDP_TCP_PORT = 3, / ns_type is a TCP or UDP port. / OFPHTN_IPV4_OPTION = 4, / ns_type is an IPv4 option number. */ }; The lower 16 bits specify the actual type in the context of the name space. Only name spaces 0 and 1 will be supported for now. For name space OFPHTN_ONF the relevant packet type is 0 (Ethernet). This is the default packet_type in OVS and the only one supported so far. Packets of type (OFPHTN_ONF, 0) are called Ethernet packets. In name space OFPHTN_ETHERTYPE the type is the Ethertype of the packet. A packet of type (OFPHTN_ETHERTYPE, <Ethertype>) is a standard L2 packet whith the Ethernet header (and any VLAN tags) removed to expose the L3 (or L2.5) payload of the packet. These will simply be called L3 packets. The Ethernet address fields dl_src and dl_dst in struct flow are not applicable for an L3 packet and must be zero. However, to maintain compatibility with the large code base, we have chosen to copy the Ethertype of an L3 packet into the the dl_type field of struct flow. This does not mean that it will be possible to match on dl_type for L3 packets with PTAP later on. Matching must be done on packet_type instead. New dp_packets are initialized with packet_type Ethernet. Ports that receive L3 packets will have to explicitly adjust the packet_type. Signed-off-by: Jean Tourrilhes <jt@labs.hpe.com> Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Co-authored-by: Zoltan Balogh <zoltan.balogh@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2017-05-03 16:56:40 -07:00
Sugesh Chandran	1a2bb11817	netdev-dpdk: Enable Rx checksum offloading feature on DPDK physical ports. Add Rx checksum offloading feature support on DPDK physical ports. By default, the Rx checksum offloading is enabled if NIC supports. However, the checksum offloading can be turned OFF either while adding a new DPDK physical port to OVS or at runtime. The rx checksum offloading can be turned off by setting the parameter to 'false'. For eg: To disable the rx checksum offloading when adding a port, 'ovs-vsctl add-port br0 dpdk0 -- \ set Interface dpdk0 type=dpdk options:rx-checksum-offload=false' OR (to disable at run time after port is being added to OVS) 'ovs-vsctl set Interface dpdk0 options:rx-checksum-offload=false' Similarly to turn ON rx checksum offloading at run time, 'ovs-vsctl set Interface dpdk0 options:rx-checksum-offload=true' The Tx checksum offloading support is not implemented due to the following reasons. 1) Checksum offloading and vectorization are mutually exclusive in DPDK poll mode driver. Vector packet processing is turned OFF when checksum offloading is enabled which causes significant performance drop at Tx side. 2) Normally, OVS generates checksum for tunnel packets in software at the 'tunnel push' operation, where the tunnel headers are created. However enabling Tx checksum offloading involves, ) Mark every packets for tx checksum offloading at 'tunnel_push' and recirculate. ) At the time of xmit, validate the same flag and instruct the NIC to do the checksum calculation. In case NIC doesnt support Tx checksum offloading, the checksum calculation has to be done in software before sending out the packets. No significant performance improvement noticed with Tx checksum offloading due to the e overhead of additional validations + non vector packet processing. In some test scenarios, it introduces performance drop too. Rx checksum offloading still offers 8-9% of improvement on VxLAN tunneling decapsulation even though the SSE vector Rx function is disabled in DPDK poll mode driver. Signed-off-by: Sugesh Chandran <sugesh.chandran@intel.com> Acked-by: Jesse Gross <jesse@kernel.org> Acked-by: Pravin B Shelar <pshelar@ovn.org>	2017-01-04 01:10:35 -08:00
Ryan Moats	ece9c2947d	Explain initialization when using csum() The checksum method csum() requires its output location to be intialized to zero when that output location is part of the checksum. Add comments to the various places where csum is called documenting where the initialization has occurred. Signed-off-by: Ryan Moats <rmoats@us.ibm.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2016-07-24 11:47:28 -07:00
Ben Pfaff	3d75c66007	netdev-native-tnl: Fix treatment of GRE key on big-endian systems. The GRE implementation used bitwise shifts to convert an ovs_be32 to an ovs_be64 (with zero extension), but on big-endian systems these conversions are no-ops. This fixes the problem. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Aaron Conole <aconole@redhat.com> Acked-by: Gerhard Stenzel <gstenzel@linux.vnet.ibm.com>	2016-06-03 13:18:19 -07:00
Thadeu Lima de Souza Cascardo	68da36feee	netdev-vport: Update copyright headers Red Hat has contributed to the original code that has moved to netdev-native-tnl module and to code that has been kept in netdev-vport as well. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Signed-off-by: Jesse Gross <jesse@kernel.org>	2016-06-02 12:48:51 -07:00
Thadeu Lima de Souza Cascardo	aca40d4f49	netdev-vport: remove unneeded headers Throughout the years, changes in netdev vport have removed the need for some of the headers, like shash, hmap, and many others. With the recent split of push/pop code, less headers are needed in each of the two modules. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Signed-off-by: Jesse Gross <jesse@kernel.org>	2016-06-02 11:50:34 -07:00
Pravin B Shelar	98c086db48	netdev-native-tnl: Fix IPv6 tos bits handling. IPv6 tunnels ignores outer tos bits on recieve and does not set it on xmit. Following patch fixes it. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>	2016-05-23 20:27:14 -07:00
Pravin B Shelar	4975aa3ee6	netdev-native-tnl: Introduce ip_build_header() The native tunneling build tunnel header code is spread across two different modules, it makes pretty hard to follow the code. Following patch refactors the code to move all code to netdev-ative-tnl module. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>	2016-05-23 20:27:14 -07:00
YAMAMOTO Takashi	67eaddc000	netdev-native-tnl: Fix a build error on NetBSD 7.0 netinet/ip6.h is not a standalone header there. Signed-off-by: YAMAMOTO Takashi <yamamoto@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org> Tested-by: Jeff Feng <jianhua@us.ibm.com>	2016-05-23 13:49:39 +09:00
Pravin B Shelar	1c8f98d96a	netdev: Return number of packet from netdev_pop_header() Current tunnel-pop API does not allow the netdev implementation retain a packet but STT can keep a packet from batch of packets during TCP reassembly processing. To return exact count of valid packet STT need to pass this number of packet parameter as a reference. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>	2016-05-18 19:39:18 -07:00
Pravin B Shelar	6b241d6452	netdev-vport: Factor-out tunnel Push-pop code into separate module. It is better to move tunnel push-pop action specific functions into separate module. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>	2016-05-18 19:39:18 -07:00

49 Commits