Segments list in SRv6 header is 16-bit aligned as most of other fields
in packet headers. A little counter-intuitively, compilers are allowed
to make alignment assumptions based on the pointer type passed to
memcpy(), so they can use copy instructions that require 32-bit alignment
in case of struct in6_addr pointer. Reported by UBsan in Clang 18:
lib/netdev-native-tnl.c:985:16: runtime error: store to misaligned
address 0x7fd9e97351ce for type 'struct in6_addr *', which
requires 4 byte alignment
0x7fd9e97351ce: note: pointer points here
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
^
0 0xc1de38 in netdev_srv6_build_header lib/netdev-native-tnl.c:985:9
1 0x6e794b in tnl_port_build_header ofproto/tunnel.c:751:11
2 0x6c9c0a in native_tunnel_output ofproto/ofproto-dpif-xlate.c:3887:11
3 0x6c9c0a in compose_output_action__ ofproto/ofproto-dpif-xlate.c:4502:13
4 0x6b6646 in compose_output_action ofproto/ofproto-dpif-xlate.c:4564:5
5 0x6b6646 in xlate_output_action ofproto/ofproto-dpif-xlate.c:5517:13
6 0x68cfee in do_xlate_actions ofproto/ofproto-dpif-xlate.c:7288:13
7 0x67fed0 in xlate_actions ofproto/ofproto-dpif-xlate.c:8314:13
8 0x6468bd in ofproto_trace__ ofproto/ofproto-dpif-trace.c:782:30
9 0x64484a in ofproto_trace ofproto/ofproto-dpif-trace.c:851:5
10 0x647469 in ofproto_unixctl_trace ofproto/ofproto-dpif-trace.c:490:9
11 0xc33771 in process_command lib/unixctl.c:310:13
12 0xc33771 in run_connection lib/unixctl.c:344:17
13 0xc33771 in unixctl_server_run lib/unixctl.c:395:21
14 0x53e6ef in main vswitchd/ovs-vswitchd.c:131:9
15 0x7f61c7 in __libc_start_call_main (/lib64/libc.so.6+0x2a1c7)
16 0x7f628a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a28a)
17 0x42ca24 in _start (vswitchd/ovs-vswitchd+0x42ca24)
SUMMARY: UndefinedBehaviorSanitizer:
undefined-behavior lib/netdev-native-tnl.c:985:16
Having misaligned pointers is also generally not allowed in C, let
alone accessing memory through them.
Fix that by using an appropriate ovs_16aligned_in6_addr pointer instead.
Fixes: 7381fd440a ("odp: Add SRv6 tunnel actions.")
Fixes: 03fc1ad785 ("userspace: Add SRv6 tunnel support.")
Reviewed-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Previously some packets were excluded from the tunnel mark if they
weren't L4. However, this causes problems with multi encapsulated
packets like arp.
Due to these flags being set, additional checks are required in checksum
modification code.
Fixes: 084c808729 ("userspace: Support VXLAN and GENEVE TSO.")
Reported-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This patch enables most of the tunnel tests in the testsuite, and adds a
large TCP transfer to a vxlan and geneve test to verify TSO
functionality. Some additional changes were required to accommodate these
changes with netdev-linux interfaces. The test for vlan over vxlan is
purposely not enabled as the traffic produced by this test gives
incorrect values in the vnet header.
Acked-by: Simon Horman <horms@ovn.org>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
For userspace datapath, this patch provides vxlan and geneve tunnel tso.
Only support userspace vxlan or geneve tunnel, meanwhile support
tunnel outter and inner csum offload. If netdev do not support offload
features, there is a software fallback.If netdev do not support vxlan
and geneve tso,packets will drop. Front-end devices can close offload
features by ethtool also.
Acked-by: Simon Horman <horms@ovn.org>
Signed-off-by: Dexia Li <dexia.li@jaguarmicro.com>
Co-authored-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This commit adds some `ovs_assert()` checks to some return values of
`dp_packet_data()` to ensure that they are not NULL and to prevent
null-pointer dereferences, which might lead to unwanted crashes. We use
assertions since it should be impossible for these calls to
`dp_packet_data()` to return NULL.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: James Raphael Tiovalen <jamestiotio@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The netdev receiving packets is supposed to provide the flags
indicating if the L4 checksum was verified and it is OK or BAD,
otherwise the stack will check when appropriate by software.
If the packet comes with good checksum, then postpone the
checksum calculation to the egress device if needed.
When encapsulate a packet with that flag, set the checksum
of the inner L4 header since that is not yet supported.
Calculate the L4 checksum when the packet is going to be sent
over a device that doesn't support the feature.
Linux tap devices allows enabling L3 and L4 offload, so this
patch enables the feature. However, Linux socket interface
remains disabled because the API doesn't allow enabling
those two features without enabling TSO too.
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Co-authored-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The netdev receiving packets is supposed to provide the flags
indicating if the IP checksum was verified and it is GOOD or BAD,
otherwise the stack will check when appropriate by software.
If the packet comes with good checksum, then postpone the
checksum calculation to the egress device if needed.
When encapsulate a packet with that flag, set the checksum
of the inner IP header since that is not yet supported.
Calculate the IP checksum when the packet is going to be sent over
a device that doesn't support the feature.
Linux devices don't support IP checksum offload alone, so the
support is not enabled.
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Co-authored-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
For tunnels such as SRv6, some popular vendor appliances support
IPv6 flowlabel based load balancing. In preparation for OVS to
support it, this patch modifies the encapsulation to allow IPv6
flowlabel to be configured.
Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
For tunnels such as SRv6, some popular vendor appliances support
IPv6 flowlabel based load balancing. In preparation for OVS to
support it, this patch modifies the encapsulation to allow IPv6
flowlabel to be configured.
Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Tunnel config can be accessed by multiple threads at the same time and
it is supposed to be protected by the netdev_vport mutex. However,
many functions are getting direct access to it via netdev API without
taking the mutex, creating a potential for various race conditions.
Fix that by protecting the tunnel config with RCU. The whole structure
is replaced on configuration changes. Individual fields are never
updated and the structure itself is constant. This way it can be safely
used by different threads within RCU grace period.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
GRE sequence number is maintained as part of the tunnel config.
This triggers tunnel reconfiguration every time set_tunnel_config()
is called, because memset over tunnel config will never be equal to
the new config constructed from database options.
And sequence number incremented non-atomically without holding a
mutex on tunnel push, that may lead to corruption if multiple
threads are sending packets to the same tunnel.
Fix that by moving sequence number to the netdev_vport structure
instead and using an atomic counter.
Fixes: 0ffff49753 ("userspace: add gre sequence number support.")
Fixes: 7dc18ae96d ("userspace: add erspan tunnel support.")
Fixes: 3c6d05a02e ("userspace: Add GTP-U support.")
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
SRv6 (Segment Routing IPv6) tunnel vport is responsible
for encapsulation and decapsulation the inner packets with
IPv6 header and an extended header called SRH
(Segment Routing Header). See spec in:
https://datatracker.ietf.org/doc/html/rfc8754
This patch implements SRv6 tunneling in userspace datapath.
It uses `remote_ip` and `local_ip` options as with existing
tunnel protocols. It also adds a dedicated `srv6_segs` option
to define a sequence of routers called segment list.
Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
GTP, GPRS Tunneling Protocol, is a group of IP-based communications
protocols used to carry general packet radio service (GPRS) within
GSM, UMTS and LTE networks. GTP protocol has two parts: Signalling
(GTP-Control, GTP-C) and User data (GTP-User, GTP-U). GTP-C is used
for setting up GTP-U protocol, which is an IP-in-UDP tunneling
protocol. Usually GTP is used in connecting between base station for
radio, Serving Gateway (S-GW), and PDN Gateway (P-GW).
This patch implements GTP-U protocol for userspace datapath,
supporting only required header fields and G-PDU message type.
See spec in:
https://tools.ietf.org/html/draft-hmm-dmm-5g-uplane-analysis-00
Tested-at: https://travis-ci.org/github/williamtu/ovs-travis/builds/666518784
Signed-off-by: Feng Yang <yangfengee04@gmail.com>
Co-authored-by: Feng Yang <yangfengee04@gmail.com>
Signed-off-by: Yi Yang <yangyi01@inspur.com>
Co-authored-by: Yi Yang <yangyi01@inspur.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Ben Pfaff <blp@ovn.org>
The patch add supports for flow-based erspan options.
The erspan_ver, erspan_idx, erspan_dir, and erspan_hwid can be
set as "flow" so that its value is set by the openflow rule,
instead of statically configured at port creation time.
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
ERSPAN is a tunneling protocol based on GRE tunnel. The patch
add erspan tunnel support for ovs-vswitchd with userspace datapath.
Configuring erspan tunnel is similar to gre tunnel, but with
additional erspan's parameters. Matching a flow on erspan's
metadata is also supported, see ovs-fields for more details.
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
The patch adds support for gre sequence number.
Default is disable. When enable with 'options:seq=true',
the outgoing gre packet will have its sequence number
incremented by one.
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
The patch adds additional 'struct netdev *' to the
native tunnel's push_header() interface. This is used
for later GRE sequence number support.
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
The ipv6 header len might have extension header, but current
code simply returns fixed ipv6 header length 40-byte.
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
During tunnel decapsulation the below steps are performed:
[1] Tunnel information is populated in packet metadata i.e packet->md->tunnel.
[2] Outer header gets popped.
[3] Packet is recirculated.
For [1] to work, the dp_packet L3 and L4 header offsets should be valid.
The offsets in the dp_packet are set as part of miniflow extraction.
If offsets are accidentally reset (or) the pop header operation is performed
prior to miniflow extraction, step [1] fails silently and creates
issues that are harder to debug. Add the assertion to check if the
offsets are valid.
Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
FreeBSD insists that <sys/types.h> be included before <netinet/in.h> and
that <netinet/in.h> be included before <arpa/inet.h>. This adds guards to
the "sparse" headers to yield a warning if this order is violated. This
commit also adjusts the order of many #includes to suit this requirement.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
The following tunnel combine patch series avoids the packets recirculation
after the tunnel push. So it is necessary to populate all relevant packet meta
data fields for the following combined action-set.
Consider a chained tunnel test case shown below,
PKT-IN --> TUNNEL_PUSH --> MOD_PKT_HDR --> TUNNEL_POP
In this eg: the last tunnel_pop operation uses the l4_offset in the packet to
validate the packets. So it must be calculated and updated in the packet before
executing the action. Since there is no recirculation now on, this calculation
is doing as part of tunnel_push.
Signed-off-by: Sugesh Chandran <sugesh.chandran@intel.com>
Signed-off-by: Zoltán Balogh <zoltan.balogh@ericsson.com>
Co-authored-by: Zoltán Balogh <zoltan.balogh@ericsson.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
In netdev_gre_build_header(), GRE protocol and VXLAN next_potocol is set based
on packet_type of flow. If it's about an Ethernet packet, it is set to
ETP_TYPE_TEB. Otherwise, if the name space is OFPHTN_ETHERNET, it is set
according to the name space type.
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
This patch is based on the "datapath: enable vxlangpe creation in compat mode"
from Yi Yang. It introduces an extension option "gpe" to the vxlan port in the
netdev-dpdk datapath. Description of vxlan gpe protocoll was added to header
file lib/packets.h. In the vxlan specific methods the different packet are
introduced and handled.
Added VXLAN GPE tunnel push test.
Signed-off-by: Yi Yang <yi.y.yang at intel.com>
Signed-off-by: Georg Schmuecking <georg.schmuecking@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Add a boolean "layer3" configuration option for tunnel vports.
The layer3 option defaults to false for all ports except LISP.
GRE ports accept both true and false for "layer3".
A tunnel vport configured with layer3=true receives L3 packets.
which are then converted to Ethernet packets by pushing a dummy
Ethernet heder at the ingress of the OpenFlow pipeline. The
Ethernet header of a packet is stripped before sending to a
layer3 tunnel vport.
Presently a single GRE vport cannot carry both L2 and L3 packets.
But it is possible to create two GRE vports representing the same
GRE tunel, one with layer3=false, the other with layer3=true.
L2 packet from the tunnel are received on the first vport, L3
packets on the second. The controller must send packets to the
layer3 GRE vport to tunnel them without their Ethernet header.
Units tests have been added to check the L3 tunnel handling.
LISP tunnels are not yet supported by the netdev userspace datapath.
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Co-authored-by: Zoltan Balogh <zoltan.balogh@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
This commit adds a packet_type attribute to the structs dp_packet and flow
to explicitly carry the type of the packet as prepration for the
introduction of the so-called packet type-aware pipeline (PTAP) in OVS.
The packet_type is a big-endian 32 bit integer with the encoding as
specified in OpenFlow verion 1.5.
The upper 16 bits contain the packet type name space. Pre-defined values
are defined in openflow-common.h:
enum ofp_header_type_namespaces {
OFPHTN_ONF = 0, /* ONF namespace. */
OFPHTN_ETHERTYPE = 1, /* ns_type is an Ethertype. */
OFPHTN_IP_PROTO = 2, /* ns_type is a IP protocol number. */
OFPHTN_UDP_TCP_PORT = 3, /* ns_type is a TCP or UDP port. */
OFPHTN_IPV4_OPTION = 4, /* ns_type is an IPv4 option number. */
};
The lower 16 bits specify the actual type in the context of the name space.
Only name spaces 0 and 1 will be supported for now.
For name space OFPHTN_ONF the relevant packet type is 0 (Ethernet).
This is the default packet_type in OVS and the only one supported so far.
Packets of type (OFPHTN_ONF, 0) are called Ethernet packets.
In name space OFPHTN_ETHERTYPE the type is the Ethertype of the packet.
A packet of type (OFPHTN_ETHERTYPE, <Ethertype>) is a standard L2 packet
whith the Ethernet header (and any VLAN tags) removed to expose the L3
(or L2.5) payload of the packet. These will simply be called L3 packets.
The Ethernet address fields dl_src and dl_dst in struct flow are not
applicable for an L3 packet and must be zero. However, to maintain
compatibility with the large code base, we have chosen to copy the
Ethertype of an L3 packet into the the dl_type field of struct flow.
This does not mean that it will be possible to match on dl_type for L3
packets with PTAP later on. Matching must be done on packet_type instead.
New dp_packets are initialized with packet_type Ethernet. Ports that
receive L3 packets will have to explicitly adjust the packet_type.
Signed-off-by: Jean Tourrilhes <jt@labs.hpe.com>
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Co-authored-by: Zoltan Balogh <zoltan.balogh@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Add Rx checksum offloading feature support on DPDK physical ports. By default,
the Rx checksum offloading is enabled if NIC supports. However,
the checksum offloading can be turned OFF either while adding a new DPDK
physical port to OVS or at runtime.
The rx checksum offloading can be turned off by setting the parameter to
'false'. For eg: To disable the rx checksum offloading when adding a port,
'ovs-vsctl add-port br0 dpdk0 -- \
set Interface dpdk0 type=dpdk options:rx-checksum-offload=false'
OR (to disable at run time after port is being added to OVS)
'ovs-vsctl set Interface dpdk0 options:rx-checksum-offload=false'
Similarly to turn ON rx checksum offloading at run time,
'ovs-vsctl set Interface dpdk0 options:rx-checksum-offload=true'
The Tx checksum offloading support is not implemented due to the following
reasons.
1) Checksum offloading and vectorization are mutually exclusive in DPDK poll
mode driver. Vector packet processing is turned OFF when checksum offloading
is enabled which causes significant performance drop at Tx side.
2) Normally, OVS generates checksum for tunnel packets in software at the
'tunnel push' operation, where the tunnel headers are created. However
enabling Tx checksum offloading involves,
*) Mark every packets for tx checksum offloading at 'tunnel_push' and
recirculate.
*) At the time of xmit, validate the same flag and instruct the NIC to do the
checksum calculation. In case NIC doesnt support Tx checksum offloading,
the checksum calculation has to be done in software before sending out the
packets.
No significant performance improvement noticed with Tx checksum offloading
due to the e overhead of additional validations + non vector packet processing.
In some test scenarios, it introduces performance drop too.
Rx checksum offloading still offers 8-9% of improvement on VxLAN tunneling
decapsulation even though the SSE vector Rx function is disabled in DPDK poll
mode driver.
Signed-off-by: Sugesh Chandran <sugesh.chandran@intel.com>
Acked-by: Jesse Gross <jesse@kernel.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
The checksum method csum() requires its output location to be
intialized to zero when that output location is part of the
checksum. Add comments to the various places where csum is
called documenting where the initialization has occurred.
Signed-off-by: Ryan Moats <rmoats@us.ibm.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
The GRE implementation used bitwise shifts to convert an ovs_be32 to an
ovs_be64 (with zero extension), but on big-endian systems these conversions
are no-ops. This fixes the problem.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Gerhard Stenzel <gstenzel@linux.vnet.ibm.com>
Red Hat has contributed to the original code that has moved to netdev-native-tnl
module and to code that has been kept in netdev-vport as well.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Signed-off-by: Jesse Gross <jesse@kernel.org>
Throughout the years, changes in netdev vport have removed the need for some of
the headers, like shash, hmap, and many others. With the recent split of
push/pop code, less headers are needed in each of the two modules.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Signed-off-by: Jesse Gross <jesse@kernel.org>
IPv6 tunnels ignores outer tos bits on recieve and does not
set it on xmit. Following patch fixes it.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
The native tunneling build tunnel header code is spread across
two different modules, it makes pretty hard to follow the code.
Following patch refactors the code to move all code to
netdev-ative-tnl module.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
Current tunnel-pop API does not allow the netdev implementation
retain a packet but STT can keep a packet from batch of packets
during TCP reassembly processing. To return exact count of
valid packet STT need to pass this number of packet parameter
as a reference.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
It is better to move tunnel push-pop action specific functions into
separate module.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>