1. No reason to have mbuf related APIs in a generic code.
2. Not only RSS/checksums should be invalidated in case of tunnel
decapsulation or sending to 'ring' ports.
In order to fix two above issues, new function
'dp_packet_reset_offload' introduced. In order to clean up/unify
the code and simplify addition of new offloading features to non-DPDK
version of dp_packet, introduced 'ol_flags' bitmask. Additionally
reduced code complexity in 'dp_packet_clone_with_headroom' by using
already existent generic APIs.
Unfortunately, we still need to have a special case for mbuf
initialization inside 'dp_packet_init__()'.
'dp_packet_init_specific()' introduced for this purpose as a generic
API for initialization of the implementation-specific fields.
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
This is needed in a subsequent patch and may otherwise be useful.
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
This new api is used in a subsequent patch and may otherwise be useful.
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
When enabled with DPDK OvS deals with two types of packets, the ones
coming from the mempool and the ones locally created by OvS - which are
copied to mempool mbufs before output. In the latter, the space is
allocated from the system, while in the former the mbufs are allocated
from a mempool, which takes care of initialising them appropriately.
In the current implementation, during mempool's initialisation of mbufs,
dp_packet_set_allocated() is called from dp_packet_init_dpdk() without
considering that the allocated space, in the case of multi-segment
mbufs, might be greater than a single mbuf. Furthermore, given that
dp_packet_init_dpdk() is on the code path that's called upon mempool's
initialisation, a call to dp_packet_set_allocated() is redundant, since
mempool takes care of initialising it.
To fix this, dp_packet_set_allocated() is no longer called after
initialisation of a mempool, only in dp_packet_init__(), which is still
called by OvS when initialising locally created packets.
Signed-off-by: Tiago Lam <tiago.lam@intel.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
dp_packets are created using xmalloc(); in the case of OvS-DPDK, it's
possible the the resultant mbuf portion of the dp_packet contains
random data. For some mbuf fields, specifically those related to
multi-segment mbufs and/or offload features, random values may cause
unexpected behaviour, should the dp_packet's contents be later copied
to a DPDK mbuf. It is critical therefore, that these fields should be
initialized to 0.
This patch ensures that the following mbuf fields are initialized to
appropriate values on creation of a new dp_packet:
- ol_flags=0
- nb_segs=1
- tx_offload=0
- packet_type=0
- next=NULL
Adapted from an idea by Michael Qiu <qiudayu@chinac.com>:
https://patchwork.ozlabs.org/patch/777570/
Co-authored-by: Tiago Lam <tiago.lam@intel.com>
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Tiago Lam <tiago.lam@intel.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
There is already an ifdef DPDK_NETDEV block, so instead of checking
on each and every function, move them to the right block.
No functional change.
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Tiago Lam <tiago.lam@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
So that we could skip some very costly CPU operations, including but
not limiting to miniflow_extract, emc lookup, dpcls lookup, etc. Thus,
performance could be greatly improved.
A PHY-PHY forwarding with 1000 mega flows (udp,tp_src=1000-1999) and
1 million streams (tp_src=1000-1999, tp_dst=2000-2999) show more that
260% performance boost.
Note that though the heavy miniflow_extract is skipped, we still have
to do per packet checking, due to we have to check the tcp_flags.
Co-authored-by: Finn Christensen <fc@napatech.com>
Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org>
Signed-off-by: Finn Christensen <fc@napatech.com>
Co-authored-by: Shahaf Shuler <shahafs@mellanox.com>
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
DPDK uses dp-packet pool for storing received packets. The pool is
reused by rxq_recv funcions of the DPDK netdevs. The datapath is
capable to modify the packet_type property of packets. For instance
when encapsulated L3 packets are received on a ptap gre port.
In this case the packet_type property of struct dp_packet can be
modified and later the same dp_packet with the modified packet_type
can be reused in the rxq_rec function, so it can contain corrupted
data.
The dp_packet_batch_init_cutlen() in the rxq_recv functions iterates
over dp_packets and sets their cutlen. So I modified this function
to set packet_type to Ethernet for the dp_packets as well. I also
renamed this function because of the added functionality.
The dp_packet_batch_init_cutlen() iterates over batch->count dp_packet.
Therefore setting of batch->count = nb_rx needs to be done before the
former function is invoked. This is an additional fix.
Signed-off-by: Zoltan Balogh <zoltan.balogh@ericsson.com>
Signed-off-by: Laszlo Suru <laszlo.suru@ericsson.com>
Co-authored-by: Laszlo Suru <laszlo.suru@ericsson.com>
CC: Jan Scheurich <jan.scheurich@ericsson.com>
CC: Sugesh Chandran <sugesh.chandran@intel.com>
CC: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
memcpy replaces the several single copies inside
dp_packet_clone_with_headroom().
Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
DPDK uses dp-packet pools and manages the mbuf portion of
each packet. When a pool is created, partial initialization is
also done on the OVS portion (i.e. non-mbuf). Since packet
memory is reused, this is not very useful for transient
fields and is also misleading. Furthermore, some of these
transient fields are properly initialized for DPDK packets
entering OVS anyways, which is the only reasonable way to do this.
Another field, cutlen, is initialized in this manner in the pool
and intended to be reset when cutlen is applied on sending the
packet out. However, if cutlen context is set but the packet is
not sent out for some reason, then the packet header would be
corrupted in the memory pool. It is better to just reset the
cutlen in the packets when received. I did not detect a
degradation in performance, however, I would be willing to
have some degradation, since this is a proper way to handle
this. In addition to initializing cutlen in received packets,
the other OVS transient fields are removed from the DPDK pool
initialization.
Acked-by: Sugesh Chandran <sugesh.chandran@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Reset the DPDK hwol flags in dp_packet_init_. The new hwol bad checksum
flag is uninitialized for non-dpdk ports and this is noticed as test
failures using netdev-dummy ports, when built with the --with-dpdk
flag set. Hence, in this case, packets may be falsely marked as having a
bad checksum. The existing APIs are simplified at the same time by
making them specific to either DPDK or otherwise; they also now
manage a single field.
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2017-August/045081.html
Fixes: 7451af618e0d ("dp-packet : Update DPDK rx checksum validation functions.")
CC: Sugesh Chandran <sugesh.chandran@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
DPDK ports use masks while reporting rx checksum flags. OVS should use these
mask along with reported checksum flag while validating the good checksum.
Added two new functions to validate bad checksum reported by DPDK NIC port.
These two functions will be used in the following patch for enabling rx checksum
offload in conntrack module.
Signed-off-by: Sugesh Chandran <sugesh.chandran@intel.com>
Co-authored-by: Darrell Ball <dball@vmware.com>
Signed-off-by: Darrell Ball <dball@vmware.com>
Acked-by: Antonio Fishetti <antonio.fischetti@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Function 'dp_packet_batch_refill_init' doesn't return anything.
Looks like this comment came from one of the intermediate versions
of the API enhancement patch. Additionally comment style changed
to be consistent with other comments in the same file.
CC: Andy Zhou <azhou@ovn.org>
Fixes: 72c84bc2db23 ("dp-packet: Enhance packet batch APIs.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Andy Zhou <azhou@ovn.org>
Without this applying of the cutlen action will not work
on copied batch. Cutlen works for linux and dummy netdevs
only because they tries to apply it per-packet inside
send function.
Cutlen action doesn't work for dpdk ports in case batch clone
occured because invoked by the 'dp_packet_batch_apply_cutlen()'.
CC: Andy Zhou <azhou@ovn.org>
Fixes: 72c84bc2db23 ("dp-packet: Enhance packet batch APIs.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Andy Zhou <azhou@ovn.org>
This commit adds a packet_type attribute to the structs dp_packet and flow
to explicitly carry the type of the packet as prepration for the
introduction of the so-called packet type-aware pipeline (PTAP) in OVS.
The packet_type is a big-endian 32 bit integer with the encoding as
specified in OpenFlow verion 1.5.
The upper 16 bits contain the packet type name space. Pre-defined values
are defined in openflow-common.h:
enum ofp_header_type_namespaces {
OFPHTN_ONF = 0, /* ONF namespace. */
OFPHTN_ETHERTYPE = 1, /* ns_type is an Ethertype. */
OFPHTN_IP_PROTO = 2, /* ns_type is a IP protocol number. */
OFPHTN_UDP_TCP_PORT = 3, /* ns_type is a TCP or UDP port. */
OFPHTN_IPV4_OPTION = 4, /* ns_type is an IPv4 option number. */
};
The lower 16 bits specify the actual type in the context of the name space.
Only name spaces 0 and 1 will be supported for now.
For name space OFPHTN_ONF the relevant packet type is 0 (Ethernet).
This is the default packet_type in OVS and the only one supported so far.
Packets of type (OFPHTN_ONF, 0) are called Ethernet packets.
In name space OFPHTN_ETHERTYPE the type is the Ethertype of the packet.
A packet of type (OFPHTN_ETHERTYPE, <Ethertype>) is a standard L2 packet
whith the Ethernet header (and any VLAN tags) removed to expose the L3
(or L2.5) payload of the packet. These will simply be called L3 packets.
The Ethernet address fields dl_src and dl_dst in struct flow are not
applicable for an L3 packet and must be zero. However, to maintain
compatibility with the large code base, we have chosen to copy the
Ethertype of an L3 packet into the the dl_type field of struct flow.
This does not mean that it will be possible to match on dl_type for L3
packets with PTAP later on. Matching must be done on packet_type instead.
New dp_packets are initialized with packet_type Ethernet. Ports that
receive L3 packets will have to explicitly adjust the packet_type.
Signed-off-by: Jean Tourrilhes <jt@labs.hpe.com>
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Co-authored-by: Zoltan Balogh <zoltan.balogh@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Add the appropriate function and the source file.
Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
One common use case of 'struct dp_packet_batch' is to process all
packets in the batch in order. Add an iterator for this use case
to simplify the logic of calling sites,
Another common use case is to drop packets in the batch, by reading
all packets, but writing back pointers of fewer packets. Add macros
to support this use case.
Signed-off-by: Andy Zhou <azhou@ovn.org>
Acked-by: Jarno Rajahalme <jarno@ovn.org>
Add Rx checksum offloading feature support on DPDK physical ports. By default,
the Rx checksum offloading is enabled if NIC supports. However,
the checksum offloading can be turned OFF either while adding a new DPDK
physical port to OVS or at runtime.
The rx checksum offloading can be turned off by setting the parameter to
'false'. For eg: To disable the rx checksum offloading when adding a port,
'ovs-vsctl add-port br0 dpdk0 -- \
set Interface dpdk0 type=dpdk options:rx-checksum-offload=false'
OR (to disable at run time after port is being added to OVS)
'ovs-vsctl set Interface dpdk0 options:rx-checksum-offload=false'
Similarly to turn ON rx checksum offloading at run time,
'ovs-vsctl set Interface dpdk0 options:rx-checksum-offload=true'
The Tx checksum offloading support is not implemented due to the following
reasons.
1) Checksum offloading and vectorization are mutually exclusive in DPDK poll
mode driver. Vector packet processing is turned OFF when checksum offloading
is enabled which causes significant performance drop at Tx side.
2) Normally, OVS generates checksum for tunnel packets in software at the
'tunnel push' operation, where the tunnel headers are created. However
enabling Tx checksum offloading involves,
*) Mark every packets for tx checksum offloading at 'tunnel_push' and
recirculate.
*) At the time of xmit, validate the same flag and instruct the NIC to do the
checksum calculation. In case NIC doesnt support Tx checksum offloading,
the checksum calculation has to be done in software before sending out the
packets.
No significant performance improvement noticed with Tx checksum offloading
due to the e overhead of additional validations + non vector packet processing.
In some test scenarios, it introduces performance drop too.
Rx checksum offloading still offers 8-9% of improvement on VxLAN tunneling
decapsulation even though the SSE vector Rx function is disabled in DPDK poll
mode driver.
Signed-off-by: Sugesh Chandran <sugesh.chandran@intel.com>
Acked-by: Jesse Gross <jesse@kernel.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
There's a lot of code in netdev-dpdk which is not at all related to the
netdev interface, mostly the library initialization code.
This commit moves it to a new 'dpdk' module, to simplify 'netdev-dpdk'.
Also a new module 'dpdk-stub' is introduced to implement some functions
when DPDK is not available. This replaces the old 'netdev-nodpdk'
module.
Some redundant includes are removed or reorganized as a consequence.
No functional change.
CC: Aaron Conole <aconole@redhat.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Tested-by: Aaron Conole <aconole@redhat.com>
The patch adds a new action to support packet truncation. The new action
is formatted as 'output(port=n,max_len=m)', as output to port n, with
packet size being MIN(original_size, m).
One use case is to enable port mirroring to send smaller packets to the
destination port so that only useful packet information is mirrored/copied,
saving some performance overhead of copying entire packet payload. Example
use case is below as well as shown in the testcases:
- Output to port 1 with max_len 100 bytes.
- The output packet size on port 1 will be MIN(original_packet_size, 100).
# ovs-ofctl add-flow br0 'actions=output(port=1,max_len=100)'
- The scope of max_len is limited to output action itself. The following
packet size of output:1 and output:2 will be intact.
# ovs-ofctl add-flow br0 \
'actions=output(port=1,max_len=100),output:1,output:2'
- The Datapath actions shows:
# Datapath actions: trunc(100),1,1,2
Tested-at: https://travis-ci.org/williamtu/ovs-travis/builds/140037134
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
DPDK datapath operate on batch of packets. To pass the batch of
packets around we use packets array and count. Next patch needs
to associate meta-data with each batch of packets. So Introducing
a batch structure to make handling the metadata easier.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
This scratchpad can be used by any layer to keep private data.
STT will use it for TCP reassembly state.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
All code is now in include/openvswitch/list.h.
Signed-off-by: Ben Warren <ben@skyportsystems.com>
Acked-by: Ryan Moats <rmoats@us.ibm.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
DPDK mbufs contain a valid RSS hash only if PKT_RX_RSS_HASH is
set in 'ol_flags'. Otherwise the hash is garbage and doesn't
relate to the packet.
This fixes an issue with vhost, which, being a virtual NIC, doesn't
compute the hash.
Reported-by: Dongjun <dongj@dtdream.com>
Suggested-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Kevin Traynor <kevin.traynor@intel.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
DPDK buf_len is only 16-bit wide ('allocated' was 32-bit), but it should
be enough to store the number of allocated bytes.
This will reduce 'struct dp_packet' size.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
In 'struct ofpbuf' the 'frame' pointer was used to parse different kinds of
data (Ethernet, OpenFlow, Netlink attributes). For Ethernet packets the
'frame' pointer was supposed to have the same value as the 'data'
pointer.
Since 'struct dp_packet' is only used for Ethernet packets, there's no
need for a separate 'frame' pointer: we can use the 'data' pointer
instead.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
The 'list' member is only used (two users) in the slow path.
This commit removes it to reduce the struct size
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
We already have the 'dp_hash' embedded in the metadata. This caused
confusion in the code. With this commit it should be clear that
'rss_hash' is the packet hash used for internal purposes, while
'md.dp_hash' is part of the flow, computed during the execution of
certain actions.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
DPDK v1.8.0 makes significant changes to struct rte_mbuf, including
removal of the 'pkt' and 'data' fields. The latter, formally a
pointer, is now calculated via an offset from the start of the
segment buffer. So now dp_packet data is also stored as offset
from base pointer.
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Rory Sexton <rory.sexton@intel.com>
Signed-off-by: Kevin Traynor <kevin.traynor@intel.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Currently dp-packet make use of ofpbuf for managing packet
buffers. That complicates ofpbuf, by making dp-packet
independent of ofpbuf both libraries can be optimized for
their own use case.
This avoids mapping operation between ofpbuf and dp_packet
in datapath upcalls.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
dp_packet is short and better name for datapath packet
structure.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>