mir/ovs - ovs - Mike's Git repositories

mir/ovs

mirror of https://github.com/openvswitch/ovs synced 2025-08-23 18:37:36 +00:00

Author	SHA1	Message	Date
Yi Yang	c17f32a11c	netdev-dpdk: Fix incorrect shinfo initialization. shinfo is used to store reference counter and free callback of an external buffer, but it is stored in mbuf if the mbuf has tailroom for it. This is wrong because the mbuf (and its data) can be freed before the external buffer, for example: pkt2 = rte_pktmbuf_alloc(mp); rte_pktmbuf_attach(pkt2, pkt); rte_pktmbuf_free(pkt); After this, pkt is freed, but it still contains shinfo, which is referenced by pkt2. This sequence of operations is possible inside DPDK e.g., while performing TSO operations for 'net_tap' PMD. Fix this by always storing shinfo at the tail of external buffer. Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support") Co-authored-by: Olivier Matz <olivier.matz@6wind.com> Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Signed-off-by: Yi Yang <yangyi01@inspur.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-02-01 19:53:03 +01:00
Ian Stokes	252e1e5764	dpdk: Update to use DPDK v20.11. This commit adds support for DPDK v20.11, it includes the following changes. 1. travis: Remove explicit DPDK kmods configuration. 2. sparse: Fix build with 20.05 DPDK tracepoints. 3. netdev-dpdk: Remove experimental API flag. http://patchwork.ozlabs.org/project/openvswitch/list/?series=173216&state=* 4. sparse: Update to DPDK 20.05 trace point header. http://patchwork.ozlabs.org/project/openvswitch/list/?series=179604&state=* 5. sparse: Fix build with DPDK 20.08. http://patchwork.ozlabs.org/project/openvswitch/list/?series=200181&state=* 6. build: Add support for DPDK meson build. http://patchwork.ozlabs.org/project/openvswitch/list/?series=199138&state=* 7. netdev-dpdk: Remove usage of RTE_ETH_DEV_CLOSE_REMOVE flag. http://patchwork.ozlabs.org/project/openvswitch/list/?series=207850&state=* 8. netdev-dpdk: Fix build with 20.11-rc1. http://patchwork.ozlabs.org/project/openvswitch/list/?series=209006&state=* 9. sparse: Fix __ATOMIC_* redefinition errors http://patchwork.ozlabs.org/project/openvswitch/list/?series=209452&state=* 10. build: Remove DPDK make build references. http://patchwork.ozlabs.org/project/openvswitch/list/?series=216682&state=* For credit all authors of the original commits to 'dpdk-latest' with the above changes have been added as co-authors for this commit. Signed-off-by: David Marchand <david.marchand@redhat.com> Co-authored-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Sunil Pai G <sunil.pai.g@intel.com> Co-authored-by: Sunil Pai G <sunil.pai.g@intel.com> Signed-off-by: Eli Britstein <elibr@nvidia.com> Co-authored-by: Eli Britstein <elibr@nvidia.com> Tested-by: Harry van Haaren <harry.van.haaren@intel.com> Tested-by: Govindharajan, Hariprasad <hariprasad.govindharajan@intel.com> Tested-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2020-12-16 17:44:06 +00:00
Gaetan Rivet	f4336f504b	netdev-dpdk: Add option to configure VF MAC address. In some cloud topologies, using DPDK VF representors in guest requires configuring a VF before it is assigned to the guest. A first basic option for such configuration is setting the VF MAC address. Add a key 'dpdk-vf-mac' to the 'options' column of the Interface table. This option can be used as such: $ ovs-vsctl add-port br0 dpdk-rep0 -- set Interface dpdk-rep0 type=dpdk \ options:dpdk-vf-mac=00:11:22:33:44:55 Suggested-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Eli Britstein <elibr@nvidia.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Gaetan Rivet <grive@u256.net> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2020-11-16 17:47:11 +01:00
Ilya Maximets	f9b0107dd0	netdev-dpdk: Add ability to set MAC address. It is possible to set the MAC address of DPDK ports by calling rte_eth_dev_default_mac_addr_set(). OvS does not actually call this function for non-internal ports, but the implementation is exposed to be used in a later commit. Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Gaetan Rivet <grive@u256.net>	2020-11-16 17:47:11 +01:00
Ian Stokes	86f624e486	DPDK: Remove support for vhost-user zero-copy. Support for vhost-user dequeue zero-copy was deprecated in OVS 2.14 with the aim of removing it for OVS 2.15. OVS only supports zero copy for vhost client mode, as such it will cease to function due to DPDK commit [1] Also DPDK is set to remove zero-copy functionality in DPDK 20.11 as referenced by commit [2] As such remove support from OVS. [1] 715070ea10e6 ("vhost: prevent zero-copy with incompatible client mode") [2] d21003c9dafa ("doc: announce removal of vhost zero-copy dequeue") Signed-off-by: Ian Stokes <ian.stokes@intel.com> Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Kevin Traynor <ktraynor@redhat.com>	2020-10-05 16:05:25 +01:00
Jaime Caamaño Ruiz	db7041716b	netdev-dpdk: Don't set rx mq mode for net_virtio. Since DPDK 19.11 [1], it is not allowed to set any RX mq mode for virtio driver. [1] `13b3137f3b` Signed-off-by: Jaime Caamaño Ruiz <jcaamano@suse.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2020-09-15 22:49:57 +02:00
Ian Stokes	e919fd4955	dpdk: Deprecate vhost-user dequeue zero-copy. Dequeue zero-copy is no longer supported for vhost-user client mode in DPDK due to commit [1]. In addition to this, zero-copy mode has been proposed to be marked deprecated in [2] with removal in the next DPDK LTS release. This commit deprecates support for vhost-user dequeue zero-copy in OVS with its removal expected in the next OVS release. [1] 715070ea10e6 ("vhost: prevent zero-copy with incompatible client mode") [2] http://mails.dpdk.org/archives/dev/2020-August/177236.html Signed-off-by: Ian Stokes <ian.stokes@intel.com> Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Ilya Maximets <i.maximets@ovn.org>	2020-08-12 18:20:50 +01:00
Sivaprasad Tummala	71417ef011	netdev-dpdk: linear buffer check with zero-copy As of DPDK 19.11, in order to use dequeue-zero-copy in DPDK Vhost library, the application has to disable the linear buffer option. Hence dequeue-zero-copy is not supported for vhost application that requires linear buffers. An alternative DPDK based approach to disable the linear buffers within the vhost library itself was proposed in [1], however the consensus was that application should be responsible for disabling linear buffers. As such this patch disables linear buffers when zero-copy is enabled. [1] https://patches.dpdk.org/patch/67200/ Fixes: 127b6a6eea02 ("dpdk: Update to use DPDK 19.11.") Signed-off-by: Sivaprasad Tummala <Sivaprasad.Tummala@intel.com> Acked-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2020-08-12 18:19:42 +01:00
Ilya Maximets	82c9d9993d	netdev-dpdk: Remove deprecated ring port type. 'dpdkr' ring ports was deprecated in 2.13 release and was not actually used for a long time. Remove support now. More details in commit b4c5f00c339b ("netdev-dpdk: Deprecate ring ports.") Acked-by: Aaron Conole <aconole@redhat.com> Acked-by: David Marchand <david.marchand@redhat.com> Acked-by: Ian Stokes <ian.stokes@intel.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2020-03-06 12:41:04 +01:00
Flavio Leitner	35b5586ba7	userspace TSO: SCTP checksum offload optional. Ideally SCTP checksum offload needs be advertised by the NIC when userspace TSO is enabled. However, very few drivers do that and it's not a widely used protocol. So, this patch enables SCTP checksum offload if available, otherwise userspace TSO can still be enabled but SCTP packets will be dropped on NICs without support. Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support") Signed-off-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2020-02-26 15:24:15 +01:00
Flavio Leitner	8c5163fe81	userspace TSO: Include UDP checksum offload. Virtio doesn't expose flags to control which protocols checksum offload needs to be enabled or disabled. This patch checks if the NIC supports UDP checksum offload and active it when TSO is enabled. Reported-by: Ilya Maximets <i.maximets@ovn.org> Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support") Signed-off-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2020-02-26 15:24:15 +01:00
Flavio Leitner	514950d37d	netdev-dpdk: vhost: disable unsupported offload features. Disable ECN and UFO since this is not supported yet. Also, disable all other features when userspace_tso is not enabled. Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support") Signed-off-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2020-02-26 15:24:15 +01:00
Ilya Maximets	1223cf123e	netdev-dpdk: Don't enable offloading on HW device if not requested. DPDK drivers has different implementations of transmit functions. Enabled offloading may cause driver to choose slower variant significantly affecting performance if userspace TSO wasn't requested. Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support") Reported-by: David Marchand <david.marchand@redhat.com> Acked-by: David Marchand <david.marchand@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2020-02-07 09:04:06 +01:00
David Marchand	3d6a6f450a	netdev-dpdk: Fix port init when lacking Tx offloads for TSO. The check on TSO capability did not ensure ip checksum, tcp checksum and TSO tx offloads were available which resulted in a port init failure (example below with a ena device): 2020-02-04T17:42:52.976Z\|00084\|dpdk\|ERR\|Ethdev port_id=0 requested Tx offloads 0x2a doesn't match Tx offloads capabilities 0xe in rte_eth_dev_configure() Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support") Reported-by: Ravi Kerur <rkerur@gmail.com> Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2020-02-05 17:44:36 +01:00
Flavio Leitner	29cf9c1b3b	userspace: Add TCP Segmentation Offload support Abbreviated as TSO, TCP Segmentation Offload is a feature which enables the network stack to delegate the TCP segmentation to the NIC reducing the per packet CPU overhead. A guest using vhostuser interface with TSO enabled can send TCP packets much bigger than the MTU, which saves CPU cycles normally used to break the packets down to MTU size and to calculate checksums. It also saves CPU cycles used to parse multiple packets/headers during the packet processing inside virtual switch. If the destination of the packet is another guest in the same host, then the same big packet can be sent through a vhostuser interface skipping the segmentation completely. However, if the destination is not local, the NIC hardware is instructed to do the TCP segmentation and checksum calculation. It is recommended to check if NIC hardware supports TSO before enabling the feature, which is off by default. For additional information please check the tso.rst document. Signed-off-by: Flavio Leitner <fbl@sysclose.org> Tested-by: Ciara Loftus <ciara.loftus.intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2020-01-17 22:27:25 +00:00
Flavio Leitner	e666e8e0f3	vhost: Disable multi-segmented buffers There is no support for multi-segmented buffers, so flag that to vhost library. Signed-off-by: Flavio Leitner <fbl@sysclose.org> Tested-by: Ciara Loftus <ciara.loftus.intel.com> Acked-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2020-01-17 17:14:10 +00:00
Eli Britstein	2f7f9284bd	netdev-dpdk: Getter function for dpdk port id API. Add a getter function for using the dpdk port id outside the scope of netdev-dpdk.c to be used for HW offload. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2020-01-16 13:34:10 +01:00
Eli Britstein	63556d8586	netdev-dpdk: Introduce rte flow query function. Introduce a rte flow query function as a pre-step towards reading HW statistics of fully offloaded flows. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2020-01-16 13:34:10 +01:00
Eelco Chaudron	e61bdffc2a	netdev-dpdk: Add new DPDK RFC 4115 egress policer This patch adds a new policer to the DPDK datapath based on RFC 4115's Two-Rate, Three-Color marker. It's a two-level hierarchical policer which first does a color-blind marking of the traffic at the queue level, followed by a color-aware marking at the port level. At the end traffic marked as Green or Yellow is forwarded, Red is dropped. For details on how traffic is marked, see RFC 4115. This egress policer can be used to limit traffic at different rated based on the queues the traffic is in. In addition, it can also be used to prioritize certain traffic over others at a port level. For example, the following configuration will limit the traffic rate at a port level to a maximum of 2000 packets a second (64 bytes IPv4 packets). 100pps as CIR (Committed Information Rate) and 1000pps as EIR (Excess Information Rate). High priority traffic is routed to queue 10, which marks all traffic as CIR, i.e. Green. All low priority traffic, queue 20, is marked as EIR, i.e. Yellow. ovs-vsctl --timeout=5 set port dpdk1 qos=@myqos -- \ --id=@myqos create qos type=trtcm-policer \ other-config:cir=52000 other-config:cbs=2048 \ other-config:eir=52000 other-config:ebs=2048 \ queues:10=@dpdk1Q10 queues:20=@dpdk1Q20 -- \ --id=@dpdk1Q10 create queue \ other-config:cir=41600000 other-config:cbs=2048 \ other-config:eir=0 other-config:ebs=0 -- \ --id=@dpdk1Q20 create queue \ other-config:cir=0 other-config:cbs=0 \ other-config:eir=41600000 other-config:ebs=2048 \ This configuration accomplishes that the high priority traffic has a guaranteed bandwidth egressing the ports at CIR (1000pps), but it can also use the EIR, so a total of 2000pps at max. These additional 1000pps is shared with the low priority traffic. The low priority traffic can use at maximum 1000pps. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2020-01-15 19:17:55 +00:00
Eelco Chaudron	23c01b196f	netdev-dpdk: Add support for multi-queue QoS to the DPDK datapath This patch adds support for multi-queue QoS to the DPDK datapath. Most of the code is based on an earlier patch from a patchset sent out by zhaozhanxu. The patch was titled "[ovs-dev, v2, 1/4] netdev-dpdk.c: Support the multi-queue QoS configuration for dpdk datapath" Signed-off-by: zhaozhanxu <zhaozhanxu@163.com> Co-authored-by: zhaozhanxu <zhaozhanxu@163.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2020-01-15 19:17:04 +00:00
Aaron Conole	cefdd80a29	netdev-dpdk: Avoid undefined behavior processing devargs In "Use of library functions" in the C standard, the following statement is written to apply to all library functions: If an argument to a function has an invalid value (such as ... a null pointer ... the behavior is undefined. Later, under the "String handling" section, "Comparison functions" no exception is listed for strcmp, which means NULL is invalid. It may be possible for the smap_get to return NULL. Given the above, we must check that new_devargs is not null. The check against NULL for new_devargs later in the function is still valid. Fixes: 55e075e65ef9 ("netdev-dpdk: Arbitrary 'dpdk' port naming") Signed-off-by: Aaron Conole <aconole@redhat.com> Acked-by: Ciara Loftus <ciara.loftus@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2020-01-13 16:35:21 +00:00
Eelco Chaudron	3d56e4ac44	netdev-dpdk: Add coverage counter to count vhost IRQs. When the dpdk vhost library executes an eventfd_write() call, i.e. waking up the guest, a new callback will be called. This patch adds the callback to count the number of interrupts sent to the VM to track the number of times interrupts where generated. This might be of interest to find out system-calls were called in the DPDK fast path. The coverage counter is called "vhost_notification" and can be read with: $ ovs-appctl coverage/read-counter vhost_notification 13238319 Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2019-12-19 01:03:52 +01:00
Kevin Traynor	6d77abf4f7	netdev-dpdk: Fix sw stats perf drop. Accessing the sw stats in the vhost datapath of a PVP test can incur a performance drop of ~2%. Most of the time these stats will just be getting zero added to them. By checking if there is a non-zero update first, we can avoid accessing them when they won't be updated and avoid the performance drop. Fixes: 2f862c712e52 ("netdev-dpdk: Detailed packet drop statistics.") Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2019-12-18 13:06:27 +01:00
Eelco Chaudron	988fd46391	netdev-dpdk: add support for the RTE_ETH_EVENT_INTR_RESET event. Currently, OVS does not register and therefore not handle the interface reset event from the DPDK framework. This would cause a problem in cases where a VF is used as an interface, and its configuration changes. As an example in the following scenario the MAC change is not detected/acted upon until OVS is restarted without the patch applied: $ echo 1 > /sys/bus/pci/devices/0000:05:00.1/sriov_numvfs $ ovs-vsctl add-port ovs_pvp_br0 dpdk0 -- \ set Interface dpdk0 type=dpdk -- \ set Interface dpdk0 options:dpdk-devargs=0000:05:0a.0 $ ip link set p5p2 vf 0 mac 52:54:00:92:d3:33 Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2019-12-18 01:23:55 +01:00
Ian Stokes	127b6a6eea	dpdk: Update to use DPDK 19.11. This commit adds support for DPDK v19.11, it includes the following changes. 1. travis: Enable compilation and linkage with dpdk 19.11. 2. sparse: Remove dpdk network headers copies. https://patchwork.ozlabs.org/patch/1185256/ 3. dpdk: Migrate to new PDUMP API. https://patchwork.ozlabs.org/patch/1192971/ 4. netdev-dpdk: Prefix network structures with rte_. https://patchwork.ozlabs.org/patch/1109733/ 5. netdev-dpdk: Update by new color definitions. https://patchwork.ozlabs.org/patch/1086089/ 6. docs: Update docs to reference 19.11. 7. docs: Add note regarding hotplug and igb_uio requirements. For credit all authors of the original commits to 'dpdk-latest' with the above changes been added as co-authors for this commmit. Signed-off-by: David Marchand <david.marchand@redhat.com> Co-authored-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Co-authored-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Co-authored-by: Ophir Munk <ophirmu@mellanox.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Acked-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2019-12-04 20:51:57 +00:00
Ilya Maximets	b4c5f00c33	netdev-dpdk: Deprecate ring ports. 'dpdkr' a.k.a. DPDK ring ports has really poor support in OVS and not tested on a regular basis. These ports are intended to work via shared memory with another DPDK secondary process, but there are lots of limitations for using this functionality in practice. Most of them connected with running secondary DPDK application and memory layout issues. More details are available in DPDK guide: https://doc.dpdk.org/guides-18.11/prog_guide/multi_proc_support.html#multi-process-limitations Beside the functional limitations it's also hard to use this functionality correctly. User must be sure that OVS and secondary DPDK application are running on different CPU cores, which is hard because non-PMD threads could float over available CPU cores. This or any other misconfiguration will likely lead to crash of OVS. Another problem is that the user must actually build the secondary application with the same version of DPDK that was used for OVS build. Above issues are same as we have while using DPDK pdump. Beside that, current implementation in OVS is not able to free allocated rings that could lead to memory exhausting. Initially these ports was added to use with IVSHMEM for a fast zero-copy HOST<-->VM communication. However, IVSHMEM is not used anymore. IVSHMEM support was removed from DPDK in 16.11 release (instructions for IVSHMEM were removed from the OVS docs almost 3 years ago by commit 90ca71dd317f ("doc: Remove ivshmem instructions.")) and the patch for QEMU for using regular files as a device backend is no longer available. That makes DPDK ring ports barely useful in real virtualization environment. This patch adds a deprecation warnings for run-time port creation and documentation. Claiming to completely remove this functionality from OVS in one of the next releases. Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Ian Stokes <ian.stokes@intel.com> Acked-by: Aaron Conole <aconole@redhat.com> Acked-by: David Marchand <david.marchand@redhat.com> Acked-by: Kevin Traynor <ktraynor@redhat.com>	2019-11-28 16:35:30 +01:00
Sriram Vatala	2f862c712e	netdev-dpdk: Detailed packet drop statistics. OVS may be unable to transmit packets for multiple reasons on the userspace datapath and today there is a single counter to track packets dropped due to any of those reasons. This patch adds custom software stats for the different reasons packets may be dropped during tx/rx on the userspace datapath in OVS. - MTU drops : drops that occur due to a too large packet size - Qos drops : drops that occur due to egress/ingress QOS - Tx failures: drops as returned by the DPDK PMD send function Note that the reason for tx failures is not specified in OVS. In practice for vhost ports it is most common that tx failures are because there are not enough available descriptors, which is usually caused by misconfiguration of the guest queues and/or because the guest is not consuming packets fast enough from the queues. These counters are displayed along with other stats in "ovs-vsctl get interface <iface> statistics" command and are available for dpdk and vhostuser/vhostuserclient ports. Also the existing "tx_retries" counter for vhost ports has been renamed to "ovs_tx_retries", so that all the custom statistics that OVS accumulates itself will have the prefix "ovs_". This will prevent any custom stats names overlapping with driver/HW stats. Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Sriram Vatala <sriram.v@altencalsoftlabs.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2019-11-11 17:14:55 +01:00
Ilya Maximets	b99ab8aaaf	netdev-dpdk: Reuse vhost function for dpdk ETH custom stats. This is yet another refactoring for upcoming detailed drop stats. It allows to use single function for all the software calculated statistics in netdev-dpdk for both vhost and ETH ports. UINT64_MAX used as a marker for non-supported statistics in a same way as it's done in bridge.c for common netdev stats. Co-authored-by: Sriram Vatala <sriram.v@altencalsoftlabs.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Sriram Vatala <sriram.v@altencalsoftlabs.com> Acked-by: Kevin Traynor <ktraynor@redhat.com>	2019-11-11 17:13:01 +01:00
David Marchand	9ff24b9c93	netdev-dpdk: Track vhost tx contention. Add a coverage counter to help diagnose contention on the vhost txqs. This is seen as dropped packets on the physical ports for rates that are usually handled fine by OVS. Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2019-11-08 21:26:56 +01:00
Tomasz Konieczny	c2c84474d4	netdev-dpdk: Fix flow control not configuring. Currently OVS is unable to change flow control configuration in DPDK because new settings are being overwritten by current settings with rte_eth_dev_flow_ctrl_get(). The fix restores correct order of operations and at the same time does not trigger error on devices without flow control support when flow control not requested. Fixes: 7e1de65e8dfb ("netdev-dpdk: Fix failure to configure flow control at netdev-init.") Signed-off-by: Tomasz Konieczny <tomaszx.konieczny@intel.com> Co-authored-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2019-11-04 17:23:47 +01:00
Ilya Maximets	15ba075d39	netdev-dpdk: Fix Tx queue false sharing. 'tx_q' array is allocated for each DPDK netdev. 'struct dpdk_tx_queue' is 8 bytes long, so 8 tx queues are sharing the same cache line in case of 64B cacheline size. This causes 'false sharing' issue in mutliqueue case because taking the spinlock implies write to memory i.e. cache invalidation. Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Eelco Chaudron <echaudro@redhat.com>	2019-10-22 19:29:24 +02:00
Paul Chaignon	940ac2ce88	treewide: Use packet batch APIs This patch replaces direct accesses to dp_packet_batch and dp_packet internal components by the appropriate API calls. It extends commit 1270b6e52 (treewide: Wider use of packet batch APIs). This patch was generated using the following semantic patch (cf. http://coccinelle.lip6.fr). // <smpl> @ dp_packet @ struct dp_packet_batch b1; struct dp_packet_batch b2; struct dp_packet p; expression e; @@ ( - b1->packets[b1->count++] = p; + dp_packet_batch_add(b1, p); \| - b2.packets[b2.count++] = p; + dp_packet_batch_add(&b2, p); \| - p->packet_type == htonl(PT_ETH) + dp_packet_is_eth(p) \| - p->packet_type != htonl(PT_ETH) + !dp_packet_is_eth(p) \| - b1->count == 0 + dp_packet_batch_is_empty(b1) \| - !b1->count + dp_packet_batch_is_empty(b1) \| b1->count = e; \| b1->count++ \| b2.count = e; \| b2.count++ \| - b1->count + dp_packet_batch_size(b1) \| - b2.count + dp_packet_batch_size(&b2) ) // </smpl> Signed-off-by: Paul Chaignon <paul.chaignon@orange.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2019-09-25 14:42:00 -07:00
Ilya Maximets	5c7ba90d81	netdev-dpdk: Refactor vhost custom stats for extensibility. vHost interfaces currently has only one custom statistic, but there might be others in the near future. This refactoring makes the code work in the same way as it done for dpdk and afxdp stats to keep the common style over the different code places and makes it easily extensible for the new stats addition. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Acked-by: Kevin Traynor <ktraynor@redhat.com>	2019-08-26 14:03:37 +03:00
Ilya Maximets	18366d1651	netdev-dpdk: Fix not reporting rx_oversize_errors in stats. There is a big code duplication issue with DPDK xstats that led to missed "rx_oversize_errors" statistics. It's defined but not used. Fix that by actually using this stat along with code refactoring that will allow us to not make same mistakes in the future. Macro definitions are perfectly suitable to automate code generation in such cases and already used in a couple of places in OVS for similar purposes. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Ian Stokes <ian.stokes@intel.com>	2019-08-26 14:03:37 +03:00
Kevin Traynor	080f080c3b	netdev-dpdk: Enable tx-retries-max config. vhost tx retries can provide some mitigation against dropped packets due to a temporarily slow guest/limited queue size for an interface, but on the other hand when a system is fully loaded those extra cycles retrying could mean packets are dropped elsewhere. Up to now max vhost tx retries have been hardcoded, which meant no tuning and no way to disable for debugging to see if extra cycles spent retrying resulted in rx drops on some other interface. Add an option to change the max retries, with a value of 0 effectively disabling vhost tx retries. Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2019-07-08 12:03:28 +01:00
Kevin Traynor	c161357d5d	netdev-dpdk: Add custom stat for vhost tx retries. vhost tx retries may occur, and it can be a sign that the guest is not optimally configured. Add a custom stat so a user will know if vhost tx retries are occurring and hence give a hint that guest config should be examined. Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2019-07-08 12:02:51 +01:00
Kevin Traynor	730b34859f	netdev-dpdk: Fix additional vhost tx retry. Fix minor issue of one possible additional retry. Fixes: c6ec9d176dbf ("netdev-dpdk: Fix vHost stats.") Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2019-06-28 09:49:32 +01:00
David Marchand	61473a0eb2	netdev-dpdk: Reset queue number for vhost devices on vm shutdown. Rather than poll all disabled queues and waste some memory for vms that have been shutdown, we can reconfigure when receiving a destroy connection notification from the vhost library. $ while true; do ovs-appctl dpif-netdev/pmd-rxq-show \|awk ' /port: / { tot++; if ($5 == "(enabled)") { en++; } } END { print "total: " tot ", enabled: " en }' sleep 1 done total: 66, enabled: 66 total: 6, enabled: 2 This change requires a fix in the DPDK vhost library, so bump the minimal required version to 18.11.2. Co-authored-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2019-06-27 15:28:04 +01:00
David Marchand	7235cd206e	netdev-dpdk: Avoid reconfiguration on VIRTIO_NET_F_MQ changes. At the moment, a malicious guest might negotiate VIRTIO_NET_F_MQ and !VIRTIO_NET_F_MQ in a loop which would be seen as qp_num going from 1 to n and n to 1 continuously, triggering datapath reconfigurations at each transition. Limit this by only reconfiguring on increased qp_num. The previous patch reduced the observed cost of polling disabled queues, so the only cost is memory. Co-authored-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2019-06-26 19:15:13 +01:00
David Marchand	35c91567c8	dpif-netdev: Only poll enabled vhost queues. We currently poll all available queues based on the max queue count exchanged with the vhost peer and rely on the vhost library in DPDK to check the vring status beneath. This can lead to some overhead when we have a lot of unused queues. To enhance the situation, we can skip the disabled queues. On rxq notifications, we make use of the netdev's change_seq number so that the pmd thread main loop can cache the queue state periodically. $ ovs-appctl dpif-netdev/pmd-rxq-show pmd thread numa_id 0 core_id 1: isolated : true port: dpdk0 queue-id: 0 (enabled) pmd usage: 0 % pmd thread numa_id 0 core_id 2: isolated : true port: vhost1 queue-id: 0 (enabled) pmd usage: 0 % port: vhost3 queue-id: 0 (enabled) pmd usage: 0 % pmd thread numa_id 0 core_id 15: isolated : true port: dpdk1 queue-id: 0 (enabled) pmd usage: 0 % pmd thread numa_id 0 core_id 16: isolated : true port: vhost0 queue-id: 0 (enabled) pmd usage: 0 % port: vhost2 queue-id: 0 (enabled) pmd usage: 0 % $ while true; do ovs-appctl dpif-netdev/pmd-rxq-show \|awk ' /port: / { tot++; if ($5 == "(enabled)") { en++; } } END { print "total: " tot ", enabled: " en }' sleep 1 done total: 6, enabled: 2 total: 6, enabled: 2 ... # Started vm, virtio devices are bound to kernel driver which enables # F_MQ + all queue pairs total: 6, enabled: 2 total: 66, enabled: 66 ... # Unbound vhost0 and vhost1 from the kernel driver total: 66, enabled: 66 total: 66, enabled: 34 ... # Configured kernel bound devices to use only 1 queue pair total: 66, enabled: 34 total: 66, enabled: 19 total: 66, enabled: 4 ... # While rebooting the vm total: 66, enabled: 4 total: 66, enabled: 2 ... total: 66, enabled: 66 ... # After shutting down the vm total: 66, enabled: 66 total: 66, enabled: 2 Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2019-06-26 18:43:39 +01:00
Ilya Maximets	b6cabb8f8f	netdev: Split up netdev offloading to separate module. New module 'netdev-offload' created to manage different flow API implementations. All the generic and provider independent code moved there from the 'netdev' module. Flow API providers further encapsulated. The only function that was changed is 'netdev_any_oor'. Now it uses offloading related hmap instead of common 'netdev_shash'. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ben Pfaff <blp@ovn.org> Acked-by: Roi Dayan <roid@mellanox.com>	2019-06-11 09:39:36 +03:00
Ilya Maximets	5fc5c50f3d	netdev: Dynamic per-port Flow API. Current issues with Flow API: * OVS calls offloading functions regardless of successful flow API initialization. (ex. on init_flow_api failure) * Static initilaization of Flow API for a netdev_class forbids having different offloading types for different instances of netdev with the same netdev_class. (ex. different vports in 'system' and 'netdev' datapaths at the same time) Solution: * Move Flow API from the netdev_class to netdev instance. * Make Flow API dynamic, i.e. probe the APIs and choose the suitable one. Side effects: * Flow API providers localized as possible in their modules. * Now we have an ability to make runtime checks. For example, we could check if particular device supports features we need, like if dpdk device supports RSS+MARK action. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Roi Dayan <roid@mellanox.com>	2019-06-11 09:39:36 +03:00
Ilya Maximets	595ce47cd6	netdev-dpdk: Print the reason of device detaching failure. Useful for debugging. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ian Stokes <ian.stokes@intel.com>	2019-05-30 11:11:41 +03:00
Liliia Butorina	30e834dcb5	netdev-dpdk: Post-copy Live Migration support for vhost-user-client. Post-copy Live Migration for vHost supported since DPDK 18.11 and QEMU 2.12. New global config option 'vhost-postcopy-support' added to control this feature. Ex.: ovs-vsctl set Open_vSwitch . other_config:vhost-postcopy-support=true Changing this value requires restarting the daemon. It's safe to enable this knob even if QEMU doesn't support post-copy LM. Feature marked as experimental and disabled by default because it may cause PMD thread hang on destination host on page fault for the time of page downloading from the source. Feature is not compatible with 'mlockall' and 'dequeue zero-copy'. Support added only for vhost-user-client. Signed-off-by: Liliia Butorina <l.butorina@partner.samsung.com> Co-authored-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2019-05-24 15:31:28 +03:00
Ilya Maximets	bb9d262357	netdev-dpdk: Allocate vhost_id dynamically. 'vhost_id' is an array of 'PATH_MAX' bytes in the middle of 'netdev_dpdk' structure. That is 4K bytes. 'vhost_id' never used on a hot path and there is no need to keep it inside the structure memory. Dynamic allocation will allow to decrease 'struct netdev_dpdk' significantly, saving 4KB per ETH port (ETH ports doesn't use 'vhost_id') and almost same value per vhost ports (real 'vhost_id's, in common case, are much shorter). We could save the pointer space by making the union with 'devargs' which is mutually exclusive with 'vhost_id'. As we're just removing the single 'PADDED_MEMBER', the total cacheline layout is not affected. Stats for 'struct netdev_dpdk': Before: /* size: 4992, cachelines: 78 / After : / size: 896, cachelines: 14 */ Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2019-04-10 16:54:56 +01:00
Ilya Maximets	170ef7265a	netdev-dpdk: Print netdev name for txq mapping. In case of reconfiguration while 'vhost_id' is not set yet, there will be the meaningless message like: \|netdev_dpdk\|DBG\|TX queue mapping for \|netdev_dpdk\|DBG\| 0 --> 0 It's better to print the name of the netdev which is always set. Additionally fixed possible splitting by other log messages and missing space in the queue state message. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2019-03-22 22:44:54 +00:00
Roni Bar Yanai	3d67b2d28e	netdev-dpdk: Move offloading code to a new file Hardware offloading code is moved to a new file called netdev-rte-offloads.c. The original offloading code is copied from the netdev-dpdk.c file to the new file, where future offloading code should be added as well. The copied code was refactored based on coding style. The netdev-dpdk.c file will remain unchanged as new offloading code is added. Co-authored-by: Ophir Munk <ophirmu@mellanox.com> Reviewed-by: Asaf Penso <asafp@mellanox.com> Signed-off-by: Roni Bar Yanai <roniba@mellanox.com> Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2019-03-19 14:33:27 +00:00
Roni Bar Yanai	6775bdfc9b	netdev-dpdk: Expose flow creation/destruction calls Before offloading code was added to the netdev-dpdk.c file (MARK and RSS actions) the only DPDK RTE calls in use were rte_flow_create() and rte_flow_destroy(). In preparation for splitting the offloading code from the netdev-dpdk.c file to a separate file, it is required to embed these RTE calls into a global netdev-dpdk-* API so that they can be called from the new file. An example for this requirement can be seen in the handling of dev->mutex, which should be encapsulated inside netdev-dpdk class (netdev-dpdk.c file), and should be unknown to the outside callers. This commit embeds the rte_flow_create() call inside the netdev_dpdk_flow_create() API and the rte_flow_destroy() call inside the netdev_dpdk_rte_flow_destroy() API. Reviewed-by: Asaf Penso <asafp@mellanox.com> Signed-off-by: Roni Bar Yanai <roniba@mellanox.com> Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Co-authored-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2019-03-19 14:12:21 +00:00
Ilya Maximets	a47e2db209	dp-packet: Refactor offloading API. 1. No reason to have mbuf related APIs in a generic code. 2. Not only RSS/checksums should be invalidated in case of tunnel decapsulation or sending to 'ring' ports. In order to fix two above issues, new function 'dp_packet_reset_offload' introduced. In order to clean up/unify the code and simplify addition of new offloading features to non-DPDK version of dp_packet, introduced 'ol_flags' bitmask. Additionally reduced code complexity in 'dp_packet_clone_with_headroom' by using already existent generic APIs. Unfortunately, we still need to have a special case for mbuf initialization inside 'dp_packet_init__()'. 'dp_packet_init_specific()' introduced for this purpose as a generic API for initialization of the implementation-specific fields. Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2019-03-13 09:51:30 +00:00
Ilya Maximets	324215c65c	netdev-dpdk: Use single struct/union for flow offload items. Having a single structure allows to simplify the code path and clear all the items at once (probably faster). This does not increase stack memory usage because all the L4 related items grouped in a union. Changes: - Memsets combined. - 'ipv4_next_proto_mask' dropped as we already know the address and able to use 'mask.ipv4.hdr.next_proto_id' directly. - Group of 'if' statements for L4 protocols turned to a 'switch'. We can do that, because we don't have semi-local variables anymore. - Eliminated 'end_proto_check' label. Not needed with 'switch'. Additionally 'rte_memcpy' replaced with simple 'memcpy' as it makes no sense to use 'rte_memcpy' for 6 bytes. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Asaf Penso <asafp@mellanox.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2019-02-27 22:01:18 +00:00

1 2 3 4 5 ...

360 Commits