mir/ovs - ovs - Mike's Git repositories

mir/ovs

mirror of https://github.com/openvswitch/ovs synced 2025-08-23 10:28:00 +00:00

Author	SHA1	Message	Date
Ophir Munk	40e940e439	netdev-dpdk: support port representors Dpdk port representors were introduced in dpdk versions 18.xx. Prior to port representors there was a one-to-one relationship between an rte device (e.g. PCI bus) and an eth device (referenced as dpdk port id in OVS). With port representors the relationship becomes one-to-many rte device to eth devices. For example in [3] there are two devices (representors) using the same PCI physical address 0000:08:00.0: "0000:08:00.0,representor=[3]" and "0000:08:00.0,representor=[5]". This commit handles the new one-to-many relationship. For example, when one of the device port representors in [3] is closed - the PCI bus cannot be detached until the other device port representor is closed as well. OVS remains backward compatible by supporting dpdk legacy PCI ports which do not include port representors. Dpdk port representors related commits are listed in [1]. Dpdk port representors documentation appears in [2]. A sample configuration which uses two representors ports (the output of "ovs-vsctl show" command) is shown in [3]. [1] e0cb96204b71 ("net/i40e: add support for representor ports") cf80ba6e2038 ("net/ixgbe: add support for representor ports") 26c08b979d26 ("net/mlx5: add port representor awareness") [2] https://doc.dpdk.org/guides-18.11/prog_guide/switch_representation.html [3] Bridge "ovs_br0" Port "ovs_br0" Interface "ovs_br0" type: internal Port "port-rep3" Interface "port-rep3" type: dpdk options: {dpdk-devargs="0000:08:00.0,representor=[3]"} Port "port-rep5" Interface "port-rep5" type: dpdk options: {dpdk-devargs="0000:08:00.0,representor=[5]"} ovs_version: "2.10.90" Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Co-authored-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2019-01-17 23:33:36 +00:00
Ophir Munk	03f3f9c0fa	dpdk: Update to use DPDK 18.11. This commit adds support for DPDK v18.11, it includes the following changes. 1. Enable compilation and linkage with dpdk 18.11.0 The following dpdk commits which were introduced after dpdk 17.11.x require OVS updates to accommodate to the dpdk changes. - ce17edde ("ethdev: introduce Rx queue offloads API") - ab3ce1e0 ("ethdev: remove old offload API") - c06ddf96 ("meter: add configuration profile") - e58638c3 ("ethdev: fix TPID handling in flow API") - cd8c7c7c ("ethdev: replace bus specific struct with generic dev") - ac8d22de ("ethdev: flatten RSS configuration in flow API") 2. Limit configured rss hash functions to only those supported by the eth device. 3. Set default RSS key in struct action_rss_data, required by OVS commit- e8a2b5bf ("netdev-dpdk: implement flow offload with rte flow") when configured with "other_config:hw-offload=true". 4. DEV_RX_OFFLOAD_CRC_STRIP has been removed from DPDK 18.11. DEV_RX_OFFLOAD_KEEP_CRC can now be used to keep the CRC. Use the correct flag and check it is supported. 5. rte_eth_dev_attach/detach have been removed from DPDK 18.11. Replace them with rte_dev_probe/remove. 6. Update docs and travis to use DPDK18.11. This commit squashes the following commits present on the dpdk-latest branch: 7f021f902bb3 ("netdev-dpdk: Upgrade to dpdk v18.08") 270d9216f1ed ("netdev-dpdk: Set scatter based on capabilities") bef2cdc8f412 ("netdev-dpdk: Fix returning the field of malloced struct.") 73c1a65167fc ("redhat: change variable used for non-root user support") eb485f60ce44 ("dpdk: Update to use DPDK 18.11.") For credit all authors of the original commits above have been added as co-authors for this commmit. From: Ophir Munk <ophirmu@mellanox.com> Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Co-authored-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Co-authored-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Timothy Redaelli <tredaelli@redhat.com> Co-authored-by: Timothy Redaelli <tredaelli@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-12-13 14:25:46 +00:00
Tiago Lam	a32bab26e5	netdev-dpdk: Add mbuf HEADROOM after alignment. Commit dfaf00e started using the result of dpdk_buf_size() to calculate the available size on each mbuf, as opposed to using the previous MBUF_SIZE macro. However, this was calculating the mbuf size by adding up the MTU with RTE_PKTMBUF_HEADROOM and only then aligning to NETDEV_DPDK_MBUF_ALIGN. Instead, the accounting for the RTE_PKTMBUF_HEADROOM should only happen after alignment, as per below. Before alignment: ROUNDUP(MTU(1500) + RTE_PKTMBUF_HEADROOM(128), 1024) = 2048 After aligment: ROUNDUP(MTU(1500), 1024) + 128 = 2176 This might seem insignificant, however, it might have performance implications in DPDK, where each mbuf is expected to have 2k + RTE_PKTMBUF_HEADROOM of available space. This is because not only some NICs have course grained alignments of 1k, they will also take RTE_PKTMBUF_HEADROOM bytes from the overall available space in an mbuf when setting up their Rx requirements. Thus, only the "After alignment" case above would guarantee a 2k of available room, as the "Before alignment" would report only 1920B. Some extra information can be found at: https://mails.dpdk.org/archives/dev/2018-November/119219.html Note: This has been found by Ian Stokes while going through some af_packet checks. Reported-by: Ian Stokes <ian.stokes@intel.com> Fixes: dfaf00e ("netdev-dpdk: fix mbuf sizing") Signed-off-by: Tiago Lam <tiago.lam@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-11-28 15:30:03 +00:00
Eelco Chaudron	2d37de73c1	netdev-dpdk: Bring link down when NETDEV_UP is not set When the netdev link flags are changed, !NETDEV_UP, the DPDK ports are not actually going down. This is causing problems for people trying to bring down a bond member. The bond link is no longer being used to receive or transmit traffic, however, the other end keeps sending data as the link remains up. With OVS 2.6 the link was brought down, and this was changed with commit 3b1fb0779. In this commit, it's explicitly mentioned that the link down/up DPDK APIs are not called as not all PMD devices support it. However, this patch does call the appropriate DPDK APIs and ignoring errors due to the PMD not supporting it. PMDs not supporting this should be fixed in DPDK upstream. I verified this patch is working correctly using the ovs-appctl netdev-dpdk/set-admin-state <port> {up\|down} and ovs-ofctl mod-port <bridge> <port> {up\|down} commands on a XL710 and 82599ES. Fixes: 3b1fb0779b87 ("netdev-dpdk: Don't call rte_dev_stop() in update_flags().") Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-11-12 15:45:12 +00:00
Tiago Lam	3aaa620151	dp-packet: Fix allocated size on DPDK init. When enabled with DPDK OvS deals with two types of packets, the ones coming from the mempool and the ones locally created by OvS - which are copied to mempool mbufs before output. In the latter, the space is allocated from the system, while in the former the mbufs are allocated from a mempool, which takes care of initialising them appropriately. In the current implementation, during mempool's initialisation of mbufs, dp_packet_set_allocated() is called from dp_packet_init_dpdk() without considering that the allocated space, in the case of multi-segment mbufs, might be greater than a single mbuf. Furthermore, given that dp_packet_init_dpdk() is on the code path that's called upon mempool's initialisation, a call to dp_packet_set_allocated() is redundant, since mempool takes care of initialising it. To fix this, dp_packet_set_allocated() is no longer called after initialisation of a mempool, only in dp_packet_init__(), which is still called by OvS when initialising locally created packets. Signed-off-by: Tiago Lam <tiago.lam@intel.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-11-02 16:29:14 +00:00
Mark Kavanagh	dfaf00e8c3	netdev-dpdk: fix mbuf sizing There are numerous factors that must be considered when calculating the size of an mbuf: - the data portion of the mbuf must be sized in accordance With Rx buffer alignment (typically 1024B). So, for example, in order to successfully receive and capture a 1500B packet, mbufs with a data portion of size 2048B must be used. - in OvS, the elements that comprise an mbuf are: * the dp packet, which includes a struct rte mbuf (704B) * RTE_PKTMBUF_HEADROOM (128B) * packet data (aligned to 1k, as previously described) * RTE_PKTMBUF_TAILROOM (typically 0) Some PMDs require that the total mbuf size (i.e. the total sum of all of the above-listed components' lengths) is cache-aligned. To satisfy this requirement, it may be necessary to round up the total mbuf size with respect to cacheline size. In doing so, it's possible that the dp_packet's data portion is inadvertently increased in size, such that it no longer adheres to Rx buffer alignment. Consequently, the following property of the mbuf no longer holds true: mbuf.data_len == mbuf.buf_len - mbuf.data_off This creates a problem in the case of multi-segment mbufs, where that assumption is assumed to be true for all but the final segment in an mbuf chain. Resolve this issue by adjusting the size of the mbuf's private data portion, as opposed to the packet data portion when aligning mbuf size to cachelines. Co-authored-by: Tiago Lam <tiago.lam@intel.com> Fixes: 4be4d22 ("netdev-dpdk: clean up mbuf initialization") Fixes: 31b88c9 ("netdev-dpdk: round up mbuf_size to cache_line_size") CC: Santosh Shukla <santosh.shukla@caviumnetworks.com> Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com> Signed-off-by: Tiago Lam <tiago.lam@intel.com> Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-11-02 16:27:31 +00:00
Ian Stokes	31154f9523	netdev-dpdk: Add link speed to get_status(). Report the link speed of the device in netdev_dpdk_get_status() function. Link speed is already reported as part of the netdev_get_features() function. However only link speeds defined in the OpenFlow specs are supported so speeds such as 25 Gbps etc. are not shown. The link speed for the device is available in Mbps in rte_eth_link. This commit converts the link speed for a given dpdk device to an easy to read string and reports it in get_status(). Suggested-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-11-02 15:17:47 +00:00
Ian Stokes	dfcb5b8ad5	netdev-dpdk: Fix netdev_dpdk_get_features(). This commit fixes netdev_dpdk_get_features() by initializing a bitmap that represents current features to zero and accounting for non defined link speed values in the OpenFlow spec. The current approach for retrieving netdev dpdk features uses a pointer allocated in the stack without being initialized. As such there is no guarantee that the bitmap will be accurate. Fix this by declaring and initializing local variable 'feature' to be used when building the bitmap, with its value then assigned to the pointer. Also account for link speeds not defined in the OpenFlow spec by defaulting to NETDEV_F_OTHER for undefined link speeds. Fixes: 8a9562d21a40 ("dpif-netdev: Add DPDK netdev.") Acked-by: Ilya Maximets <i.maximets@samsung.com> Co-authored-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-11-02 15:17:35 +00:00
Ilya Maximets	9474073615	netdev-dpdk: Dump flow patterns only if debug enabled. No need to waste time for fields checking in case DBG disabled. Additionally sequence of prints replaced with single print to avoid output interrupting by other log messages. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-11-02 15:16:14 +00:00
Ilya Maximets	faf71e4922	netdev-dpdk: Print port name in offload API messages. This is useful for understanding which flows offloaded to which ports. Code refactored a bit to reduce number of casts. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-11-02 15:15:27 +00:00
Ilya Maximets	5752eae485	dpif-netdev: Fix cmap node use after free on flow disassociation. Data pointed by cmap node must not be freed while iterating. ovsrcu_postpone should be used instead. CC: Finn Christensen <fc@napatech.com> Fixes: e8a2b5bf92bb ("netdev-dpdk: implement flow offload with rte flow") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-11-02 15:13:54 +00:00
Ilya Maximets	95ca79d542	netdev-dpdk: Secure flow offload API. rte API is not thread safe. We have to get netdev mutex before uing it and also before using fields of netdev structure. This is important because offload API used from the separate thread and could be used at the same time with other netdev functions called from the main thread. CC: Finn Christensen <fc@napatech.com> Fixes: e8a2b5bf92bb ("netdev-dpdk: implement flow offload with rte flow") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-11-02 15:13:40 +00:00
Ilya Maximets	c0af6425d7	netdev-dpdk: Drop offload API for vhost ports. vhost ports are not DPDK eth ports and has no rte_flow API. Stop calling this API with DPDK_ETH_PORT_ID_INVALID to avoid time wasting and errors in log. Additionally, DPDK_FLOW_OFFLOAD_API definition moved to .c file, because there is no need to expose it in header. CC: Finn Christensen <fc@napatech.com> Fixes: e8a2b5bf92bb ("netdev-dpdk: implement flow offload with rte flow") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-11-02 15:13:19 +00:00
Ben Pfaff	89c09c1cd1	netdev: Clean up class initialization. The macros are hard to read. This makes it a little more readable. Signed-off-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-08-27 17:48:23 +01:00
Xu Binbin	74cd69a479	netdev-dpdk: Support the link speed of XL710 In the scenario of XL710, the link speed which stored in the table of Interface is not 40G. Because the implementation of query of link speed only support to 10G, the parameter 'current' will be a random value in the scenario of higher link speed. In this case, incorrect link speed of XL710 nic will be stored in the database. Signed-off-by: Xu Binbin <xu.binbin1@zte.com.cn> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-08-27 17:48:23 +01:00
Kevin Traynor	51c6a5a3c8	netdev-dpdk: Use hex for PCI vendor ID. Match the prefix and formatting. Fixes: 8a9562d21a40 ("dpif-netdev: Add DPDK netdev.") Cc: pshelar@ovn.org Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-08-08 22:06:21 +01:00
Sugesh Chandran	7e1de65e8d	netdev-dpdk: Fix failure to configure flow control at netdev-init. Configuring flow control at ixgbe netdev-init is throwing error in port start. For eg: without this fix, user cannot configure flow control on ixgbe dpdk port as below, " ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk \ options:dpdk-devargs=0000:05:00.1 options:rx-flow-ctrl=true " Instead, it must be configured as two different commands, " ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk \ options:dpdk-devargs=0000:05:00.1 ovs-vsctl set Interface dpdk0 options:rx-flow-ctrl=true " The DPDK ixgbe driver is now validating all the 'rte_eth_fc_conf' fields before trying to configuring the dpdk ethdev. Hence OVS can no longer set the 'dont care' fields to just '0' as before. This commit make sure all the 'rte_eth_fc_conf' fields are populated with default values before the dev init. Also to avoid read error on unsupported ports, the flow control parameters are now read only when user is trying to configure/update it. Signed-off-by: Sugesh Chandran <sugesh.chandran@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-08-08 22:06:21 +01:00
Ben Pfaff	773c3cb40f	netdev-dpdk: Use ETH_ADDR_BYTES_ARGS instead of open-coding it. Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-07-24 22:36:38 +01:00
Ben Pfaff	31a033cb71	netdev-dpdk: Fix sparse complaints. Neither of these is a real problem. Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-07-24 22:36:29 +01:00
Ben Pfaff	2b7b5dbb07	netdev-dpdk: Fix incorrect byte order conversion in log message. uint8_t values shouldn't be passed to ntohs(). Found by soarse. Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-07-24 22:36:21 +01:00
Ian Stokes	43307ad0e2	dpdk: Support both shared and per port mempools. This commit re-introduces the concept of shared mempools as the default memory model for DPDK devices. Per port mempools are still available but must be enabled explicitly by a user. OVS previously used a shared mempool model for ports with the same MTU and socket configuration. This was replaced by a per port mempool model to address issues flagged by users such as: https://mail.openvswitch.org/pipermail/ovs-discuss/2016-September/042560.html However the per port model potentially requires an increase in memory resource requirements to support the same number of ports and configuration as the shared port model. This is considered a blocking factor for current deployments of OVS when upgrading to future OVS releases as a user may have to redimension memory for the same deployment configuration. This may not be possible for users. This commit resolves the issue by re-introducing shared mempools as the default memory behaviour in OVS DPDK but also refactors the memory configuration code to allow for per port mempools. This patch adds a new global config option, per-port-memory, that controls the enablement of per port mempools for DPDK devices. ovs-vsctl set Open_vSwitch . other_config:per-port-memory=true This value defaults to false; to enable per port memory support, this field should be set to true when setting other global parameters on init (such as "dpdk-socket-mem", for example). Changing the value at runtime is not supported, and requires restarting the vswitch daemon. The mempool sweep functionality is also replaced with the sweep functionality from OVS 2.9 found in commits c77f692 (netdev-dpdk: Free mempool only when no in-use mbufs.) a7fb0a4 (netdev-dpdk: Add mempool reuse/free debug.) A new document to discuss the specifics of the memory models and example memory requirement calculations is also added. Signed-off-by: Ian Stokes <ian.stokes@intel.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Tiago Lam <tiago.lam@intel.com> Tested-by: Tiago Lam <tiago.lam@intel.com>	2018-07-06 12:46:26 +01:00
Yuanhan Liu	daf90186e2	netdev-dpdk: add debug for rte flow patterns For debug purpose. Co-authored-by: Finn Christensen <fc@napatech.com> Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org> Signed-off-by: Finn Christensen <fc@napatech.com> Co-authored-by: Shahaf Shuler <shahafs@mellanox.com> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-07-06 10:32:52 +01:00
Finn Christensen	e8a2b5bf92	netdev-dpdk: implement flow offload with rte flow The basic yet the major part of this patch is to translate the "match" to rte flow patterns. And then, we create a rte flow with MARK + RSS actions. Afterwards, all packets match the flow will have the mark id in the mbuf. The reason RSS is needed is, for most NICs, a MARK only action is not allowed. It has to be used together with some other actions, such as QUEUE, RSS, etc. However, QUEUE action can specify one queue only, which may break the rss. Likely, RSS action is currently the best we could now. Thus, RSS action is choosen. For any unsupported flows, such as MPLS, -1 is returned, meaning the flow offload is failed and then skipped. Co-authored-by: Yuanhan Liu <yliu@fridaylinux.org> Signed-off-by: Finn Christensen <fc@napatech.com> Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org> Co-authored-by: Shahaf Shuler <shahafs@mellanox.com> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-07-06 10:32:52 +01:00
John Hurley	88dcf2aa82	netdev-provider: add class op to get block_id Add a new class op for netdevs to get the block_id if one exists. The block_id is used in offload ops to group multiple qdiscs together. Stub calls are made to the new class op (implementation to follow in further patches). The default block_id of 0 (no block) will be used in these cases. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2018-06-29 14:51:47 +02:00
Aaron Conole	b9a3183d3a	netdev-dpdk: Avoid warning for snprintf() call. lib/netdev-dpdk.c: In function : lib/netdev-dpdk.c:2865:49: warning: output may be truncated before the last format character [-Wformat-truncation=] snprintf(vhost_vring, 16, "vring_%d_size", i); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Suggested-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ilya Maximets <i.maximets@samsung.com>	2018-06-15 11:26:14 -07:00
Ian Stokes	4dd16ca0c3	netdev-dpdk: Handle ENOTSUP for rte_eth_dev_set_mtu. The function rte_eth_dev_set_mtu is not supported for all DPDK drivers. Currently if it is not supported we return an error in dpdk_eth_dev_queue_setup. There are two issues with this. (i) A device can still function even if rte_eth_dev_set_mtu is not supported albeit with the default max rx packet length. (ii) When ENOTSUP is returned it will not be caught in port_reconfigure() at the dpif-netdev layer. Port_reconfigure() checks if a netdev_reconfigure() function is supported for a given netdev and ignores EOPNOTSUPP errors as it assumes errors of this value mean there is no reconfiguration function. In this case the reconfiguration function is supported for netdev dpdk but a function called as part of the reconfigure (rte_eth_dev_set_mtu) may not be supported. As this is a corner case, this commit warns a user when rte_eth_dev_set_mtu is not supported and informs them of the default max rx packet length that will be used instead. Signed-off-by: Ian Stokes <ian.stokes@intel.com> Co-author: Michal Weglicki <michalx.weglicki@intel.com> Tested-By: Ciara Loftus <ciara.loftus@intel.com> Acked-by: Cian Ferriter <cian.ferriter@intel.com> Tested-by: Cian Ferriter <cian.ferriter@intel.com>	2018-06-08 17:27:56 +01:00
Michal Weglicki	e10ca8b921	netdev-dpdk: Enable HW_CRC_STRIP for virtual functions. Virtual functions such as igb_vf and i40e_vf require HW_CRC_STRIP to be explicitly enabled before configuration, otherwise device configuration will fail. This commit achieves this by adding NETDEV_RX_HW_CRC_STRIP to dpdk_hw_ol_features. When a dpdk device is added, the driver for the device is examined, if the device is a virtual function enable HW_CRC_STRIP. Signed-off-by: Michal Weglicki <michalx.weglicki@intel.com> Co-Authored: Ian Stokes <ian.stokes@intel.com> Acked-by: Cian Ferriter <cian.ferriter@intel.com> Tested-by: Cian Ferriter <cian.ferriter@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-06-08 17:27:56 +01:00
Timothy Redaelli	7bbc2e1def	netdev-dpdk: fix check for "net_nfp" driver Currently the check of "net_nfp" driver while enabling scatter compares only the first 6 bytes, but "net_nfp" is 7 bytes long. This change fixes the check by comparing the first 7 bytes. CC: Pablo Cascón <pablo.cascon@netronome.com> CC: Simon Horman <simon.horman@netronome.com> Fixes: 65a87968f4cf ("netdev-dpdk: don't enable scatter for jumbo RX support for nfp") Signed-off-by: Timothy Redaelli <tredaelli@redhat.com> Acked-by: Pablo Cascón <pablo.cascon@netronome.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-05-25 09:09:50 +01:00
Eelco Chaudron	606f665072	netdev-dpdk: Don't use PMD driver if not configured successfully When initialization of the DPDK PMD driver fails (dpdk_eth_dev_init()), the reconfigure_datapath() function will remove the port from dp_netdev, and the port is not used. Now when bridge_reconfigure() is called again, no changes to the previous failing netdev configuration are detected and therefore the ports gets added to dp_netdev and used uninitialized. This is causing exceptions... The fix has two parts to it. First in netdev-dpdk.c we remember if the DPDK port was started or not, and when calling netdev_dpdk_reconfigure() we also try re-initialization if the port was not already active. The second part of the change is in dpif-netdev.c where it makes sure netdev_reconfigure() is called if the port needs reconfiguration, as netdev_is_reconf_required() is only true until netdev_reconfigure() is called (even if it fails). Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Tested-by: Ciara Loftus <ciara.loftus@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-05-25 09:09:50 +01:00
Kevin Traynor	1f84a2d5b5	netdev-dpdk: Remove use of rte_mempool_ops_get_count. rte_mempool_ops_get_count is not exported by DPDK so it means it cannot be used by OVS when using DPDK as a shared library. Remove rte_mempool_ops_get_count but still use rte_mempool_full and document it's behavior. Fixes: 91fccdad72a2 ("netdev-dpdk: Free mempool only when no in-use mbufs.") Reported-by: Timothy Redaelli <tredaelli@redhat.com> Reported-by: Markos Chandras <mchandras@suse.de> Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-05-25 09:09:50 +01:00
Darrell Ball	7d7ded7af7	odp-execute: Rename 'may_steal' to 'should_steal'. Signed-off-by: Darrell Ball <dlu998@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2018-05-23 11:36:47 -07:00
Eelco Chaudron	eaa4358119	netdev-dpdk: Fixed netdev_dpdk structure alignment Currently, the code tells us we have 4 pad bytes left in cacheline0 while actually we are 8 bytes short: struct netdev_dpdk { union { OVS_CACHE_LINE_MARKER cacheline0; /* 1 / struct { dpdk_port_t port_id; / 0 2 / _Bool attached; / 2 1 / struct eth_addr hwaddr; / 4 6 / int mtu; / 12 4 / int socket_id; / 16 4 / int buf_size; / 20 4 / int max_packet_len; / 24 4 / enum dpdk_dev_type type; / 28 4 / enum netdev_flags flags; / 32 4 / char devargs; /* 40 8 / struct dpdk_tx_queue tx_q; /* 48 8 / struct rte_eth_link link; / 56 8 / int link_reset_cnt; / 64 4 / }; / 72 / uint8_t pad9[128]; / 128 / }; / 0 128 / / --- cacheline 2 boundary (128 bytes) --- / Re-located one member, link_reset_cnt, and now it's one cache line: struct netdev_dpdk { union { OVS_CACHE_LINE_MARKER cacheline0; / 1 / struct { dpdk_port_t port_id; / 0 2 / _Bool attached; / 2 1 / struct eth_addr hwaddr; / 4 6 / int mtu; / 12 4 / int socket_id; / 16 4 / int buf_size; / 20 4 / int max_packet_len; / 24 4 / enum dpdk_dev_type type; / 28 4 / enum netdev_flags flags; / 32 4 / int link_reset_cnt; / 36 4 / char devargs; /* 40 8 / struct dpdk_tx_queue tx_q; /* 48 8 / struct rte_eth_link link; / 56 8 / }; / 64 / uint8_t pad9[64]; / 64 / }; / 0 64 / / --- cacheline 1 boundary (64 bytes) --- */ Fixes: 5e925ccc2a6f ("netdev-dpdk: DPDK v17.11 upgrade") Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Tiago Lam <tiago.lam@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-05-11 08:08:24 +01:00
Róbert Mulik	f8b64a61bc	Configurable Link State Change (LSC) detection mode It is possible to set LSC detection mode to polling or interrupt mode for DPDK interfaces. The default is polling mode. To set interrupt mode, option dpdk-lsc-interrupt has to be set to true. For detailed description and usage see the dpdk install documentation. Signed-off-by: Robert Mulik <robert.mulik@ericsson.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-05-11 08:08:24 +01:00
Jan Scheurich	8492adc270	netdev: Add optional qfill output parameter to rxq_recv() If the caller provides a non-NULL qfill pointer and the netdev implemementation supports reading the rx queue fill level, the rxq_recv() function returns the remaining number of packets in the rx queue after reception of the packet burst to the caller. If the implementation does not support this, it returns -ENOTSUP instead. Reading the remaining queue fill level should not substantilly slow down the recv() operation. A first implementation is provided for ethernet and vhostuser DPDK ports in netdev-dpdk.c. This output parameter will be used in the upcoming commit for PMD performance metrics to supervise the rx queue fill level for DPDK vhostuser ports. Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Acked-by: Billy O'Mahony <billy.o.mahony@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-05-11 08:08:24 +01:00
Pablo Cascón	65a87968f4	netdev-dpdk: don't enable scatter for jumbo RX support for nfp Currently to RX jumbo packets fails for NICs not supporting scatter. Scatter is not strictly needed for jumbo RX support. This change fixes the issue by not enabling scatter only for the PMD/NIC known not to need it to support jumbo RX. Note: this change is temporary and not needed for later releases OVS/DPDK Reported-by: Louis Peens <louis.peens@netronome.com> Signed-off-by: Pablo Cascón <pablo.cascon@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-05-11 08:08:24 +01:00
Kevin Traynor	91fccdad72	netdev-dpdk: Free mempool only when no in-use mbufs. DPDK mempools are freed when they are no longer needed. This can happen when a port is removed or a port's mtu is reconfigured so that a new mempool is used. It is possible that an mbuf is attempted to be returned to a freed mempool from NIC Tx queues and this can lead to a segfault. In order to prevent this, only free mempools when they are not needed and have no in-use mbufs. As this might not be possible immediately, create a free list of mempools and sweep it anytime a port tries to get a mempool. Fixes: 8d38823bdf8b ("netdev-dpdk: fix memory leak") Cc: mark.b.kavanagh81@gmail.com Cc: Ilya Maximets <i.maximets@samsung.com> Reported-by: Venkatesan Pradeep <venkatesan.pradeep@ericsson.com> Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-04-21 16:59:45 +01:00
Kevin Traynor	1dfebee971	netdev-dpdk: Remove 'error' from non error log. Presently, if OVS tries to setup more queues than are allowed by a specific NIC, OVS will handle this case by retrying with a lower amount of queues. Rather than reporting initial failed queue setups in the logs as ERROR, they are reported as INFO but contain the word 'error'. Unless a user has detailed knowledge of OVS-DPDK workings, this is confusing. Let's remove 'error' from the INFO log. Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-03-23 11:35:34 +00:00
Ilya Maximets	fa9f4eebd3	netdev-dpdk: Fix print format for dpdk port ids. Since 17.11 release DPDK uses uint16 for port_id. Format strings for printing functions must be updated accordingly. CC: Mark Kavanagh <mark.b.kavanagh@intel.com> Fixes: 5e925ccc2a6f ("netdev-dpdk: DPDK v17.11 upgrade") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-03-23 11:28:35 +00:00
Justin Pettit	e883448e3f	dp-packet: Add index to DP_PACKET_BATCH_FOR_EACH to prevent shadowing. Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>	2018-02-28 14:53:27 -08:00
Ciara Loftus	10087cba9d	netdev-dpdk: Add support for vHost dequeue zero copy (experimental) Zero copy is disabled by default. To enable it, set the 'dq-zero-copy' option to 'true' when configuring the Interface: ovs-vsctl set Interface dpdkvhostuserclient0 options:vhost-server-path=/tmp/dpdkvhostuserclient0 options:dq-zero-copy=true When packets from a vHost device with zero copy enabled are destined for a single 'dpdk' port, the number of tx descriptors on that 'dpdk' port must be set to a smaller value. 128 is recommended. This can be achieved like so: ovs-vsctl set Interface dpdkport options:n_txq_desc=128 Note: The sum of the tx descriptors of all 'dpdk' ports the VM will send to should not exceed 128. Due to this requirement, the feature is considered 'experimental'. Testing of the patch showed a ~8% improvement when switching 512B packets between vHost devices on different VMs on the same host when zero copy was enabled on the transmitting device. Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-01-31 14:04:35 +00:00
Ilya Maximets	ac1a9bb93f	netdev-dpdk: Fix xstats leak on port destruction. CC: Michal Weglicki <michalx.weglicki@intel.com> Fixes: 971f4b394c6e ("netdev: Custom statistics.") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-01-26 20:49:18 +00:00
Ilya Maximets	34eb086342	netdev-dpdk: Fix memory leak in netdev_dpdk_configure_xstats(). CC: Michal Weglicki <michalx.weglicki@intel.com> Fixes: 971f4b394c6e ("netdev: Custom statistics.") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-01-26 20:49:18 +00:00
Ilya Maximets	526259f22c	netdev-dpdk: Fix memory leak in netdev_dpdk_get_custom_stats(). CC: Michal Weglicki <michalx.weglicki@intel.com> Fixes: 971f4b394c6e ("netdev: Custom statistics.") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-01-26 20:49:18 +00:00
Yuanhan Liu	5e75881868	netdev-dpdk: fix port addition for ports sharing same PCI id Some NICs have only one PCI address associated with multiple ports. This patch extends the dpdk-devargs option's format to cater for such devices. To achieve that, this patch uses a new syntax that will be adapted and implemented in future DPDK release (likely, v18.05): http://dpdk.org/ml/archives/dev/2017-December/084234.html And since it's the DPDK duty to parse the (complete and full) syntax and this patch is more likely to serve as an intermediate workaround, here I take a simpler and shorter syntax from it (note it's allowed to have only one category being provided): class=eth,mac=00:11:22:33:44:55:66 Also, old compatibility is kept. Users can still go on with using the PCI id to add a port (if that's enough for them). Meaning, this patch will not break anything. This patch is basically based on the one from Ciara: https://mail.openvswitch.org/pipermail/ovs-dev/2017-October/339496.html Cc: Loftus Ciara <ciara.loftus@intel.com> Cc: Thomas Monjalon <thomas@monjalon.net> Cc: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-01-26 20:49:18 +00:00
Ian Stokes	f6f50552a3	netdev-dpdk: Fix requested MTU size validation. This commit replaces MTU_TO_FRAME_LEN(mtu) with MTU_TO_MAX_FRAME_LEN(mtu) in netdev_dpdk_set_mtu(), in order to determine if the total length of the L2 frame with an MTU of ’mtu’ exceeds NETDEV_DPDK_MAX_PKT_LEN. When setting an MTU we first check if the requested total frame length (which includes associated L2 overhead) will exceed the maximum frame length supported in netdev_dpdk_set_mtu(). The frame length is calculated by MTU_TO_FRAME_LEN as MTU + ETHER_HEADER + ETHER_CRC. The MTU for the device will be set at a later stage in dpdk_eth_dev_init() using rte_eth_dev_set_mtu(mtu). However when using rte_eth_dev_set_mtu(mtu) the calculation used to check that the frame does not exceed the max frame length for that device varies between DPDK device drivers. For example ixgbe driver calculates the frame length for a given MTU as mtu + ETHER_HDR_LEN + ETHER_CRC_LEN i40e driver calculates it as mtu + ETHER_HDR_LEN + ETHER_CRC_LEN + I40E_VLAN_TAG_SIZE * 2 em driver calculates it as mtu + ETHER_HDR_LEN + ETHER_CRC_LEN + VLAN_TAG_SIZE Currently it is possible to set an MTU for a netdev_dpdk device that exceeds the upper limit MTU for that devices DPDK driver. This leads to a segfault. This is because the frame length comparison as is, does not take into account the addition of the vlan tag overhead expected in the drivers. The netdev_dpdk_set_mtu() call will incorrectly succeed but the subsequent dpdk_eth_dev_init() will fail before the queues have been created for the DPDK device. This coupled with assumptions regarding reconfiguration requirements for the netdev will lead to a segfault when the rxq is polled for this device. A simple way to avoid this is by using MTU_TO_MAX_FRAME_LEN(mtu) when validating a requested MTU in netdev_dpdk_set_mtu(). MTU_TO_MAX_FRAME_LEN(mtu) is equivalent to the following: mtu + ETHER_HDR_LEN + ETHER_CRC_LEN + (2 * VLAN_HEADER_LEN) By using MTU_TO_MAX_FRAME_LEN at the netdev_dpdk_set_mtu() stage, OvS now takes into account the maximum L2 overhead that a DPDK driver could allow for in its frame size calculation. This allows OVS to flag an error rather than the DPDK driver if the frame length exceeds the max DPDK frame length. OVS can fail gracefully at this point and use the default MTU of 1500 to continue to configure the port. Note: this fix is a work around, a better approach would be if DPDK devices could report the maximum MTU value that can be requested on a per device basis. This capability however is not currently available. A downside of this patch is that the MTU upper limit will be reduced by 8 bytes for DPDK devices that do not need to account for vlan tags in the frame length driver calculations e.g. ixgbe devices upper MTU limit is reduced from the OVS point of view from 9710 to 9702. CC: Mark Kavanagh <mark.b.kavanagh@intel.com> Fixes: 0072e931 ("netdev-dpdk: add support for jumbo frames") Signed-off-by: Ian Stokes <ian.stokes@intel.com> Co-authored-by: Mark Kavanagh <mark.b.kavanagh@intel.com> Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org>	2018-01-26 20:49:18 +00:00
Flavio Leitner	b2e8b12f8a	netdev-dpdk: add vhost-user get_status. Expose relevant vhost-user information in status. Signed-off-by: Flavio Leitner <fbl@sysclose.org> Tested-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-01-17 18:12:46 +00:00
zhangliping	4c47ddde34	netdev-dpdk: fix ingress_policer leak on error path Fix memory leak by freeing the policer if rte_meter_srtcm_config fails. Fixes: 9509913aa722 ("netdev-dpdk.c: Add ingress-policing functionality.") Signed-off-by: zhangliping <zhangliping02@baidu.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-01-17 18:11:28 +00:00
Michal Weglicki	971f4b394c	netdev: Custom statistics. - New get_custom_stats interface function is added to netdev. It allows particular netdev implementation to expose custom counters in dictionary format (counter name/counter value). - New statistics are retrieved using experimenter code and are printed as a result to ofctl dump-ports. - New counters are available for OpenFlow 1.4+. - New statistics are printed to output via ofctl only if those are present in reply message. - New statistics definition is added to include/openflow/intel-ext.h. - Custom statistics are implemented only for dpdk-physical port type. - DPDK-physical implementation uses xstats to collect statistics. Only dropped and error counters are exposed. Co-authored-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Michal Weglicki <michalx.weglicki@intel.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2018-01-10 15:29:13 -08:00
Ilya Maximets	ad8b0b4fe7	netdev: Remove useless cutlen. Cutlen already applied while processing OVS_ACTION_ATTR_OUTPUT. Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com	2017-12-20 21:07:46 +00:00
Ilya Maximets	b30896c969	netdev: Remove unused may_steal. Not needed anymore because 'may_steal' already handled on dpif-netdev layer and always true. Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com	2017-12-20 21:07:46 +00:00

1 2 3 4 5 ...

308 Commits