mir/ovs - ovs - Mike's Git repositories

mir/ovs

mirror of https://github.com/openvswitch/ovs synced 2025-08-28 21:07:47 +00:00

Author	SHA1	Message	Date
Mike Pattrick	2276c3a2c6	userspace: Support GRE TSO. This patch extends the userspace datapaths support of tunnel tso from only supporting VxLAN and Geneve to also supporting GRE tunnels. There is also a software fallback for cases where the egress netdev does not support this feature. Reviewed-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-01-17 00:20:48 +01:00
Mike Pattrick	82c1028e37	Userspace: Software fallback for UDP encapsulated TCP segmentation. When sending packets that are flagged as requiring segmentation to an interface that does not support this feature, send the packet to the TSO software fallback instead of dropping it. Reviewed-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>	2024-09-11 15:36:27 +02:00
Dexia Li	084c808729	userspace: Support VXLAN and GENEVE TSO. For userspace datapath, this patch provides vxlan and geneve tunnel tso. Only support userspace vxlan or geneve tunnel, meanwhile support tunnel outter and inner csum offload. If netdev do not support offload features, there is a software fallback.If netdev do not support vxlan and geneve tso,packets will drop. Front-end devices can close offload features by ethtool also. Acked-by: Simon Horman <horms@ovn.org> Signed-off-by: Dexia Li <dexia.li@jaguarmicro.com> Co-authored-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-01-17 22:06:45 +01:00
Flavio Leitner	8b5fe2dc60	userspace: Add Generic Segmentation Offloading. This provides a software implementation in the case the egress netdev doesn't support segmentation in hardware. The challenge here is to guarantee packet ordering in the original batch that may be full of TSO packets. Each TSO packet can go up to ~64kB, so with segment size of 1440 that means about 44 packets for each TSO. Each batch has 32 packets, so the total batch amounts to 1408 normal packets. The segmentation estimates the total number of packets and then the total number of batches. Then allocate enough memory and finally do the work. Finally each batch is sent in order to the netdev. Signed-off-by: Flavio Leitner <fbl@sysclose.org> Co-authored-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Mike Pattrick <mkp@redhat.com> Acked-by: Simon Horman <horms@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-12-02 01:33:37 +01:00
Adrian Moreno	6240c0b4c8	netdev: Add netdev_get_speed() to netdev API. Currently, the netdev's speed is being calculated by taking the link's feature bits (using netdev_get_features()) and transforming them into bps. This mechanism can be both inaccurate and difficult to maintain, mainly because we currently use the feature bits supported by OpenFlow which would have to be extended to support all new feature bits of all netdev implementations while keeping the OpenFlow API intact. In order to expose the link speed accurately for all current and future hardware, add a new netdev API call that allows the implementations to provide the current and maximum link speeds in Mbps. Internally, the logic to get the maximum supported speed still relies on feature bits so it might still get out of sync in the future. However, the maximum configurable speed is not used as much as the current speed and these feature bits are not exposed through the netdev interface so it should be easier to add more. Use this new function instead of netdev_get_features() where the link speed is needed. As a consequence of this patch, link speeds of cards is properly reported (internally in OVSDB) even if not supported by OpenFlow. A test verifies this behavior using a tap device. Also, in order to avoid using the old, this patch adds a checkpatch.py warning if the old API is used. Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2137567 Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-07-17 20:03:32 +02:00
Mike Pattrick	3337e6d91c	userspace: Enable L4 checksum offloading by default. The netdev receiving packets is supposed to provide the flags indicating if the L4 checksum was verified and it is OK or BAD, otherwise the stack will check when appropriate by software. If the packet comes with good checksum, then postpone the checksum calculation to the egress device if needed. When encapsulate a packet with that flag, set the checksum of the inner L4 header since that is not yet supported. Calculate the L4 checksum when the packet is going to be sent over a device that doesn't support the feature. Linux tap devices allows enabling L3 and L4 offload, so this patch enables the feature. However, Linux socket interface remains disabled because the API doesn't allow enabling those two features without enabling TSO too. Signed-off-by: Flavio Leitner <fbl@sysclose.org> Co-authored-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-06-15 23:50:30 +02:00
Mike Pattrick	5d11c47d3e	userspace: Enable IP checksum offloading by default. The netdev receiving packets is supposed to provide the flags indicating if the IP checksum was verified and it is GOOD or BAD, otherwise the stack will check when appropriate by software. If the packet comes with good checksum, then postpone the checksum calculation to the egress device if needed. When encapsulate a packet with that flag, set the checksum of the inner IP header since that is not yet supported. Calculate the IP checksum when the packet is going to be sent over a device that doesn't support the feature. Linux devices don't support IP checksum offload alone, so the support is not enabled. Signed-off-by: Flavio Leitner <fbl@sysclose.org> Co-authored-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-06-15 23:49:51 +02:00
Mike Pattrick	4433cc6860	dpif-netdev: Show netdev offloading flags. This patch modifies netdev_get_status to include information about checksum offload status by port, allowing the user to gain insight into where checksum offloading is active. Signed-off-by: Flavio Leitner <fbl@sysclose.org> Co-authored-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-06-15 15:44:57 +02:00
Eli Britstein	76ab364ea8	netdev-offload: Set 'miss_api_supported' to be under netdev. Cited commit introduced a flag in dpif-netdev level, to optimize performance and avoid hw_miss_packet_recover() for devices with no such support. However, there is a race condition between traffic processing and assigning a 'flow_api' object to the netdev. In such case, EOPNOTSUPP is returned by netdev_hw_miss_packet_recover() in netdev-offload.c layer because 'flow_api' is not yet initialized. As a result, the flag is falsely disabled, and subsequent packets won't be recovered, though they should. In order to fix it, move the flag to be in netdev-offload layer, to avoid that race. Fixes: 6e50c1651869 ("dpif-netdev: Avoid hw_miss_packet_recover() for devices with no support.") Signed-off-by: Eli Britstein <elibr@nvidia.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-10-25 21:35:51 +02:00
Tao Liu	378b51c6b0	netdev: Clear auto_classified if netdev reopened with the type specified. When netdev first opened by netdev_open(..., NULL, ...), netdev_class sets to system by default, and auto_classified sets to true. If netdev reopens by netdev_open(..., "system", ...), auto_classified should be cleared. This will be used in next patch to fix lag issue. Fixes: 8c2c225e481d ("netdev: Fix netdev_open() to track and recreate classless interfaces") Signed-off-by: Tao Liu <thomas.liu@ucloud.cn> Acked-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-07-26 12:41:41 +02:00
Yong Xu	c2567e533f	add port-based ingress policing based packet-per-second rate-limiting OVS has support for using policing to enforce a rate limit in kilobits per second. This is configured using OVSDB. f.e. $ ovs-vsctl set interface tap0 ingress_policing_rate=1000 $ ovs-vsctl set interface tap0 ingress_policing_burst=100 This patch adds a related feature, allowing policing to enforce a rate limit in kilo-packets per second. This is also configured using OVSDB. $ ovs-vsctl set interface tap0 ingress_policing_kpkts_rate=1000 $ ovs-vsctl set interface tap0 ingress_policing_kpkts_burst=100 The kilo-bit and kilo-packet rate limits may be used separately or in combination. Add separate action for BPS and PPS in netlink message. Revise code and change action result to pipe to allow traffic pipe into second action. This patch implements the feature for: * OVSDB (northbound API) * TC policer when used both with and without TC offload (kernel API) Signed-off-by: Yong Xu <yong.xu@corigine.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2021-07-01 20:44:07 +02:00
Ilya Maximets	48c1ab5d74	netdev: Allow storing dpif type into netdev structure. Storing of the dpif type of the owning datapath interface will allow us to easily distinguish, for example, userspace tunneling ports from the system ones. This is required in terms of HW offloading to avoid offloading of userspace flows to kernel interfaces that doesn't belong to userspace datapath, but have same dpif_port names. Acked-by: Eli Britstein <elibr@mellanox.com> Acked-by: Roni Bar Yanai <roniba@mellanox.com> Acked-by: Ophir Munk <ophirmu@mellanox.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2020-07-08 19:07:21 +02:00
William Tu	5bfc519fee	netdev-afxdp: Add interrupt mode netdev class. The patch adds a new netdev class 'afxdp-nonpmd' to enable afxdp interrupt mode. This is similar to 'type=afxdp', except that the is_pmd field is set to false. As a result, the packet processing is handled by main thread, not pmd thread. This avoids burning the CPU to always 100% when there is no traffic. Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2020-04-28 17:58:31 +02:00
Flavio Leitner	35b5586ba7	userspace TSO: SCTP checksum offload optional. Ideally SCTP checksum offload needs be advertised by the NIC when userspace TSO is enabled. However, very few drivers do that and it's not a widely used protocol. So, this patch enables SCTP checksum offload if available, otherwise userspace TSO can still be enabled but SCTP packets will be dropped on NICs without support. Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support") Signed-off-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2020-02-26 15:24:15 +01:00
Flavio Leitner	8c5163fe81	userspace TSO: Include UDP checksum offload. Virtio doesn't expose flags to control which protocols checksum offload needs to be enabled or disabled. This patch checks if the NIC supports UDP checksum offload and active it when TSO is enabled. Reported-by: Ilya Maximets <i.maximets@ovn.org> Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support") Signed-off-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2020-02-26 15:24:15 +01:00
Flavio Leitner	29cf9c1b3b	userspace: Add TCP Segmentation Offload support Abbreviated as TSO, TCP Segmentation Offload is a feature which enables the network stack to delegate the TCP segmentation to the NIC reducing the per packet CPU overhead. A guest using vhostuser interface with TSO enabled can send TCP packets much bigger than the MTU, which saves CPU cycles normally used to break the packets down to MTU size and to calculate checksums. It also saves CPU cycles used to parse multiple packets/headers during the packet processing inside virtual switch. If the destination of the packet is another guest in the same host, then the same big packet can be sent through a vhostuser interface skipping the segmentation completely. However, if the destination is not local, the NIC hardware is instructed to do the TCP segmentation and checksum calculation. It is recommended to check if NIC hardware supports TSO before enabling the feature, which is off by default. For additional information please check the tso.rst document. Signed-off-by: Flavio Leitner <fbl@sysclose.org> Tested-by: Ciara Loftus <ciara.loftus.intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2020-01-17 22:27:25 +00:00
Yanqin Wei	3343f8d6cf	netdev: use acquire-release semantics for change_seq in netdev "rxq_enabled" of netdev is writen in the vhost thread and read by pmd thread once it observes 'change_seq' is updated. This patch is to keep order on aarch64 or other weak memory model CPU to ensure 'rxq_enabled' is observed before 'change_seq'. Reviewed-by: Gavin Hu <Gavin.Hu@arm.com> Signed-off-by: Yanqin Wei <Yanqin.Wei@arm.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2019-12-02 14:48:14 -08:00
Darrell Ball	594570ea1c	conntrack: Optimize recirculations. Cache the 'conn' context and use it when it is valid. The cached 'conn' context will get reset if it is not expected to be valid; the cost to do this is negligible. Besides being most optimal, this also handles corner cases, such as decapsulation leading to the same tuple, as in tunnel VPN cases. A negative test is added to check the resetting of the cached 'conn'. Signed-off-by: Darrell Ball <dlu998@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2019-09-25 08:58:11 -07:00
William Tu	0de1b42596	netdev-afxdp: add new netdev type for AF_XDP. The patch introduces experimental AF_XDP support for OVS netdev. AF_XDP, the Address Family of the eXpress Data Path, is a new Linux socket type built upon the eBPF and XDP technology. It is aims to have comparable performance to DPDK but cooperate better with existing kernel's networking stack. An AF_XDP socket receives and sends packets from an eBPF/XDP program attached to the netdev, by-passing a couple of Linux kernel's subsystems As a result, AF_XDP socket shows much better performance than AF_PACKET For more details about AF_XDP, please see linux kernel's Documentation/networking/af_xdp.rst. Note that by default, this feature is not compiled in. Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>	2019-07-19 17:42:06 +03:00
David Marchand	35c91567c8	dpif-netdev: Only poll enabled vhost queues. We currently poll all available queues based on the max queue count exchanged with the vhost peer and rely on the vhost library in DPDK to check the vring status beneath. This can lead to some overhead when we have a lot of unused queues. To enhance the situation, we can skip the disabled queues. On rxq notifications, we make use of the netdev's change_seq number so that the pmd thread main loop can cache the queue state periodically. $ ovs-appctl dpif-netdev/pmd-rxq-show pmd thread numa_id 0 core_id 1: isolated : true port: dpdk0 queue-id: 0 (enabled) pmd usage: 0 % pmd thread numa_id 0 core_id 2: isolated : true port: vhost1 queue-id: 0 (enabled) pmd usage: 0 % port: vhost3 queue-id: 0 (enabled) pmd usage: 0 % pmd thread numa_id 0 core_id 15: isolated : true port: dpdk1 queue-id: 0 (enabled) pmd usage: 0 % pmd thread numa_id 0 core_id 16: isolated : true port: vhost0 queue-id: 0 (enabled) pmd usage: 0 % port: vhost2 queue-id: 0 (enabled) pmd usage: 0 % $ while true; do ovs-appctl dpif-netdev/pmd-rxq-show \|awk ' /port: / { tot++; if ($5 == "(enabled)") { en++; } } END { print "total: " tot ", enabled: " en }' sleep 1 done total: 6, enabled: 2 total: 6, enabled: 2 ... # Started vm, virtio devices are bound to kernel driver which enables # F_MQ + all queue pairs total: 6, enabled: 2 total: 66, enabled: 66 ... # Unbound vhost0 and vhost1 from the kernel driver total: 66, enabled: 66 total: 66, enabled: 34 ... # Configured kernel bound devices to use only 1 queue pair total: 66, enabled: 34 total: 66, enabled: 19 total: 66, enabled: 4 ... # While rebooting the vm total: 66, enabled: 4 total: 66, enabled: 2 ... total: 66, enabled: 66 ... # After shutting down the vm total: 66, enabled: 66 total: 66, enabled: 2 Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2019-06-26 18:43:39 +01:00
Ilya Maximets	4f746d526d	netdev-offload: Rename offload providers. Flow API providers renamed to be consistent with parent module 'netdev-offload' and look more like each other. '_rte_' replaced with more convenient '_dpdk_'. We'll have following structure: Common code: lib/netdev-offload-provider.h lib/netdev-offload.c lib/netdev-offload.h Providers: lib/netdev-offload-tc.c lib/netdev-offload-dpdk.c 'netdev-offload-dummy' still resides inside netdev-dummy, but it makes no much sence to move it out of there. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ben Pfaff <blp@ovn.org> Acked-by: Roi Dayan <roid@mellanox.com>	2019-06-11 09:39:36 +03:00
Ilya Maximets	b6cabb8f8f	netdev: Split up netdev offloading to separate module. New module 'netdev-offload' created to manage different flow API implementations. All the generic and provider independent code moved there from the 'netdev' module. Flow API providers further encapsulated. The only function that was changed is 'netdev_any_oor'. Now it uses offloading related hmap instead of common 'netdev_shash'. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ben Pfaff <blp@ovn.org> Acked-by: Roi Dayan <roid@mellanox.com>	2019-06-11 09:39:36 +03:00
Ilya Maximets	5fc5c50f3d	netdev: Dynamic per-port Flow API. Current issues with Flow API: * OVS calls offloading functions regardless of successful flow API initialization. (ex. on init_flow_api failure) * Static initilaization of Flow API for a netdev_class forbids having different offloading types for different instances of netdev with the same netdev_class. (ex. different vports in 'system' and 'netdev' datapaths at the same time) Solution: * Move Flow API from the netdev_class to netdev instance. * Make Flow API dynamic, i.e. probe the APIs and choose the suitable one. Side effects: * Flow API providers localized as possible in their modules. * Now we have an ability to make runtime checks. For example, we could check if particular device supports features we need, like if dpdk device supports RSS+MARK action. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Roi Dayan <roid@mellanox.com>	2019-06-11 09:39:36 +03:00
Ilya Maximets	f7e4685015	treewide: Clean up inclusions of netdev-dpdk header. 'netdev-dpdk.h' provides only 'netdev_dpdk_register' and 'free_dpdk_buf' which are not used in these files and should not be used. Leftovers from the already removed code. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2019-03-14 08:45:05 +00:00
Ilya Maximets	a47e2db209	dp-packet: Refactor offloading API. 1. No reason to have mbuf related APIs in a generic code. 2. Not only RSS/checksums should be invalidated in case of tunnel decapsulation or sending to 'ring' ports. In order to fix two above issues, new function 'dp_packet_reset_offload' introduced. In order to clean up/unify the code and simplify addition of new offloading features to non-DPDK version of dp_packet, introduced 'ol_flags' bitmask. Additionally reduced code complexity in 'dp_packet_clone_with_headroom' by using already existent generic APIs. Unfortunately, we still need to have a special case for mbuf initialization inside 'dp_packet_init__()'. 'dp_packet_init_specific()' introduced for this purpose as a generic API for initialization of the implementation-specific fields. Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2019-03-13 09:51:30 +00:00
Daniel Alvarez	3bdf8b620b	netdev: Add comment to allow removing a workaround in the future This patch [0] in glibc fixes an issue which is right now workarounded in OVS by [1]. I'm adding a comment to indicate that from glibc 2.28 and beyond, the workaround is not needed so that we can eventually remove it. [0] https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=c1f86a33ca32e26a9d6e29fc961e5ecb5e2e5eb4 [1] `3434d30686` Signed-off-by: Daniel Alvarez <dalvarez@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2018-12-12 10:49:05 -08:00
Sriharsha Basavapatna via dev	57924fc91c	revalidator: Rebalance offloaded flows based on the pps rate This is the third patch in the patch-set to support dynamic rebalancing of offloaded flows. The dynamic rebalancing functionality is implemented in this patch. The ukeys that are not scheduled for deletion are obtained and passed as input to the rebalancing routine. The rebalancing is done in the context of revalidation leader thread, after all other revalidator threads are done with gathering rebalancing data for flows. For each netdev that is in OOR state, a list of flows - both offloaded and non-offloaded (pending) - is obtained using the ukeys. For each netdev that is in OOR state, the flows are grouped and sorted into offloaded and pending flows. The offloaded flows are sorted in descending order of pps-rate, while pending flows are sorted in ascending order of pps-rate. The rebalancing is done in two phases. In the first phase, we try to offload all pending flows and if that succeeds, the OOR state on the device is cleared. If some (or none) of the pending flows could not be offloaded, then we start replacing an offloaded flow that has a lower pps-rate than a pending flow, until there are no more pending flows with a higher rate than an offloaded flow. The flows that are replaced from the device are added into kernel datapath. A new OVS configuration parameter "offload-rebalance", is added to ovsdb. The default value of this is "false". To enable this feature, set the value of this parameter to "true", which provides packets-per-second rate based policy to dynamically offload and un-offload flows. Note: This option can be enabled only when 'hw-offload' policy is enabled. It also requires 'tc-policy' to be set to 'skip_sw'; otherwise, flow offload errors (specifically ENOSPC error this feature depends on) reported by an offloaded device are supressed by TC-Flower kernel module. Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Co-authored-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com> Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com> Reviewed-by: Sathya Perla <sathya.perla@broadcom.com> Reviewed-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2018-10-19 11:27:52 +02:00
Sriharsha Basavapatna via dev	738c785ff1	dpif-netlink: Detect Out-Of-Resource condition on a netdev This is the first patch in the patch-set to support dynamic rebalancing of offloaded flows. The patch detects OOR condition on a netdev port when ENOSPC error is returned by TC-Flower while adding a flow rule. A new structure is added to the netdev called "netdev_hw_info", to store OOR related information required to perform dynamic offload-rebalancing. Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Co-authored-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com> Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com> Reviewed-by: Sathya Perla <sathya.perla@broadcom.com> Reviewed-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2018-10-19 11:27:45 +02:00
Ben Pfaff	63cf14cd7a	netdev: Properly clear 'details' when iterating in NETDEV_QOS_FOR_EACH. The function comment for netdev_queue_dump_next() said that it cleared its 'detail' argument, but it didn't actually do that, which meant that details could be incorrectly carried along from one queue to the next. Reported-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org>	2018-10-03 14:08:17 -07:00
Daniel Alvarez	3434d30686	netdev: Retry getting interfaces on inconsistent dumps from kernel This patch in glibc [0] is fixing a bug where we may be getting inconsistent dumps from the kernel when listing interfaces due to a race condition. This could happen if we try to retrieve them while interfaces are being added/removed from the system at the same time. For systems running against old glibc versions, this patch is retrying the operation up to 3 times and then proceeding by logging a warning. Note that 3 times should be enough to not delay the operation much and since it's unlikely that we hit the race condition 3 times in a row. Still, if this happened, this patch is not changing the current behavior. [0] https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=c1f86a33ca32e26a9d6e29fc961e5ecb5e2e5eb4 Signed-off-by: Daniel Alvarez <dalvarez@redhat.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Co-authored-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2018-08-15 13:38:09 -07:00
John Hurley	88dcf2aa82	netdev-provider: add class op to get block_id Add a new class op for netdevs to get the block_id if one exists. The block_id is used in offload ops to group multiple qdiscs together. Stub calls are made to the new class op (implementation to follow in further patches). The default block_id of 0 (no block) will be used in these cases. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2018-06-29 14:51:47 +02:00
Gavi Teitz	d63ca5329f	dpctl: Properly reflect a rule's offloaded to HW state Previously, any rule that is offloaded via a netdev, not necessarily to the HW, would be reported as "offloaded". This patch fixes this misalignment, and introduces the 'dp' state, as follows: rule is in HW via TC offload -> offloaded=yes dp:tc rule is in not HW over TC DP -> offloaded=no dp:tc rule is in not HW over OVS DP -> offloaded=no dp:ovs To achieve this, the flows's 'offloaded' flag was encapsulated in a new attrs struct, which contains the offloaded state of the flow and the DP layer the flow is handled in, and instead of setting the flow's 'offloaded' state based solely on the type of dump it was acquired via, for netdev flows it now sends the new attrs struct to be collected along with the rest of the flow via the netdev, allowing it to be set per flow. For TC offloads, the offloaded state is set based on the 'in_hw' and 'not_in_hw' flags received from the TC as part of the flower. If no such flag was received, due to lack of kernel support, it defaults to true. Signed-off-by: Gavi Teitz <gavi@mellanox.com> Acked-by: Roi Dayan <roid@mellanox.com> [simon: resolved conflict in lib/dpctl.man] Signed-off-by: Simon Horman <simon.horman@netronome.com>	2018-06-18 09:57:37 +02:00
William Tu	754f8acb45	netdev-native-tnl: refactor the tunnel push header. The patch adds additional 'struct netdev *' to the native tunnel's push_header() interface. This is used for later GRE sequence number support. Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2018-05-21 20:33:30 -07:00
Jan Scheurich	8492adc270	netdev: Add optional qfill output parameter to rxq_recv() If the caller provides a non-NULL qfill pointer and the netdev implemementation supports reading the rx queue fill level, the rxq_recv() function returns the remaining number of packets in the rx queue after reception of the packet burst to the caller. If the implementation does not support this, it returns -ENOTSUP instead. Reading the remaining queue fill level should not substantilly slow down the recv() operation. A first implementation is provided for ethernet and vhostuser DPDK ports in netdev-dpdk.c. This output parameter will be used in the upcoming commit for PMD performance metrics to supervise the rx queue fill level for DPDK vhostuser ports. Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Acked-by: Billy O'Mahony <billy.o.mahony@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-05-11 08:08:24 +01:00
Darrell Ball	762ceb66b2	netdev: If MTU set fails, issue warn log. Recently, an issue was debugged that was thought to be a bond failover triggered issue. It turned out to an vlan interface MTU set issue that had nothing to do with bonding or most other likely possibilities. Besides the effect of not setting the MTU to the desired value, this can result in increased netlink traffic and processing with associated wasted work. Let us flag a configuration issue at warn level (rather than dbg) to catch the problem early. Signed-off-by: Darrell Ball <dlu998@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2018-04-18 11:09:58 -07:00
Ben Pfaff	d2a60e57a8	netdev: Fix typos in comment. Fixes: ee4776b8bce1 ("netdev: New function netdev_get_ip_by_name().") Suggested-by: Mark Michelson <mmichels@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2018-04-17 08:33:41 -07:00
Ben Pfaff	ee4776b8bc	netdev: New function netdev_get_ip_by_name(). This is like netdev_get_in4_by_name() but accepts any IP address instead of just an IPv4 address. It will acquire its first user in an upcoming commit. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Mark Michelson <mmichels@redhat.com>	2018-04-16 14:53:27 -07:00
Ben Pfaff	93b7faf1c7	socket-util: Add more functions for IPv[46] sockaddr and sockaddr_storage. The existing functions for working with sockaddr_storage that contain an IPv4 or IPv6 address are useful. This commit adds more functions for working with them, as well as a parallel set of functions for struct sockaddr. This also adds an initial user for some of the new sockaddr functions in netdev.c. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Mark Michelson <mmichels@redhat.com>	2018-04-16 14:53:27 -07:00
Ben Pfaff	dfc77282c5	ofp-print: Move much of the printing code into message-specific files. Until now, the ofp-print code has had a lot of logic specific to individual messages. This code is better put with the other code specific to those messages, so this commit starts to migrate it. There is more work of a similar type to do, but this is a reasonable start. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org>	2018-03-14 11:41:22 -07:00
Justin Pettit	e883448e3f	dp-packet: Add index to DP_PACKET_BATCH_FOR_EACH to prevent shadowing. Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>	2018-02-28 14:53:27 -08:00
Michal Weglicki	971f4b394c	netdev: Custom statistics. - New get_custom_stats interface function is added to netdev. It allows particular netdev implementation to expose custom counters in dictionary format (counter name/counter value). - New statistics are retrieved using experimenter code and are printed as a result to ofctl dump-ports. - New counters are available for OpenFlow 1.4+. - New statistics are printed to output via ofctl only if those are present in reply message. - New statistics definition is added to include/openflow/intel-ext.h. - Custom statistics are implemented only for dpdk-physical port type. - DPDK-physical implementation uses xstats to collect statistics. Only dropped and error counters are exposed. Co-authored-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Michal Weglicki <michalx.weglicki@intel.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2018-01-10 15:29:13 -08:00
Ben Pfaff	34944e81f0	Merge branch 'dpdk_merge' of https://github.com/istokes/ovs into HEAD	2018-01-02 07:45:17 -08:00
Ben Pfaff	b2befd5bb2	sparse: Add guards to prevent FreeBSD-incompatible #include order. FreeBSD insists that <sys/types.h> be included before <netinet/in.h> and that <netinet/in.h> be included before <arpa/inet.h>. This adds guards to the "sparse" headers to yield a warning if this order is violated. This commit also adjusts the order of many #includes to suit this requirement. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org>	2017-12-22 12:58:02 -08:00
Ilya Maximets	b30896c969	netdev: Remove unused may_steal. Not needed anymore because 'may_steal' already handled on dpif-netdev layer and always true. Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com	2017-12-20 21:07:46 +00:00
Yifeng Sun	59b1e023ae	netdev: netdev_get_etheraddr is not functioning as advertised. netdev_get_etheraddr claims to clear 'mac' on error, but it fails to do so. When looking further into both netdev_windows_get_etheraddr() and netdev_linux_get_etheraddr(), 'mac' is also not cleared. This will lead to usage of uninitialised ofputil_phy_port.hw_addr. v1 -> v2: fixed a bug in v1 found by Ben, thanks Ben. Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2017-11-30 13:36:29 -08:00
Ben Pfaff	fa54741ea5	netdev: Eliminate redundant ifindex mapping. Until now, the code for mapping ODP port number to ifindexes and vice versa has maintained two completely separate data structures, one for each direction. It was possible for the two mappings to become out of sync with each other since either one could change independently. This commit merges them into a single data structure (with two indexes), which at least means that if one is removed then the other is as well. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: William Tu <u9012063@gmail.com>	2017-11-15 10:57:49 -08:00
Ben Pfaff	8639555322	netdev: Indentation and style fixes. White space changes only. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: William Tu <u9012063@gmail.com> Reviewed-by: Greg Rose <gvrose8192@gmail.com>	2017-11-15 10:57:30 -08:00
Ben Pfaff	0d8efdc9ca	netdev: Change macro to function. There was no reason that this should have been a macro. Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Greg Rose <gvrose8192@gmail.com>	2017-11-14 10:14:18 -08:00
Ashish Varma	97459c2f01	netdev, dpif: fix the crash/assert on port delete a crash is seen in "netdev_ports_remove" when an interface is deleted and added back in the system and when the interface is part of a bridge configuration. e.g. steps: create a tap0 interface using "ip tuntap add.." add the tap0 interface to br0 using "ovs-vsctl add-port.." delete the tap0 interface from system using "ip tuntap del.." add the tap0 interface back in system using "ip tuntap add.." (this changes the ifindex of the interface) delete tap0 from br0 using "ovs-vsctl del-port.." In the function "netdev_ports_insert", two hmap entries were created for mapping "portnum -> netdev" and "ifindex -> portnum". When the interface is deleted from the system, the "netdev_ports_remove" function is not getting called and the old ifindex entry is not getting cleaned up from the "ifindex_to_port" hmap. As part of the fix, added function "dpif_port_remove" which will call "netdev_ports_remove" in the path where the interface deletion from the system is detected. Also, in "netdev_ports_remove", added the code where the "ifindex_to_port_data" (ifindex -> portnum map node) is getting freed when the ifindex is not available any more. (as the interface is already deleted.) VMware-BZ: #1975788 Signed-off-by: Ashish Varma <ashishvarma.ovs@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2017-11-13 11:05:31 -08:00
Ilya Maximets	14f137ba1f	netdev: Remove EOPNOTSUPP related comment for netdev_send(). Since 57eebbb4c315, the caller must make sure that 'netdev' supports sending. This mentioned at the start of the comment. Fixes: 57eebbb4c315 ("dpif-netdev: Don't try to output on a device without txqs.") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Greg Rose <gvrose8192@gmail.com>	2017-11-09 14:11:30 -08:00

1 2 3 4 5 ...

295 Commits