mir/ovs - ovs - Mike's Git repositories

mir/ovs

mirror of https://github.com/openvswitch/ovs synced 2025-08-22 01:51:26 +00:00

Author	SHA1	Message	Date
David Marchand	6090603703	netdev-dpdk: Remove limit on maximum descriptors count. Using larger rxq can be beneficial in highly bursty setups. Remove the artificial limit on the count of descriptors in rxq and txq. The device driver will limit the values in any case. Reported-at: https://issues.redhat.com/browse/FDP-1415 Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-06-20 21:17:02 +02:00
Ilya Maximets	19b8941620	tunnels: Remove support for deprecated STT and LISP. STT and LISP tunnel types were deprecated and marked for removal in the following commits in the OVS 3.5 release: 3b37a6154a59 ("netdev-vport: Deprecate STT tunnel port type.") 8d7ac031c03d ("netdev-vport: Deprecate LISP tunnel port type.") Main reasons were that STT was rejected in upstream kernel and the LISP was never upstreamed as well and doesn't really have a supported implementation. Both protocols also appear to have lost their former relevance. Removing both now. While at it, also fixing some small documentation issues and comments. Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Alin Serdean <aserdean@ovn.org> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-02-28 17:19:41 +01:00
Roi Dayan	2472845c39	general: Use ovs_get_program_version(). ovs_get_program_version() already returns the formatted program name and version instead of doing it again. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>	2025-01-23 15:47:14 +01:00
Maxime Coquelin	a24413cd3e	netdev-dpdk: Set vhost port maximum number of queue pairs. This patch uses the new rte_vhost_driver_set_max_queue_num API to set the maximum number of queue pairs supported by the vhost-user port. This is required for VDUSE which needs to specify the maximum number of queue pairs at creation time. Without it 128 queue pairs metadata would be allocated. To configure it, a new 'vhost-max-queue-pairs' option is introduced. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Kevin Traynor <ktraynor@redhat.com>	2025-01-10 15:38:28 +00:00
Ilya Maximets	49f299313d	treewide: Refer to SSL configuration as SSL/TLS. SSL protocol family is not actually being used or supported in OVS. What we use is actually TLS. Terms "SSL" and "TLS" are often used interchangeably in modern software and refer to the same thing, which is normally just TLS. Let's replace "SSL" with "SSL/TLS" in documentation and user-visible messages, where it makes sense. This may make it more clear what is meant for a less experienced user that may look for TLS support in OVS and not find much. We're not changing any actual code, because, for example, most of OpenSSL APIs are using just SSL, for historical reasons. And our database is using "SSL" table. We may consider migrating to "TLS" naming for user-visible configuration like command line arguments and database names, but that will require extra work on making sure upgrades can still work. In general, a slightly more clear documentation should be enough for now, especially since term SSL is still widely used in the industry. "SSL/TLS" is chosen over "TLS/SSL" simply because our user-visible configuration knobs are using "SSL" naming, e.g. '--ssl-cyphers' or 'ovs-vsctl set-ssl'. So, it might be less confusing this way. We may switch that, if we decide on re-working the user-visible commands towards "TLS" naming, or providing both alternatives. Some other projects did similar changes. For example, the python ssl library is now using "TLS/SSL" in the documentation whenever possible. Same goes for OpenSSL itself. Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-12-13 13:00:27 +01:00
Ilya Maximets	9f0c46b5d4	bridge: Fix log spam about prefixes. If prefixes are set for the flow table, ovs-vswitchd will print them out to the log whenever something changes in the database. Since normally prefixes will be set for every OpenFlow table, it will print 255 log messages per iteration. This is very annoying in dynamic environments like Kubernetes, where database changes can happen frequently, obscuring and erasing useful logs on log rotation. These log messages are not very important. The information can be looked up in the database and normally the values will not actually change after initial setup. Move the log to debug level. While at it, rate limit the warnings about misconfigured prefixes, as they may be too much as well. And make the print out a little nicer by only printing once if multiple adjacent tables have the same prefixes configured. In most cases that will reduce the amount of logs from 255 lines to 1 per iteration with debug logging enabled. We're also now logging the default values as well, since it's under debug and will not actually add that many log lines with the new collapsed format. This makes debug logs more accurate/useful. An additional improvement might be to not print if nothing actually changed, but that will require either per-table per-bridge tracking of previous values or changing parts of the ofproto API to tell the bridge layer if something changed or not. Doesn't seem necessary at the moment. Fixes: 13751fd88c4b ("Classifier: Track address prefixes.") Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-12-11 20:38:56 +01:00
Ilya Maximets	3b37a6154a	netdev-vport: Deprecate STT tunnel port type. STT tunnel implementation was rejected in the upstream Linux kernel long time ago and will probably never be there. So, the only implementation for Linux is in the OOT kernel module shipped with OVS 2.17. It is deprecated and will reach end of life in Feb 2025. In addition, modern network interface cards support various hardware offload features with UDP tunnels, diminishing the main selling point of STT - the ability to reuse hardware offload features meant for TCP. Deprecate the port type now, so it can be removed once 2.17 is EoL. There is another implementation for this tunnel type in the Windows datapath. However, the protocol itself is considered harmful as it may confuse stateful network hardware by pretending to be TCP (hence the reason it was rejected in the Linux kernel). So, it is better if we deprecate this implementation and stop supporting it as well. The standard draft for the protocol itself is also expired and archived with the latest update made in 2016: https://datatracker.ietf.org/doc/draft-davie-stt/ Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-12-11 20:38:56 +01:00
Ilya Maximets	8d7ac031c0	netdev-vport: Deprecate LISP tunnel port type. Tunnel support is in upstream Linux kernel since 2013. However, despite the FAQ saying so, I'm not aware of actual attempts to bring support for LISP tunnels upstream. The only available implementation is in OOT kernel module shipped with OVS 2.17. It is deprecated and will reach EoL in Feb 2025. Mark the tunnel port type as deprecated, so we can fully remove the support once the only available implementation reaches end of life together with OVS 2.17. Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-12-11 20:38:56 +01:00
Ilya Maximets	2af7cef264	ofproto: Enable address prefix tracking for IPv6 by default. It is common to have flow tables with OpenFlow rules for both IPv4 and IPv6 traffic at the same time. One major example is OVN. Today only IPv4 addresses and L4 ports are prefix matched by default. Since recently, it's possible to turn on the optimization for all 4 fields at the same time (nw_dst, nw_src, ipv6_dst and ipv6_src), but it is an inconvenience for users. IPv6 configurations become more and more common in the real world, so having IPv6 tables not optimized by default is not good. Enable prefix tree lookups for IPv6 addresses by default in addition to IPv4. This change increases memory consumption slightly as well as takes a few extra cycles per flow to create and later iterate over additional prefix trees. However, IPv4 and IPv6 matches are mutually exclusive, so performance overhead should be minimal, especially in comparison with the benefits this configuration brings to IPv6 setups reducing the number of datapath flows by default. A new test added to check that defaults are working as expected. Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-11-27 19:20:04 +01:00
Ilya Maximets	4394f72818	classifier: Increase the maximum number of prefixes (tries). Today users can enable prefix matches for tun_id, tun_src, tun_dst, ip_src, ip_dst, ipv6_src and ipv6_dst. However, they are limited to only 3 of these enabled at the same time. This means that if our flow table is handling both IPv4 and IPv6 traffic, we can't optimize all the addresses, we'll either have to split IPv4 and IPv6 rules into separate tables or sacrifice one of the fields, as we can select only 3 out of 4 fields (ip_src, ip_dst, ipv6_src and ipv6_dst). The maximum number of tries is a little arbitrary. Increasing it will slightly increase memory usage and may take a couple extra processing cycles, but should not change classification results, so should be reasonable. Actually enabling more prefixes will consume more memory and reduce efficiency of a single flow classification, but that's a trade user can make knowing the traffic pattern and how their particular flow table looks like. While efficiency of a single flow classification may go down, the overall performance of the system may be significantly improved by having way less datapath flows with wider matches. The number of tunnels in a typical setup is not that high, so I'm not sure if it makes sense to increase the limit higher. At the same time combined IPv4 + IPv6 handling is pretty common. For example, that's the case with OVN. Tests in ofproto-dpif.at cover IPv4 and IPv6 address classification separately, and these fields can't overlap, so not adding any new tests. Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-11-27 19:18:29 +01:00
Mike Pattrick	3b1882261c	ofproto-dpif-mirror: Add support for pre-selection filter. Currently a bridge mirror will collect all packets and tools like ovs-tcpdump can apply additional filters after they have already been duplicated by vswitchd. This can result in inefficient collection. This patch adds support to apply pre-selection to bridge mirrors, which can limit which packets are mirrored based on flow metadata. This significantly improves overall vswitchd performance during mirroring if only a subset of traffic is required. Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-07-16 00:28:23 +02:00
Timothy Redaelli	9e6d43ef32	rhel: Make the version, displayed to the user, customizable. Since on CentOS/RHEL the builds are based on stable branches and not on tags for debugging purpose it's better to have the downstream version as version so it's easier to know which commits are included in a build. This commit adds --with-version-suffix as ./configure option in order to set an OVS version suffix that should be shown to the user via ovs-vsctl -V and, so, also on database, on ovs-vsctl show and the other utilities. --with-version-suffix is used in Fedora/CentOS/RHEL spec file in order to have the version be aligned with the downstream one. Signed-off-by: Timothy Redaelli <tredaelli@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-07-15 16:13:09 +02:00
Adrian Moreno	516569d31f	ofproto: xlate: Make sampled drops explicit. When the flow translation results in a datapath action list whose last action is an "observational" action, i.e: one generated for IPFIX, sFlow or local sampling applications, the packet is actually going to be dropped (and observed). In that case, add an explicit drop action so that drop statistics remain accurate. This behavior is controlled by a configurable boolean knob called "explicit_sampled_drops" Combine the "optimizations" and other odp_actions "tweaks" into a single function. Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-07-15 11:31:09 +02:00
Adrian Moreno	d54b967e8d	vswitchd: Add local sampling to vswitchd schema. Add as new column in the Flow_Sample_Collector_Set table named "local_group_id" which enables this feature. Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-07-15 11:31:09 +02:00
Adrian Moreno	d0afbf0944	ofproto_dpif: Check for psample support. Only kernel datapath supports this action so add a function in dpif.c that checks for that. Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-07-15 11:15:30 +02:00
Mike Pattrick	d7e77143fb	tunnel: Allow UDP zero checksum with IPv6 tunnels. This patch adopts the proposed RFC 6935 by allowing null UDP checksums even if the tunnel protocol is IPv6. This is already supported by Linux through the udp6zerocsumtx tunnel option. It is disabled by default and IPv6 tunnels are flagged as requiring a checksum, but this patch enables the user to set csum=false on IPv6 tunnels. Acked-by: Simon Horman <horms@ovn.org> Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-07-12 11:54:45 +02:00
Ilya Maximets	56e315937e	vswitchd: Only lock pages that are faulted in. The main purpose of locking the memory is to ensure that OVS can keep doing what it did before in case of increased memory pressure, e.g., during VM ingest / migration. Fulfilling this requirement can be achieved without locking all the allocated memory, but only the pages already accessed in the past (faulted in). Processing of the new traffic involves new memory allocations. Latency on these operations can't be guaranteed by the locking. The main difference would be the pre-faulting of the stack memory. However, in order to revalidate or process upcalls on the same traffic, the same amount of stack is likely needed, so all the necessary memory will already be faulted in. Switch 'mlockall' to MCL_ONFAULT to avoid consuming unnecessarily large amounts of RAM on systems with high core counts. For example, in a densely populated OVN cluster this saves about 650 MB of RAM per node on a system with 64 cores. This equates to 320 GB of allocated but unused RAM in a 500 node cluster. This also makes OVS better suited by default for small systems with limited amount of memory. The MCL_ONFAULT flag was introduced in Linux kernel 4.4 and wasn't available at the time of '--mlockall' introduction, but we can use it now. Falling back to an old way of locking in case we're running on an older kernel just in case. Only locking the faulted in pages also makes locking compatible with vhost post-copy live migration by default, because we'll no longer pre-fault all the guest's memory. Post-copy relies on userfaultfd to work on shared huge pages, which is only available in 4.11+ kernels. So, technically, it should not be possible for MCL_ONFAULT to fail and the call without it to succeed. But keeping the check just in case for now. Acked-by: Simon Horman <horms@ovn.org> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-06-28 23:44:53 +02:00
David Marchand	2f196c80e7	netdev-dpdk: Use LSC interrupt mode. Querying link status may get delayed for an undeterministic (long) time with mlx5 ports. This is a consequence of the mlx5 driver calling ethtool kernel API and getting stuck on the kernel RTNL lock while some other operation is in progress under this lock. One impact for long link status query is that it is called under the bond lock taken in write mode periodically in bond_run(). In parallel, datapath threads may block requesting to read bonding related info (like for example in bond_check_admissibility()). The LSC interrupt mode is available with many DPDK drivers and is used by default with testpmd. It seems safe enough to switch on this feature by default in OVS. We keep the per interface option to disable this feature in case of an unforeseen bug. Signed-off-by: David Marchand <david.marchand@redhat.com> Reviewed-by: Robin Jarry <rjarry@redhat.com> Acked-by: Mike Pattrick <mkp@redhat.com> Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Aaron Conole <aconole@redhat.com>	2024-06-24 12:46:11 +01:00
Simon Horman	29e09c8091	vswitch.xml: Use member wording for bonds. Since the patch-set that included [1] there has been a policy of using the term member for bonds, LACP, and bundle contexts. This is consistent with the more recently adopted policy of using the inclusive naming word list v1 [2, 3]. This patch addresses two instances where the term member should be used in vswitch.xml. It does not address instances of alternative wording that require code updates, which can addressed as follow-up activity. [1] 91fc374a9c5a ("Eliminate use of term "slave" in bond, LACP, and bundle contexts.") [2] df5e5cf4318a ("Documentation: Add section on inclusive language.") [3] https://inclusivenaming.org/word-lists/ Signed-off-by: Simon Horman <horms@ovn.org> Acked-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com>	2024-03-06 10:10:29 +00:00
Kevin Traynor	4cbbf56e6c	dpif-netdev: Add per PMD sleep config. Extend 'pmd-sleep-max' so that individual PMD thread cores may have a specified max sleep request value. Existing behaviour is maintained. Any PMD thread core without a value will use the global value if set or default no sleep. To set PMD thread cores 8 and 9 to never request a load based sleep and all other PMD thread cores to be able to request a max sleep of 50 usecs: $ ovs-vsctl set open_vswitch . other_config:pmd-sleep-max=50,8:0,9:0 To set PMD thread cores 10 and 11 to request a max sleep of 100 usecs and all other PMD thread cores to never request a sleep: $ ovs-vsctl set open_vswitch . other_config:pmd-sleep-max=10:100,11:100 'pmd-sleep-show' is updated to show the max sleep value for each PMD thread. Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-12-16 01:07:59 +01:00
Ales Musil	27e0349e20	ct-dpif: Enforce CT zone limit protection. Make sure that if any zone limit was set via DB all zones are forced to be set there also. This is done by tracking which datapath has zone limit protection and it is reflected in the dpctl command. If the datapath is protected the dpctl command will return permission error. Signed-off-by: Ales Musil <amusil@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-12-05 21:54:48 +01:00
Ales Musil	1b3557f53d	vswitchd, ofproto-dpif: Propagate the CT limit from database. Propagate the CT limit that is present in the DB into datapath. The limit is currently only propagated on change and can be overwritten by the dpctl commands. Signed-off-by: Ales Musil <amusil@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-12-05 21:54:48 +01:00
Ales Musil	3248833619	ovs-vsctl: Add limit to CT zone. Add limit to the CT zone DB table with ovs-vsctl helper methods. The limit has two special values besides any number, 0 is unlimited and empty limit is to leave the value untouched in the datapath. This is preparation step and the value is not yet propagated to the datapath. Signed-off-by: Ales Musil <amusil@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-12-05 21:54:47 +01:00
Jakob Meng	c19a5b48bf	netdev-dpdk: Sync and clean {get, set}_config() callbacks. For better usability, the function pairs get_config() and set_config() for netdevs should be symmetric: Options which are accepted by set_config() should be returned by get_config() and the latter should output valid options for set_config() only. This patch moves key-value pairs which are not valid options from get_config() to the get_status() callback. For example, get_config() in lib/netdev-dpdk.c returned {configured,requested}_{rx,tx}_queues previously. For requested rx queues the proper option name is n_rxq, so requested_rx_queues has been renamed respectively. Tx queues cannot be changed by the user, hence requested_tx_queues has been dropped. Both configured_{rx,tx}_queues will be returned as n_{r,t}xq in the get_status() callback. The netdev dpdk classes no longer share a common get_config() callback, instead both the dpdk_class and the dpdk_vhost_client_class define their own callbacks. The get_config() callback for dpdk_vhost_class has been dropped because it does not have a set_config() callback. The documentation in vswitchd/vswitch.xml for status columns as well as tests have been updated accordingly. Reported-at: https://bugzilla.redhat.com/1949855 Signed-off-by: Jakob Meng <code@jakobmeng.de> Reviewed-by: Robin Jarry <rjarry@redhat.com> Signed-off-by: Kevin Traynor <ktraynor@redhat.com>	2023-11-14 11:03:35 +00:00
Jakob Meng	d614f2863f	netdev-afxdp: Sync and clean {get, set}_config() callbacks. For better usability, the function pairs get_config() and set_config() for netdevs should be symmetric: Options which are accepted by set_config() should be returned by get_config() and the latter should output valid options for set_config() only. This patch also moves key-value pairs which are not valid options from get_config() to the get_status() callback. The documentation in vswitchd/vswitch.xml for status columns has been updated accordingly. Reported-at: https://bugzilla.redhat.com/1949855 Signed-off-by: Jakob Meng <code@jakobmeng.de> Signed-off-by: Kevin Traynor <ktraynor@redhat.com>	2023-11-14 11:03:28 +00:00
Kevin Traynor	2c841eef95	vswitch.xml: Add entry for dpdkvhostuser userspace-tso. get_status for dpdkvhostuser(/client) netdev class may display userspace-tso status. Fixes: a5669fd51c9b ("netdev-dpdk: Drop TSO in case of conflicting virtio features.") Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Simon Horman <horms@ovn.org>	2023-10-30 11:02:37 +00:00
Kevin Traynor	e8914353ce	vswitch.xml: Add dpdkvhostuser group status. Add group for dpdkvhostuser(/client) netdev. Adding as a single group as they display the same status, one of which is 'mode' to indicate if it's client or server. Fixes: b2e8b12f8a82 ("netdev-dpdk: add vhost-user get_status.") Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Simon Horman <horms@ovn.org>	2023-10-30 11:02:15 +00:00
Jakob Meng	bb6ed2472f	netdev-dpdk: Document rx-steering status options. Fixes: fc06ea9a1883 ("netdev-dpdk: Add custom rx-steering configuration.") Signed-off-by: Jakob Meng <code@jakobmeng.de> Acked-by: Simon Horman <horms@ovn.org> Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Kevin Traynor <ktraynor@redhat.com>	2023-10-10 11:23:35 +01:00
Jakob Meng	e9ada16292	netdev-dpdk: Update docs for interface info. The status options pci-vendor_id and pci-device_id for dpdk netdevs have been replaced by bus_info. This patch updates the documentation in vswitchd/vswitch.xml accordingly. Fixes: a77c7796f23a ("dpdk: Update to use v22.11.1.") Signed-off-by: Jakob Meng <code@jakobmeng.de> Acked-by: Simon Horman <horms@ovn.org> Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Kevin Traynor <ktraynor@redhat.com>	2023-10-10 11:23:35 +01:00
Jakob Meng	8020eff9a0	netdev-dpdk: Document status options for VF MAC address. Fixes: f4336f504b17 ("netdev-dpdk: Add option to configure VF MAC address. ") Signed-off-by: Jakob Meng <code@jakobmeng.de> Acked-by: Simon Horman <horms@ovn.org> Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Kevin Traynor <ktraynor@redhat.com>	2023-10-10 11:23:35 +01:00
Ilya Maximets	24520a401e	vswitchd: Wait for a bridge exit before replying to exit unixctl. Before the cleanup option, the bridge_exit() call was fairly fast, because it didn't include any particularly long operations. However, with the cleanup flag, this function destroys a lot of datapath resources freeing a lot of memory, waiting on RCU and talking to the kernel. That may take a noticeable amount of time, especially on a busy system or under profilers/sanitizers. However, the unixctl 'exit' command replies instantly without waiting for any work to actually be done. This may cause system test failures or other issues where scripts expect ovs-vswitchd to exit or destroy all the datapath resources shortly after appctl call. Fix that by waiting for the bridge_exit() before replying to the user. At least, all the datapath resources will actually be destroyed by the time ovs-appctl exits. Also moving a structure from stack to global. Seems cleaner this way. Since we're not replying right away and it's technically possible to have multiple clients requesting exit at the same time, storing connections in an array. Fixes: fe13ccdca6a2 ("vswitchd: Add --cleanup option to the 'appctl exit' command") Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-07-18 20:45:00 +02:00
Adrian Moreno	6240c0b4c8	netdev: Add netdev_get_speed() to netdev API. Currently, the netdev's speed is being calculated by taking the link's feature bits (using netdev_get_features()) and transforming them into bps. This mechanism can be both inaccurate and difficult to maintain, mainly because we currently use the feature bits supported by OpenFlow which would have to be extended to support all new feature bits of all netdev implementations while keeping the OpenFlow API intact. In order to expose the link speed accurately for all current and future hardware, add a new netdev API call that allows the implementations to provide the current and maximum link speeds in Mbps. Internally, the logic to get the maximum supported speed still relies on feature bits so it might still get out of sync in the future. However, the maximum configurable speed is not used as much as the current speed and these feature bits are not exposed through the netdev interface so it should be easier to add more. Use this new function instead of netdev_get_features() where the link speed is needed. As a consequence of this patch, link speeds of cards is properly reported (internally in OVSDB) even if not supported by OpenFlow. A test verifies this behavior using a tap device. Also, in order to avoid using the old, this patch adds a checkpatch.py warning if the old API is used. Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2137567 Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-07-17 20:03:32 +02:00
Kevin Traynor	023dcdc7a1	dpif-netdev: Rename pmd-maxsleep config option. other_config:pmd-maxsleep is a config option to allow PMD thread cores to sleep under low or no load conditions. Rename it to 'pmd-sleep-max' to allow a more structured name and so that additional options or command can follow the 'pmd-sleep-xyz' pattern. Use of other_config:pmd-maxsleep is deprecated to be removed in a future release and will result in a warning. Reviewed-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-07-15 00:11:21 +02:00
Sayali Naval	8e073791d4	bridge: Fix unexpected values for IPFIX enable-input/output-sampling. As per the Open vSwitch Manual ovs-vsctl(8) the Bridge IPFIX parameters can be passed as follows: ovs-vsctl -- set Bridge br0 ipfix=@i \ -- --id=@i create IPFIX targets=\"192.168.0.34:4739\" \ obs_domain_id=123 obs_point_id=456 cache_active_timeout=60 \ cache_max_flows=13 \ other_config:enable-input-sampling=false \ other_config:enable-output-sampling=false where the default values are: enable_input_sampling: true enable_output_sampling: true But in the existing code these 2 parameters take up unexpected values in some scenarios: be_opts.enable_input_sampling = !smap_get_bool(&be_cfg->other_config, "enable-input-sampling", false); be_opts.enable_output_sampling = !smap_get_bool(&be_cfg->other_config, "enable-output-sampling", false); Here, the function smap_get_bool is being used with a negation. This returns expected values for the default case (since the above code will negate “false” we get from smap_get bool function and return the value “true”) but unexpected values for the case where the sampling value is passed through the CLI. For example, if we pass "true" for other_config:enable-input-sampling in the CLI, the above code will negate the “true” value we get from the smap_bool function and return the value “false”. Same would be the case for enable_output_sampling. Acked-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Sayali Naval <sanaval@cisco.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-07-12 00:12:49 +02:00
Robin Jarry	fc06ea9a18	netdev-dpdk: Add custom rx-steering configuration. Some control protocols are used to maintain link status between forwarding engines (e.g. LACP). When the system is not sized properly, the PMD threads may not be able to process all incoming traffic from the configured Rx queues. When a signaling packet of such protocols is dropped, it can cause link flapping, worsening the situation. Use the rte_flow API to redirect these protocols into a dedicated Rx queue. The assumption is made that the ratio between control protocol traffic and user data traffic is very low and thus this dedicated Rx queue will never get full. Re-program the RSS redirection table to only use the other Rx queues. The additional Rx queue will be assigned a PMD core like any other Rx queue. Polling that extra queue may introduce increased latency and a slight performance penalty at the benefit of preventing link flapping. This feature must be enabled per port on specific protocols via the rx-steering option. This option takes "rss" followed by a "+" separated list of protocol names. It is only supported on ethernet ports. This feature is experimental. If the user has already configured multiple Rx queues on the port, an additional one will be allocated for control packets. If the hardware cannot satisfy the number of requested Rx queues, the last Rx queue will be assigned for control plane. If only one Rx queue is available, the rx-steering feature will be disabled. If the hardware does not support the rte_flow matchers/actions, the rx-steering feature will be completely disabled on the port and regular rss will be performed instead. It cannot be enabled when other-config:hw-offload=true as it may conflict with the offloaded flows. Similarly, if hw-offload is enabled, custom rx-steering will be forcibly disabled on all ports and replaced by regular rss. Example use: ovs-vsctl add-bond br-phy bond0 phy0 phy1 -- \ set interface phy0 type=dpdk options:dpdk-devargs=0000:ca:00.0 -- \ set interface phy0 options:rx-steering=rss+lacp -- \ set interface phy1 type=dpdk options:dpdk-devargs=0000:ca:00.1 -- \ set interface phy1 options:rx-steering=rss+lacp As a starting point, only one protocol is supported: LACP. Other protocols can be added in the future. NIC compatibility should be checked. To validate that this works as intended, I used a traffic generator to generate random traffic slightly above the machine capacity at line rate on a two ports bond interface. OVS is configured to receive traffic on two VLANs and pop/push them in a br-int bridge based on tags set on patch ports. +----------------------+ \| DUT \| \|+--------------------+\| \|\| br-int \|\| in_port=patch10,actions=mod_dl_src:$patch11, \|\| \|\| mod_dl_dst:$tgen0, \|\| \|\| output:patch10 \|\| \|\| in_port=patch11,actions=mod_dl_src:$patch10 \|\| \|\| mod_dl_dst:$tgen0, \|\| patch10 patch11 \|\| output:patch10 \|+---\|-----------\|----+\| \| \| \| \| \|+---\|-----------\|----+\| \|\| patch00 patch01 \|\| \|\| tag:10 tag:20 \|\| \|\| \|\| \|\| br-phy \|\| default flow, action=NORMAL \|\| \|\| \|\| bond0 \|\| balance-slb, lacp=passive, lacp-time=fast \|\| phy0 phy1 \|\| \|+------\|-----\|-------+\| +-------\|-----\|--------+ \| \| +-------\|-----\|--------+ \| port0 port1 \| balance L3/L4, lacp=active, lacp-time=fast \| lag \| mode trunk VLANs 10, 20 \| \| \| switch \| \| \| \| vlan 10 vlan 20 \| mode access \| port2 port3 \| +-----\|----------\|-----+ \| \| +-----\|----------\|-----+ \| tgen0 tgen1 \| Random traffic that is properly balanced \| \| across the bond ports in both directions. \| traffic generator \| +----------------------+ Without rx-steering, the bond0 links are randomly switching to "defaulted" when one of the LACP packets sent by the switch is dropped because the RX queues are full and the PMD threads did not process them fast enough. When that happens, all traffic must go through a single link which causes above line rate traffic to be dropped. ~# ovs-appctl lacp/show-stats bond0 ---- bond0 statistics ---- member: phy0: TX PDUs: 347246 RX PDUs: 14865 RX Bad PDUs: 0 RX Marker Request PDUs: 0 Link Expired: 168 Link Defaulted: 0 Carrier Status Changed: 0 member: phy1: TX PDUs: 347245 RX PDUs: 14919 RX Bad PDUs: 0 RX Marker Request PDUs: 0 Link Expired: 147 Link Defaulted: 1 Carrier Status Changed: 0 When rx-steering is enabled, no LACP packet is dropped and the bond links remain enabled at all times, maximizing the throughput. Neither the "Link Expired" nor the "Link Defaulted" counters are incremented anymore. This feature may be considered as "QoS". However, it does not work by limiting the rate of traffic explicitly. It only guarantees that some protocols have a lower chance of being dropped because the PMD cores cannot keep up with regular traffic. The choice of protocols is limited on purpose. This is not meant to be configurable by users. Some limited configurability could be considered in the future but it would expose to more potential issues if users are accidentally redirecting all traffic in the isolated queue. Acked-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Robin Jarry <rjarry@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-07-10 15:49:44 +02:00
Nobuhiro MIKI	701c2dbfb8	userspace: Add new option srv6_flowlabel in SRv6 tunnel. It supports flowlabel based load balancing by controlling the flowlabel of outer IPv6 header, which is already implemented in Linux kernel as seg6_flowlabel sysctl [1]. [1]: https://docs.kernel.org/networking/seg6-sysctl.html Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-05-25 17:08:32 +02:00
Nobuhiro MIKI	0f34ecbd5a	vswitch.xml: Add description of SRv6 tunnel and related options. The description of SRv6 was missing in vswitch.xml, which is used to generate the man page, so this patch adds it. Fixes: 03fc1ad78521 ("userspace: Add SRv6 tunnel support.") Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-03-30 22:10:40 +02:00
Aaron Conole	07cf5810de	dpdk: Allow retaining CAP_SYS_RAWIO privileges. Open vSwitch generally tries to let the underlying operating system managed the low level details of hardware, for example DMA mapping, bus arbitration, etc. However, when using DPDK, the underlying operating system yields control of many of these details to userspace for management. In the case of some DPDK port drivers, configuring rte_flow or even allocating resources may require access to iopl/ioperm calls, which are guarded by the CAP_SYS_RAWIO privilege on linux systems. These calls are dangerous, and can allow a process to completely compromise a system. However, they are needed in the case of some userspace driver code which manages the hardware (for example, the mlx implementation of backend support for rte_flow). Here, we create an opt-in flag passed to the command line to allow this access. We need to do this before ever accessing the database, because we want to drop all privileges asap, and cannot wait for a connection to the database to be established and functional before dropping. There may be distribution specific ways to do capability management as well (using for example, systemd), but they are not as universal to the vswitchd as a flag. Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Aaron Conole <aconole@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Gaetan Rivet <gaetanr@nvidia.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-03-22 18:56:02 +01:00
Ales Musil	e90a0727f1	vswitch: Add missing documentation for "ct_flush" capability. Fixes: 08146bf7d9b4 ("openflow: Add extension to flush CT by generic match.") Signed-off-by: Ales Musil <amusil@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-03-15 21:24:38 +01:00
Eelco Chaudron	29720e378e	ofproto-dpif-upcall: Wait for valid hw flow stats before applying min-revalidate-pps. Depending on the driver implementation, it can take from 0.2 seconds up to 2 seconds before offloaded flow statistics are updated. This is true for both TC and rte_flow-based offloading. This is causing a problem with min-revalidate-pps, as old statistic values are used during this period. This fix will wait for at least 2 seconds, by default, before assuming no packets where received during this period. Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-03-15 21:22:22 +01:00
Adrian Moreno	f1f278f5e1	ipfix: Make template and stats interval configurable. Add options to the IPFIX table configure the interval to send statistics and template information. Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-02-27 21:18:59 +01:00
Miika Petäjäniemi	a6195e2c42	netdev-linux: Add jitter parameter to the netem qos options. Adds jitter option to enable emulating latency fluctuation with netem. Submitted-at: https://github.com/openvswitch/ovs/pull/407 Signed-off-by: Miika Petäjäniemi <miika.petajaniemi@solita.fi> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-02-21 14:25:57 +01:00
wangchuanlei	e22e1f6725	dpctl: Add support to count upcall packets. Add support to count upcall packets per port, both succeed and failed, which is a better way to see how many packets upcalled on each interface. Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: wangchuanlei <wangchuanlei@inspur.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-01-31 17:40:50 +01:00
Han Zhou	e5b3cb9995	revalidator: Allow min-revalidator-pps to be 0. Today the minimum value for this setting is 1. This patch allows it to be 0, meaning not checking pps at all, and always do revalidation. This is particularly useful for environments where some of the applications with long-lived connections may have very low traffic for certain period but have high rate of burst periodically. It is desirable to keep the datapath flows instead of periodically deleting them to avoid burst of packet miss to userspace. When setting to 0, there may be more datapath flows to be revalidated, resulting in higher CPU cost of revalidator threads. This is the downside but in certain cases this is still more desirable than packet misses to user space. Signed-off-by: Han Zhou <hzhou@ovn.org> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-01-27 16:09:10 +01:00
Kevin Traynor	948767a18d	dpif-netdev: Set PMD load based sleep start/inc to 1 us. Now that the timer slack for the PMD threads is reduced we can also reduce the start/increment for PMD load based sleeping to match it. This will further reduce initial sleep times making it more resilient to interfaces that might be sensitive to large sleep times. Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-01-23 17:23:28 +01:00
Kevin Traynor	de3bbdc479	dpif-netdev: Add PMD load based sleeping. Sleep for an incremental amount of time if none of the Rx queues assigned to a PMD have at least half a batch of packets (i.e. 16 pkts) on an polling iteration of the PMD. Upon detecting the threshold of >= 16 pkts on an Rxq, reset the sleep time to zero (i.e. no sleep). Sleep time will be increased on each iteration where the low load conditions remain up to a total of the max sleep time which is set by the user e.g: ovs-vsctl set Open_vSwitch . other_config:pmd-maxsleep=500 The default pmd-maxsleep value is 0, which means that no sleeps will occur and the default behaviour is unchanged from previously. Also add new stats to pmd-perf-show to get visibility of operation e.g. ... - sleep iterations: 153994 ( 76.8 % of iterations) Sleep time (us): 9159399 ( 59 us/iteration avg.) ... Reviewed-by: Robin Jarry <rjarry@redhat.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-01-12 18:56:05 +01:00
Daniel Ding	093915e04a	vswitch.ovsschema: Set bfd_status to ephemeral. When restart openvswitch, the bfd status will be kept before ovs-vswitchd running. And if the ovs-vswitchd has high workload, which will defer updating bfd status, which not we excepted. Signed-off-by: Daniel Ding <zhihui.ding@easystack.cn> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-12-06 16:21:54 +01:00
Ilya Maximets	b22c4d8403	netdev: Assume default link speed to be 10 Gbps instead of 100 Mbps. 100 Mbps was a fair assumption 13 years ago. Modern days 10 Gbps seems like a good value in case no information is available otherwise. The change mainly affects QoS which is currently limited to 100 Mbps if the user didn't specify 'max-rate' and the card doesn't report the speed or OVS doesn't have a predefined enumeration for the speed reported by the NIC. Calculation of the path cost for STP/RSTP is also affected if OVS is unable to determine the link speed. Lower link speed adapters are typically good at reporting their speed, so chances for overshoot should be low. But newer high-speed adapters, for which there is no speed enumeration or if there are some other issues, will not suffer that much. Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-11-30 14:42:59 +01:00
David Marchand	c6062d1077	vswitchd: Publish per iface received multicast packets. The count of received multicast packets has been computed internally, but not exposed to ovsdb. Fix this. Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Mike Pattrick <mkp@redhat.com> Acked-by: Michael Santana <msantana@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-11-24 13:38:08 +01:00
Ilya Maximets	0d0f282c19	vswitch.xml: Fix the name of rstp-path-cost option. For some reason it is documented as 'rstp-port-path-cost', while the code and some other bits of documentation use 'rstp-path-cost'. Fixes: 9efd308e957c ("Rapid Spanning Tree Protocol (IEEE 802.1D).") Reviewed-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-11-02 19:45:14 +01:00

1 2 3 4 5 ...

1479 Commits