mir/ovs - ovs - Mike's Git repositories

mir/ovs

mirror of https://github.com/openvswitch/ovs synced 2025-08-28 21:07:47 +00:00

Author	SHA1	Message	Date
James Raphael Tiovalen	e769387b42	file, monitor: Add null pointer assertions for old and new ovsdb_rows. This commit adds non-null pointer assertions in some code that performs some decisions based on old and new input ovsdb_rows. Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: James Raphael Tiovalen <jamestiotio@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-07-12 00:29:08 +02:00
James Raphael Tiovalen	e71f1a2da1	ovsdb: Assert and check return values of `ovsdb_table_schema_get_column`. This commit adds a few null pointer assertions and checks to some return values of `ovsdb_table_schema_get_column`. If a null pointer is encountered in these blocks, either the assertion will fail or the control flow will now be redirected to alternative paths which will output the appropriate error messages. A few ovsdb-rbac and ovsdb-server tests are also updated to verify the expected warning logs by adding said logs to the ALLOWLIST of the OVSDB_SERVER_SHUTDOWN statements. Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: James Raphael Tiovalen <jamestiotio@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-07-12 00:22:02 +02:00
Ilya Maximets	00782baac0	AUTHORS: Add Sayali Naval. Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-07-12 00:14:54 +02:00
Sayali Naval	8e073791d4	bridge: Fix unexpected values for IPFIX enable-input/output-sampling. As per the Open vSwitch Manual ovs-vsctl(8) the Bridge IPFIX parameters can be passed as follows: ovs-vsctl -- set Bridge br0 ipfix=@i \ -- --id=@i create IPFIX targets=\"192.168.0.34:4739\" \ obs_domain_id=123 obs_point_id=456 cache_active_timeout=60 \ cache_max_flows=13 \ other_config:enable-input-sampling=false \ other_config:enable-output-sampling=false where the default values are: enable_input_sampling: true enable_output_sampling: true But in the existing code these 2 parameters take up unexpected values in some scenarios: be_opts.enable_input_sampling = !smap_get_bool(&be_cfg->other_config, "enable-input-sampling", false); be_opts.enable_output_sampling = !smap_get_bool(&be_cfg->other_config, "enable-output-sampling", false); Here, the function smap_get_bool is being used with a negation. This returns expected values for the default case (since the above code will negate “false” we get from smap_get bool function and return the value “true”) but unexpected values for the case where the sampling value is passed through the CLI. For example, if we pass "true" for other_config:enable-input-sampling in the CLI, the above code will negate the “true” value we get from the smap_bool function and return the value “false”. Same would be the case for enable_output_sampling. Acked-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Sayali Naval <sanaval@cisco.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-07-12 00:12:49 +02:00
Robin Jarry	fc06ea9a18	netdev-dpdk: Add custom rx-steering configuration. Some control protocols are used to maintain link status between forwarding engines (e.g. LACP). When the system is not sized properly, the PMD threads may not be able to process all incoming traffic from the configured Rx queues. When a signaling packet of such protocols is dropped, it can cause link flapping, worsening the situation. Use the rte_flow API to redirect these protocols into a dedicated Rx queue. The assumption is made that the ratio between control protocol traffic and user data traffic is very low and thus this dedicated Rx queue will never get full. Re-program the RSS redirection table to only use the other Rx queues. The additional Rx queue will be assigned a PMD core like any other Rx queue. Polling that extra queue may introduce increased latency and a slight performance penalty at the benefit of preventing link flapping. This feature must be enabled per port on specific protocols via the rx-steering option. This option takes "rss" followed by a "+" separated list of protocol names. It is only supported on ethernet ports. This feature is experimental. If the user has already configured multiple Rx queues on the port, an additional one will be allocated for control packets. If the hardware cannot satisfy the number of requested Rx queues, the last Rx queue will be assigned for control plane. If only one Rx queue is available, the rx-steering feature will be disabled. If the hardware does not support the rte_flow matchers/actions, the rx-steering feature will be completely disabled on the port and regular rss will be performed instead. It cannot be enabled when other-config:hw-offload=true as it may conflict with the offloaded flows. Similarly, if hw-offload is enabled, custom rx-steering will be forcibly disabled on all ports and replaced by regular rss. Example use: ovs-vsctl add-bond br-phy bond0 phy0 phy1 -- \ set interface phy0 type=dpdk options:dpdk-devargs=0000:ca:00.0 -- \ set interface phy0 options:rx-steering=rss+lacp -- \ set interface phy1 type=dpdk options:dpdk-devargs=0000:ca:00.1 -- \ set interface phy1 options:rx-steering=rss+lacp As a starting point, only one protocol is supported: LACP. Other protocols can be added in the future. NIC compatibility should be checked. To validate that this works as intended, I used a traffic generator to generate random traffic slightly above the machine capacity at line rate on a two ports bond interface. OVS is configured to receive traffic on two VLANs and pop/push them in a br-int bridge based on tags set on patch ports. +----------------------+ \| DUT \| \|+--------------------+\| \|\| br-int \|\| in_port=patch10,actions=mod_dl_src:$patch11, \|\| \|\| mod_dl_dst:$tgen0, \|\| \|\| output:patch10 \|\| \|\| in_port=patch11,actions=mod_dl_src:$patch10 \|\| \|\| mod_dl_dst:$tgen0, \|\| patch10 patch11 \|\| output:patch10 \|+---\|-----------\|----+\| \| \| \| \| \|+---\|-----------\|----+\| \|\| patch00 patch01 \|\| \|\| tag:10 tag:20 \|\| \|\| \|\| \|\| br-phy \|\| default flow, action=NORMAL \|\| \|\| \|\| bond0 \|\| balance-slb, lacp=passive, lacp-time=fast \|\| phy0 phy1 \|\| \|+------\|-----\|-------+\| +-------\|-----\|--------+ \| \| +-------\|-----\|--------+ \| port0 port1 \| balance L3/L4, lacp=active, lacp-time=fast \| lag \| mode trunk VLANs 10, 20 \| \| \| switch \| \| \| \| vlan 10 vlan 20 \| mode access \| port2 port3 \| +-----\|----------\|-----+ \| \| +-----\|----------\|-----+ \| tgen0 tgen1 \| Random traffic that is properly balanced \| \| across the bond ports in both directions. \| traffic generator \| +----------------------+ Without rx-steering, the bond0 links are randomly switching to "defaulted" when one of the LACP packets sent by the switch is dropped because the RX queues are full and the PMD threads did not process them fast enough. When that happens, all traffic must go through a single link which causes above line rate traffic to be dropped. ~# ovs-appctl lacp/show-stats bond0 ---- bond0 statistics ---- member: phy0: TX PDUs: 347246 RX PDUs: 14865 RX Bad PDUs: 0 RX Marker Request PDUs: 0 Link Expired: 168 Link Defaulted: 0 Carrier Status Changed: 0 member: phy1: TX PDUs: 347245 RX PDUs: 14919 RX Bad PDUs: 0 RX Marker Request PDUs: 0 Link Expired: 147 Link Defaulted: 1 Carrier Status Changed: 0 When rx-steering is enabled, no LACP packet is dropped and the bond links remain enabled at all times, maximizing the throughput. Neither the "Link Expired" nor the "Link Defaulted" counters are incremented anymore. This feature may be considered as "QoS". However, it does not work by limiting the rate of traffic explicitly. It only guarantees that some protocols have a lower chance of being dropped because the PMD cores cannot keep up with regular traffic. The choice of protocols is limited on purpose. This is not meant to be configurable by users. Some limited configurability could be considered in the future but it would expose to more potential issues if users are accidentally redirecting all traffic in the isolated queue. Acked-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Robin Jarry <rjarry@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-07-10 15:49:44 +02:00
David Marchand	a5669fd51c	netdev-dpdk: Drop TSO in case of conflicting virtio features. At some point in OVS history, some virtio features were announced as supported (ECN and UFO virtio features). The userspace TSO code, which has been added later, does not support those features and tries to disable them. This breaks OVS upgrades: if an existing VM already negotiated such features, their lack on reconnection to an upgraded OVS triggers a vhost socket disconnection by Qemu. This results in an endless loop because Qemu then retries with the same set of virtio features. This patch proposes to try and detect those vhost socket disconnection and fallback restoring the old virtio features (and disabling TSO for this vhost port). Acked-by: Mike Pattrick <mkp@redhat.com> Acked-by: Simon Horman <simon.horman@corigine.com> Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-07-07 18:05:48 +02:00
Gavin Li	b4c7009c20	system-offloads-traffic.at: Add vxlan gbp offload test. Add a vxlan gbp offload test case: vxlan offloads with gbp extention - ping between two ports - offloads enabled ok Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Gavin Li <gavinl@nvidia.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>	2023-07-03 11:56:39 +02:00
Gavin Li	7f04588d78	netdev-tc-offloads: Probe for allowing vxlan gbp support. Kernels that do not support vxlan gbp would treat the rule that has vxlan gbp encap action or vxlan gbp id match differently, either reject it or just skip the action/match and continue processing the knowing ones. To solve the issue, probe and disallow inserting rules with vxlan gbp action/match if kernel does not support it. Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Gavin Li <gavinl@nvidia.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>	2023-07-03 11:56:39 +02:00
Gavin Li	a2a3f1983f	tc: Add vxlan encap action with gbp option offload. Add TC offload support for vxlan encap with gbp option. Reviewed-by: Gavi Teitz <gavi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Gavin Li <gavinl@nvidia.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>	2023-07-03 11:56:39 +02:00
Gavin Li	256c1e5819	tc: Pass encap entirely to nl_msg_put_act_tunnel_key_set. Most of the data members of struct tc_action{ } are defined as anonymous struct in place. Instead of passing all members of an anonymous struct, which is not flexible to new members being added, expose encap as named struct and pass it entirely. Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Gavin Li <gavinl@nvidia.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>	2023-07-03 11:56:39 +02:00
Gavin Li	a4332b5e68	tc: Add vxlan gbp option flower match offload. Add TC offload support for filtering vxlan tunnels with gbp option. Reviewed-by: Gavi Teitz <gavi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Gavin Li <gavinl@nvidia.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>	2023-07-03 11:56:39 +02:00
Gavin Li	c39d7d06f5	netlink: Add new function to add NLA_F_NESTED to nested netlink messages. Linux kernel netlink module added NLA_F_NESTED flag checking for nested netlink messages in 5.2. A nested message without the flag set will be treated as malformatted one. The check is optional and is controlled by message policy. To avoid this, add NLA_F_NESTED explicitly for all nested netlink messages with a new function nl_msg_start_nested_with_flag(). Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Gavin Li <gavinl@nvidia.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>	2023-07-03 11:56:39 +02:00
Gavin Li	31baa7781e	odp-util: Extract vxlan gbp option encoding to a function. Extract vxlan gbp option encoding to odp_encode_gbp_raw to be used in following commits. Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Gavin Li <gavinl@nvidia.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>	2023-07-03 11:56:39 +02:00
Gavin Li	8c3d5488da	odp-util: Extract vxlan gbp option decoding to a function. Extract vxlan gbp option decoding to odp_decode_gbp_raw to be used in following commits. Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Gavin Li <gavinl@nvidia.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>	2023-07-03 11:56:39 +02:00
Gavin Li	affb9b8183	tc: Pass tunnel entirely to tunnel option parse and put functions. Tc flower tunnel key options were encoded in nl_msg_put_flower_tunnel_opts and decoded in nl_parse_flower_tunnel_opts. Only geneve was supported. To avoid adding more arguments to the function to support more vxlan options in the future, change the function arguments to pass tunnel entirely to it instead of keep adding new arguments. Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Gavin Li <gavinl@nvidia.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>	2023-07-03 11:56:39 +02:00
Ilya Maximets	c2433bdfc0	dpif-netdev: Lockless meters. Current implementation of meters in the userspace datapath takes the meter lock for every packet batch. If more than one thread hits the flow with the same meter, they will lock each other. Replace the critical section with atomic operations to avoid interlocking. Meters themselves are RCU-protected, so it's safe to access them without holding a lock. Implementation does the following: 1. Tries to advance the 'used' timer of the meter with atomic compare+exchange if it's smaller than 'now'. 2. If the timer change succeeds, atomically update band buckets. 3. Atomically update packet statistics for a meter. 4. Go over buckets and try to atomically subtract the amount of packets or bytes, recording the highest exceeded band. 5. Atomically update band statistics and drop packets. Bucket manipulations are implemented with atomic compare+exchange operations with extra checks, because bucket size should never exceed the maximum and it should never go below zero. Packet statistics may be momentarily inconsistent, i.e., number of packets and the number of bytes may reflect different sets of packets. But it should be eventually consistent. And the difference at any given time should be in just few packets. For the sake of reduced code complexity PKTPS meter tries to push packets through the band one by one, even though they all have the same weight. This is also more fair if more than one thread is passing packets through the same band at the same time. Trying to predict the number of packets that can pass may also cause extra atomic operations reducing the performance. This implementation shows similar performance to the previous one, but should scale better with more threads hitting the same meter. Reviewed-by: Simon Horman <simon.horman@corigine.com> Tested-by: Lin Huang <linhuang@ruijie.com.cn> Tested-by: Zhang YuHuang <zhangyuhuang@ruijie.com.cn> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-07-01 00:35:18 +02:00
Han Zhou	2ece9c9ac1	ovsdb: raft: Fix RAFT paper link. Signed-off-by: Han Zhou <hzhou@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-06-30 00:03:36 +02:00
Paolo Valerio	9b4d2ad8e8	conntrack: Allow to dump userspace conntrack expectations. The patch introduces a new commands ovs-appctl dpctl/dump-conntrack-exp that allows to dump the existing expectations for the userspace ct. Signed-off-by: Paolo Valerio <pvalerio@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-06-29 22:20:43 +02:00
Kevin Traynor	34ace16cb8	tests: Add macro to common file. get_log_next_line_num() was defined in alb.at. As it may be useful in other test files, move to ofproto-macros.at. Suggested-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-06-29 22:13:55 +02:00
Dumitru Ceara	d56932aac6	checkpatch: Ignore yml files when checking line lengths. As far as I can tell they're used mostly for CI job definitions and these tend to result in long lines. Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2023-June/405796.html Suggested-by: Aaron Conole <aconole@redhat.com> Acked-by: Aaron Conole <aconole@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-06-28 12:39:31 +02:00
Eelco Chaudron	903294cde6	dpif: Add coverage counters for dpif_operate() failures. Add additional error coverage counters for dpif operation failures. This could help to quickly identify netlink problems when communicating with the OVS kernel module. Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2070630 Reviewed-by: Adrian Moreno <amorenoz@redhat.com> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-06-23 15:04:55 +02:00
Simon Horman	c918670302	MAINTAINERS: Add Eelco Chaudron. Eelco Chaudron was elected by the Open vSwitch committers yesterday. This formalises his status as an Open vSwitch committer. Welcome Eelco! Acked-by: Alin Gabriel Serdean <aserdean@ovn.org> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-06-21 15:02:47 +02:00
Robin Jarry	07f6d6a0cb	Add editorconfig file. EditorConfig is a file format and collection of text editor plugins for maintaining consistent coding styles between different editors and IDEs. Initialize the file following the coding rules in Documentation/internals/contributing/coding-style.rst and add exceptions declared in build-aux/initial-tab-allowed-files. Only enforce rules for .c and .h files. Other files should use the default indenting rules from text editors. In order for this file to be taken into account (unless they use an editor with built-in EditorConfig support), developers will have to install a plugin. Notes: * All matching rules are considered. The last matching rule's properties will override the previous ones. * The max_line_length property is only supported by a limited number of EditorConfig plugins. It will be ignored if unsupported. Link: https://editorconfig.org/ Link: https://github.com/editorconfig/editorconfig-emacs Link: https://github.com/editorconfig/editorconfig-vim Link: https://github.com/editorconfig/editorconfig/wiki/EditorConfig-Properties#max_line_length Signed-off-by: Robin Jarry <rjarry@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-06-20 15:28:05 +02:00
Mike Pattrick	3337e6d91c	userspace: Enable L4 checksum offloading by default. The netdev receiving packets is supposed to provide the flags indicating if the L4 checksum was verified and it is OK or BAD, otherwise the stack will check when appropriate by software. If the packet comes with good checksum, then postpone the checksum calculation to the egress device if needed. When encapsulate a packet with that flag, set the checksum of the inner L4 header since that is not yet supported. Calculate the L4 checksum when the packet is going to be sent over a device that doesn't support the feature. Linux tap devices allows enabling L3 and L4 offload, so this patch enables the feature. However, Linux socket interface remains disabled because the API doesn't allow enabling those two features without enabling TSO too. Signed-off-by: Flavio Leitner <fbl@sysclose.org> Co-authored-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-06-15 23:50:30 +02:00
Mike Pattrick	5d11c47d3e	userspace: Enable IP checksum offloading by default. The netdev receiving packets is supposed to provide the flags indicating if the IP checksum was verified and it is GOOD or BAD, otherwise the stack will check when appropriate by software. If the packet comes with good checksum, then postpone the checksum calculation to the egress device if needed. When encapsulate a packet with that flag, set the checksum of the inner IP header since that is not yet supported. Calculate the IP checksum when the packet is going to be sent over a device that doesn't support the feature. Linux devices don't support IP checksum offload alone, so the support is not enabled. Signed-off-by: Flavio Leitner <fbl@sysclose.org> Co-authored-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-06-15 23:49:51 +02:00
Mike Pattrick	4433cc6860	dpif-netdev: Show netdev offloading flags. This patch modifies netdev_get_status to include information about checksum offload status by port, allowing the user to gain insight into where checksum offloading is active. Signed-off-by: Flavio Leitner <fbl@sysclose.org> Co-authored-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-06-15 15:44:57 +02:00
Mike Pattrick	22df63c384	Documentation: Document netdev offload. Document the implementation of netdev hardware offloading in userspace datapath. Signed-off-by: Flavio Leitner <fbl@sysclose.org> Co-authored-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-06-15 15:44:53 +02:00
Eelco Chaudron	e3ba0be48c	seq: Make read of the current value atomic. Make the read of the current seq->value atomic, i.e., not needing to acquire the global mutex when reading it. On 64-bit systems, this incurs no overhead, and it will avoid the mutex and potentially a system call. For incrementing the value followed by waking up the threads, we are still taking the mutex, so the current behavior is not changing. The seq_read() behavior is already defined as, "Returns seq's current sequence number (which could change immediately)". So the change should not impact the current behavior. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-06-12 18:48:26 +02:00
Ilya Maximets	04f854f938	fatal-signal: Don't share signal fds/handles with forked process. The signal_fds pipe and wevent are a mechanism to wake up the process after it received a signal and stored the number for the future processing. They are not intended for inter-process communication. However, in the current code, descriptors are not closed on fork(). The main scenario where we use fork() is a monitor process. Monitor doesn't actually use poll loops and doesn't wait on the descriptor. But when a child process is killed, it (child) sends a byte to itself, then it wakes up due to POLLIN on the pipe and terminates itself after processing all the callbacks. The byte stays unread. And the pipe is still open in the monitor process. When child dies, the monitor wakes up and forks again. New child inherits the same pipe that still contains unread data. This data is never read, so the child will constantly wake itself up for no reason. Interestingly enough raise(SIGSEGV) doesn't immediately kill the process. The execution continues til the end of a signal handler, so we're still able to write a byte to a pipe even in this case. Presumably because we don't have SA_NODEFER. Fix the issue by re-creating the pipe/event on fork. This way every new child will have its own notification channel and will not wake up any other processes. There was already an attempt to fix the issue, but it didn't get a follow up (see the reported-at tag). This is an alternative solution. Fixes: ff8decf1a318 ("daemon: Add support for process monitoring and restart.") Reported-at: https://patchwork.ozlabs.org/project/openvswitch/patch/20221019093147.2072-1-lifengqi@inspur.com/ Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-06-09 14:15:11 +02:00
Ilya Maximets	469e98e16d	ovsdb: monitor: Destroy initial change set when new columns added. Initial change set is preserved for as long as the monitor itself. However, if a new client has a condition on a column that is not one of the monitored columns, this column will be added to the monitor via ovsdb_monitor_condition_bind(). This new column, however, doesn't exist in the initial change set. That will cause ovsdb-server to malfunction or crash trying to access non-existent column during condition evaluation: ERROR: AddressSanitizer: heap-buffer-overflow READ of size 4 at 0x606000006780 thread T0 0 ovsdb_clause_evaluate ovsdb/condition.c:328:26 1 ovsdb_condition_match_any_clause ovsdb/condition.c:441:13 2 ovsdb_condition_empty_or_match_any ovsdb/condition.h:84:13 3 ovsdb_monitor_row_update_type_condition ovsdb/monitor.c:892:28 4 ovsdb_monitor_compose_row_update2 ovsdb/monitor.c:1058:12 5 ovsdb_monitor_compose_update ovsdb/monitor.c:1172:24 6 ovsdb_monitor_get_update ovsdb/monitor.c:1276:24 7 ovsdb_jsonrpc_monitor_create ovsdb/jsonrpc-server.c:1505:12 8 ovsdb_jsonrpc_session_got_request ovsdb/jsonrpc-server.c:1030:21 9 ovsdb_jsonrpc_session_run ovsdb/jsonrpc-server.c:572:17 10 ovsdb_jsonrpc_session_run_all ovsdb/jsonrpc-server.c:602:21 11 ovsdb_jsonrpc_server_run ovsdb/jsonrpc-server.c:417:9 12 main_loop ovsdb/ovsdb-server.c:222:9 13 main ovsdb/ovsdb-server.c:500:5 14 __libc_start_call_main 15 __libc_start_main@GLIBC_2.2.5 16 _start (ovsdb/ovsdb-server+0x473034) Located 0 bytes after 64-byte region [0x606000006740,0x606000006780) allocated by thread T0 here: 0 malloc (ovsdb/ovsdb-server+0x50dc82) 1 xmalloc__ lib/util.c:140:15 2 xmalloc lib/util.c:175:12 3 clone_monitor_row_data ovsdb/monitor.c:336:12 4 ovsdb_monitor_changes_update ovsdb/monitor.c:1384:23 5 ovsdb_monitor_get_initial ovsdb/monitor.c:1535:21 6 ovsdb_jsonrpc_monitor_create ovsdb/jsonrpc-server.c:1502:9 7 ovsdb_jsonrpc_session_got_request ovsdb/jsonrpc-server.c:1030:21 8 ovsdb_jsonrpc_session_run ovsdb/jsonrpc-server.c:572:17 9 ovsdb_jsonrpc_session_run_all ovsdb/jsonrpc-server.c:602:21 10 ovsdb_jsonrpc_server_run ovsdb/jsonrpc-server.c:417:9 11 main_loop ovsdb/ovsdb-server.c:222:9 12 main ovsdb/ovsdb-server.c:500:5 13 __libc_start_call_main 14 __libc_start_main@GLIBC_2.2.5 15 _start (ovsdb/ovsdb-server+0x473034) Fix that by destroying the initial change set every time new columns are added to the monitor. This will trigger re-generation of the change set and it will contain all the necessary columns afterwards. Fixes: 07c27226ee96 ("ovsdb: Monitor: Keep and maintain the initial change set.") Reported-by: Han Zhou <hzhou@ovn.org> Acked-by: Han Zhou <hzhou@ovn.org> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-06-09 14:11:38 +02:00
Ales Musil	759a29dc2d	backtrace: Extend the backtrace functionality. Use the backtrace functions that is provided by libc, this allows us to get backtrace that is independent of the current memory map of the process. Which in turn can be used for debugging/tracing purpose. The backtrace is not 100% accurate due to various optimizations, most notably "-fomit-frame-pointer" and LTO. This might result that the line in source file doesn't correspond to the real line. However, it should be able to pinpoint at least the function where the backtrace was called. The implementation is determined during compilation based on available libraries. Libunwind has higher priority if both methods are available to keep the compatibility with current behavior. The backtrace is not marked as signal safe however the backtrace manual page gives more detailed explanation why it might be the case [0]. Load the "libgcc" or equivalent in advance within the "fatal_signal_init" which should ensure that subsequent calls to backtrace* do not call malloc and are signal safe. The typical backtrace will look similar to the one below: /lib64/libopenvswitch-3.1.so.0(backtrace_capture+0x1e) [0x7fc5db298dfe] /lib64/libopenvswitch-3.1.so.0(log_backtrace_at+0x57) [0x7fc5db2999e7] /lib64/libovsdb-3.1.so.0(ovsdb_txn_complete+0x7b) [0x7fc5db56247b] /lib64/libovsdb-3.1.so.0(ovsdb_txn_propose_commit_block+0x8d) [0x7fc5db563a8d] ovsdb-server(+0xa661) [0x562cfce2e661] ovsdb-server(+0x7e39) [0x562cfce2be39] /lib64/libc.so.6(+0x27b4a) [0x7fc5db048b4a] /lib64/libc.so.6(__libc_start_main+0x8b) [0x7fc5db048c0b] ovsdb-server(+0x8c35) [0x562cfce2cc35] backtrace.h elaborates on how to effectively get the line information associated with the addressed presented in the backtrace. [0] backtrace() and backtrace_symbols_fd() don't call malloc() explicitly, but they are part of libgcc, which gets loaded dynamically when first used. Dynamic loading usually triggers a call to malloc(3). If you need certain calls to these two functions to not allocate memory (in signal handlers, for example), you need to make sure libgcc is loaded beforehand Reported-at: https://bugzilla.redhat.com/2177760 Signed-off-by: Ales Musil <amusil@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-06-08 20:30:42 +02:00
David Marchand	474a179aff	cpu: Fix cpuid check for some AMD processors. Some venerable AMD processors do not support querying extended features (EAX=7) with cpuid. In this case, it is not a programmatic error and the runtime check should simply return the isa is unsupported. Reported-by: Davide Repetto <red@idp.it> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2211747 Fixes: b366fa2f4947 ("dpif-netdev: Call cpuid for x86 isa availability.") Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-06-07 23:08:44 +02:00
Frode Nordahl	106ef21860	tc: Fix crash on malformed reply from kernel. The tc module combines the use of the `tc_transact` helper function for communication with the in-kernel tc infrastructure with assertions on the reply data by `ofpbuf_at_assert` on the received data prior to further processing. With the presence of bugs on the kernel side, we need to treat the kernel as an unreliable service provider and replace assertions on the reply from it with checks to avoid a fatal crash of OVS. For the record, the symptom of the crash is this in the log: EMER\|include/openvswitch/ofpbuf.h:194: assertion offset + size <= b->size failed in ofpbuf_at_assert() And an excerpt of the backtrace looks like this: ofpbuf_at_assert (offset=16, size=20) at include/openvswitch/ofpbuf.h:194 tc_replace_flower at lib/tc.c:3223 netdev_tc_flow_put at lib/netdev-offload-tc.c:2096 netdev_flow_put at lib/netdev-offload.c:257 parse_flow_put at lib/dpif-netlink.c:2297 try_send_to_netdev at lib/dpif-netlink.c:2384 Reported-At: https://launchpad.net/bugs/2018500 Fixes: 5c039ddc64ff ("netdev-linux: Add functions to manipulate tc police action") Fixes: e7f6ba220e10 ("lib/tc: add ingress ratelimiting support for tc-offload") Fixes: f98e418fbdb6 ("tc: Add tc flower functions") Fixes: c1c9c9c4b636 ("Implement QoS framework.") Signed-off-by: Frode Nordahl <frode.nordahl@canonical.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-06-07 22:46:45 +02:00
Ilya Maximets	64cdc290ef	appveyor: Silence the git clone of pthreads4w. Git by default reports progress on stderr. This doesn't fail the build, but upsets the powershell: git : Cloning into 'c:\pthreads4w-code'... At line:3 char:1 + git clone https://git.code.sf.net/p/pthreads4w/code c:\pthreads4w-cod ... + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + CategoryInfo : NotSpecified: (Cloning into 'c:\pthreads4w-code'...:String) [], RemoteException + FullyQualifiedErrorId : NativeCommandError Silence the git clone to avoid the warning. Acked-by: Alin Gabriel Serdean <aserdean@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-06-02 10:57:54 +02:00
Robin Jarry	8bcc6d694c	netdev-dpdk: Fix warning with gcc 13. GCC now reports uninitialized warnings from function return values. ../lib/netdev-dpdk.c: In function 'netdev_dpdk_mempool_configure': ../lib/netdev-dpdk.c:964:22: warning: 'dmp' may be used uninitialized [-Wmaybe-uninitialized] 964 \| dev->dpdk_mp = dmp; \| ~~~~~~~~~~~~~^~~~~ ../lib/netdev-dpdk.c:854:21: note: 'dmp' was declared here 854 \| struct dpdk_mp dmp, next; \| ^~~ NB: this looks like a false positive, gcc 13 probably fails to see the link between reuse and dmp in dpdk_mp_get(). Reviewed-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Robin Jarry <rjarry@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-06-02 10:56:57 +02:00
David Marchand	359cabbd6e	netdev-offload: Fix some typos. Caught while reviewing code. Fixes: aca2f8a8a6b6 ("netdev-offload-dpdk: Implement HW miss packet recover for vport.") Fixes: 241bad15d99a ("dpif-netdev: associate flow with a mark id") Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-05-31 22:04:11 +02:00
Ilya Maximets	ef1da757f0	ovsdb: condition: Process condition changes incrementally. In most cases, after the condition change request, the new condition is the same as old one plus minus a few clauses. Today, ovsdb-server will evaluate every database row against all the old clauses and then against all the new clauses in order to tell if an update should be generated. For example, every time a new port is added, ovn-controller adds two new clauses to conditions for a Port_Binding table. And this condition may grow significantly in size making addition of every new port heavier on the server side. The difference between conditions is not larger and, likely, significantly smaller than old and new conditions combined. And if the row doesn't match clauses that are different between old and new conditions, that row should not be part of the update. It either matches both old and new, or it doesn't match either of them. If the row matches some clauses in the difference, then we need to perform a full match against old and new in order to tell if it should be added/removed/modified. This is necessary because different clauses may select same rows. Let's generate the condition difference and use it to avoid evaluation of all the clauses for rows not affected by the condition change. Testing shows 70% reduction in total CPU time in ovn-heater's 120-node density-light test with conditional monitoring. Average CPU usage during the test phase went down from frequent 100% spikes to just 6-8%. Note: This will not help with new connections, or re-connections, or new monitor requests after database conversion. ovsdb-server will still evaluate every database row against every clause in the condition in these cases. So, it's still important to not have too many clauses in conditions for large tables. Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-05-31 21:31:38 +02:00
Ilya Maximets	d56366bfa0	tests: Check ovsdb-server logs in OVSDB tests. Many OVSDB tests are not checking the server log for warnings or errors. Some are not even using the log file. It's mostly OK as we're usually checking the user-visible behavior. But it would also be nice to detect some internal warnings if there are some. Moving the OVSDB_SERVER_SHUTDOWN macro to the common place, adding the call to check_logs into it and making OVSDB tests use this macro. Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-05-31 21:30:10 +02:00
Russell Bryant	1335af2f55	MAINTAINERS.rst: Move several people to emeritus status The following document discusses emeritus committer status: https://docs.openvswitch.org/en/latest/internals/committer-emeritus-status/ There are several people who I would guess consider themselves emeritus committers but have not formally declared it. Those moved to emeritus status in this commit have either explicitly communicated their desire to move or have both not been active in the last year and have not yet replied to this patch. It is easy to re-add people in the future should any emeritus committer desire to become active again. Per our policies, a vote of the majority of current committers (or the list of maintainers prior to this change) is required to move a committer to emeritus status. Signed-off-by: Russell Bryant <russell@ovn.org> Acked-by: Alin Gabriel Serdean <aserdean@ovn.org> Acked-by: Ansis Atteka <ansisatteka@gmail.com> Acked-by: Daniele Di Proietto <daniele.di.proietto@gmail.com> Acked-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org> Acked-by: Justin Pettit <jpettit@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Simon Horman <horms@verge.net.au> Acked-by: Thomas Graf <tgraf@tgraf.ch> Acked-by: William Tu <u9012063@gmail.com> CC: Andy Zhou <azhou@ovn.org> CC: Gurucharan Shetty <guru@ovn.org> CC: Ian Stokes <istokes@ovn.org> CC: Jarno Rajahalme <jarno@ovn.org> CC: YAMAMOTO Takashi <yamamoto@midokura.com>	2023-05-30 14:09:12 -04:00
Timothy Redaelli	e3d0e84ed3	utilities/bashcomp: Fix PS1 generation on new bash. The current implementation used to extract PS1 prompt for ovs-vsctl is broken on recent Bash releases. Starting from Bash 4.4 it's possible to use @P expansion in order to get the quoted PS1 directly. This commit makes the 2 bash completion files to use @P expansion in order to get the quoted PS1 on Bash >= 4.4. Reported-at: https://bugzilla.redhat.com/2170344 Reported-by: Martin Necas <mnecas@redhat.com> Signed-off-by: Timothy Redaelli <tredaelli@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-05-29 20:28:49 +02:00
David Marchand	c3e410a03a	netdev-offload-dpdk: Fix crash in debug log. The offload thread calling ufid_to_rte_flow_disassociate() may be the last one holding a reference on the netdev and physdev. So displaying information about them might trigger a crash when removing a physical port. Fixes: faf71e492263 ("netdev-dpdk: Print port name in offload API messages.") Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-05-29 19:22:18 +02:00
Kevin Traynor	9dad8dfd1e	netdev-dpdk: Check rx/tx descriptor sizes for device. By default OVS configures 2048 descriptors for tx and rx queues on DPDK devices. It also allows the user to configure those values. If the values used are not acceptable to the device then queue setup would fail. The device exposes it's max/min/alignment requirements and OVS applies some limits also. Use these to ensure an acceptable value is used for the number of descriptors on a device tx/rx. If the default or user value is not acceptable, adjust to a suitable value and log. Reported-at: https://bugzilla.redhat.com/2119876 Reviewed-by: David Marchand <david.marchand@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-05-26 19:59:52 +02:00
Kevin Traynor	0af352b6df	netdev-dpdk: Remove requested descriptors from get_config. There is no need to display 'requested_rx/tx_descriptors' and 'configured_rx/tx_descriptors' as they will be the same. It is simpler to just have a single 'n_rxq/txq_desc' value. Suggested-by: Ilya Maximets <i.maximets@ovn.org> Reviewed-by: David Marchand <david.marchand@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-05-26 19:59:52 +02:00
Balazs Nemeth	59c9084105	ofproto-dpif-upcall: Don't set statistics to 0 when they jump back. The only way that stats->{n_packets,n_bytes} would decrease is due to an overflow, or if there are bugs in how statistics are handled. In the past, there were multiple issues that caused a jump backward. A workaround was in place to set the statistics to 0 in that case. When this happened while the revalidator was under heavy load, the workaround had an unintended side effect where should_revalidate returned false causing the flow to be removed because the metric it calculated was based on a bogus value. Since many of those bugs have now been identified and resolved, there is no need to set the statistics to 0. In addition, the (unlikely) overflow still needs to be handled appropriately. If an unexpected jump does happen, just log it as a warning. Signed-off-by: Balazs Nemeth <bnemeth@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-05-26 19:58:36 +02:00
Ilya Maximets	0826de990c	stream-ssl: Disable alerts on unexpected EOF. OpenSSL 3.0 enabled alerts for unexpected EOF by default. It supposed to alert the application whenever the connection terminated without a proper close_notify. And that should allow applications to take actions to protect themselves from potential TLS truncation attack. This is how it looks like in the log: \|stream_ssl\|WARN\|SSL_read: error:0A000126:SSL routines::unexpected eof while reading \|jsonrpc\|WARN\|ssl:127.0.0.1:34288: receive error: Input/output error \|reconnect\|WARN\|ssl:127.0.0.1:34288: connection dropped (Input/output error) The problem is that clients based on OVS libraries do not wait for the proper termination if it didn't happen right away. It means that chances to have alerts on the server side for every single disconnection are very high. None of the high level protocols supported by OVS daemons can carry state between re-connections, e.g., there are no session cookies or anything like that. So, the TLS truncation attack is no applicable. Disable the alert to avoid unnecessary warnings in the log. Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-05-26 14:52:54 +02:00
Frode Nordahl	d51a4ef0a6	tests: layer3-tunnels: Skip bareudp tests if not supported by kernel. The bareudp tests depend on specific kernel configuration to succeed. Skip the test if the feature is not enabled in the running kernel. Signed-off-by: Frode Nordahl <frode.nordahl@canonical.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-05-25 19:45:26 +02:00
Ilya Maximets	68d6d2777f	AUTHORS: Add yangchang. Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-05-25 19:45:09 +02:00
yangchang	263fcdfdb8	ovs-fields: Modify the width of tpa and spa. Arp_spa and arp_tpa are IP addresses, their width should be 32 bits. Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: yangchang <yangchang@chinatelecom.cn> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-05-25 19:44:44 +02:00
Nobuhiro MIKI	701c2dbfb8	userspace: Add new option srv6_flowlabel in SRv6 tunnel. It supports flowlabel based load balancing by controlling the flowlabel of outer IPv6 header, which is already implemented in Linux kernel as seg6_flowlabel sysctl [1]. [1]: https://docs.kernel.org/networking/seg6-sysctl.html Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-05-25 17:08:32 +02:00
Nobuhiro MIKI	f328fd4892	netdev-native-tnl: Add ipv6_label param in netdev_tnl_ip_build_header. For tunnels such as SRv6, some popular vendor appliances support IPv6 flowlabel based load balancing. In preparation for OVS to support it, this patch modifies the encapsulation to allow IPv6 flowlabel to be configured. Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-05-25 15:45:41 +02:00

... 4 5 6 7 8 ...

19894 Commits