mir/ovs - ovs - Mike's Git repositories

mir/ovs

mirror of https://github.com/openvswitch/ovs synced 2025-08-31 06:15:47 +00:00

Author	SHA1	Message	Date
Kevin Traynor	4a7b58163f	alb.at: Increase time/warp. It seems that on slow system with high concurrency and cpu contention time/warp is not accurate enough for the ALB unit tests with the minimum time/warp that was used to hit an amount of events. This results in some intermittent test failures. As those tests are just waiting for a certain amount of events to occur and there is no functional change during that time let's do the time/warp again with higher values. With this no failures are seen in several hundred runs. Fixes: `a83a406096` ("dpif-netdev: Sync PMD ALB state with user commands.") Reported-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-12-07 15:10:36 +01:00
Kevin Traynor	09c4449b2d	alb.at: Check for log from correct line number. The next log line number should be updated to ensure that the anticipated log has occurred again after more time has passed. Fixes: `a83a406096` ("dpif-netdev: Sync PMD ALB state with user commands.") Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-12-07 15:10:36 +01:00
lic121	1f5749c790	flow: Consider dataofs when parsing TCP packets. 'dataofs' field of TCP header indicates the TCP header length. The length should be >= 20 bytes/4 and <= TCP data length. This patch is to test the 'dataofs' and not parse layer 4 fields when meet bad dataofs. This behavior is consistent with the openvswitch kernel module. Fixes: `5a51b2cd34` ("lib/ofpbuf: Remove 'l7' pointer.") Signed-off-by: lic121 <lic121@chinatelecom.cn> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-12-03 23:45:26 +01:00
lic121	d4bed95963	tests/flowgen: Fix packet data endianness. Without this fix, flowgen.py generates bad tcp pkts. tcpdump reports "bad hdr length 4 - too short" with the pcap generated by flowgen.py This patch is to correct pkt data endianness Signed-off-by: lic121 <lic121@chinatelecom.cn> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-12-03 23:20:15 +01:00
Chris Mi	e409824684	dpif-netlink: Improve feature negotiation for older kernels. OVS_DP_F_UNALIGNED is already set, no need to set again. If restarting ovs, dp is already created. So dpif_netlink_dp_transact() will return EEXIST. No need to probe again. Signed-off-by: Chris Mi <cmi@nvidia.com> Acked-by: Paolo Valerio <pvalerio@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-12-03 18:53:56 +01:00
Mike Pattrick	4490792dd3	ofproto-dpif: Increase dp_hash default max buckets. Currently when a user creates an openflow group with with multiple buckets without specifying a selection type, the efficient dp_hash is only selected if the user is creating fewer than 64 buckets. But when dp_hash is explicitly selected, up to 256 buckets are supported. While up to 64 buckets seems like a lot, certain OVN/Open Stack workloads could result in the user creating more than 64 buckets. For example, when using OVN to load balance. This patch increases the default maximum from 64 to 256. This change to the default limit doesn't affect how many buckets are actually created, that is specified by the user when the group is created, just how traffic is distributed across buckets. Signed-off-by: Mike Pattrick <mkp@redhat.com> Acked-by: Gaetan Rivet <grive@u256.net> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-12-03 15:40:53 +01:00
Dumitru Ceara	91e1ff5dde	ovsdb-idl: Don't reparse orphaned rows. Rows that refer to rows that were inserted in the current IDL run should only be reparsed if they don't get deleted (become orphan) in the current IDL run. Fixes: `7b8aeadd60` ("ovsdb-idl: Re-parse backrefs of inserted rows only once.") Reported-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-11-30 13:44:47 +01:00
Ilya Maximets	149169836d	ofproto: Fix resource usage explosion due to removal of large number of flows. While removing flows, removal itself is deferred, so classifier changes performed already from the RCU thread. This way every deferred removal triggers classifier change and reallocation of a pvector. Freeing of old version of a pvector is postponed. Since all this is happening from an RCU thread, all these copies of the same pvector will be freed only after the next grace period. Below is the example output of the 'valgrind --tool=massif' from an OVN deployment, where copies of that pvector took 5 GB of memory while processing a bundled flow removal: ------------------------------------------------------------------- n time(i) total(B) useful-heap(B) extra-heap(B) ------------------------------------------------------------------- 89 176,257,987,954 5,329,763,160 5,318,171,607 11,591,553 99.78% (5,318,171,607B) (heap allocation functions) malloc/new/new[] ->98.45% (5,247,008,392B) xmalloc__ (util.c:137) \|->98.17% (5,232,137,408B) pvector_impl_dup (pvector.c:48) \|\|->98.16% (5,231,472,896B) pvector_remove (pvector.c:159) \|\|\|->98.16% (5,231,472,800B) destroy_subtable (classifier.c:1558) \|\|\|\|->98.16% (5,231,472,800B) classifier_remove (classifier.c:792) \|\|\|\| ->98.16% (5,231,472,800B) classifier_remove_assert (classifier.c:832) \|\|\|\| ->98.16% (5,231,472,800B) remove_rule_rcu__ (ofproto.c:2978) \|\|\|\| ->98.16% (5,231,472,800B) remove_rule_rcu (ofproto.c:2990) \|\|\|\| ->98.16% (5,231,472,800B) ovsrcu_call_postponed (ovs-rcu.c:346) \|\|\|\| ->98.16% (5,231,472,800B) ovsrcu_postpone_thread (ovs-rcu.c:362) \|\|\|\| ->98.16% (5,231,472,800B) ovsthread_wrapper \|\|\|\| ->98.16% (5,231,472,800B) start_thread \|\|\|\| ->98.16% (5,231,472,800B) clone Collecting all the flows to be removed and postponing removal for all of them together to avoid the problem. This way all removals will trigger only a single pvector re-allocation greatly reducing the CPU and memory usage. Reported-by: Vladislav Odintsov <odivlad@gmail.com> Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-November/389538.html Tested-by: Vladislav Odintsov <odivlad@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-11-30 13:43:20 +01:00
Ilya Maximets	a05883b897	ofproto: Fix resource usage explosion while processing bundled FLOW_MOD. While processing a bundle, OVS will add all new and modified rules to classifiers. Classifiers are using RCU-protected pvector to store subtables. Addition of a new subtable or removal of the old one leads to re-allocation and memory copy of the pvector array. Old version of that array is given to RCU thread to free it later. The problem is that bundle is processed under the mutex without entering the quiescent state. Therefore, memory can not be freed until the whole bundle is processed. So, if a few thousands of flows added to the same table in a bundle, pvector will be re-allocated while adding each of them. So, we'll have a few thousands of copies of the same array waiting to be freed. In case of OVN deployments, there could be hundreds of thousands of flows in the same table leading to a fast consumption of a huge amount of memory and wasting a lot of CPU cycles on allocations and copies. Below snippet of the 'valgrind --tool=massif' output shows ovs-vswitchd consuming 3.5GB of RAM while processing a bundle with 65K FLOW_MODs in the OVN deployment. 3.4GB of that memory are copies of the same pvector. ------------------------------------------------------------------- n time(i) total(B) useful-heap(B) extra-heap(B) ------------------------------------------------------------------- 64 109,907,465,404 3,559,987,568 3,546,879,748 13,107,820 99.63% (3,546,879,748B) (heap allocation functions) malloc/new/new[] ->97.61% (3,474,750,333B) xmalloc__ (util.c:137) \|->97.61% (3,474,750,333B) xmalloc (util.c:172) \| ->96.38% (3,431,068,352B) pvector_impl_dup (pvector.c:48) \|\| ->96.38% (3,431,067,840B) pvector_insert (pvector.c:138) \|\| \|->96.38% (3,431,067,840B) classifier_replace (classifier.c:664) \|\| \| ->96.38% (3,431,067,840B) classifier_insert (classifier.c:695) \|\| \| ->96.38% (3,431,067,840B) replace_rule_start (ofproto.c:5563) \|\| \| ->96.38% (3,431,067,840B) add_flow_start (ofproto.c:5179) \|\| \| ->96.38% (3,431,067,840B) ofproto_flow_mod_start (ofproto.c:8017) \|\| \| ->96.38% (3,431,067,744B) do_bundle_commit (ofproto.c:8168) \|\| \| \|->96.38% (3,431,067,744B) handle_bundle_control (ofproto.c:8309) \|\| \| \| ->96.38% (3,431,067,744B) handle_single_part_openflow (ofproto.c:8593) \|\| \| \| ->96.38% (3,431,067,744B) handle_openflow (ofproto.c:8674) \|\| \| \| ->96.38% (3,431,067,744B) ofconn_run (connmgr.c:1329) \|\| \| \| ->96.38% (3,431,067,744B) connmgr_run (connmgr.c:356) \|\| \| \| ->96.38% (3,431,067,744B) ofproto_run (ofproto.c:1879) \|\| \| \| ->96.38% (3,431,067,744B) bridge_run__ (bridge.c:3251) \|\| \| \| ->96.38% (3,431,067,744B) bridge_run (bridge.c:3310) \|\| \| \| ->96.38% (3,431,067,744B) main (ovs-vswitchd.c:127) Fixing that by postponing the publishing of classifier updates, so each flow modification can work with the same version of pvector. Unfortunately, bundled PACKET_OUT messages requires all previous changes to be published before processing, otherwise the packet will use wrong version of OF tables. Publishing all changes before processing PACKET_OUT messages to avoid this issue. Hopefully, mixup of a big number of FLOW_MOD and PACKET_OUT messages is not a common usecase. Reported-by: Vladislav Odintsov <odivlad@gmail.com> Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-November/389503.html Tested-by: Vladislav Odintsov <odivlad@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-11-30 13:43:09 +01:00
Ilya Maximets	79953a57ea	stream-ssl: Avoid unnecessary memory copies on send. ssl_send() clones the data before sending, but if SSL_write() succeeds at the first attempt, this is only a waste of CPU cycles. Trying to send the original buffer instead and only copying remaining data if it's not possible to send it all right away. This should save a few cycles on every send. Note: It's probably possible to avoid the copy even if we can't send everything at once, but will, likely, require some major change of the stream-sll module in order to take into account all the corner cases related to SSL connection. So, not trying to do that for now. Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-11-30 13:39:31 +01:00
Ilya Maximets	dec4291684	ovsdb-data: Consolidate ovsdb atom and json strings. ovsdb_atom_string and json_string are basically the same data structure and ovsdb-server frequently needs to convert one to another. We can avoid that by using json_string from the beginning for all ovsdb strings. So, the conversion turns into simple json_clone(), i.e. increment of a reference counter. This change gives a moderate performance boost in some scenarios, improves the code clarity and may be useful for future development. Acked-by: Mike Pattrick <mkp@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-11-30 13:34:03 +01:00
Ilya Maximets	9d29990c21	json: Inline clone and destroy functions. With the next commit reference counting of json objects will take significant part of the CPU time for ovsdb-server. Inlining them to reduce the cost of a function call. Acked-by: Mike Pattrick <mkp@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-11-30 13:33:43 +01:00
Ilya Maximets	19aa70168b	tests/flowgen: Fix length field of 802.2 data link header. Length in Data Link Header for these packets should not include source and destination MACs or the length field itself. Therefore, it should be 14 bytes less, otherwise other network tools like wireshark complains: Expert Info (Error/Malformed): Length field value goes past the end of the payload Additionally fixing the printing of the packet/flow configuration, as it currently prints '%s=%s' strings without any real data. Acked-by: Paolo Valerio <pvalerio@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-11-30 13:33:08 +01:00
Ilya Maximets	024ba52575	AUTHORS: Add Mike Pattrick. Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-11-30 01:25:01 +01:00
Mike Pattrick	958ca7ba3b	ovs-tcpdump: Improve performance with dummy interface. Currently the ovs-tcpdump utility creates a virtual tunnel to send packets to. This method functions perfectly fine, however, it can greatly impact performance of the monitored port. It has been reported to reduce packet throughput significantly. I was able to reproduce a reduction in throughput of up 70 percent in some tests with a simple setup of two hosts communicating through a single bridge on Linux with the kernel module datapath. Another more complex test was configured for the usermode datapath both with and without DPDK. This test involved a data path going from a VM, through a port into one OVS bridge, out through a network card which could be DPDK enabled for the relevant tests, in to a different network interface, then into a different OVS bridge, through another port, and then into a virtual machine. Using the dummy driver resulted in the following impact to performance compared to no ovs-tcpdump. Due to intra-test variance and fluctuations during the first few seconds after installing a tap; multiple samples were taken over multiple test runs. The first few seconds worth of results were discarded and then results were averaged out. If the dummy driver isn't present, falls back on the existing tap code. Original Script =============== Category Impact on Throughput Kernel datapath - 65% Usermode (no DPDK) - 26% DPDK ports in use - 37% New Script ========== Category Impact on Throughput Kernel datapath - 5% Usermode (no DPDK) - 16% DPDK ports in use - 29% Signed-off-by: Mike Pattrick <mkp@redhat.com> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-11-30 01:24:00 +01:00
Terry Wilson	c60eec0649	ovs-lib: Backup and remove existing DB when joining cluster. ovsdb-tool join-cluster requires a remote addr, so the existing code that tried to join a cluster without it when there was an existing $DB_FILE would fail. Instead, if we are trying to specifically join a cluster and there is an existing $DB_FILE, back it up and remove the original before continuing to join the cluster. Signed-off-by: Terry Wilson <twilson@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Acked-by: Flavio Fernandes <flavio@flaviof.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-11-30 01:24:00 +01:00
Terry Wilson	c041042c12	python: idl: Avoid pre-allocating column defaults. Many python implementations pre-allocate space for multiple objects in empty dicts and lists. Using a custom dict-like object that only generates these objects when they are accessed can save memory. On a fairly pathological case where the DB has 1000 networks each with 100 ports, with only 'name' fields set, this saves around 300MB of memory. One could argue that if values are not going to change from their defaults, then users should not be monitoring those columns, but it's also probably good to not waste memory even if user code is sub-optimal. Signed-off-by: Terry Wilson <twilson@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Acked-by: Flavio Fernandes <flavio@flaviof.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-11-30 01:24:00 +01:00
lic121	2fe20d0bed	docs/dpdk: Fix install doc. Remove bad quotes. Signed-off-by: lic121 <lic121@chinatelecom.cn> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-11-30 01:24:00 +01:00
Ilya Maximets	29f8dc6293	AUTHORS: Add Salvatore Daniele. Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-11-30 01:24:00 +01:00
Adrian Moreno	f88ee78e0a	match: Do not print "igmp" match keyword. The match keyword "igmp" is not supported in ofp-parse, which means that flow dumps cannot be restored. Previously a workaround was added to ovs-save to avoid changing output in stable branches. This patch changes the output to print igmp match in the accepted ofp-parse format (ip,nw_proto=2) and print igmp_type/code as generic tp_src/dst. Tests are added, and NEWS is updated to reflect this change. The workaround in ovs-save is still included to ensure that flows can be restored when upgrading an older ovs-vswitchd. This workaround should be removed in later versions. Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Salvatore Daniele <sdaniele@redhat.com> Co-authored-by: Salvatore Daniele <sdaniele@redhat.com> Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-11-29 22:45:48 +01:00
Salvatore Daniele	59622fd1bc	ovs-save: Save igmp flows in ofp_parse syntax. match.c generates the keyword "igmp", which is not supported in ofp-parse. This means that flow dumps containing 'igmp' can not be restored. Removing the 'igmp' keyword entirely could break existing scripts in stable branches, so this patch creates a workaround within ovs-save by converting any instances of "igmp" within $bridge.flows.dump into "ip, nw_proto=2", and any instances of igmp_type/code into the generic tp_src/dst. Signed-off-by: Salvatore Daniele <sdaniele@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-11-29 22:28:04 +01:00
Suneetha Kalahasthi	3b2982c423	faq: Update OVS/DPDK version table for OVS 2.13/2.14. FAQ is updated to reflect the latest DPDK for OVS branch 2.13 and 2.14 Signed-off-by: Suneetha Kalahasthi <suneetha.kalahasthi@intel.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-11-29 21:17:46 +01:00
Yunjian Wang	72fbb90afc	ofproto-dpif-xlate: Add a trace log for tnl_port_build_header() error. It is useful to also log when tnl_port_build_header() failed. Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-11-29 20:38:01 +01:00
Ilya Maximets	7b8aeadd60	ovsdb-idl: Re-parse backrefs of inserted rows only once. While adding new rows ovsdb-idl re-parses all the other rows that references this new one. For example, current ovn-kubernetes creates load balancers and adds the same load balancer to all logical switches and logical routers. So, then a new load balancer is added, rows for all logical switches and routers re-parsed. During initial database connection (or re-connection with monitor/monitor_cond or monitor_cond_since with outdated last transaction id) the client downloads the whole content of a database. In case of OVN, there might be already thousands of load balancers configured. ovsdb-idl will process rows in that initial monitor reply one-by-one. Therefore, for each load balancer row, it will re-parse all rows for switches and routers. Assuming that we have 120 Logical Switches and 30K load balancers. Processing of the initial monitor reply will take 120 (switch rows) * 30K (load balancer references in a switch row) * 30K (load balancer rows) = 10^11 operations, which may take hours. ovn-kubernetes will use LB groups eventually, but there are other less obvious cases that cannot be changed that easily. Re-parsing doesn't change any internal structures of the IDL. It destroys and re-creates exactly same arcs between rows. The only thing that changes is the application-facing array of pointers. Since internal structures remains intact, suggested solution is to postpone the re-parsing of back references until all the monitor updates processed. This way we can re-parse each row only once. Tested in a sandbox with 120 LSs, 120 LRs and 3K LBs, where each load balancer added to each LS and LR, by re-statring ovn-northd and measuring the time spent in ovsdb_idl_run(). Before the change: OVN_Southbound: ovsdb_idl_run took: 924 ms OVN_Northbound: ovsdb_idl_run took: 825118 ms --> 13.75 minutes! After: OVN_Southbound: ovsdb_idl_run took: 692 ms OVN_Northbound: ovsdb_idl_run took: 1698 ms Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-11-19 17:25:30 +01:00
Ilya Maximets	fb7a75e523	ofproto-dpif-xlate: Terminate native tunnels only on ports with IP addresses. Commit `dc0bd12f5b` removed restriction that tunnel endpoint must be a bridge port. So, currently OVS has to check if the native tunnel needs to be terminated regardless of the output port. Unfortunately, there is a side effect: tnl_port_map_lookup() always adds at least 'dl_dst' match to the megaflow that ends up in the corresponding datapath flow. And since tunneling works on L3 level and not restricted by any particular bridge, this extra match criteria is added to every datapath flow on every bridge even if that bridge cannot be part of a tunnel processing. For example, if OVS has at least one tunnel configured and we're adding a completely separate bridge with 2 ports and simple rules to forward packets between two ports, there still will be a match on a destination mac address: 1. <create a tunnel configuration in OVS> 2. ovs-vsctl add-br br-non-tunnel -- set bridge datapath_type=netdev 3. ovs-vsctl add-port br-non-tunnel port0 -- add-port br-non-tunnel port1 4. ovs-ofctl del-flows br-non-tunnel 5. ovs-ofctl add-flow br-non-tunnel in_port=port0,actions=port1 6. ovs-ofctl add-flow br-non-tunnel in_port=port1,actions=port0 # ovs-appctl ofproto/trace br-non-tunnel in_port=port0 Flow: in_port=1,vlan_tci=0x0000, dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,dl_type=0x0000 bridge("br-non-tunnel") ----------------------- 0. in_port=1, priority 32768 output:2 Final flow: unchanged Megaflow: recirc_id=0,eth,in_port=1,dl_dst=00:00:00:00:00:00,dl_type=0x0000 Datapath actions: 5 ^^^^^^^^^^^^^^^^^^^^^^^^ This increases the number of upcalls and installed datapath flows, since separate flow needs to be installed per destination MAC, reducing the switching performance. This also blocks datapath performance optimizations that are based on the datapath flow simplicity. In general, in order to be a tunnel endpoint, port has to have an IP address. Hence native tunnel termination should be attempted only for such ports. This allows to avoid extra matches in most cases. Fixes: `dc0bd12f5b` ("userspace: Enable non-bridge port as tunnel endpoint.") Reported-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-October/388904.html Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Mike Pattrick <mkp@redhat.com>	2021-11-19 17:25:18 +01:00
Numan Siddique	9fe0ce4f72	ofproto-dpif-xlate: Fix check_pkt_larger incomplete translation. xlate_check_pkt_larger() sets ctx->exit to 'true' at the end causing the translation to stop. This results in incomplete datapath rules. For example, for the below OF rules configured on a bridge, table=0,in_port=1 actions=load:0x1->NXM_NX_REG1[[]],resubmit(,1), load:0x2->NXM_NX_REG1[[]],resubmit(,1), load:0x3->NXM_NX_REG1[[]],resubmit(,1) table=1,in_port=1,reg1=0x1 actions=check_pkt_larger(200)->NXM_NX_REG0[[0]], resubmit(,4) table=1,in_port=1,reg1=0x2 actions=output:2 table=1,in_port=1,reg1=0x3 actions=output:4 table=4,in_port=1 actions=output:3 The datapath flow should be: check_pkt_len(size=200,gt(3),le(3)),2,4 But right now it is: check_pkt_len(size=200,gt(3),le(3)) Actions after the first resubmit(,1) in the first flow in table 0 are never applied. This patch fixes this issue. Fixes: `5b34f8fc3b` ("Add a new OVS action check_pkt_larger") Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2018365 Reported-by: Ihar Hrachyshka <ihrachys@redhat.com> Signed-off-by: Numan Siddique <numans@ovn.org> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-11-17 22:49:51 +01:00
Kevin Traynor	a83a406096	dpif-netdev: Sync PMD ALB state with user commands. Previously, when a user enabled PMD auto load balancer with pmd-auto-lb="true", some conditions such as number of PMDs/RxQs that were required for a rebalance to take place were checked. If the configuration meant that a rebalance would not take place then PMD ALB was logged as 'disabled' and not run. Later, if the PMD/RxQ configuration changed whereby a rebalance could be effective, PMD ALB was logged as 'enabled' and would run at the appropriate time. This worked ok from a functional view but it is unintuitive for the user reading the logs. e.g. with one PMD (PMD ALB would not be effective) User enables ALB, but logs say it is disabled because it won't run. $ ovs-vsctl set open_vSwitch . other_config:pmd-auto-lb="true" \|dpif_netdev\|INFO\|PMD auto load balance is disabled No dry run takes place. Add more PMDs (PMD ALB may be effective). $ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=50 \|dpif_netdev\|INFO\|PMD auto load balance is enabled ... Dry run takes place. \|dpif_netdev\|DBG\|PMD auto load balance performing dry run. A better approach is to simply reflect back the user enable/disable state in the logs and deal with whether the rebalance will be effective when needed. That is the approach taken in this patch. To cut down on unneccessary work, some basic checks are also made before starting a PMD ALB dry run and debug logs can indicate this to the user. e.g. with one PMD (PMD ALB would not be effective) User enables ALB, and logs confirm the user has enabled it. $ ovs-vsctl set open_vSwitch . other_config:pmd-auto-lb="true" \|dpif_netdev\|INFO\|PMD auto load balance is enabled... No dry run takes place. \|dpif_netdev\|DBG\|PMD auto load balance nothing to do, not enough non-isolated PMDs or RxQs. Add more PMDs (PMD ALB may be effective). $ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=50 Dry run takes place. \|dpif_netdev\|DBG\|PMD auto load balance performing dry run. Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Acked-by: Gaetan Rivet <grive@u256.net> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-11-17 21:19:11 +01:00
lin huang	513ed65700	system-traffic.at: Fix typo in conntrack zones tests. Signed-off-by: Lin Huang <linhuang@ruijie.com.cn> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-11-16 21:06:53 +01:00
Kevin Traynor	23083672b5	dpif-netdev: Reset RxQ cycles history on PMD reload. When a PMD reload occurs, some PMD cycle measurements are reset. In order to preserve the full cycles history of an Rxq, the RxQ cycle measurements were not reset. These are both used together to display the % of a PMD that an RxQ is using in the pmd-rxq-show stat. Resetting one and not the other can lead to some unintuitive looking stats while the stats settle for the new config. In some cases, it may appear like the RxQs are using >100% of a PMD. This resolves when the stats settle for the new config, but seeing RxQs apparently using >100% of a PMD may confuse a user and lead them to think there is a bug. To avoid this, reset the RxQ cycle measurements on PMD reload. Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Gaetan Rivet <grive@u256.net> Acked-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-11-16 16:56:40 +01:00
David Marchand	bd0fec42a2	ofproto/bond: Improve admissibility debug readability. The admissibility check currently log a message like (line wrapped in this commitlog): bond(revalidator11)\|DBG\|member (dpdk0): Admissibility verdict is to drop pkt as different port is learned.active member: false, may_enable: true enable: true LACP status:2 Fix spaces around the period character and separate debug infos with commas. Prefix all log messages in this check with bond and member names. Display a human readable string for LACP status. New logs look like: bond(revalidator11)\|DBG\|bond dpdkbond0: member dpdk0: admissibility verdict is to drop pkt as different port is learned, active member: false, may_enable: true, enabled: true, LACP status: off Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-11-16 14:47:18 +01:00
Wilson Peng	71eb2ec446	datapath-windows: Reset flow key after Ipv4 fragments are reassembled While testing OVS-windows flows for the Ip fragments, the traffic will be dropped As it may match incorrect OVS flow. From the code, after the Ipv4 fragments are Reassembled, it willl still use the flow key of the last Ipv4 fragments to match CT causing match error. Reported-at:https://github.com/openvswitch/ovs-issues/issues/232 Signed-off-by: Wilson Peng <pweisong@vmware.com> Signed-off-by: Alin-Gabriel Serdean <aserdean@ovn.org>	2021-11-15 10:50:55 +02:00
Terry Wilson	04b017e3a2	python: db: Avoid allocation of an attr dict/row+column. Python objects normally have a dictionary named __dict__ allocated for handling dynamically assigned attributes. Depending on architecture and Python version, that empty dict may be between 64 and 280 bytes. Seeing as Atom and Datum objects do not need dynamic attribute support and there can be millions of rows in a database, avoiding this allocation with __slots__ can save 100s of MBs of memory per Idl process. Signed-off-by: Terry Wilson <twilson@redhat.com> Acked-by: Timothy Redaelli <tredaelli@redhat.com> Tested-by: Timothy Redaelli <tredaelli@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-11-09 00:27:10 +01:00
Dumitru Ceara	695530d8fb	github: Remove workaround fixing up /etc/hosts. The issue that was worked around has been fixed in the meantime: https://github.com/actions/virtual-environments/issues/3353 Signed-off-by: Dumitru Ceara <dceara@redhat.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-11-09 00:13:08 +01:00
Timothy Redaelli	113f925aa5	rhel: Use /run instead of /var/run. Systemd unit file generates warnings about PID file path since /var/run is a legacy path so just use /run instead of /var/run. /var/run is a symlink of /run starting from RHEL7 (and any other distribution that uses systemd). Reported-at: https://bugzilla.redhat.com/1952081 Signed-off-by: Timothy Redaelli <tredaelli@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-11-08 22:45:49 +01:00
Eelco Chaudron	9b20df73a6	dpctl: dpif: Allow viewing and configuring dp cache sizes. This patch adds a general way of viewing/configuring datapath cache sizes. With an implementation for the netlink interface. The ovs-dpctl/ovs-appctl show commands will display the current cache sizes configured: $ ovs-dpctl show system@ovs-system: lookups: hit:25 missed:63 lost:0 flows: 0 masks: hit:282 total:0 hit/pkt:3.20 cache: hit:4 hit-rate:4.54% caches: masks-cache: size:256 port 0: ovs-system (internal) port 1: br-int (internal) port 2: genev_sys_6081 (geneve: packet_type=ptap) port 3: br-ex (internal) port 4: eth2 port 5: sw0p1 (internal) port 6: sw0p3 (internal) A specific cache can be configured as follows: $ ovs-appctl dpctl/cache-set-size DP CACHE SIZE $ ovs-dpctl cache-set-size DP CACHE SIZE For example to disable the cache do: $ ovs-dpctl cache-set-size system@ovs-system masks-cache 0 Setting cache size successful, new size 0. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Paolo Valerio <pvalerio@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-11-08 21:48:05 +01:00
Eelco Chaudron	efd55eb34c	dpctl: dpif: Add kernel datapath cache hit output. This patch adds cache usage statistics to the output: $ ovs-dpctl show system@ovs-system: lookups: hit:24 missed:71 lost:0 flows: 0 masks: hit:334 total:0 hit/pkt:3.52 cache: hit:4 hit-rate:4.21% port 0: ovs-system (internal) port 1: genev_sys_6081 (geneve: packet_type=ptap) port 2: br-int (internal) port 3: br-ex (internal) port 4: eth2 port 5: sw1p1 (internal) port 6: sw0p4 (internal) Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Paolo Valerio <pvalerio@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-11-08 21:48:05 +01:00
Ilya Maximets	317b1bfd7d	ovsdb: Don't let transaction history grow larger than the database. If user frequently changes a lot of rows in a database, transaction history could grow way larger than the database itself. This wastes a lot of memory and also makes monitor_cond_since slower than usual monotor_cond if the transaction id is old enough, because re-construction of the changes from a history is slower than just creation of initial database snapshot. This is also the case if user deleted a lot of data, so transaction history still holds all of it while the database itself doesn't. In case of current lb-per-service model in ovn-kubernetes, each load-balancer is added to every logical switch/router. Such a transaction touches more than a half of a OVN_Northbound database. And each of these transactions is added to the transaction history. Since transaction history depth is 100, in worst case scenario, it will hold 100 copies of a database increasing memory consumption dramatically. In tests with 3000 LBs and 120 LSs, memory goes up to 3 GB, while holding at 30 MB if transaction history disabled in the code. Fixing that by keeping count of the number of ovsdb_atom's in the database and not allowing the total number of atoms in transaction history to grow larger than this value. Counting atoms is fairly cheap because we don't need to iterate over them, so it doesn't have significant performance impact. It would be ideal to measure the size of individual atoms, but that will hit the performance. Counting cells instead of atoms is not sufficient, because OVN users are adding hundreds or thousands of atoms to a single cell, so they are largely different in size. Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Han Zhou <hzhou@ovn.org> Acked-by: Dumitru Ceara <dceara@redhat.com>	2021-11-05 22:33:03 +01:00
Dumitru Ceara	1bdda7b6d5	ovsdb-idl: Use functions to apply diff in place. On large scale deployments with records that contain large sets, this significantly improves client side performance as it avoids comparing full contents of the old and new rows. Signed-off-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-11-05 00:06:21 +01:00
Ilya Maximets	4dbff9f0a6	ovsdb: transaction: Incremental reassessment of weak refs. The main idea is to not store list of weak references in the source row, so they all don't need to be re-checked/updated on every modification of that source row. The point is that source row already knows UUIDs of all destination rows stored in the data, so there is no much profit in storing this information somewhere else. If needed, destination row can be looked up and reference can be looked up in the destination row. For the fast lookup, destination row now stores references in a hash map. Weak reference structure now contains the table and uuid of a source row instead of a direct pointer. This allows to replace/update the source row without breaking any weak references stored in destination rows. Structure also now contains the key-value pair of atoms that triggered creation of this reference. These atoms can be used to quickly subtract removed references from a source row. During reassessment, ovsdb now only needs to care about new added or removed atoms, and atoms that got removed due to removal of the destination rows, but these are marked for reassessment by the destination row. ovsdb_datum_subtract() is used to remove atoms that points to removed or incorrect rows, so there is no need to re-sort datum in the end. Results of an OVN load-balancer benchmark that adds 3K load-balancers to each of 120 logical switches and 120 logical routers in the OVN sandbox with clustered Northbound database and then removes them: Before: %CPU CPU Time CMD 86.8 00:16:05 ovsdb-server nb1.db 44.1 00:08:11 ovsdb-server nb2.db 43.2 00:08:00 ovsdb-server nb3.db After: %CPU CPU Time CMD 54.9 00:02:58 ovsdb-server nb1.db 33.3 00:01:48 ovsdb-server nb2.db 32.2 00:01:44 ovsdb-server nb3.db So, on a cluster leader the processing time dropped by 5.4x, on followers - by 4.5x. More load-balancers - larger the performance difference. There is a slight increase of memory usage, because new reference structure is larger, but the difference is not significant. Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Dumitru Ceara <dceara@redhat.com>	2021-11-04 23:20:01 +01:00
Ilya Maximets	066741d9c5	ovsdb-idl: Add memory report function. Added new function to return memory usage statistics for database objects inside IDL. Statistics similar to what ovsdb-server reports. Not counting _Server database as it should be small, hence doesn't worth adding extra code to the ovsdb-cs module. Can be added later if needed. ovs-vswitchd is a user in OVS, but this API will be mostly useful for OVN daemons. Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Han Zhou <hzhou@ovn.org> Acked-by: Dumitru Ceara <dceara@redhat.com>	2021-11-04 23:13:13 +01:00
Timothy Redaelli	c5d384f77b	checkpatch: Check if some tags are wrongly written. Currently, there are some patches with the tags wrongly written (with space instead of dash ) and this may prevent some automatic system or CI to detect them correctly. This commit adds a check in checkpatch to be sure the tag is written correctly with dash and not with space. The tags supported by the commit are: Acked-by, Reported-at, Reported-by, Requested-by, Reviewed-by, Submitted-at and Suggested-by. It's not necessary to add "Signed-off-by" since it's already checked in checkpatch. Signed-off-by: Timothy Redaelli <tredaelli@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-11-04 22:26:00 +01:00
Ilya Maximets	9f2258360f	AUTHORS: Add Somnath Chatterjee. Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-11-03 17:27:38 +01:00
Somnath Chatterjee	42c348184b	dpif: Fix function pointer check for bond_add. There was typo in function pointer check in dpif_bond_add() before calling bond_add(). Fixes: `9df65060cf` ("userspace: Avoid dp_hash recirculation for balance-tcp bond mode.") Signed-off-by: Somnath Chatterjee <somnath.b.chatterjee@ericsson.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-11-03 16:38:45 +01:00
Timothy Redaelli	68543dd523	python: Replace pyOpenSSL with ssl. Currently, pyOpenSSL is half-deprecated upstream and so it's removed on some distributions (for example on CentOS Stream 9, https://issues.redhat.com/browse/CS-336), but since OVS only supports Python 3 it's possible to replace pyOpenSSL with "import ssl" included in base Python 3. Stream recv and send had to be splitted as _recv and _send, since SSLError is a subclass of socket.error and so it was not possible to except for SSLWantReadError and SSLWantWriteError in recv and send of SSLStream. TCPstream._open cannot be used in SSLStream, since Python ssl module requires the SSL socket to be created before connecting it, so SSLStream._open needs to create the socket, create SSL socket and then connect the SSL socket. Reported-by: Timothy Redaelli <tredaelli@redhat.com> Reported-at: https://bugzilla.redhat.com/1988429 Signed-off-by: Timothy Redaelli <tredaelli@redhat.com> Acked-by: Terry Wilson <twilson@redhat.com> Tested-by: Terry Wilson <twilson@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-11-03 16:00:04 +01:00
Timothy Redaelli	3f550fa538	python: socket-util: Split inet_open_active function and use connect_ex. In an upcoming patch, PyOpenSSL will be replaced with Python ssl module, but in order to do an async connection with Python ssl module the ssl socket must be created when the socket is created, but before the socket is connected. So, inet_open_active function is splitted in 3 parts: - inet_create_socket_active: creates the socket and returns the family and the socket, or (error, None) if some error needs to be returned. - inet_connect_active: connect the socket and returns the errno (it returns 0 if errno is EINPROGRESS or EWOULDBLOCK). connect is replaced by connect_ex, since Python suggest to use it for asynchronous connects and it's also cleaner since inet_connect_active returns errno that connect_ex already returns, moreover due to a Python limitation connect cannot not be used with ssl module. inet_open_active function is changed in order to use the new functions inet_create_socket_active and inet_connect_active. Signed-off-by: Timothy Redaelli <tredaelli@redhat.com> Acked-by: Terry Wilson <twilson@redhat.com> Tested-by: Terry Wilson <twilson@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-11-03 15:24:44 +01:00
Wilson Peng	56c3de3c61	datapath-windows:Reset PseudoChecksum value only for TX direction offload case While testing OVS-windows flows for the DNAT/SNAT action, the checksum in TCP header is set incorrectly when TCP offload is enabled by default. As a result,the packet will be dropped on the Windows VM when processing the packet from Linux VM which has included correct checksum at first. On the Windows VM, it has gone through two NAT actions and OVS Windows kernel will reset the checksum to PseudoChecksum and then it will lose the original correct checksum value which is set outside. Back to the Nat TCP/UDP checksum value reset logic,it should reset it TCP checksum To be PseudoChecksum value only on Tx direction for TCP Offload case. For the packet From the outside, OVS Windows Kernel does not need reset the TCP/UDP checksum as It should be the job of the received network driver to get out a correct checksum Value. >>>sample flow on default configuration on both Windows VM and Linux VM (src=192.168.252.1,dst=10.110.225.146)-->dnat/snat-> (src=169.254.169.253, Dst=10.176.26.107) Without the fix the return back packet(src=10.176.26.107, Dst=169.254.169.253) will have the correct TCP checksum. After the reverse NAT Actions, it will be changed to be packet (src=10.110.225.146, Dst=192.168.252.1) But with incorrect TCP checksum 0xa97a which is The PseudoChecksum. Related packet is put on the reported issue below. Reported-at:https://github.com/openvswitch/ovs-issues/issues/231 Signed-off-by: Wilson Peng <pweisong@vmware.com> Signed-off-by: Alin-Gabriel Serdean <aserdean@ovn.org>	2021-10-28 06:32:09 +03:00
Aaron Conole	4bd3755810	ci: Make linux-prepare trust system installs. Recently, the github actions CI environment has been broken due to an incompatibility between sphinx-build and the docutils python package. The pip3 install command will upgrade docutils to an incompatible version. Since we install sphinx via pip3, it will always install an appropriate version of docutils package. By forcing the upgrade, we created a broken situation. Remove the upgrade command and trust pip3. Signed-off-by: Aaron Conole <aconole@redhat.com> Reported-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-10-26 21:43:09 +02:00
Ilya Maximets	fa2bc2bb7c	github: Stick to python 3.9. Since recently actions/setup-python@v2 started to pull python 3.10.0 which seems to be incompatible with the meson 0.47.1 which we're using to build DPDK. This broke CI on 2.16 and master branches: https://github.com/ovsrobot/ovs/runs/3967167374 Sticking the version to 3.9 for now to avoid the CI failure. Dependency resolver is still not very happy, but at least it works. We'll need to find a newer version of meson to use later and revert this change. Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Aaron Conole <aconole@redhat.com>	2021-10-21 21:31:53 +02:00
Wilson Peng	a621ac5eaf	datapath-windows: add layers when adding the deferred actions Currently the layers info propogated to ProcessDeferredActions may be incorrect. Because of this, any subsequent usage of layers might result in undesired behavior. Accordingly in this patch it will add the related layers in the deferred action to make sure the layers consistent with the related NBL. In the specified case 229, we have encountered one issue when doing the decap Geneve Packet and doing the twice NAT(via two flow tables) and found the HTTP packet will be changed the TCP sequence. After debugging, we found the issue is caused by the not-updated layers value isTcp and isUdp for Geneve decapping case. The related function call chains are listed below, OvsExecuteDpIoctl—>OvsActionsExecute—>OvsDoExecuteActions->OvsTunnelPortRx ——>OvsDoExecuteActions——〉nat ct action and recircle action ->OvsActionsExecute->defered_actions processing for nat and recircle action For the Geneve packet decaping, it will firstly set the layers for Udp packet. Then it will go on doing OVS flow extract to get the inner packet layers and Processing the first nat action and first recircle action. After that datapath Will do defered_actions processing on OvsActionsExecute. And it does inherit The incorrect geneve packet layers value( isTCP 0 and isUdp 1).So in the second Nat action processing it will get the wrong TCP Headers in OvsUpdateAddressAndPort And it will update related TCP check field value but in this case it will change The packet Tcp seq value. Reported-at:https://github.com/openvswitch/ovs-issues/issues/229 Signed-off-by: Wilson Peng <pweisong@vmware.com> Signed-off-by: Alin-Gabriel Serdean <aserdean@ovn.org>	2021-10-19 22:45:38 +03:00
Peng He	c1fdb83471	ofproto-dpif-xlate: Fix zone set from non-frozen-metadata fields. CT zone could be set from a field that is not included in frozen metadata. Consider the example rules which are typically seen in OpenStack security group rules: priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0) priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2 The zone is set from the first rule's ct action. These two rules will generate two megaflows: the first one uses zone=5 to query the CT module, the second one sets the zone-id from the first megaflow and commit to CT. The current implementation will generate a megaflow that does not use ct_zone=5 as a match, but directly commit into the ct using zone=5, as zone is set by an Imm not a field. Consider a situation that one changes the zone id (for example to 15) in the first rule, however, still keep the second rule unchanged. During this change, there is traffic hitting the two generated megaflows, the revaldiator would revalidate all megaflows, however, the revalidator will not change the second megaflow, because zone=5 is recorded in the megaflow, so the xlate will still translate the commit action into zone=5, and the new traffic will still commit to CT as zone=5, not zone=15, resulting in taffic drops and other issues. Just like OVS set-field convention, if a field X is set by Y (Y is a variable not an Imm), we should also mask Y as a match in the generated megaflow. An exception is that if the zone-id is set by the field that is included in the frozen state (i.e. regs) and this upcall is a resume of a thawed xlate, the un-wildcarding can be skipped, as the recirc_id is a hash of the values in these fields, and it will change following the changes of these fields. When the recirc_id changes, all megaflows with the old recirc id will be invalid later. Fixes: `07659514c3` ("Add support for connection tracking.") Reported-by: Sai Su <susai.ss@bytedance.com> Signed-off-by: Peng He <hepeng.0320@bytedance.com> Acked-by: Mark D. Gray <mark.d.gray@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-10-13 22:17:30 +02:00

... 9 10 11 12 13 ...

19352 Commits