mir/ovs - ovs - Mike's Git repositories

mir/ovs

mirror of https://github.com/openvswitch/ovs synced 2025-08-29 21:38:13 +00:00

Author	SHA1	Message	Date
Ilya Maximets	e4a2b01093	AUTHORS: Add Ales Musil. Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-06-29 13:17:16 +02:00
David Marchand	fe171e4f10	dpif-netdev: Refactor AVX512 runtime checks. As described in the bugzilla below, cpu_has_isa code may be compiled with some AVX512 instructions in it, because cpu.c is built as part of the libopenvswitchavx512. This is a problem when this function (supposed to probe for AVX512 instructions availability) is invoked from generic OVS code, on older CPUs that don't support them. For the same reason, dpcls_subtable_avx512_gather_probe, dp_netdev_input_outer_avx512_probe, mfex_avx512_probe and mfex_avx512_vbmi_probe are potential runtime bombs and can't either be built as part of libopenvswitchavx512. Move cpu.c to be part of the "normal" libopenvswitch. And move other helpers in generic OVS code. Note: - dpcls_subtable_avx512_gather_probe is split in two, because it also needs to do its own magic, - while moving those helpers, prefer direct calls to cpu_has_isa and avoid cast to intermediate integer variables when a simple boolean is enough, Fixes: 352b6c7116cd ("dpif-lookup: add avx512 gather implementation.") Fixes: abb807e27dd4 ("dpif-netdev: Add command to switch dpif implementation.") Fixes: 250ceddcc2d0 ("dpif-netdev/mfex: Add AVX512 based optimized miniflow extract") Fixes: b366fa2f4947 ("dpif-netdev: Call cpuid for x86 isa availability.") Reported-at: https://bugzilla.redhat.com/2100393 Reported-by: Ales Musil <amusil@redhat.com> Co-authored-by: Ales Musil <amusil@redhat.com> Signed-off-by: Ales Musil <amusil@redhat.com> Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Acked-by: Ales Musil <amusil@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-06-29 11:27:09 +02:00
Dumitru Ceara	6835d4b01e	python: Add Python bindings TODO file. For now include the IDL related TODO items as discussed at: https://mail.openvswitch.org/pipermail/ovs-dev/2022-April/393516.html Signed-off-by: Dumitru Ceara <dceara@redhat.com> Acked-by: Terry Wilson <twilson@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-06-28 13:47:34 +02:00
Dumitru Ceara	a9ec4e3be3	ovsdb-server: Log database transactions for user requested tables. Add a new command, 'ovsdb-server/tlog-set DB:TABLE on\|off', which allows the user to enable/disable transaction logging for specific databases and tables. By default, logging is disabled. Once enabled, logs are generated with level INFO and are also rate limited. If used with care, this command can be useful in analyzing production deployment performance issues, allowing the user to pin point bottlenecks without the need to enable wider debug logs, e.g., jsonrpc. A command to inspect the logging state is also added: 'ovsdb-server/tlog-list'. Signed-off-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-06-28 13:45:36 +02:00
Dumitru Ceara	c558f9f1e1	ovsdb-idl: Get per-database memory usage statistics. Clients might be connected to multiple databases (e.g., ovn-controller is connected to OVN_Southbound and Open_vSwitch databases) and the IDL memory statistics are more useful if they're not aggregated. Signed-off-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-06-28 13:41:42 +02:00
Cian Ferriter	23ed22594d	dpif-netdev-extract-avx512: Protect GCC builtin usage. __builtin_constant_p is only available in GCC and only versions >= 4. Use the same "#if __GNUC__ >= 4" check used in other parts of OVS for this builtin. Signed-off-by: Cian Ferriter <cian.ferriter@intel.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-06-28 13:36:52 +02:00
Mike Pattrick	8c1c447a1d	ovs-tcpdump: Default to OVS_RUNDIR if present. Now ovs-tcpdump will check for an OVS_RUNDIR environment variable and if present, use it instead of the default RUNDIR. This is useful when used in conjunction with OVS_PAUSE_TEST while running the test suite. Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-06-28 13:34:58 +02:00
Harry van Haaren	751d05b474	dpcls: Add unlisted alias for subtable lookup command. This patch adds the old name "subtable-lookup-prio-get" as an unlisted command, to restore a consistency between OVS releases for testing scripts. Fixes: 738c76a503f4 ("dpcls: Change info-get function to fetch dpcls usage stats.") Suggested-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-06-28 13:32:50 +02:00
Yunjian Wang	cb9ae5f0fd	ovsdb: Fix memory leak on error path in ovsdb_file_read__(). Found by Coverity. Fixes: 1b1d2e6daa56 ("ovsdb: Introduce experimental support for clustered databases.") Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-06-28 13:24:47 +02:00
Eelco Chaudron	299050c2d0	odp-util: Ignore unknown attributes in parse_key_and_mask_to_match(). When processing netlink messages, we should ignore unknown OVS_KEY_ATTR as we can assume if newer attributes are present, they are backward compatible. This patch also updates the existing comments to better explain what happens in the error cases. At this place in the code, we can not ignore the TOO_LITTLE/MUCH errors as some instances could have canceled processing leaving the returning match data incomplete. Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2089331 Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Michael Santana <msantana@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-06-28 12:20:29 +02:00
lic121	29a2f18354	ofproto-dpif: Avoid unneccesary backer revalidation. If lldp didn't change, we are not supposed to trigger backer revalidation. Without this patch, bridge_reconfigure() always trigger udpif revalidator because of lldp. Signed-off-by: lic121 <lic121@chinatelecom.cn> Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Co-authored-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Paolo Valerio <pvalerio@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-06-28 12:06:23 +02:00
lic121	509c32765a	lldp: Fix lldp memory leak. lldp_create() malloc memory for lldp->lldpd->g_hardware. lldp_unref is supposed to free the memory regardless of hw->h_flags. Signed-off-by: lic121 <lic121@chinatelecom.cn> Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Paolo Valerio <pvalerio@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-06-28 12:06:11 +02:00
lic121	f1c51be500	ipfix: Trigger revalidation if ipfix options changes. ipfix cfg creation/deleting triggers revalidation. But this does not cover the case where ipfix options changes, which also suppose to trigger revalidation. Fixes: a9f5ee1199e1 ("ofproto-dpif: Trigger revalidation when ipfix config set.") Signed-off-by: lic121 <lic121@chinatelecom.cn> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-06-28 11:48:41 +02:00
Ilya Maximets	4e1e1e189f	conntrack: Fix incorrect bit shift while hashing nat range. 'max_port' is 16bit field, shift expands it to 'int', not unsigned int. lib/conntrack.c:2245:41: runtime error: left shift of 34568 by 16 places cannot be represented in type 'int'. 0 0xec45f4 in nat_range_hash lib/conntrack.c:2245:41 1 0xec45f4 in nat_get_unique_tuple lib/conntrack.c:2422:21 2 0xec45f4 in conn_not_found lib/conntrack.c:1035:32 3 0xeaf0a5 in process_one lib/conntrack.c:1407:20 4 0xea9390 in conntrack_execute lib/conntrack.c:1465:13 5 0x839530 in dp_execute_cb lib/dpif-netdev.c:9060:9 6 0x9909cc in odp_execute_actions lib/odp-execute.c:868:17 7 0x830946 in dp_netdev_execute_actions lib/dpif-netdev.c:9106:5 8 0x830946 in handle_packet_upcall lib/dpif-netdev.c:8294:5 9 0x82ea5e in fast_path_processing lib/dpif-netdev.c:8390:25 10 0x7ed87f in dp_netdev_input__ lib/dpif-netdev.c:8479:9 11 0x7eb5fc in dp_netdev_input lib/dpif-netdev.c:8517:5 12 0x81dada in dp_netdev_process_rxq_port lib/dpif-netdev.c:5329:19 13 0x7f0063 in dpif_netdev_run lib/dpif-netdev.c:6664:25 14 0x85f036 in dpif_run lib/dpif.c:467:16 15 0x61833a in type_run ofproto/ofproto-dpif.c:366:9 16 0x5c210e in ofproto_type_run ofproto/ofproto.c:1822:31 17 0x565db2 in bridge_run__ vswitchd/bridge.c:3245:9 18 0x562f82 in bridge_run vswitchd/bridge.c:3310:5 19 0x59a98c in main vswitchd/ovs-vswitchd.c:129:9 20 0x7f8864c3acf2 in __libc_start_main (/lib64/libc.so.6+0x3acf2) 21 0x47e60d in _start (vswitchd/ovs-vswitchd+0x47e60d) Fixes: 92edd073ce6c ("conntrack: Hash entire NAT data structure in nat_range_hash().") Acked-by: Paolo Valerio <pvalerio@redhat.com> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-06-24 23:51:31 +02:00
Ilya Maximets	334d43bc03	packets: Fix misaligned write to MPLS lse. MPLS header is only 2 byte aligned, so the value has to be written in parts. Also, even though the 'struct mpls_hdr' has only one field, it's cleaner to not access that field directly. lib/packets.c:432:9: runtime error: store to misaligned address 0x61b000756382 for type 'ovs_be32' (aka 'unsigned int'), which requires 4 byte alignment 0x61b000756382: note: pointer points here 00 00 be be be be be be ff ff ff ff ff ff a6 36 77 20 ... ^ 0 0xbb30ae in add_mpls lib/packets.c:432:17 1 0x9934d2 in odp_execute_actions lib/odp-execute.c:1072:17 2 0x830946 in dp_netdev_execute_actions lib/dpif-netdev.c:9106:5 3 0x830946 in handle_packet_upcall lib/dpif-netdev.c:8294:5 4 0x82ea5e in fast_path_processing lib/dpif-netdev.c:8390:25 5 0x7ed87f in dp_netdev_input__ lib/dpif-netdev.c:8479:9 6 0x7eb5fc in dp_netdev_input lib/dpif-netdev.c:8517:5 7 0x81dada in dp_netdev_process_rxq_port lib/dpif-netdev.c:5329:19 8 0x7f0063 in dpif_netdev_run lib/dpif-netdev.c:6664:25 9 0x85f036 in dpif_run lib/dpif.c:467:16 10 0x61833a in type_run ofproto/ofproto-dpif.c:366:9 11 0x5c210e in ofproto_type_run ofproto/ofproto.c:1822:31 12 0x565db2 in bridge_run__ vswitchd/bridge.c:3245:9 13 0x562f82 in bridge_run vswitchd/bridge.c:3310:5 14 0x59a98c in main vswitchd/ovs-vswitchd.c:129:9 15 0x7ff895c3acf2 in __libc_start_main (/lib64/libc.so.6+0x3acf2) 16 0x47e60d in _start (vswitchd/ovs-vswitchd+0x47e60d) Fixes: 1917ace89364 ("Encap & Decap actions for MPLS packet type.") Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-06-24 23:51:02 +02:00
Ilya Maximets	a2d202bdee	tc: Fix misaligned access to stats and time values. Pointers to gnet_stats_basic and tcf_t are not correctly aligned, so we need to copy the data before accessing. Found by running check-offloads testsuite with UBsan: lib/tc.c:1791:50: runtime error: member access within misaligned address 0x61900005ce1c for type 'const struct gnet_stats_basic', which requires 8 byte alignment 0x61900005ce1c: note: pointer points here 14 00 07 00 00 00 00 00 00 00 00 00 00 00 00 00 ... ^ 0 0xd69044 in nl_parse_single_action lib/tc.c:1791:50 1 0xd69044 in nl_parse_flower_actions lib/tc.c:1841:19 2 0xd57612 in nl_parse_flower_options lib/tc.c:1893:12 3 0xd5468d in parse_netlink_to_tc_flower lib/tc.c:1941:12 4 0xd5a7ec in tc_replace_flower lib/tc.c:3199:19 5 0xd2baea in probe_tc_block_support lib/netdev-offload-tc.c:2226:13 6 0xd2baea in netdev_tc_init_flow_api lib/netdev-offload-tc.c:2279:9 7 0x94d726 in netdev_assign_flow_api lib/netdev-offload.c:183:14 8 0x94d726 in netdev_init_flow_api lib/netdev-offload.c:323:9 9 0x9515c7 in netdev_ports_flow_init lib/netdev-offload.c:775:8 10 0x9515c7 in netdev_set_flow_api_enabled lib/netdev-offload.c:815:13 11 0x562ec8 in bridge_run vswitchd/bridge.c:3292:9 12 0x59a98c in main vswitchd/ovs-vswitchd.c:129:9 13 0x7fb5c583acf2 in __libc_start_main (/lib64/libc.so.6+0x3acf2) 14 0x47e60d in _start (vswitchd/ovs-vswitchd+0x47e60d) lib/tc.c:1306:43: runtime error: member access within misaligned address 0x619000140324 for type 'const struct tcf_t', which requires 8 byte alignment 0x619000140324: note: pointer points here 24 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 ... ^ 0 0xd6ee6f in nl_parse_tcf lib/tc.c:1306:43 1 0xd6423f in nl_parse_act_mirred lib/tc.c:1389:5 2 0xd6423f in nl_parse_single_action lib/tc.c:1747:15 3 0xd6423f in nl_parse_flower_actions lib/tc.c:1843:19 4 0xd57612 in nl_parse_flower_options lib/tc.c:1895:12 5 0xd5468d in parse_netlink_to_tc_flower lib/tc.c:1943:12 6 0xd5a7ec in tc_replace_flower lib/tc.c:3201:19 7 0xd28ae8 in netdev_tc_flow_put lib/netdev-offload-tc.c:1969:11 8 0x94cc97 in netdev_flow_put lib/netdev-offload.c:257:14 9 0xcba2be in parse_flow_put lib/dpif-netlink.c:2289:11 10 0xcba2be in try_send_to_netdev lib/dpif-netlink.c:2376:15 11 0xcba2be in dpif_netlink_operate lib/dpif-netlink.c:2447:23 12 0x8647ce in dpif_operate lib/dpif.c:1372:13 13 0x6b50a6 in push_dp_ops ofproto/ofproto-dpif-upcall.c:2406:5 14 0x6c9bcd in revalidate ofproto/ofproto-dpif-upcall.c:2792:13 15 0x6b79b5 in udpif_revalidator ofproto/ofproto-dpif-upcall.c:980:9 16 0xb4d5ea in ovsthread_wrapper lib/ovs-thread.c:422:12 17 0x7ff1090081ce in start_thread (/lib64/libpthread.so.0+0x81ce) 18 0x7ff107c39dd2 in clone (/lib64/libc.so.6+0x39dd2) This patch fixes only problems reported by UBsan in current tests, there could be more issues like this not currently covered by the testsuite. Acked-by: Dumitru Ceara <dceara@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-06-24 23:50:31 +02:00
Ilya Maximets	499b9d73c3	odp-util: Fix unaligned access to tunnel id. SUMMARY: UndefinedBehaviorSanitizer: lib/odp-util.c:3436:32: runtime error: load of misaligned address 0x624000489424 for type 'const ovs_be64' (aka 'const unsigned long'), which requires 8 byte alignment 0x624000489424: note: pointer points here 0c 00 00 00 ff ff ff ff ff ff ff ff 08 00 01 00 ... ^ 0 0x9b13a2 in format_be64 lib/odp-util.c:3436:32 1 0x9b13a2 in format_odp_tun_attr lib/odp-util.c:3942:13 2 0x9b13a2 in format_odp_key_attr__ lib/odp-util.c:4221:9 3 0x9ae7a2 in odp_flow_format lib/odp-util.c:4606:17 4 0xee5037 in format_dpif_flow lib/dpctl.c:862:5 5 0xed69ed in dpctl_dump_flows lib/dpctl.c:1142:13 6 0xed32b3 in dpctl_unixctl_handler lib/dpctl.c:3035:17 7 0xc7c80b in process_command lib/unixctl.c:310:13 8 0xc7c80b in run_connection lib/unixctl.c:344:17 9 0xc7c80b in unixctl_server_run lib/unixctl.c:395:21 10 0x59a9a4 in main vswitchd/ovs-vswitchd.c:130:9 11 0x7fee2803acf2 in __libc_start_main (/lib64/libc.so.6+0x3acf2) 12 0x47e60d in _start (vswitchd/ovs-vswitchd+0x47e60d) Tunnel id mask in the flow key is only 4 bytes aligned, so has to be accessed with appropriate unaligned read function. Acked-by: Dumitru Ceara <dceara@redhat.com> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-06-24 23:50:03 +02:00
Ilya Maximets	888193cecc	ofpbuf: Fix offsetting a NULL pointer in ofpbuf_reserve. ofpbuf_reserve() can be called with a zero size for a buffer with an unallocated data. It's a valid case, but we should not allow evaluation of 'NULL + 0'. SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior lib/ofpbuf.c:469:30 in lib/ofpbuf.c:469:30: runtime error: applying zero offset to null pointer 0 0xb2f890 in ofpbuf_reserve lib/ofpbuf.c:469:30 1 0xb2f9bc in ofpbuf_new_with_headroom lib/ofpbuf.c:179:5 2 0xb2f9bc in ofpbuf_clone_data_with_headroom lib/ofpbuf.c:228:24 3 0xb2f9bc in ofpbuf_clone_with_headroom lib/ofpbuf.c:199:18 4 0xb2f8ea in ofpbuf_clone lib/ofpbuf.c:189:12 5 0x6b3c57 in ukey_set_actions ofproto/ofproto-dpif-upcall.c:1712:5 6 0x6c4315 in ukey_create__ ofproto/ofproto-dpif-upcall.c:1738:5 7 0x6beed6 in ukey_create_from_upcall ofproto/ofproto-dpif-upcall.c:1793:12 8 0x6beed6 in upcall_xlate ofproto/ofproto-dpif-upcall.c:1284:24 9 0x6beed6 in process_upcall ofproto/ofproto-dpif-upcall.c:1456:9 10 0x6bafb6 in recv_upcalls ofproto/ofproto-dpif-upcall.c:875:17 11 0x6b70fa in udpif_upcall_handler ofproto/ofproto-dpif-upcall.c:792:13 12 0xb4d5fa in ovsthread_wrapper lib/ovs-thread.c:422:12 13 0x7fe6922081ce in start_thread (/lib64/libpthread.so.0+0x81ce) 14 0x7fe690e39dd2 in clone (/lib64/libc.so.6+0x39dd2) Acked-by: Dumitru Ceara <dceara@redhat.com> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-06-24 23:49:28 +02:00
Ilya Maximets	1dbc3b9f34	drop-stats.at: Fix frequent failures of the recursion too deep test. The test doesn't wait for old flows being revalidated before sending the second packet. The packet hits old flows and doesn't increase the new drop counter as a result. Solution is to wait for revalidators to clean up old flows. This fixes frequent test failures on CirrusCI. Fixes: a13a0209750c ("userspace: Improved packet drop statistics.") Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-06-24 20:29:44 +02:00
Eelco Chaudron	d632ad0aa1	odp_util: Fix parse_key_and_mask_to_match() vlan parsing. The parse_key_and_mask_to_match() is a function to translate a netlink formatted key/mask to match structure. And should not consider any configuration setting when translating. In addition we also enforce the encap_eth_type[0] mask as it's required for the VLAN match. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-06-24 20:24:55 +02:00
Wilson Peng	70f81aa23a	datapath-windows: Update layers for multiple tunnels processing While testing OVS-windows for multiple IPV6 Geneve tunnels on Windows2019VM, for the broadcast/mutlicast packets, it needs to flood out via configured multiple Geneve tunnels. Then in some flow pipeline processing, it may have at least two tunnel processing in OVS_ACTION_ATTR_SET action. When processing the second tunnel setting it may need flush out the packet out via tunnel before setting new tunnel parameter in tunKey. We found in this case, after flushing out the packet out via tunnel, the related layers pointer does not update. In our test setup on Windows2019VM, it will cause BSOD which is triggered in other Windows processes. We suspect it may be related to memory overwriting. When we apply the fix in the patch, no BSOD is observed on the same VM and same packet/tunnel settting. Another thing needs to be mentioned is for multiple Geneve IPv4 tunnels, the same kind broadcase/multicast packet will not cause BSOD. Signed-off-by: Wilson Peng <pweisong@vmware.com> Signed-off-by: Alin-Gabriel Serdean <aserdean@ovn.org>	2022-06-22 01:56:05 +03:00
William Tu	bca4102830	datapath-windows: Fix GRE/VxLAN/STT Tunnel RX. GRE/Vxlan/STT tunnel RX is broken due to incorrecly checking the 'tunKey->dst.si_family != AF_INET', which is actually set later after parsing the GRE header. Removing such chunk makes tunnel works. Fixes: edb2335861d6 ("datapath-windows: Add IPv6 Geneve tunnel support in Windows") Cc: Alin-Gabriel Serdean <aserdean@ovn.org> Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Alin-Gabriel Serdean <aserdean@ovn.org>	2022-06-22 00:53:11 +03:00
Frode Nordahl	88e3ae5d6f	ofproto-dpif-xlate: Fix internal CT state for non-recirc traffic. In some circumstances a flow may get its ct_state set without conscious intervention by the OVS user space code. Commit 355fef6f2ccbc optimizes out unnecessary ct_clear actions based on an internal struct xlate_ctx->conntracked state flag. Before this commit the xlate_ctx->conntracked state flag would be initialized to 'false' and only set during thawing for recirculation. This patch checks the flow ct_state for the non-recirc case and sets the internal conntracked state appropriately. A system traffic test is also added to avoid regression. Fixes: 355fef6f2ccbc ("ofproto-dpif-xlate: Avoid successive ct_clear datapath actions.") Signed-off-by: Frode Nordahl <frode.nordahl@canonical.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-06-07 15:49:26 +02:00
Aaron Conole	ca44218515	classifier: Adjust segment boundary to execute prerequisite processing. During flow processing, the flow wildcards are checked as a series of stages, and these stages are intended to carry dependencies in a single direction. But when the neighbor discovery processing, for example, was executed there is an incorrect dependency chain - we need fields from stage 4 to determine whether we need fields from stage 3. We can build a set of flow rules to demonstrate this: table=0,priority=100,ipv6,ipv6_src=1000::/10 actions=resubmit(,1) table=0,priority=0 actions=NORMAL table=1,priority=110,ipv6,ipv6_dst=1000::3 actions=resubmit(,2) table=1,priority=100,ipv6,ipv6_dst=1000::4 actions=resubmit(,2) table=1,priority=0 actions=NORMAL table=2,priority=120,icmp6,nw_ttl=255,icmp_type=135,icmp_code=0,nd_sll=10🇩🇪ad:be:ef:10 actions=NORMAL table=2,priority=100,tcp actions=NORMAL table=2,priority=100,icmp6 actions=NORMAL table=2,priority=0 actions=NORMAL With this set of flows, any IPv6 packet that executes through this pipeline will have the corresponding nd_sll field flagged as required match for classification even if that field doesn't make sense in such a context (for example, TCP packets). When the corresponding flow is installed into the kernel datapath, this field is not reflected when the revalidator executes the dump stage (see net/openvswitch/flow_netlink.c for more details). During the sweep stage, revalidator will compare the dumped WC with a generated WC - these will mismatch because the generated WC will match on the Neighbor Discovery fields, while the datapath WC will not match on these fields. We will then invalidate the flow and as a side effect force an upcall. By redefining the boundary, we shift these fields to the l4 subtable, and cause masks to be generated matching just the requisite fields. The list of fields being shifted: struct in6_addr nd_target; struct eth_addr arp_sha; struct eth_addr arp_tha; ovs_be16 tcp_flags; ovs_be16 pad2; struct ovs_key_nsh nsh; A standout field would be tcp_flags moving from l3 subtable matches to the l4 subtable matches. This reverts a partial performance optimization in the case of stateless firewalling. The tcp_flags field might have been a good candidate to retain in the l3 segment, but it got overloaded with ICMPv6 ND matching, and therefore we can't preserve this kind of optimization. Two other approaches were considered - moving the nd_target field alone and collapsing the l3/l4 segments into a single subtable for matching. Moving any field individually introduces ABI mismatch, and doesn't completely address the problems with other neighbor discovery related fields (such as nd_sll/nd_tll). Collapsing the two subtables creates an issue with datapath flow explosion, since the l3 and l4 fields will be unwildcarded together (this can be seen with some of the existing classifier tests). A simple test is added to showcase the behavior. Fixes: 476f36e83bc5 ("Classifier: Staged subtable matching.") Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2081773 Reported-by: Numan Siddique <nusiddiq@redhat.com> Suggested-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Aaron Conole <aconole@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Cian Ferriter <cian.ferriter@intel.com> Tested-by: Numan Siddique <numans@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-06-07 15:17:52 +02:00
Han Ding	c0d7d630be	ovs-tcpdump: Fix error when stopping ovs-tcpdump. Sometimes we need to dump packets on more than two interfaces in a bridge at the same time. Then when we stop dumping in order, ovs-tcpdump print traceback and fail to delete mirror interface for some interface. For example: br-int has two interface tap1 and br-int. We use ovs-tcpdump dump tap1 first and dump br-int next. Then stopping tap1 ovs-tcpdump first, and stopping br-int second. When we stop ovs-tcpdump for br-int, the screen show the error like this: __main__.OVSDBException: Unable to delete Mirror m_br-int Signed-off-by: Han Ding <handing@chinatelecom.cn> Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-06-07 15:08:33 +02:00
wenxu	165f5fbb5e	conntrack: Limit port clash resolution attempts. In case almost or all available ports are taken, clash resolution can take a very long time, resulting in pmd lockup. This can happen when many to-be-natted hosts connect to same destination:port (e.g. a proxy) and all connections pass the same SNAT. Pick a random offset in the acceptable range, then try ever smaller number of adjacent port numbers, until either the limit is reached or a useable port was found. This results in at most 248 attempts (128 + 64 + 32 + 16 + 8, i.e. 4 restarts with new search offset) instead of 64000+. Signed-off-by: wenxu <wenxu@chinatelecom.cn> Acked-by: Paolo Valerio <pvalerio@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-06-07 14:50:36 +02:00
wenxu	c608ace71d	conntrack: Remove the IP iterations in nat_get_unique_l4. Removing the IP iterations, and just picking the IP address with the hash base on the least-used src-ip/dst-ip/proto triple. Signed-off-by: wenxu <wenxu@chinatelecom.cn> Acked-by: Paolo Valerio <pvalerio@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-06-07 14:50:00 +02:00
Peng He	071b802c61	checkpatch.py: Add checks for easy-to-misuse APIs. Signed-off-by: Peng He <hepeng.0320@bytedance.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-05-30 23:39:56 +02:00
Peng He	805e9340d5	ofproto-dpif: Fix meter use-after-free. Add a rcu_barrier before close_dpif_backer to ensure that all meters have been freed before id_pool_destory meter's id-pool. Signed-off-by: Peng He <hepeng.0320@bytedance.com> Tested-by: David Marchand <david.marchand@redhat.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-05-30 23:39:56 +02:00
Peng He	c67941e974	ovs-rcu: Add ovsrcu_barrier. ovsrcu_barrier will block the current thread until all the postponed rcu job has been finished. it's like a OVS version of the Linux kernel rcu_barrier(). Signed-off-by: Peng He <hepeng.0320@bytedance.com> Co-authored-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-05-30 23:34:39 +02:00
Lin Huang	ba462b3589	dpif-netdev: Fix ALB 'rebalance_intvl' max hard limit. Currently the pmd-auto-lb-rebal-interval's value was not been checked properly. It maybe a negative, or too big value (>2 weeks between rebalances), which will be lead to a big unsigned value. So reset it to default if the value exceeds the max permitted as described in vswitchd.xml. Fixes: 5bf84282482a ("Adding support for PMD auto load balancing") Signed-off-by: Lin Huang <linhuang@ruijie.com.cn> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-05-30 23:28:22 +02:00
Lin Huang	83c0a36472	dpif-netdev: Fix ALB parameters type mismatch. The ALB parameters should never be negative. So it's to use unsigned smap_get versions to get it properly, and update VLOG formatting. Fixes: 5bf84282482a ("Adding support for PMD auto load balancing") Signed-off-by: Lin Huang <linhuang@ruijie.com.cn> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-05-30 23:28:21 +02:00
Ilya Maximets	31dfea34cd	AUTHORS: Add Michael Phelan. Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-05-30 23:23:20 +02:00
Michael Phelan	87ef13b002	dpdk: Use DPDK 21.11.1 release. Modify ci linux build script to use the latest DPDK stable release 21.11.1. Modify Documentation to use the latest DPDK stable release 21.11.1. Update NEWS file to reflect the latest DPDK stable release 21.11.1. FAQ is updated to reflect the latest DPDK for each OVS branch. Signed-off-by: Michael Phelan <michael.phelan@intel.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-05-30 23:21:46 +02:00
Cian Ferriter	cb1c640077	acinclude: Add seperate checks for AVX512 ISA. Checking for each of the required AVX512 ISA separately will allow the compiler to generate some AVX512 code where there is some support in the compiler rather than only generating all AVX512 code when all of it is supported or no AVX512 code at all. For example, in GCC 4.9 where there is just support for AVX512F, this patch will allow building the AVX512 DPIF. Another example, in GCC 5 and 6, most AVX512 code can be generated, just without AVX512VPOPCNTDQ support. Signed-off-by: Cian Ferriter <cian.ferriter@intel.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-05-30 23:12:51 +02:00
Cian Ferriter	fb85ae4340	automake.mk: Remove -mavx512dq CFLAG from AVX512 library. No instructions from the AVX512DQ ISA are used anywhere in OVS. Remove this unnecessary CFLAG. Signed-off-by: Cian Ferriter <cian.ferriter@intel.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-05-30 23:12:51 +02:00
Cian Ferriter	34a77ca704	dpif-netdev-extract: Remove unnecessary compiler targets. No instructions from the AVX512VL ISA are used. Compilation for AVX512F and AVX512 BW ISA are already enabled in lib/automake.mk for the dpif-netdev-lookup-avx512-gather.c file because it's part of the libopenvswitchavx512.la library. They don't need to be enabled at a function level. Remove these unnecessary function-level compiler target attributes. Signed-off-by: Cian Ferriter <cian.ferriter@intel.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-05-30 23:12:51 +02:00
Cian Ferriter	66c85fae3a	dpif-netdev-lookup: Fix GCC 5 warning. GCC 5 gave an incompatible pointer type warning for pkt_blocks when it's passed to _mm512_mask_i64gather_epi64(). Follow the same pattern used for tbl_blocks where the 'const uint64_t ' is cast to a 'const void ' when passed in to avx512_blocks_gather(). Fixes: 47a2a8f4138e ("dpif-netdev/dpcls-avx512: Enable 16 block processing.") Signed-off-by: Cian Ferriter <cian.ferriter@intel.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-05-30 23:12:51 +02:00
Cian Ferriter	90cadf170f	dpif-netdev-private-extract: Fix typo VMBI -> VBMI. Fixes: 250ceddcc2d0 ("dpif-netdev/mfex: Add AVX512 based optimized miniflow extract") Fixes: aa85a25095ae ("dpif-netdev/mfex: Add more AVX512 traffic profiles") Signed-off-by: Cian Ferriter <cian.ferriter@intel.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-05-30 23:12:51 +02:00
Dumitru Ceara	2c24daa099	raft: Don't use HMAP_FOR_EACH_SAFE when logging commands. There's no need to do that because we're not changing the hmap. Also, if DEBUG logging is disabled there's no need to iterate at all. Fixes: 5a9b53a51ec9 ("ovsdb raft: Fix duplicated transaction execution when leader failover.") Signed-off-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-05-30 23:04:17 +02:00
Kevin Traynor	3ecfaf1361	dpif-netdev: Restructure rxq schedule logging. Previously logging about rxq scheduling was done in a code branch with the selection of the PMD thread core after checking that a numa was selected. By splitting out the logging from the PMD thread core selection, it can simplify the code complexity and make it more extendable for future additions. Also, minor updates to a couple of variables to improve readability and fix a log indent while working on this code block. There is no user visible change in behaviour or logs. Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-05-30 22:45:51 +02:00
Kevin Traynor	37ccbd9c9d	dpif-netdev: Split function to find lowest loaded PMD thread core. This splits up the looping through each PMD thread core on a numa node with the check to compare cycles or rxqs. This is done so in future the compare could be reused with any group of PMD thread cores. There is no user visible change in behaviour. Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-05-30 22:45:51 +02:00
Ilya Maximets	04e5adfedd	ovsdb: raft: Fix transaction double commit due to lost leadership. While becoming a follower, the leader aborts all the current 'in-flight' commands, so the higher layers can re-try corresponding transactions when the new leader is elected. However, most of these commands are already sent to followers as append requests, hence they will actually be committed by the majority of the cluster members, i.e. will be treated as committed by the new leader, unless there is an actual network problem between servers. However, the old leader will decline append replies, since it's not the leader anymore and commands are already completed with RAFT_CMD_LOST_LEADERSHIP status. New leader will replicate the commit index back to the old leader. Old leader will re-try the previously "failed" transaction, because "cluster error"s are temporary. If a transaction had some prerequisites that didn't allow double committing or there are other database constraints (like indexes) that will not allow a transaction to be committed twice, the server will reply to the client with a false-negative transaction result. If there are no prerequisites or additional database constraints, the server will execute the same transaction again, but as a follower. E.g. in the OVN case, this may result in creation of duplicated logical switches / routers / load balancers. I.e. resources with the same non-indexed name. That may cause issues later where ovn-nbctl will not be able to add ports to these switches / routers. Suggested solution is to not complete (abort) the commands, but allow them to be completed with the commit index update from a new leader. It the similar behavior to what we do in order to complete commands in a backward scenario when the follower becomes a leader. That scenario was fixed by commit 5a9b53a51ec9 ("ovsdb raft: Fix duplicated transaction execution when leader failover."). Code paths for leader and follower inside the raft_update_commit_index were very similar, so they were refactored into one, since we also needed an ability to complete more than one command for a follower. Failure test added to exercise scenario of a leadership transfer. Fixes: 1b1d2e6daa56 ("ovsdb: Introduce experimental support for clustered databases.") Fixes: 3c2d6274bcee ("raft: Transfer leadership before creating snapshots.") Reported-at: https://bugzilla.redhat.com/2046340 Acked-by: Han Zhou <hzhou@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-05-26 11:43:53 +02:00
Dumitru Ceara	336d7dd7cc	dynamic-string: Fix undefined behavior due to offsetting null pointer. When compiled with clang and '-fsanitize=undefined' set, running 'ovsdb-client --timestamp monitor Open_vSwitch' in a sandbox triggers the following undefined behavior (flagged by UBSan): lib/dynamic-string.c:207:38: runtime error: applying zero offset to null pointer #0 0x4ebc18 in ds_put_strftime_msec lib/dynamic-string.c:207:38 #1 0x4ebd04 in xastrftime_msec lib/dynamic-string.c:225:5 #2 0x552e6a in table_format_timestamp__ lib/table.c:226:12 #3 0x552852 in table_print_timestamp__ lib/table.c:233:27 #4 0x5506f3 in table_print_table__ lib/table.c:254:5 #5 0x550633 in table_format lib/table.c:601:9 #6 0x5524f3 in table_print lib/table.c:633:5 #7 0x44dc5e in monitor_print_table ovsdb/ovsdb-client.c:1019:5 #8 0x44c650 in monitor_print ovsdb/ovsdb-client.c:1040:13 #9 0x44ac56 in do_monitor__ ovsdb/ovsdb-client.c:1500:21 #10 0x44636e in do_monitor ovsdb/ovsdb-client.c:1575:5 #11 0x442c41 in main ovsdb/ovsdb-client.c:283:5 Reported-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-05-26 11:43:53 +02:00
Ilya Maximets	e8f557df33	sha1: Use implementation from openssl if available. Implementation of SHA1 in OpenSSL library is much faster and optimized for all available CPU architectures and instruction sets. OVS should use it instead of internal implementation if possible. Depending on compiler options OpenSSL's version finishes our sha1 unit tests from 3 to 12 times faster. Performance of OpenSSL's version is constant, but OVS's implementation highly depends on compiler. Interestingly, default build with '-g -O2' works faster than optimized '-march=native -Ofast'. Tests with ovsdb-server on big databases shows ~5-10% improvement of the time needed for database compaction (sha1 is only a part of this operation), depending on compiler options. We still need internal implementation, because OpenSSL can be not available on some platforms. Tests enhanced to check both versions of API. Reviewed-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-05-26 11:43:53 +02:00
Aaron Conole	7b3a4c2e86	Revert "odp-util: Always report ODP_FIT_TOO_LITTLE for IGMP." This reverts commit c645550bb249 ("odp-util: Always report ODP_FIT_TOO_LITTLE for IGMP.") Always forcing a slow path action can result in some over-broad flows which swallow all traffic and force them to userspace, as reported in the thread at https://mail.openvswitch.org/pipermail/ovs-dev/2021-September/387706.html and at https://mail.openvswitch.org/pipermail/ovs-dev/2021-September/387689.html Revert the ODP_FIT_TOO_LITTLE return for IGMP specifically. Additionally, remove the userspace wc mask to prevent revalidator from cycling flows. Fixes: c645550bb249 ("odp-util: Always report ODP_FIT_TOO_LITTLE for IGMP.") Signed-off-by: Aaron Conole <aconole@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-05-26 11:43:48 +02:00
wenxu	482abeae57	ofproto-dpif-xlate: Fix netdev native tunnel neigh discovery spa. During native tunnel encapsulation process, on tunnel neighbor cache miss OVS sends an arp/nd request. Currently, tunnel source is used as arp spa. Find the spa which has the same subnet with the nexthop of tunnel dst on egress port, if false, use the tunnel src as spa. For example: tunnel src is a vip with 10.0.0.7/32, tunnel dst is 10.0.1.7 the br-phy with address 192.168.0.7/24 and the default gateway is 192.168.0.1 So the spa of arp request for 192.168.0.1 should be 192.168.0.7 but not 10.0.0.7 Signed-off-by: wenxu <wenxu@chinatelecom.cn> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-05-25 21:11:32 +02:00
wenxu	c5bcbd58d6	ovs-router: Expose the ovs_router_get_netdev_source_address function. Rename get_src_addr to ovs_router_get_netdev_source_address and expose this function to be used in the next commit. Signed-off-by: wenxu <wenxu@chinatelecom.cn> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-05-25 21:11:32 +02:00
lic121	743b53622a	ofproto-dpif: Trigger revalidation if ct tp changes. Once ct_zone timeout policy changes, revalidator is supposed to be triggered. Fixes: 993cae678bca ("ofproto-dpif: Consume CT_Zone, and CT_Timeout_Policy tables") Signed-off-by: lic121 <lic121@chinatelecom.cn> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-05-25 21:11:32 +02:00
hxie	bb78070fc7	Carefully release NBL in Windows OvsExtSendNBLComplete always release NBLs with flag SINGL_SOURCE, this is an efficient way, which requires all the NBLs having the same source port, when cloned/fragment NBLs are released, the parent NBLs will be released as well, the problem is that a parent NBL may have a different source port from the cloned/fragment NBL, so releasing the parent NBLs with flag SINGLE_SOURCE is not corrct, see: https://github.com/microsoft/hcsshim/issues/1056 When this happens, commands 'Get-NetAdapter' and 'Get-HnsEndpoint' in the Windows node show that one net-adapter/hns-endpoint is in 'disconnected' state, meanwhile, following processes are affected, ecah of them has one thread pending on a lock: vmcompute.exe containerd.exe antrea-agent.exe To fix this issue, we may check SourcePortId in each parent NBLs before released. A simple way to reprodue this issue: 1, Enable encap mode 2, create 2 nodes, nodeA and nodeB 3, create podA with image k8s.gcr.io/e2e-test-images/agnhost:2.21 on nodeA, run 'iperf/iperf.exe -s -p 9000 -D' 4, create podB with same image on nodeB, run command 'iperf/iperf.exe -c <podA-ip> -p 9000' 5, delete podB 6, run 'Get-NetAdapter' on nodeB, you will find the leaked net adapter. Signed-off-by: Hongsheng Xie <hxie@vmware.com> Signed-off-by: Alin-Gabriel Serdean <aserdean@ovn.org>	2022-05-25 21:39:13 +03:00

... 3 4 5 6 7 ...

19352 Commits