As described in the bugzilla below, cpu_has_isa code may be compiled
with some AVX512 instructions in it, because cpu.c is built as part of
the libopenvswitchavx512.
This is a problem when this function (supposed to probe for AVX512
instructions availability) is invoked from generic OVS code, on older
CPUs that don't support them.
For the same reason, dpcls_subtable_avx512_gather_probe,
dp_netdev_input_outer_avx512_probe, mfex_avx512_probe and
mfex_avx512_vbmi_probe are potential runtime bombs and can't either be
built as part of libopenvswitchavx512.
Move cpu.c to be part of the "normal" libopenvswitch.
And move other helpers in generic OVS code.
Note:
- dpcls_subtable_avx512_gather_probe is split in two, because it also
needs to do its own magic,
- while moving those helpers, prefer direct calls to cpu_has_isa and
avoid cast to intermediate integer variables when a simple boolean
is enough,
Fixes: 352b6c7116cd ("dpif-lookup: add avx512 gather implementation.")
Fixes: abb807e27dd4 ("dpif-netdev: Add command to switch dpif implementation.")
Fixes: 250ceddcc2d0 ("dpif-netdev/mfex: Add AVX512 based optimized miniflow extract")
Fixes: b366fa2f4947 ("dpif-netdev: Call cpuid for x86 isa availability.")
Reported-at: https://bugzilla.redhat.com/2100393
Reported-by: Ales Musil <amusil@redhat.com>
Co-authored-by: Ales Musil <amusil@redhat.com>
Signed-off-by: Ales Musil <amusil@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Acked-by: Ales Musil <amusil@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Add a new command, 'ovsdb-server/tlog-set DB:TABLE on|off', which
allows the user to enable/disable transaction logging for specific
databases and tables.
By default, logging is disabled. Once enabled, logs are generated
with level INFO and are also rate limited.
If used with care, this command can be useful in analyzing production
deployment performance issues, allowing the user to pin point
bottlenecks without the need to enable wider debug logs, e.g., jsonrpc.
A command to inspect the logging state is also added:
'ovsdb-server/tlog-list'.
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Clients might be connected to multiple databases (e.g., ovn-controller
is connected to OVN_Southbound and Open_vSwitch databases) and the IDL
memory statistics are more useful if they're not aggregated.
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
__builtin_constant_p is only available in GCC and only versions >= 4.
Use the same "#if __GNUC__ >= 4" check used in other parts of OVS for
this builtin.
Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Now ovs-tcpdump will check for an OVS_RUNDIR environment variable and
if present, use it instead of the default RUNDIR. This is useful when
used in conjunction with OVS_PAUSE_TEST while running the test suite.
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This patch adds the old name "subtable-lookup-prio-get" as an unlisted command,
to restore a consistency between OVS releases for testing scripts.
Fixes: 738c76a503f4 ("dpcls: Change info-get function to fetch dpcls usage stats.")
Suggested-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Found by Coverity.
Fixes: 1b1d2e6daa56 ("ovsdb: Introduce experimental support for clustered databases.")
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
When processing netlink messages, we should ignore unknown OVS_KEY_ATTR
as we can assume if newer attributes are present, they are backward
compatible.
This patch also updates the existing comments to better explain what
happens in the error cases. At this place in the code, we can not
ignore the TOO_LITTLE/MUCH errors as some instances could have
canceled processing leaving the returning match data incomplete.
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2089331
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Michael Santana <msantana@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
If lldp didn't change, we are not supposed to trigger backer
revalidation.
Without this patch, bridge_reconfigure() always trigger udpif
revalidator because of lldp.
Signed-off-by: lic121 <lic121@chinatelecom.cn>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Co-authored-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
lldp_create() malloc memory for lldp->lldpd->g_hardware. lldp_unref
is supposed to free the memory regardless of hw->h_flags.
Signed-off-by: lic121 <lic121@chinatelecom.cn>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
ipfix cfg creation/deleting triggers revalidation. But this does
not cover the case where ipfix options changes, which also suppose
to trigger revalidation.
Fixes: a9f5ee1199e1 ("ofproto-dpif: Trigger revalidation when ipfix config set.")
Signed-off-by: lic121 <lic121@chinatelecom.cn>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
'max_port' is 16bit field, shift expands it to 'int', not unsigned int.
lib/conntrack.c:2245:41: runtime error:
left shift of 34568 by 16 places cannot be represented in type 'int'.
0 0xec45f4 in nat_range_hash lib/conntrack.c:2245:41
1 0xec45f4 in nat_get_unique_tuple lib/conntrack.c:2422:21
2 0xec45f4 in conn_not_found lib/conntrack.c:1035:32
3 0xeaf0a5 in process_one lib/conntrack.c:1407:20
4 0xea9390 in conntrack_execute lib/conntrack.c:1465:13
5 0x839530 in dp_execute_cb lib/dpif-netdev.c:9060:9
6 0x9909cc in odp_execute_actions lib/odp-execute.c:868:17
7 0x830946 in dp_netdev_execute_actions lib/dpif-netdev.c:9106:5
8 0x830946 in handle_packet_upcall lib/dpif-netdev.c:8294:5
9 0x82ea5e in fast_path_processing lib/dpif-netdev.c:8390:25
10 0x7ed87f in dp_netdev_input__ lib/dpif-netdev.c:8479:9
11 0x7eb5fc in dp_netdev_input lib/dpif-netdev.c:8517:5
12 0x81dada in dp_netdev_process_rxq_port lib/dpif-netdev.c:5329:19
13 0x7f0063 in dpif_netdev_run lib/dpif-netdev.c:6664:25
14 0x85f036 in dpif_run lib/dpif.c:467:16
15 0x61833a in type_run ofproto/ofproto-dpif.c:366:9
16 0x5c210e in ofproto_type_run ofproto/ofproto.c:1822:31
17 0x565db2 in bridge_run__ vswitchd/bridge.c:3245:9
18 0x562f82 in bridge_run vswitchd/bridge.c:3310:5
19 0x59a98c in main vswitchd/ovs-vswitchd.c:129:9
20 0x7f8864c3acf2 in __libc_start_main (/lib64/libc.so.6+0x3acf2)
21 0x47e60d in _start (vswitchd/ovs-vswitchd+0x47e60d)
Fixes: 92edd073ce6c ("conntrack: Hash entire NAT data structure in nat_range_hash().")
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
MPLS header is only 2 byte aligned, so the value has to be written
in parts. Also, even though the 'struct mpls_hdr' has only one
field, it's cleaner to not access that field directly.
lib/packets.c:432:9: runtime error:
store to misaligned address 0x61b000756382 for type 'ovs_be32'
(aka 'unsigned int'), which requires 4 byte alignment
0x61b000756382: note: pointer points here
00 00 be be be be be be ff ff ff ff ff ff a6 36 77 20 ...
^
0 0xbb30ae in add_mpls lib/packets.c:432:17
1 0x9934d2 in odp_execute_actions lib/odp-execute.c:1072:17
2 0x830946 in dp_netdev_execute_actions lib/dpif-netdev.c:9106:5
3 0x830946 in handle_packet_upcall lib/dpif-netdev.c:8294:5
4 0x82ea5e in fast_path_processing lib/dpif-netdev.c:8390:25
5 0x7ed87f in dp_netdev_input__ lib/dpif-netdev.c:8479:9
6 0x7eb5fc in dp_netdev_input lib/dpif-netdev.c:8517:5
7 0x81dada in dp_netdev_process_rxq_port lib/dpif-netdev.c:5329:19
8 0x7f0063 in dpif_netdev_run lib/dpif-netdev.c:6664:25
9 0x85f036 in dpif_run lib/dpif.c:467:16
10 0x61833a in type_run ofproto/ofproto-dpif.c:366:9
11 0x5c210e in ofproto_type_run ofproto/ofproto.c:1822:31
12 0x565db2 in bridge_run__ vswitchd/bridge.c:3245:9
13 0x562f82 in bridge_run vswitchd/bridge.c:3310:5
14 0x59a98c in main vswitchd/ovs-vswitchd.c:129:9
15 0x7ff895c3acf2 in __libc_start_main (/lib64/libc.so.6+0x3acf2)
16 0x47e60d in _start (vswitchd/ovs-vswitchd+0x47e60d)
Fixes: 1917ace89364 ("Encap & Decap actions for MPLS packet type.")
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Pointers to gnet_stats_basic and tcf_t are not correctly aligned,
so we need to copy the data before accessing. Found by running
check-offloads testsuite with UBsan:
lib/tc.c:1791:50: runtime error:
member access within misaligned address 0x61900005ce1c for type
'const struct gnet_stats_basic', which requires 8 byte alignment
0x61900005ce1c: note: pointer points here
14 00 07 00 00 00 00 00 00 00 00 00 00 00 00 00 ...
^
0 0xd69044 in nl_parse_single_action lib/tc.c:1791:50
1 0xd69044 in nl_parse_flower_actions lib/tc.c:1841:19
2 0xd57612 in nl_parse_flower_options lib/tc.c:1893:12
3 0xd5468d in parse_netlink_to_tc_flower lib/tc.c:1941:12
4 0xd5a7ec in tc_replace_flower lib/tc.c:3199:19
5 0xd2baea in probe_tc_block_support lib/netdev-offload-tc.c:2226:13
6 0xd2baea in netdev_tc_init_flow_api lib/netdev-offload-tc.c:2279:9
7 0x94d726 in netdev_assign_flow_api lib/netdev-offload.c:183:14
8 0x94d726 in netdev_init_flow_api lib/netdev-offload.c:323:9
9 0x9515c7 in netdev_ports_flow_init lib/netdev-offload.c:775:8
10 0x9515c7 in netdev_set_flow_api_enabled lib/netdev-offload.c:815:13
11 0x562ec8 in bridge_run vswitchd/bridge.c:3292:9
12 0x59a98c in main vswitchd/ovs-vswitchd.c:129:9
13 0x7fb5c583acf2 in __libc_start_main (/lib64/libc.so.6+0x3acf2)
14 0x47e60d in _start (vswitchd/ovs-vswitchd+0x47e60d)
lib/tc.c:1306:43: runtime error:
member access within misaligned address 0x619000140324 for type
'const struct tcf_t', which requires 8 byte alignment
0x619000140324: note: pointer points here
24 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 ...
^
0 0xd6ee6f in nl_parse_tcf lib/tc.c:1306:43
1 0xd6423f in nl_parse_act_mirred lib/tc.c:1389:5
2 0xd6423f in nl_parse_single_action lib/tc.c:1747:15
3 0xd6423f in nl_parse_flower_actions lib/tc.c:1843:19
4 0xd57612 in nl_parse_flower_options lib/tc.c:1895:12
5 0xd5468d in parse_netlink_to_tc_flower lib/tc.c:1943:12
6 0xd5a7ec in tc_replace_flower lib/tc.c:3201:19
7 0xd28ae8 in netdev_tc_flow_put lib/netdev-offload-tc.c:1969:11
8 0x94cc97 in netdev_flow_put lib/netdev-offload.c:257:14
9 0xcba2be in parse_flow_put lib/dpif-netlink.c:2289:11
10 0xcba2be in try_send_to_netdev lib/dpif-netlink.c:2376:15
11 0xcba2be in dpif_netlink_operate lib/dpif-netlink.c:2447:23
12 0x8647ce in dpif_operate lib/dpif.c:1372:13
13 0x6b50a6 in push_dp_ops ofproto/ofproto-dpif-upcall.c:2406:5
14 0x6c9bcd in revalidate ofproto/ofproto-dpif-upcall.c:2792:13
15 0x6b79b5 in udpif_revalidator ofproto/ofproto-dpif-upcall.c:980:9
16 0xb4d5ea in ovsthread_wrapper lib/ovs-thread.c:422:12
17 0x7ff1090081ce in start_thread (/lib64/libpthread.so.0+0x81ce)
18 0x7ff107c39dd2 in clone (/lib64/libc.so.6+0x39dd2)
This patch fixes only problems reported by UBsan in current tests,
there could be more issues like this not currently covered by the
testsuite.
Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
SUMMARY: UndefinedBehaviorSanitizer:
lib/odp-util.c:3436:32: runtime error:
load of misaligned address 0x624000489424 for type 'const ovs_be64'
(aka 'const unsigned long'), which requires 8 byte alignment 0x624000489424:
note: pointer points here
0c 00 00 00 ff ff ff ff ff ff ff ff 08 00 01 00 ...
^
0 0x9b13a2 in format_be64 lib/odp-util.c:3436:32
1 0x9b13a2 in format_odp_tun_attr lib/odp-util.c:3942:13
2 0x9b13a2 in format_odp_key_attr__ lib/odp-util.c:4221:9
3 0x9ae7a2 in odp_flow_format lib/odp-util.c:4606:17
4 0xee5037 in format_dpif_flow lib/dpctl.c:862:5
5 0xed69ed in dpctl_dump_flows lib/dpctl.c:1142:13
6 0xed32b3 in dpctl_unixctl_handler lib/dpctl.c:3035:17
7 0xc7c80b in process_command lib/unixctl.c:310:13
8 0xc7c80b in run_connection lib/unixctl.c:344:17
9 0xc7c80b in unixctl_server_run lib/unixctl.c:395:21
10 0x59a9a4 in main vswitchd/ovs-vswitchd.c:130:9
11 0x7fee2803acf2 in __libc_start_main (/lib64/libc.so.6+0x3acf2)
12 0x47e60d in _start (vswitchd/ovs-vswitchd+0x47e60d)
Tunnel id mask in the flow key is only 4 bytes aligned, so has to be
accessed with appropriate unaligned read function.
Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
ofpbuf_reserve() can be called with a zero size for a buffer with
an unallocated data. It's a valid case, but we should not allow
evaluation of 'NULL + 0'.
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior lib/ofpbuf.c:469:30 in
lib/ofpbuf.c:469:30: runtime error: applying zero offset to null pointer
0 0xb2f890 in ofpbuf_reserve lib/ofpbuf.c:469:30
1 0xb2f9bc in ofpbuf_new_with_headroom lib/ofpbuf.c:179:5
2 0xb2f9bc in ofpbuf_clone_data_with_headroom lib/ofpbuf.c:228:24
3 0xb2f9bc in ofpbuf_clone_with_headroom lib/ofpbuf.c:199:18
4 0xb2f8ea in ofpbuf_clone lib/ofpbuf.c:189:12
5 0x6b3c57 in ukey_set_actions ofproto/ofproto-dpif-upcall.c:1712:5
6 0x6c4315 in ukey_create__ ofproto/ofproto-dpif-upcall.c:1738:5
7 0x6beed6 in ukey_create_from_upcall ofproto/ofproto-dpif-upcall.c:1793:12
8 0x6beed6 in upcall_xlate ofproto/ofproto-dpif-upcall.c:1284:24
9 0x6beed6 in process_upcall ofproto/ofproto-dpif-upcall.c:1456:9
10 0x6bafb6 in recv_upcalls ofproto/ofproto-dpif-upcall.c:875:17
11 0x6b70fa in udpif_upcall_handler ofproto/ofproto-dpif-upcall.c:792:13
12 0xb4d5fa in ovsthread_wrapper lib/ovs-thread.c:422:12
13 0x7fe6922081ce in start_thread (/lib64/libpthread.so.0+0x81ce)
14 0x7fe690e39dd2 in clone (/lib64/libc.so.6+0x39dd2)
Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The test doesn't wait for old flows being revalidated before sending
the second packet. The packet hits old flows and doesn't increase the
new drop counter as a result.
Solution is to wait for revalidators to clean up old flows. This fixes
frequent test failures on CirrusCI.
Fixes: a13a0209750c ("userspace: Improved packet drop statistics.")
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The parse_key_and_mask_to_match() is a function to translate
a netlink formatted key/mask to match structure. And should
not consider any configuration setting when translating.
In addition we also enforce the encap_eth_type[0] mask as
it's required for the VLAN match.
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
While testing OVS-windows for multiple IPV6 Geneve tunnels on Windows2019VM,
for the broadcast/mutlicast packets, it needs to flood out via configured
multiple Geneve tunnels. Then in some flow pipeline processing, it may have
at least two tunnel processing in OVS_ACTION_ATTR_SET action. When processing
the second tunnel setting it may need flush out the packet out via tunnel before
setting new tunnel parameter in tunKey. We found in this case, after flushing out
the packet out via tunnel, the related layers pointer does not update. In our
test setup on Windows2019VM, it will cause BSOD which is triggered in other Windows
processes. We suspect it may be related to memory overwriting. When we apply the
fix in the patch, no BSOD is observed on the same VM and same packet/tunnel settting.
Another thing needs to be mentioned is for multiple Geneve IPv4 tunnels, the same
kind broadcase/multicast packet will not cause BSOD.
Signed-off-by: Wilson Peng <pweisong@vmware.com>
Signed-off-by: Alin-Gabriel Serdean <aserdean@ovn.org>
GRE/Vxlan/STT tunnel RX is broken due to incorrecly checking the
'tunKey->dst.si_family != AF_INET', which is actually
set later after parsing the GRE header. Removing such
chunk makes tunnel works.
Fixes: edb2335861d6 ("datapath-windows: Add IPv6 Geneve tunnel support in Windows")
Cc: Alin-Gabriel Serdean <aserdean@ovn.org>
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Alin-Gabriel Serdean <aserdean@ovn.org>
In some circumstances a flow may get its ct_state set without
conscious intervention by the OVS user space code.
Commit 355fef6f2ccbc optimizes out unnecessary ct_clear actions
based on an internal struct xlate_ctx->conntracked state flag.
Before this commit the xlate_ctx->conntracked state flag would
be initialized to 'false' and only set during thawing for
recirculation.
This patch checks the flow ct_state for the non-recirc case and
sets the internal conntracked state appropriately. A system
traffic test is also added to avoid regression.
Fixes: 355fef6f2ccbc ("ofproto-dpif-xlate: Avoid successive ct_clear datapath actions.")
Signed-off-by: Frode Nordahl <frode.nordahl@canonical.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
During flow processing, the flow wildcards are checked as a series of
stages, and these stages are intended to carry dependencies in a single
direction. But when the neighbor discovery processing, for example, was
executed there is an incorrect dependency chain - we need fields from
stage 4 to determine whether we need fields from stage 3.
We can build a set of flow rules to demonstrate this:
table=0,priority=100,ipv6,ipv6_src=1000::/10 actions=resubmit(,1)
table=0,priority=0 actions=NORMAL
table=1,priority=110,ipv6,ipv6_dst=1000::3 actions=resubmit(,2)
table=1,priority=100,ipv6,ipv6_dst=1000::4 actions=resubmit(,2)
table=1,priority=0 actions=NORMAL
table=2,priority=120,icmp6,nw_ttl=255,icmp_type=135,icmp_code=0,nd_sll=10🇩🇪ad:be:ef:10 actions=NORMAL
table=2,priority=100,tcp actions=NORMAL
table=2,priority=100,icmp6 actions=NORMAL
table=2,priority=0 actions=NORMAL
With this set of flows, any IPv6 packet that executes through this pipeline
will have the corresponding nd_sll field flagged as required match for
classification even if that field doesn't make sense in such a context
(for example, TCP packets). When the corresponding flow is installed into
the kernel datapath, this field is not reflected when the revalidator
executes the dump stage (see net/openvswitch/flow_netlink.c for more details).
During the sweep stage, revalidator will compare the dumped WC with a
generated WC - these will mismatch because the generated WC will match on
the Neighbor Discovery fields, while the datapath WC will not match on
these fields. We will then invalidate the flow and as a side effect
force an upcall.
By redefining the boundary, we shift these fields to the l4 subtable, and
cause masks to be generated matching just the requisite fields. The list
of fields being shifted:
struct in6_addr nd_target;
struct eth_addr arp_sha;
struct eth_addr arp_tha;
ovs_be16 tcp_flags;
ovs_be16 pad2;
struct ovs_key_nsh nsh;
A standout field would be tcp_flags moving from l3 subtable matches to
the l4 subtable matches. This reverts a partial performance optimization
in the case of stateless firewalling. The tcp_flags field might have
been a good candidate to retain in the l3 segment, but it got overloaded
with ICMPv6 ND matching, and therefore we can't preserve this kind of
optimization.
Two other approaches were considered - moving the nd_target field alone
and collapsing the l3/l4 segments into a single subtable for matching.
Moving any field individually introduces ABI mismatch, and doesn't
completely address the problems with other neighbor discovery related
fields (such as nd_sll/nd_tll). Collapsing the two subtables creates
an issue with datapath flow explosion, since the l3 and l4 fields will
be unwildcarded together (this can be seen with some of the existing
classifier tests).
A simple test is added to showcase the behavior.
Fixes: 476f36e83bc5 ("Classifier: Staged subtable matching.")
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2081773
Reported-by: Numan Siddique <nusiddiq@redhat.com>
Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Cian Ferriter <cian.ferriter@intel.com>
Tested-by: Numan Siddique <numans@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Sometimes we need to dump packets on more than two interfaces in a bridge
at the same time. Then when we stop dumping in order, ovs-tcpdump print
traceback and fail to delete mirror interface for some interface.
For example:
br-int has two interface tap1 and br-int. We use ovs-tcpdump dump tap1 first
and dump br-int next. Then stopping tap1 ovs-tcpdump first, and stopping
br-int second. When we stop ovs-tcpdump for br-int, the screen show the error
like this:
__main__.OVSDBException: Unable to delete Mirror m_br-int
Signed-off-by: Han Ding <handing@chinatelecom.cn>
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
In case almost or all available ports are taken, clash resolution can
take a very long time, resulting in pmd lockup.
This can happen when many to-be-natted hosts connect to same
destination:port (e.g. a proxy) and all connections pass the same SNAT.
Pick a random offset in the acceptable range, then try ever smaller
number of adjacent port numbers, until either the limit is reached or a
useable port was found. This results in at most 248 attempts
(128 + 64 + 32 + 16 + 8, i.e. 4 restarts with new search offset)
instead of 64000+.
Signed-off-by: wenxu <wenxu@chinatelecom.cn>
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Removing the IP iterations, and just picking the IP address
with the hash base on the least-used src-ip/dst-ip/proto triple.
Signed-off-by: wenxu <wenxu@chinatelecom.cn>
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Add a rcu_barrier before close_dpif_backer to ensure that
all meters have been freed before id_pool_destory meter's
id-pool.
Signed-off-by: Peng He <hepeng.0320@bytedance.com>
Tested-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
ovsrcu_barrier will block the current thread until all the postponed
rcu job has been finished. it's like a OVS version of the Linux
kernel rcu_barrier().
Signed-off-by: Peng He <hepeng.0320@bytedance.com>
Co-authored-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Currently the pmd-auto-lb-rebal-interval's value was not been
checked properly.
It maybe a negative, or too big value (>2 weeks between rebalances),
which will be lead to a big unsigned value. So reset it to default
if the value exceeds the max permitted as described in vswitchd.xml.
Fixes: 5bf84282482a ("Adding support for PMD auto load balancing")
Signed-off-by: Lin Huang <linhuang@ruijie.com.cn>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The ALB parameters should never be negative.
So it's to use unsigned smap_get versions to get it properly, and
update VLOG formatting.
Fixes: 5bf84282482a ("Adding support for PMD auto load balancing")
Signed-off-by: Lin Huang <linhuang@ruijie.com.cn>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Modify ci linux build script to use the latest DPDK stable release 21.11.1.
Modify Documentation to use the latest DPDK stable release 21.11.1.
Update NEWS file to reflect the latest DPDK stable release 21.11.1.
FAQ is updated to reflect the latest DPDK for each OVS branch.
Signed-off-by: Michael Phelan <michael.phelan@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Checking for each of the required AVX512 ISA separately will allow the
compiler to generate some AVX512 code where there is some support in the
compiler rather than only generating all AVX512 code when all of it is
supported or no AVX512 code at all.
For example, in GCC 4.9 where there is just support for AVX512F, this
patch will allow building the AVX512 DPIF.
Another example, in GCC 5 and 6, most AVX512 code can be generated, just
without AVX512VPOPCNTDQ support.
Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
No instructions from the AVX512DQ ISA are used anywhere in OVS. Remove
this unnecessary CFLAG.
Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
No instructions from the AVX512VL ISA are used.
Compilation for AVX512F and AVX512 BW ISA are already enabled in
lib/automake.mk for the dpif-netdev-lookup-avx512-gather.c file because
it's part of the libopenvswitchavx512.la library. They don't need to be
enabled at a function level.
Remove these unnecessary function-level compiler target attributes.
Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
GCC 5 gave an incompatible pointer type warning for pkt_blocks when it's
passed to _mm512_mask_i64gather_epi64().
Follow the same pattern used for tbl_blocks where the 'const uint64_t *'
is cast to a 'const void *' when passed in to avx512_blocks_gather().
Fixes: 47a2a8f4138e ("dpif-netdev/dpcls-avx512: Enable 16 block processing.")
Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
There's no need to do that because we're not changing the hmap.
Also, if DEBUG logging is disabled there's no need to iterate at
all.
Fixes: 5a9b53a51ec9 ("ovsdb raft: Fix duplicated transaction execution when leader failover.")
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Previously logging about rxq scheduling was done in a code branch with
the selection of the PMD thread core after checking that a numa was
selected.
By splitting out the logging from the PMD thread core selection, it can
simplify the code complexity and make it more extendable for future
additions.
Also, minor updates to a couple of variables to improve readability and
fix a log indent while working on this code block.
There is no user visible change in behaviour or logs.
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This splits up the looping through each PMD thread core on a numa node
with the check to compare cycles or rxqs.
This is done so in future the compare could be reused with any group
of PMD thread cores.
There is no user visible change in behaviour.
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
While becoming a follower, the leader aborts all the current
'in-flight' commands, so the higher layers can re-try corresponding
transactions when the new leader is elected. However, most of these
commands are already sent to followers as append requests, hence they
will actually be committed by the majority of the cluster members,
i.e. will be treated as committed by the new leader, unless there is
an actual network problem between servers. However, the old leader
will decline append replies, since it's not the leader anymore and
commands are already completed with RAFT_CMD_LOST_LEADERSHIP status.
New leader will replicate the commit index back to the old leader.
Old leader will re-try the previously "failed" transaction, because
"cluster error"s are temporary.
If a transaction had some prerequisites that didn't allow double
committing or there are other database constraints (like indexes) that
will not allow a transaction to be committed twice, the server will
reply to the client with a false-negative transaction result.
If there are no prerequisites or additional database constraints,
the server will execute the same transaction again, but as a follower.
E.g. in the OVN case, this may result in creation of duplicated logical
switches / routers / load balancers. I.e. resources with the same
non-indexed name. That may cause issues later where ovn-nbctl will
not be able to add ports to these switches / routers.
Suggested solution is to not complete (abort) the commands, but allow
them to be completed with the commit index update from a new leader.
It the similar behavior to what we do in order to complete commands
in a backward scenario when the follower becomes a leader. That
scenario was fixed by commit 5a9b53a51ec9 ("ovsdb raft: Fix duplicated
transaction execution when leader failover.").
Code paths for leader and follower inside the raft_update_commit_index
were very similar, so they were refactored into one, since we also
needed an ability to complete more than one command for a follower.
Failure test added to exercise scenario of a leadership transfer.
Fixes: 1b1d2e6daa56 ("ovsdb: Introduce experimental support for clustered databases.")
Fixes: 3c2d6274bcee ("raft: Transfer leadership before creating snapshots.")
Reported-at: https://bugzilla.redhat.com/2046340
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
When compiled with clang and '-fsanitize=undefined' set, running
'ovsdb-client --timestamp monitor Open_vSwitch' in a sandbox triggers
the following undefined behavior (flagged by UBSan):
lib/dynamic-string.c:207:38: runtime error: applying zero offset to null pointer
#0 0x4ebc18 in ds_put_strftime_msec lib/dynamic-string.c:207:38
#1 0x4ebd04 in xastrftime_msec lib/dynamic-string.c:225:5
#2 0x552e6a in table_format_timestamp__ lib/table.c:226:12
#3 0x552852 in table_print_timestamp__ lib/table.c:233:27
#4 0x5506f3 in table_print_table__ lib/table.c:254:5
#5 0x550633 in table_format lib/table.c:601:9
#6 0x5524f3 in table_print lib/table.c:633:5
#7 0x44dc5e in monitor_print_table ovsdb/ovsdb-client.c:1019:5
#8 0x44c650 in monitor_print ovsdb/ovsdb-client.c:1040:13
#9 0x44ac56 in do_monitor__ ovsdb/ovsdb-client.c:1500:21
#10 0x44636e in do_monitor ovsdb/ovsdb-client.c:1575:5
#11 0x442c41 in main ovsdb/ovsdb-client.c:283:5
Reported-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Implementation of SHA1 in OpenSSL library is much faster and optimized
for all available CPU architectures and instruction sets. OVS should
use it instead of internal implementation if possible.
Depending on compiler options OpenSSL's version finishes our sha1
unit tests from 3 to 12 times faster. Performance of OpenSSL's
version is constant, but OVS's implementation highly depends on
compiler. Interestingly, default build with '-g -O2' works faster
than optimized '-march=native -Ofast'.
Tests with ovsdb-server on big databases shows ~5-10% improvement of
the time needed for database compaction (sha1 is only a part of this
operation), depending on compiler options.
We still need internal implementation, because OpenSSL can be not
available on some platforms. Tests enhanced to check both versions
of API.
Reviewed-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This reverts commit c645550bb249 ("odp-util: Always report
ODP_FIT_TOO_LITTLE for IGMP.")
Always forcing a slow path action can result in some over-broad
flows which swallow all traffic and force them to userspace, as reported
in the thread at
https://mail.openvswitch.org/pipermail/ovs-dev/2021-September/387706.html
and at
https://mail.openvswitch.org/pipermail/ovs-dev/2021-September/387689.html
Revert the ODP_FIT_TOO_LITTLE return for IGMP specifically.
Additionally, remove the userspace wc mask to prevent revalidator from
cycling flows.
Fixes: c645550bb249 ("odp-util: Always report ODP_FIT_TOO_LITTLE for IGMP.")
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
During native tunnel encapsulation process, on tunnel neighbor cache
miss OVS sends an arp/nd request. Currently, tunnel source is used
as arp spa.
Find the spa which has the same subnet with the nexthop of tunnel dst
on egress port, if false, use the tunnel src as spa.
For example:
tunnel src is a vip with 10.0.0.7/32, tunnel dst is 10.0.1.7
the br-phy with address 192.168.0.7/24 and the default gateway is 192.168.0.1
So the spa of arp request for 192.168.0.1 should be 192.168.0.7 but not 10.0.0.7
Signed-off-by: wenxu <wenxu@chinatelecom.cn>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Rename get_src_addr to ovs_router_get_netdev_source_address and expose
this function to be used in the next commit.
Signed-off-by: wenxu <wenxu@chinatelecom.cn>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
OvsExtSendNBLComplete always release NBLs with flag SINGL_SOURCE, this is an
efficient way, which requires all the NBLs having the same source port, when
cloned/fragment NBLs are released, the parent NBLs will be released as well,
the problem is that a parent NBL may have a different source port from the
cloned/fragment NBL, so releasing the parent NBLs with flag SINGLE_SOURCE
is not corrct, see:
https://github.com/microsoft/hcsshim/issues/1056
When this happens, commands 'Get-NetAdapter' and 'Get-HnsEndpoint' in the
Windows node show that one net-adapter/hns-endpoint is in 'disconnected'
state, meanwhile, following processes are affected, ecah of them has one
thread pending on a lock:
vmcompute.exe
containerd.exe
antrea-agent.exe
To fix this issue, we may check SourcePortId in each parent NBLs before
released.
A simple way to reprodue this issue:
1, Enable encap mode
2, create 2 nodes, nodeA and nodeB
3, create podA with image k8s.gcr.io/e2e-test-images/agnhost:2.21 on
nodeA, run 'iperf/iperf.exe -s -p 9000 -D'
4, create podB with same image on nodeB, run command
'iperf/iperf.exe -c <podA-ip> -p 9000'
5, delete podB
6, run 'Get-NetAdapter' on nodeB, you will find the leaked net adapter.
Signed-off-by: Hongsheng Xie <hxie@vmware.com>
Signed-off-by: Alin-Gabriel Serdean <aserdean@ovn.org>