The test doesn't wait for old flows being revalidated before sending
the second packet. The packet hits old flows and doesn't increase the
new drop counter as a result.
Solution is to wait for revalidators to clean up old flows. This fixes
frequent test failures on CirrusCI.
Fixes: a13a0209750c ("userspace: Improved packet drop statistics.")
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
In some circumstances a flow may get its ct_state set without
conscious intervention by the OVS user space code.
Commit 355fef6f2ccbc optimizes out unnecessary ct_clear actions
based on an internal struct xlate_ctx->conntracked state flag.
Before this commit the xlate_ctx->conntracked state flag would
be initialized to 'false' and only set during thawing for
recirculation.
This patch checks the flow ct_state for the non-recirc case and
sets the internal conntracked state appropriately. A system
traffic test is also added to avoid regression.
Fixes: 355fef6f2ccbc ("ofproto-dpif-xlate: Avoid successive ct_clear datapath actions.")
Signed-off-by: Frode Nordahl <frode.nordahl@canonical.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
During flow processing, the flow wildcards are checked as a series of
stages, and these stages are intended to carry dependencies in a single
direction. But when the neighbor discovery processing, for example, was
executed there is an incorrect dependency chain - we need fields from
stage 4 to determine whether we need fields from stage 3.
We can build a set of flow rules to demonstrate this:
table=0,priority=100,ipv6,ipv6_src=1000::/10 actions=resubmit(,1)
table=0,priority=0 actions=NORMAL
table=1,priority=110,ipv6,ipv6_dst=1000::3 actions=resubmit(,2)
table=1,priority=100,ipv6,ipv6_dst=1000::4 actions=resubmit(,2)
table=1,priority=0 actions=NORMAL
table=2,priority=120,icmp6,nw_ttl=255,icmp_type=135,icmp_code=0,nd_sll=10🇩🇪ad:be:ef:10 actions=NORMAL
table=2,priority=100,tcp actions=NORMAL
table=2,priority=100,icmp6 actions=NORMAL
table=2,priority=0 actions=NORMAL
With this set of flows, any IPv6 packet that executes through this pipeline
will have the corresponding nd_sll field flagged as required match for
classification even if that field doesn't make sense in such a context
(for example, TCP packets). When the corresponding flow is installed into
the kernel datapath, this field is not reflected when the revalidator
executes the dump stage (see net/openvswitch/flow_netlink.c for more details).
During the sweep stage, revalidator will compare the dumped WC with a
generated WC - these will mismatch because the generated WC will match on
the Neighbor Discovery fields, while the datapath WC will not match on
these fields. We will then invalidate the flow and as a side effect
force an upcall.
By redefining the boundary, we shift these fields to the l4 subtable, and
cause masks to be generated matching just the requisite fields. The list
of fields being shifted:
struct in6_addr nd_target;
struct eth_addr arp_sha;
struct eth_addr arp_tha;
ovs_be16 tcp_flags;
ovs_be16 pad2;
struct ovs_key_nsh nsh;
A standout field would be tcp_flags moving from l3 subtable matches to
the l4 subtable matches. This reverts a partial performance optimization
in the case of stateless firewalling. The tcp_flags field might have
been a good candidate to retain in the l3 segment, but it got overloaded
with ICMPv6 ND matching, and therefore we can't preserve this kind of
optimization.
Two other approaches were considered - moving the nd_target field alone
and collapsing the l3/l4 segments into a single subtable for matching.
Moving any field individually introduces ABI mismatch, and doesn't
completely address the problems with other neighbor discovery related
fields (such as nd_sll/nd_tll). Collapsing the two subtables creates
an issue with datapath flow explosion, since the l3 and l4 fields will
be unwildcarded together (this can be seen with some of the existing
classifier tests).
A simple test is added to showcase the behavior.
Fixes: 476f36e83bc5 ("Classifier: Staged subtable matching.")
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2081773
Reported-by: Numan Siddique <nusiddiq@redhat.com>
Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Cian Ferriter <cian.ferriter@intel.com>
Tested-by: Numan Siddique <numans@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
ovsrcu_barrier will block the current thread until all the postponed
rcu job has been finished. it's like a OVS version of the Linux
kernel rcu_barrier().
Signed-off-by: Peng He <hepeng.0320@bytedance.com>
Co-authored-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Currently the pmd-auto-lb-rebal-interval's value was not been
checked properly.
It maybe a negative, or too big value (>2 weeks between rebalances),
which will be lead to a big unsigned value. So reset it to default
if the value exceeds the max permitted as described in vswitchd.xml.
Fixes: 5bf84282482a ("Adding support for PMD auto load balancing")
Signed-off-by: Lin Huang <linhuang@ruijie.com.cn>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
While becoming a follower, the leader aborts all the current
'in-flight' commands, so the higher layers can re-try corresponding
transactions when the new leader is elected. However, most of these
commands are already sent to followers as append requests, hence they
will actually be committed by the majority of the cluster members,
i.e. will be treated as committed by the new leader, unless there is
an actual network problem between servers. However, the old leader
will decline append replies, since it's not the leader anymore and
commands are already completed with RAFT_CMD_LOST_LEADERSHIP status.
New leader will replicate the commit index back to the old leader.
Old leader will re-try the previously "failed" transaction, because
"cluster error"s are temporary.
If a transaction had some prerequisites that didn't allow double
committing or there are other database constraints (like indexes) that
will not allow a transaction to be committed twice, the server will
reply to the client with a false-negative transaction result.
If there are no prerequisites or additional database constraints,
the server will execute the same transaction again, but as a follower.
E.g. in the OVN case, this may result in creation of duplicated logical
switches / routers / load balancers. I.e. resources with the same
non-indexed name. That may cause issues later where ovn-nbctl will
not be able to add ports to these switches / routers.
Suggested solution is to not complete (abort) the commands, but allow
them to be completed with the commit index update from a new leader.
It the similar behavior to what we do in order to complete commands
in a backward scenario when the follower becomes a leader. That
scenario was fixed by commit 5a9b53a51ec9 ("ovsdb raft: Fix duplicated
transaction execution when leader failover.").
Code paths for leader and follower inside the raft_update_commit_index
were very similar, so they were refactored into one, since we also
needed an ability to complete more than one command for a follower.
Failure test added to exercise scenario of a leadership transfer.
Fixes: 1b1d2e6daa56 ("ovsdb: Introduce experimental support for clustered databases.")
Fixes: 3c2d6274bcee ("raft: Transfer leadership before creating snapshots.")
Reported-at: https://bugzilla.redhat.com/2046340
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Implementation of SHA1 in OpenSSL library is much faster and optimized
for all available CPU architectures and instruction sets. OVS should
use it instead of internal implementation if possible.
Depending on compiler options OpenSSL's version finishes our sha1
unit tests from 3 to 12 times faster. Performance of OpenSSL's
version is constant, but OVS's implementation highly depends on
compiler. Interestingly, default build with '-g -O2' works faster
than optimized '-march=native -Ofast'.
Tests with ovsdb-server on big databases shows ~5-10% improvement of
the time needed for database compaction (sha1 is only a part of this
operation), depending on compiler options.
We still need internal implementation, because OpenSSL can be not
available on some platforms. Tests enhanced to check both versions
of API.
Reviewed-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This reverts commit c645550bb249 ("odp-util: Always report
ODP_FIT_TOO_LITTLE for IGMP.")
Always forcing a slow path action can result in some over-broad
flows which swallow all traffic and force them to userspace, as reported
in the thread at
https://mail.openvswitch.org/pipermail/ovs-dev/2021-September/387706.html
and at
https://mail.openvswitch.org/pipermail/ovs-dev/2021-September/387689.html
Revert the ODP_FIT_TOO_LITTLE return for IGMP specifically.
Additionally, remove the userspace wc mask to prevent revalidator from
cycling flows.
Fixes: c645550bb249 ("odp-util: Always report ODP_FIT_TOO_LITTLE for IGMP.")
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Modified the dplcs info-get command output to include
the count for different dpcls implementations.
$ovs-appctl dpif-netdev/subtable-lookup-info-get
Available dpcls implementations:
autovalidator (Use count: 1, Priority: 5)
generic (Use count: 0, Priority: 1)
avx512_gather (Use count: 0, Priority: 3)
Test case to verify changes:
1061: PMD - dpcls configuration ok
Signed-off-by: Kumar Amber <kumar.amber@intel.com>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Co-authored-by: Harry van Haaren <harry.van.haaren@intel.com>
Co-authored-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
The FreeBSD CI builds keep failing because processes of tests are not
properly killed. This leaves the build hanging until it times out, and
ultimately fails.
Changes name of pidfile pid2 to 2.pid so that the
on_exit 'kill `cat *.pid`' will capture all pidfiles.
Fixes pidfile name logic in test that uses OVSDB_SERVER_SHUTDOWN_N, so
that all pidfile names match the form *.pid.
Replaces unnecessary --pidfile="`pwd`"/pid with just --pidfile, because
by default this argument creates a pidfile named <proc-name>.pid.
Removes extra [test ! -e pid || kill `cat pid`] that run upon AT_CHECK
failure, because those processes are killed with on_exit. Also adds
on_exit in tests where it was missing.
Fixes: 561205007e17 ("tests: Get rid of overly specific --pidfile and --unixctl options.")
Fixes: 0be15ad76f0f ("ovsdb-server.at: Add unit test for record/replay.")
Fixes: 7964ffe7d2bf ("ovsdb: relay: Add support for transaction forwarding.")
Fixes: e879d33e8398 ("ovsdb/jsonrpc-server: ovsdb-server closes accepted connections immediately.")
Reported-at: https://github.com/cirruslabs/cirrus-ci-docs/issues/910
Signed-off-by: Rosemarie O'Riorden <roriorden@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Note: There still is an UB instance when using SSE4.2 as reported here:
https://mail.openvswitch.org/pipermail/ovs-dev/2022-January/390904.html
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
For example is parsing the OVN "eth.dst[40] = 1;" action, which
triggered the following warning from UndefinedBehaviorSanitizer:
lib/meta-flow.c:3210:9:
runtime error: member access within misaligned address 0x000000de4e36
for type 'const union mf_value', which requires 8 byte alignment
0x000000de4e36: note: pointer points here
00 00 00 00 01 00 00 00 00 00 00 00 00 00 70 4e de 00 00 00 00 00
^
10 51 de 00 00 00 00 00 c0 4f
0 0x5818bc in mf_format lib/meta-flow.c:3210
1 0x5b6047 in format_SET_FIELD lib/ofp-actions.c:3342
2 0x5d68ab in ofpact_format lib/ofp-actions.c:9213
3 0x5d6ee0 in ofpacts_format lib/ofp-actions.c:9237
4 0x410922 in test_parse_actions tests/test-ovn.c:1360
To avoid this we now change the internal representation of the set_field
actions, struct ofpact_set_field, such that the mask is always stored
at a correctly aligned address, multiple of OFPACT_ALIGNTO.
We also need to adapt the "ovs-ofctl show-flows - Oversized flow" test
because now the ofpact representation of the set_field action uses more
bytes in memory (for the extra alignment). Change the test to use
dec_ttl instead.
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
UB Sanitizer reports:
tests/test-hash.c:59:40:
runtime error: shift exponent 64 is too large for 64-bit type
'long unsigned int'
0 0x44c3c9 in get_range128 tests/test-hash.c:59
1 0x44cb2e in check_hash_bytes128 tests/test-hash.c:178
2 0x44d14d in test_hash_main tests/test-hash.c:282
[...]
ofproto/ofproto-dpif-xlate.c:5607:45:
runtime error: left shift of 65535 by 16 places cannot be represented
in type 'int'
0 0x53fe9f in xlate_sample_action ofproto/ofproto-dpif-xlate.c:5607
1 0x54d625 in do_xlate_actions ofproto/ofproto-dpif-xlate.c:7160
2 0x553b76 in xlate_actions ofproto/ofproto-dpif-xlate.c:7806
3 0x4fcb49 in upcall_xlate ofproto/ofproto-dpif-upcall.c:1237
4 0x4fe02f in process_upcall ofproto/ofproto-dpif-upcall.c:1456
5 0x4fda99 in upcall_cb ofproto/ofproto-dpif-upcall.c:1358
[...]
tests/test-util.c:89:23:
runtime error: left shift of 1 by 31 places cannot be represented in
type 'int'
0 0x476415 in test_ctz tests/test-util.c:89
[...]
lib/dpif-netlink.c:396:33:
runtime error: left shift of 1 by 31 places cannot be represented in
type 'int'
0 0x571b9f in dpif_netlink_open lib/dpif-netlink.c:396
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
nx_to_ofp_flow_update_event() aborts the execution if incorrect
event is passed, so checking has to be done before conversion
in order to avoid the crash while decoding malformed flow update
message:
==397030==ERROR: AddressSanitizer: ABRT on unknown address 0x... )
0 0x7fd26688418b in raise
1 0x7fd266863858 in abort
2 0x6a6cbd in nx_to_ofp_flow_update_event lib/ofp-monitor.c:399:9
3 0x6a6cbd in ofputil_decode_flow_update lib/ofp-monitor.c:856:25
4 0x56491d in ofp_print_flow_monitor_reply lib/ofp-print.c:779:22
5 0x55f0a0 in ofp_to_string__ lib/ofp-print.c:1154:16
6 0x55f0a0 in ofp_to_string lib/ofp-print.c:1244:21
7 0x5603a5 in ofp_print lib/ofp-print.c:1288:28
Credit to OSS-Fuzz.
Additionally removed the extra 'reply' word from the error message,
since ofpraw_get_name(raw) already has one.
Fixes: c3e64047d1cc ("ofp-monitor: Support flow monitoring for OpenFlow 1.3, 1.4+.")
Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=47112
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Extended OpenFlow monitoring support
* OpenFlow 1.3 with ONF extensions
* OpenFlow 1.4+ as defined in OpenFlow specification 1.4+.
ONF extensions are similar to Nicira extensions except for onf_flow_monitor_request{}
where out_port is defined as 32-bit number OF(1.1) number, oxm match formats are
used in update and request messages.
Flow monitoring support in 1.4+ is slightly different from Nicira and ONF
extensions.
* More flow monitoring flags are defined.
* Monitor add/modify/delete command is introduced in flow_monitor
request message.
* Addition of out_group as part of flow_monitor request message
Description of changes:
1. Generate ofp-msgs.inc to be able to support 1.3, 1.4+ flow Monitoring messages.
include/openvswitch/ofp-msgs.h
2. Modify openflow header files with protocol specific headers.
include/openflow/openflow-1.3.h
include/openflow/openflow-1.4.h
3. Modify OvS abstraction of openflow headers. ofp-monitor.h leverages enums
from on nicira extensions for creating protocol abstraction headers. OF(1.4+)
enums are superset of nicira extensions.
include/openvswitch/ofp-monitor.h
4. Changes to these files reflect encoding and decoding of new protocol messages.
lib/ofp-monitor.c
5. Changes to modules using ofp-monitor APIs. Most of the changes here are to
migrate enums from nicira to OF 1.4+ versions.
ofproto/connmgr.c
ofproto/connmgr.h
ofproto/ofproto-provider.h
ofproto/ofproto.c
6. Extended protocol decoding tests to verify all protocol versions
FLOW_MONITOR_CANCEL
FLOW_MONITOR_PAUSED
FLOW_MONITOR_RESUMED
FLOW_MONITOR request
FLOW_MONITOR reply
tests/ofp-print.at
7. Modify flow monitoring tests to be able executed by all protocol versions.
tests/ofproto.at
7. Modified documentation highlighting the change
utilities/ovs-ofctl.8.in
NEWS
Signed-off-by: Vasu Dasari <vdasari@gmail.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-June/383915.html
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Currently OVS supports flow-monitoring for OpenFlow 1.0 and Nicira Extenstions.
Any other OpenFlow versioned messages are not accepted. This change will allow
OpenFlow1.0-1.2 Flow Monitoring with Nicira extensions be accepted. Also made
sure that flow-monitoring updates, flow monitoring pause messages, resume
messages are sent in the same OpenFlow version as that of flow-monitor request.
Description of changes:
1. Generate ofp-msgs.inc to be able to support 1.0-1.2 Flow Monitoring messages.
include/openvswitch/ofp-msgs.h
2. Support vconn to accept user specified version and use it for vconn
flow-monitoring session
ofproto/ofproto.c
3. Modify APIs to use protocol as an argument to encode and decode messages
include/openvswitch/ofp-monitor.h
lib/ofp-monitor.c
ofproto/connmgr.c
ofproto/connmgr.h
ofproto/ofproto.c
4. Modified following testcases to be verified across supported OF Versions
ofproto - flow monitoring
ofproto - flow monitoring with !own
ofproto - flow monitoring with out_port
ofproto - flow monitoring pause and resume
ofproto - flow monitoring usable protocols
tests/ofproto.at
5. Updated NEWS with the support added with this commit
Signed-off-by: Vasu Dasari <vdasari@gmail.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2020-December/050820.html
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
At a first glance, change tracking should never be allowed for
write-only columns. However, some clients (e.g., ovn-northd) that are
mostly exclusive writers of a database, use change tracking to avoid
duplicating the IDL row records into a local cache when implementing
incremental processing.
The default behavior of the IDL is to automatically turn a write-only
column into a read-write column whenever the client enables change
tracking for that column.
For the afore mentioned clients, this becomes a performance issue.
Commit 1cc618c32524 ("ovsdb-idl: Fix atomicity of writes that don't
change a column's value.") explains why writes that don't change a
column's value cannot be optimized out early if the column is
read/write.
Furthermore, if there is at least one record in any table that
changed during a transaction, then *all* records that have been
written are added to the transaction, even if their values didn't
change. If there are many such rows (e.g., like in ovn-northd's
case) this incurs a significant overhead because:
a. the client has to build this large transaction
b. the transaction has to be sent over the network
c. the server needs to parse this (mostly) no-op update
We now introduce new IDL APIs allowing users to set a new monitoring
mode flag, OVSDB_IDL_WRITE_CHANGED_ONLY, to indicate to the IDL that the
atomicity constraints may be relaxed and written columns that don't
change value can be skipped from the current transaction.
We benchmarked ovn-northd performance when using this new mode
against NB and SB databases taken from ovn-kubernetes scale tests.
We noticed that when a minor change is performed to the Northbound
database (e.g., NB_Global.nb_cfg is incremented) the time it takes to
build the Southbound transaction becomes negligible (vs ~1.5 seconds
before this change).
End-to-end ovn-kubernetes scale tests on 120-node clusters also show
significant reduction of latency to bring up pods; both average and P99
latency decreased by ~30%.
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
When a packet is received over an access port that needs to be sent
over a vxlan tunnel,the access port VLAN id is used in the lookup
leading to a wrong packet being crafted and sent over the tunnel.
Clear out the flow 's VLAN field as it should not be used while
performing mac lookup for the outer tunnel and also at this point
the VLAN action related to inner flow is already committed.
Fixes: 7c12dfc527a5 ("tunneling: Avoid datapath-recirc by combining recirc actions at xlate.")
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2022-April/393566.html
Reported-at: https://bugzilla.redhat.com/2060552
Signed-off-by: Thilak Raj Surendra Babu <thilakraj.sb@nutanix.com>
Signed-off-by: Rosemarie O'Riorden <roriorden@redhat.com>
Co-authored-by: Rosemarie O'Riorden <roriorden@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
A packet received from a tunnel port with legacy_l3 packet-type (e.g.
lisp, L3 gre, gtpu) is conceptually wrapped in a dummy Ethernet header
for processing in an OF pipeline that is not packet-type-aware. Before
transmission of the packet to another legacy_l3 tunnel port, the dummy
Ethernet header is stripped again.
In ofproto-xlate, wrapping in the dummy Ethernet header is done by
simply changing the packet_type to PT_ETH. The generation of the
push_eth datapath action is deferred until the packet's flow changes
need to be committed, for example at output to a normal port. The
deferred Ethernet encapsulation is marked in the pending_encap flag.
This patch fixes a bug in the translation of the output action to a
legacy_l3 tunnel port, where the packet_type of the flow is reverted
from PT_ETH to PT_IPV4 or PT_IPV6 (depending on the dl_type) to remove
its Ethernet header without clearing the pending_encap flag if it was
set. At the subsequent commit of the flow changes, the unexpected
combination of pending_encap == true with an PT_IPV4 or PT_IPV6
packet_type hit the OVS_NOT_REACHED() abortion clause.
The pending_encap is now cleared in this situation.
Reported-by: Dincer Beken <dbeken@blackned.de>
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Co-authored-by: Dincer Beken <dbeken@blackned.de>
Signed-off-by: Dincer Beken <dbeken@blackned.de>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The test relied on the flows installed by recv_upcalls() after
upcall_receive() returned ENODEV if the packet was initially
originated by packet-out with OFPP_CONTROLLER as in_port.
Since 323ae1e808e6 ("ofproto-dpif-xlate: Fix recirculation when in_port is OFPP_CONTROLLER.")
the test stopped working because recirculation in such scenario got
fixed and upcall_receive() no longer returns ENODEV.
Fix it by setting an invalid as "in_port" in order to similarly
trigger the same behavior.
Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
PMD auto load balance currently only operates when the polling PMD
thread core will not change numa after reassignment.
Add a unit test for this.
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Ensure that if there are no local numa PMD thread
cores available that pmd cores from all other non-local
numas will be used.
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Considering the following flows:
ovs-ofctl dump-flows br0
cookie=0x0, table=0, priority=0 actions=NORMAL
and assuming a packet originated from packet-out in this way:
ovs-ofctl packet-out br0 \
"in_port=controller,packet=<UDP packet>,action=ct(table=0)"
If in_port is OFPP_NONE or OFPP_CONTROLLER, this leads to a
NULL pointer (xport) dereference in xlate_normal().
Fix it by checking the xport pointer validity while deciding whether
it is a candidate for mac learning or not.
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Using SHORT version of the *_SAFE loops makes the code cleaner and less
error prone. So, use the SHORT version and remove the extra variable
when possible for HINDEX_*_SAFE.
In order to be able to use both long and short versions without changing
the name of the macro for all the clients, overload the existing name
and select the appropriate version depending on the number of arguments.
Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Re-write hindex's loops using multi-variable helpers.
For safe loops, use the LONG version to maintain backwards
compatibility.
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Re-write cmap's loops using multi-variable helpers.
For iterators based on cmap_cursor, we just need to make sure the NODE
variable is not used after the loop, so we set it to NULL.
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Using SHORT version of the *_SAFE loops makes the code cleaner and less
error prone. So, use the SHORT version and remove the extra variable
when possible for hmap and all its derived types.
In order to be able to use both long and short versions without changing
the name of the macro for all the clients, overload the existing name
and select the appropriate version depending on the number of arguments.
Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
HMAP_FOR_EACH_POP iterator has an additional difficulty, which is the
use of two iterator variables of different types.
In order to re-write this loop in a UB-safe manner, create a iterator
struct to be used as loop variable.
Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Rewrite hmap's loops using multi-variable helpers.
For SAFE loops, use the LONG version of the multi-variable macros to
keep backwards compatibility.
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Using the SHORT version of the *_SAFE loops makes the code cleaner
and less error-prone. So, use the SHORT version and remove the extra
variable when possible.
In order to be able to use both long and short versions without changing
the name of the macro for all the clients, overload the existing name
and select the appropriate version depending on the number of arguments.
Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Use multi-variable iteration helpers to rewrite non-safe loops.
There is an important behavior change compared with the previous
implementation: When the loop ends normally (i.e: not via "break;"), the
object pointer provided by the user is NULL. This is safer because it's
not guaranteed that it would end up pointing a valid address.
For pop iterator, set the variable to NULL when the loop ends.
Clang-analyzer has successfully picked the potential null-pointer
dereference on the code that triggered this change (bond.c) and nothing
else has been detected.
For _SAFE loops, use the LONG version for backwards compatibility.
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
'conntrack - DNAT load balancing' test fails from time to time
because not all the group buckets are getting hit.
In short, the test creates a group with 3 buckets with the same
weight. It creates 12 TCP sessions and expects that each bucket
will be used at least once. However, there is a solid chance that
this will not happen. The probability of having at least one
empty bucket is:
C(3, 1) x (2/3)^N - C(3, 2) x (1/3)^N
Where N is the number of distinct TCP sessions. For N=12, the
probability is about 0.023, i.e. there is a 2.3% chance for a
test to fail, which is not great for CI.
Increasing the number of sessions to 50 to reduce the probability
of failure down to 4.7e-9. In my testing, the configuration with
50 TCP sessions didn't fail after 6000 runs. Should be good
enough for CI systems.
Fixes: 2c66ebe47a88 ("ofp-actions: Allow conntrack action in group buckets.")
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Michael Phelan <michael.phelan@intel.com>
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
OVS_WAIT_UNTIL() macro has only 2 arguments and doesn't check the
output of the command, but bonding and route tests are trying to use
it as if it was AT_CHECK macro. That makes checks in those tests
mostly useless, since they are not actually checking anything except
for command returning zero.
Introducing a new macro OVS_WAIT_UNTIL_EQUAL that will actually
perform the comparison with the desired output. Using it for
the bonding and route tests and fixing all the caught incorrect
expected outputs along the way.
Adding an explicit argument check to the OVS_WAIT_UNTIL/WHILE to
avoid the problem in the future.
Fixes: b4e50218a0f8 ("bond: Add 'primary' interface concept for active-backup mode.")
Fixes: 9e11517e6ca6 ("ovs-router: Fix flushing of local routes.")
Acked-by: Aaron Conole <aconole@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
When the same flow is programmed in the kernel and tc, they
look different due to the way they are translated. They take
the userspace approach by always including the packet type
attribute. To make the outputs the same, show the ethernet
header when the packet type is wildcarded, and not printed.
So without the fix the kernel would show (ovs-appctl dpctl/dump-flows):
in_port(3),eth(),eth_type(0x0800),ipv4(frag=no), ..., actions:output
Where as TC would show:
in_port(3),eth_type(0x0800),ipv4(frag=no), ..., actions:output
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
AVX512 DPIF must be active in order for the MFEX AutoValidator to be executed.
If the DPIF-AVX512 is not available, the unit test is skipped, as the
scalar DPIF does not use the MFEX function-pointer based optimizations.
Fixes: 50be6715c083 ("test/sytem-dpdk: Add unit test for mfex autovalidator")
Suggested-by: Cian Ferriter <cian.ferriter@intel.com>
Acked-by: Cian Ferriter <cian.ferriter@intel.com>
Signed-off-by: Kumar Amber <kumar.amber@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Since 4a6a4734622e ("netlink-socket: Log extack error messages in
netlink transactions."), tests fail on older systems that don't support
NETLINK_EXT_ACK. It's not really an issue, so we can just ignore the
log.
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Depending on compiler flags and CPU architecture different hash
function are used. That impacts the order of tables and columns
in database representation making ovsdb report different columns
in the warning about ephemeral-to-persistent conversion.
Stripping out changing parts of the message to avoid the issue.
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
It seems Linux native tunnel configuration changed to enable
checksum by default and that causes the check-system-tso unit
test below to fail:
10: datapath - ping over vxlan tunnel FAILED (system-traffic.at:248)
That happens because userspace TSO doesn't support encapsulation
as mentioned in the current documentation. In this specific case,
udp_extract_tnl_md() checks if the checksum is correct, but since
TSO is enabled, the outer UDP header contains only the pseudo
checksum and not the full packet checksum.
Although the packet is marked correctly with UDP csum offload flag
and the code could use that to verify the pseudo csum, more work
is needed to properly translate the offloading flags from the outer
headers to the inner headers. For example, if the payload is a
TCP packet, most probably the flag DP_PACKET_OL_TX_UDP_CKSUM doesn't
make sense after decapsulating that.
This patch skips the tunnel tests when the userspace TSO is enabled.
Fixes: 29bb3093eb8b ("userspace: Enable TSO support for non-DPDK.")
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Even though relays can be scaled to the big number of servers to
handle a lot more clients, lack of transaction history may cause
significant load if clients are re-connecting.
E.g. in case of the upgrade of a large-scale OVN deployment, relays
can be taken down one by one forcing all the clients of one relay to
jump to other ones. And all these clients will download the database
from scratch from a new relay.
Since relay itself supports monitor_cond_since connection to the
main cluster, it receives the last transaction id along with each
update. Since these transaction ids are 'eid's of actual transactions,
they can be used by relay for a transaction history.
Relay may not receive all the transaction ids, because the main cluster
may combine several changes into a single monitor update. However,
all relays will, likely, receive same updates with the same transaction
ids, so the case where transaction id can not be found after
re-connection between relays should not be very common. If some id
is missing on the relay (i.e. this update was merged with some other
update and newer id was used) the client will just re-download the
database as if there was a normal transaction history miss.
OVSDB client synchronization module updated to provide the last
transaction id along with the update. Relay module updated to use
these ids as a transaction id. If ids are zero, relay decides that
the main server doesn't support transaction ids and disables the
transaction history accordingly.
Using ovsdb_txn_replay_commit() instead of ovsdb_txn_propose_commit_block(),
so transactions are added to the history. This can be done, because
relays has no file storage, so there is no need to write anything.
Relay tests modified to test both standalone and clustered database
as a main server. Checks added to ensure that all servers receive the
same transaction ids in monitor updates.
Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
If the joining server re-connects while catching up (e.g. if it crashed
or connection got closed due to inactivity), the data we sent might be
lost, so the server will never reply to append request or a snapshot
installation request. At the same time, leader will decline all the
subsequent requests to join from that server with the 'in progress'
resolution. At this point the new server will never be able to join
the cluster, because it will never receive the raft log while leader
thinks that it was already sent.
This happened in practice when one of the servers got preempted for a
few seconds, so the leader closed connection due to inactivity.
Destroying the joining server if disconnection detected. This will
allow to start the joining from scratch when the server re-connects
and sends the new join request.
We can't track re-connection in the raft_conn_run(), because it's
incoming connection and the jsonrpc will not keep it alive or
try to reconnect. Next time the server re-connects it will be an
entirely new raft conn.
Fixes: 1b1d2e6daa56 ("ovsdb: Introduce experimental support for clustered databases.")
Reported-at: https://bugzilla.redhat.com/2033514
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>
The purpose of reconnect_deadline__() function is twofold:
1. Its result is used to tell if the state has to be changed right now
in reconnect_run().
2. Its result also used to determine when the process need to wake up
and call reconnect_run() for a next time, i.e. when the state may
need to be changed next time.
Since introduction of the 'receive-attempted' feature, the function
returns LLONG_MAX if the deadline is in the future. That works for
the first case, but doesn't for the second one, because we don't
really know when we need to call reconnect_run().
This is the problem for applications where jsonrpc connection is the
only source of wake ups, e.g. ovn-northd. When the network goes down
silently, e.g. server looses IP address due to DHCP failure, ovn-northd
will sleep in the poll loop indefinitely after being told that it
doesn't need to call reconnect_run() (deadline == LLONG_MAX).
Fixing that by actually returning the expected time if it is in the
future, so we will know when to wake up. In order to keep the
'receive-attempted' feature, returning 'now + 1' in case where the
time has already passed, but receive wasn't attempted. That will
trigger a fast wake up, so the application will be able to attempt the
receive even if there was no real events. In a correctly written
application we should not fall into this case more than once in a row.
'+ 1' ensures that we will not transition into a different state
prematurely, i.e. before the receive is actually attempted.
Fixes: 4241d652e465 ("jsonrpc: Avoid disconnecting prematurely due to long poll intervals.")
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Some sporadic false positive may be visible for the following tests:
- conntrack - IPv6 HTTP
- conntrack - FTP over IPv6
The failures show up randomly.
The reason appears to be source address used when performing the
request using wget:
-tcp,orig=(src=fc00::1,dst=fc00::2,sport=<cleared>,dport=<cleared>),\
reply=(src=fc00::2,dst=fc00::1,sport=<cleared>,dport=<cleared>),\
protoinfo=(state=<cleared>)
+tcp,orig=(src=fe80::f0eb:f8ff:fef0:138f,dst=fc00::2,sport=<cleared>,dport=<cleared>),\
reply=(src=fc00::2,dst=fe80::f0eb:f8ff:fef0:138f,sport=<cleared>,dport=<cleared>),\
protoinfo=(state=<cleared>)
It seems that the problem can be addressed in multiple ways, but using
"nodad" seems to be safe enough to fix the issue that now, after
hundreds of attempts, is no longer present.
Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
turn a bunch of test ranges from 10.1.1.240-10.1.1.255 to
10.1.1.240-10.1.1.254. 10.1.1.255 is the broadcast address for
10.1.1.0/24 and can be picked to SNAT packets causing the subsequent
failure of the test.
Fixes: 9ac0aadab9f9 ("conntrack: Add support for NAT.")
Fixes: e32cd4c6292e ("conntrack: ignore port for ICMP/ICMPv6 NAT.")
Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Some specific warning are seen on various systems
which may not be visible on others but good to add
such logs to test to avoid test-case failure.
Thw warning only effects the fuzzy tests due to
more than 1000+ flows being offloading simultanously.
Wilcarding flow count number as for different systems
under test the number could vary in the warning log.
Suggested-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Kumar Amber <kumar.amber@intel.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Cian Ferriter <cian.ferriter@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
We are currently requiring in_port to be a valid port number for ipfix
sampling even if the sampling is done on the output port (egress).
This restriction is not really needed and can affect pipelines that
intentionally set the in_port to OFPP_NONE during flow processing. For
instance, OVN does this, see:
cfa547821 Fix ovn-controller generated packets from getting dropped for
reject ACL action.
This patch skips ipfix sampling only if both (ingress and egress) ports
are invalid.
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2016346
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Added PT_MPLS_MC support in function xlate_generic_decap_action to fix
packet drops when decap action is applied on packets with packet_type
(ns=1,type=0x8848).
Fixes: 1917ace89364 ("Encap & Decap actions for MPLS packet type.")
Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
When reconnecting, if there are condition changes already sent to the
server but not yet acked, reset the db's 'last-id', esentially clearing
the local cache after reconnect.
This is needed because the client cannot easily differentiate between
the following cases:
a. either the server already processed the requested monitor
condition change but the FSM was restarted before the
client was notified. In this case the client should
clear its local cache because it's out of sync with the
monitor view on the server side.
b. OR the server hasn't processed the requested monitor
condition change yet.
Conditions changing at the same time with a reconnection happening are
rare so the performance impact of this patch should be minimal.
Also, the tests are updated to cover the fact that we cannot control
which of the two scenarios ("a" and "b" above) are hit during the test.
Reported-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Since we're testing serialization, it also makes sense to test
the opposite operation. Should be useful in the future for
exploring possible optimizations.
CMD: $ ./tests/ovstest json-string-benchmark
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>