Recently OVS adopted a policy of using the inclusive naming word list v1
[1, 2]. And in keeping with this policy it is intended to rename the
primary development branch from master to main [3].
In order to help facilitate this change allow Appveyor to run
on the main as well as master branch. It is intended that
master branch will be removed from appveyor.yml after the primary branch
has been renamed.
Also, update the string included in artifacts from 'master' to 'main'.
[1] df5e5cf431 ("Documentation: Add section on inclusive language.")
[2] https://inclusivenaming.org/word-lists/
[3] https://mail.openvswitch.org/pipermail/ovs-dev/2024-March/412686.html
Signed-off-by: Simon Horman <horms@ovn.org>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Consider the following sequence of events:
1. Cluster with 2 nodes - A and B. A is a leader.
2. C connects to A and sends a join request.
3. A sends an append request to C. C is in CATCHUP phase for A.
4. A looses leadership to B. Sends join failure notification to C.
5. C sends append reply to A.
6. A discards append reply (not leader).
7. B looses leadership back to A.
8. C sends a new join request to A.
9. A replies with failure (already in progress).
10. GoTo step 8.
At this point A is waiting for an append reply that it already
discarded at step 6 and fails all the new attempts of C to join with
'already in progress' verdict. C stays forever in a joining state
and in a CATCHUP phase from A's perspective.
This is a similar case to a sudden disconnect from a leader fixed in
commit 999ba294fb ("ovsdb: raft: Fix inability to join the cluster
after interrupted attempt."), but since we're not disconnecting, the
servers are not getting destroyed.
Fix that by destroying all the servers that are not yet part of the
configuration after leadership is lost. This way, server C will be
able to simply re-start the joining process from scratch.
New failure test command is added in order to simulate leadership
change before we receive the append reply, so it gets discarded.
New cluster test is added to exercise this scenario.
Fixes: 1b1d2e6daa ("ovsdb: Introduce experimental support for clustered databases.")
Reported-at: https://github.com/ovn-org/ovn/issues/235
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Some of the failure tests can make a single-node cluster to
loose leadership. In this case the next raft_run() will
trigger election with a pre-vote enabled. This is causing
an assertion when this server attempts to vote for itself.
Fix that by not using pre-voting if there is only one server.
A new failure test introduced in later commit triggers this
assertion every time.
Fixes: 85634fd580 ("ovsdb: raft: Support pre-vote mechanism to deal with disruptive server.")
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Consider the following chain of events:
1. Have a cluster with 2 members - A and B. A is a leader.
2. C connects to A, sends a request to join the cluster.
3. A catches up C, creates an update for the 'servers' list and sends
it to B and C to apply. This entry is not committed yet.
4. Before B or C can reply, A looses leadership for some reason.
5. A sends a joining failure message to C, C remains in joining state.
5. Both B and C have the new version of 'servers', so they recognize
each other as valid cluster members.
6. B initiates a vote, C (or A) replies and B becomes a new leader.
7. B has a new list of servers. B commits it. C becomes a committed
cluster member.
8. A and C receive heartbeats with a new commit index and C is now
a committed cluster member for all A, B and C.
However, at the end of this process, C is still in joining state
as it never received a successful reply for a join request, and C is
still in a COMMITTING phase for A. So, C skips some parts of the RAFT
life cycle and A will refuse to transfer leadership to C if something
happens in the future.
More interestingly, B can actually transfer leadership to C and vote
for it. A will vote for it just fine as well. After that, C becomes
a new cluster leader while still in joining state. In this state C
will not commit any changes. So, we have seemingly stable cluster
that doesn't commit any changes! E.g.:
s3
Address: unix:s3.raft
Status: joining cluster
Remotes for joining: unix:s3.raft unix:s2.raft unix:s1.raft
Role: leader
Term: 4
Leader: self
Vote: self
Last Election started 30095 ms ago, reason: leadership_transfer
Last Election won: 30093 ms ago
Election timer: 1000
Log: [2, 7]
Entries not yet committed: 2
Entries not yet applied: 6
Connections: ->s1 ->s2 <-s1 <-s2
Disconnections: 0
Servers:
s3 (60cf at unix:s3.raft) (self) next_index=7 match_index=6
s2 (46aa at unix:s2.raft) next_index=7 match_index=6 last msg 58 ms ago
s1 (28f7 at unix:s1.raft) next_index=7 match_index=6 last msg 59 ms ago
Fix the first scenario by examining server changes in committed log
entries. This way server A can transition C to a STABLE phase and
server C can find itself in the committed list of servers and move out
from a joining state. This is similar to completing commands without
receiving an explicit reply or after the role change from leader to
follower.
The second scenario with a leader in a joining state can be fixed
when the joining server becomes leader. New leader's log is getting
committed automatically and all servers transition into STABLE phase
for it, but it should also move on from a joining state, since it
leads the cluster now.
It is also possible that B transfers leadership to C before the list
of servers is marked as committed on other servers. In this case
C will commit it's own addition to the cluster configuration.
The added test usually triggers both scenarios, but it will trigger at
least one of them.
Fixes: 1b1d2e6daa ("ovsdb: Introduce experimental support for clustered databases.")
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
While joining, ovsdb-server may not wake up for a duration of a join
timer, which is 1 second and is by default 3x larger than a heartbeat
timer. This is causing unnecessary warnings from the cooperative
multitasking module that thinks that we missed the heartbeat time by
a lot.
Use join timer (1000) instead while joining.
Fixes: d4a15647b9 ("ovsdb: raft: Enable cooperative multitasking.")
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Current implementation of the leadership transfer just shoots the
leadership in the general direction of the first stable server in the
configuration. It doesn't check if the server was active recently or
even that the connection is established. This may result in sending
leadership to a disconnected or otherwise unavailable server.
Such behavior should not cause log truncation or any other correctness
issues because the destination server would have all the append
requests queued up or the connection will be dropped by the leader.
In a worst case we will have a leader-less cluster until the next
election timer fires up. Other servers will notice the absence of the
leader and will trigger a new leader election normally.
However, the potential wait for the election timer is not good as
real-world setups may have high values configured.
Fix that by trying to transfer to servers that we know have applied
the most changes, i.e., have the highest 'match_index'. Such servers
replied to the most recent append requests, so they have highest
chances to be healthy. Choosing the random starting point in the
list of such servers so we don't transfer to the same server every
single time. This slightly improves load distribution, but, most
importantly, increases robustness of our test suite, making it cover
more cases. Also checking that the message was actually sent without
immediate failure.
If we fail to transfer to any server with the highest index, try to
just transfer to any other server that is not behind majority and then
just any other server that is connected. We did actually send them
all the updates (if the connection is open), they just didn't reply
yet for one reason or another. It should be better than leaving the
cluster without a leader.
Note that there is always a chance that transfer will fail, since
we're not waiting for it to be acknowledged (and must not wait).
In this case, normal election will be triggered after the election
timer fires up.
Fixes: 1b1d2e6daa ("ovsdb: Introduce experimental support for clustered databases.")
Acked-by: Han Zhou <hzhou@ovn.org>
Acked-by: Felix Huettner <felix.huettner@mail.schwarz>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Open vSwitch supports the ability to invoke a controller action by way
of a sample action with a specified meter. In the normal case, this
sample action is transparently generated during xlate processing. However,
when executing via a continuation, the logic to generate the sample
action when finishing the context freeze was missing. The result is that
the behavior when action is 'controller(pause,meter_id=1)' does not match
the behavior when action is 'controller(meter_id=1)'.
OVN and other controller solutions may rely on this metering to protect
the control path, so it is critical to preserve metering, whether we are
doing a plain old send to controller, or a continuation.
Fixes: 77ab5fd2a9 ("Implement serializing the state of packet traversal in "continuations".")
Reported-at: https://issues.redhat.com/browse/FDP-455
Tested-by: Alex Musil <amusil@redhat.com>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Recirculation involves re-parsing the packet from scratch and that
process is not aware of multiple header levels nor the inner/outer
offsets. So, it overwrites offsets with new ones from the outermost
headers and sets offloading flags that change their meaning when
the packet is marked for tunnel offloading.
For example:
1. TCP packet enters OVS.
2. TCP packet gets encapsulated into UDP tunnel.
3. Recirculation happens.
4. Packet is re-parsed after recirculation with miniflow_extract()
or similar function.
5. Packet is marked for UDP checksumming because we parse the
outermost set of headers. But since it is tunneled, it means
inner UDP checksumming. And that makes no sense, because the
inner packet is TCP.
This is causing packet drops due to malformed packets or even
assertions and crashes in the code that is trying to fixup checksums
for packets using incorrect metadata:
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior
lib/packets.c:2061:15: runtime error:
member access within null pointer of type 'struct udp_header'
0 0xbe5221 in packet_udp_complete_csum lib/packets.c:2061:15
1 0x7e5662 in dp_packet_ol_send_prepare lib/dp-packet.c:638:9
2 0x96ef89 in netdev_send lib/netdev.c:940:9
3 0x818e94 in dp_netdev_pmd_flush_output_on_port lib/dpif-netdev.c:5577:9
4 0x817606 in dp_netdev_pmd_flush_output_packets lib/dpif-netdev.c:5618:27
5 0x81cfa5 in dp_netdev_process_rxq_port lib/dpif-netdev.c:5677:9
6 0x7eefe4 in dpif_netdev_run lib/dpif-netdev.c:7001:25
7 0x610e87 in type_run ofproto/ofproto-dpif.c:367:9
8 0x5b9e80 in ofproto_type_run ofproto/ofproto.c:1879:31
9 0x55bbb4 in bridge_run__ vswitchd/bridge.c:3281:9
10 0x558b6b in bridge_run vswitchd/bridge.c:3346:5
11 0x591dc5 in main vswitchd/ovs-vswitchd.c:130:9
12 0x172b89 in __libc_start_call_main (/lib64/libc.so.6+0x27b89)
13 0x172c4a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x27c4a)
14 0x47eff4 in _start (vswitchd/ovs-vswitchd+0x47eff4)
Tests added for both IPv4 and IPv6 cases. Though IPv6 test doesn't
trigger the issue it's better to have a symmetric test.
Fixes: 084c808729 ("userspace: Support VXLAN and GENEVE TSO.")
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2024-March/053014.html
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Fixing the issue of incorrect outer UDP checksum in packets sent by
E810 or X710. We disable RTE_ETH_TX_OFFLOAD_OUTER_UDP_CKSUM,but also
disable all the dependent offloads like
RTE_ETH_TX_OFFLOAD_VXLAN_TNL_TSO and
RTE_ETH_TX_OFFLOAD_GENEVE_TNL_TSO.
Fixes: 084c808729 ("userspace: Support VXLAN and GENEVE TSO.")
Reported-at: https://github.com/openvswitch/ovs-issues/issues/321
Signed-off-by: Jun Wang <junwang01@cestc.cn>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
In v4.0, LibreSwan changed a default paths that had been hardcoded in
ovs-monitor-ipsec, breaking some uses of this script. This patch adds
support for both old and newer versions by auto detecting the version
of LibreSwan and then choosing the correct path.
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1975039
Reported-by: Qijun Ding <qding@redhat.com>
Fixes: d6afbc00d5 ("ipsec: Allow custom file locations.")
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Currently, ovs-vswitchd is subscribed to all the routing changes in the
kernel. On each change, it marks the internal routing table cache as
invalid, then resets it and dumps all the routes from the kernel from
scratch. The reason for that is kernel routing updates not being
reliable in a sense that it's hard to tell which route is getting
removed or modified. Userspace application has to track the order in
which route entries are dumped from the kernel. Updates can get lost
or even duplicated and the kernel doesn't provide a good mechanism to
distinguish one route from another. To my knowledge, dumping all the
routes from a kernel after each change is the only way to keep the
cache consistent. Some more info can be found in the following never
addressed issues:
https://bugzilla.redhat.com/1337860https://bugzilla.redhat.com/1337855
It seems to be believed that NetworkManager "mostly" does incremental
updates right. But it is still not completely correct, will re-dump
the whole table in certain cases, and it takes a huge amount of very
complicated code to do the accounting and route comparisons.
Going back to ovs-vswitchd, it currently dumps routes from all the
routing tables. If it will get conflicting routes from multiple
tables, the cache will not be useful. The routing cache in userspace
is primarily used for checking the egress port for tunneled traffic
and this way also detecting link state changes for a tunnel port.
For userspace datapath it is used for actual routing of the packet
after sending to a native tunnel.
With kernel datapath we don't really have a mechanism to know which
routing table will actually be used by the kernel after encapsulation,
so our lookups on a cache may be incorrect because of this as well.
So, unless all the relevant routes are in the standard tables, the
lookup in userspace route cache is unreliable.
Luckily, most setups are not using any complicated routing in
non-standard tables that OVS has to be aware of.
It is possible, but unlikely, that standard routing tables are
completely empty while some other custom table is not, and all the OVS
tunnel traffic is directed to that table. That would be the only
scenario where dumping non-standard tables would make sense. But it
seems like this kind of setup will likely need a way to tell OVS from
which table the routes should be taken, or we'll need to dump routing
rules and keep a separate cache for each table, so we can first match
on rules and then lookup correct routes in a specific table. I'm not
sure if trying to implement all that is justified.
For now, stop considering routes from non-standard tables to avoid
mixing different tables together and also wasting CPU resources.
This fixes a high CPU usage in ovs-vswitchd in case a BGP daemon is
running on a same host and in a same network namespace with OVS using
its own custom routing table.
Unfortunately, there seems to be no way to tell the kernel to send
updates only for particular tables. So, we'll still receive and parse
all of them. But they will not result in a full cache invalidation in
most cases.
Linux kernel v4.20 introduced filtering support for RTM_GETROUTE dumps.
So, we can make use of it and dump only standard tables when we get a
relevant route update. NETLINK_GET_STRICT_CHK has to be enabled on
the socket for filtering to work. There is no reason to not enable it
by default, if supported. It is not used outside of NETLINK_ROUTE.
Fixes: f0e167f0db ("route-table: Handle route updates more robustly.")
Fixes: ea83a2fcd0 ("lib: Show tunnel egress interface in ovsdb")
Reported-at: https://github.com/openvswitch/ovs-issues/issues/185
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2022-October/052091.html
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
After running ovs-tcpdump and inputs multiple CTRL+C, the program will
raise the following exception.
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/usr/bin/ovs-tcpdump", line 421, in cleanup_mirror
ovsdb = OVSDB(db_sock)
File "/usr/bin/ovs-tcpdump", line 168, in __init__
OVSDB.wait_for_db_change(self._idl_conn) # Initial Sync with DB
File "/usr/bin/ovs-tcpdump", line 155, in wait_for_db_change
while idl.change_seqno == seq and not idl.run():
The default handler of SIGINT is default_int_handler, so it was not
registered to the signal handler. When received CTRL+C again, the program
was broken, and calling hook could not be executed completely.
Signed-off-by: Daniel Ding <danieldin186@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Reproduce:
ovs-vsctl add-port br-int p0 \
-- set interface p0 type=vxlan options:remote_ip=10.10.10.1
sleep 2
ovs-vsctl --if-exists del-port p0 \
-- add-port br-int p1 \
-- set interface p1 type=vxlan options:remote_ip=10.10.10.1
ovs-vsctl: Error detected while setting up 'p1': could not add
network device p1 to ofproto (File exists).
vswitchd log:
bridge|INFO|bridge br-int: added interface p0 on port 1106
bridge|INFO|bridge br-int: deleted interface p0 on port 1106
tunnel|WARN|p1: attempting to add tunnel port with same config as
port 'p0' (::->10.10.10.1, key=0, legacy_l2, dp port=122)
ofproto|WARN|br-int: could not add port p1 (File exists)
bridge|WARN|could not add network device p1 to ofproto (File exists)
CallTrace:
bridge_reconfigure
bridge_del_ports
port_destroy
iface_destroy__
netdev_remove <------ netdev p0 removed
bridge_delete_or_reconfigure_ports
OFPROTO_PORT_FOR_EACH
ofproto_port_dump_next
port_dump_next
port_query_by_name <------ netdev_shash do not contain p0
ofproto_port_del <------ p0 do not del in ofproto
bridge_add_ports
bridge_add_ports__
iface_create
iface_do_create
ofproto_port_add <------ p1 add failed
Fixes: fe83f81df9 ("netdev: Remove netdev from global shash when the user is changing interface configuration.")
Acked-by: Han Zhou <hzhou@ovn.org>
Tested-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Tao Liu <taoliu828@163.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The new_buffer data pointer is NULL when the size of the cloned
buffer is 0. This is fine as there is no need to allocate space.
However, the cloned buffer header/msg might be the same pointer
as data. This causes undefined behavior by adding 0 to NULL pointer.
Check if the data buffer is not NULL before attempting to apply the
header/msg offset.
This was caught by OVN system test:
lib/ofpbuf.c:203:56: runtime error: applying zero offset to null pointer
0 0xa012fc in ofpbuf_clone_with_headroom /ovs/lib/ofpbuf.c:203:56
1 0x635fd4 in put_remote_port_redirect_overlay /controller/physical.c:397:40
2 0x635fd4 in consider_port_binding /controller/physical.c:1951:9
3 0x62e046 in physical_run /controller/physical.c:2447:9
4 0x601d98 in en_pflow_output_run /controller/ovn-controller.c:4690:5
5 0x707769 in engine_recompute /lib/inc-proc-eng.c:415:5
6 0x7060eb in engine_compute /lib/inc-proc-eng.c:454:17
7 0x7060eb in engine_run_node /lib/inc-proc-eng.c:503:14
8 0x7060eb in engine_run /lib/inc-proc-eng.c:528:9
9 0x5f9f26 in main /controller/ovn-controller.c
Signed-off-by: Ales Musil <amusil@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
All supported versions of Fedora do package libbpf, so it
makes sense to enable USDT support.
Acked-by: Simon Horman <horms@ovn.org>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
During normal operations, it is useful to understand when a particular flow
gets removed from the system. This can be useful when debugging performance
issues tied to ofproto flow changes, trying to determine deployed traffic
patterns, or while debugging dynamic systems where ports come and go.
Prior to this change, there was a lack of visibility around flow expiration.
The existing debugging infrastructure could tell us when a flow was added to
the datapath, but not when it was removed or why.
This change introduces a USDT probe at the point where the revalidator
determines that the flow should be removed. Additionally, we track the
reason for the flow eviction and provide that information as well. With
this change, we can track the complete flow lifecycle for the netlink
datapath by hooking the upcall tracepoint in kernel, the flow put USDT, and
the revalidator USDT, letting us watch as flows are added and removed from
the kernel datapath.
This change only enables this information via USDT probe, so it won't be
possible to access this information any other way (see:
Documentation/topics/usdt-probes.rst).
Also included is a script (utilities/usdt-scripts/flow_reval_monitor.py)
which serves as a demonstration of how the new USDT probe might be used
going forward.
Co-authored-by: Aaron Conole <aconole@redhat.com>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Kevin Sprague <ksprague0711@gmail.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Tunnel types are not flags, but 4-bit fields, so checking them with
a simple binary 'and' is incorrect and may produce false-positive
matches.
While the current implementation is unlikely to cause any issues today,
since both RTE_MBUF_F_TX_TUNNEL_VXLAN and RTE_MBUF_F_TX_TUNNEL_GENEVE
only have 1 bit set, it is risky to have this code and it may lead
to problems if we add support for other tunnel types in the future.
Use proper field checks instead. Also adding a warning for unexpected
tunnel types in case something goes wrong.
Fixes: 084c808729 ("userspace: Support VXLAN and GENEVE TSO.")
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
RTE_MBUF_F_TX_TCP_CKSUM is not a flag, but a 2-bit field, so checking
it with a simple binary 'and' is incorrect. For example, this check
will succeed for a packet with UDP checksum requested as well.
Fix the check to avoid wrongly initializing tso_segz and potentially
accessing UDP header via TCP structure pointer.
The IPv4 checksum flag has to be set for any L4 checksum request,
regardless of the type, so moving this check out of the TCP condition.
Fixes: 8b5fe2dc60 ("userspace: Add Generic Segmentation Offloading.")
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
In some cases only outer offloads may be requested for a tunneled
packet. In this case there is no need to mark the type of an
inner packet. Clean these flags up to avoid potential confusion
of DPDK drivers.
Fixes: 084c808729 ("userspace: Support VXLAN and GENEVE TSO.")
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Some drivers (primarily, Intel ones) do not expect any marking flags
being set if no offloads are requested. If these flags are present,
driver will fail Tx preparation or behave abnormally.
For example, ixgbe driver will refuse to process the packet with
only RTE_MBUF_F_TX_TUNNEL_GENEVE and RTE_MBUF_F_TX_OUTER_IPV4 set.
This pretty much breaks Geneve tunnels on these cards.
An extra check is added to make sure we don't have any unexpected
Tx offload flags set.
Fixes: 084c808729 ("userspace: Support VXLAN and GENEVE TSO.")
Reported-at: https://github.com/openvswitch/ovs-issues/issues/321
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Starting with image version 20240310.1.0, GitHub runners are using
32-bit entropy for ASLR:
$ sudo sysctl -a | grep vm.mmap.rnd
vm.mmap_rnd_bits = 32
vm.mmap_rnd_compat_bits = 16
This breaks all the asan-enabled builds, because older asan gets
confused by memory mappings and crashes with segmentation fault.
The issue is fixed in newer releases of llvm:
fb77ca05ffhttps://reviews.llvm.org/D148280
But these are not available in Ubuntu 22.04 image.
This should be fixed by GitHub, but until new images are available
reducing ASLR entropy manually to 28 bits to make builds work.
Reported-at: https://github.com/actions/runner-images/issues/9491
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Previously when an empty mutation was used to count the number of rows
in a table, OVSDB would iterate over all rows twice. First to perform an
RBAC check, and then to perform the no-operation.
This change adds a short circuit to mutate operations with no conditions
and an empty mutation set, returning immediately. One notable change in
functionality is not performing the RBAC check in this condition, as no
mutation actually takes place.
Reported-by: Terry Wilson <twilson@redhat.com>
Reported-at: https://issues.redhat.com/browse/FDP-359
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
A log message like this one:
2024-01-09T06:45:17.201Z|00071|bfd(handler2)|INFO|ovn-0af536-0: BFD state
change: down->up "Neighbor Signaled Session Down"->"Neighbor Signaled Session
Down".
can be hard to read since '->' usually represents a status change, but
in this case the diagnostic code stays constant. Update the log message to
avoid such ambiguity. The log message for the above event become:
2024-01-09T06:45:16.211Z|00026|bfd(handler3)|INFO|ovn-0af536-0: BFD state
change: (bfd.SessionState: down, bfd.LocalDiag: "Neighbor Signaled Session
Down") -> (bfd.SessionState: up, bfd.LocalDiag: "Neighbor Signaled Session
Down")
Reported-by: Alex Stupnikov <astupnik@redhat.com>
Reported-at: https://bugzilla.redhat.com/2258496
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
In OpenSSL 3.2.0 (81b741f) all the "alert" error messages were updated to
replace "sslv3" with "ssl/tls".
This commit updates the "SSL db: implementation" test to support both the
pre-openssl 3.2.0 error message: "sslv3 alert certificate unknown" and the
post-openssl 3.2.0 error message: "ssl/tls alert certificate unknown".
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
On netdev datapath, when a ct element was cleaned, the cmap
could be shrinked, potentially causing some elements to be skipped
in the flush iteration.
Fixes: 967bb5c5cd ("conntrack: Add rcu support.")
Signed-off-by: Xavier Simonart <xsimonar@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Simon Horman <horms@ovn.org>
Recently OVS adopted a policy of using the inclusive naming word list v1
[1, 2].
This patch addresses the use of the term master in the context of a
hostname used in documentation of the kernel_delay utility. It does so
by using localhost as the hostname instead.
[1] df5e5cf ("Documentation: Add section on inclusive language.")
[2] https://inclusivenaming.org/word-lists/
Signed-off-by: Simon Horman <horms@ovn.org>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Recently OVS adopted a policy of using the inclusive naming word list v1
[1, 2].
This patch partially addresses the use of the term master in the
context of LAG devices by using the term primary instead: the local
variables master_netdev and master_name are renamed as primary_netdev
and primary_name.
Related comments are also updated.
No functional change intended.
[1] df5e5cf ("Documentation: Add section on inclusive language.")
[2] https://inclusivenaming.org/word-lists/
Signed-off-by: Simon Horman <horms@ovn.org>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Recently OVS adopted a policy of using the inclusive naming word list v1
[1, 2].
This patch partially addresses the use of the term master in the
context of LAG devices by using the term primary instead: the
is_lag_master field of struct netdev_linux is renamed is_lag_primary.
A related comment is also updated.
No functional change intended.
[1] df5e5cf ("Documentation: Add section on inclusive language.")
[2] https://inclusivenaming.org/word-lists/
Signed-off-by: Simon Horman <horms@ovn.org>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Recently OVS adopted a policy of using the inclusive naming word list v1
[1, 2].
This patch addresses the use of the term master repository by
using the term main repository instead.
This is as distinct from addressing the use of a master branch,
which remains as a follow-up task.
[1] df5e5cf ("Documentation: Add section on inclusive language.")
[2] https://inclusivenaming.org/word-lists/
Signed-off-by: Simon Horman <horms@ovn.org>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Since the patch-set that included [1] there has been a policy of using
the term member for bonds, LACP, and bundle contexts. This is
consistent with the more recently adopted policy of using the inclusive
naming word list v1 [2, 3].
This patch addresses two instances where the term member should be used
in vswitch.xml. It does not address instances of alternative wording
that require code updates, which can addressed as follow-up activity.
[1] 91fc374a9c ("Eliminate use of term "slave" in bond, LACP, and bundle contexts.")
[2] df5e5cf431 ("Documentation: Add section on inclusive language.")
[3] https://inclusivenaming.org/word-lists/
Signed-off-by: Simon Horman <horms@ovn.org>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
OpenSSL 1.0.2u is long deprecated and not available for download.
So, our CI never actually downloads it and uses whatever is in the
OpenSSL-Win64 folder provided by AppVeyor. Luckily, it happens to
be OpenSSL 1.0.2u today.
The oldest supported version of OpenSSL upstream today is 3.0.
And it is an LTS version. 3.1 and 3.2 are not LTS.
Use OpenSSL 3.0 for testing instead.
This commit does a few things to achieve that:
1. Removes the folder provided by AppVeyor. This way we will fail
the build if something goes wrong instead of silently using
OpenSSL version provided by AppVeyor.
2. Obtains the JSON description of available releases and downloads
the latest minor version of OpenSSL 3.0 64-bit. With this approach
we should not need to update the download link that frequently.
New minor releases will be picked up automatically. They should
not have any breaking changes, so should be fine to use in CI.
OpenSSL 3.0 is supported until at least Sep 2026.
The JSON file is an official file referenced on the:
https://slproweb.com/products/Win32OpenSSL.html
So, it should be safe to use.
3. Executes the downloaded installer with 'Start-Process -Wait' to
properly wait for installation to finish instead of just sleeping
for 30 seconds.
4. Caches the downloaded installer, so we're not downloading 300 MB
on each CI run as that is not nice to do. We know the hash of the
latest version, so we will re-download only when the binary changes,
i.e. on a new minor release.
For the cache to work we need to introduce the 'install' phase,
because caches are populated after 'init', but before 'install'.
Alternatively, we could have just renamed 'init' to 'install',
but I think it's a little nicer to have separate phases, and we
can also move 'windows-prepare.sh' to the install phase.
Cache is also invalidated whenever appveyor.yml changes.
Acked-by: Simon Horman <horms@ovn.org>
Acked-by: Alin-Gabriel Serdean <aserdean@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
OpenSSL 1.1.0 changed the library names from libeay32 and ssleay32 to
standard libssl and libcrypto. All the versions of OpenSSL that used
old names reached their official EoL, so it should be safe to just
migrate to new names. They can still be supported via premium support
option, but I don't think that is important for us.
Also, OpenSSL installers for older versions had the following folder
structure:
C:\OPENSSL-WIN64\
+---bin
+---include
| +---openssl
+---lib
| libeay32.lib
| ssleay32.lib
+---VC
libeay32MD.lib
libeay32MDd.lib
libeay32MT.lib
libeay32MTd.lib
ssleay32MD.lib
ssleay32MDd.lib
ssleay32MT.lib
ssleay32MTd.lib
With newer OpenSSL 3+ the structure is different:
C:\OPENSSL-WIN64
+---bin
+---include
| +---openssl
+---lib
+---VC
+---x64
+---MD
| libcrypto.lib
| libssl.lib
+---MDd
| libcrypto.lib
| libssl.lib
+---MT
| libcrypto.lib
| libssl.lib
+---MTd
libcrypto.lib
libssl.lib
Basically, instead of one generic library in the lib folder and a bunch
of differently named versions of it for different type of linkage, we
now have multiple instances of the library located in different folders
based on the linkage type. So, we have to provide an exact path in
order to find the library.
'lib/VC/x64/MT' was chosen in this patch since it is a way used for
building in build-aux/ccl.
MD stands for dynamic linking, MT is static, 'd' stands for debug
versions of the libraries.
While at it, fixing documentation examples to point to Win64 default
installation folder.
Acked-by: Simon Horman <horms@ovn.org>
Acked-by: Alin-Gabriel Serdean <aserdean@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
There is no chmod or 'mkdir -m' support on Windows, so setting file
permissions for keys and certificates doesn't actually work.
Implementing them using icacls utility instead.
ovs-pki script currently only uses 0700 and 0750 modes, so only those
(and 0600) are implemented. NTFS ACLs on Windows are fairly different
and more complex in comparison with Unix file permissions, so it's hard
to implement these functions in a generic way. The script will fail if
it will encounter an unknown mode.
0700 is implemented as a F (full access) for 'Creator Owner' with no
other permissions. 0750 has an additional RX (read+execute) for the
'Creator Group'. 0600 is implemented the same as 0700, since it
doesn't matter for this use case to have or not to have an executable
or traversal permissions managed separately from everything else and
it would be a little overly verbose to give all the permissions except
for X.
Inheritance rules are set to (OI)(CI), so the folder itself, subfolders
and files in a folder inherit those ACEs.
'umask' also doesn't work on Windows. Instead, moving the private key
output files to a temporary folder that has restricted access already
configured. The file will inherit these restricted ACEs.
It should not be necessary to set explicit permissions for these
files since moving them within the same volume should preserve ACEs.
However, it might be safer to chmod them directly as well, just in
case. Windows administrators will still have to be careful with
private keys, because file copies do not preserve permissions and
moves to different volumes do not preserve them as well. 'robocopy'
with flags to copy security should be used in these cases. We may
want to re-implement 'mv' with 'robocopy' if that becomes a problem
in the future.
There is one more place where umask is used in the script for
creation of a self-signed certificate, but it is not actually needed
there since the resulted certificate doen't need to be private, so
not changing this part for now.
Tested with running an empty 'make check' in AppVeyor and examining
permissions for files in tests/pki:
Files | Linux | Windows
---------------------+------------+--------------------------------------
controllerca | drwxr-xr-x | NT AUTHORITY\SYSTEM:(I)(OI)(CI)(F)
switchca | | BUILTIN\Administrators:(I)(OI)(CI)(F)
*ca\certs | | BUILTIN\Users:(I)(OI)(CI)(RX)
*ca\crl | | BUILTIN\Users:(I)(CI)(AD)
*ca\newcerts | | BUILTIN\Users:(I)(CI)(WD)
| | APPVEYOR-VM\appveyor:(I)(F)
| | CREATOR OWNER:(I)(OI)(CI)(IO)(F)
---------------------+------------+--------------------------------------
stamp | -rw-r--r-- | NT AUTHORITY\SYSTEM:(I)(F)
test-cert.pem | | BUILTIN\Administrators:(I)(F)
test-req.pem | | BUILTIN\Users:(I)(RX)
test2-cert.pem | | APPVEYOR-VM\appveyor:(I)(F)
test2-req.pem | |
*ca\ca.cnf | |
*ca\cacert.pem | |
*ca\careq.pem | |
*ca\crlnumber | |
*ca\index.txt* | |
*ca\serial* | |
*ca\newcerts\*.pem | |
---------------------+------------+--------------------------------------
controllerca\private | drwx------ | APPVEYOR-VM\appveyor:(F)
switchca\private | | CREATOR OWNER:(OI)(CI)(IO)(F)
---------------------+------------+--------------------------------------
test-privkey.pem | -rw------- | APPVEYOR-VM\appveyor:(F)
test2-privkey.pem | |
*ca\private\cakey.pem| |
We can see that private folders and keys have only a full access from
their owners. Other files and folders have some extra inherited ACEs
from a containing folder.
Acked-by: Simon Horman <horms@ovn.org>
Acked-by: Alin-Gabriel Serdean <aserdean@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
We need to know exact linking / compilation errors in order
to fix issues. We could have uploaded it as an artifact,
but it seems easier to just print it out for now.
Acked-by: Simon Horman <horms@ovn.org>
Acked-by: Alin-Gabriel Serdean <aserdean@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
In order to properly balance bond traffic, ofproto/bond periodically
reads usage statistics of the post-recirculation rules (which are added
to a hidden internal table).
To do that, each "struct bond_entry" (which represents a hash within a
bond) stores the last seen statistics for its rule. When a hash is moved
to another member (due to a bond rebalance or the previous member going
down), the rule is typically just modified, i.e: same match different
actions. In this case, statistics are preserved and accounting continues
to work.
However, if the rule gets completely deleted (e.g: when all bond members
go down) and then re-created, the new rule will have 0 tx_bytes but its
associated entry will still store a non-zero last-seen value.
This situation leads to an overflow of the delta calculation (computed
as [current_stats_value - last_seen_value]), which can affect traffic
as the hash will be considered to carry a lot of traffic and rebalancing
will kick in.
In order to fix this situation, reset the value of last seen statistics
on rule deletion.
Implementation notes:
Modifying pr_tx_bytes requires write-locking the global rwlock but a
lockless version of update_recirc_rules was being maintained to avoid locking
on bon_unref().
Considering the small impact of locking during bond removal, removing the
lockless version and relying on clang's thread safety analysis is preferred.
Also, folding Ilya's [1], i.e: fixing thread safety annotation in
update_recirc_rules() to require holding write-lock.
[1]
https://patchwork.ozlabs.org/project/openvswitch/patch/20240209161718.1149494-1-i.maximets@ovn.org/
Reported-at: https://github.com/openvswitch/ovs-issues/issues/319
Co-authored-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Trace attempts to process all the recirculations. However, if there
is a recirculation loop, i.e. if every recirculation generates another
recirculation, this process will never stop. It will grind until the
trace fills the system memory.
A simple reproducer:
make sandbox
ovs-vsctl add-br br0
ovs-vsctl add-port br0 p1
ovs-ofctl add-flow br0 "table=0,in_port=p1,ip,actions=ct(table=0)"
ovs-appctl ofproto/trace br0 in_port=p1,ip
Limit the number of recirculations trace is processing with a fairly
arbitrary number - 4096 (loosely based on the resubmit limit, but
they are not actually related).
Not adding a test for this since it's only for a trace, but also
because the test may lead to OOM event in a system if the test fails,
which is not nice.
Fixes: e6bc8e7493 ("ofproto/trace: Add support for tracing conntrack recirculation")
Reported-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
Acked-by: Simon Horman <horms@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
With a new runner update, GitHub Actions had a kernel update.
And it seems like something changed between kernels 6.2 and 6.5
so this test now fails very frequently.
I can reproduce the same issue on RHEL 9, and I can't reproduce
it on Ubuntu 23.04 (kernel 6.2).
The test is creating a NAT with a single address+port pair in
an attempt to simulate an address space exhaustion. It is
expected that a first connection with wget leaves a conntrack
entry in a TIME_WAIT state and the second wget should fail
as long as this entry remains, because the only available
address+port pair is already taken.
However, very frequently (not always!) the second connection
replaces the first conntrack entry with a new one and connection
succeeds. There is still only one connection in the conntrack
at any single moment in time, so there is seemingly no issue with
the NAT, but the behavior is unexpected and the test fails.
The issue is likely introduced by a new kernel feature that
allows to evict connections that are in the process of closing:
https://lore.kernel.org/netdev/20230626064749.75525-7-pablo@netfilter.org/
Disable the test in CI until we figure out how to fix it.
Acked-by: Simon Horman <horms@ovn.org>
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This updates links to several upstream Kernel documents.
1. Lore is now the canonical archive for the netdev mailing list
2. net-next is now maintained by the netdev team,
of which David Miller is currently a member,
rather than only by David.
Also, use HTTPS rather than HTTP.
3. The Netdev FAQ has evolved into the Netdev Maintainer Handbook.
4. The Kernel security document link was dead,
provide the current canonical location for this document instead.
1., 2. & 3. Found by inspection
4. Flagged by check-docs
Signed-off-by: Simon Horman <horms@ovn.org>
Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Update link to OCF Resource Agents documentation as the existing link
is broken. Also, use HTTPS.
Broken link flagged by make check-docs
Signed-off-by: Simon Horman <horms@ovn.org>
Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Correct spelling errors in .rst files flagged by codespell.
Also correct some minor grammar errors in nearby documentation.
Signed-off-by: Simon Horman <horms@ovn.org>
Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
IANAL, but I think we can extend the copyright attached
to documentation to cover the current year: we are still
actively working on the documentation.
Signed-off-by: Simon Horman <horms@ovn.org>
Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Some network cards support inner checksum offloading but not outer
checksum offloading. Currently OVS will resolve that outer checksum but
allows the network card to resolve the inner checksum, invalidating the
outer checksum in the process.
Now if we can't offload outer checksums, we don't offload inner either.
Reported-at: https://issues.redhat.com/browse/FDP-363
Fixes: 084c808729 ("userspace: Support VXLAN and GENEVE TSO.")
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Avoid unnecessary thread creation as no upcalls are generated,
resulting in idle threads waiting for process termination.
This optimization significantly reduces memory usage, cutting it
by half on a 128 CPU/thread system during testing, with the number
of threads reduced from 95 to 0.
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Local IP is taken into account only in case of IPv4 address, IPv6
source is not checked. That leads to source being ignored during the
route lookup and ultimately packets encapsulated with a source IP
found during a route lookup, which is likely the wrong one.
Even worse, after encapsulation we have a difference between the
tunnel metadata that contains a correct source IP and the generated
actions that used a wrong source IP. This means that if there are
OpenFlow rules in a bridge where packet goes after encapsulation,
we may match on rules that do not correspond to the actual packet
we have.
Add the check for IPv6 source address before the route lookup.
Tests added to check that we're actually using the configured local_ip
as a source address in the packet. Also adding the same test for IPv4,
since apparently we don't have any tests covering this functionality
for userspace tunnels.
This issue also affects the case where source address is set via
OpenFlow, e.g. 'set_filed:2001:beef::88->tun_ipv6_src', but it's just
a different way of populating the tunnel metadata that doesn't depend
on a tunnel to be native or kernel one. So, not adding extra tests
for this case for now.
Fixes: 8e4e45887e ("ofproto-dpif-xlate: makes OVS native tunneling honor tunnel-specified source addresses")
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2024-February/052938.html
Reported-by: Derrick Lim <derrick.lim@rakuten.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
To mimic what kernel routing subsystem does [1], add a local route
entry for every dummy IP address. This helps with OVN testing multiple
chassis on a single host and allows to run better unit tests for
userspace tunnels without adding route entries manually. This is also
the only way to add 'local' route entries that are required for testing
'local_ip' functionality with native tunnels in userspace datapath
because route lookup will reject non-local source IPs.
There seems to be no way to explicitly remove an IP address from
netdev-dummy, hence no code path to handle route entry cleanup.
The port itself can be removed, but our tests do not normally do that.
Removal can be implemented later if necessary.
[1]: http://linux-ip.net/html/routing-tables.html#routing-table-local
"If the machine has several IP addresses on one Ethernet interface,
there will be a route to each locally hosted IP in the local routing
table. This is a normal side effect of bringing up an IP address on
an interface under linux."
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com>
Co-authored-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
It's not a system test as it runs with dummy datapath and ports
and it has nothing to do with layer 3 tunnels.
It should be with other userspace tunnel tests.
While moving also making it a little nicer visually and less error
prone by requesting port numbers for all the ports.
Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>