2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-30 13:58:14 +00:00
Commit Graph

19352 Commits

Author SHA1 Message Date
wenxu
545b64415d conntrack: Prefer dst port range during unique tuple search.
This commit splits the nested loop used to search the unique ports for
the reverse tuple.
It affects only the dnat action, giving more precedence to the dnat
range, similarly to the kernel dp, instead of searching through the
default ephemeral source range for each destination port.

Acked-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-04 20:16:37 +01:00
wenxu
ec85f5325f conntrack: Select correct sport range for well-known origin sport.
Like the kernel datapath. The sport nat range for well-konwn origin
sport should limit in the well-known ports.

Acked-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-04 20:14:29 +01:00
Mohammad Heib
10b55282a0 ipsec: StrongSwan report connection update failures to ovs logs.
Currently when the user adds an IPsec tunnel port to the
ovs bridge the ovs-monitor-ipsec script will add this tunnel
IPsec-related configuration to the appropriate file and
submit a request to start the IPsec connection for this port
and ignores the request output which can contain an error message.

This patch captures the request output and prints
the error message to the ovs logs.

Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Mohammad Heib <mheib@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-04 20:11:04 +01:00
Ilya Maximets
7bd08b6c16 AUTHORS: Add Mohammad Heib.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-04 20:09:28 +01:00
Mohammad Heib
02cff6b2d4 ipsec: Libreswan report connection failures to ovs logs.
Currently when the user adds an IPsec tunnel port to the
ovs bridge the ovs-monitor-ipsec script will submit a request
to start the IPsec connection for this port and ignores
the request output which can contain an error message.

This patch captures the request output and prints
the error message to the ovs logs.

Signed-off-by: Mohammad Heib <mheib@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-04 20:07:15 +01:00
Wan Junjie
9016592ca0 netdev-dpdk: Add mempool count in cmd get-mempool-info.
The rte_mempool_avail_count() and rte_mempool_in_use_count() DPDK API
can tell us the usage of the mempool.  It could be helpful for debug
on any memleak in the mempool.

Add a line in the cmd's output:
     count: avail (118988), in use (12084)

Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Wan Junjie <wanjunjie@bytedance.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-04 19:58:15 +01:00
Flavio Leitner
7ed60839d0 system-tso: Skip encap tests when userspace TSO is enabled.
It seems Linux native tunnel configuration changed to enable
checksum by default and that causes the check-system-tso unit
test below to fail:
 10: datapath - ping over vxlan tunnel    FAILED (system-traffic.at:248)

That happens because userspace TSO doesn't support encapsulation
as mentioned in the current documentation. In this specific case,
udp_extract_tnl_md() checks if the checksum is correct, but since
TSO is enabled, the outer UDP header contains only the pseudo
checksum and not the full packet checksum.

Although the packet is marked correctly with UDP csum offload flag
and the code could use that to verify the pseudo csum, more work
is needed to properly translate the offloading flags from the outer
headers to the inner headers.  For example, if the payload is a
TCP packet, most probably the flag DP_PACKET_OL_TX_UDP_CKSUM doesn't
make sense after decapsulating that.

This patch skips the tunnel tests when the userspace TSO is enabled.

Fixes: 29bb3093eb ("userspace: Enable TSO support for non-DPDK.")
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-04 19:51:35 +01:00
Paul Blakey
f34a7626cc tc: Fix stats byte count on fragmented packets.
Fragmented packets with offset=0 are defragmented by tc act_ct, and
only when assembled pass to next action, in ovs offload case,
a goto action. Since stats are overwritten on each action dump,
only the stats for last action in the tc filter action priority
list is taken, the stats on the goto action, which count
only the assembled packets. See below for example.

Hardware updates just part of the actions (gact, ct, mirred) - those
that support stats_update() operation. Since datapath rules end
with either an output (mirred) or recirc/drop (both gact), tc rule
will at least have one action that supports it. For software packets,
the first action will have the max software packets count.
Tc dumps total packets (hw + sw) and hardware packets, then
software packets needs to be calculated from this (total - hw).

To fix the above, get hardware packets and calculate software packets
for each action, take the max of each set, then combine back
to get the total packets that went through software and hardware.

Example by running ping above MTU (ping <IP> -s 2000):
ct_state(-trk),recirc_id(0),...,ipv4(proto=1,frag=first),
  packets:14, bytes:19544,..., actions:ct(zone=1),recirc(0x1)
ct_state(-trk),recirc_id(0),...,ipv4(proto=1,frag=later),
  packets:14, bytes:28392,..., actions:ct(zone=1),recirc(0x1)

Second rule should have had bytes=14*<size of 'later' frag>, but instead
it's bytes=14*<size of assembled packets - size of 'first' + 'later'
frags>.

Fixes: 576126a931 ("netdev-offload-tc: Add conntrack support")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-04 16:02:45 +01:00
Paul Blakey
de634e4229 compat: Add gen_stats include to define tc hw stats.
Update kernel UAPI to support dumping hardware stats
of tc filters.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-04 16:02:45 +01:00
Ilya Maximets
015994d37f ovsdb: row: Optimize row updates by applying diffs in-place.
ovsdb_datum_apply_diff_in_place() is much faster than the usual
ovsdb_datum_apply_diff() in most cases, because it doesn't clone or
compare unnecessary data.  Since the original destination datum is
destroyed anyway, we might use the faster function here to speed up
transaction processing.

ovsdb_row_update_columns() with xor is mainly used by relay databases.
So, this change should improve their performance.

Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-03 15:25:55 +01:00
Ilya Maximets
a3e97b1af1 ovsdb: relay: Add transaction history support.
Even though relays can be scaled to the big number of servers to
handle a lot more clients, lack of transaction history may cause
significant load if clients are re-connecting.
E.g. in case of the upgrade of a large-scale OVN deployment, relays
can be taken down one by one forcing all the clients of one relay to
jump to other ones.  And all these clients will download the database
from scratch from a new relay.

Since relay itself supports monitor_cond_since connection to the
main cluster, it receives the last transaction id along with each
update.  Since these transaction ids are 'eid's of actual transactions,
they can be used by relay for a transaction history.

Relay may not receive all the transaction ids, because the main cluster
may combine several changes into a single monitor update.  However,
all relays will, likely, receive same updates with the same transaction
ids, so the case where transaction id can not be found after
re-connection between relays should not be very common.  If some id
is missing on the relay (i.e. this update was merged with some other
update and newer id was used) the client will just re-download the
database as if there was a normal transaction history miss.

OVSDB client synchronization module updated to provide the last
transaction id along with the update.  Relay module updated to use
these ids as a transaction id.  If ids are zero, relay decides that
the main server doesn't support transaction ids and disables the
transaction history accordingly.

Using ovsdb_txn_replay_commit() instead of ovsdb_txn_propose_commit_block(),
so transactions are added to the history.  This can be done, because
relays has no file storage, so there is no need to write anything.

Relay tests modified to test both standalone and clustered database
as a main server.  Checks added to ensure that all servers receive the
same transaction ids in monitor updates.

Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-03 15:21:21 +01:00
Ilya Maximets
999ba294fb ovsdb: raft: Fix inability to join the cluster after interrupted attempt.
If the joining server re-connects while catching up (e.g. if it crashed
or connection got closed due to inactivity), the data we sent might be
lost, so the server will never reply to append request or a snapshot
installation request.  At the same time, leader will decline all the
subsequent requests to join from that server with the 'in progress'
resolution.  At this point the new server will never be able to join
the cluster, because it will never receive the raft log while leader
thinks that it was already sent.

This happened in practice when one of the servers got preempted for a
few seconds, so the leader closed connection due to inactivity.

Destroying the joining server if disconnection detected.  This will
allow to start the joining from scratch when the server re-connects
and sends the new join request.

We can't track re-connection in the raft_conn_run(), because it's
incoming connection and the jsonrpc will not keep it alive or
try to reconnect.  Next time the server re-connects it will be an
entirely new raft conn.

Fixes: 1b1d2e6daa ("ovsdb: Introduce experimental support for clustered databases.")
Reported-at: https://bugzilla.redhat.com/2033514
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>
2022-02-25 14:15:12 +01:00
Ilya Maximets
6de8868d19 reconnect: Fix broken inactivity probe if there is no other reason to wake up.
The purpose of reconnect_deadline__() function is twofold:

1. Its result is used to tell if the state has to be changed right now
   in reconnect_run().
2. Its result also used to determine when the process need to wake up
   and call reconnect_run() for a next time, i.e. when the state may
   need to be changed next time.

Since introduction of the 'receive-attempted' feature, the function
returns LLONG_MAX if the deadline is in the future.  That works for
the first case, but doesn't for the second one, because we don't
really know when we need to call reconnect_run().

This is the problem for applications where jsonrpc connection is the
only source of wake ups, e.g. ovn-northd.  When the network goes down
silently, e.g. server looses IP address due to DHCP failure, ovn-northd
will sleep in the poll loop indefinitely after being told that it
doesn't need to call reconnect_run() (deadline == LLONG_MAX).

Fixing that by actually returning the expected time if it is in the
future, so we will know when to wake up.  In order to keep the
'receive-attempted' feature, returning 'now + 1' in case where the
time has already passed, but receive wasn't attempted.  That will
trigger a fast wake up, so the application will be able to attempt the
receive even if there was no real events.  In a correctly written
application we should not fall into this case more than once in a row.
'+ 1' ensures that we will not transition into a different state
prematurely, i.e. before the receive is actually attempted.

Fixes: 4241d652e4 ("jsonrpc: Avoid disconnecting prematurely due to long poll intervals.")
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-02-24 17:04:32 +01:00
Frank Guo
7aaa5b8137 datapath-windows: Fix NXM_OF_IP_TOS issue
Currenlty Ovs-windows can not change tos using NXM_OF_IP_TOS, this patch fixes it.

1, test with the following flow :
ovs-ofctl.exe add-flow br-int "table=0,priority=300,in_port=antrea-gw0,icmp actions=mod_nw_tos:28,load:0x1->NXM_NX_REG0[0..3],resubmit(,SpoofGuard)"
2, capture packet trace on destination side :
02:23:30.625049 IP (tos 0x1c, ttl 128, id 15237, offset 0, flags [none], proto ICMP (1), length 60)
    192.168.250.1 > 192.168.248.1: ICMP echo request, id 1, seq 10, length 40

Reported-at:openvswitch/ovs-issues#244
Signed-off-by: Frank Guo <frankg@vmware.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Alin-Gabriel Serdean <aserdean@ovn.org>
2022-02-23 20:48:27 +02:00
Eelco Chaudron
4f933301f0 Documentation: Update USDT documentation to include systemtap dependency.
Update the documentation to include details on SystemTap dependency
when enabling USDT probes.

Suggested-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Acked-by: Adrián Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-02-16 20:58:51 +01:00
Dumitru Ceara
4628be9ff8 ovsdb-idl: Fix use-after-free when destroying an IDL loop.
Transactions that are still incomplete (waiting for a reply from the
server) are kept in the IDL's 'outstanding_txns' map.  When a transaction
is destroyed, ovsdb_idl_txn_destroy() will take care of removing the
transaction from the 'outstanding_txns' map if the transaction was
incomplete but also abort it and disassemble it if needed.

Aborting the transaction first, before ovsdb_idl_txn_destroy(), may
cause an use-after-free if the transaction was outstanding; that's
because the transaction would move to state "aborted" without being
removed from the 'outstanding_txns' map.

Fixes: 53a540e531 ("ovsdb-idl: ovsdb_idl_loop_destroy must also destroy the committing txn.")
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-02-16 20:56:36 +01:00
Gaetan Rivet
31dc72c644 dpif-netdev: Use dp_netdev reference in offload threads.
The PMD reference taken is not actually used, it is only needed to get
the dp_netdev linked. Additionally, the taking of the PMD reference
does not protect against the disappearance of the dp_netdev,
so it is misleading.

The dp reference is protected by the way the ports are being deleted
during datapath deletion. No further offload request should be found
past a flush, so it is safe to keep this reference in the offload item.

Signed-off-by: Gaetan Rivet <grive@u256.net>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-02-16 13:11:44 +01:00
Sriharsha Basavapatna
7d8b6ab64d dpif-netdev: Fix a race condition in deletion of offloaded flows.
In dp_netdev_pmd_remove_flow() we schedule the deletion of an
offloaded flow, if a mark has been assigned to the flow. But if
this occurs in the window in which the offload thread completes
offloading the flow and assigns a mark to the flow, then we miss
deleting the flow. This problem has been observed while adding
and deleting flows in a loop. To fix this, always enqueue flow
deletion regardless of the flow->mark being set.

Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id")
Co-authored-by: Gaetan Rivet <grive@u256.net>
Signed-off-by: Gaetan Rivet <grive@u256.net>
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-02-16 13:11:44 +01:00
Gaetan Rivet
a81bb674ed dpif-netdev: Move port flush after datapath reconfiguration.
Port flush and offload uninit should be moved after the datapath
has been reconfigured. That way, no other thread such as PMDs will
find this port to poll and enqueue further offload requests.

After a flush, almost no further offload request for this port should
be found in the queue.

There will still be some issued by revalidators, but they
will be caught when the offload thread fails to take a netdev ref.

This change fixes the issue of datapath reference being improperly
accessed by offload threads while it is being destroyed.

Fixes: 5b0aa55776 ("dpif-netdev: Execute flush from offload thread.")
Fixes: 62d1c28e9c ("dpif-netdev: Flush offload rules upon port deletion.")
Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Gaetan Rivet <grive@u256.net>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-02-16 13:11:44 +01:00
Cian Ferriter
f92e6946d0 dpif-netdev-dpcls: Make subtable reprobe thread-safe.
The subtable search function can be used at any time by a PMD thread.
Setting the subtable search function should be done atomically to
prevent garbage data from being read.

A dpcls_subtable_lookup_reprobe() call can happen at the same time that
DPCLS subtables are being sorted. This could lead to modifications done
by the reprobe() to be lost. Prevent this from happening by locking on
pmd->flow_mutex. After this change both the reprobe function and a
subtable sort will share the flow_mutex preventing modifications by
either one from being lost.

Also remove the pvector_publish() call. The pvector is not being changed
in dpcls_subtable_lookup_reprobe(), only the data pointed to by pointers
in the vector are being changed.

Fixes: 3d018c3ea7 ("dpif-netdev: add subtable lookup prio set command.")
Reported-by: Ilya Maximets <i.maximets@ovn.org>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2022-January/390757.html
Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2022-02-15 16:32:07 +00:00
Dumitru Ceara
5f4dfcccba ci: Fix typo in variable name.
Reported-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-02-15 00:05:56 +01:00
Dumitru Ceara
3ffeb03fee dp-packet: Ensure packet base is always non-NULL.
UB Sanitizer report:
  lib/dp-packet.h:297:39: runtime error: applying zero offset to null pointer
      #0 0x7946f5 in dp_packet_tail ./lib/dp-packet.h:297:39
      #1 0x794331 in dp_packet_tailroom ./lib/dp-packet.h:325:49
      #2 0x7942a0 in dp_packet_prealloc_tailroom lib/dp-packet.c:297:16
      #3 0xc347cf in eth_compose lib/packets.c:1061:5
      [...]

Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-02-14 23:45:36 +01:00
Dumitru Ceara
172d8bfed8 bfd: lldp: stp: Fix misaligned packet field access.
UB Sanitizer reports:
  lib/bfd.c:748:16:
        runtime error: member access within misaligned address 0x000001f0d6ea
                       for type 'struct msg', which requires 4 byte alignment
  0x000001f0d6ea: note: pointer points here
   00 20  00 00 20 40 03 18 93 f9  0a 6e 00 00 00 00 00 0f 42 40 00 0f ...
                ^
      #0 0x59008e in bfd_process_packet lib/bfd.c:748
      #1 0x52a240 in process_special ofproto/ofproto-dpif-xlate.c:3370
      #2 0x553452 in xlate_actions ofproto/ofproto-dpif-xlate.c:7766
      #3 0x4fc9e6 in upcall_xlate ofproto/ofproto-dpif-upcall.c:1237
      #4 0x4fdecc in process_upcall ofproto/ofproto-dpif-upcall.c:1456
      #5 0x4fd936 in upcall_cb ofproto/ofproto-dpif-upcall.c:1358
      [...]
  lib/stp.c:754:15:
        runtime error: member access within misaligned address 0x000002c4ea61
        for type 'const   struct stp_bpdu_header', which requires 2 byte alignment
  0x000002c4ea61: note: pointer points here
   26 42 42  03 00 00 00 00 00 80 00  aa 66 aa 66 00 01 00 00  00 00 80 ...
                ^
      #0 0x8a2bce in stp_received_bpdu lib/stp.c:754
      #1 0x51e603 in stp_process_packet ofproto/ofproto-dpif-xlate.c:1788
      #2 0x52a96d in process_special ofproto/ofproto-dpif-xlate.c:3394
      #3 0x5534df in xlate_actions ofproto/ofproto-dpif-xlate.c:7766
      #4 0x4fcb49 in upcall_xlate ofproto/ofproto-dpif-upcall.c:1237
      [...]
  lib/lldp/lldp.c:149:10:
        runtime error: load of misaligned address 0x7ffcc0ae72bd for type
                       'ovs_be16', which requires 2 byte alignment
  0x7ffcc0ae72bd: note: pointer points here
   8e e7 84 ad 04 00 05  46 61 73 74 45 74 68 65  72 6e 65 74 20 31 2f 35 ...
               ^
      #0 0x718d63 in lldp_tlv_end lib/lldp/lldp.c:149
      #1 0x7191de in lldp_send lib/lldp/lldp.c:184
      #2 0x484d6c in test_aa_send tests/test-aa.c:238
      [...]

Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-02-14 23:35:55 +01:00
Dumitru Ceara
b9e8354d04 ovsdb-idlc: Avoid accessing member within NULL idl index cursors.
Reported by UndefinedBehaviorSanitizer:
  tests/idltest.c:3602:12:
        runtime error: member access within null pointer of type
                       'const struct idltest_simple'
      #0 0x4295af in idltest_simple_cursor_first_ge tests/idltest.c:3602
      #1 0x41c81b in test_idl_compound_index_single_column tests/test-ovsdb.c:3128
      #2 0x41e035 in do_idl_compound_index tests/test-ovsdb.c:3277
      #3 0x4cf640 in ovs_cmdl_run_command__ lib/command-line.c:247
      #4 0x4cf79f in ovs_cmdl_run_command lib/command-line.c:278
      #5 0x4072f7 in main tests/test-ovsdb.c:79
      #6 0x7fa858675b74 in __libc_start_main (/lib64/libc.so.6+0x27b74)
      #7 0x4060ed in _start (/root/ovs/tests/test-ovsdb+0x4060ed)

Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-02-14 23:33:33 +01:00
Dumitru Ceara
b07c2e92e9 stopwatch: Fix buffer underflow when computing percentiles.
UB Sanitizer report:
  lib/stopwatch.c:119:22: runtime error: index 18446744073709551615 out of
                          bounds for type 'long long unsigned int [50]'
      #0 0x698358 in calc_percentile lib/stopwatch.c:119
      #1 0x69ada1 in add_sample lib/stopwatch.c:231
      #2 0x69c086 in stopwatch_end_sample_protected lib/stopwatch.c:386
      #3 0x69c522 in stopwatch_thread lib/stopwatch.c:441
      #4 0x684bae in ovsthread_wrapper lib/ovs-thread.c:383
      #5 0x7f042838b298 in start_thread (/lib64/libpthread.so.0+0x9298)
      #6 0x7f04277f2352 in clone (/lib64/libc.so.6+0x100352)

Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-02-14 22:41:29 +01:00
Dumitru Ceara
5a9bb85caf dpif-netdev: Fix misaligned access.
Remove the forced cache-line size alignment markers from
struct dp_netdev_pmd_thread and struct dp_netdev as discussed
at [0].  They don't seem to add any benefit and cause 64 byte
alignment requirements.

UB Sanitizer report:
  lib/dpif-netdev.c:6758:13:
        runtime error: member access within misaligned address 0x7f7f24d25010
        for type 'struct dp_netdev_pmd_thread', which requires 64 byte alignment
  0x7f7f24d25010: note: pointer points here
   00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 ...
                ^
     #0 0x5fbfde in dp_netdev_configure_pmd lib/dpif-netdev.c:6758
     #1 0x5fbde9 in dp_netdev_set_nonpmd lib/dpif-netdev.c:6715
     #2 0x5d6fdd in create_dp_netdev lib/dpif-netdev.c:1769
     #3 0x5d72d0 in dpif_netdev_open lib/dpif-netdev.c:1807
     #4 0x61c83f in do_open lib/dpif.c:347
     [...]
  lib/dpif-netdev.c:1724:6:
        runtime error: member access within misaligned address 0x000002005eb0
        for type 'struct dp_netdev', which requires 64 byte alignment
  0x000002005eb0: note: pointer points here
   00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 ...
                ^
      #0 0x5d6660 in create_dp_netdev lib/dpif-netdev.c:1724
      #1 0x5d72d0 in dpif_netdev_open lib/dpif-netdev.c:1807
      #2 0x61c846 in do_open lib/dpif.c:347
      #3 0x61ca9c in dpif_create lib/dpif.c:402
      #4 0x61cac9 in dpif_create_and_open lib/dpif.c:415
      #5 0x48f235 in open_dpif_backer ofproto/ofproto-dpif.c:776
      [...]

[0] https://mail.openvswitch.org/pipermail/ovs-dev/2021-December/390256.html

Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-02-14 22:38:42 +01:00
Dumitru Ceara
8ed26a8be3 treewide: Don't pass NULL to library functions that expect non-NULL.
It's actually undefined behavior to pass NULL to standard library
functions that manipulate arrays (e.g., qsort, memcpy, memcmp), even if
the passed number of items is 0.

UB Sanitizer reports:
  ovsdb/monitor.c:408:9: runtime error: null pointer passed as argument 1,
                                        which is declared to never be null
      #0 0x406ae1 in ovsdb_monitor_columns_sort ovsdb/monitor.c:408
      #1 0x406ae1 in ovsdb_monitor_add ovsdb/monitor.c:1683
  [...]
  lib/ovsdb-data.c:1970:5: runtime error: null pointer passed as argument 2,
                                          which is declared to never be null
      #0 0x4071c8 in ovsdb_datum_push_unsafe lib/ovsdb-data.c:1970
      #1 0x471cd0 in ovsdb_datum_apply_diff_in_place lib/ovsdb-data.c:2345
  [...]
  ofproto/ofproto-dpif-rid.c:159:17:
        runtime error: null pointer passed as argument 1,
                       which is declared to never be null
      #0 0x4df5d8 in frozen_state_equal ofproto/ofproto-dpif-rid.c:159
      #1 0x4dfd27 in recirc_find_equal ofproto/ofproto-dpif-rid.c:179
      [...]

Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-02-14 22:35:42 +01:00
Paolo Valerio
989895501c system-traffic.at: Avoid sporadic failures during conntrack IPv6 HTTP/FTP tests.
Some sporadic false positive may be visible for the following tests:

- conntrack - IPv6 HTTP
- conntrack - FTP over IPv6

The failures show up randomly.
The reason appears to be source address used when performing the
request using wget:
-tcp,orig=(src=fc00::1,dst=fc00::2,sport=<cleared>,dport=<cleared>),\
    reply=(src=fc00::2,dst=fc00::1,sport=<cleared>,dport=<cleared>),\
    protoinfo=(state=<cleared>)
+tcp,orig=(src=fe80::f0eb:f8ff:fef0:138f,dst=fc00::2,sport=<cleared>,dport=<cleared>),\
    reply=(src=fc00::2,dst=fe80::f0eb:f8ff:fef0:138f,sport=<cleared>,dport=<cleared>),\
    protoinfo=(state=<cleared>)

It seems that the problem can be addressed in multiple ways, but using
"nodad" seems to be safe enough to fix the issue that now, after
hundreds of attempts, is no longer present.

Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-02-14 21:38:02 +01:00
Paolo Valerio
e969370d30 system-traffic.at: Do not use ranges with broadcast address.
turn a bunch of test ranges from 10.1.1.240-10.1.1.255 to
10.1.1.240-10.1.1.254. 10.1.1.255 is the broadcast address for
10.1.1.0/24 and can be picked to SNAT packets causing the subsequent
failure of the test.

Fixes: 9ac0aadab9 ("conntrack: Add support for NAT.")
Fixes: e32cd4c629 ("conntrack: ignore port for ICMP/ICMPv6 NAT.")
Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-02-14 20:49:39 +01:00
Ben Pfaff
78ff3961ca daemon-unix: Close log file in monitor process while waiting on child.
Until now, the monitor process held its log file open while it waited for
the child to exit, and then it reopened it after the child exited.  Holding
the log file open this way prevented log rotation from freeing disk space.
This commit avoids the problem by closing the log file before waiting, then
reopening it afterward.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Reported-by: Antonin Bas <abas@vmware.com>
VMware-BZ: #2743409
Signed-off-by: William Tu <u9012063@gmail.com>
2022-02-14 00:40:02 -08:00
Kumar Amber
b9cf520705 system-dpdk.at: Add warning log in mfex fuzzy test.
Some specific warning are seen on various systems
which may not be visible on others but good to add
such logs to test to avoid test-case failure.

Thw warning only effects the fuzzy tests due to
more than 1000+ flows being offloading simultanously.

Wilcarding flow count number as for different systems
under test the number could vary in the warning log.

Suggested-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Kumar Amber <kumar.amber@intel.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Cian Ferriter <cian.ferriter@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2022-02-11 11:42:17 +00:00
Adrian Moreno
f0a9000ca6 ofproto: Fix ipfix not always sampling on egress.
We are currently requiring in_port to be a valid port number for ipfix
sampling even if the sampling is done on the output port (egress).

This restriction is not really needed and can affect pipelines that
intentionally set the in_port to OFPP_NONE during flow processing. For
instance, OVN does this, see:

cfa547821 Fix ovn-controller generated packets from getting dropped for
reject ACL action.

This patch skips ipfix sampling only if both (ingress and egress) ports
are invalid.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2016346
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-02-09 16:02:49 +01:00
Roi Dayan
96ad83bc7b tc: Fix incorrect TC rule for decap+encap datapath flow.
A datapath flow generated for traffic from vxlan port to another vxlan port
looks like this:

 tunnel(tun_id=0x65,src=10.10.11.3,dst=10.10.11.2,ttl=0/0,tp_dst=4789,flags(+key)),
 ...,in_port(vxlan_sys_4789),...,
 actions:set(tunnel(tun_id=0x66,src=10.10.12.2,dst=10.10.12.3,tp_dst=4789,flags(key))),
          vxlan_sys_4789

The generated TC rule with explicit tunnel key unset action added after
tunnel key set action, which is wrong.

filter protocol ip pref 7 flower chain 0 handle 0x1
  dst_mac fa:16:3e:2a:4e:23
  eth_type ipv4
  ip_tos 0x0/3
  enc_dst_ip 10.10.11.2
  enc_src_ip 10.10.11.3
  enc_key_id 101
  enc_dst_port 4789
  ip_flags nofrag
  not_in_hw
        action order 1: tunnel_key  set
        src_ip 10.10.12.2
        dst_ip 10.10.12.3
        key_id 102
        dst_port 4789
        nocsum pipe
         index 1 ref 1 bind 1 installed 568 sec used 0 sec
        Action statistics:
        Sent 46620 bytes 555 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

        action order 2: tunnel_key  unset pipe
         index 2 ref 1 bind 1 installed 568 sec used 0 sec
        Action statistics:
        Sent 46620 bytes 555 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

        action order 3: mirred (Egress Redirect to device vxlan_sys_4789) stolen
        index 1 ref 1 bind 1 installed 568 sec used 0 sec
        Action statistics:
        Sent 46620 bytes 555 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0
        cookie e0c82bfd504b701428b00db6b08db3b2

Fix it by also adding the the tunnel key unset action before the tunnel
key set action and not only before output port.

Fixes: 7c53bd7839 ("tc: Move tunnel_key unset action before output ports")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-02-09 15:40:19 +01:00
Kevin Traynor
ab4d3bfbef netdev-dpdk: Update to use RTE_ETH namespace defines.
This patch updates OVS to use DPDK RTE_ETH namespaces.

DPDK commit 295968d17407 ("ethdev: add namespace") [0] added RTE_ETH
namespaces for ethdev enums and macros in DPDK 21.11.

As compatibility for the older names was kept and they were not officially
deprecated in DPDK 21.11, there was no impact to OVS and OVS did not have
to be updated.

In future DPDK releases the older names will be deprecated and that will
cause build warnings for OVS. They may also be removed from DPDK at some
point.

There is no immediate need to update OVS to use the new namespaces while
DPDK 21.11 is being used but at the same time there is no need to wait
until it becomes an issue either. So might as well align with the
updated names in DPDK 21.11.

[0] http://git.dpdk.org/dpdk/commit/?id=295968d1740760337e16b0d7914875c5cac52850

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2022-02-08 11:59:29 +00:00
Harry van Haaren
4f810deab9 dpif-netdev: fix vlan and ipv4 parsing in avx512
This commit fixes the minimum packet size for the vlan/ipv4/tcp
traffic profile, which was previously incorrectly set.

This commit also disallows any fragmented IPv4 packets from being
matched in the optimized miniflow-extract, avoiding complexity of
handling fragmented packets and using scalar fallback instead.
The DF (don't fragment) bit is now ignored, and stripped from the
resulting miniflow.

Fixes: aa85a25095 ("dpif-netdev/mfex: Add more AVX512 traffic profiles.")
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Tested-by: Kumar Amber <kumar.amber@intel.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2022-02-08 09:36:25 +00:00
Ilya Maximets
d5453008c4 ci: Install wheel before installing any other python packages.
GHA is broken due to update to pip>=22.0.  This happens because now it
stops backtracking packages on build failure making it impossible to
find working combination of versions.

We're not able to build 'hacking', because 'wheel' is not installed
at that point in time.  Installing it separately to fix the issue,
so pip can find compatible versions of packages by backtracking.

Unfortunately, new version of backtracking leads to installation of
incompatible versions of flake8 and hacking.  Presumably because
current versions of hacking are not compatible with flake8>=4.0
and very old hacking-0.5.4 for some reason is considered suitable
while resolving dependencies.  So, we end up with flake8-4.0.1 and
hacking-0.5.4 installed.  And that doesn't work.  Limiting the version
of hacking to >=3.0 to have flake8-3.9.2 and hacking-3.0.0 installed
during backtracking.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>
[i.maximets: 2 tags below carried from v1, that had no >=3.0 change]
Acked-by: Gaetan Rivet <grive@u256.net>
Acked-by: Aaron Conole <aconole@redhat.com>
2022-02-04 21:58:45 +01:00
Nobuhiro MIKI
f81483ad57 odp-util: Fix tunnel key attr for GTP-U.
CC: William Tu <u9012063@gmail.com>
Fixes: 3c6d05a02e ("userspace: Add GTP-U support.")
Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-02-04 11:57:51 +01:00
Dumitru Ceara
28f36edd19 ovsdb-idl: Only process successful txn in ovsdb_idl_loop_run.
Otherwise we hide the transaction result from the user.  This may cause
problems as the user will not detect error cases.  For example, if the
server refuses a transaction with "constraint violation", the user
should be notified because the transaction might need to be retried.

For clients that process database changes incrementally (using change
tracking) this lack of failure notification creates a problem if it
occurs while no other database changes happen.  In that case:
- ovsdb_idl_loop_run() silently consumes the failure, initializes a
  new transaction.
- no other table update was received from the server so the user will
  not add anything to the new transaction.
- ovsdb_idl_loop_commit_and_wait() will "succeed" as nothing changed
  from the client's perspective.
In reality, the first transaction failed and the client wasn't given
the chance to handle the failure.

Commit 0401cf5f9e ("ovsdb idl: Try committing the pending txn in
ovsdb_idl_loop_run.") tried to optimize for the common, successful
case.  Maintain the same approach and optimize for transactions that
succeeded but fall back to the old mechanism of processing failures
within ovsdb_idl_loop_commit_and_wait() instead.

Fixes: 0401cf5f9e ("ovsdb idl: Try committing the pending txn in ovsdb_idl_loop_run.")
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-02-04 11:48:31 +01:00
Ilya Maximets
97772a9b2e AUTHORS: Add Wan Junjie.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-02-02 19:19:06 +01:00
Wan Junjie
cbcd9ca423 ofproto-dpif-upcall: Fix n_revalidators on upcall show.
When upcall/show is used to collect upcall statistics from a grafana
collector or some agent, upcall/show can be called even during ovs
restart. Occasionally ovs will crash when the revalidator thread
is not really inited. Backtrace:

(gdb) bt
- 0 upcall_unixctl_show at ofproto/ofproto-dpif-upcall.c:2885
- 1 process_command at lib/unixctl.c:308
- 2 run_connection at lib/unixctl.c:342
- 3 unixctl_server_run at lib/unixctl.c:393
- 4 main at vswitchd/ovs-v$witchd.c:140

Fixes: e79a6c833e ("ofproto: Handle flow installation and eviction in upcall.")
Signed-off-by: Wan Junjie <wanjunjie@bytedance.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-02-02 18:54:49 +01:00
William Tu
33027afd22 acinclude: Detect avx512 vpopcntdq compiler support.
Ubuntu Xenial 16.04 is using GCC 5.4 and it does not support
target "-mavx512vpopcntdq" and cuases error
  lib/dpif-netdev-lookup-avx512-gather.c:356:47:
  error: attribute(target("avx512vpopcntdq")) is unknown
GCC 7+ supports vpopcntdq:
https://gcc.gnu.org/gcc-7/changes.html
The patch detects vpopcntdq and disables AVX512 when not found.

Fixes: 1e31489134 ("dpcls-avx512: Enable avx512 vector popcount instruction.")
Reported-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Cian Ferriter <cian.ferriter@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2022-02-02 14:36:55 +00:00
Maxime Coquelin
0bca7fa1a3 Documentation: Fix userspace Tx steering section.
This patch fixes the thread mode part, as the static
thread-to-txq mapping selection depends on whether the
number of queues is strictly greater than the number of
PMD threads, and not greater or equal.

The section is also reworded as per Ilya's suggestion.

Fixes: c18e707b2f ("dpif-netdev: Introduce hash-based Tx packet steering mode.")
Reported-by: Kevin Traynor <ktraynor@redhat.com>
Reported-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-31 21:38:19 +01:00
Maxime Coquelin
a7f52b7eb6 vswitchd.xml: Add missing tx-steering PMD option.
This patch documents PMD's other_config:tx-steering option.

Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-31 21:36:59 +01:00
Dumitru Ceara
53a540e531 ovsdb-idl: ovsdb_idl_loop_destroy must also destroy the committing txn.
Found by AddressSanitizer when running OVN tests:
  Direct leak of 64 byte(s) in 1 object(s) allocated from:
      #0 0x498fb2 in calloc (/ic/ovn-ic+0x498fb2)
      #1 0x5f681e in xcalloc__ ovs/lib/util.c:121:31
      #2 0x5f681e in xzalloc__ ovs/lib/util.c:131:12
      #3 0x5f681e in xzalloc ovs/lib/util.c:165:12
      #4 0x5e3697 in ovsdb_idl_txn_add_map_op ovs/lib/ovsdb-idl.c:4057:29
      #5 0x4d3f25 in update_isb_pb_external_ids ic/ovn-ic.c:576:5
      #6 0x4cc4cc in create_isb_pb ic/ovn-ic.c:716:5
      #7 0x4cc4cc in port_binding_run ic/ovn-ic.c:803:21
      #8 0x4cc4cc in ovn_db_run ic/ovn-ic.c:1700:5
      #9 0x4c9c1c in main ic/ovn-ic.c:1984:17
      #10 0x7f9ad9f4a0b2 in __libc_start_main

Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-31 21:29:43 +01:00
Martin Varghese
712202ff7d ofproto-dpif-xlate: Fix packet drops with decap action on MPLS Multicast.
Added PT_MPLS_MC support in function xlate_generic_decap_action to fix
packet drops when decap action is applied on packets with packet_type
(ns=1,type=0x8848).

Fixes: 1917ace893 ("Encap & Decap actions for MPLS packet type.")
Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-31 21:27:06 +01:00
Martin Varghese
3ae3e86059 tests: Fix cosmetic errors in system-traffic.at.
Removed extra lines in multiple encap decap mpls actions &
encap decap mpls actions tests.

Converted title of encap decap mpls actions tests to lowercase
for consistency.

Fixes: 1917ace893 ("Encap & Decap actions for MPLS packet type.")
Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-31 21:27:06 +01:00
Dumitru Ceara
5202710a78 python: idl: Clear last_id on reconnect if condition changes in-flight.
When reconnecting, if there are condition changes already sent to the
server but not yet acked, reset the db's 'last-id', esentially clearing
the local cache after reconnect.

This is needed because the client cannot easily differentiate between
the following cases:
a. either the server already processed the requested monitor
   condition change but the FSM was restarted before the
   client was notified.  In this case the client should
   clear its local cache because it's out of sync with the
   monitor view on the server side.
b. OR the server hasn't processed the requested monitor
   condition change yet.

Fixes: 46d44cf3be ("python: idl: Add monitor_cond_since support.")
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-31 21:23:47 +01:00
Dumitru Ceara
c1691cceac ovsdb-cs: Clear last_id on reconnect if condition changes in-flight.
When reconnecting, if there are condition changes already sent to the
server but not yet acked, reset the db's 'last-id', esentially clearing
the local cache after reconnect.

This is needed because the client cannot easily differentiate between
the following cases:
a. either the server already processed the requested monitor
   condition change but the FSM was restarted before the
   client was notified.  In this case the client should
   clear its local cache because it's out of sync with the
   monitor view on the server side.
b. OR the server hasn't processed the requested monitor
   condition change yet.

Conditions changing at the same time with a reconnection happening are
rare so the performance impact of this patch should be minimal.

Also, the tests are updated to cover the fact that we cannot control
which of the two scenarios ("a" and "b" above) are hit during the test.

Reported-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-31 21:23:47 +01:00
Dumitru Ceara
718dc8fca7 python: idl: Resend requested but not acked conditions when reconnecting.
When reconnecting forget about in-flight monitor condition changes
if the user requested a newer condition already.

This matches the C implementation, in ovsdb_cs_db_sync_condition().

Fixes: 46d44cf3be ("python: idl: Add monitor_cond_since support.")
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-By: Terry Wilson <twilson@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-31 21:23:47 +01:00
Ilya Maximets
9632f5551f tests: Add de-serialization check to the json string benchmark.
Since we're testing serialization, it also makes sense to test
the opposite operation.  Should be useful in the future for
exploring possible optimizations.

CMD: $ ./tests/ovstest json-string-benchmark

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
2022-01-31 21:15:25 +01:00