2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-31 06:15:47 +00:00
Commit Graph

18755 Commits

Author SHA1 Message Date
Greg Rose
5c1cc74da6 rhel: Fix dual kernel rpm install for RHEL 8.4
RHEL 8.4 is the first of the RHEL 8.x kernels that has broken ABI so
it requires the same sort of fix as we did for several RHEL 7.x kernel
that needed two kernel rpms to work for all minor revisions of the
baseline kernel module.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
2021-09-03 15:58:41 -07:00
Simon Horman
944950dccd AUTHORS: Update Simon Horman
Some of Netronome's activities and people have moved over to Corigine,
including OVS development and myself.

Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
2021-09-01 12:17:53 +02:00
Ilya Maximets
43e66fc276 ovsdb: monitor: Store serialized json in a json cache.
Same json from a json cache is typically sent to all the clients,
e.g., in case of OVN deployment with ovn-monitor-all=true.

There could be hundreds or thousands connected clients and ovsdb
will serialize the same json object for each of them before sending.

Serializing it once before storing into json cache to speed up
processing.

This change allows to save a lot of CPU cycles and a bit of memory
since we need to store in memory only a string and not the full json
object.

Testing with ovn-heater on 120 nodes using density-heavy scenario
shows reduction of the total CPU time used by Southbound DB processes
from 256 minutes to 147.  Duration of unreasonably long poll intervals
also reduced dramatically from 7 to 2 seconds:

           Count   Min    Max   Median    Mean   95 percentile
 -------------------------------------------------------------
  Before   1934   1012   7480   4302.5   4875.3     7034.3
  After    1909   1004   2730   1453.0   1532.5     2053.6

Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-09-01 00:15:30 +02:00
Ilya Maximets
0de8829540 raft: Don't keep full json objects in memory if no longer needed.
Raft log entries (and raft database snapshot) contains json objects
of the data.  Follower receives append requests with data that gets
parsed and added to the raft log.  Leader receives execution requests,
parses data out of them and adds to the log.  In both cases, later
ovsdb-server reads the log with ovsdb_storage_read(), constructs
transaction and updates the database.  On followers these json objects
in common case are never used again.  Leader may use them to send
append requests or snapshot installation requests to followers.
However, all these operations (except for ovsdb_storage_read()) are
just serializing the json in order to send it over the network.

Json objects are significantly larger than their serialized string
representation.  For example, the snapshot of the database from one of
the ovn-heater scale tests takes 270 MB as a string, but 1.6 GB as
a json object from the total 3.8 GB consumed by ovsdb-server process.

ovsdb_storage_read() for a given raft entry happens only once in a
lifetime, so after this call, we can serialize the json object, store
the string representation and free the actual json object that ovsdb
will never need again.  This can save a lot of memory and can also
save serialization time, because each raft entry for append requests
and snapshot installation requests serialized only once instead of
doing that every time such request needs to be sent.

JSON_SERIALIZED_OBJECT can be used in order to seamlessly integrate
pre-serialized data into raft_header and similar json objects.

One major special case is creation of a database snapshot.
Snapshot installation request received over the network will be parsed
and read by ovsdb-server just like any other raft log entry.  However,
snapshots created locally with raft_store_snapshot() will never be
read back, because they reflect the current state of the database,
hence already applied.  For this case we can free the json object
right after writing snapshot on disk.

Tests performed with ovn-heater on 60 node density-light scenario,
where on-disk database goes up to 97 MB, shows average memory
consumption of ovsdb-server Southbound DB processes decreased by 58%
(from 602 MB to 256 MB per process) and peak memory consumption
decreased by 40% (from 1288 MB to 771 MB).

Test with 120 nodes on density-heavy scenario with 270 MB on-disk
database shows 1.5 GB memory consumption decrease as expected.
Also, total CPU time consumed by the Southbound DB process reduced
from 296 to 256 minutes.  Number of unreasonably long poll intervals
reduced from 2896 down to 1934.

Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-09-01 00:14:42 +02:00
Ilya Maximets
b0bca6f27a json: Add support for partially serialized json objects.
Introducing a new json type JSON_SERIALIZED_OBJECT.  It's not an
actual type that can be seen in a json message on a wire, but
internal type that is intended to hold a serialized version of
some other json object.  For this reason it's defined after the
JSON_N_TYPES to not confuse parsers and other parts of the code
that relies on compliance with RFC 4627.

With this JSON type internal users may construct large JSON objects,
parts of which are already serialized.  This way, while serializing
the larger object, data from JSON_SERIALIZED_OBJECT can be added
directly to the result, without additional processing.

This will be used by next commits to add pre-serialized JSON data
to the raft_header structure, that can be converted to a JSON
before writing the file transaction on disk or sending to other
servers.  Same technique can also be used to pre-serialize json_cache
for ovsdb monitors, this should allow to not perform serialization
for every client and will save some more memory.

Since serialized JSON is just a string, reusing the 'json->string'
pointer for it.

Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-08-31 22:19:46 +02:00
Ilya Maximets
c2fb5bdae6 ovs-actions: Convert man page from xml to rST.
This way it's easier to show it on a website as it will be updated
automatically along with the rest of the documentation.

Sphinx doesn't render everything perfectly, but it looks good enough
in both man and html versions.  rST is a bit easier to read and it
takes less space.

Conversion performed manually since I didn't found any good tool
that can actually make the process any faster.

Along the way I replaced versions like x.y.90 with x.y+1, because
it doesn't seem correct to me to refer non-released versions of OVS
in the docs.  Fixed a couple of small mistakes like duplicated
paragraph and reference to a different section by incorrect name.
Also removed bits of xml->nroff conversion code that is not needed
anymore.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Roi Dayan <roid@nvidia.com>
2021-08-31 20:56:47 +02:00
Ilya Maximets
748010ff30 json: Optimize string serialization.
Current string serialization code puts all characters one by one.
This is slow because dynamic string needs to perform length checks
on every ds_put_char() and it's also doesn't allow compiler to use
better memory copy operations, i.e. doesn't allow copying few bytes
at once.

Special symbols are rare in a typical database.  Quotes are frequent,
but not too frequent.  In databases created by ovn-kubernetes, for
example, usually there are at least 10 to 50 chars between quotes.
So, it's better to count characters that doesn't require escaping
and use fast data copy for the whole sequential block.

Testing with a synthetic benchmark (included) on my laptop shows
following performance improvement:

   Size      Q  S       Before       After       Diff
 -----------------------------------------------------
 100000      0  0 :    0.227 ms     0.142 ms   -37.4 %
 100000      2  1 :    0.277 ms     0.186 ms   -32.8 %
 100000      10 1 :    0.361 ms     0.309 ms   -14.4 %
 10000000    0  0 :   22.720 ms    12.160 ms   -46.4 %
 10000000    2  1 :   27.470 ms    19.300 ms   -29.7 %
 10000000    10 1 :   37.950 ms    31.250 ms   -17.6 %
 100000000   0  0 :  239.600 ms   126.700 ms   -47.1 %
 100000000   2  1 :  292.400 ms   188.600 ms   -35.4 %
 100000000   10 1 :  387.700 ms   321.200 ms   -17.1 %

Here Q - probability (%) for a character to be a '\"' and
S - probability (%) to be a special character ( < 32).

Testing with a closer to real world scenario shows overall decrease
of the time needed for database compaction by ~5-10 %.  And this
change also decreases CPU consumption in general, because string
serialization is used in many different places including ovsdb
monitors and raft.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Numan Siddique <numans@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>
2021-08-31 19:04:08 +02:00
Ilya Maximets
7847bf89ea tests: Skip netlink policy test on non-Linux platforms.
FreeBSD tests in Cirrus CI are broken and, I guess, windows tests too:

  89. library.at:258: testing netlink policy ...
  ./library.at:259: ovstest test-netlink-policy ll_addr
  --- /dev/null	2021-08-20 19:02:41.907547000 +0000
  +++ /tmp/cirrus-ci-build/tests/testsuite.dir/at-groups/89/stderr
  @@ -0,0 +1 @@
  +ovstest: unknown command 'test-netlink-policy'; use --help for help
  ./library.at:259: exit code was 1, expected 0
  89. library.at:258: 89. netlink policy (library.at:258): FAILED

'tests/test-netlink-policy.c' is built only on Linux, test
must be skipped on all other platforms to unblock CI.

Fixes: bfee9f6c01 ("netlink: Add support for parsing link layer address.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Frode Nordahl <frode.nordahl@canonical.com>
2021-08-28 16:10:02 +02:00
Numan Siddique
7502849e95 ovsdb-idl: Add APIs to query if a table and a column is present.
This patch adds 2 new APIs in the ovsdb-idl client library
 - ovsdb_idl_server_has_table() and ovsdb_idl_server_has_column() to
query if a table and a column is present in the IDL or not.  This
patch also adds IDL helper functions which are auto generated from
the schema which makes it easier for the clients.

These APIs are required for scenarios where the server schema is old and
missing a table or column and the client (built with a new schema
version) does a transaction with the missing table or column.  This
results in a continuous loop of transaction failures.

Related-Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1992705
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-08-28 02:59:04 +02:00
Frode Nordahl
bfee9f6c01 netlink: Add support for parsing link layer address.
Data retrieved from netlink and friends may include link layer
address.  Add type to nl_attr_type and min/max functions to allow
use of nl_policy_parse with this type of data.

While this will not be used by Open vSwitch itself at this time,
sibling and derived projects want to use the great netlink library
that OVS provides, and it is not possible to safely override the
global nl_attr_type symbol at link time.

Signed-off-by: Frode Nordahl <frode.nordahl@canonical.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2021-08-20 11:32:52 -07:00
Ilya Maximets
9ab5075bff Set release date for 2.16.0.
Acked-by: Numan Siddique <numans@ovn.org>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-08-17 00:26:28 +02:00
Ilya Maximets
3fc34862bf AUTHORS: Add Vlad Buslov.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-08-16 21:28:17 +02:00
Vlad Buslov
6bb3f363d6 tc: Set action flags for tunnel_key release.
The commit that enabled 'no_percpu' flag for compatible actions missed the
tunnel_key release action code. Add missing call to nl_msg_put_act_flags().

Fixes: 292d5bd9bb ("tc: Set 'no_percpu' flag for compatible actions")
Reported-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Tested-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-08-16 19:40:29 +02:00
Paolo Valerio
d08c086e56 netlink-socket: Replace error with txn->error when logging nacked transactions.
in case nl_msg_nlmsgerr returns true which basically means that the
nlmsg_type == NLMSG_ERROR, we need to log the error code, besides the
descriptive representation, stored by nl_msg_nlmsgerr instead of
"error".

Fixes: 72d32ac0b3 ("netlink-socket: Make caller provide message receive buffers.")
Suggested-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Reviewed-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-08-16 19:38:13 +02:00
Sriharsha Basavapatna
d9a5fb78d7 netdev-offload-dpdk: Initialize s_tnl dynamic string.
The 's_tnl' member in flow_patterns and flow_actions should be
to set to DS_EMPTY_INITIALIZER, to be consistent with dynamic string
initializations.

Also, there's a potential memory leak of flow_patterns->s_tnl.
Fix this by destroying s_tnl in free_flow_patterns().

Fixes: 507d20e77b ("netdev-offload-dpdk: Support vports flows offload.")
Fixes: be56e063d0 ("netdev-offload-dpdk: Support tunnel pop action.")
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Acked-by: Gaetan Rivet <grive@u256.net>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-08-16 19:25:43 +02:00
Sriharsha Basavapatna
0363891d10 dynamic-string: Fix a crash in ds_clone().
ds_clone() crashes while trying to clone an empty dynamic string.
It happens because it doesn't check if memory was allocated and
tries to read from the NULL pointer.  ds_init() doesn't allocate
any memory.

For example:
In netdev_offload_dpdk_flow_create() when an offload request fails,
dump_flow() is called to log a warning message. The 's_tnl' string
in flow_patterns gets initialized in vport_to_rte_tunnel() conditionally
via ds_put_format(). If it is not initialized, it crashes later in
dump_flow_attr()->ds_clone()->memcpy() while dereferencing this string.

To fix this, check if memory for the src string has been allocated,
before copying it to the dst string.

Fixes: fa44a4a3ff ("ovn-controller: Persist desired conntrack groups.")
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-08-16 19:25:43 +02:00
Harry van Haaren
0b3a5d7add dpif-netdev: fix memory leak in dpif and mfex commands
This patch fixes a memory leak in the commands for DPIF and MFEX
get and set. In order to operate the commands require a pmd_list,
which is currently not freed after it has been used. This issue
was identified by a static analysis tool.

Fixes: 3d8f47bc ("dpif-netdev: Add command line and function pointer for miniflow extract")
Fixes: abb807e2 ("dpif-netdev: Add command to switch dpif implementation.")
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2021-08-16 15:53:47 +01:00
Harry van Haaren
01cbe1ed41 dpif-netdev: fix memory leak in dpcls subtable set command
This patch fixes a memory leak when the command
"dpif-netdev/subtable-lookup-prio-set" is run, the pmd_list
required to iterate the PMD threads was not being freed.
This issue was identified by a static analysis tool.

Fixes: 3d018c3e ("dpif-netdev: Add subtable lookup prio set command.")
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2021-08-16 15:52:55 +01:00
kumar Amber
d2ad305a64 dpif-netdev: Fix dead code in mfex command
The commit removes the dead code from the
MFEX set command as highlighted by static tool
analysis.

Fixes: a395b132b7 ("dpif-netdev: Add packet count and core id paramters for study")
Signed-off-by: kumar Amber <kumar.amber@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2021-08-16 15:18:12 +01:00
Cian Ferriter
8a5f055a0b docs/dpdk/bridge: Fix dpif-netdev/miniflow-parser-set formatting
The "name" parameter isn't optional so don't use brackets around it.

Fixes: 5c5c98cec2 ("docs/dpdk/bridge: Add miniflow extract section.")
Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2021-08-16 15:00:15 +01:00
Cian Ferriter
96e513f557 dpif-netdev-unixctl.man: Document miniflow-parser-* CMDs
Document the "ovs-appctl dpif-netdev/miniflow-parser-get" and
"ovs-appctl dpif-netdev/miniflow-parser-set" commands in the vswitchd
manpage.

Fixes: 3d8f47bc04 ("dpif-netdev: Add command line and function pointer for miniflow extract")
Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2021-08-16 14:59:38 +01:00
Cian Ferriter
3004ebf20f dpif-netdev-unixctl.man: Document subtable-lookup-* CMDs
Document the "ovs-appctl dpif-netdev/subtable-lookup-prio-get" and
"ovs-appctl dpif-netdev/subtable-lookup-prio-set" commands in the
vswitchd manpage.

Fixes: 9ff7cabfd7 ("dpif-netdev: add subtable-lookup-prio-get command.")
Fixes: 3d018c3ea7 ("dpif-netdev: add subtable lookup prio set command.")
Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2021-08-16 14:58:57 +01:00
Harry van Haaren
c15c3df3ac dpcls: fix build on compilers without AVX512-VPOPCNT
This commit adds extra checks around the AVX-512 vpopcnt instruction
enabling, ensuring that in the function where the ISA is enabled the
compiler has also indicated its support for the ISA. This is achieved
by checking the __AVX512VPOPCNTDQ__ define, which the compiler sets if
it is capable of handling the vpopcnt instruction.

If the compiler is not capable of handling vpopcnt, we fall back to
the emulated vpopcnt implementation.

Reported-by: Ian Stokes <ian.stokes@intel.com>
Fixes: 1e31489134 ("dpcls-avx512: Enable avx512 vector popcount instruction.")
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2021-08-10 11:41:18 +01:00
Ilya Maximets
ccb6cc20ff AUTHORS.rst: Add Lin Huang.
Lin Huang authored sevaral patches, but wasn't added to the list of
authors for some reason.  Adding by request.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-08-09 21:20:49 +02:00
Ilya Maximets
765c8a774d AUTHORS.rst: Add Gaetan Rivet and Wentao Jia.
Wentao Jia provided useful bug reports for ovsdb relay mode.
Gaetan Rivet authored several patches already, but I seem to
forget to add him to the list of authors.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-08-05 21:47:40 +02:00
Ilya Maximets
2480d09e9f ovsdb: transaction-forward: Fix initialization of the 'sent' hmap node.
'sent_node' is initialized to all zeroes by xzalloc(), but
HMAP_NODE_NULL is not all zeroes.  hmap_node_is_null() is used
to detect if the node is valid, but it will fail and cause
segmentation fault on attempt to remove the non-existent node
from the hash map.  This can happen if client disconnected while
the transaction is not yet forwarded to the relay source:

  Program terminated with signal 11, Segmentation fault.
  0  in hmap_remove at include/openvswitch/hmap.h:293
     293         while (*bucket != node) {
  (gdb) bt
   0  hmap_remove at include/openvswitch/hmap.h:293
   1  ovsdb_txn_forward_unlist at ovsdb/transaction-forward.c:67
   2  ovsdb_txn_forward_destroy at ovsdb/transaction-forward.c:79
   3  ovsdb_trigger_destroy at ovsdb/trigger.c:70
   4  ovsdb_jsonrpc_trigger_complete at ovsdb/jsonrpc-server.c:1192
   5  ovsdb_jsonrpc_trigger_remove__  at ovsdb/jsonrpc-server.c:1204
   6  ovsdb_jsonrpc_trigger_complete_all at ovsdb/jsonrpc-server.c:1223
   7  ovsdb_jsonrpc_session_run at ovsdb/jsonrpc-server.c:546
   8  ovsdb_jsonrpc_session_run_all at ovsdb/jsonrpc-server.c:591
   9  ovsdb_jsonrpc_server_run at ovsdb/jsonrpc-server.c:406
  10  main_loop
  (gdb) print db->txn_forward_sent
  $20 = {buckets = 0x..., one = 0x0, mask = 63, n = 0}
  (gdb) print txn_fwd->sent_node
  $24 = {hash = 0, next = 0x0}

Fix that by correct initialization of the 'sent_node'.

Reported-by: Wentao Jia <wentao.jia@easystack.cn>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2021-August/051354.html
Fixes: 7964ffe7d2 ("ovsdb: relay: Add support for transaction forwarding.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Mark D. Gray <mark.d.gray@redhat.com>
2021-08-05 21:35:28 +02:00
Ilya Maximets
a3bd383e94 acinclude: Don't set AVX512-related configuration via CFLAGS.
The correct way to pass configuration options is to define them
inside the config.h.  Additionally, few long lines wrapped and
fixed the unnecessary double check for -mavx512f.

Fixes: abb807e27d ("dpif-netdev: Add command to switch dpif implementation.")
Fixes: 5324b54e60 ("dpif-netdev: Add configure to enable autovalidator at build time.")
Fixes: e90e115a01 ("dpif-netdev: implement subtable lookup validation.")
Fixes: 352b6c7116 ("dpif-lookup: add avx512 gather implementation.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Cian Ferriter <cian.ferriter@intel.com>
2021-08-05 11:46:33 +02:00
Eli Britstein
e21e9dcec2 dpif-netdev: Log flow modification in debug level.
Log flow modifications to help debugging.

Signed-off-by: Eli Britstein <elibr@nvidia.com>
Reviewed-by: Gaetan Rivet <gaetanr@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-08-02 19:33:03 +02:00
Eli Britstein
6f69e0e30b dpif-netdev: Fix offloads of modified flows.
Association of a mark to a flow is done as part of its offload handling,
in the offloading thread. However, the PMD thread specifies whether an
offload request is an "add" or "modify" by the association of a mark to
the flow.
This is exposed to a race condition. A flow might be created with
actions that cannot be fully offloaded, for example flooding (before MAC
learning), and later modified to have actions that can be fully
offloaded. If the two requests are queued before the offload thread
handling, they are both marked as "add". When the offload thread handles
them, the first request is partially offloaded, and the second one is
ignored as the flow is already considered as offloaded.

Fix it by specifying add/modify of an offload request by the actual flow
state change, without relying on the mark.

Fixes: 3c7330ebf0 ("netdev-offload-dpdk: Support offload of output action.")
Signed-off-by: Eli Britstein <elibr@nvidia.com>
Reviewed-by: Gaetan Rivet <gaetanr@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-08-02 18:56:56 +02:00
Eli Britstein
0d25621e4d dpif-netdev: Fix flow modification after failure.
dp_netdev_flow_offload_main thread is asynchronous, by the cited commit.
There might be a case where there are modification requests of the same
flow submitted before handled. Then, if the first handling fails, the
rule for the flow is deleted, and the mark is freed. Then, the following
one should not be handled as a modification, but rather as an "add".

Fixes: 02bb2824e5 ("dpif-netdev: do hw flow offload in a thread")
Signed-off-by: Eli Britstein <elibr@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-08-02 18:56:56 +02:00
Eli Britstein
8917010b05 dpif-netdev: Do not flush PMD offloads on reload.
Before flushing offloads of a removed port was supported by [1], it was
necessary to flush the 'marks'. In doing so, all offloads of the PMD are
removed, include the ones that are not related to the removed port and
that are not modified following this removal. As a result such flows are
evicted from being offloaded, and won't resume offloading.

As PMD offload flush is not necessary, avoid it.

[1] 62d1c28e9c ("dpif-netdev: Flush offload rules upon port deletion.")

Signed-off-by: Eli Britstein <elibr@nvidia.com>
Reviewed-by: Gaetan Rivet <gaetanr@nvidia.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-08-02 18:56:47 +02:00
Ilya Maximets
cd36a34f33 dpif-netdev: Fix non-atomic read of smc_enable_db.
Atomic variables must be read atomically.  And since we already have
'smc_enable_db' in the PMD context, we need to use it from there to
avoid reading atomically twice.

Also, 'smc_enable_db' is a global configuration, there is no need
to read it per-port or per-rxq.

Fixes: 9ac84a1a36 ("dpif-avx512: Add ISA implementation of dpif.")
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Cian Ferriter <cian.ferriter@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-08-02 18:52:44 +02:00
Rosemarie O'Riorden
de15afa50f dpdk: Stop configuring socket-limit with the value of socket-mem.
This change removes the automatic memory limit on start-up of OVS with
DPDK. As DPDK supports dynamic memory allocation, there is no
need to limit the amount of memory available, if not requested.

Currently, if socket-limit is not configured, it is set to the value of
socket-mem. With this change, the user can decide to set it or have no
memory limit.

Removed logs that announce this change and fixed documentation.

Reported at: https://bugzilla.redhat.com/show_bug.cgi?id=1949850
Signed-off-by: Rosemarie O'Riorden <roriorde@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-07-26 03:31:46 +02:00
Rosemarie O'Riorden
a8621f49d0 dpdk: Remove default values for socket-mem and limit.
This change removes the default values for EAL args socket-mem and
socket-limit.  As DPDK supports dynamic memory allocation, there is no
need to allocate a certain amount of memory on start-up, nor limit the
amount of memory available, if not requested.

Currently, socket-mem has a default value of 1024 when it is not
configured by the user, and socket-limit takes on the value of
socket-mem, 1024, by default.  With this change, socket-mem is not
configured by default, meaning that socket-limit is not either.
Neither, either or both options can be set.

Removed extra logs that announce this change and fixed documentation.

Reported at: https://bugzilla.redhat.com/show_bug.cgi?id=1949850
Signed-off-by: Rosemarie O'Riorden <roriorde@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-07-26 03:31:46 +02:00
Eli Britstein
1f7f557603 netdev-offload-dpdk: Fix vxlan vni cast-align warnings.
Reported-by: Harry Van Haaren <harry.van.haaren@intel.com>
Fixes: 4e432d6f81 ("netdev-offload-dpdk: Support tnl/push using vxlan encap attribute.")
Fixes: e098c2f966 ("netdev-dpdk-offload: Add vxlan pattern matching function.")
Signed-off-by: Eli Britstein <elibr@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-07-26 01:53:07 +02:00
Eli Britstein
e8cccd3a35 netdev-offload-dpdk: Fix IPv6 rewrite cast-align warning.
Fixes: b6207b1d27 ("netdev-offload-dpdk: Support offload of set IPv6 actions.")
Signed-off-by: Eli Britstein <elibr@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-07-26 01:53:07 +02:00
Ilya Maximets
b8b31d8410 daemon-unix: Fix leak of a fork error message.
19 bytes in 1 blocks are definitely lost in loss record 24 of 121
    at 0x4839748: malloc (vg_replace_malloc.c:306)
    by 0x483BD63: realloc (vg_replace_malloc.c:834)
    by 0x521C26: xrealloc (util.c:149)
    by 0x478F91: ds_reserve (dynamic-string.c:63)
    by 0x47928B: ds_put_format_valist (dynamic-string.c:161)
    by 0x47920A: ds_put_format (dynamic-string.c:142)
    by 0x506DE5: process_status_msg (process.c:0)
    by 0x52A6D0: fork_and_wait_for_startup (daemon-unix.c:284)
    by 0x52A54D: daemonize_start (daemon-unix.c:453)
    by 0x40EB3E: main (ovs-vswitchd.c:91)

Fixes: b925336a36 ("daemon: restart child process if it died before signaling its readiness")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Roi Dayan <roid@nvidia.com>
2021-07-26 01:44:48 +02:00
Dumitru Ceara
daf627f459 ovsdb-cs: Perform forced reconnects without a backoff.
The ovsdb-cs layer triggers a forced reconnect in various cases:
- when an inconsistency is detected in the data received from the
  remote server.
- when the remote server is running in clustered mode and transitioned
  to "follower", if the client is configured in "leader-only" mode.
- when explicitly requested by upper layers (e.g., by the user
  application, through the IDL layer).

In such cases it's desirable that reconnection should happen as fast as
possible, without the current exponential backoff maintained by the
underlying reconnect object.  Furthermore, since 3c2d6274bc ("raft:
Transfer leadership before creating snapshots."), leadership changes
inside the clustered database happen more often and, therefore,
"leader-only" clients need to reconnect more often too.

Forced reconnects call jsonrpc_session_force_reconnect() which will not
reset backoff.  To make sure clients reconnect as fast as possible in
the aforementioned scenarios we first call the new API,
jsonrpc_session_reset_backoff(), in ovsdb-cs, for sessions that are in
state CS_S_MONITORING (i.e., the remote is likely still alive and
functioning fine).

jsonrpc_session_reset_backoff() resets the number of backoff-free
reconnect retries to the number of remotes configured for the session,
ensuring that all remotes are retried exactly once with backoff 0.

This commit also updates the Python IDL and jsonrpc implementations.
The Python IDL wasn't tracking the IDL_S_MONITORING state explicitly,
we now do that too.  Tests were also added to make sure the IDL forced
reconnects happen without backoff.

Reported-at: https://bugzilla.redhat.com/1977264
Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-07-23 17:29:36 +02:00
kumar Amber
69b2bdfd3f system-dpdk.at: Fix module not found error for pyhton < 3.6.
This fixes the flake8 error on pyhton version older than 3.6
as ModuleNotFoundError in not available before 3.6.

../../tests/mfex_fuzzy.py:5:8: F821 undefined name 'ModuleNotFoundError'
Makefile:5826: recipe for target 'flake8-check' failed

Since it doesn't really make any sense to catch this exception,
try-except block is just removed.  Additionally the check for
scapy replaced with the more reliable one.  Imports re-ordered,
because standard imports should go first.

Fixes: 50be6715c0 ("test/sytem-dpdk: Add unit test for mfex autovalidator")
Signed-off-by: kumar Amber <kumar.amber@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-07-23 17:25:46 +02:00
Ilya Maximets
43b7d960af netdev-dummy: Silence the 'may be uninitialized' warning.
GCC 11 with -O1 on Feodra 34 emits a false-positive warning like this:

 lib/netdev-dummy.c: In function ‘dummy_packet_stream_run’:
 lib/netdev-dummy.c:284:16: error: ‘n’ may be used uninitialized in this
                                   function [-Werror=maybe-uninitialized]
  284 |             if (retval == n && dp_packet_size(&s->rxbuf) > 2) {
      |                ^

This breaks the build with --enable-Werror.  Initializing 'n' to
avoid the warning.

Acked-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-07-23 15:40:08 +02:00
Ben Pfaff
f05d6d623e ofproto-dpif-xlate: Fix continuations with OF instructions in OF1.1+.
Open vSwitch supports OpenFlow "instructions", which were introduced in
OpenFlow 1.1 and act like restricted kinds of actions that can only
appear in a particular order and particular circumstances.  OVS did
not support two of these instructions, "write_metadata" and
"goto_table", properly in the case where they appeared in a flow that
needed to be frozen for continuations.

Both of these instructions had the problem that they couldn't be
properly serialized into the stream of actions, because they're not
actions.  This commit fixes that problem in freeze_unroll_actions()
by converting them into equivalent actions for serialization.

goto_table had the additional problem that it was being serialized to
the frozen stream even after it had been executed.  This was already
properly handled in do_xlate_actions() for resubmit, which is almost
equivalent to goto_table, so this commit applies the same fix to
goto_table.  (The commit removes an assertion from the goto_table
implementation, but there wasn't any real value in that assertion and
I thought the code looked cleaner without it.)

This commit adds tests that would have found these bugs.  This includes
adding a variant of each continuation test that uses OF1.3 for
monitor/resume (which is necessary to trigger these bugs) plus specific
tests for continuations with goto_table and write_metadata.  It also
improves the continuation test infrastructure to add more detail on
the problem if a test fails.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Reported-by: Grayson Wu <wgrayson@vmware.com>
Reported-at: https://github.com/openvswitch/ovs-issues/issues/213
Discussed-at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/386166.html
Acked-by: Ilya Maximets <i.maximets@ovn.org>
2021-07-22 12:26:20 -07:00
Wilson Peng
8e808e7f14 datapath-windows:Correct checksum for DNAT action
While testing OVS-windows flows for the DNAT action, the checksum
In TCP header is set incorrectly when TCP offload is enabled by
Default. As a result, the packet will be dropped on receiver linuxVM.

>>>sample flow default configuration on both Windows VM and Linux VM
(src=40.0.1.2,dst=10.150.0.1) --dnat--> (src=40.0.1.2,dst==30.1.0.2)
Without the fix for some TCP packet(40.0.1.2->30.1.0.2 with payload
len 207) the TCP checksum will be pseduo header checksum and the value
is 0x01d6. With the fix the checksum will be 0x47ee, it could be got
the correct TCP checksum on the receiver Linux VM.

Signed-off-by: Wilson Peng<pweisong@vmware.com>
Signed-off-by: Anand Kumar<kumaranand@vmware.com>
Acked-by: Alin-Gabriel Serdean <aserdean@ovn.org>
Signed-off-by: Alin-Gabriel Serdean <aserdean@ovn.org>
2021-07-21 12:59:06 +03:00
David Marchand
9547987526 Documentation: Remove duplicate words.
This is a simple cleanup with a script of mine.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2021-07-19 09:33:01 -07:00
Ilya Maximets
4703bc67b7 Prepare for post-2.16.0 (2.16.90).
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-07-16 21:52:39 +02:00
Ilya Maximets
45bd6d93f1 Prepare for 2.16.0.
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-07-16 21:52:23 +02:00
Ilya Maximets
298d4151f4 bond: Fix broken rebalancing after link state changes.
There are 3 constraints for moving hashes from one member to another:

  1. The load difference exceeds ~ 3% of the load of one member.
  2. The difference in load between members exceeds 100,000 bytes.
  3. Moving the hash reduces the load difference by more than 10%.

In the current implementation, if one of the members transitions to
the DOWN state, all hashes assigned to it will be moved to the other
members.  After that, if the member goes UP, it will wait for
rebalancing to get hashes.  But in case we have more than 10 equally
loaded hashes, it will never meet constraint # 3, because each hash
will handle less than 10% of the load.  The situation gets worse when
the number of flows grows and it is almost impossible to transfer any
hash when all 256 hash records are used, which is very likely when we
have few hundred/thousand flows.

As a result, if one of the members goes down and back up while traffic
flows, it will never be used to transmit packets again.  This will not
be fixed even if we completely stop the traffic and start it again,
because the first two constraints will block rebalancing in the
earlier stages, while we have low traffic volume.

Moving a single hash if the destination does not have any hashes,
as it was before commit c460a6a7bc ("ofproto/bond: simplifying the
rebalancing logic"), will not help, because a single hash is not
enough to make the difference in load less than 10% of the total load,
and this member will handle only that one hash forever.

To fix this, let's try to move multiple hashes at the same time to
meet constraint # 3.

The implementation includes sorting the "records" to be able to
collect records with a cumulative load close enough to the ideal value.

Acked-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-07-16 21:52:23 +02:00
Mark Gray
b1e517bd2f dpif-netlink: Introduce per-cpu upcall dispatch.
The Open vSwitch kernel module uses the upcall mechanism to send
packets from kernel space to user space when it misses in the kernel
space flow table. The upcall sends packets via a Netlink socket.
Currently, a Netlink socket is created for every vport. In this way,
there is a 1:1 mapping between a vport and a Netlink socket.
When a packet is received by a vport, if it needs to be sent to
user space, it is sent via the corresponding Netlink socket.

This mechanism, with various iterations of the corresponding user
space code, has seen some limitations and issues:

* On systems with a large number of vports, there is correspondingly
a large number of Netlink sockets which can limit scaling.
(https://bugzilla.redhat.com/show_bug.cgi?id=1526306)
* Packet reordering on upcalls.
(https://bugzilla.redhat.com/show_bug.cgi?id=1844576)
* A thundering herd issue.
(https://bugzilla.redhat.com/show_bug.cgi?id=1834444)

This patch introduces an alternative, feature-negotiated, upcall
mode using a per-cpu dispatch rather than a per-vport dispatch.

In this mode, the Netlink socket to be used for the upcall is
selected based on the CPU of the thread that is executing the upcall.
In this way, it resolves the issues above as:

a) The number of Netlink sockets scales with the number of CPUs
rather than the number of vports.
b) Ordering per-flow is maintained as packets are distributed to
CPUs based on mechanisms such as RSS and flows are distributed
to a single user space thread.
c) Packets from a flow can only wake up one user space thread.

Reported-at: https://bugzilla.redhat.com/1844576
Signed-off-by: Mark Gray <mark.d.gray@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-07-16 20:05:03 +02:00
Mark Gray
485e3a13a6 dpif-netlink: Fix report_loss() message.
Fixes: 1579cf677f ("dpif-linux: Implement the API functions to allow multiple handler threads read upcall.")
Signed-off-by: Mark Gray <mark.d.gray@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-07-16 20:05:03 +02:00
Mark Gray
1325debb45 ofproto: Change type of n_handlers and n_revalidators.
'n_handlers' and 'n_revalidators' are declared as type 'size_t'.
However, dpif_handlers_set() requires parameter 'n_handlers' as
type 'uint32_t'. This patch fixes this type mismatch.

Signed-off-by: Mark Gray <mark.d.gray@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-07-16 20:05:03 +02:00
David Marchand
3222a89d9a dpif-netdev: Report overhead busy cycles per pmd.
Users complained that per rxq pmd usage was confusing: summing those
values per pmd would never reach 100% even if increasing traffic load
beyond pmd capacity.

This is because the dpif-netdev/pmd-rxq-show command only reports "pure"
rxq cycles while some cycles are used in the pmd mainloop and adds up to
the total pmd load.

dpif-netdev/pmd-stats-show does report per pmd load usage.
This load is measured since the last dpif-netdev/pmd-stats-clear call.
On the other hand, the per rxq pmd usage reflects the pmd load on a 10s
sliding window which makes it non trivial to correlate.

Gather per pmd busy cycles with the same periodicity and report the
difference as overhead in dpif-netdev/pmd-rxq-show so that we have all
info in a single command.

Example:
$ ovs-appctl dpif-netdev/pmd-rxq-show
pmd thread numa_id 1 core_id 3:
  isolated : true
  port: dpdk0             queue-id:  0 (enabled)   pmd usage: 90 %
  overhead:  4 %
pmd thread numa_id 1 core_id 5:
  isolated : false
  port: vhost0            queue-id:  0 (enabled)   pmd usage:  0 %
  port: vhost1            queue-id:  0 (enabled)   pmd usage: 93 %
  port: vhost2            queue-id:  0 (enabled)   pmd usage:  0 %
  port: vhost6            queue-id:  0 (enabled)   pmd usage:  0 %
  overhead:  6 %
pmd thread numa_id 1 core_id 31:
  isolated : true
  port: dpdk1             queue-id:  0 (enabled)   pmd usage: 86 %
  overhead:  4 %
pmd thread numa_id 1 core_id 33:
  isolated : false
  port: vhost3            queue-id:  0 (enabled)   pmd usage:  0 %
  port: vhost4            queue-id:  0 (enabled)   pmd usage:  0 %
  port: vhost5            queue-id:  0 (enabled)   pmd usage: 92 %
  port: vhost7            queue-id:  0 (enabled)   pmd usage:  0 %
  overhead:  7 %

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2021-07-16 17:43:42 +01:00