mir/ovs - ovs - Mike's Git repositories

mir/ovs

mirror of https://github.com/openvswitch/ovs synced 2025-08-31 06:15:47 +00:00

Author	SHA1	Message	Date
Greg Rose	5c1cc74da6	rhel: Fix dual kernel rpm install for RHEL 8.4 RHEL 8.4 is the first of the RHEL 8.x kernels that has broken ABI so it requires the same sort of fix as we did for several RHEL 7.x kernel that needed two kernel rpms to work for all minor revisions of the baseline kernel module. Signed-off-by: Greg Rose <gvrose8192@gmail.com>	2021-09-03 15:58:41 -07:00
Simon Horman	944950dccd	AUTHORS: Update Simon Horman Some of Netronome's activities and people have moved over to Corigine, including OVS development and myself. Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Acked-by: Ilya Maximets <i.maximets@ovn.org>	2021-09-01 12:17:53 +02:00
Ilya Maximets	43e66fc276	ovsdb: monitor: Store serialized json in a json cache. Same json from a json cache is typically sent to all the clients, e.g., in case of OVN deployment with ovn-monitor-all=true. There could be hundreds or thousands connected clients and ovsdb will serialize the same json object for each of them before sending. Serializing it once before storing into json cache to speed up processing. This change allows to save a lot of CPU cycles and a bit of memory since we need to store in memory only a string and not the full json object. Testing with ovn-heater on 120 nodes using density-heavy scenario shows reduction of the total CPU time used by Southbound DB processes from 256 minutes to 147. Duration of unreasonably long poll intervals also reduced dramatically from 7 to 2 seconds: Count Min Max Median Mean 95 percentile ------------------------------------------------------------- Before 1934 1012 7480 4302.5 4875.3 7034.3 After 1909 1004 2730 1453.0 1532.5 2053.6 Acked-by: Dumitru Ceara <dceara@redhat.com> Acked-by: Han Zhou <hzhou@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-09-01 00:15:30 +02:00
Ilya Maximets	0de8829540	raft: Don't keep full json objects in memory if no longer needed. Raft log entries (and raft database snapshot) contains json objects of the data. Follower receives append requests with data that gets parsed and added to the raft log. Leader receives execution requests, parses data out of them and adds to the log. In both cases, later ovsdb-server reads the log with ovsdb_storage_read(), constructs transaction and updates the database. On followers these json objects in common case are never used again. Leader may use them to send append requests or snapshot installation requests to followers. However, all these operations (except for ovsdb_storage_read()) are just serializing the json in order to send it over the network. Json objects are significantly larger than their serialized string representation. For example, the snapshot of the database from one of the ovn-heater scale tests takes 270 MB as a string, but 1.6 GB as a json object from the total 3.8 GB consumed by ovsdb-server process. ovsdb_storage_read() for a given raft entry happens only once in a lifetime, so after this call, we can serialize the json object, store the string representation and free the actual json object that ovsdb will never need again. This can save a lot of memory and can also save serialization time, because each raft entry for append requests and snapshot installation requests serialized only once instead of doing that every time such request needs to be sent. JSON_SERIALIZED_OBJECT can be used in order to seamlessly integrate pre-serialized data into raft_header and similar json objects. One major special case is creation of a database snapshot. Snapshot installation request received over the network will be parsed and read by ovsdb-server just like any other raft log entry. However, snapshots created locally with raft_store_snapshot() will never be read back, because they reflect the current state of the database, hence already applied. For this case we can free the json object right after writing snapshot on disk. Tests performed with ovn-heater on 60 node density-light scenario, where on-disk database goes up to 97 MB, shows average memory consumption of ovsdb-server Southbound DB processes decreased by 58% (from 602 MB to 256 MB per process) and peak memory consumption decreased by 40% (from 1288 MB to 771 MB). Test with 120 nodes on density-heavy scenario with 270 MB on-disk database shows 1.5 GB memory consumption decrease as expected. Also, total CPU time consumed by the Southbound DB process reduced from 296 to 256 minutes. Number of unreasonably long poll intervals reduced from 2896 down to 1934. Acked-by: Dumitru Ceara <dceara@redhat.com> Acked-by: Han Zhou <hzhou@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-09-01 00:14:42 +02:00
Ilya Maximets	b0bca6f27a	json: Add support for partially serialized json objects. Introducing a new json type JSON_SERIALIZED_OBJECT. It's not an actual type that can be seen in a json message on a wire, but internal type that is intended to hold a serialized version of some other json object. For this reason it's defined after the JSON_N_TYPES to not confuse parsers and other parts of the code that relies on compliance with RFC 4627. With this JSON type internal users may construct large JSON objects, parts of which are already serialized. This way, while serializing the larger object, data from JSON_SERIALIZED_OBJECT can be added directly to the result, without additional processing. This will be used by next commits to add pre-serialized JSON data to the raft_header structure, that can be converted to a JSON before writing the file transaction on disk or sending to other servers. Same technique can also be used to pre-serialize json_cache for ovsdb monitors, this should allow to not perform serialization for every client and will save some more memory. Since serialized JSON is just a string, reusing the 'json->string' pointer for it. Acked-by: Dumitru Ceara <dceara@redhat.com> Acked-by: Han Zhou <hzhou@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-08-31 22:19:46 +02:00
Ilya Maximets	c2fb5bdae6	ovs-actions: Convert man page from xml to rST. This way it's easier to show it on a website as it will be updated automatically along with the rest of the documentation. Sphinx doesn't render everything perfectly, but it looks good enough in both man and html versions. rST is a bit easier to read and it takes less space. Conversion performed manually since I didn't found any good tool that can actually make the process any faster. Along the way I replaced versions like x.y.90 with x.y+1, because it doesn't seem correct to me to refer non-released versions of OVS in the docs. Fixed a couple of small mistakes like duplicated paragraph and reference to a different section by incorrect name. Also removed bits of xml->nroff conversion code that is not needed anymore. Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Roi Dayan <roid@nvidia.com>	2021-08-31 20:56:47 +02:00
Ilya Maximets	748010ff30	json: Optimize string serialization. Current string serialization code puts all characters one by one. This is slow because dynamic string needs to perform length checks on every ds_put_char() and it's also doesn't allow compiler to use better memory copy operations, i.e. doesn't allow copying few bytes at once. Special symbols are rare in a typical database. Quotes are frequent, but not too frequent. In databases created by ovn-kubernetes, for example, usually there are at least 10 to 50 chars between quotes. So, it's better to count characters that doesn't require escaping and use fast data copy for the whole sequential block. Testing with a synthetic benchmark (included) on my laptop shows following performance improvement: Size Q S Before After Diff ----------------------------------------------------- 100000 0 0 : 0.227 ms 0.142 ms -37.4 % 100000 2 1 : 0.277 ms 0.186 ms -32.8 % 100000 10 1 : 0.361 ms 0.309 ms -14.4 % 10000000 0 0 : 22.720 ms 12.160 ms -46.4 % 10000000 2 1 : 27.470 ms 19.300 ms -29.7 % 10000000 10 1 : 37.950 ms 31.250 ms -17.6 % 100000000 0 0 : 239.600 ms 126.700 ms -47.1 % 100000000 2 1 : 292.400 ms 188.600 ms -35.4 % 100000000 10 1 : 387.700 ms 321.200 ms -17.1 % Here Q - probability (%) for a character to be a '\"' and S - probability (%) to be a special character ( < 32). Testing with a closer to real world scenario shows overall decrease of the time needed for database compaction by ~5-10 %. And this change also decreases CPU consumption in general, because string serialization is used in many different places including ovsdb monitors and raft. Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Numan Siddique <numans@ovn.org> Acked-by: Dumitru Ceara <dceara@redhat.com>	2021-08-31 19:04:08 +02:00
Ilya Maximets	7847bf89ea	tests: Skip netlink policy test on non-Linux platforms. FreeBSD tests in Cirrus CI are broken and, I guess, windows tests too: 89. library.at:258: testing netlink policy ... ./library.at:259: ovstest test-netlink-policy ll_addr --- /dev/null 2021-08-20 19:02:41.907547000 +0000 +++ /tmp/cirrus-ci-build/tests/testsuite.dir/at-groups/89/stderr @@ -0,0 +1 @@ +ovstest: unknown command 'test-netlink-policy'; use --help for help ./library.at:259: exit code was 1, expected 0 89. library.at:258: 89. netlink policy (library.at:258): FAILED 'tests/test-netlink-policy.c' is built only on Linux, test must be skipped on all other platforms to unblock CI. Fixes: `bfee9f6c01` ("netlink: Add support for parsing link layer address.") Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Frode Nordahl <frode.nordahl@canonical.com>	2021-08-28 16:10:02 +02:00
Numan Siddique	7502849e95	ovsdb-idl: Add APIs to query if a table and a column is present. This patch adds 2 new APIs in the ovsdb-idl client library - ovsdb_idl_server_has_table() and ovsdb_idl_server_has_column() to query if a table and a column is present in the IDL or not. This patch also adds IDL helper functions which are auto generated from the schema which makes it easier for the clients. These APIs are required for scenarios where the server schema is old and missing a table or column and the client (built with a new schema version) does a transaction with the missing table or column. This results in a continuous loop of transaction failures. Related-Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1992705 Signed-off-by: Numan Siddique <nusiddiq@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-08-28 02:59:04 +02:00
Frode Nordahl	bfee9f6c01	netlink: Add support for parsing link layer address. Data retrieved from netlink and friends may include link layer address. Add type to nl_attr_type and min/max functions to allow use of nl_policy_parse with this type of data. While this will not be used by Open vSwitch itself at this time, sibling and derived projects want to use the great netlink library that OVS provides, and it is not possible to safely override the global nl_attr_type symbol at link time. Signed-off-by: Frode Nordahl <frode.nordahl@canonical.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2021-08-20 11:32:52 -07:00
Ilya Maximets	9ab5075bff	Set release date for 2.16.0. Acked-by: Numan Siddique <numans@ovn.org> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-08-17 00:26:28 +02:00
Ilya Maximets	3fc34862bf	AUTHORS: Add Vlad Buslov. Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-08-16 21:28:17 +02:00
Vlad Buslov	6bb3f363d6	tc: Set action flags for tunnel_key release. The commit that enabled 'no_percpu' flag for compatible actions missed the tunnel_key release action code. Add missing call to nl_msg_put_act_flags(). Fixes: `292d5bd9bb` ("tc: Set 'no_percpu' flag for compatible actions") Reported-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Tested-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-08-16 19:40:29 +02:00
Paolo Valerio	d08c086e56	netlink-socket: Replace error with txn->error when logging nacked transactions. in case nl_msg_nlmsgerr returns true which basically means that the nlmsg_type == NLMSG_ERROR, we need to log the error code, besides the descriptive representation, stored by nl_msg_nlmsgerr instead of "error". Fixes: `72d32ac0b3` ("netlink-socket: Make caller provide message receive buffers.") Suggested-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Reviewed-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: Paolo Valerio <pvalerio@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-08-16 19:38:13 +02:00
Sriharsha Basavapatna	d9a5fb78d7	netdev-offload-dpdk: Initialize s_tnl dynamic string. The 's_tnl' member in flow_patterns and flow_actions should be to set to DS_EMPTY_INITIALIZER, to be consistent with dynamic string initializations. Also, there's a potential memory leak of flow_patterns->s_tnl. Fix this by destroying s_tnl in free_flow_patterns(). Fixes: `507d20e77b` ("netdev-offload-dpdk: Support vports flows offload.") Fixes: `be56e063d0` ("netdev-offload-dpdk: Support tunnel pop action.") Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Acked-by: Gaetan Rivet <grive@u256.net> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-08-16 19:25:43 +02:00
Sriharsha Basavapatna	0363891d10	dynamic-string: Fix a crash in ds_clone(). ds_clone() crashes while trying to clone an empty dynamic string. It happens because it doesn't check if memory was allocated and tries to read from the NULL pointer. ds_init() doesn't allocate any memory. For example: In netdev_offload_dpdk_flow_create() when an offload request fails, dump_flow() is called to log a warning message. The 's_tnl' string in flow_patterns gets initialized in vport_to_rte_tunnel() conditionally via ds_put_format(). If it is not initialized, it crashes later in dump_flow_attr()->ds_clone()->memcpy() while dereferencing this string. To fix this, check if memory for the src string has been allocated, before copying it to the dst string. Fixes: `fa44a4a3ff` ("ovn-controller: Persist desired conntrack groups.") Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-08-16 19:25:43 +02:00
Harry van Haaren	0b3a5d7add	dpif-netdev: fix memory leak in dpif and mfex commands This patch fixes a memory leak in the commands for DPIF and MFEX get and set. In order to operate the commands require a pmd_list, which is currently not freed after it has been used. This issue was identified by a static analysis tool. Fixes: `3d8f47bc` ("dpif-netdev: Add command line and function pointer for miniflow extract") Fixes: `abb807e2` ("dpif-netdev: Add command to switch dpif implementation.") Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-08-16 15:53:47 +01:00
Harry van Haaren	01cbe1ed41	dpif-netdev: fix memory leak in dpcls subtable set command This patch fixes a memory leak when the command "dpif-netdev/subtable-lookup-prio-set" is run, the pmd_list required to iterate the PMD threads was not being freed. This issue was identified by a static analysis tool. Fixes: `3d018c3e` ("dpif-netdev: Add subtable lookup prio set command.") Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-08-16 15:52:55 +01:00
kumar Amber	d2ad305a64	dpif-netdev: Fix dead code in mfex command The commit removes the dead code from the MFEX set command as highlighted by static tool analysis. Fixes: `a395b132b7` ("dpif-netdev: Add packet count and core id paramters for study") Signed-off-by: kumar Amber <kumar.amber@intel.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-08-16 15:18:12 +01:00
Cian Ferriter	8a5f055a0b	docs/dpdk/bridge: Fix dpif-netdev/miniflow-parser-set formatting The "name" parameter isn't optional so don't use brackets around it. Fixes: `5c5c98cec2` ("docs/dpdk/bridge: Add miniflow extract section.") Signed-off-by: Cian Ferriter <cian.ferriter@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-08-16 15:00:15 +01:00
Cian Ferriter	96e513f557	dpif-netdev-unixctl.man: Document miniflow-parser-* CMDs Document the "ovs-appctl dpif-netdev/miniflow-parser-get" and "ovs-appctl dpif-netdev/miniflow-parser-set" commands in the vswitchd manpage. Fixes: `3d8f47bc04` ("dpif-netdev: Add command line and function pointer for miniflow extract") Signed-off-by: Cian Ferriter <cian.ferriter@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-08-16 14:59:38 +01:00
Cian Ferriter	3004ebf20f	dpif-netdev-unixctl.man: Document subtable-lookup-* CMDs Document the "ovs-appctl dpif-netdev/subtable-lookup-prio-get" and "ovs-appctl dpif-netdev/subtable-lookup-prio-set" commands in the vswitchd manpage. Fixes: `9ff7cabfd7` ("dpif-netdev: add subtable-lookup-prio-get command.") Fixes: `3d018c3ea7` ("dpif-netdev: add subtable lookup prio set command.") Signed-off-by: Cian Ferriter <cian.ferriter@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-08-16 14:58:57 +01:00
Harry van Haaren	c15c3df3ac	dpcls: fix build on compilers without AVX512-VPOPCNT This commit adds extra checks around the AVX-512 vpopcnt instruction enabling, ensuring that in the function where the ISA is enabled the compiler has also indicated its support for the ISA. This is achieved by checking the __AVX512VPOPCNTDQ__ define, which the compiler sets if it is capable of handling the vpopcnt instruction. If the compiler is not capable of handling vpopcnt, we fall back to the emulated vpopcnt implementation. Reported-by: Ian Stokes <ian.stokes@intel.com> Fixes: `1e31489134` ("dpcls-avx512: Enable avx512 vector popcount instruction.") Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-08-10 11:41:18 +01:00
Ilya Maximets	ccb6cc20ff	AUTHORS.rst: Add Lin Huang. Lin Huang authored sevaral patches, but wasn't added to the list of authors for some reason. Adding by request. Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-08-09 21:20:49 +02:00
Ilya Maximets	765c8a774d	AUTHORS.rst: Add Gaetan Rivet and Wentao Jia. Wentao Jia provided useful bug reports for ovsdb relay mode. Gaetan Rivet authored several patches already, but I seem to forget to add him to the list of authors. Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-08-05 21:47:40 +02:00
Ilya Maximets	2480d09e9f	ovsdb: transaction-forward: Fix initialization of the 'sent' hmap node. 'sent_node' is initialized to all zeroes by xzalloc(), but HMAP_NODE_NULL is not all zeroes. hmap_node_is_null() is used to detect if the node is valid, but it will fail and cause segmentation fault on attempt to remove the non-existent node from the hash map. This can happen if client disconnected while the transaction is not yet forwarded to the relay source: Program terminated with signal 11, Segmentation fault. 0 in hmap_remove at include/openvswitch/hmap.h:293 293 while (*bucket != node) { (gdb) bt 0 hmap_remove at include/openvswitch/hmap.h:293 1 ovsdb_txn_forward_unlist at ovsdb/transaction-forward.c:67 2 ovsdb_txn_forward_destroy at ovsdb/transaction-forward.c:79 3 ovsdb_trigger_destroy at ovsdb/trigger.c:70 4 ovsdb_jsonrpc_trigger_complete at ovsdb/jsonrpc-server.c:1192 5 ovsdb_jsonrpc_trigger_remove__ at ovsdb/jsonrpc-server.c:1204 6 ovsdb_jsonrpc_trigger_complete_all at ovsdb/jsonrpc-server.c:1223 7 ovsdb_jsonrpc_session_run at ovsdb/jsonrpc-server.c:546 8 ovsdb_jsonrpc_session_run_all at ovsdb/jsonrpc-server.c:591 9 ovsdb_jsonrpc_server_run at ovsdb/jsonrpc-server.c:406 10 main_loop (gdb) print db->txn_forward_sent $20 = {buckets = 0x..., one = 0x0, mask = 63, n = 0} (gdb) print txn_fwd->sent_node $24 = {hash = 0, next = 0x0} Fix that by correct initialization of the 'sent_node'. Reported-by: Wentao Jia <wentao.jia@easystack.cn> Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2021-August/051354.html Fixes: `7964ffe7d2` ("ovsdb: relay: Add support for transaction forwarding.") Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Mark D. Gray <mark.d.gray@redhat.com>	2021-08-05 21:35:28 +02:00
Ilya Maximets	a3bd383e94	acinclude: Don't set AVX512-related configuration via CFLAGS. The correct way to pass configuration options is to define them inside the config.h. Additionally, few long lines wrapped and fixed the unnecessary double check for -mavx512f. Fixes: `abb807e27d` ("dpif-netdev: Add command to switch dpif implementation.") Fixes: `5324b54e60` ("dpif-netdev: Add configure to enable autovalidator at build time.") Fixes: `e90e115a01` ("dpif-netdev: implement subtable lookup validation.") Fixes: `352b6c7116` ("dpif-lookup: add avx512 gather implementation.") Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Cian Ferriter <cian.ferriter@intel.com>	2021-08-05 11:46:33 +02:00
Eli Britstein	e21e9dcec2	dpif-netdev: Log flow modification in debug level. Log flow modifications to help debugging. Signed-off-by: Eli Britstein <elibr@nvidia.com> Reviewed-by: Gaetan Rivet <gaetanr@nvidia.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-08-02 19:33:03 +02:00
Eli Britstein	6f69e0e30b	dpif-netdev: Fix offloads of modified flows. Association of a mark to a flow is done as part of its offload handling, in the offloading thread. However, the PMD thread specifies whether an offload request is an "add" or "modify" by the association of a mark to the flow. This is exposed to a race condition. A flow might be created with actions that cannot be fully offloaded, for example flooding (before MAC learning), and later modified to have actions that can be fully offloaded. If the two requests are queued before the offload thread handling, they are both marked as "add". When the offload thread handles them, the first request is partially offloaded, and the second one is ignored as the flow is already considered as offloaded. Fix it by specifying add/modify of an offload request by the actual flow state change, without relying on the mark. Fixes: `3c7330ebf0` ("netdev-offload-dpdk: Support offload of output action.") Signed-off-by: Eli Britstein <elibr@nvidia.com> Reviewed-by: Gaetan Rivet <gaetanr@nvidia.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-08-02 18:56:56 +02:00
Eli Britstein	0d25621e4d	dpif-netdev: Fix flow modification after failure. dp_netdev_flow_offload_main thread is asynchronous, by the cited commit. There might be a case where there are modification requests of the same flow submitted before handled. Then, if the first handling fails, the rule for the flow is deleted, and the mark is freed. Then, the following one should not be handled as a modification, but rather as an "add". Fixes: `02bb2824e5` ("dpif-netdev: do hw flow offload in a thread") Signed-off-by: Eli Britstein <elibr@nvidia.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-08-02 18:56:56 +02:00
Eli Britstein	8917010b05	dpif-netdev: Do not flush PMD offloads on reload. Before flushing offloads of a removed port was supported by [1], it was necessary to flush the 'marks'. In doing so, all offloads of the PMD are removed, include the ones that are not related to the removed port and that are not modified following this removal. As a result such flows are evicted from being offloaded, and won't resume offloading. As PMD offload flush is not necessary, avoid it. [1] `62d1c28e9c` ("dpif-netdev: Flush offload rules upon port deletion.") Signed-off-by: Eli Britstein <elibr@nvidia.com> Reviewed-by: Gaetan Rivet <gaetanr@nvidia.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-08-02 18:56:47 +02:00
Ilya Maximets	cd36a34f33	dpif-netdev: Fix non-atomic read of smc_enable_db. Atomic variables must be read atomically. And since we already have 'smc_enable_db' in the PMD context, we need to use it from there to avoid reading atomically twice. Also, 'smc_enable_db' is a global configuration, there is no need to read it per-port or per-rxq. Fixes: `9ac84a1a36` ("dpif-avx512: Add ISA implementation of dpif.") Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Cian Ferriter <cian.ferriter@intel.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-08-02 18:52:44 +02:00
Rosemarie O'Riorden	de15afa50f	dpdk: Stop configuring socket-limit with the value of socket-mem. This change removes the automatic memory limit on start-up of OVS with DPDK. As DPDK supports dynamic memory allocation, there is no need to limit the amount of memory available, if not requested. Currently, if socket-limit is not configured, it is set to the value of socket-mem. With this change, the user can decide to set it or have no memory limit. Removed logs that announce this change and fixed documentation. Reported at: https://bugzilla.redhat.com/show_bug.cgi?id=1949850 Signed-off-by: Rosemarie O'Riorden <roriorde@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-26 03:31:46 +02:00
Rosemarie O'Riorden	a8621f49d0	dpdk: Remove default values for socket-mem and limit. This change removes the default values for EAL args socket-mem and socket-limit. As DPDK supports dynamic memory allocation, there is no need to allocate a certain amount of memory on start-up, nor limit the amount of memory available, if not requested. Currently, socket-mem has a default value of 1024 when it is not configured by the user, and socket-limit takes on the value of socket-mem, 1024, by default. With this change, socket-mem is not configured by default, meaning that socket-limit is not either. Neither, either or both options can be set. Removed extra logs that announce this change and fixed documentation. Reported at: https://bugzilla.redhat.com/show_bug.cgi?id=1949850 Signed-off-by: Rosemarie O'Riorden <roriorde@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-26 03:31:46 +02:00
Eli Britstein	1f7f557603	netdev-offload-dpdk: Fix vxlan vni cast-align warnings. Reported-by: Harry Van Haaren <harry.van.haaren@intel.com> Fixes: `4e432d6f81` ("netdev-offload-dpdk: Support tnl/push using vxlan encap attribute.") Fixes: `e098c2f966` ("netdev-dpdk-offload: Add vxlan pattern matching function.") Signed-off-by: Eli Britstein <elibr@nvidia.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-26 01:53:07 +02:00
Eli Britstein	e8cccd3a35	netdev-offload-dpdk: Fix IPv6 rewrite cast-align warning. Fixes: `b6207b1d27` ("netdev-offload-dpdk: Support offload of set IPv6 actions.") Signed-off-by: Eli Britstein <elibr@nvidia.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-26 01:53:07 +02:00
Ilya Maximets	b8b31d8410	daemon-unix: Fix leak of a fork error message. 19 bytes in 1 blocks are definitely lost in loss record 24 of 121 at 0x4839748: malloc (vg_replace_malloc.c:306) by 0x483BD63: realloc (vg_replace_malloc.c:834) by 0x521C26: xrealloc (util.c:149) by 0x478F91: ds_reserve (dynamic-string.c:63) by 0x47928B: ds_put_format_valist (dynamic-string.c:161) by 0x47920A: ds_put_format (dynamic-string.c:142) by 0x506DE5: process_status_msg (process.c:0) by 0x52A6D0: fork_and_wait_for_startup (daemon-unix.c:284) by 0x52A54D: daemonize_start (daemon-unix.c:453) by 0x40EB3E: main (ovs-vswitchd.c:91) Fixes: `b925336a36` ("daemon: restart child process if it died before signaling its readiness") Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Roi Dayan <roid@nvidia.com>	2021-07-26 01:44:48 +02:00
Dumitru Ceara	daf627f459	ovsdb-cs: Perform forced reconnects without a backoff. The ovsdb-cs layer triggers a forced reconnect in various cases: - when an inconsistency is detected in the data received from the remote server. - when the remote server is running in clustered mode and transitioned to "follower", if the client is configured in "leader-only" mode. - when explicitly requested by upper layers (e.g., by the user application, through the IDL layer). In such cases it's desirable that reconnection should happen as fast as possible, without the current exponential backoff maintained by the underlying reconnect object. Furthermore, since `3c2d6274bc` ("raft: Transfer leadership before creating snapshots."), leadership changes inside the clustered database happen more often and, therefore, "leader-only" clients need to reconnect more often too. Forced reconnects call jsonrpc_session_force_reconnect() which will not reset backoff. To make sure clients reconnect as fast as possible in the aforementioned scenarios we first call the new API, jsonrpc_session_reset_backoff(), in ovsdb-cs, for sessions that are in state CS_S_MONITORING (i.e., the remote is likely still alive and functioning fine). jsonrpc_session_reset_backoff() resets the number of backoff-free reconnect retries to the number of remotes configured for the session, ensuring that all remotes are retried exactly once with backoff 0. This commit also updates the Python IDL and jsonrpc implementations. The Python IDL wasn't tracking the IDL_S_MONITORING state explicitly, we now do that too. Tests were also added to make sure the IDL forced reconnects happen without backoff. Reported-at: https://bugzilla.redhat.com/1977264 Suggested-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-23 17:29:36 +02:00
kumar Amber	69b2bdfd3f	system-dpdk.at: Fix module not found error for pyhton < 3.6. This fixes the flake8 error on pyhton version older than 3.6 as ModuleNotFoundError in not available before 3.6. ../../tests/mfex_fuzzy.py:5:8: F821 undefined name 'ModuleNotFoundError' Makefile:5826: recipe for target 'flake8-check' failed Since it doesn't really make any sense to catch this exception, try-except block is just removed. Additionally the check for scapy replaced with the more reliable one. Imports re-ordered, because standard imports should go first. Fixes: `50be6715c0` ("test/sytem-dpdk: Add unit test for mfex autovalidator") Signed-off-by: kumar Amber <kumar.amber@intel.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-23 17:25:46 +02:00
Ilya Maximets	43b7d960af	netdev-dummy: Silence the 'may be uninitialized' warning. GCC 11 with -O1 on Feodra 34 emits a false-positive warning like this: lib/netdev-dummy.c: In function ‘dummy_packet_stream_run’: lib/netdev-dummy.c:284:16: error: ‘n’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 284 \| if (retval == n && dp_packet_size(&s->rxbuf) > 2) { \| ^ This breaks the build with --enable-Werror. Initializing 'n' to avoid the warning. Acked-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-23 15:40:08 +02:00
Ben Pfaff	f05d6d623e	ofproto-dpif-xlate: Fix continuations with OF instructions in OF1.1+. Open vSwitch supports OpenFlow "instructions", which were introduced in OpenFlow 1.1 and act like restricted kinds of actions that can only appear in a particular order and particular circumstances. OVS did not support two of these instructions, "write_metadata" and "goto_table", properly in the case where they appeared in a flow that needed to be frozen for continuations. Both of these instructions had the problem that they couldn't be properly serialized into the stream of actions, because they're not actions. This commit fixes that problem in freeze_unroll_actions() by converting them into equivalent actions for serialization. goto_table had the additional problem that it was being serialized to the frozen stream even after it had been executed. This was already properly handled in do_xlate_actions() for resubmit, which is almost equivalent to goto_table, so this commit applies the same fix to goto_table. (The commit removes an assertion from the goto_table implementation, but there wasn't any real value in that assertion and I thought the code looked cleaner without it.) This commit adds tests that would have found these bugs. This includes adding a variant of each continuation test that uses OF1.3 for monitor/resume (which is necessary to trigger these bugs) plus specific tests for continuations with goto_table and write_metadata. It also improves the continuation test infrastructure to add more detail on the problem if a test fails. Signed-off-by: Ben Pfaff <blp@ovn.org> Reported-by: Grayson Wu <wgrayson@vmware.com> Reported-at: https://github.com/openvswitch/ovs-issues/issues/213 Discussed-at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/386166.html Acked-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-22 12:26:20 -07:00
Wilson Peng	8e808e7f14	datapath-windows:Correct checksum for DNAT action While testing OVS-windows flows for the DNAT action, the checksum In TCP header is set incorrectly when TCP offload is enabled by Default. As a result, the packet will be dropped on receiver linuxVM. >>>sample flow default configuration on both Windows VM and Linux VM (src=40.0.1.2,dst=10.150.0.1) --dnat--> (src=40.0.1.2,dst==30.1.0.2) Without the fix for some TCP packet(40.0.1.2->30.1.0.2 with payload len 207) the TCP checksum will be pseduo header checksum and the value is 0x01d6. With the fix the checksum will be 0x47ee, it could be got the correct TCP checksum on the receiver Linux VM. Signed-off-by: Wilson Peng<pweisong@vmware.com> Signed-off-by: Anand Kumar<kumaranand@vmware.com> Acked-by: Alin-Gabriel Serdean <aserdean@ovn.org> Signed-off-by: Alin-Gabriel Serdean <aserdean@ovn.org>	2021-07-21 12:59:06 +03:00
David Marchand	9547987526	Documentation: Remove duplicate words. This is a simple cleanup with a script of mine. Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2021-07-19 09:33:01 -07:00
Ilya Maximets	4703bc67b7	Prepare for post-2.16.0 (2.16.90). Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-16 21:52:39 +02:00
Ilya Maximets	45bd6d93f1	Prepare for 2.16.0. Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-16 21:52:23 +02:00
Ilya Maximets	298d4151f4	bond: Fix broken rebalancing after link state changes. There are 3 constraints for moving hashes from one member to another: 1. The load difference exceeds ~ 3% of the load of one member. 2. The difference in load between members exceeds 100,000 bytes. 3. Moving the hash reduces the load difference by more than 10%. In the current implementation, if one of the members transitions to the DOWN state, all hashes assigned to it will be moved to the other members. After that, if the member goes UP, it will wait for rebalancing to get hashes. But in case we have more than 10 equally loaded hashes, it will never meet constraint # 3, because each hash will handle less than 10% of the load. The situation gets worse when the number of flows grows and it is almost impossible to transfer any hash when all 256 hash records are used, which is very likely when we have few hundred/thousand flows. As a result, if one of the members goes down and back up while traffic flows, it will never be used to transmit packets again. This will not be fixed even if we completely stop the traffic and start it again, because the first two constraints will block rebalancing in the earlier stages, while we have low traffic volume. Moving a single hash if the destination does not have any hashes, as it was before commit `c460a6a7bc` ("ofproto/bond: simplifying the rebalancing logic"), will not help, because a single hash is not enough to make the difference in load less than 10% of the total load, and this member will handle only that one hash forever. To fix this, let's try to move multiple hashes at the same time to meet constraint # 3. The implementation includes sorting the "records" to be able to collect records with a cumulative load close enough to the ideal value. Acked-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-16 21:52:23 +02:00
Mark Gray	b1e517bd2f	dpif-netlink: Introduce per-cpu upcall dispatch. The Open vSwitch kernel module uses the upcall mechanism to send packets from kernel space to user space when it misses in the kernel space flow table. The upcall sends packets via a Netlink socket. Currently, a Netlink socket is created for every vport. In this way, there is a 1:1 mapping between a vport and a Netlink socket. When a packet is received by a vport, if it needs to be sent to user space, it is sent via the corresponding Netlink socket. This mechanism, with various iterations of the corresponding user space code, has seen some limitations and issues: * On systems with a large number of vports, there is correspondingly a large number of Netlink sockets which can limit scaling. (https://bugzilla.redhat.com/show_bug.cgi?id=1526306) * Packet reordering on upcalls. (https://bugzilla.redhat.com/show_bug.cgi?id=1844576) * A thundering herd issue. (https://bugzilla.redhat.com/show_bug.cgi?id=1834444) This patch introduces an alternative, feature-negotiated, upcall mode using a per-cpu dispatch rather than a per-vport dispatch. In this mode, the Netlink socket to be used for the upcall is selected based on the CPU of the thread that is executing the upcall. In this way, it resolves the issues above as: a) The number of Netlink sockets scales with the number of CPUs rather than the number of vports. b) Ordering per-flow is maintained as packets are distributed to CPUs based on mechanisms such as RSS and flows are distributed to a single user space thread. c) Packets from a flow can only wake up one user space thread. Reported-at: https://bugzilla.redhat.com/1844576 Signed-off-by: Mark Gray <mark.d.gray@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-16 20:05:03 +02:00
Mark Gray	485e3a13a6	dpif-netlink: Fix report_loss() message. Fixes: `1579cf677f` ("dpif-linux: Implement the API functions to allow multiple handler threads read upcall.") Signed-off-by: Mark Gray <mark.d.gray@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-16 20:05:03 +02:00
Mark Gray	1325debb45	ofproto: Change type of n_handlers and n_revalidators. 'n_handlers' and 'n_revalidators' are declared as type 'size_t'. However, dpif_handlers_set() requires parameter 'n_handlers' as type 'uint32_t'. This patch fixes this type mismatch. Signed-off-by: Mark Gray <mark.d.gray@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-16 20:05:03 +02:00
David Marchand	3222a89d9a	dpif-netdev: Report overhead busy cycles per pmd. Users complained that per rxq pmd usage was confusing: summing those values per pmd would never reach 100% even if increasing traffic load beyond pmd capacity. This is because the dpif-netdev/pmd-rxq-show command only reports "pure" rxq cycles while some cycles are used in the pmd mainloop and adds up to the total pmd load. dpif-netdev/pmd-stats-show does report per pmd load usage. This load is measured since the last dpif-netdev/pmd-stats-clear call. On the other hand, the per rxq pmd usage reflects the pmd load on a 10s sliding window which makes it non trivial to correlate. Gather per pmd busy cycles with the same periodicity and report the difference as overhead in dpif-netdev/pmd-rxq-show so that we have all info in a single command. Example: $ ovs-appctl dpif-netdev/pmd-rxq-show pmd thread numa_id 1 core_id 3: isolated : true port: dpdk0 queue-id: 0 (enabled) pmd usage: 90 % overhead: 4 % pmd thread numa_id 1 core_id 5: isolated : false port: vhost0 queue-id: 0 (enabled) pmd usage: 0 % port: vhost1 queue-id: 0 (enabled) pmd usage: 93 % port: vhost2 queue-id: 0 (enabled) pmd usage: 0 % port: vhost6 queue-id: 0 (enabled) pmd usage: 0 % overhead: 6 % pmd thread numa_id 1 core_id 31: isolated : true port: dpdk1 queue-id: 0 (enabled) pmd usage: 86 % overhead: 4 % pmd thread numa_id 1 core_id 33: isolated : false port: vhost3 queue-id: 0 (enabled) pmd usage: 0 % port: vhost4 queue-id: 0 (enabled) pmd usage: 0 % port: vhost5 queue-id: 0 (enabled) pmd usage: 92 % port: vhost7 queue-id: 0 (enabled) pmd usage: 0 % overhead: 7 % Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-07-16 17:43:42 +01:00

1 2 3 4 5 ...

18755 Commits