2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-22 01:51:26 +00:00

12 Commits

Author SHA1 Message Date
Ilya Maximets
1de4a08c22 json: Use functions to access json arrays.
Internal implementation of JSON array will be changed in the future
commits.  Add access functions that users can rely on instead of
accessing the internals of 'struct json' directly and convert all the
users.  Structure fields are intentionally renamed to make sure that
no code is using the old fields directly.

json_array() function is removed, as not needed anymore.  Added new
functions:  json_array_size(), json_array_at(), json_array_set()
and json_array_pop().  These are enough to cover all the use cases
within OVS.

The change is fairly large, however, IMO, it's a much overdue cleanup
that we need even without changing the underlying implementation.

Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-06-30 16:53:56 +02:00
Dmitry Porokh
421c94ee14 ovsdb: Introduce and use specialized uuid print functions.
According to profiling data, converting UUIDs to strings is a frequent
operation in some workloads. This typically results in a call to
xasprintf(), which internally calls vsnprintf() twice, first to
calculate the required buffer size, and then to format the string.

This patch introduces specialized functions for printing UUIDs, which
both reduces code duplication and improves performance.

For example, on my laptop, 10,000,000 calls to the new uuid_to_string()
function takes 1296 ms, while the same number of xasprintf() calls using
UUID_FMT take 2498 ms.

Signed-off-by: Dmitry Porokh <dporokh@nvidia.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
2025-05-08 09:28:21 +02:00
Adrian Moreno
9e56549c2b hmap: use short version of safe loops if possible.
Using SHORT version of the *_SAFE loops makes the code cleaner and less
error prone. So, use the SHORT version and remove the extra variable
when possible for hmap and all its derived types.

In order to be able to use both long and short versions without changing
the name of the macro for all the clients, overload the existing name
and select the appropriate version depending on the number of arguments.

Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-30 16:59:02 +02:00
Ilya Maximets
08e9e53373 ovsdb: raft: Fix inability to read the database with DNS host names.
Clustered OVSDB allows to use DNS names as addresses of raft members.
However, if DNS resolution fails during the initial database read,
this causes a fatal failure and exit of the ovsdb-server process.

Also, if DNS name of a joining server is not resolvable for one of the
followers, this follower will reject append requests for a new server
to join until the name is successfully resolved.  This makes a follower
effectively non-functional while DNS is unavailable.

To fix the problem relax the address verification.  Allowing validation
to pass if only name resolution failed and the address is valid
otherwise.  This will allow addresses to be added to the database, so
connections could be established later when the DNS is available.

Additionally fixing missed initialization of the dns-resolve module.
Without it, DNS requests are blocking.  This causes unexpected delays
in runtime.

Fixes: 771680d96fb6 ("DNS: Add basic support for asynchronous DNS resolving")
Reported-at: https://bugzilla.redhat.com/2055097
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-30 16:59:02 +02:00
Ilya Maximets
0de8829540 raft: Don't keep full json objects in memory if no longer needed.
Raft log entries (and raft database snapshot) contains json objects
of the data.  Follower receives append requests with data that gets
parsed and added to the raft log.  Leader receives execution requests,
parses data out of them and adds to the log.  In both cases, later
ovsdb-server reads the log with ovsdb_storage_read(), constructs
transaction and updates the database.  On followers these json objects
in common case are never used again.  Leader may use them to send
append requests or snapshot installation requests to followers.
However, all these operations (except for ovsdb_storage_read()) are
just serializing the json in order to send it over the network.

Json objects are significantly larger than their serialized string
representation.  For example, the snapshot of the database from one of
the ovn-heater scale tests takes 270 MB as a string, but 1.6 GB as
a json object from the total 3.8 GB consumed by ovsdb-server process.

ovsdb_storage_read() for a given raft entry happens only once in a
lifetime, so after this call, we can serialize the json object, store
the string representation and free the actual json object that ovsdb
will never need again.  This can save a lot of memory and can also
save serialization time, because each raft entry for append requests
and snapshot installation requests serialized only once instead of
doing that every time such request needs to be sent.

JSON_SERIALIZED_OBJECT can be used in order to seamlessly integrate
pre-serialized data into raft_header and similar json objects.

One major special case is creation of a database snapshot.
Snapshot installation request received over the network will be parsed
and read by ovsdb-server just like any other raft log entry.  However,
snapshots created locally with raft_store_snapshot() will never be
read back, because they reflect the current state of the database,
hence already applied.  For this case we can free the json object
right after writing snapshot on disk.

Tests performed with ovn-heater on 60 node density-light scenario,
where on-disk database goes up to 97 MB, shows average memory
consumption of ovsdb-server Southbound DB processes decreased by 58%
(from 602 MB to 256 MB per process) and peak memory consumption
decreased by 40% (from 1288 MB to 771 MB).

Test with 120 nodes on density-heavy scenario with 270 MB on-disk
database shows 1.5 GB memory consumption decrease as expected.
Also, total CPU time consumed by the Southbound DB process reduced
from 296 to 256 minutes.  Number of unreasonably long poll intervals
reduced from 2896 down to 1934.

Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-09-01 00:14:42 +02:00
Ilya Maximets
83fbd2e9dc raft: Avoid having more than one snapshot in-flight.
Previous commit 8c2c503bdb0d ("raft: Avoid sending equal snapshots.")
took a "safe" approach to not send only exactly same snapshot
installation requests.  However, it doesn't make much sense to send
more than one snapshot at a time.  If obsolete snapshot installed,
leader will re-send the most recent one.

With this change leader will have only 1 snapshot in-flight per
connection.  This will reduce backlogs on raft connections in case
new snapshot created while 'install_snapshot_request' is in progress
or if election timer changed in that period.

Also, not tracking the exact 'install_snapshot_request' we've sent
allows to simplify the code.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1888829
Fixes: 8c2c503bdb0d ("raft: Avoid sending equal snapshots.")
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-11-03 13:01:33 +01:00
Ilya Maximets
8c2c503bdb raft: Avoid sending equal snapshots.
Snapshots are huge.  In some cases we could receive several outdated
append replies from the remote server.  This could happen in high
scale cases if the remote server is overloaded and not able to process
all the raft requests in time.  As an action to each outdated append
reply we're sending full database snapshot.  While remote server is
already overloaded those snapshots will stuck in jsonrpc backlog for
a long time making it grow up to few GB.  Since remote server wasn't
able to timely process incoming messages it will likely not able to
process snapshots leading to the same situation with low chances to
recover.  Remote server will likely stuck in 'candidate' state, other
servers will grow their memory consumption due to growing jsonrpc
backlogs:

jsonrpc|INFO|excessive sending backlog, jsonrpc: ssl:192.16.0.3:6644,
             num of msgs: 3795, backlog: 8838994624.

This patch is trying to avoid that situation by avoiding sending of
equal snapshot install requests.  This helps maintain reasonable memory
consumption and allows the cluster to recover on a larger scale.

Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-05-28 11:46:39 +02:00
Han Zhou
a76ba8254d raft: Save and read new election timer in header snapshot.
This patch store the latest election timer in snapshot during log
compression, and when server restarts it reads the value from the log.
Without this, any previous changes to election timer will be lost
in the log, and if server restarts, it will use the default value
instead of the changed value.

Fixes: commit 8e35461 ("ovsdb raft: Support leader election time change online.")
Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2019-08-23 14:42:57 -07:00
Han Zhou
8e35461419 ovsdb raft: Support leader election time change online.
A new unixctl command cluster/change-election-timer is implemented to
change leader election timeout base value according to the scale needs.

The change takes effect upon consensus of the cluster, implemented through
the append-request RPC.  A new field "election-timer" is added to raft log
entry for this purpose.

Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2019-08-21 11:30:08 -07:00
Numan Siddique
f31b8ae7a7 ovn-nbctl: Fix the ovn-nbctl test "LBs - daemon" which fails during rpm build
When 'make check' is called by the mock rpm build (which disables networking),
the test "ovn-nbctl: LBs - daemon" fails when it runs the command
"ovn-nbctl lb-add lb0 30.0.0.1a 192.168.10.10:80,192.168.10.20:80". ovn-nbctl
extracts the vip by calling the socket util function 'inet_parse_active()',
and this function blocks when libunbound function ub_resolve() is called
further down. ub_resolve() is a blocking function without timeout and all the
ovs/ovn utilities use this function.

As reported by Timothy Redaelli, the issue can also be reproduced by running
the below commands

$ sudo unshare -mn -- sh -c 'ip addr add dev lo 127.0.0.1 && \
  mount --bind /dev/null /etc/resolv.conf && runuser $SUDO_USER'
$ make sandbox SANDBOXFLAGS="--ovn"
$ ovn-nbctl -vsocket_util:off lb-add lb0 30.0.0.1a \
  192.168.10.10:80,192.168.10.20:80

To address this issue, this patch adds a new bool argument 'resolve_host' to
the function inet_parse_active() to resolve the host only if it is 'true'.

ovn-nbctl/ovn-northd will pass 'false' when it calls this function to parse
the load balancer values.

Reported-by: Timothy Redaelli <tredaelli@redhat.com>
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1641672
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2018-11-05 07:11:10 -08:00
Ben Pfaff
1bb011218d socket-util: Make inet_parse_active() and inet_parse_passive() more alike.
Until now, the default_port parameters to these functions have had
different types and different behavior.  There is a reason for this, since
it makes sense to listen on a kernel-selected port but it does not make
sense to connect to a kernel-selected port, but this overlooks the
possibility that a caller might want to parse a string in the format
understood by inet_parse_active() without actually using it to connect to
a remote host.  This commit makes the behavior consistent and updates all
the callers to work with the new semantics.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Mark Michelson <mmichels@redhat.com>
2018-04-16 14:51:09 -07:00
Ben Pfaff
1b1d2e6daa ovsdb: Introduce experimental support for clustered databases.
This commit adds support for OVSDB clustering via Raft.  Please read
ovsdb(7) for information on how to set up a clustered database.  It is
simple and boils down to running "ovsdb-tool create-cluster" on one server
and "ovsdb-tool join-cluster" on each of the others and then starting
ovsdb-server in the usual way on all of them.

One you have a clustered database, you configure ovn-controller and
ovn-northd to use it by pointing them to all of the servers, e.g. where
previously you might have said "tcp:1.2.3.4" was the database server,
now you say that it is "tcp:1.2.3.4,tcp:5.6.7.8,tcp:9.10.11.12".

This also adds support for database clustering to ovs-sandbox.

Acked-by: Justin Pettit <jpettit@ovn.org>
Tested-by: aginwala <aginwala@asu.edu>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2018-03-24 12:04:53 -07:00