Internal implementation of JSON array will be changed in the future
commits. Add access functions that users can rely on instead of
accessing the internals of 'struct json' directly and convert all the
users. Structure fields are intentionally renamed to make sure that
no code is using the old fields directly.
json_array() function is removed, as not needed anymore. Added new
functions: json_array_size(), json_array_at(), json_array_set()
and json_array_pop(). These are enough to cover all the use cases
within OVS.
The change is fairly large, however, IMO, it's a much overdue cleanup
that we need even without changing the underlying implementation.
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
We'll be changing the way strings are stored, so the direct access
will not be safe anymore. Change all the users to use the proper
API as they should have been doing anyway. This also means splitting
the handling of strings and serialized objects in most cases as
they will be treated differently.
The only code outside of json implementation for which direct access
is preserved is substitute_uuids() in test-ovsdb.c. It's an unusual
string manipulation that is only needed for the testing, so doesn't
seem worthy adding a new API function. We could introduce something
like json_string_replace() if this use case will appear somewhere
else in the future.
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
According to profiling data, converting UUIDs to strings is a frequent
operation in some workloads. This typically results in a call to
xasprintf(), which internally calls vsnprintf() twice, first to
calculate the required buffer size, and then to format the string.
This patch introduces specialized functions for printing UUIDs, which
both reduces code duplication and improves performance.
For example, on my laptop, 10,000,000 calls to the new uuid_to_string()
function takes 1296 ms, while the same number of xasprintf() calls using
UUID_FMT take 2498 ms.
Signed-off-by: Dmitry Porokh <dporokh@nvidia.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Allow setting all the options for the source connection, not only the
inactivity probe interval.
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
For ovsdb clients that are short-lived, e.g. when using
ovn-nbctl/ovn-sbctl to read some metrics from the OVN NB/SB server, they
don't really need to be aware of db changes, because they exit
immediately after getting the initial response for the requested data.
In such use cases, however, the clients still send 'set_db_change_aware'
request, which results in server side error logs when the server tries
to send out the response for the 'set_db_change_aware' request, because
at the moment the client that is supposed to receive the request has
already closed the connection and exited. E.g.:
2023-01-10T18:23:29.431Z|00007|jsonrpc|WARN|unix#3: receive error: Connection reset by peer
2023-01-10T18:23:29.431Z|00008|reconnect|WARN|unix#3: connection dropped (Connection reset by peer)
To avoid such problems, this patch provides an API to allow a client to
choose to not send the 'set_db_change_aware' request.
There was an earlier attempt to fix this [0], but it was not accepted
back then as discussed in the email [1]. It was also discussed in the
emails that an alternative approach is to use notification instead of
request, but that would require protocol changes and taking backward
compatibility into consideration. So this patch takes a different
approach and tries to keep the change small.
[0] http://patchwork.ozlabs.org/project/openvswitch/patch/1594380801-32134-1-git-send-email-dceara@redhat.com/
[1] https://mail.openvswitch.org/pipermail/ovs-discuss/2021-February/050919.html
Reported-by: Girish Moodalbail <gmoodalbail@nvidia.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2020-July/050343.html
Reported-by: Tobias Hofmann <tohofman@cisco.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2021-February/050914.html
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
When initializing a monitor table the default monitor condition is
[True] which matches the behavior of the server (to send all rows of
that table). There's no need to include this default condition in the
initial monitor request so we can consider it implicitly acked by the
server.
This fixes the incorrect (one too large) expected condition sequence
number reported by ovsdb_idl_set_condition() when application is
trying to set a [True] condition for a new table.
Reported-by: Numan Siddique <numans@ovn.org>
Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Using SHORT version of the *_SAFE loops makes the code cleaner and less
error prone. So, use the SHORT version and remove the extra variable
when possible for hmap and all its derived types.
In order to be able to use both long and short versions without changing
the name of the macro for all the clients, overload the existing name
and select the appropriate version depending on the number of arguments.
Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Even though relays can be scaled to the big number of servers to
handle a lot more clients, lack of transaction history may cause
significant load if clients are re-connecting.
E.g. in case of the upgrade of a large-scale OVN deployment, relays
can be taken down one by one forcing all the clients of one relay to
jump to other ones. And all these clients will download the database
from scratch from a new relay.
Since relay itself supports monitor_cond_since connection to the
main cluster, it receives the last transaction id along with each
update. Since these transaction ids are 'eid's of actual transactions,
they can be used by relay for a transaction history.
Relay may not receive all the transaction ids, because the main cluster
may combine several changes into a single monitor update. However,
all relays will, likely, receive same updates with the same transaction
ids, so the case where transaction id can not be found after
re-connection between relays should not be very common. If some id
is missing on the relay (i.e. this update was merged with some other
update and newer id was used) the client will just re-download the
database as if there was a normal transaction history miss.
OVSDB client synchronization module updated to provide the last
transaction id along with the update. Relay module updated to use
these ids as a transaction id. If ids are zero, relay decides that
the main server doesn't support transaction ids and disables the
transaction history accordingly.
Using ovsdb_txn_replay_commit() instead of ovsdb_txn_propose_commit_block(),
so transactions are added to the history. This can be done, because
relays has no file storage, so there is no need to write anything.
Relay tests modified to test both standalone and clustered database
as a main server. Checks added to ensure that all servers receive the
same transaction ids in monitor updates.
Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
When reconnecting, if there are condition changes already sent to the
server but not yet acked, reset the db's 'last-id', esentially clearing
the local cache after reconnect.
This is needed because the client cannot easily differentiate between
the following cases:
a. either the server already processed the requested monitor
condition change but the FSM was restarted before the
client was notified. In this case the client should
clear its local cache because it's out of sync with the
monitor view on the server side.
b. OR the server hasn't processed the requested monitor
condition change yet.
Conditions changing at the same time with a reconnection happening are
rare so the performance impact of this patch should be minimal.
Also, the tests are updated to cover the fact that we cannot control
which of the two scenarios ("a" and "b" above) are hit during the test.
Reported-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Current code doesn't use the last id received in the monitor reply.
That may result in re-downloading the database content if the
re-connection happened after receiving the initial monitor reply,
but before receiving any other database updates.
Fixes: 1c337c43ac1c ("ovsdb-idl: Break into two layers.")
Reported-at: https://bugzilla.redhat.com/2044624
Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
ovsdb-server spends a lot of time cloning atoms for various reasons,
e.g. to create a diff of two rows or to clone a row to the transaction.
All atoms, except for strings, contains a simple value that could be
copied in efficient way, but duplicating strings every time has a
significant performance impact.
Introducing a new reference-counted structure 'ovsdb_atom_string'
that allows to not copy strings every time, but just increase a
reference counter.
This change allows to increase transaction throughput in benchmarks
up to 2x for standalone databases and 3x for clustered databases, i.e.
number of transactions that ovsdb-server can handle per second.
It also noticeably reduces memory consumption of ovsdb-server.
Next step will be to consolidate this structure with json strings,
so we will not need to duplicate strings while converting database
objects to json and back.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Mark D. Gray <mark.d.gray@redhat.com>
The ovsdb-cs layer triggers a forced reconnect in various cases:
- when an inconsistency is detected in the data received from the
remote server.
- when the remote server is running in clustered mode and transitioned
to "follower", if the client is configured in "leader-only" mode.
- when explicitly requested by upper layers (e.g., by the user
application, through the IDL layer).
In such cases it's desirable that reconnection should happen as fast as
possible, without the current exponential backoff maintained by the
underlying reconnect object. Furthermore, since 3c2d6274bcee ("raft:
Transfer leadership before creating snapshots."), leadership changes
inside the clustered database happen more often and, therefore,
"leader-only" clients need to reconnect more often too.
Forced reconnects call jsonrpc_session_force_reconnect() which will not
reset backoff. To make sure clients reconnect as fast as possible in
the aforementioned scenarios we first call the new API,
jsonrpc_session_reset_backoff(), in ovsdb-cs, for sessions that are in
state CS_S_MONITORING (i.e., the remote is likely still alive and
functioning fine).
jsonrpc_session_reset_backoff() resets the number of backoff-free
reconnect retries to the number of remotes configured for the session,
ensuring that all remotes are retried exactly once with backoff 0.
This commit also updates the Python IDL and jsonrpc implementations.
The Python IDL wasn't tracking the IDL_S_MONITORING state explicitly,
we now do that too. Tests were also added to make sure the IDL forced
reconnects happen without backoff.
Reported-at: https://bugzilla.redhat.com/1977264
Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Clients needs to re-connect from the relay that has no connection
with the database source. Also, relay acts similarly to the follower
from a clustered model from the consistency point of view, so it's not
suitable for leader-only connections.
Acked-by: Mark D. Gray <mark.d.gray@redhat.com>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
If a new database server added to the cluster, or if one of the
database servers changed its IP address or port, then you need to
update the list of remotes for the client. For example, if a new
OVN_Southbound database server is added, you need to update the
ovn-remote for the ovn-controller.
However, in the current implementation, the ovsdb-cs module always
closes the current connection and creates a new one. This can lead
to a storm of re-connections if all ovn-controllers will be updated
simultaneously. They can also start re-dowloading the database
content, creating even more load on the database servers.
Correct this by saving an existing connection if it is still in the
list of remotes after the update.
'reconnect' module will report connection state updates, but that
is OK since no real re-connection happened and we only updated the
state of a new 'reconnect' instance.
If required, re-connection can be forced after the update of remotes
with ovsdb_cs_force_reconnect().
Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
In ovsdb_cs_db_set_condition(), take into account all pending condition
changes for all tables when computing the db->cond_seqno at which the
monitor is expected to be updated.
In the following scenario, with two tables, A and B, the old code
performed the following steps:
1. Initial db->cond_seqno = X.
2. Client changes condition for table A:
- A->new_cond gets set
- expected cond seqno returned to the client: X + 1
3. ovsdb-cs sends the monitor_cond_change for table A
- A->req_cond <- A->new_cond
4. Client changes condition for table B:
- B->new_cond gets set
- expected cond seqno returned to the client: X + 1
- however, because the condition change at step 3 is still not replied
to, table B's monitor_cond_change request is not sent yet.
5. ovsdb-cs receives the reply for the condition change at step 3:
- db->cond_seqno <- X + 1
6. ovsdb-cs sends the monitor_cond_change for table B
7. ovsdb-cs receives the reply for the condition change at step 6:
- db->cond_seqno <- X + 2
The client was incorrectly informed that it will have all relevant
updates for table B at seqno X + 1 while actually that happens later, at
seqno X + 2.
Fixes: 46437c5232bd ("ovsdb-idl: Enhance conditional monitoring API")
Acked-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
ovsdb_cs_send_transaction() returns the pointer to the same
'request_id' object that is used internally. This leads to
situation where transaction in idl and CS module has the
same 'request_id' object. However, CS module is able to
destroy this transaction id at any time, e.g. if connection
state chnaged, but idl transaction might be still around at
this moment and application might still use it.
Found by running 'make check-ovsdb-cluster' with AddressSanitizer:
==79922==ERROR: AddressSanitizer: heap-use-after-free on address
0x604000167a98 at pc 0x000000626acf bp 0x7ffcdb38a4c0 sp 0x7ffcdb38a4b8
READ of size 8 at 0x604000167a98 thread T0
#0 0x626ace in json_destroy lib/json.c:354:18
#1 0x56d1ab in ovsdb_idl_txn_destroy lib/ovsdb-idl.c:2528:5
#2 0x53a908 in do_vsctl utilities/ovs-vsctl.c:3008:5
#3 0x539251 in main utilities/ovs-vsctl.c:203:17
#4 0x7f7f7e376081 in __libc_start_main (/lib64/libc.so.6+0x27081)
#5 0x461fed in _start (utilities/ovs-vsctl+0x461fed)
0x604000167a98 is located 8 bytes inside of 40-byte
region [0x604000167a90,0x604000167ab8)
freed by thread T0 here:
#0 0x503ac7 in free (utilities/ovs-vsctl+0x503ac7)
#1 0x626aae in json_destroy lib/json.c:378:9
#2 0x6adfa2 in ovsdb_cs_run lib/ovsdb-cs.c:625:13
#3 0x567731 in ovsdb_idl_run lib/ovsdb-idl.c:394:5
#4 0x56fed1 in ovsdb_idl_txn_commit_block lib/ovsdb-idl.c:3187:9
#5 0x53a4df in do_vsctl utilities/ovs-vsctl.c:2898:14
#6 0x539251 in main utilities/ovs-vsctl.c:203:17
#7 0x7f7f7e376081 in __libc_start_main
previously allocated by thread T0 here:
#0 0x503dcf in malloc (utilities/ovs-vsctl+0x503dcf)
#1 0x594656 in xmalloc lib/util.c:138:15
#2 0x626431 in json_create lib/json.c:1451:25
#3 0x626972 in json_integer_create lib/json.c:263:25
#4 0x62da0f in jsonrpc_create_id lib/jsonrpc.c:563:12
#5 0x62d9a8 in jsonrpc_create_request lib/jsonrpc.c:570:23
#6 0x6af3a6 in ovsdb_cs_send_transaction lib/ovsdb-cs.c:1357:35
#7 0x56e3d5 in ovsdb_idl_txn_commit lib/ovsdb-idl.c:3147:27
#8 0x56fea9 in ovsdb_idl_txn_commit_block lib/ovsdb-idl.c:3186:22
#9 0x53a4df in do_vsctl utilities/ovs-vsctl.c:2898:14
#10 0x539251 in main utilities/ovs-vsctl.c:203:17
#11 0x7f7f7e376081 in __libc_start_main
Fixes: 1c337c43ac1c ("ovsdb-idl: Break into two layers.")
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This change breaks the IDL into two layers: the IDL proper, whose
interface to its client is unchanged, and a low-level library called
the OVSDB "client synchronization" (CS) library. There are two
reasons for this change. First, the IDL is big and complicated and
I think that this change factors out some of that complication into
a simpler lower layer. Second, the OVN northd implementation based
on DDlog can benefit from the client synchronization library even
though it would actually be made increasingly complicated by the IDL.
Signed-off-by: Ben Pfaff <blp@ovn.org>
This new module has a single direct user now. In the future, it
will also be used by OVN.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Ilya Maximets <i.maximets@ovn.org>