ovsdb-server spends a lot of time cloning atoms for various reasons,
e.g. to create a diff of two rows or to clone a row to the transaction.
All atoms, except for strings, contains a simple value that could be
copied in efficient way, but duplicating strings every time has a
significant performance impact.
Introducing a new reference-counted structure 'ovsdb_atom_string'
that allows to not copy strings every time, but just increase a
reference counter.
This change allows to increase transaction throughput in benchmarks
up to 2x for standalone databases and 3x for clustered databases, i.e.
number of transactions that ovsdb-server can handle per second.
It also noticeably reduces memory consumption of ovsdb-server.
Next step will be to consolidate this structure with json strings,
so we will not need to duplicate strings while converting database
objects to json and back.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Mark D. Gray <mark.d.gray@redhat.com>
The ovsdb-cs layer triggers a forced reconnect in various cases:
- when an inconsistency is detected in the data received from the
remote server.
- when the remote server is running in clustered mode and transitioned
to "follower", if the client is configured in "leader-only" mode.
- when explicitly requested by upper layers (e.g., by the user
application, through the IDL layer).
In such cases it's desirable that reconnection should happen as fast as
possible, without the current exponential backoff maintained by the
underlying reconnect object. Furthermore, since 3c2d6274bcee ("raft:
Transfer leadership before creating snapshots."), leadership changes
inside the clustered database happen more often and, therefore,
"leader-only" clients need to reconnect more often too.
Forced reconnects call jsonrpc_session_force_reconnect() which will not
reset backoff. To make sure clients reconnect as fast as possible in
the aforementioned scenarios we first call the new API,
jsonrpc_session_reset_backoff(), in ovsdb-cs, for sessions that are in
state CS_S_MONITORING (i.e., the remote is likely still alive and
functioning fine).
jsonrpc_session_reset_backoff() resets the number of backoff-free
reconnect retries to the number of remotes configured for the session,
ensuring that all remotes are retried exactly once with backoff 0.
This commit also updates the Python IDL and jsonrpc implementations.
The Python IDL wasn't tracking the IDL_S_MONITORING state explicitly,
we now do that too. Tests were also added to make sure the IDL forced
reconnects happen without backoff.
Reported-at: https://bugzilla.redhat.com/1977264
Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Clients needs to re-connect from the relay that has no connection
with the database source. Also, relay acts similarly to the follower
from a clustered model from the consistency point of view, so it's not
suitable for leader-only connections.
Acked-by: Mark D. Gray <mark.d.gray@redhat.com>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
If a new database server added to the cluster, or if one of the
database servers changed its IP address or port, then you need to
update the list of remotes for the client. For example, if a new
OVN_Southbound database server is added, you need to update the
ovn-remote for the ovn-controller.
However, in the current implementation, the ovsdb-cs module always
closes the current connection and creates a new one. This can lead
to a storm of re-connections if all ovn-controllers will be updated
simultaneously. They can also start re-dowloading the database
content, creating even more load on the database servers.
Correct this by saving an existing connection if it is still in the
list of remotes after the update.
'reconnect' module will report connection state updates, but that
is OK since no real re-connection happened and we only updated the
state of a new 'reconnect' instance.
If required, re-connection can be forced after the update of remotes
with ovsdb_cs_force_reconnect().
Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
In ovsdb_cs_db_set_condition(), take into account all pending condition
changes for all tables when computing the db->cond_seqno at which the
monitor is expected to be updated.
In the following scenario, with two tables, A and B, the old code
performed the following steps:
1. Initial db->cond_seqno = X.
2. Client changes condition for table A:
- A->new_cond gets set
- expected cond seqno returned to the client: X + 1
3. ovsdb-cs sends the monitor_cond_change for table A
- A->req_cond <- A->new_cond
4. Client changes condition for table B:
- B->new_cond gets set
- expected cond seqno returned to the client: X + 1
- however, because the condition change at step 3 is still not replied
to, table B's monitor_cond_change request is not sent yet.
5. ovsdb-cs receives the reply for the condition change at step 3:
- db->cond_seqno <- X + 1
6. ovsdb-cs sends the monitor_cond_change for table B
7. ovsdb-cs receives the reply for the condition change at step 6:
- db->cond_seqno <- X + 2
The client was incorrectly informed that it will have all relevant
updates for table B at seqno X + 1 while actually that happens later, at
seqno X + 2.
Fixes: 46437c5232bd ("ovsdb-idl: Enhance conditional monitoring API")
Acked-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
ovsdb_cs_send_transaction() returns the pointer to the same
'request_id' object that is used internally. This leads to
situation where transaction in idl and CS module has the
same 'request_id' object. However, CS module is able to
destroy this transaction id at any time, e.g. if connection
state chnaged, but idl transaction might be still around at
this moment and application might still use it.
Found by running 'make check-ovsdb-cluster' with AddressSanitizer:
==79922==ERROR: AddressSanitizer: heap-use-after-free on address
0x604000167a98 at pc 0x000000626acf bp 0x7ffcdb38a4c0 sp 0x7ffcdb38a4b8
READ of size 8 at 0x604000167a98 thread T0
#0 0x626ace in json_destroy lib/json.c:354:18
#1 0x56d1ab in ovsdb_idl_txn_destroy lib/ovsdb-idl.c:2528:5
#2 0x53a908 in do_vsctl utilities/ovs-vsctl.c:3008:5
#3 0x539251 in main utilities/ovs-vsctl.c:203:17
#4 0x7f7f7e376081 in __libc_start_main (/lib64/libc.so.6+0x27081)
#5 0x461fed in _start (utilities/ovs-vsctl+0x461fed)
0x604000167a98 is located 8 bytes inside of 40-byte
region [0x604000167a90,0x604000167ab8)
freed by thread T0 here:
#0 0x503ac7 in free (utilities/ovs-vsctl+0x503ac7)
#1 0x626aae in json_destroy lib/json.c:378:9
#2 0x6adfa2 in ovsdb_cs_run lib/ovsdb-cs.c:625:13
#3 0x567731 in ovsdb_idl_run lib/ovsdb-idl.c:394:5
#4 0x56fed1 in ovsdb_idl_txn_commit_block lib/ovsdb-idl.c:3187:9
#5 0x53a4df in do_vsctl utilities/ovs-vsctl.c:2898:14
#6 0x539251 in main utilities/ovs-vsctl.c:203:17
#7 0x7f7f7e376081 in __libc_start_main
previously allocated by thread T0 here:
#0 0x503dcf in malloc (utilities/ovs-vsctl+0x503dcf)
#1 0x594656 in xmalloc lib/util.c:138:15
#2 0x626431 in json_create lib/json.c:1451:25
#3 0x626972 in json_integer_create lib/json.c:263:25
#4 0x62da0f in jsonrpc_create_id lib/jsonrpc.c:563:12
#5 0x62d9a8 in jsonrpc_create_request lib/jsonrpc.c:570:23
#6 0x6af3a6 in ovsdb_cs_send_transaction lib/ovsdb-cs.c:1357:35
#7 0x56e3d5 in ovsdb_idl_txn_commit lib/ovsdb-idl.c:3147:27
#8 0x56fea9 in ovsdb_idl_txn_commit_block lib/ovsdb-idl.c:3186:22
#9 0x53a4df in do_vsctl utilities/ovs-vsctl.c:2898:14
#10 0x539251 in main utilities/ovs-vsctl.c:203:17
#11 0x7f7f7e376081 in __libc_start_main
Fixes: 1c337c43ac1c ("ovsdb-idl: Break into two layers.")
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This change breaks the IDL into two layers: the IDL proper, whose
interface to its client is unchanged, and a low-level library called
the OVSDB "client synchronization" (CS) library. There are two
reasons for this change. First, the IDL is big and complicated and
I think that this change factors out some of that complication into
a simpler lower layer. Second, the OVN northd implementation based
on DDlog can benefit from the client synchronization library even
though it would actually be made increasingly complicated by the IDL.
Signed-off-by: Ben Pfaff <blp@ovn.org>
This new module has a single direct user now. In the future, it
will also be used by OVN.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Ilya Maximets <i.maximets@ovn.org>