ovsdb-cs: Perform forced reconnects without a backoff.

The ovsdb-cs layer triggers a forced reconnect in various cases: - when an inconsistency is detected in the data received from the remote server. - when the remote server is running in clustered mode and transitioned to "follower", if the client is configured in "leader-only" mode. - when explicitly requested by upper layers (e.g., by the user application, through the IDL layer). In such cases it's desirable that reconnection should happen as fast as possible, without the current exponential backoff maintained by the underlying reconnect object. Furthermore, since 3c2d6274bc ("raft: Transfer leadership before creating snapshots."), leadership changes inside the clustered database happen more often and, therefore, "leader-only" clients need to reconnect more often too. Forced reconnects call jsonrpc_session_force_reconnect() which will not reset backoff. To make sure clients reconnect as fast as possible in the aforementioned scenarios we first call the new API, jsonrpc_session_reset_backoff(), in ovsdb-cs, for sessions that are in state CS_S_MONITORING (i.e., the remote is likely still alive and functioning fine). jsonrpc_session_reset_backoff() resets the number of backoff-free reconnect retries to the number of remotes configured for the session, ensuring that all remotes are retried exactly once with backoff 0. This commit also updates the Python IDL and jsonrpc implementations. The Python IDL wasn't tracking the IDL_S_MONITORING state explicitly, we now do that too. Tests were also added to make sure the IDL forced reconnects happen without backoff. Reported-at: https://bugzilla.redhat.com/1977264 Suggested-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-09-01 14:55:18 +00:00 · 2021-07-21 14:51:00 +02:00
parent 69b2bdfd3f
commit daf627f459
6 changed files with 123 additions and 4 deletions
--- a/python/ovs/db/idl.py
+++ b/python/ovs/db/idl.py
@@ -102,6 +102,7 @@ class Idl(object):
    IDL_S_SERVER_MONITOR_REQUESTED = 2
    IDL_S_DATA_MONITOR_REQUESTED = 3
    IDL_S_DATA_MONITOR_COND_REQUESTED = 4
+    IDL_S_MONITORING = 5

    def __init__(self, remote, schema_helper, probe_interval=None,
                 leader_only=True):
@@ -295,6 +296,7 @@ class Idl(object):
                    else:
                        assert self.state == self.IDL_S_DATA_MONITOR_REQUESTED
                        self.__parse_update(msg.result, OVSDB_UPDATE)
+                    self.state = self.IDL_S_MONITORING

                except error.Error as e:
                    vlog.err("%s: parse error in received schema: %s"
@@ -442,6 +444,15 @@ class Idl(object):
    def force_reconnect(self):
        """Forces the IDL to drop its connection to the database and reconnect.
        In the meantime, the contents of the IDL will not change."""
+        if self.state == self.IDL_S_MONITORING:
+            # The IDL was in MONITORING state, so we either had data
+            # inconsistency on this server, or it stopped being the cluster
+            # leader, or the user requested to re-connect.  Avoiding backoff
+            # in these cases, as we need to re-connect as soon as possible.
+            # Connections that are not in MONITORING state should have their
+            # backoff to avoid constant flood of re-connection attempts in
+            # case there is no suitable database server.
+            self._session.reset_backoff()
        self._session.force_reconnect()

    def session_name(self):
--- a/python/ovs/jsonrpc.py
+++ b/python/ovs/jsonrpc.py
@@ -612,5 +612,18 @@ class Session(object):
    def force_reconnect(self):
        self.reconnect.force_reconnect(ovs.timeval.msec())

+    def reset_backoff(self):
+        """ Resets the reconnect backoff by allowing as many free tries as the
+        number of configured remotes.  This is to be used by upper layers
+        before calling force_reconnect() if backoff is undesirable."""
+        free_tries = len(self.remotes)
+
+        if self.is_connected():
+            # The extra free try will be consumed when the current remote
+            # is disconnected.
+            free_tries += 1
+
+        self.reconnect.set_backoff_free_tries(free_tries)
+
    def get_num_of_remotes(self):
        return len(self.remotes)