2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-29 13:27:59 +00:00

docs: Document manual cluster recovery procedure.

Remove the notion of cluster/leave --force since it was never
implemented. Instead of these instructions, document how a broken
cluster can be re-initialized with the old database contents.

Acked-by: Simon Horman <horms@ovn.org>
Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This commit is contained in:
Ihar Hrachyshka 2024-04-26 16:54:48 +00:00 committed by Ilya Maximets
parent 139b564dbd
commit 01a0fff361
2 changed files with 37 additions and 9 deletions

View File

@ -315,16 +315,11 @@ The above methods for adding and removing servers only work for healthy
clusters, that is, for clusters with no more failures than their maximum clusters, that is, for clusters with no more failures than their maximum
tolerance. For example, in a 3-server cluster, the failure of 2 servers tolerance. For example, in a 3-server cluster, the failure of 2 servers
prevents servers joining or leaving the cluster (as well as database access). prevents servers joining or leaving the cluster (as well as database access).
To prevent data loss or inconsistency, the preferred solution to this problem To prevent data loss or inconsistency, the preferred solution to this problem
is to bring up enough of the failed servers to make the cluster healthy again, is to bring up enough of the failed servers to make the cluster healthy again,
then if necessary remove any remaining failed servers and add new ones. If then if necessary remove any remaining failed servers and add new ones. If
this cannot be done, though, use ``ovs-appctl`` to invoke ``cluster/leave this is not an option, see the next section for `Manual cluster recovery`_.
--force`` on a running server. This command forces the server to which it is
directed to leave its cluster and form a new single-node cluster that contains
only itself. The data in the new cluster may be inconsistent with the former
cluster: transactions not yet replicated to the server will be lost, and
transactions not yet applied to the cluster may be committed. Afterward, any
servers in its former cluster will regard the server to have failed.
Once a server leaves a cluster, it may never rejoin it. Instead, create a new Once a server leaves a cluster, it may never rejoin it. Instead, create a new
server and join it to the cluster. server and join it to the cluster.
@ -362,6 +357,40 @@ Clustered OVSDB does not support the OVSDB "ephemeral columns" feature.
ones when they work with schemas for clustered databases. Future versions of ones when they work with schemas for clustered databases. Future versions of
OVSDB might add support for this feature. OVSDB might add support for this feature.
Manual cluster recovery
~~~~~~~~~~~~~~~~~~~~~~~
.. important::
The procedure below will result in ``cid`` and ``sid`` change. A *new*
cluster will be initialized.
To recover a clustered database after a failure:
1. Stop *all* old cluster ``ovsdb-server`` instances before proceeding.
2. Pick one of the old members which will serve as a bootstrap member of the
to-be-recovered cluster.
3. Convert its database file to the standalone format using ``ovsdb-tool
cluster-to-standalone``.
4. Backup the standalone database file.
5. Create a new single-node cluster with ``ovsdb-tool create-cluster``
using the previously saved standalone database file, then start
``ovsdb-server``.
6. Once the single-node cluster is up and running and serves the restored data,
new members should be created and added to the cluster, as usual, with
``ovsdb-tool join-cluster``.
.. note::
The data in the new cluster may be inconsistent with the former cluster:
transactions not yet replicated to the server chosen in step 2 will be lost,
and transactions not yet applied to the cluster may be committed.
Upgrading from version 2.14 and earlier to 2.15 and later Upgrading from version 2.14 and earlier to 2.15 and later
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

View File

@ -461,8 +461,7 @@ This does not result in a three server cluster that lacks quorum.
. .
.IP "\fBcluster/kick \fIdb server\fR" .IP "\fBcluster/kick \fIdb server\fR"
Start graceful removal of \fIserver\fR from \fIdb\fR's cluster, like Start graceful removal of \fIserver\fR from \fIdb\fR's cluster, like
\fBcluster/leave\fR (without \fB\-\-force\fR) except that it can \fBcluster/leave\fR, except that it can remove any server, not just this one.
remove any server, not just this one.
.IP .IP
\fIserver\fR may be a server ID, as printed by \fBcluster/sid\fR, or \fIserver\fR may be a server ID, as printed by \fBcluster/sid\fR, or
the server's local network address as passed to \fBovsdb-tool\fR's the server's local network address as passed to \fBovsdb-tool\fR's