2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-22 09:58:01 +00:00
ovs/ovsdb/relay.c

434 lines
12 KiB
C
Raw Normal View History

ovsdb: New ovsdb 'relay' service model. New database service model 'relay' that is needed to scale out read-mostly database access, e.g. ovn-controller connections to OVN_Southbound. In this service model ovsdb-server connects to existing OVSDB server and maintains in-memory copy of the database. It serves read-only transactions and monitor requests by its own, but forwards write transactions to the relay source. Key differences from the active-backup replication: - support for "write" transactions (next commit). - no on-disk storage. (probably, faster operation) - support for multiple remotes (connect to the clustered db). - doesn't try to keep connection as long as possible, but faster reconnects to other remotes to avoid missing updates. - No need to know the complete database schema beforehand, only the schema name. - can be used along with other standalone and clustered databases by the same ovsdb-server process. (doesn't turn the whole jsonrpc server to read-only mode) - supports modern version of monitors (monitor_cond_since), because based on ovsdb-cs. - could be chained, i.e. multiple relays could be connected one to another in a row or in a tree-like form. - doesn't increase availability. - cannot be converted to other service models or become a main active server. Some performance test results can be found here: https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385825.html Acked-by: Mark D. Gray <mark.d.gray@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-06-01 23:27:36 +02:00
/*
* Copyright (c) 2021, Red Hat, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at:
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include <config.h>
#include "relay.h"
#include "coverage.h"
#include "jsonrpc.h"
#include "openvswitch/hmap.h"
#include "openvswitch/json.h"
#include "openvswitch/list.h"
#include "openvswitch/poll-loop.h"
#include "openvswitch/shash.h"
#include "openvswitch/vlog.h"
#include "ovsdb.h"
#include "ovsdb-cs.h"
#include "ovsdb-error.h"
#include "row.h"
#include "table.h"
#include "timeval.h"
ovsdb: New ovsdb 'relay' service model. New database service model 'relay' that is needed to scale out read-mostly database access, e.g. ovn-controller connections to OVN_Southbound. In this service model ovsdb-server connects to existing OVSDB server and maintains in-memory copy of the database. It serves read-only transactions and monitor requests by its own, but forwards write transactions to the relay source. Key differences from the active-backup replication: - support for "write" transactions (next commit). - no on-disk storage. (probably, faster operation) - support for multiple remotes (connect to the clustered db). - doesn't try to keep connection as long as possible, but faster reconnects to other remotes to avoid missing updates. - No need to know the complete database schema beforehand, only the schema name. - can be used along with other standalone and clustered databases by the same ovsdb-server process. (doesn't turn the whole jsonrpc server to read-only mode) - supports modern version of monitors (monitor_cond_since), because based on ovsdb-cs. - could be chained, i.e. multiple relays could be connected one to another in a row or in a tree-like form. - doesn't increase availability. - cannot be converted to other service models or become a main active server. Some performance test results can be found here: https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385825.html Acked-by: Mark D. Gray <mark.d.gray@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-06-01 23:27:36 +02:00
#include "transaction.h"
ovsdb: relay: Add support for transaction forwarding. Current version of ovsdb relay allows to scale out read-only access to the primary database. However, many clients are not read-only but read-mostly. For example, ovn-controller. In order to scale out database access for this case ovsdb-server need to process transactions that are not read-only. Relay is not allowed to do that, i.e. not allowed to modify the database, but it can act like a proxy and forward transactions that includes database modifications to the primary server and forward replies back to a client. At the same time it may serve read-only transactions and monitor requests by itself greatly reducing the load on primary server. This configuration will slightly increase transaction latency, but it's not very important for read-mostly use cases. Implementation details: With this change instead of creating a trigger to commit the transaction, ovsdb-server will create a trigger for transaction forwarding. Later, ovsdb_relay_run() will send all new transactions to the relay source. Once transaction reply received from the relay source, ovsdb-relay module will update the state of the transaction forwarding with the reply. After that, trigger_run() will complete the trigger and jsonrpc_server_run() will send the reply back to the client. Since transaction reply from the relay source will be received after all the updates, client will receive all the updates before receiving the transaction reply as it is in a normal scenario with other database models. Acked-by: Mark D. Gray <mark.d.gray@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-04-15 19:05:40 +02:00
#include "transaction-forward.h"
ovsdb: New ovsdb 'relay' service model. New database service model 'relay' that is needed to scale out read-mostly database access, e.g. ovn-controller connections to OVN_Southbound. In this service model ovsdb-server connects to existing OVSDB server and maintains in-memory copy of the database. It serves read-only transactions and monitor requests by its own, but forwards write transactions to the relay source. Key differences from the active-backup replication: - support for "write" transactions (next commit). - no on-disk storage. (probably, faster operation) - support for multiple remotes (connect to the clustered db). - doesn't try to keep connection as long as possible, but faster reconnects to other remotes to avoid missing updates. - No need to know the complete database schema beforehand, only the schema name. - can be used along with other standalone and clustered databases by the same ovsdb-server process. (doesn't turn the whole jsonrpc server to read-only mode) - supports modern version of monitors (monitor_cond_since), because based on ovsdb-cs. - could be chained, i.e. multiple relays could be connected one to another in a row or in a tree-like form. - doesn't increase availability. - cannot be converted to other service models or become a main active server. Some performance test results can be found here: https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385825.html Acked-by: Mark D. Gray <mark.d.gray@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-06-01 23:27:36 +02:00
#include "util.h"
VLOG_DEFINE_THIS_MODULE(relay);
static struct shash relay_dbs = SHASH_INITIALIZER(&relay_dbs);
struct relay_ctx {
struct ovsdb *db;
struct ovsdb_cs *cs;
/* Schema updates. */
struct ovsdb_schema *new_schema;
schema_change_callback schema_change_cb;
void *schema_change_aux;
long long int last_connected;
ovsdb: New ovsdb 'relay' service model. New database service model 'relay' that is needed to scale out read-mostly database access, e.g. ovn-controller connections to OVN_Southbound. In this service model ovsdb-server connects to existing OVSDB server and maintains in-memory copy of the database. It serves read-only transactions and monitor requests by its own, but forwards write transactions to the relay source. Key differences from the active-backup replication: - support for "write" transactions (next commit). - no on-disk storage. (probably, faster operation) - support for multiple remotes (connect to the clustered db). - doesn't try to keep connection as long as possible, but faster reconnects to other remotes to avoid missing updates. - No need to know the complete database schema beforehand, only the schema name. - can be used along with other standalone and clustered databases by the same ovsdb-server process. (doesn't turn the whole jsonrpc server to read-only mode) - supports modern version of monitors (monitor_cond_since), because based on ovsdb-cs. - could be chained, i.e. multiple relays could be connected one to another in a row or in a tree-like form. - doesn't increase availability. - cannot be converted to other service models or become a main active server. Some performance test results can be found here: https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385825.html Acked-by: Mark D. Gray <mark.d.gray@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-06-01 23:27:36 +02:00
};
#define RELAY_MAX_RECONNECTION_MS 30000
/* Reports if the database is connected to the relay source and functional,
* i.e. it actively monitors the source and is able to forward transactions. */
bool
ovsdb_relay_is_connected(struct ovsdb *db)
{
struct relay_ctx *ctx = shash_find_data(&relay_dbs, db->name);
if (!ctx || !ovsdb_cs_is_alive(ctx->cs)) {
return false;
}
if (ovsdb_cs_may_send_transaction(ctx->cs)) {
return true;
}
/* Trying to avoid connection state flapping by delaying report for
* upper layer and giving ovsdb-cs some time to reconnect. */
if (time_msec() - ctx->last_connected < RELAY_MAX_RECONNECTION_MS) {
return true;
}
return false;
}
ovsdb: New ovsdb 'relay' service model. New database service model 'relay' that is needed to scale out read-mostly database access, e.g. ovn-controller connections to OVN_Southbound. In this service model ovsdb-server connects to existing OVSDB server and maintains in-memory copy of the database. It serves read-only transactions and monitor requests by its own, but forwards write transactions to the relay source. Key differences from the active-backup replication: - support for "write" transactions (next commit). - no on-disk storage. (probably, faster operation) - support for multiple remotes (connect to the clustered db). - doesn't try to keep connection as long as possible, but faster reconnects to other remotes to avoid missing updates. - No need to know the complete database schema beforehand, only the schema name. - can be used along with other standalone and clustered databases by the same ovsdb-server process. (doesn't turn the whole jsonrpc server to read-only mode) - supports modern version of monitors (monitor_cond_since), because based on ovsdb-cs. - could be chained, i.e. multiple relays could be connected one to another in a row or in a tree-like form. - doesn't increase availability. - cannot be converted to other service models or become a main active server. Some performance test results can be found here: https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385825.html Acked-by: Mark D. Gray <mark.d.gray@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-06-01 23:27:36 +02:00
static struct json *
ovsdb_relay_compose_monitor_request(const struct json *schema_json, void *ctx_)
{
struct json *monitor_request = json_object_create();
struct relay_ctx *ctx = ctx_;
struct ovsdb_schema *schema;
struct ovsdb *db = ctx->db;
struct ovsdb_error *error;
error = ovsdb_schema_from_json(schema_json, &schema);
if (error) {
char *msg = ovsdb_error_to_string_free(error);
VLOG_WARN("%s: Failed to parse db schema: %s", db->name, msg);
free(msg);
/* There is nothing we can really do here. */
return monitor_request;
}
const struct shash_node *node;
SHASH_FOR_EACH (node, &schema->tables) {
struct json *monitor_request_array = json_array_create_empty();
struct ovsdb_table_schema *table = node->data;
json_array_add(monitor_request_array, json_object_create());
json_object_put(monitor_request, table->name, monitor_request_array);
}
if (!db->schema || !ovsdb_schema_equal(schema, db->schema)) {
VLOG_DBG("database %s schema changed.", db->name);
if (ctx->new_schema) {
ovsdb_schema_destroy(ctx->new_schema);
}
/* We will update the schema later when we will receive actual data
* from the mointor in order to avoid sitting with an empty database
* until the monitor reply. */
ctx->new_schema = schema;
} else {
ovsdb_schema_destroy(schema);
}
return monitor_request;
}
static struct ovsdb_cs_ops relay_cs_ops = {
.compose_monitor_requests = ovsdb_relay_compose_monitor_request,
};
void
ovsdb_relay_add_db(struct ovsdb *db, const char *remote,
schema_change_callback schema_change_cb,
void *schema_change_aux,
const struct jsonrpc_session_options *options)
ovsdb: New ovsdb 'relay' service model. New database service model 'relay' that is needed to scale out read-mostly database access, e.g. ovn-controller connections to OVN_Southbound. In this service model ovsdb-server connects to existing OVSDB server and maintains in-memory copy of the database. It serves read-only transactions and monitor requests by its own, but forwards write transactions to the relay source. Key differences from the active-backup replication: - support for "write" transactions (next commit). - no on-disk storage. (probably, faster operation) - support for multiple remotes (connect to the clustered db). - doesn't try to keep connection as long as possible, but faster reconnects to other remotes to avoid missing updates. - No need to know the complete database schema beforehand, only the schema name. - can be used along with other standalone and clustered databases by the same ovsdb-server process. (doesn't turn the whole jsonrpc server to read-only mode) - supports modern version of monitors (monitor_cond_since), because based on ovsdb-cs. - could be chained, i.e. multiple relays could be connected one to another in a row or in a tree-like form. - doesn't increase availability. - cannot be converted to other service models or become a main active server. Some performance test results can be found here: https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385825.html Acked-by: Mark D. Gray <mark.d.gray@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-06-01 23:27:36 +02:00
{
struct relay_ctx *ctx;
if (!db || !remote) {
return;
}
ctx = shash_find_data(&relay_dbs, db->name);
if (ctx) {
ovsdb_cs_set_remote(ctx->cs, remote, true);
ovsdb_cs_set_jsonrpc_options(ctx->cs, options);
ovsdb: New ovsdb 'relay' service model. New database service model 'relay' that is needed to scale out read-mostly database access, e.g. ovn-controller connections to OVN_Southbound. In this service model ovsdb-server connects to existing OVSDB server and maintains in-memory copy of the database. It serves read-only transactions and monitor requests by its own, but forwards write transactions to the relay source. Key differences from the active-backup replication: - support for "write" transactions (next commit). - no on-disk storage. (probably, faster operation) - support for multiple remotes (connect to the clustered db). - doesn't try to keep connection as long as possible, but faster reconnects to other remotes to avoid missing updates. - No need to know the complete database schema beforehand, only the schema name. - can be used along with other standalone and clustered databases by the same ovsdb-server process. (doesn't turn the whole jsonrpc server to read-only mode) - supports modern version of monitors (monitor_cond_since), because based on ovsdb-cs. - could be chained, i.e. multiple relays could be connected one to another in a row or in a tree-like form. - doesn't increase availability. - cannot be converted to other service models or become a main active server. Some performance test results can be found here: https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385825.html Acked-by: Mark D. Gray <mark.d.gray@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-06-01 23:27:36 +02:00
VLOG_DBG("%s: relay source set to '%s'", db->name, remote);
return;
}
db->is_relay = true;
ctx = xzalloc(sizeof *ctx);
ctx->schema_change_cb = schema_change_cb;
ctx->schema_change_aux = schema_change_aux;
ctx->db = db;
ctx->cs = ovsdb_cs_create(db->name, 3, &relay_cs_ops, ctx);
ctx->last_connected = 0;
ovsdb: New ovsdb 'relay' service model. New database service model 'relay' that is needed to scale out read-mostly database access, e.g. ovn-controller connections to OVN_Southbound. In this service model ovsdb-server connects to existing OVSDB server and maintains in-memory copy of the database. It serves read-only transactions and monitor requests by its own, but forwards write transactions to the relay source. Key differences from the active-backup replication: - support for "write" transactions (next commit). - no on-disk storage. (probably, faster operation) - support for multiple remotes (connect to the clustered db). - doesn't try to keep connection as long as possible, but faster reconnects to other remotes to avoid missing updates. - No need to know the complete database schema beforehand, only the schema name. - can be used along with other standalone and clustered databases by the same ovsdb-server process. (doesn't turn the whole jsonrpc server to read-only mode) - supports modern version of monitors (monitor_cond_since), because based on ovsdb-cs. - could be chained, i.e. multiple relays could be connected one to another in a row or in a tree-like form. - doesn't increase availability. - cannot be converted to other service models or become a main active server. Some performance test results can be found here: https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385825.html Acked-by: Mark D. Gray <mark.d.gray@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-06-01 23:27:36 +02:00
shash_add(&relay_dbs, db->name, ctx);
ovsdb_cs_set_leader_only(ctx->cs, false);
ovsdb_cs_set_remote(ctx->cs, remote, true);
ovsdb_cs_set_jsonrpc_options(ctx->cs, options);
ovsdb: New ovsdb 'relay' service model. New database service model 'relay' that is needed to scale out read-mostly database access, e.g. ovn-controller connections to OVN_Southbound. In this service model ovsdb-server connects to existing OVSDB server and maintains in-memory copy of the database. It serves read-only transactions and monitor requests by its own, but forwards write transactions to the relay source. Key differences from the active-backup replication: - support for "write" transactions (next commit). - no on-disk storage. (probably, faster operation) - support for multiple remotes (connect to the clustered db). - doesn't try to keep connection as long as possible, but faster reconnects to other remotes to avoid missing updates. - No need to know the complete database schema beforehand, only the schema name. - can be used along with other standalone and clustered databases by the same ovsdb-server process. (doesn't turn the whole jsonrpc server to read-only mode) - supports modern version of monitors (monitor_cond_since), because based on ovsdb-cs. - could be chained, i.e. multiple relays could be connected one to another in a row or in a tree-like form. - doesn't increase availability. - cannot be converted to other service models or become a main active server. Some performance test results can be found here: https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385825.html Acked-by: Mark D. Gray <mark.d.gray@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-06-01 23:27:36 +02:00
VLOG_DBG("added database: %s, %s", db->name, remote);
}
/* Updates the probe interval for all relay connections to the specified
* value. */
void
ovsdb_relay_set_probe_interval(int probe_interval)
{
struct shash_node *node;
SHASH_FOR_EACH (node, &relay_dbs) {
struct relay_ctx *ctx = node->data;
ovsdb_cs_set_probe_interval(ctx->cs, probe_interval);
}
}
ovsdb: New ovsdb 'relay' service model. New database service model 'relay' that is needed to scale out read-mostly database access, e.g. ovn-controller connections to OVN_Southbound. In this service model ovsdb-server connects to existing OVSDB server and maintains in-memory copy of the database. It serves read-only transactions and monitor requests by its own, but forwards write transactions to the relay source. Key differences from the active-backup replication: - support for "write" transactions (next commit). - no on-disk storage. (probably, faster operation) - support for multiple remotes (connect to the clustered db). - doesn't try to keep connection as long as possible, but faster reconnects to other remotes to avoid missing updates. - No need to know the complete database schema beforehand, only the schema name. - can be used along with other standalone and clustered databases by the same ovsdb-server process. (doesn't turn the whole jsonrpc server to read-only mode) - supports modern version of monitors (monitor_cond_since), because based on ovsdb-cs. - could be chained, i.e. multiple relays could be connected one to another in a row or in a tree-like form. - doesn't increase availability. - cannot be converted to other service models or become a main active server. Some performance test results can be found here: https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385825.html Acked-by: Mark D. Gray <mark.d.gray@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-06-01 23:27:36 +02:00
void
ovsdb_relay_del_db(struct ovsdb *db)
{
struct relay_ctx *ctx;
if (!db) {
return;
}
ctx = shash_find_and_delete(&relay_dbs, db->name);
if (!ctx) {
VLOG_WARN("Failed to remove relay database %s: not found.", db->name);
return;
}
VLOG_DBG("removed database: %s", db->name);
db->is_relay = false;
ovsdb_cs_destroy(ctx->cs);
free(ctx);
}
static struct ovsdb_error *
ovsdb_relay_process_row_update(struct ovsdb_table *table,
const struct ovsdb_cs_row_update *ru,
struct ovsdb_txn *txn)
{
const struct uuid *uuid = &ru->row_uuid;
struct ovsdb_error * error = NULL;
/* XXX: ovsdb-cs module returns shash which was previously part of a json
* structure and we need json row format in order to use ovsdb_row*
* functions. Creating a json object out of shash. */
struct json *json_row = json_object_create();
struct shash *obj = json_row->object;
json_row->object = CONST_CAST(struct shash *, ru->columns);
switch (ru->type) {
case OVSDB_CS_ROW_DELETE:
error = ovsdb_table_execute_delete(txn, uuid, table);
break;
case OVSDB_CS_ROW_INSERT:
error = ovsdb_table_execute_insert(txn, uuid, table, json_row);
break;
case OVSDB_CS_ROW_UPDATE:
error = ovsdb_table_execute_update(txn, uuid, table, json_row, false);
break;
case OVSDB_CS_ROW_XOR:
error = ovsdb_table_execute_update(txn, uuid, table, json_row, true);
break;
default:
OVS_NOT_REACHED();
}
json_row->object = obj;
json_destroy(json_row);
return error;
}
static struct ovsdb_error *
ovsdb_relay_parse_update__(struct ovsdb *db,
ovsdb: relay: Add transaction history support. Even though relays can be scaled to the big number of servers to handle a lot more clients, lack of transaction history may cause significant load if clients are re-connecting. E.g. in case of the upgrade of a large-scale OVN deployment, relays can be taken down one by one forcing all the clients of one relay to jump to other ones. And all these clients will download the database from scratch from a new relay. Since relay itself supports monitor_cond_since connection to the main cluster, it receives the last transaction id along with each update. Since these transaction ids are 'eid's of actual transactions, they can be used by relay for a transaction history. Relay may not receive all the transaction ids, because the main cluster may combine several changes into a single monitor update. However, all relays will, likely, receive same updates with the same transaction ids, so the case where transaction id can not be found after re-connection between relays should not be very common. If some id is missing on the relay (i.e. this update was merged with some other update and newer id was used) the client will just re-download the database as if there was a normal transaction history miss. OVSDB client synchronization module updated to provide the last transaction id along with the update. Relay module updated to use these ids as a transaction id. If ids are zero, relay decides that the main server doesn't support transaction ids and disables the transaction history accordingly. Using ovsdb_txn_replay_commit() instead of ovsdb_txn_propose_commit_block(), so transactions are added to the history. This can be done, because relays has no file storage, so there is no need to write anything. Relay tests modified to test both standalone and clustered database as a main server. Checks added to ensure that all servers receive the same transaction ids in monitor updates. Acked-by: Mike Pattrick <mkp@redhat.com> Acked-by: Han Zhou <hzhou@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-12-19 15:09:39 +01:00
const struct ovsdb_cs_db_update *du,
const struct uuid *last_id)
ovsdb: New ovsdb 'relay' service model. New database service model 'relay' that is needed to scale out read-mostly database access, e.g. ovn-controller connections to OVN_Southbound. In this service model ovsdb-server connects to existing OVSDB server and maintains in-memory copy of the database. It serves read-only transactions and monitor requests by its own, but forwards write transactions to the relay source. Key differences from the active-backup replication: - support for "write" transactions (next commit). - no on-disk storage. (probably, faster operation) - support for multiple remotes (connect to the clustered db). - doesn't try to keep connection as long as possible, but faster reconnects to other remotes to avoid missing updates. - No need to know the complete database schema beforehand, only the schema name. - can be used along with other standalone and clustered databases by the same ovsdb-server process. (doesn't turn the whole jsonrpc server to read-only mode) - supports modern version of monitors (monitor_cond_since), because based on ovsdb-cs. - could be chained, i.e. multiple relays could be connected one to another in a row or in a tree-like form. - doesn't increase availability. - cannot be converted to other service models or become a main active server. Some performance test results can be found here: https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385825.html Acked-by: Mark D. Gray <mark.d.gray@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-06-01 23:27:36 +02:00
{
struct ovsdb_error *error = NULL;
struct ovsdb_txn *txn;
txn = ovsdb_txn_create(db);
for (size_t i = 0; i < du->n; i++) {
const struct ovsdb_cs_table_update *tu = &du->table_updates[i];
struct ovsdb_table *table = ovsdb_get_table(db, tu->table_name);
if (!table) {
error = ovsdb_error("unknown table", "unknown table %s",
tu->table_name);
goto exit;
}
for (size_t j = 0; j < tu->n; j++) {
const struct ovsdb_cs_row_update *ru = &tu->row_updates[j];
error = ovsdb_relay_process_row_update(table, ru, txn);
if (error) {
goto exit;
}
}
}
exit:
if (error) {
ovsdb_txn_abort(txn);
return error;
} else {
ovsdb: relay: Add transaction history support. Even though relays can be scaled to the big number of servers to handle a lot more clients, lack of transaction history may cause significant load if clients are re-connecting. E.g. in case of the upgrade of a large-scale OVN deployment, relays can be taken down one by one forcing all the clients of one relay to jump to other ones. And all these clients will download the database from scratch from a new relay. Since relay itself supports monitor_cond_since connection to the main cluster, it receives the last transaction id along with each update. Since these transaction ids are 'eid's of actual transactions, they can be used by relay for a transaction history. Relay may not receive all the transaction ids, because the main cluster may combine several changes into a single monitor update. However, all relays will, likely, receive same updates with the same transaction ids, so the case where transaction id can not be found after re-connection between relays should not be very common. If some id is missing on the relay (i.e. this update was merged with some other update and newer id was used) the client will just re-download the database as if there was a normal transaction history miss. OVSDB client synchronization module updated to provide the last transaction id along with the update. Relay module updated to use these ids as a transaction id. If ids are zero, relay decides that the main server doesn't support transaction ids and disables the transaction history accordingly. Using ovsdb_txn_replay_commit() instead of ovsdb_txn_propose_commit_block(), so transactions are added to the history. This can be done, because relays has no file storage, so there is no need to write anything. Relay tests modified to test both standalone and clustered database as a main server. Checks added to ensure that all servers receive the same transaction ids in monitor updates. Acked-by: Mike Pattrick <mkp@redhat.com> Acked-by: Han Zhou <hzhou@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-12-19 15:09:39 +01:00
if (uuid_is_zero(last_id)) {
/* The relay source doesn't support unique transaction ids,
* disabling transaction history for relay. */
ovsdb_txn_history_destroy(db);
ovsdb_txn_history_init(db, false);
} else {
ovsdb_txn_set_txnid(last_id, txn);
}
/* Commit transaction.
* There is no storage, so ovsdb_txn_replay_commit() can be used. */
error = ovsdb_txn_replay_commit(txn);
ovsdb: New ovsdb 'relay' service model. New database service model 'relay' that is needed to scale out read-mostly database access, e.g. ovn-controller connections to OVN_Southbound. In this service model ovsdb-server connects to existing OVSDB server and maintains in-memory copy of the database. It serves read-only transactions and monitor requests by its own, but forwards write transactions to the relay source. Key differences from the active-backup replication: - support for "write" transactions (next commit). - no on-disk storage. (probably, faster operation) - support for multiple remotes (connect to the clustered db). - doesn't try to keep connection as long as possible, but faster reconnects to other remotes to avoid missing updates. - No need to know the complete database schema beforehand, only the schema name. - can be used along with other standalone and clustered databases by the same ovsdb-server process. (doesn't turn the whole jsonrpc server to read-only mode) - supports modern version of monitors (monitor_cond_since), because based on ovsdb-cs. - could be chained, i.e. multiple relays could be connected one to another in a row or in a tree-like form. - doesn't increase availability. - cannot be converted to other service models or become a main active server. Some performance test results can be found here: https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385825.html Acked-by: Mark D. Gray <mark.d.gray@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-06-01 23:27:36 +02:00
}
return error;
}
static struct ovsdb_error *
ovsdb_relay_clear(struct ovsdb *db)
{
struct ovsdb_txn *txn = ovsdb_txn_create(db);
struct shash_node *table_node;
ovsdb: relay: Add transaction history support. Even though relays can be scaled to the big number of servers to handle a lot more clients, lack of transaction history may cause significant load if clients are re-connecting. E.g. in case of the upgrade of a large-scale OVN deployment, relays can be taken down one by one forcing all the clients of one relay to jump to other ones. And all these clients will download the database from scratch from a new relay. Since relay itself supports monitor_cond_since connection to the main cluster, it receives the last transaction id along with each update. Since these transaction ids are 'eid's of actual transactions, they can be used by relay for a transaction history. Relay may not receive all the transaction ids, because the main cluster may combine several changes into a single monitor update. However, all relays will, likely, receive same updates with the same transaction ids, so the case where transaction id can not be found after re-connection between relays should not be very common. If some id is missing on the relay (i.e. this update was merged with some other update and newer id was used) the client will just re-download the database as if there was a normal transaction history miss. OVSDB client synchronization module updated to provide the last transaction id along with the update. Relay module updated to use these ids as a transaction id. If ids are zero, relay decides that the main server doesn't support transaction ids and disables the transaction history accordingly. Using ovsdb_txn_replay_commit() instead of ovsdb_txn_propose_commit_block(), so transactions are added to the history. This can be done, because relays has no file storage, so there is no need to write anything. Relay tests modified to test both standalone and clustered database as a main server. Checks added to ensure that all servers receive the same transaction ids in monitor updates. Acked-by: Mike Pattrick <mkp@redhat.com> Acked-by: Han Zhou <hzhou@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-12-19 15:09:39 +01:00
struct ovsdb_error *error;
ovsdb: New ovsdb 'relay' service model. New database service model 'relay' that is needed to scale out read-mostly database access, e.g. ovn-controller connections to OVN_Southbound. In this service model ovsdb-server connects to existing OVSDB server and maintains in-memory copy of the database. It serves read-only transactions and monitor requests by its own, but forwards write transactions to the relay source. Key differences from the active-backup replication: - support for "write" transactions (next commit). - no on-disk storage. (probably, faster operation) - support for multiple remotes (connect to the clustered db). - doesn't try to keep connection as long as possible, but faster reconnects to other remotes to avoid missing updates. - No need to know the complete database schema beforehand, only the schema name. - can be used along with other standalone and clustered databases by the same ovsdb-server process. (doesn't turn the whole jsonrpc server to read-only mode) - supports modern version of monitors (monitor_cond_since), because based on ovsdb-cs. - could be chained, i.e. multiple relays could be connected one to another in a row or in a tree-like form. - doesn't increase availability. - cannot be converted to other service models or become a main active server. Some performance test results can be found here: https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385825.html Acked-by: Mark D. Gray <mark.d.gray@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-06-01 23:27:36 +02:00
SHASH_FOR_EACH (table_node, &db->tables) {
struct ovsdb_table *table = table_node->data;
struct ovsdb_row *row;
ovsdb: New ovsdb 'relay' service model. New database service model 'relay' that is needed to scale out read-mostly database access, e.g. ovn-controller connections to OVN_Southbound. In this service model ovsdb-server connects to existing OVSDB server and maintains in-memory copy of the database. It serves read-only transactions and monitor requests by its own, but forwards write transactions to the relay source. Key differences from the active-backup replication: - support for "write" transactions (next commit). - no on-disk storage. (probably, faster operation) - support for multiple remotes (connect to the clustered db). - doesn't try to keep connection as long as possible, but faster reconnects to other remotes to avoid missing updates. - No need to know the complete database schema beforehand, only the schema name. - can be used along with other standalone and clustered databases by the same ovsdb-server process. (doesn't turn the whole jsonrpc server to read-only mode) - supports modern version of monitors (monitor_cond_since), because based on ovsdb-cs. - could be chained, i.e. multiple relays could be connected one to another in a row or in a tree-like form. - doesn't increase availability. - cannot be converted to other service models or become a main active server. Some performance test results can be found here: https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385825.html Acked-by: Mark D. Gray <mark.d.gray@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-06-01 23:27:36 +02:00
HMAP_FOR_EACH_SAFE (row, hmap_node, &table->rows) {
ovsdb: New ovsdb 'relay' service model. New database service model 'relay' that is needed to scale out read-mostly database access, e.g. ovn-controller connections to OVN_Southbound. In this service model ovsdb-server connects to existing OVSDB server and maintains in-memory copy of the database. It serves read-only transactions and monitor requests by its own, but forwards write transactions to the relay source. Key differences from the active-backup replication: - support for "write" transactions (next commit). - no on-disk storage. (probably, faster operation) - support for multiple remotes (connect to the clustered db). - doesn't try to keep connection as long as possible, but faster reconnects to other remotes to avoid missing updates. - No need to know the complete database schema beforehand, only the schema name. - can be used along with other standalone and clustered databases by the same ovsdb-server process. (doesn't turn the whole jsonrpc server to read-only mode) - supports modern version of monitors (monitor_cond_since), because based on ovsdb-cs. - could be chained, i.e. multiple relays could be connected one to another in a row or in a tree-like form. - doesn't increase availability. - cannot be converted to other service models or become a main active server. Some performance test results can be found here: https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385825.html Acked-by: Mark D. Gray <mark.d.gray@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-06-01 23:27:36 +02:00
ovsdb_txn_row_delete(txn, row);
}
}
ovsdb: relay: Add transaction history support. Even though relays can be scaled to the big number of servers to handle a lot more clients, lack of transaction history may cause significant load if clients are re-connecting. E.g. in case of the upgrade of a large-scale OVN deployment, relays can be taken down one by one forcing all the clients of one relay to jump to other ones. And all these clients will download the database from scratch from a new relay. Since relay itself supports monitor_cond_since connection to the main cluster, it receives the last transaction id along with each update. Since these transaction ids are 'eid's of actual transactions, they can be used by relay for a transaction history. Relay may not receive all the transaction ids, because the main cluster may combine several changes into a single monitor update. However, all relays will, likely, receive same updates with the same transaction ids, so the case where transaction id can not be found after re-connection between relays should not be very common. If some id is missing on the relay (i.e. this update was merged with some other update and newer id was used) the client will just re-download the database as if there was a normal transaction history miss. OVSDB client synchronization module updated to provide the last transaction id along with the update. Relay module updated to use these ids as a transaction id. If ids are zero, relay decides that the main server doesn't support transaction ids and disables the transaction history accordingly. Using ovsdb_txn_replay_commit() instead of ovsdb_txn_propose_commit_block(), so transactions are added to the history. This can be done, because relays has no file storage, so there is no need to write anything. Relay tests modified to test both standalone and clustered database as a main server. Checks added to ensure that all servers receive the same transaction ids in monitor updates. Acked-by: Mike Pattrick <mkp@redhat.com> Acked-by: Han Zhou <hzhou@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-12-19 15:09:39 +01:00
/* There is no storage, so ovsdb_txn_replay_commit() can be used. */
error = ovsdb_txn_replay_commit(txn);
/* Clearing the transaction history, and re-enabling it. */
ovsdb_txn_history_destroy(db);
ovsdb_txn_history_init(db, true);
return error;
ovsdb: New ovsdb 'relay' service model. New database service model 'relay' that is needed to scale out read-mostly database access, e.g. ovn-controller connections to OVN_Southbound. In this service model ovsdb-server connects to existing OVSDB server and maintains in-memory copy of the database. It serves read-only transactions and monitor requests by its own, but forwards write transactions to the relay source. Key differences from the active-backup replication: - support for "write" transactions (next commit). - no on-disk storage. (probably, faster operation) - support for multiple remotes (connect to the clustered db). - doesn't try to keep connection as long as possible, but faster reconnects to other remotes to avoid missing updates. - No need to know the complete database schema beforehand, only the schema name. - can be used along with other standalone and clustered databases by the same ovsdb-server process. (doesn't turn the whole jsonrpc server to read-only mode) - supports modern version of monitors (monitor_cond_since), because based on ovsdb-cs. - could be chained, i.e. multiple relays could be connected one to another in a row or in a tree-like form. - doesn't increase availability. - cannot be converted to other service models or become a main active server. Some performance test results can be found here: https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385825.html Acked-by: Mark D. Gray <mark.d.gray@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-06-01 23:27:36 +02:00
}
static void
ovsdb_relay_parse_update(struct relay_ctx *ctx,
const struct ovsdb_cs_update_event *update)
{
struct ovsdb_error *error = NULL;
ovsdb: New ovsdb 'relay' service model. New database service model 'relay' that is needed to scale out read-mostly database access, e.g. ovn-controller connections to OVN_Southbound. In this service model ovsdb-server connects to existing OVSDB server and maintains in-memory copy of the database. It serves read-only transactions and monitor requests by its own, but forwards write transactions to the relay source. Key differences from the active-backup replication: - support for "write" transactions (next commit). - no on-disk storage. (probably, faster operation) - support for multiple remotes (connect to the clustered db). - doesn't try to keep connection as long as possible, but faster reconnects to other remotes to avoid missing updates. - No need to know the complete database schema beforehand, only the schema name. - can be used along with other standalone and clustered databases by the same ovsdb-server process. (doesn't turn the whole jsonrpc server to read-only mode) - supports modern version of monitors (monitor_cond_since), because based on ovsdb-cs. - could be chained, i.e. multiple relays could be connected one to another in a row or in a tree-like form. - doesn't increase availability. - cannot be converted to other service models or become a main active server. Some performance test results can be found here: https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385825.html Acked-by: Mark D. Gray <mark.d.gray@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-06-01 23:27:36 +02:00
if (!ctx->db) {
return;
}
if (update->monitor_reply && ctx->new_schema) {
/* There was a schema change. Updating a database with a new schema
* before processing monitor reply with the new data. */
error = ctx->schema_change_cb(ctx->db, ctx->new_schema, &UUID_ZERO,
false, ctx->schema_change_aux);
if (error) {
/* Should never happen, but handle this case anyway. */
static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
char *s = ovsdb_error_to_string_free(error);
VLOG_ERR_RL(&rl, "%s", s);
free(s);
ovsdb_cs_flag_inconsistency(ctx->cs);
return;
}
ovsdb: New ovsdb 'relay' service model. New database service model 'relay' that is needed to scale out read-mostly database access, e.g. ovn-controller connections to OVN_Southbound. In this service model ovsdb-server connects to existing OVSDB server and maintains in-memory copy of the database. It serves read-only transactions and monitor requests by its own, but forwards write transactions to the relay source. Key differences from the active-backup replication: - support for "write" transactions (next commit). - no on-disk storage. (probably, faster operation) - support for multiple remotes (connect to the clustered db). - doesn't try to keep connection as long as possible, but faster reconnects to other remotes to avoid missing updates. - No need to know the complete database schema beforehand, only the schema name. - can be used along with other standalone and clustered databases by the same ovsdb-server process. (doesn't turn the whole jsonrpc server to read-only mode) - supports modern version of monitors (monitor_cond_since), because based on ovsdb-cs. - could be chained, i.e. multiple relays could be connected one to another in a row or in a tree-like form. - doesn't increase availability. - cannot be converted to other service models or become a main active server. Some performance test results can be found here: https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385825.html Acked-by: Mark D. Gray <mark.d.gray@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-06-01 23:27:36 +02:00
ovsdb_schema_destroy(ctx->new_schema);
ctx->new_schema = NULL;
}
struct ovsdb_cs_db_update *du;
error = ovsdb_cs_parse_db_update(update->table_updates,
update->version, &du);
ovsdb: New ovsdb 'relay' service model. New database service model 'relay' that is needed to scale out read-mostly database access, e.g. ovn-controller connections to OVN_Southbound. In this service model ovsdb-server connects to existing OVSDB server and maintains in-memory copy of the database. It serves read-only transactions and monitor requests by its own, but forwards write transactions to the relay source. Key differences from the active-backup replication: - support for "write" transactions (next commit). - no on-disk storage. (probably, faster operation) - support for multiple remotes (connect to the clustered db). - doesn't try to keep connection as long as possible, but faster reconnects to other remotes to avoid missing updates. - No need to know the complete database schema beforehand, only the schema name. - can be used along with other standalone and clustered databases by the same ovsdb-server process. (doesn't turn the whole jsonrpc server to read-only mode) - supports modern version of monitors (monitor_cond_since), because based on ovsdb-cs. - could be chained, i.e. multiple relays could be connected one to another in a row or in a tree-like form. - doesn't increase availability. - cannot be converted to other service models or become a main active server. Some performance test results can be found here: https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385825.html Acked-by: Mark D. Gray <mark.d.gray@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-06-01 23:27:36 +02:00
if (!error) {
if (update->clear) {
error = ovsdb_relay_clear(ctx->db);
}
if (!error) {
ovsdb: relay: Add transaction history support. Even though relays can be scaled to the big number of servers to handle a lot more clients, lack of transaction history may cause significant load if clients are re-connecting. E.g. in case of the upgrade of a large-scale OVN deployment, relays can be taken down one by one forcing all the clients of one relay to jump to other ones. And all these clients will download the database from scratch from a new relay. Since relay itself supports monitor_cond_since connection to the main cluster, it receives the last transaction id along with each update. Since these transaction ids are 'eid's of actual transactions, they can be used by relay for a transaction history. Relay may not receive all the transaction ids, because the main cluster may combine several changes into a single monitor update. However, all relays will, likely, receive same updates with the same transaction ids, so the case where transaction id can not be found after re-connection between relays should not be very common. If some id is missing on the relay (i.e. this update was merged with some other update and newer id was used) the client will just re-download the database as if there was a normal transaction history miss. OVSDB client synchronization module updated to provide the last transaction id along with the update. Relay module updated to use these ids as a transaction id. If ids are zero, relay decides that the main server doesn't support transaction ids and disables the transaction history accordingly. Using ovsdb_txn_replay_commit() instead of ovsdb_txn_propose_commit_block(), so transactions are added to the history. This can be done, because relays has no file storage, so there is no need to write anything. Relay tests modified to test both standalone and clustered database as a main server. Checks added to ensure that all servers receive the same transaction ids in monitor updates. Acked-by: Mike Pattrick <mkp@redhat.com> Acked-by: Han Zhou <hzhou@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-12-19 15:09:39 +01:00
error = ovsdb_relay_parse_update__(ctx->db, du, &update->last_id);
ovsdb: New ovsdb 'relay' service model. New database service model 'relay' that is needed to scale out read-mostly database access, e.g. ovn-controller connections to OVN_Southbound. In this service model ovsdb-server connects to existing OVSDB server and maintains in-memory copy of the database. It serves read-only transactions and monitor requests by its own, but forwards write transactions to the relay source. Key differences from the active-backup replication: - support for "write" transactions (next commit). - no on-disk storage. (probably, faster operation) - support for multiple remotes (connect to the clustered db). - doesn't try to keep connection as long as possible, but faster reconnects to other remotes to avoid missing updates. - No need to know the complete database schema beforehand, only the schema name. - can be used along with other standalone and clustered databases by the same ovsdb-server process. (doesn't turn the whole jsonrpc server to read-only mode) - supports modern version of monitors (monitor_cond_since), because based on ovsdb-cs. - could be chained, i.e. multiple relays could be connected one to another in a row or in a tree-like form. - doesn't increase availability. - cannot be converted to other service models or become a main active server. Some performance test results can be found here: https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385825.html Acked-by: Mark D. Gray <mark.d.gray@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-06-01 23:27:36 +02:00
}
}
ovsdb_cs_db_update_destroy(du);
if (error) {
static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
if (!VLOG_DROP_WARN(&rl)) {
char *s = ovsdb_error_to_string(error);
VLOG_WARN_RL(&rl, "%s", s);
free(s);
}
/* Something bad happened. Try to recover. */
if (!strcmp(ovsdb_error_get_tag(error), "consistency violation")) {
ovsdb_cs_flag_inconsistency(ctx->cs);
} else {
ovsdb_cs_force_reconnect(ctx->cs);
}
ovsdb_error_destroy(error);
}
}
void
ovsdb_relay_run(void)
{
struct shash_node *node;
SHASH_FOR_EACH (node, &relay_dbs) {
struct relay_ctx *ctx = node->data;
struct ovs_list events;
ovsdb: relay: Add support for transaction forwarding. Current version of ovsdb relay allows to scale out read-only access to the primary database. However, many clients are not read-only but read-mostly. For example, ovn-controller. In order to scale out database access for this case ovsdb-server need to process transactions that are not read-only. Relay is not allowed to do that, i.e. not allowed to modify the database, but it can act like a proxy and forward transactions that includes database modifications to the primary server and forward replies back to a client. At the same time it may serve read-only transactions and monitor requests by itself greatly reducing the load on primary server. This configuration will slightly increase transaction latency, but it's not very important for read-mostly use cases. Implementation details: With this change instead of creating a trigger to commit the transaction, ovsdb-server will create a trigger for transaction forwarding. Later, ovsdb_relay_run() will send all new transactions to the relay source. Once transaction reply received from the relay source, ovsdb-relay module will update the state of the transaction forwarding with the reply. After that, trigger_run() will complete the trigger and jsonrpc_server_run() will send the reply back to the client. Since transaction reply from the relay source will be received after all the updates, client will receive all the updates before receiving the transaction reply as it is in a normal scenario with other database models. Acked-by: Mark D. Gray <mark.d.gray@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-04-15 19:05:40 +02:00
ovsdb_txn_forward_run(ctx->db, ctx->cs);
ovsdb: New ovsdb 'relay' service model. New database service model 'relay' that is needed to scale out read-mostly database access, e.g. ovn-controller connections to OVN_Southbound. In this service model ovsdb-server connects to existing OVSDB server and maintains in-memory copy of the database. It serves read-only transactions and monitor requests by its own, but forwards write transactions to the relay source. Key differences from the active-backup replication: - support for "write" transactions (next commit). - no on-disk storage. (probably, faster operation) - support for multiple remotes (connect to the clustered db). - doesn't try to keep connection as long as possible, but faster reconnects to other remotes to avoid missing updates. - No need to know the complete database schema beforehand, only the schema name. - can be used along with other standalone and clustered databases by the same ovsdb-server process. (doesn't turn the whole jsonrpc server to read-only mode) - supports modern version of monitors (monitor_cond_since), because based on ovsdb-cs. - could be chained, i.e. multiple relays could be connected one to another in a row or in a tree-like form. - doesn't increase availability. - cannot be converted to other service models or become a main active server. Some performance test results can be found here: https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385825.html Acked-by: Mark D. Gray <mark.d.gray@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-06-01 23:27:36 +02:00
ovsdb_cs_run(ctx->cs, &events);
if (ovsdb_cs_may_send_transaction(ctx->cs)) {
ctx->last_connected = time_msec();
}
ovsdb: New ovsdb 'relay' service model. New database service model 'relay' that is needed to scale out read-mostly database access, e.g. ovn-controller connections to OVN_Southbound. In this service model ovsdb-server connects to existing OVSDB server and maintains in-memory copy of the database. It serves read-only transactions and monitor requests by its own, but forwards write transactions to the relay source. Key differences from the active-backup replication: - support for "write" transactions (next commit). - no on-disk storage. (probably, faster operation) - support for multiple remotes (connect to the clustered db). - doesn't try to keep connection as long as possible, but faster reconnects to other remotes to avoid missing updates. - No need to know the complete database schema beforehand, only the schema name. - can be used along with other standalone and clustered databases by the same ovsdb-server process. (doesn't turn the whole jsonrpc server to read-only mode) - supports modern version of monitors (monitor_cond_since), because based on ovsdb-cs. - could be chained, i.e. multiple relays could be connected one to another in a row or in a tree-like form. - doesn't increase availability. - cannot be converted to other service models or become a main active server. Some performance test results can be found here: https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385825.html Acked-by: Mark D. Gray <mark.d.gray@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-06-01 23:27:36 +02:00
struct ovsdb_cs_event *event;
LIST_FOR_EACH_POP (event, list_node, &events) {
if (!ctx->db) {
ovsdb_cs_event_destroy(event);
continue;
}
switch (event->type) {
case OVSDB_CS_EVENT_TYPE_RECONNECT:
ovsdb: relay: Add support for transaction forwarding. Current version of ovsdb relay allows to scale out read-only access to the primary database. However, many clients are not read-only but read-mostly. For example, ovn-controller. In order to scale out database access for this case ovsdb-server need to process transactions that are not read-only. Relay is not allowed to do that, i.e. not allowed to modify the database, but it can act like a proxy and forward transactions that includes database modifications to the primary server and forward replies back to a client. At the same time it may serve read-only transactions and monitor requests by itself greatly reducing the load on primary server. This configuration will slightly increase transaction latency, but it's not very important for read-mostly use cases. Implementation details: With this change instead of creating a trigger to commit the transaction, ovsdb-server will create a trigger for transaction forwarding. Later, ovsdb_relay_run() will send all new transactions to the relay source. Once transaction reply received from the relay source, ovsdb-relay module will update the state of the transaction forwarding with the reply. After that, trigger_run() will complete the trigger and jsonrpc_server_run() will send the reply back to the client. Since transaction reply from the relay source will be received after all the updates, client will receive all the updates before receiving the transaction reply as it is in a normal scenario with other database models. Acked-by: Mark D. Gray <mark.d.gray@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-04-15 19:05:40 +02:00
/* Cancelling all the transactions that were already sent but
* not replied yet as they might be lost. */
ovsdb_txn_forward_cancel_all(ctx->db, true);
ovsdb: New ovsdb 'relay' service model. New database service model 'relay' that is needed to scale out read-mostly database access, e.g. ovn-controller connections to OVN_Southbound. In this service model ovsdb-server connects to existing OVSDB server and maintains in-memory copy of the database. It serves read-only transactions and monitor requests by its own, but forwards write transactions to the relay source. Key differences from the active-backup replication: - support for "write" transactions (next commit). - no on-disk storage. (probably, faster operation) - support for multiple remotes (connect to the clustered db). - doesn't try to keep connection as long as possible, but faster reconnects to other remotes to avoid missing updates. - No need to know the complete database schema beforehand, only the schema name. - can be used along with other standalone and clustered databases by the same ovsdb-server process. (doesn't turn the whole jsonrpc server to read-only mode) - supports modern version of monitors (monitor_cond_since), because based on ovsdb-cs. - could be chained, i.e. multiple relays could be connected one to another in a row or in a tree-like form. - doesn't increase availability. - cannot be converted to other service models or become a main active server. Some performance test results can be found here: https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385825.html Acked-by: Mark D. Gray <mark.d.gray@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-06-01 23:27:36 +02:00
break;
case OVSDB_CS_EVENT_TYPE_UPDATE:
ovsdb_relay_parse_update(ctx, &event->update);
break;
case OVSDB_CS_EVENT_TYPE_TXN_REPLY:
ovsdb: relay: Add support for transaction forwarding. Current version of ovsdb relay allows to scale out read-only access to the primary database. However, many clients are not read-only but read-mostly. For example, ovn-controller. In order to scale out database access for this case ovsdb-server need to process transactions that are not read-only. Relay is not allowed to do that, i.e. not allowed to modify the database, but it can act like a proxy and forward transactions that includes database modifications to the primary server and forward replies back to a client. At the same time it may serve read-only transactions and monitor requests by itself greatly reducing the load on primary server. This configuration will slightly increase transaction latency, but it's not very important for read-mostly use cases. Implementation details: With this change instead of creating a trigger to commit the transaction, ovsdb-server will create a trigger for transaction forwarding. Later, ovsdb_relay_run() will send all new transactions to the relay source. Once transaction reply received from the relay source, ovsdb-relay module will update the state of the transaction forwarding with the reply. After that, trigger_run() will complete the trigger and jsonrpc_server_run() will send the reply back to the client. Since transaction reply from the relay source will be received after all the updates, client will receive all the updates before receiving the transaction reply as it is in a normal scenario with other database models. Acked-by: Mark D. Gray <mark.d.gray@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-04-15 19:05:40 +02:00
ovsdb_txn_forward_complete(ctx->db, event->txn_reply);
break;
ovsdb: New ovsdb 'relay' service model. New database service model 'relay' that is needed to scale out read-mostly database access, e.g. ovn-controller connections to OVN_Southbound. In this service model ovsdb-server connects to existing OVSDB server and maintains in-memory copy of the database. It serves read-only transactions and monitor requests by its own, but forwards write transactions to the relay source. Key differences from the active-backup replication: - support for "write" transactions (next commit). - no on-disk storage. (probably, faster operation) - support for multiple remotes (connect to the clustered db). - doesn't try to keep connection as long as possible, but faster reconnects to other remotes to avoid missing updates. - No need to know the complete database schema beforehand, only the schema name. - can be used along with other standalone and clustered databases by the same ovsdb-server process. (doesn't turn the whole jsonrpc server to read-only mode) - supports modern version of monitors (monitor_cond_since), because based on ovsdb-cs. - could be chained, i.e. multiple relays could be connected one to another in a row or in a tree-like form. - doesn't increase availability. - cannot be converted to other service models or become a main active server. Some performance test results can be found here: https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385825.html Acked-by: Mark D. Gray <mark.d.gray@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-06-01 23:27:36 +02:00
case OVSDB_CS_EVENT_TYPE_LOCKED:
ovsdb: relay: Add support for transaction forwarding. Current version of ovsdb relay allows to scale out read-only access to the primary database. However, many clients are not read-only but read-mostly. For example, ovn-controller. In order to scale out database access for this case ovsdb-server need to process transactions that are not read-only. Relay is not allowed to do that, i.e. not allowed to modify the database, but it can act like a proxy and forward transactions that includes database modifications to the primary server and forward replies back to a client. At the same time it may serve read-only transactions and monitor requests by itself greatly reducing the load on primary server. This configuration will slightly increase transaction latency, but it's not very important for read-mostly use cases. Implementation details: With this change instead of creating a trigger to commit the transaction, ovsdb-server will create a trigger for transaction forwarding. Later, ovsdb_relay_run() will send all new transactions to the relay source. Once transaction reply received from the relay source, ovsdb-relay module will update the state of the transaction forwarding with the reply. After that, trigger_run() will complete the trigger and jsonrpc_server_run() will send the reply back to the client. Since transaction reply from the relay source will be received after all the updates, client will receive all the updates before receiving the transaction reply as it is in a normal scenario with other database models. Acked-by: Mark D. Gray <mark.d.gray@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-04-15 19:05:40 +02:00
VLOG_WARN("%s: Unexpected LOCKED event.", ctx->db->name);
ovsdb: New ovsdb 'relay' service model. New database service model 'relay' that is needed to scale out read-mostly database access, e.g. ovn-controller connections to OVN_Southbound. In this service model ovsdb-server connects to existing OVSDB server and maintains in-memory copy of the database. It serves read-only transactions and monitor requests by its own, but forwards write transactions to the relay source. Key differences from the active-backup replication: - support for "write" transactions (next commit). - no on-disk storage. (probably, faster operation) - support for multiple remotes (connect to the clustered db). - doesn't try to keep connection as long as possible, but faster reconnects to other remotes to avoid missing updates. - No need to know the complete database schema beforehand, only the schema name. - can be used along with other standalone and clustered databases by the same ovsdb-server process. (doesn't turn the whole jsonrpc server to read-only mode) - supports modern version of monitors (monitor_cond_since), because based on ovsdb-cs. - could be chained, i.e. multiple relays could be connected one to another in a row or in a tree-like form. - doesn't increase availability. - cannot be converted to other service models or become a main active server. Some performance test results can be found here: https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385825.html Acked-by: Mark D. Gray <mark.d.gray@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-06-01 23:27:36 +02:00
break;
}
ovsdb_cs_event_destroy(event);
}
ovsdb-server: Fix excessive memory usage on DB open. During initial read of a database file all the file transactions are added to the transaction history. The history run with the history size checks is only executed after the whole file is processed. If, for some reason, the file contains way too many transactions, this behavior may result in excessive memory consumption up to hundreds of GBs. For example, here is a log entry about memory usage after reading a file with 100K+ OVN NbDB transactions: |00004|memory|INFO|95650400 kB peak resident set size after 96.9 seconds |00005|memory|INFO|atoms:3083346 cells:1838767 monitors:0 raft-log:123309 txn-history:123307 txn-history-atoms:1647022868 In this particular case ovsdb-server allocated 95 GB of RAM in order to accommodate 1.6 billion ovsdb atoms in the history, while only 3 million atoms are in the actual database. Fix that by running history size checks after applying each file transaction. This way the memory usage while reading the database from the example stays at about 1 GB mark. History size checks are cheap in comparison with transaction replay, so the additional calls do not reduce performance. We could've just moved the history run into ovsdb_txn_replay_commit(), but it seems more organic to call it externally, since we have init() and destroy() functions called externally as well. Since the history run will be executed shortly after reading the database and actual memory consumption peak is not always logged, there seem to be no reliable way to unit test for the issue without adding extra testing infrastructure into the code. Fixes: 695e81502794 ("ovsdb-server: Transaction history tracking.") Reported-at: https://bugzilla.redhat.com/2228464 Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-08-02 15:45:32 +02:00
ovsdb_txn_history_run(ctx->db);
ovsdb: New ovsdb 'relay' service model. New database service model 'relay' that is needed to scale out read-mostly database access, e.g. ovn-controller connections to OVN_Southbound. In this service model ovsdb-server connects to existing OVSDB server and maintains in-memory copy of the database. It serves read-only transactions and monitor requests by its own, but forwards write transactions to the relay source. Key differences from the active-backup replication: - support for "write" transactions (next commit). - no on-disk storage. (probably, faster operation) - support for multiple remotes (connect to the clustered db). - doesn't try to keep connection as long as possible, but faster reconnects to other remotes to avoid missing updates. - No need to know the complete database schema beforehand, only the schema name. - can be used along with other standalone and clustered databases by the same ovsdb-server process. (doesn't turn the whole jsonrpc server to read-only mode) - supports modern version of monitors (monitor_cond_since), because based on ovsdb-cs. - could be chained, i.e. multiple relays could be connected one to another in a row or in a tree-like form. - doesn't increase availability. - cannot be converted to other service models or become a main active server. Some performance test results can be found here: https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385825.html Acked-by: Mark D. Gray <mark.d.gray@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-06-01 23:27:36 +02:00
}
}
void
ovsdb_relay_wait(void)
{
struct shash_node *node;
SHASH_FOR_EACH (node, &relay_dbs) {
struct relay_ctx *ctx = node->data;
ovsdb_cs_wait(ctx->cs);
ovsdb: relay: Add support for transaction forwarding. Current version of ovsdb relay allows to scale out read-only access to the primary database. However, many clients are not read-only but read-mostly. For example, ovn-controller. In order to scale out database access for this case ovsdb-server need to process transactions that are not read-only. Relay is not allowed to do that, i.e. not allowed to modify the database, but it can act like a proxy and forward transactions that includes database modifications to the primary server and forward replies back to a client. At the same time it may serve read-only transactions and monitor requests by itself greatly reducing the load on primary server. This configuration will slightly increase transaction latency, but it's not very important for read-mostly use cases. Implementation details: With this change instead of creating a trigger to commit the transaction, ovsdb-server will create a trigger for transaction forwarding. Later, ovsdb_relay_run() will send all new transactions to the relay source. Once transaction reply received from the relay source, ovsdb-relay module will update the state of the transaction forwarding with the reply. After that, trigger_run() will complete the trigger and jsonrpc_server_run() will send the reply back to the client. Since transaction reply from the relay source will be received after all the updates, client will receive all the updates before receiving the transaction reply as it is in a normal scenario with other database models. Acked-by: Mark D. Gray <mark.d.gray@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-04-15 19:05:40 +02:00
ovsdb_txn_forward_wait(ctx->db, ctx->cs);
ovsdb: New ovsdb 'relay' service model. New database service model 'relay' that is needed to scale out read-mostly database access, e.g. ovn-controller connections to OVN_Southbound. In this service model ovsdb-server connects to existing OVSDB server and maintains in-memory copy of the database. It serves read-only transactions and monitor requests by its own, but forwards write transactions to the relay source. Key differences from the active-backup replication: - support for "write" transactions (next commit). - no on-disk storage. (probably, faster operation) - support for multiple remotes (connect to the clustered db). - doesn't try to keep connection as long as possible, but faster reconnects to other remotes to avoid missing updates. - No need to know the complete database schema beforehand, only the schema name. - can be used along with other standalone and clustered databases by the same ovsdb-server process. (doesn't turn the whole jsonrpc server to read-only mode) - supports modern version of monitors (monitor_cond_since), because based on ovsdb-cs. - could be chained, i.e. multiple relays could be connected one to another in a row or in a tree-like form. - doesn't increase availability. - cannot be converted to other service models or become a main active server. Some performance test results can be found here: https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385825.html Acked-by: Mark D. Gray <mark.d.gray@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-06-01 23:27:36 +02:00
}
}