ovsdb-server allows the OVSDB clients to specify the uuid for
the row inserts [1]. Both the C IDL client library and Python
IDL are missing this feature. This patch adds this support.
In C IDL, for each schema table, a new function is generated -
<schema_table>insert_persistent_uuid(txn, uuid) which can
be used the clients to persist the uuid.
ovs-vsctl and other derivatives of ctl now supports the same
in the generic 'create' command with the option "--id=<UUID>".
In Python IDL, the uuid to persist can be specified in
the Transaction.insert() function.
[1] - a529e3cd1f("ovsdb-server: Allow OVSDB clients to specify the UUID for inserted rows.:)
Acked-by: Adrian Moreno <amorenoz@redhat.com>
Acked-by: Han Zhou <hzhou@ovn.org>
Acked-by: Terry Wilson <twilson@redhat.com>
Signed-off-by: Numan Siddique <numans@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
While adding new rows ovsdb-idl re-parses all the other rows that
references this new one. For example, current ovn-kubernetes creates
load balancers and adds the same load balancer to all logical switches
and logical routers. So, then a new load balancer is added, rows for
all logical switches and routers re-parsed.
During initial database connection (or re-connection with
monitor/monitor_cond or monitor_cond_since with outdated last
transaction id) the client downloads the whole content of a database.
In case of OVN, there might be already thousands of load balancers
configured. ovsdb-idl will process rows in that initial monitor reply
one-by-one. Therefore, for each load balancer row, it will re-parse
all rows for switches and routers.
Assuming that we have 120 Logical Switches and 30K load balancers.
Processing of the initial monitor reply will take 120 (switch rows) *
30K (load balancer references in a switch row) * 30K (load balancer
rows) = 10^11 operations, which may take hours. ovn-kubernetes will
use LB groups eventually, but there are other less obvious cases that
cannot be changed that easily.
Re-parsing doesn't change any internal structures of the IDL. It
destroys and re-creates exactly same arcs between rows. The only
thing that changes is the application-facing array of pointers.
Since internal structures remains intact, suggested solution is to
postpone the re-parsing of back references until all the monitor
updates processed. This way we can re-parse each row only once.
Tested in a sandbox with 120 LSs, 120 LRs and 3K LBs, where each
load balancer added to each LS and LR, by re-statring ovn-northd and
measuring the time spent in ovsdb_idl_run().
Before the change:
OVN_Southbound: ovsdb_idl_run took: 924 ms
OVN_Northbound: ovsdb_idl_run took: 825118 ms --> 13.75 minutes!
After:
OVN_Southbound: ovsdb_idl_run took: 692 ms
OVN_Northbound: ovsdb_idl_run took: 1698 ms
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This patch adds 2 new APIs in the ovsdb-idl client library
- ovsdb_idl_server_has_table() and ovsdb_idl_server_has_column() to
query if a table and a column is present in the IDL or not. This
patch also adds IDL helper functions which are auto generated from
the schema which makes it easier for the clients.
These APIs are required for scenarios where the server schema is old and
missing a table or column and the client (built with a new schema
version) does a transaction with the missing table or column. This
results in a continuous loop of transaction failures.
Related-Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1992705
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This change breaks the IDL into two layers: the IDL proper, whose
interface to its client is unchanged, and a low-level library called
the OVSDB "client synchronization" (CS) library. There are two
reasons for this change. First, the IDL is big and complicated and
I think that this change factors out some of that complication into
a simpler lower layer. Second, the OVN northd implementation based
on DDlog can benefit from the client synchronization library even
though it would actually be made increasingly complicated by the IDL.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Assuming an ovsdb client connected to a database using OVSDB_MONITOR_V3
(i.e., "monitor_cond_since" method) with the initial monitor condition
MC1.
Assuming the following two transactions are executed on the
ovsdb-server:
TXN1: "insert record R1 in table T1"
TXN2: "insert record R2 in table T2"
If the client's monitor condition MC1 for table T2 matches R2 then the
client will receive the following update3 message:
method="update3", "insert record R2 in table T2", last-txn-id=TXN2
At this point, if the presence of the new record R2 in the IDL triggers
the client to update its monitor condition to MC2 and add a clause for
table T1 which matches R1, a monitor_cond_change message is sent to the
server:
method="monitor_cond_change", "clauses from MC2"
In normal operation the ovsdb-server will reply with a new update3
message of the form:
method="update3", "insert record R1 in table T1", last-txn-id=TXN2
However, if the connection drops in the meantime, this last update might
get lost.
It might happen that during the reconnect a new transaction happens
that modifies the original record R1:
TXN3: "modify record R1 in table T1"
When the client reconnects, it will try to perform a fast resync by
sending:
method="monitor_cond_since", "clauses from MC2", last-txn-id=TXN2
Because TXN2 is still in the ovsdb-server transaction history, the
server replies with the changes from the most recent transactions only,
i.e., TXN3:
result="true", last-txbb-id=TXN3, "modify record R1 in table T1"
This causes the IDL on the client in to end up in an inconsistent
state because it has never seen the update that created R1.
Such a scenario is described in:
https://bugzilla.redhat.com/show_bug.cgi?id=1808580#c22
To avoid this issue, the IDL will now maintain (up to) 3 different
types of conditions for each DB table:
- new_cond: condition that has been set by the IDL client but has
not yet been sent to the server through monitor_cond_change.
- req_cond: condition that has been sent to the server but the reply
acknowledging the change hasn't been received yet.
- ack_cond: condition that has been acknowledged by the server.
Whenever the IDL FSM is restarted (e.g., voluntary or involuntary
disconnect):
- if there is a known last_id txn-id the code ensures that new_cond
will contain the most recent condition set by the IDL client
(either req_cond if there was a request in flight, or new_cond
if the IDL client set a condition while the IDL was disconnected)
- if there is no known last_id txn-id the code ensures that ack_cond will
contain the most recent conditions set by the IDL client regardless
whether they were acked by the server or not.
When monitor_cond_since/monitor_cond requests are sent they will
always include ack_cond and if new_cond is not NULL a follow up
monitor_cond_change will be generated afterwards.
On the other hand ovsdb_idl_db_set_condition() will always modify new_cond.
This ensures that updates of type "insert" that happened before the last
transaction known by the IDL but didn't match old monitor conditions are
sent upon reconnect if the monitor condition has changed to include them
in the meantime.
Fixes: 403a6a0cb0 ("ovsdb-idl: Fast resync from server when connection reset.")
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
OVSDB IDL can track changes, but for deleted rows, the data is
destroyed and only uuid is tracked. In some cases we need to
check the data of the deleted rows. This patch preserves data
for deleted rows until track clear is called.
Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
The design of the compound index feature in the C OVSDB IDL was unusual.
Indexes were generally referenced only by name rather than by pointer, and
could be obtained only from the top-level ovsdb_idl object. To iterate or
otherwise search an index required explicitly creating a special
ovsdb_idl_cursor object, which at least seemed somewhat heavy-weight given
that it required a string lookup in a table of indexes.
This commit redesigns the compound index interface. It discards the use of
names for indexes, instead having clients pass in a pointer to the index
object itself. It simplifies how indexes are created, gets rid of the need
for explicit cursor objects, and updates all of the users to the new
interface.
The underlying reason for this commit is to make it easier in
ovn-controller to keep track of the dependencies for a given function, by
making the indexes explicit arguments to any function that needs to use
them.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Han Zhou <hzhou8@ebay.com>
Until now, a given ovsdb-idl instances has only monitored a single
database. In an upcoming commit, it will grow to also monitor a second
database that represents the state of the database server itself. Much of
the work is the same for both databases, so this commit breaks the common
code and data out into new data structures and functions.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
A synthetic column is one that is not present in the actual database but
instead calculated by code in the client based on columns in the row. This
can be useful to avoid repeatedly calculating the same function of a row.
Signed-off-by: Ben Pfaff <blp@ovn.org>
By verifying that singleton tables (that is, tables that should have exactly
one row) are empty when they emit transactions that insert into them,
ovs-vsctl and similar tools tolerate initialization races, where more than one
client at a time tries to initialize a singleton table.
The upshot is that if you create a database and then run multiple ovs-vsctl
(etc.) commands against it in parallel (without first initializing it
serially), then without this patch sometimes you will sometimes get failures
but this patch avoids them.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
This patch adds 'extern "C"' in a couple of header files so that
they can be compiled with C++ compilers.
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Now that the 'new' datum is named 'new_datum', be more consistent by
renaming 'old' to 'old_datum' to match.
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
In C++, 'new' is a keyword. If this is used as the name for a field,
then C++ compilers can get confused about the context and fail to
compile references to such fields. Rename the field to 'new_datum' to
avoid this issue.
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
In C++, 'mutable' is a keyword. If this is used as the name for a field,
then C++ compilers can get confused about the context and fail to
compile references to such fields. Rename the field to 'is_mutable' to
avoid this issue.
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
In C++, 'class' is a keyword. If this is used as the name for a field,
then C++ compilers can get confused about the context and fail to
compile references to such fields. Rename the field to 'class_' to
avoid this issue.
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
This patch adds support for the creation of multicolumn indexes
in the C IDL to enable for efficient search and retrieval of database
rows by key.
Signed-off-by: Esteban Rodriguez Betancourt <estebarb@hpe.com>
Co-authored-by: Lance Richardson <lrichard@redhat.com>
Signed-off-by: Lance Richardson <lrichard@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Most users of OVSDB react to whatever is currently in their view of the
database, as opposed to keeping track of changes and reacting to those
changes individually. The interface to conditional monitoring was
different, in that it expected the client to say what to add or remove from
monitoring instead of what to monitor. This seemed reasonable at the time,
but in practice it turns out that the usual approach actually works better,
because the condition is generally a function of the data visible in the
database. This commit changes the approach.
This commit also changes the meaning of an empty condition for a table.
Previously, an empty condition meant to replicate every row. Now, an empty
condition means to replicate no rows. This is more convenient for code
that gradually constructs conditions, because it does not need special
cases for replicating nothing.
This commit also changes the internal implementation of conditions from
linked lists to arrays. I just couldn't see an advantage to using linked
lists.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Liran Schour <lirans@il.ibm.com>
The 'tc' member of struct ovsdb_idl_condition was written but never read,
so remove it.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Mickey Spiegel <mickeys.dev@gmail.com>
This function doesn't modify its 'dst_table' parameter, so it might as well
be marked const.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Andy Zhou <azhou@ovn.org>
I wrote this code and if I have to rediscover how it works, it's time to
improve the commnts.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Andy Zhou <azhou@ovn.org>
The conditional replication code had hardly any comments. This adds some.
This commit also fixes a number of style problems, factors out some code
into a helper function, and moves some struct declarations from a public
header, that were not used by client code, into more private locations.
Signed-off-by: Ben Pfaff <blp@ovn.org>
This patchset mimics the changes introduced in
f199df26 (ovsdb-idl: Add partial map updates functionality.)
010fe7ae (ovsdb-idlc.in: Autogenerate partial map updates functions.)
7251075c (tests: Add test for partial map updates.)
b1048e6a (ovsdb-idl: Fix issues detected in Partial Map Update feature)
but for columns that store sets of values rather than key-value
pairs. These columns will now be able to use the OVSDB mutate
operation to transmit deltas on the wire rather than use
verify/update and transmit wait/update operations on the wire.
Side effect of modifying the comments in the partial map update
tests.
Signed-off-by: Ryan Moats <rmoats@us.ibm.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
To easily allow both in- and out-of-tree building of the Python
wrapper for the OVS JSON parser (e.g. w/ pip), move json.h to
include/openvswitch. This also requires moving lib/{hmap,shash}.h.
Both hmap.h and shash.h were #include-ing "util.h" even though the
headers themselves did not use anything from there, but rather from
include/openvswitch/util.h. Fixing that required including util.h
in several C files mostly due to OVS_NOT_REACHED and things like
xmalloc.
Signed-off-by: Terry Wilson <twilson@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Add to IDL API that allows the user to add and remove clauses on a table's condition
iteratively. IDL maintain tables condition and send monitor_cond_change to the server
upon condition change.
Add tests for conditional monitoring to IDL.
Signed-off-by: Liran Schour <lirans@il.ibm.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
In the current implementation, every time an element of either a map or set
column has to be modified, the entire content of the column is sent to the
server to be updated. This is not a major problem if the information contained
in the column for the corresponding row is small, but there are cases where
these columns can have a significant amount of elements per row, or these
values are updated frequently, therefore the cost of the modifications becomes
high in terms of time and bandwidth.
In this solution, the ovsdb-idl code is modified to use the RFC 7047 'mutate'
operation, to allow sending partial modifications on map columns to the server.
The functionality is exposed to clients in the vswitch idl. This was
implemented through map operations.
A map operation is defined as an insertion, update or deletion of a key-value
pair inside a map. The idea is to minimize the amount of map operations
that are send to the OVSDB server when a transaction is committed.
In order to keep track of the requested map operations, structs map_op and
map_op_list were defined with accompanying functions to manipulate them. These
functions make sure that only one operation is send to the server for each
key-value that wants to be modified, so multiple operation on a key value are
collapsed into a single operation.
As an example, if a client using the IDL updates several times the value for
the same key, the functions will ensure that only the last value is send to
the server, instead of multiple updates. Or, if the client inserts a key-value,
and later on deletes the key before committing the transaction, then both
actions cancel out and no map operation is send for that key.
To keep track of the desired map operations on each transaction, a list of map
operations (struct map_op_list) is created for every column on the row on which
a map operation is performed. When a new map operation is requested on the same
column, the corresponding map_op_list is checked to verify if a previous
operations was performed on the same key, on the same transaction. If there is
no previous operation, then the new operation is just added into the list. But
if there was a previous operation on the same key, then the previous operation
is collapsed with the new operation into a single operation that preserves the
final result if both operations were to be performed sequentially. This design
keep a small memory footprint during transactions.
When a transaction is committed, the map operations lists are checked and
all map operations that belong to the same map are grouped together into a
single JSON RPC "mutate" operation, in which each map_op is transformed into
the necessary "insert" or "delete" mutators. Then the "mutate" operation is
added to the operations that will be send to the server.
Once the transaction is finished, all map operation lists are cleared and
deleted, so the next transaction starts with a clean board for map operations.
Using different structures and logic to handle map operations, instead of
trying to force the current structures (like 'old' and 'new' datums in the row)
to handle then, ensures that map operations won't mess up with the current
logic to generate JSON messages for other operations, avoids duplicating the
whole map for just a few changes, and is faster for insert and delete
operations, because there is no need to maintain the invariants in the 'new'
datum.
Signed-off-by: Edward Aymerich <edward.aymerich@hpe.com>
Signed-off-by: Arnoldo Lutz <arnoldo.lutz.guevara@hpe.com>
Co-authored-by: Arnoldo Lutz <arnoldo.lutz.guevara@hpe.com>
[blp@ovn.org made style changes and factored out error checking]
Signed-off-by: Ben Pfaff <blp@ovn.org>
Recent IDL change tracking patches allow quick traversal of changed
rows. This patch adds additional support to track changed columns.
It allows an IDL client to efficiently check if a specific column
of a row was updated by IDL.
Signed-off-by: Shad Ansari <shad.ansar@hpe.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
The new comment reflects with more clarity what ovsdb_idl_add_table() does.
Previous comment could be misunderstood, leading to believe that this function
replicates all columns on IDL. Hopefully this fix clarifies that columns are
not replicated, just minimal data for reference integrity is replicated.
A comment in ovsdb_idl_table_class is also modified to better reflect this
behaviour.
Signed-off-by: Edward Aymerich <edward.aymerich@hpe.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Ovsdb-idl notifies a client that something changed; it does not track
which table, row changed in what way (insert, modify or delete).
As a result, a client has to scan or reconfigure the entire idl after
ovsdb_idl_run(). This is presumably fine for typical ovs schemas where
tables are relatively small. In use-cases where ovsdb is used with
schemas that can have very large tables, the current ovsdb-idl
notification mechanism does not appear to scale - clients need to do a
lot of processing to determine the exact change delta.
This change adds support for:
- Table and row based change sequence numbers to record the
most recent IDL change sequence numbers associated with insert,
modify or delete update on that table or row.
- Change tracking of specific columns. This ensures that changed
rows (inserted, modified, deleted) that have tracked columns, are
tracked by IDL. The client can directly access the changed rows
with get_first, get_next operations without the need to scan the
entire table.
The tracking functionality is not enabled by default and needs to
be turned on per-column by the client after ovsdb_idl_create()
and before ovsdb_idl_run().
/* Example Usage */
idl = ovsdb_idl_create(...);
/* Track specific columns */
ovsdb_idl_track_add_column(idl, column);
/* Or, track all columns */
ovsdb_idl_track_add_all(idl);
for (;;) {
ovsdb_idl_run(idl);
seqno = ovsdb_idl_get_seqno(idl);
/* Process only the changed rows in Table FOO */
FOO_FOR_EACH_TRACKED(row, idl) {
/* Determine the type of change from the row seqnos */
if (foo_row_get_seqno(row, OVSDB_IDL_CHANGE_DELETE)
>= seqno)) {
printf("row deleted\n");
} else if (foo_row_get_seqno(row, OVSDB_IDL_CHANGE_MODIFY)
>= seqno))
printf("row modified\n");
} else if (foo_row_get_seqno(row, OVSDB_IDL_CHANGE_INSERT)
>= seqno))
printf("row inserted\n");
}
}
/* All changes processed - clear the change track */
ovsdb_idl_track_clear(idl);
}
Signed-off-by: Shad Ansari <shad.ansari@hp.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
struct list is a common name and can't be used in public headers.
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Ben Pfaff <blp@nicira.com>
OVSDB has always had the ability to mark a column as "immutable", so that
its value cannot be changed in a given row after that row is initially
inserted. However, we discovered recently that ovsdb-server has never
enforced this constraint. This commit implements enforcement.
Reported-by: Paul Ingram <paul@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Kyle Mestery <kmestery@cisco.com>
String to string maps are used all over the Open vSwitch database.
Before this patch, they were implemented in the idl as parallel
string arrays. This strategy has proven a bit cumbersome. With
this patch, string to string maps are implemented using the smap
library.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Replaced all instances of Nicira Networks(, Inc) to Nicira, Inc.
Feature #10593
Signed-off-by: Raju Subramanian <rsubramanian@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Until now, by default the IDL replicated all tables and all columns in the
database, and a few functions made it possible to avoid replicating
selected columns. This commit adds a mode in which nothing is replicated
by default and the client code is responsible for specifying each column
and table that it is interested in. The following commit adds a user for
this mode.
ovs-vswitchd has no need to replicate some parts of the database. In
particular, it doesn't need to replicate the bits that it never reads,
such as the external_ids column in the Open_vSwitch table. This saves
some memory, CPU time, and bandwidth to the database.
Another type of column that benefits from special treatment is "write-only
columns", that is, those that ovs-vswitchd writes and keeps up-to-date but
never expects another client to write, such as the cur_cfg column in the
Open_vSwitch table. If the IDL reports that the database has changed when
ovs-vswitchd updates such a column, then ovs-vswitchd reconfigures itself
for no reason, wasting CPU time. This commit also adds support for such
columns.
This also adds protocol compatibility to the database itself and to
ovsdb-client. It doesn't actually add multiple database support to
ovsdb-server, since we don't really need that yet.
ovs-vsctl wants to use these functions directly, so make them available
through the ovsdb-idl public header instead of only through the private
one.
Also, change the prototypes to make them usable without casts.
The IDL is intended to allow clients easier access to data in the database
by providing an extra layer of abstraction. However, ovs-vsctl needs to
also provide generic access to database tables, rows, and columns, and
until now the IDL has not allowed this. In particular, there was no way
to modify the value of a database column by providing a "struct
ovsdb_datum" with the new value and then have that reflected in the IDL
structs, although the other direction was possible.
This commit fixes that problem, which requires a bit of refactoring of the
IDL layer. It also exposes the interface for iterating through table
records to clients directly, by moving it from the "private" IDL header to
the public one.