Most users of OVSDB react to whatever is currently in their view of the
database, as opposed to keeping track of changes and reacting to those
changes individually. The interface to conditional monitoring was
different, in that it expected the client to say what to add or remove from
monitoring instead of what to monitor. This seemed reasonable at the time,
but in practice it turns out that the usual approach actually works better,
because the condition is generally a function of the data visible in the
database. This commit changes the approach.
This commit also changes the meaning of an empty condition for a table.
Previously, an empty condition meant to replicate every row. Now, an empty
condition means to replicate no rows. This is more convenient for code
that gradually constructs conditions, because it does not need special
cases for replicating nothing.
This commit also changes the internal implementation of conditions from
linked lists to arrays. I just couldn't see an advantage to using linked
lists.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Liran Schour <lirans@il.ibm.com>
Some upcoming tests will add extra trickiness to the IDL internal graph.
This worries me, because the IDL doesn't have any checks for its graph
consistency. This commit adds some.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Andy Zhou <azhou@ovn.org>
The conditional replication code had hardly any comments. This adds some.
This commit also fixes a number of style problems, factors out some code
into a helper function, and moves some struct declarations from a public
header, that were not used by client code, into more private locations.
Signed-off-by: Ben Pfaff <blp@ovn.org>
This patchset mimics the changes introduced in
f199df26 (ovsdb-idl: Add partial map updates functionality.)
010fe7ae (ovsdb-idlc.in: Autogenerate partial map updates functions.)
7251075c (tests: Add test for partial map updates.)
b1048e6a (ovsdb-idl: Fix issues detected in Partial Map Update feature)
but for columns that store sets of values rather than key-value
pairs. These columns will now be able to use the OVSDB mutate
operation to transmit deltas on the wire rather than use
verify/update and transmit wait/update operations on the wire.
Side effect of modifying the comments in the partial map update
tests.
Signed-off-by: Ryan Moats <rmoats@us.ibm.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
It's slow to add --wait to every ovn-nbctl command; only the last command
needs it. But it's sometimes inconvenient to add it to the last command
if it's in a loop, etc. This makes it possible to separately wait for
the OVN southbound or hypervisors to catch up to the northbound.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Ryan Moats <rmoats@us.ibm.com>
Until now, there has been no reliable for the CMS (or ovn-nbctl, or
anything else) to detect when changes made to the northbound configuration
have been passed through to the southbound database or to the hypervisors.
This commit adds this feature to the system, by adding sequence numbers
to the northbound and southbound databases and adding code in ovn-nbctl,
ovn-northd, and ovn-controller to keep those sequence numbers up-to-date.
The biggest user-visible change from this commit is new a new option
--wait to ovn-nbctl. With --wait=sb, ovn-nbctl now waits for ovn-northd
to update the southbound database; with --wait=hv, it waits for the
changes to make their way to Open vSwitch on every hypervisor.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Russell Bryant <russell@ovn.org>
Add to IDL API that allows the user to add and remove clauses on a table's condition
iteratively. IDL maintain tables condition and send monitor_cond_change to the server
upon condition change.
Add tests for conditional monitoring to IDL.
Signed-off-by: Liran Schour <lirans@il.ibm.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
IDL uses now a uuid to specify a monitoring session that is being
sent to the server on "monitor_cond" request.
This uuid will be used to issue ongoing "monitor_cond_change" requests
for this monitoring session.
Signed-off-by: Liran Schour <lirans@il.ibm.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
In the current implementation, every time an element of either a map or set
column has to be modified, the entire content of the column is sent to the
server to be updated. This is not a major problem if the information contained
in the column for the corresponding row is small, but there are cases where
these columns can have a significant amount of elements per row, or these
values are updated frequently, therefore the cost of the modifications becomes
high in terms of time and bandwidth.
In this solution, the ovsdb-idl code is modified to use the RFC 7047 'mutate'
operation, to allow sending partial modifications on map columns to the server.
The functionality is exposed to clients in the vswitch idl. This was
implemented through map operations.
A map operation is defined as an insertion, update or deletion of a key-value
pair inside a map. The idea is to minimize the amount of map operations
that are send to the OVSDB server when a transaction is committed.
In order to keep track of the requested map operations, structs map_op and
map_op_list were defined with accompanying functions to manipulate them. These
functions make sure that only one operation is send to the server for each
key-value that wants to be modified, so multiple operation on a key value are
collapsed into a single operation.
As an example, if a client using the IDL updates several times the value for
the same key, the functions will ensure that only the last value is send to
the server, instead of multiple updates. Or, if the client inserts a key-value,
and later on deletes the key before committing the transaction, then both
actions cancel out and no map operation is send for that key.
To keep track of the desired map operations on each transaction, a list of map
operations (struct map_op_list) is created for every column on the row on which
a map operation is performed. When a new map operation is requested on the same
column, the corresponding map_op_list is checked to verify if a previous
operations was performed on the same key, on the same transaction. If there is
no previous operation, then the new operation is just added into the list. But
if there was a previous operation on the same key, then the previous operation
is collapsed with the new operation into a single operation that preserves the
final result if both operations were to be performed sequentially. This design
keep a small memory footprint during transactions.
When a transaction is committed, the map operations lists are checked and
all map operations that belong to the same map are grouped together into a
single JSON RPC "mutate" operation, in which each map_op is transformed into
the necessary "insert" or "delete" mutators. Then the "mutate" operation is
added to the operations that will be send to the server.
Once the transaction is finished, all map operation lists are cleared and
deleted, so the next transaction starts with a clean board for map operations.
Using different structures and logic to handle map operations, instead of
trying to force the current structures (like 'old' and 'new' datums in the row)
to handle then, ensures that map operations won't mess up with the current
logic to generate JSON messages for other operations, avoids duplicating the
whole map for just a few changes, and is faster for insert and delete
operations, because there is no need to maintain the invariants in the 'new'
datum.
Signed-off-by: Edward Aymerich <edward.aymerich@hpe.com>
Signed-off-by: Arnoldo Lutz <arnoldo.lutz.guevara@hpe.com>
Co-authored-by: Arnoldo Lutz <arnoldo.lutz.guevara@hpe.com>
[blp@ovn.org made style changes and factored out error checking]
Signed-off-by: Ben Pfaff <blp@ovn.org>
Change tracking is a bit different from what someone with
"classic" database experience might expect, so let's add
the knowledged gained from the experience of making change
tracking work for incremental processing.
Signed-off-by: RYAN D. MOATS <rmoats@us.ibm.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Allows for auto detection and reconnect if the ovn-remote needs
to change. Ovn-controller test case updated to include testing
this code.
Signed-off-by: RYAN D. MOATS <rmoats@us.ibm.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Add a external-id 'ovn-remote-probe-interval' for setting the activity probe
interval of the json session from ovn-controller to the OVN southbound database.
Signed-off-by: Huang Lei <lhuang8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Recent IDL change tracking patches allow quick traversal of changed
rows. This patch adds additional support to track changed columns.
It allows an IDL client to efficiently check if a specific column
of a row was updated by IDL.
Signed-off-by: Shad Ansari <shad.ansar@hpe.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Ovsdb-idl notifies a client that something changed; it does not track
which table, row changed in what way (insert, modify or delete).
As a result, a client has to scan or reconfigure the entire idl after
ovsdb_idl_run(). This is presumably fine for typical ovs schemas where
tables are relatively small. In use-cases where ovsdb is used with
schemas that can have very large tables, the current ovsdb-idl
notification mechanism does not appear to scale - clients need to do a
lot of processing to determine the exact change delta.
This change adds support for:
- Table and row based change sequence numbers to record the
most recent IDL change sequence numbers associated with insert,
modify or delete update on that table or row.
- Change tracking of specific columns. This ensures that changed
rows (inserted, modified, deleted) that have tracked columns, are
tracked by IDL. The client can directly access the changed rows
with get_first, get_next operations without the need to scan the
entire table.
The tracking functionality is not enabled by default and needs to
be turned on per-column by the client after ovsdb_idl_create()
and before ovsdb_idl_run().
/* Example Usage */
idl = ovsdb_idl_create(...);
/* Track specific columns */
ovsdb_idl_track_add_column(idl, column);
/* Or, track all columns */
ovsdb_idl_track_add_all(idl);
for (;;) {
ovsdb_idl_run(idl);
seqno = ovsdb_idl_get_seqno(idl);
/* Process only the changed rows in Table FOO */
FOO_FOR_EACH_TRACKED(row, idl) {
/* Determine the type of change from the row seqnos */
if (foo_row_get_seqno(row, OVSDB_IDL_CHANGE_DELETE)
>= seqno)) {
printf("row deleted\n");
} else if (foo_row_get_seqno(row, OVSDB_IDL_CHANGE_MODIFY)
>= seqno))
printf("row modified\n");
} else if (foo_row_get_seqno(row, OVSDB_IDL_CHANGE_INSERT)
>= seqno))
printf("row inserted\n");
}
}
/* All changes processed - clear the change track */
ovsdb_idl_track_clear(idl);
}
Signed-off-by: Shad Ansari <shad.ansari@hp.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
The same function is defined in both ovn-controller.c and
ovn-controller-vtep.c, so worth librarizing.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Russell Bryant <rbryant@redhat.com>
idl-loop is needed in implementing other controller (i.e., vtep controller).
So, this commit moves the logic into ovsdb-idl library module.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Russell Bryant <rbryant@redhat.com>
We've had this question a couple of times so we might as well document it.
Requested-by: Saurabh Shrivastava (सौरभ श्रीवास्तव) <saurabh@gmail.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
The following macros are renamed to avoid conflicts with other headers:
* WARN_UNUSED_RESULT to OVS_WARN_UNUSED_RESULT
* PRINTF_FORMAT to OVS_PRINTF_FORMAT
* NO_RETURN to OVS_NO_RETURN
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Ben Pfaff <blp@nicira.com>
OVSDB has the concept of "immutable" columns, which are columns whose
values are fixed once a row is inserted. Until now, ovs-vsctl has not
allowed these columns to be modified at all. However, this is a little too
strict, because these columns can be set to any value at the time that the
row is inserted. This commit relaxes the ovs-vsctl requirement, then, to
allow an immutable column's value to be modified if its row has been
inserted within this transaction.
Requested-by: Mukesh Hira <mhira@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
If ovs-vsctl has to wait for ovs-vswitchd to reconfigure itself
according to the new database, then sometimes ovs-vsctl could
end up stuck in the event loop if OVSDB connection was dropped
while ovs-vsctl was still running.
This patch fixes this problem by letting ovs-vsctl to reconnect
to the OVSDB, if it has to wait cur_cfg field to be updated.
Issue: 1191997
Reported-by: Spiro Kourtessis <spiro@nicira.com>
Signed-Off-By: Ansis Atteka <aatteka@nicira.com>
Until now, ovs-vsctl has kept trying to the database server until it
succeeded or the timeout expired (if one was specified with --timeout).
This meant that if ovsdb-server wasn't running, then ovs-vsctl would hang.
The result was that almost every ovs-vsctl invocation in scripts specified
a timeout on the off-chance that the database server might not be running.
But it's difficult to choose a good timeout. A timeout that is too short
can cause spurious failures. A timeout that is too long causes long delays
if the server really isn't running.
This commit should alleviate this problem. It changes ovs-vsctl's behavior
so that, if it fails to connect to the server, it exits unsuccessfully.
This makes --timeout obsolete for the purpose of avoiding a hang if the
database server isn't running. (--timeout is still useful to avoid a hang
if ovsdb-server is running but ovs-vswitchd is not, for ovs-vsctl commands
that modify the database. --no-wait also avoids that issue.)
Bug #2393.
Bug #15594.
Reported-by: Jeff Merrick <jmerrick@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
For 1000 tunnels with CFM enabled, this reduces CPU use from
about 36% to about 30%.
Bug #15171.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
ovs-vswitchd should only write to write-only columns. Furthermore,
writing to a column which is not write-only can cause serious
performance degradations. This patch causes ovs-vswitchd to log
and reject writes to read-write columns.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Replaced all instances of Nicira Networks(, Inc) to Nicira, Inc.
Feature #10593
Signed-off-by: Raju Subramanian <rsubramanian@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Originally the IDL transaction state machine had a return value
TXN_TRY_AGAIN to signal the client to wait for a change in the database and
then retry its transaction. However, this logic was incomplete, because
it was possible for the database to change before the reply to the
transaction RPC was received, in which case the client would wait for a
further change. Commit 4fdfe5ccf84c (ovsdb-idl: Prevent occasional hang
when multiple database clients race.) fixed the problem by breaking
TXN_TRY_AGAIN into two status codes, TXN_AGAIN_WAIT that meant to wait for
a further change and TXN_AGAIN_NOW that meant that a change had already
occurred so try again immediately.
This is correct enough, but it is more complicated than necessary. It is
simpler and just as correct to use a single "try again" status that
requires the client to wait for a change relative to the database contents
*before* the transaction was committed. This commit makes that change.
It also changes ovsdb_idl_run()'s return type from bool to void because
its return type is hardly useful anymore.
Signed-off-by: Ben Pfaff <blp@nicira.com>
When a client of the IDL tries to commit a read-modify-write transaction
but the database has changed in the meantime, the IDL tells its client to
wait for the IDL to change and then try the transaction again by returning
TXN_TRY_AGAIN. The "wait for the IDL to change" part is important because
there's no point in retrying the transaction before the IDL has received
the database updates (the transaction would fail in the same way all over
again).
However, the logic was incomplete: the database update can be received
*before* the reply to the transaction RPC (I think that in the current
ovsdb-server implementation this will always happen, in fact). When this
happens, the right thing to do is to retry the transaction immediately;
if we wait, then we're waiting for an additional change to the database
that may never come, causing an indefinite hang.
This commit therefore breaks the "try again" IDL commit status code
into two, one that means "try again immediately" and another that means
"wait for a change then try again". When an update is processed after a
transaction is committed but before the reply is received, the "try again
now" tells the IDL client not to wait for another database change before
retrying its transaction.
Bug #5980.
Reported-by: Ram Jothikumar <rjothikumar@nicira.com>
Reproduced-by: Alex Yip <alex@nicira.com>
Once in a while someone reports a problem caused by running multiple
ovs-vswitchd processes at the same time. This fixes the problem by
requiring ovs-vswitchd to obtain a database lock before taking any actions.
The state machine didn't have a proper state for "not yet committed or
aborted", which meant that destroying an ovsdb_idl_txn without committing
or aborting it caused a segfault. This fixes the problem by adding a new
state TXN_UNCOMMITTED to the state machine.
This is related to commit 79554078d "ovsdb-idl: Fix bad logic in
ovsdb_idl_txn_commit() state transitions", which fixed a related bug.
Bug #2438.
Until now, ovs-vswitchd has been unable to configure IP addresses and
routes for bridges whose Bridge records lack a Port and an Interface
record for the bridge's local port (e.g. OFPP_LOCAL, the port with the
same name as the bridge itself). When such a bridge was reconfigured,
ovs-vswitchd would output a log message that worried people.
This commit fixes the internal limitation that led to the message being
printed.
Bug #5385.
Until now, by default the IDL replicated all tables and all columns in the
database, and a few functions made it possible to avoid replicating
selected columns. This commit adds a mode in which nothing is replicated
by default and the client code is responsible for specifying each column
and table that it is interested in. The following commit adds a user for
this mode.
ovs-vswitchd has no need to replicate some parts of the database. In
particular, it doesn't need to replicate the bits that it never reads,
such as the external_ids column in the Open_vSwitch table. This saves
some memory, CPU time, and bandwidth to the database.
Another type of column that benefits from special treatment is "write-only
columns", that is, those that ovs-vswitchd writes and keeps up-to-date but
never expects another client to write, such as the cur_cfg column in the
Open_vSwitch table. If the IDL reports that the database has changed when
ovs-vswitchd updates such a column, then ovs-vswitchd reconfigures itself
for no reason, wasting CPU time. This commit also adds support for such
columns.
The existing ovsdb_idl_txn_read() was somewhat difficult and expensive to
use, because it always made a copy of the data in the column. This was
necessary at the time it was introduced, because there was no way for it
to return a "default" value for columns that had not yet been populated
without allocating data and hence requiring the caller to free it.
Now that ovsdb_datum_default() exists, this is no longer required. This
commit introduces a pair of new functions, ovsdb_idl_read() and
ovsdb_idl_get(), that return a pointer to existing data and do not do any
copying. It also transitions all of ovsdb_idl_txn_read()'s callers to
the new interfaces.
This makes it easy to create a bunch of records that are all related to
each other in a single ovs-vsctl invocation. It adds an example to the
ovs-vsctl manpage.
Before OVSDB was adopted in the vswitch, bridge ioctls were synchronous.
That is, an operation that, say, creates a new bridge was guaranteed to
have completed before brcompatd returned a success result to the kernel.
When OVSDB was adopted, however, we failed to maintain this property.
Instead, bridge creation (etc.) only happened some time after the return
value was passed back to the kernel. This causes a race condition against
software that creates or deletes bridges or ports and expects that the
operation is completed synchronously.
This commit restores the synchronous behavior.
Bug #2443.
The ovs-vsctl "create" command, and perhaps other commands, should print
the UUID of the newly created database row, but until now the IDL has not
provided a way to find that out. This commit adds the ability.
ovs-vsctl wants to use these functions directly, so make them available
through the ovsdb-idl public header instead of only through the private
one.
Also, change the prototypes to make them usable without casts.
The IDL is intended to allow clients easier access to data in the database
by providing an extra layer of abstraction. However, ovs-vsctl needs to
also provide generic access to database tables, rows, and columns, and
until now the IDL has not allowed this. In particular, there was no way
to modify the value of a database column by providing a "struct
ovsdb_datum" with the new value and then have that reflected in the IDL
structs, although the other direction was possible.
This commit fixes that problem, which requires a bit of refactoring of the
IDL layer. It also exposes the interface for iterating through table
records to clients directly, by moving it from the "private" IDL header to
the public one.
Until now the ovsdb-based vswitch has provided no way to know when it has
finished applying the configuration from the database. This commit
introduces a way:
* The client who wants to wait increments the "next_cfg" column of the
Open_vSwitch record.
* When ovs-vswitchd finishes reconfiguring, it sets the value of the
"cur_cfg" column to that of the "next_cfg" column.
* The client waits until the "cur_cfg" column is at least as great as
the value it set into "next_cfg".
This allows us to drop the 5-second sleep in interface-reconfigure.
The idea here is that transaction comments get copied to the ovsdb-server's
transaction log, which can then make it clear later why a particular change
was made to the database, to ease debugging.