Deciding what delete operations to issue on garbage-collected tables has
been a bit of a difficult issue for ovs-vsctl. When garbage collection was
introduced in commit c5f341a "ovsdb: Implement garbage collection",
ovs-vsctl did not issue any deletions for these tables at all. As a side
effect, ovs-vsctl did not notice that records were going to be deleted.
That meant that when multiple commands were issued in one ovs-vsctl run,
ovs-vsctl could get confused by apparent duplicate records that did not
in fact exist. Commit 28a14bf "ovs-vsctl: Back out garbage collection
changes" fixed the problem by putting all of the explicit deletions back
into ovs-vsctl.
However, adding these explicit deletions had the price that it then became
(again) impossible to use ovs-vsctl commands to delete duplicates, for
example to use "ovs-vsctl del-br" to delete a bridge that points to the
same Port records that some other Bridge record also does. This commit
makes that possible again, by implementing a compromise:
* Internally, ovs-vsctl deletes the records that it believes should be
deleted.
* ovsdb-idl suppresses the deletions when it makes the RPC call into
the database server.
Bug #5358.
Reported-by: Henrik Amren <henrik@nicira.com>
The existing ovsdb_idl_txn_commit() drops any writes that don't change a
column's value from what was last reported by the database. But this isn't
a valid optimization, because it breaks the atomicity of transactions.
Suppose columns A and B initially have values 1 and 2. Client 1 writes
value 1 to both columns in one transaction. Client 2 writes value 2 to
both columns in another transaction. The only possible valid results for
any serial ordering of transactions are 1,1 or 2,2. But if both clients
drop writes to columns that they have not modified, then 2,1 also becomes
possible (because client 1 just writes to B and client 2 just writes to A).
However, for write-only columns we can optimize this out because the IDL
can assume it is the only client writing to a column.
Found by inspection.
A JSONRPC_REPLY message always have a nonnull 'id' member, as ensured by
jsonrpc_msg_is_valid(). Checking for NULL here confused Coverity into
believing that the call to ovsdb_idl_txn_process_reply() just below could
cause a null pointer dereference, since ovsdb_idl_txn_process_reply() uses
the 'id' member without checking it for null.
Coverity #10713.
The IDL can only verify prerequisites for columns that it is monitoring,
but it didn't check for that. This assertion (which is the same as one in
ovsdb_idl_txn_write()) should alert us to such problems.
This would have found the problem fixed by the previous commit.
Until now, by default the IDL replicated all tables and all columns in the
database, and a few functions made it possible to avoid replicating
selected columns. This commit adds a mode in which nothing is replicated
by default and the client code is responsible for specifying each column
and table that it is interested in. The following commit adds a user for
this mode.
All of these changes avoid using the same name for two local variables
within a same function. None of them are actual bugs as far as I can tell,
but any of them could be confusing to the casual reader.
The one in lib/ovsdb-idl.c is particularly brilliant: inner and outer
loops both using (different) variables named 'i'.
Found with GCC -Wshadow.
Not replicating unneeded columns has some value in avoiding CPU time and
bandwidth to the database. In ovs-vswitchd, setting cur_cfg as write-only
also have great value in avoiding extra reconfiguration steps. When
ovs-vsctl is used in its default mode this essentially avoids half of the
reconfigurations that ovs-vswitchd currently does. What happens now is:
1. ovs-vsctl updates the database and increments next_cfg.
2. ovs-vswitchd notices the change to the database, reconfigures
itself, then increments cur_cfg to match next_cfg.
3. The database sends the change to cur_cfg back to ovs-vswitchd.
4. ovs-vswitchd reconfigures itself a second time.
By not replicating cur_cfg we avoid step 3 and save a whole reconfiguration
step.
Also, now that the database contains interface statistics, this avoids
reconfiguring every time that statistics are updated.
ovs-vswitchd has no need to replicate some parts of the database. In
particular, it doesn't need to replicate the bits that it never reads,
such as the external_ids column in the Open_vSwitch table. This saves
some memory, CPU time, and bandwidth to the database.
Another type of column that benefits from special treatment is "write-only
columns", that is, those that ovs-vswitchd writes and keeps up-to-date but
never expects another client to write, such as the cur_cfg column in the
Open_vSwitch table. If the IDL reports that the database has changed when
ovs-vswitchd updates such a column, then ovs-vswitchd reconfigures itself
for no reason, wasting CPU time. This commit also adds support for such
columns.
Adding a macro to define the vlog module in use adds a level of
indirection, which makes it easier to change how the vlog module must be
defined. A followup commit needs to do that, so getting these widespread
changes out of the way first should make that commit easier to review.
The existing ovsdb_idl_txn_read() was somewhat difficult and expensive to
use, because it always made a copy of the data in the column. This was
necessary at the time it was introduced, because there was no way for it
to return a "default" value for columns that had not yet been populated
without allocating data and hence requiring the caller to free it.
Now that ovsdb_datum_default() exists, this is no longer required. This
commit introduces a pair of new functions, ovsdb_idl_read() and
ovsdb_idl_get(), that return a pointer to existing data and do not do any
copying. It also transitions all of ovsdb_idl_txn_read()'s callers to
the new interfaces.
Commit cde3f1 "ovsdb-idl: Drop unnecessary allocation from
ovsdb_idl_txn_insert()." does lazy allocation of row->written
on the assumption that ovsdb_idl_txn_write() will handle it.
However, this isn't the case for empty rows created by something
like "ovs-vsctl init" so add a check before reading the bitfield.
This makes it easy to create a bunch of records that are all related to
each other in a single ovs-vsctl invocation. It adds an example to the
ovs-vsctl manpage.
This patch fixed the following compile warning:
lib/ovsdb-idl.c: In function 'ovsdb_idl_txn_process_inc_reply':
lib/ovsdb-idl.c:1524: warning: format '%u' expects type 'unsigned int', but argument 5 has type 'size_t'
lib/ovsdb-idl.c:1538: warning: format '%ld' expects type 'long int', but argument 5 has type 'long long int'
lib/ovsdb-idl.c:1550: warning: format '%u' expects type 'unsigned int', but argument 5 has type 'size_t'
lib/ovsdb-idl.c: In function 'ovsdb_idl_txn_process_insert_reply':
lib/ovsdb-idl.c:1579: warning: format '%u' expects type 'unsigned int', but argument 5 has type 'size_t'
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
The fatal-signal library notices and records fatal signals (e.g. SIGTERM)
and terminates the process on the next trip through poll_block(). But
some special utilities do not always invoke poll_block() promptly, e.g.
"ovs-ofctl monitor" does not call poll_block() as long as OpenFlow messages
are available. But these special cases seem like they are all likely to
call into functions that themselves block (those with "_block" in their
names). So make a new rule that such functions should always call
fatal_signal_run(), either directly or through poll_block(). This commit
implements and documents that rule.
Bug #2625.
Before OVSDB was adopted in the vswitch, bridge ioctls were synchronous.
That is, an operation that, say, creates a new bridge was guaranteed to
have completed before brcompatd returned a success result to the kernel.
When OVSDB was adopted, however, we failed to maintain this property.
Instead, bridge creation (etc.) only happened some time after the return
value was passed back to the kernel. This causes a race condition against
software that creates or deletes bridges or ports and expects that the
operation is completed synchronously.
This commit restores the synchronous behavior.
Bug #2443.
When a transaction is in progress, newly inserted rows have NULL 'old'
values. These rows are not orphans, so ovsdb_idl_row_is_orphan() should
not treat them as such.
I do not believe that this changes behavior at all, because I have not been
able to find a case where ovsdb_idl_row_is_orphan() is called while a
transaction is in progress. It is a code cleanup.
The IDL was returning rows that had existed in the database and were
deleted by the current transaction (that is, row->old && !row->new).
This commit fixes the problem.
The condition used by next_real_row() was just blatantly wrong and
illogical. The correct condition is row->new != NULL. The old condition
only got one case wrong (the one mentioned above), even though it didn't
make much sense.
This fixes an ovs-vsctl call that assert-failed in a "set" command that
iterated through a table from which a previous ovs-vsctl command (in the
same invocation) had deleted a row.
If sending the transaction fails (jsonrpc_session_send() returns 0),
then we need to transition to TXN_TRY_AGAIN. (Transitioning to
TXN_INCOMPLETE is actually a no-op, because at this point in the code
we are guaranteed to be in that state already.)
Leaving the transaction in TXN_INCOMPLETE causes a segfault later in
ovsdb_idl_txn_destroy() when it calls hmap_remove() on the transaction's
txn_node.
This bug reveals a hole in the ovsdb_idl_txn state machine: destroying
a transaction without committing it or aborting it will cause the same
problem. This problem is *not* fixed by this patch: it really should be
handled by adding a new state TXN_UNCOMMITTED that indicates that the
transaction is not yet committed or aborted. That's too much for this
patch, and doesn't really matter for OVS at the moment since none of its
code paths destroy a transaction without committing or aborting it.
Bug #2435.
This also adds protocol compatibility to the database itself and to
ovsdb-client. It doesn't actually add multiple database support to
ovsdb-server, since we don't really need that yet.
The ovs-vsctl "create" command, and perhaps other commands, should print
the UUID of the newly created database row, but until now the IDL has not
provided a way to find that out. This commit adds the ability.
ovs-vsctl wants to use these functions directly, so make them available
through the ovsdb-idl public header instead of only through the private
one.
Also, change the prototypes to make them usable without casts.
The IDL is intended to allow clients easier access to data in the database
by providing an extra layer of abstraction. However, ovs-vsctl needs to
also provide generic access to database tables, rows, and columns, and
until now the IDL has not allowed this. In particular, there was no way
to modify the value of a database column by providing a "struct
ovsdb_datum" with the new value and then have that reflected in the IDL
structs, although the other direction was possible.
This commit fixes that problem, which requires a bit of refactoring of the
IDL layer. It also exposes the interface for iterating through table
records to clients directly, by moving it from the "private" IDL header to
the public one.
If ovsdb_idl_txn_destroy() is called to destroy a transaction before its
reply has been received from the database server, then until now we would
drop the connection to the database when the reply actually arrived,
because we would have no record of that transaction ID any longer.
Notably, ovs-vswitchd does this: it "fires and forgets" database
transactions. (Really, it should not do that, but that's a bigger commit.)
This commit fixes the problem by not dropping the database connection in
such a case.
This fixes an observed problem such that sometimes ovs-vsctl took a long
time to complete, which was because ovs-vswitchd was dropping its
connection to the database and backing off.
Until now the ovsdb-based vswitch has provided no way to know when it has
finished applying the configuration from the database. This commit
introduces a way:
* The client who wants to wait increments the "next_cfg" column of the
Open_vSwitch record.
* When ovs-vswitchd finishes reconfiguring, it sets the value of the
"cur_cfg" column to that of the "next_cfg" column.
* The client waits until the "cur_cfg" column is at least as great as
the value it set into "next_cfg".
This allows us to drop the 5-second sleep in interface-reconfigure.
The idea here is that transaction comments get copied to the ovsdb-server's
transaction log, which can then make it clear later why a particular change
was made to the database, to ease debugging.
If the transaction modified a row and then deleted it, the IDL would
instead mistakenly leave the row entirely untouched.
This commit fixes this bug. It needs a regression test, but this commit
does not add one.
When the IDL was used to insert a row, but all of the new row's columns
were left at the default values, then the IDL would not insert the row at
all.
When the IDL was used to delete one or more rows, and the transaction did
not include any update or insertion operations, the transaction was dropped
entirely.
This commit fixes these two bugs. It needs a regression test, but this
commit does not add one.