ovsdb_datum_apply_diff() is heavily used in ovsdb transactions, but
it's linear in terms of number of comparisons. And it also clones
all the atoms along the way. In most cases size of a diff is much
smaller than the size of the original datum, this allows to perform
the same operation in-place with only O(diff->n * log2(old->n))
comparisons and O(old->n + diff->n) memory copies with memcpy.
Using this function while applying diffs read from the storage gives
a significant performance boost and allows to execute much more
transactions per second.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Mark D. Gray <mark.d.gray@redhat.com>
Current algorithm for ovsdb_datum_subtract looks like this:
for-each atom in a:
if atom in b:
swap(atom, <last atom in 'a'>)
destroy(atom)
quicksort(a)
Complexity:
Na * log2(Nb) + (Na - Nb) * log2(Na - Nb)
Search Comparisons for quicksort
It's not optimal, especially because Nb << Na in a vast majority of
cases.
Reversing the search phase to look up atoms from 'b' in 'a', and
closing gaps from deleted elements in 'a' by plain memory copy to
avoid quicksort.
Resulted complexity:
Nb * log2(Na) + (Na - Nb)
Search Memory copies
Subtraction is heavily used while executing database transactions.
For example, to remove one port from a logical switch in OVN.
Complexity of such operation if original logical switch had 100 ports
goes down from
100 * log2(1) = 100 comparisons for search and
99 * log2(99) = 656 comparisons for quicksort
------------------------------
756 comparisons in total
to only
1 * log2(100) = 7 comparisons for search
+ memory copy of 99 * sizeof (union ovsdb_atom) bytes.
We could use memmove to close the gaps after removing atoms, but
it will lead to 2 memory copies inside the call, while we can perform
only one to the temporary 'result' and swap pointers.
Performance in cases, where sizes of 'a' and 'b' are comparable,
should not change. Cases with Nb >> Na should not happen in practice.
All in all, this change allows ovsdb-server to perform several times
more transactions, that removes elements from sets, per second.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Han Zhou <hzhou@ovn.org>
Acked-by: Mark D. Gray <mark.d.gray@redhat.com>
Current algorithm of ovsdb_datum_union looks like this:
for-each atom in b:
if not bin_search(a, atom):
push(a, clone(atom))
quicksort(a)
So, the complexity looks like this:
Nb * log2(Na) + Nb + (Na + Nb) * log2(Na + Nb)
Comparisons clones Comparisons for quicksort
for search
ovsdb_datum_union() is heavily used in database transactions while
new element is added to a set. For example, if new logical switch
port is added to a logical switch in OVN. This is a very common
use case where CMS adds one new port to an existing switch that
already has, let's say, 100 ports. For this case ovsdb-server will
have to perform:
1 * log2(100) + 1 clone + 101 * log2(101)
Comparisons Comparisons for
for search quicksort.
~7 1 ~707
Roughly 714 comparisons of atoms and 1 clone.
Since binary search can give us position, where new atom should go
(it's the 'low' index after the search completion) for free, the
logic can be re-worked like this:
copied = 0
for-each atom in b:
desired_position = bin_search(a, atom)
push(result, a[ copied : desired_position - 1 ])
copied = desired_position
push(result, clone(atom))
push(result, a[ copied : Na ])
swap(a, result)
Complexity of this schema:
Nb * log2(Na) + Nb + Na
Comparisons clones memory copy on push
for search
'swap' is just a swap of a few pointers. 'push' is not a 'clone',
but a simple memory copy of 'union ovsdb_atom'.
In general, this schema substitutes complexity of a quicksort
with complexity of a memory copy of Na atom structures, where we're
not even copying strings that these atoms are pointing to.
Complexity in the example above goes down from 714 comparisons
to 7 comparisons and memcpy of 100 * sizeof (union ovsdb_atom) bytes.
General complexity of a memory copy should always be lower than
complexity of a quicksort, especially because these copies usually
performed in bulk, so this new schema should work faster for any input.
All in all, this change allows to execute several times more
transactions per second for transactions that adds new entries to sets.
Alternatively, union can be implemented as a linear merge of two
sorted arrays, but this will result in O(Na) comparisons, which
is more than Nb * log2(Na) in common case, since Na is usually
far bigger than Nb. Linear merge will also mean per-atom memory
copies instead of copying in bulk.
'replace' functionality of ovsdb_datum_union() had no users, so it
just removed. But it can easily be added back if needed in the future.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Han Zhou <hzhou@ovn.org>
Acked-by: Mark D. Gray <mark.d.gray@redhat.com>
Currently, even if one reference added to the set of strong references
or removed from it, ovsdb-server will walk through the whole set and
re-count references to other rows. These referenced rows will also be
added to the transaction in order to re-count their references.
For example, every time Logical Switch Port added to a Logical Switch,
OVN Northbound database server will walk through all ports of this
Logical Switch, clone their rows, and re-count references. This is
not very efficient. Instead, it can only increase reference counters
for added references and reduce for removed ones. In many cases this
will be only one row affected in the Logical_Switch_Port table.
Introducing new function that generates a diff of two datum objects,
but stores added and removed atoms separately, so they can be used
to increase or decrease row reference counters accordingly.
This change allows to perform several times more transactions that
adds or removes strong references to/from sets per second, because
ovsdb-server no longer clones and re-counts rows that are irrelevant
to current transaction.
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
When 'datum.values' or 'datum.keys' is NULL, some code path calling
into ovsdb_idl_txn_write__ triggers NULL deref. An example is below:
ovsrec_open_vswitch_set_cur_cfg(const struct ovsrec_open_vswitch
{
struct ovsdb_datum datum;
union ovsdb_atom key;
datum.n = 1;
datum.keys = &key;
key.integer = cur_cfg;
// 1. assign_zero: Assigning: datum.values = NULL.
datum.values = NULL;
// CID 1421356 (#1 of 1): Explicit null dereferenced (FORWARD_NULL)
// 2. var_deref_model: Passing &datum to ovsdb_idl_txn_write_clone,\
// which dereferences null datum.values.
ovsdb_idl_txn_write_clone(&row->header_, &ovsrec_open_vswitch_col
}
And with the following calls:
ovsdb_idl_txn_write_clone
ovsdb_idl_txn_write__
6. deref_parm_in_call: Function ovsdb_datum_destroy dereferences
datum->values
ovsdb_datum_destroy
And another possible NULL deref is at ovsdb_datum_equals(). Fix the
two by adding additional checks.
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
No need to use quotes for strings like "br0".
Keeping UUIDs always in quotes to avoid different treatment of those
that starts with digits and those that starts with letters.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ben Pfaff <blp@ovn.org>
"must have exactly one member" is much better than "must have 1 to 1
members".
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
The call to ovsdb_datum_diff() initializes 'new', so it's not necessary to
also do it in ovsdb_datum_apply_diff().
Found by inspection.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Several OVS structs contain embedded named unions, like this:
struct {
...
union {
...
} u;
};
C11 standardized a feature that many compilers already implemented
anyway, where an embedded union may be unnamed, like this:
struct {
...
union {
...
};
};
This is more convenient because it allows the programmer to omit "u."
in many places. OVS already used this feature in several places. This
commit embraces it in several others.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
Tested-by: Alin Gabriel Serdean <aserdean@ovn.org>
Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
With this change, "ovsdb-client convert" can be used to convert a database
from one schema to another without taking the database offline.
This can be useful to minimize downtime for a database during a software
upgrade.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
The ovs_assert() macro always evaluates its argument, even when NDEBUG is
defined so that failure is ignored. This behavior wasn't documented, and
thus a lot of code didn't rely on it. This commit documents the behavior
and simplifies bits of code that heretofore didn't rely on it.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
The ovsdb-client "dump" command is a fairly low-level tool that can be
used, among other purposes, to debug the OVSDB protocol. It's better if
it just prints what the server sends without being too judgmental about it.
Thus, we might as well ignore constraints for the purpose of dumping
tables.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
This allows slight code simplifications across the tree.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Tested-by: Yifeng Sun <pkusunyifeng@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
The function prototypes in ovsdb-data.h already have these, but it seems
more complete to have the annotation on the definitions too.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Adding / removing a range of integers to a column accepting a set of
integers requires enumarating all of the integers. This patch simplifies
it by introducing 'range' concept to the database commands. Two integers
separated by a hyphen represent an inclusive range.
The patch adds positive and negative tests for the new syntax.
The patch was tested by 'make check'. Covarage was tested by
'make check-lcov'.
Signed-off-by: Lukasz Rzasik <lukasz.rzasik@gmail.com>
Suggested-by: <my_ovs_discuss@yahoo.com>
Suggested-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
There's no reason to have three copies of this code for every smap-type
column.
The code wasn't a perfect match for ovsdb_datum_from_smap(), so this commit
also changes ovsdb_datum_from_smap() to better suit it. It only had one
caller and the new design is adequate for that caller.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Andy Zhou <azhou@ovn.org>
To easily allow both in- and out-of-tree building of the Python
wrapper for the OVS JSON parser (e.g. w/ pip), move json.h to
include/openvswitch. This also requires moving lib/{hmap,shash}.h.
Both hmap.h and shash.h were #include-ing "util.h" even though the
headers themselves did not use anything from there, but rather from
include/openvswitch/util.h. Fixing that required including util.h
in several C files mostly due to OVS_NOT_REACHED and things like
xmalloc.
Signed-off-by: Terry Wilson <twilson@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
When an OVSDB column change its value, it is more efficient to only
send what has changed, rather than sending the entire new copy.
This is analogous to software programmer send patches rather than
the entire source file.
For columns store a single element, the "diff" datum is the same
as the "new" datum.
For columns that store set or map, it is only necessary to send the
information about the elements changed (including addition or removal).
The "diff" for those types are all elements that are changed.
Those APIs are mainly used for implementing a new OVSDB server
"update2" JSON-RPC notification, which encodes modifications
of a column with the contents of those "diff"s. Later patch implements
the "update2" notification.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Added ovsdb_transient_datum_from_json() to avoid size check for
the diff datum that is transient in nature.
Suppose a datum contains set, and the max number of elements is 2.
If we are changing from set that contains [A, B], to a set contains
[C, D], the diff datum will contains 4 elements [A, B, C, D].
Thus diff datum should not be constrained by the size limit. However
the datum after diff is applied should not violate the size limit.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Ben Pfaff <blp@ovn.org>
The following macros are renamed to avoid conflicts with other headers:
* WARN_UNUSED_RESULT to OVS_WARN_UNUSED_RESULT
* PRINTF_FORMAT to OVS_PRINTF_FORMAT
* NO_RETURN to OVS_NO_RETURN
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Also, add some clarifications relative to RFC 7047 to ovsdb-server(1).
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
This reverts commit 0ea7bec76d804a2c4efccd3dbdaa3e30cf536a5c.
Connections that queue up too much data, because they are monitoring a
table that is changing quickly and failing to keep up with the updates,
cause problems with buffer management. Since commit 60533a405b2e
(jsonrpc-server: Disconnect connections that queue too much data.),
ovsdb-server has dealt with them by disconnecting the connection and
letting them start up again with a fresh copy of the database. However,
this is not ideal because of situations where disconnection happens
repeatedly. For example:
- A manager toggles a column back and forth between two or more values
quickly (in which case the data transmitted over the monitoring
connections always increases quickly, without bound).
- A manager repeatedly extends the contents of some column in some row
(in which case the data transmitted over the monitoring connection
grows with O(n**2) in the length of the string).
A better way to deal with this problem is to combine updates when they are
sent to the monitoring connection, if that connection is not keeping up.
In both the above cases, this reduces the data that must be sent to a
manageable amount. An upcoming patch implements this new way. This commit
reverts part of the previous solution that disconnects backlogged
connections, since it is no longer useful.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
This allows other libraries to use util.h that has already
defined NOT_REACHED.
Signed-off-by: Harold Lim <haroldl@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
The MSVC C library printf() implementation does not support the 'z', 't',
'j', or 'hh' format specifiers. This commit changes the Open vSwitch code
to avoid those format specifiers, switching to standard macros from
<inttypes.h> where available and inventing new macros resembling them
where necessary. It also updates CodingStyle to specify the macros' use
and adds a Makefile rule to report violations.
Signed-off-by: Alin Serdean <aserdean@cloudbasesolutions.com>
Co-authored-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This is a straight search-and-replace, except that I also removed #include
<assert.h> from each file where there were no assert calls left.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
Casts are sometimes necessary. One common reason that they are necessary
is for discarding a "const" qualifier. However, this can impede
maintenance: if the type of the expression being cast changes, then the
presence of the cast can hide a necessary change in the code that does the
cast. Using CONST_CAST, instead of a bare cast, makes these changes
visible.
Inspired by my own work elsewhere:
http://git.savannah.gnu.org/cgit/pspp.git/tree/src/libpspp/cast.h#n80
Signed-off-by: Ben Pfaff <blp@nicira.com>
"smap" is now the appropriate data structure for a string-to-string map.
Also changes ovsdb_datum_from_shash() into ovsdb_datum_from_smap() since
system-stats related code was the only client.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Replaced all instances of Nicira Networks(, Inc) to Nicira, Inc.
Feature #10593
Signed-off-by: Raju Subramanian <rsubramanian@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
OVS has two Python tests that have always failed, for reasons not
understood, since they were added to the tree. This commit fixes them.
One problem was that Python was assuming that stdout was encoded in ASCII.
Apparently the only way to "fix" this at runtime is to set PYTHONIOENCODING
to utf_8 in the environment, so this change does that.
Second, it appears that Python really doesn't like to print invalid UTF-8,
so this avoids doing that in python/ovs/json.py, instead just printing
the hexadecimal values of the invalid bytes. For consistency, it makes
the same change to the C version.
Third, the C version of test-ovsdb doesn't check UTF-8 for consistency, it
just sends it blindly to the OVSDB server, but Python does check it and so
it bails out earlier. This commit changes the Python version of the
"no invalid UTF-8 sequences in strings" to allow for the slight difference
in output that occurs for that reason.
Finally, test-ovsdb.py needs to convert error messages to Unicode
explicitly before printing them in the "parse-atoms" function. I don't
really understand why, but now it works.
ovs-vsctl will, in upcoming commits, want to more closely examine its
ovsdb_symbol_table structures. This could be done by providing a more
complete API, but it doesn't seem worth it to me. This commit instead goes
the other way, exposing the internals to clients. This commit also
eliminates the ovsdb_symbol_table_find_uncreated() function, which
ovs-vsctl can now implement itself.
The name 'created' better reflects the actual meaning of this member: in
both ovsdb and ovs-vsctl, it is true if a row has been created with the
symbol's UUID and false otherwise.
The "uuid-name" that creates symbols must be an <id> but we weren't
verifying the same constraint on the "named-uuid"s that refer to symbols,
which was a bit confusing in writing transactions by hand. This commit
fixes the inconsistency and updates the SPECS file to clarify that a
named-uuid string has to be an <id>.
These new functions are more forgiving than the corresponding functions
without "_unique". The goal is to be more tolerant of data provided by
IDL clients, which will happen in a followup patch.
The wait-until command to be added in an upcoming commit needs to support
!=, <, >, <=, and >= operators in addition to =, so this commit adds that
infrastructure.