Currently ovsdb-server is using shallow copies of some JSON objects
by keeping a reference counter. JSON string objects are also used
directly as ovsdb atoms in database rows to avoid extra copies.
Taking this approach one step further ovsdb_datum objects can also
be mostly deduplicated by postponing the copy until it actually
needed. datum object itself contains a type and 2 pointers to
data arrays. Adding a one more pointer to a reference counter
we may create a shallow copy of the datum by simply copying type
and pointers and increasing the reference counter.
Before modifying the datum, special function needs to be called
to perform an actual copy of the object, a.k.a. unshare it.
Most of the datum modifications are performed inside the special
functions in ovsdb-data.c, so that is not very hard to track.
A few places like ovsdb-server.c and column mutations are accessing
and changing the data directly, so a few extra unshare() calls
has to be added there.
This change doesn't affect the maximum memory consumption too much,
because most of the copies are short-living. However, not actually
performing these copies saves up to 40% of CPU time on operations
with large sets.
Reported-at: https://bugzilla.redhat.com/2069089
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This is undefined behavior and was reported by UB Sanitizer:
lib/meta-flow.c:3445:16: runtime error:
member access within null pointer of type 'struct vl_mf_field'
0 0x6aad0f in mf_get_vl_mff lib/meta-flow.c:3445
1 0x6d96d7 in mf_from_oxm_header lib/nx-match.c:260
2 0x6d9e2e in nx_pull_header__ lib/nx-match.c:341
3 0x6daafa in nx_pull_header lib/nx-match.c:488
4 0x6abcb6 in mf_vl_mff_nx_pull_header lib/meta-flow.c:3605
5 0x73b9be in decode_NXAST_RAW_REG_MOVE lib/ofp-actions.c:2652
6 0x764ccd in ofpact_decode lib/ofp-actions.inc2:4681
[...]
lib/sset.c:315:12: runtime error: applying zero offset to null pointer
0 0xcc2e6a in sset_at_position lib/sset.c:315:12
1 0x5734b3 in port_dump_next ofproto/ofproto-dpif.c:4083:20
[...]
lib/ovsdb-data.c:2194:56:
runtime error: applying zero offset to null pointer
0 0x5e9530 in ovsdb_datum_added_removed lib/ovsdb-data.c:2194:56
1 0x4d6258 in update_row_ref_count ovsdb/transaction.c:335:17
2 0x4c360b in for_each_txn_row ovsdb/transaction.c:1572:33
[...]
lib/ofpbuf.c:440:30:
runtime error: applying zero offset to null pointer
0 0x75066d in ofpbuf_push_uninit lib/ofpbuf.c:440
1 0x46ac8a in ovnacts_parse lib/actions.c:4190
2 0x46ad91 in ovnacts_parse_string lib/actions.c:4208
3 0x4106d1 in test_parse_actions tests/test-ovn.c:1324
[...]
lib/ofp-actions.c:3205:22:
runtime error: applying non-zero offset 2 to null pointer
0 0x6e1641 in set_field_split_str lib/ofp-actions.c:3205:22
[...]
lib/tnl-ports.c:74:12:
runtime error: applying zero offset to null pointer
0 0xceffe7 in tnl_port_cast lib/tnl-ports.c:74:12
1 0xcf14c3 in map_insert lib/tnl-ports.c:116:13
[...]
ofproto/ofproto.c:8905:16:
runtime error: applying zero offset to null pointer
0 0x556795 in eviction_group_hash_rule ofproto/ofproto.c:8905:16
1 0x503f8d in eviction_group_add_rule ofproto/ofproto.c:9022:42
[...]
Also, it's valid to have an empty ofpact list and we should be able to
try to iterate through it.
UB Sanitizer report:
include/openvswitch/ofp-actions.h:222:12:
runtime error: applying zero offset to null pointer
0 0x665d69 in ofpact_end ./include/openvswitch/ofp-actions.h:222:12
1 0x66b2cf in ofpacts_put_openflow_actions lib/ofp-actions.c:8861:5
2 0x6ffdd1 in ofputil_encode_flow_mod lib/ofp-flow.c:447:9
[...]
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
ovsdb_atom_string and json_string are basically the same data structure
and ovsdb-server frequently needs to convert one to another. We can
avoid that by using json_string from the beginning for all ovsdb
strings. So, the conversion turns into simple json_clone(), i.e.
increment of a reference counter. This change gives a moderate
performance boost in some scenarios, improves the code clarity and
may be useful for future development.
Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
ovsdb-server spends a lot of time cloning atoms for various reasons,
e.g. to create a diff of two rows or to clone a row to the transaction.
All atoms, except for strings, contains a simple value that could be
copied in efficient way, but duplicating strings every time has a
significant performance impact.
Introducing a new reference-counted structure 'ovsdb_atom_string'
that allows to not copy strings every time, but just increase a
reference counter.
This change allows to increase transaction throughput in benchmarks
up to 2x for standalone databases and 3x for clustered databases, i.e.
number of transactions that ovsdb-server can handle per second.
It also noticeably reduces memory consumption of ovsdb-server.
Next step will be to consolidate this structure with json strings,
so we will not need to duplicate strings while converting database
objects to json and back.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Mark D. Gray <mark.d.gray@redhat.com>
ovsdb_datum_apply_diff() is heavily used in ovsdb transactions, but
it's linear in terms of number of comparisons. And it also clones
all the atoms along the way. In most cases size of a diff is much
smaller than the size of the original datum, this allows to perform
the same operation in-place with only O(diff->n * log2(old->n))
comparisons and O(old->n + diff->n) memory copies with memcpy.
Using this function while applying diffs read from the storage gives
a significant performance boost and allows to execute much more
transactions per second.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Mark D. Gray <mark.d.gray@redhat.com>
Current algorithm of ovsdb_datum_union looks like this:
for-each atom in b:
if not bin_search(a, atom):
push(a, clone(atom))
quicksort(a)
So, the complexity looks like this:
Nb * log2(Na) + Nb + (Na + Nb) * log2(Na + Nb)
Comparisons clones Comparisons for quicksort
for search
ovsdb_datum_union() is heavily used in database transactions while
new element is added to a set. For example, if new logical switch
port is added to a logical switch in OVN. This is a very common
use case where CMS adds one new port to an existing switch that
already has, let's say, 100 ports. For this case ovsdb-server will
have to perform:
1 * log2(100) + 1 clone + 101 * log2(101)
Comparisons Comparisons for
for search quicksort.
~7 1 ~707
Roughly 714 comparisons of atoms and 1 clone.
Since binary search can give us position, where new atom should go
(it's the 'low' index after the search completion) for free, the
logic can be re-worked like this:
copied = 0
for-each atom in b:
desired_position = bin_search(a, atom)
push(result, a[ copied : desired_position - 1 ])
copied = desired_position
push(result, clone(atom))
push(result, a[ copied : Na ])
swap(a, result)
Complexity of this schema:
Nb * log2(Na) + Nb + Na
Comparisons clones memory copy on push
for search
'swap' is just a swap of a few pointers. 'push' is not a 'clone',
but a simple memory copy of 'union ovsdb_atom'.
In general, this schema substitutes complexity of a quicksort
with complexity of a memory copy of Na atom structures, where we're
not even copying strings that these atoms are pointing to.
Complexity in the example above goes down from 714 comparisons
to 7 comparisons and memcpy of 100 * sizeof (union ovsdb_atom) bytes.
General complexity of a memory copy should always be lower than
complexity of a quicksort, especially because these copies usually
performed in bulk, so this new schema should work faster for any input.
All in all, this change allows to execute several times more
transactions per second for transactions that adds new entries to sets.
Alternatively, union can be implemented as a linear merge of two
sorted arrays, but this will result in O(Na) comparisons, which
is more than Nb * log2(Na) in common case, since Na is usually
far bigger than Nb. Linear merge will also mean per-atom memory
copies instead of copying in bulk.
'replace' functionality of ovsdb_datum_union() had no users, so it
just removed. But it can easily be added back if needed in the future.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Han Zhou <hzhou@ovn.org>
Acked-by: Mark D. Gray <mark.d.gray@redhat.com>
Currently, even if one reference added to the set of strong references
or removed from it, ovsdb-server will walk through the whole set and
re-count references to other rows. These referenced rows will also be
added to the transaction in order to re-count their references.
For example, every time Logical Switch Port added to a Logical Switch,
OVN Northbound database server will walk through all ports of this
Logical Switch, clone their rows, and re-count references. This is
not very efficient. Instead, it can only increase reference counters
for added references and reduce for removed ones. In many cases this
will be only one row affected in the Logical_Switch_Port table.
Introducing new function that generates a diff of two datum objects,
but stores added and removed atoms separately, so they can be used
to increase or decrease row reference counters accordingly.
This change allows to perform several times more transactions that
adds or removes strong references to/from sets per second, because
ovsdb-server no longer clones and re-counts rows that are irrelevant
to current transaction.
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
With this change, "ovsdb-client convert" can be used to convert a database
from one schema to another without taking the database offline.
This can be useful to minimize downtime for a database during a software
upgrade.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
The ovsdb-client "dump" command is a fairly low-level tool that can be
used, among other purposes, to debug the OVSDB protocol. It's better if
it just prints what the server sends without being too judgmental about it.
Thus, we might as well ignore constraints for the purpose of dumping
tables.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
This patch adds 'extern "C"' in a couple of header files so that
they can be compiled with C++ compilers.
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Now that the 'new' datum is named 'new_datum', be more consistent by
renaming 'old' to 'old_datum' to match.
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
In C++, 'new' is a keyword. If this is used as the name for a field,
then C++ compilers can get confused about the context and fail to
compile references to such fields. Rename the field to 'new_datum' to
avoid this issue.
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
Adding / removing a range of integers to a column accepting a set of
integers requires enumarating all of the integers. This patch simplifies
it by introducing 'range' concept to the database commands. Two integers
separated by a hyphen represent an inclusive range.
The patch adds positive and negative tests for the new syntax.
The patch was tested by 'make check'. Covarage was tested by
'make check-lcov'.
Signed-off-by: Lukasz Rzasik <lukasz.rzasik@gmail.com>
Suggested-by: <my_ovs_discuss@yahoo.com>
Suggested-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Most users of OVSDB react to whatever is currently in their view of the
database, as opposed to keeping track of changes and reacting to those
changes individually. The interface to conditional monitoring was
different, in that it expected the client to say what to add or remove from
monitoring instead of what to monitor. This seemed reasonable at the time,
but in practice it turns out that the usual approach actually works better,
because the condition is generally a function of the data visible in the
database. This commit changes the approach.
This commit also changes the meaning of an empty condition for a table.
Previously, an empty condition meant to replicate every row. Now, an empty
condition means to replicate no rows. This is more convenient for code
that gradually constructs conditions, because it does not need special
cases for replicating nothing.
This commit also changes the internal implementation of conditions from
linked lists to arrays. I just couldn't see an advantage to using linked
lists.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Liran Schour <lirans@il.ibm.com>
There's no reason to have three copies of this code for every smap-type
column.
The code wasn't a perfect match for ovsdb_datum_from_smap(), so this commit
also changes ovsdb_datum_from_smap() to better suit it. It only had one
caller and the new design is adequate for that caller.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Andy Zhou <azhou@ovn.org>
To easily allow both in- and out-of-tree building of the Python
wrapper for the OVS JSON parser (e.g. w/ pip), move json.h to
include/openvswitch. This also requires moving lib/{hmap,shash}.h.
Both hmap.h and shash.h were #include-ing "util.h" even though the
headers themselves did not use anything from there, but rather from
include/openvswitch/util.h. Fixing that required including util.h
in several C files mostly due to OVS_NOT_REACHED and things like
xmalloc.
Signed-off-by: Terry Wilson <twilson@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
When an OVSDB column change its value, it is more efficient to only
send what has changed, rather than sending the entire new copy.
This is analogous to software programmer send patches rather than
the entire source file.
For columns store a single element, the "diff" datum is the same
as the "new" datum.
For columns that store set or map, it is only necessary to send the
information about the elements changed (including addition or removal).
The "diff" for those types are all elements that are changed.
Those APIs are mainly used for implementing a new OVSDB server
"update2" JSON-RPC notification, which encodes modifications
of a column with the contents of those "diff"s. Later patch implements
the "update2" notification.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Added ovsdb_transient_datum_from_json() to avoid size check for
the diff datum that is transient in nature.
Suppose a datum contains set, and the max number of elements is 2.
If we are changing from set that contains [A, B], to a set contains
[C, D], the diff datum will contains 4 elements [A, B, C, D].
Thus diff datum should not be constrained by the size limit. However
the datum after diff is applied should not violate the size limit.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Ben Pfaff <blp@ovn.org>
The following macros are renamed to avoid conflicts with other headers:
* WARN_UNUSED_RESULT to OVS_WARN_UNUSED_RESULT
* PRINTF_FORMAT to OVS_PRINTF_FORMAT
* NO_RETURN to OVS_NO_RETURN
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This reverts commit 0ea7bec76d804a2c4efccd3dbdaa3e30cf536a5c.
Connections that queue up too much data, because they are monitoring a
table that is changing quickly and failing to keep up with the updates,
cause problems with buffer management. Since commit 60533a405b2e
(jsonrpc-server: Disconnect connections that queue too much data.),
ovsdb-server has dealt with them by disconnecting the connection and
letting them start up again with a fresh copy of the database. However,
this is not ideal because of situations where disconnection happens
repeatedly. For example:
- A manager toggles a column back and forth between two or more values
quickly (in which case the data transmitted over the monitoring
connections always increases quickly, without bound).
- A manager repeatedly extends the contents of some column in some row
(in which case the data transmitted over the monitoring connection
grows with O(n**2) in the length of the string).
A better way to deal with this problem is to combine updates when they are
sent to the monitoring connection, if that connection is not keeping up.
In both the above cases, this reduces the data that must be sent to a
manageable amount. An upcoming patch implements this new way. This commit
reverts part of the previous solution that disconnects backlogged
connections, since it is no longer useful.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
"smap" is now the appropriate data structure for a string-to-string map.
Also changes ovsdb_datum_from_shash() into ovsdb_datum_from_smap() since
system-stats related code was the only client.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Replaced all instances of Nicira Networks(, Inc) to Nicira, Inc.
Feature #10593
Signed-off-by: Raju Subramanian <rsubramanian@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
ovs-vsctl will, in upcoming commits, want to more closely examine its
ovsdb_symbol_table structures. This could be done by providing a more
complete API, but it doesn't seem worth it to me. This commit instead goes
the other way, exposing the internals to clients. This commit also
eliminates the ovsdb_symbol_table_find_uncreated() function, which
ovs-vsctl can now implement itself.
The name 'created' better reflects the actual meaning of this member: in
both ovsdb and ovs-vsctl, it is true if a row has been created with the
symbol's UUID and false otherwise.
These new functions are more forgiving than the corresponding functions
without "_unique". The goal is to be more tolerant of data provided by
IDL clients, which will happen in a followup patch.
This makes it easy to create a bunch of records that are all related to
each other in a single ovs-vsctl invocation. It adds an example to the
ovs-vsctl manpage.
Some of the uses for the formerly supported regular expression constraints
were simply to limit values to those in a set of allowed values.
This commit adds support for that kind of simple enumeration constraint.
The upcoming "remove" command for ovs-vsctl wants to try parsing an
argument two different ways. This doesn't work if a parse error always
aborts immediately. This commit fixes the problem, by making a parsing
failure pass up an error for higher layers to deal with instead of aborting
immediately.
This commit should have no user-visible effect.
These functions provide an alternative to JSON parsing and formatting that
is more human-friendly (and shorter).
These will be used in an upcoming commit to enhance ovs-vsctl.
This commit refactors the functions for working with "struct ovsdb_datum",
adding and exposing some more operations for ovs-vsctl to use in an
upcoming commit.
When a new record is inserted into a database, ovsdb logs the values of all
of the fields in the record. However, often new records have many columns
that contain default values. There is no need to log those values, so this
commit causes them to be omitted.
As a side effect, this also makes "ovsdb-tool show-log --more --more"
output easier to read, because record insertions print less noise. (Adding
--more --more to this command makes it print changes to database records.
The --more option will be introduced in an upcoming commit.)