2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-22 09:58:01 +00:00

233 Commits

Author SHA1 Message Date
Ilya Maximets
1de4a08c22 json: Use functions to access json arrays.
Internal implementation of JSON array will be changed in the future
commits.  Add access functions that users can rely on instead of
accessing the internals of 'struct json' directly and convert all the
users.  Structure fields are intentionally renamed to make sure that
no code is using the old fields directly.

json_array() function is removed, as not needed anymore.  Added new
functions:  json_array_size(), json_array_at(), json_array_set()
and json_array_pop().  These are enough to cover all the use cases
within OVS.

The change is fairly large, however, IMO, it's a much overdue cleanup
that we need even without changing the underlying implementation.

Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-06-30 16:53:56 +02:00
Ilya Maximets
6c48b29f52 json: Always use the json_string() method to access the strings.
We'll be changing the way strings are stored, so the direct access
will not be safe anymore.  Change all the users to use the proper
API as they should have been doing anyway.  This also means splitting
the handling of strings and serialized objects in most cases as
they will be treated differently.

The only code outside of json implementation for which direct access
is preserved is substitute_uuids() in test-ovsdb.c.  It's an unusual
string manipulation that is only needed for the testing, so doesn't
seem worthy adding a new API function.  We could introduce something
like json_string_replace() if this use case will appear somewhere
else in the future.

Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-06-30 16:53:56 +02:00
Dmitry Porokh
421c94ee14 ovsdb: Introduce and use specialized uuid print functions.
According to profiling data, converting UUIDs to strings is a frequent
operation in some workloads. This typically results in a call to
xasprintf(), which internally calls vsnprintf() twice, first to
calculate the required buffer size, and then to format the string.

This patch introduces specialized functions for printing UUIDs, which
both reduces code duplication and improves performance.

For example, on my laptop, 10,000,000 calls to the new uuid_to_string()
function takes 1296 ms, while the same number of xasprintf() calls using
UUID_FMT take 2498 ms.

Signed-off-by: Dmitry Porokh <dporokh@nvidia.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
2025-05-08 09:28:21 +02:00
Ilya Maximets
e6844c6460 ovsdb-idl: Fix use of uninitialized datum for graph consistency check.
Columns in 'new_datum' may not be initialized if they were not written,
i.e., when the column in not in the 'written' bitmap.  In this case the
actual content should be read from the 'old_datum' instead.  If the old
one is also not available, then the default should be used.

 WARNING: MemorySanitizer: use-of-uninitialized-value
  0 0x78d27f in ovsdb_idl_check_consistency lib/ovsdb-idl.c:732:21
  1 0x4ee12a in idl_set tests/test-ovsdb.c:2586:9
  2 0x4d7c4b in do_idl tests/test-ovsdb.c:2868:18
  3 0x6c5704 in ovs_cmdl_run_command__ lib/command-line.c:247:17
  4 0x6c4d28 in ovs_cmdl_run_command lib/command-line.c:278:5
  5 0x4c39bf in main tests/test-ovsdb.c:80:5
  6 0x7f912a02958f in __libc_start_call_main
  7 0x7f912a02963f in __libc_start_main@GLIBC_2.2.5
  8 0x432b54 in _start (tests/test-ovsdb+0x432b54)

Use the ovsdb_idl_read() helper to read the actual correct data
during the consistency check.

Alternative might be to iterate over the 'written' bitmap and only
check those columns, but it seems like that will reduce the intended
scope of the check, since 'new_datum' may exist while the 'written'
bitmap doesn't, e.g., when 'new_datum == old_datum'.

Fixes: 11990a5274f7 ("ovsdb-idl: Check internal graph in OVSDB tests.")
Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2024-11-29 18:37:50 +01:00
Xavier Simonart
c98759a4f3 ovsdb-idl: Fix IDL memory leak.
In the following case, we could see multiple leaks detected for memory allocated
by ovsdb_idl_txn_add_map_op: insert a row in a table, set a key and delete the row
(all within the same transaction).

For instance:
Direct leak of 24 byte(s) in 1 object(s) allocated from:
    0 0x4e60a7 in calloc (./tests/test-ovsdb+0x4e60a7)
    1 0x5f9b32 in xcalloc__ ./lib/util.c:125:31
    2 0x5f9b60 in xzalloc__ ./lib/util.c:135:12
    3 0x5f9c25 in xzalloc ./ovs/lib/util.c:169:12
    4 0x5d4899 in ovsdb_idl_txn_add_map_op ./lib/ovsdb-idl.c:4175:29
    5 0x5d4758 in ovsdb_idl_txn_write_partial_map ./lib/ovsdb-idl.c:4332:5
    6 0x53cbe8 in idltest_simple2_update_smap_setkey ./tests/idltest.c:4701:5
    7 0x526fe2 in do_idl_partial_update_map_column ./tests/test-ovsdb.c:3027:5
    8 0x59d99c in ovs_cmdl_run_command__ ./lib/command-line.c:247:17
    9 0x59d79a in ovs_cmdl_run_command ./lib/command-line.c:278:5
    10 0x51d458 in main ./tests/test-ovsdb.c:80:5
    11 0x7f0a20a79b74 in __libc_start_main (/lib64/libc.so.6+0x27b74)

Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Xavier Simonart <xsimonar@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
2024-09-11 15:41:06 +02:00
Mike Pattrick
4102674b3e ovsdb-idl: Preserve change_seqno when deleting rows.
In the case of a weak reference, clearing all change_seqno's can delete
useful information. Instead of clearing all seqno's when removing
track_node, only clear those values in cases including row insertion,
and row deleting if no dst_arcs remain.

Fixes: 95689f166818 ("ovsdb-idl: Preserve references for deleted rows.")
Reported-at: https://issues.redhat.com/browse/FDP-193
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Simon Horman <horms@ovn.org>
2024-01-04 13:26:20 +00:00
Han Zhou
8833e7c8ed ovsdb-idl: Provide API to disable set_db_change_aware request.
For ovsdb clients that are short-lived, e.g. when using
ovn-nbctl/ovn-sbctl to read some metrics from the OVN NB/SB server, they
don't really need to be aware of db changes, because they exit
immediately after getting the initial response for the requested data.
In such use cases, however, the clients still send 'set_db_change_aware'
request, which results in server side error logs when the server tries
to send out the response for the 'set_db_change_aware' request, because
at the moment the client that is supposed to receive the request has
already closed the connection and exited. E.g.:

2023-01-10T18:23:29.431Z|00007|jsonrpc|WARN|unix#3: receive error: Connection reset by peer
2023-01-10T18:23:29.431Z|00008|reconnect|WARN|unix#3: connection dropped (Connection reset by peer)

To avoid such problems, this patch provides an API to allow a client to
choose to not send the 'set_db_change_aware' request.

There was an earlier attempt to fix this [0], but it was not accepted
back then as discussed in the email [1]. It was also discussed in the
emails that an alternative approach is to use notification instead of
request, but that would require protocol changes and taking backward
compatibility into consideration. So this patch takes a different
approach and tries to keep the change small.

[0] http://patchwork.ozlabs.org/project/openvswitch/patch/1594380801-32134-1-git-send-email-dceara@redhat.com/

[1] https://mail.openvswitch.org/pipermail/ovs-discuss/2021-February/050919.html

Reported-by: Girish Moodalbail <gmoodalbail@nvidia.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2020-July/050343.html
Reported-by: Tobias Hofmann <tohofman@cisco.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2021-February/050914.html
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-16 20:14:10 +01:00
Numan Siddique
55b9507e68 ovsdb-idl: Add the support to specify the uuid for row insert.
ovsdb-server allows the OVSDB clients to specify the uuid for
the row inserts [1].  Both the C IDL client library and Python
IDL are missing this feature.  This patch adds this support.

In C IDL, for each schema table, a new function is generated -
<schema_table>insert_persistent_uuid(txn, uuid) which can
be used the clients to persist the uuid.

ovs-vsctl and other derivatives of ctl now supports the same
in the generic 'create' command with the option "--id=<UUID>".

In Python IDL, the uuid to persist can be specified in
the Transaction.insert() function.

[1] - a529e3cd1f("ovsdb-server: Allow OVSDB clients to specify the UUID for inserted rows.:)

Acked-by: Adrian Moreno <amorenoz@redhat.com>
Acked-by: Han Zhou <hzhou@ovn.org>
Acked-by: Terry Wilson <twilson@redhat.com>
Signed-off-by: Numan Siddique <numans@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-11-30 15:15:57 +01:00
Xavier Simonart
02f31a1262 ovsdb-idl: Preserve references for rows deleted in same IDL run as their insertion.
Considering two DB rows, 'a' from table A and 'b' from table B (with
column 'ref_a' a reference to table A):
a = {A._uuid=<U1>}
b = {B._uuid=<U2>, B.ref_a=<U1>}

Assuming both records are inserted in the IDL client's in-memory view
of the database, if row 'b' is also deleted in the same run, it should
generate the following tracked changes:

- for table A:
  - inserted records: a = {A._uuid=<U1>}
- for table B:
  - inserted records: b = {B._uuid=<U2>, B.ref_a=<U1>}
  - deleted records: b = {B._uuid=<U2>, B.ref_a=<U1>}

Before this patch, inserted and deleted records in table B
would (in some cases [0]) be b = {B._uuid=<U2>, B.ref_a=[]}.
Having B.ref_a=[] would violate the integrity of the database from
client perspective.

test-ovsdb has also been updated to show that one row can be
both inserted and deleted within one IDL run.

[0] In ovn-controller the fact that the reference is NULL caused a
    crash in the following case, when both commands were handled by
    ovn-controller within the same loop:
    $ ovn-nbctl ls-add sw0 -- lsp-add sw0 sw0-port1 -- \
        lsp-set-addresses sw0-port1 "50:54:00:00:00:01 192.168.0.2"
    $ ovn-nbctl lsp-del sw0-port1

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2126450
Fixes: 91e1ff5dde39 ("ovsdb-idl: Don't reparse orphaned rows.")
Signed-off-by: Xavier Simonart <xsimonar@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-09-27 00:44:35 +02:00
Frode Nordahl
0443c1557e Fix spelling error exposed in binaries.
As reported by Debian lintian.

Signed-off-by: Frode Nordahl <frode.nordahl@canonical.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-07-14 15:24:07 +02:00
Ilya Maximets
485ac63d10 ovsdb: Add lazy-copy support for ovsdb_datum objects.
Currently ovsdb-server is using shallow copies of some JSON objects
by keeping a reference counter.  JSON string objects are also used
directly as ovsdb atoms in database rows to avoid extra copies.

Taking this approach one step further ovsdb_datum objects can also
be mostly deduplicated by postponing the copy until it actually
needed.  datum object itself contains a type and 2 pointers to
data arrays.  Adding a one more pointer to a reference counter
we may create a shallow copy of the datum by simply copying type
and pointers and increasing the reference counter.

Before modifying the datum, special function needs to be called
to perform an actual copy of the object, a.k.a. unshare it.
Most of the datum modifications are performed inside the special
functions in ovsdb-data.c, so that is not very hard to track.
A few places like ovsdb-server.c and column mutations are accessing
and changing the data directly, so a few extra unshare() calls
has to be added there.

This change doesn't affect the maximum memory consumption too much,
because most of the copies are short-living.  However, not actually
performing these copies saves up to 40% of CPU time on operations
with large sets.

Reported-at: https://bugzilla.redhat.com/2069089
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-07-13 20:33:07 +02:00
Dumitru Ceara
c558f9f1e1 ovsdb-idl: Get per-database memory usage statistics.
Clients might be connected to multiple databases (e.g., ovn-controller
is connected to OVN_Southbound and Open_vSwitch databases) and the IDL
memory statistics are more useful if they're not aggregated.

Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-06-28 13:41:42 +02:00
Dumitru Ceara
d94cd0d3ee ovsdb-idl: Support write-only-changed IDL monitor mode.
At a first glance, change tracking should never be allowed for
write-only columns.  However, some clients (e.g., ovn-northd) that are
mostly exclusive writers of a database, use change tracking to avoid
duplicating the IDL row records into a local cache when implementing
incremental processing.

The default behavior of the IDL is to automatically turn a write-only
column into a read-write column whenever the client enables change
tracking for that column.

For the afore mentioned clients, this becomes a performance issue.
Commit 1cc618c32524 ("ovsdb-idl: Fix atomicity of writes that don't
change a column's value.") explains why writes that don't change a
column's value cannot be optimized out early if the column is
read/write.

Furthermore, if there is at least one record in any table that
changed during a transaction, then *all* records that have been
written are added to the transaction, even if their values didn't
change.  If there are many such rows (e.g., like in ovn-northd's
case) this incurs a significant overhead because:
a. the client has to build this large transaction
b. the transaction has to be sent over the network
c. the server needs to parse this (mostly) no-op update

We now introduce new IDL APIs allowing users to set a new monitoring
mode flag, OVSDB_IDL_WRITE_CHANGED_ONLY, to indicate to the IDL that the
atomicity constraints may be relaxed and written columns that don't
change value can be skipped from the current transaction.

We benchmarked ovn-northd performance when using this new mode
against NB and SB databases taken from ovn-kubernetes scale tests.
We noticed that when a minor change is performed to the Northbound
database (e.g., NB_Global.nb_cfg is incremented) the time it takes to
build the Southbound transaction becomes negligible (vs ~1.5 seconds
before this change).

End-to-end ovn-kubernetes scale tests on 120-node clusters also show
significant reduction of latency to bring up pods; both average and P99
latency decreased by ~30%.

Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-04-28 16:57:43 +02:00
Adrian Moreno
9e56549c2b hmap: use short version of safe loops if possible.
Using SHORT version of the *_SAFE loops makes the code cleaner and less
error prone. So, use the SHORT version and remove the extra variable
when possible for hmap and all its derived types.

In order to be able to use both long and short versions without changing
the name of the macro for all the clients, overload the existing name
and select the appropriate version depending on the number of arguments.

Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-30 16:59:02 +02:00
Adrian Moreno
e9bf5bffb0 list: use short version of safe loops if possible.
Using the SHORT version of the *_SAFE loops makes the code cleaner
and less error-prone. So, use the SHORT version and remove the extra
variable when possible.

In order to be able to use both long and short versions without changing
the name of the macro for all the clients, overload the existing name
and select the appropriate version depending on the number of arguments.

Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-30 16:59:02 +02:00
Dumitru Ceara
4628be9ff8 ovsdb-idl: Fix use-after-free when destroying an IDL loop.
Transactions that are still incomplete (waiting for a reply from the
server) are kept in the IDL's 'outstanding_txns' map.  When a transaction
is destroyed, ovsdb_idl_txn_destroy() will take care of removing the
transaction from the 'outstanding_txns' map if the transaction was
incomplete but also abort it and disassemble it if needed.

Aborting the transaction first, before ovsdb_idl_txn_destroy(), may
cause an use-after-free if the transaction was outstanding; that's
because the transaction would move to state "aborted" without being
removed from the 'outstanding_txns' map.

Fixes: 53a540e5311c ("ovsdb-idl: ovsdb_idl_loop_destroy must also destroy the committing txn.")
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-02-16 20:56:36 +01:00
Dumitru Ceara
28f36edd19 ovsdb-idl: Only process successful txn in ovsdb_idl_loop_run.
Otherwise we hide the transaction result from the user.  This may cause
problems as the user will not detect error cases.  For example, if the
server refuses a transaction with "constraint violation", the user
should be notified because the transaction might need to be retried.

For clients that process database changes incrementally (using change
tracking) this lack of failure notification creates a problem if it
occurs while no other database changes happen.  In that case:
- ovsdb_idl_loop_run() silently consumes the failure, initializes a
  new transaction.
- no other table update was received from the server so the user will
  not add anything to the new transaction.
- ovsdb_idl_loop_commit_and_wait() will "succeed" as nothing changed
  from the client's perspective.
In reality, the first transaction failed and the client wasn't given
the chance to handle the failure.

Commit 0401cf5f9e06 ("ovsdb idl: Try committing the pending txn in
ovsdb_idl_loop_run.") tried to optimize for the common, successful
case.  Maintain the same approach and optimize for transactions that
succeeded but fall back to the old mechanism of processing failures
within ovsdb_idl_loop_commit_and_wait() instead.

Fixes: 0401cf5f9e06 ("ovsdb idl: Try committing the pending txn in ovsdb_idl_loop_run.")
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-02-04 11:48:31 +01:00
Dumitru Ceara
53a540e531 ovsdb-idl: ovsdb_idl_loop_destroy must also destroy the committing txn.
Found by AddressSanitizer when running OVN tests:
  Direct leak of 64 byte(s) in 1 object(s) allocated from:
      #0 0x498fb2 in calloc (/ic/ovn-ic+0x498fb2)
      #1 0x5f681e in xcalloc__ ovs/lib/util.c:121:31
      #2 0x5f681e in xzalloc__ ovs/lib/util.c:131:12
      #3 0x5f681e in xzalloc ovs/lib/util.c:165:12
      #4 0x5e3697 in ovsdb_idl_txn_add_map_op ovs/lib/ovsdb-idl.c:4057:29
      #5 0x4d3f25 in update_isb_pb_external_ids ic/ovn-ic.c:576:5
      #6 0x4cc4cc in create_isb_pb ic/ovn-ic.c:716:5
      #7 0x4cc4cc in port_binding_run ic/ovn-ic.c:803:21
      #8 0x4cc4cc in ovn_db_run ic/ovn-ic.c:1700:5
      #9 0x4c9c1c in main ic/ovn-ic.c:1984:17
      #10 0x7f9ad9f4a0b2 in __libc_start_main

Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-31 21:29:43 +01:00
Dumitru Ceara
91e1ff5dde ovsdb-idl: Don't reparse orphaned rows.
Rows that refer to rows that were inserted in the current IDL run should
only be reparsed if they don't get deleted (become orphan) in the current
IDL run.

Fixes: 7b8aeadd60c8 ("ovsdb-idl: Re-parse backrefs of inserted rows only once.")
Reported-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-11-30 13:44:47 +01:00
Ilya Maximets
7b8aeadd60 ovsdb-idl: Re-parse backrefs of inserted rows only once.
While adding new rows ovsdb-idl re-parses all the other rows that
references this new one.  For example, current ovn-kubernetes creates
load balancers and adds the same load balancer to all logical switches
and logical routers.  So, then a new load balancer is added, rows for
all logical switches and routers re-parsed.

During initial database connection (or re-connection with
monitor/monitor_cond or monitor_cond_since with outdated last
transaction id) the client downloads the whole content of a database.
In case of OVN, there might be already thousands of load balancers
configured.  ovsdb-idl will process rows in that initial monitor reply
one-by-one.  Therefore, for each load balancer row, it will re-parse
all rows for switches and routers.

Assuming that we have 120 Logical Switches and 30K load balancers.
Processing of the initial monitor reply will take 120 (switch rows) *
30K (load balancer references in a switch row) * 30K (load balancer
rows) = 10^11 operations, which may take hours.  ovn-kubernetes will
use LB groups eventually, but there are other less obvious cases that
cannot be changed that easily.

Re-parsing doesn't change any internal structures of the IDL.  It
destroys and re-creates exactly same arcs between rows.  The only
thing that changes is the application-facing array of pointers.

Since internal structures remains intact, suggested solution is to
postpone the re-parsing of back references until all the monitor
updates processed.  This way we can re-parse each row only once.

Tested in a sandbox with 120 LSs, 120 LRs and 3K LBs, where each
load balancer added to each LS and LR, by re-statring ovn-northd and
measuring the time spent in ovsdb_idl_run().

Before the change:

  OVN_Southbound: ovsdb_idl_run took: 924 ms
  OVN_Northbound: ovsdb_idl_run took: 825118 ms  --> 13.75 minutes!

After:

  OVN_Southbound: ovsdb_idl_run took: 692 ms
  OVN_Northbound: ovsdb_idl_run took: 1698 ms

Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-11-19 17:25:30 +01:00
Dumitru Ceara
1bdda7b6d5 ovsdb-idl: Use functions to apply diff in place.
On large scale deployments with records that contain large sets, this
significantly improves client side performance as it avoids comparing
full contents of the old and new rows.

Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-11-05 00:06:21 +01:00
Ilya Maximets
066741d9c5 ovsdb-idl: Add memory report function.
Added new function to return memory usage statistics for database
objects inside IDL.  Statistics similar to what ovsdb-server reports.
Not counting _Server database as it should be small, hence doesn't
worth adding extra code to the ovsdb-cs module.  Can be added later
if needed.

ovs-vswitchd is a user in OVS, but this API will be mostly useful for
OVN daemons.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Han Zhou <hzhou@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>
2021-11-04 23:13:13 +01:00
Ilya Maximets
429b114c5a ovsdb-data: Deduplicate string atoms.
ovsdb-server spends a lot of time cloning atoms for various reasons,
e.g. to create a diff of two rows or to clone a row to the transaction.
All atoms, except for strings, contains a simple value that could be
copied in efficient way, but duplicating strings every time has a
significant performance impact.

Introducing a new reference-counted structure 'ovsdb_atom_string'
that allows to not copy strings every time, but just increase a
reference counter.

This change allows to increase transaction throughput in benchmarks
up to 2x for standalone databases and 3x for clustered databases, i.e.
number of transactions that ovsdb-server can handle per second.
It also noticeably reduces memory consumption of ovsdb-server.

Next step will be to consolidate this structure with json strings,
so we will not need to duplicate strings while converting database
objects to json and back.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Mark D. Gray <mark.d.gray@redhat.com>
2021-09-24 15:53:46 +02:00
Ilya Maximets
51946d2227 ovsdb-data: Optimize union of sets.
Current algorithm of ovsdb_datum_union looks like this:

  for-each atom in b:
      if not bin_search(a, atom):
          push(a, clone(atom))
  quicksort(a)

So, the complexity looks like this:

   Nb * log2(Na)   +    Nb     +   (Na + Nb) * log2(Na + Nb)
   Comparisons        clones       Comparisons for quicksort
   for search

ovsdb_datum_union() is heavily used in database transactions while
new element is added to a set.  For example, if new logical switch
port is added to a logical switch in OVN.  This is a very common
use case where CMS adds one new port to an existing switch that
already has, let's say, 100 ports.  For this case ovsdb-server will
have to perform:

   1 * log2(100)  + 1 clone + 101 * log2(101)
   Comparisons                Comparisons for
   for search                   quicksort.
       ~7           1            ~707
   Roughly 714 comparisons of atoms and 1 clone.

Since binary search can give us position, where new atom should go
(it's the 'low' index after the search completion) for free, the
logic can be re-worked like this:

  copied = 0
  for-each atom in b:
      desired_position = bin_search(a, atom)
      push(result, a[ copied : desired_position - 1 ])
      copied = desired_position
      push(result, clone(atom))
  push(result, a[ copied : Na ])
  swap(a, result)

Complexity of this schema:

   Nb * log2(Na)   +    Nb     +         Na
   Comparisons        clones       memory copy on push
   for search

'swap' is just a swap of a few pointers.  'push' is not a 'clone',
but a simple memory copy of 'union ovsdb_atom'.

In general, this schema substitutes complexity of a quicksort
with complexity of a memory copy of Na atom structures, where we're
not even copying strings that these atoms are pointing to.

Complexity in the example above goes down from 714 comparisons
to 7 comparisons and memcpy of 100 * sizeof (union ovsdb_atom) bytes.

General complexity of a memory copy should always be lower than
complexity of a quicksort, especially because these copies usually
performed in bulk, so this new schema should work faster for any input.

All in all, this change allows to execute several times more
transactions per second for transactions that adds new entries to sets.

Alternatively, union can be implemented as a linear merge of two
sorted arrays, but this will result in O(Na) comparisons, which
is more than Nb * log2(Na) in common case, since Na is usually
far bigger than Nb.  Linear merge will also mean per-atom memory
copies instead of copying in bulk.

'replace' functionality of ovsdb_datum_union() had no users, so it
just removed.  But it can easily be added back if needed in the future.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Han Zhou <hzhou@ovn.org>
Acked-by: Mark D. Gray <mark.d.gray@redhat.com>
2021-09-24 14:55:54 +02:00
Numan Siddique
7502849e95 ovsdb-idl: Add APIs to query if a table and a column is present.
This patch adds 2 new APIs in the ovsdb-idl client library
 - ovsdb_idl_server_has_table() and ovsdb_idl_server_has_column() to
query if a table and a column is present in the IDL or not.  This
patch also adds IDL helper functions which are auto generated from
the schema which makes it easier for the clients.

These APIs are required for scenarios where the server schema is old and
missing a table or column and the client (built with a new schema
version) does a transaction with the missing table or column.  This
results in a continuous loop of transaction failures.

Related-Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1992705
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-08-28 02:59:04 +02:00
Ilya Maximets
04f8881f57 ovsdb-idl: Fix the database update signaling if it has never been connected.
The symptom of this issue is that OVS bridge looses its IP address on
restart.

Simple reproducer:
 0. start ovsdb-server and ovs-vswitchd
 1. ovs-vsctl add-br br0
 2. ifconfig br0 10.0.0.1 up
 3. ovs-appctl -t ovs-vswitchd exit
 4. start ovs-vswitchd back.

After step #3 ovs-vswitchd is down, but br0 interface exists and
has configured IP address.  After step #4 there is no IP address
on the port br0.

What happened:
1. ovsdb-cs connects to the database via ovsdb-idl and requests
   database lock.
   --> get_schema for _Server database
   --> lock request

2. ovsdb-cs receives schema for the _Server database.  And sends
   monitor request.
   <-- schema for _Server
   --> monitor_cond for _Server

3. ovsdb-cs receives lock reply.
   <-- locked
   At this point ovsdb-cs generates OVSDB_CS_EVENT_TYPE_LOCKED
   event and passes it to ovsdb-idl.  ovsdb-idl increases change_seqno.

4. ovsdb_idl_has_ever_connected() is 'true' now, because change_seqno
   is not zero.

5. ovs-vswitchd decides that it has connection with database and
   all the initial data, therefore initiates configuration of bridges.
   bridge_run():ovsdb_idl_has_ever_connected() --> true

6. Since monitor request for the Open_vSwitch database is not even
   sent yet, the database is empty.  This leads to removal of all the
   ports and all other resources.

7. When data finally received, ovs-vswitchd re-creates bridges and
   ports, but IP addresses can not be restored.

While splitting out ovsdb-cs from ovsdb-idl one part of the logic
was lost.  Particularly, before the split, ovsdb-idl updated
change_seqno only in MONITORING state.

Restoring the logic by updating the change_seqno only if may send
transaction, i.e. lock is ours and ovsdb-cs is in the MONITORING
state.  This matches with the main purpose of increasing change_seqno
at this point, i.e. to force the client to re-try the transaction.
With this change ovsdb_idl_has_ever_connected() remains 'false'
until the first monitor reply with the actual data received.

This issue was reported several times during the last couple of weeks.

Reported-at: https://bugzilla.redhat.com/1968445
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-June/383512.html
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2021-June/051222.html
Fixes: 1c337c43ac1c ("ovsdb-idl: Break into two layers.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>
2021-06-11 01:11:57 +02:00
Dumitru Ceara
ac85cdb38c ovsdb-idl: Mark arc sources as updated when destination is deleted.
Considering two DB rows, 'a' from table A and 'b' from table B (with
column 'ref_a' a reference to table A):
a = {A._uuid=<U1>}
b = {B._uuid=<U2>, B.ref_a=<U1>}

When the IDL client processes an update that deletes row 'a', row 'b'
is also marked as 'updated' if change tracking is enabled for table B.

Fixes: 102781cc02c6 ("ovsdb-idl: Track changes for table references.")
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-04-01 14:15:49 +02:00
Dumitru Ceara
95689f1668 ovsdb-idl: Preserve references for deleted rows.
Considering two DB rows, 'a' from table A and 'b' from table B (with
column 'ref_a' a reference to table A):
a = {A._uuid=<U1>}
b = {B._uuid=<U2>, B.ref_a=<U1>}

Assuming both records are present in the IDL client's in-memory view of
the database, depending whether row 'b' is also deleted in the same
transaction or not, deletion of row 'a' should generate the following
tracked changes:

1. only row 'a' is deleted:
- for table A:
  - deleted records: a = {A._uuid=<U1>}
- for table B:
  - updated records: b = {B._uuid=<U2>, B.ref_a=[]}

2. row 'a' and row 'b' are deleted in the same update:
- for table A:
  - deleted records: a = {A._uuid=<U1>}
- for table B:
  - deleted records: b = {B._uuid=<U2>, B.ref_a=<U1>}

To ensure this, we now delay reparsing row backrefs for deleted rows
until all updates in the current run have been processed.

Without this change, in scenario 2 above, the tracked changes for table
B would be:
- deleted records: b = {B._uuid=<U2>, B.ref_a=[]}

In particular, for strong references, row 'a' can never be deleted in
a transaction that happens strictly before row 'b' is deleted.  In some
cases [0] both rows are deleted in the same transaction and having
B.ref_a=[] would violate the integrity of the database from client
perspective.  This would force the client to always validate that
strong reference fields are non-NULL.  This is not really an option
because the information in the original reference is required for
incrementally processing the record deletion.

[0] with ovn-monitor-all=true, the following command triggers a crash
    in ovn-controller because a strong reference field becomes NULL:
    $ ovn-nbctl --wait=hv -- lr-add r -- lrp-add r rp 00:00:00:00:00:01 1.0.0.1/24
    $ ovn-nbctl lr-del r

Reported-at: https://bugzilla.redhat.com/1932642
Fixes: 72aeb243a52a ("ovsdb-idl: Tracking - preserve data for deleted rows.")
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-04-01 14:15:49 +02:00
Dumitru Ceara
4c0d093b17 ovsdb-idl.at: Make test outputs more predictable.
IDL tests need predictable output from test-ovsdb.

This used to be done by first sorting the output of test-ovsdb and then
applying uuidfilt to predictably translate UUIDs.  This was not
reliable enough in case test-ovsdb processes two or more insert/delete
operations in the same iteration because the order of lines in the
output depends on the automatically generated UUID values.

To fix this we change the way test-ovsdb and test-ovsdb.py generate
outputs and prepend the table name and tracking information before
printing the contents of a row.

All existing ovsdb-idl.at and ovsdb-cluster.at tests are updated to
expect the new output format.

Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-04-01 13:53:20 +02:00
Ben Pfaff
1c337c43ac ovsdb-idl: Break into two layers.
This change breaks the IDL into two layers: the IDL proper, whose
interface to its client is unchanged, and a low-level library called
the OVSDB "client synchronization" (CS) library.  There are two
reasons for this change.  First, the IDL is big and complicated and
I think that this change factors out some of that complication into
a simpler lower layer.  Second, the OVN northd implementation based
on DDlog can benefit from the client synchronization library even
though it would actually be made increasingly complicated by the IDL.

Signed-off-by: Ben Pfaff <blp@ovn.org>
2021-01-21 15:33:56 -08:00
Ben Pfaff
a5c067a8b9 ovsdb-cs: New module that factors out code from ovsdb-idl.
This new module has a single direct user now.  In the future, it
will also be used by OVN.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
2021-01-21 15:09:05 -08:00
Ben Pfaff
2653155874 ovsdb-idl: Add comment.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
2020-12-18 18:32:17 -08:00
Ben Pfaff
de914f4ee5 ovsdb-idl: Fix memory leak sending messages without a session.
When there's no open session, we still have to free the messages that
we make but cannot send.

I'm not confident that these fix actual bugs, because it seems possible
that these code paths can only be hit when the session is nonnull.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
2020-12-18 18:32:07 -08:00
Ben Pfaff
75439c4bdc ovsdb-idl: Avoid redundant clearing and parsing of received data.
ovsdb_idl_db_parse_monitor_reply() clears the IDL and parses the
received data.  There's no need to do it again afterward.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Fixes: 1b1d2e6daa56 ("ovsdb: Introduce experimental support for clustered databases.")
Acked-by: Ilya Maximets <i.maximets@ovn.org>
2020-12-18 18:31:22 -08:00
Dumitru Ceara
97dbef6de9 ovsdb-idl: Fix expected condition seqno when changes are pending.
Commit 17f22fe46142 tried to address this but only covered some of the
cases.

The correct way to report the expected seqno is to take into account if
there already is a condition change that was requested to the server but
not acked yet.  In that case, the new condition change request will be
sent only after the already requested one is acked.  That is, expected
condition seqno when conditions are up to date is db->cond_seqno + 2 in
this case.

Fixes: 17f22fe46142 ("ovsdb-idl: Return correct seqno from ovsdb_idl_db_set_condition().")
Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-12-04 20:00:36 +01:00
Dumitru Ceara
91a6a45802 ovsdb-idl: Fix use-after-free when deleting orphaned rows.
It's possible that the IDL client processes multiple jsonrpc updates
in a single ovsdb_idl_run().

Considering the following updates processed in a single IDL run:
1. Update row R1 from table A while R1 is also referenced by row R2 from
   table B:
   - this adds R1 to table A's track_list.
2. Delete row R1 from table A while R1 is also referenced by row R2 from
   table B:
   - because row R2 still refers to row R1, this will create an orphan
     R1.
   - at this point R1 is still in table A's hmap.

When the IDL client calls ovsdb_idl_track_clear() after it has finished
processing the tracked changes, row R1 gets freed leaving a dangling
pointer in table A's hmap.

To fix this we don't free rows in ovsdb_idl_track_clear() if they are
orphan and still referenced by other rows, i.e., the row's 'dst_arcs'
list is not empty.  Later, when all arc sources (e.g., R2) are
deleted, the orphan R1 will be cleaned up as well.

The only exception is when the whole contents of the IDL are flushed,
in ovsdb_idl_db_clear(), in which case it's safe to free all rows.

Reported-by: Ilya Maximets <i.maximets@ovn.org>
Fixes: 932104f483ef ("ovsdb-idl: Add support for change tracking.")
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-12-03 18:04:03 +01:00
Dumitru Ceara
08e130abb1 ovsdb-idl: Fix memleak when deleting orphan rows.
Pure IDL orphan rows, i.e., for which no "insert" operation was seen,
which are part of tables with change tracking enabled should also be
freed when the table track_list is flushed.

Reported-by: Ilya Maximets <i.maximets@ovn.org>
Fixes: 72aeb243a52a ("ovsdb-idl: Tracking - preserve data for deleted rows.")
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-12-03 18:04:03 +01:00
Dumitru Ceara
2407099c9f ovsdb-idl: Fix memleak when reinserting tracked orphan rows.
Considering the following updates processed by an IDL client:
1. Delete row R1 from table A while R1 is also referenced by row R2 from
   table B:
   - because row R2 still refers to row R1, this will create an orphan
     R1 but also sets row->tracked_old_datum to report to the IDL client
     that the row has been deleted.
2. Insert row R1 to table A.
   - because orphan R1 already existed in the IDL, it will be reused.
   - R1 still has row->tracked_old_datum set (and may also be on the
     table->track_list).
3. Delete row R2 from table B and row R1 from table A.
   - row->tracked_old_datum is set again but the previous
     tracked_old_datum was never freed.

IDL clients use the deleted old_datum values so when multiple delete
operations are received for a row, always track the first one as that
will match the contents of the row the IDL client knew about.

Running the newly added test case with valgrind, without the fix,
produces the following report:

==23113== 327 (240 direct, 87 indirect) bytes in 1 blocks are definitely lost in loss record 43 of 43
==23113==    at 0x4C29F73: malloc (vg_replace_malloc.c:309)
==23113==    by 0x476761: xmalloc (util.c:138)
==23113==    by 0x45D8B3: ovsdb_idl_insert_row (ovsdb-idl.c:3431)
==23113==    by 0x45B7F9: ovsdb_idl_process_update2 (ovsdb-idl.c:2670)
==23113==    by 0x45AFCF: ovsdb_idl_db_parse_update__ (ovsdb-idl.c:2479)
==23113==    by 0x45B262: ovsdb_idl_db_parse_update (ovsdb-idl.c:2542)
==23113==    by 0x45ABBE: ovsdb_idl_db_parse_update_rpc (ovsdb-idl.c:2358)
==23113==    by 0x4576DD: ovsdb_idl_process_msg (ovsdb-idl.c:865)
==23113==    by 0x457973: ovsdb_idl_run (ovsdb-idl.c:944)
==23113==    by 0x40B7B9: do_idl (test-ovsdb.c:2523)
==23113==    by 0x44425D: ovs_cmdl_run_command__ (command-line.c:247)
==23113==    by 0x44430E: ovs_cmdl_run_command (command-line.c:278)
==23113==    by 0x404BA6: main (test-ovsdb.c:76)

Fixes: 72aeb243a52a ("ovsdb-idl: Tracking - preserve data for deleted rows.")
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-12-03 18:04:03 +01:00
Ilya Maximets
f0d23f6795 ovsdb-idl: Fix iteration over tracked rows with no actual data.
When idl removes orphan rows, those rows are inserted into the
'track_list'.  This allows iterators such as *_FOR_EACH_TRACKED () to
return orphan rows that never had any data to the IDL user.  In this
case, it is difficult for the user to understand whether it is a row
with no data (there was no "insert" / "modify" for this row) or it is
a row with zero data (columns were cleared by DB transaction).

The main problem with this condition is that rows without data will
have NULL pointers instead of references that should be there according
to the database schema.  For example, ovn-controller might crash:

 ERROR: AddressSanitizer: SEGV on unknown address 0x000000000100
       (pc 0x00000055e9b2 bp 0x7ffef6180880 sp 0x7ffef6180860 T0)
 The signal is caused by a READ memory access.
 Hint: address points to the zero page.
    #0 0x55e9b1 in handle_deleted_lport /controller/binding.c
    #1 0x55e903 in handle_deleted_vif_lport /controller/binding.c:2072:5
    #2 0x55e059 in binding_handle_port_binding_changes /controller/binding.c:2155:23
    #3 0x5a6395 in runtime_data_sb_port_binding_handler /controller/ovn-controller.c:1454:10
    #4 0x5e15b3 in engine_compute /lib/inc-proc-eng.c:306:18
    #5 0x5e0faf in engine_run_node /lib/inc-proc-eng.c:352:14
    #6 0x5e0e04 in engine_run /lib/inc-proc-eng.c:377:9
    #7 0x5a03de in main /controller/ovn-controller.c
    #8 0x7f4fd9c991a2 in __libc_start_main (/lib64/libc.so.6+0x271a2)
    #9 0x483f0d in _start (/controller/ovn-controller+0x483f0d)

It doesn't make much sense to return non-real rows to the user, so it's
best to exclude them from iteration.

Test included.  Without the fix, provided test will print empty orphan
rows that was never received by idl as tracked changes.

Fixes: 932104f483ef ("ovsdb-idl: Add support for change tracking.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>
2020-11-27 14:31:27 +01:00
Dumitru Ceara
17f22fe461 ovsdb-idl: Return correct seqno from ovsdb_idl_db_set_condition().
If an IDL client sets the same monitor condition twice, the expected
seqno when the IDL contents are updated should be the same for both
calls.

In the following scenario:
1. Client calls ovsdb_idl_db_set_condition(db, table, cond1)
2. ovsdb_idl sends monitor_cond_change(cond1) but the server doesn't yet
   reply.
3. Client calls ovsdb_idl_db_set_condition(db, table, cond1)

At step 3 the returned expected seqno should be the same as at step 1.
Similarly, if step 2 is skipped, i.e., the client calls sets
the condition twice in the same iteration, then both
ovsdb_idl_db_set_condition() calls should return the same value.

Fixes: 46437c5232bd ("ovsdb-idl: Enhance conditional monitoring API")
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-11-16 17:47:12 +01:00
Mark Gray
12eb2f67df ovsdb-idl: Fix *_is_new() IDL functions.
Currently all functions of the type *_is_new() always return
'false'. This patch resolves this issue by using the
'OVSDB_IDL_CHANGE_INSERT' 'change_seqno' instead of the
'OVSDB_IDL_CHANGE_MODIFY' 'change_seqno' to determine if a row
is new and by resetting the 'OVSDB_IDL_CHANGE_INSERT'
'change_seqno' on clear.

Further to this, the code is also updated to match the following
behaviour:

When a row is inserted, the 'OVSDB_IDL_CHANGE_INSERT'
'change_seqno' is updated to match the new database
change_seqno. The 'OVSDB_IDL_CHANGE_MODIFY' 'change_seqno'
is not set for inserted rows (only for updated rows).

At the end of a run, ovsdb_idl_db_track_clear() should be
called to clear all tracking information, this includes
resetting all row 'change_seqno' to zero. This will ensure
that subsequent runs will not see a previously 'new' row.

add_tracked_change_for_references() is updated to only
track rows that reference the current row.

Also, update unit tests in order to test the *_is_new(),
*_is_delete() functions.

Suggested-by: Dumitru Ceara <dceara@redhat.com>
Reported-at: https://bugzilla.redhat.com/1883562
Fixes: ca545a787ac0 ("ovsdb-idl.c: Increase seqno for change-tracking of table references.")
Signed-off-by: Mark Gray <mark.d.gray@redhat.com>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-11-16 17:47:11 +01:00
Ben Pfaff
3630ab86f4 ovsdb-idl: Add comment with program name to ovsdb_idl_loop transactions.
This can make it easier to see what daemon is committing transactions.
Sometimes, in OVN especially, it can be hard to guess.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>
2020-11-02 15:15:19 -08:00
Han Zhou
d08a602b35 Revert "ovsdb-idl: Fix NULL deref reported by Coverity."
This reverts commit 68bc6f88a3a36549fcd3b6248c25c5e2e6deb8f3.
The commit causes a regression in OVN scale test. ovn-northd's CPU
more than doubled for the test scenario: create and bind 12k ports.
Below are some perf data of ovn-northd when running command:
  ovn-nbctl --wait=sb sync

Before reverting this commit:
-   92.42%     0.62%  ovn-northd  ovn-northd          [.] main
   - 91.80% main
      + 68.93% ovn_db_run (inlined)
      + 22.45% ovsdb_idl_loop_commit_and_wait

After reverting this commit:
-   92.84%     0.60%  ovn-northd  ovn-northd          [.] main
   - 92.24% main
      + 92.03% ovn_db_run (inlined)

Reverting this commit avoided 22.45% of the CPU caused by
ovsdb_idl_loop_commit_and_wait().

The commit changed the logic of ovsdb_idl_txn_write__() by adding
the check "datum->keys && datum->values" before discarding unchanged
data in a transaction. However, it is normal for OVSDB clients (
such as ovn-northd) to try to set columns with same empty data
as it is before the transaction. IDL would discard these changes
and avoid sending big transactions to server (which would end up as
no-op on server side). In the ovn scale test scenario mentioned above,
each iteration of ovn-northd would send a transaction to server that
includes all rows of the huge Port_Binding table, which caused the
significant CPU increase of ovn-northd (and also the OVN SB DB server),
resulted in longer end to end latency of OVN configuration changes.

For the original problem the commit 68bc6f88 was trying to fix, it
doesn't seem to be a real problem. The NULL deref reported by
Coverity may be addressed in a future patch using a different approach,
if necessary.

Signed-off-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-08-12 20:31:50 +02:00
Dumitru Ceara
9ed69557e5 ovsdb-idl: Force IDL retry when missing updates encountered.
Adds a generic recovery mechanism which triggers an IDL retry with fast
resync disabled in case the IDL has detected that it ended up in an
inconsistent state due to other bugs in the ovsdb-server/ovsdb-idl
implementation.

Additionally, this commit also:
- bumps IDL semantic error logs to level ERR to make them more
  visible.
- triggers an IDL retry in cases when the IDL client used to try to
  recover (i.e., trying to add an existing row, trying to remove a non
  existent row).

Fixes: db2b5757328c ("lib: add monitor2 support in ovsdb-idl.")
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-07-07 12:28:21 +02:00
Numan Siddique
0401cf5f9e ovsdb idl: Try committing the pending txn in ovsdb_idl_loop_run.
The function ovsdb_idl_loop_run(), after calling ovsdb_idl_run(),
returns a transaction object (of type 'struct ovsdb_idl_txn').
The returned transaction object can be NULL if there is a pending
transaction (loop->committing_txn) in the idl loop object.

Normally the clients of idl library, first call ovsdb_idl_loop_run(),
then do their own processing and create any idl transactions during
this processing and then finally call ovsdb_idl_loop_commit_and_wait().

If ovsdb_idl_loop_run() returns NULL transaction object, then much
of the processing done by the client gets wasted as in the case
of ovn-controller.

The client (in this case ovn-controller), can skip the processing
and instead call ovsdb_idl_loop_commit_and_wait() if the transaction
oject is NULL. But ovn-controller uses IDL tracking and it may
loose the tracked changes in that run.

This patch tries to improve this scenario, by checking if the
pending transaction can be committed in the ovsdb_idl_loop_run()
itself and if the pending transaction is cleared (because of the
response messages from ovsdb-server due to a transaction message
in the previous run), ovsdb_idl_loop_run() can return a valid
transaction object.

CC: Han Zhou <hzhou@ovn.org>
Signed-off-by: Numan Siddique <numans@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2020-06-16 14:06:50 -07:00
Dumitru Ceara
ae25f8c8ff ovsdb-idl: Avoid inconsistent IDL state with OVSDB_MONITOR_V3.
Assuming an ovsdb client connected to a database using OVSDB_MONITOR_V3
(i.e., "monitor_cond_since" method) with the initial monitor condition
MC1.

Assuming the following two transactions are executed on the
ovsdb-server:
TXN1: "insert record R1 in table T1"
TXN2: "insert record R2 in table T2"

If the client's monitor condition MC1 for table T2 matches R2 then the
client will receive the following update3 message:
method="update3", "insert record R2 in table T2", last-txn-id=TXN2

At this point, if the presence of the new record R2 in the IDL triggers
the client to update its monitor condition to MC2 and add a clause for
table T1 which matches R1, a monitor_cond_change message is sent to the
server:
method="monitor_cond_change", "clauses from MC2"

In normal operation the ovsdb-server will reply with a new update3
message of the form:
method="update3", "insert record R1 in table T1", last-txn-id=TXN2

However, if the connection drops in the meantime, this last update might
get lost.

It might happen that during the reconnect a new transaction happens
that modifies the original record R1:
TXN3: "modify record R1 in table T1"

When the client reconnects, it will try to perform a fast resync by
sending:
method="monitor_cond_since", "clauses from MC2", last-txn-id=TXN2

Because TXN2 is still in the ovsdb-server transaction history, the
server replies with the changes from the most recent transactions only,
i.e., TXN3:
result="true", last-txbb-id=TXN3, "modify record R1 in table T1"

This causes the IDL on the client in to end up in an inconsistent
state because it has never seen the update that created R1.

Such a scenario is described in:
https://bugzilla.redhat.com/show_bug.cgi?id=1808580#c22

To avoid this issue, the IDL will now maintain (up to) 3 different
types of conditions for each DB table:
- new_cond: condition that has been set by the IDL client but has
  not yet been sent to the server through monitor_cond_change.
- req_cond: condition that has been sent to the server but the reply
  acknowledging the change hasn't been received yet.
- ack_cond: condition that has been acknowledged by the server.

Whenever the IDL FSM is restarted (e.g., voluntary or involuntary
disconnect):
- if there is a known last_id txn-id the code ensures that new_cond
  will contain the most recent condition set by the IDL client
  (either req_cond if there was a request in flight, or new_cond
  if the IDL client set a condition while the IDL was disconnected)
- if there is no known last_id txn-id the code ensures that ack_cond will
  contain the most recent conditions set by the IDL client regardless
  whether they were acked by the server or not.

When monitor_cond_since/monitor_cond requests are sent they will
always include ack_cond and if new_cond is not NULL a follow up
monitor_cond_change will be generated afterwards.

On the other hand ovsdb_idl_db_set_condition() will always modify new_cond.

This ensures that updates of type "insert" that happened before the last
transaction known by the IDL but didn't match old monitor conditions are
sent upon reconnect if the monitor condition has changed to include them
in the meantime.

Fixes: 403a6a0cb003 ("ovsdb-idl: Fast resync from server when connection reset.")
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-06-15 14:07:29 +02:00
Mark Michelson
89b522aee3 ovsdb-idl: Add function to reset min_index.
If an administrator removes all of the databases in a cluster from
disk, then ovsdb IDL clients will have a problem. The databases will all
reset their stored indexes to 0, so The IDL client's min_index will be
higher than the indexes of all databases in the cluster. This results in
the client constantly connecting to databases, detecting the data as
"stale", and then attempting to connect to another.

This function provides a way to reset the IDL to an initial state with
min_index of 0. This way, the client will not wrongly detect the
database data as stale and will recover properly.

Notice that this function is not actually used anywhere in this patch.
This will be used by OVN, though, since OVN is the primary user of
clustered OVSDB.

Signed-off-by: Mark Michelson <mmichels@redhat.com>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-05-28 18:36:48 +02:00
William Tu
68bc6f88a3 ovsdb-idl: Fix NULL deref reported by Coverity.
When 'datum.values' or 'datum.keys' is NULL, some code path calling
into ovsdb_idl_txn_write__ triggers NULL deref.

An example:
ovsrec_open_vswitch_set_cur_cfg(const struct ovsrec_open_vswitch
{
    struct ovsdb_datum datum;
    union ovsdb_atom key;

    datum.n = 1;
    datum.keys = &key;

    key.integer = cur_cfg;
//  1. assign_zero: Assigning: datum.values = NULL.
    datum.values = NULL;
//  CID 1421356 (#1 of 1): Explicit null dereferenced (FORWARD_NULL)
//  2. var_deref_model: Passing &datum to ovsdb_idl_txn_write_clone,\
//     which dereferences null datum.values.
    ovsdb_idl_txn_write_clone(&row->header_, &ovsrec_open_vswitch_col
}

And with the following calls:
ovsdb_idl_txn_write_clone
  ovsdb_idl_txn_write__
    6. deref_parm_in_call: Function ovsdb_datum_destroy dereferences
       datum->values
    ovsdb_datum_destroy

Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
2020-05-20 08:49:27 -07:00
Dumitru Ceara
2b7e536fa5 Revert "ovsdb-idl: Avoid sending redundant conditional monitoring updates"
This reverts commit 5351980b047f4dd40be7a59a1e4b910df21eca0a.

If the ovsdb-server reply to "monitor_cond_since" requests has
"found" == false then ovsdb_idl_db_parse_monitor_reply() calls
ovsdb_idl_db_clear() which iterates through all tables and
unconditionally sets table->cond_changed to false.

However, if the client had already set a new condition for some of the
tables, this new condition request will never be sent to ovsdb-server
until the condition is reset to a different value. This is due to the
check in ovsdb_idl_db_set_condition().

One way to replicate the issue is described in the bugzilla reporting
the bug, when ovn-controller is configured to use "ovn-monitor-all":
https://bugzilla.redhat.com/show_bug.cgi?id=1808125#c6

Commit 5351980b047f tried to optimize sending redundant conditional
monitoring updates but the chances that this scenario happens with the
latest code is quite low since commit 403a6a0cb003 ("ovsdb-idl: Fast
resync from server when connection reset.") changed the behavior of
ovsdb_idl_db_parse_monitor_reply() to avoid calling ovsdb_idl_db_clear()
in most cases.

Reported-by: Dan Williams <dcbw@redhat.com>
Reported-at: https://bugzilla.redhat.com/1808125
CC: Andy Zhou <azhou@ovn.org>
Fixes: 5351980b047f ("ovsdb-idl: Avoid sending redundant conditional monitoring updates")
Acked-by: Han Zhou <hzhou@ovn.org>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-03-26 22:27:45 +01:00
Han Zhou
ca367fa5f8 ovsdb-idl.c: Allows retry even when using a single remote.
When clustered mode is used, the client needs to retry connecting
to new servers when certain failures happen. Today it is allowed to
retry new connection only if multiple remotes are used, which prevents
using LB VIP with clustered nodes. This patch makes sure the retry
logic works when using LB VIP: although same IP is used for retrying,
the LB can actually redirect the connection to a new node.

Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2019-08-21 11:30:03 -07:00