The purpose of reconnect_deadline__() function is twofold:
1. Its result is used to tell if the state has to be changed right now
in reconnect_run().
2. Its result also used to determine when the process need to wake up
and call reconnect_run() for a next time, i.e. when the state may
need to be changed next time.
Since introduction of the 'receive-attempted' feature, the function
returns LLONG_MAX if the deadline is in the future. That works for
the first case, but doesn't for the second one, because we don't
really know when we need to call reconnect_run().
This is the problem for applications where jsonrpc connection is the
only source of wake ups, e.g. ovn-northd. When the network goes down
silently, e.g. server looses IP address due to DHCP failure, ovn-northd
will sleep in the poll loop indefinitely after being told that it
doesn't need to call reconnect_run() (deadline == LLONG_MAX).
Fixing that by actually returning the expected time if it is in the
future, so we will know when to wake up. In order to keep the
'receive-attempted' feature, returning 'now + 1' in case where the
time has already passed, but receive wasn't attempted. That will
trigger a fast wake up, so the application will be able to attempt the
receive even if there was no real events. In a correctly written
application we should not fall into this case more than once in a row.
'+ 1' ensures that we will not transition into a different state
prematurely, i.e. before the receive is actually attempted.
Fixes: 4241d652e4 ("jsonrpc: Avoid disconnecting prematurely due to long poll intervals.")
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This follows up on commit 4241d652e4 ("jsonrpc: Avoid disconnecting
prematurely due to long poll intervals."), which implemented the same
thing in C.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Requested-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
The Python IDL implementation supports ovsdb cluster connections.
This patch is a follow up to commit 31e434fc98, it adds the option of
connecting to the leader (the default) in the Raft-based cluster. It mimics
the exisiting C IDL support for clusters introduced in commit 1b1d2e6daa.
The _Server database schema is first requested, then a monitor of the
Database table in the _Server Database. Method __check_server_db verifies
the eligibility of the server. If the attempt to obtain a monitor of the
_Server database fails and a cluster id was not provided this implementation
proceeds to request the data monitor. If a cluster id was provided via the
set_cluster_id method then the connection is aborted and a connection to a
different node is instead attempted, until a valid cluster node is found.
Thus, when supplied, cluster id is interpreted as the intention to only
allow connections to a clustered database. If not supplied, connections to
standalone nodes, or nodes that do not have the _Server database are
allowed. change_seqno is not incremented in the case of Database table
updates.
Acked-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Ted Elhourani <ted.elhourani@nutanix.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
This is aimed at an upcoming database clustering implementation, where it's
desirable to try all of the cluster members quickly before backing off to
retry them again in sequence.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Russell Bryant <russell@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
https://review.openstack.org/#/c/432906/
flake8-import-order adds 3 new flake8 warnings:
I100: Your import statements are in the wrong order.
I101: The names in your from import are in the wrong order.
I201: Missing newline between sections or imports.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Comparing None to an integer worked in Python 2, but fails in Python 3.
Signed-off-by: Russell Bryant <russell@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
Receiving data is not the only reasonable way to verify that a connection
is up. For example, on a TCP connection, receiving an acknowledgment that
the remote side has accepted data that we sent is also a reasonable means.
Therefore, this commit generalizes the naming.
Also, similarly for the Python implementation: Reconnect.received() becomes
Reconnect.activity().
Signed-off-by: Ben Pfaff <blp@nicira.com>
Replaced all instances of Nicira Networks(, Inc) to Nicira, Inc.
Feature #10593
Signed-off-by: Raju Subramanian <rsubramanian@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
The python reconnect library attempted to send a probe every 0
milliseconds instead of disabling probing when the probe_interval
was zero.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
The ovs_error() and ovs_fatal() helper functions are useful enough
to be ported to Python. A user will be added in a future commit.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
This patch does minor style cleanups to the code in the python and
tests directory. There's other code floating around that could use
similar treatment, but updating it is not convenient at the moment.
Commit 5eda645e36 (ovsdb-server: Report time since last connect and disconnect
for each manager.) used a conditional expression in reconnect.py. That syntax
is only supported in Python 2.5 and later. XenServer 5.6 is based on RHEL 5,
which uses Python 2.4.3, so various OVS scripts on XenServer fail with Python
tracebacks.
Reported-by: Cedric Hobbs <cedric@nicira.com>
Only the time connected (if connected) or disconnected (if disconnected) is
currently reported for each manager. Change to reporting both in seconds since
the last connect and disconnect events respectively. An empty value indicates
no previous connection or disconnection.
This can help diagnose certain connectivity problems, e.g. flapping.
Requested-by: Peter Balland <peter@nicira.com>
Bug #4833.
reconnect uses the same connection state names as rconn with the exception of
the above. This commit makes their states identical, which should reduce
confusion for people debugging connection problems.
Commit a4613b01ab (ovsdb: Change the way connection duration time is reported
in Manager table.), pushed earlier today, requires this commit, so OVSDB has
been unbuildable from then to now.
These initial bindings pass a few hundred of the corresponding tests
for C implementations of various bits of the Open vSwitch library API.
The poorest part of them is actually the Python IDL interface in
ovs.db.idl, which has not received enough attention yet. It appears
to work, but it doesn't yet support writes (transactions) and it is
difficult to use. I hope to improve it as it becomes clear what
semantics Python applications actually want from an IDL.