2012-09-14T05:38:26Z|00001|jsonrpc|WARN|tcp:127.0.0.1:6634: receive error: Con
ovsdb-client: transaction failed (Connection reset by peer)
NOTE: This occurs intermittently depending on how ovsdb-server runs.
Running ovsdb-client on a remote machine increases the possibility.
This is because ovsdb-server closes newly accepted tcp connection.
The following changesets caused it. struct jsonrpc_session::dscp isn't set
based on listening socket's dscp value.
- ovsdb-server creates passive connection and listens on it.
- ovsdb-server accepts connection by ovsdb_jsonrpc_server_run().
The accepted socket inherits from the listening sockets.
ovsdb_jsonrpc_server_run() creates json session, but leaves dscp of
struct jsonrpc_session zero.
- On calling reconfigure_from_db(), it resets dscp value to
all jsonrpc sessions. Eventually jsonrpc_session_set_dscp() is called.
Then jsonrpc_session_force_reconnect() closes the connection.
With this patch,
- struct jsonrpc_session::dscp is correctly set based on
listening sockets dscp value.
- dscp of listening socket is changed dynamically by setsockopt.
This leaves a window where accepted socket may have old dscp.
But it is ignored for now because it would complicates codes
too much.
The related change sets:
- 0442efd9b1a88d923b56eab6b72b6be8231a49f7
Reapplying the dscp changes: No need to restart DB/OVS on changing
dscp value.
- 59efa47adf3234ec51541405726d033173851285
Revert DSCP update changes.
- b2e18db292cd4962af3248f11e9f17e6eaf9c033
No need to restart DB / OVS on changing dscp value.
- f125905cdd3dc0339ad968c0a70128807884b400
Allow configuring DSCP on controller and manager connections.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Mehak Mahajan <mmahajan@nicira.com>
Until now, the jsonrpc code has only counted receiving a full JSON-RPC
messages as activity. This could theoretically time out, then, while a
very long message is in transit or if a slow link is involved. This commit
changes this code to count receiving any part of a message as activity.
This isn't a problem for OpenFlow connections because OpenFlow messages are
at most 64 kB in size.
This problem hasn't actually been observed in practice.
Bug #12789.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Until now, the jsonrpc module has used messages received from the
remote peer as the sole means to determine that the JSON-RPC
connection is up. This could in theory interact badly with a
remote peer that stops reading and processing messages from the
receive queue when there is a backlog in the send queue for a
given connection (ovsdb-server is an example of a program that
behaves this way). This commit fixes the problem by expanding
the definition of "activity" to include successfully sending
JSON-RPC data that was previously queued.
The above change is exactly analogous to the similar change
made to the rconn library in commit 133f2dc95454 (rconn: Treat
draining a message from the send queue as activity.).
Bug #12789.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Receiving data is not the only reasonable way to verify that a connection
is up. For example, on a TCP connection, receiving an acknowledgment that
the remote side has accepted data that we sent is also a reasonable means.
Therefore, this commit generalizes the naming.
Also, similarly for the Python implementation: Reconnect.received() becomes
Reconnect.activity().
Signed-off-by: Ben Pfaff <blp@nicira.com>
This patch reapplies the changes that were reverted with the commit 59efa47
(Revert DSCP update changes.). It also addresses the problem introduced by
the original commits, cd8fca2 ((jsonrpc: Correctly setting the dscp value
before reconnect.) and b2e18d (No need to restart DB / OVS on changing
dscp value.), that caused numerous unit test failures on some systems (as
diagnosed by valgrind).
With this change there is no need to restart the DB or OVS on configuring a
different value for the manager or controller connection respectively. On
detecting a change in the dscp value on the socket, the previous socket is
closed and a new socket is created and connection is established with the new
configured dscp value.
Signed-off-by: Mehak Mahajan <mmahajan@nicira.com>
This reverts commit cd8fca2ba0a7d036da069a4484d501bdc7a6f611 (jsonrpc:
Correctly setting the dscp value before reconnect.) and commit
b2e18db292cd4962af3248f11e9f17e6eaf9c033 (No need to restart DB / OVS on
changing dscp value.), which on some systems causes numerous unit test
failures that valgrind diagnoses as:
Conditional jump or move depends on uninitialised value(s)
at 0x805F63F: jsonrpc_session_set_dscp (jsonrpc.c:1061)
by 0x804F45D: ovsdb_jsonrpc_server_set_remotes (jsonrpc-server.c:417)
by 0x804B775: reconfigure_from_db (ovsdb-server.c:656)
by 0x804C231: main (ovsdb-server.c:159)
Signed-off-by: Ben Pfaff <blp@nicira.com>
In commit b2e18d(No need to restart DB / OVS on changing dscp value.), the
dscp value was wrongly set after the reconnect.
Signed-off-by: Mehak Mahajan <mmahajan@nicira.com>
Reported-by: Ravi Kerur <rkerur@gmail.com>
With this change there is no need to restart the DB or OVS on configuring a
different value for the manager or controller connection respectively. On
detecting a change in the dscp value on the socket, the previous socket is
closed and a new socket is created and connection is established with the new
configured dscp value.
Signed-off-by: Mehak Mahajan <mmahajan@nicira.com>
Replaced all instances of Nicira Networks(, Inc) to Nicira, Inc.
Feature #10593
Signed-off-by: Raju Subramanian <rsubramanian@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
jsonrpc_recv() could take an unbounded amount of CPU time as long as data
kept arriving, preventing other work from taking place. This limits the
amount of work to processing at most 25 kB of received data and then
yielding to the caller.
Signed-off-by: Ben Pfaff <blp@nicira.com>
The DSCP bits of a connection have nothing to do with the
reconnection state machine. This pulls them up into jsonrpc which
needs them to properly establish connections.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
There isn't a lot of value in sending inactivity probes on unix
sockets. This patch changes the default to disable them.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
The changes allow the user to specify a separate dscp value for the
controller connection and the manager connection. The value will take
effect on resetting the connections. If no value is specified a default
value of 192 is chosen for each of the connections.
Feature #10074
Requested-by: Rajiv Ramanathan <rramanathan@nicira.com>
Signed-off-by: Mehak Mahajan <mmahajan@nicira.com>
If a server returned an error in response to a request,
jsonrpc_transact_block() would ignore it. This patch changes the
behavior and updates its callers to gracefully handle the
possibility.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
This function is an implementation detail. The JSONRPC unit test used it,
but not for any good reason, so this commit changes the test to avoid
using it.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Until now, when the poll_loop module's log level was turned up to "debug",
it would log a backtrace of the call stack for the event that caused poll()
to wake up in poll_block(). This was pretty useful from time to time to
find out why ovs-vswitchd was using more CPU than expected, because we
could find out what was causing it to wake up.
But there were some issues. One is simply that the backtrace was printed
as a series of hexadecimal numbers, so GDB or another debugger was needed
to translate it into human-readable format. Compiler optimizations meant
that even the human-readable backtrace wasn't, in my experience, as helpful
as it could have been. And, of course, one needed to have the binary to
interpret the backtrace. When the backtrace couldn't be interpreted or
wasn't meaningful, there was essentially nothing to fall back on.
This commit changes the way that "debug" logging for poll_block() wakeups
works. Instead of logging a backtrace, it logs the source code file name
and line number of the call to a poll_loop function, using __FILE__ and
__LINE__. This is by itself much more meaningful than a sequence of
hexadecimal numbers, since no additional interpretation is necessary. It
can be useful even if the Open vSwitch version is only approximately known.
In addition to the file and line, this commit adds, for wakeups caused by
file descriptors, information about the file descriptor itself: what kind
of file it is (regular file, directory, socket, etc.), the name of the file
(on Linux only), and the local and remote endpoints for socket file
descriptors.
Here are a few examples of the new output format:
932-ms timeout at ../ofproto/in-band.c:507
[POLLIN] on fd 20 (192.168.0.20:35388<->192.168.0.3:6633) at ../lib/stream-fd.c:149
[POLLIN] on fd 7 (FIFO pipe:[48049]) at ../lib/fatal-signal.c:168
This commit makes the status of manager connections visible via the Manager
table in the database. Two new columns have been created for this purpose:
'is_connected' and 'status'. The former is a boolean flag, and the latter is a
string-string map which may contain the keys "last_error", "state", and
"time_in_state".
Requested-by: Keith Amidon <keith@nicira.com>
Reviewed by: Ben Pfaff.
Feature #3692.
Many OVS functions return 0, EOF, or errno. There are several places in the
codebase where a return value is converted to a string. All must decide whether
the return value is set, and if it is, whether it is an errno value, EOF, or
otherwise invalid. This commit consolidates that code.
Reviewed by Ben Pfaff.
ovs_queue doesn't seem very useful; it's just a singly-linked list. It's
more generally useful to use a general-purpose "struct list" for lists of
packets, so this commit adds such a member to "struct ofpbuf" and shifts
the existing users to use it.
I'm retaining the "managers" column in the Open_vSwitch table for now, but
I hope that applications transition to using "manager_options" eventually
so that we could drop it.
CC: Andrew Lambeth <wal@nicira.com>
CC: Jeremy Stribling <strib@nicira.com>
Adding a macro to define the vlog module in use adds a level of
indirection, which makes it easier to change how the vlog module must be
defined. A followup commit needs to do that, so getting these widespread
changes out of the way first should make that commit easier to review.
In the ovsdb-server log there are fairly continuous messages like these:
Apr 26 11:27:55|15254|reconnect|INFO|unix:/tmp/stream-unix.31734.0: connected
Apr 26 11:27:55|15255|reconnect|INFO|unix:/tmp/stream-unix.31734.0: connection dropped
Apr 26 11:28:00|15256|reconnect|INFO|unix:/tmp/stream-unix.31810.0: connecting...
Apr 26 11:28:00|15257|reconnect|INFO|unix:/tmp/stream-unix.31810.0: connected
Apr 26 11:28:00|15258|reconnect|INFO|unix:/tmp/stream-unix.31810.0: connection dropped
These just indicate that ovs-vsctl is connecting to ovsdb-server from,
for example, the "vif" script. But there's no need to log all that detail;
it's simply not useful. This commit suppresses it.
Bug #2715.
Always passing 0 to reconnect_disconnected() means that it uses a generic
log message ("connection dropped"). By passing the error code, as done by
this commit, reconnect_disconnected() can log a more specific message.
Sometimes, when a user asks me to help debug a problem, it turns out that
an SSL connection was being made on a TCP port, or vice versa, or that an
OpenFlow connection was being made on a JSON-RPC port, or vice versa, and
so on. This commit adds log messages that diagnose this kind of problem,
e.g. "tcp:127.0.0.1:6633: received JSON-RPC data on OpenFlow channel".
The fatal-signal library notices and records fatal signals (e.g. SIGTERM)
and terminates the process on the next trip through poll_block(). But
some special utilities do not always invoke poll_block() promptly, e.g.
"ovs-ofctl monitor" does not call poll_block() as long as OpenFlow messages
are available. But these special cases seem like they are all likely to
call into functions that themselves block (those with "_block" in their
names). So make a new rule that such functions should always call
fatal_signal_run(), either directly or through poll_block(). This commit
implements and documents that rule.
Bug #2625.
This is unlikely to occur very often in practice, because s->stream
usually gets stuffed into s->rpc before long, but it is still a good idea
to fix it.
Some upcoming code wants to serialize JSON into a "struct ds" dynamic
string buffer, so expose an interface to do this.
This commit doesn't change much, but it renames some functions internal
to json.c to make the naming more consistent.
Also, make jsonrpc_log_msg() use this new function, since it is a more
straightforward way to do what it wants.
jsonrpc_session_connect() indirectly called reconnect_disconnected(), which
told the reconnect object that the connection had failed, before it told it
that the connection attempt had started. When the connection didn't
complete immediately, this caused the connection to time out immediately,
without any backoff.
Reported by Jeremy Stribling.
SSL, which will be added in an upcoming commit, requires some background
processing, which is best done in a "run" function in our architecture.
This commit adds stream_run() and stream_run_wait() and calls to them from
the places where they will be required.