The vconn code is a relative fossil as OVS code goes. It was written
before we had really figured how code should fit together. Part of that
history is that it used poll_fd_callback() to register callbacks without
the assistance of other code. That isn't how the rest of OVS works now;
this code is the only remaining user of that function.
To make it more like the rest of the system, this code gets rid of the use
of poll_fd_callback(). It also adds vconn_run() and vconn_run_wait()
functions and calls to them from the places where they are now required.
When the switch is configured to connect to a controller that accepts
connections, waits a few seconds, and then disconnects without setting up
flows, currently this causes "fail-open" to flush the flow table and
stop setting up new flows during the connection duration. This is OK if
it happens once, but it can easily happen every 8 seconds with typical
backoff settings, and that isn't so great.
This commit changes fail-open to only flush the flow table once the switch
appears to have been admitted by the controller, which prevents these
frequent network interruptions.
Thanks to Jesse Gross for especially valuable feedback.
QA notes: Behavior in fail-open and especially behavior with a controller
that rejects the switch after it connects needs to be re-tested. The
ovs-controller --mute switch added by this commit is one simple way to
create such a controller.
CC: Peter Balland <peter@nicira.com>
Bug #1695. Bug #2055.
In-band control needs to know the IP and port of the controller, so that
it can set up the correct flows to talk to that controller. Until now,
the rconn code has only made this available when a connection was actually
in progress. This means that, say, ARP packets will not be allowed through
when the rconn backs off. The same is true of packets sent by switches
that access the controller through this one.
This commit makes the rconn cache the remote IP and port and local IP
across connection attempts, improving the situation. In particular, it
reduces the overall amount of time that it takes to connect in my own
simple test case from over 10 seconds to about 2 seconds.
This was a somewhat difficult merge since there was a fair amount of
superficially divergent development on the two branches, especially in the
datapath.
This has been build-tested against XenServer 5.5.0 and XenServer 5.7.0
build 15122. It has been booted and connected to XenCenter on 5.5.0.
The merge revealed a couple of outstanding bugs, which will be fixed on
citrix and then merged back into master.
Users at Nicira have commented that a maximum reconnection time of 15
seconds, which was the default, is too long. This commit cuts it to 8
seconds, on the theory that an administrator is willing to wait that long
before deciding that a change that should restore connectivity did not
work.
Previously, rconn and vconn only allowed users to find out about the
remote IP address. This set of changes allows users to retrieve the
remote port, local IP, and local port used for the connection.