This was a somewhat difficult merge since there was a fair amount of
superficially divergent development on the two branches, especially in the
datapath.
This has been build-tested against XenServer 5.5.0 and XenServer 5.7.0
build 15122. It has been booted and connected to XenCenter on 5.5.0.
The merge revealed a couple of outstanding bugs, which will be fixed on
citrix and then merged back into master.
Citrix QA scripts expect that "brctl show" shows a bond interface for each
bond that is added to a bridge. The only way to do that without modifying
brctl itself is to create an actual network device by that name, so this
commit adds a new bonding configuration key that causes an internal
device by the name of the bond to be created.
This feature is also necessary, but not sufficient, to allow XenCenter to
accurately show the link status and statistics of bridges (bug #1363).
This new configuration key is intentionally undocumented, because I don't
want anyone to use it.
Bug NIC-19.
Originally, the function dp_enumerate() initialized the 'all_dps'
argument. This is inconsistent with most other functions that take an
svec argument, which would only clear the contents. Further, if someone
were not careful when reusing the svec, it could lead to memory leaks.
With this change, the caller is expected to first call svec_init() on
the argument.
When a port's MAC is explicitly specified in the config file, we did not
initialize 'iface' and therefore later we could dereference a wild pointer.
This commit fixes the problem.
Justin reported that adding an internal port to a bridge caused
ovs-vswitchd to log a pair of warnings. This commit suppresses those
warnings, which were harmless.
For consistency, it's best if every netdev function takes a netdev instead
of a device name. The netdev_nodev_*() functions have always been a bit
ugly.
The netdev_nodev_*() functions have always been a bit of a kluge. It's
better to keep a network device open than to open it every time that it is
needed. This commit converts all of the users of these functions to use
the corresponding functions that take a "struct netdev *" instead.
The netdev_nodev_*() functions have always been a bit of a kluge. It's
better to keep a network device open than to open it every time that it is
needed.
Two different pieces of code in vswitchd were both iterating over all
the interfaces in a bridge and deleting some of them, then deleting any
ports that ended up with no interfaces because of this. This commit
factors this operation out into a helper function.
When there is the possibility of multiple classes of netdevs,
netdev_add_router() needs to know which of these to use, so it needs a
"struct netdev *" parameter.
Until now ovs-vswitchd has created the files in /proc/net/bonding, but not
updated them, because there was little need. But the Citrix QA tests check
that the list of bond hashes in that file is kept up-to-date, so we need
to update them whenever the bond hashes (or other data in the file) change.
This commit does that.
Bug NIC-16.
The Citrix QA scripts require the bond hashes and their assigned devices
to be noted in /proc/net/bonding. We weren't doing that, so this commit
adds them.
Bug NIC-16.
Previously, the only way to query the flow table was to run "ovs-ofctl
dump-flows". This returned most flows, but not those marked hidden by
secchan. Hidden flows are setup by mechanisms such as in-band control,
since they must not be modified by users of the controller. However,
when debugging problems on the switch, it is often useful to see what
the flow table is actually doing. The new "bridge/dump-flows" command
added to ovs-appctl shows all flows being used by the OpenFlow stack.
Before now, the default probe interval (the idle time after which an echo
request is sent on an OpenFlow connection) was set to 15 seconds. The
fail-open timeout is 3 times the probe interval, so this meant that it
took 45 seconds for a switch to fail open.
Users at Nicira have commented that this is too long. They don't like the
idea that the network will be down for most of a minute before it begins to
recover. So this commit changes the default probe interval to 5 seconds,
hence the fail-open timeout to 15 seconds.
Users at Nicira have commented that a maximum reconnection time of 15
seconds, which was the default, is too long. This commit cuts it to 8
seconds, on the theory that an administrator is willing to wait that long
before deciding that a change that should restore connectivity did not
work.
The xapi database for PIFs specifies the MAC address that should be used
for bonds, but interface-reconfigure didn't honor it and ovs-vswitchd
didn't have a way to configure it anyhow. This commit fixes both problems.
Bug #1645.
The "fdb/show" unixctl command was showing vswitch-internal port indexes,
which cannot be meaningfully interpreted by software outside vswitchd.
Also, they potentially change every time the vswitchd configuration file
changes. This commit changes it to use a datapath port index instead,
which are both more meaningful and more stable.
To implement "brctl showmacs" for bridge compatibility, brcompatd needs to
be able to extract the MAC learning table from ovs-vswitchd. This provides
a way, and it may be directly useful to switch administrators also.
If a network device takes a few seconds to detect carrier, as some do, then
when bringing up a network device and then immediately adding that device
to a bridge, the bond code would start out with that slave considered down
and apply the full updelay to it before bringing it up. With the 31-second
updelay set by XenServer, this is excessive: we end up having no
connectivity at all for 31 seconds even though there is no reason for it.
This commit makes the bond code disregard the updelay when an interface
comes up on a bond that has no enabled interfaces, and updates the
documentation to match.
Part of bug #1566.
At startup, the vswitch needs to delete datapaths that are not configured
by the administrator. Until now this was done by knowing the possible
names of Linux datapaths. This commit cleans up by allowing each
datapath class to enumerate its existing datapaths and their names.
Commit f4b96c92c "vswitch: Disallow bridges named "dpN" or "nl:N"" disabled
naming bridges "dpN" because the vswitchd code made the bad assumption that
the bridge's local port has the same name as the bridge, which was not
true (at the time) for bridges named dpN. Now that assumption has been
eliminated, so this commit eliminates the restriction too.
This change is also a cleanup in that it eliminates one form of the
vswitch's dependence on specifics of the dpif implementation.
Soon we will allow for multiple datapath implementations. By allowing
the datapath to choose the port numbers, we possibly simplify some datapath
implementations, and the datapath's clients don't have to guess (or to
check) what port numbers are free, so this seems like a better way to go.
dpif_id() is often used in error messages, e.g. "dp%u: screwed up". But
soon we will be generalizing the concept of a datapath, so it is better
to have a function that returns a full name, e.g. "%s: screwed up".
Accordingly, this commit replaces dpif_id() by a new function dpif_name()
that does so.
The 'minor' member of struct dpif is used for two different purposes:
for printing in log messages and for encapsulating in NetFlow messages.
The needs in each case are different, so we should break up these uses.
This commit does half of that, by introducing a new function to retrieve
NetFlow ids and using it where appropriate.
With multiple kinds of datapaths, code should not just use
"dp%u" along with dpif_minor() to print a datapath name, because not all
datapaths can sensibly be named that way. We want to use a function
with a name like dpif_get_name() to retrieve a datapath name for printing
to the user, in which case the existing dpif_get_name() function would be
confusing. So rename the existing one to something more explicit.
When vSwitch does discovery, it is supposed to update resolv.conf by
default. The way configuration parameters were being read, it would
disable this update by default.
In vSwitch, the minimum probe interval is supposed to be 5 seconds, but
that was not enforced. If no interval was specified in the config file,
a value of 0 was being used, which would cause probes to never be sent
and the rconn not to move out of its ACTIVE state.
Possible fix to Bug #1466.
If a user moves from one controller to another, we did not remove the
cacert. This prevents the switch from connecting to the new controller.
To ease confusion, we now delete the cacert when the user changes or
removes the controller in xsconsole.
Note: This commit has a minor security issue, since we do not remove
trust for the old certificate until the switch is restarted. In
general, users should only be connected to trusted servers, so the
impact should be low. Fixes this would require larger changes to the
vconn-ssl code, which we don't want to do so late in the release cycle.
Bug #1457
The SHA-1 library that we used until now was taken from RFC 3174. That
library has no clearly free license statement, only a license on the text
of the RFC. This commit replaces this library with a modified version of
the code from the Apache Portable Runtime library from apr.apache.org,
which is licensed under the Apache 2.0 license, the same as the rest of
Open vSwitch.
The 'packet' argument to process_flow() is allowed to be null, but some of
the code was assuming that it was always non-null, which caused a segfault
while revalidating ARP flows.
Bug #1394.
As long as bonding has been implemented, the vswitch has refused to learn
from multicast packets that arrive on a bond slave if it has already
learned any other port for that source MAC, because it is likely that we
sent the packet out ourselves and are only now receiving a copy of it on
our active slave.
This is entirely correct, but it does not go far enough. In fact, the
bridge needs to entirely drop such packets. Otherwise, a host whose MAC
is assigned to a slave other than the active slave will receive a second
copy of multicast packets that it sends out the bond, and other ports
will receive two copies of every multicast packet sent by such a host.
This commit implements this new policy, which simplifies the code at the
same time.
Bug #1387.
When a bond slave goes down, all of the MACs that were on it are migrated
to another slave, but this is not apparent to the switch that the bond is
connected to until each MAC sends out a packet. This causes incoming
traffic for a given MAC to be dropped until the MAC sends out a packet.
This is not usually a problem, because traffic is not ordinarily one-way,
and we can't avoid losing some packets in some cases, but we can do a
little better by sending out a gratuitous learning packet on the new slave
as soon as we know about it, and that is what this commit implements.
Bug #1290.
Whether a bond slave is enabled should be based on whether the device's
PHY sees carrier, not based on whether the device is configured up or down.
(Note that a device that is configured down will always see "no carrier").
Otherwise a device that is up but has no carrier will initially be enabled,
which does not make sense.
This has no effect on interfaces that are not bond slaves, because the
"enabled" setting is used only by bond slaves.
Bug #1247.