This removes over 1000 lines of code from bridge.c and will make it
easier to moving the bonding implementation into ofproto as part of
future development.
The code that enables and disables bond slaves was a bit of a mess:
* Disabling a slave could recursively enable a different slave.
* Processing a flow could enable a slave.
This commit gets rid of both of those properties, which made it difficult
to reason about the code paths along which slaves would be enabled and
disabled.
Bug #5121.
There's no reason that I can see to maintain this information in struct
port and struct iface. It's redundant, since the lacp implementation
maintains the same information.
In each of the cases converted here, an shash was used simply to maintain
a set of strings, with the shash_nodes' 'data' values set to NULL. This
commit converts them to use sset instead.
The test ofproto_has_primary_controller() is meaningless, since OFPP_NORMAL
can cause the MAC learning table and port bonding to be in use even when
there is a controller.
I see that this bug has been here since early 2009, when the OFPP_NORMAL
feature was introduced in the bridge. (Obviously it's not a severe
problem.)
Before this patch, CFM would report unexpected remote maintenance
points in the database. This commit no longer exposes this
information.
Information about precisely why a link is faulty is more interesting
to a system administrator debugging a problem than a controller
which will generally only care about whether or not a link is
faulty. For simplicity sake, this commit removes this information
from the database where it was somewhat awkwardly placed. In the
future it may be valuable to report the information through
ovs-appctl commands for debugging purposes.
It doesn't really make sense for the CFM code to be composing
packets. Its caller is better placed to compose the appropriate
L2 header. This commit pulls that logic out of the CFM library.
This commit generalizes compose_lacp_packet() into new
compose_packet() function. This new function will be used to send
CCM messages in future patches.
Port destruction could cause dangling lacp objects to live in the
lacp module's 'all_lacps' list. This could cause bogus output for
the lacp/show appctl command.
Bug #5088.
In my opinion, this makes the code more obviously correct in many places,
because there are generally fewer variables. One must generally keep two
variables in sync for iterating through an array: the array index and the
contents of the array at that index. For iterating through an hmap, only
the map element is necessary.
A linked list would also be a reasonable choice for the bridge's collection
of ports. I chose to use an hmap because we already had an index by name
and it seemed OK to use only one index. I decided not to keep the shash
because they are less convenient for iteration than an hmap.
Adding and removing ports is fairly common in a virtual environment,
because it happens whenever a VM boots or shuts down. It is best to
avoid flushing the whole MAC learning table when that happens, because
that means that, briefly, every packet will get flooded, wasting CPU cycles
and network bandwidth.
This commit breaks flushing the MAC table out of bridge_flush(). Instead,
each caller is now responsible for flushing the MAC table if it is
necessary. In a few cases, no flushing was necessary, so those callers
were not modified. In the case of removing a port or modifying its VLAN
assignments, it is necessary to expire all of the MAC learning entries
associated with that port, so this commit does that. Finally, some
operations do require a MAC learning flush but they are rare enough that
in my opinion it's not worth taking care to avoid a MAC table flush.
Bug #891.
In an upcoming commit I want to store a pointer in MAC learning entries
in the bridge, instead of an integer port number. The MAC learning library
has other clients, and the others do not gracefully fit this new model, so
in fact the data will have to become a union. However, this does not fit
well with the current mac_learning API, since mac_learning_learn()
currently initializes and compares the data. It seems better to break up
the API so that only the client has to know the data's format and how to
initialize it or compare it. This commit makes this possible.
This commit doesn't change the type of the data stored in a MAC learning
entry yet.
As a side effect this commit has the benefit that clients that don't need
gratuitous ARP locking don't have to specify any policy for it at all.
In my opinion, this makes the code more obviously correct in many places,
because there are generally fewer variables. One must generally keep two
variables in sync for iterating through an array: the array index and the
contents of the array at that index. For iterating through a list, only
the list element is necessary.
This is a first step toward changing the array of ifaces in struct port
to a more suitable data structure.
As a side effect this fixes a bonding problem that I noticed via code
inspection. Before this commit, each bond_entry specified an interface
via index. If an iface was deleted, however, this shifted all of the
iface indexes, and the code didn't compensate for that. This commit fixes
the problem by using pointers to ifaces instead, which don't shift around.
Until now, if a given MAC ever transmitted, then it would always show up
in bond information output. There's no benefit to that if the MAC has
gone away permanently. This commit causes them to be deleted when their
load has gone to 0. This takes a fairly long time: if a MAC has sent, say,
one million bytes and then stops transmitting entirely, then it will take
about 20 rebalancing intervals (200 seconds) before it decays to 0 and
gets deleted.
Bug #2157.
The iface_tag name and comment implied that it was really just a copy of
the 'tag' member of struct iface, but in fact it has a completely different
purpose: it represents the binding of a bond_entry to a particular iface.
It is invalidated when the bond_entry has to be redirected to a different
iface, not when the iface itself changes. I hope that this commit helps
to clarify.
This makes the code easier to understand.
As a historical note, the "bridge" code was originally written in an
almighty hurry, and so some design decisions were made on the basis of
being unlikely to cause serious bugs instead of on the basis of being
easy to understand. That's why there are so many array indexes sprinkled
around the bridge data structures, and so much range checking of their
values, when it would be better to just have pointers that can be followed
directly. I figured that getting the wrong index would at least do
something half-reasonable in most cases, whereas dereferencing a freed
pointer was likely to segfault sooner or later and cause immediate failure.
But now I think it's time to improve the code. The code is mature enough
now that we should be able to thoroughly understand the data lifetime
issues.
This code was checking a name from cfg->interfaces[i] but using the result
to decide whether to delete port->ifaces[i]. The two arrays aren't
necessarily in the same order so this code could delete the wrong iface.
Found by inspection. This is probably not a common problem in practice
because I imagine that deleting an interface from a bond, without deleting
the bond itself, is a rare operation. The problem would also be
intermittent--after a few switch reconfiguration cycles I imagine that
the problem would resolve.
This takes one step toward the larger goal of getting rid of all of the
hooks from ofproto back into vswitchd, by eliminating one of the reasons
that they are required.
The callers of ofproto_send_packet() actually just want to send a packet
out on a port, possibly tagged with a VLAN, but the interface forced them
to compose a set of OpenFlow actions, which made it harder to use than
necessary. This commit specializes the interface for the purposes that
the callers really wanted, making it easier to use.
handle_miss_upcall() can now take advantage of this function, too.
A port can be configured to use miimon reporting as the criterion for
enabling or disabling an interface, but in some cases (such as for reading
the initial link status) the code was reading the carrier status instead.
This commit fixes the problem.
This changes the meaning of the link_status column in the Interface table.
I don't think that the old meaning was useful to the controller in the
case of a bond configured for miimon monitoring, because the controller
could not use it to detect which interfaces the bond considered to be up
or down.
Bonding requires lacp attached status to make decisions about which
links are enabled. However, LACP does not require any bonding
related information (other than which links are aggregated) to
perform. Having them reversed causes some subtle bugs in the
bonding code's responsiveness to LACP changes.
Creates new compose_lacp_pdu() from the old compose_lacp_packet()
function. This will allow a LACP PDU to be created without
necessarily knowing the Ethernet Source Address required for
generating the packet. Future patches rely on this functionality.
The monitoring logic and bonding code are unrelated. This commit
pulls the monitoring logic out. As a result all interfaces, not
just those participating in bonds, are monitored. This will be
required to run LACP on non-bonded ports.
Also removes the miimon flag from the port structure.
I've never heard of anyone actually using controller discovery.
It adds a great deal of code to the source tree, and a little
bit of complication to ofproto, so this commit removes it.
The source of truth for QoS statistics on a given interface is tc.
Unfortunately, output from tc can be a little bit confusing and
does not follow the same data model as OVS. This commit adds a
"qos/show" appctl command which gives friendlier output.