Before, it was possible for records in the configuration database to
disappear, so all of the ovsrec pointers inside bridge structures had
comments cautioning against their use except during reconfiguration. But
now that the bridge has direct control over when ovsdb_idl_run() is called,
it can ensure that bridge_reconfigure() is always called immediately
whenever the IDL data structures change. That means that we can use the
ovsrec configuration at any time after the reconfiguration process
initializes them, not just during reconfiguration.
Until now, the ovs-vswitchd main loop has managed the connection to the
database. This worked adequately until now, but upcoming patches will tie
the bridge code more tightly to the database, which means that the bridge
needs more control over interaction with the database connection and thus
that it is better for the bridge to handle that connection itself. This
commit makes the latter change, moving the database interaction from the
ovs-vswitchd main loop into bridge.c.
ovs-vswitchd doesn't declare its QoS capabilities in the database yet,
so the controller has to know what they are. We can add that later.
The linux-htb QoS class has been tested to the extent that I can see that
it sets up the queues I expect when I run "tc qdisc show" and "tc class
show". I haven't tested that the effects on flows are what we expect them
to be. I am sure that there will be problems in that area that we will
have to fix.
OpenFlow 1.0 datapath IDs are 64 bits long, so the "datapath_id" column
should have 16 hex digits. The documentation had this right, but the
code didn't implement it correctly.
Reported-by: Arthur van Kleef <arthur.vankleef@os3.nl>
Before the transition from configuration file to OVSDB, it was possible to
override the defaults for OpenFlow management listeners and for OpenFlow
controller connection snooping. The former can now effectively be done by
configuring a controller through the database. Overriding the latter is
not very useful (no one has complained that it cannot be done any longer).
So this commit deletes the commented-out code.
In certain cases we require the ability to provide stats that are
added to the values collected by the kernel (currently only used
by bond fake devices). Internal devices previously implemented
this directly but now that their stats are now handled by the vport
layer the functionality has been moved there. This removes the
userspace code to set the stats and replaces it with a mechanism
to access the equivalent functionality in the vport layer.
Normally we filter out packets received on a bond if we have
learned the source MAC as belonging to another port to avoid packets
sent on one slave and reflected back on another. The exception to
this is gratuitous ARPs because they indicate that the host
has moved to another port. However, this can result in an additional
problem on the switch that the host moved to if the gratuitous ARP is
reflected back on a bond slave. In this case, we incorrectly relearn
the slave as the source of the MAC address. To solve this, we lock the
learning entry for 5 seconds after receiving a gratuitous ARP against
further updates caused by gratuitous ARPs on bond slaves.
Bug #2516
Reported-by: Ian Campbell <ian.campbell@citrix.com>
The most recent revision of the netdev library added may_create
and may_open flags to explicitly state the intent of the caller as
to whether the device should already be in use. This was simply
a sanity check for users of the netdev library and the configuration.
At this point the netdev library and its users are well behaved and
should no longer need to be checked. Additional checks have also
been added for incorrect configuration that mean the netdev library
is no longer the primary line of defense.
These flags themselves create problems because it is not always
easy for a library to know what the state of devices should be.
This is particularly a problem for ovs-openflowd, which expects
ports to be added by ovs-dpctl. Fixing this either requires that
the checks are so permissive to be useless or ugly hacks to get
around them. Since they are no longer needed, just remove the
checks.
This commit restores the previous behavior of ovs-openflowd to
not require that ports be specified on the command line or
cleaned up after use.
Bug #2652
CC: Natasha Gude <natasha@nicira.com>
CC: Jean Tourrilhes <jt@hpl.hp.com>
CC: 蒲彦 <yan.p.bjtu@gmail.com>
vswitchd has long used a gratuitous ARP reply as an indication that a VM
has migrated, because traditional xen.org Linux DomUs send such packets out
when they complete migration. Relatively recently, however, we realized
that upstream Linux does not do this. Ian Campbell tracked this down to
two separate issues:
1. A bug prevented gratuitous ARPs from being sent.
2. When this was fixed, the gratuitous ARPs that were sent were
requests, not replies, although kernel documentation sent that
replies were to be sent.
Ian submitted patches to fix both bugs. #1 is in process of revision for
acceptance. #2 was rejected: according to Dave Miller, the documentation
is wrong, not the implementation, because ARP replies would unnecessarily
fill up the ARP tables of devices on the network.
OVS has not until now treated gratuitous ARP requests specially, only
replies. Now that Linux will be using ARP requests to indicate migration,
OVS should also treat them as such.! This commit does so.
See http://marc.info/?l=linux-netdev&m=127367215620212&w=2 for Ian's
original patch and http://marc.info/?l=linux-netdev&m=127468303701361&w=2
for Dave Miller's response.
CC: Ian Campbell <Ian.Campbell@citrix.com>
NIC-74.
When creating an interface we need to check whether it is internal.
However, the function iface_is_internal() does a lookup on the
interface name but we haven't added it to the hash table yet. This
adds the interface to the table early on in iface_create.
NIC-78
Profiling with qprof showed that bitmap_set_multiple() and bitmap_equal()
were eating up quite a bit of CPU time during bridge reconfiguration (up
to about 10% of total runtime). This is completely avoidable in the common
case where a port trunks all VLANs, where we don't really need a bitmap at
all. This commit implements that optimization.
Before this commit and the preceding one, with 1000 interfaces strcmp()
took 36% and port_lookup() took 8% of total runtime when reconfiguring
bridges. With these two commits the percentage is reduced to 3% and 0%,
respectively.
Before this commit and the following one, with 1000 interfaces strcmp()
took 36% and port_lookup() took 8% of total runtime when reconfiguring
bridges. With these two commits the percentage is reduced to 3% and 0%,
respectively.
Previously we would keep interfaces around that couldn't be opened
because they might be internal interfaces that are created later.
However, this leads to a race condition if the interface appears
after we try to create it and fails since some operations may
succeed. Instead, give up on the interface immediately if it can't
be opened and isn't internal (which we control and so won't have
this issue).
Bug #2737
ovsdb-server must be able to connect to the OVSDB managers over in-band
control (because the manager may be what configures the OpenFlow
controllers). This commit enables that.
With this commit, Open vSwitch permits a bridge to have any number of
OpenFlow controllers. When multiple controllers are configured, Open
vSwitch connects to all of them simultaneously. Details of configuration
are in the vswitch schema documentation.
OpenFlow 1.0 does not specify how multiple controllers coordinate in
interacting with a single switch, so more than one controller should be
specified only if the controllers are themselves designed to coordinate
with each other.
An upcoming commit will provide a simple means for coordination between
multiple controllers.
Feature #2495.
Many ofproto settings are controller-related. Upcoming commits will add
to ofproto the ability to support multiple controllers, so it is important
to be able to refer to controller settings as a group. Hence, this commit
bundles them into a new "struct ofproto_controller".
Add a tun_id field which contains the ID of the encapsulating tunnel
on which a packet was received (0 if not received on a tunnel). Also
add an action which allows the tunnel ID to be set for outgoing
packets. At this point there aren't any tunnel implementations so
these fields don't have any effect.
The matching is exposed to OpenFlow by overloading the high 32 bits
of the cookie as the tunnel ID. ovs-ofctl is capable of turning
on this special behavior using a new "tun-cookie" command but this
command is intentially undocumented to avoid it being used without
a full understanding of the consequences.
Various kinds of flows are inadmissible and must be dropped. Most notably,
OVS drops packets received on a bond whose destinations are ones that OVS
has already learned on a different port. As the comment says:
/* Drop all packets for which we have learned a different input
* port, because we probably sent the packet on one slave and got
* it back on the other. Broadcast ARP replies are an exception
* to this rule: the host has moved to another switch. */
As an important side effect of dropping these packets, OVS does not use
them for MAC learning when it sets up the corresponding flows.
However, OVS also periodically scans the datapath flow table and uses
information about flow activity to update its learning tables. (Otherwise,
learning table entries could expire because no new flows were being set up,
even though active flows existed.) This process, implemented in
bridge_account_flow_ofhook_cb(), did not check for admissibility, so
packets received on a bond could be used for learning even though another
port had already been learned.
This commit fixes the problem by making bridge_account_flow_ofhook_cb()
check for admissibility.
QA notes: Reproducing this problem requires some care and some luck. One
way is to have two VMs with network interfaces on a single bonded network.
Both bonded interfaces must be up (otherwise packets sent out on one slave
will never be received on the other). The problem will also not occur if
the physical switch that the bond slaves are plugged into has learned the
MAC address of the VMs involved (because the physical switch will then,
again, drop the packets without sending them back in on the other slave).
Finally, there needs to be some luck in timing and perhaps with the OVS
internal hash function also.
(One way to reproduce it reliably is to plug a pair of Ethernet ports into
each other with a cable, without an intermediate switch, and then use that
pair of ports as a bond. Then every packet sent out on one will
immediately be received on the other, triggering the problem fairly often.
If this doesn't work at first, try changing the Ethernet address used on
one side or the other.)
To verify that the problem being observed is the one fixed by this commit,
turn on bridge debugging with "ovs-appctl vlog/set bridge:file" and look
for "bridge xapi2: learned that 00:01:02:03:04:06 is on port bond0 in VLAN
0" where 00:01:02:03:04:06 is a VM's Ethernet address and bond0 is the
name of the bond in the ovs-vswitchd log file.
Testing: I ran the "loopback bond" test above with and without this commit,
twice each in case I was just lucky.
CC: Henrik Amren <henrik@nicira.com>
Bug #2366.
Bug NIC-64.
Bug NIC-69.
The next commit will need to make the same tests as the first part of
process_flow(), so this commit breaks that out into a new function
is_admissible().
Should have no externally visible effect.
The implementation of bridge_update_desc() is empty, which causes a
compiler warning for the argument. Mark the argument unused until we
get a chance to fix the function's implementation.
Until now, the stream_ssl functions for configuring private keys,
certificates, and CA certificates have always called into OpenSSL to read
a file. This commit instead makes them do that only if the file name
changed (or it has been 60 seconds since we last tried, in case someone
installed the file behind our backs).
This allows us to factor some code out of vswitchd. In an upcoming commit
we will want to do essentially the same thing from ovsdb-server, so this
avoid code redundancy.
A port mirror seems sufficiently disconnected from the ports that it
mirrors that it seems counterproductive to forbid removing a port if
it is mirrored. This commit therefore changes the references from
Mirror to Port from strong references to weak references, so that
removing a port automatically removes references to it from the Mirror
table.
Since this could cause the port and VLAN selection for the Mirror to
become empty, which would make the mirror select all packets, at the same
time this commit adds a new column "select_all" to Mirror, to explicitly
allow selecting all packets.
ovs-vswitchd has a concept of an "internal" port, which is created
on-demand. The netdev library doesn't understand an "internal" device
type, so we map it to a "system" one.
Bug #2431
Replace inline OpenFlow descriptions with #define. Also, start work to
support setting them dynamically.
(This was originally working with the config file version of vswitchd,
but needs to be updated to work with the config db.)
The 'ip' variable in this inner "if" statement shadows a variable with
the same name in the enclosing block. The variable in the inner block
is never initialized.
Found by Clang (http://clang-analyzer.llvm.org).
In places where a random Ethernet address needs to be generated we
are inconsistent about setting an OUI. This sets an OUI everywhere
to allow the source of packets to be easily identified.