This reverts commit 0858ad2b9d2f594f99bb11fc9b331ce8b1de953d.
Although that commit partially fixed a performance regression
relative to the Linux bridge, it introduced other problems that
were not apparent in the performance testing (e.g.
WARN_ON_ONCE(skb->destructor) now triggers in dp_process_received_packet()).
The fix is not completely obvious, so reverting now to ensure stability
while investigating the problem.
Before commit 72ca14c1 "datapath: Fix race against workqueue in dp_dev and
simplify code," the dp_dev network device had a device queue, and we would
orphan packets before sticking them on the queue. This screwed up socket
accounting a bit, but the effect was limited to the device queue length.
Now, after that commit, the dp_dev device has no device queue, but it still
orphans packets. This screws up socket accounting a *lot*, because the
effect is now unlimited, since there is no queue to limit it.
The solution is to not orphan packets at all. There is little need for it
now since packet transmission now happens immediately, not in a workqueue
whose execution may be delayed.
This should fix bug #1519, which tests "netperf -t UDP_STREAM" performance,
finding that an unrealistically high number of UDP packets could be sent
but that none at all were received. The send rate is due to the orphaning,
the receive rate presumably because at least one out of approx. 65535/1500
= 44 fragments per full packet were dropped in each case.
Previously, there was no way to induce coverage information to be
displayed; it would only print when the system noticed unusual delays
between polling intervals. Now, production of coverage logs can be
forced with "coverage/log" command in ovs-appctl. Coverage counters may
be reset with "coverage/clear".
Before we create the local port, we should allocate and assign the table.
Otherwise packets sent on the local port before we do so will cause an
OOPS.
This is a theoretical race that has not been observed in practice.
The dp_dev_destroy() function failed to cancel the xmit_queue work, which
allowed it to run after the device had been destroyed, accessing freed
memory. However, simply canceling the work with cancel_work_sync() would
be insufficient, since other packets could get queued while the work
function was running. Stopping the queue with netif_tx_disable() doesn't
help, because the final action in dp_dev_do_xmit() is to re-enable the TX
queue.
This issue led me to re-examine why the dp_dev needs to use a work_struct
at all. This was implemented in commit 71f13ed0b "Send of0 packets from
workqueue, to avoid recursive locking of ofN device" due to a complaint
from lockdep about recursive locking.
However, there's no actual reason that we need any locking around
dp_dev_xmit(). Until now, it has accepted the standard locking provided
by the network stack. But looking at the other software devices (veth,
loopback), those use NETIF_F_LLTX, which disables this locking, and
presumably do so for this very reason. In fact, the lwn article at
http://lwn.net/Articles/121566/ hints that NETIF_F_LLTX, which is otherwise
discouraged in the kernel, is acceptable for "certain types of software
device."
So this commit switches to using NETIF_F_LLTX for dp_dev and gets rid
of the work_struct.
In the process, I noticed that veth and loopback also take advantage of
a network device destruction "hook" using the net_device "destructor"
member. Using this we can automatically get called on network device
destruction at the point where rtnl_unlock() is called. This allows us
to stop stringing the dp_devs that are being destroyed onto a list so
that we can free them, and thus simplifies the code along all the paths
that call dp_dev_destroy().
This commit gets rid of a call to synchronize_rcu() (disguised as a call
to synchronize_net(), which is a macro that expands to synchronize_rcu()),
so it probably speeds up deleting ports, too.
When we create a datapath we do this:
1. Create local port.
2. Call add_dp hook.
3. Allow userspace to add more ports.
When we deleted a datapath we were doing this:
1. Call del_dp hook
2. Delete all the ports.
Unfortunately step 1 destroys dp->ifobj, then dp_del_port on any port other
than the local port in step 2 tries to reference dp->ifobj through a call
to sysfs_remove_link().
This commit fixes the problem by changing datapath deletion to mirror
creation:
1. Delete all the ports but the local port.
2. Call dp_del hook.
3. Delete local port.
Commit 010082639 "datapath: Add sysfs support for all (otherwise supported)
Linux versions" makes this problem obvious on a 2.6.25+ kernel configured
with slab debugging, because on such kernels the ifobj is a pointer to a
slab object that is freed by the del_dp hook function (when brcompat_mod
is loaded). This bug may be just as present on older kernels, but there
the ifobj is part of struct datapath, not a pointer, and thus it is much
harder to trigger.
Bug #1465.
The in-band control code assumed that we would connect to the controller
over the datapath's "local" port. If the switch had multiple datapaths,
and the controller was only available through one of them, the other
datapaths would fail to connect. This is addressed by the following
changes:
- The controller's MAC address is looked up through the connection's
interface instead of the datapath's local one.
- We allow ARP traffic to be sent by the connection's interface.
- We allow ARP traffic to be sent to the controller's MAC address.
This is necessary if the controller is running locally (e.g. in a VM).
Bug #1466
In some cases we need to be able to locate a device by its IPv4 address.
There doesn't seem to be an easy way to do this on Linux, so we iterate
through all devices until it's found. The caller can provide a hint as
to the device, so subsequent checks can be quicker.
This checkin also adds nodev versions of functions to lookup ARP entries
and the IPv4 address of a device.
It is often useful in debugging to know the ports and IP addresses being
used in an OpenFlow connection. This commit adds the remote and local
IP addresses and ports used for a connection to the "status" information
returned by ovs-ofctl.
Feature #1484.
Previously, rconn and vconn only allowed users to find out about the
remote IP address. This set of changes allows users to retrieve the
remote port, local IP, and local port used for the connection.
Since it's not always easy to determine the management id, this adds the
ability to see it (and the datapath id) with the "ovs-ofctl status"
command.
Feature #1533
At startup, the vswitch needs to delete datapaths that are not configured
by the administrator. Until now this was done by knowing the possible
names of Linux datapaths. This commit cleans up by allowing each
datapath class to enumerate its existing datapaths and their names.
The userspace tools were allowing the name of any internal port to be used
to identify a datapath. This, however, makes it hard to enumerate all the
names by which a datapath can be known, and it was never documented or
intentional behavior, so this commit disables it.
Commit f4b96c92c "vswitch: Disallow bridges named "dpN" or "nl:N"" disabled
naming bridges "dpN" because the vswitchd code made the bad assumption that
the bridge's local port has the same name as the bridge, which was not
true (at the time) for bridges named dpN. Now that assumption has been
eliminated, so this commit eliminates the restriction too.
This change is also a cleanup in that it eliminates one form of the
vswitch's dependence on specifics of the dpif implementation.
The dpif and netdev code has had various ways to check for changes to
dpifs and netdevs over the course of Open vSwitch development. All of
these have been thus far fairly specific to the Linux implementation. This
commit is the start of a more general API for watching for such changes.
The dpif-related parts seem fairly mature and so they are documented,
the netdev parts will probably need to change somewhat and so they are
not documented yet.
This commit initially introduces only a single datapath implementation,
which is the same as the original one, but it paves the way for
additional implementations, such as the upcoming userspace datapath.
Until now, ODP_PORT_LIST has reported the number of ports actually copied
out. It's better for the caller, however, if it reports the number of
ports that were available to be copied out.
Soon we will allow for multiple datapath implementations. By allowing
the datapath to choose the port numbers, we possibly simplify some datapath
implementations, and the datapath's clients don't have to guess (or to
check) what port numbers are free, so this seems like a better way to go.
This code checked that the number of actions in a flow query did not
exceed the maximum number of actions that are allowed on a given flow.
But this check is unnecessary, since the code will never copy out any
more actions than actually exist in a flow. It would be a shame to refuse
a flow query simply on the basis that the caller allocated more memory
than necessary, so eliminate the check.
This function is easier for callers to use if they do not have to guess
how many ports are in the group. Since it's not performance critical at
all, introduce these easier semantics.
One potential user of dpif_port_query_by_name() actually open-coded the
function because it didn't want logging on failure. There was only one
actual caller of the function, which didn't want logging on failure either.
So, clean up by reducing the failure log level to DBG and making the
open-coded version an actual caller.
dpif_id() is often used in error messages, e.g. "dp%u: screwed up". But
soon we will be generalizing the concept of a datapath, so it is better
to have a function that returns a full name, e.g. "%s: screwed up".
Accordingly, this commit replaces dpif_id() by a new function dpif_name()
that does so.
The 'minor' member of struct dpif is used for two different purposes:
for printing in log messages and for encapsulating in NetFlow messages.
The needs in each case are different, so we should break up these uses.
This commit does half of that, by introducing a new function to retrieve
NetFlow ids and using it where appropriate.
With multiple kinds of datapaths, code should not just use
"dp%u" along with dpif_minor() to print a datapath name, because not all
datapaths can sensibly be named that way. We want to use a function
with a name like dpif_get_name() to retrieve a datapath name for printing
to the user, in which case the existing dpif_get_name() function would be
confusing. So rename the existing one to something more explicit.
When vSwitch does discovery, it is supposed to update resolv.conf by
default. The way configuration parameters were being read, it would
disable this update by default.