These steps are sequentially in lockstep, so we might as well combine them.
This expands the region over which the vport_lock is held. I didn't
carefully verify that this was OK.
This also eliminates the synchronize_rcu() call from destruction of tunnel
vports, since they didn't appear to me to need it.
It should be possible to eliminate the synchronize_rcu() from the netdev,
patch, and internal_dev vports, but this commit does not do that.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
After the previous commit, which changed the datapath to always create and
attach a vport at the same time, and to always detach and delete a vport
at the same time, there is no longer any real distinction between a dp_port
and a vport. This commit, therefore, merges the two together to simplify
code. It might even improve performance, although I have not checked.
I wasn't sure at first whether the merged structure should be "struct
dp_port" or "struct vport". I went with the latter since the "v" prefix
sounds cool.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
For some time now, Open vSwitch datapaths have internally made a
distinction between adding a vport and attaching it to a datapath. Adding
a vport just means to create it, as an entity detached from any datapath.
Attaching it gives it a port number and a datapath. Similarly, a vport
could be detached and deleted separately.
After some study, I think I understand why this distinction exists. It is
because ovs-vswitchd tries to open all the datapath ports before it tries
to create them. However, changing it to create them before it tries to
open them is not difficult, so this commit does this.
The bulk of this commit, however, changes the datapath interface to one
that always creates a vport and attaches it to a datapath in a single step,
and similarly detaches a vport and deletes it in a single step.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
'struct net_device' is refcounted and can stick around for quite a
while if someone is still holding a reference to it. However, we
free the vport that it is attached to in the next RCU grace period
after detach. This assigns the vport to NULL on detach and adds
appropriate checks.
We currently acquire dp_mutex when we are notified that the MTU
of a device attached to the datapath has changed so that we can
set the internal devices to the minimum MTU. However, it is not
required to hold dp_mutex because we already have RTNL lock and it
causes a deadlock, so don't do it.
Specifically, the issue is that DP mutex is acquired twice: once in
dp_device_event() before calling set_internal_devs_mtu() and then
again in internal_dev_change_mtu() when it is actually being changed
(since the MTU can also be set directly). Since it's not a recursive
mutex, deadlock.
Currently the datapath directly accesses devices through their
Linux functions. Obviously this doesn't work for virtual devices
that are not backed by an actual Linux device. This creates a
new virtual port layer which handles all interaction with devices.
The existing support for Linux devices was then implemented on top
of this layer as two device types. It splits out and renames dp_dev
to internal_dev. There were several places where datapath devices
had to handled in a special manner and this cleans that up by putting
all the special casing in a single location.
The MTU of the local port should be no larger than the minimum of
the MTUs of the ports attached to the bridge, overwise packets may be
dropped. We already prevent changes to the MTU that would violate
this constraint but don't actuallly proactively set the MTU. This
changes makes everything consistent and matches the behavior of
the bridge.
Commit c874dc6d6b "secchan: Fix behavior when a network device is renamed."
fixed a crash in the datapath when network devices within a datapath were
renamed. However, this missed the case where the device that was renamed
was a datapath's internal port: these devices have their br_port members
set to NULL, so we have to determine that they belong to a datapath another
way. This commit does so.
This commit also changes the initialization order in dp_dev_create().
Otherwise, dp_device_event() will dereference null when it is called via
register_netdevice(), because the newly created device is a datapath device
but its members are not yet initialized.
We create symlinks from /sys/class/net/<bridgename>/brif/<devname> to
/sys/class/net/<devname>/brport, but until now we have never updated the
links when network devices are renamed. This commit fixes this problem.
(Only the <devname> in /sys/class/net/<bridgename>/brif/<devname> needs to
be updated. Symlinks within sysfs have stable targets; that is, no matter
how the object that a sysfs symlink points to moves around, the link is
still maintained correctly.)
The dp_dev_destroy() function failed to cancel the xmit_queue work, which
allowed it to run after the device had been destroyed, accessing freed
memory. However, simply canceling the work with cancel_work_sync() would
be insufficient, since other packets could get queued while the work
function was running. Stopping the queue with netif_tx_disable() doesn't
help, because the final action in dp_dev_do_xmit() is to re-enable the TX
queue.
This issue led me to re-examine why the dp_dev needs to use a work_struct
at all. This was implemented in commit 71f13ed0b "Send of0 packets from
workqueue, to avoid recursive locking of ofN device" due to a complaint
from lockdep about recursive locking.
However, there's no actual reason that we need any locking around
dp_dev_xmit(). Until now, it has accepted the standard locking provided
by the network stack. But looking at the other software devices (veth,
loopback), those use NETIF_F_LLTX, which disables this locking, and
presumably do so for this very reason. In fact, the lwn article at
http://lwn.net/Articles/121566/ hints that NETIF_F_LLTX, which is otherwise
discouraged in the kernel, is acceptable for "certain types of software
device."
So this commit switches to using NETIF_F_LLTX for dp_dev and gets rid
of the work_struct.
In the process, I noticed that veth and loopback also take advantage of
a network device destruction "hook" using the net_device "destructor"
member. Using this we can automatically get called on network device
destruction at the point where rtnl_unlock() is called. This allows us
to stop stringing the dp_devs that are being destroyed onto a list so
that we can free them, and thus simplifies the code along all the paths
that call dp_dev_destroy().
This commit gets rid of a call to synchronize_rcu() (disguised as a call
to synchronize_net(), which is a macro that expands to synchronize_rcu()),
so it probably speeds up deleting ports, too.