2
0
mirror of https://github.com/openvswitch/ovs synced 2025-10-17 14:28:02 +00:00
Commit Graph

116 Commits

Author SHA1 Message Date
Pravin B Shelar
95b1d73a4a datapath: Increase maximum number of datapath ports.
Use hash table to store ports of datapath. Allow 64K ports per switch.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>

Bug #2462
2012-02-16 17:12:36 -08:00
Pravin B Shelar
2a4999f3f3 datapath: Add support for namespace.
Following patch adds support for Linux net-namespace. Now we can
have independent OVS instance in each net-ns.
Namespace support requires 2.6.32 or newer kernel as per-net-ns
genl-sock is not available in earlier kernel.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>

Bug #7821
2012-01-30 06:56:54 -08:00
Dan Carpenter
9697e4337f datapath: small potential memory leak in ovs_vport_alloc()
We're unlikely to hit this leak, but the static checkers complain if we
don't take care of it.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2011-12-06 11:08:25 -08:00
Jesse Gross
850b6b3b9f datapath: Scope global symbols with ovs_ prefix.
OVS has quite a few global symbols that should be scoped with a
prefix to prevent collisions with other modules in the kernel.

Suggested-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2011-11-22 11:13:35 -08:00
Jesse Gross
821cb9fac7 datapath: Use u64_stats_sync for datapath and vport stats.
We currently use a seqcount to prevent reading partial 64-bit stats
on 32-bit CPUs.  u64_stats_sync uses the same logic but elides it on
64-bit and uniprocessor machines.  This improves performance (primarily
on non-x86 architectures) at the cost of not guaranteeing that packet
and byte counts were necessarily read together.

Suggested-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2011-11-21 10:25:19 -08:00
Jesse Gross
a9a29d22d8 datapath: Reformat copyright messages.
Many of our kernel copyright messages make reference to code being
copied from the Linux kernel, which is a bit odd for code in the
kernel.  This changes them to use the standard GNU GPL boilerplate
instead.  It does not change the actual license, which continues to
be GPLv2.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2011-11-16 13:55:49 -08:00
Pravin B Shelar
58d01ad97d datapath: Fix vport tx_packets count.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-11-08 11:16:24 -08:00
Jesse Gross
16b82e84fa datapath: Slim down the vport interface.
Many of the function in vport.c are simply pass throughs to their
underlying vport implementation and, of these, many are used only
for bridge compatibility code.  This allows users of these functions
to directly call through the ops structure, reducing boilerplate code
and keeping more of the compatibility code together.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2011-11-07 18:24:35 -08:00
Pravin B Shelar
6455100f38 datapath: Fix coding style issues.
Most of issues are reported by checkpatch.pl

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>

Bug #7771
2011-11-07 15:53:01 -08:00
Pravin B Shelar
e9141eec24 datapath: Remove RT kernel support.
Following patch removes RT kernel support. This allows us to cleanup
the loop detection.
Along with this BH is now disabled while running execute_actions()
for packet from user-space.
As a result we can simplify the stats code as entire send and receive
path runs in BH context on all supported platforms.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Bug #7621
2011-10-06 21:52:39 -07:00
Pravin Shelar
6ff686f2bc sFlow: Genericize/simplify kernel sFlow implementation
Following patch adds sampling action which takes probability and set
of actions as arguments. When probability is hit, actions are executed for
given packet.
USERSPACE action's userdata (u64) is used to store struct
user_action_cookie as cookie. CONTROLLER action is fixed accordingly.

Now we can remove sFlow code from kernel and implement sFlow generically
as SAMPLE action. sFlow is defined as SAMPLE Action with probability (sFlow
sampling rate) and USERSPACE action as argument. USERSPACE action's data
is used as cookie. sFlow uses this cookie to store output-port, number of
output ports and vlan-id. sample-pool is calculated by using vport
stats.

Signed-off-by: Pravin Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2011-09-28 10:43:07 -07:00
Jesse Gross
b063d9f06e datapath: Use unicast Netlink sockets for upcalls.
Currently we publish several multicast groups for upcalls and let
userspace sockets subscribe to them.  The benefit of this is mostly
that userspace is the one doing the subscription - the actual
multicast capability is not currently used and probably wouldn't be
even if we moved to a multiprocess model.  Despite the convenience,
multicast sockets have a number of disadvantages, primarily that
we only have a limited number of them so there could be collisions.
In addition, unicast sockets give additional flexibility to userspace
by allowing every object to potentially have a different socket
chosen by userspace for upcalls.  Finally, any future optimizations
for upcalls to reduce copying will likely not be compatible with
multicast anyways so disallowing it potentially simplifies things.

We also never unregistered the multicast groups registered for upcalls
and leaked them on module unload.  As a side effect, this solves that
problem.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2011-09-23 15:27:48 -07:00
Pravin Shelar
f613a0d72c datapath: Always use generic stats for devices (vports)
Currently ovs is using device stats for Linux devices and count them
itself in other situations. This leads to overlap with hardware stats,
inconsistencies, etc. It's much better to just always count the packets
flowing through the switch and let userspace do any merging that it wants.

Following patch removes vport->get_stats() interface. vport-stat is changed
to use new `struct ovs_vport_stat` rather than rtnl_link_stats64.
Definitions of rtnl_link_stats64 is removed from OVS.  dipf_port->stat is also
removed as aggregate stats are only available at netdev layer.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-09-15 19:36:17 -07:00
Justin Pettit
9197df76b4 Set MTU in userspace rather than kernel.
Currently the kernel automatically sets the MTU of any internal
interfaces to the minimum of all attached interfaces because the Linux
bridge does this.  Userspace can do this with more knowledge and
flexibility.

Feature #7323

Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-09-15 16:27:15 -07:00
Pravin Shelar
3544358aa5 datapath: Improve kernel hash table
Currently OVS uses its own hashing implmentation for hash tables
which has some problems, e.g. error case on deletion code.
Following patch replaces that with hlist based hash table which is
consistent with other kernel hash tables. As Jesse suggested, flex-array
is used for allocating hash buckets, So that we can have large
hash-table without large contiguous kernel memory.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-09-09 19:09:47 -07:00
Pravin Shelar
ff8d7a5e81 Strip down vport interface : iflink
Remove iflink from vport interface. iflink is not used anywhere in
OVS. So there is not need to have iflink as vport attribute.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-09-08 15:18:42 -07:00
Justin Pettit
24b019f808 datapath: Disable LRO from userspace instead of the kernel.
Whenever a port is added to the datapath, LRO is automatically disabled.
In the future, we may want to enable LRO in some circumstances, so have
userspace disable LRO through the ethtool ioctls.

As part of this change, the MTU and LRO checks are moved to
netdev-vport's send(), which is where they're actually needed.

Feature #6810

Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-08-28 21:30:58 -07:00
Justin Pettit
df2c07f433 datapath: Use "OVS_*" as opposed to "ODP_*" for user<->kernel interactions.
The prefix "ODP_*" is not overly descriptive in the context of the
larger Linux tree.  This commit changes the prefix to "OVS_*" for the
userpace to kernel interactions.  The userspace libraries still use
"ODP_" in many of their interfaces since it is more descriptive in the
OVS oeuvre.

Feature #6904

Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-08-19 22:48:23 -07:00
Justin Pettit
91dbd46c69 datapath: Correct comment for vport_add().
The comment describing vport_add() incorrectly stated that the function
added the vport to the datapath.  It is the responsibility of the caller
to do that.

Signed-off-by: Justin Pettit <jpettit@nicira.com>
2011-08-18 17:18:09 -07:00
Ben Pfaff
f915f1a8ca datapath: Consider tunnels to have no MTU, fixing jumbo frame support.
Until now, tunnel vports have had a specific MTU, in the same way that
ordinary network devices have an MTU, but treating them this way does not
always make sense.  For example, consider a datapath that has three ports:
the local port, a GRE tunnel to another host, and a physical port.  If
the physical port is configured with a jumbo MTU, it should be possible to
send jumbo packets across the tunnel: the tunnel can do fragmentation or
the physical port traversed by the tunnel might have a jumbo MTU.

However, until now, tunnels always had a 1500-byte MTU by default.  It
could be adjusted using ODP_VPORT_MTU_SET, but nothing actually did this.
One alternative would be to make ovs-vswitchd able to set the vport's MTU.
This commit, however, takes a different approach, of dropping the concept
of MTU entirely for tunnel vports.  This also solves the problem described
above, without making any additional work for anyone.

I tested that, without this change, I could not send 1600-byte "pings"
between two machines whose NICs had 2000-byte MTUs that were connected to
vswitches that were in turn connected over GRE tunnels with the default
1500-byte MTU.  With this change, it worked OK, regardless of the MTU of
the network traversed by the GRE tunnel.

This patch also makes "patch" ports MTU-less.

It might make sense to remove vport_set_mtu() and the associated callback
now, since ordinary network devices are the only vports that support it
now.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Suggested-by: Jesse Gross <jesse@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Bug #3728.
2011-02-04 09:46:26 -08:00
Ben Pfaff
f0fef76062 datapath: Convert ODP_VPORT_* to use AF_NETLINK socket layer.
This commit calls genl_lock() and thus doesn't support Linux before
2.6.35, which wasn't exported before that version.  That problem will
be fixed once the whole userspace interface transitions to Generic
Netlink a few commits from now.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-01-28 15:34:27 -08:00
Ben Pfaff
ed099e921e datapath: Adopt Generic Netlink-compatible locking.
The kernel Generic Netlink layer always holds a mutex (genl_lock) when it
invokes callbacks, so that means that there is no point in having
per-datapath mutexes or a separate vport lock.  This commit removes them.

This commit breaks support for Linux before 2.6.35 because it calls
genl_lock(), which wasn't exported before that version.  That problem will
be fixed once the whole userspace interface transitions to Generic
Netlink a few commits from now.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-01-27 21:08:42 -08:00
Ben Pfaff
057dd6d279 datapath: Eliminate vport_mutex by protecting vport table with RCU.
The vport_mutex really only protects the vport dev_table, which isn't very
much.  By getting rid of it we take one step toward simplifying the vswitch
locking, which will necessarily have to be based mainly around the Generic
Netlink genl_mutex once we switch to Generic Netlink.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-01-27 21:08:41 -08:00
Ben Pfaff
c19e653509 datapath: Change userspace vport interface to use Netlink attributes.
One of the goals for Open vSwitch is to decouple kernel and userspace
software, so that either one can be upgraded or rolled back independent of
the other.  To do this in full generality, it must be possible to add new
features to the kernel vport layer without changing userspace software.
The customary way to do this in the Linux networking stack is to use
Netlink and in particular Netlink attributes.  This commit adopts that
model for the vport layer.  It does not yet actually start using the
Netlink socket layer, which will come later.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-01-27 21:08:37 -08:00
Ben Pfaff
c283069c71 datapath: Change vport type from string to integer enumeration.
I plan to make the vport type part of the standard header stuck on each
Netlink message related to a vport.  As such, it is more convenient to use
an integer than a string.  In addition, by being fundamentally different
from strings, using an integer may reduce the confusion we've had in the
past over the differences in userspace and kernel names for network device
and vport types.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-01-27 21:08:37 -08:00
Ben Pfaff
3517d8bf80 datapath: Remove unused ->set_stats() function from vport_ops.
No vport implements this function and, as far as I can tell, no vport has
ever implemented it.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-01-22 20:07:20 -08:00
Ben Pfaff
a80a9fbd06 datapath: Remove vport_del_all() because it is now a no-op.
vport_del_all() was created when vports could exist without being attached
to any datapath.  Now, a vport is always attached to a datapath.  This
function was only called on module unload, but the module can't be unloaded
if any datapath exists, so it won't ever have any work to do, and we might
as well delete it entirely.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Suggested-by: Jesse Gross <jesse@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-01-20 16:34:15 -08:00
Ben Pfaff
bc22fc3e66 datapath: Clean up code in vport_get_stats().
This should not change behavior.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-01-05 20:47:03 -08:00
Ben Pfaff
265d5020ad datapath: Fix vport_get_stats() in !VPORT_F_GEN_STATS case.
When VPORT_F_GEN_STATS was not set, vport_get_stats() would always return
an error (either an error returned by ->get_stats(), otherwise
-EOPNOTSUPP).  This fixes the problem.

Found by inspection.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-01-05 20:46:42 -08:00
Jesse Gross
0971d127d0 datapath: Fix double counting of packet stats for Linux devices.
The kernel augments stats for Linux devices that only provide 32-bit stats
with its own internal 64-bit counters.  When doing this it takes the error
stats from the device but uses the packet and byte values from its local
counters.  However, we were also taking the packet and byte counts from
the device, leading to double counting.

Problem introduced by commit ec61a01cd8
'datapath: Use "struct rtnl_link_stats64" instead of "struct odp_vport_stats".'.

Bug #4327

Reported-by: Krishna Miriyala <krishna@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2011-01-05 14:17:09 -08:00
Jesse Gross
1cf7d1b80f datapath: Report ifindex of 0 if vport doesn't have one.
If a vport is a virtual device then it doesn't have a system ifindex.
We currently return the ifindex of the bridge device in this situation
but that's somewhat misleading, so this replaces it with 0.  Nothing
actually reads the ifindex for devices other than the bridge device,
so this doesn't have a functional change.

Suggested-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2010-12-30 09:35:48 -08:00
Jesse Gross
6b02c20141 datapath: Use RCU dereference in vport_get_ifindex().
If we don't have an ifindex for a device (because it is a virtual
port), we fall back to using the ifindex of the local port.
However, we weren't properly dereferencing the vport from the ports
array, so this adds that.  This isn't a real problem though, because
the local port always exists and never changes as long as the
datapath exists.

Found with sparse.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2010-12-29 10:42:10 -08:00
Justin Pettit
dd851cbbcc datapath: Return vport configuration when queried.
Additional configuration is passed down to the kernel in the "config"
array of an odp_port when a vport is created.  This information is not
returned when a vport is queried, though.  This information is useful
for debugging, since it may be used to distinguish ports based on
additional data, such as the peer in tunnels.  In a forthcoming patch, it
will be essential to distinguish between plain GRE and GRE over IPsec.

Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2010-12-28 14:30:35 -08:00
Ben Pfaff
7237e4f4b6 datapath: Merge vport "attach" into "create" and "detach" into "destroy".
These steps are sequentially in lockstep, so we might as well combine them.

This expands the region over which the vport_lock is held.  I didn't
carefully verify that this was OK.

This also eliminates the synchronize_rcu() call from destruction of tunnel
vports, since they didn't appear to me to need it.

It should be possible to eliminate the synchronize_rcu() from the netdev,
patch, and internal_dev vports, but this commit does not do that.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2010-12-03 15:44:51 -08:00
Ben Pfaff
e779d8d90d datapath: Merge "struct dp_port" into "struct vport".
After the previous commit, which changed the datapath to always create and
attach a vport at the same time, and to always detach and delete a vport
at the same time, there is no longer any real distinction between a dp_port
and a vport.  This commit, therefore, merges the two together to simplify
code.  It might even improve performance, although I have not checked.

I wasn't sure at first whether the merged structure should be "struct
dp_port" or "struct vport".  I went with the latter since the "v" prefix
sounds cool.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2010-12-03 14:43:38 -08:00
Ben Pfaff
c3827f619a datapath: Make adding and attaching a vport a single step.
For some time now, Open vSwitch datapaths have internally made a
distinction between adding a vport and attaching it to a datapath.  Adding
a vport just means to create it, as an entity detached from any datapath.
Attaching it gives it a port number and a datapath.  Similarly, a vport
could be detached and deleted separately.

After some study, I think I understand why this distinction exists.  It is
because ovs-vswitchd tries to open all the datapath ports before it tries
to create them.  However, changing it to create them before it tries to
open them is not difficult, so this commit does this.

The bulk of this commit, however, changes the datapath interface to one
that always creates a vport and attaches it to a datapath in a single step,
and similarly detaches a vport and deletes it in a single step.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2010-12-03 14:41:38 -08:00
Ben Pfaff
94903c9898 datapath: Encapsulate parameters for new vports in new struct vport_parms.
Upcoming commits will keep needing to pass more information to the vport
'create' member function.  It's annoying to have to modify a dozen pieces
of code every time just to do this, so this commit encapsulates all of the
parameters in a new struct and passes that instead.

Acked-by: Jesse Gross <jesse@nicira.com>
2010-12-03 11:33:41 -08:00
Jesse Gross
b279fccf5b datapath: Constify ops structures.
vport_ops, tunnel_ops, and ethtool_ops should not change at runtime.
Therefore, mark them as const to keep them out of the hotpath and to
prevent them from getting trampled.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2010-12-02 17:10:16 -08:00
Jesse Gross
6d1d631e10 vport: Remove unused error types.
We currently track rx_over_errors, rx_crc_errors, rx_frame_errors,
and collisions but never increment these counters.  It seems likely
that we will never use them since they are primarily hardware errors
and we pull hardware stats directly from the NIC.  This removes those
counters, saving 32 bytes per port.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2010-12-02 17:10:15 -08:00
Ben Pfaff
ec61a01cd8 datapath: Use "struct rtnl_link_stats64" instead of "struct odp_vport_stats".
Linux 2.6.35 added struct rtnl_link_stats64, which as a set of 64-bit
network device counters is what the OVS datapath needs.  We might as well
use it instead of our own.

This commit moves the if_link.h compat header from datapath/ into the
top-level include/ directory so that it is visible both to kernel and
userspace code.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2010-11-09 13:48:57 -08:00
Ben Pfaff
f493a3fc14 datapath: Use struct assignment in place of memcpy() for copying stats.
We might as well take advantage of type safety when we can get it.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2010-11-09 12:45:24 -08:00
Jesse Gross
3976f6d57b datapath: Enable usage of cached flows.
An upcoming commit will add support for supplying cached flows for
packets entering the datapath.  This adds the code in the datapath
itself to recognize these cached flows and use them instead of
extracting the flow fields and doing a lookup.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Reviewed-by: Ben Pfaff <blp@nicira.com>
2010-09-22 13:43:01 -07:00
Joe Perches
dfffaef1eb treewide: Use pr_fmt and pr_<level>
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Simon Horman <horms@verge.net.au>
[Jesse: Added missing pr_fmt in vport-gre.c and dp_sysfs_dp.c]
Signed-off-by: Jesse Gross <jesse@nicira.com>
2010-08-30 13:23:08 -07:00
Jesse Gross
e90b1cf9ce datapath: Add support for CAPWAP UDP transport.
Add support for the transport portion of the CAPWAP protocol as
an alternative to GRE for L2 over L3 tunneling.  This is not
full support for the CAPWAP protocol.  CAPWAP covers management
of wireless access points and describes a control protocol for
setting those devices up.  It also describes a data plane protocol
that allows packets to be tunneled to a controller for inspection.
This data plane protocol is the only component covered by this
commit.

Signed-off-by: Jesse Gross <jesse@nicira.com>
2010-08-24 16:58:00 -04:00
Jesse Gross
38c6ecbc8d datpath: Avoid reporting half updated statistics.
We enforce mutual exclusion when updating statistics by disabling
bottom halves and only writing to per-CPU state.  However, reading
requires looking at the statistics for foreign CPUs, which could be
in the process of updating them since there isn't a lock.  This means
we could get garbage values for 64-bit values on 32-bit machines or
byte counts that don't correspond to packet counts, etc.

This commit introduces a sequence lock for statistics values to avoid
this problem.  Getting a write lock is very cheap - it only requires
incrementing a counter plus a memory barrier (which is compiled away
on x86) to acquire or release the lock and will never block.  On
read we spin until the sequence number hasn't changed in the middle
of the operation, indicating that the we have a consistent set of
values.

Signed-off-by: Jesse Gross <jesse@nicira.com>
2010-08-20 19:44:46 -07:00
Ben Pfaff
55574bb0d2 datapath: Detect and suppress flows that are implicated in loops.
In-kernel loops need to be suppressed; otherwise, they cause high CPU
consumption, even to the point that the machine becomes unusable.  Ideally
these flows should never be added to the Open vSwitch flow table, but it
is fairly easy for a buggy controller to create them given the menagerie
of tunnels, patches, etc. that OVS makes available.

Commit ecbb6953b "datapath: Add loop checking" did the initial work
toward suppressing loops, by dropping packets that recursed more than 5
times.  This at least prevented the kernel stack from overflowing and
thereby OOPSing the machine.  But even with this commit, it is still
possible to waste a lot of CPU time due to loops.  The problem is not
limited to 5 recursive calls per packet: any packet can be sent to
multiple destinations, which in turn can themselves be sent to multiple
destinations, and so on.  We have actually seen in practice a case where
each packet was, apparently, sent to at least 2 destinations per hop, so
that each packet actually consumed CPU time for 2**5 == 32 packets,
possibly more.

This commit takes loop suppression a step further, by clearing the actions
of flows that are implicated in loops.  Thus, after the first packet in
such a flow, later packets for either the "root" flow or for flows that
it ends up looping through are simply discarded, saving a huge amount of
CPU time.

This version of the commit just clears the actions from the flows that a
part of the loop.  Probably, there should be some additional action to tell
ovs-vswitchd that a loop has been detected, so that it can in turn inform
the controller one way or another.

My test case was this:

ovs-controller -H --max-idle=permanent punix:/tmp/controller
ovs-vsctl -- \
    set-controller br0 unix:/tmp/controller -- \
    add-port br0 patch00 -- \
    add-port br0 patch01 -- \
    add-port br0 patch10 -- \
    add-port br0 patch11 -- \
    add-port br0 patch20 -- \
    add-port br0 patch21 -- \
    add-port br0 patch30 -- \
    add-port br0 patch31 -- \
    set Interface patch00 type=patch options:peer=patch01 -- \
    set Interface patch01 type=patch options:peer=patch00 -- \
    set Interface patch10 type=patch options:peer=patch11 -- \
    set Interface patch11 type=patch options:peer=patch10 -- \
    set Interface patch20 type=patch options:peer=patch21 -- \
    set Interface patch21 type=patch options:peer=patch20 -- \
    set Interface patch30 type=patch options:peer=patch31 -- \
    set Interface patch31 type=patch options:peer=patch30

followed by sending a single "ping" packet from an attached Ethernet
port into the bridge.  After this, without this commit the vswitch
userspace and kernel consume 50-75% of the machine's CPU (in my KVM
test setup on a single physical host); with this commit, some CPU is
consumed initially but it converges on 0% quickly.

A more challenging test sends a series of packets in multiple flows;
I used "hping3" with its default options.  Without this commit, the
vswitch consumes 100% of the machine's CPU, most of which is in the
kernel.  With this commit, the vswitch consumes "only" 33-50% CPU,
most of which is in userspace, so the machine is more responsive.

A refinement on this commit would be to pass the loop counter down to
userspace as part of the odp_msg struct and then back up as part of
the ODP_EXECUTE command arguments.  This would, presumably, reduce
the CPU requirements, since it would allow loop detection to happen
earlier, during initial setup of flows, instead of just on the second
and subsequent packets of flows.
2010-08-03 14:40:29 -07:00
Jesse Gross
cc98976af5 vport: Make dp_port->vport always valid.
When we detached a vport we would assign NULL to dp_port->vport
before calling synchronize_rcu().  However, since vports have a
longer lifetime than dp_ports there were no checks before
dereferencing dp_port->vport.  This changes the behavior to
match the assumption by not assigning NULL during detach.  This
avoids a potential NULL pointer dereference in do_output() among
other places.
2010-07-30 13:42:26 -07:00
Jesse Gross
f9764f6e91 vport: Use DEFINE_PER_CPU instead of dynamically allocating loop counter.
DEFINE_PER_CPU is simpler and faster than alloc_percpu() so use it
for the loop counter, which is already statically defined.
2010-07-15 15:09:08 -07:00
Jesse Gross
fceb2a5bb2 datapath: Put return type on same line as arguments for functions.
In some places we would put the return type on the same line as
the rest of the function definition and other places we wouldn't.
Reformat everything to match kernel style.
2010-07-15 15:09:08 -07:00
Jesse Gross
10d3515aa4 vport: Use EBUSY to represent already attached device.
We currently use EEXIST to represent both a device that is already
attached and for GRE devices that are the same as another one.
Instead use EBUSY for already attached devices to disambiguate the
two situations.
2010-07-15 15:09:08 -07:00