2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-28 12:58:00 +00:00

124 Commits

Author SHA1 Message Date
Ben Pfaff
c2b565b54e dpif: Factor 'type' and 'error' out of individual dpif_op members.
I'd like to change ->dpif_flow_put() and ->dpif_execute() in the dpif
provider to take the structures of the same names as parameters, instead of
passing them discrete parameters, because this seems like a more sensible
way to do things internally than to have two different ways to pass the
parameters.  It might even simplify code slightly.  But ->flow_put() and
->execute() wouldn't want the 'type' (because it's implied by the function
being called) or 'error' (because it would be the same as the return
value).  Although of course they could just ignore those members, it seems
slightly cleaner to omit them entirely, as this change allows.

Signed-off-by: Ben Pfaff <blp@nicira.com>
2012-01-16 13:35:21 -08:00
Ben Pfaff
a12b3eadc6 dpif: Simplify the "listen mask" concept.
At one point in the past, there were three separate queues between the
kernel module and OVS userspace, each of which corresponded to a Netlink
socket (or, before that, to a character device).  It made sense to allow
each of these to be enabled or disabled separately, hence the "listen mask"
concept in the dpif layer.

These days, the concept is much less clear-cut.  Queuing is no longer on
the basis of different classes of packets but instead striped across a
collection of sockets based on input port.  It doesn't really make sense
to enable receiving packets on the basis of the kind of packet anymore.
Accordingly, this commit simplifies the "listen_mask" to just a bool that
either enables or disables receiving packets.

It could be useful to enable or disable receiving packets on a per-vport
basis, but the rest of the code isn't ready to make use of that so this
commit doesn't generalize this much.

Based on this discussion on ovs-dev:
http://openvswitch.org/pipermail/dev/2011-October/012044.html

Signed-off-by: Ben Pfaff <blp@nicira.com>
2012-01-12 17:09:22 -08:00
Ben Pfaff
733c8d13d7 dpif-linux: Avoid valgrind warning in epoll_ctl() call.
Valgrind points out correctly that there are uninitialized bytes in the
'event' structure.  That's OK, but it doesn't hurt to suppress the warning
by zeroing all of the bytes.

This doesn't fix a real bug.

Signed-off-by: Ben Pfaff <blp@nicira.com>
2011-12-12 14:47:10 -08:00
Ben Pfaff
50f80534f2 dpif-linux: Use "epoll" instead of poll().
epoll appears to be much more efficient than poll() at least for
static file descriptor sets.  I can't otherwise explain why this
patch increases netperf CRR performance by 20% above the previous
commit, which is also about a 19% overall improvement versus
the baseline from before the poll_fd_woke() optimization was
removed.
2011-11-28 09:31:07 -08:00
Ben Pfaff
8522ba0996 dpif-linux: Use poll() internally in dpif_linux_recv().
Using poll() internally in dpif_linux_recv(), instead of relying
on the results of the main loop poll() call, brings netperf CRR
performance back within 1% of par versus the code base before the
poll_fd_woke() optimizations were introduced.  It also increases
the ovs-benchmark results by about 5% versus that baseline, too.

My theory is that this is because the main loop takes long enough
that a significant number of packets can arrive during the main
loop itself, so this reduces the time before OVS gets to those
packets.
2011-11-28 09:29:18 -08:00
Ben Pfaff
f6d1465cec dpif-linux: Remove poll_fd_woke() optimization from dpif_linux_recv().
This optimization on its own provided about 37% benefit against a
load of a single netperf CRR test, but at the same time it penalized
ovs-benchmark by about 11%.  We can get back the CRR performance
loss, and more, other ways, so the first step is to revert this
patch, temporarily accepting the performance loss.
2011-11-28 09:19:08 -08:00
Ben Pfaff
f7df9823a0 netlink: New macro NL_POLICY_FOR. 2011-11-11 14:07:17 -08:00
Pravin B Shelar
abff858b5a datapath: Convert kernel priority actions into match/set.
Following patch adds skb-priority to flow key. So userspace will know
what was priority when packet arrived and we can remove the pop/reset
priority action. It's no longer necessary to have a special action for
pop that is based on the kernel remembering original skb->priority.
Userspace can just emit a set priority action with the original value.

Since the priority field is a match field with just a normal set action,
we can convert it into the new model for actions that are based on
matches.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>

Bug #7715
2011-11-01 10:13:16 -07:00
Jesse Gross
69685a8882 datapath: Define constants for versions of GENL families.
Currently we hard code the versions of our GENL families to 1 but it's
nicer to have symbolic constants.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2011-10-23 11:24:19 -07:00
Ben Pfaff
7257b535ab Implement new fragment handling policy.
Until now, OVS has handled IP fragments more awkwardly than necessary.  It
has not been possible to match on L4 headers, even in fragments with offset
0 where they are actually present.  This means that there was no way to
implement ACLs that treat, say, different TCP ports differently, on
fragmented traffic; instead, all decisions for fragment forwarding had to
be made on the basis of L2 and L3 headers alone.

This commit improves the situation significantly.  It is still not possible
to match on L4 headers in fragments with nonzero offset, because that
information is simply not present in such fragments, but this commit adds
the ability to match on L4 headers for fragments with zero offset.  This
means that it becomes possible to implement ACLs that drop such "first
fragments" on the basis of L4 headers.  In practice, that effectively
blocks even fragmented traffic on an L4 basis, because the receiving IP
stack cannot reassemble a full packet when the first fragment is missing.

This commit works by adding a new "fragment type" to the kernel flow match
and making it available through OpenFlow as a new NXM field named
NXM_NX_IP_FRAG.  Because OpenFlow 1.0 explicitly says that the L4 fields
are always 0 for IP fragments, it adds a new OpenFlow fragment handling
mode that fills in the L4 fields for "first fragments".  It also enhances
ovs-ofctl to allow users to configure this new fragment handling mode and
to parse the new field.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Bug #7557.
2011-10-21 15:07:36 -07:00
Ben Pfaff
6bc6002490 dpif: New function dpif_operate() and dpif-linux implementation.
This will be used in an upcoming commit.
2011-10-14 14:08:44 -07:00
Ben Pfaff
30b44744a1 dpif-linux: Only ask datapath to echo back results when they will be used.
A fair number of datapath flow operations optionally report back results
to the requester based on whether NLM_F_ECHO is set in the request.  When
userspace isn't going to use those results anyway, it wastes memory to
store them and a system call to retrieve them.

This commit omits the NLM_F_ECHO bit in cases where the caller isn't going
to use the results.

(NLM_F_ECHO has no effect on operations whose entire purpose is to retrieve
data, e.g. "get" and "dump" operations, so we need not bother to set it
for those.)

This improves "ovs-benchmark rate" results in my testing by about 4%.
2011-10-14 14:08:44 -07:00
Ben Pfaff
09ded0ad48 datapath-protocol: Rename enums for consistency.
Most of the enum tags in this file are lowercased versions of the uppercase
enum prefixes (or slightly less abbreviated versions, e.g. "dp" becomes
"datapath").  This commit fixes up the others for consistency.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-10-12 16:27:09 -07:00
Ben Pfaff
ea36840fa4 datapath: Require explicit upcall_pid for new datapaths and vports.
This increases consistency with the OVS_ACTION_ATTR_USERSPACE action, which
also requires an explicit pid.

Suggested-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-10-12 16:27:08 -07:00
Ben Pfaff
98403001ec datapath: Move Netlink PID for userspace actions from flows to actions.
Commit b063d9f06 "datapath: Use unicast Netlink sockets for upcalls" that
switched from multicast to unicast Netlink for sending upcalls added a
Netlink PID to each kernel flow, used by OVS_ACTION_ATTR_USERSPACE actions
within the flow as target.

This commit drops this per-flow PID in favor of a per-action PID, because
that is more flexible.  It does not yet make use of this additional
flexibility, so behavior should not change.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Bug #7559.
2011-10-12 16:27:00 -07:00
Ben Pfaff
0e70cdcb8d dpif-linux: Use get_32aligned_u64() in an appropriate place. 2011-10-12 16:23:16 -07:00
Ben Pfaff
a24a65747a dpif-linux: Don't reset kernel upcall_pids unintentionally.
Commit b063d9f0 "datapath: Use unicast Netlink sockets for upcalls" that
introduced an 'upcall_pid' member into struct dpif_linux_vport, struct
dpif_linux_dp, and struct dpif_linux_flow neglected to do so only if the
member was nonzero.  This caused every datapath, vport, and flow operation
to supply an upcall_pid.  In particular, the netdev_set_config() called at
startup when a vport already existed caused the upcall_pid for that vport
to be reset to 0, which in turn caused all packets received on the vport to
be dropped instead of forwarded to ovs-vswitchd.

Reported-by: Shih-Hao Li <shli@nicira.com>
Bug #7714.
2011-10-07 19:53:23 -07:00
Pravin B Shelar
33b14e70e2 datapath: Strip down vport interface - ifIndex.
Following patch removes ifIndex attribute of vport which is not
used in userspace.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>

Bug #7114
2011-10-05 19:06:29 -07:00
Ben Pfaff
a8d9304d12 dpif: Avoid use of "struct ovs_dp_stats" in platform-independent modules.
Over time we wish to reduce the number of datapath-protocol.h definitions
used directly outside of Linux-specific code.  This commit removes use of
"struct ovs_dp_stats" from platform-independent code.

Bug #7559.
2011-10-05 11:18:13 -07:00
Pravin Shelar
6ff686f2bc sFlow: Genericize/simplify kernel sFlow implementation
Following patch adds sampling action which takes probability and set
of actions as arguments. When probability is hit, actions are executed for
given packet.
USERSPACE action's userdata (u64) is used to store struct
user_action_cookie as cookie. CONTROLLER action is fixed accordingly.

Now we can remove sFlow code from kernel and implement sFlow generically
as SAMPLE action. sFlow is defined as SAMPLE Action with probability (sFlow
sampling rate) and USERSPACE action as argument. USERSPACE action's data
is used as cookie. sFlow uses this cookie to store output-port, number of
output ports and vlan-id. sample-pool is calculated by using vport
stats.

Signed-off-by: Pravin Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2011-09-28 10:43:07 -07:00
Jesse Gross
17411ecf2b dpif-linux: Prevent a single port from monopolizing upcalls.
Currently it is possible for a client on a single port to generate
a huge number of packets that miss in the kernel flow table and
monopolize the userspace/kernel communication path.  This
effectively DoS's the machine because no new flow setups can take
place.  This adds some additional fairness by separating each upcall
type for each object in the datapath onto a separate socket, each
with its own queue.  Userspace then reads round-robin from each
socket so other flow setups can still succeed.

Since the number of objects can potentially be large, we don't always
have a unique socket for each.  Instead, we create 16 sockets and
spread the load around them in a round robin fashion.  It's theoretically
possible to do better than this with some kind of active load balancing
scheme but this seems like a good place to start.

Feature #6485
2011-09-23 15:27:49 -07:00
Jesse Gross
b063d9f06e datapath: Use unicast Netlink sockets for upcalls.
Currently we publish several multicast groups for upcalls and let
userspace sockets subscribe to them.  The benefit of this is mostly
that userspace is the one doing the subscription - the actual
multicast capability is not currently used and probably wouldn't be
even if we moved to a multiprocess model.  Despite the convenience,
multicast sockets have a number of disadvantages, primarily that
we only have a limited number of them so there could be collisions.
In addition, unicast sockets give additional flexibility to userspace
by allowing every object to potentially have a different socket
chosen by userspace for upcalls.  Finally, any future optimizations
for upcalls to reduce copying will likely not be compatible with
multicast anyways so disallowing it potentially simplifies things.

We also never unregistered the multicast groups registered for upcalls
and leaked them on module unload.  As a side effect, this solves that
problem.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2011-09-23 15:27:48 -07:00
Ethan Jackson
213a13ed77 dpif-linux: Handle nl_lookup_genl_mcgroup() failures.
The nl_lookup_genl_mcgroup() function can fail on older kernels
which do not support the required netlink interface.  Before this
patch, dpif-linux would refuse to create a datapath when this
happened.  With this patch, it attempts to use a workaround.  If
the workaround fails it simply disables the affected features
without completely disabling the dpif.
2011-09-16 11:22:30 -07:00
Ethan Jackson
8f4a4df5e3 dpif-linux: Open dpif despite notifier failures.
Before this patch, if dpif-linux failed to register a notifier it
would give up opening the datapath entirely.  This seems draconian
as a dpif can still perform the majority of its intended
functionality without vport notifications.
2011-09-16 11:22:30 -07:00
Ethan Jackson
2ee6545f2b notifiers: Create and destroy nln_notifiers.
This patch changes the interface of netlink-notifier and
rtnetlink-link.  Now nln_notifiers are allocated and destroyed by
the module instead of passed in by callers.  This allows the
definition of nln_notifier to be hidden, and generally cleans up
the code.
2011-09-16 11:22:30 -07:00
Ethan Jackson
18a2378164 notifiers: Rename run and wait functions.
It makes more sense to call nln_notifier_run() and
nln_notifier_wait() simply nln_run() and nln_wait() since they
don't operate on notifiers but the entire nln object.  This patch
changes the nln and the rtnetlink-link modules to the new
convention.
2011-09-16 11:22:30 -07:00
Pravin Shelar
f613a0d72c datapath: Always use generic stats for devices (vports)
Currently ovs is using device stats for Linux devices and count them
itself in other situations. This leads to overlap with hardware stats,
inconsistencies, etc. It's much better to just always count the packets
flowing through the switch and let userspace do any merging that it wants.

Following patch removes vport->get_stats() interface. vport-stat is changed
to use new `struct ovs_vport_stat` rather than rtnl_link_stats64.
Definitions of rtnl_link_stats64 is removed from OVS.  dipf_port->stat is also
removed as aggregate stats are only available at netdev layer.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-09-15 19:36:17 -07:00
Pravin Shelar
9b02078077 datapath: Strip down vport interface : OVS_VPORT_ATTR_MTU
There is no need to have vport attribute MTU (OVS_VPORT_ATTR_MTU) as
linux net-dev-ioctl can be used to get/set MTU for linux device.
Following patch removes OVS_VPORT_ATTR_MTU from datapath protocol.

This patch also adds netdev_set_mtu interface. So that MTU adjustments
can be done from OVS userspace. get_mtu() interface is also changed, now
get_mtu() returns EOPNOTSUPP rather than returning 0 and setting *pmtu
to INT_MAX in case there is no MTU attribute for given device.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-09-12 17:12:52 -07:00
Pravin Shelar
ff8d7a5e81 Strip down vport interface : iflink
Remove iflink from vport interface. iflink is not used anywhere in
OVS. So there is not need to have iflink as vport attribute.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-09-08 15:18:42 -07:00
Ethan Jackson
c7178a0b3e dpif-linux: Stop listening for RTNL notifications.
Currently dpif-linux listens for vport change events using
rtnetlink notifications.  This patch switches to the ovs genl
notification system.

Feature #6809.
2011-09-01 17:22:00 -07:00
Ethan Jackson
0a811051ff netlink-notifier: Rename rtnetlink code.
This patch renames the rtnetlink module's code to "nln" for
"netlink notifier".  Callers are now required to pass in the
netlink protocol to he newly renamed nln_create() function.
2011-09-01 17:18:52 -07:00
Ethan Jackson
45c8d3a189 lib: Rename rtnetlink.[ch] files.
The only rtnetlink specific functionality contained in the
rtnetlink module is the use of the NETLINK_ROUTE protocol.  This
can easily be passed in by callers.

In preparation for generalization, this patch renames
rtnetlink.[ch] to netlink-notifier.[ch].  Future patches will
complete the transition.
2011-09-01 17:18:51 -07:00
Justin Pettit
24b019f808 datapath: Disable LRO from userspace instead of the kernel.
Whenever a port is added to the datapath, LRO is automatically disabled.
In the future, we may want to enable LRO in some circumstances, so have
userspace disable LRO through the ethtool ioctls.

As part of this change, the MTU and LRO checks are moved to
netdev-vport's send(), which is where they're actually needed.

Feature #6810

Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-08-28 21:30:58 -07:00
Ethan Jackson
d8abf60c72 dpif-linux: Call rtnetlink_notifier_run() as required.
I don't think this actually fixes a bug, as netdev-linux calls this
function. However, it seems stylistically more correct.
2011-08-22 13:06:04 -07:00
Justin Pettit
df2c07f433 datapath: Use "OVS_*" as opposed to "ODP_*" for user<->kernel interactions.
The prefix "ODP_*" is not overly descriptive in the context of the
larger Linux tree.  This commit changes the prefix to "OVS_*" for the
userpace to kernel interactions.  The userspace libraries still use
"ODP_" in many of their interfaces since it is more descriptive in the
OVS oeuvre.

Feature #6904

Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-08-19 22:48:23 -07:00
Ben Pfaff
d3c110b648 dpif-linux: Fix memory and file descriptor leak in dpif_linux_close().
Found with valgrind.
2011-06-07 13:20:35 -07:00
Ethan Jackson
eb8b28e7da dpif-linux: Avoid duplicate code in dpif_linux_vport_send().
dpif_linux_vport_send() had duplicated most of the code in
dpif_linux_execute() in order to execute output actions in the
kernel.  This forces developers to remember to change both
functions whenever the kernel interface changes.  In particular,
commit 80e5eed9 "datapath: Get packet metadata from userspace in
odp_packet_cmd_execute()." broke netdev_linux_vport_send().  This
commit reorganizes the code and fixes the regression.

Bug #5818.
2011-06-06 11:13:22 -07:00
Ben Pfaff
80e5eed9c2 datapath: Get packet metadata from userspace in odp_packet_cmd_execute().
Until now, the tun_id and in_port have been lost when a packet is sent from
the kernel to userspace and then back to the kernel.  I didn't think that
this was a problem, but recent behavior made me look closer and see that
it makes a difference if sFlow is turned on or if an
ODP_ATTR_ACTION_CONTROLLER action is present.  We could possibly kluge
around those, but for future-proofing it seems better to pass the packet
metadata from userspace to the kernel.  That is what this commit does.

This commit introduces a user-kernel protocol break.  We could avoid that,
if it is desirable, by making ODP_PACKET_ATTR_KEY optional for
ODP_PACKET_CMD_EXECUTE commands.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-06-01 13:39:51 -07:00
Ben Pfaff
b2fda3effc Merge 'next' into 'master'.
I know already that this breaks the statsfixes that were implemented by the
following commits:

827ab71c97f "ofproto: Datapath statistics accounted twice."
6f1435fc8f7 "ofproto: Resubmit statistics improperly account during..."

These were already broken in a previous merge.  I will work on a fix.
2011-05-18 14:01:13 -07:00
Ben Pfaff
d3d8f1f7e5 Add missing "static" keywords.
Found by sparse.
2011-05-16 13:40:47 -07:00
Ben Pfaff
0079481775 Merge 'master' into 'next'. 2011-05-12 12:05:42 -07:00
Ben Pfaff
640e1b2077 dpif: Improve abstraction by making 'run' and 'wait' functions per-dpif.
Until now, the dp_run() and dp_wait() functions had to be called at the top
level of the program because they applied to every open dpif.  By replacing
them by functions that take a specific dpif as an argument, we can call
them only from ofproto, which is currently the correct layer to deal with
dpifs.
2011-05-11 12:26:07 -07:00
Ben Pfaff
032aa6a354 ovs-dpctl: Add -s option to print packet and byte counters. 2011-05-02 09:33:12 -07:00
Ethan Jackson
8522b38386 dpif-linux: Recycle leaked ports.
When ports are deleted from the datapath they need to be added to
an LRU list maintained in dpif-linux so they may be reallocated.
When using vswitchd to delete the ports this happens automatically.
However, if a port is deleted directly from the datapath it is
never reclaimed by dpif-linux.  If this happens often, eventually
no ports will be available for allocation and dpif-linux will fall
back to using the old, kernel implemented, allocation strategy.

This commit fixes the problem by automatically reclaiming ports
missing from the datapath whenever the list of ports in the
datapath is dumped.

Bug #2140.
2011-04-29 16:06:31 -07:00
Ben Pfaff
141d9ce465 dpif-linux: Avoid logging error on ENOENT in dpif_linux_is_internal_device().
ENOENT can be returned if the kernel module isn't loaded.  If that's the
case then we've already logged that and there's no point in logging it
again.
2011-04-11 10:46:39 -07:00
Ben Pfaff
42bb6c72b5 dpif-linux: Avoid segfault on netdev_get_stats() without kernel module.
netdev_linux_get_stats() calls into netdev_vport_get_stats(), which in
turn attempts a transaction on genl_sock.  If the kernel module isn't
loaded, then genl_sock won't be there, and in any case there's nothing that
guarantees that it's been initialized yet.

This fixes the problem by ensuring that dpif_linux was initialized properly
before attempting a transaction on genl_sock.

Reported-by: Aaron Rosen <arosen@clemson.edu>
2011-04-11 10:46:32 -07:00
Ethan Jackson
773cd53821 dpif-linux: Choose port numbers more prudently.
Before this patch the kernel chose the lowest available number for
newly created datapath ports.  This patch moves the port number
choosing responsibility to user space, and implements a least
recently used port number queue in an attempt to avoid reuse.

Bug #2140.
2011-04-05 20:40:27 -07:00
Ben Pfaff
7feba1acdd netdev-vport: Implement 'send' function.
The new implementation of the bonding code expects to be able to send
packets on netdevs using netdev_send().  This implements it.
2011-04-01 15:52:19 -07:00
Ben Pfaff
d0c23a1a57 dpif: Use sset instead of svec in dpif interface. 2011-03-31 16:42:01 -07:00
Ben Pfaff
b3c01ed330 Convert shash users that don't use the 'data' value to sset instead.
In each of the cases converted here, an shash was used simply to maintain
a set of strings, with the shash_nodes' 'data' values set to NULL.  This
commit converts them to use sset instead.
2011-03-31 16:42:01 -07:00