I'd like to change ->dpif_flow_put() and ->dpif_execute() in the dpif
provider to take the structures of the same names as parameters, instead of
passing them discrete parameters, because this seems like a more sensible
way to do things internally than to have two different ways to pass the
parameters. It might even simplify code slightly. But ->flow_put() and
->execute() wouldn't want the 'type' (because it's implied by the function
being called) or 'error' (because it would be the same as the return
value). Although of course they could just ignore those members, it seems
slightly cleaner to omit them entirely, as this change allows.
Signed-off-by: Ben Pfaff <blp@nicira.com>
At one point in the past, there were three separate queues between the
kernel module and OVS userspace, each of which corresponded to a Netlink
socket (or, before that, to a character device). It made sense to allow
each of these to be enabled or disabled separately, hence the "listen mask"
concept in the dpif layer.
These days, the concept is much less clear-cut. Queuing is no longer on
the basis of different classes of packets but instead striped across a
collection of sockets based on input port. It doesn't really make sense
to enable receiving packets on the basis of the kind of packet anymore.
Accordingly, this commit simplifies the "listen_mask" to just a bool that
either enables or disables receiving packets.
It could be useful to enable or disable receiving packets on a per-vport
basis, but the rest of the code isn't ready to make use of that so this
commit doesn't generalize this much.
Based on this discussion on ovs-dev:
http://openvswitch.org/pipermail/dev/2011-October/012044.html
Signed-off-by: Ben Pfaff <blp@nicira.com>
The unit tests did not allow users to run them as root because
ovs-vswitchd would destroy all of the existing 'system' datapaths.
This patch prevents ovs-vswitchd from registering 'system'
datapaths when running unit tests preventing the issue.
Until now, OVS has handled IP fragments more awkwardly than necessary. It
has not been possible to match on L4 headers, even in fragments with offset
0 where they are actually present. This means that there was no way to
implement ACLs that treat, say, different TCP ports differently, on
fragmented traffic; instead, all decisions for fragment forwarding had to
be made on the basis of L2 and L3 headers alone.
This commit improves the situation significantly. It is still not possible
to match on L4 headers in fragments with nonzero offset, because that
information is simply not present in such fragments, but this commit adds
the ability to match on L4 headers for fragments with zero offset. This
means that it becomes possible to implement ACLs that drop such "first
fragments" on the basis of L4 headers. In practice, that effectively
blocks even fragmented traffic on an L4 basis, because the receiving IP
stack cannot reassemble a full packet when the first fragment is missing.
This commit works by adding a new "fragment type" to the kernel flow match
and making it available through OpenFlow as a new NXM field named
NXM_NX_IP_FRAG. Because OpenFlow 1.0 explicitly says that the L4 fields
are always 0 for IP fragments, it adds a new OpenFlow fragment handling
mode that fills in the L4 fields for "first fragments". It also enhances
ovs-ofctl to allow users to configure this new fragment handling mode and
to parse the new field.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Bug #7557.
Commit b063d9f06 "datapath: Use unicast Netlink sockets for upcalls" that
switched from multicast to unicast Netlink for sending upcalls added a
Netlink PID to each kernel flow, used by OVS_ACTION_ATTR_USERSPACE actions
within the flow as target.
This commit drops this per-flow PID in favor of a per-action PID, because
that is more flexible. It does not yet make use of this additional
flexibility, so behavior should not change.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Bug #7559.
Over time we wish to reduce the number of datapath-protocol.h definitions
used directly outside of Linux-specific code. This commit removes use of
"struct ovs_dp_stats" from platform-independent code.
Bug #7559.
The "flow" module is concerned only with OpenFlow flows these days. It
shouldn't have anything to do with ODP or dpifs. However, it included
dpif.h just to implement flow_extract_stats(). This function is a better
fit for dpif.c, so this commit moves it there and removes the dpif.h
#include from flow.h and flow.c
This commit also removes a few more dpif.h #includes that weren't needed.
Following patch adds sampling action which takes probability and set
of actions as arguments. When probability is hit, actions are executed for
given packet.
USERSPACE action's userdata (u64) is used to store struct
user_action_cookie as cookie. CONTROLLER action is fixed accordingly.
Now we can remove sFlow code from kernel and implement sFlow generically
as SAMPLE action. sFlow is defined as SAMPLE Action with probability (sFlow
sampling rate) and USERSPACE action as argument. USERSPACE action's data
is used as cookie. sFlow uses this cookie to store output-port, number of
output ports and vlan-id. sample-pool is calculated by using vport
stats.
Signed-off-by: Pravin Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Currently ovs is using device stats for Linux devices and count them
itself in other situations. This leads to overlap with hardware stats,
inconsistencies, etc. It's much better to just always count the packets
flowing through the switch and let userspace do any merging that it wants.
Following patch removes vport->get_stats() interface. vport-stat is changed
to use new `struct ovs_vport_stat` rather than rtnl_link_stats64.
Definitions of rtnl_link_stats64 is removed from OVS. dipf_port->stat is also
removed as aggregate stats are only available at netdev layer.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
The prefix "ODP_*" is not overly descriptive in the context of the
larger Linux tree. This commit changes the prefix to "OVS_*" for the
userpace to kernel interactions. The userspace libraries still use
"ODP_" in many of their interfaces since it is more descriptive in the
OVS oeuvre.
Feature #6904
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
ODP_ACTION_ATTR_CONTROLLER in the kernel actually sends packets to
userspace, not the controller. To make it generic rename this action
to ODP_ACTION_ATTR_USERSPACE.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
The kernel now provides the entire flow key for a packet sent up to
userspace, but dpif_recv() would only log the in_port. This change makes
userspace log the entire flow key.
This would have made a bug that I recently looked at a bit easier to
investigate.
Until now, the tun_id and in_port have been lost when a packet is sent from
the kernel to userspace and then back to the kernel. I didn't think that
this was a problem, but recent behavior made me look closer and see that
it makes a difference if sFlow is turned on or if an
ODP_ATTR_ACTION_CONTROLLER action is present. We could possibly kluge
around those, but for future-proofing it seems better to pass the packet
metadata from userspace to the kernel. That is what this commit does.
This commit introduces a user-kernel protocol break. We could avoid that,
if it is desirable, by making ODP_PACKET_ATTR_KEY optional for
ODP_PACKET_CMD_EXECUTE commands.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
This eliminates the final reference from bridge.c directly into the dpif
layer, which will make it easier to change the implementation of ofproto
to support other lower layers.
Until now, the dp_run() and dp_wait() functions had to be called at the top
level of the program because they applied to every open dpif. By replacing
them by functions that take a specific dpif as an argument, we can call
them only from ofproto, which is currently the correct layer to deal with
dpifs.
Jesse suggested this naming scheme, so I'm adjusting existing names to
fit it.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
None of the remaining dpif implementations have more than one name per
dpif, so there's no need for this function anymore.
Suggested-by: Jesse Gross <jesse@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
This commit calls genl_lock() and thus doesn't support Linux before
2.6.35, which wasn't exported before that version. That problem will
be fixed once the whole userspace interface transitions to Generic
Netlink a few commits from now.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Nothing was productively using the 'flags' member of odp_flow, so this
commit removes it.
ODPFF_ZERO_TCP_FLAGS isn't used at all (as of the previous commit).
ODPFF_EOF has been replaced by a special case of the 'key_len' member.
This will go away, too, once AF_NETLINK starts being used.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Following this commit, the ODPPF_* constants are only used in
Linux-specific parts of OVS userspace code. This allows the actual Linux
datapath interface to evolve more freely.
Reviewed by Justin Pettit.
Following this commit, "struct odp_flow_stats" is only used in
Linux-specific parts of OVS userspace code. This allows the actual Linux
datapath interface to evolve more freely.
Reviewed by Justin Pettit.
Following this commit, "struct odp_flow" and related data structures are
only used in Linux-specific parts of OVS userspace code. This allows the
actual Linux datapath interface to evolve more freely.
Reviewed by Justin Pettit.
This brings the code closer to what the Netlink interface will need to
implement.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
As with n_flows, n_ports was used regularly by userspace to determine how
much memory to allocate when listing ports, but it is no longer needed for
that. max_ports, on the other hand, is necessary but it is also a fixed
value for the kernel datapath right now and if we expand it we can also
come up with a way to report the expanded value.
The remaining members of odp_stats are actually real statistics that I
intend to keep.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
This queue information will be available through the kernel socket layer
once we move over to Netlink socket as transports, so we might as well get
rid of the redundancy.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Following this commit, "struct odp_port" is only used in Linux-specific
parts of OVS userspace code. This allows the actual Linux datapath
interface to evolve more freely.
Reviewed by Justin Pettit.
One of the goals for Open vSwitch is to decouple kernel and userspace
software, so that either one can be upgraded or rolled back independent of
the other. To do this in full generality, it must be possible to add new
features to the kernel vport layer without changing userspace software. In
turn, that means that the odp_port structure must become variable-length.
This does not, however, fit in well with the ODP_PORT_LIST ioctl in its
current form, because that would require userspace to know how much space
to allocate for each port in advance, or to allocate as much space as
could possibly be needed. Neither choice is very attractive.
This commit prepares for a different solution, by replacing ODP_PORT_LIST
by a new ioctl ODP_VPORT_DUMP that retrieves information about a single
vport from the datapath on each call. It is much cleaner to allocate the
maximum amount of space for a single vport than to do so for possibly a
large number of vports.
It would be faster to retrieve a number of vports in batch instead of just
one at a time, but that will naturally happen later when the kernel
datapath interface is changed to use Netlink, so this patch does not bother
with it.
The Netlink version won't need to take the starting port number from
userspace, since Netlink sockets can keep track of that state as part
of their "dump" feature.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
One of the goals for Open vSwitch is to decouple kernel and userspace
software, so that either one can be upgraded or rolled back independent of
the other. To do this in full generality, it must be possible to change
the kernel's idea of the flow key separately from the userspace version.
This commit takes one step in that direction by making the kernel report
its idea of the flow that a packet belongs to whenever it passes a packet
up to userspace. This means that userspace can intelligently figure out
what to do:
- If userspace's notion of the flow for the packet matches the kernel's,
then nothing special is necessary.
- If the kernel has a more specific notion for the flow than userspace,
for example if the kernel decoded IPv6 headers but userspace stopped
at the Ethernet type (because it does not understand IPv6), then again
nothing special is necessary: userspace can still set up the flow in
the usual way.
- If userspace has a more specific notion for the flow than the kernel,
for example if userspace decoded an IPv6 header but the kernel
stopped at the Ethernet type, then userspace can forward the packet
manually, without setting up a flow in the kernel. (This case is
bad from a performance point of view, but at least it is correct.)
This commit does not actually make userspace flexible enough to handle
changes in the kernel flow key structure, although userspace does now
have enough information to do that intelligently. This will have to wait
for later commits.
This commit is bigger than it would otherwise be because it is rolled
together with changing "struct odp_msg" to a sequence of Netlink
attributes. The alternative, to do each of those changes in a separate
patch, seemed like overkill because it meant that either we would have to
introduce and then kill off Netlink attributes for in_port and tun_id, if
Netlink conversion went first, or shove yet another variable-length header
into the stuff already after odp_msg, if adding the flow key to odp_msg
went first.
This commit will slow down performance of checksumming packets sent up to
userspace. I'm not entirely pleased with how I did it. I considered a
couple of alternatives, but none of them seemed that much better.
Suggestions welcome. Not changing anything wasn't an option,
unfortunately. At any rate some slowdown will become unavoidable when OVS
actually starts using Netlink instead of just Netlink framing.
(Actually, I thought of one option where we could avoid that: make
userspace do the checksum instead, by passing csum_start and csum_offset as
part of what goes to userspace. But that's not perfect either.)
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
One of the goals for Open vSwitch is to decouple kernel and userspace
software, so that either one can be upgraded or rolled back independent of
the other. To do this in full generality, it must be possible to change
the kernel's idea of the flow key separately from the userspace version.
In turn, that means that flow keys must become variable-length. This does
not, however, fit in well with the ODP_FLOW_LIST ioctl in its current form,
because that would require userspace to know how much space to allocate
for each flow's key in advance, or to allocate as much space as could
possibly be needed. Neither choice is very attractive.
This commit prepares for a different solution, by replacing ODP_FLOW_LIST
by a new ioctl ODP_FLOW_DUMP that retrieves a single flow from the datapath
on each call. It is much cleaner to allocate the maximum amount of space
for a single flow key than to do so for possibly a very large number of
flow keys.
As a side effect, this patch also fixes a race condition that sometimes
made "ovs-dpctl dump-flows" print an error: previously, flows were listed
and then their actions were retrieved, which left a window in which
ovs-vswitchd could delete the flow. Now dumping a flow and its actions is
a single step, closing that window.
Dumping all of the flows in a datapath is no longer an atomic step, so now
it is possible to miss some flows or see a single flow twice during
iteration, if the flow table is modified by another process. It doesn't
look like this should be a problem for ovs-vswitchd.
It would be faster to retrieve a number of flows in batch instead of just
one at a time, but that will naturally happen later when the kernel
datapath interface is changed to use Netlink, so this patch does not bother
with it.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
We have a need to identify tunnels with keys longer than 32 bits. This
commit adds basic datapath and OpenFlow support for such keys. It doesn't
actually add any tunnel protocols that support 64-bit keys, so this is not
very useful yet.
The 'arg' member of struct odp_msg had to be expanded to 64-bits also,
because it sometimes contains a tunnel ID. This member also contains the
argument passed to ODPAT_CONTROLLER, so I expanded that action's argument
to 64 bits also so that it can use the full width of the expanded 'arg'.
Userspace doesn't take advantage of the new space though (it was only
using 16 bits anyhow).
This commit has been tested only to the extent that it doesn't disrupt
basic Open vSwitch operation. I have not tested it with tunnel traffic.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Feature #3976.
In the medium term, we plan to migrate the datapath to use Netlink as its
communication channel. In the short term, we need to be able to have
actions with 64-bit arguments but "struct odp_action" only has room for
48 bits. So this patch shifts to variable-length arguments using Netlink
attributes, which starts in on the Netlink transition and makes 64-bit
arguments possible at the same time.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
For some time now, Open vSwitch datapaths have internally made a
distinction between adding a vport and attaching it to a datapath. Adding
a vport just means to create it, as an entity detached from any datapath.
Attaching it gives it a port number and a datapath. Similarly, a vport
could be detached and deleted separately.
After some study, I think I understand why this distinction exists. It is
because ovs-vswitchd tries to open all the datapath ports before it tries
to create them. However, changing it to create them before it tries to
open them is not difficult, so this commit does this.
The bulk of this commit, however, changes the datapath interface to one
that always creates a vport and attaches it to a datapath in a single step,
and similarly detaches a vport and deletes it in a single step.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
The "port group" concept seems like a good one, but it has not been
used very much in userspace so far, so before we commit ourselves to
a frozen API that we must maintain forever, remove it. We can always
add it back in later as a new kind of vport.
Signed-off-by: Ben Pfaff <blp@nicira.com>
When an OpenFlow flow says to send packets to the controller, until now
ofproto has executed that using dpif_execute(), which passes the packet up
to the kernel. The kernel queues the packet into its "action" queue, and
then later ofproto pulls the packet back down from the kernel and sends it
to the controller.
However, this is unnecessary. Open vSwitch can just recognize in advance
that it will get the packet back and handle it directly, skipping the round
trip. This commit implements this optimization.
This generally affects only the first packet in a flow, since generally the
rest come directly down from the kernel. It only optimizes the "easy" case
where the first action in a flow is to send the packet to the controller,
since this seems to be the common case in the flows that I'm looking at
now.
When the QoS code was integrated, I didn't yet know how to abstract the
translation from a queue ID in an OpenFlow OFPAT_ENQUEUE action into a
priority value for an ODP ODPAT_SET_PRIORITY action. This commit is a
first attempt that works OK for Linux, so far. It's possible that in fact
this translation needs the 'netdev' as an argument too, but it's not needed
yet.
This brings over some features that were added to the netdev interface,
most notably the separation between the name and the type. In addition
to being cleaner, this also avoids problems where it is expected that
the local port has the same name as the datapath.
This function combines what dpif_create() and dpif_open() do. It allows
us to factor a tiny amount of code out of the vswitch, but more importantly
this function is also useful in the following commit.
At startup, the vswitch needs to delete datapaths that are not configured
by the administrator. Until now this was done by knowing the possible
names of Linux datapaths. This commit cleans up by allowing each
datapath class to enumerate its existing datapaths and their names.