Following this commit, "struct odp_flow" and related data structures are
only used in Linux-specific parts of OVS userspace code. This allows the
actual Linux datapath interface to evolve more freely.
Reviewed by Justin Pettit.
This brings the code closer to what the Netlink interface will need to
implement.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
As with n_flows, n_ports was used regularly by userspace to determine how
much memory to allocate when listing ports, but it is no longer needed for
that. max_ports, on the other hand, is necessary but it is also a fixed
value for the kernel datapath right now and if we expand it we can also
come up with a way to report the expanded value.
The remaining members of odp_stats are actually real statistics that I
intend to keep.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
This queue information will be available through the kernel socket layer
once we move over to Netlink socket as transports, so we might as well get
rid of the redundancy.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Userspace used to use the n_flows information here to decide how much
memory needed to be allocated to list flows, but that isn't necessary any
longer now that listing flows uses an iterator abstraction. The
cur_capacity and max_capacity members are just curiosities and don't
provide much information; if the implementation ever changes away from
the current hash table implementation then they could become meaningless
anyhow.
But more than anything, these aren't really the kind of statistics that
networking people usually care about.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Following this commit, "struct odp_port" is only used in Linux-specific
parts of OVS userspace code. This allows the actual Linux datapath
interface to evolve more freely.
Reviewed by Justin Pettit.
One of the goals for Open vSwitch is to decouple kernel and userspace
software, so that either one can be upgraded or rolled back independent of
the other. To do this in full generality, it must be possible to add new
features to the kernel vport layer without changing userspace software. In
turn, that means that the odp_port structure must become variable-length.
This does not, however, fit in well with the ODP_PORT_LIST ioctl in its
current form, because that would require userspace to know how much space
to allocate for each port in advance, or to allocate as much space as
could possibly be needed. Neither choice is very attractive.
This commit prepares for a different solution, by replacing ODP_PORT_LIST
by a new ioctl ODP_VPORT_DUMP that retrieves information about a single
vport from the datapath on each call. It is much cleaner to allocate the
maximum amount of space for a single vport than to do so for possibly a
large number of vports.
It would be faster to retrieve a number of vports in batch instead of just
one at a time, but that will naturally happen later when the kernel
datapath interface is changed to use Netlink, so this patch does not bother
with it.
The Netlink version won't need to take the starting port number from
userspace, since Netlink sockets can keep track of that state as part
of their "dump" feature.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
One of the goals for Open vSwitch is to decouple kernel and userspace
software, so that either one can be upgraded or rolled back independent of
the other. To do this in full generality, it must be possible to change
the kernel's idea of the flow key separately from the userspace version.
This commit takes one step in that direction by making the kernel report
its idea of the flow that a packet belongs to whenever it passes a packet
up to userspace. This means that userspace can intelligently figure out
what to do:
- If userspace's notion of the flow for the packet matches the kernel's,
then nothing special is necessary.
- If the kernel has a more specific notion for the flow than userspace,
for example if the kernel decoded IPv6 headers but userspace stopped
at the Ethernet type (because it does not understand IPv6), then again
nothing special is necessary: userspace can still set up the flow in
the usual way.
- If userspace has a more specific notion for the flow than the kernel,
for example if userspace decoded an IPv6 header but the kernel
stopped at the Ethernet type, then userspace can forward the packet
manually, without setting up a flow in the kernel. (This case is
bad from a performance point of view, but at least it is correct.)
This commit does not actually make userspace flexible enough to handle
changes in the kernel flow key structure, although userspace does now
have enough information to do that intelligently. This will have to wait
for later commits.
This commit is bigger than it would otherwise be because it is rolled
together with changing "struct odp_msg" to a sequence of Netlink
attributes. The alternative, to do each of those changes in a separate
patch, seemed like overkill because it meant that either we would have to
introduce and then kill off Netlink attributes for in_port and tun_id, if
Netlink conversion went first, or shove yet another variable-length header
into the stuff already after odp_msg, if adding the flow key to odp_msg
went first.
This commit will slow down performance of checksumming packets sent up to
userspace. I'm not entirely pleased with how I did it. I considered a
couple of alternatives, but none of them seemed that much better.
Suggestions welcome. Not changing anything wasn't an option,
unfortunately. At any rate some slowdown will become unavoidable when OVS
actually starts using Netlink instead of just Netlink framing.
(Actually, I thought of one option where we could avoid that: make
userspace do the checksum instead, by passing csum_start and csum_offset as
part of what goes to userspace. But that's not perfect either.)
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
One of the goals for Open vSwitch is to decouple kernel and userspace
software, so that either one can be upgraded or rolled back independent of
the other. To do this in full generality, it must be possible to change
the kernel's idea of the flow key separately from the userspace version.
In turn, that means that flow keys must become variable-length. This
commit makes that change using Netlink attribute sequences.
This commit does not actually make userspace flexible enough to handle
changes in the kernel flow key structure, because userspace doesn't yet
have enough information to do that intelligently. Upcoming commits will
fix that.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
One of the goals for Open vSwitch is to decouple kernel and userspace
software, so that either one can be upgraded or rolled back independent of
the other. To do this in full generality, it must be possible to change
the kernel's idea of the flow key separately from the userspace version.
In turn, that means that flow keys must become variable-length. This does
not, however, fit in well with the ODP_FLOW_LIST ioctl in its current form,
because that would require userspace to know how much space to allocate
for each flow's key in advance, or to allocate as much space as could
possibly be needed. Neither choice is very attractive.
This commit prepares for a different solution, by replacing ODP_FLOW_LIST
by a new ioctl ODP_FLOW_DUMP that retrieves a single flow from the datapath
on each call. It is much cleaner to allocate the maximum amount of space
for a single flow key than to do so for possibly a very large number of
flow keys.
As a side effect, this patch also fixes a race condition that sometimes
made "ovs-dpctl dump-flows" print an error: previously, flows were listed
and then their actions were retrieved, which left a window in which
ovs-vswitchd could delete the flow. Now dumping a flow and its actions is
a single step, closing that window.
Dumping all of the flows in a datapath is no longer an atomic step, so now
it is possible to miss some flows or see a single flow twice during
iteration, if the flow table is modified by another process. It doesn't
look like this should be a problem for ovs-vswitchd.
It would be faster to retrieve a number of flows in batch instead of just
one at a time, but that will naturally happen later when the kernel
datapath interface is changed to use Netlink, so this patch does not bother
with it.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
This is a long-standing bug--it was present in version 1.0 too.
Reported-by: Gaetano Catalli <gaetano.catalli@gmail.com>
Solution by Jesse Gross <jesse@nicira.com>
Currently the type of the datapath action length is mixture of
size_t and unsigned int. However, size_t is really defined as an
unsigned long, which causes the build to fail on 64-bit platforms.
This consistently uses size_t.
We have a need to identify tunnels with keys longer than 32 bits. This
commit adds basic datapath and OpenFlow support for such keys. It doesn't
actually add any tunnel protocols that support 64-bit keys, so this is not
very useful yet.
The 'arg' member of struct odp_msg had to be expanded to 64-bits also,
because it sometimes contains a tunnel ID. This member also contains the
argument passed to ODPAT_CONTROLLER, so I expanded that action's argument
to 64 bits also so that it can use the full width of the expanded 'arg'.
Userspace doesn't take advantage of the new space though (it was only
using 16 bits anyhow).
This commit has been tested only to the extent that it doesn't disrupt
basic Open vSwitch operation. I have not tested it with tunnel traffic.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Feature #3976.
In the medium term, we plan to migrate the datapath to use Netlink as its
communication channel. In the short term, we need to be able to have
actions with 64-bit arguments but "struct odp_action" only has room for
48 bits. So this patch shifts to variable-length arguments using Netlink
attributes, which starts in on the Netlink transition and makes 64-bit
arguments possible at the same time.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
ovs_queue doesn't seem very useful; it's just a singly-linked list. It's
more generally useful to use a general-purpose "struct list" for lists of
packets, so this commit adds such a member to "struct ofpbuf" and shifts
the existing users to use it.
For some time now, Open vSwitch datapaths have internally made a
distinction between adding a vport and attaching it to a datapath. Adding
a vport just means to create it, as an entity detached from any datapath.
Attaching it gives it a port number and a datapath. Similarly, a vport
could be detached and deleted separately.
After some study, I think I understand why this distinction exists. It is
because ovs-vswitchd tries to open all the datapath ports before it tries
to create them. However, changing it to create them before it tries to
open them is not difficult, so this commit does this.
The bulk of this commit, however, changes the datapath interface to one
that always creates a vport and attaches it to a datapath in a single step,
and similarly detaches a vport and deletes it in a single step.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
This makes it easier for dpif_provider implementations to share code but
distinguish the class actually in use, because comparing a pointer is
easier than comparing a string.
There's no need to have a mask in this action, because both parts of the
TCI are part of the flow structure.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
This allows eliminating padding from odp_flow_key, although actually doing
that is postponed until the next commit.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
When userspace and the kernel were using the same structure for flows,
flow_t was a useful way to indicate that a structure was really a userspace
flow instead of a kernel one, but now it's better to just write "struct
flow" for consistency, since OVS doesn't use typedefs for structs
elsewhere.
Acked-by: Jesse Gross <jesse@nicira.com>
The "struct odp_flow_key" used in the kernel datapath is conceptually
separate from the "flow_t" used in userspace, but until now we have
used the latter as a typedef for the former for convenience. This commit
separates them. This makes it possible in upcoming commits to change
them independently.
This is cross-ported from the "wdp" branch, which has had it for months.
The "port group" concept seems like a good one, but it has not been
used very much in userspace so far, so before we commit ourselves to
a frozen API that we must maintain forever, remove it. We can always
add it back in later as a new kind of vport.
Signed-off-by: Ben Pfaff <blp@nicira.com>
All of these changes avoid using the same name for two local variables
within a same function. None of them are actual bugs as far as I can tell,
but any of them could be confusing to the casual reader.
The one in lib/ovsdb-idl.c is particularly brilliant: inner and outer
loops both using (different) variables named 'i'.
Found with GCC -Wshadow.
Some of the flow actions that modify skbuff data did not check that the
skbuff was long enough before doing so. This commit fixes that problem.
Previously, the strategy for avoiding this was to only indicate the layer-3
nw_proto field in the flow if the corresponding layer-4 header was fully
present, so that if, for example, nw_proto was IPPROTO_TCP, this meant
that a TCP header was present. The original motivation for this patch was
to add corresponding code to only indicate a layer-2 dl_type if the
corresponding layer-3 header was fully present. But I'm now convinced that
this approach is conceptually wrong, because the meaning of a layer-N
header should not be affected by the meaning of a layer-(N+1) header.
This commit switches to a new approach. Now, when a header is missing, its
fields in the flow are simply zeroed and have no effect on the "type" field
for the outer header. Responsibility for ensuring that a header is fully
present is now shifted to the actions that wish to modify that header.
Signed-off-by: Ben Pfaff <blp@nicira.com>
"ARP spoofing" is when a host claims an incorrect association between an
IP address and a MAC address for deceptive purposes. OpenFlow by itself
can prevent a host from sending out ARP replies from an incorrect MAC
address in the Ethernet L2 header, but it cannot control the MAC addresses
inside the ARP L3 packet. This commit adds a new action that can be used
to drop these spoofed packets.
CC: Paul Ingram <paul@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
It looks to me like the current dpif-netdev implementation doesn't handle
the case where a packet comes in without a VLAN and then is subjected to
multiple ODPAT_SET_VLAN_* operations. dp_netdev_modify_vlan_tci() just
checks the flow key each time to see whether there's a VLAN, but it doesn't
update the flow key to note that there is now a VLAN.
One fix would be to update the flow key, but it's "const" these days.
Instead, add a check for whether the Ethernet type is ETH_TYPE_VLAN,
which should be equivalent.
Actions that modify packets need to tolerate packets that are too small.
Most of the actions already implicitly do this check, since they check for
appropriate values in the flow key that would only be there if the
corresponding data was present. But actions to modify the Ethernet header
didn't have a guarantee that the packet was at least 14 bytes long, and
actions to modify the VLAN didn't have such a guarantee either, so this
adds appropriate checks.
Problem found by code inspection.
Originally, the datapath didn't care about IP TOS at all. Then, to support
NetFlow, we made it keep track of the last-seen IP TOS value on a per-flow
basis. Then, to support OpenFlow 1.0, we added a nw_tos field to
odp_flow_key. We don't need both methods, so this commit drops the
NetFlow-specific tracking.
This introduces a small kernel ABI break: upgrading the kernel module
without upgrading the OVS userspace will mean that NetFlow records will
all show an IP TOS value of 0. I don't consider that to be a serious
problem.
Adding a macro to define the vlog module in use adds a level of
indirection, which makes it easier to change how the vlog module must be
defined. A followup commit needs to do that, so getting these widespread
changes out of the way first should make that commit easier to review.
When the QoS code was integrated, I didn't yet know how to abstract the
translation from a queue ID in an OpenFlow OFPAT_ENQUEUE action into a
priority value for an ODP ODPAT_SET_PRIORITY action. This commit is a
first attempt that works OK for Linux, so far. It's possible that in fact
this translation needs the 'netdev' as an argument too, but it's not needed
yet.
Currently the flow key is updated to match an action that is applied
to a packet but these field are never looked at again. Not only is
this a waste of time it also makes optimizations involving caching
the flow key more difficult.
Most of the timekeeping needs of OVS are simply to measure intervals,
which means that it is sensitive to changes in the clock. This commit
replaces the existing clocks with monotonic timers. An additional set
of wall clock timers are added and used in locations that need absolute
time.
Bug #1858
The most recent revision of the netdev library added may_create
and may_open flags to explicitly state the intent of the caller as
to whether the device should already be in use. This was simply
a sanity check for users of the netdev library and the configuration.
At this point the netdev library and its users are well behaved and
should no longer need to be checked. Additional checks have also
been added for incorrect configuration that mean the netdev library
is no longer the primary line of defense.
These flags themselves create problems because it is not always
easy for a library to know what the state of devices should be.
This is particularly a problem for ovs-openflowd, which expects
ports to be added by ovs-dpctl. Fixing this either requires that
the checks are so permissive to be useless or ugly hacks to get
around them. Since they are no longer needed, just remove the
checks.
This commit restores the previous behavior of ovs-openflowd to
not require that ports be specified on the command line or
cleaned up after use.
Bug #2652
CC: Natasha Gude <natasha@nicira.com>
CC: Jean Tourrilhes <jt@hpl.hp.com>
CC: 蒲彦 <yan.p.bjtu@gmail.com>
When a dpif passes an odp_msg down to ofproto, and ofproto transforms it
into an ofp_packet_in to send to the controller, until now this always
involved a full copy of the packet inside ofproto. This commit eliminates
this copy by ensuring that there is always enough headroom in the ofpbuf
that holds the odp_msg to replace it by an ofp_packet_in in-place.
From Jean Tourrilhes <jt@hpl.hp.com>, with some revisions.
Add a tun_id field which contains the ID of the encapsulating tunnel
on which a packet was received (0 if not received on a tunnel). Also
add an action which allows the tunnel ID to be set for outgoing
packets. At this point there aren't any tunnel implementations so
these fields don't have any effect.
The matching is exposed to OpenFlow by overloading the high 32 bits
of the cookie as the tunnel ID. ovs-ofctl is capable of turning
on this special behavior using a new "tun-cookie" command but this
command is intentially undocumented to avoid it being used without
a full understanding of the consequences.
After executing an action that changes a packet sometimes we update
the flow key and sometimes we don't. This is potentially problematic
because we sometimes use the key for checks later on. This consistently
maintains the key.