The dpif provider 'operate' call duplicates all of the features available
from the 'flow_put', 'flow_del', and 'execute' calls, yielding redundant
code in providers that support both mechanisms. This change drops the
latter calls in favor of making every dpif provider support 'operate'.
The result is code that is overall less duplicative.
It might make sense to do the same with flow_get but so far 'operate'
doesn't support flow_get.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Open vSwitch userspace uses special types to indicate that a particular
object may not be naturally aligned. Netlink is one source of such
problems: in Netlink, 64-bit integers are often aligned only on 32-bit
boundaries. This commit changes the odp-netlink.h that is transformed from
<linux/openvswitch.h> to use these types to make it harder to accidentally
access a misaligned 64-bit member.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Typically, kernel datapath threads send upcalls to userspace where
handler threads process the upcalls. For TAP and DPDK devices, the
datapath threads operate in userspace, so there is no need for
separate handler threads.
This patch allows userspace datapath threads to directly call the
ofproto upcall functions, eliminating the need for handler threads
for datapaths of type 'netdev'.
Signed-off-by: Ryan Wilson <wryan@nicira.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
When dpif_linux_refresh_channels() refreshes the set of channels when
the number of handlers changes, it destroys all the dpif's channels and
sets dpif->uc_array_size to 0. If the port dump later in the function
turns up no ports (which generally indicates a bug), then no channels will
be allocated and thus dpif->uc_array_size will remain 0 and 'channels' will
be null in each handler. This is self-consistent, at least, but
dpif_linux_port_get_pid__() was still willing in this situation to
try to access element 0 of the set of channels, dereferencing a null
pointer.
This fixes the problem.
I encountered this while looking at a bug that I had introduced during
development that caused the port dump to always be empty. It would be
difficult to encounter in normal use.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
Change the interface to allow implementations to pass back a buffer, and
allow callers to specify which of actions, mask, and stats they wish to
receive. This will be used in the next commit.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This commit fixes a race between port deletion and flow miss handling.
More specifically, a port could be removed by main thread while
the handler thread is handling the flow miss from it. If the flow
requires slow path action, the handler thread will try querying a pid
from port's socket. Since the port has been deleted, the query will
cause a dereference of NULL socket pointer.
This commit makes the handler thread recheck the socket pointer before
dereferencing it.
VMware-BZ: 1251981
Reported-by: Pratap Reddy <preddy@nicira.com>
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
This adds support for Geneve - Generic Network Virtualization
Encapsulation. The protocol is documented at
http://tools.ietf.org/html/draft-gross-geneve-00
The kernel implementation is completely agnostic to the options
that are in use and can handle newly defined options without
further work. It does this by simply matching on a byte array
of options and allowing userspace to setup flows on this array.
Userspace currently implements only support for basic version of
Geneve. It can work with the base header (including the VNI) and
is capable of parsing options but does not currently support any
particular option definitions. Over time, the intention is to
allow options to be matched through OpenFlow without requiring
explicit support in OVS userspace.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Since dpif_netdev_enumerate() is used for "netdev" and "dummy" class, it
incorrectly lists dpif-netdevs as "dummy" and vice versa.
This patches address the issue by changing the dpif-provider interface: a
dpif_class parameter is passed to the 'enumerate' call to match the right class.
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Commit a6ce4b9d25 (ofproto-dpif-upcall: Avoid use-after-free in
revalidate() corner case.) showed that it is somewhat tricky to correctly
use the existing dpif flow dumping interface to obtain batches of flows.
One has to be careful about calling dpif_flow_dump_next_may_destroy_keys()
before going on to the next flow.
A better interface is possible, one that is naturally oriented toward
retrieving batches when that is a useful optimization. This commit
replaces the dpif interface by such a design, and updates both the
implementations and the callers to adopt it.
This is a fairly large change, but I think that the code in
ofproto-dpif-upcall is easier to understand after the change.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Based on the policy of 'OVS_VPORT_ATTR_UPCALL_PID', if upcall should
not be sent to userspace (i.e. the number of handler threads is zero),
the netlink message for creating vport should be an one-element array of
value 0. However, dpif_linux_port_add__() fails to obey it and generates
zero-payload netlink message which causes the netlink transaction failing
with ERANGE error.
This is particularly bad when the 'flow-restore-wait' is set during upgrade,
since number of handler threads is not set in dpif-linux module and ovs is
not able to add port in datapath until the 'flow-restore-wait' is disabled.
Connection may lose due to this bug.
This bug was introduced by commit 1579cf677f (dpif-linux: Implement the
API functions to allow multiple handler threads read upcall.).
This commit fixes the bug by fixing the dpif_linux_port_add__() to generate
the correct netlink message when the number of handler threads is not set.
Bug #1239914.
Bug #1240598.
Bug #1240626.
Reported-by: Gurucharan Shetty <gshetty@nicira.com>
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This commit reformats the dpif-linux module so that all internal
static functions take 'struct dpif_linux *' as input argument.
This will allow the adding of thread-safety annotations.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
This commit changes the API in 'dpif-provider.h' to allow multiple
handler threads call dpif_recv() simultaneously.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
None of the atomic implementations need a destroy function anymore, so it's
"more standard" and more convenient for users to get rid of them.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
This new function allows callers to determine whether previously
returned keys will be modified or reallocated on the next call to
dpif_flow_dump_next(). This will be used in a future commit to allow
batched flow deletion by revalidator threads.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This patch makes it the caller's responsibility to initialize a
per-thread 'state' object and pass it down to the dpif_flow_dump_next()
implementation. The implementation can expect to be called from multiple
threads with the same 'iter' and different 'state' objects.
When flow_dump_next() returns non-zero, the implementation must ensure
that subsequent calls with the same arguments also return non-zero.
Subsequent calls with the same 'iter' and different 'state' may return
zero, but should make progress towards returning non-zero.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This patch separates the structures for thread-local flow dump state
("state") from the shared flow dump state ("iter") in dpif-linux and
dpif-netdev. Future patches will make use of this to allow multiple
threads to dump flows from the same flow dump operation.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This patch makes all of the users of 'struct nl_dump' allocate their own
buffers to pass down to nl_dump_next(). This paves the way for allowing
multithreaded flow dumping.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
When creating tap ports in dpif-linux, the "tap" type is treated the
same as "system", and the type is discarded. When dumping datapath
port types, this would cause "tap" type to be reported as a "system"
type.
Each time we see a port of the wrong type in bridge_reconfigure(), we
remove it and add a port with the correct configuration. This would
always occur for tap ports, causing deletion and re-creation of all tap
ports each time the bridge was reconfigured.
This patch makes dpif-linux use netdev to look up port types if the
datapath reports that they are of type OVS_VPORT_TYPE_NETDEV.
Bug #1196289.
Reported-by: James Schmidt <jschmidt@vmware.com>
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
The 'dpif_' parameter of dpif_linux_flow_dump_next() was marked as
OVS_UNUSED, even though it's passed down to dpif_linux_flow_get__().
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This helps reduce confusion about when a flow is a flow and when it is
just metadata.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Following commit (''netlink: Do not enforce alignment of last Netlink
attribute''), signal the ability to receive unaligned Netlink messages
to the datapath to enable utilization of zerocopy optimizations.
Opening a datapath is now done by issueing a OVS_DP_CMD_SET in order
to overwrite previously set user features.
Signed-off-by: Thomas Graf <tgraf@redhat.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
We already set all the fields of the upcall, so memsetting right before
is unnecessary.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This commit adds check of sock pointer in dpif_linux_port_get_pid().
If the pointer is NULL, do not call nl_sock_pid().
Signed-off-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Commit 222837 (dpif-linux: Factor out port dumping helper functions.)
introduced a bug by making dpif_linux_port_dump_next__() return 'bool'
instead of 'int' as defined in dpif-provider.h. This bug causes ovs-
vswitchd failure with SEGFAULT when processing slow-path packet.
This commit fixes the bug by following the dpif-provider specification.
Signed-off-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This allows other libraries to use util.h that has already
defined NOT_REACHED.
Signed-off-by: Harold Lim <haroldl@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Commit da546e0 (dpif: Allow execute to modify the packet.) introduced
a bug by subtracting the zero-value ofpbuf size by "sizeof(struct
nlattr)" and assigning the result back to the ofpbuf size. This bug
causes the ovs-assert failure in facet_push_stats().
This commit fixes the bug by assigning the right value to the ofpbuf
size.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Allowing the packet to be modified by execution allows less data
copying for userspace action execution. Some users of the
dpif_execute already expect that the packet may be modified. This
patch makes this behavior uniform and makes the userspace datapath and
the execution helpers modify the packet as it is being executed.
Userspace action now steals the packet if given permission, as the
packet is normally not needed after it. The only exception is the
sample action, and this is accounted for my keeping track of any
actions that could be following the userspace action.
The packet in dpif_upcall is changed from a pointer to a struct,
allowing the packet to be honest about it's headroom. After this
change the packet can safely be pushed on over the precarious 4 byte
limit earlier allowed by the netlink data preceding the packet.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
When ovs-vswitchd deletes a port with dpif_linux_port_del(), that function
uses del_channel() to delete the corresponding channel, including closing
its Netlink socket fd. However, if the vport gets removed by some other
process (e.g. "ip link delete" for veths) then this function never gets
called and thus the channel never gets deleted.
This commit fixes the problem by resynchronizing channels with the
datapath whenever a vport gets deleted.
Bug #16784.
Reported-by: Paul Ingram <paul@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Until now, the code in dpif_linux_recv_set() has assumed it is starting
from a clean slate with no channels at all. This commit generalizes it to
compare the existing set of channels against the ones that should exist
given the vports that the kernel has, adding any new ones that are missing
and removing any that should not exist. A followup commit will make good
use of this functionality.
Signed-off-by: Ben Pfaff <blp@nicira.com>
The MSVC C library printf() implementation does not support the 'z', 't',
'j', or 'hh' format specifiers. This commit changes the Open vSwitch code
to avoid those format specifiers, switching to standard macros from
<inttypes.h> where available and inventing new macros resembling them
where necessary. It also updates CodingStyle to specify the macros' use
and adds a Makefile rule to report violations.
Signed-off-by: Alin Serdean <aserdean@cloudbasesolutions.com>
Co-authored-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Both the old (wrong) and new (correct) enums have the same value, so this
doesn't fix a visible bug.
Reported-by: ankur dwivedi <ankurengg2003@gmail.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
The declaration of 'get_max_ports()' to return odp_port_t adds
unwanted complexity to coding. This commit changes it back to
return uint32_t type.
Signed-off-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
We've seen a number of deadlocks in the tree since thread safety was
introduced. So far, all of these are self-deadlocks, that is, a single
thread acquiring a lock and then attempting to re-acquire the same lock
recursively. When this has happened, the process simply hung, and it was
somewhat difficult to find the cause.
POSIX "error-checking" mutexes check for this specific problem (and
others). This commit switches from other types of mutexes to
error-checking mutexes everywhere that we can, that is, everywhere that
we're not using recursive mutexes. This ought to help find problems more
quickly in the future.
There might be performance advantages to other kinds of mutexes in some
cases. However, the existing mutex type choices were just guesses, so I'd
rather go for easy detection of errors until we know that other mutex
types actually perform better in specific cases. Also, I did a quick
microbenchmark of glibc mutex types on my host and found that the
error checking mutexes weren't any slower than the other types, at least
when the mutex is uncontended.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
If a notification is bigger than 4 kB (I doubt it one would be), then the
lack of ofpbuf_uninit() would cause a memory leak.
Signed-off-by: Ben Pfaff <blp@nicira.com>
This commit adds annotations for thread safety check. And the
check can be conducted by using -Wthread-safety flag in clang.
Co-authored-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This seems to be the last remaining thread-unsafe code in dpif-linux.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
This disentangles "struct nl_dump" from "struct nl_sock", clearing the way
to make the use of either one thread-safe in an obviously correct manner.
Signed-off-by: Ben Pfaff <blp@nicira.com>