2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-22 09:58:01 +00:00

32 Commits

Author SHA1 Message Date
Flavio Leitner
cf114a7fce netlink linux: enable listening to all nsids
Internal ports may be moved to another network namespace
and when that happens, the vswitch stops receiving netlink
notifications.

This patch enables the vswitch to listen to all network
namespaces that have a nsid assigned into the network
namespace where the socket has been opened.

It requires kernel 4.2 or newer.

Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2018-03-31 12:48:36 -07:00
Flavio Leitner
a86bd14ec9 netlink: provide network namespace id from a msg.
The netlink notification's ancillary data contains the network
namespace id (netnsid) needed to identify the device correctly.

Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2018-03-31 12:48:31 -07:00
Bhanuprakash Bodireddy
0a0b5a7282 netlink-socket: Reorder elements in nl_dump structure.
By reordering the elements in nl_dump structure, pad bytes can be
reduced there by saving a cache line.

Before: structure size:72, holes:1, sum padbytes:4, cachelines:2
After: structure size:64, holes:0, sum padbytes:0, cachelines:1

Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Co-authored-by: Antonio Fischetti <antonio.fischetti@intel.com>
Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>
Acked-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-10-17 18:35:18 -07:00
Ben Warren
64c967795b Move lib/ofpbuf.h to include/openvswitch directory
Signed-off-by: Ben Warren <ben@skyportsystems.com>
Acked-by: Ryan Moats <rmoats@us.ibm.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2016-03-30 13:10:18 -07:00
Alin Serdean
9667de98d6 nl_sock_fd is not used under MSVC
Ifdef out nl_sock_fd to make users aware it is not used.

Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
2015-09-28 08:11:22 -07:00
Nithin Raju
36791e2178 netlink-socket: Add packet subscribe functionality on Windows.
In this patch, we add support in userspace for packet subscribe API
similar to the join/leave MC group API that is used for port events.
The kernel code has already been commited.

Signed-off-by: Nithin Raju <nithin@vmware.com>
Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2014-10-23 13:08:28 -07:00
Alin Serdean
e214f75182 netlink-socket: Allow compiling on MSVC even without HAVE_NETLINK.
Bypass the error compilation when compiling under MSVC.

Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2014-07-29 09:32:55 -07:00
Ben Pfaff
022ad2b9ce netlink-socket: Add conceptual documentation.
Based on a conversation with the VMware Hyper-V team earlier today.

This commit also changes a couple of functions that were only used with
netlink-socket.c into static functions.  I couldn't think of a reason for
code outside that file to use them.

Signed-off-by: Ben Pfaff <blp@nicira.com>
2014-07-29 08:59:40 -07:00
Ben Pfaff
93295354df netlink-socket: Simplify multithreaded dumping to match Linux reality.
Commit 0791315e4d (netlink-socket: Work around kernel Netlink dump thread
races.) introduced a simple workaround for Linux kernel races in Netlink
dumps.  However, the code remained more complicated than needed.  This
commit simplifies it.

The main reason for complication in the code was 'status_seq' in nl_dump.
This member was there to allow a thread to wait for some other thread to
refill the socket buffer with another dump message (although we did not
understand the reason at the time it was introduced).  Now that we know
that Netlink dumps properly need to be serialized to work in existing
Linux kernels, there's no additional value in having 'status_seq',
because serialized recvmsg() calls always refill the socket buffer
properly.

This commit updates nl_msg_next() to clear its buffer argument on error.
This is a more convenient interface for the new version of the Netlink
dump code.  nl_msg_next() doesn't have any other callers.

Signed-off-by: Ben Pfaff <blp@nicira.com>
2014-07-16 09:39:49 -07:00
Ben Pfaff
0791315e4d netlink-socket: Work around kernel Netlink dump thread races.
The Linux kernel Netlink implementation has two races that cause problems
for processes that attempt to dump a table in a multithreaded manner.

The first race is in the structure of the kernel netlink_recv() function.
This function pulls a message from the socket queue and, if there is none,
reports EAGAIN:
	skb = skb_recv_datagram(sk, flags, noblock, &err);
	if (skb == NULL)
		goto out;
Only if a message is successfully read from the socket queue does the
function, toward the end, try to queue up a new message to be dumped:
	if (nlk->cb && atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf / 2) {
		ret = netlink_dump(sk);
		if (ret) {
			sk->sk_err = ret;
			sk->sk_error_report(sk);
		}
	}
This means that if thread A reads a message from a dump, then thread B
attempts to read one before A queues up the next, B will get EAGAIN.  This
means that, following EAGAIN, B needs to wait until A returns to userspace
before it tries to read the socket again.  nl_dump_next() already does
this, using 'dump->status_seq' (although the need for it has never been
explained clearly, to my knowledge).

The second race is more serious.  Suppose thread X and thread Y both
simultaneously attempt to queue up a new message to be dumped, using the
call to netlink_dump() quoted above.  netlink_dump() begins with:
	mutex_lock(nlk->cb_mutex);

	cb = nlk->cb;
	if (cb == NULL) {
		err = -EINVAL;
		goto errout_skb;
	}
Suppose that X gets cb_mutex first and finds that the dump is complete.  It
will therefore, toward the end of netlink_dump(), clear nlk->cb to NULL to
indicate that no dump is in progress and release the mutex:
	nlk->cb = NULL;
	mutex_unlock(nlk->cb_mutex);
When Y grabs cb_mutex afterward, it will see that nlk->cb is NULL and
return -EINVAL as quoted above.  netlink_recv() stuffs -EINVAL in sk_err,
but that error is not reported immediately; instead, it is saved for the
next read from the socket.  Since Open vSwitch maintains a pool of Netlink
sockets, that next failure can crop up pretty much anywhere.  One of the
worst places for it to crop up is in the execution of a later transaction
(e.g. in nl_sock_transact_multiple__()), because userspace treats Netlink
transactions as idempotent and will re-execute them when socket errors
occur.  For a transaction that sends a packet, this causes packet
duplication, which we actually observed in practice.  (ENOBUFS should
actually cause transactions to be re-executed in many cases, but EINVAL
should not; this is a separate bug in the userspace netlink code.)

VMware-BZ: #1283188
Reported-and-tested-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
2014-07-10 16:48:16 -07:00
Joe Stringer
0672776e58 netlink: Make nl_dump_next() thread-safe.
This patch modifies 'struct nl_dump' and nl_dump_next() to allow
multiple threads to share the same nl_dump. These changes are targeted
around synchronizing dump status between multiple callers, and
allowing callers to fully process their existing buffers before
determining whether to stop fetching flows.

The 'status' field of 'struct nl_dump' becomes atomic, so that multiple
threads may check and/or update it to communicate when there is an error
or the netlink dump is finished. The low bit holds whether the final
message was seen, while the higher bits hold an errno value.

nl_dump_next() will now read all messages from the given buffer before
checking the shared error status and attempting to fetch more. Multiple
threads may call this with the same nl_dump, but must provide
independent buffers. As previously, the final dump status can be
determined by calling nl_dump_done() from a single thread.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2014-02-27 14:26:51 -08:00
Joe Stringer
d57695d77e netlink: Remove buffer from 'struct nl_dump'.
This patch makes all of the users of 'struct nl_dump' allocate their own
buffers to pass down to nl_dump_next(). This paves the way for allowing
multithreaded flow dumping.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2014-02-27 14:14:56 -08:00
Joe Stringer
9c8ad495ec netlink: Rename 'dump->seq' to 'dump->nl_seq'
An upcoming patch will introduce another, completely unrelated seq to
'struct nl_dump'. Giving this one a better name should reduce confusion.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2014-01-23 11:31:06 -08:00
Pravin B Shelar
b3dcb73cc5 datapath: Cleanup netlink compat code.
Patch removes genl, netlink, rtnl compat code and dpif-linux
fallback-id compat code.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2013-09-06 09:51:43 -07:00
Ben Pfaff
0bd01224dc netlink-socket: Make thread-safe.
The uses of vlog in this module are not thread-safe, because vlog itself
is not yet thread-safe.

Signed-off-by: Ben Pfaff <blp@nicira.com>
2013-07-18 13:18:47 -07:00
Ben Pfaff
a88b4e0412 netlink-socket: Simplify use of transactions and dumps.
This disentangles "struct nl_dump" from "struct nl_sock", clearing the way
to make the use of either one thread-safe in an obviously correct manner.

Signed-off-by: Ben Pfaff <blp@nicira.com>
2013-07-18 13:18:46 -07:00
Ben Pfaff
ff459dd649 ovs-brcompatd: Fix sending replies to kernel requests.
Commit 7d7447 (netlink: Postpone choosing sequence numbers until send
time.) broke ovs-brcompatd because it prevented userspace replies to
kernel requests from using the correct sequence numbers.  This commit fixes
it.

Atzm Watanabe found the root cause and provided an alternative patch to
avoid the problem.

Reported-by: André Ruß <andre.russ@hybris.com>
Reported-by: Atzm Watanabe <atzm@stratosphere.co.jp>
Tested-by: Atzm Watanabe <atzm@stratosphere.co.jp>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2012-07-05 08:41:03 -07:00
Raju Subramanian
e0edde6fee Global replace of Nicira Networks.
Replaced all instances of Nicira Networks(, Inc) to Nicira, Inc.

Feature #10593
Signed-off-by: Raju Subramanian <rsubramanian@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2012-05-02 17:08:02 -07:00
Ben Pfaff
72d32ac0b3 netlink-socket: Make caller provide message receive buffers.
Typically an nl_sock client can stack-allocate the buffer for receiving
a Netlink message, which provides a performance boost.

Signed-off-by: Ben Pfaff <blp@nicira.com>
2012-04-18 20:28:48 -07:00
Ben Pfaff
8843668ae5 netlink-socket: Remove unnecessary #include.
Signed-off-by: Ben Pfaff <blp@nicira.com>
2012-04-18 20:28:47 -07:00
Ben Pfaff
8522ba0996 dpif-linux: Use poll() internally in dpif_linux_recv().
Using poll() internally in dpif_linux_recv(), instead of relying
on the results of the main loop poll() call, brings netperf CRR
performance back within 1% of par versus the code base before the
poll_fd_woke() optimizations were introduced.  It also increases
the ovs-benchmark results by about 5% versus that baseline, too.

My theory is that this is because the main loop takes long enough
that a significant number of packets can arrive during the main
loop itself, so this reduces the time before OVS gets to those
packets.
2011-11-28 09:29:18 -08:00
Ben Pfaff
3907401ce6 Revert "poll-loop: Enable checking whether a FD caused a wakeup."
This reverts commit 1e276d1a10539a8cd97d2ad63c073a9a43f0f1ef.
The poll_fd_woke() and nl_sock_woke() function added in that commit are
no longer used, so there is no reason to keep them in the tree.
2011-11-28 09:19:48 -08:00
Ben Pfaff
cc75061af7 netlink-socket: New function nl_sock_transact_multiple().
This will be used in an upcoming commit.
2011-10-14 14:08:44 -07:00
Jesse Gross
1e276d1a10 poll-loop: Enable checking whether a FD caused a wakeup.
Each time we run through the poll loop, we check all file descriptors
that we were waiting on to see if there is data available.  However,
this requires a system call and poll already provides information on
which FDs caused the wakeup so it is inefficient as the number of
active FDs grows.  This provides a way to check whether a given FD
has data.
2011-09-23 15:27:49 -07:00
Jesse Gross
50802adb0e netlink: Expose method to get Netlink pid of a socket.
In the future, the kernel will use unicast messages instead of
multicast to send upcalls.  As a result, we need to be able to
tell it where to direct the traffic.  This adds a function to expose
the Netlink pid of a socket so it can be included in messages to the
kernel.
2011-09-23 15:27:48 -07:00
Ethan Jackson
213a13ed77 dpif-linux: Handle nl_lookup_genl_mcgroup() failures.
The nl_lookup_genl_mcgroup() function can fail on older kernels
which do not support the required netlink interface.  Before this
patch, dpif-linux would refuse to create a datapath when this
happened.  With this patch, it attempts to use a workaround.  If
the workaround fails it simply disables the affected features
without completely disabling the dpif.
2011-09-16 11:22:30 -07:00
Ethan Jackson
e408762f5d netlink-socket: New function nl_lookup_genl_mcgroup(). 2011-09-01 17:18:52 -07:00
Ben Pfaff
5e9fd01b6e netlink-socket: Remove unused nl_sock_sendv() function.
This function hasn't been used for ages.
2011-07-27 14:50:23 -07:00
Ben Pfaff
c6eab56db4 netlink-socket: Make dumping and doing transactions on same nl_sock safe.
It's not safe to use a single Netlink fd to do multiple operations in an
synchronous way.  Some of the limitations are fundamental; for example, the
kernel only supports a single "dump" operation at a time.  Others are
limitations imposed by the OVS coding style; for example, our Netlink
library is not callback based, so nothing can be done about incoming
messages that can't be handled immediately.  Regardless, in OVS multicast
groups, transactions, and dumps cannot coexist on a single nl_sock.

This is only mildly irritating at the moment, but it will become much worse
later on, when dpif-linux shifts to using Netlink dumps for listing various
kinds of datapath entities.  When that happens, a dump will be in progress
in situations where the dpif-linux client might want to do other
operations.  For example, it is reasonable for the client to list flows
and, in the middle, look up information on vports mentioned in those flows.
It might be possible to simply ban and avoid such nested operations--I have
not even audited the source tree to find out whether we do anything like
that already--but that seems like an unnecessary cramp on our coding style.
Furthermore, it's difficult to explain and justify without understanding
the implementation.

This patch takes another approach, by improving the Netlink socket library
to avoid artificial constraints.  When an operation, or a dump, or joining
a multicast group would cause a problem, this patch makes the library
transparently create a separate Netlink socket.  This solves the problem
without putting any onerous restrictions on use.

This commit also slightly simplifies netdev_vport_reset_names().  It had
been written to destroy the dump object before the Netlink socket that it
used, but this is no longer necessary and doing it in the opposite order
saved a few lines of code.

Reviewed by Ethan Jackson <ethan@nicira.com>.
2011-01-27 09:26:06 -08:00
Ben Pfaff
6b7c12fdc1 netlink-socket: New function for draining the receive buffer.
This will be used in an upcoming patch.

Reviewed by Justin Pettit.
2011-01-27 09:26:05 -08:00
Ben Pfaff
cceb11f5b1 netlink-socket: Add functions for joining and leaving multicast groups.
When this library was originally implemented, support for Linux 2.4 was
important.  The Netlink implementation in Linux only added support for
joining and leaving multicast groups after a socket is bound as of Linux
2.6.14, so the library did not support it either.  But the current version
of Open vSwitch targets Linux 2.6.18 and over, so it's fine to add this
support now, and this commit does so.

This will be used more extensively in upcoming commits.

Reviewed by Justin Pettit.
2011-01-27 09:26:05 -08:00
Ben Pfaff
2fe27d5ad2 netlink: Split into generic and Linux-specific parts.
The parts of the netlink module that are related to sockets are
Linux-specific, since only Linux has AF_NETLINK sockets.  The rest can be
built anywhere.  This commit breaks them into two modules, and builds the
generic one on all platforms.

Acked-by: Jesse Gross <jesse@nicira.com>
2010-12-10 11:13:27 -08:00