This commit introduces a new data structure used for receiving packets from
netdevs and passing them to dpifs.
The purpose of this change is to allow storing some private data for each
packet. The subsequent commits make use of it.
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
I think these were leftovers from the removal of %z for MSVC that happened
some time ago.
VMware-BZ: 1265762
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Pritesh Kothari <pritesh.kothari@cisco.com>
Currently physical ports stats are collected from kernel datapath.
However, those counter do not reflect actual wire packet counters
when GSO, TSO or GRO are enabled by the NIC. In the meantime, the
stats collected form routing stack does. While both stats are valid,
Reporting kernel netdev stats for packet counts and byte counts make
it easier to correlate those numbers with external measurements.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This commit can be seen as a partial revert of commit
da4a619179d (netdev: Globally track port status changes)
by adding the 'change_seq' to 'struct netdev'.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Preparation for multi queue netdev IO. There are no functional changes
in this patch.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@redhat.com>
DPDK netdev need to access ofpbuf while sending buffer. Following
patch changes netdev_send accordingly.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@redhat.com>
DPDK can receive multiple packets but current netdev API does
not allow that. Following patch allows dpif-netdev receive batch
of packet in a rx_recv() call for any netdev port. This will be
used by dpdk-netdev.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
netdev-linux does not depend on linux kernel version. So
remove header file include.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This patch makes all of the users of 'struct nl_dump' allocate their own
buffers to pass down to nl_dump_next(). This paves the way for allowing
multithreaded flow dumping.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Commit 73c85181d (netdev-linux: Read packet auxdata to obtain vlan_tid)
added #include <linux/if_packet.h> to this file, to get the definition
of PACKET_AUXDATA and some other definitions, but on RHEL 6.1 this
provoked compiler errors:
In file included from /usr/include/linux/rtnetlink.h:5,
from lib/netdev-linux.c:34:
/usr/include/linux/netlink.h:34: error: expected specifier-qualifier-list
before 'sa_family_t'
Since the old #includes worked everywhere, and this file already defined
its own versions of most of the new macros that it needed, this commit just
reverts the old #includes and adds the one macro definition it didn't
already have.
(RHEL 6.1 isn't necessarily the only platform where this is a problem, but
it's the first one for which we noticed the problem.)
This switches the definition of sockaddr_ll used from the Linux one, which
uses __be16 for sll_protocol, to the glibc one, which uses plain "unsigned
short int". This makes sparse complain (rightly), so this commit also
adds a sparse-specific header that uses ovs_be16 to prevent the warning.
Signed-off-by: Ben Pfaff <blp@nicira.com>
If VLAN acceleration is used when the kernel receives a packet
then the outer-most VLAN tag will not be present in the packet
when it is received by netdev-linux. Rather, it will be present
in auxdata.
This patch uses recvmsg() instead of recv() to read auxdata for
each packet and if the vlan_tid is set then it is added to the packet.
Adding the vlan_tid makes use of headroom available
in the buffer parameter of rx_recv.
Signed-off-by: Simon Horman <horms@verge.net.au>
Co-authored-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Update the netdev_class so that struct ofpbuf * is passed to rx_recv()
to provide both the data and size of the data to read a packet into.
On success, update struct ofpbuf size inside netdev_class rx_recv
implementation and return 0. This moves logic from the caller.
On error a positive error code is returned, whereas previously
a negative error code was returned. This is a more common convention.
This patch should not have any behavioural changes.
This patch is in preparation for the netdev-linux variant of rx_recv()
making use of headroom in the struct ofpbuf * parameter to push a VLAN tag
obtained from auxdata.
Signed-off-by: Simon Horman <horms@verge.net.au>
Co-authored-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
There's no need to obtain the ifindex, because RTM_GETLINK is happy to take
the interface name. There's no need to do a full nl_policy_parse(),
because we only need a single attribute.
Signed-off-by: Ben Pfaff <blp@nicira.com>
The OVS kernel module requires 2.6.32 or later, so there's no reason for
userspace to support older kernels. This commit removes the special
fallback code for retrieving Linux netdev stats in pre-2.6.19 kernels,
which should no longer be useful.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Previously, we tracked status changes for ofports on a per-device basis.
Each time in the main thread's loop, we would inspect every ofport
to determine whether the status had changed for corresponding devices.
This patch replaces the per-netdev change_seq with a global 'struct seq'
which tracks status change for all ports. In the average case where
ports are not constantly going up or down, this allows us to check the
sequence once per main loop and not poll any ports. In the worst case,
execution is expected to be similar to how it is currently.
In a test environment of 5000 internal ports and 50 tunnel ports with
bfd, this reduces average CPU usage of the main thread from about 40% to
about 35%.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
The MSVC C library printf() implementation does not support the 'z', 't',
'j', or 'hh' format specifiers. This commit changes the Open vSwitch code
to avoid those format specifiers, switching to standard macros from
<inttypes.h> where available and inventing new macros resembling them
where necessary. It also updates CodingStyle to specify the macros' use
and adds a Makefile rule to report violations.
Signed-off-by: Alin Serdean <aserdean@cloudbasesolutions.com>
Co-authored-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
When dealing with a large number of ports, one of the performance
bottlenecks is that we loop through all netdevs in the main loop. Miimon
is a contributor to this, checking all devices even if it has never been
enabled.
This patch introduces a counter for the number of netdevs with miimon
configured. If this is 0, then we skip miimon_run() and miimon_wait().
In a test environment of 5000 internal ports and 50 tunnel ports with
bfd, this reduces CPU usage from about 50% to about 45%.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
We have a call chain like this:
iface_configure_qos() calls
netdev_dump_queues(), which calls
netdev_linux_dump_queues(), which calls back through 'cb' to
qos_unixctl_show_cb(), which calls
netdev_delete_queue(), which calls
netdev_linux_delete_queue().
Both netdev_dump_queues() and netdev_linux_delete_queue() take the same
mutex in the same netdev, which deadlocks.
This commit fixes the problem by getting rid of the callback.
netdev_linux_dump_queue_stats() would benefit from the same treatment but
it's less urgent because I don't see any callbacks from that function that
call back into a netdev function.
Bug #19319.
Reported-by: Scott Hendricks <shendricks@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
We've seen a number of deadlocks in the tree since thread safety was
introduced. So far, all of these are self-deadlocks, that is, a single
thread acquiring a lock and then attempting to re-acquire the same lock
recursively. When this has happened, the process simply hung, and it was
somewhat difficult to find the cause.
POSIX "error-checking" mutexes check for this specific problem (and
others). This commit switches from other types of mutexes to
error-checking mutexes everywhere that we can, that is, everywhere that
we're not using recursive mutexes. This ought to help find problems more
quickly in the future.
There might be performance advantages to other kinds of mutexes in some
cases. However, the existing mutex type choices were just guesses, so I'd
rather go for easy detection of errors until we know that other mutex
types actually perform better in specific cases. Also, I did a quick
microbenchmark of glibc mutex types on my host and found that the
error checking mutexes weren't any slower than the other types, at least
when the mutex is uncontended.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
htb_parse_qdisc_details__(), which is called with the netdev mutex, called
netdev_get_mtu(), which tried to reacquire the mutex and thus deadlocked.
This commit fixes the problem and similar problems in
htb_parse_class_details__() and hfsc_parse_qdisc_details__().
Bug #19180.
Reported-by: Dhaval Badiani <dbadiani@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
netdev_linux_set_etheraddr() would attempt to recursively acquire
netdev->mutex via netdev_linux_update_flags() for tap devices.
Reported-by: ZhengLingyun <konghuarukhr@163.com>
Tested-by: ZhengLingyun <konghuarukhr@163.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
The rtnetlink_link asynchronous netlink notifications seem somewhat
troublesome in a threaded environment. It seems more straightforward
to have netdev-linux fend for itself.
Signed-off-by: Ben Pfaff <blp@nicira.com>
This is the same lifecycle used in the ofproto provider interface.
Compared to the previous netdev provider interface, it has the
advantage that the netdev top layer can control when any given
netdev becomes visible to the outside world.
Signed-off-by: Ben Pfaff <blp@nicira.com>
The only uses of 'af_inet_sock', in both drivers, were ioctls, so it seemed
like a good abstraction to write a function that just does such an ioctl,
and to factor out shared code into socket-util.
Signed-off-by: Ben Pfaff <blp@nicira.com>
CC: Ed Maste <emaste@freebsd.org>
This API change is necessary for thread safety, to be added in an upcoming
commit. Otherwise, the client would not be able to safely use the returned
netdev because it could already have been destroyed.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
This API change is necessary for thread safety, to be added in an upcoming
commit. Otherwise, the client would not be able to actually use any of
the returned netdevs because they could already have been destroyed.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
The data items returned by netdev_get_devices() are "struct netdev *"s.
The code fixed up by this commit used them as "struct netdev_linux *",
which happens to work because struct netdev happens to be at offset 0 in
each struct but it's better to do a proper cast in case someday
struct netdev gets moved to a nonzero offset.
Signed-off-by: Ben Pfaff <blp@nicira.com>
This commit fixes the warning issued by 'clang' when pointer is casted
to one with greater alignment.
Signed-off-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This disentangles "struct nl_dump" from "struct nl_sock", clearing the way
to make the use of either one thread-safe in an obviously correct manner.
Signed-off-by: Ben Pfaff <blp@nicira.com>
I am using a userspace OVS, there is a constant ERR message in log:
"dpif_netdev|ERR|error receiving data from ovs-netdev: Message too long"
Check the code and find there is a comparison between ssize_t and size_t type in
netdev_rx_linux_recv(). It makes the negative "retval" greater than the "size".
Change the sequence of comparison to avoid negative return value cast to
a positive and become greater then a non-negative value.
Signed-off-by: ZhengLingyun <konghuarukhr@163.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Commit 796223f5 (netdev: Add new "struct netdev_rx" for capturing packets
from a netdev) refactored send and receive into separate netdevs. As a
result, send and receive now use different socket descriptors (except for tap
interfaces which are treated specially). An unintended side effect was that
all sent packets are looped back and received, which had previously been
avoided as the kernel specifically prevents this from happening on a single
socket descriptor.
To resolve the situation, a socket filter is added to the receive socket
so that it only accepts inbound packets.
Simon Horman co-discovered and initially reported this issue.
Signed-off-by: Murphy McCauley <murphy.mccauley@gmail.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Tested-by: Simon Horman <horms@verge.net.au>
Reviewed-by: Simon Horman <horms@verge.net.au>
netdev_turn_flags_off() does nothing if the flags that one turns off are
already off.
Reported-by: Ethan Jackson <ethan@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
The distinction between struct netdev_dev and struct netdev has always
been confusing. Now that previous commits have eliminated all interesting
state from struct netdev, this commit deletes it and renames struct
netdev_dev to take its place. Now the situation makes much more sense and
I won't have to continue making embarrassed explanations in the future.
Good riddance.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Separating packet capture from "struct netdev" means that there is no
remaining per-"struct netdev" state, which will allow us to get rid of
"struct netdev_dev" (by renaming it "struct netdev").
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>