Commit 73c85181d (netdev-linux: Read packet auxdata to obtain vlan_tid)
added #include <linux/if_packet.h> to this file, to get the definition
of PACKET_AUXDATA and some other definitions, but on RHEL 6.1 this
provoked compiler errors:
In file included from /usr/include/linux/rtnetlink.h:5,
from lib/netdev-linux.c:34:
/usr/include/linux/netlink.h:34: error: expected specifier-qualifier-list
before 'sa_family_t'
Since the old #includes worked everywhere, and this file already defined
its own versions of most of the new macros that it needed, this commit just
reverts the old #includes and adds the one macro definition it didn't
already have.
(RHEL 6.1 isn't necessarily the only platform where this is a problem, but
it's the first one for which we noticed the problem.)
This switches the definition of sockaddr_ll used from the Linux one, which
uses __be16 for sll_protocol, to the glibc one, which uses plain "unsigned
short int". This makes sparse complain (rightly), so this commit also
adds a sparse-specific header that uses ovs_be16 to prevent the warning.
Signed-off-by: Ben Pfaff <blp@nicira.com>
If VLAN acceleration is used when the kernel receives a packet
then the outer-most VLAN tag will not be present in the packet
when it is received by netdev-linux. Rather, it will be present
in auxdata.
This patch uses recvmsg() instead of recv() to read auxdata for
each packet and if the vlan_tid is set then it is added to the packet.
Adding the vlan_tid makes use of headroom available
in the buffer parameter of rx_recv.
Signed-off-by: Simon Horman <horms@verge.net.au>
Co-authored-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Update the netdev_class so that struct ofpbuf * is passed to rx_recv()
to provide both the data and size of the data to read a packet into.
On success, update struct ofpbuf size inside netdev_class rx_recv
implementation and return 0. This moves logic from the caller.
On error a positive error code is returned, whereas previously
a negative error code was returned. This is a more common convention.
This patch should not have any behavioural changes.
This patch is in preparation for the netdev-linux variant of rx_recv()
making use of headroom in the struct ofpbuf * parameter to push a VLAN tag
obtained from auxdata.
Signed-off-by: Simon Horman <horms@verge.net.au>
Co-authored-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
There's no need to obtain the ifindex, because RTM_GETLINK is happy to take
the interface name. There's no need to do a full nl_policy_parse(),
because we only need a single attribute.
Signed-off-by: Ben Pfaff <blp@nicira.com>
The OVS kernel module requires 2.6.32 or later, so there's no reason for
userspace to support older kernels. This commit removes the special
fallback code for retrieving Linux netdev stats in pre-2.6.19 kernels,
which should no longer be useful.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Previously, we tracked status changes for ofports on a per-device basis.
Each time in the main thread's loop, we would inspect every ofport
to determine whether the status had changed for corresponding devices.
This patch replaces the per-netdev change_seq with a global 'struct seq'
which tracks status change for all ports. In the average case where
ports are not constantly going up or down, this allows us to check the
sequence once per main loop and not poll any ports. In the worst case,
execution is expected to be similar to how it is currently.
In a test environment of 5000 internal ports and 50 tunnel ports with
bfd, this reduces average CPU usage of the main thread from about 40% to
about 35%.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
The MSVC C library printf() implementation does not support the 'z', 't',
'j', or 'hh' format specifiers. This commit changes the Open vSwitch code
to avoid those format specifiers, switching to standard macros from
<inttypes.h> where available and inventing new macros resembling them
where necessary. It also updates CodingStyle to specify the macros' use
and adds a Makefile rule to report violations.
Signed-off-by: Alin Serdean <aserdean@cloudbasesolutions.com>
Co-authored-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
When dealing with a large number of ports, one of the performance
bottlenecks is that we loop through all netdevs in the main loop. Miimon
is a contributor to this, checking all devices even if it has never been
enabled.
This patch introduces a counter for the number of netdevs with miimon
configured. If this is 0, then we skip miimon_run() and miimon_wait().
In a test environment of 5000 internal ports and 50 tunnel ports with
bfd, this reduces CPU usage from about 50% to about 45%.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
We have a call chain like this:
iface_configure_qos() calls
netdev_dump_queues(), which calls
netdev_linux_dump_queues(), which calls back through 'cb' to
qos_unixctl_show_cb(), which calls
netdev_delete_queue(), which calls
netdev_linux_delete_queue().
Both netdev_dump_queues() and netdev_linux_delete_queue() take the same
mutex in the same netdev, which deadlocks.
This commit fixes the problem by getting rid of the callback.
netdev_linux_dump_queue_stats() would benefit from the same treatment but
it's less urgent because I don't see any callbacks from that function that
call back into a netdev function.
Bug #19319.
Reported-by: Scott Hendricks <shendricks@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
We've seen a number of deadlocks in the tree since thread safety was
introduced. So far, all of these are self-deadlocks, that is, a single
thread acquiring a lock and then attempting to re-acquire the same lock
recursively. When this has happened, the process simply hung, and it was
somewhat difficult to find the cause.
POSIX "error-checking" mutexes check for this specific problem (and
others). This commit switches from other types of mutexes to
error-checking mutexes everywhere that we can, that is, everywhere that
we're not using recursive mutexes. This ought to help find problems more
quickly in the future.
There might be performance advantages to other kinds of mutexes in some
cases. However, the existing mutex type choices were just guesses, so I'd
rather go for easy detection of errors until we know that other mutex
types actually perform better in specific cases. Also, I did a quick
microbenchmark of glibc mutex types on my host and found that the
error checking mutexes weren't any slower than the other types, at least
when the mutex is uncontended.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
htb_parse_qdisc_details__(), which is called with the netdev mutex, called
netdev_get_mtu(), which tried to reacquire the mutex and thus deadlocked.
This commit fixes the problem and similar problems in
htb_parse_class_details__() and hfsc_parse_qdisc_details__().
Bug #19180.
Reported-by: Dhaval Badiani <dbadiani@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
netdev_linux_set_etheraddr() would attempt to recursively acquire
netdev->mutex via netdev_linux_update_flags() for tap devices.
Reported-by: ZhengLingyun <konghuarukhr@163.com>
Tested-by: ZhengLingyun <konghuarukhr@163.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
The rtnetlink_link asynchronous netlink notifications seem somewhat
troublesome in a threaded environment. It seems more straightforward
to have netdev-linux fend for itself.
Signed-off-by: Ben Pfaff <blp@nicira.com>
This is the same lifecycle used in the ofproto provider interface.
Compared to the previous netdev provider interface, it has the
advantage that the netdev top layer can control when any given
netdev becomes visible to the outside world.
Signed-off-by: Ben Pfaff <blp@nicira.com>
The only uses of 'af_inet_sock', in both drivers, were ioctls, so it seemed
like a good abstraction to write a function that just does such an ioctl,
and to factor out shared code into socket-util.
Signed-off-by: Ben Pfaff <blp@nicira.com>
CC: Ed Maste <emaste@freebsd.org>
This API change is necessary for thread safety, to be added in an upcoming
commit. Otherwise, the client would not be able to safely use the returned
netdev because it could already have been destroyed.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
This API change is necessary for thread safety, to be added in an upcoming
commit. Otherwise, the client would not be able to actually use any of
the returned netdevs because they could already have been destroyed.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
The data items returned by netdev_get_devices() are "struct netdev *"s.
The code fixed up by this commit used them as "struct netdev_linux *",
which happens to work because struct netdev happens to be at offset 0 in
each struct but it's better to do a proper cast in case someday
struct netdev gets moved to a nonzero offset.
Signed-off-by: Ben Pfaff <blp@nicira.com>
This commit fixes the warning issued by 'clang' when pointer is casted
to one with greater alignment.
Signed-off-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This disentangles "struct nl_dump" from "struct nl_sock", clearing the way
to make the use of either one thread-safe in an obviously correct manner.
Signed-off-by: Ben Pfaff <blp@nicira.com>
I am using a userspace OVS, there is a constant ERR message in log:
"dpif_netdev|ERR|error receiving data from ovs-netdev: Message too long"
Check the code and find there is a comparison between ssize_t and size_t type in
netdev_rx_linux_recv(). It makes the negative "retval" greater than the "size".
Change the sequence of comparison to avoid negative return value cast to
a positive and become greater then a non-negative value.
Signed-off-by: ZhengLingyun <konghuarukhr@163.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Commit 796223f5 (netdev: Add new "struct netdev_rx" for capturing packets
from a netdev) refactored send and receive into separate netdevs. As a
result, send and receive now use different socket descriptors (except for tap
interfaces which are treated specially). An unintended side effect was that
all sent packets are looped back and received, which had previously been
avoided as the kernel specifically prevents this from happening on a single
socket descriptor.
To resolve the situation, a socket filter is added to the receive socket
so that it only accepts inbound packets.
Simon Horman co-discovered and initially reported this issue.
Signed-off-by: Murphy McCauley <murphy.mccauley@gmail.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Tested-by: Simon Horman <horms@verge.net.au>
Reviewed-by: Simon Horman <horms@verge.net.au>
netdev_turn_flags_off() does nothing if the flags that one turns off are
already off.
Reported-by: Ethan Jackson <ethan@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
The distinction between struct netdev_dev and struct netdev has always
been confusing. Now that previous commits have eliminated all interesting
state from struct netdev, this commit deletes it and renames struct
netdev_dev to take its place. Now the situation makes much more sense and
I won't have to continue making embarrassed explanations in the future.
Good riddance.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Separating packet capture from "struct netdev" means that there is no
remaining per-"struct netdev" state, which will allow us to get rid of
"struct netdev_dev" (by renaming it "struct netdev").
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
This gets rid of the only per-instance data in "struct netdev", which
will make it possible to merge "struct netdev_dev" into "struct netdev" in
a later commit.
Ed Maste wrote the netdev-bsd changes in this commit.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Co-authored-by: Ed Maste <emaste@freebsd.org>
Signed-off-by: Ed Maste <emaste@freebsd.org>
Tested-by: Ed Maste <emaste@freebsd.org>
It's unlikely to fail but checking it can't hurt.
Found by Coverity.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
This patch removes the final bit of linux specific code which
prevents building netdev-vport everywhere. With this, other
platforms automatically get access to patch ports, and (if their
datapath supports it), flow based tunneling.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
The current method of calculating the ingress policer rate
can lead to inaccuracy if ingress_policing_rate is set to
a smallish values because the rate is divided by 8 first
which causes rounding errors.
Signed-off-by: Thomas Graf <tgraf@redhat.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This is a straight search-and-replace, except that I also removed #include
<assert.h> from each file where there were no assert calls left.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
Future patches will need to know the details of a netdev's tunnel
configuration from outside the netdev library.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
The only user of netdev_set_stats() is bonding (for updating the
fake interface). This interface is never a vport, so it seems
quite a bit cleaner to keep the relevant code in the netdev-linux
library where it's needed, instead of in netdev-vport, where it
adds needless complexity.
Signed-off-by: Ethan Jackson <ethan@nicira.com>