2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-28 12:58:00 +00:00

282 Commits

Author SHA1 Message Date
Ben Pfaff
337c9b9927 netdev-linux: Add support for 64-bit network device stats.
Reported-by: Andrey Korolyov <andrey@xdel.ru>
Tested-by: Andrey Korolyov <andrey@xdel.ru>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2014-10-29 14:16:29 -07:00
Alin Gabriel Serdean
93451a0a81 dpif-linux: Rename dpif-netlink; change to compile with MSVC.
The patch contains the necessary modifications to compile and also to run
under MSVC.

Added the files to the build system and also changed dpif_linux to be under
a more generic name dpif_windows.

Added a TODO under the windows part in case we want to implement another
counterpart for epoll functions.

Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2014-09-18 14:59:59 -07:00
Alex Wang
5496878cbf netdev: Add function for configuring tx and rx queues.
This commit adds a new API to the 'struct netdev_class' which
allows user to configure the number of tx queues and rx queues
of 'netdev'.  Upcoming patches will use this function to set
multiple tx/rx queues when adding the netdev to dpif-netdev.

Currently, only netdev-dpdk module implements this function.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-09-15 11:43:48 -07:00
Pravin B Shelar
2f9dd77fcd ofproto: Do not update stats on fake bond interface.
There are couple of reasons to remove this support:
*   This is used in very old OVS use-case. It is much better
    to read stats directly from OVS.
*   Forthcoming commit will remove support for setting stats
    for vport. The stats update depends on stats-set.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-09-15 10:08:56 -07:00
Alex Wang
f00fa8cbad netdev: Add n_txq to 'struct netdev'.
This commit adds new variable n_txq to 'struct netdev' for recording
the number of tx queues.  Correspondingly, the send_*() functions are
extended to accept queue id as input argument.

All 'netdev-*' implementation will ignore the queue id since having
multiple tx queues is not supported.  Upcomping patches will start
using it and create multiple tx queues for dpdk netdev.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-09-12 11:30:58 -07:00
Alex Wang
7dec44fe1c netdev: Add function for getting the numa node id of netdev.
This commit adds a new API to the 'struct netdev_class' which
allows user to query the numa node id the 'netdev' is on.

Currently, only netdev-dpdk module implements this function.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-09-12 11:30:58 -07:00
Daniele Di Proietto
61a2647e15 packet-dpif: Add dpif_packet_{get, set}_hash()
These function are used to stored the packet hash. 'netdev-dpdk'
automatically set this value to the RSS hash returned by the
NIC. Other 'netdev's set it to 0 (which is an invalid hash
value), so that callers can compute the hash on their own.

If DPDK support is enabled, struct dpif_packet's member
'dp_hash' is removed and 'pkt.hash.rss' from DPDK mbuf is used

This commit also configure DPDK devices to compute RSS hash
for UDP and IPv6 packets

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-08-29 16:32:21 -07:00
Thomas Graf
1aca400ce9 netdev-linux: Cast policer rate to uint64_t to avoid overflow
tc_fill_rate() takes a 64bit int, casting kbits_rate from int
to uint64_t avoids a possible overflow when translating from
kbits to bytes.

Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2014-08-29 10:47:19 -07:00
Jarno Rajahalme
812c272cde lib/netdev-linux: Use atomic_count for 'miimon_cnt'.
'miimon_cnt' and the actual device miimon configuration is only
loosely coupled, so we can use the relaxed atomic_count for it.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-08-29 10:34:53 -07:00
Ben Pfaff
6a54dedc66 odp-netlink.h: Use 32-bit aligned 64-bit types.
Open vSwitch userspace uses special types to indicate that a particular
object may not be naturally aligned.  Netlink is one source of such
problems: in Netlink, 64-bit integers are often aligned only on 32-bit
boundaries.  This commit changes the odp-netlink.h that is transformed from
<linux/openvswitch.h> to use these types to make it harder to accidentally
access a misaligned 64-bit member.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2014-08-04 11:11:44 -07:00
Daniele Di Proietto
f4fd623c4c netdev: netdev_send accepts multiple packets
The netdev_send function has been modified to accept multiple packets, to
allow netdev providers to amortize locking and queuing costs.
This is especially true for netdev-dpdk.

Later commits exploit the new API.

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-06-23 14:41:13 -07:00
Daniele Di Proietto
910885540a dpif-netdev: use dpif_packet structure for packets
This commit introduces a new data structure used for receiving packets from
netdevs and passing them to dpifs.
The purpose of this change is to allow storing some private data for each
packet. The subsequent commits make use of it.

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-06-23 14:41:12 -07:00
Ben Pfaff
e5e4b47cc2 Fix log message weird suffixes.
I think these were leftovers from the removal of %z for MSVC that happened
some time ago.

VMware-BZ: 1265762
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Pritesh Kothari <pritesh.kothari@cisco.com>
2014-06-11 09:14:54 -07:00
Andy Zhou
04c881eb64 netdev-linux: favor netlink stats for physical ports
Currently physical ports stats are collected from kernel datapath.
However, those counter do not reflect actual wire packet counters
when GSO, TSO or GRO are enabled by the NIC. In the meantime, the
stats collected form routing stack does. While both stats are valid,
Reporting kernel netdev stats for packet counts and byte counts make
it easier to correlate those numbers with external measurements.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-05-02 18:10:46 -07:00
Alex Wang
3e912ffcbb netdev: Add 'change_seq' back to netdev.
This commit can be seen as a partial revert of commit
da4a619179d (netdev: Globally track port status changes)
by adding the 'change_seq' to 'struct netdev'.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-04-10 12:55:28 -07:00
Pravin Shelar
1f317cb5c2 ofpbuf: Introduce access api for base, data and size.
These functions will be used by later patches.  Following patch
does not change functionality.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-03-30 06:18:43 -07:00
Pravin
f77917408a netdev: Rename netdev_rx to netdev_rxq
Preparation for multi queue netdev IO.  There are no functional changes
in this patch.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@redhat.com>
2014-03-21 11:48:28 -07:00
Pravin
40d26f04b2 netdev: Send ofpbuf directly to netdev.
DPDK netdev need to access ofpbuf while sending buffer. Following
patch changes netdev_send accordingly.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@redhat.com>
2014-03-21 11:48:28 -07:00
Pravin
df1e5a3bc7 netdev: Extend rx_recv to pass multiple packets.
DPDK can receive multiple packets but current netdev API does
not allow that.  Following patch allows dpif-netdev receive batch
of packet in a rx_recv() call for any netdev port.  This will be
used by dpdk-netdev.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-03-21 11:48:28 -07:00
Pravin Shelar
f35c2d60cd netdev-linux: Remove unnecessary header.
netdev-linux does not depend on linux kernel version. So
remove header file include.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-03-07 04:03:53 -08:00
Joe Stringer
d57695d77e netlink: Remove buffer from 'struct nl_dump'.
This patch makes all of the users of 'struct nl_dump' allocate their own
buffers to pass down to nl_dump_next(). This paves the way for allowing
multithreaded flow dumping.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2014-02-27 14:14:56 -08:00
Ben Pfaff
55bc98d6cb netdev-linux: Fix build break on RHEL 6.1.
Commit 73c85181d (netdev-linux: Read packet auxdata to obtain vlan_tid)
added #include <linux/if_packet.h> to this file, to get the definition
of PACKET_AUXDATA and some other definitions, but on RHEL 6.1 this
provoked compiler errors:

In file included from /usr/include/linux/rtnetlink.h:5,
    from lib/netdev-linux.c:34:
/usr/include/linux/netlink.h:34: error: expected specifier-qualifier-list
    before 'sa_family_t'

Since the old #includes worked everywhere, and this file already defined
its own versions of most of the new macros that it needed, this commit just
reverts the old #includes and adds the one macro definition it didn't
already have.

(RHEL 6.1 isn't necessarily the only platform where this is a problem, but
it's the first one for which we noticed the problem.)

This switches the definition of sockaddr_ll used from the Linux one, which
uses __be16 for sll_protocol, to the glibc one, which uses plain "unsigned
short int".  This makes sparse complain (rightly), so this commit also
adds a sparse-specific header that uses ovs_be16 to prevent the warning.

Signed-off-by: Ben Pfaff <blp@nicira.com>
2014-01-16 23:40:43 -08:00
Simon Horman
b73c85181d netdev-linux: Read packet auxdata to obtain vlan_tid
If VLAN acceleration is used when the kernel receives a packet
then the outer-most VLAN tag will not be present in the packet
when it is received by netdev-linux. Rather, it will be present
in auxdata.

This patch uses recvmsg() instead of recv() to read auxdata for
each packet and if the vlan_tid is set then it is added to the packet.

Adding the vlan_tid makes use of headroom available
in the buffer parameter of rx_recv.

Signed-off-by: Simon Horman <horms@verge.net.au>
Co-authored-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2014-01-16 15:09:14 -08:00
Simon Horman
bfd3367b9e netdev_class: Pass a struct ofpbuf * to rx_recv()
Update the netdev_class so that struct ofpbuf * is passed to rx_recv()
to provide both the data and size of the data to read a packet into.

On success, update struct ofpbuf size inside netdev_class rx_recv
implementation and return 0. This moves logic from the caller.
On error a positive error code is returned, whereas previously
a negative error code was returned. This is a more common convention.

This patch should not have any behavioural changes.

This patch is in preparation for the netdev-linux variant of rx_recv()
making use of headroom in the struct ofpbuf * parameter to push a VLAN tag
obtained from auxdata.

Signed-off-by: Simon Horman <horms@verge.net.au>
Co-authored-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2014-01-16 14:37:43 -08:00
Ben Pfaff
13a24df8b9 netdev-linux: Simplify get_stats_via_netlink().
There's no need to obtain the ifindex, because RTM_GETLINK is happy to take
the interface name.  There's no need to do a full nl_policy_parse(),
because we only need a single attribute.

Signed-off-by: Ben Pfaff <blp@nicira.com>
2013-12-30 14:04:21 -08:00
Ben Pfaff
35eef899b3 netdev-linux: Drop support for pre-2.6.19 kernels.
The OVS kernel module requires 2.6.32 or later, so there's no reason for
userspace to support older kernels.  This commit removes the special
fallback code for retrieving Linux netdev stats in pre-2.6.19 kernels,
which should no longer be useful.

Signed-off-by: Ben Pfaff <blp@nicira.com>
2013-12-30 14:04:21 -08:00
Joe Stringer
da4a619179 netdev: Globally track port status changes
Previously, we tracked status changes for ofports on a per-device basis.
Each time in the main thread's loop, we would inspect every ofport
to determine whether the status had changed for corresponding devices.

This patch replaces the per-netdev change_seq with a global 'struct seq'
which tracks status change for all ports. In the average case where
ports are not constantly going up or down, this allows us to check the
sequence once per main loop and not poll any ports. In the worst case,
execution is expected to be similar to how it is currently.

In a test environment of 5000 internal ports and 50 tunnel ports with
bfd, this reduces average CPU usage of the main thread from about 40% to
about 35%.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
2013-12-12 15:03:19 -08:00
Alin Serdean
34582733d9 Avoid printf type modifiers not supported by MSVC C runtime library.
The MSVC C library printf() implementation does not support the 'z', 't',
'j', or 'hh' format specifiers.  This commit changes the Open vSwitch code
to avoid those format specifiers, switching to standard macros from
<inttypes.h> where available and inventing new macros resembling them
where necessary.  It also updates CodingStyle to specify the macros' use
and adds a Makefile rule to report violations.

Signed-off-by: Alin Serdean <aserdean@cloudbasesolutions.com>
Co-authored-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2013-11-25 23:38:59 -08:00
Ben Pfaff
c2c28dfd68 Switch from sscanf() to ovs_scan() throughout the tree.
Signed-off-by: Ben Pfaff <blp@nicira.com>
2013-11-15 08:55:02 -08:00
Joe Stringer
19c8e9c11b netdev-linux: Skip miimon execution when disabled
When dealing with a large number of ports, one of the performance
bottlenecks is that we loop through all netdevs in the main loop. Miimon
is a contributor to this, checking all devices even if it has never been
enabled.

This patch introduces a counter for the number of netdevs with miimon
configured. If this is 0, then we skip miimon_run() and miimon_wait().
In a test environment of 5000 internal ports and 50 tunnel ports with
bfd, this reduces CPU usage from about 50% to about 45%.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2013-11-02 20:53:10 -07:00
Alexandru Copot
7ba19d412a netdev: update IFF_LOOPBACK flag for linux and bsd devices
Signed-off-by: Alexandru Copot <alex.mihai.c@gmail.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2013-09-07 09:33:51 -07:00
Ben Pfaff
89454bf477 netdev: Fix deadlock when netdev_dump_queues() callback calls into netdev.
We have a call chain like this:

    iface_configure_qos() calls
        netdev_dump_queues(), which calls
            netdev_linux_dump_queues(), which calls back through 'cb' to
                qos_unixctl_show_cb(), which calls
                    netdev_delete_queue(), which calls
                        netdev_linux_delete_queue().

Both netdev_dump_queues() and netdev_linux_delete_queue() take the same
mutex in the same netdev, which deadlocks.

This commit fixes the problem by getting rid of the callback.

netdev_linux_dump_queue_stats() would benefit from the same treatment but
it's less urgent because I don't see any callbacks from that function that
call back into a netdev function.

Bug #19319.
Reported-by: Scott Hendricks <shendricks@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2013-08-27 21:50:40 -07:00
Ben Pfaff
834d6cafe4 Use "error-checking" mutexes in place of other kinds wherever possible.
We've seen a number of deadlocks in the tree since thread safety was
introduced.  So far, all of these are self-deadlocks, that is, a single
thread acquiring a lock and then attempting to re-acquire the same lock
recursively.  When this has happened, the process simply hung, and it was
somewhat difficult to find the cause.

POSIX "error-checking" mutexes check for this specific problem (and
others).  This commit switches from other types of mutexes to
error-checking mutexes everywhere that we can, that is, everywhere that
we're not using recursive mutexes.  This ought to help find problems more
quickly in the future.

There might be performance advantages to other kinds of mutexes in some
cases.  However, the existing mutex type choices were just guesses, so I'd
rather go for easy detection of errors until we know that other mutex
types actually perform better in specific cases.  Also, I did a quick
microbenchmark of glibc mutex types on my host and found that the
error checking mutexes weren't any slower than the other types, at least
when the mutex is uncontended.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
2013-08-20 13:40:02 -07:00
Ben Pfaff
73371c09f9 netdev-linux: Fix self-deadlocks in traffic control code.
htb_parse_qdisc_details__(), which is called with the netdev mutex, called
netdev_get_mtu(), which tried to reacquire the mutex and thus deadlocked.
This commit fixes the problem and similar problems in
htb_parse_class_details__() and hfsc_parse_qdisc_details__().

Bug #19180.
Reported-by: Dhaval Badiani <dbadiani@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
2013-08-16 14:25:16 -07:00
Ben Pfaff
4f9f3f2135 netdev-linux: Avoid deadlock in netdev_linux_update_flags() for taps.
netdev_linux_set_etheraddr() would attempt to recursively acquire
netdev->mutex via netdev_linux_update_flags() for tap devices.

Reported-by: ZhengLingyun <konghuarukhr@163.com>
Tested-by: ZhengLingyun <konghuarukhr@163.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2013-08-10 20:42:50 -07:00
Ben Pfaff
38e0065b1f netdev-linux: Fix netdev leak in corner case.
Reported-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2013-08-10 09:02:56 -07:00
Ben Pfaff
863838160e netdev: Make netdev access thread-safe.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
2013-08-09 21:34:02 -07:00
Ben Pfaff
cee8733832 netdev-linux: Use dedicated netlink notification socket.
The rtnetlink_link asynchronous netlink notifications seem somewhat
troublesome in a threaded environment.  It seems more straightforward
to have netdev-linux fend for itself.

Signed-off-by: Ben Pfaff <blp@nicira.com>
2013-08-09 21:29:03 -07:00
Ben Pfaff
9dc63482bb netdev: Adopt four-step alloc/construct/destruct/dealloc lifecycle.
This is the same lifecycle used in the ofproto provider interface.
Compared to the previous netdev provider interface, it has the
advantage that the netdev top layer can control when any given
netdev becomes visible to the outside world.

Signed-off-by: Ben Pfaff <blp@nicira.com>
2013-08-09 21:21:38 -07:00
Ben Pfaff
259e0b1ad1 netdev-linux, netdev-bsd: Make access to AF_INET socket thread-safe.
The only uses of 'af_inet_sock', in both drivers, were ioctls, so it seemed
like a good abstraction to write a function that just does such an ioctl,
and to factor out shared code into socket-util.

Signed-off-by: Ben Pfaff <blp@nicira.com>
CC: Ed Maste <emaste@freebsd.org>
2013-08-09 21:14:23 -07:00
Ben Pfaff
991e5fae57 netdev: Make netdev_from_name() take a reference to its returned netdev.
This API change is necessary for thread safety, to be added in an upcoming
commit.  Otherwise, the client would not be able to safely use the returned
netdev because it could already have been destroyed.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
2013-08-07 23:52:43 -07:00
Ben Pfaff
2f980d7417 netdev: Make netdev_get_devices() take a reference to each netdev.
This API change is necessary for thread safety, to be added in an upcoming
commit.  Otherwise, the client would not be able to actually use any of
the returned netdevs because they could already have been destroyed.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
2013-08-07 23:52:42 -07:00
Ben Pfaff
c22762302c netdev-linux: Move variable declaration inward in netdev_linux_cache_cb().
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
2013-08-07 23:40:32 -07:00
Ben Pfaff
887ed8b26f netdev-linux: Remove useless member 'peer', which was always zero.
Always, correct a comment on netdev_linux_get_features().

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
2013-08-07 23:40:32 -07:00
Ben Pfaff
1755ec4004 netdev-linux: Remove unused struct netdev_linux member.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
2013-08-07 23:40:32 -07:00
Ben Pfaff
d0d08f8aa5 netdev-linux: Remove pointless layers of indirection for tap devices.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
2013-08-07 23:40:31 -07:00
Ben Pfaff
15aee11695 netdev: Minor formatting improvements.
Signed-off-by: Ben Pfaff <blp@nicira.com>
2013-08-02 12:24:19 -07:00
Ben Pfaff
96172faa34 netdev-linux: Don't assume 'struct netdev' has offset 0.
The data items returned by netdev_get_devices() are "struct netdev *"s.
The code fixed up by this commit used them as "struct netdev_linux *",
which happens to work because struct netdev happens to be at offset 0 in
each struct but it's better to do a proper cast in case someday
struct netdev gets moved to a nonzero offset.

Signed-off-by: Ben Pfaff <blp@nicira.com>
2013-08-02 12:24:18 -07:00
Ben Pfaff
2e5ae318d5 netdev-linux: Initialize change_seq for tap devices too.
change_seq is supposed to always be nonzero but tap devices got this wrong.

Signed-off-by: Ben Pfaff <blp@nicira.com>
2013-08-02 12:24:18 -07:00
Ben Pfaff
f61d8d2931 netdev-linux: Fix fd leak on error path.
Found by inspection.

Signed-off-by: Ben Pfaff <blp@nicira.com>
2013-08-02 12:24:18 -07:00