Commit 5a0340 (dpif-netdev: Create multiple tx/rx queues when
adding dpdk interface.) introduced a bug which causes the function
netdev_dpdk_set_multiq() never resetting the tx queues. This bug
could cause pmd thread accessing unassigned memory, resulting in
segfault.
This commit fixes the bug.
Reported-by: Ethan Jackson <ethan@nicira.com>
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Daniele Di Proietto <ddiproietto@vmware.com>
Since dpdk_do_tx_copy() will be called by both pmd and
non-pmd thread, it should take the queue id as input.
The current ovs always uses NON_PMD_THREAD_TX_QUEUE
as queue id, which causes unprotected multi-access
to the same queue.
This commit fixes the issue by passing the queue id
to dpdk_do_tx_copy().
Reported-by: Ethan Jackson <ethan@nicira.com>
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Daniele Di Proietto <ddiproietto@vmware.com>
This code handles an event notification subscription for a user mode thread
which joins an MC group. The event wait handler queues an IRP which is
completed upon change in a port state.
Signed-off-by: Eitan Eliahu <eliahue@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Nithin Raju <nithin@vmware.com>
Acked-by: Samuel Ghinet <sghinet@cloudbasesolutions.com>
The patch contains the necessary modifications to compile and also to run
under MSVC.
Added the files to the build system and also changed dpif_linux to be under
a more generic name dpif_windows.
Added a TODO under the windows part in case we want to implement another
counterpart for epoll functions.
Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Frequently we've run into controller bugs which result in hundreds of
thousands, or even millions of rules being installed in an OpenFlow
table. This isn't something trouble-shooters naturally think of to
check for, so it's nice to have a low rate warning message to hint at
the potential problem.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This optimization is done to avoid calling count_1bits(), which, if
the popcnt istruction is not available might is slow. popcnt may not
be available because:
- We are running on old hardware
- (more likely) We're using a generic build (i.e. packaged OVS from a
distro), not tuned for the specific CPU
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
netdev_flow_key is a miniflow with the following constraints:
1) It is used only inside dpif-netdev.c.
2) It always has inline values.
3) It contains only miniflows created by miniflow_extract().
Therefore, by using these new functions instead of the miniflow_*
ones, we get the following (performance related) benefits:
- Because of (1) the functions can be inlined.
- Because of (2) and (3) the netdev_flow_key can be treated as POD.
Specifically, because of (3), we can do comparisons with memcmp,
since if the map is different the miniflow must be different.
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
This optimization should give a small performance benefit to the userspace
datapath.
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Added the optional flag in policy structure. This would allow
caller to avoid checks for mandatory attributes if parsing
succeeds.
Signed-off-by: Ankur Sharma <ankursharma@vmware.com>
Acked-by: Nithin Raju <nithin@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Added minor fix for allowing support for variable lenghth attributes in
parsing policy.
Signed-off-by: Ankur Sharma <ankursharma@vmware.com>
Acked-by: Nithin Raju <nithin@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
They have slightly different support characteristics, so it's nice to
easily switch between them for testing.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Daniele Di Proietto <ddiproietto@vmware.com>
They may or may not make a difference, but there's no reason not to
support passing them.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Daniele Di Proietto <ddiproietto@vmware.com>
In this change:
1. we refactor the code that fills up information about the DP into
a seprate function.
2. use the netlink set APIs to fill up the netlink attributes.
3. we define a OVS_DP_STATS to be a typedef of 'struct ovs_dp_stats'
in keeping with the Windows kernel naming conventions.
4. In the absence of netlink set API, I had put in an ASSERT earlier
that the output buffer should be limited to 512 bytes. This is not
true anymore. The netlink set API checks for bounds of the buffer.
Hence removed the ASSERT.
Signed-off-by: Nithin Raju <nithin@vmware.com>
Acked-by: Ankur Sharma <ankursharma@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
These warnings were introduced by
commit 7d031d7e51
("ofproto-dpif-xlate: Work around Linux netdev_max_backlog limit.")
and found by --enable-Werror build on NetBSD.
Signed-off-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Signed-off-by: Ben Pfaff <blp@nicira.com>
To prevent warnings such as "Not all control paths return a value",
we should define NO_RETURN for MSVC.
Currently for gcc, we add NO_RETURN at the end of function declaration.
But for MSVC, "__declspec(noreturn)" is needed at the beginning of function
declaration. So this commit moves NO_RETURN to the beginning of the function
declaration as it works with gcc and clang too.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
MSVC complains about a void function returning a value if there is a
statement of the form - 'return foo()' even if foo() has a void return
type.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
MSVC does not support c11 style atomics for the C compiler.
Windows has different InterLocked* functions for different data
sizes. ovs-atomic-msvc.h maps the api in ovs-atomic.h (which is similar
to c11 atomics) to the available atomic functions in Windows. In some
cases, this causes compiler warnings about mismatched data sizes because
the generated code has 'if else' conditions on different data sizes and
proper casting is not possible.
In current OVS code base, we get one compiler warning through ovs-rcu.h
which says "‘void *’ differs in levels of indirection from LONGLONG."
This comes from the following in ovs-atomic-msvc.h for atomic_read64():
*(DST) = InterlockedOr64((int64_t volatile *) (SRC), 0);
when *DST is a void pointer (because InterLockedOr64 returns LONGLONG).
But this code path is only every hit for 64 bit data. So it should be safe to
disable the warning. (Any real bugs in api calls would hopefully be caught
while compiling on Linux using gcc/clang).
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Eitan Eliahu <eliahue@vmware.com>
dpdk_eth_dev_init() must be called with dpdk_mutex. However,
netdev_dpdk_set_multiq() fails to follow this rule. This commit
fixes this breach.
Found by clang.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Daniele Di Proietto <ddiproietto@vmware.com>
With the separation of tx queue and rx queue configuration
in netdev-dpdk module, the netdev_dpdk_get_config() can no
longer report 'n_rxq' as tx queue configuration.
This commit fixes the above issue.
Reported-by: Daniele Di Proietto <ddiproietto@vmware.com>
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Daniele Di Proietto <ddiproietto@vmware.com>
With this commit, ovs by default will create one pmd thread
for each numa node and pin the pmd thread to available cpu
core on the numa node.
NON_PMD_CORE_ID (currently 0) is used to reserve a particular
cpu core for the I/O of all non-pmd threads. No pmd thread
can be pinned to this reserved core.
As side-effects of this commit:
- pmd thread will not be created, if there is no dpdk interface
from the corresponding numa node added to ovs.
- the exact-match cache for non-pmd threads is removed from
'struct dp_netdev'. Instead, all non-pmd threads will use
the exact-match cache defined in the 'struct dp_netdev_pmd_thread'
for NON_PMD_CORE_ID.
- the rx packet processing functions are refactored to use
'struct dp_netdev_pmd_thread' as input.
- the 'netdev_send()' function will be called with the proper
queue id.
- both pmd and non-pmd threads can call the dpif_netdev_execute().
so, use a per-thread key to help recognize the calling thread.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
The previous commit makes OVS create one tx queue for each
cpu core, each pmd thread will use a separate tx queue.
Also, tx of non-pmd threads on dpdk interface is all through
'NON_PMD_THREAD_TX_QUEUE', protected by the 'nonpmd_mempool_mutex'.
Therefore, the spinlock is no longer needed. And this commit
removes it from 'struct dpdk_tx_queue'.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Previous commit makes OVS create one tx queue for each cpu
core. An upcoming patch will allow multiple pmd threads be
created and pinned to cpu cores. So each pmd thread will use
the tx queue corresponding to its core id.
Moreover, the pmd threads running on different numa node than
the dpdk interface (called non-local pmd thread) will not
handle the rx of the interface. Consequently, there need to
be a way to flush the tx queues of the non-local pmd threads.
To address the queue flushing issue, this commit introduces a
new flag 'flush_tx' in the 'struct dpdk_tx_queue' which is
set if the queue is to be used by a non-local pmd thread.
Then, when enqueueing the tx pkts, if the flag is set, the tx
queue will always be flushed immediately after the enqueue.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Before this commit, ovs creates one tx and one rx queue for
each dpdk interface and uses only one poll thread for handling
I/O of all dpdk interfaces. An upcoming patch will allow multiple
poll threads be created. As a preparation, this commit changes
the dpif-netdev to create multiple tx/rx queues when the dpdk
interface is added.
Specifically, the number of rx queues will still be one per-dpdk
interface for this commit. But upcoming work will allow user
create multiple rx queues. The number of tx queues will be the
number of cpu cores on the machine. Although not all the tx queues
will be used, each poll thread will have its own queue for
transmission on the dpdk interface.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
This commit adds a new API to the 'struct netdev_class' which
allows user to configure the number of tx queues and rx queues
of 'netdev'. Upcoming patches will use this function to set
multiple tx/rx queues when adding the netdev to dpif-netdev.
Currently, only netdev-dpdk module implements this function.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
There are couple of reasons to remove this support:
* This is used in very old OVS use-case. It is much better
to read stats directly from OVS.
* Forthcoming commit will remove support for setting stats
for vport. The stats update depends on stats-set.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Current autoconfig detection logic for HAVE_PER_CPU_PTR is not robust.
Depends on linux kernel version, the definition can be in either
linux/percpu.h or asm/percpu.h
Turns out it is simpler and safer to handle missing percpu.h
definitions in linux/percpu.h rather than asm/percpu.h. With this
change, there is no need for the autoconfig detection logic above.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
--enable-dummy was useless anyway for ovsdb-server. Now it is an error to pass
it.
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
The datapath max_idle value determines how long to wait before deleting
an idle datapath flow when operating below the flow_limit. This patch
increases the max_idle to 10 seconds, which allows datapath flows to be
remain cached even if they are used less consistently, and provides a
small improvement in the supported number of flows when operating around
the flow_limit.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
Openvswitch defines u64_stats_sync as ->sync rather than ->syncp,
so fails to compile with netdev_alloc_pcpu_stats(). So just rename it to ->syncp.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes: 1c213bd24ad04f4430031 (net: introduce netdev_alloc_pcpu_stats() for drivers)
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
There are many drivers calling alloc_percpu() to allocate pcpu stats
and then initializing ->syncp. So just introduce a helper function for them.
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
This patch introduces the use of the macro IS_ERR_OR_NULL in place of
tests for NULL and IS_ERR.
The following Coccinelle semantic patch was used for making the change:
@@
expression e;
@@
- e == NULL || IS_ERR(e)
+ IS_ERR_OR_NULL(e)
|| ...
Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The "rcu_dereference()" call is used directly in a condition.
Since its return value is never dereferenced it is recommended to use
"rcu_access_pointer()" instead of "rcu_dereference()".
Therefore, this patch makes the replacement.
The following Coccinelle semantic patch was used:
@@
@@
(
if(
(<+...
- rcu_dereference
+ rcu_access_pointer
(...)
...+>)) {...}
|
while(
(<+...
- rcu_dereference
+ rcu_access_pointer
(...)
...+>)) {...}
)
Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
This commit adds new variable n_txq to 'struct netdev' for recording
the number of tx queues. Correspondingly, the send_*() functions are
extended to accept queue id as input argument.
All 'netdev-*' implementation will ignore the queue id since having
multiple tx queues is not supported. Upcomping patches will start
using it and create multiple tx queues for dpdk netdev.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
This commit adds a new API to the 'struct netdev_class' which
allows user to query the numa node id the 'netdev' is on.
Currently, only netdev-dpdk module implements this function.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
On current master, the per-thread callback event set is flushed
when ovsrcu_quiesce_start() is called or when the callback
event set is full. For threads that only call 'ovsrcu_quiesce()'
to indicate quiescient state, their callback event set will not
be flushed for execution until the set is full. And this could
take a very long time.
Theoretically, this should not be an issue, since rcu postponed
callback events should only free the old version of objects.
However, current ovs does not follow this rule, and some callback
events include other activities like unregistering the netdev
from global name-netdev map. The delay of unregistering the netdev
(by threads that only calls ovsrcu_quiesce()) will prevent the
recreate of same netdev indefinitely.
As a short-term workaround, this commit makes every call to
ovsrcu_quiesce() flush the callback event set. In the long run,
there will be a refactor of the use of ovs-rcu module, in which all
callback events only free the old version of objects.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Commit b2a0daa5bd (debian: Don't recreate bridges during manual restart.)
added a check on $RUNLEVEL to only create bridges and ports when the
system starts up. This fix does not work with systemd.
This commit uses a different approach to solve the same problem.
Reported-at: https://bugs.debian.org/686518
Reported-by: Philipp S. Schmidt <phils@in-panik.de>
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Tested-by: Philipp S. Schmidt <phils@in-panik.de>