There isn't any significant downside to making cmap iteration "safe" all
the time, so this drops the _SAFE variant.
Similar changes to CMAP_CURSOR_FOR_EACH and CMAP_CURSOR_FOR_EACH_CONTINUE.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Typically, kernel datapath threads send upcalls to userspace where
handler threads process the upcalls. For TAP and DPDK devices, the
datapath threads operate in userspace, so there is no need for
separate handler threads.
This patch allows userspace datapath threads to directly call the
ofproto upcall functions, eliminating the need for handler threads
for datapaths of type 'netdev'.
Signed-off-by: Ryan Wilson <wryan@nicira.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
Commit db73f7166a6 (netdev-dpdk: Fix race condition with DPDK mempools in
non pmd threads) switched to a new way of setting up 'upcall->packet', but
only initialized two of the fields in the packet. This could cause
core dumps and other strange behavior. In particular it caused failures in
several unit tests on XenServer.
This commit fixes the problem by initializing the entire ofpbuf.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
struct 'miniflow' already contains MINI_N_INLINE values, therefore
we can save few bytes in netdev_flow_key.
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
CMAP_FOR_EACH and CLS_FOR_EACH and their variants tried to use void ** as
a "pointer to any kind of pointer". That is a violation of the aliasing
rules in ISO C which technically yields undefined behavior. With GCC 4.1,
it causes both warnings and actual misbehavior. One option would to add
-fno-strict-aliasing to the compiler flags, but that would only help with
GCC; who knows whether this can be worked around with other compilers.
Instead, this commit rewrites the iterators to avoid disallowed pointer
aliasing.
VMware-BZ: #1287651
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
DPDK mempools rely on rte_lcore_id() to implement a thread-local cache.
Our non pmd threads had rte_lcore_id() == 0. This allowed concurrent access to
the "thread-local" cache, causing crashes.
This commit resolves the issue with the following changes:
- Every non pmd thread has the same lcore_id (0, for management reasons), which
is not shared with any pmd thread (lcore_id for pmd threads now start from 1)
- DPDK mbufs must be allocated/freed in pmd threads. When there is the need to
use mempools in non pmd threads, like in dpdk_do_tx_copy(), a mutex must be
held.
- The previous change does not allow us anymore to pass DPDK mbufs to handler
threads: therefore this commit partially revert 143859ec63d45e. Now packets
are copied for upcall processing. We can remove the extra memcpy by
processing upcalls in the pmd thread itself.
With the introduction of the extra locking, the packet throughput will be lower
in the following cases:
- When using internal (tap) devices with DPDK devices on the same datapath.
Anyway, to support internal devices efficiently, we needed DPDK KNI devices,
which will be proper pmd devices and will not need this locking.
- When packets are processed in the slow path by non pmd threads. This overhead
can be avoided by handling the upcalls directly in pmd threads (a change that
has already been proposed by Ryan Wilson)
Also, the following two fixes have been introduced:
- In dpdk_free_buf() use rte_pktmbuf_free_seg() instead of rte_mempool_put().
This allows OVS to run properly with CONFIG_RTE_LIBRTE_MBUF_DEBUG DPDK option
- Do not bulk free mbufs in a transmission queue. They may belong to different
mempools
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
The userspace datapath returns RCU-protected actions from flow_get() and
flow_dump_next(). This doesn't cause any trouble for current users of
these functions, but it imposes additional constraints on their use.
This patch makes the dpif documentation more explicit about how the
results of these functions can be used.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Change the interface to allow implementations to pass back a buffer, and
allow callers to specify which of actions, mask, and stats they wish to
receive. This will be used in the next commit.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Now that all the relevant classifier structures use RCU and internal
mutual exclusion for modifications, we can remove the fat-rwlock and
thus make the classifier lookups lockless.
As the readers are operating concurrently with the writers, a
concurrent reader may or may not see a new rule being added by a
writer, depending on how the concurrent events overlap with each
other. Overall, this is no different from the former locked behavior,
but there the visibility of the new rule only depended on the timing
of the locking functions.
A new rule is first added to the segment indices, so the readers may
find the rule in the indices before the rule is visible in the
subtables 'rules' map. This may result in us losing the opportunity
to quit lookups earlier, resulting in sub-optimal wildcarding. This
will be fixed by forthcoming revalidation always scheduled after flow
table changes.
Similar behavior may happen due to us removing the overlapping rule
(if any) from the indices only after the corresponding new rule has
been added.
The subtable's max priority is updated only after a rule is inserted
to the maps, so the concurrent readers may not see the rule, as the
updated priority ordered subtable list will only be visible after the
subtable's max priority is updated.
Similarly, the classifier's partitions are updated by the caller after
the rule is inserted to the maps, so the readers may keep skipping the
subtable until they see the updated partitions.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
After a quick analysis, in most cases the access to refcounted objects
is clearly protected either with an explicit lock/mutex, or RCU. there
are only a few places where I left a call to ovs_refcount_unref().
Upon closer analysis it may well be that those could also use the
relaxed form.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This requires less locking and makes introducing lockless classifier
lookups possible.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
The new name "packet_batch" is a bit more straight forward.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
This commit fixes memory leaks in dp_execute_cb() in two cases:
- when the output port cannot be found
- when the recirculation depth is exceeded
Reported-by: Pravin Shelar <pshelar@nicira.com>
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
miniflow_destroy() needs to be called after using miniflow_init().
Otherwise, if the miniflow mallocs data, then a memory leak may
occur.
Found by inspection.
Signed-off-by: Ryan Wilson <wryan@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Previously, flows were retrieved one by one when dumping flows for
datapaths of type 'netdev'. This increased contention for the dump's
mutex, negatively affecting revalidator performance.
This patch retrieves batches of flows when dumping flows for datapaths
of type 'netdev'.
Signed-off-by: Ryan Wilson <wryan@nicira.com>
[blp@nicira.com relaxed max_flows restriction]
Signed-off-by: Ben Pfaff <blp@nicira.com>
In dp_netdev_input() we nevered fully covered the case where handler queues are
not there.
With this change we increment the stat counter and free the packet.
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
This change in dpif-netdev allows faster packet processing for devices which
implement batching (netdev-dpdk currently).
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
The netdev_send function has been modified to accept multiple packets, to
allow netdev providers to amortize locking and queuing costs.
This is especially true for netdev-dpdk.
Later commits exploit the new API.
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
This commit introduces a new data structure used for receiving packets from
netdevs and passing them to dpifs.
The purpose of this change is to allow storing some private data for each
packet. The subsequent commits make use of it.
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Since dpif_netdev_enumerate() is used for "netdev" and "dummy" class, it
incorrectly lists dpif-netdevs as "dummy" and vice versa.
This patches address the issue by changing the dpif-provider interface: a
dpif_class parameter is passed to the 'enumerate' call to match the right class.
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This further eases porting existing hmap code to use cmap instead.
The iterator variants taking an explicit cursor are retained (renamed)
as they are needed when iteration is to be continued from the last
iterated node.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Commit 87400a3d4cc4a (dpif-netdev: Fix use-after-free in port_unref().)
fixed one use-after-free in the common case of port_unref(). However,
there was another, similar case: if port->netdev has no rxqs, then
the netdev_close() causes port->netdev to be destroyed and thus the
following call to netdev_n_rxq() accesses freed memory. This commit fixes
the problem.
Found by valgrind.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
When a bridge of datatype type netdev receives a packet, it
copies the packet from the NIC to a buffer in userspace.
Currently, when making an upcall, the packet is again copied
to the upcall's buffer. However, this extra copy is not
necessary when the datapath exists in userspace as the upcall
can directly access the packet data.
This patch eliminates this extra copy of the packet data in
most cases. In cases where the packet may still be used later
by callers of dp_netdev_execute_actions, making a copy of the
packet data is still necessary.
This patch also adds a dpdk_buf field to 'struct ofpbuf' when
using DPDK. This field holds a pointer to the allocated DPDK
buffer in the rte_mempool. Thus, an upcall packet ofpbuf
allocated on the stack can now share data and free memory of
a rte_mempool allocated ofpbuf.
Signed-off-by: Ryan Wilson <wryan@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
The test added in this commit would have caught the bug fixed by commit
96be8de595150 (bridge: When ports disappear from a datapath, add them
back.). With that commit reverted, the new test fails.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Gurucharan Shetty <gshetty@nicira.com>
When the last rxq is closed (which releases the rxq's internal reference
to its netdev) the next call to netdev_n_rxq() accesses freed memory.
Found by valgrind.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Reported-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Commit a6ce4b9d251 (ofproto-dpif-upcall: Avoid use-after-free in
revalidate() corner case.) showed that it is somewhat tricky to correctly
use the existing dpif flow dumping interface to obtain batches of flows.
One has to be careful about calling dpif_flow_dump_next_may_destroy_keys()
before going on to the next flow.
A better interface is possible, one that is naturally oriented toward
retrieving batches when that is a useful optimization. This commit
replaces the dpif interface by such a design, and updates both the
implementations and the callers to adopt it.
This is a fairly large change, but I think that the code in
ofproto-dpif-upcall is easier to understand after the change.
Signed-off-by: Ben Pfaff <blp@nicira.com>
The userspace and kernel datapaths previously differed on their
treatment of the recirc_id and dp_hash fields when sending upcalls.
While the kernel datapath would always serialise these fields, the
userspace would not. When using the userspace datapath, this would cause
a mismatch between the odp flow key in an upcall compared to the one
that is serialised upon flow_dump.
This patch brings the userspace datapath behaviour back in line with the
kernel datapath by always serialising recirc_id and dp_hash to odp.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
Simplify code and update comments after commit 61e7deb1.
("dpif-netdev: Use RCU to protect data.")
Acked-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
This allows use of miniflows that have all of their values inline.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
Thread names are occasionally very useful for debugging, but from time to
time we've forgotten to set one. This commit adds the new thread's name
as a parameter to the function to start a thread, to make that mistake
impossible. This also simplifies code, since two function calls become
only one.
This makes a few other changes to the thread creation function:
* Since it is no longer a direct wrapper around a pthread function,
rename it to avoid giving that impression.
* Remove 'pthread_attr_t *' param that every caller supplied as NULL.
* Change 'pthread *' parameter into a return value, for convenience.
The system-stats code hadn't set a thread name, so this fixes that issue.
This patch is a prerequisite for making RCU report the name of a thread
that is blocking RCU synchronization, because the easiest way to do that is
for ovsrcu_quiesce_end() to record the current thread's name.
ovsrcu_quiesce_end() is called before the thread function is called, so it
won't get a name set within the thread function itself. Setting the thread
name earlier, as in this patch, avoids the problem.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
Rename hash_bias to hash_basis to make it consistent with similar
usages.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Currently netlink flow (and mask) recirc_id attribute is only
serialized when the recirc_id value is non-zero. For this logic
to work correctly, the interpretation of the missing recirc_id
depends on whether the datapath supports recirculation.
This patch remove the ambiguity of the meaning of missing recirc_id
attribute in netlink message. When recirc_id is non-zero, or when
it is not a wildcard match, both key and mask attributes are
serialized. On the other hand, when recirc_id is zero, and being
wildcarded, they are not serialized. A missing recirc_id key and
mask attribute thus should always be interpreted as wildcard,
same as other flow fields.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Use miniflow as a flow key in the userspace datapath classifier. The
miniflow is expanded for upcalls, but for existing datapath flows, the
key need not be expanded.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Reviewed-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Currently recirculation action can optionally compute hash. This patch
adds a hash action that is independent of the recirc action, which
no longer computes hash. For megaflow bond with recirc, the output
to a bond port action will look like:
hash(hash_l4(0)), recirc(<recirc_id>)
Obviously, when a recirculation application that does not depend on
hash value can just use the recirc action alone.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Reviewed-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Acked-by: Pravin B Shelar <pshelar@nicira.com
If the actions executed during recirculation changed metadata fields,
then any actions after the recirculation returns would see those new
values. Now, all metadata are saved and restored across a recirculation.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Infrastructure to enable megaflow support for bond ports using
recirculation. This patch adds the following features:
* Generate RECIRC action when bond can benefit from recirculation.
* Populate post recirculation rules in a hidden table. Currently table 254.
* Uses post recirculation rules for bond rebalancing
* A recirculation implementation in dpif-netdev.
The goal of this patch is to be able to megaflow bond outputs and
thus greatly improve performance. However, this patch does not
actually improve the megaflow generation. It is left for a later commit.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
One case in the dpif_netdev_mask_from_nlattrs() function accidentally
wildcarded only a 16-bit subset of the mask's odp_port. On little-endian
machines this subset was the lower bits, which happened to work out OK,
but on big-endian machines this subset was the upper bits, which doesn't
work and causes a test failure. (The problem was actually visible in the
test expected results on little-endian machines, but we had not noticed.)
This commit unwildcards the whole field, fixing the problem, and updates
the test expected results to match.
This fixes the failure of test 732 seen here:
https://buildd.debian.org/status/fetch.php?pkg=openvswitch&arch=sparc&ver=2.1.0%2Bgit20140325-1&stamp=1396438624
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
sparse emits the following warning:
lib/dpif-netdev.c:1755:15: warning: Initializer entry defined twice
lib/dpif-netdev.c:1755:15: also defined here
due to a bug in sparse which doesn't like inlined functions which
expands a #define within it. This commit removes inline to make
sparse happy.
Signed-off-by: Pritesh Kothari <pritesh.kothari@cisco.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Wrap long lines, fix whitespaces, and fix a typo in a comment.
No functional changes are intended.
Cc: Andy Zhou <azhou@nicira.com>
Signed-off-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Add basic recirculation infrastructure and user space
data path support for it. The following bond mega flow patch will
make use of this infrastructure.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
new netdev type like DPDK can support multi-queue IO. Following
patch Adds support for same.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@redhat.com>