Thread names are occasionally very useful for debugging, but from time to
time we've forgotten to set one. This commit adds the new thread's name
as a parameter to the function to start a thread, to make that mistake
impossible. This also simplifies code, since two function calls become
only one.
This makes a few other changes to the thread creation function:
* Since it is no longer a direct wrapper around a pthread function,
rename it to avoid giving that impression.
* Remove 'pthread_attr_t *' param that every caller supplied as NULL.
* Change 'pthread *' parameter into a return value, for convenience.
The system-stats code hadn't set a thread name, so this fixes that issue.
This patch is a prerequisite for making RCU report the name of a thread
that is blocking RCU synchronization, because the easiest way to do that is
for ovsrcu_quiesce_end() to record the current thread's name.
ovsrcu_quiesce_end() is called before the thread function is called, so it
won't get a name set within the thread function itself. Setting the thread
name earlier, as in this patch, avoids the problem.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
Rename hash_bias to hash_basis to make it consistent with similar
usages.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Currently netlink flow (and mask) recirc_id attribute is only
serialized when the recirc_id value is non-zero. For this logic
to work correctly, the interpretation of the missing recirc_id
depends on whether the datapath supports recirculation.
This patch remove the ambiguity of the meaning of missing recirc_id
attribute in netlink message. When recirc_id is non-zero, or when
it is not a wildcard match, both key and mask attributes are
serialized. On the other hand, when recirc_id is zero, and being
wildcarded, they are not serialized. A missing recirc_id key and
mask attribute thus should always be interpreted as wildcard,
same as other flow fields.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Use miniflow as a flow key in the userspace datapath classifier. The
miniflow is expanded for upcalls, but for existing datapath flows, the
key need not be expanded.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Reviewed-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Currently recirculation action can optionally compute hash. This patch
adds a hash action that is independent of the recirc action, which
no longer computes hash. For megaflow bond with recirc, the output
to a bond port action will look like:
hash(hash_l4(0)), recirc(<recirc_id>)
Obviously, when a recirculation application that does not depend on
hash value can just use the recirc action alone.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Reviewed-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Acked-by: Pravin B Shelar <pshelar@nicira.com
If the actions executed during recirculation changed metadata fields,
then any actions after the recirculation returns would see those new
values. Now, all metadata are saved and restored across a recirculation.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Infrastructure to enable megaflow support for bond ports using
recirculation. This patch adds the following features:
* Generate RECIRC action when bond can benefit from recirculation.
* Populate post recirculation rules in a hidden table. Currently table 254.
* Uses post recirculation rules for bond rebalancing
* A recirculation implementation in dpif-netdev.
The goal of this patch is to be able to megaflow bond outputs and
thus greatly improve performance. However, this patch does not
actually improve the megaflow generation. It is left for a later commit.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
One case in the dpif_netdev_mask_from_nlattrs() function accidentally
wildcarded only a 16-bit subset of the mask's odp_port. On little-endian
machines this subset was the lower bits, which happened to work out OK,
but on big-endian machines this subset was the upper bits, which doesn't
work and causes a test failure. (The problem was actually visible in the
test expected results on little-endian machines, but we had not noticed.)
This commit unwildcards the whole field, fixing the problem, and updates
the test expected results to match.
This fixes the failure of test 732 seen here:
https://buildd.debian.org/status/fetch.php?pkg=openvswitch&arch=sparc&ver=2.1.0%2Bgit20140325-1&stamp=1396438624
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
sparse emits the following warning:
lib/dpif-netdev.c:1755:15: warning: Initializer entry defined twice
lib/dpif-netdev.c:1755:15: also defined here
due to a bug in sparse which doesn't like inlined functions which
expands a #define within it. This commit removes inline to make
sparse happy.
Signed-off-by: Pritesh Kothari <pritesh.kothari@cisco.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Wrap long lines, fix whitespaces, and fix a typo in a comment.
No functional changes are intended.
Cc: Andy Zhou <azhou@nicira.com>
Signed-off-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Add basic recirculation infrastructure and user space
data path support for it. The following bond mega flow patch will
make use of this infrastructure.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
new netdev type like DPDK can support multi-queue IO. Following
patch Adds support for same.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@redhat.com>
Preparation for multi queue netdev IO. There are no functional changes
in this patch.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@redhat.com>
This patch adds PMD type netdev for netdevice with poll-mode
drivers. Since there is no way to get signal on a packet recv
from these devices we need to poll them in busy loop. So minimize
system call overhead this patch uses dpif-thread exclusively
for PMD devices and rest of devices which needs system calls to
do IO are moved to dpif-netdev-run().
PMD device like DPDK work in userspace so there is no system call
overhead for them.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@redhat.com>
DPDK Poll mode thread need to keep ref to dpif-port.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Thomas Graf <tgraf@redhat.com>
DPDK netdev need to access ofpbuf while sending buffer. Following
patch changes netdev_send accordingly.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@redhat.com>
DPDK can receive multiple packets but current netdev API does
not allow that. Following patch allows dpif-netdev receive batch
of packet in a rx_recv() call for any netdev port. This will be
used by dpdk-netdev.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
threads read upcall.
This commit implements the API functions to allow multiple handler
threads read upcall.
Also, this commit removes the handling priority of DPIF_UC_MISS
over DPIF_UC_ACTION. So, both misses will be put to the same
queue. The decision is based on the fact that a lot has changed
since the age when flow setup rate is most treasured and starving
all actions in the presence of any flow misses doesn't seem like
a sound balancing solution.
Thusly the current implementation will be put in testing and
investigation for better balancing solution will continue if
there is an issue.
Also note, the introduction and use of flow_hash_5tuple() will
put missed ICMP packets from same source but with different
type/code to different handler queues. This may cause reordering
of these packets. For now, we do not count this as a problem.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This commit changes the API in 'dpif-provider.h' to allow multiple
handler threads call dpif_recv() simultaneously.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
TCP flags are already extracted from the flow, no need to parse them
again.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
The flow that created the netdev_flow might have wildcarded TCP flags,
or it may not be a TCP flow at all. Fix this by using the freshly
extracted flow key to parse TCP flags.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This should scale better than a single mutex, though still not
ideally.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
This allows clients to do more than just increment a counter. The
following commit will make the first use of that feature.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
It is better to explicitly initialize the dp->destroy than to rely
on xzalloc().
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
None of the atomic implementations need a destroy function anymore, so it's
"more standard" and more convenient for users to get rid of them.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
Change the flow_extract() API to accept struct pkt_metadata,
instead of individual metadata fields. It will make the API more
logical and easier to maintain when we need to expand metadata
down the road.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>¬
This new function allows callers to determine whether previously
returned keys will be modified or reallocated on the next call to
dpif_flow_dump_next(). This will be used in a future commit to allow
batched flow deletion by revalidator threads.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This patch makes it the caller's responsibility to initialize a
per-thread 'state' object and pass it down to the dpif_flow_dump_next()
implementation. The implementation can expect to be called from multiple
threads with the same 'iter' and different 'state' objects.
When flow_dump_next() returns non-zero, the implementation must ensure
that subsequent calls with the same arguments also return non-zero.
Subsequent calls with the same 'iter' and different 'state' may return
zero, but should make progress towards returning non-zero.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This patch separates the structures for thread-local flow dump state
("state") from the shared flow dump state ("iter") in dpif-linux and
dpif-netdev. Future patches will make use of this to allow multiple
threads to dump flows from the same flow dump operation.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
In dpif_netdev_flow_del() and dp_netdev_port_input(), the
referenced 'netdev_flow' is not un-referenced. This causes
the leak of the struct's memory.
This commit fixes the above issue by calling dp_netdev_flow_unref()
after using the reference.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This commit makes dp_netdev_flow_unref() and dp_netdev_actions_unref()
invoke the ovs_refcount_destroy() before freeing the corresponding
pointer.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This commit makes the userspace support for MPLS more complete. Now
up to 3 labels are supported.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Co-authored-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Simon Horman <horms@verge.net.au>
Jarno Rajahalme reported up to 40% performance gain on netperf TCP_CRR with
an earlier version of this patch in combination with a kernel NUMA patch,
together with a reduction in variance:
http://openvswitch.org/pipermail/dev/2014-January/035867.html
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
This is a first step in making thread safety more granular in dpif-netdev,
to allow for multithreaded forwarding.
Signed-off-by: Ben Pfaff <blp@nicira.com>
This is analogous to the split between rule and rule_actions in
ofproto. As there, it will allow retaining a reference to a rule's
actions, while processing them, without having to retain a reference
to the rule itself.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Normally all the ports have the same mtu anyhow, so there is little
advantage in keeping track of the maximum mtu on a per-bridge basis. In
upcoming commits, tracking mtu will require more locking and present
even less advantage (because the packet buffer will become per-thread, so
that reallocating once per thread becomes essentially a null cost).
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
The goal is to make it easy to divide the ports into groups for handling
by threads. It seems easy enough to do that by hash value, and a little
harder otherwise.
This commit has the side effect of raising the maximum number of ports from
256 to UINT32_MAX-1. That is why some tests need to be updated:
previously, internally generated port names like "ovs_vxlan_4341" were
ignored because 4341 is bigger than the previous limit of 256.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
ovsthread_counter is an abstract interface that could be implemented
different ways. The initial implementation is simple but less than
optimally efficient.
Signed-off-by: Ben Pfaff <blp@nicira.com>
This helps reduce confusion about when a flow is a flow and when it is
just metadata.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Use one callback instead of many, helps in adding new functionality
later on.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
The skb_priority, pkt_mark and tunl parameters of dp_netdev_port_input()
are always passed as 0, 0 and NULL respectively. So rather than
passing these values to dp_netdev_port_input() just use them directly.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>