2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-28 21:07:47 +00:00

344 Commits

Author SHA1 Message Date
Ben Pfaff
8ba0a5227f ovs-thread: Make caller provide thread name when creating a thread.
Thread names are occasionally very useful for debugging, but from time to
time we've forgotten to set one.  This commit adds the new thread's name
as a parameter to the function to start a thread, to make that mistake
impossible.  This also simplifies code, since two function calls become
only one.

This makes a few other changes to the thread creation function:

    * Since it is no longer a direct wrapper around a pthread function,
      rename it to avoid giving that impression.

    * Remove 'pthread_attr_t *' param that every caller supplied as NULL.

    * Change 'pthread *' parameter into a return value, for convenience.

The system-stats code hadn't set a thread name, so this fixes that issue.

This patch is a prerequisite for making RCU report the name of a thread
that is blocking RCU synchronization, because the easiest way to do that is
for ovsrcu_quiesce_end() to record the current thread's name.
ovsrcu_quiesce_end() is called before the thread function is called, so it
won't get a name set within the thread function itself.  Setting the thread
name earlier, as in this patch, avoids the problem.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
2014-04-28 15:25:49 -07:00
Andy Zhou
62ac1f20e9 openvswitch.h: rename hash action definition
Rename hash_bias to hash_basis to make it consistent with similar
usages.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2014-04-20 22:28:01 -07:00
Andy Zhou
fbfe01de0d odp-util: Always generate key/mask pair in netlink for recirc_id
Currently netlink flow (and mask) recirc_id attribute is only
serialized when the recirc_id value is non-zero. For this logic
to work correctly, the interpretation of the missing recirc_id
depends on whether the datapath supports recirculation.

This patch remove the ambiguity of the meaning of missing recirc_id
attribute in netlink message.  When recirc_id is non-zero, or when
it is not a wildcard match, both key and mask attributes are
serialized.  On the other hand, when recirc_id is zero, and being
wildcarded, they are not serialized.  A missing recirc_id key and
mask attribute thus should always be interpreted as wildcard,
same as other flow fields.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-04-20 22:27:55 -07:00
Jarno Rajahalme
4f15074492 dpif-netdev: Use miniflow as a flow key.
Use miniflow as a flow key in the userspace datapath classifier.  The
miniflow is expanded for upcalls, but for existing datapath flows, the
key need not be expanded.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Reviewed-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
2014-04-18 08:39:44 -07:00
Andy Zhou
347bf289b3 dpif-netdev: Move hash function out of the recirc action, into its own action
Currently recirculation action can optionally compute hash. This patch
adds a hash action that is independent of the recirc action, which
no longer computes hash.  For megaflow bond with recirc, the output
to a bond port action will look like:

    hash(hash_l4(0)), recirc(<recirc_id>)

Obviously, when a recirculation application that does not depend on
hash value can just use the recirc action alone.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Reviewed-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Acked-by: Pravin B Shelar <pshelar@nicira.com
2014-04-16 15:30:42 -07:00
Andy Zhou
cd527139bb dpif-netdev: Use existing flow for computing dp hash
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-04-09 10:59:48 -07:00
Andy Zhou
4347b9b38e dpif-netdev: preserve packet metadata fields across recirculation
If the actions executed during recirculation changed metadata fields,
then any actions after the recirculation returns would see those new
values. Now, all metadata are saved and restored across a recirculation.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-04-09 10:59:48 -07:00
Andy Zhou
adcf00ba35 ofproto/bond: Implement bond megaflow using recirculation
Infrastructure to enable megaflow support for bond ports using
recirculation. This patch adds the following features:
* Generate RECIRC action when bond can benefit from recirculation.
* Populate post recirculation rules in a hidden table. Currently table 254.
* Uses post recirculation rules for bond rebalancing
* A recirculation implementation in dpif-netdev.

The goal of this patch is to be able to megaflow bond outputs and
thus greatly improve performance. However, this patch does not
actually improve the megaflow generation. It is left for a later commit.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-04-07 19:55:30 -07:00
Ben Pfaff
f3f750e5ae dpif-netdev: Unwildcard entire odp_port in dpif_netdev_mask_from_nlattrs().
One case in the dpif_netdev_mask_from_nlattrs() function accidentally
wildcarded only a 16-bit subset of the mask's odp_port.  On little-endian
machines this subset was the lower bits, which happened to work out OK,
but on big-endian machines this subset was the upper bits, which doesn't
work and causes a test failure.  (The problem was actually visible in the
test expected results on little-endian machines, but we had not noticed.)

This commit unwildcards the whole field, fixing the problem, and updates
the test expected results to match.

This fixes the failure of test 732 seen here:
https://buildd.debian.org/status/fetch.php?pkg=openvswitch&arch=sparc&ver=2.1.0%2Bgit20140325-1&stamp=1396438624

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-04-05 10:27:05 -07:00
Pravin Shelar
1f317cb5c2 ofpbuf: Introduce access api for base, data and size.
These functions will be used by later patches.  Following patch
does not change functionality.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-03-30 06:18:43 -07:00
Pritesh Kothari
5794e276b4 sparse: workaround for a bug in sparse.
sparse emits the following warning:
lib/dpif-netdev.c:1755:15: warning: Initializer entry defined twice
lib/dpif-netdev.c:1755:15:   also defined here
due to a bug in sparse which doesn't like inlined functions which
expands a #define within it. This commit removes inline to make
sparse happy.

Signed-off-by: Pritesh Kothari <pritesh.kothari@cisco.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2014-03-28 14:40:07 -07:00
YAMAMOTO Takashi
9b516652a1 recirculation: Some cosmetic fixes
Wrap long lines, fix whitespaces, and fix a typo in a comment.
No functional changes are intended.

Cc: Andy Zhou <azhou@nicira.com>
Signed-off-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Signed-off-by: Andy Zhou <azhou@nicira.com>
2014-03-28 13:14:18 -07:00
Andy Zhou
572f732ab0 dpif-netdev: user space datapath recirculation
Add basic recirculation infrastructure and user space
data path support for it. The following bond mega flow patch will
make use of this infrastructure.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-03-25 13:24:39 -07:00
Pravin
8617affff4 netdev-dpdk: Use multiple core for dpdk IO.
DPDK need to set _lcore_id for using multiple core.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@redhat.com>
2014-03-21 11:48:28 -07:00
Pravin
55c955bd8a netdev: Add support multiqueue recv.
new netdev type like DPDK can support multi-queue IO. Following
patch Adds support for same.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@redhat.com>
2014-03-21 11:48:28 -07:00
Pravin
f77917408a netdev: Rename netdev_rx to netdev_rxq
Preparation for multi queue netdev IO.  There are no functional changes
in this patch.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@redhat.com>
2014-03-21 11:48:28 -07:00
Pravin
e4cfed38b1 dpif-netdev: Add poll-mode-device thread.
This patch adds PMD type netdev for netdevice with poll-mode
drivers.  Since there is no way to get signal on a packet recv
from these devices we need to poll them in busy loop.  So minimize
system call overhead this patch uses dpif-thread exclusively
for PMD devices and rest of devices which needs system calls to
do IO are moved to dpif-netdev-run().
PMD device like DPDK work in userspace so there is no system call
overhead for them.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@redhat.com>
2014-03-21 11:48:28 -07:00
Pravin
b284085e55 dpif-netdev: Add ref-counting for port.
DPDK Poll mode thread need to keep ref to dpif-port.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Thomas Graf <tgraf@redhat.com>
2014-03-21 11:48:28 -07:00
Pravin
40d26f04b2 netdev: Send ofpbuf directly to netdev.
DPDK netdev need to access ofpbuf while sending buffer. Following
patch changes netdev_send accordingly.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@redhat.com>
2014-03-21 11:48:28 -07:00
Pravin
df1e5a3bc7 netdev: Extend rx_recv to pass multiple packets.
DPDK can receive multiple packets but current netdev API does
not allow that.  Following patch allows dpif-netdev receive batch
of packet in a rx_recv() call for any netdev port.  This will be
used by dpdk-netdev.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-03-21 11:48:28 -07:00
Alex Wang
63be20bee2 dpif-netdev: Implement the API functions to allow multiple handler
threads read upcall.

This commit implements the API functions to allow multiple handler
threads read upcall.

Also, this commit removes the handling priority of DPIF_UC_MISS
over DPIF_UC_ACTION.  So, both misses will be put to the same
queue.  The decision is based on the fact that a lot has changed
since the age when flow setup rate is most treasured and starving
all actions in the presence of any flow misses doesn't seem like
a sound balancing solution.

Thusly the current implementation will be put in testing and
investigation for better balancing solution will continue if
there is an issue.

Also note, the introduction and use of flow_hash_5tuple() will
put missed ICMP packets from same source but with different
type/code to different handler queues.  This may cause reordering
of these packets.  For now, we do not count this as a problem.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-03-20 10:27:20 -07:00
Alex Wang
1954e6bbcb dpif: Change dpif API to allow multiple handler threads read upcall.
This commit changes the API in 'dpif-provider.h' to allow multiple
handler threads call dpif_recv() simultaneously.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-03-20 10:27:10 -07:00
Jarno Rajahalme
e0eecb1ca1 lib: Use tcp_flags from flow.
TCP flags are already extracted from the flow, no need to parse them
again.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2014-03-19 16:13:32 -07:00
Jarno Rajahalme
855dd13c9a dpif-netdev: Use packet key to parse TCP flags.
The flow that created the netdev_flow might have wildcarded TCP flags,
or it may not be a TCP flow at all.  Fix this by using the freshly
extracted flow key to parse TCP flags.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2014-03-19 16:13:32 -07:00
Ben Pfaff
61e7deb143 dpif-netdev: Use RCU to protect data.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
2014-03-19 07:48:43 -07:00
Ben Pfaff
679ba04cab dpif-netdev: Use ovsthread_stats for flow stats.
This should scale better than a single mutex, though still not
ideally.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
2014-03-19 07:48:42 -07:00
Ben Pfaff
51852a57a0 ovs-thread: Replace ovsthread_counter by more general ovsthread_stats.
This allows clients to do more than just increment a counter.  The
following commit will make the first use of that feature.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
2014-03-19 07:47:12 -07:00
Andy Zhou
1a65ba8544 dpif-netdev: init atomic flag dp->destroyed
It is better to explicitly initialize the dp->destroy than to rely
on xzalloc().

Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-03-18 00:40:10 -07:00
Ben Pfaff
8917f72cbb ovs-atomic: Delete atomic, atomic_flag, ovs_refcount destroy functions.
None of the atomic implementations need a destroy function anymore, so it's
"more standard" and more convenient for users to get rid of them.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
2014-03-13 12:45:47 -07:00
Andy Zhou
b5e7e61a99 lib: simplify flow_extract() API
Change the flow_extract() API to accept struct pkt_metadata,
instead of individual metadata fields. It will make the API more
logical and easier to maintain when we need to expand metadata
down the road.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>¬
2014-02-28 16:29:37 -08:00
Joe Stringer
bdeadfdd95 dpif: New function flow_dump_next_may_destroy_keys().
This new function allows callers to determine whether previously
returned keys will be modified or reallocated on the next call to
dpif_flow_dump_next(). This will be used in a future commit to allow
batched flow deletion by revalidator threads.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2014-02-27 14:39:21 -08:00
Joe Stringer
d2ad7ef178 dpif: Make dpif_flow_dump_next() thread-safe.
This patch makes it the caller's responsibility to initialize a
per-thread 'state' object and pass it down to the dpif_flow_dump_next()
implementation. The implementation can expect to be called from multiple
threads with the same 'iter' and different 'state' objects.

When flow_dump_next() returns non-zero, the implementation must ensure
that subsequent calls with the same arguments also return non-zero.
Subsequent calls with the same 'iter' and different 'state' may return
zero, but should make progress towards returning non-zero.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2014-02-27 14:30:25 -08:00
Joe Stringer
e723fd32d5 dpif: Separate local and shared flow dump state.
This patch separates the structures for thread-local flow dump state
("state") from the shared flow dump state ("iter") in dpif-linux and
dpif-netdev. Future patches will make use of this to allow multiple
threads to dump flows from the same flow dump operation.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2014-02-27 14:27:32 -08:00
Alex Wang
71c24bb0f8 dpif-netdev: Fix memory leak.
In dpif_netdev_flow_del() and dp_netdev_port_input(), the
referenced 'netdev_flow' is not un-referenced.  This causes
the leak of the struct's memory.

This commit fixes the above issue by calling dp_netdev_flow_unref()
after using the reference.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-02-21 14:07:46 -08:00
Alex Wang
3754832be4 dpif-netdev: Call ovs_refcount_destroy() before free().
This commit makes dp_netdev_flow_unref() and dp_netdev_actions_unref()
invoke the ovs_refcount_destroy() before freeing the corresponding
pointer.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-02-21 14:05:07 -08:00
Ben Pfaff
8bfd0fdace Enhance userspace support for MPLS, for up to 3 labels.
This commit makes the userspace support for MPLS more complete.  Now
up to 3 labels are supported.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Co-authored-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-02-04 10:41:30 -08:00
Ben Pfaff
80e448834d dpif-netdev: Make a log message more detailed.
This would have helped me track down a bug I was hunting just now.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
2014-02-04 08:11:45 -08:00
Ben Pfaff
06f8162043 classifier: Use fat_rwlock instead of ovs_rwlock.
Jarno Rajahalme reported up to 40% performance gain on netperf TCP_CRR with
an earlier version of this patch in combination with a kernel NUMA patch,
together with a reduction in variance:
    http://openvswitch.org/pipermail/dev/2014-January/035867.html

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
2014-01-14 14:45:10 -08:00
Ben Pfaff
6c3eee823e dpif-netdev: Use separate threads for forwarding.
For now, we use exactly two threads.  Presumably at some point we will want
to make this configurable.

Signed-off-by: Ben Pfaff <blp@nicira.com>
2014-01-08 17:13:32 -08:00
Ben Pfaff
8a4e3a858a dpif-netdev: Make thread-safety much more granular.
This will allow for parallelism in multithreaded forwarding in an upcoming
commit.

Signed-off-by: Ben Pfaff <blp@nicira.com>
2014-01-08 17:13:32 -08:00
Ben Pfaff
f5126b5727 dpif-netdev: Introduce new mutex to protect queues.
This is a first step in making thread safety more granular in dpif-netdev,
to allow for multithreaded forwarding.

Signed-off-by: Ben Pfaff <blp@nicira.com>
2014-01-08 17:13:31 -08:00
Ben Pfaff
a84cb64a9e dpif-netdev: Break actions out into new struct dp_netdev_actions.
This is analogous to the split between rule and rule_actions in
ofproto.  As there, it will allow retaining a reference to a rule's
actions, while processing them, without having to retain a reference
to the rule itself.

Signed-off-by: Ben Pfaff <blp@nicira.com>
2014-01-08 17:13:31 -08:00
Ben Pfaff
6a8267c5b7 dpif-netdev: Take advantage of ovs_refcount for dp_netdev.
By making "destroyed" own a reference, we can treat dp_netdev's ref_cnt
like any other in Open vSwitch.

Signed-off-by: Ben Pfaff <blp@nicira.com>
2014-01-08 17:13:31 -08:00
Ben Pfaff
5c8d2fcad0 dpif-netdev: Remove max_mtu tracking.
Normally all the ports have the same mtu anyhow, so there is little
advantage in keeping track of the maximum mtu on a per-bridge basis.  In
upcoming commits, tracking mtu will require more locking and present
even less advantage (because the packet buffer will become per-thread, so
that reallocating once per thread becomes essentially a null cost).

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
2014-01-08 17:11:14 -08:00
Ben Pfaff
ff073a71f9 dpif-netdev: Use hmap instead of list+array for tracking ports.
The goal is to make it easy to divide the ports into groups for handling
by threads.  It seems easy enough to do that by hash value, and a little
harder otherwise.

This commit has the side effect of raising the maximum number of ports from
256 to UINT32_MAX-1.  That is why some tests need to be updated:
previously, internally generated port names like "ovs_vxlan_4341" were
ignored because 4341 is bigger than the previous limit of 256.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
2014-01-08 17:11:09 -08:00
Ben Pfaff
ed27e010b9 dpif-netdev: Use new "ovsthread_counter" to track dp statistics.
ovsthread_counter is an abstract interface that could be implemented
different ways.  The initial implementation is simple but less than
optimally efficient.

Signed-off-by: Ben Pfaff <blp@nicira.com>
2014-01-08 17:10:32 -08:00
Ben Pfaff
9e5026938c dpif: Remove unused 'get_max_ports' from provider interface.
Nothing ever called this function.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
2014-01-08 17:10:31 -08:00
Jarno Rajahalme
758c456df5 dpif: Use explicit packet metadata.
This helps reduce confusion about when a flow is a flow and when it is
just metadata.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2013-12-30 16:52:43 -08:00
Jarno Rajahalme
09f9da0bca odp-execute: Consolidate callbacks.
Use one callback instead of many, helps in adding new functionality
later on.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2013-12-30 15:58:58 -08:00
Simon Horman
77790ca7b1 dpif-netdev: Remove unnecessary parameters from dp_netdev_port_input()
The skb_priority, pkt_mark and tunl parameters of dp_netdev_port_input()
are always passed as 0, 0 and NULL respectively. So rather than
passing these values to dp_netdev_port_input() just use them directly.

Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2013-12-17 16:31:34 -08:00