sparse emits the following warning:
lib/dpif-netdev.c:1755:15: warning: Initializer entry defined twice
lib/dpif-netdev.c:1755:15: also defined here
due to a bug in sparse which doesn't like inlined functions which
expands a #define within it. This commit removes inline to make
sparse happy.
Signed-off-by: Pritesh Kothari <pritesh.kothari@cisco.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Wrap long lines, fix whitespaces, and fix a typo in a comment.
No functional changes are intended.
Cc: Andy Zhou <azhou@nicira.com>
Signed-off-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Add basic recirculation infrastructure and user space
data path support for it. The following bond mega flow patch will
make use of this infrastructure.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
new netdev type like DPDK can support multi-queue IO. Following
patch Adds support for same.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@redhat.com>
Preparation for multi queue netdev IO. There are no functional changes
in this patch.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@redhat.com>
This patch adds PMD type netdev for netdevice with poll-mode
drivers. Since there is no way to get signal on a packet recv
from these devices we need to poll them in busy loop. So minimize
system call overhead this patch uses dpif-thread exclusively
for PMD devices and rest of devices which needs system calls to
do IO are moved to dpif-netdev-run().
PMD device like DPDK work in userspace so there is no system call
overhead for them.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@redhat.com>
DPDK Poll mode thread need to keep ref to dpif-port.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Thomas Graf <tgraf@redhat.com>
DPDK netdev need to access ofpbuf while sending buffer. Following
patch changes netdev_send accordingly.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@redhat.com>
DPDK can receive multiple packets but current netdev API does
not allow that. Following patch allows dpif-netdev receive batch
of packet in a rx_recv() call for any netdev port. This will be
used by dpdk-netdev.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
threads read upcall.
This commit implements the API functions to allow multiple handler
threads read upcall.
Also, this commit removes the handling priority of DPIF_UC_MISS
over DPIF_UC_ACTION. So, both misses will be put to the same
queue. The decision is based on the fact that a lot has changed
since the age when flow setup rate is most treasured and starving
all actions in the presence of any flow misses doesn't seem like
a sound balancing solution.
Thusly the current implementation will be put in testing and
investigation for better balancing solution will continue if
there is an issue.
Also note, the introduction and use of flow_hash_5tuple() will
put missed ICMP packets from same source but with different
type/code to different handler queues. This may cause reordering
of these packets. For now, we do not count this as a problem.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This commit changes the API in 'dpif-provider.h' to allow multiple
handler threads call dpif_recv() simultaneously.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
TCP flags are already extracted from the flow, no need to parse them
again.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
The flow that created the netdev_flow might have wildcarded TCP flags,
or it may not be a TCP flow at all. Fix this by using the freshly
extracted flow key to parse TCP flags.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This should scale better than a single mutex, though still not
ideally.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
This allows clients to do more than just increment a counter. The
following commit will make the first use of that feature.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
It is better to explicitly initialize the dp->destroy than to rely
on xzalloc().
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
None of the atomic implementations need a destroy function anymore, so it's
"more standard" and more convenient for users to get rid of them.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
Change the flow_extract() API to accept struct pkt_metadata,
instead of individual metadata fields. It will make the API more
logical and easier to maintain when we need to expand metadata
down the road.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>¬
This new function allows callers to determine whether previously
returned keys will be modified or reallocated on the next call to
dpif_flow_dump_next(). This will be used in a future commit to allow
batched flow deletion by revalidator threads.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This patch makes it the caller's responsibility to initialize a
per-thread 'state' object and pass it down to the dpif_flow_dump_next()
implementation. The implementation can expect to be called from multiple
threads with the same 'iter' and different 'state' objects.
When flow_dump_next() returns non-zero, the implementation must ensure
that subsequent calls with the same arguments also return non-zero.
Subsequent calls with the same 'iter' and different 'state' may return
zero, but should make progress towards returning non-zero.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This patch separates the structures for thread-local flow dump state
("state") from the shared flow dump state ("iter") in dpif-linux and
dpif-netdev. Future patches will make use of this to allow multiple
threads to dump flows from the same flow dump operation.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
In dpif_netdev_flow_del() and dp_netdev_port_input(), the
referenced 'netdev_flow' is not un-referenced. This causes
the leak of the struct's memory.
This commit fixes the above issue by calling dp_netdev_flow_unref()
after using the reference.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This commit makes dp_netdev_flow_unref() and dp_netdev_actions_unref()
invoke the ovs_refcount_destroy() before freeing the corresponding
pointer.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This commit makes the userspace support for MPLS more complete. Now
up to 3 labels are supported.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Co-authored-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Simon Horman <horms@verge.net.au>
Jarno Rajahalme reported up to 40% performance gain on netperf TCP_CRR with
an earlier version of this patch in combination with a kernel NUMA patch,
together with a reduction in variance:
http://openvswitch.org/pipermail/dev/2014-January/035867.html
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
This is a first step in making thread safety more granular in dpif-netdev,
to allow for multithreaded forwarding.
Signed-off-by: Ben Pfaff <blp@nicira.com>
This is analogous to the split between rule and rule_actions in
ofproto. As there, it will allow retaining a reference to a rule's
actions, while processing them, without having to retain a reference
to the rule itself.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Normally all the ports have the same mtu anyhow, so there is little
advantage in keeping track of the maximum mtu on a per-bridge basis. In
upcoming commits, tracking mtu will require more locking and present
even less advantage (because the packet buffer will become per-thread, so
that reallocating once per thread becomes essentially a null cost).
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
The goal is to make it easy to divide the ports into groups for handling
by threads. It seems easy enough to do that by hash value, and a little
harder otherwise.
This commit has the side effect of raising the maximum number of ports from
256 to UINT32_MAX-1. That is why some tests need to be updated:
previously, internally generated port names like "ovs_vxlan_4341" were
ignored because 4341 is bigger than the previous limit of 256.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
ovsthread_counter is an abstract interface that could be implemented
different ways. The initial implementation is simple but less than
optimally efficient.
Signed-off-by: Ben Pfaff <blp@nicira.com>
This helps reduce confusion about when a flow is a flow and when it is
just metadata.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Use one callback instead of many, helps in adding new functionality
later on.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
The skb_priority, pkt_mark and tunl parameters of dp_netdev_port_input()
are always passed as 0, 0 and NULL respectively. So rather than
passing these values to dp_netdev_port_input() just use them directly.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>
The command ovs-dpctl can wrongly output the masks even if the
datapath does not implement mega flows. In this case the output
will be similar to the following:
system@ovs-system:
lookups: hit:14 missed:41 lost:0
flows: 0
masks: hit:18446744073709551615 total:4294967295
hit/pkt:335395346794719104.00
port 0: ovs-system (internal)
port 1: gre_system (gre: df_default=false, ttl=0)
port 2: ots-br0 (internal)
port 3: int0 (internal)
port 4: vnet0
port 5: vnet1
The problem depends on the fact that n_masks stats is stored as a
uint32 in the struct ovs_dp_megaflow_stats and as a uint64 in the
struct dpif_dp_stats. UINT32_MAX instead of UINT64_MAX should be
used to detect if the datapath supports megaflows or not.
Signed-off-by: Francesco Fusco <ffusco@redhat.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Allowing the packet to be modified by execution allows less data
copying for userspace action execution. Some users of the
dpif_execute already expect that the packet may be modified. This
patch makes this behavior uniform and makes the userspace datapath and
the execution helpers modify the packet as it is being executed.
Userspace action now steals the packet if given permission, as the
packet is normally not needed after it. The only exception is the
sample action, and this is accounted for my keeping track of any
actions that could be following the userspace action.
The packet in dpif_upcall is changed from a pointer to a struct,
allowing the packet to be honest about it's headroom. After this
change the packet can safely be pushed on over the precarious 4 byte
limit earlier allowed by the netlink data preceding the packet.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Normally OVS userspace supplies a mask along with a flow key for each
new data path flow that should be created. OVS also provides an
option to disable the kernel wildcarding, in which case the flows are
created without a mask. When kernel wildcarding is disabled, the
datapath should use exact match, i.e. not wildcard any bits in the
flow key. Currently, what happens with the userspace datapath instead
is that a datapath flow with mostly empty mask is created (i.e., most
fields are wildcarded), as the current code does not examine the given
mask key length to find out that the mask key is actually empty. This
results in the same datapath flow matching on packets of multiple
different flows, wrong actions being processed, and stats being
incorrect.
This patch refactors userspace datapath code to explicitly initialize
a suitable exact match mask when a flow put without a mask is
executed.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Subtable lookup is performed in ranges defined for struct flow,
starting from metadata (registers, in_port, etc.), then L2 header, L3,
and finally L4 ports. Whenever it is found that there are no matches
in the current subtable, the rest of the subtable can be skipped. The
rationale of this logic is that as many fields as possible can remain
wildcarded.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Userspace action needs the original flow key. This also
matches the kernel datapath behavior.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Extract the flow key from the packet instead of the execute->key.
This reflects how the kernel datapath behaves.
Also use ofpbuf_clone_with_headroom() instead of open coding the same.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Instead of an exact match flow table, we introduce a classifier.
This enables mega-flows in userspace datapath.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
[blp@nicira.com tweaked flow lookup]
Signed-off-by: Ben Pfaff <blp@nicira.com>
'struct dp_netdev_flow' is currently being instantiated as 'flow'.
An upcoming commit introduces a classifier to dpif-netdev
which uses 'struct flow' at a few places and that can cause
confusion while reading code.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Widen TCP flags handling from 7 bits (uint8_t) to 12 bits (uint16_t).
The kernel interface remains at 8 bits, which makes no functional
difference now, as none of the higher bits is currently of interest
to the userspace.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>