This commit introduces some functions to add/remove rxqs from pmd
threads without reloading them. They will be used by next commits.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Tested-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Instead of performing every operation inside do_port_add() it seems
clearer to introduce port_create(), since we already have
port_destroy().
No functional change.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Tested-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
The datapath code expects the RSS hash to always be initialized. This
is enforced by checking in emc_processing() that the hash is valid, and
eventually by computing a new one.
Unfortunately, there is another entry point to the datapath,
dpif_netdev_execute(). A packet generated by OVS (BFD frame,
packet-out from controller) doesn't have a valid RSS hash and so is
allowed to enter the datapath with an uninitialized hash value.
This commit recomputes the hash (if not valid) in dpif_netdev_execute().
The only place where we would use an invalid hash is netdev-vport, in
push_udp_header(). This caused an uninitialized memory read, and a
random value to be assigned to the outer tunnel header source port.
Reported-by: William Tu <u9012063@gmail.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: William Tu <u9012063@gmail.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Large segment support need to use this refactored function to
send individual segments.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
Once datapath support large packets, we need to segment packet before
sending it to upcall. Refactoring this code make it bit cleaner.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
in case of error from netdev_push_header() batch of packets was not
freed. Following patch fixes this issue.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
The tunnel header pop action can leak batch of packet
in case of error. Following patch fixex the error code path.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
DPDK datapath operate on batch of packets. To pass the batch of
packets around we use packets array and count. Next patch needs
to associate meta-data with each batch of packets. So Introducing
a batch structure to make handling the metadata easier.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
Next patch introduces new structure named packet_batch. So
I am renaming it to packet_batch_per_flow.
This does not change any functionality.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
Current tunnel-pop API does not allow the netdev implementation
retain a packet but STT can keep a packet from batch of packets
during TCP reassembly processing. To return exact count of
valid packet STT need to pass this number of packet parameter
as a reference.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
Commit f2d105b5 (ofproto-dpif-xlate: xlate ct_{mark, label} correctly.)
introduced the ovs_u128_and() function. It directly takes ovs_u128
values as arguments instead of pointers to them. As this is a bit more
direct way to deal with 128-bit values, modify the other utility
functions to do the same.
Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
After removing a flow from the dpcls classifier there might still be
readers who have access to the flow, until the next grace period.
Setting flow->cr.mask to NULL can cause concurrent readers to crash,
so this commit avoids doing it.
The crash can be reproduced, for example, by invoking an operation
that cause datapath flows to be deleted (such as `ovs-appctl
upcall/enable-megaflows`) while traffic is running.
I think the assignment was intended just as a safety measure to catch
race conditions, and it should be safe to remove.
Here's a stack trace of a possible crash:
Program terminated with signal SIGSEGV, Segmentation fault.
rule=0x7f3ae8006190) at ../lib/dpif-netdev.c:4156
4156 if (OVS_UNLIKELY((value & *maskp++) != *keyp++)) {
(gdb) bt
rule=0x7f3ae8006190) at ../lib/dpif-netdev.c:4156
rules=0x7f3afa3f2e40, cnt=<optimized out>) at ../lib/dpif-netdev.c:4225
(pmd=pmd@entry=0x7f3afa3fc010, packets=packets@entry=0x7f3afa3fa420,
cnt=cnt@entry=32, keys=keys@entry=0x7f3afa3f6428,
batches=batches@entry=0x7f3afa3f4118,
n_batches=n_batches@entry=0x7f3afa3fa3b0)
at ../lib/dpif-netdev.c:3483
(pmd=pmd@entry=0x7f3afa3fc010, packets=packets@entry=0x7f3afa3fa420,
cnt=<optimized out>, md_is_valid=md_is_valid@entry=false,
port_no=<optimized out>) at ../lib/dpif-netdev.c:3625
cnt=<optimized out>, packets=0x7f3afa3fa420, pmd=0x7f3afa3fc010) at
../lib/dpif-netdev.c:3642
rxq=<optimized out>, port=<optimized out>, port=<optimized out>) at
../lib/dpif-netdev.c:2574
../lib/dpif-netdev.c:2693
../lib/ovs-thread.c:340
pthread_create.c:312
../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Fixes: 361d808dd9e4("flow: Split miniflow's map.")
CC: Jarno Rajahalme <jarno@ovn.org>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Only the main thread will delete ports after pausing every other
thread. There's no need to keep count.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Tested-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
It is only used in the testsuite and it can be replaced by a dpctl
command.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Tested-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
This will ease deleting a port with no open rxqs.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Tested-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
This fixes multiple error path mistakes in do_add_port, none of which
has been a problem in practice so far. This change will make it easier
for a following commit to return in case of error.
Also, this removes an unneeded special case for tunnel ports.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Tested-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Since PMD threads are placed on the NUMA node of the port regardless
of a possible pmd-cpu-mask setting, this can lead to a somewhat
confusing "out of unpinned cores" message - there might be plenty
of available cores in the mask but they cannot be used if the port
is on different NUMA node than the cores. Report the NUMA node
number to help diagnosing the issue.
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1295952
Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
This attempts to prevent namespace collisions with other list libraries
Signed-off-by: Ben Warren <ben@skyportsystems.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
All code is now in include/openvswitch/list.h.
Signed-off-by: Ben Warren <ben@skyportsystems.com>
Acked-by: Ryan Moats <rmoats@us.ibm.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
While adding of pmd interface with multiple queues several queues
may be assigned to one thread and this thread will be reloaded
one time for each added queue.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
This command can be used to check the port/rxq assignment to
pmd threads. For each pmd thread of the datapath shows list
of queue-ids with port names.
Additionally log message from pmd_thread_main() extended with
queue-id, and type of this message changed from INFO to DBG.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Currently, all of the PMD netdevs can only have the same number of
rx queues, which is specified in other_config:n-dpdk-rxqs.
Fix that by introducing of new option for PMD interfaces: 'n_rxq', which
specifies the maximum number of rx queues to be created for this
interface.
Example:
ovs-vsctl set Interface dpdk0 options:n_rxq=8
Old 'other_config:n-dpdk-rxqs' deleted.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
The 'key' pointer must point at the first unused element in the key
array.
Fixes: b89c678b7a26 ("dpif-netdev: optmizing emc_processing()")
CC: Andy Zhou <azhou@ovn.org>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Andy Zhou <azhou@nicira.com>
It is ok to iterate a cmap with CMAP_FOR_EACH and remove elements with
cmap_remove(), but having quiescent states inside the loop might create
problems, since some of the postponed cleanup done inside the cmap might
be executed, freeing the memory that the iterator is using.
We had several of these errors in dpif-netdev, because when we rearrange
ports or threads we often need to wait on a condition variable (which
implies a quiescent state).
This problem caused iterations to skip elements or to list them twice,
resulting in the main thread waiting on a condition without anyone else
to signal.
Fix these cases by moving the possible quiescent states outside
CMAP_FOR_EACH loops.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Tested-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Jarno Rajahalme <jarno@ovn.org>
When a group of packets arrives from a port, we loop through them to
initialize metadata and then we loop through them again to extract the
flow and perform the exact match classification.
This commit combines the two loops into one, and initializes packet->md
in emc_processing() to improve performance.
Since emc_processing() might also be called after recirculation (in
which case the metadata is already valid), an extra parameter is added
to support both cases.
This commits also implements simple prefetching of packet metadata,
to further improve performance.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Andy Zhou <azhou@ovn.org>
Acked-by: Chandran, Sugesh <sugesh.chandran@intel.com>
Commit d262ac2c60ce1da7b477737f70e8efd38b32502d introduced a slight
performance drop for the fast path, where every packets hits the
emc cache. This patch removes that performance drop by only reloading
the key pointer on emc cache miss.
Signed-off-by: Andy Zhou <azhou@ovn.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
For the machines I have access to, Reloading the same pointer from
memory seems to inhibit complier optimization somewhat.
In emc_processing(), using a single packet pointer, instead reloading
it from memory with packets[i], improves performance by 0.3 Mpps (tested
with 10G NIC pushing 64 byte packets, with the base line of 12.2 Mpps).
Besides improving performance, this patch should also improves code
readability.
Signed-off-by: Andy Zhou <azhou@ovn.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Before this commit, emc_processing() copied a netdev_flow_key if there was
no exact-match cache (EMC) hit. This commit eliminates the copy by
constructing the netdev_flow_key in the place it would be copied.
Found by inspection.
Shahbaz (CCed) reports that this reduces the cost of an EMC miss by 72
cycles in his test case in which the EMC is disabled. Presumably this
is similarly valuable in cases where the EMC merely has few hits.
For the original version of this patch, which was against a slightly
earlier version of OVS, Daniele reported that:
- With EMC disabled, this increases throughput from 4.8 Mpps to 5.4
Mpps.
- With EMC enabled, this decreases throughput from 12.4 to 12.0 Mpps.
CC: Muhammad Shahbaz <mshahbaz@cs.princeton.edu>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Andy Zhou <azhou@ovn.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
emc_processing() moves all the missed packets towards the beginning of
packet array; matched packets are queued up into flow queues. Since the
remaining of the packet array is not used anymore, don't bother swap
packet pointers to save cycles and simplify logic.
Signed-off-by: Andy Zhou <azhou@ovn.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Current logic counts dropped packet as cache hit which is not
correct. This patch removes dropped packet to improve accuracy.
Signed-off-by: Andy Zhou <azhou@ovn.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Currently tx_qid is equal to pmd->core_id. This leads to unexpected
behavior if pmd-cpu-mask different from '/(0*)(1|3|7)?(f*)/',
e.g. if core_ids are not sequential, or doesn't start from 0, or both.
Example:
starting 2 pmd threads with 1 port, 2 rxqs per port,
pmd-cpu-mask = 00000014 and let dev->real_n_txq = 2
It that case pmd_1->tx_qid = 2, pmd_2->tx_qid = 4 and
txq_needs_locking = true (if device hasn't ovs_numa_get_n_cores()+1
queues).
In that case, after truncating in netdev_dpdk_send__():
'qid = qid % dev->real_n_txq;'
pmd_1: qid = 2 % 2 = 0
pmd_2: qid = 4 % 2 = 0
So, both threads will call dpdk_queue_pkts() with same qid = 0.
This is unexpected behavior if there is 2 tx queues in device.
Queue #1 will not be used and both threads will lock queue #0
on each send.
Fix that by using sequential tx_qids.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Current rx queue management model is buggy and will not work properly
without additional barriers and other syncronization between PMD
threads and main thread.
Known BUGS of current model:
* While reloading, two PMD threads, one already reloaded and
one not yet reloaded, can poll same queue of the same port.
This behavior may lead to dpdk driver failure, because they
are not thread-safe.
* Same bug as fixed in commit e4e74c3a2b
("dpif-netdev: Purge all ukeys when reconfigure pmd.") but
reproduced while only reconfiguring of pmd threads without
restarting, because addition may change the sequence of
other ports, which is important in time of reconfiguration.
Introducing the new model, where distribution of queues made by main
thread with minimal synchronizations and without data races between
pmd threads. Also, this model should work faster, because only
needed threads will be interrupted for reconfiguraition and total
computational complexity of reconfiguration is less.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
When handling an upcall with the userspace datapath, it's currently
possible for a flow from a packet with no tunnel options to come back
with matches on the options. If that happens, dpif-netdev will
attempt to translate the wildcards provided by ofproto into the format
used by dpif. The translation requires use of the original wildcards
from the flow, which since they didn't exist, is uninitalized memory.
Matching on fields which don't actually exist is itself a bug. However,
this can occur when we attempt to set a tunnel option on the packet -
ofproto generates a match on the field in the original packet. This is
being fixed separately.
In other situations where we have a match on an unexpected field, we
simply ignore it. This happens with tunnel options with the kernel
datapath, non-tunnel fields that don't exist in the packet, and even
with Geneve where we do have some options but not the particular one
that was matched on. This brings the same behavior for this case and
avoids the possibility of accessing uninitialized memory.
Reported-by: Daniele Di Proietto <diproiettod@vmware.com>
Signed-off-by: Jesse Gross <jesse@kernel.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
This function will flush the connection tracking tables of a specific
datapath.
It simply calls a function pointer in the dpif_class. No dpif
currently implements the required interface.
The next commits will provide an implementation in dpif-netlink.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Joe Stringer <joe@ovn.org>
These function can be used to dump conntrack entries from a datapath.
They simply call a function pointer in the dpif_class. No dpif currently
implements the interface.
The next commits will provide an implementation in dpif-netlink.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Joe Stringer <joe@ovn.org>
This patch renames the command name related with geneve-map to a more
generic name as following:
add-geneve-map -> add-tlv-map
del-geneve-map -> del-tlv-map
dump-geneve-map -> dump-tlv-map
It also renames the Geneve_table to tlv_table.
By doing this renaming, the NSH variable context header (the same TLV
format as Geneve) or other protocol can reuse the field tun_metadata<N>
in the future.
Signed-off-by: Mengke Liu <mengke.liu@intel.com>
Signed-off-by: Ricky Li <ricky.li@intel.com>
Signed-off-by: Jesse Gross <jesse@kernel.org>
In the ODP context an empty mask netlink attribute usually means that
the flow should be an exact match.
odp_flow_key_to_mask{,_udpif}() instead return a struct flow_wildcards
with matches only on recirc_id and vlan_tci.
A more appropriate behavior is to handle a missing (zero length) netlink
mask specially (like we do in userspace and Linux datapath) and create
an exact match flow_wildcards from the original flow.
This fixes a bug in revalidate_ukey(): every flow created with
megaflows disabled would be revalidated away, because the mask would
seem too generic. (Another possible fix would be to handle the special
case of a missing mask in revalidate_ukey(), but this seems a more
generic solution).
Since we don't distinguish between IPv4 and IPv6 lookups, consolidate ARP
and ND cache into neighbor cache. Other references to ARP related to the
ARP cache but that are not really about ARP have been renamed as well.
tnl_arp_lookup is kept for lookups using IPv4 instead of IPv4-mapped
addresses, but that is going to be removed in a later patch.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
This allows --enable-dummy=system with a userland-only build.
It's useful for testsuite.
Signed-off-by: YAMAMOTO Takashi <yamamoto@midokura.com>
Acked-by: Ben Pfaff <blp@ovn.org>