To easily allow both in- and out-of-tree building of the Python
wrapper for the OVS JSON parser (e.g. w/ pip), move json.h to
include/openvswitch. This also requires moving lib/{hmap,shash}.h.
Both hmap.h and shash.h were #include-ing "util.h" even though the
headers themselves did not use anything from there, but rather from
include/openvswitch/util.h. Fixing that required including util.h
in several C files mostly due to OVS_NOT_REACHED and things like
xmalloc.
Signed-off-by: Terry Wilson <twilson@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
The PMD thread needs to keep processing RX queues in order
to achieve maximum throughput. It also needs to sweep emc
cache and quiesce which use seq_mutex. That mutex can
eventually block the PMD thread causing latency spikes and
affecting the throughput.
Since there is no requirement for running those tasks at a
specific time, this patch extend seq API to allow tentative
locking instead.
Reported-by: Karl Rister <krister@redhat.com>
Co-authored-by: Karl Rister <krister@redhat.com>
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Currently, there are few inconsistencies in ways to configure number of
queues for netdev device:
* dpif-netdev can't know about exact number of queues
allocated inside netdev.
This leads to constant mapping of queue-ids to 'real' ones.
* We are able to configure 'n_rxq' for vhost-user devices, but
there is only one sane number of rx queues which must be used
and configured manually (number of queues that allocated
in QEMU).
This patch disables configuration of 'n_rxq' for DPDK vHost devices.
Configuration of rx and tx queues now automatically applied from
connected virtio device. Standard reconfiguration mechanism was used to
apply this changes.
Also, now 'n_txq' and 'n_rxq' are always the real numbers of queues
in the device.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
The patch adds a new action to support packet truncation. The new action
is formatted as 'output(port=n,max_len=m)', as output to port n, with
packet size being MIN(original_size, m).
One use case is to enable port mirroring to send smaller packets to the
destination port so that only useful packet information is mirrored/copied,
saving some performance overhead of copying entire packet payload. Example
use case is below as well as shown in the testcases:
- Output to port 1 with max_len 100 bytes.
- The output packet size on port 1 will be MIN(original_packet_size, 100).
# ovs-ofctl add-flow br0 'actions=output(port=1,max_len=100)'
- The scope of max_len is limited to output action itself. The following
packet size of output:1 and output:2 will be intact.
# ovs-ofctl add-flow br0 \
'actions=output(port=1,max_len=100),output:1,output:2'
- The Datapath actions shows:
# Datapath actions: trunc(100),1,1,2
Tested-at: https://travis-ci.org/williamtu/ovs-travis/builds/140037134
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
When debug logging is enabled, dpif-netdev can print each flow as it is
installed, which it currently does using OpenFlow match formatting. Compared
to ODP formatting, there generally isn't too much difference since the
fields are largely the same but it is inconsistent with other logging in
dpif-netdev as well as the analogous functions that deal with the kernel.
However, in some cases there is a difference between the two formats, such
as in the cases of input port or tunnel metadata. For input port, datapath
format helped detect that the generated masks were incorrect. As for tunnels,
at the moment, it's possible to convert between the two formats on demand as
we have a global metadata table. In the future, though this won't be possible
as the metadata table becomes per-bridge which the datapath won't have access
to.
Signed-off-by: Jesse Gross <jesse@kernel.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
When calling odp_flow_key_from_flow (or _mask), the in_port included
as part of the flow is ignored and must be explicitly passed as a
separate parameter. This is because the assumption was that the flow's
version would often be in OFP format, rather than ODP.
However, at this point all flows that are ready for serialization in
netlink format already have their in_port properly set to ODP format.
As a result, every caller needs to explicitly initialize the extra
paramter to the value that is in the flow. This switches to just use
the value in the flow to simply things and avoid the possibility of
forgetting to initialize the extra parameter.
Signed-off-by: Jesse Gross <jesse@kernel.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
This commit moves the code that sets the pmd threads affinity from
netdev-dpdk to ovs-numa. There's one small part left in netdev-dpdk, to
set the lcore_id.
Now dpif-netdev will call both modules (ovs-numa and netdev-dpdk) when
starting a pmd thread.
This change will allow having a dummy implementation of the set affinity
call, for testing purposes.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Currently 'dpctl/flow-get' doesn't work for flows installed by
PMD threads.
Fix that by implementing search across all PMD threads. Will be returned
flow from first PMD thread with match.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
This introduces in dpif-netdev and netdev-dpdk the first use for the
newly introduce reconfigure netdev call.
When a request to change the number of queues comes, netdev-dpdk will
remember this and notify the upper layer via
netdev_request_reconfigure().
The datapath, instead of periodically calling netdev_set_multiq(), can
detect this and call reconfigure().
This mechanism can also be used to:
* Automatically match the number of rxq with the one provided by qemu
via the new_device callback.
* Provide a way to change the MTU of dpdk devices at runtime.
* Move a DPDK vhost device to the proper NUMA socket.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Tested-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Errors returned by netdev_set_multiq() and netdev_rxq_open() weren't
handled properly in reconfigure_pmd_threads(). In case of error now we
remove the port from the datapath.
Also, part of the code is moved in a new function, port_reconfigure().
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Tested-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
netdev objects are hard to use with RCU, because it's not possible to
split removal and reclamation. Postponing the removal means that the
port is not removed and cannot be readded immediately. Waiting for
reclamation means introducing a quiescent state, and that may introduce
subtle bugs, due to the RCU model we use in userspace.
This commit changes the port container from cmap to hmap. 'port_mutex'
must be held by readers and writers. This shouldn't have performance
impact, as readers in the fast path use a thread local cache.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Tested-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
A future commit will stop using RCU for 'dp->ports' and use a mutex for
reading/writing them. To avoid taking a mutex in dp_execute_cb(), which
is called in the fast path, this commit introduces a pmd thread local
cache of ports.
The downside is that every port add/remove now needs to synchronize with
every pmd thread.
Among the advantages, keeping a per thread port mapping could allow
greater control over the txq assigment.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Tested-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
The pmds and the main threads are synchronized using a condition
variable. The main thread writes a new configuration, then it waits on
the condition variable. A pmd thread reads the new configuration, then
it calls signal() on the condition variable. To make sure that the pmds
and the main thread have a consistent view, each signal() should be
backed by a wait().
Currently the first signal() doesn't have a corresponding wait(). If
the pmd thread takes a long time to start and the signal() is received
by a later wait, the threads will have an inconsistent view.
The commit fixes the problem by removing the first signal() from the
pmd thread.
This is hardly a problem on current master, because the main thread
will call the first wait() a long time after the creation of a pmd
thread. It becomes a problem with the next commits.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Tested-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
This commit introduces some functions to add/remove rxqs from pmd
threads without reloading them. They will be used by next commits.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Tested-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Instead of performing every operation inside do_port_add() it seems
clearer to introduce port_create(), since we already have
port_destroy().
No functional change.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Tested-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
The datapath code expects the RSS hash to always be initialized. This
is enforced by checking in emc_processing() that the hash is valid, and
eventually by computing a new one.
Unfortunately, there is another entry point to the datapath,
dpif_netdev_execute(). A packet generated by OVS (BFD frame,
packet-out from controller) doesn't have a valid RSS hash and so is
allowed to enter the datapath with an uninitialized hash value.
This commit recomputes the hash (if not valid) in dpif_netdev_execute().
The only place where we would use an invalid hash is netdev-vport, in
push_udp_header(). This caused an uninitialized memory read, and a
random value to be assigned to the outer tunnel header source port.
Reported-by: William Tu <u9012063@gmail.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: William Tu <u9012063@gmail.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Large segment support need to use this refactored function to
send individual segments.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
Once datapath support large packets, we need to segment packet before
sending it to upcall. Refactoring this code make it bit cleaner.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
in case of error from netdev_push_header() batch of packets was not
freed. Following patch fixes this issue.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
The tunnel header pop action can leak batch of packet
in case of error. Following patch fixex the error code path.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
DPDK datapath operate on batch of packets. To pass the batch of
packets around we use packets array and count. Next patch needs
to associate meta-data with each batch of packets. So Introducing
a batch structure to make handling the metadata easier.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
Next patch introduces new structure named packet_batch. So
I am renaming it to packet_batch_per_flow.
This does not change any functionality.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
Current tunnel-pop API does not allow the netdev implementation
retain a packet but STT can keep a packet from batch of packets
during TCP reassembly processing. To return exact count of
valid packet STT need to pass this number of packet parameter
as a reference.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
Commit f2d105b5 (ofproto-dpif-xlate: xlate ct_{mark, label} correctly.)
introduced the ovs_u128_and() function. It directly takes ovs_u128
values as arguments instead of pointers to them. As this is a bit more
direct way to deal with 128-bit values, modify the other utility
functions to do the same.
Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
After removing a flow from the dpcls classifier there might still be
readers who have access to the flow, until the next grace period.
Setting flow->cr.mask to NULL can cause concurrent readers to crash,
so this commit avoids doing it.
The crash can be reproduced, for example, by invoking an operation
that cause datapath flows to be deleted (such as `ovs-appctl
upcall/enable-megaflows`) while traffic is running.
I think the assignment was intended just as a safety measure to catch
race conditions, and it should be safe to remove.
Here's a stack trace of a possible crash:
Program terminated with signal SIGSEGV, Segmentation fault.
rule=0x7f3ae8006190) at ../lib/dpif-netdev.c:4156
4156 if (OVS_UNLIKELY((value & *maskp++) != *keyp++)) {
(gdb) bt
rule=0x7f3ae8006190) at ../lib/dpif-netdev.c:4156
rules=0x7f3afa3f2e40, cnt=<optimized out>) at ../lib/dpif-netdev.c:4225
(pmd=pmd@entry=0x7f3afa3fc010, packets=packets@entry=0x7f3afa3fa420,
cnt=cnt@entry=32, keys=keys@entry=0x7f3afa3f6428,
batches=batches@entry=0x7f3afa3f4118,
n_batches=n_batches@entry=0x7f3afa3fa3b0)
at ../lib/dpif-netdev.c:3483
(pmd=pmd@entry=0x7f3afa3fc010, packets=packets@entry=0x7f3afa3fa420,
cnt=<optimized out>, md_is_valid=md_is_valid@entry=false,
port_no=<optimized out>) at ../lib/dpif-netdev.c:3625
cnt=<optimized out>, packets=0x7f3afa3fa420, pmd=0x7f3afa3fc010) at
../lib/dpif-netdev.c:3642
rxq=<optimized out>, port=<optimized out>, port=<optimized out>) at
../lib/dpif-netdev.c:2574
../lib/dpif-netdev.c:2693
../lib/ovs-thread.c:340
pthread_create.c:312
../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Fixes: 361d808dd9e4("flow: Split miniflow's map.")
CC: Jarno Rajahalme <jarno@ovn.org>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Only the main thread will delete ports after pausing every other
thread. There's no need to keep count.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Tested-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
It is only used in the testsuite and it can be replaced by a dpctl
command.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Tested-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
This will ease deleting a port with no open rxqs.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Tested-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
This fixes multiple error path mistakes in do_add_port, none of which
has been a problem in practice so far. This change will make it easier
for a following commit to return in case of error.
Also, this removes an unneeded special case for tunnel ports.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Tested-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Since PMD threads are placed on the NUMA node of the port regardless
of a possible pmd-cpu-mask setting, this can lead to a somewhat
confusing "out of unpinned cores" message - there might be plenty
of available cores in the mask but they cannot be used if the port
is on different NUMA node than the cores. Report the NUMA node
number to help diagnosing the issue.
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1295952
Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
This attempts to prevent namespace collisions with other list libraries
Signed-off-by: Ben Warren <ben@skyportsystems.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
All code is now in include/openvswitch/list.h.
Signed-off-by: Ben Warren <ben@skyportsystems.com>
Acked-by: Ryan Moats <rmoats@us.ibm.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
While adding of pmd interface with multiple queues several queues
may be assigned to one thread and this thread will be reloaded
one time for each added queue.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
This command can be used to check the port/rxq assignment to
pmd threads. For each pmd thread of the datapath shows list
of queue-ids with port names.
Additionally log message from pmd_thread_main() extended with
queue-id, and type of this message changed from INFO to DBG.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Currently, all of the PMD netdevs can only have the same number of
rx queues, which is specified in other_config:n-dpdk-rxqs.
Fix that by introducing of new option for PMD interfaces: 'n_rxq', which
specifies the maximum number of rx queues to be created for this
interface.
Example:
ovs-vsctl set Interface dpdk0 options:n_rxq=8
Old 'other_config:n-dpdk-rxqs' deleted.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
The 'key' pointer must point at the first unused element in the key
array.
Fixes: b89c678b7a26 ("dpif-netdev: optmizing emc_processing()")
CC: Andy Zhou <azhou@ovn.org>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Andy Zhou <azhou@nicira.com>
It is ok to iterate a cmap with CMAP_FOR_EACH and remove elements with
cmap_remove(), but having quiescent states inside the loop might create
problems, since some of the postponed cleanup done inside the cmap might
be executed, freeing the memory that the iterator is using.
We had several of these errors in dpif-netdev, because when we rearrange
ports or threads we often need to wait on a condition variable (which
implies a quiescent state).
This problem caused iterations to skip elements or to list them twice,
resulting in the main thread waiting on a condition without anyone else
to signal.
Fix these cases by moving the possible quiescent states outside
CMAP_FOR_EACH loops.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Tested-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Jarno Rajahalme <jarno@ovn.org>
When a group of packets arrives from a port, we loop through them to
initialize metadata and then we loop through them again to extract the
flow and perform the exact match classification.
This commit combines the two loops into one, and initializes packet->md
in emc_processing() to improve performance.
Since emc_processing() might also be called after recirculation (in
which case the metadata is already valid), an extra parameter is added
to support both cases.
This commits also implements simple prefetching of packet metadata,
to further improve performance.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Andy Zhou <azhou@ovn.org>
Acked-by: Chandran, Sugesh <sugesh.chandran@intel.com>