This command can be used to check the port/rxq assignment to
pmd threads. For each pmd thread of the datapath shows list
of queue-ids with port names.
Additionally log message from pmd_thread_main() extended with
queue-id, and type of this message changed from INFO to DBG.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Currently, all of the PMD netdevs can only have the same number of
rx queues, which is specified in other_config:n-dpdk-rxqs.
Fix that by introducing of new option for PMD interfaces: 'n_rxq', which
specifies the maximum number of rx queues to be created for this
interface.
Example:
ovs-vsctl set Interface dpdk0 options:n_rxq=8
Old 'other_config:n-dpdk-rxqs' deleted.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
The 'key' pointer must point at the first unused element in the key
array.
Fixes: b89c678b7a26 ("dpif-netdev: optmizing emc_processing()")
CC: Andy Zhou <azhou@ovn.org>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Andy Zhou <azhou@nicira.com>
It is ok to iterate a cmap with CMAP_FOR_EACH and remove elements with
cmap_remove(), but having quiescent states inside the loop might create
problems, since some of the postponed cleanup done inside the cmap might
be executed, freeing the memory that the iterator is using.
We had several of these errors in dpif-netdev, because when we rearrange
ports or threads we often need to wait on a condition variable (which
implies a quiescent state).
This problem caused iterations to skip elements or to list them twice,
resulting in the main thread waiting on a condition without anyone else
to signal.
Fix these cases by moving the possible quiescent states outside
CMAP_FOR_EACH loops.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Tested-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Jarno Rajahalme <jarno@ovn.org>
When a group of packets arrives from a port, we loop through them to
initialize metadata and then we loop through them again to extract the
flow and perform the exact match classification.
This commit combines the two loops into one, and initializes packet->md
in emc_processing() to improve performance.
Since emc_processing() might also be called after recirculation (in
which case the metadata is already valid), an extra parameter is added
to support both cases.
This commits also implements simple prefetching of packet metadata,
to further improve performance.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Andy Zhou <azhou@ovn.org>
Acked-by: Chandran, Sugesh <sugesh.chandran@intel.com>
Commit d262ac2c60ce1da7b477737f70e8efd38b32502d introduced a slight
performance drop for the fast path, where every packets hits the
emc cache. This patch removes that performance drop by only reloading
the key pointer on emc cache miss.
Signed-off-by: Andy Zhou <azhou@ovn.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
For the machines I have access to, Reloading the same pointer from
memory seems to inhibit complier optimization somewhat.
In emc_processing(), using a single packet pointer, instead reloading
it from memory with packets[i], improves performance by 0.3 Mpps (tested
with 10G NIC pushing 64 byte packets, with the base line of 12.2 Mpps).
Besides improving performance, this patch should also improves code
readability.
Signed-off-by: Andy Zhou <azhou@ovn.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Before this commit, emc_processing() copied a netdev_flow_key if there was
no exact-match cache (EMC) hit. This commit eliminates the copy by
constructing the netdev_flow_key in the place it would be copied.
Found by inspection.
Shahbaz (CCed) reports that this reduces the cost of an EMC miss by 72
cycles in his test case in which the EMC is disabled. Presumably this
is similarly valuable in cases where the EMC merely has few hits.
For the original version of this patch, which was against a slightly
earlier version of OVS, Daniele reported that:
- With EMC disabled, this increases throughput from 4.8 Mpps to 5.4
Mpps.
- With EMC enabled, this decreases throughput from 12.4 to 12.0 Mpps.
CC: Muhammad Shahbaz <mshahbaz@cs.princeton.edu>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Andy Zhou <azhou@ovn.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
emc_processing() moves all the missed packets towards the beginning of
packet array; matched packets are queued up into flow queues. Since the
remaining of the packet array is not used anymore, don't bother swap
packet pointers to save cycles and simplify logic.
Signed-off-by: Andy Zhou <azhou@ovn.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Current logic counts dropped packet as cache hit which is not
correct. This patch removes dropped packet to improve accuracy.
Signed-off-by: Andy Zhou <azhou@ovn.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Currently tx_qid is equal to pmd->core_id. This leads to unexpected
behavior if pmd-cpu-mask different from '/(0*)(1|3|7)?(f*)/',
e.g. if core_ids are not sequential, or doesn't start from 0, or both.
Example:
starting 2 pmd threads with 1 port, 2 rxqs per port,
pmd-cpu-mask = 00000014 and let dev->real_n_txq = 2
It that case pmd_1->tx_qid = 2, pmd_2->tx_qid = 4 and
txq_needs_locking = true (if device hasn't ovs_numa_get_n_cores()+1
queues).
In that case, after truncating in netdev_dpdk_send__():
'qid = qid % dev->real_n_txq;'
pmd_1: qid = 2 % 2 = 0
pmd_2: qid = 4 % 2 = 0
So, both threads will call dpdk_queue_pkts() with same qid = 0.
This is unexpected behavior if there is 2 tx queues in device.
Queue #1 will not be used and both threads will lock queue #0
on each send.
Fix that by using sequential tx_qids.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Current rx queue management model is buggy and will not work properly
without additional barriers and other syncronization between PMD
threads and main thread.
Known BUGS of current model:
* While reloading, two PMD threads, one already reloaded and
one not yet reloaded, can poll same queue of the same port.
This behavior may lead to dpdk driver failure, because they
are not thread-safe.
* Same bug as fixed in commit e4e74c3a2b
("dpif-netdev: Purge all ukeys when reconfigure pmd.") but
reproduced while only reconfiguring of pmd threads without
restarting, because addition may change the sequence of
other ports, which is important in time of reconfiguration.
Introducing the new model, where distribution of queues made by main
thread with minimal synchronizations and without data races between
pmd threads. Also, this model should work faster, because only
needed threads will be interrupted for reconfiguraition and total
computational complexity of reconfiguration is less.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
When handling an upcall with the userspace datapath, it's currently
possible for a flow from a packet with no tunnel options to come back
with matches on the options. If that happens, dpif-netdev will
attempt to translate the wildcards provided by ofproto into the format
used by dpif. The translation requires use of the original wildcards
from the flow, which since they didn't exist, is uninitalized memory.
Matching on fields which don't actually exist is itself a bug. However,
this can occur when we attempt to set a tunnel option on the packet -
ofproto generates a match on the field in the original packet. This is
being fixed separately.
In other situations where we have a match on an unexpected field, we
simply ignore it. This happens with tunnel options with the kernel
datapath, non-tunnel fields that don't exist in the packet, and even
with Geneve where we do have some options but not the particular one
that was matched on. This brings the same behavior for this case and
avoids the possibility of accessing uninitialized memory.
Reported-by: Daniele Di Proietto <diproiettod@vmware.com>
Signed-off-by: Jesse Gross <jesse@kernel.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
This function will flush the connection tracking tables of a specific
datapath.
It simply calls a function pointer in the dpif_class. No dpif
currently implements the required interface.
The next commits will provide an implementation in dpif-netlink.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Joe Stringer <joe@ovn.org>
These function can be used to dump conntrack entries from a datapath.
They simply call a function pointer in the dpif_class. No dpif currently
implements the interface.
The next commits will provide an implementation in dpif-netlink.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Joe Stringer <joe@ovn.org>
This patch renames the command name related with geneve-map to a more
generic name as following:
add-geneve-map -> add-tlv-map
del-geneve-map -> del-tlv-map
dump-geneve-map -> dump-tlv-map
It also renames the Geneve_table to tlv_table.
By doing this renaming, the NSH variable context header (the same TLV
format as Geneve) or other protocol can reuse the field tun_metadata<N>
in the future.
Signed-off-by: Mengke Liu <mengke.liu@intel.com>
Signed-off-by: Ricky Li <ricky.li@intel.com>
Signed-off-by: Jesse Gross <jesse@kernel.org>
In the ODP context an empty mask netlink attribute usually means that
the flow should be an exact match.
odp_flow_key_to_mask{,_udpif}() instead return a struct flow_wildcards
with matches only on recirc_id and vlan_tci.
A more appropriate behavior is to handle a missing (zero length) netlink
mask specially (like we do in userspace and Linux datapath) and create
an exact match flow_wildcards from the original flow.
This fixes a bug in revalidate_ukey(): every flow created with
megaflows disabled would be revalidated away, because the mask would
seem too generic. (Another possible fix would be to handle the special
case of a missing mask in revalidate_ukey(), but this seems a more
generic solution).
Since we don't distinguish between IPv4 and IPv6 lookups, consolidate ARP
and ND cache into neighbor cache. Other references to ARP related to the
ARP cache but that are not really about ARP have been renamed as well.
tnl_arp_lookup is kept for lookups using IPv4 instead of IPv4-mapped
addresses, but that is going to be removed in a later patch.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
This allows --enable-dummy=system with a userland-only build.
It's useful for testsuite.
Signed-off-by: YAMAMOTO Takashi <yamamoto@midokura.com>
Acked-by: Ben Pfaff <blp@ovn.org>
This patch adds a new 128-bit metadata field to the connection tracking
interface. When a label is specified as part of the ct action and the
connection is committed, the value is saved with the current connection.
Subsequent ct lookups with the table specified will expose this metadata
as the "ct_label" field in the flow.
For example, to allow new TCP connections from port 1->2 and only allow
established connections from port 2->1, and to associate a label with
those connections:
table=0,priority=1,action=drop
table=0,arp,action=normal
table=0,in_port=1,tcp,action=ct(commit,exec(set_field:1->ct_label)),2
table=0,in_port=2,ct_state=-trk,tcp,action=ct(table=1)
table=1,in_port=2,ct_state=+trk,ct_label=1,tcp,action=1
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This patch adds a new 32-bit metadata field to the connection tracking
interface. When a mark is specified as part of the ct action and the
connection is committed, the value is saved with the current connection.
Subsequent ct lookups with the table specified will expose this metadata
as the "ct_mark" field in the flow.
For example, to allow new TCP connections from port 1->2 and only allow
established connections from port 2->1, and to associate a mark with those
connections:
table=0,priority=1,action=drop
table=0,arp,action=normal
table=0,in_port=1,tcp,action=ct(commit,exec(set_field:1->ct_mark)),2
table=0,in_port=2,ct_state=-trk,tcp,action=ct(table=1)
table=1,in_port=2,ct_state=+trk,ct_mark=1,tcp,action=1
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This patch adds a new action and fields to OVS that allow connection
tracking to be performed. This support works in conjunction with the
Linux kernel support merged into the Linux-4.3 development cycle.
Packets have two possible states with respect to connection tracking:
Untracked packets have not previously passed through the connection
tracker, while tracked packets have previously been through the
connection tracker. For OpenFlow pipeline processing, untracked packets
can become tracked, and they will remain tracked until the end of the
pipeline. Tracked packets cannot become untracked.
Connections can be unknown, uncommitted, or committed. Packets which are
untracked have unknown connection state. To know the connection state,
the packet must become tracked. Uncommitted connections have no
connection state stored about them, so it is only possible for the
connection tracker to identify whether they are a new connection or
whether they are invalid. Committed connections have connection state
stored beyond the lifetime of the packet, which allows later packets in
the same connection to be identified as part of the same established
connection, or related to an existing connection - for instance ICMP
error responses.
The new 'ct' action transitions the packet from "untracked" to
"tracked" by sending this flow through the connection tracker.
The following parameters are supported initally:
- "commit": When commit is executed, the connection moves from
uncommitted state to committed state. This signals that information
about the connection should be stored beyond the lifetime of the
packet within the pipeline. This allows future packets in the same
connection to be recognized as part of the same "established" (est)
connection, as well as identifying packets in the reply (rpl)
direction, or packets related to an existing connection (rel).
- "zone=[u16|NXM]": Perform connection tracking in the zone specified.
Each zone is an independent connection tracking context. When the
"commit" parameter is used, the connection will only be committed in
the specified zone, and not in other zones. This is 0 by default.
- "table=NUMBER": Fork pipeline processing in two. The original instance
of the packet will continue processing the current actions list as an
untracked packet. An additional instance of the packet will be sent to
the connection tracker, which will be re-injected into the OpenFlow
pipeline to resume processing in the specified table, with the
ct_state and other ct match fields set. If the table is not specified,
then the packet is submitted to the connection tracker, but the
pipeline does not fork and the ct match fields are not populated. It
is strongly recommended to specify a table later than the current
table to prevent loops.
When the "table" option is used, the packet that continues processing in
the specified table will have the ct_state populated. The ct_state may
have any of the following flags set:
- Tracked (trk): Connection tracking has occurred.
- Reply (rpl): The flow is in the reply direction.
- Invalid (inv): The connection tracker couldn't identify the connection.
- New (new): This is the beginning of a new connection.
- Established (est): This is part of an already existing connection.
- Related (rel): This connection is related to an existing connection.
For more information, consult the ovs-ofctl(8) man pages.
Below is a simple example flow table to allow outbound TCP traffic from
port 1 and drop traffic from port 2 that was not initiated by port 1:
table=0,priority=1,action=drop
table=0,arp,action=normal
table=0,in_port=1,tcp,ct_state=-trk,action=ct(commit,zone=9),2
table=0,in_port=2,tcp,ct_state=-trk,action=ct(zone=9,table=1)
table=1,in_port=2,ct_state=+trk+est,tcp,action=1
table=1,in_port=2,ct_state=+trk+new,tcp,action=drop
Based on original design by Justin Pettit, contributions from Thomas
Graf and Daniele Di Proietto.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
The Netlink encoding of datapath flow keys cannot express wildcarding
the presence of a VLAN tag. Instead, a missing VLAN tag is interpreted
as exact match on the fact that there is no VLAN. This makes reading
datapath flow dumps confusing, since for everything else, a missing
key value means that the corresponding key was wildcarded.
Unless we refactor a lot of code that translates between Netlink and
struct flow representations, we have to do the same in the userspace
datapath. This makes at least the flow install logs show that the
vlan_tci field is matched to zero. However, the datapath flow dumps
remain as they were before, as they are performed using the netlink
format.
Add a test to verify that packet with a vlan will not match a rule
that may seem wildcarding the presence of the vlan tag. Applying this
test without the userspace datapath modification showed that the
userspace datapath failed to create a new datapath flow for the VLAN
packet before this patch.
Reported-by: Tony van der Peet <tony.vanderpeet@gmail.com>
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
DPDK mbufs contain a valid RSS hash only if PKT_RX_RSS_HASH is
set in 'ol_flags'. Otherwise the hash is garbage and doesn't
relate to the packet.
This fixes an issue with vhost, which, being a virtual NIC, doesn't
compute the hash.
Reported-by: Dongjun <dongj@dtdream.com>
Suggested-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Kevin Traynor <kevin.traynor@intel.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Currently tnl-port table wildcard destination ip and mac addresses
for given tunnel packet. That could result accepting tunnel
packets destined for other hosts. Following patch adds
support for matching for ip and mac address.
IP address upates to tnl-port table are piggybacked on
ovs-router updates.
Reported-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
When dpdk configuration changes, all pmd threads are recreated
and rx queues of each port are reloaded. After this process,
rx queue could be mapped to a different pmd thread other than
the one before reconfiguration. However, this is totally
transparent to ofproto layer modules. So, if the ofproto-dpif-upcall
module still holds ukeys generated before pmd thread recreation,
this old ukey will collide with the ukey for the new upcalls
from same traffic flow, causing flow installation failure.
To fix the bug, this commit adds a new call-back function
in dpif layer for notifying upper layer the purging of datapath
(e.g. pmd thread deletion in dpif-netdev). So, the
ofproto-dpif-upcall module can react properly with deleting
the ukeys and with collecting flows' last stats.
Reported-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Alex Wang <ee07b291@gmail.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Tested-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
Struct miniflow is now sometimes used just as a map. Define a new
struct flowmap for that purpose. The flowmap is defined as an array of
maps, and it is automatically sized according to the size of struct
flow, so it will be easier to maintain in the future.
It would have been tempting to use the existing struct bitmap for this
purpose. The main reason this is not feasible at the moment is that
some flowmap algorithms are simpler when it can be assumed that no
struct flow member requires more bits than can fit to a single map
unit. The tunnel member already requires more than 32 bits, so the map
unit needs to be 64 bits wide.
Performance critical algorithms enumerate the flowmap array units
explicitly, as it is easier for the compiler to optimize, compared to
the normal iterator. Without this optimization a classifier lookup
without wildcard masks would be about 25% slower.
With this more general (and maintainable) algorithm the classifier
lookups are about 5% slower, when the struct flow actually becomes big
enough to require a second map. This negates the performance gained
in the "Pre-compute stage masks" patch earlier in the series.
Requested-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
For performance-critical threads like pmd threads, we currently make them
never call coverage_clear() to avoid contention over the global mutex
'coverage_mutex'. So, even though pmd thread still keeps updating their
thread-local coverage count, the count is never attributed to the global
total. But it is useful to have them available.
This commit makes this happen by implementing a non-contending version
of the clear function, coverage_try_clear(). The function will use
the ovs_mutex_trylock() and return immediately if the mutex cannot
be acquired. Since threads like pmd thread are always busy-looping,
the lock will eventually be acquired.
Requested-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com
The kernel implementation of Geneve options stores the TLV option
data in the flow exactly as received, without any further parsing.
This is then translated to known options for the purposes of matching
on flow setup (which will then install a datapath flow in the form
the kernel is expecting).
The userspace implementation behaves a little bit differently - it
looks up known options as each packet is received. The reason for this
is there is a much tighter coupling between datapath and flow translation
and the representation is generally expected to be the same. This works
but it incurs work on a per-packet basis that could be done per-flow
instead.
This introduces a small translation step for Geneve packets between
datapath and flow lookup for the userspace datapath in order to
allow the same kind of processing that the kernel does. A side effect
of this is that unknown options are now shown when flows dumped via
ovs-appctl dpif/dump-flows, similar to the kernel.
There is a second benefit to this as well: for some operations it is
preferable to keep the options exactly as they were received on the wire,
which this enables. One example is that for packets that are executed from
ofproto-dpif-upcall to the datapath, this avoids the translation of
Geneve metadata. Since this conversion is potentially lossy (for unknown
options), keeping everything in the same format removes the possibility
of dropping options if the packet comes back up to userspace and the
Geneve option translation table has changed. To help with these types of
operations, most functions can understand both formats of data and seamlessly
do the right thing.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
If ofproto-dpif installs a flow into the userspace datapath that doesn't
include a mask, we need to synthesize an exact match one. This is currently
done using the metaflow infrastructure, iterating over each field and
setting it to all ones.
There is a conceptual mismatch here because metaflow is operating on
OpenFlow fields, not datapath ones. Even though they are generally very
similar, there are subtle differences, which is why it is necessary to
fix up the input port mask.
With Geneve options, the mapping is much more complicated and so the
situation is worse. The first issue is that the metaflow to flow
mapping can change over time, so we would need to do more revalidation
to track this. In addition, an upcoming patch will completely disconnect
the option format between ofproto-dpif and dpif-netdev, so the values
written by metaflow don't make sense at all.
When megaflows are turned off, ofproto-dpif internally generates masks
using flow_wildcards_init_for_packet(). Since that's the same as what
we want to do here, we can just use that instead of metaflow.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Currently pmd threads select queues in pmd_load_queues() according to
get_n_pmd_threads_on_numa(). This behavior leads to race between pmds,
beacause dp_netdev_set_pmds_on_numa() starts them one by one and
current number of threads changes incrementally.
As a result we may have the following situation with 2 pmd threads:
* dp_netdev_set_pmds_on_numa()
* pmd12 thread started. Currently only 1 pmd thread exists.
dpif_netdev(pmd12)|INFO|Core 1 processing port 'port_1'
dpif_netdev(pmd12)|INFO|Core 1 processing port 'port_2'
* pmd14 thread started. 2 pmd threads exists.
dpif_netdev|INFO|Created 2 pmd threads on numa node 0
dpif_netdev(pmd14)|INFO|Core 2 processing port 'port_2'
We have:
core 1 --> port 1, port 2
core 2 --> port 2
Fix this by starting pmd threads only after all of them have
been configured.
Cc: Daniele Di Proietto <diproiettod at vmware.com>
Cc: Dyasly Sergey <s.dyasly at samsung.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Signed-off-by: Ilya Maximets <i.maximets at samsung.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Use two maps in miniflow to allow for expansion of struct flow past
512 bytes. We now have one map for tunnel related fields, and another
for the rest of the packet metadata and actual packet header fields.
This split has the benefit that for non-tunneled packets the overhead
should be minimal.
Some miniflow utilities now exist in two variants, new ones operating
over all the data, and the old ones operating only on a single 64-bit
map at a time. The old ones require doubling of code but should
execute faster, so those are used in the datapath and classifier's
lookup path.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
When no mask is passed in, dpif_netdev_mask_from_nlattrs() builds a
mask from all possible fields whose prerequisites are met. OpenFlow
metadata is not relevant for a datapath flow, so we skip registers and
metadata. Do this for XREGs as well.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
MSVC does not like zero sized arrays in structs. Hence, remove the
'values' member from struct miniflow and add back the getters
miniflow_values() and miniflow_get_values().
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Now that performance critical code already inlines miniflows and
minimasks, we can simplify struct miniflow by always dynamically
allocating miniflows and minimasks to the correct size. This changes
the struct minimatch to always contain pointers to its miniflow and
minimask.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Datapath support for some flow key fields is used inside ofproto-dpif as
well as odp-util. Share these fields using the same structure.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
The addition of Geneve options to packet metadata significantly
expanded its size. It was reported that this can decrease performance
for DPDK ports by up to 25% since we need to initialize the whole
structure on each packet receive.
It is not really necessary to zero out the entire structure because
miniflow_extract() only copies the tunnel metadata when particular
fields indicate that it is valid. Therefore, as long as we zero out
these fields when the metadata is initialized and ensure that the
rest of the structure is correctly set in the presence of a tunnel,
we can avoid touching the tunnel fields on packet reception.
Reported-by: Ciara Loftus <ciara.loftus@intel.com>
Tested-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
When using multiple PMDs and numerous ports, a performance gain
may be achieved in some use cases by pinning a PMD/port to a
particular (set of) core(s).
This patch provides a summary of the switch's port/core affinities
each time that the status of the switch's ports is modified.
Based on this information, a user may determine what affinity
modifications are required to optimise performance for their
particular use case.
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Wojciech Andralojc <wojciechx.andralojc@intel.com>
Acked-by: Flavio Leitner <fbl@redhat.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Sometimes we need to look at flow fields to understand how to parse
an attribute. However, masks don't have this information - just the
mask on the field. We already use the translated flow structure for
this purpose but this isn't always enough since sometimes we actually
need the raw netlink information. Fortunately, that is also readily
available so this passes it down from the appropriate callers.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Currently the functions to set, clear, and iterate over bitmaps
only operate over 32 bit values. If we convert them to handle
64 bit bitmaps, they can be used in more places.
Suggested-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Serializing between userspace flows and netlink attributes currently
requires several additional parameters besides the flows themselves.
This will continue to grow in the future as well. This converts
the function arguments to a parameters struct, which makes the code
easier to read and allowing irrelevant arguments to be omitted.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Until now there have been two variants for --enable-dummy:
* --enable-dummy: This adds support for "dummy" dpif and netdev.
* --enable-dummy=override: In addition, this replaces *every* existing
dpif and netdev by the dummy type.
The latter is useful for testing but it defeats the possibility of using
the userspace native tunneling implementation (because all the tunnel
netdevs get replaced by dummy netdevs). Thus, this commit adds a third
variant:
* --enable-dummy=system: This replaces the "system" dpif and netdev
by dummies but leaves the others untouched.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
When --enable-dummy=system or --enable-dummy=override is in use, dpifs
other than "dummy" are actually dummy dpifs, so use a more reliable test.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
It appears that miniflow_extract() in emc_processing() spends a lot of
cycles waiting for the packet's data to be read.
Prefetching the next packet's data while parsing removes this delay.
For a single flow pipeline the throughput improves by ~10%. With a
more realistic pipeline the change has a much smaller effect (~0.5%
improvement)
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
This function doesn't need to be exported in the public OVS headers, and
it had an inconsistent name compared to uuid_equals(). Rename and move.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Non pmd threads have a core_id == UINT32_MAX, while queue ids used by
netdevs range from 0 to the number of CPUs. Therefore core ids cannot
be used directly to select a queue.
This commit introduces a simple mapping to fix the problem: pmd threads
continue using queues 0 to N (where N is the number of CPUs in the
system), while non pmd threads use queue N+1.
Fixes: d5c199ea7ff7 ("netdev-dpdk: Properly support non pmd threads.")
Reported-by: 차은호 <eunho.cha@atto-research.com
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Signed-off-by: Mark D. Gray <mark.d.gray@intel.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Flavio Leitner <fbl@redhat.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
We used to reserve DPDK lcore 0 for non pmd operations, making it
difficult to use core 0 for packet processing.
DPDK 2.0 properly support non EAL threads with lcore LCORE_ID_ANY.
Using non EAL threads for non pmd threads, we do not need to reserve
any core for non pmd operations
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>