2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-28 21:07:47 +00:00

576 Commits

Author SHA1 Message Date
Alex Wang
e4e74c3a2b dpif-netdev: Purge all ukeys when reconfigure pmd.
When dpdk configuration changes, all pmd threads are recreated
and rx queues of each port are reloaded.  After this process,
rx queue could be mapped to a different pmd thread other than
the one before reconfiguration.  However, this is totally
transparent to ofproto layer modules.  So, if the ofproto-dpif-upcall
module still holds ukeys generated before pmd thread recreation,
this old ukey will collide with the ukey for the new upcalls
from same traffic flow, causing flow installation failure.

To fix the bug, this commit adds a new call-back function
in dpif layer for notifying upper layer the purging of datapath
(e.g. pmd thread deletion in dpif-netdev).  So, the
ofproto-dpif-upcall module can react properly with deleting
the ukeys and with collecting flows' last stats.

Reported-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Alex Wang <ee07b291@gmail.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Tested-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
2015-09-02 05:57:59 +00:00
Jarno Rajahalme
5fcff47b0b flow: Add struct flowmap.
Struct miniflow is now sometimes used just as a map.  Define a new
struct flowmap for that purpose.  The flowmap is defined as an array of
maps, and it is automatically sized according to the size of struct
flow, so it will be easier to maintain in the future.

It would have been tempting to use the existing struct bitmap for this
purpose. The main reason this is not feasible at the moment is that
some flowmap algorithms are simpler when it can be assumed that no
struct flow member requires more bits than can fit to a single map
unit. The tunnel member already requires more than 32 bits, so the map
unit needs to be 64 bits wide.

Performance critical algorithms enumerate the flowmap array units
explicitly, as it is easier for the compiler to optimize, compared to
the normal iterator.  Without this optimization a classifier lookup
without wildcard masks would be about 25% slower.

With this more general (and maintainable) algorithm the classifier
lookups are about 5% slower, when the struct flow actually becomes big
enough to require a second map.  This negates the performance gained
in the "Pre-compute stage masks" patch earlier in the series.

Requested-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-08-26 15:37:22 -07:00
Alex Wang
fbe0962b28 coverage: Add coverage_try_clear() for performance-critical threads.
For performance-critical threads like pmd threads, we currently make them
never call coverage_clear() to avoid contention over the global mutex
'coverage_mutex'.  So, even though pmd thread still keeps updating their
thread-local coverage count, the count is never attributed to the global
total.  But it is useful to have them available.

This commit makes this happen by implementing a non-contending version
of the clear function, coverage_try_clear().  The function will use
the ovs_mutex_trylock() and return immediately if the mutex cannot
be acquired.  Since threads like pmd thread are always busy-looping,
the lock will eventually be acquired.

Requested-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com
2015-08-25 16:21:36 -07:00
Jesse Gross
6728d578f6 dpif-netdev: Translate Geneve options per-flow, not per-packet.
The kernel implementation of Geneve options stores the TLV option
data in the flow exactly as received, without any further parsing.
This is then translated to known options for the purposes of matching
on flow setup (which will then install a datapath flow in the form
the kernel is expecting).

The userspace implementation behaves a little bit differently - it
looks up known options as each packet is received. The reason for this
is there is a much tighter coupling between datapath and flow translation
and the representation is generally expected to be the same. This works
but it incurs work on a per-packet basis that could be done per-flow
instead.

This introduces a small translation step for Geneve packets between
datapath and flow lookup for the userspace datapath in order to
allow the same kind of processing that the kernel does. A side effect
of this is that unknown options are now shown when flows dumped via
ovs-appctl dpif/dump-flows, similar to the kernel.

There is a second benefit to this as well: for some operations it is
preferable to keep the options exactly as they were received on the wire,
which this enables. One example is that for packets that are executed from
ofproto-dpif-upcall to the datapath, this avoids the translation of
Geneve metadata. Since this conversion is potentially lossy (for unknown
options), keeping everything in the same format removes the possibility
of dropping options if the packet comes back up to userspace and the
Geneve option translation table has changed. To help with these types of
operations, most functions can understand both formats of data and seamlessly
do the right thing.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2015-08-05 20:26:48 -07:00
Jesse Gross
9f861c9182 dpif-netdev: Don't use metaflow to operate on userspace datapath fields.
If ofproto-dpif installs a flow into the userspace datapath that doesn't
include a mask, we need to synthesize an exact match one. This is currently
done using the metaflow infrastructure, iterating over each field and
setting it to all ones.

There is a conceptual mismatch here because metaflow is operating on
OpenFlow fields, not datapath ones. Even though they are generally very
similar, there are subtle differences, which is why it is necessary to
fix up the input port mask.

With Geneve options, the mapping is much more complicated and so the
situation is worse. The first issue is that the metaflow to flow
mapping can change over time, so we would need to do more revalidation
to track this. In addition, an upcoming patch will completely disconnect
the option format between ofproto-dpif and dpif-netdev, so the values
written by metaflow don't make sense at all.

When megaflows are turned off, ofproto-dpif internally generates masks
using flow_wildcards_init_for_packet(). Since that's the same as what
we want to do here, we can just use that instead of metaflow.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2015-08-05 20:25:40 -07:00
Ilya Maximets
2aca813cf4 dpif-netdev: fix race for queues between pmd threads.
Currently pmd threads select queues in pmd_load_queues() according to
get_n_pmd_threads_on_numa(). This behavior leads to race between pmds,
beacause dp_netdev_set_pmds_on_numa() starts them one by one and
current number of threads changes incrementally.

As a result we may have the following situation with 2 pmd threads:

* dp_netdev_set_pmds_on_numa()
* pmd12 thread started. Currently only 1 pmd thread exists.
dpif_netdev(pmd12)|INFO|Core 1 processing port 'port_1'
dpif_netdev(pmd12)|INFO|Core 1 processing port 'port_2'
* pmd14 thread started. 2 pmd threads exists.
dpif_netdev|INFO|Created 2 pmd threads on numa node 0
dpif_netdev(pmd14)|INFO|Core 2 processing port 'port_2'

We have:
core 1 --> port 1, port 2
core 2 --> port 2

Fix this by starting pmd threads only after all of them have
been configured.

Cc: Daniele Di Proietto <diproiettod at vmware.com>
Cc: Dyasly Sergey <s.dyasly at samsung.com>

Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>

Signed-off-by: Ilya Maximets <i.maximets at samsung.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
2015-08-04 12:36:45 -07:00
Jarno Rajahalme
361d808dd9 flow: Split miniflow's map.
Use two maps in miniflow to allow for expansion of struct flow past
512 bytes.  We now have one map for tunnel related fields, and another
for the rest of the packet metadata and actual packet header fields.
This split has the benefit that for non-tunneled packets the overhead
should be minimal.

Some miniflow utilities now exist in two variants, new ones operating
over all the data, and the old ones operating only on a single 64-bit
map at a time.  The old ones require doubling of code but should
execute faster, so those are used in the datapath and classifier's
lookup path.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-07-17 15:18:43 -07:00
Jarno Rajahalme
8cd27cfd1e dpif-netdev: Skip also xregs when building a mask.
When no mask is passed in, dpif_netdev_mask_from_nlattrs() builds a
mask from all possible fields whose prerequisites are met.  OpenFlow
metadata is not relevant for a datapath flow, so we skip registers and
metadata.  Do this for XREGs as well.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-07-17 15:18:43 -07:00
Jarno Rajahalme
09b0fa9c55 flow: Make compile with MSVC.
MSVC does not like zero sized arrays in structs.  Hence, remove the
'values' member from struct miniflow and add back the getters
miniflow_values() and miniflow_get_values().

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-07-16 17:42:24 -07:00
Jarno Rajahalme
8fd4792403 flow: Always inline miniflows.
Now that performance critical code already inlines miniflows and
minimasks, we can simplify struct miniflow by always dynamically
allocating miniflows and minimasks to the correct size.  This changes
the struct minimatch to always contain pointers to its miniflow and
minimask.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-07-15 13:17:10 -07:00
Joe Stringer
2494ccd78f odp-util: Share fields between odp and dpif_backer.
Datapath support for some flow key fields is used inside ofproto-dpif as
well as odp-util. Share these fields using the same structure.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
2015-07-06 10:17:37 -07:00
Jesse Gross
35303d715b tunnels: Don't initialize unnecessary packet metadata.
The addition of Geneve options to packet metadata significantly
expanded its size. It was reported that this can decrease performance
for DPDK ports by up to 25% since we need to initialize the whole
structure on each packet receive.

It is not really necessary to zero out the entire structure because
miniflow_extract() only copies the tunnel metadata when particular
fields indicate that it is valid. Therefore, as long as we zero out
these fields when the metadata is initialized and ensure that the
rest of the structure is correctly set in the presence of a tunnel,
we can avoid touching the tunnel fields on packet reception.

Reported-by: Ciara Loftus <ciara.loftus@intel.com>
Tested-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-07-01 15:24:04 -07:00
Mark Kavanagh
7dd671f08e dpif-netdev: log port/core affinity
When using multiple PMDs and numerous ports, a performance gain
may be achieved in some use cases by pinning a PMD/port to a
particular (set of) core(s).

This patch provides a summary of the switch's port/core affinities
each time that the status of the switch's ports is modified.
Based on this information, a user may determine what affinity
modifications are required to optimise performance for their
particular use case.

Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Wojciech Andralojc <wojciechx.andralojc@intel.com>
Acked-by: Flavio Leitner <fbl@redhat.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2015-06-25 11:21:38 -07:00
Jesse Gross
ec1f6f327e odp-util: Pass down flow netlink attributes when translating masks.
Sometimes we need to look at flow fields to understand how to parse
an attribute. However, masks don't have this information - just the
mask on the field. We already use the translated flow structure for
this purpose but this isn't always enough since sometimes we actually
need the raw netlink information. Fortunately, that is also readily
available so this passes it down from the appropriate callers.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-06-25 11:08:58 -07:00
Jesse Gross
3ee6026aba bitmap: Convert single bitmap functions to 64-bit.
Currently the functions to set, clear, and iterate over bitmaps
only operate over 32 bit values. If we convert them to handle
64 bit bitmaps, they can be used in more places.

Suggested-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-06-25 11:08:31 -07:00
Justin Pettit
2d34dbd9e1 Merge remote-tracking branch 'origin/master' into ovn4 2015-06-18 22:02:55 -07:00
Jesse Gross
5262eea1b8 odp-util: Convert flow serialization parameters to a struct.
Serializing between userspace flows and netlink attributes currently
requires several additional parameters besides the flows themselves.
This will continue to grow in the future as well. This converts
the function arguments to a parameters struct, which makes the code
easier to read and allowing irrelevant arguments to be omitted.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
2015-06-18 16:42:48 -07:00
Ben Pfaff
8420c7ad4e dummy: Introduce new --enable-dummy=system option.
Until now there have been two variants for --enable-dummy:

    * --enable-dummy: This adds support for "dummy" dpif and netdev.

    * --enable-dummy=override: In addition, this replaces *every* existing
      dpif and netdev by the dummy type.

The latter is useful for testing but it defeats the possibility of using
the userspace native tunneling implementation (because all the tunnel
netdevs get replaced by dummy netdevs).  Thus, this commit adds a third
variant:

    * --enable-dummy=system: This replaces the "system" dpif and netdev
      by dummies but leaves the others untouched.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
2015-06-16 08:21:38 -07:00
Ben Pfaff
c4ea752900 dpif: Generalize test for dummy dpifs beyond the name.
When --enable-dummy=system or --enable-dummy=override is in use, dpifs
other than "dummy" are actually dummy dpifs, so use a more reliable test.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
2015-06-16 08:21:28 -07:00
Daniele Di Proietto
72a5e2b8fc dpif-netdev: Prefetch next packet before miniflow_extract().
It appears that miniflow_extract() in emc_processing() spends a lot of
cycles waiting for the packet's data to be read.

Prefetching the next packet's data while parsing removes this delay.
For a single flow pipeline the throughput improves by ~10%.  With a
more realistic pipeline the change has a much smaller effect (~0.5%
improvement)

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
2015-06-15 15:03:50 -07:00
Joe Stringer
bdd7ecf5bf types: Rename and move ovs_u128_equal().
This function doesn't need to be exported in the public OVS headers, and
it had an inconsistent name compared to uuid_equals(). Rename and move.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-06-09 18:20:02 -07:00
Daniele Di Proietto
3bcc10c070 dpif-netdev: Fix non-pmd thread queue id.
Non pmd threads have a core_id == UINT32_MAX, while queue ids used by
netdevs range from 0 to the number of CPUs.  Therefore core ids cannot
be used directly to select a queue.

This commit introduces a simple mapping to fix the problem: pmd threads
continue using queues 0 to N (where N is the number of CPUs in the
system), while non pmd threads use queue N+1.

Fixes: d5c199ea7ff7 ("netdev-dpdk: Properly support non pmd threads.")

Reported-by: 차은호 <eunho.cha@atto-research.com
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Signed-off-by: Mark D. Gray <mark.d.gray@intel.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Flavio Leitner <fbl@redhat.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
2015-06-03 15:34:58 -07:00
Daniele Di Proietto
d5c199ea7f netdev-dpdk: Properly support non pmd threads.
We used to reserve DPDK lcore 0 for non pmd operations, making it
difficult to use core 0 for packet processing.
DPDK 2.0 properly support non EAL threads with lcore LCORE_ID_ANY.

Using non EAL threads for non pmd threads, we do not need to reserve
any core for non pmd operations

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
2015-05-22 11:28:19 -07:00
Daniele Di Proietto
bd5131ba76 ovs-numa: Change 'core_id' to unsigned.
DPDK lcore_id is unsigned.  We need to support big values like
LCORE_ID_ANY (=UINT32_MAX).  Therefore I am changing the type everywhere
in OVS.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
2015-05-22 11:28:19 -07:00
Daniele Di Proietto
048963aa85 dpif-netdev: Reset RSS hash when recirculating.
Having the same RSS hash after recirculation can cause unnecessary
collisions in the exact match cache.  A simple solution is to rehash it
with the recirculation depth if it is non-zero.

Suggested-by: Ethan Jackson <ethan@nicira.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
2015-05-21 14:00:24 -07:00
Ethan Jackson
603f2ce04d dpif-netdev: Clear flow batches before execute.
When executing actions, it's possible a recirculation will occur
causing dp_netdev_input() to be called multiple times.  If the batch
pointers embedded in dp_netdev_flow aren't cleared, it's possible
packets after the recirculation will be reinserted into a batch
associated with the original lookup.  This could be very bad.

This patch fixes the problem by zeroing out flow batch pointers before
calling packet_batch_execute().  This probably has a slightly negative
performance impact, though I haven't tried it.

Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
2015-05-21 13:49:46 -07:00
Ciara Loftus
fc82e877ef dpif-netdev: Increase the number of EMC entries
Prior to this commit, the number of possible entries in the Exact
Match Cache stood at 1024 per thread exacting to 0.18Mb. A typical
server system will have 2.5Mb cache per core meaning a larger EMC will
comfortably fit in. This patch increases the number of entries to 8192
per thread (1.4Mb) which in turn yields improved throughput when
processing multiple flows of traffic.

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
2015-05-21 13:46:43 -07:00
Ethan Jackson
cd159f1a82 dpdk: Ditch MAX_PKT_BURST macro.
The MAX_PKT_BURST and NETDEV_MAX_RX_BATCH macros had a confusing
relationship.  They basically purport to do the same thing, making it
unclear which is the source of truth.

Furthermore, while NETDEV_MAX_RX_BATCH was 256, MAX_PKT_BURST was 32,
meaning we never process a batch larger than 32 packets further adding
to the confusion.

This patch resolves the issue by removing MAX_PKT_BURST completely,
and shrinking the new NETDEV_MAX_BURST macro to only 32.  This should
have no change in the execution path except shrinking a couple of
structs and memory allocations (can't hurt).

Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
2015-05-19 14:47:00 -07:00
Daniele Di Proietto
8aaa125dab dpif-netdev: Share emc and fast path output batches.
Until now the exact match cache processing was able to handle only four
megaflows.  The rest of the packets was passed to the megaflow
classifier.

The limit was arbitraly set to four also because the algorithm used to
group packets in output batches didn't perform well with a lot of
megaflows.

After changing the algorithm and after some performance testing it seems
much better just to share the same output batches between the exact
match cache and the megaflow classifier.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2015-05-18 15:14:02 -07:00
Daniele Di Proietto
11e5cf1f90 dpif-netdev: Store batch pointer in dp_netdev_flow.
The userspace datapath

1. receives a batch of packets.
2. finds a 'netdev_flow' (megaflow) for each packet.
3. groups the packets in output batches based on the 'netdev_flow'.

Until now the grouping (2) was done using a simple algorithm with a
O(N^2) runtime, where N is the number of distinct megaflows of the packets
in the incoming batch.  This could quickly become a bottleneck, even with
a small number of megaflows.

With this commit the datapath simply stores in the 'netdev_flow' (the
megaflow) a pointer to the output batch, if one has been created for the
current input batch.  The pointer will be cleared when the output batch
is sent.

In a simple phy2phy test with 128 megaflows the throughput is more than
doubled.

The reason that stopped us from doing this change was that the
'netdev_flow' memory was shared between multiple threads: this is no
longer the case with the per-thread classifier.

Also, this commit reorders struct dp_netdev_flow to group toghether the
members used in the fastpath.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2015-05-18 15:14:02 -07:00
Daniele Di Proietto
efa2bcbb35 dpif-netdev: Store pkt_metadata structure in dp_netdev_port.
Initializing a struct pkt_metadata for every packet can be surprisingly
expensive.  It's much faster to keep a copy for each port and copying it
on each packet.

Suggested-by: Pravin Shelar <pshelar@nicira.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2015-05-18 15:14:02 -07:00
Daniele Di Proietto
28e2fa027d dpif-netdev: Batch packets when recirculating.
Now that we have per packet metadata, there's no need to split packet
batches when recirculating.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
2015-04-20 12:56:29 -07:00
Daniele Di Proietto
2bc1bbd27d dp-packet: Rename 'dp_hash' in 'rss_hash'.
We already have the 'dp_hash' embedded in the metadata.  This caused
confusion in the code.  With this commit it should be clear that
'rss_hash' is the packet hash used for internal purposes, while
'md.dp_hash' is part of the flow, computed during the execution of
certain actions.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
2015-04-20 12:49:41 -07:00
Daniele Di Proietto
11bfdaddf2 dpif-netdev: Cache time_msec() calls for each received batch.
Calling time_msec() (which calls clock_gettime()) too often might be
expensive.  With this commit OVS makes only one call per received
batch and caches the result.

Suggested-by: Ethan Jackson <ethan@nicira.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
2015-04-20 12:49:41 -07:00
Daniele Di Proietto
9ff55ae284 dpif-netdev: Store actions data and size contiguously.
As stated by the comment above the structure, the 'action' pointer does not
change during the 'dp_netdev_actions' lifetime: we might as well embed
the pointed memory into the structure.

The commit also updates the description of dp_netdev_actions_create().

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
2015-04-20 12:49:41 -07:00
Ben Pfaff
17050610ec dpif-netdev: Reject adding duplicate ports.
Otherwise it is at least very confusing.

Found during testing.  An upcoming commit adds a test.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
2015-04-16 08:13:10 -07:00
Daniele Di Proietto
6553d06bd1 dpif-netdev: Add dpif-netdev/pmd-stats-* appctl commands.
These commands can be used to get packets and cycles counters on a pmd
thread basis.  They're useful to get a clearer picture about the
performance of the userspace datapath.

They export these pieces of information:

- A (per-thread) view of the caches hit rate. Hits in the exact match
  cache are reported separately from hits in the masked classifier
- A rough cycles count. This will allow to estimate the load of OVS and
  the polling overhead.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
2015-04-14 12:31:30 -07:00
Daniele Di Proietto
c8973eb634 dpif-provider: Add class init function.
This init function is called when the dpif class is registered. It will
be used by following commits

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
2015-04-14 12:30:11 -07:00
Daniele Di Proietto
55e3ca97d1 dpif-netdev: Add simple per pmd-thread cycles counters.
The counters use x86 TSC if available (currently only with DPDK). They
will be exposed by subsequents commits

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
2015-04-14 12:28:50 -07:00
Daniele Di Proietto
abcf3ef4c3 dpif-netdev: Count exact match cache hits.
We used to count exact match cache hits and masked classifier hits
together. This commit splits the DP_STAT_HIT counter into two.
This change will be used by future commits.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
2015-04-09 15:00:52 -07:00
Daniele Di Proietto
eb94da30ae dpif-netdev: Make datapath and flow stats atomic.
A read operation from a non atomic shared value (without external
locking) can return incorrect values.  Using the atomic semantics
prevents this from happening.

However:
* No memory barriers are used.  We don't need that kind of consistency
  for statistics (we use relaxed operations).
* The updates are not atomic, just the loads and stores.  This is ok
  because there's a single writer.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
2015-04-09 15:00:52 -07:00
Daniele Di Proietto
60fc3b7ba4 dpif-netdev: Group statistics updates in the slow path.
Since statistics updates might require locking (in future commits)
grouping them will reduce the locking overhead.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
2015-04-09 15:00:52 -07:00
Daniele Di Proietto
97447f55a9 dpif-netdev: Remove support for DPIF_FP_ZERO_STATS flag
Since flow statistics are thread local and updated without any lock,
it is not correct to do a memset from another thread.

This commit simply removes the support for the flag.  It is not needed
by ofproto-dpif, it is only exposed by dpctl commands.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
2015-04-02 17:55:17 -07:00
Daniele Di Proietto
7ad20cbd96 dpif-netdev: Account for and free lost packets.
Packets for which an upcall has failed (lost packets) must be deleted.
We also need to count them as MISS and LOST.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
2015-03-30 13:17:41 -07:00
Pravin B Shelar
6fd6ed71cb ofpbuf: Simplify ofpbuf API.
ofpbuf was complicated due to its wide usage across all
layers of OVS, Now we have introduced independent dp_packet
which can be used for datapath packet, we can simplify ofpbuf.
Following patch removes DPDK mbuf and access API of ofpbuf
members.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-03-03 13:37:39 -08:00
Pravin B Shelar
cf62fa4c70 dp-packet: Remove ofpbuf dependency.
Currently dp-packet make use of ofpbuf for managing packet
buffers. That complicates ofpbuf, by making dp-packet
independent of ofpbuf both libraries can be optimized for
their own use case.
This avoids mapping operation between ofpbuf and dp_packet
in datapath upcalls.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-03-03 13:37:37 -08:00
Pravin B Shelar
e14deea0bd dpif_packet: Rename to dp_packet
dp_packet is short and better name for datapath packet
structure.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2015-03-03 13:37:34 -08:00
Ethan Jackson
4c75aaabb1 dpif-netdev: Fix rare flow add race condition.
Before this patch, dp_netdev_flow_add() inserted newly minted flows in
the "flow_table" cmap before inserting them into the per core "dpcls"
classifier.  Since dpcls_insert() initializes 'flow->cr.mask', there's
a brief window where the flow is accessible from the cmap, but has a
bogus mask value.

In my testing, under rare instances (i.e. once every 20 minutes with a
very specific flow table and traffic pattern), revalidators core dump
when they call dpif_netdev_flow_dump_next(), which accesses this bogus
mask value from dp_netdev_flow_to_dpif_flow().

By inserting into the per core classifier before the cmap, all the
values are guaranteed to be initialized during flow dumps.  With this
patch, I can no longer reproduce the crash.

Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2015-01-07 18:15:13 -08:00
Jarno Rajahalme
d70e8c28f9 miniflow: Use 64-bit data.
So far the compressed flow data in struct miniflow has been in 32-bit
words with a 63-bit map, allowing for a maximum size of struct flow of
252 bytes.  With the forthcoming Geneve options this is not sufficient
any more.

This patch solves the problem by changing the miniflow data to 64-bit
words, doubling the flow max size to 504 bytes.  Since the word size
is doubled, there is some loss in compression efficiency.  To counter
this some of the flow fields have been reordered to keep related
fields together (e.g., the source and destination IP addresses share
the same 64-bit word).

This change should speed up flow data processing on 64-bit CPUs, which
may help counterbalance the impact of making the struct flow bigger in
the future.

Classifier lookup stage boundaries are also changed to 64-bit
alignment, as the current algorithm depends on each miniflow word to
not be split between ranges.  This has resulted in new padding (part
of the 'mpls_lse' field).

The 'dp_hash' field is also moved to packet metadata to eliminate
otherwise needed padding there.  This allows the L4 to fit into one
64-bit word, and also makes matches on 'dp_hash' more efficient as
misses can be found already on stage 1.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-01-06 14:47:30 -08:00
Jarno Rajahalme
aae7c34f04 hash: Add hash_add64().
Add support for adding 64-bit words to hashes.  This will be used by
subsequent patches.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-01-06 14:47:30 -08:00