2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-28 12:58:00 +00:00

328 Commits

Author SHA1 Message Date
Jarno Rajahalme
d70e8c28f9 miniflow: Use 64-bit data.
So far the compressed flow data in struct miniflow has been in 32-bit
words with a 63-bit map, allowing for a maximum size of struct flow of
252 bytes.  With the forthcoming Geneve options this is not sufficient
any more.

This patch solves the problem by changing the miniflow data to 64-bit
words, doubling the flow max size to 504 bytes.  Since the word size
is doubled, there is some loss in compression efficiency.  To counter
this some of the flow fields have been reordered to keep related
fields together (e.g., the source and destination IP addresses share
the same 64-bit word).

This change should speed up flow data processing on 64-bit CPUs, which
may help counterbalance the impact of making the struct flow bigger in
the future.

Classifier lookup stage boundaries are also changed to 64-bit
alignment, as the current algorithm depends on each miniflow word to
not be split between ranges.  This has resulted in new padding (part
of the 'mpls_lse' field).

The 'dp_hash' field is also moved to packet metadata to eliminate
otherwise needed padding there.  This allows the L4 to fit into one
64-bit word, and also makes matches on 'dp_hash' more efficient as
misses can be found already on stage 1.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-01-06 14:47:30 -08:00
Jarno Rajahalme
aae7c34f04 hash: Add hash_add64().
Add support for adding 64-bit words to hashes.  This will be used by
subsequent patches.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-01-06 14:47:30 -08:00
Alex Wang
1c1e46ed84 dpif-netdev: Add per-pmd flow-table/classifier.
This commit changes the per dpif-netdev datapath flow-table/
classifier to per pmd-thread.  As direct benefit, datapath
and flow statistics no longer need to be protected by mutex
or be declared as per-thread variable, since they are only
written by the owning pmd thread.

As side effects, the flow-dump output of userspace datapath
can contain overlapping flows.  To reduce confusion, the dump
from different pmd thread will be separated by a title line.
In addition, the flow operations via 'ovs-appctl dpctl/*'
are modified so that if the given flow in_port corresponds
to a dpdk interface, the operation will be conducted to all
pmd threads recv from that interface (expect for flow-get
which will always be applied to non-pmd threads).

Signed-off-by: Alex Wang <alexw@nicira.com>
Tested-by: Mark D. Gray <mark.d.gray@intel.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-12-30 11:47:30 -08:00
Alex Wang
b19befaef2 dpif-netdev: Add function to get pmd using core id.
This commit adds the function dp_netdev_get_pmd() which allows
users to get 'struct dp_netdev_pmd_thread' based on the core id.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-12-30 11:47:18 -08:00
Joe Stringer
8e1ffd757d dpif: Shift ufid support checking up to dpif_backer.
Previously, the dpif layer was responsible for determining datapath
support for UFIDs, which resulted in all ovs-dpctl utilities
inserting/deleting flows from the datapath each time they are run.
Shift this responsibility up to the dpif_backer.

There are two users of this functionality: Revalidators check for UFID
support to request a terser dump using UFIDs, and dpif-netlink uses this
to request flow_del operations to only return the UFID/stats. The latter
case was previously hidden from revalidators, but this change makes them
aware of it, and reuses the same "udpif->enable_ufid" flag for reducing
overhead of both flow dump and flow delete.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
2014-12-19 12:57:41 -08:00
Thomas Graf
e6211adce4 lib: Move vlog.h to <openvswitch/vlog.h>
A new function vlog_insert_module() is introduced to avoid using
list_insert() from the vlog.h header.

Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-12-15 14:15:19 +01:00
Joe Stringer
64bb477f05 dpif: Minimize memory copy for revalidation.
One of the limiting factors on the number of flows that can be supported
in the datapath is the overhead of assembling flow dump messages in the
datapath. This patch modifies the dpif to allow revalidators to skip
dumping the key, mask and actions from the datapath, by making use of
the unique flow identifiers introduced in earlier patches.

For each flow dump, the dpif user specifies whether to skip these
attributes, allowing the common case to only dump a pair of 128-bit ID
and flow stats. With datapath support, this increases the number of
flows that a revalidator can handle per second by 50% or more. Support
in dpif-netdev and dpif-netlink is added in this patch; kernel support
is left for future patches.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-12-02 14:55:55 -08:00
Joe Stringer
70e5ed6f39 dpif: Index flows using unique identifiers.
This patch modifies the dpif interface to allow flows to be manipulated
using a 128-bit identifier. This allows revalidator threads to perform
datapath operations faster, as they do not need to serialise the entire
flow key for operations like flow_get and flow_delete. In conjunction
with a future patch to simplify the dump interface, this provides a
significant performance benefit for revalidation.

When handlers assemble flow_put operations, they specify a unique
identifier (UFID) for each flow as it is passed down to the datapath to
be stored with the flow. The UFID is currently provided to handlers
by the dpif during upcall processing.

When revalidators assemble flow_get or flow_del operations, they may
specify the UFID for the flow along with the key. The dpif will decide
whether to send only the UFID to the datapath, or both the UFID and flow
key. The former is preferred for newer datapaths that support UFID,
while the latter is used for backwards compatibility.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-12-02 14:10:23 -08:00
Alex Wang
433330a8bf dpif-netdev: Fix a race.
On current master, the 'struct dp_netdev_port' is destroyed
immediately when the ref count reaches 0.  However, non-pmd
threads calling the dpif_netdev_execute() for sending packets
could hold pointer to 'port' that is not ref-counted.  Thusly
those threads could possibly access freed memory when the port
is deleted.

To fix this bug, this commit makes non-pmd threads acquiring
the 'port_mutex' before doing the actual execution in
dpif_netdev_execute().

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-12-01 10:53:19 -08:00
Joe Stringer
7af12bd7c8 dpif: Generate flow_hash for revalidators in dpif.
This patch shifts the responsibility for determining the hash for a flow
from the revalidation logic down to the dpif layer. This assists in
handling backward-compatibility for revalidation with the upcoming
unique flow identifier "UFID" patches.

A 128-bit UFID was selected to minimize the likelihood of hash conflicts.
Handler threads will not install a flow that has an identical UFID as
another flow, to prevent misattribution of stats and to ensure that the
correct flow key cache is used for revalidation.

For datapaths that do not support UFID, which is currently all
datapaths, the dpif will generate the UFID and pass it up during upcall
and flow_dump. This is generated based on the datapath flow key.

Later patches will add support for datapaths to store and interpret this
UFID, in which case the dpif has a responsibility to pass it through
transparently.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-11-25 14:12:25 -08:00
Pravin B Shelar
53e1d6f1ef dpif-netdev: Remove redundant hash action handling.
odp_execute_actions() already handles hash execution part.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-11-21 15:51:42 -08:00
Alex Wang
67ad54cbc8 dpif-netdev: Garbage collect the exact match cache periodically.
On current master, the exact match cache entry can keep reference to
'struct dp_netdev_flow' even after the flow is removed from the flow
table.  This means the free of allocated memory of the flow is delayed
until the exact match cache entry is cleared or replaced.

If the allocated memory is ahead of chunks of freed memory on heap,
the delay will prevent the reclaim of those freed chunks, causing
falsely high memory utilization.

To fix the issue, this commit makes the owning thread conduct periodic
garbage collection on the exact match cache and clear dead entries.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>

---
PATCH -> V2:
- Adopt Jarno's suggestion and conduct slow sweep to avoid introducing
  jitter.
2014-11-21 08:06:41 -08:00
Jarno Rajahalme
802f84ffd7 classifier: Defer pvector publication.
This patch adds a new functions classifier_defer() and
classifier_publish(), which control when the classifier modifications
are made available to lookups.  By default, all modifications are made
available to lookups immediately.  Modifications made after a
classifier_defer() call MAY be 'deferred' for later 'publication'.  A
call to classifier_publish() will both publish any deferred
modifications, and cause subsequent changes to to be published
immediately.

Currently any deferring is limited to the visibility of the subtable
vector changes.  pvector now processes modifications mostly in a
working copy, which needs to be explicitly published with
pvector_publish().  pvector_publish() sorts the working copy and
removes gaps before publishing it.

This change helps avoiding O(n**2) memory behavior in corner cases,
where large number of rules with different masks are inserted or
deleted.

VMware-BZ: #1322017
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-11-14 16:00:46 -08:00
Alex Wang
accf86266a dpif-netdev: Allow direct destroy of 'struct dp_netdev_port'.
Before this commit, when 'struct dp_netdev_port' is deleted from
'dpif-netdev' datapath, if there is pmd thread, the pmd thread
will release the last reference to the port and ovs-rcu postpone
the destroy.  However, the delayed close of object like 'struct
netdev' could cause failure in immediate re-add or reconfigure of
the same device.

To fix the above issue, this commit uses condition variable and
makes the main thread wait for pmd thread to release the reference
when deleting port.  Then, the main thread can directly destroy the
port.

Reported-by: Cian Ferriter <cian.ferriter@intel.com>
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-11-12 15:59:12 -08:00
Alex Wang
f7d636527b dpif-netdev: Move 'struct dp_netdev_port' initialization before use.
There is a portion of the 'struct dp_netdev_port' initialization
that is placed after the reload of pmd threads.  This means in
theory, there could be a race where pmd threads access half-
initialized struct.  Although such race has not been seen, it
makes sense to fully initialize the struct before use.

Found by code inspection.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-11-12 15:59:11 -08:00
Pravin B Shelar
a36de779d7 openvswitch: Userspace tunneling.
Following patch adds support for userspace tunneling. Tunneling
needs three more component first is routing table which is configured by
caching kernel routes and second is ARP cache which build automatically
by snooping arp. And third is tunnel protocol table which list all
listening protocols which is populated by vswitchd as tunnel ports
are added. GRE and VXLAN protocol support is added in this patch.

Tunneling works as follows:
On packet receive vswitchd check if this packet is targeted to tunnel
port. If it is then vswitchd inserts tunnel pop action which pops
header and sends packet to tunnel port.
On packet xmit rather than generating Set tunnel action it generate
tunnel push action which has tunnel header data. datapath can use
tunnel-push action data to generate header for each packet and
forward this packet to output port. Since tunnel-push action
contains most of packet header vswitchd needs to lookup routing
table and arp table to build this action.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-11-12 15:08:33 -08:00
Andy Zhou
b5cbbcf656 bridge: Store datapath version into ovsdb
OVS userspace are backward compatible with older Linux kernel modules.
However, not having the most up-to-date datapath kernel modules can
some times lead to user confusion. Storing the datapath version in
OVSDB allows management software to check and optionally provide
notifications to users.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-11-05 15:27:38 -08:00
David Verbeiren
7251515ea9 netdev-dpdk: Fix DPDK rings broken by multi queue
DPDK rings don't need one queue per PMD thread and don't support multiple
queues (set_multiq function is undefined). To fix operation with DPDK rings,
this patch ignores EOPNOTSUPP error on netdev_set_multiq() and provides, for
DPDK rings, a netdev send() function that ignores the provided queue id
(= PMD thread core id).

Suggested-by: Maryam Tahhan <maryam.tahhan@intel.com>
Signed-off-by: David Verbeiren <david.verbeiren@intel.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-11-04 09:21:27 -08:00
Jarno Rajahalme
caeb4906d4 lib/dpif-netdev: Fix EMC lookup.
Patch 0de8783a9 (lib/dpif-netdev: Integrate megaflow classifier.)
broke exact match cache lookup, but it went undetected since there are
no separate stats for EMC.

This patch fixes the problem by changing the struct netdev_flow_key
'len' member to cover only the 'mf' member, not the whole
netdev_flow_key, and ignoring the 'len' field in
netdev_flow_key_equal.  Comparison is still accurate, as the miniflow
'map' field encodes the length in the number of 1-bits, and the map is
included in the comparison.

Reported-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Daniele Di Proietto <ddiproietto@vmware.com>
2014-10-17 17:03:13 -07:00
Jarno Rajahalme
0de8783a9d lib/dpif-netdev: Integrate megaflow classifier.
Megaflow inserts and removals are simplified:

- No need for classifier internal mutex, as dpif-netdev already has a
  'flow_mutex'.
- Number of memory allocations/frees can be halved.
- Lookup code path can rely on netdev_flow_key always having inline data.

This will also be easier to simplify further when moving to per-thread
megaflow classifiers in the future.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
2014-10-17 09:37:11 -07:00
Pravin B Shelar
41ccaa249c netdev-dpif: Add metadata to dpif-packet.
Today dpif-netdev has single metadat for given batch, since one
batch belongs to one port, but soon packets fro single tunnel ports
can belong to different ports, so we need to have per packet metadata.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-10-09 14:12:11 -07:00
Daniele Di Proietto
154374a72b dpif-netdev: reduce netdev_flow_key size
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-10-07 11:23:37 -07:00
Jarno Rajahalme
52a524eb20 lib/cmap: cmap_find_batch().
Batching the cmap find improves the memory behavior with large cmaps
and can make searches twice as fast:

$ tests/ovstest test-cmap benchmark 2000000 8 0.1 16
Benchmarking with n=2000000, 8 threads, 0.10% mutations, batch size 16:
cmap insert:    533 ms
cmap iterate:    57 ms
batch search:   146 ms
cmap destroy:   233 ms

cmap insert:    552 ms
cmap iterate:    56 ms
cmap search:    299 ms
cmap destroy:   229 ms

hmap insert:    222 ms
hmap iterate:   198 ms
hmap search:   2061 ms
hmap destroy:   209 ms

Batch size 1 has small performance penalty, but all other batch sizes
are faster than non-batched cmap_find().  The batch size 16 was
experimentally found better than 8 or 32, so now
classifier_lookup_miniflow_batch() performs the cmap find operations
in batches of 16.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-10-06 15:33:47 -07:00
Jarno Rajahalme
55847abee8 lib/cmap: split up cmap_find().
This makes the following patch easier and cleans up the code.

Explicit "inline" keywords seem necessary to prevent performance
regression on cmap_find() with GCC 4.7 -O2.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-10-06 15:33:46 -07:00
Daniele Di Proietto
8bd89cdc06 dpif-netdev: Destroy pmd_thread cmap at exit
Found by valgrind

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-10-03 15:04:15 -07:00
Daniele Di Proietto
88ace79b3e dpif-netdev: fix dp_netdev_free()
dp_netdev_free() must free 'dp->upcall_rwlock', but when upcalls are
disabled (if the datapath is being freed upcalls should be disabled)
'dp->upcall_rwlock' is taken and freeing it causes an assertion to
fail.

This commit takes makes sure that the upcalls are disabled and
releases 'dp->upcall_rwlock' before freeing it. A simple testcase is
added to detect the failure.

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-10-03 15:04:15 -07:00
Daniele Di Proietto
ac8c20812b dpif-netdev: Fix (packet) memory leaks in the slow path.
If a packet didn't match a rule in the fast path classifier its memory was
never freed. The issue was particularly clear with DPDK devices because it was
not possible to process more than ~250000 DPDK mbufs in the slow path.

This commit fixes the problem by:
* calling dpif_packet_delete() if the upcalls are disabled
* passing may_steal==true to dp_netdev_execute_actions() during normal upcall
  processing

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-09-19 16:50:17 -07:00
Alex Wang
f2eee18911 dpif-netdev: Allow multi-rx-queue, multi-pmd-thread configuration.
This commits adds the multithreading functionality to OVS dpdk
module.  Users are able to create multiple pmd threads and set
their cpu affinity via specifying the cpu mask string similar
to the EAL '-c COREMASK' option.

Also, the number of rx queues for each dpdk interface is made
configurable to help distribution of rx packets among multiple
pmd threads.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-09-19 15:59:36 -07:00
Daniele Di Proietto
0023a1cb45 dpif-netdev: Store miniflow length in exact match cache
This optimization is done to avoid calling count_1bits(), which, if
the popcnt istruction is not available might is slow. popcnt may not
be available because:

- We are running on old hardware
- (more likely) We're using a generic build (i.e. packaged OVS from a
  distro), not tuned for the specific CPU

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-09-17 15:15:09 -07:00
Daniele Di Proietto
79df317f0b dpif-netdev: Introduce netdev_flow_key_* functions
netdev_flow_key is a miniflow with the following constraints:

1) It is used only inside dpif-netdev.c.
2) It always has inline values.
3) It contains only miniflows created by miniflow_extract().

Therefore, by using these new functions instead of the miniflow_*
ones, we get the following (performance related) benefits:

- Because of (1) the functions can be inlined.
- Because of (2) and (3) the netdev_flow_key can be treated as POD.
  Specifically, because of (3), we can do comparisons with memcmp,
  since if the map is different the miniflow must be different.

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-09-17 15:09:45 -07:00
Alex Wang
65f13b50c5 dpif-netdev: Create multiple pmd threads by default.
With this commit, ovs by default will create one pmd thread
for each numa node and pin the pmd thread to available cpu
core on the numa node.

NON_PMD_CORE_ID (currently 0) is used to reserve a particular
cpu core for the I/O of all non-pmd threads.  No pmd thread
can be pinned to this reserved core.

As side-effects of this commit:

-  pmd thread will not be created, if there is no dpdk interface
   from the corresponding numa node added to ovs.

- the exact-match cache for non-pmd threads is removed from
  'struct dp_netdev'.  Instead, all non-pmd threads will use
  the exact-match cache defined in the 'struct dp_netdev_pmd_thread'
  for NON_PMD_CORE_ID.

- the rx packet processing functions are refactored to use
  'struct dp_netdev_pmd_thread' as input.

- the 'netdev_send()' function will be called with the proper
  queue id.

- both pmd and non-pmd threads can call the dpif_netdev_execute().
  so, use a per-thread key to help recognize the calling thread.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-09-15 11:43:49 -07:00
Alex Wang
5a03406477 dpif-netdev: Create multiple tx/rx queues when adding dpdk interface.
Before this commit, ovs creates one tx and one rx queue for
each dpdk interface and uses only one poll thread for handling
I/O of all dpdk interfaces.  An upcoming patch will allow multiple
poll threads be created.  As a preparation, this commit changes
the dpif-netdev to create multiple tx/rx queues when the dpdk
interface is added.

Specifically, the number of rx queues will still be one per-dpdk
interface for this commit.  But upcoming work will allow user
create multiple rx queues.  The number of tx queues will be the
number of cpu cores on the machine.  Although not all the tx queues
will be used, each poll thread will have its own queue for
transmission on the dpdk interface.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-09-15 11:43:48 -07:00
Alex Wang
f00fa8cbad netdev: Add n_txq to 'struct netdev'.
This commit adds new variable n_txq to 'struct netdev' for recording
the number of tx queues.  Correspondingly, the send_*() functions are
extended to accept queue id as input argument.

All 'netdev-*' implementation will ignore the queue id since having
multiple tx queues is not supported.  Upcomping patches will start
using it and create multiple tx queues for dpdk netdev.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-09-12 11:30:58 -07:00
Alex Wang
74467d5c88 unixctl: Make command description all lowercase.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-09-11 13:49:13 -07:00
Jarno Rajahalme
3c33f0ffe7 lib/dpif-netdev: Make emc_mutex recursive.
dpif_netdev_execute may be called while doing upcall processing.
Since the context of the input port is not tracked upto this point, we
use the shared dp->emc_cache for packet execution, where the emc_cache
is needed for recirculation.

While recursive mutexes can make thread safety analysis hard, for now
we change emc_mutex to be recursive.  Forthcoming new unit tests will
fail with the current non-recursive mutex.  Later improvements may
remove the need for this recursion.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Daniele Di Proietto <ddiproietto@vmware.com>
2014-09-08 15:33:00 -07:00
Jarno Rajahalme
6d670e7f0d lib/odp: Masked set action execution and printing.
Add a new action type OVS_ACTION_ATTR_SET_MASKED, and support for
parsing, printing, and committing them.

Masked set actions add a mask, immediately following the netlink
attribute data, within the netlink attribute itself.  Thus the key
attribute size for a masked set action is exactly double of the
non-masked set action.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-09-08 14:57:08 -07:00
Alex Wang
a1fdee136b dpif-netdev: Introduce port_try_ref() to prevent a race.
When pmd thread interates through all ports for queue loading,
the main thread may unreference and 'rcu-free' a port before
pmd thread take new reference of it.  This could cause pmd
thread fail the reference and access freed memory later.

This commit fixes this race by introducing port_try_ref()
which uses ovs_refcount_try_ref_rcu().  And the pmd thread
will only load the port's queue, if port_try_ref() returns
true.

Found by inspection.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-09-03 13:29:03 -07:00
Alin Serdean
1a0d583102 dpif-netdev: Avoid variable length array on MSVC.
MSVC does not like variable length array either.

This patch treats the following error:

lib/dpif-netdev.c(2272) : error C2057: expected constant expression
lib/dpif-netdev.c(2272) : error C2466: cannot allocate an array of constant size 0
lib/dpif-netdev.c(2272) : error C2133: 'batches' : unknown size
lib/dpif-netdev.c(2273) : error C2057: expected constant expression
lib/dpif-netdev.c(2273) : error C2466: cannot allocate an array of constant size 0
lib/dpif-netdev.c(2273) : error C2133: 'mfs' : unknown size
lib/dpif-netdev.c(2274) : error C2057: expected constant expression
lib/dpif-netdev.c(2274) : error C2466: cannot allocate an array of constant size 0
lib/dpif-netdev.c(2274) : error C2133: 'rules' : unknown size
lib/dpif-netdev.c(2363) : warning C4034: sizeof returns 0
lib/dpif-netdev.c(2381) : error C2057: expected constant expression
lib/dpif-netdev.c(2381) : error C2466: cannot allocate an array of constant size 0
lib/dpif-netdev.c(2381) : error C2133: 'keys' : unknown size
make[2]: *** [lib/dpif-netdev.lo] Error 1

Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2014-09-02 08:10:24 -07:00
Daniele Di Proietto
9bbf1c3dc0 dpif-netdev: Exact match cache
Since lookups in the classifier can be pretty expensive,
we introduce this (thread local) cache which simply
compares the miniflows of the packets

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-08-29 16:32:22 -07:00
Daniele Di Proietto
61a2647e15 packet-dpif: Add dpif_packet_{get, set}_hash()
These function are used to stored the packet hash. 'netdev-dpdk'
automatically set this value to the RSS hash returned by the
NIC. Other 'netdev's set it to 0 (which is an invalid hash
value), so that callers can compute the hash on their own.

If DPDK support is enabled, struct dpif_packet's member
'dp_hash' is removed and 'pkt.hash.rss' from DPDK mbuf is used

This commit also configure DPDK devices to compute RSS hash
for UDP and IPv6 packets

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-08-29 16:32:21 -07:00
Jarno Rajahalme
91a96379e5 lib: Use shorter form of relaxed atomic access.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-08-29 10:34:52 -07:00
Thomas Graf
16bea12c92 dpif-netdev: Fix leaked port, port->rxq, port->type in error path
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
[blp@nicira.com added free of port->type]
Signed-off-by: Ben Pfaff <blp@nicira.com>
2014-08-26 11:27:00 -07:00
Jarno Rajahalme
84067a4c1a lib/dpif-netdev: Clean-up pmd thread signaling.
It could be possible that the thread misses a signal when it reads the
change_seq again after reload.  Also, the counter has no dependent
data, so the memory model for the atomic read can be relaxed.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-08-15 16:40:14 -07:00
Ethan Jackson
623540e461 dpif-netdev: Streamline miss handling.
This patch avoids the relatively inefficient miss handling processes
dictated by the dpif process, by calling into ofproto-dpif directly
through a callback.

Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-08-14 13:47:04 -07:00
Joe Stringer
6fe09f8c44 dpif: Support flow_get in dpif_operate().
This cleans up the dpif interface to make it more consistent with the
other dpif operations, and allows flows to be fetched in batches.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-08-14 10:18:41 +12:00
Daniele Di Proietto
ed79f89a30 dpif-netdev: Reintroduce ref_cnt for dp_netdev_flow
struct dp_netdev_flow used to have a reference counter.
It has been replaced by RCU. Unfortunately RCU is not
enough if we plan to hold a reference to the dp_netdev_flow
for a long time. So this commit reintroduces reference
counting for struct dp_netdev_flow

Subsequent commits make use of it.

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-08-12 17:38:48 -07:00
Ben Pfaff
1a0c894a26 dpif-provider: Get rid of redundant operations.
The dpif provider 'operate' call duplicates all of the features available
from the 'flow_put', 'flow_del', and 'execute' calls, yielding redundant
code in providers that support both mechanisms.  This change drops the
latter calls in favor of making every dpif provider support 'operate'.
The result is code that is overall less duplicative.

It might make sense to do the same with flow_get but so far 'operate'
doesn't support flow_get.

Signed-off-by: Ben Pfaff <blp@nicira.com>
2014-08-12 09:59:42 -07:00
Ethan Jackson
ae2ceebd67 dpif-netdev: Avoid useless flow copy in dp_netdev_flow_add().
This patch gives dp_netdev_flow_add() a match with which it can
initialize the classifier rule.  This prevents it from needing to copy
a flow and flow_wildcards into the match first.

Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-08-05 14:13:20 -07:00
Ethan Jackson
645b893423 style: Replace TODO with XXX.
In accordance with CodingStyle.

Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-08-05 14:13:20 -07:00
Ben Pfaff
6bc3bb829c cmap: Merge CMAP_FOR_EACH_SAFE into CMAP_FOR_EACH.
There isn't any significant downside to making cmap iteration "safe" all
the time, so this drops the _SAFE variant.

Similar changes to CMAP_CURSOR_FOR_EACH and CMAP_CURSOR_FOR_EACH_CONTINUE.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-07-29 09:02:23 -07:00