PMD threads use pvectors but do not need the overhead of the
concurrent version. Expose the non-concurrent version for
that use.
Note that struct pvector is renamed as struct cpvector (for concurrent
priority vector), and the former struct pvector_impl is now struct
pvector.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
Allow clients to use the whole priority range. Note that this changes
the semantics of PVECTOR_FOR_EACH_PRIORITY so that the iteration still
continues for entries at the given priority.
Suggested-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
Use the new ccmap type instead of cmap for staged lookup indices to
fix the problem with slow removal of rules with large number of
duplicates. This was problematic especially when many rules shared
the same match in packet metadata (e.g., a port number, but nothing
else), causing a large number of duplicates to be inserted into the
staged lookup index. ccmap only keeps the count of inserted (hash)
values, so duplicates do not add any performance penalty.
Reported-by: Alok Kumar Maurya <alok-kumar.maurya@hpe.com>
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
This optimization applied when a staged lookup index would narrow down
to a single rule, which happens sometimes is simple test cases, but
presumably less often in more populated flow tables. The result of
this optimization allowed a bit more general megaflows, but the bit
patterns produced were sometimes cryptic. Finally, a later fix to a
more important performance problem does not allow for this
optimization any more, so remove it now.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Ryan Moats <rmoats@us.ibm.com>
Acked-by: Ben Pfaff <blp@ovn.org>
The only vlog line was a left over from debugging.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Ryan Moats <rmoats@us.ibm.com>
Acked-by: Ben Pfaff <blp@ovn.org>
The test for figuring out if the last index had the same fields as the
actual rules map as broken, resulting into keeping an unnecessary
index around.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Ryan Moats <rmoats@us.ibm.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Addition of table versioning exposed struct cls_rule member
'cls_match' to RCU readers and made it possible for 'cls_match' become
NULL while being accessed by an RCU reader, but we failed to check for
this condition. This may have resulted in NULL pointer dereference
and ovs-vswitchd crash.
Fix this by making the 'cls_match' member an RCU pointer and checking
the value whenever it potentially read by an RCU reader. In these
instances we use ovsrcu_get(), whereas functions accessible only by
the exclusive writers use ovsrcu_get_protected() and do not need to
check the result.
VMware-BZ: 1643642
Fixes: 2b7b1427 ("classifier: Support table versioning")
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
This commit also adds several #include directives in source files in
order to make the 'ofp-util.h' move possible
Signed-off-by: Ben Warren <ben@skyportsystems.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Classifier partitions allowed skipping subtables when if was known
from the flow's metadata field that the subtable cannot possibly
match. This functionality was later implemented in a more general
fashion by staged lookup, where the first stage also covers the
metadata field, among the rest of the non-packet fields in the struct
flow. While in theory skipping a subtable on the basis of the
metadata field alone could produce more effective wildcards, on the
basis of our testsuite coverage it does not seem to be the case, as
removing the partitioning feature did not result in any test failures.
Removing the partitioning feature makes classifier lookups roughly 20%
faster when a wildcard mask is not needed, and roughly 10% faster when
a wildcard mask is needed, as tested with the test-classifier
benchmark with one lookup thread.
Found by profiling with 'perf'.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Struct miniflow is now sometimes used just as a map. Define a new
struct flowmap for that purpose. The flowmap is defined as an array of
maps, and it is automatically sized according to the size of struct
flow, so it will be easier to maintain in the future.
It would have been tempting to use the existing struct bitmap for this
purpose. The main reason this is not feasible at the moment is that
some flowmap algorithms are simpler when it can be assumed that no
struct flow member requires more bits than can fit to a single map
unit. The tunnel member already requires more than 32 bits, so the map
unit needs to be 64 bits wide.
Performance critical algorithms enumerate the flowmap array units
explicitly, as it is easier for the compiler to optimize, compared to
the normal iterator. Without this optimization a classifier lookup
without wildcard masks would be about 25% slower.
With this more general (and maintainable) algorithm the classifier
lookups are about 5% slower, when the struct flow actually becomes big
enough to require a second map. This negates the performance gained
in the "Pre-compute stage masks" patch earlier in the series.
Requested-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This makes stage mask computation happen only when a subtable is
inserted and allows simplification of the main lookup function.
Classifier benchmark shows that this speeds up the classification
(with wildcards) about 5%.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
mf_value has grown bigger than needed for storing the biggest
supported prefix (IPv6 address length). Define a new type to be used
instead of mf_value.
This makes classifier lookups a bit faster.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
Use two maps in miniflow to allow for expansion of struct flow past
512 bytes. We now have one map for tunnel related fields, and another
for the rest of the packet metadata and actual packet header fields.
This split has the benefit that for non-tunneled packets the overhead
should be minimal.
Some miniflow utilities now exist in two variants, new ones operating
over all the data, and the old ones operating only on a single 64-bit
map at a time. The old ones require doubling of code but should
execute faster, so those are used in the datapath and classifier's
lookup path.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
MSVC does not like zero sized arrays in structs. Hence, remove the
'values' member from struct miniflow and add back the getters
miniflow_values() and miniflow_get_values().
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
miniflow_clone() and minimask_clone() are no longer used, remove them
from the API.
Now that miniflow data is always inlined, it makes sense to rename
miniflow_clone_inline() miniflow_clone().
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Now that performance critical code already inlines miniflows and
minimasks, we can simplify struct miniflow by always dynamically
allocating miniflows and minimasks to the correct size. This changes
the struct minimatch to always contain pointers to its miniflow and
minimask.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Now that struct cls_match has 'add_version' the 'version' in cls_match
was largely redundant. Remove 'version' from struct cls_rule, and add
it to function prototypes that need it. This makes versioning more
explicit (or less indirect) in the API.
Suggested-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
After all, there are some cases in which both the insertion version
and removal version of a rule need to be considered. This makes the
cls_match a bit bigger, but makes classifier versioning much simpler
to understand.
Also, avoid using type larger than int in an enum, as it is not
portable C.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Postponed 'next' member poisoning was based on the faulty assumption
that postponed functions would be called in the order they were
postponed. This assumption holds only for the functions postponed by
any single thread. When functions are postponed by different
threads, there are no guarantees of the order in which the functions
may be called, or timing between those calls after the next grace
period has passed.
Given this, the postponed poisoning could have executed after
postponed destruction of the object containing the rculist element.
This bug was revealed after the memory leaks on rule deletion were
recently fixed.
This patch removes the postponed 'next' member poisoning and adds
documentation describing the ordering limitations in OVS RCU.
Alex Wang dug out the root cause of the resulting crashes, thanks!
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
Each rule is now added or deleted in a specific tables version. Flow
tables are versioned with a monotonically increasing 64-bit integer,
where positive values are valid version numbers.
Rule modifications are implemented as an insertion of a new rule and a
deletion of the old rule, both taking place in the same tables
version. Since concurrent lookups may use different versions, both
the old and new rule must be available for lookups at the same time.
The ofproto provider interface is changed to accomodate the above. As
rule's actions need not be modified any more, we no longer need
'rule_premodify_actions', nor 'rule_modify_actions'. 'rule_insert'
now takes a pointer to the old rule and adds a flag that tells whether
the old stats should be forwarded to the new rule or not (this
replaces the 'reset_counters' flag of the now removed
'rule_modify_actions').
Versioning all flow table changes has the side effect of making
learned flows visible for future lookups only. I.e., the upcall that
executes the learn action, will not see the newly learned action in
it's classifier lookups. Only upcalls that start executing after the
new flow was added will match on it.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
The traversal of the list of identical rules from the lookup threads
is fragile if the list head is removed during the list traversal.
This patch simplifies the implementation of that list by making the
list NULL terminated, singly linked RCU-protected list. By having the
NULL at the end there is no longer a possiblity of missing the point
when the list wraps around. This is significant when there can be
multiple elements with the same priority in the list.
This change also decreases the size of the struct cls_match back
pre-'visibility' attribute size.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This patch allows classifier rules to become visible and invisible in
specific versions. A 'version' is defined as a positive monotonically
increasing integer, which never wraps around.
The new 'visibility' attribute replaces the prior 'to_be_removed' and
'visible' attributes.
When versioning is not used, the 'version' parameter should be passed
as 'CLS_MIN_VERSION' when creating rules, and 'CLS_MAX_VERSION' when
looking up flows.
This feature enables the support for atomic OpenFlow bundles without
significant performance penalty on 64-bit systems. There is a
performance decrease in 32-bit systems due to 64-bit atomics used.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
OpenFlow 1.4 bundles are easier to implement when it is possible to
mark a rule as 'to_be_removed' and then insert a new, identical rule
with the same priority.
All but one out of the identical rules must be marked as
'to_be_removed', and the one rule that is not 'to_be_removed' must
have been inserted last.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This makes it possible to tentatively add flows to the classifier
without the datapath seeing them.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
A "conjunctive match" allows higher-level matches in the flow table, such
as set membership matches, without causing a cross-product explosion for
multidimensional matches. Please refer to the documentation that this
commit adds to ovs-ofctl(8) for a better explanation, including an example.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
An upcoming commit will make classifier_lookup() sometimes modify its
'flow' argument temporarily during the lookup.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
---
v2: New patch.
v2.1: Rebase.
v3: Rebase.
So far the compressed flow data in struct miniflow has been in 32-bit
words with a 63-bit map, allowing for a maximum size of struct flow of
252 bytes. With the forthcoming Geneve options this is not sufficient
any more.
This patch solves the problem by changing the miniflow data to 64-bit
words, doubling the flow max size to 504 bytes. Since the word size
is doubled, there is some loss in compression efficiency. To counter
this some of the flow fields have been reordered to keep related
fields together (e.g., the source and destination IP addresses share
the same 64-bit word).
This change should speed up flow data processing on 64-bit CPUs, which
may help counterbalance the impact of making the struct flow bigger in
the future.
Classifier lookup stage boundaries are also changed to 64-bit
alignment, as the current algorithm depends on each miniflow word to
not be split between ranges. This has resulted in new padding (part
of the 'mpls_lse' field).
The 'dp_hash' field is also moved to packet metadata to eliminate
otherwise needed padding there. This allows the L4 to fit into one
64-bit word, and also makes matches on 'dp_hash' more efficient as
misses can be found already on stage 1.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
A new function vlog_insert_module() is introduced to avoid using
list_insert() from the vlog.h header.
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Change a test so that the result will be the same in both
little-endian and big-endian systems by editing the test case so that
only one bit differs.
Reported-by: Mijo Safradin <mijo@linux.vnet.ibm.com>
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Use MAP_FOR_EACH_INDEX to improve readability when there is no
apparent cost for it.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This patch adds a new functions classifier_defer() and
classifier_publish(), which control when the classifier modifications
are made available to lookups. By default, all modifications are made
available to lookups immediately. Modifications made after a
classifier_defer() call MAY be 'deferred' for later 'publication'. A
call to classifier_publish() will both publish any deferred
modifications, and cause subsequent changes to to be published
immediately.
Currently any deferring is limited to the visibility of the subtable
vector changes. pvector now processes modifications mostly in a
working copy, which needs to be explicitly published with
pvector_publish(). pvector_publish() sorts the working copy and
removes gaps before publishing it.
This change helps avoiding O(n**2) memory behavior in corner cases,
where large number of rules with different masks are inserted or
deleted.
VMware-BZ: #1322017
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
classifier_remove() was recently changed to take a const struct
cls_rule *. Make the corresponding change to classifier_replace() and
classifier_insert(). This simplifies existing calling sites in
ofproto.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Almost all classifier users already exclude concurrent modifications,
or are single-threaded, hence the classifier internal mutex can be
removed. Due to this change, ovs-router.c and tnl-ports.c need new
mutexes, which are added.
As noted by Ben in review, ovs_router_flush() should also free the
entries it removes from the classifier. It now calls
ovsrcu_postpone() to that effect.
Suggested-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Previously, accurate iteration required writers to be excluded during
iteration. This patch adds an rculist to struct cls_subtable, and a
corresponding list node to struct cls_rule, which makes iteration more
straightforward, and allows the iterators to remain ignorant of the
internals of the cls_match. This new list allows iteration of rules
in the classifier by traversing the RCU-friendly subtables vector, and
the rculist of rules in each subtable.
Classifier modifications may be performed concurrently, but whether or
not the concurrent iterator sees those changes depends on the timing
of change. More specifically, an concurrent iterator:
- May or may not see a rule that is being inserted or removed.
- Will see either the new or the old version of a rule that is replaced.
- Will see all the other rules (that are not being modified).
Finally, The subtable's rculist also allows to make
classifier_rule_overlaps() lockless, which this patch also does.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
There is no point in adding duplicate information into prefix tries.
Also, since the lower-priority duplicate rules are not visible to
lookups, they do not need to be in staged lookup indices directly
either (the head rule is).
Finally, now that cmap operations return the number of elements in the
cmap, subtable's 'n_rules' member is not needed any more.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
insert_rule() only had one caller and this makes the code easier to
understand.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Shifting a 32-bit entity by 32 bits is undefined behavior. As we have 3
cases where we may hit this, it is a time to introduce a helper for
this.
VMware-BZ: #1355026
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Add asserts to make sure the containers within are already empty.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Some struct cls_match and cls_subtable fields were already documented
of being const. Make them const and use CONST_CAST where appropriate
to initialize them.
This will help catch future errors modifying those fields after
initialization.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
struct cls_match 'list' member was recently changed to an rculist.
This allows classifier_find_rule_exactly() to be made lockless.
Since subtable's 'max_priority' member would still require a lock, we
no longer check it before calling find_equal(). This adds a hash
table lookup in cases where the subtable may already be known to not
contain any rule of the target priority.
classifier_find_rule_exactly() is never called on the fastpath, so
this should not be significant.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Returning const struct cls_rule pointers from the classifier API helps
callers to remember that they should not modify the rules returned.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
The list of identical, but lower priority rules is not currently used
in classifier lookup. A later patch introducing conjunctive matches
needs to access the list during lookups, so we must make the list RCU.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
OpenFlow has priorities in the 16-bit unsigned range, from 0 to 65535.
In the classifier, it is sometimes useful to be able to have values below
and above this range. With the 'unsigned int' type used for priorities
until now, there were no values below the range, so some code worked
around it by converting priorities to 64-bit signed integers. This didn't
seem so great to me given that a plain 'int' also had the needed range.
This commit therefore changes the type used for priorities to int.
The interesting parts of this change are in pvector.h and classifier.c,
where one can see the elimination of the use of int64_t.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
tests/test-classifier.c used to include lib/classifier.c to gain
access to the internal data structures and some utility functions.
This was confusing, so this patch splits the relevant groups of
classifier internal definations to a new file
(lib/classifier-private.h), which is included by both lib/classifier.c
and tests/test-classifier.c. Other use of the new file is
discouraged.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Megaflow inserts and removals are simplified:
- No need for classifier internal mutex, as dpif-netdev already has a
'flow_mutex'.
- Number of memory allocations/frees can be halved.
- Lookup code path can rely on netdev_flow_key always having inline data.
This will also be easier to simplify further when moving to per-thread
megaflow classifiers in the future.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
This reverts commit d2064437e2bf91859a0a50fba30dcabba668a811, which
fails clang thread satefy analysis.
A more complete patch will be introduced later.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>