Add an internal mutex to struct cls_classifier, and reorganize
classifier internal structures according to the user of each field,
marking the fields that need to be protected by the mutex. This makes
locking requirements easier to track, and may make lookup more memory
efficient.
After this patch there is some double locking, as callers are taking
the fat-rwlock, and we take the mutex internally. A following patch
will remove the classifier fat-rwlock, removing the (double) locking
overhead.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Hide the cursor from the classifier iteration users and move locking to
the iterators. This will make following RCU changes simpler, as the call
sites of the iterators need not be changed at that point.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Keep an internal representation of a rule separate from the one
embedded into user's structs. This allows for further memory
optimization in the classifier.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
It is better not to expose definitions not needed by users.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
Support struct miniflow as a key for datapath flow lookup.
The new classifier interface classifier_lookup_miniflow_first() takes
a miniflow as a key and stops at the first match with no regard to
flow prioritites. This works only if the classifier has no
conflicting rules (as is the case with the userspace datapath
classifier).
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Reviewed-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Jarno Rajahalme reported up to 40% performance gain on netperf TCP_CRR with
an earlier version of this patch in combination with a kernel NUMA patch,
together with a reduction in variance:
http://openvswitch.org/pipermail/dev/2014-January/035867.html
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
Add a prefix tree (trie) structure for tracking the used address
space, enabling skipping classifier tables containing longer masks
than necessary for an address field value in a packet header being
classified. This enables less unwildcarding for datapath flows in
parts of the address space without host routes.
Trie lookup is interwoven to the staged lookup, so that a trie is
searched only when the configured trie field becomes relevant
for the lookup. The trie lookup results are retained so that each
trie is checked at most once for each classifier lookup.
This implementation tracks the number of rules at each address prefix
for the whole classifier. More aggressive table skipping would be
possible by maintaining lists of tables that have prefixes at the
lengths encountered on tree traversal, or by maintaining separate
tries for subsets of rules separated by metadata fields.
Prefix tracking is configured via OVSDB. A new column "prefixes" is
added to the database table "Flow_Table". "prefixes" is a set of
string values listing the field names for which prefix lookup should
be used.
As of now, the fields for which prefix lookup can be enabled are:
- tun_id, tun_src, tun_dst
- nw_src, nw_dst (or aliases ip_src and ip_dst)
- ipv6_src, ipv6_dst
There is a maximum number of fields that can be enabled for any one
flow table. Currently this limit is 3.
Examples:
ovs-vsctl set Bridge br0 flow_tables:0=@N1 -- \
--id=@N1 create Flow_Table name=table0
ovs-vsctl set Bridge br0 flow_tables:1=@N1 -- \
--id=@N1 create Flow_Table name=table1
ovs-vsctl set Flow_Table table0 prefixes=ip_dst,ip_src
ovs-vsctl set Flow_Table table1 prefixes=[]
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Subtable lookup is performed in ranges defined for struct flow,
starting from metadata (registers, in_port, etc.), then L2 header, L3,
and finally L4 ports. Whenever it is found that there are no matches
in the current subtable, the rest of the subtable can be skipped. The
rationale of this logic is that as many fields as possible can remain
wildcarded.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
The naming of the classifier table has been a source of confusion,
since each OpenFlow table is implemented as a classifier, which
consists of multiple (sub)tables. This name change hopefully makes
classifier related discussion a bit less confusing.
For consistency, relevant field names as well as the function and
variable names have been renamed in similar fashion.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
It's easy to add two tags together, but it's hard to subtract them. The
new "tag_tracker" data structure provides a solution.
Signed-off-by: Ben Pfaff <blp@nicira.com>
We have a controller that puts many rules with different metadata values
into the flow table, where metadata is used (by "resubmit"s) to distinguish
stages in a pipeline. Thus, any given flow only needs to be hashed into
classifier "cls_table"s that contain a match for the flow's metadata value.
This commit optimizes the classifier lookup by (probabilistically) skipping
the "cls_table"s that can't possibly match.
(The "metadata" referred to here is the OpenFlow 1.1+ "metadata" field,
which is a 64-bit field similar in purpose to the "registers" defined by
Open vSwitch.)
Previous versions of this patch, with earlier versions of the controller in
question, improved flow setup performance by about 19%.
Bug #14282.
Signed-off-by: Ben Pfaff <blp@nicira.com>
This makes 'ofproto_mutex' protect the flow table well enough that threads
other than the main one can realistically modify flows.
I need to look at the interface between ofproto and connmgr: I think that
there might need to be some locking there too.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
This commit makes macro function "ASSIGN_CONTAINER()" evaluates
to "(void)0". This is to avoid the 'clang' warning: "expression
result unused", since most of time, the final evaluated value
is not used.
Signed-off-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
It seems that 'clang' compiler applies strict protection on pointer
dereference. And it causes unexpected execution in macro functions
like "HMAP_FOR_EACH()" and unit test failures. This commit fixes
this issue and pass all unit tests.
Co-authored-by: Ethan Jackson <ethan@nicira.com>
Signed-off-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
A future commit will want to know what bits were significant during the
classifier lookup.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Co-authored-by: Justin Pettit <jpettit@nicira.com>
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Signed-off-by: Jarno Rajahalme <jarno.rajahalme@nsn.com>
[blp@nicira.com: this along with Jarno's previous patch to the
classifier give me a combined 15% boost in "ovs-benchmark rate"
with a complicated flow table involving multiple resubmits]
Signed-off-by: Ben Pfaff <blp@nicira.com>
A cls_rule is 324 bytes on i386 now. The cost of a flow table lookup is
currently proportional to this size, which is going to continue to grow.
However, the required cost of a flow table lookup, with the classifier that
we currently use, is only proportional to the number of bits that a rule
actually matches. This commit implements that optimization by replacing
the match inside "struct cls_rule" by a sparse representation.
This reduces struct cls_rule to 100 bytes on i386.
There is still some headroom for further optimization following this
commit:
- I suspect that adding an 'n' member to struct miniflow would make
miniflow operations faster, since popcount() has some cost.
- It's probably possible to replace the "struct minimatch" in cls_rule
by just a "struct miniflow", since the cls_rule's cls_table has a
copy of the minimask.
- Some of the miniflow operations aren't well-optimized.
Signed-off-by: Ben Pfaff <blp@nicira.com>
When cls_cursor_init() is given a NULL target, it can skip an expensive
step comparing the rule against the target for every table and every rule
in the classifier. collect_rule_loose() and other callers could take
advantage of this optimization, except that they actually pass in a rule
that matches everything instead of a NULL rule (e.g. for "ovs-ofctl
dump-flows <bridge>" without specifying a matching rule).
This optimizes that case.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Until now, "struct cls_rule" didn't own any data outside its own memory
block. An upcoming commit will make "struct cls_rule" sometimes own blocks
of memory, so it needs "destroy" and to a lesser extent "clone" functions.
This commit adds these in advance, even though they are mostly no-ops, to
make it possible to separately review the memory management.
Signed-off-by: Ben Pfaff <blp@nicira.com>
OpenFlow 1.0 and 1.2 have notions of VLAN that are different
enough to warrant separate "meta-flow" fields, which this commit
adds.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Simon Horman <horms@verge.net.au>
[blp@nicira.com added NEWS, updated a few overlooked meta-flow bits]
Signed-off-by: Ben Pfaff <blp@nicira.com>
Arbitrary ethernet mask support is one step on the way to support for OpenFlow
1.1+. This patch set seeks to add this capability without breaking current
protocol support.
Signed-off-by: Joe Stringer <joe@wand.net.nz>
[blp@nicira.com made some updates, see
http://openvswitch.org/pipermail/dev/2012-May/017585.html]
Signed-off-by: Ben Pfaff <blp@nicira.com>
Replaced all instances of Nicira Networks(, Inc) to Nicira, Inc.
Feature #10593
Signed-off-by: Raju Subramanian <rsubramanian@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Most flow tables have some kind of "catchall" rule that matches every
packet. For this table, the cost of copying, zeroing, and hashing the
input flow is significant. This patch avoids these costs.
Signed-off-by: Ben Pfaff <blp@nicira.com>
The meta-flow code enforces IPv4/IPv6 masks, so there's no reason to do
it again in the classifier. This allows a number of functions to be
removed, since the only callers were in this classifier code.
Most of the members in structures referring to network elements indicate
the layer (e.g., "tl_", "nw_", "tp_"). The "frag" and "tos" members
didn't, so this commit add them.
Add support matching the IPv4 TTL and IPv6 hop limit fields. This
commit also adds support for modifying the IPv4 TTL. Modifying the IPv6
hop limit isn't currently supported, since we don't support modifying
IPv6 headers.
We will likely want to change the user-space interface, since basic
matching and setting the TTL are not generally useful. We will probably
want the ability to match on extraordinary events (such as TTL of 0 or 1)
and a decrement action.
Feature #8024
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Until now, OVS has handled IP fragments more awkwardly than necessary. It
has not been possible to match on L4 headers, even in fragments with offset
0 where they are actually present. This means that there was no way to
implement ACLs that treat, say, different TCP ports differently, on
fragmented traffic; instead, all decisions for fragment forwarding had to
be made on the basis of L2 and L3 headers alone.
This commit improves the situation significantly. It is still not possible
to match on L4 headers in fragments with nonzero offset, because that
information is simply not present in such fragments, but this commit adds
the ability to match on L4 headers for fragments with zero offset. This
means that it becomes possible to implement ACLs that drop such "first
fragments" on the basis of L4 headers. In practice, that effectively
blocks even fragmented traffic on an L4 basis, because the receiving IP
stack cannot reassemble a full packet when the first fragment is missing.
This commit works by adding a new "fragment type" to the kernel flow match
and making it available through OpenFlow as a new NXM field named
NXM_NX_IP_FRAG. Because OpenFlow 1.0 explicitly says that the L4 fields
are always 0 for IP fragments, it adds a new OpenFlow fragment handling
mode that fills in the L4 fields for "first fragments". It also enhances
ovs-ofctl to allow users to configure this new fragment handling mode and
to parse the new field.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Bug #7557.
Without this commit, every NXAST_LEARN action that adds a flow causes every
facet to be revalidated. With this commit, as long as the "Usage Advice"
in the large comment on struct nx_action_learn in nicira-ext.h is followed,
this no longer happens.
The other cls_rule_*() functions that take IPv6 addresses take a pointer
to an in6_addr, so cls_rule_set_nd_target() should as well for consistency.
Possibly this is more efficient also, although I guess it doesn't really
make much of a difference either way.