Add new macro MINIFLOW_MAP(FIELD) that returns the map covering the
given struct flow field.
Change the miniflow accessors to macros so that they can take the
field name directly.
Use these to add ipv6 support to miniflow_hash_5tuple().
Add ipv6 support to flow_hash_5tuple() as well so that these two
functions continue to return the same hash value for the corresponding
flows.
Also, simplify miniflow_get_metadata().
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
Apart from STP, EVB extension of LLDP as well as IEEE 802.1QBG use the
Nearest Customer Bridge (NCB) DMAC which has a value of 0180.c200.0000.
STP can be distinguished by Ethertype from these protocols.
Signed-off-by: Padmanabhan Krishnan <kprad1@yahoo.com>
[blp@nicira.com rewrote the details of the patch]
Signed-off-by: Ben Pfaff <blp@nicira.com>
Tested-by: Padmanabhan Krishnan <kprad1@yahoo.com>
Use miniflow as a flow key in the userspace datapath classifier. The
miniflow is expanded for upcalls, but for existing datapath flows, the
key need not be expanded.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Reviewed-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Support struct miniflow as a key for datapath flow lookup.
The new classifier interface classifier_lookup_miniflow_first() takes
a miniflow as a key and stops at the first match with no regard to
flow prioritites. This works only if the classifier has no
conflicting rules (as is the case with the userspace datapath
classifier).
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Reviewed-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Add inlined generic accessors for miniflow integer type fields, and a
new miniflow_get_tcp_flags() usinge these. These will be used in a
later patch.
Some definitions also used in lib/packets.h had to be moved there to
resolve circular include dependencies. Similarly, some inline
functions using struct flow are now in lib/flow.h. IMO this is
cleaner, since now the lib/flow.h need not be included from
lib/packets.h.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Reviewed-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
miniflow_extract() extracts packet headers directly to a miniflow,
which is a compressed form of the struct flow. This does not require
a large struct to be cleared to begin with, and accesses less memory.
These performance benefits should allow this to be used in the DPDK
datapath.
miniflow_extract() takes a miniflow as an input/output parameter. On
input the buffer for values to be extracted must be properly
initialized. On output the map contains ones for all the fields that
have been extracted.
Some struct flow fields are reordered to make miniflow_extract to
progress in the logical order.
Some explicit "inline" keywords are necessary for GCC to optimize this
properly. Also, macros are used for same reason instead of inline
functions for pushing data to the miniflow.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Reviewed-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Since the dp_hash will often be a hash of the 5 tuple, it makes sense
to put it with the L4 header so it hits in the last classifier lookup
stage.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
threads read upcall.
This commit implements the API functions to allow multiple handler
threads read upcall.
Also, this commit removes the handling priority of DPIF_UC_MISS
over DPIF_UC_ACTION. So, both misses will be put to the same
queue. The decision is based on the fact that a lot has changed
since the age when flow setup rate is most treasured and starving
all actions in the presence of any flow misses doesn't seem like
a sound balancing solution.
Thusly the current implementation will be put in testing and
investigation for better balancing solution will continue if
there is an issue.
Also note, the introduction and use of flow_hash_5tuple() will
put missed ICMP packets from same source but with different
type/code to different handler queues. This may cause reordering
of these packets. For now, we do not count this as a problem.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Change the flow_extract() API to accept struct pkt_metadata,
instead of individual metadata fields. It will make the API more
logical and easier to maintain when we need to expand metadata
down the road.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>¬
This commit makes the userspace support for MPLS more complete. Now
up to 3 labels are supported.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Co-authored-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Simon Horman <horms@verge.net.au>
We allow zero 'values' in a miniflow for it to have the same map
as the corresponding minimask. Minimasks themselves never have
zero data values, though. Document this and optimize the code
accordingly.
v2:
- Made miniflow_get_map_in_range() to return data offset instead of
a pointer via the last parameter.
- Simplified minimatch_hash_in_range() by removing pointer arithmetic.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This patch adds a new function flow_unildcard_tp_ports() which doesn't
unwildcard the upper half of tp_src and tp_dst with ICMP packets.
Unfortunately, this matters in future patches when we compare masks
carefully to determine if flows should be evicted from the datapath.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Commit bcd2633a5be(ofproto-dpif: Store relevant fields for wildcarding
in facet) implements flow revalidation by comparing the newly looked
up flow mask with that of the existing facet.
The non-packet fields, such as register masks, are always cleared
by xlate_actions in the masks stored within facets, but they are not
cleared in the newly looked up flow masks, causing otherwise valid
flows to be declared as invalid flows and be removed from the datapath.
This patch provides a fix. I was able to verify the fix on a system set
up by Ying Chen where the bug can be reproduced.
Bug #21680
Reported by: Ying Chen <yingchen@vmware.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Allow TCP flags match specification with symbolic flag names. TCP
flags are optionally specified as a string of flag names, each
preceded by '+' when the flag must be one, or '-' when the flag must
be zero. Any flags not explicitly included are wildcarded. The
existing hex syntax is still allowed, and is used in flow dumps when
all the flags are matched.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Subtable lookup is performed in ranges defined for struct flow,
starting from metadata (registers, in_port, etc.), then L2 header, L3,
and finally L4 ports. Whenever it is found that there are no matches
in the current subtable, the rest of the subtable can be skipped. The
rationale of this logic is that as many fields as possible can remain
wildcarded.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
tcp_flags=flags/mask
Bitwise match on TCP flags. The flags and mask are 16-bit num‐
bers written in decimal or in hexadecimal prefixed by 0x. Each
1-bit in mask requires that the corresponding bit in port must
match. Each 0-bit in mask causes the corresponding bit to be
ignored.
TCP protocol currently defines 9 flag bits, and additional 3
bits are reserved (must be transmitted as zero), see RFCs 793,
3168, and 3540. The flag bits are, numbering from the least
significant bit:
0: FIN No more data from sender.
1: SYN Synchronize sequence numbers.
2: RST Reset the connection.
3: PSH Push function.
4: ACK Acknowledgement field significant.
5: URG Urgent pointer field significant.
6: ECE ECN Echo.
7: CWR Congestion Windows Reduced.
8: NS Nonce Sum.
9-11: Reserved.
12-15: Not matchable, must be zero.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
This patch gets rid of the need for having explicit padding in struct
flow as new fields are being added. flow_wildcards_init_exact(), which
used to set bits in both compiler generated and explicit padding, is
removed. match_wc_init() is now used instead, which generates the mask
based on a given flow, setting bits only in fields which make sense.
Places where random bits were placed in struct flow have been changed to
only set random bits on fields that are significant in the given context.
This avoids setting padding bits.
- lib/flow:
- Properly initialize struct flow also in places we used to zero out
padding before.
- Add flow_random_hash_fields() used for testing.
- Remove flow_wildcards_init_exact() to avoid initializing
masks where compiler generated padding has bits set.
- lib/match.c match_wc_init(): Wildcard transport layer fields for later
fragments, remove match_init_exact(), which used
flow_wildcards_init_exact().
- tests/test-flows.c: use match_wc_init() instead of match_init_exact()
- tests/flowgen.pl: generate more accurate packets and flows when
fragmenting, mark unavailable fields as wildcarded.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Rather than tracking the MPLS depth as a field in the
flow, which is an entirely poor place for it, just track
the delta to the MPLS depth during translation.
This logic was developed while implementing recirculation
and intended to be used to detect when recirculation should
occur. This variant of the patch uses the logic to determine
if processing of actions should stop due to an MPLS
action which cannot be translated (without recirculation).
A side-effect of this patch is that it resolves a bug
whereby ovs-vswitchd will abort due to to an assertion
on eth_type_mpls(ctx->xin->flow.dl_type) in compose_mpls_pop_action(()
if the actions of a flow include pop_mpls twice without
a push_mpls in between.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>
We have a controller that puts many rules with different metadata values
into the flow table, where metadata is used (by "resubmit"s) to distinguish
stages in a pipeline. Thus, any given flow only needs to be hashed into
classifier "cls_table"s that contain a match for the flow's metadata value.
This commit optimizes the classifier lookup by (probabilistically) skipping
the "cls_table"s that can't possibly match.
(The "metadata" referred to here is the OpenFlow 1.1+ "metadata" field,
which is a 64-bit field similar in purpose to the "registers" defined by
Open vSwitch.)
Previous versions of this patch, with earlier versions of the controller in
question, improved flow setup performance by about 19%.
Bug #14282.
Signed-off-by: Ben Pfaff <blp@nicira.com>
The Linux kernel datapath enables matching and setting the skb mark
but this functionality is currently used only internally by
ovs-vswitchd. This exposes it through NXM to enable external
controllers to interact with other kernel subsystems. Although this
is simply exporting the skb mark, the intention is that this is a
platform independent mechanism to access some system metadata and
therefore may have different implementations on various systems.
Bug #17855
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
The skb_mark field is currently only available with the Linux datapath
and is only used internally. However, it is desirable to expose this
through OpenFlow and when it is exposed ideally it would not be system-
specific. In preparation for this, skb_mark is rename to pkt_mark in
internal data structures for consistency.
This does not rename the Linux interfaces because doing so would break
the API. It would not necessarily be desirable to do anyways since in
Linux-specific code it is clearer to use the actual name rather than a
generic one. This can lead to confusion in some places, however, because
we do not always strictly separate generic and platform dependent code
(one example is actions). This seems inevitable though at this point if
the lower and upper layers have different names (as they must given the
above requirements).
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
When determining the fields to un-wildcard, we need to be careful
about only un-wildcarding fields that are relevant. Also, we
didn't properly handle IPv6 addresses.
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
These functions are used so often, that having an easy to read
helper is worth it.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Until now, datapath ports and openflow ports were both represented by
unsigned integers of various sizes. With implicit conversions, etc., it is
easy to mix them up and use one where the other is expected. This commit
creates two typedefs, ofp_port_t and odp_port_t. Both of these two types
are marked by "__attribute__((bitwise))" so that sparse can be used to
detect any misuse.
Signed-off-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Dynamically determines the flow fields that were relevant in
processing flows based on the OpenFlow flow table and switch
configuration. The immediate use for this functionality is to
cache action translations for similar flows in facets. This yields
a roughly 80% improvement in flow set up rates for a complicated
flow table.
More importantly, these wildcards will be used to determine what to
wildcard for the forthcoming kernel wildcard (megaflow) patches
that will allow wildcarding in the kernel, which will provide
significant flow set up improvements.
The approach to tracking fields and caching action translations in
facets was based on an impressive prototype by Ethan Jackson.
Co-authored-by: Ethan Jackson <ethan@nicira.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Rename the function flow_wildcards_combine() to flow_wildcards_and().
Add new flow_wildcards_or() and flow_hash_in_wildcards() functions.
These will be useful in a future patch.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Signed-off-by: Justin Pettit <jpettit@nicira.com>
This function will be useful in a future commit.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Co-authored-by: Justin Pettit <jpettit@nicira.com>
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Adds tun_src and tun_dst match and set capabilities via new NXM fields
NXM_NX_TUN_IPV4_SRC and NXM_NX_TUN_IPV4_DST. This allows management of
large number of tunnels via the flow tables, without requiring the tunnels
to be pre-configured.
Flow-based tunnels can be configured with options remote_ip=flow and
local_ip=flow. local_ip=flow requires remote_ip=flow. When set, the
tunnel remote IP address and/or local IP address is set from the flow,
instead of the tunnel configuration.
Example:
$ ovs-vsctl add-port br0 gre -- set Interface gre ofport_request=1 type=gre options:remote_ip=flow options:key=flow
$ ovs-ofctl add-flow br0 "in_port=LOCAL actions=set_tunnel:1,set_field:192.168.0.1->tun_dst,output:1"
$ ovs-ofctl add-flow br0 "in_port=1 tun_src=192.168.0.1 tun_id=1 actions=LOCAL"
Signed-off-by: Jarno Rajahalme <jarno.rajahalme@nsn.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
There were plans to use this in conjunction with inner/outer flows,
however that plan has been changed in favour of using recirculation.
This leaves us with the current usage.
encal_dl_type is currently only used to allow decoding of packets used in
the test suite. However, this is a bit of a fudge and the packets may be
provided as hexadecimal instead.
Also remove comments from parse_l2_5_onward() relating to MPLS which are
not in keeping with the commenting throughout the rest of the function.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Jesse Gross <jesse@nicira.com>
It was planned to use this code to allow further processing of packets, a
second pass done when constructing a flow. Instead it is now planned to
use recirculation to address the problems that secondary processing aimed
to resolve. As a result there are no longer plans to use
flow_extract_l3_onwards() and it seems prudent to remove it.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Jesse Gross <jesse@nicira.com>
This patch implements use-space datapath and non-datapath code
to match and use the datapath API set out in Leo Alterman's patch
"user-space datapath: Add basic MPLS support to kernel".
The resulting MPLS implementation supports:
* Pushing a single MPLS label
* Poping a single MPLS label
* Modifying an MPLS lable using set-field or load actions
that act on the label value, tc and bos bit.
* There is no support for manipulating the TTL
this is considered future work.
The single-level push pop limitation is implemented by processing
push, pop and set-field/load actions in order and discarding information
that would require multiple levels of push/pop to be supported.
e.g.
push,push -> the first push is discarded
pop,pop -> the first pop is discarded
This patch is based heavily on work by Ravi K.
Cc: Ravi K <rkerur@gmail.com>
Reviewed-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Split the L3 and above portion of flow_extract() out into
flow_extract_l3_onwards() and call flow_extract_l3_onwards()
from flow_extract().
This is to allow re-extraction of l3 and higher information using
flow->encap_dl_type which may be set using information contained
in actions.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>
My previous 72e8bf28bb38e8816435c64859fb350215b6a9e6 (datapath:
add skb mark matching and set action) commit broke 32-bit builds.
This patch assures that size of struct flow is equal on both
32-bit and 64-bit architectures so that build asserts would
not fire anymore.
Acked-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Ansis Atteka <aatteka@nicira.com>
This patch adds support for skb mark matching and set action.
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Ansis Atteka <aatteka@nicira.com>
The current code has a simple mapping between datapath and OpenFlow port
numbers (the port numbers were the same other than OFPP_LOCAL which maps
to datapath port 0). Since the translation was know at compile time,
this allowed different layers to easily translate between the two, so
the translation often occurred late.
A future commit will break this simple mapping, so this commit draws a
line between where datapath and OpenFlow port numbers are used. The
ofproto-dpif layer will be responsible for the translations. Callers
above will use OpenFlow port numbers. Providers below will use
datapath port numbers.
Signed-off-by: Justin Pettit <jpettit@nicira.com>
The new struct flow_tnl contains an extra four bytes of padding on
64-bit machines but we currently assert that the total struct flow
is a fixed size. The size difference isn't actually a problem
because both are multiples of 4 and the build assertion is only
intended to remind people to update FLOW_WC_SEQ when new fields are
added. This changes the assertion to fix just the non-tunnel field
size.
Suggested-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Soon the kernel will begin supplying the information about the outer
IP header for tunneled packets and userspace will need to be able to
track it as part of the flow. For the time being this is only used
internally by OVS and not exposed outwards to OpenFlow. As a result,
this threads the information throughout userspace but simply stores
the existing tun_id in it.
Signed-off-by: Jesse Gross <jesse@nicira.com>
A cls_rule is 324 bytes on i386 now. The cost of a flow table lookup is
currently proportional to this size, which is going to continue to grow.
However, the required cost of a flow table lookup, with the classifier that
we currently use, is only proportional to the number of bits that a rule
actually matches. This commit implements that optimization by replacing
the match inside "struct cls_rule" by a sparse representation.
This reduces struct cls_rule to 100 bytes on i386.
There is still some headroom for further optimization following this
commit:
- I suspect that adding an 'n' member to struct miniflow would make
miniflow operations faster, since popcount() has some cost.
- It's probably possible to replace the "struct minimatch" in cls_rule
by just a "struct miniflow", since the cls_rule's cls_table has a
copy of the minimask.
- Some of the miniflow operations aren't well-optimized.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Now that "struct flow" and "struct flow_wildcards" have the same simple
and uniform structure, it's easy to handle common operations by just
iterating over the bits inside them.
Signed-off-by: Ben Pfaff <blp@nicira.com>
It's only used in a not-very-useful assertion in some test code. In
general, exact-match flows make very little sense anymore, and they're
basically on their way out.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Since we know these bytes are always 0 in both structures, we can use
faster functions that only work with full words.
Signed-off-by: Ben Pfaff <blp@nicira.com>