Ethernet headers are 14 bytes long, so when the beginning of such a header
is 32-bit aligned, the following data is misaligned. The usual trick to
fix that is to start the Ethernet header on an odd-numbered 16-bit
boundary. That trick works OK for Open vSwitch, but there are two
problems:
- OVS doesn't use that trick everywhere. Maybe it should, but it's
difficult to make sure that it does consistently because the CPUs
most commonly used with OVS don't care about misalignment, so we
only find problems when porting.
- Some protocols (GRE, VXLAN) don't use that trick, so in such a case
one can properly align the inner or outer L3/L4/L7 but not both. (OVS
userspace doesn't directly deal with such protocols yet, so this is
just future-proofing.)
- OpenFlow uses the alignment trick in a few places but not all of them.
This commit starts the adoption of what I hope will be a more robust way
to avoid misalignment problems and the resulting bus errors on RISC
architectures. Instead of trying to ensure that 32-bit quantities are
always aligned, we always read them as if they were misaligned. To ensure
that they are read this way, we change their types from 32-bit types to
pairs of 16-bit types. (I don't know of any protocols that offset the
next header by an odd number of bytes, so a 16-bit alignment assumption
seems OK.)
The same would be necessary for 64-bit types in protocol headers, but we
don't yet have any protocol definitions with 64-bit types.
IPv6 protocol headers need the same treatment, but for those we rely on
structs provided by system headers, so I'll leave them for an upcoming
patch.
Signed-off-by: Ben Pfaff <blp@nicira.com>
The Linux kernel datapath enables matching and setting the skb mark
but this functionality is currently used only internally by
ovs-vswitchd. This exposes it through NXM to enable external
controllers to interact with other kernel subsystems. Although this
is simply exporting the skb mark, the intention is that this is a
platform independent mechanism to access some system metadata and
therefore may have different implementations on various systems.
Bug #17855
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
The skb_mark field is currently only available with the Linux datapath
and is only used internally. However, it is desirable to expose this
through OpenFlow and when it is exposed ideally it would not be system-
specific. In preparation for this, skb_mark is rename to pkt_mark in
internal data structures for consistency.
This does not rename the Linux interfaces because doing so would break
the API. It would not necessarily be desirable to do anyways since in
Linux-specific code it is clearer to use the actual name rather than a
generic one. This can lead to confusion in some places, however, because
we do not always strictly separate generic and platform dependent code
(one example is actions). This seems inevitable though at this point if
the lower and upper layers have different names (as they must given the
above requirements).
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
When determinining what fields to un-wildcard for the symmetric L4 hash,
don't include the IPv6 address fields if the packet isn't IPv6.
Reported-by: Jarno Rajahalme <jarno.rajahalme@nsn.com>
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
When determining the fields to un-wildcard, we need to be careful
about only un-wildcarding fields that are relevant. Also, we
didn't properly handle IPv6 addresses.
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
We always look at the fragment status and often look at other L3
headers when processing the packet, so just un-wildcard the
Ethertype.
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Until now, datapath ports and openflow ports were both represented by
unsigned integers of various sizes. With implicit conversions, etc., it is
easy to mix them up and use one where the other is expected. This commit
creates two typedefs, ofp_port_t and odp_port_t. Both of these two types
are marked by "__attribute__((bitwise))" so that sparse can be used to
detect any misuse.
Signed-off-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
A number of use-cases weren't handled properly when determining what can
be wildcarded for megaflows. This commit both catches additional fields
that cannot be wildcarded and loosens a few other cases.
Bug #17979
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Dynamically determines the flow fields that were relevant in
processing flows based on the OpenFlow flow table and switch
configuration. The immediate use for this functionality is to
cache action translations for similar flows in facets. This yields
a roughly 80% improvement in flow set up rates for a complicated
flow table.
More importantly, these wildcards will be used to determine what to
wildcard for the forthcoming kernel wildcard (megaflow) patches
that will allow wildcarding in the kernel, which will provide
significant flow set up improvements.
The approach to tracking fields and caching action translations in
facets was based on an impressive prototype by Ethan Jackson.
Co-authored-by: Ethan Jackson <ethan@nicira.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Rename the function flow_wildcards_combine() to flow_wildcards_and().
Add new flow_wildcards_or() and flow_hash_in_wildcards() functions.
These will be useful in a future patch.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Signed-off-by: Justin Pettit <jpettit@nicira.com>
This function will be useful in a future commit.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Co-authored-by: Justin Pettit <jpettit@nicira.com>
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Adds tun_src and tun_dst match and set capabilities via new NXM fields
NXM_NX_TUN_IPV4_SRC and NXM_NX_TUN_IPV4_DST. This allows management of
large number of tunnels via the flow tables, without requiring the tunnels
to be pre-configured.
Flow-based tunnels can be configured with options remote_ip=flow and
local_ip=flow. local_ip=flow requires remote_ip=flow. When set, the
tunnel remote IP address and/or local IP address is set from the flow,
instead of the tunnel configuration.
Example:
$ ovs-vsctl add-port br0 gre -- set Interface gre ofport_request=1 type=gre options:remote_ip=flow options:key=flow
$ ovs-ofctl add-flow br0 "in_port=LOCAL actions=set_tunnel:1,set_field:192.168.0.1->tun_dst,output:1"
$ ovs-ofctl add-flow br0 "in_port=1 tun_src=192.168.0.1 tun_id=1 actions=LOCAL"
Signed-off-by: Jarno Rajahalme <jarno.rajahalme@nsn.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
IPv6 fragmented packet (except first fragment) will not be handled
correctly. When extracting packet at parse_ipv6(), although nw_frag
should have both of FLOW_NW_FRAG_ANY and FLOW_NW_FRAG_LATER for
later fragment, only FLOW_NW_FRAG_LATER is set.
Signed-off-by: Takashi Kawaguchi <kawaguchi-takashi@mxd.nes.nec.co.jp>
Signed-off-by: Ken Ajiro <ajiro@mxw.nes.nec.co.jp>
Signed-off-by: Jesse Gross <jesse@nicira.com>
There were plans to use this in conjunction with inner/outer flows,
however that plan has been changed in favour of using recirculation.
This leaves us with the current usage.
encal_dl_type is currently only used to allow decoding of packets used in
the test suite. However, this is a bit of a fudge and the packets may be
provided as hexadecimal instead.
Also remove comments from parse_l2_5_onward() relating to MPLS which are
not in keeping with the commenting throughout the rest of the function.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Jesse Gross <jesse@nicira.com>
It was planned to use this code to allow further processing of packets, a
second pass done when constructing a flow. Instead it is now planned to
use recirculation to address the problems that secondary processing aimed
to resolve. As a result there are no longer plans to use
flow_extract_l3_onwards() and it seems prudent to remove it.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Jesse Gross <jesse@nicira.com>
This adds support for the OpenFlow 1.1+ dec_mpls_ttl action.
And also adds an NX dec_mpls_ttl action.
The handling of the TTL modification is entirely handled in userspace.
Reviewed-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Before this patch, if an LLC/SNAP packet with OUI 00:00:00 had an ethertype
less than 1536 the flow key given to userspace in the upcall would contain the
invalid ethertype (for example, 3). If userspace attempted to insert a kernel
flow for this key it would be rejected by ovs_flow_from_nlattrs.
This patch allows OVS to pass the OFTest pktact.DirectBadLlcPackets.
Signed-off-by: Rich Lane <rlane@bigswitch.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
This patch implements use-space datapath and non-datapath code
to match and use the datapath API set out in Leo Alterman's patch
"user-space datapath: Add basic MPLS support to kernel".
The resulting MPLS implementation supports:
* Pushing a single MPLS label
* Poping a single MPLS label
* Modifying an MPLS lable using set-field or load actions
that act on the label value, tc and bos bit.
* There is no support for manipulating the TTL
this is considered future work.
The single-level push pop limitation is implemented by processing
push, pop and set-field/load actions in order and discarding information
that would require multiple levels of push/pop to be supported.
e.g.
push,push -> the first push is discarded
pop,pop -> the first pop is discarded
This patch is based heavily on work by Ravi K.
Cc: Ravi K <rkerur@gmail.com>
Reviewed-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>
murmurhash is faster than Jenkins and slightly higher quality, so switch to
it for hashing words.
The best timings I got for hashing for data lengths of the following
numbers of 32-bit words, in seconds per 1,000,000,000 hashes, were:
words murmurhash Jenkins hash
----- ---------- ------------
1 8.4 10.4
2 10.3 10.3
3 11.2 10.7
4 12.6 18.0
5 13.9 18.3
6 15.2 18.7
In other words, murmurhash outperforms Jenkins for all input lengths other
than exactly 3 32-bit words (12 bytes). (It's understandable that Jenkins
would have a best case at 12 bytes, because Jenkins works in 12-byte
chunks.) Even in the case where Jenkins is faster, it's only by 5%. On
average within this data set, murmurhash is 15% faster, and for 4-word
input it is 30% faster.
We retain Jenkins for flow_hash_symmetric_l4() and flow_hash_fields(),
which are cases where the hash value is exposed externally.
This commit appears to improve "ovs-benchmark rate" results slightly by
a few hundred connections per second (under 1%), when used with an NVP
controller.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
murmurhash includes an xor with the number of bytes hashed in its finishing
step. Until now, we've only had murmurhash for full words, but an upcoming
commit adds murmurhash for bytes, so the finishing function will need to
take a count of bytes instead.
Signed-off-by: Ben Pfaff <blp@nicira.com>
This is a straight search-and-replace, except that I also removed #include
<assert.h> from each file where there were no assert calls left.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
Split the L3 and above portion of flow_extract() out into
flow_extract_l3_onwards() and call flow_extract_l3_onwards()
from flow_extract().
This is to allow re-extraction of l3 and higher information using
flow->encap_dl_type which may be set using information contained
in actions.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>
The flow_format() function was incorrectly passing skb_priority
to the match_format() function. match_format() function instead
expects rule priority.
This issue was introduced with aa6c9932f2937fa9a2140ec1737668eb9105b0b5
(Change logging format for flows to that accepted by ofproto/trace).
Signed-off-by: Ansis Atteka <aatteka@nicira.com>
My previous 72e8bf28bb38e8816435c64859fb350215b6a9e6 (datapath:
add skb mark matching and set action) commit broke 32-bit builds.
This patch assures that size of struct flow is equal on both
32-bit and 64-bit architectures so that build asserts would
not fire anymore.
Acked-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Ansis Atteka <aatteka@nicira.com>
This patch adds support for skb mark matching and set action.
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Ansis Atteka <aatteka@nicira.com>
With this commit, OVS will match the data in the RARP packets having
ethertype 0x8035, in the same way as the data in the ARP packets.
Signed-off-by: Mehak Mahajan <mmahajan@nicira.com>
The current code has a simple mapping between datapath and OpenFlow port
numbers (the port numbers were the same other than OFPP_LOCAL which maps
to datapath port 0). Since the translation was know at compile time,
this allowed different layers to easily translate between the two, so
the translation often occurred late.
A future commit will break this simple mapping, so this commit draws a
line between where datapath and OpenFlow port numbers are used. The
ofproto-dpif layer will be responsible for the translations. Callers
above will use OpenFlow port numbers. Providers below will use
datapath port numbers.
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Thanks to Ben Pfaff for immediately pinpointing the likely location of
the issue.
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>
With this commit, the datapath will process the ARP header for
RARP packets. It also fixes a bug whereby if the ARP opcode is
something other than ARP request or reply, the key_len is not
adjusted to include ARP info.
Signed-off-by: Mehak Mahajan <mmahajan@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
flow_format() logs packets contents. However, the format used is not
the format accepted by ofproto/trace. Hence it becomes difficult to
trace the packets using the debugs printed. With this commit, the
logging of the packet contents is done in a format that is accepted
by ofproto/trace. This will make debugging easier.
Signed-off-by: Mehak Mahajan <mmahajan@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
Commit 296e07ace0f (flow: Extend struct flow to contain tunnel outer
header.) changed the tunnel ID parameter of flow_extract() from an integer
passed by value to a structure passed by pointer. Before flow_extract()
reads the tunnel ID, it zeros the entire flow parameter. This means that,
if a caller passes the address of the tunnel member of the flow as the
tunnel ID, then flow_extract() zeros the tunnel data before it reads and
copies the tunnel data (that it just zeroed). The result is that the
tunnel data is ignored.
This commit fixes the problem by making the caller that did this use a
separate flow structure instead of trying to be clever.
Bug #13461.
CC: Pankaj Thakkar <thakkar@nicira.com>
Reported-by: Michael Hu <mhu@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Soon the kernel will begin supplying the information about the outer
IP header for tunneled packets and userspace will need to be able to
track it as part of the flow. For the time being this is only used
internally by OVS and not exposed outwards to OpenFlow. As a result,
this threads the information throughout userspace but simply stores
the existing tun_id in it.
Signed-off-by: Jesse Gross <jesse@nicira.com>
A cls_rule is 324 bytes on i386 now. The cost of a flow table lookup is
currently proportional to this size, which is going to continue to grow.
However, the required cost of a flow table lookup, with the classifier that
we currently use, is only proportional to the number of bits that a rule
actually matches. This commit implements that optimization by replacing
the match inside "struct cls_rule" by a sparse representation.
This reduces struct cls_rule to 100 bytes on i386.
There is still some headroom for further optimization following this
commit:
- I suspect that adding an 'n' member to struct miniflow would make
miniflow operations faster, since popcount() has some cost.
- It's probably possible to replace the "struct minimatch" in cls_rule
by just a "struct miniflow", since the cls_rule's cls_table has a
copy of the minimask.
- Some of the miniflow operations aren't well-optimized.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Now that "struct flow" and "struct flow_wildcards" have the same simple
and uniform structure, it's easy to handle common operations by just
iterating over the bits inside them.
Signed-off-by: Ben Pfaff <blp@nicira.com>
It's only used in a not-very-useful assertion in some test code. In
general, exact-match flows make very little sense anymore, and they're
basically on their way out.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Since we know these bytes are always 0 in both structures, we can use
faster functions that only work with full words.
Signed-off-by: Ben Pfaff <blp@nicira.com>
NXM and OpenFlow 1.2+ allow including the values of arbitrary flow metadata
in "packet-in" messages. Open vSwitch has until now always included all
the values of the metadata fields that it implements in NXT_PACKET_IN
messages.
However, this has at least two disadvantages:
- Most of the metadata fields tend to be zero most of the time, which
wastes space in the message.
- It means that controllers must be very liberal about accepting
fields that they know nothing about in packet-in messages, since any
switch upgrade could cause new fields to appear even if the
controller does nothing to give them nonzero values. (Controllers
have to be prepared to tolerate unknown fields in any case, but this
property makes unknown fields more likely to appear than otherwise.)
This commit changes Open vSwitch so that metadata fields whose values are
zero are not reported in packet-ins, fixing both problems. (This is
explicitly allowed by OpenFlow 1.2+.)
This commit mainly fixes a sort of internal conceptual dissonance centering
around struct flow_metadata. This structure is supposed to report the
metadata for a given flow. If you look at a flow, it has particular
metadata values; it doesn't have masks, and the idea of a mask for a
particular flow doesn't really make sense. However, struct flow_metadata
did have masks. This led to internal confusion; one can see this in, for
example, the following code removed by this commit in ofproto-dpif.c to
handle misses in the OpenFlow flow table:
/* Registers aren't meaningful on a miss. */
memset(pin.fmd.reg_masks, 0, sizeof pin.fmd.reg_masks);
What this code was really trying to say is that on a flow miss, the
registers are zero, so they shouldn't be included in the packet-in message.
It did manage to omit the registers, by marking them as "wild", but it is
conceptually more correct to simply omit them because they are zero (and
that's one effect of this commit).
Bug #12968.
Reported-by: Igor Ganichev <iganichev@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>