Several encapsulation formats have the concept of an 'OAM' bit
which typically is used with networking tracing tools to
distinguish test packets from real traffic. OVS already internally
has support for this, however, it doesn't do anything with it
and it also isn't exposed for controllers to use. This enables
support through OpenFlow.
There are several other tunnel flags which are consumed internally
by OVS. It's not clear that it makes sense to use them externally
so this does not expose those flags - although it should be easy
to do so if necessary in the future.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
The current support for Geneve in OVS is exactly equivalent to VXLAN:
it is possible to set and match on the VNI but not on any options
contained in the header. This patch enables the use of options.
The goal for Geneve support is not to add support for any particular option
but to allow end users or controllers to specify what they would like to
match. That is, the full range of Geneve's capabilities should be exposed
without modifying the code (the one exception being options that require
per-packet computation in the fast path).
The main issue with supporting Geneve options is how to integrate the
fields into the existing OpenFlow pipeline. All existing operations
are referred to by their NXM/OXM field name - matches, action generation,
arithmetic operations (i.e. tranfer to a register). However, the Geneve
option space is exactly the same as the OXM space, so a direct mapping
is not feasible. Instead, we create a pool of 64 NXMs that are then
dynamically mapped on Geneve option TLVs using OpenFlow. Once mapped,
these fields become first-class citizens in the OpenFlow pipeline.
An example of how to use Geneve options:
ovs-ofctl add-geneve-map br0 {class=0xffff,type=0,len=4}->tun_metadata0
ovs-ofctl add-flow br0 in_port=LOCAL,actions=set_field:0xffffffff->tun_metadata0,1
This will add a 4 bytes option (filled will all 1's) to all packets
coming from the LOCAL port and then send then out to port 1.
A limitation of this patch is that although the option table is specified
for a particular switch over OpenFlow, it is currently global to all
switches. This will be addressed in a future patch.
Based on work originally done by Madhu Challa. Ben Pfaff also significantly
improved the comments.
Signed-off-by: Madhu Challa <challa@noironetworks.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
When reading in hex strings that form NXM fields, we don't need to
enforce size constraints if the fields are variable length.
Instead, we can set the header size based on the string length.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
It is technically correct to send the entire maximum length of
a field when it is variable length. However, it is awkward to
do so and not what one would naively expect. Since receivers will
internally zero-extend fields, we can do the opposite and trim
off leading zeros. This results in encodings that are generally
sensible without specific knowledge of what is being transmitted.
(Of course, other implementations, such as controllers, may know
exactly the expected length of the field and are free to encode
it that way even if it has leading zeros.)
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Currently when an NXM field is encoded, the caller must specify
the length of the data being provided. However, this data is
always placed into a field of standard length. In order to
support variable length options, the length field must also
alter the size in the header. The previous implementation
already required callers to pass in the exact (fixed) size of
the field or it would not work properly, so there is no danger
that this will change the behavior for non-variable length
fields.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This adds support for receiving variable length fields encoded in
NXM/OXM and mapping them into OVS internal structures. In order
for this to make sense, we need to define some semantics:
There are three lengths that matter in this process: the maximum
size of the field (represented as the existing mf->n_bytes), the
size of the field in the incoming NXM (given by the length in the
NXM header), and the currently configured length of the field
(defined by the consumer of the field and outside the scope of
this patch).
Fields are modeled as being their maximum length and have the
characteristics expected by exsiting code (i.e. exact match fields
have masks that are all 1's for the whole field, etc.). Incoming
NXMs are stored in the field in the least significant bits. If
the NXM length is larger than the field, is is truncated, if it
is smaller it is zero-extended. When the field is consumed, the
component that needs data picks the configured length out of the
generated field.
In most cases, the configured and NXM lengths will be equal and
these edge cases do not matter. However, since we cannot easily
enforce that the lengths match (and might not even know what the
right length is, such as in the case of a string parsed by
ovs-ofctl), these semantics should provide deterministic results
that are easy to understand and not require most existing code
to be aware of variable length fields.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Currently we treat the entire NXM/OXM header, including length,
as an ID to define a field. However, this does not allow for
multiple lengths of a particular field.
If a field has been marked as variable, we should ignore the length
when looking up the field and only use the class and field. We
continue to use the length for non-variable fields to ensure that
we don't accept something that can never match.
Signed-off-by: Jesse Gross <jesse@nicira.com>
NXM/OXM headers as represented in this file are 64-bit long and the low
bits are essentially constant (almost always 0) so using hash_int(),
which takes an uint32_t, is going to be a useless hash function. This
commit fixes the problem.
Found by inspection.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Include NTR selection method experimenter group property in
in group mod request and group desc reply.
NTR selection method
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This is in preparation for supporting group mod and desc reply
messages with an NTR selection method group experimenter property.
Currently decoding always fails as it only allows properties for known
selection methods and no selection methods are known yet. A subsequent
patch will propose a hash selection method.
NTR selection method
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Prevent a peer patch port bridge from popping data off or pushing data
to the stack of the first bridge.
Found by inspection.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
ofpbuf was complicated due to its wide usage across all
layers of OVS, Now we have introduced independent dp_packet
which can be used for datapath packet, we can simplify ofpbuf.
Following patch removes DPDK mbuf and access API of ofpbuf
members.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Introduces two new NXMs to represent VXLAN-GBP [0] fields.
actions=load:0x10->NXM_NX_TUN_GBP_ID[],NORMAL
tun_gbp_id=0x10,actions=drop
This enables existing VXLAN tunnels to carry security label
information such as a SELinux context to other network peers.
The values are carried to/from the datapath using the attribute
OVS_TUNNEL_KEY_ATTR_VXLAN_OPTS.
[0] https://tools.ietf.org/html/draft-smith-vxlan-group-policy-00
Signed-off-by: Madhu Challa <challa@noironetworks.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
A "conjunctive match" allows higher-level matches in the flow table, such
as set membership matches, without causing a cross-product explosion for
multidimensional matches. Please refer to the documentation that this
commit adds to ovs-ofctl(8) for a better explanation, including an example.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
So far the compressed flow data in struct miniflow has been in 32-bit
words with a 63-bit map, allowing for a maximum size of struct flow of
252 bytes. With the forthcoming Geneve options this is not sufficient
any more.
This patch solves the problem by changing the miniflow data to 64-bit
words, doubling the flow max size to 504 bytes. Since the word size
is doubled, there is some loss in compression efficiency. To counter
this some of the flow fields have been reordered to keep related
fields together (e.g., the source and destination IP addresses share
the same 64-bit word).
This change should speed up flow data processing on 64-bit CPUs, which
may help counterbalance the impact of making the struct flow bigger in
the future.
Classifier lookup stage boundaries are also changed to 64-bit
alignment, as the current algorithm depends on each miniflow word to
not be split between ranges. This has resulted in new padding (part
of the 'mpls_lse' field).
The 'dp_hash' field is also moved to packet metadata to eliminate
otherwise needed padding there. This allows the L4 to fit into one
64-bit word, and also makes matches on 'dp_hash' more efficient as
misses can be found already on stage 1.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
A new function vlog_insert_module() is introduced to avoid using
list_insert() from the vlog.h header.
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Ben Pfaff <blp@nicira.com>
struct list is a common name and can't be used in public headers.
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Ben Pfaff <blp@nicira.com>
The following macros are renamed to avoid conflicts with other headers:
* WARN_UNUSED_RESULT to OVS_WARN_UNUSED_RESULT
* PRINTF_FORMAT to OVS_PRINTF_FORMAT
* NO_RETURN to OVS_NO_RETURN
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Packets with 'LATER' fragment do not have a transport header, so it is
not possible to either match on or set transport ports on such
packets. Matching is prevented by augmenting mf_are_prereqs_ok() with
a nw_frag 'LATER' bit check. Setting the transport headers on such
packets is prevented in three ways:
1. Flows with an explicit match on nw_frag, where the LATER bit is 1:
existing calls to the modified mf_are_prereqs_ok() prohibit using
transport header fields (port numbers) in OXM/NXM actions
(set_field, move). SET_TP_* actions need a new check on the LATER
bit.
2. Flows that wildcard the nw_frag LATER bit: At flow translation
time, add calls to mf_are_prereqs_ok() to make sure that we do not
use transport ports in flows that do not have them.
3. At action execution time, do not set transport ports, if the packet
does not have a full transport header. This ensures that we never
call the packet_set functions, that require a valid transport
header, with packets that do not have them. For example, if the
flow was created with a IPv6 first fragment that had the full TCP
header, but the next packet's first fragment is missing them.
3 alone would suffice for correct behavior, but 1 and 2 seem like a
right thing to do, anyway.
Currently, if we are setting port numbers, we will also match them,
due to us tracking the set fields with the same flow_wildcards as the
matched fields. Hence, if the incoming port number was not zero, the
flow would not match any packets with missing or truncated transport
headers. However, relying on no packets having zero port numbers
would not be very robust. Also, we may separate the tracking of set
and matched fields in the future, which would allow some flows that
blindly set port numbers to not match on them at all.
For TCP in case 3 we use ofpbuf_get_tcp_payload() that requires the
whole (potentially variable size) TCP header to be present. However,
when parsing a flow, we only require the fixed size portion of the TCP
header to be present, which would be enough to set the port numbers
and fix the TCP checksum.
Finally, we add tests testing the new behavior.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This field allows a flow table to match on the output port currently in the
action set.
ONF-JIRA: EXT-233
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
actset_output, to be added in an upcoming commit, has one OXM assignment
in OpenFlow 1.3 and another one in OpenFlow 1.5. This commit allows both
of them to be supported in appropriate OpenFlow versions.
This feature is difficult to test on its own, so the same commit that adds
actset_output support also tests this feature.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
OpenFlow 1.2+ defines a means for vendors to define vendor-specific OXM
fields, called "experimenter OXM". These OXM fields are expressed with a
64-bit OXM header instead of the 32-bit header used for standard OXM (and
NXM). Until now, OVS has not implemented experimenter OXM, and indeed we
have had little need to do so because of a pair of special 32-bit OXM classes
grandfathered to OVS as part of the OpenFlow 1.2 standardization process.
However, I want to prototype a feature for OpenFlow 1.5 that uses an
experimenter OXM as part of the prototype, so to do this OVS needs to
support experimenter OXM. This commit adds that support.
Most of this commit is a fairly straightforward change: it extends the type
used for OXM/NXM from 32 to 64 bits and adds code to encode and decode the
longer headers when necessary. Some other changes are necessary because
experimenter OXMs have a funny idea of the division between "header" and
"body": the extra 32 bits for experimenter OXMs are counted as part of the body
rather than the header according to the OpenFlow standard (even though this
does not entirely make sense), so arithmetic in various places has to be
adjusted, which is the reason for the new functions nxm_experimenter_len(),
nxm_payload_len(), and nxm_header_len().
Another change that calls for explanation is the new function mf_nxm_header()
that has been split from mf_oxm_header(). This function is used in actions
where the space for an NXM or OXM header is fixed so that there is no room
for a 64-bit experimenter type. An upcoming commit will add new variations
of these actions that can support experimenter OXM.
Testing experimenter OXM is tricky because I do not know of any in
widespread use. Two ONF proposals use experimenter OXMs: EXT-256 and
EXT-233. EXT-256 is not suitable to implement for testing because its use
of experimenter OXM is wrong and will be changed. EXT-233 is not suitable
to implement for testing because it requires adding a new field to struct
flow and I am not yet convinced that that field and the feature that it
supports is worth having in Open vSwitch. Thus, this commit assigns an
experimenter OXM code point to an existing OVS field that is currently
restricted from use by controllers, "dp_hash", and uses that for testing.
Because controllers cannot use it, this leaves future versions of OVS free
to drop the support for the experimenter OXM for this field without causing
backward compatibility problems.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
OXM renamed the 'vendor' field from NXM to the 'class', and uses the term
"experimenter", which OVS usually renders as "vendor" for historical
reasons, as part of the extended 64-bit OXMs. To reduce confusion, this
commit adopts the OXM terminology for class.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
OpenFlow 1.5 (draft) extends the OFPAT_SET_FIELD action originally
introduced in OpenFlow 1.2 so that it can set not just entire fields but
any subset of bits within a field as well. This commit adds support for
that feature when OpenFlow 1.5 is used.
With this feature, OFPAT_SET_FIELD becomes a superset of NXAST_REG_LOAD.
Thus, this commit merges the implementations of the two actions into a
single ofpact_set_field.
ONF-JIRA: EXT-314
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
This improves the general abstraction of OXM/NXM by eliminating direct
knowledge of it from the meta-flow code and other places.
Some function renaming might be called for; for example, mf_oxm_header()
may not be the best name now that the function is implemented within
nx-match. However, these renamings would make this commit larger and
harder to review, so I'm postponing them.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
NXM/OXM are only supposed to put 1-bits in a value if the corresponding bit
in the mask is a 1-bit, but in the case of cookie matching, e.g.
ovs-ofctl del-flows br0 cookie=0x3/0x1
ovs-ofctl would encode a bad OXM. This fixes the problem.
(The test "ofproto - del flows based on cookie mask" in the OVS testsuite
tickles this bug.)
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
dp_hash and recirc_id are specific to OVS, but that doesn't mean that we
shouldn't encode them into flow matches when OXM is used in OpenFlow 1.2
and later.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Following this change, only meta-flow.c uses any explicit NXM_* or OXM_*
constants. An upcoming commit will actually remove the definitions of
these constants, hiding them behind a functional interface, for better
abstraction.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
A log message has warned that this was going to happen for some time, and
newer OpenFlow versions require this behavior.
The test updates fix this behavior in our testsuite.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Commit 79fe0f4611 (meta-flow: Add 64-bit registers.) added support for
the OpenFlow 1.5 (draft) standardized registers, but neglected to cause
them to be serialized when Open vSwitch composes flow matches. This meant
that they were always sent to a controller as pairs of Nicira extension
registers. This commit fixes the problem.
Found by inspection.
ONF-JIRA: EXT-244
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Until now, knowledge about OpenFlow has been somewhat scattered around the
tree. Some of it is in ofp-actions, some of it is in ofp-util, some in
separate files for individual actions, and most of the wire format
declarations are in include/openflow. This commit centralizes all of that
in ofp-actions.
Encoding and decoding OpenFlow actions was previously broken up by OpenFlow
version. This was OK with only OpenFlow 1.0 and 1.1, but each additional
version added a new wrapper around the existing ones, which started to
become hard to understand. This commit merges all of the processing for
the different versions, to the extent that they are similar, making the
version differences clearer.
Previously, ofp-actions contained OpenFlow encoding and decoding, plus
ofpact formatting, but OpenFlow parsing was separated into ofp-parse, which
seems an odd division. This commit moves the parsing code into ofp-actions
with the rest of the code.
Before this commit, the four main bits of code associated with a particular
ofpact--OpenFlow encoding and decoding, ofpact formatting and parsing--were
all found far away from each other. This often made it hard to see what
was going on for a particular ofpact, since you had to search around to
many different pieces of code. This commit reorganizes so that all of the
code for a given ofpact is in a single place.
As a code refactoring, this commit has little visible behavioral change.
The update to ofproto-dpif.at illustrates one minor bug fix as a side
effect: a flow that was added with the action "dec_ttl" (a standard
OpenFlow action) was previously formatted as "dec_ttl(0)" (using a Nicira
extension to specifically direct packets bounced to the controller because
of too-low TTL), but after this commit it is correctly formatted as
"dec_ttl".
The other visible effect is to drop support for the Nicira extension
dec_ttl action in OpenFlow 1.1 and later in favor of the equivalent
standard action. It seems unlikely that anyone was really using the
Nicira extension in OF1.1 or later.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Until now, some fields have been handled in the caller, and the caller has
been responsible for distinguishing ICMPv4 from ICMPv6. This
implementation seems to make the code a little easier to understand.
Signed-off-by: Ben Pfaff <blp@nicira.com>
miniflow_extract() extracts packet headers directly to a miniflow,
which is a compressed form of the struct flow. This does not require
a large struct to be cleared to begin with, and accesses less memory.
These performance benefits should allow this to be used in the DPDK
datapath.
miniflow_extract() takes a miniflow as an input/output parameter. On
input the buffer for values to be extracted must be properly
initialized. On output the map contains ones for all the fields that
have been extracted.
Some struct flow fields are reordered to make miniflow_extract to
progress in the logical order.
Some explicit "inline" keywords are necessary for GCC to optimize this
properly. Also, macros are used for same reason instead of inline
functions for pushing data to the miniflow.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Reviewed-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
This patch shrinks the struct ofpbuf from 104 to 48 bytes on 64-bit
systems, or from 52 to 36 bytes on 32-bit systems (counting in the
'l7' removal from an earlier patch). This may help contribute to
cache efficiency, and will speed up initializing, copying and
manipulating ofpbufs. This is potentially important for the DPDK
datapath, but the rest of the code base may also see a little benefit.
Changes are:
- Remove 'l7' pointer (previous patch).
- Use offsets instead of layer pointers for l2_5, l3, and l4 using
'l2' as basis. Usually 'data' is the same as 'l2', but this is not
always the case (e.g., when parsing or constructing a packet), so it
can not be easily used as the offset basis. Also, packet parsing is
faster if we do not need to maintain the offsets each time we pull
data from the ofpbuf.
- Use uint32_t for 'allocated' and 'size', as 2^32 is enough even for
largest possible messages/packets.
- Use packed enum for 'source'.
- Rearrange to avoid unnecessary padding.
- Remove 'private_p', which was used only in two cases, both of which
had the invariant ('l2' == 'data'), so we can temporarily use 'l2'
as a private pointer.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
PAD_SIZE(x,y) is a little shorter and may have a more obvious meaning
than ROUND_UP(x,y) - x.
I intend to add more users in an upcoming comment.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
This commit makes the userspace support for MPLS more complete. Now
up to 3 labels are supported.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Co-authored-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Simon Horman <horms@verge.net.au>
The MSVC C library printf() implementation does not support the 'z', 't',
'j', or 'hh' format specifiers. This commit changes the Open vSwitch code
to avoid those format specifiers, switching to standard macros from
<inttypes.h> where available and inventing new macros resembling them
where necessary. It also updates CodingStyle to specify the macros' use
and adds a Makefile rule to report violations.
Signed-off-by: Alin Serdean <aserdean@cloudbasesolutions.com>
Co-authored-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Subtable lookup is performed in ranges defined for struct flow,
starting from metadata (registers, in_port, etc.), then L2 header, L3,
and finally L4 ports. Whenever it is found that there are no matches
in the current subtable, the rest of the subtable can be skipped. The
rationale of this logic is that as many fields as possible can remain
wildcarded.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
tcp_flags=flags/mask
Bitwise match on TCP flags. The flags and mask are 16-bit num‐
bers written in decimal or in hexadecimal prefixed by 0x. Each
1-bit in mask requires that the corresponding bit in port must
match. Each 0-bit in mask causes the corresponding bit to be
ignored.
TCP protocol currently defines 9 flag bits, and additional 3
bits are reserved (must be transmitted as zero), see RFCs 793,
3168, and 3540. The flag bits are, numbering from the least
significant bit:
0: FIN No more data from sender.
1: SYN Synchronize sequence numbers.
2: RST Reset the connection.
3: PSH Push function.
4: ACK Acknowledgement field significant.
5: URG Urgent pointer field significant.
6: ECE ECN Echo.
7: CWR Congestion Windows Reduced.
8: NS Nonce Sum.
9-11: Reserved.
12-15: Not matchable, must be zero.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>