Packets with 'LATER' fragment do not have a transport header, so it is
not possible to either match on or set transport ports on such
packets. Matching is prevented by augmenting mf_are_prereqs_ok() with
a nw_frag 'LATER' bit check. Setting the transport headers on such
packets is prevented in three ways:
1. Flows with an explicit match on nw_frag, where the LATER bit is 1:
existing calls to the modified mf_are_prereqs_ok() prohibit using
transport header fields (port numbers) in OXM/NXM actions
(set_field, move). SET_TP_* actions need a new check on the LATER
bit.
2. Flows that wildcard the nw_frag LATER bit: At flow translation
time, add calls to mf_are_prereqs_ok() to make sure that we do not
use transport ports in flows that do not have them.
3. At action execution time, do not set transport ports, if the packet
does not have a full transport header. This ensures that we never
call the packet_set functions, that require a valid transport
header, with packets that do not have them. For example, if the
flow was created with a IPv6 first fragment that had the full TCP
header, but the next packet's first fragment is missing them.
3 alone would suffice for correct behavior, but 1 and 2 seem like a
right thing to do, anyway.
Currently, if we are setting port numbers, we will also match them,
due to us tracking the set fields with the same flow_wildcards as the
matched fields. Hence, if the incoming port number was not zero, the
flow would not match any packets with missing or truncated transport
headers. However, relying on no packets having zero port numbers
would not be very robust. Also, we may separate the tracking of set
and matched fields in the future, which would allow some flows that
blindly set port numbers to not match on them at all.
For TCP in case 3 we use ofpbuf_get_tcp_payload() that requires the
whole (potentially variable size) TCP header to be present. However,
when parsing a flow, we only require the fixed size portion of the TCP
header to be present, which would be enough to set the port numbers
and fix the TCP checksum.
Finally, we add tests testing the new behavior.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This field allows a flow table to match on the output port currently in the
action set.
ONF-JIRA: EXT-233
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
OpenFlow 1.5 (draft) extends the OFPAT_SET_FIELD action originally
introduced in OpenFlow 1.2 so that it can set not just entire fields but
any subset of bits within a field as well. This commit adds support for
that feature when OpenFlow 1.5 is used.
With this feature, OFPAT_SET_FIELD becomes a superset of NXAST_REG_LOAD.
Thus, this commit merges the implementations of the two actions into a
single ofpact_set_field.
ONF-JIRA: EXT-314
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
This improves the general abstraction of OXM/NXM by eliminating direct
knowledge of it from the meta-flow code and other places.
Some function renaming might be called for; for example, mf_oxm_header()
may not be the best name now that the function is implemented within
nx-match. However, these renamings would make this commit larger and
harder to review, so I'm postponing them.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
This is a first step toward improving the abstraction of OXM and NXM in the
tree. As an immediate improvement, this commit removes all of the
definitions of the OXM and NXM constants from the top-level header files,
because they are no longer used anywhere.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Should index the first lse for all parts of the lse (label, TC, BOS).
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
is_all_zeros() and is_all_ones() operate on bytes, but just like with
memset, it is easier to use if the first argument is a void *.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
These 64-bit registers are intended to conform with the OpenFlow 1.5
draft specification.
EXT-244.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
At the time that Open vSwitch implemented registers, there was a high cost
to adding additional fields, so I wrote the code so that the number of
registers could be reduced at compile time. Now, fields are cheaper
(though not free) and in the meantime I have never heard of anyone reducing
the number of registers. Since I intend to add more code that would
require awkward "#if"s like this, I think that this is a good time to
simplify it by requiring FLOW_N_REGS to be fixed. This commit does that.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Commit 8bfd0fda mistakenly introduced duplicate "break;" statements to
MFF_MPLS_BOS handling. This patch removes them.
Found by inspection.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
ICMPv4 and ICMPv6 have 8-bit "type" and "code" fields. struct flow
uses the low 8 bits of the 16-bit tp_src and tp_dst members to
represent these fields. The datapath interface, on the other hand,
represents them with just 8 bits each. This means that if the high 8
bits of the masks for these fields somehow become set (meaning to
match on the nonexistent "high bits" of these fields) during
translation, then they will get chopped off by a round trip through
the datapath, and revalidation will spot that as an inconsistency and
delete the flow. This commit avoids the problem by making sure that
only the low 8 bits of either field can be unwildcarded for ICMP.
This seems like the minimal fix for this problem, appropriate for
backporting to earlier branches. The root of the issue is that these high
bits can get set in the match at all. I have some leads on that, but they
require more invasive changes elsewhere.
Bug #23320.
Reported-by: Krishna Miriyala <miriyalak@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
This commit makes the userspace support for MPLS more complete. Now
up to 3 labels are supported.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Co-authored-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Simon Horman <horms@verge.net.au>
This allows other libraries to use util.h that has already
defined NOT_REACHED.
Signed-off-by: Harold Lim <haroldl@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Add a prefix tree (trie) structure for tracking the used address
space, enabling skipping classifier tables containing longer masks
than necessary for an address field value in a packet header being
classified. This enables less unwildcarding for datapath flows in
parts of the address space without host routes.
Trie lookup is interwoven to the staged lookup, so that a trie is
searched only when the configured trie field becomes relevant
for the lookup. The trie lookup results are retained so that each
trie is checked at most once for each classifier lookup.
This implementation tracks the number of rules at each address prefix
for the whole classifier. More aggressive table skipping would be
possible by maintaining lists of tables that have prefixes at the
lengths encountered on tree traversal, or by maintaining separate
tries for subsets of rules separated by metadata fields.
Prefix tracking is configured via OVSDB. A new column "prefixes" is
added to the database table "Flow_Table". "prefixes" is a set of
string values listing the field names for which prefix lookup should
be used.
As of now, the fields for which prefix lookup can be enabled are:
- tun_id, tun_src, tun_dst
- nw_src, nw_dst (or aliases ip_src and ip_dst)
- ipv6_src, ipv6_dst
There is a maximum number of fields that can be enabled for any one
flow table. Currently this limit is 3.
Examples:
ovs-vsctl set Bridge br0 flow_tables:0=@N1 -- \
--id=@N1 create Flow_Table name=table0
ovs-vsctl set Bridge br0 flow_tables:1=@N1 -- \
--id=@N1 create Flow_Table name=table1
ovs-vsctl set Flow_Table table0 prefixes=ip_dst,ip_src
ovs-vsctl set Flow_Table table1 prefixes=[]
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Allow TCP flags match specification with symbolic flag names. TCP
flags are optionally specified as a string of flag names, each
preceded by '+' when the flag must be one, or '-' when the flag must
be zero. Any flags not explicitly included are wildcarded. The
existing hex syntax is still allowed, and is used in flow dumps when
all the flags are matched.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Adds OXM inspired aliases for match fields that don't have them
already ("ip_proto", "ip_ecn", "ip_dscp", and "tunnel_id").
"ip_dscp" replaces the earlier undocumented "nw_tos_shifted",
and takes the DSCP value (0-63), which is then shifted
appropriately when applied to an IP packet.
The number of bits for this field is fixed from 8 to 6.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
mf_from_id accesses a static table, so the compiler should be able to
completely optimize it away.
Also use OVS_PACKED_ENUM to waste less space.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Currently if a mask is supplied for an unmaskable match then NOT_REACHED()
is called. The effect of this for a user calling ovs-vsctl with a match
that includes a mask which is not permitted is to politely inform them of
the error of their ways by calling abort and segfaulting.
This patch takes an alternate approach to return no protocols which has the
has the effect when that ovs-vsctl is called with a match that includes a
mask which is not permitted an error message of the following form is
displayed.
ovs-ofctl: none of the usable flow formats (none) is among the allowed flow formats (OpenFlow10,NXM)
This patch also updates the ovs-ofctl test to test matches with masks
where possible.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>
tcp_flags=flags/mask
Bitwise match on TCP flags. The flags and mask are 16-bit num‐
bers written in decimal or in hexadecimal prefixed by 0x. Each
1-bit in mask requires that the corresponding bit in port must
match. Each 0-bit in mask causes the corresponding bit to be
ignored.
TCP protocol currently defines 9 flag bits, and additional 3
bits are reserved (must be transmitted as zero), see RFCs 793,
3168, and 3540. The flag bits are, numbering from the least
significant bit:
0: FIN No more data from sender.
1: SYN Synchronize sequence numbers.
2: RST Reset the connection.
3: PSH Push function.
4: ACK Acknowledgement field significant.
5: URG Urgent pointer field significant.
6: ECE ECN Echo.
7: CWR Congestion Windows Reduced.
8: NS Nonce Sum.
9-11: Reserved.
12-15: Not matchable, must be zero.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
With mega-flows, many flows in the kernel datapath are wildcarded.
For someone that is debugging a system and wants to find a particular
flow and its actions, it is a little hard to zero-in on the flow
because some fields are wildcarded.
With the filter='$filter' option, we can now filter on the o/p
of 'ovs-dpctl dump-flows'.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Sets mask bits for the given field and its prerequisite fields.
Needed for unwildcarding the proper bits from datapath masks.
Removed old prototype for mf_force_prereqs().
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This support is added through the userspace slow path, because we don't
judge that this is important enough to require permanent support in the
Linux kernel ABI.
Bug #19259.
CC: Teemu Koponen <koponen@nicira.com>
CC: Pankaj Thakkar <thakkar@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Keep track of usable protocols while parsing actions and matches,
rather than checking for them afterwards. This fixes silently discarded
meter and goto table instructions when not explicitly specifying the
protocol to use.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
OFPERR_OFPBAC_BAD_ARGUMENT is not as specific as the errors provided by
OpenFlow 1.2 and later.
Some of these errors needed Nicira extension numbers for use with OpenFlow
1.0 and 1.1, so this commit also adds those.
Some of these errors had poor explanations likely to confuse users, so this
commits improves them.
Some of the errors had the wrong names, so this commit fixes them.
Reported-by: Jean Tourrilhes <jt@hpl.hp.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jean Tourrilhes <jt@hpl.hp.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
The Linux kernel datapath enables matching and setting the skb mark
but this functionality is currently used only internally by
ovs-vswitchd. This exposes it through NXM to enable external
controllers to interact with other kernel subsystems. Although this
is simply exporting the skb mark, the intention is that this is a
platform independent mechanism to access some system metadata and
therefore may have different implementations on various systems.
Bug #17855
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
The skb_mark field is currently only available with the Linux datapath
and is only used internally. However, it is desirable to expose this
through OpenFlow and when it is exposed ideally it would not be system-
specific. In preparation for this, skb_mark is rename to pkt_mark in
internal data structures for consistency.
This does not rename the Linux interfaces because doing so would break
the API. It would not necessarily be desirable to do anyways since in
Linux-specific code it is clearer to use the actual name rather than a
generic one. This can lead to confusion in some places, however, because
we do not always strictly separate generic and platform dependent code
(one example is actions). This seems inevitable though at this point if
the lower and upper layers have different names (as they must given the
above requirements).
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
Otherwise, input with invalid trailing data was accepted, such as input
that had 7 colon-separated segments instead of 6.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
Until now, failure to parse a flow in the ofp-parse module has caused the
program to abort immediately with a fatal error. This makes it hard to
use these functions from any long-lived program. This commit fixes the
problem.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Until now, datapath ports and openflow ports were both represented by
unsigned integers of various sizes. With implicit conversions, etc., it is
easy to mix them up and use one where the other is expected. This commit
creates two typedefs, ofp_port_t and odp_port_t. Both of these two types
are marked by "__attribute__((bitwise))" so that sparse can be used to
detect any misuse.
Signed-off-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This makes life easier for a few callers, and it agrees with my usual
preference that a function should fill in its output parameters whether it
succeeds or not.
CC: Jarno Rajahalme <jarno.rajahalme@nsn.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This helps get rid of one special case in nx_pull_raw() and allows
loading of 32-bit values from/to OXM_OF_IN_PORT in NXAST_LEARN actions.
Previously the 16-bit limit acted the same on both NXM_OF_IN_PORT and
OXM_OF_IN_PORT, even though OF1.1+ controllers would expect OXM_OF_IN_PORT
to be 32 bits wide.
Signed-off-by: Jarno Rajahalme <jarno.rajahalme@nsn.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
ofputil_port_from_string() does all the work already.
Signed-off-by: Jarno Rajahalme <jarno.rajahalme@nsn.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Adds tun_src and tun_dst match and set capabilities via new NXM fields
NXM_NX_TUN_IPV4_SRC and NXM_NX_TUN_IPV4_DST. This allows management of
large number of tunnels via the flow tables, without requiring the tunnels
to be pre-configured.
Flow-based tunnels can be configured with options remote_ip=flow and
local_ip=flow. local_ip=flow requires remote_ip=flow. When set, the
tunnel remote IP address and/or local IP address is set from the flow,
instead of the tunnel configuration.
Example:
$ ovs-vsctl add-port br0 gre -- set Interface gre ofport_request=1 type=gre options:remote_ip=flow options:key=flow
$ ovs-ofctl add-flow br0 "in_port=LOCAL actions=set_tunnel:1,set_field:192.168.0.1->tun_dst,output:1"
$ ovs-ofctl add-flow br0 "in_port=1 tun_src=192.168.0.1 tun_id=1 actions=LOCAL"
Signed-off-by: Jarno Rajahalme <jarno.rajahalme@nsn.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
OpenFlow says that an "output" action to a flow's input port is ordinarily
dropped, unless the flow explicitly outputs to OFPP_IN_PORT. We've
occasionally been asked to implement some way to avoid this behavior in
cases where it is not easily known in advance whether a given port is the
input port (so that OFPP_IN_PORT is not easy to use).
This commit implements such a feature. With this commit, one may write:
actions=load:0->NXM_OF_IN_PORT[],output:123
which will output to port 123 regardless of whether it is the input port.
If the input port is important, then one may save and restore it on the
stack:
actions=push:NXM_OF_IN_PORT[],load:0->NXM_OF_IN_PORT[],output:123,
pop:NXM_OF_IN_PORT[]
(Sometimes I am asked whether "resubmit" changes the in_port and would
therefore interact badly with this feature. It does not. "resubmit" only
(optionally) changes the in_port used for the resubmit's flow table lookup.
It does not otherwise have any effect on in_port.)
Bug #14091.
CC: Jarno Rajahalme <jarno.rajahalme@nsn.com>
CC: Ronghua Zhang <rzhang@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
NXM puts the DSCP value in bits 2-7 of NXM_OF_IP_TOS.
OXM puts the DSCP value in bits 0-6 of OXM_OF_IP_DSCP.
Before this commit, Open vSwitch incorrectly implemented OXM_OF_IP_DSCP
with the same format as NXM_OF_IP_TOS. This commit fixes the problem and
adds a test (previously missing but I don't know why).
Reported-by: Hiroshi Miyata <miyahiro.dazu@gmail.com>
Tested-by: Hiroshi Miyata <miyahiro.dazu@gmail.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This patch implements use-space datapath and non-datapath code
to match and use the datapath API set out in Leo Alterman's patch
"user-space datapath: Add basic MPLS support to kernel".
The resulting MPLS implementation supports:
* Pushing a single MPLS label
* Poping a single MPLS label
* Modifying an MPLS lable using set-field or load actions
that act on the label value, tc and bos bit.
* There is no support for manipulating the TTL
this is considered future work.
The single-level push pop limitation is implemented by processing
push, pop and set-field/load actions in order and discarding information
that would require multiple levels of push/pop to be supported.
e.g.
push,push -> the first push is discarded
pop,pop -> the first pop is discarded
This patch is based heavily on work by Ravi K.
Cc: Ravi K <rkerur@gmail.com>
Reviewed-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>