Note that because there's been no prerequisite on the outer protocol,
we cannot add it now. Instead, treat the ipv4 and ipv6 dst fields in the way
that either both are null, or at most one of them is non-null.
[cascardo: abstract testing either dst with flow_tnl_dst_is_set]
cascardo: using IPv4-mapped address is an exercise for the future, since this
would require special handling of MFF_TUN_SRC and MFF_TUN_DST and OpenFlow
messages.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Co-authored-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
This patch adds a new 128-bit metadata field to the connection tracking
interface. When a label is specified as part of the ct action and the
connection is committed, the value is saved with the current connection.
Subsequent ct lookups with the table specified will expose this metadata
as the "ct_label" field in the flow.
For example, to allow new TCP connections from port 1->2 and only allow
established connections from port 2->1, and to associate a label with
those connections:
table=0,priority=1,action=drop
table=0,arp,action=normal
table=0,in_port=1,tcp,action=ct(commit,exec(set_field:1->ct_label)),2
table=0,in_port=2,ct_state=-trk,tcp,action=ct(table=1)
table=1,in_port=2,ct_state=+trk,ct_label=1,tcp,action=1
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This patch adds a new 32-bit metadata field to the connection tracking
interface. When a mark is specified as part of the ct action and the
connection is committed, the value is saved with the current connection.
Subsequent ct lookups with the table specified will expose this metadata
as the "ct_mark" field in the flow.
For example, to allow new TCP connections from port 1->2 and only allow
established connections from port 2->1, and to associate a mark with those
connections:
table=0,priority=1,action=drop
table=0,arp,action=normal
table=0,in_port=1,tcp,action=ct(commit,exec(set_field:1->ct_mark)),2
table=0,in_port=2,ct_state=-trk,tcp,action=ct(table=1)
table=1,in_port=2,ct_state=+trk,ct_mark=1,tcp,action=1
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This patch adds a new action and fields to OVS that allow connection
tracking to be performed. This support works in conjunction with the
Linux kernel support merged into the Linux-4.3 development cycle.
Packets have two possible states with respect to connection tracking:
Untracked packets have not previously passed through the connection
tracker, while tracked packets have previously been through the
connection tracker. For OpenFlow pipeline processing, untracked packets
can become tracked, and they will remain tracked until the end of the
pipeline. Tracked packets cannot become untracked.
Connections can be unknown, uncommitted, or committed. Packets which are
untracked have unknown connection state. To know the connection state,
the packet must become tracked. Uncommitted connections have no
connection state stored about them, so it is only possible for the
connection tracker to identify whether they are a new connection or
whether they are invalid. Committed connections have connection state
stored beyond the lifetime of the packet, which allows later packets in
the same connection to be identified as part of the same established
connection, or related to an existing connection - for instance ICMP
error responses.
The new 'ct' action transitions the packet from "untracked" to
"tracked" by sending this flow through the connection tracker.
The following parameters are supported initally:
- "commit": When commit is executed, the connection moves from
uncommitted state to committed state. This signals that information
about the connection should be stored beyond the lifetime of the
packet within the pipeline. This allows future packets in the same
connection to be recognized as part of the same "established" (est)
connection, as well as identifying packets in the reply (rpl)
direction, or packets related to an existing connection (rel).
- "zone=[u16|NXM]": Perform connection tracking in the zone specified.
Each zone is an independent connection tracking context. When the
"commit" parameter is used, the connection will only be committed in
the specified zone, and not in other zones. This is 0 by default.
- "table=NUMBER": Fork pipeline processing in two. The original instance
of the packet will continue processing the current actions list as an
untracked packet. An additional instance of the packet will be sent to
the connection tracker, which will be re-injected into the OpenFlow
pipeline to resume processing in the specified table, with the
ct_state and other ct match fields set. If the table is not specified,
then the packet is submitted to the connection tracker, but the
pipeline does not fork and the ct match fields are not populated. It
is strongly recommended to specify a table later than the current
table to prevent loops.
When the "table" option is used, the packet that continues processing in
the specified table will have the ct_state populated. The ct_state may
have any of the following flags set:
- Tracked (trk): Connection tracking has occurred.
- Reply (rpl): The flow is in the reply direction.
- Invalid (inv): The connection tracker couldn't identify the connection.
- New (new): This is the beginning of a new connection.
- Established (est): This is part of an already existing connection.
- Related (rel): This connection is related to an existing connection.
For more information, consult the ovs-ofctl(8) man pages.
Below is a simple example flow table to allow outbound TCP traffic from
port 1 and drop traffic from port 2 that was not initiated by port 1:
table=0,priority=1,action=drop
table=0,arp,action=normal
table=0,in_port=1,tcp,ct_state=-trk,action=ct(commit,zone=9),2
table=0,in_port=2,tcp,ct_state=-trk,action=ct(zone=9,table=1)
table=1,in_port=2,ct_state=+trk+est,tcp,action=1
table=1,in_port=2,ct_state=+trk+new,tcp,action=drop
Based on original design by Justin Pettit, contributions from Thomas
Graf and Daniele Di Proietto.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Define struct eth_addr and use it instead of a uint8_t array for all
ethernet addresses in OVS userspace. The struct is always the right
size, and it can be assigned without an explicit memcpy, which makes
code more readable.
"struct eth_addr" is a good type name for this as many utility
functions are already named accordingly.
struct eth_addr can be accessed as bytes as well as ovs_be16's, which
makes the struct 16-bit aligned. All use seems to be 16-bit aligned,
so some algorithms on the ethernet addresses can be made a bit more
efficient making use of this fact.
As the struct fits into a register (in 64-bit systems) we pass it by
value when possible.
This patch also changes the few uses of Linux specific ETH_ALEN to
OVS's own ETH_ADDR_LEN, and removes the OFP_ETH_ALEN, as it is no
longer needed.
This work stemmed from a desire to make all struct flow members
assignable for unrelated exploration purposes. However, I think this
might be a nice code readability improvement by itself.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Allocate the miniflow and minimask in struct minimatch at once, so
that they are consecutive in memory. This halves the number of
allocations, and allows smaller minimatches to share the same cache
line.
After this a minimatch has one heap allocation for all it's data.
Previously it had either none (when data was small enough to fit in
struct miniflow's inline buffer), or two (when the inline buffer was
insufficient). Hopefully always having one performs almost the same
as none or two, in average.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Now that performance critical code already inlines miniflows and
minimasks, we can simplify struct miniflow by always dynamically
allocating miniflows and minimasks to the correct size. This changes
the struct minimatch to always contain pointers to its miniflow and
minimask.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
The current support for Geneve in OVS is exactly equivalent to VXLAN:
it is possible to set and match on the VNI but not on any options
contained in the header. This patch enables the use of options.
The goal for Geneve support is not to add support for any particular option
but to allow end users or controllers to specify what they would like to
match. That is, the full range of Geneve's capabilities should be exposed
without modifying the code (the one exception being options that require
per-packet computation in the fast path).
The main issue with supporting Geneve options is how to integrate the
fields into the existing OpenFlow pipeline. All existing operations
are referred to by their NXM/OXM field name - matches, action generation,
arithmetic operations (i.e. tranfer to a register). However, the Geneve
option space is exactly the same as the OXM space, so a direct mapping
is not feasible. Instead, we create a pool of 64 NXMs that are then
dynamically mapped on Geneve option TLVs using OpenFlow. Once mapped,
these fields become first-class citizens in the OpenFlow pipeline.
An example of how to use Geneve options:
ovs-ofctl add-geneve-map br0 {class=0xffff,type=0,len=4}->tun_metadata0
ovs-ofctl add-flow br0 in_port=LOCAL,actions=set_field:0xffffffff->tun_metadata0,1
This will add a 4 bytes option (filled will all 1's) to all packets
coming from the LOCAL port and then send then out to port 1.
A limitation of this patch is that although the option table is specified
for a particular switch over OpenFlow, it is currently global to all
switches. This will be addressed in a future patch.
Based on work originally done by Madhu Challa. Ben Pfaff also significantly
improved the comments.
Signed-off-by: Madhu Challa <challa@noironetworks.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
GCC 4.6.1 complained about the match structure not being properly
initialzed when using MATCH_CATCHALL_INITIALIZER macro.
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Russell Bryant <rbryant@redhat.com>
Introduces two new NXMs to represent VXLAN-GBP [0] fields.
actions=load:0x10->NXM_NX_TUN_GBP_ID[],NORMAL
tun_gbp_id=0x10,actions=drop
This enables existing VXLAN tunnels to carry security label
information such as a SELinux context to other network peers.
The values are carried to/from the datapath using the attribute
OVS_TUNNEL_KEY_ATTR_VXLAN_OPTS.
[0] https://tools.ietf.org/html/draft-smith-vxlan-group-policy-00
Signed-off-by: Madhu Challa <challa@noironetworks.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
A "conjunctive match" allows higher-level matches in the flow table, such
as set membership matches, without causing a cross-product explosion for
multidimensional matches. Please refer to the documentation that this
commit adds to ovs-ofctl(8) for a better explanation, including an example.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
This field allows a flow table to match on the output port currently in the
action set.
ONF-JIRA: EXT-233
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
OpenFlow has priorities in the 16-bit unsigned range, from 0 to 65535.
In the classifier, it is sometimes useful to be able to have values below
and above this range. With the 'unsigned int' type used for priorities
until now, there were no values below the range, so some code worked
around it by converting priorities to 64-bit signed integers. This didn't
seem so great to me given that a plain 'int' also had the needed range.
This commit therefore changes the type used for priorities to int.
The interesting parts of this change are in pvector.h and classifier.c,
where one can see the elimination of the use of int64_t.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
ETH_ADDR_LEN is defined in lib/packets.h, valued 6.
Use this macro instead of magic number 6 to represent the length
of eth mac address.
Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
These 64-bit registers are intended to conform with the OpenFlow 1.5
draft specification.
EXT-244.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
This helps about 1% in TCP_CRR performance test. However, this also
helps by clearly showing the classifier_lookup() cost in perf reports
as one item.
This also cleans up the flow/match APIs from functionality only used
by the classifier, making is more straightforward to evolve them
later.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
Infrastructure to enable megaflow support for bond ports using
recirculation. This patch adds the following features:
* Generate RECIRC action when bond can benefit from recirculation.
* Populate post recirculation rules in a hidden table. Currently table 254.
* Uses post recirculation rules for bond rebalancing
* A recirculation implementation in dpif-netdev.
The goal of this patch is to be able to megaflow bond outputs and
thus greatly improve performance. However, this patch does not
actually improve the megaflow generation. It is left for a later commit.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This commit makes the userspace support for MPLS more complete. Now
up to 3 labels are supported.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Co-authored-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Simon Horman <horms@verge.net.au>
Subtable lookup is performed in ranges defined for struct flow,
starting from metadata (registers, in_port, etc.), then L2 header, L3,
and finally L4 ports. Whenever it is found that there are no matches
in the current subtable, the rest of the subtable can be skipped. The
rationale of this logic is that as many fields as possible can remain
wildcarded.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
tcp_flags=flags/mask
Bitwise match on TCP flags. The flags and mask are 16-bit num‐
bers written in decimal or in hexadecimal prefixed by 0x. Each
1-bit in mask requires that the corresponding bit in port must
match. Each 0-bit in mask causes the corresponding bit to be
ignored.
TCP protocol currently defines 9 flag bits, and additional 3
bits are reserved (must be transmitted as zero), see RFCs 793,
3168, and 3540. The flag bits are, numbering from the least
significant bit:
0: FIN No more data from sender.
1: SYN Synchronize sequence numbers.
2: RST Reset the connection.
3: PSH Push function.
4: ACK Acknowledgement field significant.
5: URG Urgent pointer field significant.
6: ECE ECN Echo.
7: CWR Congestion Windows Reduced.
8: NS Nonce Sum.
9-11: Reserved.
12-15: Not matchable, must be zero.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
This patch gets rid of the need for having explicit padding in struct
flow as new fields are being added. flow_wildcards_init_exact(), which
used to set bits in both compiler generated and explicit padding, is
removed. match_wc_init() is now used instead, which generates the mask
based on a given flow, setting bits only in fields which make sense.
Places where random bits were placed in struct flow have been changed to
only set random bits on fields that are significant in the given context.
This avoids setting padding bits.
- lib/flow:
- Properly initialize struct flow also in places we used to zero out
padding before.
- Add flow_random_hash_fields() used for testing.
- Remove flow_wildcards_init_exact() to avoid initializing
masks where compiler generated padding has bits set.
- lib/match.c match_wc_init(): Wildcard transport layer fields for later
fragments, remove match_init_exact(), which used
flow_wildcards_init_exact().
- tests/test-flows.c: use match_wc_init() instead of match_init_exact()
- tests/flowgen.pl: generate more accurate packets and flows when
fragmenting, mark unavailable fields as wildcarded.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
The Linux kernel datapath enables matching and setting the skb mark
but this functionality is currently used only internally by
ovs-vswitchd. This exposes it through NXM to enable external
controllers to interact with other kernel subsystems. Although this
is simply exporting the skb mark, the intention is that this is a
platform independent mechanism to access some system metadata and
therefore may have different implementations on various systems.
Bug #17855
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
The skb_mark field is currently only available with the Linux datapath
and is only used internally. However, it is desirable to expose this
through OpenFlow and when it is exposed ideally it would not be system-
specific. In preparation for this, skb_mark is rename to pkt_mark in
internal data structures for consistency.
This does not rename the Linux interfaces because doing so would break
the API. It would not necessarily be desirable to do anyways since in
Linux-specific code it is clearer to use the actual name rather than a
generic one. This can lead to confusion in some places, however, because
we do not always strictly separate generic and platform dependent code
(one example is actions). This seems inevitable though at this point if
the lower and upper layers have different names (as they must given the
above requirements).
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
Until now, datapath ports and openflow ports were both represented by
unsigned integers of various sizes. With implicit conversions, etc., it is
easy to mix them up and use one where the other is expected. This commit
creates two typedefs, ofp_port_t and odp_port_t. Both of these two types
are marked by "__attribute__((bitwise))" so that sparse can be used to
detect any misuse.
Signed-off-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This patch implements use-space datapath and non-datapath code
to match and use the datapath API set out in Leo Alterman's patch
"user-space datapath: Add basic MPLS support to kernel".
The resulting MPLS implementation supports:
* Pushing a single MPLS label
* Poping a single MPLS label
* Modifying an MPLS lable using set-field or load actions
that act on the label value, tc and bos bit.
* There is no support for manipulating the TTL
this is considered future work.
The single-level push pop limitation is implemented by processing
push, pop and set-field/load actions in order and discarding information
that would require multiple levels of push/pop to be supported.
e.g.
push,push -> the first push is discarded
pop,pop -> the first pop is discarded
This patch is based heavily on work by Ravi K.
Cc: Ravi K <rkerur@gmail.com>
Reviewed-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This patch adds logging support for skb_mark and skb_priority.
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Ansis Atteka <aatteka@nicira.com>
flow_format() logs packets contents. However, the format used is not
the format accepted by ofproto/trace. Hence it becomes difficult to
trace the packets using the debugs printed. With this commit, the
logging of the packet contents is done in a format that is accepted
by ofproto/trace. This will make debugging easier.
Signed-off-by: Mehak Mahajan <mmahajan@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
A cls_rule is 324 bytes on i386 now. The cost of a flow table lookup is
currently proportional to this size, which is going to continue to grow.
However, the required cost of a flow table lookup, with the classifier that
we currently use, is only proportional to the number of bits that a rule
actually matches. This commit implements that optimization by replacing
the match inside "struct cls_rule" by a sparse representation.
This reduces struct cls_rule to 100 bytes on i386.
There is still some headroom for further optimization following this
commit:
- I suspect that adding an 'n' member to struct miniflow would make
miniflow operations faster, since popcount() has some cost.
- It's probably possible to replace the "struct minimatch" in cls_rule
by just a "struct miniflow", since the cls_rule's cls_table has a
copy of the minimask.
- Some of the miniflow operations aren't well-optimized.
Signed-off-by: Ben Pfaff <blp@nicira.com>