2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-29 13:27:59 +00:00

328 Commits

Author SHA1 Message Date
Eric Garver
f0fb825a37 Add support for 802.1ad (QinQ tunneling)
Flow key handling changes:
 - Add VLAN header array in struct flow, to record multiple 802.1q VLAN
   headers.
 - Add dpif multi-VLAN capability probing. If datapath supports
   multi-VLAN, increase the maximum depth of nested OVS_KEY_ATTR_ENCAP.

Refactor VLAN handling in dpif-xlate:
 - Introduce 'xvlan' to track VLAN stack during flow processing.
 - Input and output VLAN translation according to the xbundle type.

Push VLAN action support:
 - Allow ethertype 0x88a8 in VLAN headers and push_vlan action.
 - Support push_vlan on dot1q packets.

Use other_config:vlan-limit in table Open_vSwitch to limit maximum VLANs
that can be matched. This allows us to preserve backwards compatibility.

Add test cases for VLAN depth limit, Multi-VLAN actions and QinQ VLAN
handling

Co-authored-by: Thomas F Herbert <thomasfherbert@gmail.com>
Signed-off-by: Thomas F Herbert <thomasfherbert@gmail.com>
Co-authored-by: Xiao Liang <shaw.leon@gmail.com>
Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Signed-off-by: Eric Garver <e@erig.me>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-03-16 15:18:40 -07:00
Jarno Rajahalme
daf4d3c18d odp: Support conntrack orig tuple key.
Userspace support for datapath original direction conntrack tuple.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
2017-03-08 17:23:15 -08:00
Jarno Rajahalme
2a28ccc8f5 flow: Make room after ct_state.
'ct_state' currently only needs 8 bits, so we can make room for a new
CT field introduced in the next patch.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
2017-03-08 17:22:56 -08:00
Justin Pettit
3e24a8947b flow: Fix small typo in comment describing flow_compose().
Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
2017-01-05 13:51:16 -08:00
Jarno Rajahalme
742c0ac3c0 mpls: Fix MPLS restoration after patch port and group bucket.
This patch fixes problems with MPLS handling related to patch ports
and group buckets.

If a group bucket or a peer bridge across a patch port pushes MPLS
headers to a non-MPLS packet and outputs, the flow translation after
returning from the group bucket or patch port would undo the packet
transformations so that the processing could continue with the packet
as it was before entering the patch port.  There were two problems
with this:

1. As part of the first MPLS push on a non-MPLS packet, the flow
translation would first clear the L3/4 headers of the 'flow' to mark
those fields invalid.  Later, when committing 'flow' changes to
datapath actions before output, the necessary datapath MPLS actions
are created and the corresponding changes updated to the 'base flow'.
This was done using the same flow_push_mpls() function that clears
the L2/3 headers, so also the 'base flow' L2/3 headers were cleared.

Then, when translation returns from a patch port or group bucket, the
original 'flow' is restored, now showing no sign of the MPLS labels.
Since the 'base flow' now has the MPLS labels, following translations
know to issue MPLS POP actions before any output actions.  However, as
part of checking for changes to IP headers we test that the IP
protocol type was not changed.  But now the 'base flow's 'nw_proto'
field is zero and an assert fail crashes OVS.

This is solved by not clearing the L3/4 fields of the 'base
flow'. This allows the processing after the patch port to continue
with L3/4 fields as if no MPLS was done, after first issuing the
necessary MPLS POP actions.

2. IP header updates were done before the MPLS POP actions were
issued. This caused incorrect packet output after, e.g., group action
or patch port.  For example, with actions:

group 1234: all bucket=push_mpls,output:LOCAL

ip actions=group:1234,dec_ttl,output:LOCAL,output:LOCAL

the dec_ttl would only be executed before the last output to LOCAL,
since at the time of committing IP changes after the group action the
packet was still an MPLS packet.

This is solved by checking the dl_type of both 'flow' and 'base flow'
and issuing MPLS actions if they can transform the packet from an MPLS
packet to a non-MPLS packet.  For an IP packet the change in ttl can
then be correctly committed before the last two output actions.

Two test cases are added to prevent future regressions.

Reported-by: Thomas Morin <thomas.morin@orange.com>
Suggested-by: Takashi YAMAMOTO <yamamoto@ovn.org>
Fixes: 8bfd0fdac ("Enhance userspace support for MPLS, for up to 3 labels.")
Fixes: 1b035ef20 ("mpls: Allow l3 and l4 actions to prior to a push_mpls action")
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: YAMAMOTO Takashi <yamamoto@ovn.org>
2016-12-02 18:42:21 -08:00
Jesse Gross
8d8ab6c2d5 tun-metadata: Manage tunnel TLV mapping table on a per-bridge basis.
When using tunnel TLVs (at the moment, this means Geneve options), a
controller must first map the class and type onto an appropriate OXM
field so that it can be used in OVS flow operations. This table is
managed using OpenFlow extensions.

The original code that added support for TLVs made the mapping table
global as a simplification. However, this is not really logically
correct as the OpenFlow management commands are operating on a per-bridge
basis. This removes the original limitation to make the table per-bridge.

One nice result of this change is that it is generally clearer whether
the tunnel metadata is in datapath or OpenFlow format. Rather than
allowing ad-hoc format changes and trying to handle both formats in the
tunnel metadata functions, the format is more clearly separated by function.
Datapaths (both kernel and userspace) use datapath format and it is not
changed during the upcall process. At the beginning of action translation,
tunnel metadata is converted to OpenFlow format and flows and wildcards
are translated back at the end of the process.

As an additional benefit, this change improves performance in some flow
setup situations by keeping the tunnel metadata in the original packet
format in more cases. This helps when copies need to be made as the amount
of data touched is only what is present in the packet rather than the
maximum amount of metadata supported.

Co-authored-by: Madhu Challa <challa@noironetworks.com>
Signed-off-by: Madhu Challa <challa@noironetworks.com>
Signed-off-by: Jesse Gross <jesse@kernel.org>
Acked-by: Ben Pfaff <blp@ovn.org>
2016-09-19 09:52:22 -07:00
Daniele Di Proietto
e839d01eeb flow: Generate checksum and udp_len in flow_compose().
This is useful to test the connection tracker, which performs checksum
and udp length verification.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Joe Stringer <joe@ovn.org>
2016-07-27 18:52:13 -07:00
Daniele Di Proietto
206b60d48f flow: Introduce parse_dl_type().
The function simply returns the ethernet type of the packet (after
eventually discarding the VLAN tag).  It will be used by a following
commit.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
2016-07-27 17:58:44 -07:00
Daniele Di Proietto
94a81e4021 flow: Export parse_ipv6_ext_hdrs().
This will be used by a future commit.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Joe Stringer <joe@ovn.org>
Acked-by: Flavio Leitner <fbl@sysclose.org>
2016-07-27 17:58:44 -07:00
Ben Pfaff
5b2cdc3fbb flow: Verify that tot_len >= ip_len in miniflow_extract().
miniflow_extract() uses the following quantities when it examines an IPv4
header:

    size, the number of bytes from the start of the IPv4 header onward
    ip_len, the number of bytes in the IPv4 header (from the IHL field)
    tot_len, same as size but taken from IPv4 header Total Length field

Until now, the code in miniflow_extract() verified these invariants:

    size >= 20 (minimum IP header length)
    ip_len >= 20 (ditto)
    ip_len <= size (to avoid reading past end of packet)
    tot_len <= size (ditto)
    size - tot_len <= 255 (because this is stored in a 1-byte variable
			   internally and wouldn't normally be big)

It failed to verify the following, which is not implied by the conjunction
of the above:

    ip_len <= tot_len (e.g. that the IP header fits in the packet)

This means that the code was willing to read past the end of an IP
packet's declared length, up to the actual end of the packet including any
L2 padding.  For example, given:

    size = 44
    ip_len = 44
    tot_len = 40

miniflow_extract() would successfully verify all the constraints, then:

    * Trim off 4 bytes of tail padding (size - tot_len), reducing size to
      40 to match tot_len.
    * Pull 44 (ip_len) bytes of IP header, even though there are only 40
      bytes left.  This causes 'size' to wrap around to SIZE_MAX-4.

Given an IP protocol that OVS understands (such as TCP or UDP), this
integer wraparound could cause OVS to read past the end of the packet.
In turn, this could cause OVS to extract invalid port numbers, TCP flags,
or ICMPv4 or ICMPv6 or IGMP type and code from arbitrary heap data
past the end of a packet.

This bug has common hallmarks of a security vulnerability, but we do not
know of a way to exploit this bug to cause an Open vSwitch crash, or to
extract sensitive data from Open vSwitch address space to an attacker's
benefit.

We do not have a specific example, but it is reasonable to suspect that
this bug could allow an attacker in some circumstances to bypass ACLs
implemented via Open vSwitch flow tables.  However, any IP packet that
triggers this bug is invalid and should be rejected in an early stage of a
receiver's IP stack.  For the same reason, any IP packet that triggers this
bug will also be dropped by any IP router, so an attacker would have to
share the same L2 segment as the victim.  In conjunction with an IP stack
that has a similar bug, of course, this could cause some damage, but we do
not know of an IP stack with such a bug; neither Linux nor the OVS
userspace tunnel implementation appear to have such a bug.

Reported-by: Bhargava Shastry <bshastry@sec.t-labs.tu-berlin.de>
Reported-by: Kashyap Thimmaraju <Kashyap.Thimmaraju@sec.t-labs.tu-berlin.de>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Flavio Leitner <fbl@sysclose.org>
2016-07-26 10:07:17 -07:00
Ryan Moats
ece9c2947d Explain initialization when using csum()
The checksum method csum() requires its output location to be
intialized to zero when that output location is part of the
checksum.  Add comments to the various places where csum is
called documenting where the initialization has occurred.

Signed-off-by: Ryan Moats <rmoats@us.ibm.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2016-07-24 11:47:28 -07:00
Terry Wilson
ee89ea7b47 json: Move from lib to include/openvswitch.
To easily allow both in- and out-of-tree building of the Python
wrapper for the OVS JSON parser (e.g. w/ pip), move json.h to
include/openvswitch. This also requires moving lib/{hmap,shash}.h.

Both hmap.h and shash.h were #include-ing "util.h" even though the
headers themselves did not use anything from there, but rather from
include/openvswitch/util.h. Fixing that required including util.h
in several C files mostly due to OVS_NOT_REACHED and things like
xmalloc.

Signed-off-by: Terry Wilson <twilson@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2016-07-22 17:09:17 -07:00
Justin Pettit
b23ada8eec Introduce 128-bit xxregs.
These are needed to handle IPv6 addresses.

Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
2016-07-12 21:14:02 -07:00
Justin Pettit
847b8b027a Increase number of registers to 16.
With eight 32-bit registers, we can only store two IPv6 addresses, which is
pretty tight.

Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
2016-07-12 21:14:02 -07:00
Daniele Di Proietto
362ad4ba31 flow: Fix uninitialized reads in [mini]flow_hash_5tuple().
Almost every caller expects [mini]flow_hash_5tuple() to be able to deal
with all kinds of flows, not only TCP and UDP.

Currently, when dealing with non L4 flows, the function may access
uninitialized memory.  This commit changes it to return prematurely with
a partial hash value instead of reading uninitialized memory.

Found by valgrind.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ben Pfaff <blp@ovn.org>
2016-05-20 11:08:08 -07:00
Justin Pettit
2ff8484bbf util: Pass 128-bit arguments directly instead of using pointers.
Commit f2d105b5 (ofproto-dpif-xlate: xlate ct_{mark, label} correctly.)
introduced the ovs_u128_and() function.  It directly takes ovs_u128
values as arguments instead of pointers to them.  As this is a bit more
direct way to deal with 128-bit values, modify the other utility
functions to do the same.

Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
2016-05-08 09:26:19 -07:00
Daniele Di Proietto
8ecb3068da flow: Fix flow_wc_map() for ICMPv6/IGMP type and code.
flow_wc_map() should include 'tp_src' and 'tp_dst' for ICMPv6 and IGMP
packets, since they're used for type and code.

This caused installed flows in the userspace datapath to always have
ICMPv6 code and type wildcarded (there are no other users of this
function).

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2016-05-02 13:56:36 -07:00
Ben Warren
e29747e440 Move lib/match.h to include/openvswitch directory
Signed-off-by: Ben Warren <ben@skyportsystems.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2016-04-14 13:46:49 -07:00
Ben Warren
3e8a2ad145 Move lib/dynamic-string.h to include/openvswitch directory
Signed-off-by: Ben Warren <ben@skyportsystems.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2016-03-19 10:02:12 -07:00
Quentin Monnet
0deec6d261 match: Color output of match conditions for ovs-ofctl dump-flows.
Add color output for flow match conditions for ovs-ofctl dump-flows
command utility, by inserting color markers in the functions responsible
for printing those match condictions.

Signed-off-by: Quentin Monnet <quentin.monnet@6wind.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2016-03-18 14:02:48 -07:00
Simon Horman
1589ee5ae9 flow: add miniflow_pad_from_64
Provide leading padding to allow pushing a value to a miniflow where
the value is not aligned to 64 bytes and no value has already been
pushed to the same word.

This will be used by a follow-up patch to allow layer 3 packet - that is
packets without an ethernet header - to be represented in flows.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Jarno Rajahalme <jarno@ovn.org>
2016-02-24 14:13:50 +09:00
Simon Horman
1dcf9ac7ee flow: add miniflow_push_uint8
The motivation is to allow pushing single bytes in
a manner to that already used for 16, 32 and 64 bit integers.

This will be used by a follow-up patch to allow layer 3 packet -
that is packets without an ethernet header - to be represented in flows.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Jarno Rajahalme <jarno@ovn.org>
2016-02-24 14:13:50 +09:00
Simon Horman
e807a1a63d flow: fix compilation of MINIFLOW_ASSERT
Often MINIFLOW_ASSERT is a no-op and compilation of code that uses
it is optimised out. This patch fixes compilation errors that occur
when that is not the case:

* FLOWMAP_MAX does not exist. Use MAP_MAP instead.
* FLOWMAP_IS_SET does not exist. Use flowmap_is_set instead.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Jarno Rajahalme <jarno@ovn.org>
2016-01-11 17:35:18 -08:00
Simon Horman
06f41fc484 flow: Pass last field to miniflow_pad_to_64().
Make miniflow_pad_to_64() a little more robust with regards to updates to
struct flow by passing the last field, whose end should be considered for
padding, rather than the next field, whose start should be considered.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2015-12-22 22:36:05 -08:00
Jiri Benc
ffe4c74f93 tunneling: extend flow_tnl with ipv6 addresses
Note that because there's been no prerequisite on the outer protocol,
we cannot add it now. Instead, treat the ipv4 and ipv6 dst fields in the way
that either both are null, or at most one of them is non-null.

[cascardo: abstract testing either dst with flow_tnl_dst_is_set]
cascardo: using IPv4-mapped address is an exercise for the future, since this
would require special handling of MFF_TUN_SRC and MFF_TUN_DST and OpenFlow
messages.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Co-authored-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2015-11-30 10:31:35 -08:00
Jarno Rajahalme
9ac0aadab9 conntrack: Add support for NAT.
Extend OVS conntrack interface to cover NAT.  New nested NAT action
may be included with a CT action.  A bare NAT action only mangles
existing connections.  If a NAT action with src or dst range attribute
is included, new (non-committed) connections are mangled according to
the NAT attributes.

This work extends on a branch by Thomas Graf at
https://github.com/tgraf/ovs/tree/nat.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
2015-11-25 16:04:59 -08:00
Joe Stringer
9daf23484f Add connection tracking label support.
This patch adds a new 128-bit metadata field to the connection tracking
interface. When a label is specified as part of the ct action and the
connection is committed, the value is saved with the current connection.
Subsequent ct lookups with the table specified will expose this metadata
as the "ct_label" field in the flow.

For example, to allow new TCP connections from port 1->2 and only allow
established connections from port 2->1, and to associate a label with
those connections:

    table=0,priority=1,action=drop
    table=0,arp,action=normal
    table=0,in_port=1,tcp,action=ct(commit,exec(set_field:1->ct_label)),2
    table=0,in_port=2,ct_state=-trk,tcp,action=ct(table=1)
    table=1,in_port=2,ct_state=+trk,ct_label=1,tcp,action=1

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-10-13 15:34:16 -07:00
Joe Stringer
8e53fe8cf7 Add connection tracking mark support.
This patch adds a new 32-bit metadata field to the connection tracking
interface. When a mark is specified as part of the ct action and the
connection is committed, the value is saved with the current connection.
Subsequent ct lookups with the table specified will expose this metadata
as the "ct_mark" field in the flow.

For example, to allow new TCP connections from port 1->2 and only allow
established connections from port 2->1, and to associate a mark with those
connections:

    table=0,priority=1,action=drop
    table=0,arp,action=normal
    table=0,in_port=1,tcp,action=ct(commit,exec(set_field:1->ct_mark)),2
    table=0,in_port=2,ct_state=-trk,tcp,action=ct(table=1)
    table=1,in_port=2,ct_state=+trk,ct_mark=1,tcp,action=1

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-10-13 15:34:15 -07:00
Joe Stringer
07659514c3 Add support for connection tracking.
This patch adds a new action and fields to OVS that allow connection
tracking to be performed. This support works in conjunction with the
Linux kernel support merged into the Linux-4.3 development cycle.

Packets have two possible states with respect to connection tracking:
Untracked packets have not previously passed through the connection
tracker, while tracked packets have previously been through the
connection tracker. For OpenFlow pipeline processing, untracked packets
can become tracked, and they will remain tracked until the end of the
pipeline. Tracked packets cannot become untracked.

Connections can be unknown, uncommitted, or committed. Packets which are
untracked have unknown connection state. To know the connection state,
the packet must become tracked. Uncommitted connections have no
connection state stored about them, so it is only possible for the
connection tracker to identify whether they are a new connection or
whether they are invalid. Committed connections have connection state
stored beyond the lifetime of the packet, which allows later packets in
the same connection to be identified as part of the same established
connection, or related to an existing connection - for instance ICMP
error responses.

The new 'ct' action transitions the packet from "untracked" to
"tracked" by sending this flow through the connection tracker.
The following parameters are supported initally:

- "commit": When commit is executed, the connection moves from
  uncommitted state to committed state. This signals that information
  about the connection should be stored beyond the lifetime of the
  packet within the pipeline. This allows future packets in the same
  connection to be recognized as part of the same "established" (est)
  connection, as well as identifying packets in the reply (rpl)
  direction, or packets related to an existing connection (rel).
- "zone=[u16|NXM]": Perform connection tracking in the zone specified.
  Each zone is an independent connection tracking context. When the
  "commit" parameter is used, the connection will only be committed in
  the specified zone, and not in other zones. This is 0 by default.
- "table=NUMBER": Fork pipeline processing in two. The original instance
  of the packet will continue processing the current actions list as an
  untracked packet. An additional instance of the packet will be sent to
  the connection tracker, which will be re-injected into the OpenFlow
  pipeline to resume processing in the specified table, with the
  ct_state and other ct match fields set. If the table is not specified,
  then the packet is submitted to the connection tracker, but the
  pipeline does not fork and the ct match fields are not populated. It
  is strongly recommended to specify a table later than the current
  table to prevent loops.

When the "table" option is used, the packet that continues processing in
the specified table will have the ct_state populated. The ct_state may
have any of the following flags set:

- Tracked (trk): Connection tracking has occurred.
- Reply (rpl): The flow is in the reply direction.
- Invalid (inv): The connection tracker couldn't identify the connection.
- New (new): This is the beginning of a new connection.
- Established (est): This is part of an already existing connection.
- Related (rel): This connection is related to an existing connection.

For more information, consult the ovs-ofctl(8) man pages.

Below is a simple example flow table to allow outbound TCP traffic from
port 1 and drop traffic from port 2 that was not initiated by port 1:

    table=0,priority=1,action=drop
    table=0,arp,action=normal
    table=0,in_port=1,tcp,ct_state=-trk,action=ct(commit,zone=9),2
    table=0,in_port=2,tcp,ct_state=-trk,action=ct(zone=9,table=1)
    table=1,in_port=2,ct_state=+trk+est,tcp,action=1
    table=1,in_port=2,ct_state=+trk+new,tcp,action=drop

Based on original design by Justin Pettit, contributions from Thomas
Graf and Daniele Di Proietto.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-10-13 15:34:15 -07:00
Ben Pfaff
aba5d6dd52 flow: Fix MSVC compile errors.
This fixes some MSVC build errors introduced by commit 74ff3298c
(userspace: Define and use struct eth_addr.)

MSVC doesn't like the change in 'const' between function declaration and
definition: it reports "formal parameter 2 different from declaration" for
each of the functions in flow.h corrected by this (commit.  I think it's
technically wrong about that, standards-wise.)

MSVC doesn't like an empty-brace initializer.  (I think it's technically
right about that, standards-wise.)

This commit attempts to fix both problems, but I have not tested it with
MSVC.

CC: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Nithin Raju <nithin@vmware.com>
Tested-by: Nithin Raju <nithin@vmware.com>
2015-08-31 12:56:09 -07:00
Jarno Rajahalme
74ff3298c8 userspace: Define and use struct eth_addr.
Define struct eth_addr and use it instead of a uint8_t array for all
ethernet addresses in OVS userspace.  The struct is always the right
size, and it can be assigned without an explicit memcpy, which makes
code more readable.

"struct eth_addr" is a good type name for this as many utility
functions are already named accordingly.

struct eth_addr can be accessed as bytes as well as ovs_be16's, which
makes the struct 16-bit aligned.  All use seems to be 16-bit aligned,
so some algorithms on the ethernet addresses can be made a bit more
efficient making use of this fact.

As the struct fits into a register (in 64-bit systems) we pass it by
value when possible.

This patch also changes the few uses of Linux specific ETH_ALEN to
OVS's own ETH_ADDR_LEN, and removes the OFP_ETH_ALEN, as it is no
longer needed.

This work stemmed from a desire to make all struct flow members
assignable for unrelated exploration purposes.  However, I think this
might be a nice code readability improvement by itself.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
2015-08-28 14:55:11 -07:00
Jarno Rajahalme
5fcff47b0b flow: Add struct flowmap.
Struct miniflow is now sometimes used just as a map.  Define a new
struct flowmap for that purpose.  The flowmap is defined as an array of
maps, and it is automatically sized according to the size of struct
flow, so it will be easier to maintain in the future.

It would have been tempting to use the existing struct bitmap for this
purpose. The main reason this is not feasible at the moment is that
some flowmap algorithms are simpler when it can be assumed that no
struct flow member requires more bits than can fit to a single map
unit. The tunnel member already requires more than 32 bits, so the map
unit needs to be 64 bits wide.

Performance critical algorithms enumerate the flowmap array units
explicitly, as it is easier for the compiler to optimize, compared to
the normal iterator.  Without this optimization a classifier lookup
without wildcard masks would be about 25% slower.

With this more general (and maintainable) algorithm the classifier
lookups are about 5% slower, when the struct flow actually becomes big
enough to require a second map.  This negates the performance gained
in the "Pre-compute stage masks" patch earlier in the series.

Requested-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-08-26 15:37:22 -07:00
Jarno Rajahalme
fa2fdbf8d0 classifier: Pre-compute stage masks.
This makes stage mask computation happen only when a subtable is
inserted and allows simplification of the main lookup function.

Classifier benchmark shows that this speeds up the classification
(with wildcards) about 5%.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-08-26 14:41:02 -07:00
Simon Horman
cc5dba2f32 flow: Ignore invalid ICMPv6 fields when parsing packets
There is a miss-match between the handling of invalid ICMPv6 fields in the
implementations of parse_icmpv6() in user-space and in the kernel datapath.

This patch addresses that by modifying the user-space implementation to
match that of the kernel datapath; processing is terminated without
rather than with an error and partial information is cleared.

With these changes the user-space implementation of parse_icmpv6()
never returns an error. Accordingly the return type and caller have been
updated.

The original motivation for this is to allow matching the ICMPv6 type and
code of packets with invalid neighbour discovery options although only the
change around the '(!opt_len || opt_len > *sizep)' conditional is necessary
to achieve that goal.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2015-08-13 14:47:37 -07:00
Jarno Rajahalme
c2581ccf64 flow: Avoid compile errors.
GCC (4.7) sees too wide shifts when there are none, refactor to
circumvent the false error.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
2015-08-12 16:01:41 -07:00
Jesse Gross
6728d578f6 dpif-netdev: Translate Geneve options per-flow, not per-packet.
The kernel implementation of Geneve options stores the TLV option
data in the flow exactly as received, without any further parsing.
This is then translated to known options for the purposes of matching
on flow setup (which will then install a datapath flow in the form
the kernel is expecting).

The userspace implementation behaves a little bit differently - it
looks up known options as each packet is received. The reason for this
is there is a much tighter coupling between datapath and flow translation
and the representation is generally expected to be the same. This works
but it incurs work on a per-packet basis that could be done per-flow
instead.

This introduces a small translation step for Geneve packets between
datapath and flow lookup for the userspace datapath in order to
allow the same kind of processing that the kernel does. A side effect
of this is that unknown options are now shown when flows dumped via
ovs-appctl dpif/dump-flows, similar to the kernel.

There is a second benefit to this as well: for some operations it is
preferable to keep the options exactly as they were received on the wire,
which this enables. One example is that for packets that are executed from
ofproto-dpif-upcall to the datapath, this avoids the translation of
Geneve metadata. Since this conversion is potentially lossy (for unknown
options), keeping everything in the same format removes the possibility
of dropping options if the packet comes back up to userspace and the
Geneve option translation table has changed. To help with these types of
operations, most functions can understand both formats of data and seamlessly
do the right thing.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2015-08-05 20:26:48 -07:00
Jarno Rajahalme
361d808dd9 flow: Split miniflow's map.
Use two maps in miniflow to allow for expansion of struct flow past
512 bytes.  We now have one map for tunnel related fields, and another
for the rest of the packet metadata and actual packet header fields.
This split has the benefit that for non-tunneled packets the overhead
should be minimal.

Some miniflow utilities now exist in two variants, new ones operating
over all the data, and the old ones operating only on a single 64-bit
map at a time.  The old ones require doubling of code but should
execute faster, so those are used in the datapath and classifier's
lookup path.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-07-17 15:18:43 -07:00
Jarno Rajahalme
09b0fa9c55 flow: Make compile with MSVC.
MSVC does not like zero sized arrays in structs.  Hence, remove the
'values' member from struct miniflow and add back the getters
miniflow_values() and miniflow_get_values().

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-07-16 17:42:24 -07:00
Jesse Gross
b666962be3 tunneling: Allow matching and setting tunnel 'OAM' flag.
Several encapsulation formats have the concept of an 'OAM' bit
which typically is used with networking tracing tools to
distinguish test packets from real traffic. OVS already internally
has support for this, however, it doesn't do anything with it
and it also isn't exposed for controllers to use. This enables
support through OpenFlow.

There are several other tunnel flags which are consumed internally
by OVS. It's not clear that it makes sense to use them externally
so this does not expose those flags - although it should be easy
to do so if necessary in the future.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-07-15 20:33:41 -07:00
Jesse Gross
8e4c1621e9 flow: Factor out flag parsing and formatting routines.
There are several implementations of functions that parse/format
flags and their binary representation. This factors them out into
common routines. In addition to reducing code, it also makes things
more consistent across different parts of OVS.

Signed-off-by: Jesse Gross <jesse@nicira.com>
2015-07-15 20:24:04 -07:00
Jarno Rajahalme
a851eb9492 flow: Eliminate miniflow_clone() and minimask_clone().
miniflow_clone() and minimask_clone() are no longer used, remove them
from the API.

Now that miniflow data is always inlined, it makes sense to rename
miniflow_clone_inline() miniflow_clone().

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-07-15 13:17:10 -07:00
Jarno Rajahalme
ceb3bd6731 match: Single malloc minimatch.
Allocate the miniflow and minimask in struct minimatch at once, so
that they are consecutive in memory.  This halves the number of
allocations, and allows smaller minimatches to share the same cache
line.

After this a minimatch has one heap allocation for all it's data.
Previously it had either none (when data was small enough to fit in
struct miniflow's inline buffer), or two (when the inline buffer was
insufficient).  Hopefully always having one performs almost the same
as none or two, in average.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-07-15 13:17:10 -07:00
Jarno Rajahalme
8fd4792403 flow: Always inline miniflows.
Now that performance critical code already inlines miniflows and
minimasks, we can simplify struct miniflow by always dynamically
allocating miniflows and minimasks to the correct size.  This changes
the struct minimatch to always contain pointers to its miniflow and
minimask.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-07-15 13:17:10 -07:00
Jeroen van Bemmel
4249b5470e hash: Add symmetric L3/L4 hash functions for multipath, bundle hashing.
Signed-off-by: Jeroen van Bemmel <jvb127@gmail.com>
[blp@nicira.com made code style fixes, expanded documentation]
Signed-off-by: Ben Pfaff <blp@nicira.com>
2015-07-08 08:33:27 -07:00
Jesse Gross
9ad11dbe4c pkt-metadata: Avoid introducing overhead for userspace tunnels.
The addition of Geneve metadata requires a large amount of additional
space to handle the maximum set of options. In most cases, this is
not a big deal since it is only temporary storage on the stack or
can be automatically stripped out for miniflows. However, userspace
tunnels need to deal with this on a per-packet basis, so we should
avoid introducing additional overhead if possible. Two small changes
are aimed at this:

 * Move struct flow_tnl to the end of the packet metadata. Since
   the Geneve metadata is already at the end of flow_tnl and pkt_metadata
   is at the end of struct dp_packet, this avoids putting a large
   amount metadata (which might be empty) in hot cache lines.

 * Only push the new metadata into a miniflow if any options are present
   during miniflow_extract(). This does not necessarily provide the
   most fine-grained flow generation but it is a quick check and
   the userspace implementation of Geneve does not currently support
   options anyways.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-06-25 11:08:58 -07:00
Jesse Gross
9558d2a548 tunnel: Geneve TLV handling support for OpenFlow.
The current support for Geneve in OVS is exactly equivalent to VXLAN:
it is possible to set and match on the VNI but not on any options
contained in the header. This patch enables the use of options.

The goal for Geneve support is not to add support for any particular option
but to allow end users or controllers to specify what they would like to
match. That is, the full range of Geneve's capabilities should be exposed
without modifying the code (the one exception being options that require
per-packet computation in the fast path).

The main issue with supporting Geneve options is how to integrate the
fields into the existing OpenFlow pipeline. All existing operations
are referred to by their NXM/OXM field name - matches, action generation,
arithmetic operations (i.e. tranfer to a register). However, the Geneve
option space is exactly the same as the OXM space, so a direct mapping
is not feasible. Instead, we create a pool of 64 NXMs that are then
dynamically mapped on Geneve option TLVs using OpenFlow. Once mapped,
these fields become first-class citizens in the OpenFlow pipeline.

An example of how to use Geneve options:
ovs-ofctl add-geneve-map br0 {class=0xffff,type=0,len=4}->tun_metadata0
ovs-ofctl add-flow br0 in_port=LOCAL,actions=set_field:0xffffffff->tun_metadata0,1

This will add a 4 bytes option (filled will all 1's) to all packets
coming from the LOCAL port and then send then out to port 1.

A limitation of this patch is that although the option table is specified
for a particular switch over OpenFlow, it is currently global to all
switches. This will be addressed in a future patch.

Based on work originally done by Madhu Challa. Ben Pfaff also significantly
improved the comments.

Signed-off-by: Madhu Challa <challa@noironetworks.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-06-25 11:08:58 -07:00
Ben Pfaff
268eca112d flow: Make assertions about offsets within struct flow easier to follow.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2015-06-15 14:44:37 -07:00
Ben Pfaff
4c0e587ce4 flow: Add 'const' qualifiers in flow extraction.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
2015-06-08 15:04:30 -07:00
Jesse Gross
50dcbd8ed4 ofp-util: Convert flow_metadata to match structure.
We have a special flow_metadata structure to represent the parts
of a packet that aren't carried in the payload itself. This is
used in the case where we need to send the packet as a Packet In
to an OpenFlow controller. This is a subset of the more general
struct flow.

In practice, almost all operations we do on this structure involve
converting it to or from a match or have code that is the same as
a match. Serialization to NXM and back is done as a match. There
is special flow_metadata formatting code that is almost identical
to match formatting.

The uses for struct flow_metadata aren't performance critical
when it comes to memory, so we can save quite a bit of code by
just using a match structure directly instead. In addition, as
metadata increases and becomes more complex (Geneve options require
some special handling beyond just additional fields), using the
match structure means we only have to do this work in one place.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-06-08 10:17:58 -07:00
Daniele Di Proietto
82eb5b0aba dp-packet: Remove 'frame' member.
In 'struct ofpbuf' the 'frame' pointer was used to parse different kinds of
data (Ethernet, OpenFlow, Netlink attributes).  For Ethernet packets the
'frame' pointer was supposed to have the same value as the 'data'
pointer.

Since 'struct dp_packet' is only used for Ethernet packets, there's no
need for a separate 'frame' pointer: we can use the 'data' pointer
instead.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2015-05-18 15:14:02 -07:00