Add inlined generic accessors for miniflow integer type fields, and a
new miniflow_get_tcp_flags() usinge these. These will be used in a
later patch.
Some definitions also used in lib/packets.h had to be moved there to
resolve circular include dependencies. Similarly, some inline
functions using struct flow are now in lib/flow.h. IMO this is
cleaner, since now the lib/flow.h need not be included from
lib/packets.h.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Reviewed-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
miniflow_extract() extracts packet headers directly to a miniflow,
which is a compressed form of the struct flow. This does not require
a large struct to be cleared to begin with, and accesses less memory.
These performance benefits should allow this to be used in the DPDK
datapath.
miniflow_extract() takes a miniflow as an input/output parameter. On
input the buffer for values to be extracted must be properly
initialized. On output the map contains ones for all the fields that
have been extracted.
Some struct flow fields are reordered to make miniflow_extract to
progress in the logical order.
Some explicit "inline" keywords are necessary for GCC to optimize this
properly. Also, macros are used for same reason instead of inline
functions for pushing data to the miniflow.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Reviewed-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Add basic recirculation infrastructure and user space
data path support for it. The following bond mega flow patch will
make use of this infrastructure.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Commit 03fbdf8d9c80a (lib/flow: Retain ODPP_NONE on flow_extract())
replaced packet metadata initialization function by a macro.
Visual studio does not like nested structure initialization that
is done in that macro.
This commit replaces the macro by a function.
CC: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
We used to map ODPP_NONE to port number 0, which is wrong, as
ODPP_NONE is a valid value of the flow's in_port.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Change the flow_extract() API to accept struct pkt_metadata,
instead of individual metadata fields. It will make the API more
logical and easier to maintain when we need to expand metadata
down the road.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>¬
Make set_ethertype() static as it is not used outside of packet.c
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>
ICMPv4 and ICMPv6 have 8-bit "type" and "code" fields. struct flow
uses the low 8 bits of the 16-bit tp_src and tp_dst members to
represent these fields. The datapath interface, on the other hand,
represents them with just 8 bits each. This means that if the high 8
bits of the masks for these fields somehow become set (meaning to
match on the nonexistent "high bits" of these fields) during
translation, then they will get chopped off by a round trip through
the datapath, and revalidation will spot that as an inconsistency and
delete the flow. This commit avoids the problem by making sure that
only the low 8 bits of either field can be unwildcarded for ICMP.
This seems like the minimal fix for this problem, appropriate for
backporting to earlier branches. The root of the issue is that these high
bits can get set in the match at all. I have some leads on that, but they
require more invasive changes elsewhere.
Bug #23320.
Reported-by: Krishna Miriyala <miriyalak@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
This is in preparation for pushing vlan tags
using the TPID provided by the kernel via auxdata.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This helps reduce confusion about when a flow is a flow and when it is
just metadata.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Allow TCP flags match specification with symbolic flag names. TCP
flags are optionally specified as a string of flag names, each
preceded by '+' when the flag must be one, or '-' when the flag must
be zero. Any flags not explicitly included are wildcarded. The
existing hex syntax is still allowed, and is used in flow dumps when
all the flags are matched.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Widen TCP flags handling from 7 bits (uint8_t) to 12 bits (uint16_t).
The kernel interface remains at 8 bits, which makes no functional
difference now, as none of the higher bits is currently of interest
to the userspace.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
eth_mpls_depth() has been unused as of 1ac7c9bdb2b6fdcb ("ofproto-dpif: Use
execute_actions to execute controller actions").
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Ethernet headers are 14 bytes long, so when the beginning of such a header
is 32-bit aligned, the following data is misaligned. The usual trick to
fix that is to start the Ethernet header on an odd-numbered 16-bit
boundary. That trick works OK for Open vSwitch, but there are two
problems:
- OVS doesn't use that trick everywhere. Maybe it should, but it's
difficult to make sure that it does consistently because the CPUs
most commonly used with OVS don't care about misalignment, so we
only find problems when porting.
- Some protocols (GRE, VXLAN) don't use that trick, so in such a case
one can properly align the inner or outer L3/L4/L7 but not both. (OVS
userspace doesn't directly deal with such protocols yet, so this is
just future-proofing.)
- OpenFlow uses the alignment trick in a few places but not all of them.
This commit starts the adoption of what I hope will be a more robust way
to avoid misalignment problems and the resulting bus errors on RISC
architectures. Instead of trying to ensure that 32-bit quantities are
always aligned, we always read them as if they were misaligned. To ensure
that they are read this way, we change their types from 32-bit types to
pairs of 16-bit types. (I don't know of any protocols that offset the
next header by an odd number of bytes, so a 16-bit alignment assumption
seems OK.)
The same would be necessary for 64-bit types in protocol headers, but we
don't yet have any protocol definitions with 64-bit types.
IPv6 protocol headers need the same treatment, but for those we rely on
structs provided by system headers, so I'll leave them for an upcoming
patch.
Signed-off-by: Ben Pfaff <blp@nicira.com>
The current situation is that whenever any packet enters the
userspace, bfd_should_process_flow() looks at the UDP destination
port to figure out whether that is a BFD packet. This means that
UDP destination port cannot be wildcarded for all the other flows
too.
To optimize BFD for megaflows, we introduce a new
'bfd:bfd_dst_mac' field in the database. Whenever this field is set
by a controller, it is assumed that all the BFD packets to/from
this interface will have the destination mac address set as the one
specified in the bfd:bfd_dst_mac field. If this field is set, we
first look at the destination mac address of a packet and if it
does not match the mac address set in bfd:bfd_dst_mac, we do not
process that packet as bfd. If the field does match, we go ahead
and look at the UDP destination port too.
Also, change the default BFD destination mac address to
"00:23:20:00:00:01".
Feature #18850.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
Added support to allow mega flow specified and displayed. ovs-dpctl tool
is mainly used as debugging tool.
This patch also implements the low level user space routines to send
and receive mega flow netlink messages. Those netlink suppor
routines are required for forthcoming user space mega flow patches.
Added a unit test to test parsing and display of mega flows.
Ethan contributed the ovs-dpctl mega flow output function.
Co-authored-by: Ethan Jackson <ethan@nicira.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This adds support for the OpenFlow 1.1+ dec_mpls_ttl action.
And also adds an NX dec_mpls_ttl action.
The handling of the TTL modification is entirely handled in userspace.
Reviewed-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Use the innermost dl_type when decoding L3 and L4 data from a packet.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This patch implements use-space datapath and non-datapath code
to match and use the datapath API set out in Leo Alterman's patch
"user-space datapath: Add basic MPLS support to kernel".
The resulting MPLS implementation supports:
* Pushing a single MPLS label
* Poping a single MPLS label
* Modifying an MPLS lable using set-field or load actions
that act on the label value, tc and bos bit.
* There is no support for manipulating the TTL
this is considered future work.
The single-level push pop limitation is implemented by processing
push, pop and set-field/load actions in order and discarding information
that would require multiple levels of push/pop to be supported.
e.g.
push,push -> the first push is discarded
pop,pop -> the first pop is discarded
This patch is based heavily on work by Ravi K.
Cc: Ravi K <rkerur@gmail.com>
Reviewed-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>
An ovs_be32 is a more obvious way to represent an IP address than a
pointer to one. It is also more type-safe, especially since "sparse" is
able to check that the argument is in network byte order.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
Follwoing patch fixes sparse error:
lib/packets.c:643:1: error: symbol 'packet_set_ipv6' redeclared
with different type (originally declared at lib/packets.h:493)
- incompatible argument 6 (different base types)
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
This implementes push_vlan with 802.1Q.
NOTE: 802.1AD (QinQ) is not supported. It requires another effort.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Upcoming tunnel code will be able to handle ECN encapsulation/
decapsulation in userspace. This adds the necessary constants for ECN
manipulation.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Kyle Mestery <kmestery@cisco.com>
Rarp packets had their own header definition in the packets
library. This doesn't make sense because they have the same packet
format as arps.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
We need these for OpenFlow 1.1 ofp_match support even if we don't support
MPLS matching (which we don't, yet).
Reviewed-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>
SUSv3 doesn't require IPPROTO_SCTP so some systems might not provide it.
IPPROTO_SCTP isn't used in the tree yet so this doesn't fix a real bug.
Reviewed-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>
The name compose_rarp() more clearly describes what it's doing now.
Requested-by: Justin Pettit <jpettit@nicira.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Traditionally Open vSwitch had used 802.2 SNAP packets to update
upstream switch learning tables when necessary. This approach had
advantages in that debugging information could be embedded in the
packet helping hapless admins figure out what's going on. However,
since both qemu and VMware use RARP for this purpose, it seems
appropriate to fall in line with the defacto standard.
Requested-by: Ben Basler <bbasler@nicira.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Open vSwitch refuses to mirror certain destination addresses in
addition to those classified by eth_addr_is_reserved(). Looking
through the uses of eth_addr_is_reserved(), one finds that no
callers should be using the additional addresses which mirroring
drops. This patch folds the additional addresses dropped in the
mirroring code, into the more general eth_addr_is_reserverd()
function.
This patch also changes the implementation in a way that is
slightly less efficient, but much easier to read and extend int he
future.
Bug #11755.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
It turns out that eth_addr_equal_except() computed the exact
opposite of what it purported to. It returned true if the two
arguments where *not* equal. This is extremely confusing, so this
patch changes it.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
With OpenFlow 1.1 requiring arbitrary ethernet match support, it simplifies
other code if we have some extra helper functions. This patch adds
eth_mask_is_exact(mask), eth_addr_bitand(src, mask, dst),
eth_addr_equal_except(a, b, mask) and eth_format_masked(eth, mask, output).
Signed-off-by: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Replaced all instances of Nicira Networks(, Inc) to Nicira, Inc.
Feature #10593
Signed-off-by: Raju Subramanian <rsubramanian@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This commit pulls code used to modify L3 and L4 header fields
from dp_netdev into the packet library. An additional user will
be added in a future commit.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Current userspace considers an ICMP header to be 4 bytes consisting
of the type, code, and checksum. The kernel considers it to be 8
bytes because it also counts the two data fields that contain
type-specific information (and are always present). Since flow
extract will zero out headers that are not completely present this
means that an ICMP packet that has a header of 5-7 bytes will be
interpreted differently by userspace and kernel. This fixes the
problem by adopting the kernel's version of the ICMP header in
userspace.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Something like this, on two separate vswitches, works to try it out:
route add -net 224.0.0.0 netmask 240.0.0.0 dev eth0
ovs-vsctl \
-- add-port br0 gre0 \
-- set interface gre0 type=gre options:remote_ip=224.0.0.1
Runtime tested on Linux 3.0, build tested on Linux 2.6.18, both i386.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>