Introduces two new NXMs to represent VXLAN-GBP [0] fields.
actions=load:0x10->NXM_NX_TUN_GBP_ID[],NORMAL
tun_gbp_id=0x10,actions=drop
This enables existing VXLAN tunnels to carry security label
information such as a SELinux context to other network peers.
The values are carried to/from the datapath using the attribute
OVS_TUNNEL_KEY_ATTR_VXLAN_OPTS.
[0] https://tools.ietf.org/html/draft-smith-vxlan-group-policy-00
Signed-off-by: Madhu Challa <challa@noironetworks.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Visual studio supports zero-size array within a struct or union,
but has to be the last element. GCC does not have this restriction.
icmp headers got included inside 'struct ovs_nd_msg' through
commit e60e935b1f (Implement set-field for IPv6 ND fields (nd_target,
nd_sll,and nd_tll). This causes compilation error while using MSVC.
Since icmp[6]_data in the icmp[6]_header is not used anywhere, just remove
them.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This patch adds set-field operations for nd_target, nd_sll, and nd_tll
fields, with and without masks, using Nicira extensions and OpenFlow 1.2
protocol.
Signed-off-by: Randall A Sharo <randall.sharo at navy.mil>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Following patch adds support for userspace tunneling. Tunneling
needs three more component first is routing table which is configured by
caching kernel routes and second is ARP cache which build automatically
by snooping arp. And third is tunnel protocol table which list all
listening protocols which is populated by vswitchd as tunnel ports
are added. GRE and VXLAN protocol support is added in this patch.
Tunneling works as follows:
On packet receive vswitchd check if this packet is targeted to tunnel
port. If it is then vswitchd inserts tunnel pop action which pops
header and sends packet to tunnel port.
On packet xmit rather than generating Set tunnel action it generate
tunnel push action which has tunnel header data. datapath can use
tunnel-push action data to generate header for each packet and
forward this packet to output port. Since tunnel-push action
contains most of packet header vswitchd needs to lookup routing
table and arp table to build this action.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Ben Pfaff <blp@nicira.com>
The macro LITTLE_ENDIAN is a constant, not a test for endianness,
so it doesn't actually tell us anything about the machine.
WORDS_BIGENDIAN is both correct and defined by configure so it is
more portable.
Reported-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
ETH_ADDR_LEN is defined in lib/packets.h, valued 6.
Use this macro instead of magic number 6 to represent the length
of eth mac address.
Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This is the v5 from June 12th, 2014, rebased to OVS master, further
changes in following patches.
Signed-off by: Daniele Venturino <daniele.venturino@m3s.it>
Signed-off by: Martino Fornasa <mf@fornasa.it>
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Daniele Venturino <daniele.venturino@m3s.it>
The system defined ICMPv6 header doesn't have sparse annotation,
so this adds a definition so that endianness can be checked.
Reported-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Extend IPFIX exporter to export tunnel headers when both input and output
of the port.
Add three other_config options in IPFIX table: enable-input-sampling,
enable-output-sampling and enable-tunnel-sampling, to control whether
sampling tunnel info, on which direction (input or output).
Insert sampling action before output action and the output tunnel port
is sent to datapath in the sampling action.
Make datapath collect output tunnel info and send it back to userpace
in upcall message with a new additional optional attribute.
Add a tunnel ports map to make the tunnel port lookup faster in sampling
upcalls in IPFIX exporter. Make the IPFIX exporter generate IPFIX template
sets with enterprise elements for the tunnel info, save the tunnel info
in IPFIX cache entries, and send IPFIX DATA with tunnel info.
Add flowDirection element in IPFIX templates.
Signed-off-by: Wenyu Zhang <wenyuz@vmware.com>
Acked-by: Romain Lenglet <rlenglet@vmware.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
This adds support for Geneve - Generic Network Virtualization
Encapsulation. The protocol is documented at
http://tools.ietf.org/html/draft-gross-geneve-00
The kernel implementation is completely agnostic to the options
that are in use and can handle newly defined options without
further work. It does this by simply matching on a byte array
of options and allowing userspace to setup flows on this array.
Userspace currently implements only support for basic version of
Geneve. It can work with the base header (including the VNI) and
is capable of parsing options but does not currently support any
particular option definitions. Over time, the intention is to
allow options to be matched through OpenFlow without requiring
explicit support in OVS userspace.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
RFC4541- IGMP and MLD Snooping Switches Considerations
recommends to have different actions for local and
non-local multicast traffic.
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Thomas Graf <tgraf@redhat.com>
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Add basic header structure and definitions into packet.h.
Acked-by: Thomas Graf <tgraf@redhat.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Add inlined generic accessors for miniflow integer type fields, and a
new miniflow_get_tcp_flags() usinge these. These will be used in a
later patch.
Some definitions also used in lib/packets.h had to be moved there to
resolve circular include dependencies. Similarly, some inline
functions using struct flow are now in lib/flow.h. IMO this is
cleaner, since now the lib/flow.h need not be included from
lib/packets.h.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Reviewed-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
miniflow_extract() extracts packet headers directly to a miniflow,
which is a compressed form of the struct flow. This does not require
a large struct to be cleared to begin with, and accesses less memory.
These performance benefits should allow this to be used in the DPDK
datapath.
miniflow_extract() takes a miniflow as an input/output parameter. On
input the buffer for values to be extracted must be properly
initialized. On output the map contains ones for all the fields that
have been extracted.
Some struct flow fields are reordered to make miniflow_extract to
progress in the logical order.
Some explicit "inline" keywords are necessary for GCC to optimize this
properly. Also, macros are used for same reason instead of inline
functions for pushing data to the miniflow.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Reviewed-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Add basic recirculation infrastructure and user space
data path support for it. The following bond mega flow patch will
make use of this infrastructure.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Commit 03fbdf8d9c80a (lib/flow: Retain ODPP_NONE on flow_extract())
replaced packet metadata initialization function by a macro.
Visual studio does not like nested structure initialization that
is done in that macro.
This commit replaces the macro by a function.
CC: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
We used to map ODPP_NONE to port number 0, which is wrong, as
ODPP_NONE is a valid value of the flow's in_port.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Change the flow_extract() API to accept struct pkt_metadata,
instead of individual metadata fields. It will make the API more
logical and easier to maintain when we need to expand metadata
down the road.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>¬
Make set_ethertype() static as it is not used outside of packet.c
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>
ICMPv4 and ICMPv6 have 8-bit "type" and "code" fields. struct flow
uses the low 8 bits of the 16-bit tp_src and tp_dst members to
represent these fields. The datapath interface, on the other hand,
represents them with just 8 bits each. This means that if the high 8
bits of the masks for these fields somehow become set (meaning to
match on the nonexistent "high bits" of these fields) during
translation, then they will get chopped off by a round trip through
the datapath, and revalidation will spot that as an inconsistency and
delete the flow. This commit avoids the problem by making sure that
only the low 8 bits of either field can be unwildcarded for ICMP.
This seems like the minimal fix for this problem, appropriate for
backporting to earlier branches. The root of the issue is that these high
bits can get set in the match at all. I have some leads on that, but they
require more invasive changes elsewhere.
Bug #23320.
Reported-by: Krishna Miriyala <miriyalak@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
This is in preparation for pushing vlan tags
using the TPID provided by the kernel via auxdata.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This helps reduce confusion about when a flow is a flow and when it is
just metadata.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Allow TCP flags match specification with symbolic flag names. TCP
flags are optionally specified as a string of flag names, each
preceded by '+' when the flag must be one, or '-' when the flag must
be zero. Any flags not explicitly included are wildcarded. The
existing hex syntax is still allowed, and is used in flow dumps when
all the flags are matched.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Widen TCP flags handling from 7 bits (uint8_t) to 12 bits (uint16_t).
The kernel interface remains at 8 bits, which makes no functional
difference now, as none of the higher bits is currently of interest
to the userspace.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
eth_mpls_depth() has been unused as of 1ac7c9bdb2b6fdcb ("ofproto-dpif: Use
execute_actions to execute controller actions").
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Ethernet headers are 14 bytes long, so when the beginning of such a header
is 32-bit aligned, the following data is misaligned. The usual trick to
fix that is to start the Ethernet header on an odd-numbered 16-bit
boundary. That trick works OK for Open vSwitch, but there are two
problems:
- OVS doesn't use that trick everywhere. Maybe it should, but it's
difficult to make sure that it does consistently because the CPUs
most commonly used with OVS don't care about misalignment, so we
only find problems when porting.
- Some protocols (GRE, VXLAN) don't use that trick, so in such a case
one can properly align the inner or outer L3/L4/L7 but not both. (OVS
userspace doesn't directly deal with such protocols yet, so this is
just future-proofing.)
- OpenFlow uses the alignment trick in a few places but not all of them.
This commit starts the adoption of what I hope will be a more robust way
to avoid misalignment problems and the resulting bus errors on RISC
architectures. Instead of trying to ensure that 32-bit quantities are
always aligned, we always read them as if they were misaligned. To ensure
that they are read this way, we change their types from 32-bit types to
pairs of 16-bit types. (I don't know of any protocols that offset the
next header by an odd number of bytes, so a 16-bit alignment assumption
seems OK.)
The same would be necessary for 64-bit types in protocol headers, but we
don't yet have any protocol definitions with 64-bit types.
IPv6 protocol headers need the same treatment, but for those we rely on
structs provided by system headers, so I'll leave them for an upcoming
patch.
Signed-off-by: Ben Pfaff <blp@nicira.com>
The current situation is that whenever any packet enters the
userspace, bfd_should_process_flow() looks at the UDP destination
port to figure out whether that is a BFD packet. This means that
UDP destination port cannot be wildcarded for all the other flows
too.
To optimize BFD for megaflows, we introduce a new
'bfd:bfd_dst_mac' field in the database. Whenever this field is set
by a controller, it is assumed that all the BFD packets to/from
this interface will have the destination mac address set as the one
specified in the bfd:bfd_dst_mac field. If this field is set, we
first look at the destination mac address of a packet and if it
does not match the mac address set in bfd:bfd_dst_mac, we do not
process that packet as bfd. If the field does match, we go ahead
and look at the UDP destination port too.
Also, change the default BFD destination mac address to
"00:23:20:00:00:01".
Feature #18850.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
Added support to allow mega flow specified and displayed. ovs-dpctl tool
is mainly used as debugging tool.
This patch also implements the low level user space routines to send
and receive mega flow netlink messages. Those netlink suppor
routines are required for forthcoming user space mega flow patches.
Added a unit test to test parsing and display of mega flows.
Ethan contributed the ovs-dpctl mega flow output function.
Co-authored-by: Ethan Jackson <ethan@nicira.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This adds support for the OpenFlow 1.1+ dec_mpls_ttl action.
And also adds an NX dec_mpls_ttl action.
The handling of the TTL modification is entirely handled in userspace.
Reviewed-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Use the innermost dl_type when decoding L3 and L4 data from a packet.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This patch implements use-space datapath and non-datapath code
to match and use the datapath API set out in Leo Alterman's patch
"user-space datapath: Add basic MPLS support to kernel".
The resulting MPLS implementation supports:
* Pushing a single MPLS label
* Poping a single MPLS label
* Modifying an MPLS lable using set-field or load actions
that act on the label value, tc and bos bit.
* There is no support for manipulating the TTL
this is considered future work.
The single-level push pop limitation is implemented by processing
push, pop and set-field/load actions in order and discarding information
that would require multiple levels of push/pop to be supported.
e.g.
push,push -> the first push is discarded
pop,pop -> the first pop is discarded
This patch is based heavily on work by Ravi K.
Cc: Ravi K <rkerur@gmail.com>
Reviewed-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>
An ovs_be32 is a more obvious way to represent an IP address than a
pointer to one. It is also more type-safe, especially since "sparse" is
able to check that the argument is in network byte order.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
Follwoing patch fixes sparse error:
lib/packets.c:643:1: error: symbol 'packet_set_ipv6' redeclared
with different type (originally declared at lib/packets.h:493)
- incompatible argument 6 (different base types)
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
This implementes push_vlan with 802.1Q.
NOTE: 802.1AD (QinQ) is not supported. It requires another effort.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Upcoming tunnel code will be able to handle ECN encapsulation/
decapsulation in userspace. This adds the necessary constants for ECN
manipulation.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Kyle Mestery <kmestery@cisco.com>
Rarp packets had their own header definition in the packets
library. This doesn't make sense because they have the same packet
format as arps.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
We need these for OpenFlow 1.1 ofp_match support even if we don't support
MPLS matching (which we don't, yet).
Reviewed-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>
SUSv3 doesn't require IPPROTO_SCTP so some systems might not provide it.
IPPROTO_SCTP isn't used in the tree yet so this doesn't fix a real bug.
Reviewed-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>
The name compose_rarp() more clearly describes what it's doing now.
Requested-by: Justin Pettit <jpettit@nicira.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Traditionally Open vSwitch had used 802.2 SNAP packets to update
upstream switch learning tables when necessary. This approach had
advantages in that debugging information could be embedded in the
packet helping hapless admins figure out what's going on. However,
since both qemu and VMware use RARP for this purpose, it seems
appropriate to fall in line with the defacto standard.
Requested-by: Ben Basler <bbasler@nicira.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Open vSwitch refuses to mirror certain destination addresses in
addition to those classified by eth_addr_is_reserved(). Looking
through the uses of eth_addr_is_reserved(), one finds that no
callers should be using the additional addresses which mirroring
drops. This patch folds the additional addresses dropped in the
mirroring code, into the more general eth_addr_is_reserverd()
function.
This patch also changes the implementation in a way that is
slightly less efficient, but much easier to read and extend int he
future.
Bug #11755.
Signed-off-by: Ethan Jackson <ethan@nicira.com>