The ARP headers have been acceptable as NXAST_REG_MOVE destinations since
commit f6c8a6b163 (Add software switch support for modifying ARP headers
in OpenFlow.)
Reported-by: Anupam Chanda <achanda@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
When a controller changes its role to MASTER, the others are marked
as SLAVE. This patch makes it possible to notify the controllers
of this change.
Signed-off-by: Alexandru Copot <alex.mihai.c@gmail.com>
Cc: Daniel Baluta <dbaluta@ixiacom.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Open vSwitch has never implemented this request and reply, even though they
have been in OpenFlow since version 1.0. This commit adds an
implementation.
Signed-off: Venkitachalam Gopalakrishnan <gops@vmware.com>
Co-authored-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
tcp_flags=flags/mask
Bitwise match on TCP flags. The flags and mask are 16-bit num‐
bers written in decimal or in hexadecimal prefixed by 0x. Each
1-bit in mask requires that the corresponding bit in port must
match. Each 0-bit in mask causes the corresponding bit to be
ignored.
TCP protocol currently defines 9 flag bits, and additional 3
bits are reserved (must be transmitted as zero), see RFCs 793,
3168, and 3540. The flag bits are, numbering from the least
significant bit:
0: FIN No more data from sender.
1: SYN Synchronize sequence numbers.
2: RST Reset the connection.
3: PSH Push function.
4: ACK Acknowledgement field significant.
5: URG Urgent pointer field significant.
6: ECE ECN Echo.
7: CWR Congestion Windows Reduced.
8: NS Nonce Sum.
9-11: Reserved.
12-15: Not matchable, must be zero.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
OpenFlow 1.3 uses all-1-bits in a packet_in to indicate that the packet_in
was not generated by a flow, but Open vSwitch incorrectly used 0. This
fixes the problem.
For consistency, this commit also changes NXT_PACKET_IN to use all-1-bits
for this case, event though NXT_PACKET_IN was previously defined to use
zero. This doesn't appear to make a difference for the NVP controller; if
it causes a problem for some other controller then I will revert that part
of the change.
Found by inspection.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Collect mega flow mask stats. ovs-dpctl show command can be used to
display them.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
union ofp_action cannot remain in the OF 1.0 header as it is expanded
to include actions from later versions. Also, it is not part of the
protocol interface and will be easier to update where it is actually
used.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
The OVS code has always made a distinction between the unencrypted (TCP)
and SSL port numbers for the OpenFlow and OVSDB protocols. The default
port numbers for both protocols has changed, and there continues to be
no distinction between the unencrypted and SSL versions. This
commit removes the distinction in port numbers. A future patch will
recognize the change in default port number.
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Commit 3245502404 (OXM: Allow masking of IPv6 Flow Label) made the
IPv6 flow label field fully maskable but did not update the comment to say
so.
EXT-101.
CC: Jean Tourrilhes <jt@hpl.hp.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
The translation into OpenFlow 1.2 didn't trim off the OpenFlow 1.3 specific
bits. This fixes the problem.
It would probably be wise to introduce an ofputil_table_stats structure.
Using ofp12_table_stats is somewhat confusing.
Reported-by: Torbjorn Tornkvist <kruskakli@gmail.com>
Tested-by: Torbjorn Tornkvist <kruskakli@gmail.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
OFPTC11_TABLE_MISS_MASK is not a valid configuration, but to indicate
there are only 2 bits being used for table miss configuration. Move
it out of the enum definition.
Reported-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Added infrastructure to support Openflow OFPT_TABLE_MOD message. This patch
does not include the flexible table miss handling code that is necessary to
support the semantics specified in OFPT_TABLE_MOD messages.
Current flow miss behavior continues to conform to Openflow 1.0. Future
commits to add more flexible table miss support are needed to fully support
OPFT_TABLE_MOD for Openflow-1.1+.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Fix typo in enum ofp13_flow_mod_flags comment caused probably
by a copy/paste error.
Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Ethernet headers are 14 bytes long, so when the beginning of such a header
is 32-bit aligned, the following data is misaligned. The usual trick to
fix that is to start the Ethernet header on an odd-numbered 16-bit
boundary. That trick works OK for Open vSwitch, but there are two
problems:
- OVS doesn't use that trick everywhere. Maybe it should, but it's
difficult to make sure that it does consistently because the CPUs
most commonly used with OVS don't care about misalignment, so we
only find problems when porting.
- Some protocols (GRE, VXLAN) don't use that trick, so in such a case
one can properly align the inner or outer L3/L4/L7 but not both. (OVS
userspace doesn't directly deal with such protocols yet, so this is
just future-proofing.)
- OpenFlow uses the alignment trick in a few places but not all of them.
This commit starts the adoption of what I hope will be a more robust way
to avoid misalignment problems and the resulting bus errors on RISC
architectures. Instead of trying to ensure that 32-bit quantities are
always aligned, we always read them as if they were misaligned. To ensure
that they are read this way, we change their types from 32-bit types to
pairs of 16-bit types. (I don't know of any protocols that offset the
next header by an odd number of bytes, so a 16-bit alignment assumption
seems OK.)
The same would be necessary for 64-bit types in protocol headers, but we
don't yet have any protocol definitions with 64-bit types.
IPv6 protocol headers need the same treatment, but for those we rely on
structs provided by system headers, so I'll leave them for an upcoming
patch.
Signed-off-by: Ben Pfaff <blp@nicira.com>
This patch adds support for rewriting SCTP src,dst ports similar to the
functionality already available for TCP/UDP.
Rewriting SCTP ports is expensive due to double-recalculation of the
SCTP checksums; this is performed to ensure that packets traversing OVS
with invalid checksums will continue to the destination with any
checksum corruption intact.
Reviewed-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Ben Pfaff <blp@nicira.com>
We've seen a number of deadlocks in the tree since thread safety was
introduced. So far, all of these are self-deadlocks, that is, a single
thread acquiring a lock and then attempting to re-acquire the same lock
recursively. When this has happened, the process simply hung, and it was
somewhat difficult to find the cause.
POSIX "error-checking" mutexes check for this specific problem (and
others). This commit switches from other types of mutexes to
error-checking mutexes everywhere that we can, that is, everywhere that
we're not using recursive mutexes. This ought to help find problems more
quickly in the future.
There might be performance advantages to other kinds of mutexes in some
cases. However, the existing mutex type choices were just guesses, so I'd
rather go for easy detection of errors until we know that other mutex
types actually perform better in specific cases. Also, I did a quick
microbenchmark of glibc mutex types on my host and found that the
error checking mutexes weren't any slower than the other types, at least
when the mutex is uncontended.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
The Linux kernel datapath enables matching and setting the skb mark
but this functionality is currently used only internally by
ovs-vswitchd. This exposes it through NXM to enable external
controllers to interact with other kernel subsystems. Although this
is simply exporting the skb mark, the intention is that this is a
platform independent mechanism to access some system metadata and
therefore may have different implementations on various systems.
Bug #17855
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
The Clang support for thread-safety annotations is much more effective
than "sparse" support. I found that I was unable to make the annotations
warning-free under sparse.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
This commit adds annotations for thread safety check. And the
check can be conducted by using -Wthread-safety flag in clang.
Co-authored-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
With the latest version of sparse, the ATOMIC_VAR_INIT macro
generates the following warning. This patch suppresses it.
warning: Using plain integer as NULL pointer
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Swap places of OFPRR_METER_DELETE and OFPRR_EVICTION in enumeration to be
compatible with OpenFlow 1.4.
Prior to OpenFlow 1.4 OFPRR_EVICTION was a Nicira specific flow removal reason
code. OpenFlow 1.3 added support for meters, which require dependent flow
removal when meters are deleted. The reason code for this is also added in
OpenFlow 1.4, but OFPRR_METER_DELETE now has the value OVS previously had for
OFPRR_EVICTION.
Signed-off-by: Jarno Rajahalme <jarno.rajahalme@nsn.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
The only tricky part here is that I'm throwing in annotations to allow
"sparse" to report unbalanced locking.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
OpenFlow 1.2 standardized experimenter error codes in a way different from
the Nicira extension. This commit implements the OpenFlow 1.2+ version.
This commit also makes it easy to add error codes for new experimenter IDs
by adding new *_VENDOR_ID definitions to openflow-common.h.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Until now, datapath ports and openflow ports were both represented by
unsigned integers of various sizes. With implicit conversions, etc., it is
easy to mix them up and use one where the other is expected. This commit
creates two typedefs, ofp_port_t and odp_port_t. Both of these two types
are marked by "__attribute__((bitwise))" so that sparse can be used to
detect any misuse.
Signed-off-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Add wildcarded flow support in kernel datapath.
Wildcarded flow can improve OVS flow set up performance by avoid sending
matching new flows to the user space program. The exact performance boost
will largely dependent on wildcarded flow hit rate.
In case all new flows hits wildcard flows, the flow set up rate is
within 5% of that of linux bridge module.
Pravin has made significant contributions to this patch. Including API
clean ups and bug fixes.
Co-authored-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
[jesse: Additional documentation, fix memory leak, and improve validation.]
Signed-off-by: Jesse Gross <jesse@nicira.com>
Commit 796223f5 (netdev: Add new "struct netdev_rx" for capturing packets
from a netdev) refactored send and receive into separate netdevs. As a
result, send and receive now use different socket descriptors (except for tap
interfaces which are treated specially). An unintended side effect was that
all sent packets are looped back and received, which had previously been
avoided as the kernel specifically prevents this from happening on a single
socket descriptor.
To resolve the situation, a socket filter is added to the receive socket
so that it only accepts inbound packets.
Simon Horman co-discovered and initially reported this issue.
Signed-off-by: Murphy McCauley <murphy.mccauley@gmail.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Tested-by: Simon Horman <horms@verge.net.au>
Reviewed-by: Simon Horman <horms@verge.net.au>
Adds tun_src and tun_dst match and set capabilities via new NXM fields
NXM_NX_TUN_IPV4_SRC and NXM_NX_TUN_IPV4_DST. This allows management of
large number of tunnels via the flow tables, without requiring the tunnels
to be pre-configured.
Flow-based tunnels can be configured with options remote_ip=flow and
local_ip=flow. local_ip=flow requires remote_ip=flow. When set, the
tunnel remote IP address and/or local IP address is set from the flow,
instead of the tunnel configuration.
Example:
$ ovs-vsctl add-port br0 gre -- set Interface gre ofport_request=1 type=gre options:remote_ip=flow options:key=flow
$ ovs-ofctl add-flow br0 "in_port=LOCAL actions=set_tunnel:1,set_field:192.168.0.1->tun_dst,output:1"
$ ovs-ofctl add-flow br0 "in_port=1 tun_src=192.168.0.1 tun_id=1 actions=LOCAL"
Signed-off-by: Jarno Rajahalme <jarno.rajahalme@nsn.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
OpenFlow says that an "output" action to a flow's input port is ordinarily
dropped, unless the flow explicitly outputs to OFPP_IN_PORT. We've
occasionally been asked to implement some way to avoid this behavior in
cases where it is not easily known in advance whether a given port is the
input port (so that OFPP_IN_PORT is not easy to use).
This commit implements such a feature. With this commit, one may write:
actions=load:0->NXM_OF_IN_PORT[],output:123
which will output to port 123 regardless of whether it is the input port.
If the input port is important, then one may save and restore it on the
stack:
actions=push:NXM_OF_IN_PORT[],load:0->NXM_OF_IN_PORT[],output:123,
pop:NXM_OF_IN_PORT[]
(Sometimes I am asked whether "resubmit" changes the in_port and would
therefore interact badly with this feature. It does not. "resubmit" only
(optionally) changes the in_port used for the resubmit's flow table lookup.
It does not otherwise have any effect on in_port.)
Bug #14091.
CC: Jarno Rajahalme <jarno.rajahalme@nsn.com>
CC: Ronghua Zhang <rzhang@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Note that OVS_KEY_ATTR_MPLS may be an array of ovs_key_mpls
and that the acceptable length may be restricted by the implementation.
Currently the user-space datapath and proposed kernel datapath
implementation restrict the length to a single element.
Also update the mpls_top_lse name of the element of struct ovs_key_mpls,
as it is an array of LSEs and thus not necessarily just the top LSE.
As requested by Jesse Gross
Cc: Jesse Gross <jesse@nicira.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Define a new NXAST_SAMPLE OpenFlow vendor action and the corresponding
OFPACT_SAMPLE OVS action, to do per-flow packet sampling, translated
into a new SAMPLE "flow_sample" dp action.
Make the userspace action's userdata size vary depending on the union
member used. Add a new "flow_sample" upcall to do per-flow packet
sampling. Add a new "ipfix" upcall to do per-bridge packet sampling
to IPFIX collectors.
Extend the OVSDB schema to support configuring IPFIX collector sets.
Add support for configuring multiple IPFIX collectors for per-flow
packet sampling. Add support for configuring per-bridge IPFIX
sampling.
Automatically generate standard IPFIX entity definitions from the IANA
specs. Send one IPFIX data record message for every packet sampled by
an OpenFlow sample action or received by a bridge configured with
IPFIX sampling, and periodically send IPFIX template set messages.
Signed-off-by: Romain Lenglet <rlenglet@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Patch ports have been completely moved to userspace at this point
but one part of the interface remained. It's no longer used by
either userspace or kernel so this deletes it.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>