The conntrack/ipfrag backport was previously not entirely consistent in
its include for versions 3.9 and 3.10. The intention was to build it for
all kernels 3.10 and newer, so fix the version checks.
Reported-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Tested-by: Simon Horman <simon.horman@netronome.com>
Allow matching and setting the ct_label field. As with ct_mark, this is
populated by executing the CT action. The label field may be modified by
specifying a label and mask nested under the CT action. It is stored as
metadata attached to the connection. Label modification occurs after
lookup, and will only persist when the conntrack entry is committed by
providing the COMMIT flag to the CT action. Labels are currently fixed
to 128 bits in size.
Upstream: c2ac667 "openvswitch: Allow matching on conntrack label"
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Allow matching and setting the ct_mark field. As with ct_state and
ct_zone, these fields are populated when the CT action is executed. To
write to this field, a value and mask can be specified as a nested
attribute under the CT action. This data is stored with the conntrack
entry, and is executed after the lookup occurs for the CT action. The
conntrack entry itself must be committed using the COMMIT flag in the CT
action flags for this change to persist.
Upstream: 182e304 "openvswitch: Allow matching on conntrack mark"
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Expose the kernel connection tracker via OVS. Userspace components can
make use of the CT action to populate the connection state (ct_state)
field for a flow. This state can be subsequently matched.
Exposed connection states are OVS_CS_F_*:
- NEW (0x01) - Beginning of a new connection.
- ESTABLISHED (0x02) - Part of an existing connection.
- RELATED (0x04) - Related to an established connection.
- INVALID (0x20) - Could not track the connection for this packet.
- REPLY_DIR (0x40) - This packet is in the reply direction for the flow.
- TRACKED (0x80) - This packet has been sent through conntrack.
When the CT action is executed by itself, it will send the packet
through the connection tracker and populate the ct_state field with one
or more of the connection state flags above. The CT action will always
set the TRACKED bit.
When the COMMIT flag is passed to the conntrack action, this specifies
that information about the connection should be stored. This allows
subsequent packets for the same (or related) connections to be
correlated with this connection. Sending subsequent packets for the
connection through conntrack allows the connection tracker to consider
the packets as ESTABLISHED, RELATED, and/or REPLY_DIR.
The CT action may optionally take a zone to track the flow within. This
allows connections with the same 5-tuple to be kept logically separate
from connections in other zones. If the zone is specified, then the
"ct_zone" match field will be subsequently populated with the zone id.
IP fragments are handled by transparently assembling them as part of the
CT action. The maximum received unit (MRU) size is tracked so that
refragmentation can occur during output.
IP frag handling contributed by Andy Zhou.
Based on original design by Justin Pettit.
Upstream: 7f8a436 "openvswitch: Add conntrack action"
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
This will allow the ovs-conntrack code to reuse these macros.
Upstream: be26b9a "openvswitch: Move MASKED* macros to datapath.h"
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Following patch adds support for lwtunnel to OVS datapath.
With this change OVS datapath detect lwtunnel support and
make use of new APIs if available. On older kernel where the
support is not there the backported tunnel modules are used.
These backported tunnel devices acts as lwtunnel devices.
I tried to keep backported module same as upstream for easier
bug-fix backport. Since STT and LISP are not upstream OVS
always needs to use respective modules from tunnel compat layer.
To make it work on kernel 4.3 I have converted STT and LISP
modules to lwtunnel API model.
lwtunnel make use of skb-dst to pass tunnel information to the
tunnel module. On older kernel this is not possible. So the in
case of old kernel metadata ref is stored in OVS_CB and direct
call to tunnel transmit function is made by respective tunnel
vport modules. Similarly on receive side tunnel recv directly
call netdev-vport-receive to pass the skb to OVS.
Major backported components include:
Geneve, GRE, VXLAN, ip_tunnel, udp-tunnels GRO.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Joe Stringer <joe@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
When sampling rate is 1, the sampling probability is UINT32_MAX. The packet
should be sampled even the prandom32() generate the number of UINT32_MAX.
And none packet need be sampled when the probability is 0.
Signed-off-by: Wenyu Zhang <wenyuz@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Upstream: e05176a3283 ("openvswitch: Make 100 percents packets
sampled when sampling rate is 1.")
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
openvswitch modifies the L4 checksum of a packet when modifying
the ip address. When an IP packet is fragmented only the first
fragment contains an L4 header and checksum. Prior to this change
openvswitch would modify all fragments, modifying application data
in non-first fragments, causing checksum failures in the
reassembled packet.
Signed-off-by: Glenn Griffin <ggriffin.kernel@gmail.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Upstream: 3576fd794b3 ("openvswitch: Fix L4 checksum handling when
dealing with IP fragments").
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
If new optional attribute OVS_USERSPACE_ATTR_ACTIONS is added to an
OVS_ACTION_ATTR_USERSPACE action, then include the datapath actions
in the upcall.
This Directly associates the sampled packet with the path it takes
through the virtual switch. Path information currently includes mangling,
encapsulation and decapsulation actions for tunneling protocols GRE,
VXLAN, Geneve, MPLS and QinQ, but this extension requires no further
changes to accommodate datapath actions that may be added in the
future.
Adding path information enhances visibility into complex virtual
networks.
Signed-off-by: Neil McKee <neil.mckee@inmon.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
OVS kernel module support for masked set actions in already upstream
in Linux (commit 83d2b9ba1abca241df44a502b6da950a25856b5b). This
patch adds the same for the OVS tree kernel module.
The existing set action sets many fields at once. When only a subset
of the IP header fields, for example, should be modified, all the IP
fields need to be exact matched so that the other field values can be
copied to the set action. A masked set action allows modification of
an arbitrary subset of the supported header bits without requiring the
rest to be matched.
Masked set action is now supported for all writeable key types, except
for the tunnel key. The set tunnel action is an exception as any
input tunnel info is cleared before action processing starts, so there
is no tunnel info to mask.
The kernel module converts all (non-tunnel) set actions to masked set
actions. This makes action processing more uniform, and results in
less branching and duplicating the action processing code. When
returning actions to userspace, the conversion is inverted. We use a
kernel internal action code to be able to tell the userspace provided
and converted masked set actions apart.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Upstream commit:
net: rename vlan_tx_* helpers since "tx" is misleading there
The same macros are used for rx as well. So rename it.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Upstream: df8a39d ("net: rename vlan_tx_* helpers since "tx" is misleading there")
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
So it can be used from out of openvswitch code.
Did couple of cosmetic changes on the way, namely variable naming and
adding support for 8021AD proto.
Note on backwards compatability:
Unlike the upstream version, the backport of skb_vlan_push() does not
support translating a hardware accelerated 8021AD tag to software.
This is not a problem though as it preserves existing behaviour.
Upstream: 93515d53 ("net: move vlan pop/push functions into common code")
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
note that skb_make_writable already exists in net/netfilter/core.c
but does something slightly different.
Upstream: e219512 ("net: move make_writable helper into common code")
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
__vlan_put_tag() was renamed to vlan_insert_tag_set_proto() with
the argument list kept intact.
Upstream: 62749e ("vlan: rename __vlan_put_tag to vlan_insert_tag_set_proto")
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Kernel datapath code has diverged from upstream code. This
makes porting patches between these two code bases harder
than it needs to be. Following patch fixes this by fixing
coding style issues on this branch.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
The original motivation for this change was to allow the helper to be used
in files other than actions.c as part of work on an odp select group
action.
It was as pointed out by Thomas Graf that this helper would be best off
living in netlink.h. Furthermore, I think that the generic nature of this
helper means it is best off in netlink.h regardless of if it is used more
than one .c file or not. Thus, I would like it considered independent of
the work on an odp select group action.
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Pravin Shelar <pshelar@nicira.com>
Cc: Andy Zhou <azhou@nicira.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Thomas Graf <tgraf@noironetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
ipv6_find_hdr() already fixed in newer upstram kernel by Ansis, we
can start using this API safely.
This patch also backports fix (ipv6: ipv6_find_hdr restore prev
functionality) to compat ipv6_find_hdr().
CC: Ansis Atteka <aatteka@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
OVS keeps pointer to packet key in skb->cb, but the packet key is
store on stack. This could make code bit tricky. So it is better to
get rid of the pointer.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
When there are multiple vlan headers present in a received frame, the
first one is put into vlan_tci and protocol is set to ETH_P_8021Q.
Anything in the skb beyond the VLAN TPID may be still non-linear,
including the inner TCI and ethertype. While ovs_flow_extract takes
care of IP and IPv6 headers, it does nothing with ETH_P_8021Q. Later,
if OVS_ACTION_ATTR_POP_VLAN is executed, __pop_vlan_tci pulls the
next vlan header into vlan_tci.
This leads to two things:
1. Part of the resulting ethernet header is in the non-linear part of
the skb. When eth_type_trans is called later as the result of
OVS_ACTION_ATTR_OUTPUT, kernel BUGs in __skb_pull. Also,
__pop_vlan_tci is in fact accessing random data when it reads
past the TPID.
2. network_header points into the ethernet header instead of behind it.
mac_len is set to a wrong value (10), too.
Reported-by: Yulong Pei <ypei@redhat.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
I have dropped second change. Since it assumes inner mac header is of
ETH_HLEN len which is not always true.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Since kernel stack is limited in size, it is not wise to using
recursive function with large stack frames.
This patch provides an alternative implementation of recirc action
without using recursion.
A per CPU fixed sized, 'deferred action FIFO', is used to store either
recirc or sample actions encountered during execution of an action
list. Not executing recirc or sample action in place, but rather execute
them laster as 'deferred actions' avoids recursion.
Deferred actions are only executed after all other actions has been
executed, including the ones triggered by loopback from the kernel
network stack.
The size of the private FIFO, currently set to 20, limits the number
of total 'deferred actions' any one packet can accumulate.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Future patches will change the recirc action implementation to not
using recursion. The stack depth detection is no longer necessary.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
The current sample() function implementation is more complicated
than necessary in handling single user space action optimization
and skb reference counting. There is no functional changes.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
The checksum of ICMPv6 packets uses the IP pseudoheader as part of
the calculation, unlike ICMP in IPv4. This was not implemented,
which means that modifying the IP addresses of an ICMPv6 packet
would cause the checksum to no longer be correct as the psuedoheader
did not match.
Reported-by: Neal Shrader <icosahedral@gmail.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
If recirc action is the last action of a action list, the SKB triggers
the recirc will be freed twice. This patch fixes this bug.
Reported-by: Justin Pettit <jpettit@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
struct ovs_skb_cb is full on kernels < 3.11 due to compatibility code.
This patch removes the 'flow' member in order to make room for data
needed by layer 3 flow/port support that will be added in an upcoming
patch. The 'flow' memeber was chosen for removal because it's only used
in ovs_execute_actions().
Signed-off-by: Lorand Jakab <lojakab@cisco.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Extend IPFIX exporter to export tunnel headers when both input and output
of the port.
Add three other_config options in IPFIX table: enable-input-sampling,
enable-output-sampling and enable-tunnel-sampling, to control whether
sampling tunnel info, on which direction (input or output).
Insert sampling action before output action and the output tunnel port
is sent to datapath in the sampling action.
Make datapath collect output tunnel info and send it back to userpace
in upcall message with a new additional optional attribute.
Add a tunnel ports map to make the tunnel port lookup faster in sampling
upcalls in IPFIX exporter. Make the IPFIX exporter generate IPFIX template
sets with enterprise elements for the tunnel info, save the tunnel info
in IPFIX cache entries, and send IPFIX DATA with tunnel info.
Add flowDirection element in IPFIX templates.
Signed-off-by: Wenyu Zhang <wenyuz@vmware.com>
Acked-by: Romain Lenglet <rlenglet@vmware.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
When flow key becomes invalid due to push or pop actions, current
implementation leaves it as invalid, only rebuild the flow key used
for recirculation.
This works, but is less efficient in case of multiple recirc
actions. Each recirc action will have to re-extract
its own flow keys.
This patch update the original flow key as soon as the first recirc
action is encountered, avoiding expensive flow extract call for any
future recirc actions as long as the flow key remains valid.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
The ‘recalculate_csum’ is almost always ‘true’. It is false only if
the ipv6 nexthdr is an extension header, and a routing header is
found. For the majority of ipv6 packets this would not be the case,
so this can be marked as 'likely'.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
OVS need to flow key for flow lookup in recic action. OVS
does key extract in recic action. Most of cases we could
use OVS_CB packet key directly and can avoid packet flow key
extract. SET action we can update flow-key along with packet
to keep it consistent. But there are some action like MPLS
pop which forces OVS to do flow-extract. In such cases we
can mark flow key as invalid so that subsequent recirc
action can do full flow extract.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
Currently tun_info is used for passing tunnel information
on ingress and egress path, this cause confusion. Following
patch removes its use on ingress path make it egress only parameter.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
Recirc action needs to extract flow key from packet, it uses tun_info
from OVS_CB for setting tunnel meta data in flow key. But tun_info
can be overwritten by tunnel send action. This would result in wrong
flow key for the recirculation.
Following patch copies flow-key meta data from OVS_CB packet key
itself thus avoids this bug.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
OVS flow extract is called on packet receive or packet
execute code path. Following patch defines separate API
for extracting flow-key in packet execute code path.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
struct dp_upcall_info has pointer to pkt_key which is already
available in OVS_CB. This also simplifies upcall handling
for gso packet.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
skb_clone() NULL check is implemented in do_output(), as past of the
common (fast) path. Refactoring so that NULL check is done in the
slow path, immediately after skb_clone() is called.
Besides optimization, this change also improves code readability by
making the skb_clone() NULL check consistent within OVS datapath
module.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Fix a bug where skb_clone() NULL check is missing in sample action
implementation.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Refactoring recirc action implementation.
The main change is to fix a bug where the NULL check after skb clone()
call is missing. The fix is to return -ENOMEM whenever skb_clone()
failed to create a clone.
Also rearranged adjacent code to improve readability.
Reported-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
On packet recv OVS CB is initialized in multiple function.
Following patch moves all these initialization to
ovs_vport_receive().
This patch also save a check in execute actions.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Allow datapath to recognize and extract MPLS labels into flow keys
and execute actions which push, pop, and set labels on packets.
Based heavily on work by Leo Alterman, Ravi K, Isaku Yamahata and Joe Stringer.
Cc: Ravi K <rkerur@gmail.com>
Cc: Leo Alterman <lalterman@nicira.com>
Cc: Isaku Yamahata <yamahata@valinux.co.jp>
Cc: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Currently, the flow information that is matched for tunnels and
the tunnel data passed around with packets is the same. However,
as additional information is added this is not necessarily desirable,
as in the case of pointers.
This adds a new structure for tunnel metadata which currently contains
only the existing struct. This change is purely internal to the kernel
since the current OVS_KEY_ATTR_IPV4_TUNNEL is simply a compressed version
of OVS_KEY_ATTR_TUNNEL that is translated at flow setup.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
The sample action is rather generic, allowing arbitrary actions to be
executed based on a probability. However its use, within the Open vSwitch
code-base is limited: only a single user-space action is ever nested.
A consequence of the current implementation of sample actions is that
depending on weather the sample action executed (due to its probability)
any side-effects of nested actions may or may not be present before
executing subsequent actions. This has the potential to complicate
verification of valid actions by the (kernel) datapath. And indeed adding
support for push and pop MPLS actions inside sample actions is one case
where such case.
In order to allow all supported actions to be continue to be nested inside
sample actions without the potential need for complex verification code
this patch changes the implementation of the sample action in the kernel
datapath so that sample actions are more like a function call and any side
effects of nested actions are not present when executing subsequent
actions.
With the above in mind the motivation for this change is twofold:
* To contain side-effects the sample action in the hope of making it
easier to deal with in the future and;
* To avoid some rather complex verification code introduced in the MPLS
datapath patch.
Some notes about the implementation:
* This patch silently changes the behaviour of sample actions whose nested
actions have side-effects. There are no known users of such sample
actions.
* sample() does not clone the skb for the only known use-case of the sample
action: a single nested userspace action. In such a case a clone is not
needed as the userspace action has no side effects.
Given that there are no known users of other nested actions and in order
to avoid the complexity of predicting if other sequences of actions have
side-effects in such cases the skb is cloned.
* As sample() provides a cloned skb in the unlikely case where there are
nested actions other than a single userspace action it is no longer
necessary to clone the skb in do_execute_actions() when executing a
recirculation action just because the keep_skb parameter is set: this
parameter was only set when processing the nested actions of a sample
action. Moreover it is possible to remove the keep_skb parameter of
do_execute_actions entirely.
* As sample() provides either a cloned skb or one that has had a
reference taken (using keep_skb) to do_execute_actions()
the original skb passed to sample() is never consumed. Thus the
caller of sample() (also do_execute_actions()) can use its generic
error handling to free the skb on error.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Jesse Gross <jesse@nicira.com>
This patch attempts to ensure that skb(s) are always freed (once)
if if an error occurs in execute_recirc(). It does so in two ways:
1. Ensure that execute_recirc() always consimes skb passed to it.
Specifically, free the skb if the call to ovs_flow_extract() fails.
2. Return from the recirc case in execute_recirc() whenever
the skb has not been cloned immediately above.
This is only the case if the action is both the last action and the
keep_skb parameter of execute_recirc is not true. As it is the last
action and the skb is consumed one way or another by execute_recirc() it
is correct to return here. In particular this avoids the possibility of
the skb, which has been consumed by execute_recirc() from being freed.
Conversely if this is not the case then the skb has been cloned
and the clone has been consumed by execute_recirc().
This leads to three sub-cases:
* If execute_recirc() returned an error code then the original skb
will be freed by the error handling code below the case statement in
do_execute_actions().
* If this is not the last action then action processing will continue,
using the original skb.
* If this is the last action then it must also be the case that keep_skb
is true (otherwise the skb would not have been cloned). Thus
do_execute_actions() will return without freeing the original skb.
Signed-off-by: Simon Horman <horms@verge.net.au>
[jesse: use kfree_skb instead of consume_skb on error path]
Signed-off-by: Jesse Gross <jesse@nicira.com>
Current datapath limits the number of times same packet can loop
through action execution to avoid blowing out the kernel stack.
Recirculation also adds to action execution count, but does not use
the same amount of stack compare to other services, such as IPsec.
This patch introduces the concept of stack cost. Recirculation has a
stack cost of 1 while other services have stack cost of 4. Datapath
packet process can accommodate packets that need both services and
recirculation as long as the total stack cost does not exceed the max
stack cost. Packets exceed the limit will be treated as looped packets
and dropped.
The max stack cost is set to allow up to 4 regular services, plus up
to 3 recirculation. The behavior of packets do not recirculate does
not change.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>