The code that converts netlink attributes to a flow match always
stored TCP and UDP ports in the IPv4 structure. This commit
properly puts TCP and UDP traffic into appropriate IPv4 and IPv6
structures.
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
When executing packets sent from userspace, the majority of the
flow information is extracted from the packet itself and a small
amount of metadata supplied by userspace is added. However, when
adding this metadata, the extracted flow information is currently
being cleared.
This manifests in a problem when executing actions as elements of key are
used when verifying some actions. For example a dec_ttl action verifies the
proto of the flow. An example of a flow that fails as a result of this
problem is:
ovs-ofctl add-flow br0 "ip actions=dec_ttl,normal"
This is a regression added by "datapath: Mega flow implementation",
a1c564be1e.
CC: Andy Zhou <azhou@nicira.com>
Reported-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Jesse Gross <jesse@nicira.com>
When flow table is copied, the mask list from the old table
is not properly copied into the new table. The corrupted mask
list in the new table will lead to kernel crash. This patch
fixes this bug.
Bug #18110
Reported-by: Justin Pettit <jpettit@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Following patch restructures ovs tunneling and gre vport
implementation to make ovs tunneling more in sync with
upstream kernel tunneling. Doing this tunneling code is
simplified as most of protocol processing on send and
recv is pushed to kernel tunneling. For external ovs
module the code is moved to kernel compatibility code.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
When parsing flow Netlink messages we currently have arrays to hold the
attribute pointers for both values and masks. This results in a large
stack, which some compilers warn about. It's not actually necessary
to have both arrays at the same time, so we can collapse this to a
single array.
Reported-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Add wildcarded flow support in kernel datapath.
Wildcarded flow can improve OVS flow set up performance by avoid sending
matching new flows to the user space program. The exact performance boost
will largely dependent on wildcarded flow hit rate.
In case all new flows hits wildcard flows, the flow set up rate is
within 5% of that of linux bridge module.
Pravin has made significant contributions to this patch. Including API
clean ups and bug fixes.
Co-authored-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
[jesse: Additional documentation, fix memory leak, and improve validation.]
Signed-off-by: Jesse Gross <jesse@nicira.com>
As suggested by Jesse in the comment for patch "gre: Restructure
tunneling", following patch keeps skb->csum correct across ovs.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
I'm not sure why, but the hlist for each entry iterators were conceived
list_for_each_entry(pos, head, member)
The hlist ones were greedy and wanted an extra parameter:
hlist_for_each_entry(tpos, pos, head, member)
Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.
Besides the semantic patch, there was some manual work required:
- Fix up the actual hlist iterators in linux/list.h
- Fix up the declaration of other iterators based on the hlist ones.
- A very small amount of places were using the 'node' parameter, this
was modified to use 'obj->member' instead.
- Coccinelle didn't handle the hlist_for_each_entry_safe iterator
properly, so those had to be fixed up manually.
The semantic patch which is mostly the work of Peter Senna Tschudin is here:
@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
type T;
expression a,c,d,e;
identifier b;
statement S;
@@
-T b;
<+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
...+>
[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Since userspace flow based tunneling code is checked in, the kernel
port based tunneling code can be removed.
Patch removes following components:
- tunnel ports hash table and moved tunnel ports list to individual
vports.
- Cleaned per tnl-port config.
- OVS_KEY_ATTR_TUN_ID action is removed.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Bug #15078
This reverts commit 82b0d75509.
This patch introduced bug by calling vfree() from interrupt context.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
The switch to flow based tunneling increased the size of each output
action in the flow action list. In extreme cases, this can result
in the action list exceeding the maximum buffer size.
This doubles the maximum buffer size to compensate for the increase
in action size. In the common case, most allocations will be
less than a page and those uses kmalloc. Therefore, for the majority
of situations, this will have no impact.
Bug #15203
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Before this patch, if an LLC/SNAP packet with OUI 00:00:00 had an ethertype
less than 1536 the flow key given to userspace in the upcall would contain the
invalid ethertype (for example, 3). If userspace attempted to insert a kernel
flow for this key it would be rejected by ovs_flow_from_nlattrs.
This patch allows OVS to pass the OFTest pktact.DirectBadLlcPackets.
Signed-off-by: Rich Lane <rlane@bigswitch.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Following patch breaks down single ipv4_tunnel netlink attribute into
individual member attributes. It will help when we extend tunneling
parameters in future.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Bug #14611
When a packet is received on a tunnel the pad member is currently
left uninitialized. This didn't previously cause problems because
userspace didn't interprete the IPV4_TUNNEL attribute and blindly
copied back the uninitialized data. However, now that userspace
knows how to serialize this attribute it was zeroing it out, which
prevented flows that had been previously installed from being
deleted. In addition to zeroing out the padding on packet reception,
it also does the same thing on flow setup since we should be ignoring
the value.
Reported-by: Anand Krishnamurthy <krishnamurt4@wisc.edu>
Reported-by: Saul St. John <sstjohn@cs.wisc.edu>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This patch adds support for skb mark matching and set action.
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Ansis Atteka <aatteka@nicira.com>
Tunneling metadata is important enough to move out of struct phy
and handled on its own. This makes it somewhat easier to tell
how well the other structures are packed and also the name shorter.
This also drops the union since it's not needed quite yet. We
can introduce it back when we have support for IPv6 tunneling.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Kyle Mestery <kmestery@cisco.com>
The names for the flags used by flow based tunneling are pretty long.
This shortens them a little by removing the word FLOW, which is a
distinction that won't be meaningful in the near future.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Kyle Mestery <kmestery@cisco.com>
Tunnel ports now always include full outer IP information, even if
userspace can't understand it. Since our flows our exact match this
information must also be provided when setting up flows. Since flows
with only OVS_KEY_ATTR_TUN_ID keys don't contain all of this information
they can never be hit and we should just reject them at setup time.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Kyle Mestery <kmestery@cisco.com>
During development it was preferable to keep OVS_KEY_ATTR_IPV4_TUNNEL
in the non-upstream range of identifiers to avoid conflicts or
compatibility issues as it evolved. However, since the intention is
to get it upstream, it makes sense to move it down now to avoid issues
with compatibility when upgrading.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Kyle Mestery <kmestery@cisco.com>
Header caching previously required the ability to maintain the lifetime
of flows across RCU boundaries. However, now that header caching is
gone we can simplfy the code and make it match the upstream version.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
With this commit, OVS will match the data in the RARP packets having
ethertype 0x8035, in the same way as the data in the ARP packets.
Signed-off-by: Mehak Mahajan <mmahajan@nicira.com>
With this commit, the datapath will process the ARP header for
RARP packets. It also fixes a bug whereby if the ARP opcode is
something other than ARP request or reply, the key_len is not
adjusted to include ARP info.
Signed-off-by: Mehak Mahajan <mmahajan@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Following patch adds start offset for sw_flow-key, so that we can
skip tunneling information in key for non-tunnel flows.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Kyle Mestery <kmestery@cisco.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This is a first pass at providing a tun_key which can be
used as the basis for flow-based tunnelling. The
tun_key includes and replaces the tun_id in both struct
ovs_skb_cb and struct sw_tun_key.
This patch allows all existing tun_id behaviour to still work. Existing
users of tun_id are redirected to tun_key->tun_id to retain compatibility.
However, when the userspace code is updated to make use of the new
tun_key, the old behaviour will be deprecated and removed.
NOTE: With these changes, the tunneling code no longer assumes input and
output keys are symmetric. If they are not, PMTUD needs to be disabled
for tunneling to work.
Signed-off-by: Kyle Mestery <kmestery@cisco.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Reviewed-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Replaced all instances of Nicira Networks(, Inc) to Nicira, Inc.
Feature #10593
Signed-off-by: Raju Subramanian <rsubramanian@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.
Signed-off-by: David S. Miller <davem@davemloft.net>
[jesse: Additional transformations for code not upstream.]
Signed-off-by: Jesse Gross <jesse@nicira.com>
We currently check that a packet is IPv4 and TCP before fetching the
TCP flags. This enables fetching from IPv6 packets as well.
Bug #10194
Reported-by: Michael Mao <mmao@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
When collecting TCP flags we check that the IP header indicates that
a TCP header is present but not that the packet is actually long
enough to contain the header. This adds a check to prevent reading
off the end of the packet.
In practice, this is only likely to result in reading of bad data and
not a crash due to the presence of struct skb_shared_info at the end
of the packet.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Use hash table to store ports of datapath. Allow 64K ports per switch.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Bug #2462
Following patch introduces a timer based event to rehash flow-hash
table. It makes finding collisions difficult to for an attacker.
Suggested-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
We currently have a version of ipv6_skip_exthdr() which is
identical to the main one with the addition of fragment reporting.
We can propose our version for upstream and then use it directly
without duplication.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
OVS has quite a few global symbols that should be scoped with a
prefix to prevent collisions with other modules in the kernel.
Suggested-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Some overzealous marking of pointers as __rcu caused sparse to flag
errors.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Many of our kernel copyright messages make reference to code being
copied from the Linux kernel, which is a bit odd for code in the
kernel. This changes them to use the standard GNU GPL boilerplate
instead. It does not change the actual license, which continues to
be GPLv2.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Without this, every VLAN packet goes to userspace because VLAN flows
cannot be set up.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
In the future it is likely that our vlan support will expand to
include multiply tagged packets. When this happens, we would
ideally like for it to be consistent with our current tagging.
Currently, if we receive a packet with a partial VLAN tag we will
automatically drop it in the kernel, which is unique among the
protocols we support. The only other reason to drop a packet is
a memory allocation error. For a doubly tagged packet, we will
parse the first tag and indicate that another tag was present but
do not drop if the second tag is incorrect as we do not parse it.
This changes the behavior of the vlan parser to match other protocols
and also deeper tags by indicating the presence of a broken tag with
the 802.1Q EtherType but no vlan information. This shifts the policy
decision to userspace on whether to drop broken tags and allows us to
uniformly add new levels of tag parsing.
Although additional levels of control are provided to userspace, this
maintains the current behavior of dropping packets with a broken
tag when using the NORMAL action because that is the correct behavior
for an 802.1Q-aware switch. The userspace flow parser actually
already had the new behavior so this corrects an inconsistency.
Reported-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
When the datapath was converted to use Netlink attributes for describing
flow keys, I had a vague idea of how it could be smoothly extensible, but
I didn't actually implement extensibility or carefully think it through.
This commit adds a document that describes how flow keys can be extended
in a compatible fashion and adapts the existing interface to match what
it says.
This commit doesn't actually implement extensibility. I already have a
separate patch series out for that. This patch series borrows from that
one heavily, but the extensibility series will need to be reworked
somewhat once this one is in.
This commit is only lightly tested because I don't have a good test setup
for VLANs.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
This is more conventional use of Netlink.
For upstreaming, 'u64 attrs' can be changed to u32 and the uses of 1ULL
can be changed to 1.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
This seems clearer to me. It should not cause any behavioral change.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
The function flow_metadata_from_nlattrs() is very restrictive
about the ordering and type of metadata attributes that it receives.
This patch will change flow_metadata_from_nlattrs() behavior by
ignoring attributes that it does not understand and allowing them
to be passed in arbitrary order.
Issue #8167
Signed-off-by: Ansis Atteka <aatteka@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
IPv6 uses the term "traffic class" for what IPv4 calls
"type-of-service". This commit renames the the "ipv6_tos" field to
"ipv6_tclass" in the "ovs-key_ipv6" struct to be more consistent with
the IPv6 terminology.
Suggested-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Add support matching the IPv4 TTL and IPv6 hop limit fields. This
commit also adds support for modifying the IPv4 TTL. Modifying the IPv6
hop limit isn't currently supported, since we don't support modifying
IPv6 headers.
We will likely want to change the user-space interface, since basic
matching and setting the TTL are not generally useful. We will probably
want the ability to match on extraordinary events (such as TTL of 0 or 1)
and a decrement action.
Feature #8024
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>