2
0
mirror of https://github.com/openvswitch/ovs synced 2025-10-15 14:17:18 +00:00
Commit Graph

295 Commits

Author SHA1 Message Date
Pravin B Shelar
19c64e8691 datapath: stt: Fix device list management.
STT receive can accept packet on device which is not UP state.
Following patch fixes this issue by introducing another list
of devices which contains only devices in up state. This list can
be used for searching stt devices on packet receive.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@kernel.org>
2015-12-21 14:59:04 -08:00
Pravin B Shelar
15a0ca65f3 datapath: stt: Fix error handling in stt_start().
The bug was reported by Joe Stringer.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@kernel.org>
2015-12-20 13:20:20 -08:00
Pravin B Shelar
4e00c9858e datapath: stt: Do not access stt_dev socket in lookup.
STT device is added to the device list at device create time. and
the dev socket is initialized when dev is UP. So avoid accessing
stt socket while searching a device.

---8<---
IP: [<ffffffffc0e731fd>] nf_ip_hook+0xfd/0x180 [openvswitch]
Oops: 0000 [#1] PREEMPT SMP
Hardware name: VMware, Inc. VMware Virtual Platform/440BX
RIP: 0010:[<ffffffffc0e731fd>]  [<ffffffffc0e731fd>] nf_ip_hook+0xfd/0x180 [openvswitch]
RSP: 0018:ffff88043fd03cd0  EFLAGS: 00010206
RAX: 0000000000000000 RBX: ffff8801008e2200 RCX: 0000000000000034
RDX: 0000000000000110 RSI: ffff8801008e2200 RDI: ffff8801533a3880
RBP: ffff88043fd03d00 R08: ffffffff90646d10 R09: ffff880164b27000
R10: 0000000000000003 R11: ffff880155eb9dd8 R12: 0000000000000028
R13: ffff8802283dc580 R14: 00000000000076b4 R15: ffff880013b20000
FS:  00007ff5ba73b700(0000) GS:ffff88043fd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000020 CR3: 000000037ff96000 CR4: 00000000000007e0
Stack:
 ffff8801533a3890 ffff88043fd03d80 ffffffff90646d10 0000000000000000
 ffff880164b27000 ffff8801008e2200 ffff88043fd03d48 ffffffff9064050a
 ffffffff90d0f930 ffffffffc0e7ef80 0000000000000001 ffff8801008e2200
Call Trace:
 <IRQ>
 [<ffffffff9064050a>] nf_iterate+0x9a/0xb0
 [<ffffffff9064059c>] nf_hook_slow+0x7c/0x120
 [<ffffffff906470f3>] ip_local_deliver+0x73/0x80
 [<ffffffff90646a3d>] ip_rcv_finish+0x7d/0x350
 [<ffffffff90647398>] ip_rcv+0x298/0x3d0
 [<ffffffff9060fc56>] __netif_receive_skb_core+0x696/0x880
 [<ffffffff9060fe58>] __netif_receive_skb+0x18/0x60
 [<ffffffff90610b3e>] process_backlog+0xae/0x180
 [<ffffffff906102c2>] net_rx_action+0x152/0x270
 [<ffffffff9006d625>] __do_softirq+0xf5/0x320
 [<ffffffff9071d15c>] do_softirq_own_stack+0x1c/0x30

Reported-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Tested-by: Joe Stringer <joe@ovn.org>
2015-12-20 13:20:15 -08:00
Joe Stringer
f2e11497cd compat: Always use own __ipv6_select_ident().
If the ip fragmentation backport is enabled, we should always use our
own {,__}ipv6_select_ident(). This fixes the following issue on some
v3.19 kernels:

datapath/linux/ip6_output.c:93:12: error: conflicting types for
‘__ipv6_select_ident’
 static u32 __ipv6_select_ident(struct net *net, u32 hashrnd,

Reported-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Tested-by: Simon Horman <simon.horman@netronome.com>
2015-12-18 17:27:11 -08:00
Pravin B Shelar
8ab1170b06 datapath: stt: Use RCU API to update stt-dev list.
Following crash was reported for STT tunnel. I am not able to reproduce
it, But the usage of wrong list manipulation API is likely culprit.

---8<---
IP: [<ffffffffc0e731fd>] nf_ip_hook+0xfd/0x180 [openvswitch]
Oops: 0000 [#1] PREEMPT SMP
Hardware name: VMware, Inc. VMware Virtual Platform/440BX
RIP: 0010:[<ffffffffc0e731fd>]  [<ffffffffc0e731fd>] nf_ip_hook+0xfd/0x180 [openvswitch]
RSP: 0018:ffff88043fd03cd0  EFLAGS: 00010206
RAX: 0000000000000000 RBX: ffff8801008e2200 RCX: 0000000000000034
RDX: 0000000000000110 RSI: ffff8801008e2200 RDI: ffff8801533a3880
RBP: ffff88043fd03d00 R08: ffffffff90646d10 R09: ffff880164b27000
R10: 0000000000000003 R11: ffff880155eb9dd8 R12: 0000000000000028
R13: ffff8802283dc580 R14: 00000000000076b4 R15: ffff880013b20000
FS:  00007ff5ba73b700(0000) GS:ffff88043fd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000020 CR3: 000000037ff96000 CR4: 00000000000007e0
Stack:
 ffff8801533a3890 ffff88043fd03d80 ffffffff90646d10 0000000000000000
 ffff880164b27000 ffff8801008e2200 ffff88043fd03d48 ffffffff9064050a
 ffffffff90d0f930 ffffffffc0e7ef80 0000000000000001 ffff8801008e2200
Call Trace:
 <IRQ>
 [<ffffffff90646d10>] ? ip_rcv_finish+0x350/0x350
 [<ffffffff9064050a>] nf_iterate+0x9a/0xb0
 [<ffffffff90646d10>] ? ip_rcv_finish+0x350/0x350
 [<ffffffff9064059c>] nf_hook_slow+0x7c/0x120
 [<ffffffff90646d10>] ? ip_rcv_finish+0x350/0x350
 [<ffffffff906470f3>] ip_local_deliver+0x73/0x80
 [<ffffffff90646a3d>] ip_rcv_finish+0x7d/0x350
 [<ffffffff90647398>] ip_rcv+0x298/0x3d0
 [<ffffffff9060fc56>] __netif_receive_skb_core+0x696/0x880
 [<ffffffff9060fe58>] __netif_receive_skb+0x18/0x60
 [<ffffffff90610b3e>] process_backlog+0xae/0x180
 [<ffffffff906102c2>] net_rx_action+0x152/0x270
 [<ffffffff9006d625>] __do_softirq+0xf5/0x320
 [<ffffffff9071d15c>] do_softirq_own_stack+0x1c/0x30

Reported-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@kernel.org>
2015-12-17 15:33:10 -08:00
Pravin B Shelar
7842e2571f datapath: compat: Block upstream ip_tunnels functions.
Since upstream and compat ip_tunnel structures are not same, we can not
use exported upstream functions.
Following patch blocks definitions which used ip_tunnel internal
structure. Function which do not depend on these structures are
allows by explicitly by defining it in the header files. e.g.
iptunnel_handle_offloads(), iptunnel_pull_header(). etc.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@kernel.org>
2015-12-11 13:49:47 -08:00
Pravin B Shelar
0346941968 datapath: define compat ip_tunnel_get_link_net()
Same as ip_tunnel_get_iflink(), function ip_tunnel_get_link_net()
also depends on ip_tunnel structure. So this patch defines
compat implementation for same.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@kernel.org>
2015-12-11 13:49:47 -08:00
Pravin B Shelar
00d662ba54 datapath: define compat ip_tunnel_get_iflink()
ip_tunnel_get_iflink() depends on ip_tunnel structure. But OVS
compat layer defines its own ip_tunnel structure which is not
compatible with all upstream kernel versions. Therefore we
can no use such function.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@kernel.org>
2015-12-11 13:49:47 -08:00
Pravin B Shelar
c182a3676e datapath: Backport: skbuff: Fix skb checksum partial check.
This bug fix is not required for OVS use cases. But is it
nice to keep function consistent with upstream implementation.

Upstream commit:

    Earlier patch 6ae459bda tried to detect void ckecksum partial
    skb by comparing pull length to checksum offset. But it does
    not work for all cases since checksum-offset depends on
    updates to skb->data.

    Following patch fixes it by validating checksum start offset
    after skb-data pointer is updated. Negative value of checksum
    offset start means there is no need to checksum.

    Fixes: 6ae459bda ("skbuff: Fix skb checksum flag on skb pull")
    Reported-by: Andrew Vagin <avagin@odin.com>
    Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Upstream: 31b33dfb0a1 ("skbuff: Fix skb checksum partial check");
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@kernel.org>
2015-12-10 20:34:43 -08:00
Pravin B Shelar
68fc345151 datapath: Fix STT packet receive handling.
STT reassembly can generate list of packets. But it was
handled as a single skb. Following patch fixes it.

Fixes: e23775f20 ("datapath: Add support for lwtunnel").
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@kernel.org>
Acked-by: Joe Stringer <joe@ovn.org>
2015-12-10 20:34:00 -08:00
Joe Stringer
b41e3e99d4 datapath: Define nf_connlabels_{put,get}.
Previously this was only done when connlabels were enabled in the kernel
config, even if the functions didn't exist. Fix the compile error.

Fixes: d70a6ff5d40d ("datapath: Define nf_connlabels_{put,get}.")
Reported-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2015-12-10 16:13:54 -08:00
Pravin B Shelar
839c723379 datapath: Backport: vxlan: interpret IP headers for ECN correctly
Upstream commit:
    When looking for outer IP header, use the actual socket address family, not
    the address family of the default destination which is not set for metadata
    based interfaces (and doesn't have to match the address family of the
    received packet even if it was set).

    Fix also the misleading comment.

    Signed-off-by: Jiri Benc <jbenc@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Upstream: ce212d0f6f5 ("vxlan: interpret IP headers for ECN correctly")
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@kernel.org>
2015-12-08 09:48:31 -08:00
Pravin B Shelar
9e066fe189 datapath: Backport: vxlan: fix incorrect RCO bit in VXLAN header
Upstream commit:
    Commit 3511494ce2f3d ("vxlan: Group Policy extension") changed definition of
    VXLAN_HF_RCO from 0x00200000 to BIT(24). This is obviously incorrect. It's
    also in violation with the RFC draft.

    Fixes: 3511494ce2f3d ("vxlan: Group Policy extension")
    Cc: Thomas Graf <tgraf@suug.ch>
    Cc: Tom Herbert <therbert@google.com>
    Signed-off-by: Jiri Benc <jbenc@redhat.com>
    Acked-by: Tom Herbert <tom@herbertland.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Upstream: c5fb8caaf91 ("vxlan: fix incorrect RCO bit in VXLAN header")
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@kernel.org>
2015-12-08 09:48:27 -08:00
Joe Stringer
c05e20946d datapath: Backport conntrack fixes.
Backport the following fixes for conntrack from upstream.

9723e6abc70a openswitch: fix typo CONFIG_NF_CONNTRACK_LABEL
0d5cdef8d5dd openvswitch: Fix conntrack compilation without mark.
982b52700482 openvswitch: Fix mask generation for nested attributes.
cc5706056baa openvswitch: Fix IPv6 exthdr handling with ct helpers.
33db4125ec74 openvswitch: Rename LABEL->LABELS
b8f2257069f1 openvswitch: Fix skb leak in ovs_fragment()
ec0d043d05e6 openvswitch: Ensure flow is valid before executing ct
6f225952461b openvswitch: Reject ct_state unsupported bits
fbccce5965a5 openvswitch: Extend ct_state match field to 32 bits
ab38a7b5a449 openvswitch: Change CT_ATTR_FLAGS to CT_ATTR_COMMIT
9e384715e9e7 openvswitch: Reject ct_state masks for unknown bits
4f0909ee3d8e openvswitch: Mark connections new when not confirmed.
e754ec69ab69 openvswitch: Serialize nested ct actions if provided
74c16618137f openvswitch: Fix double-free on ip_defrag() errors
6f5cadee44d8 openvswitch: Fix skb leak using IPv6 defrag

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2015-12-03 17:17:25 -08:00
Joe Stringer
a94ebc3999 datapath: Add conntrack action
Expose the kernel connection tracker via OVS. Userspace components can
make use of the CT action to populate the connection state (ct_state)
field for a flow. This state can be subsequently matched.

Exposed connection states are OVS_CS_F_*:
- NEW (0x01) - Beginning of a new connection.
- ESTABLISHED (0x02) - Part of an existing connection.
- RELATED (0x04) - Related to an established connection.
- INVALID (0x20) - Could not track the connection for this packet.
- REPLY_DIR (0x40) - This packet is in the reply direction for the flow.
- TRACKED (0x80) - This packet has been sent through conntrack.

When the CT action is executed by itself, it will send the packet
through the connection tracker and populate the ct_state field with one
or more of the connection state flags above. The CT action will always
set the TRACKED bit.

When the COMMIT flag is passed to the conntrack action, this specifies
that information about the connection should be stored. This allows
subsequent packets for the same (or related) connections to be
correlated with this connection. Sending subsequent packets for the
connection through conntrack allows the connection tracker to consider
the packets as ESTABLISHED, RELATED, and/or REPLY_DIR.

The CT action may optionally take a zone to track the flow within. This
allows connections with the same 5-tuple to be kept logically separate
from connections in other zones. If the zone is specified, then the
"ct_zone" match field will be subsequently populated with the zone id.

IP fragments are handled by transparently assembling them as part of the
CT action. The maximum received unit (MRU) size is tracked so that
refragmentation can occur during output.

IP frag handling contributed by Andy Zhou.

Based on original design by Justin Pettit.

Upstream: 7f8a436 "openvswitch: Add conntrack action"
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2015-12-03 17:17:25 -08:00
Joe Stringer
73b09aff14 compat: Backport IPv6 reassembly.
Backport IPv6 fragment reassembly from upstream commits in the Linux 4.3
development tree.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2015-12-03 17:17:24 -08:00
Joe Stringer
2d3ef70b74 compat: Backport IPv6 fragmentation.
IPv6 fragmentation functionality is not exported by most kernels, so
backport this code from the upstream 4.3 development tree.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2015-12-03 17:17:24 -08:00
Joe Stringer
595e069a06 compat: Backport IPv4 reassembly.
Backport IPv4 reassembly from the upstream commit caaecdd3d3f8 ("inet:
frags: remove INET_FRAG_EVICTED and use list_evictor for the test").

This is necessary because kernels prior to upstream commit d6b915e29f4a
("ip_fragment: don't forward defragmented DF packet") would not always
track the maximum received unit size during ip_defrag(). Without the
MRU, refragmentation cannot occur so reassembled packets are dropped.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2015-12-03 17:08:16 -08:00
Joe Stringer
213e1f54b4 compat: Wrap IPv4 fragmentation.
Most kernels provide some form of ip fragmentation. However, until
recently many of them would always send ICMP responses for over_MTU
packets, even when operating in bridge mode. Backport the check to
ensure this doesn't occur.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2015-12-03 17:08:16 -08:00
Joe Stringer
cfda4537ec compat: Backport ip_skb_dst_mtu().
>From upstream f87c10a8aa1e ("ipv4: introduce ip_dst_mtu_maybe_forward
and protect forwarding path against pmtu spoofing")

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2015-12-03 17:08:16 -08:00
Joe Stringer
3f506f0761 compat: Backport dev_recursion_level().
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2015-12-03 17:08:16 -08:00
Joe Stringer
f000c8793d compat: Backport prandom_u32_max().
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2015-12-03 17:08:16 -08:00
Joe Stringer
0d03e51ca9 compat: Backport 'dst' functions.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2015-12-03 17:08:16 -08:00
Joe Stringer
b8cce81fa9 compat: Backport nf_connlabels_{get, put}().
This is a partial backport of Linux commit 86ca02e77408
"netfilter: connlabels: Export setting connlabel length".

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2015-12-03 17:08:15 -08:00
Joe Stringer
057772cf24 compat: Backport nf_ct_tmpl_alloc().
Loosely based upon Linux commit 0838aa7fcfcd "netfilter: fix netns
dependencies with conntrack templates" and commit 5e8018fc6142
"netfilter: nf_conntrack: add efficient mark to zone mapping".

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2015-12-03 17:08:15 -08:00
Joe Stringer
420c9726b6 compat: Backport conntrack zones headers.
Loosely based upon Linux commit 308ac9143ee2 "netfilter: nf_conntrack:
push zone object into functions".

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2015-12-03 17:08:15 -08:00
Pravin B Shelar
e23775f20e datapath: Add support for lwtunnel
Following patch adds support for lwtunnel to OVS datapath.
With this change OVS datapath detect lwtunnel support and
make use of new APIs if available. On older kernel where the
support is not there the backported tunnel modules are used.
These backported tunnel devices acts as lwtunnel devices.
I tried to keep backported module same as upstream for easier
bug-fix backport. Since STT and LISP are not upstream OVS
always needs to use respective modules from tunnel compat layer.
To make it work on kernel 4.3 I have converted STT and LISP
modules to lwtunnel API model.

lwtunnel make use of skb-dst to pass tunnel information to the
tunnel module. On older kernel this is not possible. So the in
case of old kernel metadata ref is stored in OVS_CB and direct
call to tunnel transmit function is made by respective tunnel
vport modules. Similarly on receive side tunnel recv directly
call netdev-vport-receive to pass the skb to OVS.

Major backported components include:
Geneve, GRE, VXLAN, ip_tunnel, udp-tunnels GRO.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Joe Stringer <joe@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
2015-12-03 16:30:21 -08:00
Jiri Benc
ffe4c74f93 tunneling: extend flow_tnl with ipv6 addresses
Note that because there's been no prerequisite on the outer protocol,
we cannot add it now. Instead, treat the ipv4 and ipv6 dst fields in the way
that either both are null, or at most one of them is non-null.

[cascardo: abstract testing either dst with flow_tnl_dst_is_set]
cascardo: using IPv4-mapped address is an exercise for the future, since this
would require special handling of MFF_TUN_SRC and MFF_TUN_DST and OpenFlow
messages.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Co-authored-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2015-11-30 10:31:35 -08:00
Jarno Rajahalme
9ac0aadab9 conntrack: Add support for NAT.
Extend OVS conntrack interface to cover NAT.  New nested NAT action
may be included with a CT action.  A bare NAT action only mangles
existing connections.  If a NAT action with src or dst range attribute
is included, new (non-committed) connections are mangled according to
the NAT attributes.

This work extends on a branch by Thomas Graf at
https://github.com/tgraf/ovs/tree/nat.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
2015-11-25 16:04:59 -08:00
Zang MingJie
1889999fd9 datapath: Fix vxlan udp csum of gso packet
Signed-off-by: Zang MingJie <zealot0630@gmail.com>
Signed-off-by: Jesse Gross <jesse@kernel.org>
2015-11-23 11:18:11 -08:00
Joe Stringer
edc3afe2f1 compat: Explicitly include net/ip.h in net/udp.h.
The inet_get_local_port_range() function is defined as a 3-parameter
version in the backported net/ip.h, however some versions of RHEL7
kernel use the 2-parameter version in their net/udp.h header. We need to
make sure that our net/ip.h is first included, then undef our overriding
3-parameter version, include the system net/udp.h, then redefine our
overriding 3-parameter version so that it may be used inside OVS code.

This header needs to include net/ip.h here as some files may not include
it prior to net/udp.h, in which case the logic we have to define the
right version while including the system net/udp.h will not work.

Specifically this fixes issues on kernel 3.10.0-229.7.2.el7.x86_64
(perhaps earlier as well; some later versions make this unnecessary).

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2015-11-23 10:59:16 -08:00
Andy Zhou
27130224cd dpif-netlink: Allow MRU packet attribute.
User space now may receive re-assembled IP fragments. The user space
netlink handler can now accept packets with the new OVS_PACKET_ATTR_MRU
attribute. This allows the kernel to assemble fragmented packets for the
duration of OpenFlow processing, then re-fragment at output time. Most
notably this occurs for packets that are sent through the connection
tracker.

Note that the MRU attribute is not exported at the OpenFlow layer. As
such, if packets are reassembled by conntrack and subsequently sent to
the controller, then OVS has no way to re-serialize the packets to their
original size.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-10-13 15:34:16 -07:00
Joe Stringer
d787ad39b8 Add support for connection tracking helper/ALGs.
This patch adds support for specifying a "helper" or ALG to assist
connection tracking for protocols that consist of multiple streams.
Initially, only support for FTP is included.

Below is an example set of flows to allow FTP control connections from
port 1->2 to establish active data connections in the reverse direction:

    table=0,priority=1,action=drop
    table=0,arp,action=normal
    table=0,in_port=1,tcp,action=ct(alg=ftp,commit),2
    table=0,in_port=2,tcp,ct_state=-trk,action=ct(table=1)
    table=1,in_port=2,tcp,ct_state=+trk+est,action=1
    table=1,in_port=2,tcp,ct_state=+trk+rel,action=ct(commit),1

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-10-13 15:34:16 -07:00
Joe Stringer
9daf23484f Add connection tracking label support.
This patch adds a new 128-bit metadata field to the connection tracking
interface. When a label is specified as part of the ct action and the
connection is committed, the value is saved with the current connection.
Subsequent ct lookups with the table specified will expose this metadata
as the "ct_label" field in the flow.

For example, to allow new TCP connections from port 1->2 and only allow
established connections from port 2->1, and to associate a label with
those connections:

    table=0,priority=1,action=drop
    table=0,arp,action=normal
    table=0,in_port=1,tcp,action=ct(commit,exec(set_field:1->ct_label)),2
    table=0,in_port=2,ct_state=-trk,tcp,action=ct(table=1)
    table=1,in_port=2,ct_state=+trk,ct_label=1,tcp,action=1

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-10-13 15:34:16 -07:00
Joe Stringer
8e53fe8cf7 Add connection tracking mark support.
This patch adds a new 32-bit metadata field to the connection tracking
interface. When a mark is specified as part of the ct action and the
connection is committed, the value is saved with the current connection.
Subsequent ct lookups with the table specified will expose this metadata
as the "ct_mark" field in the flow.

For example, to allow new TCP connections from port 1->2 and only allow
established connections from port 2->1, and to associate a mark with those
connections:

    table=0,priority=1,action=drop
    table=0,arp,action=normal
    table=0,in_port=1,tcp,action=ct(commit,exec(set_field:1->ct_mark)),2
    table=0,in_port=2,ct_state=-trk,tcp,action=ct(table=1)
    table=1,in_port=2,ct_state=+trk,ct_mark=1,tcp,action=1

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-10-13 15:34:15 -07:00
Joe Stringer
07659514c3 Add support for connection tracking.
This patch adds a new action and fields to OVS that allow connection
tracking to be performed. This support works in conjunction with the
Linux kernel support merged into the Linux-4.3 development cycle.

Packets have two possible states with respect to connection tracking:
Untracked packets have not previously passed through the connection
tracker, while tracked packets have previously been through the
connection tracker. For OpenFlow pipeline processing, untracked packets
can become tracked, and they will remain tracked until the end of the
pipeline. Tracked packets cannot become untracked.

Connections can be unknown, uncommitted, or committed. Packets which are
untracked have unknown connection state. To know the connection state,
the packet must become tracked. Uncommitted connections have no
connection state stored about them, so it is only possible for the
connection tracker to identify whether they are a new connection or
whether they are invalid. Committed connections have connection state
stored beyond the lifetime of the packet, which allows later packets in
the same connection to be identified as part of the same established
connection, or related to an existing connection - for instance ICMP
error responses.

The new 'ct' action transitions the packet from "untracked" to
"tracked" by sending this flow through the connection tracker.
The following parameters are supported initally:

- "commit": When commit is executed, the connection moves from
  uncommitted state to committed state. This signals that information
  about the connection should be stored beyond the lifetime of the
  packet within the pipeline. This allows future packets in the same
  connection to be recognized as part of the same "established" (est)
  connection, as well as identifying packets in the reply (rpl)
  direction, or packets related to an existing connection (rel).
- "zone=[u16|NXM]": Perform connection tracking in the zone specified.
  Each zone is an independent connection tracking context. When the
  "commit" parameter is used, the connection will only be committed in
  the specified zone, and not in other zones. This is 0 by default.
- "table=NUMBER": Fork pipeline processing in two. The original instance
  of the packet will continue processing the current actions list as an
  untracked packet. An additional instance of the packet will be sent to
  the connection tracker, which will be re-injected into the OpenFlow
  pipeline to resume processing in the specified table, with the
  ct_state and other ct match fields set. If the table is not specified,
  then the packet is submitted to the connection tracker, but the
  pipeline does not fork and the ct match fields are not populated. It
  is strongly recommended to specify a table later than the current
  table to prevent loops.

When the "table" option is used, the packet that continues processing in
the specified table will have the ct_state populated. The ct_state may
have any of the following flags set:

- Tracked (trk): Connection tracking has occurred.
- Reply (rpl): The flow is in the reply direction.
- Invalid (inv): The connection tracker couldn't identify the connection.
- New (new): This is the beginning of a new connection.
- Established (est): This is part of an already existing connection.
- Related (rel): This connection is related to an existing connection.

For more information, consult the ovs-ofctl(8) man pages.

Below is a simple example flow table to allow outbound TCP traffic from
port 1 and drop traffic from port 2 that was not initiated by port 1:

    table=0,priority=1,action=drop
    table=0,arp,action=normal
    table=0,in_port=1,tcp,ct_state=-trk,action=ct(commit,zone=9),2
    table=0,in_port=2,tcp,ct_state=-trk,action=ct(zone=9,table=1)
    table=1,in_port=2,ct_state=+trk+est,tcp,action=1
    table=1,in_port=2,ct_state=+trk+new,tcp,action=drop

Based on original design by Justin Pettit, contributions from Thomas
Graf and Daniele Di Proietto.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-10-13 15:34:15 -07:00
Pravin B Shelar
ccb75a285b datapath: Fix compilation on kernel 2.6.32
Fixes following compilation error:

CC [M]  /home/travis/build/openvswitch/ovs/datapath/linux/actions.o

In file included from
/home/travis/build/openvswitch/ovs/datapath/linux/actions.c:21:0:

/home/travis/build/openvswitch/ovs/datapath/linux/compat/include/linux/skbuff.h:
In function ‘rpl_skb_postpull_rcsum’:

/home/travis/build/openvswitch/ovs/datapath/linux/compat/include/linux/skbuff.h:384:4:
error: implicit declaration of function ‘skb_checksum_start_offset’
[-Werror=implicit-function-declaration]

cc1: some warnings being treated as errors

Reported-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
2015-10-09 14:23:21 -07:00
Pravin B Shelar
a2cf752478 datapath: Backport "skbuff: Fix skb checksum flag on skb pull"
Upstream commit:

    VXLAN device can receive skb with checksum partial. But the checksum
    offset could be in outer header which is pulled on receive. This results
    in negative checksum offset for the skb. Such skb can cause the assert
    failure in skb_checksum_help(). Following patch fixes the bug by setting
    checksum-none while pulling outer header.

    Following is the kernel panic msg from old kernel hitting the bug.

    ------------[ cut here ]------------
    kernel BUG at net/core/dev.c:1906!
    RIP: 0010:[<ffffffff81518034>] skb_checksum_help+0x144/0x150
    Call Trace:
    <IRQ>
    [<ffffffffa0164c28>] queue_userspace_packet+0x408/0x470 [openvswitch]
    [<ffffffffa016614d>] ovs_dp_upcall+0x5d/0x60 [openvswitch]
    [<ffffffffa0166236>] ovs_dp_process_packet_with_key+0xe6/0x100 [openvswitch]
    [<ffffffffa016629b>] ovs_dp_process_received_packet+0x4b/0x80 [openvswitch]
    [<ffffffffa016c51a>] ovs_vport_receive+0x2a/0x30 [openvswitch]
    [<ffffffffa0171383>] vxlan_rcv+0x53/0x60 [openvswitch]
    [<ffffffffa01734cb>] vxlan_udp_encap_recv+0x8b/0xf0 [openvswitch]
    [<ffffffff8157addc>] udp_queue_rcv_skb+0x2dc/0x3b0
    [<ffffffff8157b56f>] __udp4_lib_rcv+0x1cf/0x6c0
    [<ffffffff8157ba7a>] udp_rcv+0x1a/0x20
    [<ffffffff8154fdbd>] ip_local_deliver_finish+0xdd/0x280
    [<ffffffff81550128>] ip_local_deliver+0x88/0x90
    [<ffffffff8154fa7d>] ip_rcv_finish+0x10d/0x370
    [<ffffffff81550365>] ip_rcv+0x235/0x300
    [<ffffffff8151ba1d>] __netif_receive_skb+0x55d/0x620
    [<ffffffff8151c360>] netif_receive_skb+0x80/0x90
    [<ffffffff81459935>] virtnet_poll+0x555/0x6f0
    [<ffffffff8151cd04>] net_rx_action+0x134/0x290
    [<ffffffff810683d8>] __do_softirq+0xa8/0x210
    [<ffffffff8162fe6c>] call_softirq+0x1c/0x30
    [<ffffffff810161a5>] do_softirq+0x65/0xa0
    [<ffffffff810687be>] irq_exit+0x8e/0xb0
    [<ffffffff81630733>] do_IRQ+0x63/0xe0
    [<ffffffff81625f2e>] common_interrupt+0x6e/0x6e

    Reported-by: Anupam Chanda <achanda@vmware.com>
    Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
    Acked-by: Tom Herbert <tom@herbertland.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Upstream: 6ae459bdaae ("skbuff: Fix skb checksum flag on skb pull")
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2015-09-25 19:50:02 -07:00
Pravin B Shelar
fdce83a304 datapath: Add support for 4.2 kernel. 2015-09-23 19:41:19 -07:00
Pravin B Shelar
ec96e66376 datapath: Fix compilation on kernel 3.18
Fixes following compilation error:
In file included from ovs/datapath/linux/actions.c:30: ovs/datapath/linux/compat/include/linux/if_vlan.h:65:
error: redefinition of ‘__vlan_hwaccel_push_inside’ include/linux/if_vlan.h:353: note: previous definition of
‘__vlan_hwaccel_push_inside’ was here ovs/datapath/linux/compat/include/linux/if_vlan.h:83:

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2015-09-22 14:21:25 -07:00
Joe Stringer
c0cddcec39 datapath: Add support for 4.1 kernel.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2015-09-18 13:27:24 -07:00
Jiri Benc
dd693f9bf2 datapath: Use netlink ipv4 API to handle the ipv4 addr attributes.
upstream: ("netlink: implement nla_put_in_addr and nla_put_in6_addr")
upstream: ("netlink: implement nla_get_in_addr and nla_get_in6_addr")
IP addresses are often stored in netlink attributes. Add generic functions
to do that.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2015-09-14 10:44:57 -07:00
Jason Kölker
554daf066b datapath: Add net/ip6_checksum.h to stt.c
`csum_ipv6_magic` is an asm inline on most platforms. However if it is
not defined (like on ppc64le) including <net/ip6_checksum.h> will fall
back to the c implementation by wrapping it in an
`#ifndef _HAVE_ARCH_IPV6_CSUM`.

Signed-off-by: Jason Kölker <jason@koelker.net>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2015-09-02 15:46:50 -07:00
Flavio Leitner
572e54faff datapath: check for rx handler register
Red Hat Enterprise Linux 6 has backported the netdev RX
handler facility so use the netdev_rx_handler_register as
an indicator.

The handler prototype changed between 2.6.36 and 2.6.39
since there could be backports in any stage, don't look
at the kernel version, but at the prototype.

Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2015-08-31 12:48:16 -07:00
Flavio Leitner
21a719d658 datapath: check for el6 kernels for per_cpu
The OVS hook has been backported so it doesn't work to
decide per_cpu work arounds.

Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2015-08-28 09:47:54 -07:00
Flavio Leitner
920fc09189 datapath: check for backported ip_is_fragment
Red Hat Enterprise Linux 6 has backported it from upstream,
so check for ip_is_fragment instead of kernel version.

Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2015-08-28 09:45:34 -07:00
Flavio Leitner
46d69d187e datapath: check for backported proto_ports_offset
Red Hat Enterprise Linux 6 has backported it from upstream,
so check for proto_ports_offset instead of kernel version.

Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2015-08-28 09:44:48 -07:00
Pravin B Shelar
99e7b07740 tunneling: Remove gre64 tunnel support.
GRE64 was introduced to extend gre key from 32-bit to 64-bit using
gre-key and sequence number field. But GRE64 is not standard
protocol. There are not many users of this protocol. Therefore we
have decided to remove it.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2015-08-20 13:01:58 -07:00
Joe Stringer
0d32b0ece3 datapath: Backport eth_proto_is_802_3().
Backport of upstream commit 2c7a88c252bf3381958cf716f31b6b2e0f2f3fa7:
"etherdev: Fix sparse error, make test usable by other functions"

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2015-07-30 16:42:07 -07:00
Neil McKee
0e469d3b38 datapath: Include datapath actions with sampled-packet upcall to userspace.
If new optional attribute OVS_USERSPACE_ATTR_ACTIONS is added to an
OVS_ACTION_ATTR_USERSPACE action, then include the datapath actions
in the upcall.

This Directly associates the sampled packet with the path it takes
through the virtual switch. Path information currently includes mangling,
encapsulation and decapsulation actions for tunneling protocols GRE,
VXLAN, Geneve, MPLS and QinQ, but this extension requires no further
changes to accommodate datapath actions that may be added in the
future.

Adding path information enhances visibility into complex virtual
networks.

Signed-off-by: Neil McKee <neil.mckee@inmon.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2015-07-17 13:23:39 -07:00