2
0
mirror of https://github.com/openvswitch/ovs synced 2025-10-15 14:17:18 +00:00
Commit Graph

482 Commits

Author SHA1 Message Date
Pravin B Shelar
a6a8674dcd datapath: backport: openvswitch: allow management from inside user namespaces
Upstream commit:
    commit 4a92602aa1cd5bbaeedbd9536ff992f7d26fe9d1
    Author: Tycho Andersen <tycho.andersen@canonical.com>

    openvswitch: allow management from inside user namespaces

    Operations with the GENL_ADMIN_PERM flag fail permissions checks because
    this flag means we call netlink_capable, which uses the init user ns.

    Instead, let's introduce a new flag, GENL_UNS_ADMIN_PERM for operations
    which should be allowed inside a user namespace.

    The motivation for this is to be able to run openvswitch in unprivileged
    containers. I've tested this and it seems to work, but I really have no
    idea about the security consequences of this patch, so thoughts would be
    much appreciated.

    v2: use the GENL_UNS_ADMIN_PERM flag instead of a check in each function
    v3: use separate ifs for UNS_ADMIN_PERM and ADMIN_PERM, instead of one
        massive one

    Reported-by: James Page <james.page@canonical.com>
    Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
    CC: Eric Biederman <ebiederm@xmission.com>
    CC: Pravin Shelar <pshelar@ovn.org>
    CC: Justin Pettit <jpettit@ovn.org>
    CC: "David S. Miller" <davem@davemloft.net>
    Acked-by: Pravin B Shelar <pshelar@ovn.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
2016-07-17 10:25:09 -07:00
Pravin B Shelar
aad7cb91ef datapath: compat: Refactor egress tunnel info
upstream tunnel egress info is retrieved using ndo_fill_metadata_dst.
Since we do not have it on older kernel we need to keep vport operation
to do same on these kernels.
Following patch try to merge these to operations into one to avoid code
duplication.
This commit backports fc4099f1 ("openvswitch:
Fix egress tunnel info.")

Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
2016-07-08 19:27:48 -07:00
William Tu
039fb36c96 datapath:backport: openvswitch: Add packet len info to upcall.
Upstream commit:
    commit b95e5928fcc76d156352570858abdea7b2628efd
    Author: William Tu <u9012063@gmail.com>
    Date:   Mon Jun 20 07:26:17 2016 -0700

    The commit f2a4d086ed4c ("openvswitch: Add packet truncation support.")
    introduces packet truncation before sending to userspace upcall receiver.
    This patch passes up the skb->len before truncation so that the upcall
    receiver knows the original packet size. Potentially this will be used
    by sFlow, where OVS translates sFlow config header=N to a sample action,
    truncating packet to N byte in kernel datapath. Thus, only N bytes instead
    of full-packet size is copied from kernel to userspace, saving the
    kernel-to-userspace bandwidth.

    Signed-off-by: William Tu <u9012063@gmail.com>
    Cc: Pravin Shelar <pshelar@nicira.com>
    Acked-by: Pravin B Shelar <pshelar@ovn.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Tested-at: https://travis-ci.org/williamtu/ovs-travis/builds/140135299
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
2016-06-24 16:13:11 -07:00
William Tu
4c7804f14b datapath:backport: openvswitch: Add packet truncation support.
Upstream commit:
    commit f2a4d086ed4c588d32fe9b7aa67fead7280e7bf1
    Author: William Tu <u9012063@gmail.com>
    Date:   Fri Jun 10 11:49:33 2016 -0700

    openvswitch: Add packet truncation support.

    The patch adds a new OVS action, OVS_ACTION_ATTR_TRUNC, in order to
    truncate packets. A 'max_len' is added for setting up the maximum
    packet size, and a 'cutlen' field is to record the number of bytes
    to trim the packet when the packet is outputting to a port, or when
    the packet is sent to userspace.

    Signed-off-by: William Tu <u9012063@gmail.com>
    Cc: Pravin Shelar <pshelar@nicira.com>
    Acked-by: Pravin B Shelar <pshelar@ovn.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
2016-06-24 09:17:00 -07:00
Pravin B Shelar
8063e09587 datapath: Drop support for kernel older than 3.10
Currently OVS out of tree datapath supports a large number of kernel
versions. From 2.6.32 to 4.3 and various distribution-specific
kernels. But at this point major features are only available on more
recent kernels.  For example, stateful services are only available
starting in kernel 3.10 and STT is available on starting with 3.5.

Since these features are becoming essential to many OVS deployments,
and the effort of maintaining the backports is high. We have decided
to drop support for older kernel. Following patch drops supports
for kernel older than 3.10.

Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
2016-03-14 09:53:51 -07:00
Joe Stringer
038e34abaa datapath: Allow matching on conntrack label
Allow matching and setting the ct_label field. As with ct_mark, this is
populated by executing the CT action. The label field may be modified by
specifying a label and mask nested under the CT action. It is stored as
metadata attached to the connection. Label modification occurs after
lookup, and will only persist when the conntrack entry is committed by
providing the COMMIT flag to the CT action. Labels are currently fixed
to 128 bits in size.

Upstream: c2ac667 "openvswitch: Allow matching on conntrack label"
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2015-12-03 17:17:25 -08:00
Joe Stringer
a94ebc3999 datapath: Add conntrack action
Expose the kernel connection tracker via OVS. Userspace components can
make use of the CT action to populate the connection state (ct_state)
field for a flow. This state can be subsequently matched.

Exposed connection states are OVS_CS_F_*:
- NEW (0x01) - Beginning of a new connection.
- ESTABLISHED (0x02) - Part of an existing connection.
- RELATED (0x04) - Related to an established connection.
- INVALID (0x20) - Could not track the connection for this packet.
- REPLY_DIR (0x40) - This packet is in the reply direction for the flow.
- TRACKED (0x80) - This packet has been sent through conntrack.

When the CT action is executed by itself, it will send the packet
through the connection tracker and populate the ct_state field with one
or more of the connection state flags above. The CT action will always
set the TRACKED bit.

When the COMMIT flag is passed to the conntrack action, this specifies
that information about the connection should be stored. This allows
subsequent packets for the same (or related) connections to be
correlated with this connection. Sending subsequent packets for the
connection through conntrack allows the connection tracker to consider
the packets as ESTABLISHED, RELATED, and/or REPLY_DIR.

The CT action may optionally take a zone to track the flow within. This
allows connections with the same 5-tuple to be kept logically separate
from connections in other zones. If the zone is specified, then the
"ct_zone" match field will be subsequently populated with the zone id.

IP fragments are handled by transparently assembling them as part of the
CT action. The maximum received unit (MRU) size is tracked so that
refragmentation can occur during output.

IP frag handling contributed by Andy Zhou.

Based on original design by Justin Pettit.

Upstream: 7f8a436 "openvswitch: Add conntrack action"
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2015-12-03 17:17:25 -08:00
Joe Stringer
c3bb15b38a datapath: Serialize acts with original netlink len
Previously, we used the kernel-internal netlink actions length to
calculate the size of messages to serialize back to userspace.
However,the sw_flow_actions may not be formatted exactly the same as the
actions on the wire, so store the original actions length when
de-serializing and re-use the original length when serializing.

Upstream: 8e2fed1 "openvswitch: Serialize acts with original netlink len"
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2015-12-03 17:17:25 -08:00
Joe Stringer
595e069a06 compat: Backport IPv4 reassembly.
Backport IPv4 reassembly from the upstream commit caaecdd3d3f8 ("inet:
frags: remove INET_FRAG_EVICTED and use list_evictor for the test").

This is necessary because kernels prior to upstream commit d6b915e29f4a
("ip_fragment: don't forward defragmented DF packet") would not always
track the maximum received unit size during ip_defrag(). Without the
MRU, refragmentation cannot occur so reassembled packets are dropped.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2015-12-03 17:08:16 -08:00
Pravin B Shelar
e23775f20e datapath: Add support for lwtunnel
Following patch adds support for lwtunnel to OVS datapath.
With this change OVS datapath detect lwtunnel support and
make use of new APIs if available. On older kernel where the
support is not there the backported tunnel modules are used.
These backported tunnel devices acts as lwtunnel devices.
I tried to keep backported module same as upstream for easier
bug-fix backport. Since STT and LISP are not upstream OVS
always needs to use respective modules from tunnel compat layer.
To make it work on kernel 4.3 I have converted STT and LISP
modules to lwtunnel API model.

lwtunnel make use of skb-dst to pass tunnel information to the
tunnel module. On older kernel this is not possible. So the in
case of old kernel metadata ref is stored in OVS_CB and direct
call to tunnel transmit function is made by respective tunnel
vport modules. Similarly on receive side tunnel recv directly
call netdev-vport-receive to pass the skb to OVS.

Major backported components include:
Geneve, GRE, VXLAN, ip_tunnel, udp-tunnels GRO.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Joe Stringer <joe@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
2015-12-03 16:30:21 -08:00
Jesse Gross
ad4adec2a3 datapath: Backport "openvswitch: Zero flows on allocation."
Upstream commit:
    openvswitch: Zero flows on allocation.

    When support for megaflows was introduced, OVS needed to start
    installing flows with a mask applied to them. Since masking is an
    expensive operation, OVS also had an optimization that would only
    take the parts of the flow keys that were covered by a non-zero
    mask. The values stored in the remaining pieces should not matter
    because they are masked out.

    While this works fine for the purposes of matching (which must always
    look at the mask), serialization to netlink can be problematic. Since
    the flow and the mask are serialized separately, the uninitialized
    portions of the flow can be encoded with whatever values happen to be
    present.

    In terms of functionality, this has little effect since these fields
    will be masked out by definition. However, it leaks kernel memory to
    userspace, which is a potential security vulnerability. It is also
    possible that other code paths could look at the masked key and get
    uninitialized data, although this does not currently appear to be an
    issue in practice.

    This removes the mask optimization for flows that are being installed.
    This was always intended to be the case as the mask optimizations were
    really targetting per-packet flow operations.

    Fixes: 03f0d916 ("openvswitch: Mega flow implementation")
    Signed-off-by: Jesse Gross <jesse@nicira.com>
    Acked-by: Pravin B Shelar <pshelar@nicira.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Upstream: ae5f2fb1 ("openvswitch: Zero flows on allocation.")
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2015-09-23 19:49:27 -07:00
Joe Stringer
c0cddcec39 datapath: Add support for 4.1 kernel.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2015-09-18 13:27:24 -07:00
Pravin B Shelar
18fd3a521b datapath: Revert "datapath: Constify netlink structs."
This reverts commit 2023bdcfc4.
This commit is causing segfaults when genl compat code is in use.

Compat code update genl_multicast_group and genl_family type objects.
Therefore these can not be const.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
2015-08-05 10:44:04 -07:00
Alexander Duyck
935fc58209 datapath: Use eth_proto_is_802_3.
Replace "ntohs(proto) >= ETH_P_802_3_MIN" w/ eth_proto_is_802_3(proto).

Backport of upstream commit 6713fc9b8fa33444aa000f0f31076f6a859ccb34:
"openvswitch: Use eth_proto_is_802_3"

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2015-07-30 16:42:07 -07:00
Joe Stringer
2023bdcfc4 datapath: Constify netlink structs.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2015-07-30 16:42:07 -07:00
Joe Stringer
a0fb56c1b2 datapath: Whitespace fixes.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2015-07-30 16:42:07 -07:00
Neil McKee
0e469d3b38 datapath: Include datapath actions with sampled-packet upcall to userspace.
If new optional attribute OVS_USERSPACE_ATTR_ACTIONS is added to an
OVS_ACTION_ATTR_USERSPACE action, then include the datapath actions
in the upcall.

This Directly associates the sampled packet with the path it takes
through the virtual switch. Path information currently includes mangling,
encapsulation and decapsulation actions for tunneling protocols GRE,
VXLAN, Geneve, MPLS and QinQ, but this extension requires no further
changes to accommodate datapath actions that may be added in the
future.

Adding path information enhances visibility into complex virtual
networks.

Signed-off-by: Neil McKee <neil.mckee@inmon.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2015-07-17 13:23:39 -07:00
Jesse Gross
26bfaeaa96 datapath: Stop using __DATE__ and __TIME__ in startup string.
An increasing number of distributions ship with GCC 4.9 (including
Fedora and Ubuntu) that has -Werror=date-time. This causes kernel
compilation to fail because the builds are not exactly reproducible.

This simply removes the use of those constants, which was already
done for the upstream Linux version of the module. It retains the
version string, however, which should provide the same information
in most cases.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-04-27 15:09:28 -07:00
Thomas Graf
5a38795f5e datapath: Turn vports with dependencies into separate modules
Upstream commit:
    The internal and netdev vport remain part of openvswitch.ko. Encap
    vports including vxlan, gre, and geneve can be built as separate
    modules and are loaded on demand. Modules can be unloaded after use.
    Datapath ports keep a reference to the vport module during their
    lifetime.

    Allows to remove the error prone maintenance of the global list
    vport_ops_list.

    Signed-off-by: Thomas Graf <tgraf@suug.ch>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Also folds in the follow-up commits 9ba559d9ca3 to turned the non-GPL
symbol exports to GPL exports, and fa2d8ff4e35 which fixes a module
reference release bug.

Exports various backwards compat functions linked into the main
openvswitch module as GPL symbols to ensure vport modules can use them.

Some fiddling with the Makefile was needed to work around the fact
that Makefile variables can't contain '-' characters needed to define
'vport-xxx' module sources. Also, Kbuild complains heavily if a
$(module)-y = $(module).o is defined which is actually backed with a .c
file of the same name. Therefore, a new $(build_multi_modules) variable
is defined which lists all module which consist of more than one source
file.

Upstream: 62b9c8d0372 ("ovs: Turn vports with dependencies into separate modules")
Upstream: 9ba559d9ca3 ("openvswitch: Export symbols as GPL symbols.")
Upstream: fa2d8ff4e35 ("openvswitch: Return vport module ref before destruction")
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2015-04-04 08:24:44 +02:00
Joe Stringer
bc619e29df datapath: Add support for unique flow IDs.
Previously, flows were manipulated by userspace specifying a full,
unmasked flow key. This adds significant burden onto flow
serialization/deserialization, particularly when dumping flows.

This patch adds an alternative way to refer to flows using a
variable-length "unique flow identifier" (UFID). At flow setup time,
userspace may specify a UFID for a flow, which is stored with the flow
and inserted into a separate table for lookup, in addition to the
standard flow table. Flows created using a UFID must be fetched or
deleted using the UFID.

All flow dump operations may now be made more terse with OVS_UFID_F_*
flags. For example, the OVS_UFID_F_OMIT_KEY flag allows responses to
omit the flow key from a datapath operation if the flow has a
corresponding UFID. This significantly reduces the time spent assembling
and transacting netlink messages. With all OVS_UFID_F_OMIT_* flags
enabled, the datapath only returns the UFID and statistics for each flow
during flow dump, increasing ovs-vswitchd revalidator performance by 40%
or more.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-27 10:58:37 -08:00
Joe Stringer
db7f223827 datapath: Refactor ovs_nla_fill_match().
Refactor the ovs_nla_fill_match() function into separate netlink
serialization functions ovs_nla_put_{unmasked_key,mask}(). Modify
ovs_nla_put_flow() to handle attribute nesting and expose the 'is_mask'
parameter - all callers need to nest the flow, and callers have better
knowledge about whether it is serializing a mask or not.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-27 10:58:36 -08:00
Pravin B Shelar
cabd55169e datapath: Fix net exit.
Open vSwitch allows moving internal vport to different namespace
while still connected to the bridge. But when namespace deleted
OVS does not detach these vports, that results in dangling
pointer to netdevice which causes kernel panic as follows.
This issue is fixed by detaching all ovs ports from the deleted
namespace at net-exit.

BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
IP: [<ffffffffa0aadaa5>] ovs_vport_locate+0x35/0x80 [openvswitch]
Oops: 0000 [#1] SMP
Call Trace:
 [<ffffffffa0aa6391>] lookup_vport+0x21/0xd0 [openvswitch]
 [<ffffffffa0aa65f9>] ovs_vport_cmd_get+0x59/0xf0 [openvswitch]
 [<ffffffff8167e07c>] genl_family_rcv_msg+0x1bc/0x3e0
 [<ffffffff8167e319>] genl_rcv_msg+0x79/0xc0
 [<ffffffff8167d919>] netlink_rcv_skb+0xb9/0xe0
 [<ffffffff8167deac>] genl_rcv+0x2c/0x40
 [<ffffffff8167cffd>] netlink_unicast+0x12d/0x1c0
 [<ffffffff8167d3da>] netlink_sendmsg+0x34a/0x6b0
 [<ffffffff8162e140>] sock_sendmsg+0xa0/0xe0
 [<ffffffff8162e5e8>] ___sys_sendmsg+0x408/0x420
 [<ffffffff8162f541>] __sys_sendmsg+0x51/0x90
 [<ffffffff8162f592>] SyS_sendmsg+0x12/0x20
 [<ffffffff81764ee9>] system_call_fastpath+0x12/0x17

Reported-by: Assaf Muller <amuller@redhat.com>
Fixes: 46df7b81454("openvswitch: Add support for network namespaces.")
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Reviewed-by: Thomas Graf <tgraf@noironetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Upstream: 7b4577a9da ("openvswitch: Fix net exit").
Acked-by: Andy Zhou <azhou@nicira.com>
2015-02-20 15:28:12 -08:00
Thomas Graf
23b48dc182 datapath: Account for "netlink: make nlmsg_end() and genlmsg_end() void"
genlmsg_end() no longer returns an error value. Not a problem as it
never returned an error code anyway.

Upstream: 053c09 ("netlink: make nlmsg_end() and genlmsg_end() void")
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2015-02-03 22:31:20 +01:00
Thomas Graf
6233a1bdf1 datapath: Account for "genetlink: pass only network namespace to genl_has_listeners()"
Upstream commit:
    genetlink: pass only network namespace to genl_has_listeners()

    There's no point to force the caller to know about the internal
    genl_sock to use inside struct net, just have them pass the network
    namespace. This doesn't really change code generation since it's
    an inline, but makes the caller less magic - there's never any
    reason to pass another socket.

    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Upstream: f8403a2 ("genetlink: pass only network namespace to genl_has_listeners()")
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2015-02-03 22:31:20 +01:00
Thomas Graf
efd8a18e8d datapath: Account for "rename vlan_tx_* helpers since "tx" is misleading there"
Upstream commit:
    net: rename vlan_tx_* helpers since "tx" is misleading there

    The same macros are used for rx as well. So rename it.

    Signed-off-by: Jiri Pirko <jiri@resnulli.us>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Upstream: df8a39d ("net: rename vlan_tx_* helpers since "tx" is misleading there")
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2015-02-03 21:55:38 +01:00
Thomas Graf
2e460098bf dpif: Use separate OVS_PACKET_ATTR_PROBE for packet messges
User space is currently sending a OVS_FLOW_ATTR_PROBE for both flow
and packet messages. This leads to an out-of-bounds access in
ovs_packet_cmd_execute() because OVS_FLOW_ATTR_PROBE >
OVS_PACKET_ATTR_MAX.

Introduce a new OVS_PACKET_ATTR_PROBE with the same numeric value
as OVS_FLOW_ATTR_PROBE to grow the range of accepted packet attributes
while maintaining binary compatibility with existing OVS binaries.

Fixes: 9233ce ("datapath: Add support for OVS_FLOW_ATTR_PROBE.")
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2015-01-15 00:17:31 +01:00
Thomas Graf
5282e284ac datapath: introduce rtnl ops stub
This stub now allows userspace to see IFLA_INFO_KIND for ovs master and
IFLA_INFO_SLAVE_KIND for slave.

Upstream: 5b9e7e16 ("openvswitch: introduce rtnl ops stub")
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2015-01-07 12:55:49 +01:00
Thomas Graf
1f649f1c8d datapath: Account for rename to vlan_insert_tag_set_proto()
__vlan_put_tag() was renamed to vlan_insert_tag_set_proto() with
the argument list kept intact.

Upstream: 62749e ("vlan: rename __vlan_put_tag to vlan_insert_tag_set_proto")
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2015-01-07 12:55:49 +01:00
Pravin B Shelar
7d16c8478e datapath: fix coding style.
Kernel datapath code has diverged from upstream code.  This
makes porting patches between these two code bases harder
than it needs to be. Following patch fixes this by fixing
coding style issues on this branch.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-11-09 20:03:33 -08:00
Pravin B Shelar
d637497c0d datapath: Convert dp rcu read operation to locked operations
dp read operations depends on ovs_dp_cmd_fill_info(). This API
needs to looup vport to find dp name, but vport lookup can
fail. Therefore to keep vport reference alive we need to
take ovs lock.

Found by code inspection.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
2014-11-04 13:52:45 -08:00
Pravin B Shelar
af465b67a9 datapath: Fix comment style.
Use netdev comment style.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
2014-10-23 19:09:23 -07:00
Pravin B Shelar
46051cf8ad datapath: Replace __force type cast with rcu_dereference_raw().
rcu_dereference_raw() api is cleaner way of accessing RCU pointer
when no locking is required.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
2014-10-23 19:09:23 -07:00
Pravin B Shelar
d1da76691a datapath: net: make skb_gso_segment error handling more robust
skb_gso_segment has three possible return values:
1. a pointer to the first segmented skb
2. an errno value (IS_ERR())
3. NULL.  This can happen when GSO is used for header verification.

However, several callers currently test IS_ERR instead of IS_ERR_OR_NULL
and would oops when NULL is returned.

Note that these call sites should never actually see such a NULL return
value; all callers mask out the GSO bits in the feature argument.

However, there have been issues with some protocol handlers erronously not
respecting the specified feature mask in some cases.

It is preferable to get 'have to turn off hw offloading, else slow' reports
rather than 'kernel crashes'.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-10-21 14:15:27 -07:00
Pravin B Shelar
705e9260d5 datapath: Add support for RHEL-7 / CentOS-7 kernel.
This patch mostly is related to tunnel API where RHEL 7
kernel API are not in-sync with newer linux kernel API. So
extra checks are required to check for parameters of API.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jiri Benc <jbenc@redhat.com>
2014-10-03 15:38:53 -07:00
Jarno Rajahalme
9233cef706 datapath: Add support for OVS_FLOW_ATTR_PROBE.
This new flag is useful for suppressing error logging while probing
for datapath features using flow commands.  For backwards
compatibility reasons the commands are executed normally, but error
logging is suppressed.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-10-03 13:31:07 -07:00
Thomas Graf
f1f60b8583 datapath: Constify various function arguments
Help produce better optimized code.

Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-09-23 14:47:58 -07:00
Pravin B Shelar
b2a23c4ea7 datapath: Restore OVS_CB after skb_segment.
OVS needs to segments large skb before sending it for miss
packet handling to userspace. but skb_gso_segment uses
skb->cb. This corrupted OVS_CB which result in following panic.

[  735.419921] BUG: unable to handle kernel paging request at 00000014000001b2
[  735.423168] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[  735.445097] RIP: 0010:[<ffffffffa05df0d7>]  [<ffffffffa05df0d7>] ovs_nla_put_flow+0x37/0x7c0 [openvswitch]
[  735.468858] Call Trace:
[  735.470384]  [<ffffffffa05d7ec2>] queue_userspace_packet+0x332/0x4d0 [openvswitch]
[  735.471741]  [<ffffffffa05d8155>] queue_gso_packets+0xf5/0x180 [openvswitch]
[  735.481862]  [<ffffffffa05da9f5>] ovs_dp_upcall+0x65/0x70 [openvswitch]
[  735.483031]  [<ffffffffa05dab81>] ovs_dp_process_packet+0x181/0x1b0 [openvswitch]
[  735.484391]  [<ffffffffa05e2f55>] ovs_vport_receive+0x65/0x90 [openvswitch]
[  735.492638]  [<ffffffffa05e5738>] internal_dev_xmit+0x68/0x110 [openvswitch]
[  735.495334]  [<ffffffff81588eb6>] dev_hard_start_xmit+0x2e6/0x8b0
[  735.496503]  [<ffffffff81589847>] __dev_queue_xmit+0x3c7/0x920
[  735.499827]  [<ffffffff81589db0>] dev_queue_xmit+0x10/0x20
[  735.500798]  [<ffffffff815d3b60>] ip_finish_output+0x6a0/0x950
[  735.502818]  [<ffffffff815d55f8>] ip_output+0x68/0x110
[  735.503835]  [<ffffffff815d4979>] ip_local_out+0x29/0x90
[  735.504801]  [<ffffffff815d4e46>] ip_queue_xmit+0x1d6/0x640
[  735.507015]  [<ffffffff815ee0d7>] tcp_transmit_skb+0x477/0xac0
[  735.508260]  [<ffffffff815ee856>] tcp_write_xmit+0x136/0xba0
[  735.510829]  [<ffffffff815ef56e>] __tcp_push_pending_frames+0x2e/0xc0
[  735.512296]  [<ffffffff815e0593>] tcp_sendmsg+0xa63/0xd50
[  735.513526]  [<ffffffff81612c2c>] inet_sendmsg+0x10c/0x220
[  735.516025]  [<ffffffff81566b8c>] sock_sendmsg+0x9c/0xe0
[  735.518066]  [<ffffffff81566d41>] SYSC_sendto+0x121/0x1c0
[  735.521398]  [<ffffffff8156801e>] SyS_sendto+0xe/0x10
[  735.522473]  [<ffffffff816df5e9>] system_call_fastpath+0x16/0x1b

Reported-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
2014-09-23 14:47:55 -07:00
Thomas Graf
4f67b12af0 datapath: Fix double free when ovs_nla_copy_actions() fails
ovs_nla_copy_actions() already frees the allocated actions buffers,
ovs_flow_cmd_new() will free it a second time when jumping to
err_kfree_acts.

Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-09-23 14:47:53 -07:00
Pravin B Shelar
e74d48171e datapath: Remove pkt_key from OVS_CB.
OVS keeps pointer to packet key in skb->cb, but the packet key is
store on stack. This could make code bit tricky. So it is better to
get rid of the pointer.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
2014-09-20 19:45:56 -07:00
Samuel Gauthier
114fce23a7 datapath: restore OVS_FLOW_CMD_NEW notifications
Since commit fb5d1e9e127a ("openvswitch: Build flow cmd netlink reply only if needed."),
the new flows are not notified to the listeners of OVS_FLOW_MCGROUP.

This commit fixes the problem by using the genl function, ie
genl_has_listerners() instead of netlink_has_listeners().

Signed-off-by: Samuel Gauthier <samuel.gauthier@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-09-20 19:45:51 -07:00
Pravin B Shelar
c981665a3d datapath: Remove support to set vport stats.
This was required for old compatibility code which update stats
on fake bond interface. Now vswitchd has dropped it. This
support was always deprecated, so finally removing it.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
2014-09-20 19:36:33 -07:00
WANG Cong
b81deb156d openvswitch: rename ->sync to ->syncp
Openvswitch defines u64_stats_sync as ->sync rather than ->syncp,
so fails to compile with netdev_alloc_pcpu_stats(). So just rename it to ->syncp.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes: 1c213bd24ad04f4430031 (net: introduce netdev_alloc_pcpu_stats() for drivers)
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-09-12 14:29:00 -07:00
WANG Cong
08fb1bbdcc datapath: introduce netdev_alloc_pcpu_stats() for drivers
There are many drivers calling alloc_percpu() to allocate pcpu stats
and then initializing ->syncp. So just introduce a helper function for them.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-09-12 14:29:00 -07:00
Himangi Saraogi
a6ddcc9a1b datapath: Use IS_ERR_OR_NULL
This patch introduces the use of the macro IS_ERR_OR_NULL in place of
tests for NULL and IS_ERR.

The following Coccinelle semantic patch was used for making the change:

@@
expression e;
@@

- e == NULL || IS_ERR(e)
+ IS_ERR_OR_NULL(e)
 || ...

Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-12 14:28:59 -07:00
Jean Sacren
49f24f6dc6 datapath: fix duplicate #include headers
The #include headers net/genetlink.h and linux/genetlink.h both were
included twice, so delete each of the duplicate.

Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Cc: Pravin Shelar <pshelar@nicira.com>
Cc: dev@openvswitch.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-09-12 14:28:59 -07:00
Pravin B Shelar
2c622e5aa9 datapath: Remove unused dp parameter.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
2014-09-08 14:22:56 -07:00
Pravin B Shelar
800711c3c2 datapath: Set packet egress_tun_info.
packet execute is setting egress_tun_info in skb->cb, rather
than packet->cb. skb is netlink msg skb. This causes corruption
in netlink skb state stored in skb->cb (NETLINK_CB) which
results in following deadlock in netlink code.

=============================================
[ INFO: possible recursive locking detected ]
3.2.62 #2
---------------------------------------------
handler55/22851 is trying to acquire lock:
 (genl_mutex){+.+.+.}, at: [<ffffffff81471ad7>] genl_lock+0x17/0x20

but task is already holding lock:
 (genl_mutex){+.+.+.}, at: [<ffffffff81471ad7>] genl_lock+0x17/0x20

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(genl_mutex);
  lock(genl_mutex);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

1 lock held by handler55/22851:
 #0:  (genl_mutex){+.+.+.}, at: [<ffffffff81471ad7>] genl_lock+0x17/0x20

stack backtrace:
Pid: 22851, comm: handler55 Tainted: G           O 3.2.62 #2
Call Trace:
 [<ffffffff81097bb2>] print_deadlock_bug+0xf2/0x100
 [<ffffffff81099b99>] validate_chain+0x579/0x860
 [<ffffffff8109a17c>] __lock_acquire+0x2fc/0x4f0
 [<ffffffff8109aab0>] lock_acquire+0xa0/0x180
 [<ffffffff81519070>] __mutex_lock_common+0x60/0x420
 [<ffffffff8151959a>] mutex_lock_nested+0x4a/0x60
 [<ffffffff81471ad7>] genl_lock+0x17/0x20
 [<ffffffff81471af6>] genl_rcv+0x16/0x40
 [<ffffffff8146ff72>] netlink_unicast+0x2f2/0x310
 [<ffffffff81470159>] netlink_ack+0x109/0x1f0
 [<ffffffff8147030b>] netlink_rcv_skb+0xcb/0xd0
 [<ffffffff81471b05>] genl_rcv+0x25/0x40
 [<ffffffff8146ff72>] netlink_unicast+0x2f2/0x310
 [<ffffffff8147134c>] netlink_sendmsg+0x28c/0x3d0
 [<ffffffff8143375f>] sock_sendmsg+0xef/0x120
 [<ffffffff81435766>] ___sys_sendmsg+0x416/0x430
 [<ffffffff81435949>] __sys_sendmsg+0x49/0x90
 [<ffffffff814359a9>] sys_sendmsg+0x19/0x20
 [<ffffffff8152432b>] system_call_fastpath+0x16/0x1b

Reported-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
2014-09-08 10:43:36 -07:00
Li RongQing
a7d607c58d datapath: distinguish between the dropped and consumed skb
distinguish between the dropped and consumed skb, not assume the skb
is consumed always

Cc: Thomas Graf <tgraf@noironetworks.com>
Cc: Pravin Shelar <pshelar@nicira.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-09-08 10:43:34 -07:00
Andy Zhou
2c8c4fb7ff datapath: Implement recirc action without recursion
Since kernel stack is limited in size, it is not wise to using
recursive function with large stack frames.

This patch provides an alternative implementation of recirc action
without using recursion.

A per CPU fixed sized, 'deferred action FIFO', is used to store either
recirc or sample actions encountered during execution of an action
list. Not executing recirc or sample action in place, but rather execute
them laster as 'deferred actions' avoids recursion.

Deferred actions are only executed after all other actions has been
executed, including the ones triggered by loopback from the kernel
network stack.

The size of the private FIFO, currently set to 20, limits the number
of total 'deferred actions' any one packet can accumulate.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-09-05 17:34:40 -07:00
Andy Zhou
60759b2b9d datapath: Remove recirc stack depth limit check
Future patches will change the recirc action implementation to not
using recursion. The stack depth detection is no longer necessary.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-09-05 17:25:39 -07:00