Upstream commit:
commit 4a92602aa1cd5bbaeedbd9536ff992f7d26fe9d1
Author: Tycho Andersen <tycho.andersen@canonical.com>
openvswitch: allow management from inside user namespaces
Operations with the GENL_ADMIN_PERM flag fail permissions checks because
this flag means we call netlink_capable, which uses the init user ns.
Instead, let's introduce a new flag, GENL_UNS_ADMIN_PERM for operations
which should be allowed inside a user namespace.
The motivation for this is to be able to run openvswitch in unprivileged
containers. I've tested this and it seems to work, but I really have no
idea about the security consequences of this patch, so thoughts would be
much appreciated.
v2: use the GENL_UNS_ADMIN_PERM flag instead of a check in each function
v3: use separate ifs for UNS_ADMIN_PERM and ADMIN_PERM, instead of one
massive one
Reported-by: James Page <james.page@canonical.com>
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
CC: Eric Biederman <ebiederm@xmission.com>
CC: Pravin Shelar <pshelar@ovn.org>
CC: Justin Pettit <jpettit@ovn.org>
CC: "David S. Miller" <davem@davemloft.net>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
upstream tunnel egress info is retrieved using ndo_fill_metadata_dst.
Since we do not have it on older kernel we need to keep vport operation
to do same on these kernels.
Following patch try to merge these to operations into one to avoid code
duplication.
This commit backports fc4099f1 ("openvswitch:
Fix egress tunnel info.")
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
Upstream commit:
commit b95e5928fcc76d156352570858abdea7b2628efd
Author: William Tu <u9012063@gmail.com>
Date: Mon Jun 20 07:26:17 2016 -0700
The commit f2a4d086ed4c ("openvswitch: Add packet truncation support.")
introduces packet truncation before sending to userspace upcall receiver.
This patch passes up the skb->len before truncation so that the upcall
receiver knows the original packet size. Potentially this will be used
by sFlow, where OVS translates sFlow config header=N to a sample action,
truncating packet to N byte in kernel datapath. Thus, only N bytes instead
of full-packet size is copied from kernel to userspace, saving the
kernel-to-userspace bandwidth.
Signed-off-by: William Tu <u9012063@gmail.com>
Cc: Pravin Shelar <pshelar@nicira.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tested-at: https://travis-ci.org/williamtu/ovs-travis/builds/140135299
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Upstream commit:
commit f2a4d086ed4c588d32fe9b7aa67fead7280e7bf1
Author: William Tu <u9012063@gmail.com>
Date: Fri Jun 10 11:49:33 2016 -0700
openvswitch: Add packet truncation support.
The patch adds a new OVS action, OVS_ACTION_ATTR_TRUNC, in order to
truncate packets. A 'max_len' is added for setting up the maximum
packet size, and a 'cutlen' field is to record the number of bytes
to trim the packet when the packet is outputting to a port, or when
the packet is sent to userspace.
Signed-off-by: William Tu <u9012063@gmail.com>
Cc: Pravin Shelar <pshelar@nicira.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Currently OVS out of tree datapath supports a large number of kernel
versions. From 2.6.32 to 4.3 and various distribution-specific
kernels. But at this point major features are only available on more
recent kernels. For example, stateful services are only available
starting in kernel 3.10 and STT is available on starting with 3.5.
Since these features are becoming essential to many OVS deployments,
and the effort of maintaining the backports is high. We have decided
to drop support for older kernel. Following patch drops supports
for kernel older than 3.10.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
Allow matching and setting the ct_label field. As with ct_mark, this is
populated by executing the CT action. The label field may be modified by
specifying a label and mask nested under the CT action. It is stored as
metadata attached to the connection. Label modification occurs after
lookup, and will only persist when the conntrack entry is committed by
providing the COMMIT flag to the CT action. Labels are currently fixed
to 128 bits in size.
Upstream: c2ac667 "openvswitch: Allow matching on conntrack label"
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Expose the kernel connection tracker via OVS. Userspace components can
make use of the CT action to populate the connection state (ct_state)
field for a flow. This state can be subsequently matched.
Exposed connection states are OVS_CS_F_*:
- NEW (0x01) - Beginning of a new connection.
- ESTABLISHED (0x02) - Part of an existing connection.
- RELATED (0x04) - Related to an established connection.
- INVALID (0x20) - Could not track the connection for this packet.
- REPLY_DIR (0x40) - This packet is in the reply direction for the flow.
- TRACKED (0x80) - This packet has been sent through conntrack.
When the CT action is executed by itself, it will send the packet
through the connection tracker and populate the ct_state field with one
or more of the connection state flags above. The CT action will always
set the TRACKED bit.
When the COMMIT flag is passed to the conntrack action, this specifies
that information about the connection should be stored. This allows
subsequent packets for the same (or related) connections to be
correlated with this connection. Sending subsequent packets for the
connection through conntrack allows the connection tracker to consider
the packets as ESTABLISHED, RELATED, and/or REPLY_DIR.
The CT action may optionally take a zone to track the flow within. This
allows connections with the same 5-tuple to be kept logically separate
from connections in other zones. If the zone is specified, then the
"ct_zone" match field will be subsequently populated with the zone id.
IP fragments are handled by transparently assembling them as part of the
CT action. The maximum received unit (MRU) size is tracked so that
refragmentation can occur during output.
IP frag handling contributed by Andy Zhou.
Based on original design by Justin Pettit.
Upstream: 7f8a436 "openvswitch: Add conntrack action"
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Previously, we used the kernel-internal netlink actions length to
calculate the size of messages to serialize back to userspace.
However,the sw_flow_actions may not be formatted exactly the same as the
actions on the wire, so store the original actions length when
de-serializing and re-use the original length when serializing.
Upstream: 8e2fed1 "openvswitch: Serialize acts with original netlink len"
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Backport IPv4 reassembly from the upstream commit caaecdd3d3f8 ("inet:
frags: remove INET_FRAG_EVICTED and use list_evictor for the test").
This is necessary because kernels prior to upstream commit d6b915e29f4a
("ip_fragment: don't forward defragmented DF packet") would not always
track the maximum received unit size during ip_defrag(). Without the
MRU, refragmentation cannot occur so reassembled packets are dropped.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Following patch adds support for lwtunnel to OVS datapath.
With this change OVS datapath detect lwtunnel support and
make use of new APIs if available. On older kernel where the
support is not there the backported tunnel modules are used.
These backported tunnel devices acts as lwtunnel devices.
I tried to keep backported module same as upstream for easier
bug-fix backport. Since STT and LISP are not upstream OVS
always needs to use respective modules from tunnel compat layer.
To make it work on kernel 4.3 I have converted STT and LISP
modules to lwtunnel API model.
lwtunnel make use of skb-dst to pass tunnel information to the
tunnel module. On older kernel this is not possible. So the in
case of old kernel metadata ref is stored in OVS_CB and direct
call to tunnel transmit function is made by respective tunnel
vport modules. Similarly on receive side tunnel recv directly
call netdev-vport-receive to pass the skb to OVS.
Major backported components include:
Geneve, GRE, VXLAN, ip_tunnel, udp-tunnels GRO.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Joe Stringer <joe@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
Upstream commit:
openvswitch: Zero flows on allocation.
When support for megaflows was introduced, OVS needed to start
installing flows with a mask applied to them. Since masking is an
expensive operation, OVS also had an optimization that would only
take the parts of the flow keys that were covered by a non-zero
mask. The values stored in the remaining pieces should not matter
because they are masked out.
While this works fine for the purposes of matching (which must always
look at the mask), serialization to netlink can be problematic. Since
the flow and the mask are serialized separately, the uninitialized
portions of the flow can be encoded with whatever values happen to be
present.
In terms of functionality, this has little effect since these fields
will be masked out by definition. However, it leaks kernel memory to
userspace, which is a potential security vulnerability. It is also
possible that other code paths could look at the masked key and get
uninitialized data, although this does not currently appear to be an
issue in practice.
This removes the mask optimization for flows that are being installed.
This was always intended to be the case as the mask optimizations were
really targetting per-packet flow operations.
Fixes: 03f0d916 ("openvswitch: Mega flow implementation")
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Upstream: ae5f2fb1 ("openvswitch: Zero flows on allocation.")
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
This reverts commit 2023bdcfc4.
This commit is causing segfaults when genl compat code is in use.
Compat code update genl_multicast_group and genl_family type objects.
Therefore these can not be const.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
Replace "ntohs(proto) >= ETH_P_802_3_MIN" w/ eth_proto_is_802_3(proto).
Backport of upstream commit 6713fc9b8fa33444aa000f0f31076f6a859ccb34:
"openvswitch: Use eth_proto_is_802_3"
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
If new optional attribute OVS_USERSPACE_ATTR_ACTIONS is added to an
OVS_ACTION_ATTR_USERSPACE action, then include the datapath actions
in the upcall.
This Directly associates the sampled packet with the path it takes
through the virtual switch. Path information currently includes mangling,
encapsulation and decapsulation actions for tunneling protocols GRE,
VXLAN, Geneve, MPLS and QinQ, but this extension requires no further
changes to accommodate datapath actions that may be added in the
future.
Adding path information enhances visibility into complex virtual
networks.
Signed-off-by: Neil McKee <neil.mckee@inmon.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
An increasing number of distributions ship with GCC 4.9 (including
Fedora and Ubuntu) that has -Werror=date-time. This causes kernel
compilation to fail because the builds are not exactly reproducible.
This simply removes the use of those constants, which was already
done for the upstream Linux version of the module. It retains the
version string, however, which should provide the same information
in most cases.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Upstream commit:
The internal and netdev vport remain part of openvswitch.ko. Encap
vports including vxlan, gre, and geneve can be built as separate
modules and are loaded on demand. Modules can be unloaded after use.
Datapath ports keep a reference to the vport module during their
lifetime.
Allows to remove the error prone maintenance of the global list
vport_ops_list.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Also folds in the follow-up commits 9ba559d9ca3 to turned the non-GPL
symbol exports to GPL exports, and fa2d8ff4e35 which fixes a module
reference release bug.
Exports various backwards compat functions linked into the main
openvswitch module as GPL symbols to ensure vport modules can use them.
Some fiddling with the Makefile was needed to work around the fact
that Makefile variables can't contain '-' characters needed to define
'vport-xxx' module sources. Also, Kbuild complains heavily if a
$(module)-y = $(module).o is defined which is actually backed with a .c
file of the same name. Therefore, a new $(build_multi_modules) variable
is defined which lists all module which consist of more than one source
file.
Upstream: 62b9c8d0372 ("ovs: Turn vports with dependencies into separate modules")
Upstream: 9ba559d9ca3 ("openvswitch: Export symbols as GPL symbols.")
Upstream: fa2d8ff4e35 ("openvswitch: Return vport module ref before destruction")
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Previously, flows were manipulated by userspace specifying a full,
unmasked flow key. This adds significant burden onto flow
serialization/deserialization, particularly when dumping flows.
This patch adds an alternative way to refer to flows using a
variable-length "unique flow identifier" (UFID). At flow setup time,
userspace may specify a UFID for a flow, which is stored with the flow
and inserted into a separate table for lookup, in addition to the
standard flow table. Flows created using a UFID must be fetched or
deleted using the UFID.
All flow dump operations may now be made more terse with OVS_UFID_F_*
flags. For example, the OVS_UFID_F_OMIT_KEY flag allows responses to
omit the flow key from a datapath operation if the flow has a
corresponding UFID. This significantly reduces the time spent assembling
and transacting netlink messages. With all OVS_UFID_F_OMIT_* flags
enabled, the datapath only returns the UFID and statistics for each flow
during flow dump, increasing ovs-vswitchd revalidator performance by 40%
or more.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Refactor the ovs_nla_fill_match() function into separate netlink
serialization functions ovs_nla_put_{unmasked_key,mask}(). Modify
ovs_nla_put_flow() to handle attribute nesting and expose the 'is_mask'
parameter - all callers need to nest the flow, and callers have better
knowledge about whether it is serializing a mask or not.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Open vSwitch allows moving internal vport to different namespace
while still connected to the bridge. But when namespace deleted
OVS does not detach these vports, that results in dangling
pointer to netdevice which causes kernel panic as follows.
This issue is fixed by detaching all ovs ports from the deleted
namespace at net-exit.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
IP: [<ffffffffa0aadaa5>] ovs_vport_locate+0x35/0x80 [openvswitch]
Oops: 0000 [#1] SMP
Call Trace:
[<ffffffffa0aa6391>] lookup_vport+0x21/0xd0 [openvswitch]
[<ffffffffa0aa65f9>] ovs_vport_cmd_get+0x59/0xf0 [openvswitch]
[<ffffffff8167e07c>] genl_family_rcv_msg+0x1bc/0x3e0
[<ffffffff8167e319>] genl_rcv_msg+0x79/0xc0
[<ffffffff8167d919>] netlink_rcv_skb+0xb9/0xe0
[<ffffffff8167deac>] genl_rcv+0x2c/0x40
[<ffffffff8167cffd>] netlink_unicast+0x12d/0x1c0
[<ffffffff8167d3da>] netlink_sendmsg+0x34a/0x6b0
[<ffffffff8162e140>] sock_sendmsg+0xa0/0xe0
[<ffffffff8162e5e8>] ___sys_sendmsg+0x408/0x420
[<ffffffff8162f541>] __sys_sendmsg+0x51/0x90
[<ffffffff8162f592>] SyS_sendmsg+0x12/0x20
[<ffffffff81764ee9>] system_call_fastpath+0x12/0x17
Reported-by: Assaf Muller <amuller@redhat.com>
Fixes: 46df7b81454("openvswitch: Add support for network namespaces.")
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Reviewed-by: Thomas Graf <tgraf@noironetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Upstream: 7b4577a9da ("openvswitch: Fix net exit").
Acked-by: Andy Zhou <azhou@nicira.com>
genlmsg_end() no longer returns an error value. Not a problem as it
never returned an error code anyway.
Upstream: 053c09 ("netlink: make nlmsg_end() and genlmsg_end() void")
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Upstream commit:
genetlink: pass only network namespace to genl_has_listeners()
There's no point to force the caller to know about the internal
genl_sock to use inside struct net, just have them pass the network
namespace. This doesn't really change code generation since it's
an inline, but makes the caller less magic - there's never any
reason to pass another socket.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Upstream: f8403a2 ("genetlink: pass only network namespace to genl_has_listeners()")
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Upstream commit:
net: rename vlan_tx_* helpers since "tx" is misleading there
The same macros are used for rx as well. So rename it.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Upstream: df8a39d ("net: rename vlan_tx_* helpers since "tx" is misleading there")
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
User space is currently sending a OVS_FLOW_ATTR_PROBE for both flow
and packet messages. This leads to an out-of-bounds access in
ovs_packet_cmd_execute() because OVS_FLOW_ATTR_PROBE >
OVS_PACKET_ATTR_MAX.
Introduce a new OVS_PACKET_ATTR_PROBE with the same numeric value
as OVS_FLOW_ATTR_PROBE to grow the range of accepted packet attributes
while maintaining binary compatibility with existing OVS binaries.
Fixes: 9233ce ("datapath: Add support for OVS_FLOW_ATTR_PROBE.")
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Jesse Gross <jesse@nicira.com>
This stub now allows userspace to see IFLA_INFO_KIND for ovs master and
IFLA_INFO_SLAVE_KIND for slave.
Upstream: 5b9e7e16 ("openvswitch: introduce rtnl ops stub")
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
__vlan_put_tag() was renamed to vlan_insert_tag_set_proto() with
the argument list kept intact.
Upstream: 62749e ("vlan: rename __vlan_put_tag to vlan_insert_tag_set_proto")
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Kernel datapath code has diverged from upstream code. This
makes porting patches between these two code bases harder
than it needs to be. Following patch fixes this by fixing
coding style issues on this branch.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
dp read operations depends on ovs_dp_cmd_fill_info(). This API
needs to looup vport to find dp name, but vport lookup can
fail. Therefore to keep vport reference alive we need to
take ovs lock.
Found by code inspection.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
rcu_dereference_raw() api is cleaner way of accessing RCU pointer
when no locking is required.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
skb_gso_segment has three possible return values:
1. a pointer to the first segmented skb
2. an errno value (IS_ERR())
3. NULL. This can happen when GSO is used for header verification.
However, several callers currently test IS_ERR instead of IS_ERR_OR_NULL
and would oops when NULL is returned.
Note that these call sites should never actually see such a NULL return
value; all callers mask out the GSO bits in the feature argument.
However, there have been issues with some protocol handlers erronously not
respecting the specified feature mask in some cases.
It is preferable to get 'have to turn off hw offloading, else slow' reports
rather than 'kernel crashes'.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
This patch mostly is related to tunnel API where RHEL 7
kernel API are not in-sync with newer linux kernel API. So
extra checks are required to check for parameters of API.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jiri Benc <jbenc@redhat.com>
This new flag is useful for suppressing error logging while probing
for datapath features using flow commands. For backwards
compatibility reasons the commands are executed normally, but error
logging is suppressed.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
ovs_nla_copy_actions() already frees the allocated actions buffers,
ovs_flow_cmd_new() will free it a second time when jumping to
err_kfree_acts.
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
OVS keeps pointer to packet key in skb->cb, but the packet key is
store on stack. This could make code bit tricky. So it is better to
get rid of the pointer.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
Since commit fb5d1e9e127a ("openvswitch: Build flow cmd netlink reply only if needed."),
the new flows are not notified to the listeners of OVS_FLOW_MCGROUP.
This commit fixes the problem by using the genl function, ie
genl_has_listerners() instead of netlink_has_listeners().
Signed-off-by: Samuel Gauthier <samuel.gauthier@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
This was required for old compatibility code which update stats
on fake bond interface. Now vswitchd has dropped it. This
support was always deprecated, so finally removing it.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
Openvswitch defines u64_stats_sync as ->sync rather than ->syncp,
so fails to compile with netdev_alloc_pcpu_stats(). So just rename it to ->syncp.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes: 1c213bd24ad04f4430031 (net: introduce netdev_alloc_pcpu_stats() for drivers)
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
There are many drivers calling alloc_percpu() to allocate pcpu stats
and then initializing ->syncp. So just introduce a helper function for them.
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
This patch introduces the use of the macro IS_ERR_OR_NULL in place of
tests for NULL and IS_ERR.
The following Coccinelle semantic patch was used for making the change:
@@
expression e;
@@
- e == NULL || IS_ERR(e)
+ IS_ERR_OR_NULL(e)
|| ...
Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
packet execute is setting egress_tun_info in skb->cb, rather
than packet->cb. skb is netlink msg skb. This causes corruption
in netlink skb state stored in skb->cb (NETLINK_CB) which
results in following deadlock in netlink code.
=============================================
[ INFO: possible recursive locking detected ]
3.2.62 #2
---------------------------------------------
handler55/22851 is trying to acquire lock:
(genl_mutex){+.+.+.}, at: [<ffffffff81471ad7>] genl_lock+0x17/0x20
but task is already holding lock:
(genl_mutex){+.+.+.}, at: [<ffffffff81471ad7>] genl_lock+0x17/0x20
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(genl_mutex);
lock(genl_mutex);
*** DEADLOCK ***
May be due to missing lock nesting notation
1 lock held by handler55/22851:
#0: (genl_mutex){+.+.+.}, at: [<ffffffff81471ad7>] genl_lock+0x17/0x20
stack backtrace:
Pid: 22851, comm: handler55 Tainted: G O 3.2.62 #2
Call Trace:
[<ffffffff81097bb2>] print_deadlock_bug+0xf2/0x100
[<ffffffff81099b99>] validate_chain+0x579/0x860
[<ffffffff8109a17c>] __lock_acquire+0x2fc/0x4f0
[<ffffffff8109aab0>] lock_acquire+0xa0/0x180
[<ffffffff81519070>] __mutex_lock_common+0x60/0x420
[<ffffffff8151959a>] mutex_lock_nested+0x4a/0x60
[<ffffffff81471ad7>] genl_lock+0x17/0x20
[<ffffffff81471af6>] genl_rcv+0x16/0x40
[<ffffffff8146ff72>] netlink_unicast+0x2f2/0x310
[<ffffffff81470159>] netlink_ack+0x109/0x1f0
[<ffffffff8147030b>] netlink_rcv_skb+0xcb/0xd0
[<ffffffff81471b05>] genl_rcv+0x25/0x40
[<ffffffff8146ff72>] netlink_unicast+0x2f2/0x310
[<ffffffff8147134c>] netlink_sendmsg+0x28c/0x3d0
[<ffffffff8143375f>] sock_sendmsg+0xef/0x120
[<ffffffff81435766>] ___sys_sendmsg+0x416/0x430
[<ffffffff81435949>] __sys_sendmsg+0x49/0x90
[<ffffffff814359a9>] sys_sendmsg+0x19/0x20
[<ffffffff8152432b>] system_call_fastpath+0x16/0x1b
Reported-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
Since kernel stack is limited in size, it is not wise to using
recursive function with large stack frames.
This patch provides an alternative implementation of recirc action
without using recursion.
A per CPU fixed sized, 'deferred action FIFO', is used to store either
recirc or sample actions encountered during execution of an action
list. Not executing recirc or sample action in place, but rather execute
them laster as 'deferred actions' avoids recursion.
Deferred actions are only executed after all other actions has been
executed, including the ones triggered by loopback from the kernel
network stack.
The size of the private FIFO, currently set to 20, limits the number
of total 'deferred actions' any one packet can accumulate.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Future patches will change the recirc action implementation to not
using recursion. The stack depth detection is no longer necessary.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>