2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-30 05:47:55 +00:00

405 Commits

Author SHA1 Message Date
Darrell Ball
db1dcb235f datapath: Fix builds on older kernels.
On older kernels, for example 3.19, the function rt6_get_cookie() is
not available and used with ipv6 config enabled;  it was introduced in
4.2.  Put back the replacement function if it does not exist.
Add a 3.19 version to travis.

CC: Yifeng Sun <pkusunyifeng@gmail.com>
Fixes: bf61b8b1c1db ("datapath: Add support for kernel 4.16.x & 4.17.x.")
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Tested-by: Yifeng Sun <pkusunyifeng@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
2018-08-30 14:09:35 -07:00
Yifeng Sun
bf61b8b1c1 datapath: Add support for kernel 4.16.x & 4.17.x
Add support for kernel version up to 4.17.x. On Travis, build passed
for all kernel versions. And no new test fails are introduced by this
patch.

Cleaned up file datapath/linux/compat/include/net/ip6_fib.h which
has no effect to kernel module but brings complexity to porting.

Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
2018-08-24 10:54:00 -07:00
Yi-Hung Wei
6660a9597a datapath: compat: Introduce static key support
Static keys allow the inclusion of seldom used features in
performance-sensitive fast-path kernel code, via a GCC feature and a
code patching technique. For more information:
    * https://www.kernel.org/doc/Documentation/static-keys.txt

Since upstream ovs kernel module now uses some static key API that was
introduced in v4.3 kernel, we shall backport them to the compat module
for older kernel supprots.

This backport is based on upstream net-next commit 11276d5306b8
("locking/static_keys: Add a new static_key interface").

Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Justin Pettit <jpettit@ovn.org>
2018-08-17 09:30:36 -07:00
Yi-Hung Wei
744964326f datapath: compat: Backports nf_conncount
This patch backports the nf_conncount backend that counts the number
of connections matching an arbitrary key.  The following patch will
use the feature to support connection tracking zone limit in ovs
kernel datapath.

This backport is based on an upstream net-next upstream commits.
5c789e131cbb ("netfilter: nf_conncount: Add list lock and gc worker, and RCU for init tree search")
34848d5c896e ("netfilter: nf_conncount: Split insert and traversal")
2ba39118c10a ("netfilter: nf_conncount: Move locking into count_tree()")
976afca1ceba ("netfilter: nf_conncount: Early exit in nf_conncount_lookup() and cleanup")
cb2b36f5a97d ("netfilter: nf_conncount: Switch to plain list")
2a406e8ac7c3 ("netfilter: nf_conncount: Early exit for garbage collection")
b36e4523d4d5 ("netfilter: nf_conncount: fix garbage collection confirm race")
21ba8847f857 ("netfilter: nf_conncount: Fix garbage collection with zones")
5e5cbc7b23ea ("netfilter: nf_conncount: expose connection list interface")
35d8deb80c30 ("netfilter: conncount: Support count only use case")
6aec208786c2 ("netfilter: Refactor nf_conncount")
d384e65f1e75 ("netfilter: return booleans instead of integers")
625c556118f3 ("netfilter: connlimit: split xt_connlimit into front and backend")

The upstream nf_conncount has a couple of export functions while
this patch only export the ones that ovs kernel module needs.

Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Justin Pettit <jpettit@ovn.org>
2018-08-17 09:30:33 -07:00
Yi-Hung Wei
179fccce34 compat: Backport nf_ct_netns_{get, put}()
This patch backports nf_ct_netns_get/put() in order to support a feature
in the follow up patch.

nf_ct_netns_{get,put} were first introduced in upstream net-next commit
ecb2421b5ddf ("netfilter: add and use nf_ct_netns_get/put") in kernel
v4.10, and then updated in commmit 7e35ec0e8044 ("netfilter: conntrack:
move nf_ct_netns_{get,put}() to core") in kernel v4.15.  We need to
invoke nf_ct_netns_get/put() when the underlying nf_conntrack_l3proto
supports net_ns_{get,put}().

Therefore, there are 3 cases that we need to consider.
1) Before nf_ct_{get,put}() is introduced.
    We just mock nf_ct_nets_{get,put}() and do nothing.

2) After 1) and before v4.15
    Backports based on commit 7e35ec0e8044 .

3) Staring from v4.15
    Use the upstream version.

Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Justin Pettit <jpettit@ovn.org>
2018-08-17 09:30:12 -07:00
Yifeng Sun
7f63d8302e porting: Add fixes to support kernel 4.15.x
This patch enables OVS kernel module to run on kernel 4.15.x.
Two conntrack-related tests failed:
 - conntrack - multiple zones, local
 - conntrack - multi-stage pipeline, local
This might be due to conntrack policy changes for packets coming
from local ports on kernel 4.15. More survey will be done later.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Co-authored-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Tested-by: Gregory Rose <gvrose8192@gmail.com>
Reviewed-by: Gregory Rose <gvrose8192@gmail.com>
2018-08-16 16:24:07 -07:00
Greg Rose
44a629b5b8 compat: Substitute more dependable define
The compat layer ip_tunnel_get_stats64 function was checking for the
Linux kernel version to determine if the return was void or a pointer.
This is not very reliable and caused compile warnings on SLES 12 SP3.
In acinclude.m4 create a more reliable method of determining when to
use a void return vs. a pointer return.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2018-08-13 17:42:22 -07:00
wenxu
e2e11c890e datapath: support upstream ndo_udp_tunnel_add in net_device_ops
It makes datapath can support both ndo_add_udp_tunnel_port and
ndo_add_vxlan/geneve_port. The newer kernels don't support vxlan/geneve
specific NDO's anymore

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
2018-08-07 15:06:32 -07:00
Or Gerlitz
dd83253e11 lib/tc: Support matching on ip tunnel tos and ttl
Support matching on tos and ttl of ip tunnels
for the TC data-path.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2018-08-01 11:32:54 +02:00
Or Gerlitz
4b12e45435 lib/tc: Support setting tos and ttl for TC IP tunnels
Allow to set the tos and ttl for TC tunnels.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2018-08-01 11:32:54 +02:00
Yi-Hung Wei
fdec3c17ac datapath: NAT support for shifted portmap ranges
This patch backports the following upstream commit from net-next, and
defines HAVE_NF_NAT_RANGE2 to determine whether to use
'struct nf_nat_range2'.

Upstream commit:
    commit 2eb0f624b709e78ec8e2f4c3412947703db99301
    Author: Thierry Du Tre <thierry@dtsystems.be>
    Date:   Wed Apr 4 15:38:22 2018 +0200

    netfilter: add NAT support for shifted portmap ranges

    This is a patch proposal to support shifted ranges in portmaps.  (i.e. tcp/udp
    incoming port 5000-5100 on WAN redirected to LAN 192.168.1.5:2000-2100)

    Currently DNAT only works for single port or identical port ranges.  (i.e.
    ports 5000-5100 on WAN interface redirected to a LAN host while original
    destination port is not altered) When different port ranges are configured,
    either 'random' mode should be used, or else all incoming connections are
    mapped onto the first port in the redirect range. (in described example
    WAN:5000-5100 will all be mapped to 192.168.1.5:2000)

    This patch introduces a new mode indicated by flag NF_NAT_RANGE_PROTO_OFFSET
    which uses a base port value to calculate an offset with the destination port
    present in the incoming stream. That offset is then applied as index within the
    redirect port range (index modulo rangewidth to handle range overflow).

    In described example the base port would be 5000. An incoming stream with
    destination port 5004 would result in an offset value 4 which means that the
    NAT'ed stream will be using destination port 2004.

    Other possibilities include deterministic mapping of larger or multiple ranges
    to a smaller range : WAN:5000-5999 -> LAN:5000-5099 (maps WAN port 5*xx to port
    51xx)

    This patch does not change any current behavior. It just adds new NAT proto
    range functionality which must be selected via the specific flag when intended
    to use.

    A patch for iptables (libipt_DNAT.c + libip6t_DNAT.c) will also be proposed
    which makes this functionality immediately available.

    Signed-off-by: Thierry Du Tre <thierry@dtsystems.be>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Justin Pettit <jpettit@ovn.org>
2018-07-30 11:08:10 -07:00
Yi-Hung Wei
3d10a0c851 datapath: Introduce net_rwsem and remove rtnl_lock()
This patch backports the following two upstream commits and
add a new symbol HAVE_NET_RWSEM in acinclude.m4 to determine
whether to use new introduced rw_semaphore, net_rwsem.

Upstream commit:
    commit f0b07bb151b098d291fd1fd71ef7a2df56fb124a
    Author: Kirill Tkhai <ktkhai@virtuozzo.com>
    Date:   Thu Mar 29 19:20:32 2018 +0300

    net: Introduce net_rwsem to protect net_namespace_list

    rtnl_lock() is used everywhere, and contention is very high.
    When someone wants to iterate over alive net namespaces,
    he/she has no a possibility to do that without exclusive lock.
    But the exclusive rtnl_lock() in such places is overkill,
    and it just increases the contention. Yes, there is already
    for_each_net_rcu() in kernel, but it requires rcu_read_lock(),
    and this can't be sleepable. Also, sometimes it may be need
    really prevent net_namespace_list growth, so for_each_net_rcu()
    is not fit there.

    This patch introduces new rw_semaphore, which will be used
    instead of rtnl_mutex to protect net_namespace_list. It is
    sleepable and allows not-exclusive iterations over net
    namespaces list. It allows to stop using rtnl_lock()
    in several places (what is made in next patches) and makes
    less the time, we keep rtnl_mutex. Here we just add new lock,
    while the explanation of we can remove rtnl_lock() there are
    in next patches.

    Fine grained locks generally are better, then one big lock,
    so let's do that with net_namespace_list, while the situation
    allows that.

    Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Upstream commit:
    commit ec9c780925c57588637e1dbd8650d294107311c0
    Author: Kirill Tkhai <ktkhai@virtuozzo.com>
    Date:   Thu Mar 29 19:21:09 2018 +0300

    ovs: Remove rtnl_lock() from ovs_exit_net()

    Here we iterate for_each_net() and removes
    vport from alive net to the exiting net.

    ovs_net::dps are protected by ovs_mutex(),
    and the others, who change it (ovs_dp_cmd_new(),
    __dp_destroy()) also take it.
    The same with datapath::ports list.

    So, we remove rtnl_lock() here.

    Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Justin Pettit <jpettit@ovn.org>
2018-07-30 11:07:52 -07:00
Jianbo Liu
f9885dc594 Add support to offload QinQ double VLAN headers match
Currently the inner VLAN header is ignored when using the TC data-path.
As TC flower supports QinQ, now we can offload the rules to match on both
outer and inner VLAN headers.

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2018-07-25 18:15:52 +02:00
Greg Rose
10f242363d compat: Add skb_checksum_simple_complete()
A recent patch to gre.c added a call to skb_checksum_simple_complete()
which is not present in kernels before 3.16.  Fix up the compatability
layer to allow compile on older kernels that do not have it.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2018-06-04 12:52:22 -07:00
Greg Rose
436d36db62 compat: Fixups for newer kernels
A recent patch series added support for ERSPAN but left some problems
remaining for kernel releases from 4.10 to 4.14.  This patch
addresses those problems.

Of note is that the old cisco gre compat layer code is gone for good.

Also, several compat defines in acinclude.m4 were looking for keys
in .c source files - this does not work on distros without source
code.  A more reliable key was already defined so we use that instead.

We have pared support for the Linux kernel releases in .travis.yml
to reflect that 4.15 is no longer in the LTS list.  With this patch
the Out of Tree OVS datapath kernel modules can build on kernels
up to 4.14.47.  Support for kernels up to 4.16.x will be added
later.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2018-05-31 18:58:30 -07:00
Greg Rose
e1ededf45f rhel: Enable ERSPAN features for RHEL 7.x
Enable ERSPAN on RHEL 7.x

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
2018-05-21 20:33:30 -07:00
William Tu
8e53509cc6 gre: introduce native tunnel support for ERSPAN
Upstream commit:
    commit 84e54fe0a5eaed696dee4019c396f8396f5a908b
    Author: William Tu <u9012063@gmail.com>
    Date:   Tue Aug 22 09:40:28 2017 -0700

    gre: introduce native tunnel support for ERSPAN

    The patch adds ERSPAN type II tunnel support.  The implementation
    is based on the draft at [1].  One of the purposes is for Linux
    box to be able to receive ERSPAN monitoring traffic sent from
    the Cisco switch, by creating a ERSPAN tunnel device.
    In addition, the patch also adds ERSPAN TX, so Linux virtual
    switch can redirect monitored traffic to the ERSPAN tunnel device.
    The traffic will be encapsulated into ERSPAN and sent out.

    The implementation reuses tunnel key as ERSPAN session ID, and
    field 'erspan' as ERSPAN Index fields:
    ./ip link add dev ers11 type erspan seq key 100 erspan 123 \
    			local 172.16.1.200 remote 172.16.1.100

    To use the above device as ERSPAN receiver, configure
    Nexus 5000 switch as below:

    monitor session 100 type erspan-source
      erspan-id 123
      vrf default
      destination ip 172.16.1.200
      source interface Ethernet1/11 both
      source interface Ethernet1/12 both
      no shut
    monitor erspan origin ip-address 172.16.1.100 global

    [1] https://tools.ietf.org/html/draft-foschiano-erspan-01
    [2] iproute2 patch: http://marc.info/?l=linux-netdev&m=150306086924951&w=2
    [3] test script: http://marc.info/?l=linux-netdev&m=150231021807304&w=2

    Signed-off-by: William Tu <u9012063@gmail.com>
    Signed-off-by: Meenakshi Vohra <mvohra@vmware.com>
    Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
    Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

This commit also backports heavily from upstream gre, ip_gre and
ip_tunnel modules to support the necessary erspan ip gre
infrastructure as well as implementing a variety of compatability
layer changes for same support.

Cc: William Tu <u9012063@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
2018-05-21 20:17:15 -07:00
Yi-Hung Wei
39ca338374 datapath: compat: Fix build on RHEL 7.5
1) OVS datapath compat modules breaks on RHEL 7.5, because it moves
ndo_change_mtu function pointer from 'struct net_device_ops' to
'struct net_device_ops_extended'.

2) RHEL 7.5 introduces the MTU range checking as mentioned in
6c0bf091 ("datapath: use core MTU range checking in core net infra").
However, the max_mtu field is defined in 'struct net_device_extended'
but not in 'struct net_device' as upstream kernel.

This patch defines a new symbol HAVE_RHEL7_MAX_MTU that determines
the previous 2 conditions, and fixes the backport issue.

Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Acked-by: Lucas Alvares Gomes <lucasagomes@gmail.com>
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
2018-05-14 18:36:18 -07:00
Greg Rose
138df3e563 compat: Fix upstream 4.4.119 kernel
The Linux 4.4.119 kernel (and perhaps others) from kernel.org
backports some dst_cache code that breaks the openvswitch kernel
due to a duplicated name "dst_cache_destroy".  For most cases the
"USE_UPSTREAM_TUNNEL" covers this but in this case the dst_cache
feature needs to be separated out.

Add the necessary compatibility detection layer in acinclude.m4 and
then fixup the source files so that if the built-in kernel includes
dst_cache support then exclude our own compatibility code.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
2018-05-08 17:25:37 -07:00
Roi Dayan
83e866067e netdev-tc-offloads: Add support for IP fragmentation
Add support for frag no, first and later.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Shahar Klein <shahark@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2018-03-21 09:59:29 +01:00
Greg Rose
b3e647cae4 compat: Fix RHEL 7 compile
frag_percpu_counter_batch is a variable, not a define, so checking if
it is defined is an error and causes warning messages during compile
on RHEL 7 (or other 3.10 based) builds.  Use a compat #define from
acinclude.m4 instead.

Fixes: 64d8cb7295 ("compat:inet_frag.h: Check for frag_percpu_counter_batch")
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
2018-02-27 19:45:19 -08:00
Greg Rose
bba78707b0 acinclude: Enable building for Linux kernel 4.15
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
2018-02-16 00:41:22 -08:00
Greg Rose
05522eaa8a acinclude.m4: Enable Linux 4.14
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
2018-02-12 22:39:19 -08:00
Andy Zhou
1cb5703965 datapath: Add meter infrastructure
Upstream commit:
    commit 96fbc13d7e770b542d2d1fcf700d0baadc6e8063
    Author: Andy Zhou <azhou@ovn.org>
    Date:   Fri Nov 10 12:09:42 2017 -0800

    openvswitch: Add meter infrastructure

    OVS kernel datapath so far does not support Openflow meter action.
    This is the first stab at adding kernel datapath meter support.
    This implementation supports only drop band type.

    Signed-off-by: Andy Zhou <azhou@ovn.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Added a compat layer fixup for nla_parse.
Added another compat fixup for ktime_get_ns.

Cc: Andy Zhou <azhou@ovn.org>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
2018-02-12 22:39:19 -08:00
Jiri Benc
b147f2e919 datapath: reliable interface indentification in port dumps
Upstream commit:
    commit 9354d452034273a50a4fd703bea31e5d6b1fc20b
    Author: Jiri Benc <jbenc@redhat.com>
    Date:   Thu Nov 2 17:04:37 2017 -0200

    openvswitch: reliable interface indentification in port dumps

    This patch allows reliable identification of netdevice interfaces connected
    to openvswitch bridges. In particular, user space queries the netdev
    interfaces belonging to the ports for statistics, up/down state, etc.
    Datapath dump needs to provide enough information for the user space to be
    able to do that.

    Currently, only interface names are returned. This is not sufficient, as
    openvswitch allows its ports to be in different name spaces and the
    interface name is valid only in its name space. What is needed and generally
    used in other netlink APIs, is the pair ifindex+netnsid.

    The solution is addition of the ifindex+netnsid pair (or only ifindex if in
    the same name space) to vport get/dump operation.

    On request side, ideally the ifindex+netnsid pair could be used to
    get/set/del the corresponding vport. This is not implemented by this patch
    and can be added later if needed.

    Signed-off-by: Jiri Benc <jbenc@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Added compat fixup for peernet2id.

Cc: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
2018-02-12 22:39:19 -08:00
Arnd Bergmann
6deeb55a3e datapath: use ktime_get_ts64() instead of ktime_get_ts()
Upstream commit:
    commit 311af51dcb5629f04976a8e451673f77e3301041
    Author: Arnd Bergmann <arnd@arndb.de>
    Date:   Mon Nov 27 12:41:38 2017 +0100

    openvswitch: use ktime_get_ts64() instead of ktime_get_ts()

    timespec is deprecated because of the y2038 overflow, so let's convert
    this one to ktime_get_ts64(). The code is already safe even on 32-bit
    architectures, since it uses monotonic times. On 64-bit architectures,
    nothing changes, while on 32-bit architectures this avoids one
    type conversion.

    Signed-off-by: Arnd Bergmann <arnd@arndb.de>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Additional compatability check for ktime_get_ts64() exists or not.
If not, then just continue using ktime_get_ts(). I added a new
compatability header file "timekeeping.h".

Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
2018-02-12 00:15:13 -08:00
Greg Rose
a61fbfa459 compat: Fix compiler headers
Since Linux kernel upstream commit d15155824c50
("linux/compiler.h: Split into compiler.h and compiler_types.h") this
error check for the gcc compiler header is no longer valid.  Remove
so that openvswitch builds for linux kernels 4.14.8 and since.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
2018-02-12 00:15:13 -08:00
Greg Rose
36d3520b5f datapath: Fix netdev_master_upper_dev_link for 4.14
An extended netlink ack has been added for 4.14 - add compat layer
changes so that it compiles for all kernels up to and including
4.14.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
2018-02-12 00:15:13 -08:00
Yi Yang
96b82f6d5f datapath: enable NSH support
Upstream commit:
  commit b2d0f5d5dc53532e6f07bc546a476a55ebdfe0f3
  Author: Yi Yang <yi.y.yang@intel.com>
  Date:   Tue Nov 7 21:07:02 2017 +0800

    openvswitch: enable NSH support

    OVS master and 2.8 branch has merged NSH userspace
    patch series, this patch is to enable NSH support
    in kernel data path in order that OVS can support
    NSH in compat mode by porting this.

    Signed-off-by: Yi Yang <yi.y.yang@intel.com>
    Acked-by: Jiri Benc <jbenc@redhat.com>
    Acked-by: Eric Garver <e@erig.me>
    Acked-by: Pravin Shelar <pshelar@ovn.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
2018-02-07 10:45:58 -08:00
Eric Garver
c080263427 acinclude: check for IP_CT_UNTRACKED
IP_CT_UNTRACKED is fairly new, but used by the kernel datapath ct_clear
action.

Signed-off-by: Eric Garver <e@erig.me>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
2018-01-22 19:24:18 -08:00
Ben Pfaff
d4042a708f configure: New --enable-sparse option to enable sparse checking by default.
Until now, "make" called sparse to do checking only if C=1 was passed on
the command line.  It was easy for developers to forget to specify that.
This commit adds another option: specifying --enable-sparse on the
configure command line enables sparse checking by default.  (It can still
be disabled with C=0.)

Requested-by: Justin Pettit <jpettit@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
2018-01-12 10:46:50 -08:00
Ben Pfaff
df26508605 configure: Make --enable-Werror turn sparse warnings into errors.
Until now, when "sparse" reported a warning, it didn't fail the build for
that file, even when the project was configured with --enable-Werror, which
made it easy to miss warnings.  This commit fixes the problem.

Reported-by: "Stokes, Ian" <ian.stokes@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Ian Stokes <ian.stokes@intel.com>
Tested-by: Ian Stokes <ian.stokes@intel.com>
2018-01-11 15:07:25 -08:00
Paul Blakey
e5b1657e6f compat: Add act_pedit compatibility for old kernels
Added compatibility for action pedit.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2017-11-16 08:03:05 -08:00
William Tu
f90e29c922 acinclude: Fix SKB_GSO_UDP check.
The HAVE_SKB_GSO_UDP checks whether skbuff.h defines SKB_GSO_UDP.
However, it falsely returns yes because grep matches SKB_GSO_UDP_TUNNEL.
Thus, add space character '[:space:]' before and after it.

Fixes: ad283644f0e4 ("acinclude: Check for SKB_GSO_UDP")
Signed-off-by: William Tu <u9012063@gmail.com>
Cc: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-11-01 13:03:10 -07:00
Greg Rose
1465deac90 acinclude: Add support for Linux 4.13
Add configuration support for the just released 4.13 Linux kernel.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Andy Zhou <azhou@ovn.org>
2017-09-22 12:32:54 -07:00
Greg Rose
a522069322 acinclude: Check for existence of nf_hook_ops member "list".
The "list" member of the nf_hook_ops structure is removed in Linux
kernel release 4.13.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Andy Zhou <azhou@ovn.org>
2017-09-22 12:32:54 -07:00
Greg Rose
6b51641a51 acinclude: Check for extended netlink ack presence
RTNL ops validate and newlink now include the extended netlink
ack feature.  Check for it and set HAVE_EXT_ACK_IN_RTNL_LINKOPS
if found.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Andy Zhou <azhou@ovn.org>
2017-09-22 12:32:54 -07:00
Greg Rose
e0b0150ce5 acinclude: Add compat define for DST_NOCACHE
DST_NOCACHE is removed in the 4.13 Linux kernel - add check for it
and if found set HAVE_DST_NOCACHE.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Andy Zhou <azhou@ovn.org>
2017-09-22 12:32:54 -07:00
Greg Rose
ad283644f0 acinclude: Check for SKB_GSO_UDP
Removed in kernel 4.13

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Andy Zhou <azhou@ovn.org>
2017-09-22 12:32:51 -07:00
Greg Rose
35fc063864 acinclude: Add missing define
The final line of a conditional search for the nf_conntrack_helper_put
function does not actually define HAVE_NF_CONNTRACK_HELPER_PUT used
in datapath/linux/compat/include/net/netfilter/nf_conntrack_helper.h.

Fixes: ac8e3c6d14d2 ("datapath: introduce nf_conntrack_helper_put function")
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-09-21 10:14:11 -07:00
Yi-Hung Wei
b7ab73163c datapath: compat: Fix build on RHEL 7.4
RHEL 7.4 introduces netdev_master_upper_dev_link_rh() that breaks the
backport of OVS kernel module on RHEL 7.4. This patch fixes that issue.

Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
2017-08-23 16:26:43 -07:00
Paul Blakey
2517aa5a6a compat: Update tc compatibility header
Update to include up to flower ttl matching.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
2017-08-11 11:31:20 -07:00
Christian Ehrhardt
96195c09cd acinclude: Also support pkg-config for configuring dpdk.
If available use dpdk pkg-config info of libdpdk to set the right
include paths.
That for example, allows packagers to provide non default include
paths in a common way (pkg-config).

Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Suggested-by: Luca Boccassi <luca.boccassi@gmail.com>
Acked-by: Luca Boccassi <luca.boccassi@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-08-07 11:42:14 -07:00
Greg Rose
b03d058723 acinclude.m4: Support Linux kernel 4.12
Allow datapath kernel modules to be configured and built for kernels up
to 4.12.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
2017-07-24 11:25:39 -07:00
Joe Stringer
aadd6ae9e8 compat: net: store port/representator id in metadata_dst.
Upstream commit:
    commit 3fcece12bc1b6dcdf0986f2cd9e8f63b1f9b6aa0
    Author: Jakub Kicinski <jakub.kicinski@netronome.com>
    Date: Fri Jun 23 22:11:58 2017 +0200

    net: store port/representator id in metadata_dst

    Switches and modern SR-IOV enabled NICs may multiplex traffic from Port
    representators and control messages over single set of hardware queues.
    Control messages and muxed traffic may need ordered delivery.

    Those requirements make it hard to comfortably use TC infrastructure today
    unless we have a way of attaching metadata to skbs at the upper device.
    Because single set of queues is used for many netdevs stopping TC/sched
    queues of all of them reliably is impossible and lower device has to
    retreat to returning NETDEV_TX_BUSY and usually has to take extra locks on
    the fastpath.

    This patch attempts to enable port/representative devs to attach metadata
    to skbs which carry port id.  This way representatives can be queueless and
    all queuing can be performed at the lower netdev in the usual way.

    Traffic arriving on the port/representative interfaces will be have
    metadata attached and will subsequently be queued to the lower device for
    transmission.  The lower device should recognize the metadata and translate
    it to HW specific format which is most likely either a special header
    inserted before the network headers or descriptor/metadata fields.

    Metadata is associated with the lower device by storing the netdev pointer
    along with port id so that if TC decides to redirect or mirror the new
    netdev will not try to interpret it.

    This is mostly for SR-IOV devices since switches don't have lower netdevs
    today.

    Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
    Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
    Signed-off-by: Simon Horman <horms@verge.net.au>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Upstream: 3fcece12bc1b ("net: store port/representator id in metadata_dst")
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Greg Rose <gvrose8192@gmail.com>
2017-07-24 11:25:34 -07:00
Greg Rose
143656435c datapath: get rid of redundant vxlan_dev.flags
Upstream commit:
    commit dc5321d79697db1b610c25fa4fad1aec7533ea3e
    Author: Matthias Schiffer <mschiffer@universe-factory.net>
    Date:   Mon Jun 19 10:03:56 2017 +0200

    vxlan: get rid of redundant vxlan_dev.flags

    There is no good reason to keep the flags twice in vxlan_dev and
    vxlan_config.

    Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Applied using HAVE_VXLAN_DEV_CFG compatibility flag defined in
acinclude.m4.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
2017-07-24 11:25:34 -07:00
Joe Stringer
0ace0a260f compat: convert many more places to skb_put_zero().
Upstream commit:
    commit de77b966ce8adcb4c58d50e2f087320d5479812a
    Author: Johannes Berg <johannes.berg@intel.com>
    Date: Fri Jun 16 14:29:19 2017 +0200

    networking: convert many more places to skb_put_zero()

    There were many places that my previous spatch didn't find,
    as pointed out by yuan linyu in various patches.

    The following spatch found many more and also removes the
    now unnecessary casts:

        @@
        identifier p, p2;
        expression len;
        expression skb;
        type t, t2;
        @@
        (
        -p = skb_put(skb, len);
        +p = skb_put_zero(skb, len);
        |
        -p = (t)skb_put(skb, len);
        +p = skb_put_zero(skb, len);
        )
        ... when != p
        (
        p2 = (t2)p;
        -memset(p2, 0, len);
        |
        -memset(p, 0, len);
        )

        @@
        type t, t2;
        identifier p, p2;
        expression skb;
        @@
        t *p;
        ...
        (
        -p = skb_put(skb, sizeof(t));
        +p = skb_put_zero(skb, sizeof(t));
        |
        -p = (t *)skb_put(skb, sizeof(t));
        +p = skb_put_zero(skb, sizeof(t));
        )
        ... when != p
        (
        p2 = (t2)p;
        -memset(p2, 0, sizeof(*p));
        |
        -memset(p, 0, sizeof(*p));
        )

        @@
        expression skb, len;
        @@
        -memset(skb_put(skb, len), 0, len);
        +skb_put_zero(skb, len);

    Apply it to the tree (with one manual fixup to keep the
    comment in vxlan.c, which spatch removed.)

    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Use e45a79da863c ("skbuff/mac80211: introduce and use skb_put_zero()")
as the basis for the backported function.

Upstream: de77b966ce8a ("networking: convert many more places to skb_put_zero()")
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Greg Rose <gvrose8192@gmail.com>
2017-07-24 11:25:10 -07:00
Greg Rose
7f15e8dd25 datapath: Fix inconsistent teardown and release of private netdev state.
Upstream commit:
    commit cf124db566e6b036b8bcbe8decbed740bdfac8c6
    Author: David S. Miller <davem@davemloft.net>
    Date:   Mon May 8 12:52:56 2017 -0400

    net: Fix inconsistent teardown and release of private netdev state.

    Network devices can allocate reasources and private memory using
    netdev_ops->ndo_init().  However, the release of these resources
    can occur in one of two different places.

    Either netdev_ops->ndo_uninit() or netdev->destructor().

    The decision of which operation frees the resources depends upon
    whether it is necessary for all netdev refs to be released before it
    is safe to perform the freeing.

    netdev_ops->ndo_uninit() presumably can occur right after the
    NETDEV_UNREGISTER notifier completes and the unicast and multicast
    address lists are flushed.

    netdev->destructor(), on the other hand, does not run until the
    netdev references all go away.

    Further complicating the situation is that netdev->destructor()
    almost universally does also a free_netdev().

    This creates a problem for the logic in register_netdevice().
    Because all callers of register_netdevice() manage the freeing
    of the netdev, and invoke free_netdev(dev) if register_netdevice()
    fails.

    If netdev_ops->ndo_init() succeeds, but something else fails inside
    of register_netdevice(), it does call ndo_ops->ndo_uninit().  But
    it is not able to invoke netdev->destructor().

    This is because netdev->destructor() will do a free_netdev() and
    then the caller of register_netdevice() will do the same.

    However, this means that the resources that would normally be released
    by netdev->destructor() will not be.

    Over the years drivers have added local hacks to deal with this, by
    invoking their destructor parts by hand when register_netdevice()
    fails.

    Many drivers do not try to deal with this, and instead we have leaks.

    Let's close this hole by formalizing the distinction between what
    private things need to be freed up by netdev->destructor() and whether
    the driver needs unregister_netdevice() to perform the free_netdev().

    netdev->priv_destructor() performs all actions to free up the private
    resources that used to be freed by netdev->destructor(), except for
    free_netdev().

    netdev->needs_free_netdev is a boolean that indicates whether
    free_netdev() should be done at the end of unregister_netdevice().

    Now, register_netdevice() can sanely release all resources after
    ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit()
    and netdev->priv_destructor().

    And at the end of unregister_netdevice(), we invoke
    netdev->priv_destructor() and optionally call free_netdev().

    Signed-off-by: David S. Miller <davem@davemloft.net>

Applied the portion of the commit applicable to openvswitch.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
2017-07-24 11:24:32 -07:00
Joe Stringer
a0c9fedc4f datapath: more accurate checksumming in queue_userspace_packet()
Upstream commit:
    commit 7529390d08f07fbf9b0174c5a87600b5caa1a8e8
    Author: Davide Caratti <dcaratti@redhat.com>
    Date:   Thu May 18 15:44:42 2017 +0200

    openvswitch: more accurate checksumming in queue_userspace_packet()

    if skb carries an SCTP packet and ip_summed is CHECKSUM_PARTIAL, it needs
    CRC32c in place of Internet Checksum: use skb_csum_hwoffload_help to avoid
    corrupting such packets while queueing them towards userspace.

    Signed-off-by: Davide Caratti <dcaratti@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Joe Stringer <joe@ovn.org>
2017-07-24 11:24:32 -07:00
Greg Rose
ac8e3c6d14 datapath: introduce nf_conntrack_helper_put function
Upstream commit:
    commit d91fc59cd77c719f33eda65c194ad8f95a055190
    Author: Liping Zhang <zlpnobody@gmail.com>
    Date:   Sun May 7 22:01:55 2017 +0800

    netfilter: introduce nf_conntrack_helper_put helper function

    And convert module_put invocation to nf_conntrack_helper_put, this is
    prepared for the followup patch, which will add a refcnt for cthelper,
    so we can reject the deleting request when cthelper is in use.

    Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Applied with additional use of HAVE_NF_CONNTRACK_HELPER_PUT compatibility
flag defined in acinclude.m4.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
2017-07-24 11:24:32 -07:00