GNU (at least) printf interprets -I as an option, but we want to print it
literally, so use %s.
CC: YAMAMOTO Takashi <yamamoto@ovn.org>
Fixes: 27d41afaa446 ("acinclude.m4: Avoid echo -n")
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
-n option for echo is not portable. Use printf instead.
This fixes OSX build on travis-ci.
Acked-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: YAMAMOTO Takashi <yamamoto@ovn.org>
OVS uses extensively clang annotations for thread safety
checks. The ctags tool can't parse them, so they are not
included in the tag file.
This patch improves the configure script to generate a list
of identifiers from the header compiler.h to be ignored by
ctags.
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Aaron Conole <aconole@redhat.com>
The check for rte_config.h in acinclude.m4 used AC_CHECK_FILE, but this
macro is intended to check for a file on the host system, not the build
system, which means that it fails unconditionally in a cross-compilation
environment. However, the intended check here is for a header file,
which is part of the build system. To check for part of the build system,
we can just use "test", so this commit makes that change.
Reported-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2017-March/329994.html
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Darrell Ball <dlu998@gmail.com>
flake8 checking is useful. Until now, it always failed the build for any
flake8 errors. This is too aggressive, for the same reason that always
failing the build for any compiler warnings is too aggressive: compilers
change over time and asynchronously from OVS itself. Thus, if we release
some version of OVS today, even if it's flake8-clean today, it might not
be flake8-clean tomorrow, even with the same settings. We don't want to
have to track flake8 warnings on every release branch.
Thus, this adopts the same policy for compiler warnings: always report
them, but only fail the build if --enable-Werror was configured. Usually
just developers use that configure option, and they're prepared to deal
with the fallout.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Russell Bryant <russell@ovn.org>
The attribute __ro_after_init was introduced in Linux kernel 4.5. If
a data structure is given this attribute then after the driver module
loads the memory page where the data resides will be marked read only.
The compat code in cache.h always defines __ro_after_init if it is not
already defined so that it can be used as an attribute for the datapath
genl_family structure definitions. If __ro_after_init is defined then
it is used "as-is" where it will apply the read only attribute after
driver initialization.
This is incorrect usage for the Generic Netlink genl_family structure
definitions prior to Linux kernel 4.10. The genl_family structure
in those kernels includes a list header member that will be written
to when the generic netlink family is unregistered. This will cause
a subsequent page fault and kernel panic because at this time the
genl_family structure data has been marked read only in the page
descriptor.
A new compat macro is introduced in acinclude.m4 to detect when the
genl_family structure has the family_list list header as a member.
In this case HAVE_GENL_FAMILY_LIST is defined and if __ro_after_init
is also defined then it is undefined and redefined as empty. This
will prevent the genl_family data structure from being marked read
only in kernels 4.5 through 4.9 and thus prevent the page fault when
the generic netlink families in datapath.c are unregistered.
[Committer notes]
* Rolled a short explanation comment into the code.
Fixes: ba63fe260bd5 ("datapath: Allow compile against current net-next.")
CC: Jarno Rajahalme <jarno@ovn.org>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
Added compatibility headers for actions vlan and tunnel key.
Do not use compat code when compiling kernel datapath
there is no need for it as TC compatibility is not provided there.
In other words, the compat code is only used when compiling user-space
code against old kernel headers.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Upstream commits cc41c84b7e7f ("netfilter: kill the fake untracked
conntrack objects") and ab8bc7ed864b ("netfilter: remove
nf_ct_is_untracked") removed the 'untracked' conntrack objects and
functions. The latter commit removes the usage of nf_ct_is_untracked()
from OVS. However, older kernels still have a representation of
'untracked' CT objects so the code needs to remain until the kernel
support is bumped to Linux 4.12 or newer. Introduce a macro to detect
this symbol and wrap these lines in the macro check.
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Greg Rose <gvrose8192@gmail.com>
This patch backports the following two upstream commits to fix MPLS GSO in
ovs datapath. Starting from upstream commit 48d2ab609b6b ("net: mpls: Fixups
for GSO"), the mpls_gso kernel module relies on the fact that
skb_network_header() points to the mpls header and skb_inner_network_header()
points to the L3 header so that it can derive the length of mpls header
correctly, and the upstream commit updates how ovs datapath marks the skb
header when push and pop mpls. However, the old mpls_gso kernel module
assumes that the skb_network_header() points to the L3 header, and the old
mpls_gso kernel module will misbehave if the ovs datapath marks the
skb_network_header() in the new way since it will treat mpls header as the L3
header.
Because of the functional signature of mpls_gso_segment() does not change,
this backport patch uses the new mpls_hdr() to determine if the kernel that
ovs datapath is compiled with has the new or legacy mpls_gso kernel module.
It has been tested on kernel 4.4 and 4.9.
Upstream commit:
commit 48d2ab609b6bbecb7698487c8579bc40de9d6dfa
Author: David Ahern <dsa@cumulusnetworks.com>
Date: Wed Aug 24 20:10:44 2016 -0700
net: mpls: Fixups for GSO
As reported by Lennert the MPLS GSO code is failing to properly segment
large packets. There are a couple of problems:
1. the inner protocol is not set so the gso segment functions for inner
protocol layers are not getting run, and
2 MPLS labels for packets that use the "native" (non-OVS) MPLS code
are not properly accounted for in mpls_gso_segment.
The MPLS GSO code was added for OVS. It is re-using skb_mac_gso_segment
to call the gso segment functions for the higher layer protocols. That
means skb_mac_gso_segment is called twice -- once with the network
protocol set to MPLS and again with the network protocol set to the
inner protocol.
This patch sets the inner skb protocol addressing item 1 above and sets
the network_header and inner_network_header to mark where the MPLS labels
start and end. The MPLS code in OVS is also updated to set the two
network markers.
>From there the MPLS GSO code uses the difference between the network
header and the inner network header to know the size of the MPLS header
that was pushed. It then pulls the MPLS header, resets the mac_len and
protocol for the inner protocol and then calls skb_mac_gso_segment
to segment the skb.
Afterward the inner protocol segmentation is done the skb protocol
is set to mpls for each segment and the network and mac headers
restored.
Reported-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Upstream commit:
commit 85de4a2101acb85c3b1dde465e84596ccca99f2c
Author: Jiri Benc <jbenc@redhat.com>
Date: Fri Sep 30 19:08:07 2016 +0200
openvswitch: use mpls_hdr
skb_mpls_header is equivalent to mpls_hdr now. Use the existing helper
instead.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Use the acinclude.m4 configuration file to check for the net parameter
that was added to the ipv4 and ipv6 frags init functions in the 4.10
Linux kernel to check whether DEFRAG_ENABLE_TAKES_NET should be set and
then check for that at compile time.
This is an alternative solution patch for the issue reported by Raymond
Burkholder and the patch submitted by Guoshuai Li.
[Committer notes]
Squash in "acinclude.m4: Add check for struct net parameter" which
provides the HAVE_DEFRAG_ENABLE_TAKES_NET.
Reported-by: Raymond Burkholder <ray@oneunified.net>
CC: Guoshuai Li <ligs@dtdream.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
Upstream commit:
commit fceb6435e85298f747fee938415057af837f5a8a
Author: Johannes Berg <johannes.berg@intel.com>
Date: Wed Apr 12 14:34:07 2017 +0200
netlink: pass extended ACK struct to parsing functions
Pass the new extended ACK reporting struct to all of the generic
netlink parsing functions. For now, pass NULL in almost all callers
(except for some in the core.)
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Upstream commit:
ipv6: orphan skbs in reassembly unit
Andrey reported a use-after-free in IPv6 stack.
Issue here is that we free the socket while it still has skb
in TX path and in some queues.
It happens here because IPv6 reassembly unit messes skb->truesize,
breaking skb_set_owner_w() badly.
We fixed a similar issue for IPV4 in commit 8282f27449bf ("inet: frag:
Always orphan skbs inside ip_defrag()")
Acked-by: Joe Stringer <joe@ovn.org>
==================================================================
BUG: KASAN: use-after-free in sock_wfree+0x118/0x120
Read of size 8 at addr ffff880062da0060 by task a.out/4140
page:ffffea00018b6800 count:1 mapcount:0 mapping: (null)
index:0x0 compound_mapcount: 0
flags: 0x100000000008100(slab|head)
raw: 0100000000008100 0000000000000000 0000000000000000 0000000180130013
raw: dead000000000100 dead000000000200 ffff88006741f140 0000000000000000
page dumped because: kasan: bad access detected
CPU: 0 PID: 4140 Comm: a.out Not tainted 4.10.0-rc3+ #59
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:15
dump_stack+0x292/0x398 lib/dump_stack.c:51
describe_address mm/kasan/report.c:262
kasan_report_error+0x121/0x560 mm/kasan/report.c:370
kasan_report mm/kasan/report.c:392
__asan_report_load8_noabort+0x3e/0x40 mm/kasan/report.c:413
sock_flag ./arch/x86/include/asm/bitops.h:324
sock_wfree+0x118/0x120 net/core/sock.c:1631
skb_release_head_state+0xfc/0x250 net/core/skbuff.c:655
skb_release_all+0x15/0x60 net/core/skbuff.c:668
__kfree_skb+0x15/0x20 net/core/skbuff.c:684
kfree_skb+0x16e/0x4e0 net/core/skbuff.c:705
inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304
inet_frag_put ./include/net/inet_frag.h:133
nf_ct_frag6_gather+0x1125/0x38b0 net/ipv6/netfilter/nf_conntrack_reasm.c:617
ipv6_defrag+0x21b/0x350 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
nf_hook_entry_hookfn ./include/linux/netfilter.h:102
nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310
nf_hook ./include/linux/netfilter.h:212
__ip6_local_out+0x52c/0xaf0 net/ipv6/output_core.c:160
ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170
ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722
ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742
rawv6_push_pending_frames net/ipv6/raw.c:613
rawv6_sendmsg+0x2cff/0x4130 net/ipv6/raw.c:927
inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
sock_sendmsg_nosec net/socket.c:635
sock_sendmsg+0xca/0x110 net/socket.c:645
sock_write_iter+0x326/0x620 net/socket.c:848
new_sync_write fs/read_write.c:499
__vfs_write+0x483/0x760 fs/read_write.c:512
vfs_write+0x187/0x530 fs/read_write.c:560
SYSC_write fs/read_write.c:607
SyS_write+0xfb/0x230 fs/read_write.c:599
entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203
RIP: 0033:0x7ff26e6f5b79
RSP: 002b:00007ff268e0ed98 EFLAGS: 00000206 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00007ff268e0f9c0 RCX: 00007ff26e6f5b79
RDX: 0000000000000010 RSI: 0000000020f50fe1 RDI: 0000000000000003
RBP: 00007ff26ebc1220 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
R13: 00007ff268e0f9c0 R14: 00007ff26efec040 R15: 0000000000000003
The buggy address belongs to the object at ffff880062da0000
which belongs to the cache RAWv6 of size 1504
The buggy address ffff880062da0060 is located 96 bytes inside
of 1504-byte region [ffff880062da0000, ffff880062da05e0)
Freed by task 4113:
save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
save_stack+0x43/0xd0 mm/kasan/kasan.c:502
set_track mm/kasan/kasan.c:514
kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:578
slab_free_hook mm/slub.c:1352
slab_free_freelist_hook mm/slub.c:1374
slab_free mm/slub.c:2951
kmem_cache_free+0xb2/0x2c0 mm/slub.c:2973
sk_prot_free net/core/sock.c:1377
__sk_destruct+0x49c/0x6e0 net/core/sock.c:1452
sk_destruct+0x47/0x80 net/core/sock.c:1460
__sk_free+0x57/0x230 net/core/sock.c:1468
sk_free+0x23/0x30 net/core/sock.c:1479
sock_put ./include/net/sock.h:1638
sk_common_release+0x31e/0x4e0 net/core/sock.c:2782
rawv6_close+0x54/0x80 net/ipv6/raw.c:1214
inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425
inet6_release+0x50/0x70 net/ipv6/af_inet6.c:431
sock_release+0x8d/0x1e0 net/socket.c:599
sock_close+0x16/0x20 net/socket.c:1063
__fput+0x332/0x7f0 fs/file_table.c:208
____fput+0x15/0x20 fs/file_table.c:244
task_work_run+0x19b/0x270 kernel/task_work.c:116
exit_task_work ./include/linux/task_work.h:21
do_exit+0x186b/0x2800 kernel/exit.c:839
do_group_exit+0x149/0x420 kernel/exit.c:943
SYSC_exit_group kernel/exit.c:954
SyS_exit_group+0x1d/0x20 kernel/exit.c:952
entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203
Allocated by task 4115:
save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
save_stack+0x43/0xd0 mm/kasan/kasan.c:502
set_track mm/kasan/kasan.c:514
kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:605
kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:544
slab_post_alloc_hook mm/slab.h:432
slab_alloc_node mm/slub.c:2708
slab_alloc mm/slub.c:2716
kmem_cache_alloc+0x1af/0x250 mm/slub.c:2721
sk_prot_alloc+0x65/0x2a0 net/core/sock.c:1334
sk_alloc+0x105/0x1010 net/core/sock.c:1396
inet6_create+0x44d/0x1150 net/ipv6/af_inet6.c:183
__sock_create+0x4f6/0x880 net/socket.c:1199
sock_create net/socket.c:1239
SYSC_socket net/socket.c:1269
SyS_socket+0xf9/0x230 net/socket.c:1249
entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203
Memory state around the buggy address:
ffff880062d9ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff880062d9ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff880062da0000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff880062da0080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff880062da0100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is a bugfix, and will be progressively backported to earlier
kernels. If it is backported to any kernel 4.5 through 4.10, then users
use that updated kernel with the OVS kernel module prior to this patch, it
could cause a crash. The compat code here resolves such issues.
Upstream: 48cac18ecf1d ("ipv6: orphan skbs in reassembly unit")
Signed-off-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Andy Zhou <azhou@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Change the Linux kernel tests in OVS configuration.
While the backports may still be a little behind, it is useful to be
able to test the OVS tree kernel module with the upstream net-next
kernel.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
Upstream commit:
commit f330a7fdbe1611104622faff7e614a246a7d20f0
Author: Florian Westphal <fw@strlen.de>
Date: Thu Aug 25 15:33:31 2016 +0200
netfilter: conntrack: get rid of conntrack timer
With stats enabled this eats 80 bytes on x86_64 per nf_conn entry, as
Eric Dumazet pointed out during netfilter workshop 2016.
Eric also says: "Another reason was the fact that Thomas was about to
change max timer range [..]" (500462a9de657f8, 'timers: Switch to
a non-cascading wheel').
Remove the timer and use a 32bit jiffies value containing timestamp until
entry is valid.
During conntrack lookup, even before doing tuple comparision, check
the timeout value and evict the entry in case it is too old.
The dying bit is used as a synchronization point to avoid races where
multiple cpus try to evict the same entry.
Because lookup is always lockless, we need to bump the refcnt once
when we evict, else we could try to evict already-dead entry that
is being recycled.
This is the standard/expected way when conntrack entries are destroyed.
Followup patches will introduce garbage colliction via work queue
and further places where we can reap obsoleted entries (e.g. during
netlink dumps), this is needed to avoid expired conntracks from hanging
around for too long when lookup rate is low after a busy period.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Upstream commit f330a7fdbe16 ("netfilter: conntrack: get rid of
conntrack timer") changes the way nf_ct_delete() is called. Prior to
commit the call pattern was like this:
if (del_timer(&ct->timeout))
nf_ct_delete(ct, ...);
After this change nf_ct_delete() is called directly:
nf_ct_delete(ct, ...);
This patch provides a replacement implementation for nf_ct_delete()
that first calls the del_timer(). This replacement is only used if
the struct nf_conn has member 'timeout' of type 'struct timer_list'.
The following patch introduces the first caller to nf_ct_delete() in
the OVS kernel module.
Linux <3.12 does not have nf_ct_delete() at all, so we inline it if it
does not exist. The inlined code is from 3.11 death_by_timeout(),
which in later versions simply calls nf_ct_delete().
Upstream commit 02982c27ba1e1bd9f9d4747214e19ca83aa88d0e introduced
nf_ct_delete() in Linux 3.12. This commit has the original code that
is being inlined here.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Upstream commit:
commit c74454fadd5ea6fc866ffe2c417a0dba56b2bf1c
Author: Florian Westphal <fw@strlen.de>
Date: Mon Jan 23 18:21:57 2017 +0100
netfilter: add and use nf_ct_set helper
Add a helper to assign a nf_conn entry and the ctinfo bits to an sk_buff.
This avoids changing code in followup patch that merges skb->nfct and
skb->nfctinfo into skb->_nfct.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
OVS in-tree datapath compiles against Linux 4.10 kernel, so allow it.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Open vSwitch has documented a requirement for GNU Make for a long time, yet
it had vestiges catering to other make implementations. This removes
those.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Russell Bryant <russell@ovn.org>
Upstream commit:
commit 5108bbaddc37c1c8583f0cf2562d7d3463cd12cb
Author: Jiri Benc <jbenc@redhat.com>
Date: Thu Nov 10 16:28:21 2016 +0100
openvswitch: add processing of L3 packets
Support receiving, extracting flow key and sending of L3 packets (packets
without an Ethernet header).
Note that even after this patch, non-Ethernet interfaces are still not
allowed to be added to bridges. Similarly, netlink interface for sending and
receiving L3 packets to/from user space is not in place yet.
Based on previous versions by Lorand Jakab and Simon Horman.
Signed-off-by: Lorand Jakab <lojakab@cisco.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
Upstream commit:
commit 61e84623ace35ce48975e8f90bbbac7557c43d61
Author: Jarod Wilson <jarod@redhat.com>
Date: Fri Oct 7 22:04:33 2016 -0400
net: centralize net_device min/max MTU checking
While looking into an MTU issue with sfc, I started noticing that almost
every NIC driver with an ndo_change_mtu function implemented almost
exactly the same range checks, and in many cases, that was the only
practical thing their ndo_change_mtu function was doing. Quite a few
drivers have either 68, 64, 60 or 46 as their minimum MTU value checked,
and then various sizes from 1500 to 65535 for their maximum MTU value. We
can remove a whole lot of redundant code here if we simple store min_mtu
and max_mtu in net_device, and check against those in net/core/dev.c's
dev_set_mtu().
In theory, there should be zero functional change with this patch, it just
puts the infrastructure in place. Subsequent patches will attempt to start
using said infrastructure, with theoretically zero change in
functionality.
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Upstream commit:
commit 91572088e3fdbf4fe31cf397926d8b890fdb3237
Author: Jarod Wilson <jarod@redhat.com>
Date: Thu Oct 20 13:55:20 2016 -0400
net: use core MTU range checking in core net infra
...
openvswitch:
- set min/max_mtu, remove internal_dev_change_mtu
- note: max_mtu wasn't checked previously, it's been set to 65535, which
is the largest possible size supported
...
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Upstream commit:
commit 425df17ce3a26d98f76e2b6b0af2acf4aeb0b026
Author: Jarno Rajahalme <jarno@ovn.org>
Date: Tue Feb 14 21:16:28 2017 -0800
openvswitch: Set internal device max mtu to ETH_MAX_MTU.
Commit 91572088e3fd ("net: use core MTU range checking in core net
infra") changed the openvswitch internal device to use the core net
infra for controlling the MTU range, but failed to actually set the
max_mtu as described in the commit message, which now defaults to
ETH_DATA_LEN.
This patch fixes this by setting max_mtu to ETH_MAX_MTU after
ether_setup() call.
Fixes: 91572088e3fd ("net: use core MTU range checking in core net infra")
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This backport detects the new max_mtu field in the struct netdevice
and uses the upstream code if it exists, and local backport code if
not. The latter case is amended with bounds checks with new upstream
macros ETH_MIN_MTU and ETH_MAX_MTU and the corresponding error
messages from the upstream commit.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Signed-off-by: Joe Stringer <joe@ovn.org>
Upstream commit:
commit fe19c4f971a55cea3be442d8032a5f6021702791
Author: Eric Garver <e@erig.me>
Date: Wed Sep 7 12:56:58 2016 -0400
This is to simplify using double tagged vlans. This function allows all
valid vlan ethertypes to be checked in a single function call.
Also replace some instances that check for both ETH_P_8021Q and
ETH_P_8021AD.
Patch based on one originally by Thomas F Herbert.
Signed-off-by: Thomas F Herbert <thomasfherbert@gmail.com>
Signed-off-by: Eric Garver <e@erig.me>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Acked-by: Eric Garver <e@erig.me>
Signed-off-by: Joe Stringer <joe@ovn.org>
Upstream commit:
commit f5a7fb88e1f82542ca14ba93a1d4fa35471c60ca
Author: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Date: Fri Mar 27 14:31:11 2015 +0900
vlan: Introduce helper functions to check if skb is tagged
Separate the two checks for single vlan and multiple vlans in
netif_skb_features(). This allows us to move the check for multiple
vlans to another function later.
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Acked-by: Eric Garver <e@erig.me>
Signed-off-by: Joe Stringer <joe@ovn.org>
RHEL 7.3 provides upstream tunnel but it does not support name_assign_type
attribute in net-device. This patch fixes the build problem by backporting
functions with name_assign_type, and using proper flags in acinclude.m4 to
invoke backport functions.
Tested on RHEL 7.3 with kernel 3.10.0-514.el7.x86_64
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
GCC 6.1 warns that -Wformat-security has no effect without -Wformat, so
this commit fixes the problem.
The change to _OVS_CHECK_CC_OPTION is needed so that the cache variable
name doesn't end up with a space in it, which obviously doesn't work.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Andy Zhou <azhou@ovn.org>
AC_LANG_PROGRAM(,) uses a program like this:
int main() { return 0; }
but that triggers warnings for -Wstrict-prototypes and for
-Wold-style-definition, since this definition of main() lacks a prototype
and is therefore old-style. This meant that -Wstrict-prototypes and
-Wold-style-definition weren't being turned on for new-enough GCC. This
commit fixes the problem by changing the program that is test-compiled to:
int x;
which doesn't make any compilers mad, as far as I know.
I recently upgraded to GCC 6.1 and just now noticed the issue, so I think
that GCC somewhere between version 4.9 and version 6.1 must have started
warning about main() when it's declared this way.
Also, fix a few functions that lacked prototypes.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Andy Zhou <azhou@ovn.org>
This patch allows openvswitch kernel module in the OVS tree to be
compiled against the current net-next Linux kernel. The changes are
due to these upstream commits:
56989f6d856 ("genetlink: mark families as __ro_after_init")
489111e5c25 ("genetlink: statically initialize families")
a07ea4d9941 ("genetlink: no longer support using static family IDs")
struct genl_family initialization is changed be completely static and
to include the new (in Linux 4.6) __ro_after_init attribute. Compat
code defines it as an empty macro if not defined already.
GENL_ID_GENERATE is no longer defined, but since it was defined as 0,
it is safe to drop it from all initializers also on older Linux
versions. A compiletime_assert is added to make sure this is true
whenever GENL_ID_GENERATE is defined.
Tested with current Linux net-next (4.9) and 3.16.
It should be noted that there are still a number of fixes and new
features in upstream net-next that are yet to be backported.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Andy Zhou <azhou@ovn.org>
Completely unrelated, but annoying. Let's fix it up.
Signed-off-by: Stephen Finucane <stephen@that.guru>
Signed-off-by: Russell Bryant <russell@ovn.org>
Port upstream change in conntrack labels extension. Add a new
configure macro HAVE_NF_CONN_LABELS_WITH_WORDS to detect the old
definition. Unfortunately there is no conntrack API to hide the
difference, so the this makes conntrack.c deviate from upstream source
a bit.
Upstream commit:
commit 23014011ba4209a086931ff402eac1c41abbe456
Author: Florian Westphal <fw@strlen.de>
Date: Thu Jul 21 12:51:16 2016 +0200
netfilter: conntrack: support a fixed size of 128 distinct labels
The conntrack label extension is currently variable-sized, e.g. if
only 2 labels are used by iptables rules then the labels->bits[] array
will only contain one element.
We track size of each label storage area in the 'words' member.
But in nftables and openvswitch we always have to ask for worst-case
since we don't know what bit will be used at configuration time.
As most arches are 64bit we need to allocate 24 bytes in this case:
struct nf_conn_labels {
u8 words; /* 0 1 */
/* XXX 7 bytes hole, try to pack */
long unsigned bits[2]; /* 8 24 */
Make bits a fixed size and drop the words member, it simplifies
the code and only increases memory requirements on x86 when
less than 64bit labels are required.
We still only allocate the extension if its needed.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
If NUMA information can't be derived from a vHost User device, only
print an error if the VHOST_NUMA option is enabled in DPDK. Otherwise
'fail' silently.
Fixes: 0a0f39df1d5a ("netdev-dpdk: Add support for DPDK 16.07")
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Reported-by: Ian Stokes <ian.stokes@intel.com>
Tested-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
This basically backport commit:
commit 179bc67f69b6cb53ad68cfdec5a917c2a2248355
Author: Edward Cree <ecree@solarflare.com>
Date: Thu Feb 11 20:48:04 2016 +0000
net: local checksum offload for encapsulation
The arithmetic properties of the ones-complement checksum mean that a
correctly checksummed inner packet, including its checksum, has a ones
complement sum depending only on whatever value was used to initialise
the checksum field before checksumming (in the case of TCP and UDP,
this is the ones complement sum of the pseudo header, complemented).
Consequently, if we are going to offload the inner checksum with
CHECKSUM_PARTIAL, we can compute the outer checksum based only on the
packed data not covered by the inner checksum, and the initial value of
the inner checksum field.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
This commit removes the 'dpdkvhostcuse' port type from the userspace
datapath. vhost-cuse ports are quickly becoming obsolete as the
vhost-user port type begins to support a greater feature-set thanks to
the addition of things like vhost-user multiqueue and potential
upcoming features like vhost-user client-mode and vhost-user reconnect.
The feature is also expected to be removed from DPDK soon.
One potential drawback of the removal of this support is that a
userspace vHost port type is not available in OVS for use with older
versions of QEMU (pre v2.2). Considering v2.2 is nearly two years old
this should however be a low impact change.
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
This commit provides the ability to 'listen' on DPDK ports and save
packets to a pcap file with a DPDK app that uses the librte_pdump
library. One such app is the 'pdump' app that can be found in the DPDK
'app' directory. Instructions on how to use this can be found in
INSTALL.DPDK-ADVANCED.md
Pdump capability in OVS with DPDK will only be initialised if the
CONFIG_RTE_LIBRTE_PMD_PCAP=y and CONFIG_RTE_LIBRTE_PDUMP=y options are
set in DPDK. libpcap is required if the above configuration is used.
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Older kernel skb_scrub_packet() has bug which resets skb mark for
all packet. It is fixed during 3.18 release where it is reset
only for packets crossing namespace. So OVS is forced to use
compat skb_scrub_packet() on older kernel.
This is related to upstream bug fix commit ca7c7b9059e3
("skbuff: Do not scrub skb mark within the same name space").
VMware-BZ: #1710701
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Remove mutual exclusion between udp-gro registration and geneve receive port
registration.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
Prior to this patch, OVS with DPDK required the libnuma packages to
build. This patch removes this dependency, making it only a requirement
when the CONFIG_RTE_LIBRTE_VHOST_NUMA option is detected as enabled in
the DPDK build.
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
OVS turns on tunnel GSO for statically for kernel older than 3.18.
Some distributions kernel could backport tunnel GSO. To make use
of device offload on such kernel detect the support at configure
stage.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
In kernels <=3.16 there is an LRU for managing fragment queues for IPv4
and IPv6. Because the backport code comes from more recent upstream
versions of Linux, this LRU management was missing from ip_frag_queue()
and nf_ct_frag6_queue().
Fixes: 595e069a0634 ("compat: Backport IPv4 reassembly.")
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
The core fragmentation handling logic is exported on all supported
kernels, so it's not necessary to backport the latest version of this.
This greatly simplifies the code due to inconsistencies between the old
per-lookup garbage collection and the newer workqueue based garbage
collection.
As a result of simplifying and removing unnecessary backport code, a few
bugs are fixed for corner cases such as when some fragments remain in
the fragment cache when openvswitch is unloaded.
Some backported ip functions need a little extra logic than what is seen
on the latest code due to this, for instance on kernels <3.17:
* Call inet_frag_evictor() before defrag
* Limit hashsize in ip{,6}_fragment logic
The pernet init/exit logic also differs a little from upstream. Upstream
ipv[46]_defrag logic initializes the various pernet fragment parameters
and its own global fragments cache. In the OVS backport, the pernet
parameters are shared while the fragments cache is separate. The
backport relies upon upstream pernet initialization to perform the
shared setup, and performs no pernet initialization of its own. When it
comes to pernet exit however, the backport must ensure that all
OVS-specific fragment state is cleared, while the shared state remains
untouched so that the regular ipv[46] logic may do its own cleanup. In
practice this means that OVS must have its own divergent implementation
of inet_frags_exit_net().
Fixes the following crash:
Call Trace:
<IRQ>
[<ffffffff810744f6>] ? call_timer_fn+0x36/0x100
[<ffffffff8107548f>] run_timer_softirq+0x1ef/0x2f0
[<ffffffff8106cccc>] __do_softirq+0xec/0x2c0
[<ffffffff8106d215>] irq_exit+0x105/0x110
[<ffffffff81737095>] smp_apic_timer_interrupt+0x45/0x60
[<ffffffff81735a1d>] apic_timer_interrupt+0x6d/0x80
<EOI>
[<ffffffff8104f596>] ? native_safe_halt+0x6/0x10
[<ffffffff8101cb2f>] default_idle+0x1f/0xc0
[<ffffffff8101d406>] arch_cpu_idle+0x26/0x30
[<ffffffff810bf3a5>] cpu_startup_entry+0xc5/0x290
[<ffffffff810415ed>] start_secondary+0x21d/0x2d0
Code: Bad RIP value.
RIP [<ffffffffa0177480>] 0xffffffffa0177480
RSP <ffff88003f703e78>
CR2: ffffffffa0177480
---[ end trace eb98ca80ba07bd9c ]---
Kernel panic - not syncing: Fatal exception in interrupt
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Most of patch iron out USE_UPSTREAM_TUNNEL case where datapath
directly use upstream tunneling modules.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
Acked-by: Amitabha Biswas <abiswas@us.ibm.com>
In upstream linux kernel networking stack udp_set_csum() is called
with only udp header applied but in case of compat layer it can
be called with IP header. So following patch take the offset into
account.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
Most of changes are related to ip-fragment API and genetlink
API changes.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
API changes are related commit:
openvswitch: Revert: "Enable memory mapped Netlink i/o"
revert commit 795449d8b846 ("openvswitch: Enable memory mapped Netlink i/o").
Following the mmaped netlink removal this code can be removed.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
Upstream commit:
commit f35423c137b0e64155f52c166db1d13834a551f2
Author: Fabian Frederick <fabf@skynet.be>
openvswitch: use PTR_ERR_OR_ZERO
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
Upstream commit:
commit 3a927bc7cf9d0fbe8f4a8189dd5f8440228f64e7
Author: Paolo Abeni <pabeni@redhat.com>
ovs: propagate per dp max headroom to all vports
This patch implements bookkeeping support to compute the maximum
headroom for all the devices in each datapath. When said value
changes, the underlying devs are notified via the
ndo_set_rx_headroom method.
This also increases the internal vports xmit performance.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
Upstream commit:
commit 66c7a5ee1a6b7c69d41dfd68d207fdd54efba56a
Author: Nicolas Dichtel <nicolas.dichtel@6wind.com>
ovs: align nlattr properly when needed
I also fix commit 8b32ab9e6ef1: use nla_total_size_64bit() for
OVS_FLOW_ATTR_USED in ovs_flow_cmd_msg_size().
Fixes: 8b32ab9e6ef1 ("ovs: use nla_put_u64_64bit()")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
OVS has GSO compat functionality which needs inner offset
of the packet to segment a packet. older kernel did not
include these offsets in skb, therefore these were stored
in OVS_GSO_CB. Now OVS has dropped support for these
old kernel, So none of the supported kernel needs this
comapt code. Following patch removes it.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>