The patch adds support for using need_wakeup flag in AF_XDP rings.
A new option, use-need-wakeup, is added. When this option is used,
it means that OVS has to explicitly wake up the kernel RX, using poll()
syscall and wake up TX, using sendto() syscall. This feature improves
the performance by avoiding unnecessary sendto syscalls for TX.
For RX, instead of kernel always busy-spinning on fille queue, OVS wakes
up the kernel RX processing when fill queue is replenished.
The need_wakeup feature is merged into Linux kernel bpf-next tee with commit
77cd0d7b3f25 ("xsk: add support for need_wakeup flag in AF_XDP rings") and
OVS enables it by default, if libbpf supports it. If users enable it but
runs in an older version of libbpf, then the need_wakeup feature has no effect,
and a warning message is logged.
For virtual interface, it's better set use-need-wakeup=false, since
the virtual device's AF_XDP xmit is synchronous: the sendto syscall
enters kernel and process the TX packet on tx queue directly.
On Intel Xeon E5-2620 v3 2.4GHz system, performance of physical port
to physical port improves from 6.1Mpps to 7.3Mpps.
Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
There is a race window between getting the time and getting the meter
lock. This could lead to situation where the thread with larger
current time (this thread called time_{um}sec() later than others)
will acquire meter lock first and update meter->used to the large
value. Next threads will try to calculate time delta by subtracting
the large meter->used from their lower time getting the negative value
which will be converted to a big unsigned delta.
Fix that by assuming that all these threads received packets in the
same time in this case, i.e. dropping negative delta to 0.
CC: Jarno Rajahalme <jarno@ovn.org>
Fixes: 4b27db644a ("dpif-netdev: Simple DROP meter implementation.")
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-September/363126.html
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
Mixing of RSS hash with recirculation depth is useful for flow lookup
because same packet after recirculation should match with different
datapath rule. Setting of the mixed value back to the packet is
completely unnecessary because recirculation depth is different on
each recirculation, i.e. we will have different packet hash for
flow lookup anyway.
This should fix the issue that packets from the same flow could be
directed to different buckets based on a dp_hash or different ports of
a balanced bonding in case they were recirculated different number of
times (e.g. due to conntrack rules).
With this change, the original RSS hash will remain the same making
it possible to calculate equal dp_hash values for such packets.
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-September/363127.html
Fixes: 048963aa85 ("dpif-netdev: Reset RSS hash when recirculating.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Jan Scheurich <jan.scheurich@ericsson.com>
This function allows an environment variable to be included in
command-line parsing. It will receive its first user in an
upcoming commit.
Signed-off-by: Aliasgar Ginwala <aginwala@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Memory leak happens while converting existing database into new
database according to the specified schema (ovsdb-client convert
new-schema). Memory leak was detected by valgrind while executing
functional test "schema conversion online - clustered"
==16202== 96 bytes in 6 blocks are definitely lost in loss record 326 of 399
==16202== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==16202== by 0x44A5D4: xmalloc (util.c:138)
==16202== by 0x4377A6: alloc_default_atoms (ovsdb-data.c:315)
==16202== by 0x437F18: ovsdb_datum_init_default (ovsdb-data.c:918)
==16202== by 0x413D82: ovsdb_row_create (row.c:59)
==16202== by 0x40AA53: ovsdb_convert_table (file.c:220)
==16202== by 0x40AA53: ovsdb_convert (file.c:275)
==16202== by 0x416BE1: ovsdb_trigger_try (trigger.c:255)
==16202== by 0x40D29E: ovsdb_jsonrpc_trigger_create (jsonrpc-server.c:1119)
==16202== by 0x40D29E: ovsdb_jsonrpc_session_got_request (jsonrpc-server.c:986)
==16202== by 0x40D29E: ovsdb_jsonrpc_session_run (jsonrpc-server.c:556)
==16202== by 0x40D29E: ovsdb_jsonrpc_session_run_all (jsonrpc-server.c:586)
==16202== by 0x40D29E: ovsdb_jsonrpc_server_run (jsonrpc-server.c:401)
==16202== by 0x40682E: main_loop (ovsdb-server.c:209)
==16202== by 0x40682E: main (ovsdb-server.c:460)
The problem was in ovsdb_datum_convert() function, which overrides
pointers to datum memory allocated in ovsdb_row_create() function.
Fix was done by freeing this memory before ovsdb_datum_convert()
is called.
Signed-off-by: Damijan Skvarc <damjan.skvarc@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Since Python 3 is now mandatory, Extra Packages for Enterprise Linux
(EPEL) repository is needed in order to build OVS on RHEL7.
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
parse_tcp_flags() does not care about vlan tags in a packet thus
not able to parse them. As a result, if partial offloading is
enabled in userspace datapath vlan packets are not parsed, i.e.
has no initialized offsets. This causes OVS crash on any attempt
to access/modify packet header fields.
For example, having the flow with following actions:
in_port=1,ip,actions=mod_nw_src:192.168.0.7,output:IN_PORT
will lead to OVS crash on vlan packet handling:
Process terminating with default action of signal 11 (SIGSEGV)
Invalid read of size 4
at 0x785657: get_16aligned_be32 (unaligned.h:249)
by 0x785657: odp_set_ipv4 (odp-execute.c:82)
by 0x785657: odp_execute_masked_set_action (odp-execute.c:527)
by 0x785657: odp_execute_actions (odp-execute.c:894)
by 0x74CDA9: dp_netdev_execute_actions (dpif-netdev.c:7355)
by 0x74CDA9: packet_batch_per_flow_execute (dpif-netdev.c:6339)
by 0x74CDA9: dp_netdev_input__ (dpif-netdev.c:6845)
by 0x74DB6E: dp_netdev_input (dpif-netdev.c:6854)
by 0x74DB6E: dp_netdev_process_rxq_port (dpif-netdev.c:4287)
by 0x74E863: dpif_netdev_run (dpif-netdev.c:5264)
by 0x703F57: type_run (ofproto-dpif.c:370)
by 0x6EC8B8: ofproto_type_run (ofproto.c:1760)
by 0x6DA52B: bridge_run__ (bridge.c:3188)
by 0x6E083F: bridge_run (bridge.c:3252)
by 0x1642E4: main (ovs-vswitchd.c:127)
Address 0xc is not stack'd, malloc'd or (recently) free'd
Fix that by properly parsing vlan tags first. Function 'parse_dl_type'
transformed for that purpose as it had no users anyway.
Added unit test for packet modification with partial offloading that
triggers above crash.
Fixes: aab96ec4d8 ("dpif-netdev: retrieve flow directly from the flow mark")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
This patch provides essential fixes for OVS to support
RHEL7.7's new kernel.
make rpm-fedora-kmod \
RPMBUILD_OPT='-D "kversion 3.10.0-1062.1.2.el7.x86_64"'
Tested-by: Greg Rose <gvrose8192@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Presently, replication is not allowed if there is a schema version mismatch between
the schema returned by the active ovsdb-server and the local db schema. This is
causing failures in OVN DB HA deployments during uprades.
In the case of OpenStack tripleo deployment with OVN, OVN DB ovsdb-servers are
deployed on a multi node controller cluster in active/standby mode. During
minor updates or major upgrades, the cluster is updated one at a time. If
a node A is running active OVN DB ovsdb-servers and when it is updated, another
node B becomes active. After the update when OVN DB ovsdb-servers in A are started,
these ovsdb-servers fail to replicate from the active if there is a schema
version mismatch.
This patch addresses this issue by allowing replication even if there is a
schema version mismatch only if all the active db schema tables and its colums are
present in the local db schema.
This should not result in any data loss.
Signed-off-by: Numan Siddique <numans@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Issue:
When LLDP is enabled on a port, a structure to hold LLDP related state
is created and that structure has a reference to the port. The ofproto
monitor thread accesses the LLDP structure to periodically send packets
over the associated port. When the port is deleted, the LLDP structure
is not cleaned up and it continues to refer to the deleted port.
When the monitor thread attempts to access the deleted port OVS crashes.
Crash can happen with bridge delete and bond delete also.
Fix:
Remove all references to the LLDP structure and free it when
the port is deleted.
Signed-off-by: Surya Rudra <rudrasurya.r@altencalsoftlabs.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
The DPDK library allows OVS fast access to packet I/O in userspace. It
is not a datapath. This commit avoids using that term.
Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Some users would find it useful to know the particular OVS version that
introduced a feature to the OVS tree kernel module or to the OVS
userspace (DPDK) datapath implementation. This patch updates the FAQ
to include that information.
This information is primarily gleaned from the top-level NEWS file.
For most of these, I did not verify them by looking carefully through
the history, so some of them may be inaccurate, although a few people
made corrections in review.
Requested-by: Jianjun Shen <shenj@vmware.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
It is helpful in reporting main thread that is yet to enable bond slave,
but link state was brought up by lacp thread and capture this desync
between ovs threads for debugging.
Fixes: a8448cb170 ("lacp: Avoid packet drop on LACP bond after link up")
Signed-off-by: Gowrishankar Muthukrishnan <gmuthukr@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
The tcpundump tool expects all packets to be a length which aligns to
exactly a 4-nibble boundary. This means packets like DNS requests will be
stripped before being correctly processed. Fix this by allowing at least
two nibbles (or one byte) alignment.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Running 'ovs-tcpundump -V' will cause ovs-tcpundump to start processing on
stdin. Instead, print the version and exit.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
'tx_q' array is allocated for each DPDK netdev. 'struct dpdk_tx_queue'
is 8 bytes long, so 8 tx queues are sharing the same cache line in
case of 64B cacheline size. This causes 'false sharing' issue in
mutliqueue case because taking the spinlock implies write to memory
i.e. cache invalidation.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
We can't easily update the kernel on TravisCI to run system tests
with AF_XDP, but we could run build tests with libbpf and headers
from newer kernels.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
A previous patch removed python2 support from ovs. So we can remove
this condition and make python3 mandatory for builds. Without this
patch, make rpm-fedora on centos 7 fails unless we pass
RPMBUILD_OPT="--with build_python3".
Signed-off-by: Numan Siddique <numans@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Google's oss-fuzz builder bots were complaining that miniflow_target is
too slow to fuzz in that some tests take longer than a second to
complete. This patch fixes this by replacing the random flow generation
within the harness to a more simpler scenario.
Signed-off-by: Bhargava Shastry <bshas3@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Upstream commit:
commit 248d45f1e1934f7849fbdc35ef1e57151cf063eb
Author: Yi-Hung Wei <yihung.wei@gmail.com>
Date: Fri Oct 4 09:26:44 2019 -0700
openvswitch: Allow attaching helper in later commit
This patch allows to attach conntrack helper to a confirmed conntrack
entry. Currently, we can only attach alg helper to a conntrack entry
when it is in the unconfirmed state. This patch enables an use case
that we can firstly commit a conntrack entry after it passed some
initial conditions. After that the processing pipeline will further
check a couple of packets to determine if the connection belongs to
a particular application, and attach alg helper to the connection
in a later stage.
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Backports the following upstream commit with some backward compatibility
change.
commit f319ca6557c10a711facc4dd60197470796d3ec1
Author: Geert Uytterhoeven <geert@linux-m68k.org>
Date: Wed May 8 08:52:32 2019 +0200
openvswitch: Replace removed NF_NAT_NEEDED with IS_ENABLED(CONFIG_NF_NAT)
Commit 4806e975729f99c7 ("netfilter: replace NF_NAT_NEEDED with
IS_ENABLED(CONFIG_NF_NAT)") removed CONFIG_NF_NAT_NEEDED, but a new user
popped up afterwards.
Fixes: fec9c271b8f1bde1 ("openvswitch: load and reference the NAT helper.")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Florian Westphal <fw@strlen.de>
Acked-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
upstream commit:
commit ca96534630e2edfd73121c487c957b17eca3b7d7
Author: Colin Ian King <colin.king@canonical.com>
Date: Wed May 1 14:41:58 2019 +0100
openvswitch: check for null pointer return from nla_nest_start_noflag
The call to nla_nest_start_noflag can return null in the unlikely
event that nla_put returns -EMSGSIZE. Check for this condition to
avoid a null pointer dereference on pointer nla_reply.
Addresses-Coverity: ("Dereference null return value")
Fixes: 11efd5cb04a1 ("openvswitch: Support conntrack zone limit")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
This commit backports the following upstream commit, and two functions
in nf_conntrack_helper.h.
Upstream commit:
commit fec9c271b8f1bde1086be5aa415cdb586e0dc800
Author: Flavio Leitner <fbl@redhat.com>
Date: Wed Apr 17 11:46:17 2019 -0300
openvswitch: load and reference the NAT helper.
This improves the original commit 17c357efe5ec ("openvswitch: load
NAT helper") where it unconditionally tries to load the module for
every flow using NAT, so not efficient when loading multiple flows.
It also doesn't hold any references to the NAT module while the
flow is active.
This change fixes those problems. It will try to load the module
only if it's not present. It grabs a reference to the NAT module
and holds it while the flow is active. Finally, an error message
shows up if either actions above fails.
Fixes: 17c357efe5ec ("openvswitch: load NAT helper")
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
This patch backports the following upstream commit within the
openvswitch kernel module with some checks so that it also works
in the older kernel.
Upstream commit:
commit ef6243acb4782df587a4d7d6c310fa5b5d82684b
Author: Johannes Berg <johannes.berg@intel.com>
Date: Fri Apr 26 14:07:31 2019 +0200
genetlink: optionally validate strictly/dumps
Add options to strictly validate messages and dump messages,
sometimes perhaps validating dump messages non-strictly may
be required, so add an option for that as well.
Since none of this can really be applied to existing commands,
set the options everwhere using the following spatch:
@@
identifier ops;
expression X;
@@
struct genl_ops ops[] = {
...,
{
.cmd = X,
+ .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
...
},
...
};
For new commands one should just not copy the .validate 'opt-out'
flags and thus get strict validation.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
This patch backports the openvswitch changes and update the compat layer
for the following upstream patch.
commit ae0be8de9a53cda3505865c11826d8ff0640237c
Author: Michal Kubecek <mkubecek@suse.cz>
Date: Fri Apr 26 11:13:06 2019 +0200
netlink: make nla_nest_start() add NLA_F_NESTED flag
Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
netlink based interfaces (including recently added ones) are still not
setting it in kernel generated messages. Without the flag, message parsers
not aware of attribute semantics (e.g. wireshark dissector or libmnl's
mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
the structure of their contents.
Unfortunately we cannot just add the flag everywhere as there may be
userspace applications which check nlattr::nla_type directly rather than
through a helper masking out the flags. Therefore the patch renames
nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
are rewritten to use nla_nest_start().
Except for changes in include/net/netlink.h, the patch was generated using
this semantic patch:
@@ expression E1, E2; @@
-nla_nest_start(E1, E2)
+nla_nest_start_noflag(E1, E2)
@@ expression E1, E2; @@
-nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
+nla_nest_start(E1, E2)
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Starting from the following upstream commit, NF_NAT_NEEDED is replaced
by IS_ENABLED(CONFIG_NF_NAT) in the upstream kernel. This patch makes
some changes so that our in tree ovs kernel module is compatible to
both old and new kernels.
Upstream commit:
commit 4806e975729f99c7908d1688a143f1e16d464e6c
Author: Florian Westphal <fw@strlen.de>
Date: Wed Mar 27 09:22:26 2019 +0100
netfilter: replace NF_NAT_NEEDED with IS_ENABLED(CONFIG_NF_NAT)
NF_NAT_NEEDED is true whenever nat support for either ipv4 or ipv6 is
enabled. Now that the af-specific nat configuration switches have been
removed, IS_ENABLED(CONFIG_NF_NAT) has the same effect.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
upstream patch:
commit fa7e428c6b7ed3281610511a2b2ec716d9894be8
Author: Flavio Leitner <fbl@sysclose.org>
Date: Mon Mar 25 15:58:31 2019 -0300
openvswitch: add seqadj extension when NAT is used.
When the conntrack is initialized, there is no helper attached
yet so the nat info initialization (nf_nat_setup_info) skips
adding the seqadj ext.
A helper is attached later when the conntrack is not confirmed
but is going to be committed. In this case, if NAT is needed then
adds the seqadj ext as well.
Fixes: 16ec3d4fbb96 ("openvswitch: Fix cached ct with helper.")
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
The following two upstream commits merge nf_nat_ipv4 and nf_nat_ipv6
into nf_nat core, and move some header files around. To handle
these modifications, this patch detects the upstream changes, uses
the header files and config symbols properly.
Ideally, we should replace CONFIG_NF_NAT_IPV4 and CONFIG_NF_NAT_IPV6 with
CONFIG_NF_NAT and CONFIG_IPV6. In order to keep backward compatibility,
we keep the checking of CONFIG_NF_NAT_IPV4/6 as is for the old kernel,
and replace them with marco for the new kernel.
upstream commits:
3bf195ae6037 ("netfilter: nat: merge nf_nat_ipv4,6 into nat core")
d2c5c103b133 ("netfilter: nat: remove nf_nat_l3proto.h and nf_nat_core.h")
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
After upstream net-next commit 303e0c558959 ("netfilter: conntrack:
avoid unneeded nf_conntrack_l4proto lookups") nf_ct_invert_tuplepr()
is no longer available in the kernel.
Ideally, we should be in sync with upstream kernel by calling
nf_ct_invert_tuple() directly in conntrack.c. However,
nf_ct_invert_tuple() has different function signature in older kernel,
and it would be hard to replace that in the compat layer. Thus, we
use rpl_nf_ct_invert_tuple() in conntrack.c and maintain compatibility
in the compat layer so that ovs kernel module runs smoothly in both
new and old kernel.
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
upstream commit:
commit a277d516de5f498c91d91189717ef7e01102ad27
Author: Arnd Bergmann <arnd@arndb.de>
Date: Fri Nov 2 16:36:55 2018 +0100
openvswitch: fix linking without CONFIG_NF_CONNTRACK_LABELS
When CONFIG_CC_OPTIMIZE_FOR_DEBUGGING is enabled, the compiler
fails to optimize out a dead code path, which leads to a link failure:
net/openvswitch/conntrack.o: In function `ovs_ct_set_labels':
conntrack.c:(.text+0x2e60): undefined reference to `nf_connlabels_replace'
In this configuration, we can take a shortcut, and completely
remove the contrack label code. This may also help the regular
optimization.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Upstream commmit:
commit 895b5c9f206eb7d25dc1360a8ccfc5958895eb89
Author: Florian Westphal <fw@strlen.de>
Date: Sun Sep 29 20:54:03 2019 +0200
netfilter: drop bridge nf reset from nf_reset
commit 174e23810cd31
("sk_buff: drop all skb extensions on free and skb scrubbing") made napi
recycle always drop skb extensions. The additional skb_ext_del() that is
performed via nf_reset on napi skb recycle is not needed anymore.
Most nf_reset() calls in the stack are there so queued skb won't block
'rmmod nf_conntrack' indefinitely.
This removes the skb_ext_del from nf_reset, and renames it to a more
fitting nf_reset_ct().
In a few selected places, add a call to skb_ext_reset to make sure that
no active extensions remain.
I am submitting this for "net", because we're still early in the release
cycle. The patch applies to net-next too, but I think the rename causes
needless divergence between those trees.
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Added some compat layer fixups for nf_reset_ct. This is just a portion
of the upstream commit that applies to openvswitch.
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Upstream commit:
commit aef833c58d321f09ae4ce4467723542842ba9faf
Author: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Fri Jul 19 18:20:13 2019 +0200
net: openvswitch: rename flow_stats to sw_flow_stats
There is a flow_stats structure defined in include/net/flow_offload.h
and a follow up patch adds #include <net/flow_offload.h> to
net/sch_generic.h.
This breaks compilation since OVS codebase includes net/sock.h which
pulls in linux/filter.h which includes net/sch_generic.h.
In file included from ./include/net/sch_generic.h:18:0,
from ./include/linux/filter.h:25,
from ./include/net/sock.h:59,
from ./include/linux/tcp.h:19,
from net/openvswitch/datapath.c:24
This definition takes precedence on OVS since it is placed in the
networking core, so rename flow_stats in OVS to sw_flow_stats since
this structure is contained in sw_flow.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Upstream commit:
commit 0e141f757b2c78c983df893e9993313e2dc21e38
Author: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Date: Fri Sep 27 14:58:20 2019 +0800
erspan: remove the incorrect mtu limit for erspan
erspan driver calls ether_setup(), after commit 61e84623ace3
("net: centralize net_device min/max MTU checking"), the range
of mtu is [min_mtu, max_mtu], which is [68, 1500] by default.
It causes the dev mtu of the erspan device to not be greater
than 1500, this limit value is not correct for ipgre tap device.
Tested:
Before patch:
# ip link set erspan0 mtu 1600
Error: mtu greater than device maximum.
After patch:
# ip link set erspan0 mtu 1600
# ip -d link show erspan0
21: erspan0@NONE: <BROADCAST,MULTICAST> mtu 1600 qdisc noop state DOWN
mode DEFAULT group default qlen 1000
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 0
Fixes: 61e84623ace3 ("net: centralize net_device min/max MTU checking")
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Upstream commit:
commit ea8564c865299815095bebeb4b25bef474218e4c
Author: Li RongQing <lirongqing@baidu.com>
Date: Tue Sep 24 19:11:52 2019 +0800
openvswitch: change type of UPCALL_PID attribute to NLA_UNSPEC
userspace openvswitch patch "(dpif-linux: Implement the API
functions to allow multiple handler threads read upcall)"
changes its type from U32 to UNSPEC, but leave the kernel
unchanged
and after kernel 6e237d099fac "(netlink: Relax attr validation
for fixed length types)", this bug is exposed by the below
warning
[ 57.215841] netlink: 'ovs-vswitchd': attribute type 5 has an invalid length.
Fixes: 5cd667b0a456 ("openvswitch: Allow each vport to have an array of 'port_id's")
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: beb1c69a3 ("datapath: Allow each vport to have an array of 'port_id's.")
Cc: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Upstream commit:
commit 260637903f47f20c5918bb5c1eea52b2a28ea863
Author: Arnd Bergmann <arnd@arndb.de>
Date: Mon Jul 22 17:00:01 2019 +0200
ovs: datapath: hide clang frame-overflow warnings
Some functions in the datapath code are factored out so that each
one has a stack frame smaller than 1024 bytes with gcc. However,
when compiling with clang, the functions are inlined more aggressively
and combined again so we get
net/openvswitch/datapath.c:1124:12: error: stack frame size of 1528 bytes in function 'ovs_flow_cmd_set' [-Werror,-Wframe-larger-than=]
Marking both get_flow_actions() and ovs_nla_init_match_and_action()
as 'noinline_for_stack' gives us the same behavior that we see with
gcc, and no warning. Note that this does not mean we actually use
less stack, as the functions call each other, and we still get
three copies of the large 'struct sw_flow_key' type on the stack.
The comment tells us that this was previously considered safe,
presumably since the netlink parsing functions are called with
a known backchain that does not also use a lot of stack space.
Fixes: 9cc9a5cb176c ("datapath: Avoid using stack larger than 1024.")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Currently, ovs supports to offload max TCA_ACT_MAX_PRIO(32) actions.
But net sched api has a limit of 4K message size which is not enough
for 32 actions when echo flag is set.
After a lot of testing, we find that 16 actions is a reasonable number.
So in this commit, we introduced a new define to limit the max actions.
Fixes: 0c70132cd288("tc: Make the actions order consistent")
Signed-off-by: Chris Mi <chrism@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
In commit 057772cf24 the function is missing the rpl_ prefix
and the define that replaces the original function should come
after the function definition.
Fixes: 057772cf24 ("compat: Backport nf_ct_tmpl_alloc().")
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
It is possible that user install libunwind but not libunwind-devel,
and it will run into a compilation error. So we need to check the
existence of the library and the header file.
Fixes: e2ed6fbeb1 ("fatal-signal: Catch SIGSEGV and print backtrace.")
Suggested-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
'OVS_CTL_TIMEOUT' environment variable is exported in tests/atlocal.in
and controls timeouts for all OVS utilities in testsuite.
There should be no manual tweaks for each single command.
This helps with running tests under valgrind where commands could
take really long time as you only need to change 'OVS_CTL_TIMEOUT'
in a single place.
Few manual timeouts were left in places where they make sense.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Building ovn/ovs container breaks while configure:
checking for Python 3 (version 3.4 or later)... no
configure: error: Python 3.4 or later is required but not found in
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin,
please install it or set to point to it
As per ovs commit 1ca0323e7c
Require Python 3 and remove support for Python 2.
Signed-off-by: Aliasgar Ginwala <aginwala@ebay.com>
Signed-off-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
This workaround only applied to kernels earlier than 2.6.37, but OVS
only supports 3.10 and later.
As the original author of this code, I won't miss it.
Tested-by: Greg Rose <gvrose8192@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Usually a plural name refers to an array, but 'socks' and 'socksp' were
only single objects, so this changes their names to 'sock' and 'sockp'.
Usually a 'p' suffix means that a variable is an output argument, but
that was only true in one place here, so this changes the names of the
other variables to plain 'sock'.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Remove static qualifier from compose_ipv6 definition and export it to
OVN. compose_ipv6 will be used in order to add IPv6 prefix delegation
support to OVN
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Valgrind reports:
20 bytes in 1 blocks are definitely lost in loss record 94 of 353
by 0x532594: xmalloc (util.c:138)
by 0x553EAD: nl_sock_create (netlink-socket.c:146)
by 0x54331D: create_nl_sock (dpif-netlink.c:255)
by 0x54331D: dpif_netlink_port_add__ (dpif-netlink.c:756)
by 0x5435F6: dpif_netlink_port_add_compat (dpif-netlink.c:876)
by 0x5435F6: dpif_netlink_port_add (dpif-netlink.c:922)
by 0x47EC1D: dpif_port_add (dpif.c:584)
by 0x42B35F: port_add (ofproto-dpif.c:3721)
by 0x41E64A: ofproto_port_add (ofproto.c:2032)
by 0x40B3FE: iface_do_create (bridge.c:1817)
by 0x40B3FE: iface_create (bridge.c:1855)
by 0x40B3FE: bridge_add_ports__ (bridge.c:943)
by 0x40D14A: bridge_add_ports (bridge.c:959)
by 0x40D14A: bridge_reconfigure (bridge.c:673)
by 0x410D75: bridge_run (bridge.c:3050)
by 0x407614: main (ovs-vswitchd.c:127)
This leak is because when vport_add_channel() returns 0, it is expected
to take the ownership of 'socksp'. This patch fixes this issue.
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
The commit [1] force drops all connections when the db read/write status changes.
Prior to the commit [1], when there was read/write status change, the existing
jsonrpc sessions with 'db_change_aware' set to true, were not updated with the
changed 'read_only' value. If the db status was changed to 'standby', the existing
clients could still write to the db.
In the case of pacemaker OVN HA, OVN OCF script 'start' action starts the
ovsdb-servers in read-only state and later, it sets to read-write in the
'promote' action. We have observed that if some ovn-controllers connect to
the SB ovsdb-server (in read-only state) just before the 'promote' action,
the connection is not reset all the times and these ovn-controllers remain connected
to the SB ovsdb-server in read-only state all the time. Even though
the commit [1] calls 'ovsdb_jsonrpc_server_reconnect()' with 'forced' flag
set to true when the db read/write status changes, somehow the FSM misses resetting
the connections of these ovn-controllers.
I think this needs to be addressed in the FSM. This patch doesn't address
this FSM issue. Instead it changes the behavior of 'ovsdb_jsonrpc_server_set_read_only()'
by setting the 'read_only' flag of all the jsonrpc sessions instead of forcefully
resetting the connection.
I think there is no need to reset the connection. In large scale production
deployements with OVN, this results in unnecessary waste of CPU cycles as ovn-controllers
will have to connect twice - once during 'start' action and again during 'promote'.
[1] - 2a9679e3b2c6("ovsdb-server: drop all connections on read/write status change")
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>