This patch backports the following upstream commit from net-next, and
defines HAVE_NF_NAT_RANGE2 to determine whether to use
'struct nf_nat_range2'.
Upstream commit:
commit 2eb0f624b709e78ec8e2f4c3412947703db99301
Author: Thierry Du Tre <thierry@dtsystems.be>
Date: Wed Apr 4 15:38:22 2018 +0200
netfilter: add NAT support for shifted portmap ranges
This is a patch proposal to support shifted ranges in portmaps. (i.e. tcp/udp
incoming port 5000-5100 on WAN redirected to LAN 192.168.1.5:2000-2100)
Currently DNAT only works for single port or identical port ranges. (i.e.
ports 5000-5100 on WAN interface redirected to a LAN host while original
destination port is not altered) When different port ranges are configured,
either 'random' mode should be used, or else all incoming connections are
mapped onto the first port in the redirect range. (in described example
WAN:5000-5100 will all be mapped to 192.168.1.5:2000)
This patch introduces a new mode indicated by flag NF_NAT_RANGE_PROTO_OFFSET
which uses a base port value to calculate an offset with the destination port
present in the incoming stream. That offset is then applied as index within the
redirect port range (index modulo rangewidth to handle range overflow).
In described example the base port would be 5000. An incoming stream with
destination port 5004 would result in an offset value 4 which means that the
NAT'ed stream will be using destination port 2004.
Other possibilities include deterministic mapping of larger or multiple ranges
to a smaller range : WAN:5000-5999 -> LAN:5000-5099 (maps WAN port 5*xx to port
51xx)
This patch does not change any current behavior. It just adds new NAT proto
range functionality which must be selected via the specific flag when intended
to use.
A patch for iptables (libipt_DNAT.c + libip6t_DNAT.c) will also be proposed
which makes this functionality immediately available.
Signed-off-by: Thierry Du Tre <thierry@dtsystems.be>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Justin Pettit <jpettit@ovn.org>
This patch backports the following two upstream commits and
add a new symbol HAVE_NET_RWSEM in acinclude.m4 to determine
whether to use new introduced rw_semaphore, net_rwsem.
Upstream commit:
commit f0b07bb151b098d291fd1fd71ef7a2df56fb124a
Author: Kirill Tkhai <ktkhai@virtuozzo.com>
Date: Thu Mar 29 19:20:32 2018 +0300
net: Introduce net_rwsem to protect net_namespace_list
rtnl_lock() is used everywhere, and contention is very high.
When someone wants to iterate over alive net namespaces,
he/she has no a possibility to do that without exclusive lock.
But the exclusive rtnl_lock() in such places is overkill,
and it just increases the contention. Yes, there is already
for_each_net_rcu() in kernel, but it requires rcu_read_lock(),
and this can't be sleepable. Also, sometimes it may be need
really prevent net_namespace_list growth, so for_each_net_rcu()
is not fit there.
This patch introduces new rw_semaphore, which will be used
instead of rtnl_mutex to protect net_namespace_list. It is
sleepable and allows not-exclusive iterations over net
namespaces list. It allows to stop using rtnl_lock()
in several places (what is made in next patches) and makes
less the time, we keep rtnl_mutex. Here we just add new lock,
while the explanation of we can remove rtnl_lock() there are
in next patches.
Fine grained locks generally are better, then one big lock,
so let's do that with net_namespace_list, while the situation
allows that.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Upstream commit:
commit ec9c780925c57588637e1dbd8650d294107311c0
Author: Kirill Tkhai <ktkhai@virtuozzo.com>
Date: Thu Mar 29 19:21:09 2018 +0300
ovs: Remove rtnl_lock() from ovs_exit_net()
Here we iterate for_each_net() and removes
vport from alive net to the exiting net.
ovs_net::dps are protected by ovs_mutex(),
and the others, who change it (ovs_dp_cmd_new(),
__dp_destroy()) also take it.
The same with datapath::ports list.
So, we remove rtnl_lock() here.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Justin Pettit <jpettit@ovn.org>
Upstream commit:
commit ddc502dfed600bff0b61d899f70d95b76223fdfc
Author: zhangliping <zhangliping02@baidu.com>
Date: Fri Mar 9 10:08:50 2018 +0800
openvswitch: meter: fix the incorrect calculation of max delta_t
Max delat_t should be the full_bucket/rate instead of the full_bucket.
Also report EINVAL if the rate is zero.
Fixes: 96fbc13d7e77 ("openvswitch: Add meter infrastructure")
Cc: Andy Zhou <azhou@ovn.org>
Signed-off-by: zhangliping <zhangliping02@baidu.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Justin Pettit <jpettit@ovn.org>
Newer selinux base policies now split out 'map' actions, as well as
adding more explicit checks for hugetlbfs objects. Where previously these
weren't required, recent changes have flagged the allocation of hugepages
and subsequent clearing. This means that the hugepage storage information
for the DPDK .rte_config, and clearing actions copying from /dev/zero will
trigger selinux denials.
This commit allows openvswitch to have more permissions for the hugetlbfs
allocation and use.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Ansis Atteka <aatteka@ovn.org>
When for some reason the built-in kernel ip6_gre module is loaded that
would prevent the openvswitch kernel driver from loading. Even when
the built-in kernel ip6_gre module is loaded we can still perform
port mirroring via Tx. Adjust the error handling and detect when
the ip6_gre kernel module is loaded and in that case still enable
IPv6 GRE/ERSPAN Tx.
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
The RHEL 7 kernels expect the secret timer interval to be initialized
before calling the inet_frags_init() function. By not initializing it
the inet_frags_secret_rebuild() function was running on every tick
rather than on the expected interval. This caused occasional panics
from page faults when inet_frags_secret_rebuild() would try to rearm a
timer from the openvswitch kernel module which had just been removed.
Also remove the prior, and now unnecessary, work around.
VMware BZ 2094203
Fixes: 595e069a ("compat: Backport IPv4 reassembly.")
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Added new types to the flow dump filter, and allowed multiple filter
types to be passed at once, as a comma separated list. The new types
added are:
* tc - specifies flows handled by the tc dp
* non-offloaded - specifies flows not offloaded to the HW
* all - specifies flows of all types
The type list is now fully parsed by the dpctl, and a new struct was
added to dpif which enables dpctl to define which types of dumps to
provide, rather than passing the type string and having dpif parse it.
Signed-off-by: Gavi Teitz <gavi@mellanox.com>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Currently the inner VLAN header is ignored when using the TC data-path.
As TC flower supports QinQ, now we can offload the rules to match on both
outer and inner VLAN headers.
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
By default, these function are to change the first vlan vid and pcp
in the flow. Add a parameter as index for vlans if we want to handle
the second ones.
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Currently we only support 802.1q, so we can offload push action without
specifying any vlan type. Kernel will push 802.1q ethertype by default.
But to support QinQ, we need to tell what ethertype is in push action as
it could be 802.1ad.
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
This commit renames HAVE_PYTHON to HAVE_PYTHON2 and PYTHON to PYTHON2
and adds HAVE_PYTHON and PYTHON with a different semantics:
- If PYTHON environment variable is set, use it as PYTHON
- If a python2 interpreter is available, PYTHON became the python2 interpreter
- If a python3 interpreter is available, PYTHON became the python3 interpreter
PYTHON is only used to run the python scripts needed by the build system
NOTE:
Since currently most of the utilities and bugtool doesn't support Python3,
they're installed only if python2 is available. This will be fixed in later
commits.
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Opening a file with 'rw' in Python3 returns an error, moreover using 'rw' in
Python2 is wrong too since it opens the file using O_RDONLY and not by using
O_RDWR.
This commit fixes it by using the low-level os.open function with O_RDWR
as suggested by the Linux kernel (tuntap.txt) documentation.
This commit fixes also some usual bytes vs string incompatibilities.
Tested on Python 2.7.15 and Python 3.6.5
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Added parenthesis after print and use "as" instead of "," in except.
This commit fixes also a couple of flake8 warnings:
utilities/ovs-tcpundump:23:1: E302 expected 2 blank lines, found 1
utilities/ovs-tcpundump:35:1: E305 expected 2 blank lines after class or
function definition, found 1
Tested on Python 2.7.15 and Python 3.6.5
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
A common issue is users pairing the incorrect version of OVS to DPDK
when working outside of the build tree.
To avoid this, this commit updates the OVS DPDK documentation to explicitly
flag that users should consult the OVS to DPDK release mapping in FAQ if
working outside of the OVS build tree.
Suggested-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
This patch adds a signature match cache (SMC) after exact match
cache (EMC). The difference between SMC and EMC is SMC only stores
a signature of a flow thus it is much more memory efficient. With
same memory space, EMC can store 8k flows while SMC can store 1M
flows. It is generally beneficial to turn on SMC but turn off EMC
when traffic flow count is much larger than EMC size.
SMC cache will map a signature to an dp_netdev_flow index in
flow_table. Thus, we add two new APIs in cmap for lookup key by
index and lookup index by key.
For now, SMC is an experimental feature that it is turned off by
default. One can turn it on using ovsdb options.
Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Co-authored-by: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Acked-by: Billy O'Mahony <billy.o.mahony@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Preparatory work for getting rid of ctl_fatal() in command parser.
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Make ovn-nbctl act as a unixctl server if we were asked to detach. This
turns ovn-nbctl into a long-lived process that acts a proxy for
interacting with NB DB. The main difference to regular mode of ovn-nbctl
is that in the daemon mode, a local copy of database contents has to be
obtained only once.
Just two unixctl commands are supported 'run' and 'exit'. The former can
be used to run any ovn-nbctl command or a batch of them as so:
ovs-appctl -t ovn-nbctl run [OPTIONS] COMMAND [-- [OPTIONS] COMMAND] ...
Running commands that have not yet been converted to not use ctl_fatal()
will result in death of the daemon process. However, --monitor option
can be used to keep the daemon running.
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Provide a handler for options that change how the main loop behaves.
This will allow code reuse for option parsing in daemon mode.
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
This will allow us to direct oneline-formatted output to other sinks
than stdout if needed. Preparatory work for daemon mode.
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Extend the main loop and the command runner so that the caller can
specify a timeout for poll_block(). This will allow us to break out of
the main loop when waiting on IDL, like in the blocked '--wait=sb/hv
sync' case.
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Instead of terminating the process, return the error to the caller.
This will allow us to reuse the prerequisites runner in daemon mode.
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Let the caller handle the errors instead of reporting it and
terminating. Prepare for reusing the main loop in daemon mode.
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Instead of terminating the process, return the error to the caller.
This will allow us to reuse the main loop in daemon mode.
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Let the caller decide how to handle the error. Prepare for using the
parser in ovn-nbctl daemon mode.
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Let the caller handle the error. Needed for ovn-nbctl daemon mode.
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Reset the global state, if transaction succeeded. Otherwise nbctl_exit()
callback will try to clean up on any fatal error.
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Introduce an output parameter for the flag that signals need to retry
running the command. This leaves the return value for error reporting.
Preparatory work for reusing the main loop in daemon mode.
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Destroy IDL resources in the routine where we allocated them.
Preparatory work for reusing the main loop in daemon mode.
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Destroy commands in the same routine where they were allocated.
Preparatory work for reusing the main loop in daemon mode.
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Split out a routine for the main ovn-nbctl loop.
Preparatory work for introducing daemon mode.
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
If IDL was created with monitoring and alerts turned on by default for
all columns, then there is no harm in allowing the API users to ask
again for monitoring and alerts to be enabled for any given column.
This allows us to run prerequisites handlers for db-ctl and ovn-nbctl
commands once the IDL has already ran once.
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Having a constant in addition to the constant expression for the default
table style allows us to reset 'struct table_style' variables to default
style.
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>