If the database server sent an error reply to a monitor_cond request, and
the error was not a JSON string, then passing the error to json_string()
caused an assertion failure.
Found by inspection.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
This change documents the IDL state machine, adds other comments,
and fixes a spelling error in a comment.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
An address sanity check is done on icmp error packets to
check that the icmp error payload makes sense w.r.t. the
packet itself.
The sanity check was partially incorrect since it tried
to verify the source address of the error packet against the
original destination, which does not makes since the error
can be generated by any intermediate node.
Reported-by: wangzhike <wangzhike@jd.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2017-December/341609.html
Fixes: a489b1685 ("conntrack: New userspace connection tracker.")
CC: Daniele Di Proietto <diproiettod@vmware.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: wangzhike <wangzhike@jd.com>
Co-authored-by: wangzhike <wangzhike@jd.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Presently, alg processing is enabled by default to better exercise code.
This is similar to kernels before 4.7 as well. The recommended default
behavior in the newer kernels is to only process algs if a helper is
supplied in a conntrack rule. The behavior is changed to match the
later kernels.
A test is extended to check that the control connection is still
created in such a case.
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Algs can use variable control port numbers for servers.
The main use case is a kind of feeble security measure; the
thinking being by some is that it obscures the alg traffic.
It is really not very effective, but the kernel has this
capability. This patch mimics the capability.
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Upcoming requirements for new algs make it desirable to split out
alg helpers more cleanly.
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
"sparse" warns when odp_port_t is used directly in an inequality
comparison. This avoids the warning.
CC: Kevin Traynor <ktraynor@redhat.com>
Fixes: a130f1a89b ("dpif-netdev: Add port/queue tiebreaker to rxq_cycle_sort.")
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Ian Stokes <ian.stokes@intel.com>
These options have existed for a while, but were not expressed in the
help information. Inform the user that these options exist, and give
some basic help.
Reported-by: Saravanan KR <skramaja@redhat.com>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Markos Chandras <mchandras@suse.de>
To delete a conntrack entry specified by 5-tuple pass an additional
conntrack 5-tuple parameter to flush-conntrack.
Signed-off-by: Anand Kumar <kumaranand@vmware.com>
Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
Until now, compose_output_action__() has asserted that a packet output to
a patch port is not to be truncated. This commit changes this to an error
that will be included in trace output, for two reasons. First, this sounds
like only a minor problem to me which doesn't warrant killing the process.
Second, it will be easier to track down the actual problem (if any) if we
can get a trace instead of a segfault.
Reported-by: Kevin Lin <kevin@kelda.io>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2017-December/045832.html
Signed-off-by: Ben Pfaff <blp@ovn.org>
xlate_output_action() must tell some of the functions it calls whether the
packet is being truncated. Until now, it has inferred that based on
whether its max_len argument is nonzero.
Unfortunately, max_len conflates two different purposes. Historically it
was used only to limit the number of bytes of packets sent to an OpenFlow
controller in packet_in messages. When packet truncation was introduced,
it was then also used to specify the truncation length. This meant that,
for example, when xlate_output_reg_action() called into
xlate_output_action() passing along for max_len an OpenFlow controller byte
limit (which ovs-ofctl by default sets to 65535), xlate_output_action()
interpreted that as a truncation request and told the functions it called
that the packet was being truncated, which in the worst case led to
assertion failures.
This commit disentangles these two meaning of max_len, separating them into
two separate parameters, and updates the callers.
Reported-by: Kevin Lin <kevin@kelda.io>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2017-December/045841.html
Tested-by: Kevin Lin <kevin@kelda.io>
Signed-off-by: Ben Pfaff <blp@ovn.org>
The error message in question is about the monitor session ID but it
actually reports the JSON-RPC request ID instead, which is surprising.
Found by inspection.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
Until now, the ovn-controller-vtep, ovn-nbctl, and ovn-sbctl tests have
ignored "Broken pipe" and "Connection reset" messages. The same rationale
that applies to them also applies to ovs-vsctl and other utilities. It
seems easier to just always ignore them.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
At least on Linux, when process A connects to process B over a Unix
domain socket, unless process A bound its socket to a name before
it made the connection, process B gets an empty peer name. Until
now, OVS has just reported the name of the connection as "unix".
This is not meaningful, of course. I do not know of a good general
solution to this problem, but this commit attempts a step in the
right direction by at least giving each connection of this kind a
number: "unix#1", "unix#2", and so on. That way, in log messages
one can at least see which messages are related to a particular
connection.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
When a trigger executes, it can make changes to the database that fulfill
the conditions for some other trigger to execute. ovsdb-server implements
this properly, but the code in test-ovsdb for testing triggers outside
ovsdb-server did not. This fixes the problem.
Found by inspection.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
This reverts commit a807c15796.
Padding and aligning of dp_netdev_pmd_thread structure members
is useless, broken in a several ways and only greatly degrades
maintainability and extensibility of the structure.
Issues:
1. It's not working because all the instances of struct
dp_netdev_pmd_thread allocated only by usual malloc. All the
memory is not aligned to cachelines -> structure almost never
starts at aligned memory address. This means that any further
paddings and alignments inside the structure are completely
useless. Fo example:
Breakpoint 1, pmd_thread_main
(gdb) p pmd
$49 = (struct dp_netdev_pmd_thread *) 0x1b1af20
(gdb) p &pmd->cacheline1
$51 = (OVS_CACHE_LINE_MARKER *) 0x1b1af60
(gdb) p &pmd->cacheline0
$52 = (OVS_CACHE_LINE_MARKER *) 0x1b1af20
(gdb) p &pmd->flow_cache
$53 = (struct emc_cache *) 0x1b1afe0
All of the above addresses shifted from cacheline start by 32B.
Can we fix it properly? NO.
OVS currently doesn't have appropriate API to allocate aligned
memory. The best candidate is 'xmalloc_cacheline()' but it
clearly states that "The memory returned will not be at the
start of a cache line, though, so don't assume such alignment".
And also, this function will never return aligned memory on
Windows or MacOS.
2. CACHE_LINE_SIZE is not constant. Different architectures have
different cache line sizes, but the code assumes that
CACHE_LINE_SIZE is always equal to 64 bytes. All the structure
members are grouped by 64 bytes and padded to CACHE_LINE_SIZE.
This leads to a huge holes in a structures if CACHE_LINE_SIZE
differs from 64. This is opposite to portability. If I want
good performance of cmap I need to have CACHE_LINE_SIZE equal
to the real cache line size, but I will have huge holes in the
structures. If you'll take a look to struct rte_mbuf from DPDK
you'll see that it uses 2 defines: RTE_CACHE_LINE_SIZE and
RTE_CACHE_LINE_MIN_SIZE to avoid holes in mbuf structure.
3. Sizes of system/libc defined types are not constant for all the
systems. For example, sizeof(pthread_mutex_t) == 48 on my
ARMv8 machine, but only 40 on x86. The difference could be
much bigger on Windows or MacOS systems. But the code assumes
that sizeof(struct ovs_mutex) is always 48 bytes. This may lead
to broken alignment/big holes in case of padding/wrong comments
about amount of free pad bytes.
4. Sizes of the many fileds in structure depends on defines like
DP_N_STATS, PMD_N_CYCLES, EM_FLOW_HASH_ENTRIES and so on.
Any change in these defines or any change in any structure
contained by thread should lead to the not so simple
refactoring of the whole dp_netdev_pmd_thread structure. This
greatly reduces maintainability and complicates development of
a new features.
5. There is no reason to align flow_cache member because it's
too big and we usually access random entries by single thread
only.
So, the padding/alignment only creates some visibility of performance
optimization but does nothing useful in reality. It only complicates
maintenance and adds huge holes for non-x86 architectures and non-Linux
systems. Performance improvement stated in a original commit message
should be random and not valuable. I see no performance difference.
Most of the above issues are also true for some other padded/aligned
structures like 'struct netdev_dpdk'. They will be treated separately.
CC: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
CC: Ben Pfaff <blp@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
DPDK v17.11 introduces support for the vHost IOMMU feature.
This is a security feature, which restricts the vhost memory
that a virtio device may access.
This feature also enables the vhost REPLY_ACK protocol, the
implementation of which is known to work in newer versions of
QEMU (i.e. v2.10.0), but is buggy in older versions (v2.7.0 -
v2.9.0, inclusive). As such, the feature is disabled by default
in (and should remain so), for the aforementioned older QEMU
verions. Starting with QEMU v2.9.1, vhost-iommu-support can
safely be enabled, even without having an IOMMU device, with
no performance penalty.
This patch adds a new global config option, vhost-iommu-support,
that controls enablement of the vhost IOMMU feature:
ovs-vsctl set Open_vSwitch . other_config:vhost-iommu-support=true
This value defaults to false; to enable IOMMU support, this field
should be set to true when setting other global parameters on init
(such as "dpdk-socket-mem", for example). Changing the value at
runtime is not supported, and requires restarting the vswitch daemon.
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Valgrind complains in test 1019 (dpctl - add-if set-if del-if):
4,850,896 (4,850,240 direct, 656 indirect) bytes in 1 blocks are
definitely lost in loss record 364 of 364
by 0x517062: xcalloc (util.c:103)
by 0x46CBBC: dp_netdev_set_nonpmd (dpif-netdev.c:4498)
by 0x46CBBC: create_dp_netdev (dpif-netdev.c:1299)
by 0x46CBBC: dpif_netdev_open (dpif-netdev.c:1337)
by 0x472CB0: do_open (dpif.c:350)
by 0x472E6F: dpif_create (dpif.c:404)
by 0x472E6F: dpif_create_and_open (dpif.c:417)
by 0x430EBC: open_dpif_backer (ofproto-dpif.c:727)
by 0x430EBC: construct (ofproto-dpif.c:1411)
by 0x41B714: ofproto_create (ofproto.c:539)
by 0x40C84E: bridge_reconfigure (bridge.c:647)
by 0x4104C5: bridge_run (bridge.c:2998)
by 0x406FA4: main (ovs-vswitchd.c:119)
The reference count wasn't released at this earlier return.
This fix passes the test 'make check'.
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
compare_rxq_cycles sums the latest cycles from each queue for
comparison with each other. While each comparison correctly
gets the latest cycles, the cycles could change between calls
to compare_rxq_cycle. In order to use consistent values through
each call of compare_rxq_cycles, sum the cycles before qsort is
called.
Requested-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
This function is used for comparison between queues
as part of the sort. It does not do the sort itself.
As such, give it a more appropriate name.
Suggested-by: Billy O'Mahony <billy.o.mahony@intel.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Billy O'Mahony
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
rxq_cycle_sort is used to compare rx queues by their measured number
of cycles. In the event that they are equal, 0 could be returned.
However, it is observed that returning 0 results in a different sort
order on Windows/Linux. This is ok in practice but it causes a unit
test failure for
"1007: PMD - pmd-cpu-mask/distribution of rx queues" when running
on different OS's.
In order to have a consistent sort result across multiple OS's,
introduce a tiebreaker of port/queue.
Fixes: 655856ef39 ("dpif-netdev: Change rxq_scheduling to use rxq processing cycles.")
Reported-by: Alin Gabriel Serdean <aserdean@ovn.org>
Tested-by: Alin Gabriel Serdean <aserdean@ovn.org>
Co-authored-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
The call to rte_eth_dev_count() was added as workaround
for rte_eth_dev_get_port_by_name() not handling cases
when there was no DPDK ports.
In versions of DPDK >= 17.02 rte_eth_dev_get_port_by_name()
does handle this case (DPDK commit f9ae888b1e19).
rte_eth_dev_count() is no longer needed so remove it.
Acked-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
It'll be nice to document current naming convention for variables of
the following types used in netdev-dpdk:
* netdev
* netdev_dpdk
* netdev_rxq
* netdev_rxq_dpdk
to be sure that we will not return to chaos which was before
commit d46285a220 ("netdev-dpdk: Consistent variable naming.").
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Function 'netdev_dpdk_set_admin_state()' was missed while fixing
variables naming according to the following convention:
'struct netdev':'netdev'
'struct netdev_dpdk':'dev'
'struct netdev_rxq':'rxq'
'struct netdev_rxq_dpdk':'rx'
Fixes: d46285a220 ("netdev-dpdk: Consistent variable naming.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokess <ian.stokes@intel.com>
The function prototypes in ovsdb-data.h already have these, but it seems
more complete to have the annotation on the definitions too.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
The zone Netlink attribute is supposed to be in network-byte order, but
the Windows code for deleting conntrack entries was treating it as
host-byte order.
Found by inspection.
Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Sairam Venugopal <vsairam@vmware.com>
With this patch, "flush-conntrack" in ovs-dpctl and ovs-appctl accept
a conntrack 5-tuple to delete the conntrack entry specified by the 5-tuple.
For example, user can use the following command to flush a conntrack entry
in zone 5.
$ ovs-dpctl flush-conntrack zone=5 \
'ct_nw_src=10.1.1.2,ct_nw_dst=10.1.1.1,ct_nw_proto=17,ct_tp_src=2,ct_tp_dst=1'
$ ovs-appctl dpctl/flush-conntrack zone=5 \
'ct_nw_src=10.1.1.2,ct_nw_dst=10.1.1.1,ct_nw_proto=17,ct_tp_src=2,ct_tp_dst=1'
VMWare-BZ: #1983178
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Justin Pettit <jpettit@ovn.org>
This patch adds support of flushing a conntrack entry specified by the
conntrack 5-tuple, and provides the implementation in dpif-netlink.
The implementation of dpif-netlink in the linux datapath utilizes the
NFNL_SUBSYS_CTNETLINK netlink subsystem to delete a conntrack entry in
nf_conntrack. Future patches will add support for the userspace and
Windows datapaths.
VMWare-BZ: #1983178
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Justin Pettit <jpettit@ovn.org>
"test -d x || mkdir x" has a race when invoked in parallel: it is possible
for two processes to both see that 'x' does not exist and both try to
create it, and if that happens then one of them will fail. This avoids
the problem.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Pacemaker Resource agent periodically calls the OVN OCF's "monitor" action
periodically to check the status. But the OVN OCF script doesn't add the
action "monitor" for the role "Master" because of which the pacemaker
resource agent do not call the "monitor" action at all for the master.
In case OVN db servers exit for some reason this totally gets undetected
and one of the standby node is not promoted to master.
This patch adds the monitor action for "Master" role. Also the monitor
action do not check for the status of the ovn-northd (if manage_northd is yes).
This patch also checks for the status of the ovn-northd in the monitor action
for the "Master" role. If any of the ovsdb-server or ovn-northd is not running,
monitor action will return OCF_NOT_RUNNING and this will cause the pacemaker
to restart the OVN OCF resource.
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1512568
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
CC: Russell Bryant <russell@ovn.org>
Signed-off-by: Russell Bryant <russell@ovn.org>
The external_ids column is missing from the NAT and
Logical_Router_Static_Route tables.
Signed-off-by: Lucas Alvares Gomes <lucasagomes@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Daniel Alvarez <dalvarez@redhat.com>
Acked-by: Miguel Angel Ajo <majopela@redhat.com>
Found by libfuzzer.
Reported-by: Bhargava Shastry <bshastry@sec.t-labs.tu-berlin.de>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Jan Scheurich <jan.scheurich@ericsson.com>
This patch adds 'extern "C"' in a couple of header files so that
they can be compiled with C++ compilers.
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Instead of listing all the dependencies, use the RPM group
'Development Tools' and the builddep tool to find specific
ones.
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>