Linux kernel netlink module added NLA_F_NESTED flag checking for nested
netlink messages in 5.2. A nested message without the flag set will be
treated as malformatted one. The check is optional and is controlled by
message policy. To avoid this, add NLA_F_NESTED explicitly for all
nested netlink messages with a new function
nl_msg_start_nested_with_flag().
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Gavin Li <gavinl@nvidia.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
When OVS sees a tunnel push with a nested list next, it will not
clone the packet, as a clone is not needed. However, a clone action will
still be created with the tunnel push encapsulated inside. There is no
need to create the clone action in this case, as extra parsing will need
to be performed, which is less efficient.
Signed-off-by: Rosemarie O'Riorden <roriorden@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
During a netlink transaction, in case of replies of type NLMSG_ERROR,
the current behavior includes the translation of the error number
received into a string that describes the error code.
Netlink replies may carry a more descriptive error message, and
although it is possible to read those messages using the existing perf
tracepoint, it is more convenient to retrieve them directly from ovs.
This patch extends nl_msg_nlmsgerr() so that it retrieves the message
that later, if present, will be used by nl_sock_transact_multiple__()
in place of the generic descriptive form of the error number. This is
particularly useful with tc that makes use of such kind of mechanism.
As an example, with this patch applied, the following generic message:
ovs|00239|netlink_socket|DBG|received NAK error=0 (Operation not supported)
becomes:
ovs|00239|netlink_socket|DBG|received NAK error=0 - Conntrack isn't enabled
The layout has been slightly modified to avoid nested parentheses.
Suggested-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Reviewed-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Data retrieved from netlink and friends may include link layer
address. Add type to nl_attr_type and min/max functions to allow
use of nl_policy_parse with this type of data.
While this will not be used by Open vSwitch itself at this time,
sibling and derived projects want to use the great netlink library
that OVS provides, and it is not possible to safely override the
global nl_attr_type symbol at link time.
Signed-off-by: Frode Nordahl <frode.nordahl@canonical.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Length of nested attributes must be checked before storing to the
header. If current length exceeds the maximum value parsing should
fail, otherwise the length value will be truncated leading to
corrupted netlink message and out-of-bound memory accesses:
ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6310002cc838
at pc 0x000000575470 bp 0x7ffc6c322d60 sp 0x7ffc6c322d58
READ of size 1 at 0x6310002cc838 thread T0
SCARINESS: 12 (1-byte-read-heap-buffer-overflow)
#0 0x57546f in format_generic_odp_key lib/odp-util.c:2738:39
#1 0x559e70 in check_attr_len lib/odp-util.c:3572:13
#2 0x56581a in format_odp_key_attr lib/odp-util.c:4392:9
#3 0x5563b9 in format_odp_action lib/odp-util.c:1192:9
#4 0x555d75 in format_odp_actions lib/odp-util.c:1279:13
...
Fix that by checking the length of nested netlink attributes before
updating 'nla_len' inside the header. Additionally introduced
assertion inside nl_msg_end_nested() to catch this kind of issues
before actual overflow happened.
Credit to OSS-Fuzz.
Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=20003
Fixes: 65da723b40a5 ("odp-util: Format tunnel attributes directly from netlink.")
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Use the helpers in appropriate places. In most cases, this fixes a
misaligned reference, since ovs_be128 and ovs_u128 require 8-byte alignment
but Netlink only guarantees 4-byte.
Found by GCC -fsanitize=undefined.
Reported-by: Lance Richardson <lrichard@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Lance Richardson <lrichard@redhat.com>
Eliminate a number of instances of undefined behavior related to
passing NULL in parameters having "nonnull" annotations.
Found with gcc's undefined behavior sanitizer.
Signed-off-by: Lance Richardson <lrichard@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Since there is no data to copy nl_msg_put_unspec_uninit() may be used
directly, rather than via nl_msg_put_unspec().
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
To easily allow both in- and out-of-tree building of the Python
wrapper for the OVS JSON parser (e.g. w/ pip), move json.h to
include/openvswitch. This also requires moving lib/{hmap,shash}.h.
Both hmap.h and shash.h were #include-ing "util.h" even though the
headers themselves did not use anything from there, but rather from
include/openvswitch/util.h. Fixing that required including util.h
in several C files mostly due to OVS_NOT_REACHED and things like
xmalloc.
Signed-off-by: Terry Wilson <twilson@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
In nl_msg_push_flag(), the 3rd NULL parameter causing 'memcpy()'
with NULL source pointer in nl_msg_push_unspec().
Found by Clang.
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
This patch adds support for specifying a "helper" or ALG to assist
connection tracking for protocols that consist of multiple streams.
Initially, only support for FTP is included.
Below is an example set of flows to allow FTP control connections from
port 1->2 to establish active data connections in the reverse direction:
table=0,priority=1,action=drop
table=0,arp,action=normal
table=0,in_port=1,tcp,action=ct(alg=ftp,commit),2
table=0,in_port=2,tcp,ct_state=-trk,action=ct(table=1)
table=1,in_port=2,tcp,ct_state=+trk+est,action=1
table=1,in_port=2,tcp,ct_state=+trk+rel,action=ct(commit),1
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
[cascardo: add NL_A_IPV6, used in next patch]
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
ofpbuf was complicated due to its wide usage across all
layers of OVS, Now we have introduced independent dp_packet
which can be used for datapath packet, we can simplify ofpbuf.
Following patch removes DPDK mbuf and access API of ofpbuf
members.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Until now the Netlink code has considered an attribute to exceed the
maximum length if the *padded* size of the attribute exceeds 65535 bytes.
For example, an attribute with a 65529-byte payload, together with 4-byte
header and 3 bytes of padding, takes up 65536 bytes and therefore the
existing code rejected it.
However, the restriction on Netlink attribute sizes is to ensure that the
length fits in the 16-bit nla_len field. This field includes the 4-byte
header but not the padding, so a 65529-byte payload is acceptable because,
with the header but not the padding, it comes to only 65533 bytes.
Thus, this commit relaxes the restriction on Netlink attribute sizes by
omitting padding from size checks. It also changes one piece of code that
inlined a size check to use the central function nl_attr_oversized().
This change should fix an assertion failure when OVS userspace passes a
maximum-size (65529+ byte) packet back to the kernel.
Reported-by: Shuping Cui <scui@redhat.com>
Reported-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
A new function vlog_insert_module() is introduced to avoid using
list_insert() from the vlog.h header.
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Commit 0791315e4d (netlink-socket: Work around kernel Netlink dump thread
races.) introduced a simple workaround for Linux kernel races in Netlink
dumps. However, the code remained more complicated than needed. This
commit simplifies it.
The main reason for complication in the code was 'status_seq' in nl_dump.
This member was there to allow a thread to wait for some other thread to
refill the socket buffer with another dump message (although we did not
understand the reason at the time it was introduced). Now that we know
that Netlink dumps properly need to be serialized to work in existing
Linux kernels, there's no additional value in having 'status_seq',
because serialized recvmsg() calls always refill the socket buffer
properly.
This commit updates nl_msg_next() to clear its buffer argument on error.
This is a more convenient interface for the new version of the Netlink
dump code. nl_msg_next() doesn't have any other callers.
Signed-off-by: Ben Pfaff <blp@nicira.com>
I think these were leftovers from the removal of %z for MSVC that happened
some time ago.
VMware-BZ: 1265762
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Pritesh Kothari <pritesh.kothari@cisco.com>
This patch shrinks the struct ofpbuf from 104 to 48 bytes on 64-bit
systems, or from 52 to 36 bytes on 32-bit systems (counting in the
'l7' removal from an earlier patch). This may help contribute to
cache efficiency, and will speed up initializing, copying and
manipulating ofpbufs. This is potentially important for the DPDK
datapath, but the rest of the code base may also see a little benefit.
Changes are:
- Remove 'l7' pointer (previous patch).
- Use offsets instead of layer pointers for l2_5, l3, and l4 using
'l2' as basis. Usually 'data' is the same as 'l2', but this is not
always the case (e.g., when parsing or constructing a packet), so it
can not be easily used as the offset basis. Also, packet parsing is
faster if we do not need to maintain the offsets each time we pull
data from the ofpbuf.
- Use uint32_t for 'allocated' and 'size', as 2^32 is enough even for
largest possible messages/packets.
- Use packed enum for 'source'.
- Rearrange to avoid unnecessary padding.
- Remove 'private_p', which was used only in two cases, both of which
had the invariant ('l2' == 'data'), so we can temporarily use 'l2'
as a private pointer.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
PAD_SIZE(x,y) is a little shorter and may have a more obvious meaning
than ROUND_UP(x,y) - x.
I intend to add more users in an upcoming comment.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
This allows other libraries to use util.h that has already
defined NOT_REACHED.
Signed-off-by: Harold Lim <haroldl@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
The MSVC C library printf() implementation does not support the 'z', 't',
'j', or 'hh' format specifiers. This commit changes the Open vSwitch code
to avoid those format specifiers, switching to standard macros from
<inttypes.h> where available and inventing new macros resembling them
where necessary. It also updates CodingStyle to specify the macros' use
and adds a Makefile rule to report violations.
Signed-off-by: Alin Serdean <aserdean@cloudbasesolutions.com>
Co-authored-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This function already had a few potential users, which this commit
converts. An upcoming commit adds more users.
Signed-off-by: Ben Pfaff <blp@nicira.com>
If we allow oversize datapath actions to make it out of translation, then
we will assert-fail later when we try to put those actions into a Netlink
attribute.
Bug #19277.
Reported-by: Paul ingram <paul@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
This commit fixes the warning issued by 'clang' when pointer is casted
to one with greater alignment.
Signed-off-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Until now, datapath ports and openflow ports were both represented by
unsigned integers of various sizes. With implicit conversions, etc., it is
easy to mix them up and use one where the other is expected. This commit
creates two typedefs, ofp_port_t and odp_port_t. Both of these two types
are marked by "__attribute__((bitwise))" so that sparse can be used to
detect any misuse.
Signed-off-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This is a straight search-and-replace, except that I also removed #include
<assert.h> from each file where there were no assert calls left.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
Replaced all instances of Nicira Networks(, Inc) to Nicira, Inc.
Feature #10593
Signed-off-by: Raju Subramanian <rsubramanian@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Choosing sequence numbers at time of creating a packet means that
nl_sock_transact_multiple() has to search for the sequence number
of a reply, because the sequence numbers of the requests aren't
necessarily sequential. This commit makes it possible to avoid
the search, by deferring choice of sequence numbers until the
time that we send the packets. It doesn't actually modify
nl_sock_transact_multiple(), which will happen in a later commit.
Previously, I was concerned about a theoretical race condition
described in a comment in the old versino of this code:
This implementation uses sequence numbers that are unique
process-wide, to avoid a hypothetical race: send request, close
socket, open new socket that reuses the old socket's PID value,
send request on new socket, receive reply from kernel to old
socket but with same PID and sequence number. (This race could be
avoided other ways, e.g. by preventing PIDs from being quickly
reused).
However, I no longer believe that this can be a real problem,
because Netlink operates synchronously. The reply to a request
will always arrive before the socket can be closed and a new
socket opened with the old socket's PID.
Signed-off-by: Ben Pfaff <blp@nicira.com>
These are really just copies of the corresponding "put" functions. An
upcoming commit will introduce a user of nl_msg_push_u32(). I thought I
might as well create all of these while I was at it.
Many of our functions pass around a pointer to Netlink attributes
and a length. This exposes the version of nl_attr_find that takes
that format so it can be used by callers outside the Netlink library.
These functions are useful in the occasional case where a piece of code
only cares about one or a few attributes, probably knows that the format
is correct, and doesn't want to go to the trouble of doing a full parse.
Upcoming commits will add a user.
Reviewed by Justin Pettit.
Linux since v2.6.24 has a couple of couple of bits at the top of
nla_type that one is apparently supposed to ignore. This commit
starts doing that in Open vSwitch userspace.
Acked-by: Jesse Gross <jesse@nicira.com>
These _be<N> functions are completely equivalent to the corresponding
_u<N> functions, but the names help to make their purpose clear.
Acked-by: Jesse Gross <jesse@nicira.com>
The parts of the netlink module that are related to sockets are
Linux-specific, since only Linux has AF_NETLINK sockets. The rest can be
built anywhere. This commit breaks them into two modules, and builds the
generic one on all platforms.
Acked-by: Jesse Gross <jesse@nicira.com>
Stress options allow developers testing Open vSwitch to trigger behavior
that otherwise would occur only in corner cases. Developers and testers
can thereby more easily discover bugs that would otherwise manifest only
rarely or nondeterministically. Stress options may cause surprising
behavior even when they do not actually reveal bugs, so they should only be
enabled as part of testing Open vSwitch.
This commit implements the framework and adds a few example stress options.
This commit started from code written by Andrew Lambeth.
Suggested-by: Henrik Amren <henrik@nicira.com>
CC: Andrew Lambeth <wal@nicira.com>
Until now, the collection of coverage counters supported by a given OVS
program was not specific to that program. That means that, for example,
even though ovs-dpctl does not have anything to do with mac_learning, it
still has a coverage counter for it. This is confusing, at best.
This commit fixes the problem on some systems, in particular on ones that
use GCC and the GNU linker. It uses the feature of the GNU linker
described in its manual as:
If an orphaned section's name is representable as a C identifier then
the linker will automatically see PROVIDE two symbols: __start_SECNAME
and __end_SECNAME, where SECNAME is the name of the section. These
indicate the start address and end address of the orphaned section
respectively.
Systems that don't support these features retain the earlier behavior.
This commit also fixes the annoyance that files that include coverage
counters must be listed on COVERAGE_FILES in lib/automake.mk.
This commit also fixes the annoyance that modifying any source file that
includes a coverage counter caused all programs that link against
libopenvswitch.a to relink, even programs that the source file was not
linked into. For example, modifying ofproto/ofproto.c (which includes
coverage counters) caused tests/test-aes128 to relink, even though
test-aes128 does not link again ofproto.o.
All of these changes avoid using the same name for two local variables
within a same function. None of them are actual bugs as far as I can tell,
but any of them could be confusing to the casual reader.
The one in lib/ovsdb-idl.c is particularly brilliant: inner and outer
loops both using (different) variables named 'i'.
Found with GCC -Wshadow.