It's possible to install an OpenFlow flow that matches on udp source and
destination ports without matching on fragments. If the subtable where
such flow stays is visited during translation of a later fragment, the
generated mask will have incorrect prerequisited for the datapath and it
would be revalidated away at the first chance.
This commit fixes it by adjusting the mask for later fragments after
translation.
Other prerequisites of the mask are also prerequisites in OpenFlow, but
not the ip fragment bit, that's why we need a special case here.
For completeness, this commits also fixes a related problem in bfd,
where we check the udp destination port without checking if the frame is
an ip fragment. It's not really necessary to address this separately,
given the adjustment that we perform.
VMware-BZ: #1651589
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Jarno Rajahalme <jarno@ovn.org>
The checksum method csum() requires its output location to be
intialized to zero when that output location is part of the
checksum. Add comments to the various places where csum is
called documenting where the initialization has occurred.
Signed-off-by: Ryan Moats <rmoats@us.ibm.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
To easily allow both in- and out-of-tree building of the Python
wrapper for the OVS JSON parser (e.g. w/ pip), move json.h to
include/openvswitch. This also requires moving lib/{hmap,shash}.h.
Both hmap.h and shash.h were #include-ing "util.h" even though the
headers themselves did not use anything from there, but rather from
include/openvswitch/util.h. Fixing that required including util.h
in several C files mostly due to OVS_NOT_REACHED and things like
xmalloc.
Signed-off-by: Terry Wilson <twilson@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Some tunnel protocols, such as Geneve, have a bit in the tunnel
header to indicate that it is an OAM packet. This means that the
packet should be processed as a tunnel control frame and not be
passed onto connected links.
When BFD is used inside of a tunnel it is often used in this control
capacity, so this adds an option to enable marking the outer header
when the output port is a tunnel that supports the OAM concept. It is
also possible to use tunnels as point-to-point links that are simply
carrying BFD as payload, so this is not always turned on.
Conceptually, this may also apply to other types of packets locally
generated by the switch, most obviously CFM. However, BFD seems to
be most commonly used for this type of tunnel monitoring application
so this only adds the option to BFD for the time being to avoid
unnecessarily adding configuration knobs that might never get used.
Signed-off-by: Jesse Gross <jesse@kernel.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Most vlog calls are for the log module owned by the translation unit being
compiled, but this module was referenced indirectly through a pointer
variable. That seems silly, so this commit changes the code so that the
local vlog module is referred to directly, as &this_module.
We could get rid of the global variables for vlog modules entirely, but
I like getting linker errors when there's a duplicate module name.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Russell Bryant <russell@ovn.org>
This saves some code and improves clarity, in my opinion.
Some of these changes just change an inet_pton() call into a similar
ip_parse() or ipv6_parse() call. In those cases the benefit is better
type safety, since inet_pton()'s output parameter is type "void *".
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
RFC 5880 specified bfd.RemoteSessionState as one of the state
variables. In OVS implementation, this value is exported to OVSDB's
BFD status column of the interface table, as one of the map elements,
with the key of 'remote_state'.
It can be surprising when the 'remote_state' map element disappears
when BFD is in the 'DOWN' state, but otherwise always exported.
Change to always exporting it, to make it more predictable for
applications that monitors the BFD status column.
While at it, make the same change to 'remote_diagnostic', so that it
is also always exported to OVSDB for consistency.
VMWare-BZ: 1535979
Reported-by: Mihir Gangar <gangarm@vmware.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
Define struct eth_addr and use it instead of a uint8_t array for all
ethernet addresses in OVS userspace. The struct is always the right
size, and it can be assigned without an explicit memcpy, which makes
code more readable.
"struct eth_addr" is a good type name for this as many utility
functions are already named accordingly.
struct eth_addr can be accessed as bytes as well as ovs_be16's, which
makes the struct 16-bit aligned. All use seems to be 16-bit aligned,
so some algorithms on the ethernet addresses can be made a bit more
efficient making use of this fact.
As the struct fits into a register (in 64-bit systems) we pass it by
value when possible.
This patch also changes the few uses of Linux specific ETH_ALEN to
OVS's own ETH_ADDR_LEN, and removes the OFP_ETH_ALEN, as it is no
longer needed.
This work stemmed from a desire to make all struct flow members
assignable for unrelated exploration purposes. However, I think this
might be a nice code readability improvement by itself.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Some commands of ovs-appctl were lazily registered when first
bridg or bfd was added. Before that, calling these commands raised a
error("xxx is not a valid command"). The problem commangs included
"bfd/...", "upcall/...","dpif/...","fdb/..." and so on.
Fix this by moving the register into the "bridge_init" and
"bridge_init_ofproto". All commands are registered at the moment
ovs-vswitchd starts.
Signed-off-by: Huanle Han <hanxueluo@gmail.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Currently dp-packet make use of ofpbuf for managing packet
buffers. That complicates ofpbuf, by making dp-packet
independent of ofpbuf both libraries can be optimized for
their own use case.
This avoids mapping operation between ofpbuf and dp_packet
in datapath upcalls.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
A new function vlog_insert_module() is introduced to avoid using
list_insert() from the vlog.h header.
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Ben Pfaff <blp@nicira.com>
I found the BFD transmit interval was lowerbounded by the default value
without warning, although documentation does not consider a lowerbound.
Testing has been performed with transmit and receive intervals as low as 1
ms, and although CPU overhead was effected (especially with multiple BFD
sessions such as 6 and higher), it worked well.
Signed-off-by: Niels van Adrichem <n.l.m.vanadrichem@tudelft.nl>
Signed-off-by: Ben Pfaff <blp@nicira.com>
The atomics here do not synchronize the state of any other variables,
so we can use atomic_count and relaxed atomics.
bfd_should_process_flow() is rearranged to set the megaflow mask bits
only if necessary, and to avoid the atomic operation on non-BFD
packets.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This commit adds options for configuring the MAC addresses
in BFD state machine. Therein, the "bfd_local_src_mac" and
"bfd_local_dst_mac" configure the MAC address for transmitted
BFD packets. The "bfd_remote_dst_mac" configure the matching
of MAC address on received BFD packets.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
This commit flips the default value of bfd ip source and destination,
so that they match the default value of ip destination and source
of vtep schema.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
After a quick analysis, in most cases the access to refcounted objects
is clearly protected either with an explicit lock/mutex, or RCU. there
are only a few places where I left a call to ovs_refcount_unref().
Upon closer analysis it may well be that those could also use the
relaxed form.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Various recent commits have introduced build failures on FreeBSD. This
patch fixes them.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Commit 88bf179aa3 (bfd/cfm: Check status change before update
status to database.) used a boolean flag to trigger bfd status
update. However, the flag is not set on bfd creation and deletion.
To prevent any stale status in database, this commit makes bfd module
always set the flag on bfd creation and deletion.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
This commit adds a requirement that bfd session must receive at least
one bfd control packet every 100 * bfd->cfg_min_rx amount of time in
forwarding_if_rx mode. Otherwise, even if the data packets are received
on the monitored interface, the bfd->forwarding is still false.
Since the datapath flow is not purged when the userspace Open Vswitch
crashes, data packet can still be forwarded through the tunnel and
fool the remote BFD session in forwarding_if_rx mode. Thus, this commit
can prevent the remote BFD session from falsely declaring tunnel liveness
in this situation.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
This commit adds boolean flag in bfd/cfm module for checking
status change. If there is no status change, the current
update to OVS database will skip the bfd/cfm session.
In the experiment with 5K bfd sessions, when one session is
flapping at rate of every 0.3 second, this patch reduces the
cpu utilization of the ovs-vswitchd thread from 13 to 6.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
Now that we don't need to parse TCP flags from the packet after
extraction, we usually do not need the 'l7' pointer any more. When
needed, ofpbuf_get_tcp|udp|sctp|icmp_payload() or ofpbuf_get_l4_size()
can be used instead.
Removal of 'l7' was requested by Pravin for the DPDK datapath work, as
it simplifies packet parsing a bit.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
None of the atomic implementations need a destroy function anymore, so it's
"more standard" and more convenient for users to get rid of them.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
Windows does not have inet_aton(), but does have a inet_pton().
inet_aton() is not defined in POSIX. But inet_pton() is.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This commit adds two new options, bfd_src_ip and bfd_dst_ip
respectively, which allows user to configure the source and
destination IP address of bfd control packet. If the user
specified address cannot be parsed, the default address
will be used.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
Commit ccc09689 (bfd: Implement Bidirectional Forwarding Detection.)
set the bfd local diagnostic to "Concatenated Path Down" in response
to the set of cpath_down only when the current local diagnostic is
"None". However, since the bfd local diagnostic is not reset when
the bfd state is restored from last erroneous state, the set of
cpath_down will not update the local diagnostic in that case.
This commit fixes the bug by always checking for local diagnostic
change when cpath_down is set or reset.
Bug #22625
Signed-off-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
This is a thin wrapper around an atomic_uint. It is useful anyhow because
each ovs_refcount_ref() or ovs_refcount_unref() call saves a few lines of
code.
This commit also changes all the potential direct users over to use the new
data structure.
Signed-off-by: Ben Pfaff <blp@nicira.com>
The bfd module did not previously change the global connectivity_seq
when the remote state changed, which means that such state changes may
not be propagated to the database. This is particularly bad if this is
the last state transition to happen in an otherwise stable environment.
This patch checks for transitions in remote state, and ensures that the
main thread will update the database when these happen.
Bug #22136.
Co-authored-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
In the case where we have not yet sent a control packet for a bfd
connection, and we receive a control packet from the remote host,
bfd->next_tx is updated to an unusual value. This causes the logging to
incorrectly report that there has been long delays (in the order of
weeks) since the last bfd transmission time.
This patch only modifies bfd->next_tx in this case if we are not
expecting to immediately send a control packet. This should mean that
bfd->next_tx is either 0 (immediate tx) or in the order of time_msec().
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This allows other libraries to use util.h that has already
defined NOT_REACHED.
Signed-off-by: Harold Lim <haroldl@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Previously, as part of ofproto-dpif run() processing, we would loop
through all ports and poll for changes to carrier, bfd, cfm and lacp
status. This information is used to determine whether bundles may be
enabled, and to perform revalidation when needed.
This patch makes the bfd, cfm, lacp and stp modules aware of the new
global connectivity_seq, notifying on changes in port status. We can
then use connectivity_seq to check if anything has changed before
looping through all ports in ofproto-dpif. In a test environment of 5000
internal ports and 50 tunnel ports with bfd, this reduces average CPU
usage of the main thread from about 35% to about 25%.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
Currently, we update the forwarding flag in bfd_set_state() and in
bfd_forwarding_if_rx_update() if bfd_forwarding_if_rx is enabled.
However, these are not the exact places where the forwarding flag
needs to be updated. The exact places are in the bfd_process_packet()
where bfd status are changed based on received control packet, and in
the flow_push_stats() and compose_output_action__() where the
rx_packet counter is updated.
This commit changes the update of forwarding flag to the places
mentioned above.
Signed-off-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
The MSVC C library printf() implementation does not support the 'z', 't',
'j', or 'hh' format specifiers. This commit changes the Open vSwitch code
to avoid those format specifiers, switching to standard macros from
<inttypes.h> where available and inventing new macros resembling them
where necessary. It also updates CodingStyle to specify the macros' use
and adds a Makefile rule to report violations.
Signed-off-by: Alin Serdean <aserdean@cloudbasesolutions.com>
Co-authored-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This commit adds a new key "flap_count" to "bfd_status" to count
the number of bfd "forwarding" flag flaps. A flap is considered
as a change of the "forwarding" flag value.
Signed-off-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This commit adds a forwarding flag to "struct bfd". This flag
is for indicating the interface's capability of packet I/O.
Also, this flag makes it possible to count the number of interface
state flapping. bfd_forwarding__() will update this flag at
each invocation.
Signed-off-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This commit adds a new function "bfd_wake_time()" that returns the
next wakeup time associated with the "struct bfd".
Signed-off-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
Ethernet headers are 14 bytes long, so when the beginning of such a header
is 32-bit aligned, the following data is misaligned. The usual trick to
fix that is to start the Ethernet header on an odd-numbered 16-bit
boundary. That trick works OK for Open vSwitch, but there are two
problems:
- OVS doesn't use that trick everywhere. Maybe it should, but it's
difficult to make sure that it does consistently because the CPUs
most commonly used with OVS don't care about misalignment, so we
only find problems when porting.
- Some protocols (GRE, VXLAN) don't use that trick, so in such a case
one can properly align the inner or outer L3/L4/L7 but not both. (OVS
userspace doesn't directly deal with such protocols yet, so this is
just future-proofing.)
- OpenFlow uses the alignment trick in a few places but not all of them.
This commit starts the adoption of what I hope will be a more robust way
to avoid misalignment problems and the resulting bus errors on RISC
architectures. Instead of trying to ensure that 32-bit quantities are
always aligned, we always read them as if they were misaligned. To ensure
that they are read this way, we change their types from 32-bit types to
pairs of 16-bit types. (I don't know of any protocols that offset the
next header by an odd number of bytes, so a 16-bit alignment assumption
seems OK.)
The same would be necessary for 64-bit types in protocol headers, but we
don't yet have any protocol definitions with 64-bit types.
IPv6 protocol headers need the same treatment, but for those we rely on
structs provided by system headers, so I'll leave them for an upcoming
patch.
Signed-off-by: Ben Pfaff <blp@nicira.com>
This commit adds a new boolean option "forwarding_if_rx" to bfd.
When forwarding_if_rx is true the interface will be considered
capable of packet I/O as long as there is packet received at
interface. This is important in that when link becomes temporarily
conjested, consecutive BFD control packets can be lost. And the
forwarding_if_rx can prevent link failover by detecting non-control
packets received at interface.
Signed-off-by: Alex Wang <alexw@nicira.com>
C11 says that atomic_load() requires a non-const argument, and Clang
enforces that. This fixes warnings with FreeBSD <stdatomic.h> that uses
the Clang extensions.
Reported-by: Ed Maste <emaste@freebsd.org>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ed Maste <emaste@freebsd.org>
When there is no incoming data traffic at the interface for a period,
BFD decay allows the bfd session to increase the min_rx. This is
helpful in that some interfaces may usually be idle for a long time.
And cpu consumption can be reduced by processing fewer bfd control
packets.
Signed-off-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
Currently, when there are multiple bfd configuration changes,
the bfd_poll() will only update one change at a time with the
other side. This commit moves the call to bfd_poll() at the
end of configuration processing function, so that bfd_poll()
will update all configuration changes together.
Signed-off-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>