The new CFM "demand mode" (named after BFD's demand mode) uses
data traffic to indicate interface liveness. It's helpful on
heavily congested networks where CCMs may be dropped.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
This is a straight search-and-replace, except that I also removed #include
<assert.h> from each file where there were no assert calls left.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
Before this patch, when a tunnel is configured with key=flow, CFM
didn't verify that incoming packets had the appropriate key of
zero. This could cause the CFM module to consume packets which
weren't actually intended for it.
Bug #13542.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
The cfm_get_opup() function's result doesn't make sense when CFM is
not configured in extended mode. This patch makes it report -1 in
this case. Future patches will rely on this behavior.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
When CFM is first configured, it detects no remote endpoints, and
thus sets RDI on its CCMs. This can cause the receiver of these
CCMs to think there is a problem when really things are simply
initializing. This patch fixes the issue by not setting the RDI
bit in CCMs until at least one fault interval has passed.
Bug #12610.
Reported-by: Paul Ingram <paul@nicira.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
This patch makes a two improvements to CFM logging which should
make debugging connectivity problems a bit more intuitive. First,
when a remote_mp disappears, the length of time since its last CCM
reception is logged. Second, the "CFM fault status changed"
message is reformatted in a more intuitive way. Instead of
prefixing additions and deletions with pluses and minuses, the full
old fault status and new fault status are logged.
Requested-by: Ben Basler <bbasler@nicira.com>,
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Found by valgrind:
Syscall param socketcall.sendmsg(msg.msg_iov[i]) points to uninitialised
byte(s)
at 0x42D3021: sendmsg (in /lib/libc-2.5.so)
by 0x80E4D23: nl_sock_transact (netlink-socket.c:670)
by 0x80D9086: dpif_linux_execute__ (dpif-linux.c:872)
by 0x807D6AE: dpif_execute__ (dpif.c:957)
by 0x807D6FE: dpif_execute (dpif.c:987)
by 0x805DED9: send_packet (ofproto-dpif.c:4727)
by 0x805F8E1: port_run_fast (ofproto-dpif.c:2441)
by 0x8065CF6: run_fast (ofproto-dpif.c:926)
by 0x805674F: ofproto_run_fast (ofproto.c:1148)
by 0x804C957: bridge_run_fast (bridge.c:1980)
by 0x8053F49: main (ovs-vswitchd.c:123)
Address 0xbea0895c is on thread 1's stack
Bug #11797.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Commit 2b540ecb (Added handling of previously ignored cfm faults.)
made the CFM code trigger a fault when a packet is received with an
out of order sequence number. This means that if even one CFM
probe is dropped, a fault will be triggered because the next
received probe's sequence would be two greater than the last. This
is in conflict with the 802.1ag requirement that 3.5 dropped probes
triggers fault.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
We've recently seen problems where OVS can get delayed sending CCM
probes by several seconds. This can cause tunnels to flap, and
generally wreak havoc. It's easy to detect when this is happening,
so minimally, warning should be helpful to those debugging
problems.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
When debugging CFM, it's useful to know exactly when each fault
interval starts in relation to other CFM events.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Replaced all instances of Nicira Networks(, Inc) to Nicira, Inc.
Feature #10593
Signed-off-by: Raju Subramanian <rsubramanian@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Until now, fault status changes just log the new status. This means that
the administrator has to find two consecutive status change messages to
see what actually changed.
This commit changes the log message format to prefix new faults with '+'
and faults that disappeared with '-'. Existing faults that are still
present are not prefixed.
This also simplifies the code a little by making ds_put_cfm_fault()
put spaces before fault names instead of after.
Signed-off-by: Ben Pfaff <blp@nicira.com>
The CFM packets that are out of sequence or contain invalid cfm_interval were
previously not ignored. The behavior is changed with this patch to not
process those CFM frames.
Signed-off-by: Mehak Mahajan <mmahajan@nicira.com>
The changes display the cfm_health of an interface. The cfm_health
is an exponential weighted moving average of the health of all
remote_mpids. The value can vary from 0 to 100, 100 being very healthy
and 0 being unhealthy.
Feature #10363
Requested-by: Ethan Jackson <ethan@nicira.com>
Signed-off-by: Mehak Mahajan <mmahajan@nicira.com>
The changes display the cfm_health of an interface. The cfm_health
is an exponential weighted moving average of the health of all
remote_mpids. The value can vary from 0 to 100, 100 being very healthy
and 0 being unhealthy.
Feature #10363
Requested-by: Ethan Jackson <ethan@nicira.com>
Signed-off-by: Mehak Mahajan <mmahajan@nicira.com>
CCM PDUs may take a different path through the network depending on
the VLAN tag they carry. In order to exercise these paths, it
may be advantageous to use a random VLAN tag.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
The unixctl library had used the vde2 management protocol since the
early days of Open vSwitch. As Open vSwitch has matured, several
Python daemons have been added to the code base which would benefit
from a unixctl implementations. Instead of implementing the old
unixctl protocol in Python, this patch changes unixctl to use JSON
RPC for which we already have an implementation in both Python and
C. Future patches will need to implement a unixctl library in
Python on top of JSON RPC.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
The cfm_fault column of the database is the logical OR of a number
of reasons that CFM can be in a faulted state. A controller may
want to have more specific information in which case it can look at
the cfm_fault_status column which this patch adds.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
While debugging some issues today it became clear that it would be
useful to log when the CFM fault status changes and when packets
are lost. The CFM module logs pretty aggressively when in debug
mode, but this can be chatty and most systems don't operate under
this logging level for extended periods of time. This patch logs
when CCMs are received which indicate reordering or packet loss and
when the CFM fault status changed.
Requested-by: Jacob Cherkas <jcherkas@nicira.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
The protocol used by ovs-appctl has a long-standing bug that there
is no way to distinguish "ovs-appctl a b c" from "ovs-appctl 'a b c'".
This isn't a big deal because none of the current commands really
want to accept arguments that include spaces, but it's kind of a silly
limitation.
At the same time, the internal API is awkward because every user is
stuck doing its own argument parsing, which is no fun.
This commit fixes both problems, by adding shell-like quoting to the
protocol and modifying the internal API from one that passes a string
to one that passes in an array of pre-parsed strings. Command
implementations may now specify how many arguments they expect. This
simplifies some command implementations significantly.
Signed-off-by: Ben Pfaff <blp@nicira.com>
In some cases, a controller may want to take an interface down for
forwarding purposes, but avoid completely deconfiguring CFM and
thus lose all connectivity monitoring. The new 'cfm_opstate'
setting is a way to achieve this behavior.
Wireshark complained that Open vSwitch-generated CFM messages were
malformed. Upon looking at the standard, I spotted that Open vSwitch
failed to include the final, required "End TLV" byte with value 0.
This commit adds the End TLV byte to generated CCMs but still accepts
the truncated messages for backward compatibility.
When no remote MPIDs were found, the output would print an extra newline.
If multiple remote MPIDs were found, the lines would run together. This
commit cleans things up a bit by just printing each item on its own line
without any blank lines.
A controller may want to know which MPIDs are reachable from an
interface configured with CFM. This patch regularly writes this
information to the database.
Bug #7014.
802.1ag only allows for MPIDs in the range [1, 8191]. This is
restrictive enough to make assignment of MPIDs to instances of OVS
awkward. This patch allows eight byte MPIDs when running in
extended mode.
Bug #7014.
The standard CFM protocol only allows a handful of transmission
rates. This is particularly problematic if you want to support a
transmission rate slower than 100 ms and faster than 1000 ms.
This patch allows arbitrary transmission rates (between 1 ms and
65535 ms). It does this by commandeering parts of a reserved
"zero" field in the ccm message. This breaks wire compatibility
with standard 802.1ag implementations, and thus is only supported
in extended mode.
Bug #7014.
The new extended mode introduced in this patch will be used for
features which break wire compatibility with 802.1ag compliant
implementations.
Bug #7014.
According to the 802.1ag specification, users should be able to
configure the CFM module with a list of remote endpoints with which
the local endpoint should have connectivity. Commit 93b8df3853
"cfm: Remove Maintenance_Point and Monitor tables." changed the
behavior so that only one remote endpoint could be specified. This
commit takes it further, by disallowing specification of any
remote endpoints.
Due to this change, the semantics of the fault flag are slightly
different. Before, a fault was triggered if any of the configured
remote endpoints were unreachable (or with RDI), or if any
unconfigured remote endpoints were reachable. Now a fault is
triggered if no remote endpoints are reachable at all, or if
reachable endpoints have set their RDI.
Bug #7014.
Commit 84c5d450 "cfm: No longer allow configuration of ma_name and
md_name." changed the default MA and MD name for no reason. This
could add needless complexity to situations where OVS instances
built before and after this commit need to speak CFM to each other.
This commit reverts to the old values.
According to the 802.1ag specification, a maintenance point should
set the Remote Defect Indicator (RDI) bit on CCMs when it has
failed to receive a CCM from any of it's configured remote
maintenance points within the required fault interval. This allows
unidirectional faults to be flagged by both ends of a CFM monitored
tunnel.
This patch makes a stylistic improvement by removing CFM protocol
information from cfm.h. In the process it changes
cfm_compose_ccm() to populate an ofpbuf instead of a struct ccm.
In an effort to make CFM easier to understand and configure, this
patch removes the Maintenance_Point and Monitor tables from the
database. As a consequence, users will only be able to configure
one remote maintenance point. Furthermore, before this patch each
remote maintenance point maintained its own separate fault flag in
the database. This flag is no longer reported, users will need to
infer the fault status from the global CFM fault flag.
These settings added complexity to the database and CFM module
interface with negligible benefit. This patch removes them in such
a way that they can easily be re-added in the (unlikely) event that
we need them in the future.