Before this (and the previous) patch, whenever cfm_configure was
called it would set the fault_timer to expired. Thus, the next
call to cfm_run would notice a lack of CCM reception and trigger a
faulted status. This is a bug in and of itself, but normally would
not be a big deal because cfm_configure should only be called
infrequently (when the database changes). However due to an
unrelated bug, cfm_configure() was getting called approximately once
per second. This resulted in all monitors showing faults all of
the time.
This patch fixes the problem by not expiring the timer at
cfm_configure(). Instead it gives it the appropriate
fault_interval amount of time to miss heartbeats.
Bug #5244.
Calling cfm_configure often could cause timers to be reset
resulting in unexpected behavior. This commit only updates when
cfm configuration actually changed.
Bug #5244.
Ben pointed out that an attacker could cause OVS to use infinite
memory by sending a series of CCMs with different MAIDs. Each
message would cause a remote_maid to be allocated and stored for
several seconds.
Since Commit 1c2e2d2fc8 (cfm: Don't report unexpected remote
endpoints) no longer reports unexpected remote MAIDS and MPs in the
database, the only reason to keep track of this information is for
debugging purposes. In my judgment, it provides negligible useful
debugging information at the expense of significantly increased
code complexity. This commit rips it out entirely.
The specification says that a fault should be signaled when 3.5 *
ccm_interval milliseconds have passed. This commit respects that
requirement, possibly increasing the responsiveness of fault
detection slightly.
Before this patch, CFM would report unexpected remote maintenance
points in the database. This commit no longer exposes this
information.
Information about precisely why a link is faulty is more interesting
to a system administrator debugging a problem than a controller
which will generally only care about whether or not a link is
faulty. For simplicity sake, this commit removes this information
from the database where it was somewhat awkwardly placed. In the
future it may be valuable to report the information through
ovs-appctl commands for debugging purposes.
Commit af5739857a (cfm: Immediately signal a fault upon receiving
an unexpected MPID.) caused the CFM library to immediately signal a
fault upon reception of an unexpected remote MPID. This commit
does the same for MAIDs, and remote maintenance points with invalid
CCM intervals.
It doesn't really make sense for the CFM code to be composing
packets. Its caller is better placed to compose the appropriate
L2 header. This commit pulls that logic out of the CFM library.
An unexpected MPID is always a fault, but the CFM code didn't signal the
fault until the next time cfm_run() was called. In one experiment I
saw a visible lag in the database (although I wasn't able to reproduce it
again within a few tries).
On 32-bit platforms GCC warns:
../lib/cfm.c: In function 'compose_ccm':
../lib/cfm.c:130: warning: integer constant is too large for 'long' type
../lib/cfm.c: In function 'cfm_should_process_flow':
../lib/cfm.c:375: warning: integer constant is too large for 'long' type
This fixes the problem by using the UINT64_C macro from <inttypes.h> to
write a 64-constant.
This commit implements a subset of the 802.1ag specification for
Connectivity Fault Management (CFM) using Continuity Check Messages
(CCM). When CFM is configured on an interface CCMs are broadcast
at regular intervals to detect missing or unexpected connectivity.