This patch moves the cfm/show unixctl show command from the bridge
to the CFM module. This is more in line with how LACP does it, and
will make future patches easier to implement.
According to the 802.1ag specification, reception of a CCM from an
unexpected source should trigger a fault. This patch causes the CFM
module to simply warn instead. There are several reasons for this
change outlined below.
- Faults can cause controllers to make potentially expensive
changes to the network topology.
- Faults can be maliciously triggered by crafting invalid CCMs.
- With this patch, cfm->fault and rmp->fault are only updated in
cfm_run() making the code easier to debug and reason about.
According to the 802.1ag specification, when a CCM is received
which advertises a misconfigured transmission interval, a fault
should be triggered. This patch goes against the spec by simply
warning when this happens. This is done for several reasons.
- Faults can cause controllers to make potentially expensive
changes in the network topology.
- Faults can be maliciously triggered by crafting invalid CCMs.
- Reducing the number of places in the code where rmp->fault and
cfm->fault are changed makes the code easier to debug and
reason about.
If the last receive time for a remote MP was before the last fault
check, the CFM code would not declare a fault. This is, of course,
exactly the wrong response.
Bug #5303.
Before this (and the previous) patch, whenever cfm_configure was
called it would set the fault_timer to expired. Thus, the next
call to cfm_run would notice a lack of CCM reception and trigger a
faulted status. This is a bug in and of itself, but normally would
not be a big deal because cfm_configure should only be called
infrequently (when the database changes). However due to an
unrelated bug, cfm_configure() was getting called approximately once
per second. This resulted in all monitors showing faults all of
the time.
This patch fixes the problem by not expiring the timer at
cfm_configure(). Instead it gives it the appropriate
fault_interval amount of time to miss heartbeats.
Bug #5244.
Calling cfm_configure often could cause timers to be reset
resulting in unexpected behavior. This commit only updates when
cfm configuration actually changed.
Bug #5244.
Ben pointed out that an attacker could cause OVS to use infinite
memory by sending a series of CCMs with different MAIDs. Each
message would cause a remote_maid to be allocated and stored for
several seconds.
Since Commit 1c2e2d2fc8 (cfm: Don't report unexpected remote
endpoints) no longer reports unexpected remote MAIDS and MPs in the
database, the only reason to keep track of this information is for
debugging purposes. In my judgment, it provides negligible useful
debugging information at the expense of significantly increased
code complexity. This commit rips it out entirely.
The specification says that a fault should be signaled when 3.5 *
ccm_interval milliseconds have passed. This commit respects that
requirement, possibly increasing the responsiveness of fault
detection slightly.
Before this patch, CFM would report unexpected remote maintenance
points in the database. This commit no longer exposes this
information.
Information about precisely why a link is faulty is more interesting
to a system administrator debugging a problem than a controller
which will generally only care about whether or not a link is
faulty. For simplicity sake, this commit removes this information
from the database where it was somewhat awkwardly placed. In the
future it may be valuable to report the information through
ovs-appctl commands for debugging purposes.
Commit af5739857a (cfm: Immediately signal a fault upon receiving
an unexpected MPID.) caused the CFM library to immediately signal a
fault upon reception of an unexpected remote MPID. This commit
does the same for MAIDs, and remote maintenance points with invalid
CCM intervals.
It doesn't really make sense for the CFM code to be composing
packets. Its caller is better placed to compose the appropriate
L2 header. This commit pulls that logic out of the CFM library.
An unexpected MPID is always a fault, but the CFM code didn't signal the
fault until the next time cfm_run() was called. In one experiment I
saw a visible lag in the database (although I wasn't able to reproduce it
again within a few tries).
On 32-bit platforms GCC warns:
../lib/cfm.c: In function 'compose_ccm':
../lib/cfm.c:130: warning: integer constant is too large for 'long' type
../lib/cfm.c: In function 'cfm_should_process_flow':
../lib/cfm.c:375: warning: integer constant is too large for 'long' type
This fixes the problem by using the UINT64_C macro from <inttypes.h> to
write a 64-constant.
This commit implements a subset of the 802.1ag specification for
Connectivity Fault Management (CFM) using Continuity Check Messages
(CCM). When CFM is configured on an interface CCMs are broadcast
at regular intervals to detect missing or unexpected connectivity.