In glibc, "timer_t" is a "void *" that appears to point into malloc()'d
memory. By throwing it away entirely, we leak it, which makes valgrind
complain. We really don't ever care to use the timer object again, but
we can't destroy it without stopping the periodic timer. So make it
static to avoid a warning from Valgrind.
Most of the timekeeping needs of OVS are simply to measure intervals,
which means that it is sensitive to changes in the clock. This commit
replaces the existing clocks with monotonic timers. An additional set
of wall clock timers are added and used in locations that need absolute
time.
Bug #1858
This code triggers when a trip through the process's main loop takes much
longer than expected. The code for calculating the expected time rounds
down to a maximum of 10000 ms to avoid overflow. But there is no reason
that the correct time should not be displayed in the log message, and
furthermore displaying the correct time may help tracking down the
underlying issue, since it lets the administrator find out exactly when
the trip through the main loop started. So this commit displays the exact
time without rounding down.
Without removing SA_RESTART from the SIGALRM handler, the fcntl call will
never return, even after the signal handler is invoked and returns.
We haven't seen a problem in practice, at least not recently, but that's
probably just luck combined with not holding the configuration file lock
for very long.
Open vSwitch uses an interval timer signal to tell it that its cached idea
of the current time has expired. However, this didn't work in a daemon
detached from the foreground session (invoked with --detach) because a
child created with fork() does not inherit the parent's interval timer and
we did not re-set it after calling fork().
This commit fixes the problem by setting the interval timer back up after
calling fork() from daemonize().
This fix is based on code inspection (which was then verified to be correct
through testing). It may not fix any actual problems in practice, because
time_refresh() is called every time through the poll loop, and the poll
loop typically runs more quickly than the periodic timer fires (1 ms or so
average in ovs-vswitchd, vs. 100 ms timer interval).
By default, many OVS processes keep track of their time through a poll
loop. If it takes an unusually long time (measured as some distance
from the mean), the processes will log stats it has been keeping about
coverage. It was doing this at level WARN.
On Xen systems, syslog messages written at level INFO and higher are
written to /var/log/messages synchronously. This would mean that there
would be dire messages that it took a few dozen milliseconds to go
through the loop, meanwhile, it would take up to 6(!) seconds writing
those. Meanwhile, the process would do no other processing, which could
be quite serious in the case of a process such as ovs-vswitchd.
This problem was somewhat masked because the time used by this logging
was not used in the calculations for determining how long it was taking
to get through the loop.
This commit lowers the default log level for those coverage messages to
INFO. On Xen systems, it raises the default level at which messages are
written to syslog to WARN.
Diagnosed and fixed with the help of Ian Campbell.
Previously, there was no way to induce coverage information to be
displayed; it would only print when the system noticed unusual delays
between polling intervals. Now, production of coverage logs can be
forced with "coverage/log" command in ovs-appctl. Coverage counters may
be reset with "coverage/clear".