get_null_fd() is only called from daemon.c.
It does not need thread safety features anymore as
it is called either through daemonize_start() or
indirectly through daemonize_complete() once.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
The following code does not add any users yet.
The visioned workflow that this piece of code should work with is:
* Create a windows service through a startup script with
a tool like 'sc'
ex: sc create ovsdb-server binpath=
"C:\openvswitch\usr\sbin\ovsdb-server.exe -vconsole:off
-vsyslog:off -vfile:info --remote=ptcp:6632:127.0.0.1 --log-file
--service-monitor --service"
* Start the service from the startup script.
ex: sc start ovsdb-server
* Terminate the service during shutdown process.
ex: sc stop ovsdb-server
* Abrupt termination will restart the service.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Some functions are unused and some functions can be
declared as static.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This allows other libraries to use util.h that has already
defined NOT_REACHED.
Signed-off-by: Harold Lim <haroldl@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This commit removes the CACHE_TIME scheme from timeval module. This
is for eliminating the lock contention over the read/write lock of
the cached time. To get the time, the thread now will directly do
the system call 'clock_gettime()'.
As a side effect, timer can only be warpped after timer is stopped
by 'appctl time/stop' command.
Signed-off-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
Currently we are creating the worker process after creation of the pidfile.
This means that the responsibility of deleting the pidfile after process
termination rests with the worker process.
When we restart openvswitch using the startup scripts, we SIGTERM the main
process and once it is cleaned up, we start ovs-vswitchd again. This results
in a race condition. The new ovs-vswitchd will create a pidfile because it is
unlocked. But, if the old worker process exits after the start of new
ovs-vswitchd, it will simply delete the pidfile underneath the new ovs-vswitchd.
This will eventually result in multiple ovs-vswitchd daemons.
This patch gives the responsibility of deleting the pidfile to the main
process.
Bug #16669.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
This is a straight search-and-replace, except that I also removed #include
<assert.h> from each file where there were no assert calls left.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
To keep control+C and other signals in the initiating session from killing
the monitor process, we need to put the monitor process into its own
session. However, until this point, we've only done that for the daemon
processes that the monitor started, which means that control+C would kill
the monitor but not the daemons that it launched.
I don't know of a benefit to putting the monitor and daemon processes in
different sessions, as opposed to one new session for both of them, so
this change does the latter.
daemonize_post_detach() is called from one additional context where we'd
want to be in a new session, the worker_start() function, but that function
is documented as to be called after daemonize_start(), in which case we
will (after this commit) already have called setsid(), so no additional
change is required there.
Bug #14280.
Reported-by: Gordon Good <ggood@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
make_pidfile() depends on the link() system call to atomically
create pidfiles when multiple daemons are started concurrently.
However, this system call isn't available on ESX so an alternative
strategy is necessary. Fortunately, the approach this patch takes
is cleaner than the original code.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
This will be more useful later when we introduces "worker" subprocesses.
I don't have any current plans to introduce threading, but I can't
think of a disadvantage to wording this in a general manner.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Replaced all instances of Nicira Networks(, Inc) to Nicira, Inc.
Feature #10593
Signed-off-by: Raju Subramanian <rsubramanian@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
On one machine, "/etc/init.d/openvswitch-switch start" failed to start
with:
ovs-vswitchd: fork child failed to signal startup (Success)
Starting ovs-vswitchd ... failed!
"strace" revealed that the fork child was actually segfaulting, but the
message output didn't indicate that in any way. This commit fixes the
log message (but not the segfault itself).
Reported-by: Michael Hu <mhu@nicira.com>
Bug #8457.
Until now, "emer" has effectively been "off" because no messages were ever
logged at "emer" level. Justin points out that it is useful to use "emer"
for messages that indicate a fatal error. This commit makes that change
and adds a new "off" level to really turn off all logging to a facility.
This message will appear repeatedly when ovs-vswitchd is running, if there
is any stale pidfile in /var/run/openvswitch, because ovs-vswitchd reads
all of the pidfiles in that directory periodically to update statistics.
Until now, if two copies of one OVS daemon started up at the same time,
then due to races in pidfile creation it was possible for both of them to
start successfully, instead of just one. This was made worse when a
previous copy of the daemon had died abruptly, leaving a stale pidfile.
This commit implements a new pidfile creation and removal protocol that I
believe closes these races. Now, a pidfile is asserted with "link" instead
of "rename", which prevents the race on creation, and a stale pidfile may
only be deleted by a process after it has taken a lock on it.
This may solve mysterious problems seen occasionally on vswitch restart.
I'm still puzzled by these problems, however, because I don't see anything
in our tests cases that would actually cause two copies of a daemon to
start at the same time, which as far as I can see is a necessary
precondition for the problem.
Until now, it has been the responsibility of an individual daemon to call
die_if_already_running() at an appropriate time. A long time ago, this
had to happen *before* daemonizing, because once the process daemonized
itself there was no way to report failure to the process that originally
started the daemon. With the introduction of daemonize_start(), this is
now possible, but we haven't been taking advantage of it.
Therefore, this commit integrates the die_if_already_running() call into
daemonize_start() and deletes the calls to it from individual daemons.
Otherwise when we add support for saving and restoring configuration
of internal devices around kernel module unload and reload, there's
no easy way for the "restore" code to tell when all the interfaces
should be set up and ready for configuration.
The version of valgrind I have in my test VMs doesn't know what F_GETLK
does, so it complains that l_pid is uninitialized even though fcntl sets
it. Initializing it ourselves before calling the function avoids a series
of false-positive warnings about use of uninitialized data.
This makes it possible to run tests that need access to installation
directories, such as the rundir, without having access to the actual
installation directories (/var/run is generally not world-writable), by
setting environment variables. This is not a good way to do things in
general--usually it would be better to choose the correct directories
at configure time--so for now this is undocumented.
When the monitored child is killed with SIGTERM, the monitoring process
currently logs a message like "1 crashes: pid 12345 died, killed by
signal 15 (Terminated), exiting". This counts the SIGTERM as a crash, even
though it's intentional.
This commit changes the log message to omit the "%d crashes" part on normal
termination.
Opening a file descriptor and then closing it always discards any locks
held on the underlying file, even if the file is still open as another file
descriptor. This meant that calling read_pidfile() on the process's own
pidfile would discard the lock and make other OVS processes think that the
process had died. This commit fixes the problem.
If a process daemonizes itself, then it should be possible to control that
process's log levels with "ovs-appctl vlog/set" and related commands. The
vlog_init() function registers those commands. But vlog_init() doesn't
normally get called until the first log message is issued. This can take a
while, especially for ovs-controller, where I first noticed the problem.
This commit fixes the problem by calling vlog_init() from
daemonize_start(), which always gets called as a process daemonizes.
Adding a macro to define the vlog module in use adds a level of
indirection, which makes it easier to change how the vlog module must be
defined. A followup commit needs to do that, so getting these widespread
changes out of the way first should make that commit easier to review.
If a monitored daemon dies quickly at startup, the system can waste a lot
of CPU time continually restarting it. This commit prevents a given
daemon from restarting more than once every 10 seconds.
If the monitored daemon dumps core frequently, then this can quickly
exhaust the host's disk space. This commit limits core dumps to at most
one per monitored session (typically, once per boot).
This leaked a small amount of memory each time a daemon process was
created. It is only important if a daemon is otherwise very buggy.
Found with valgrind.
When --monitor is used, administrators sometimes become confused about the
presence of two copies of each process. This commit attempts to clarify
the situation by making the monitoring process change its process name, as
seen in /proc/$pid/cmdline and in "ps", to clearly indicate what is going
on.
CC: Dan Wendlandt <dan@nicira.com>