With taskmgr running on top of netmgr, the ordering of how the tasks and
netmgr shutdown interacts was wrong as previously isc_taskmgr_destroy()
was waiting until all tasks were properly shutdown and detached. This
responsibility was moved to netmgr, so we now need to do the following:
1. shutdown all the tasks - this schedules all shutdown events onto
the netmgr queue
2. shutdown the netmgr - this also makes sure all the tasks and
events are properly executed
3. Shutdown the taskmgr - this now waits for all the tasks to finish
running before returning
4. Shutdown the netmgr - this call waits for all the netmgr netievents
to finish before returning
This solves the race when the taskmgr object would be destroyed before
all the tasks were finished running in the netmgr loops.
Previously, netmgr, taskmgr, timermgr and socketmgr all had their own
isc_<*>mgr_create() and isc_<*>mgr_destroy() functions. The new
isc_managers_create() and isc_managers_destroy() fold all four into a
single function and makes sure the objects are created and destroy in
correct order.
Especially now, when taskmgr runs on top of netmgr, the correct order is
important and when the code was duplicated at many places it's easy to
make mistake.
The former isc_<*>mgr_create() and isc_<*>mgr_destroy() functions were
made private and a single call to isc_managers_create() and
isc_managers_destroy() is required at the program startup / shutdown.
This commit changes the taskmgr to run the individual tasks on the
netmgr internal workers. While an effort has been put into keeping the
taskmgr interface intact, couple of changes have been made:
* The taskmgr has no concept of universal privileged mode - rather the
tasks are either privileged or unprivileged (normal). The privileged
tasks are run as a first thing when the netmgr is unpaused. There
are now four different queues in in the netmgr:
1. priority queue - netievent on the priority queue are run even when
the taskmgr enter exclusive mode and netmgr is paused. This is
needed to properly start listening on the interfaces, free
resources and resume.
2. privileged task queue - only privileged tasks are queued here and
this is the first queue that gets processed when network manager
is unpaused using isc_nm_resume(). All netmgr workers need to
clean the privileged task queue before they all proceed normal
operation. Both task queues are processed when the workers are
finished.
3. task queue - only (traditional) task are scheduled here and this
queue along with privileged task queues are process when the
netmgr workers are finishing. This is needed to process the task
shutdown events.
4. normal queue - this is the queue with netmgr events, e.g. reading,
sending, callbacks and pretty much everything is processed here.
* The isc_taskmgr_create() now requires initialized netmgr (isc_nm_t)
object.
* The isc_nm_destroy() function now waits for indefinite time, but it
will print out the active objects when in tracing mode
(-DNETMGR_TRACE=1 and -DNETMGR_TRACE_VERBOSE=1), the netmgr has been
made a little bit more asynchronous and it might take longer time to
shutdown all the active networking connections.
* Previously, the isc_nm_stoplistening() was a synchronous operation.
This has been changed and the isc_nm_stoplistening() just schedules
the child sockets to stop listening and exits. This was needed to
prevent a deadlock as the the (traditional) tasks are now executed on
the netmgr threads.
* The socket selection logic in isc__nm_udp_send() was flawed, but
fortunatelly, it was broken, so we never hit the problem where we
created uvreq_t on a socket from nmhandle_t, but then a different
socket could be picked up and then we were trying to run the send
callback on a socket that had different threadid than currently
running.
The isc_mem API now crashes on memory allocation failure, and this is
the next commit in series to cleanup the code that could fail before,
but cannot fail now, e.g. isc_result_t return type has been changed to
void for the isc_log API functions that could only return ISC_R_SUCCESS.
The isc_buffer_allocate() function now cannot fail with ISC_R_MEMORY.
This commit removes all the checks on the return code using the semantic
patch from previous commit, as isc_buffer_allocate() now returns void.
When a task manager is created, we can now specify an `isc_nm`
object to associate with it; thereafter when the task manager is
placed into exclusive mode, the network manager will be paused.
All unit tests define the UNIT_TESTING macro, which causes <cmocka.h> to
replace malloc(), calloc(), realloc(), and free() with its own functions
tracking memory allocations. In order for this not to break
compilation, the system header declaring the prototypes for these
standard functions must be included before <cmocka.h>.
Normally, these prototypes are only present in <stdlib.h>, so we make
sure it is included before <cmocka.h>. However, musl libc also defines
the prototypes for calloc() and free() in <sched.h>, which is included
by <pthread.h>, which is included e.g. by <isc/mutex.h>. Thus, unit
tests including "dnstest.h" (which includes <isc/mem.h>, which includes
<isc/mutex.h>) after <cmocka.h> will not compile with musl libc as for
these programs, <sched.h> will be included after <cmocka.h>.
Always including <cmocka.h> after all other header files is not a
feasible solution as that causes the mock assertion macros defined in
<isc/util.h> to mangle the contents of <cmocka.h>, thus breaking
compilation. We cannot really use the __noreturn__ or analyzer_noreturn
attributes with cmocka assertion functions because they do return if the
tested condition is true. The problem is that what BIND unit tests do
is incompatible with Clang Static Analyzer's assumptions: since we use
cmocka, our custom assertion handlers are present in a shared library
(i.e. it is the cmocka library that checks the assertion condition, not
a macro in unit test code). Redefining cmocka's assertion macros in
<isc/util.h> is an ugly hack to overcome that problem - unfortunately,
this is the only way we can think of to make Clang Static Analyzer
properly process unit test code. Giving up on Clang Static Analyzer
being able to properly process unit test code is not a satisfactory
solution.
Undefining _GNU_SOURCE for unit test code could work around the problem
(musl libc's <sched.h> only defines the prototypes for calloc() and
free() when _GNU_SOURCE is defined), but doing that could introduce
discrepancies for unit tests including entire *.c files, so it is also
not a good solution.
All in all, including <sched.h> before <cmocka.h> for all affected unit
tests seems to be the most benign way of working around this musl libc
quirk. While quite an ugly solution, it achieves our goals here, which
are to keep the benefit of proper static analysis of unit test code and
to fix compilation against musl libc.
The common construct seen in the BIND 9 source is func(isc_mem_t *mctx, ...).
Unfortunately, the dnstest.{h,c} has been using mctx as a global symbol, which
in turn generated a lot of errors when update.c got included in update_test.c.
As a rule of thumb, we should avoid naming global symbols with generic names
(like mctx) and we should prefix them with "namespace" (like dt_mctx).
The CHECK() macro has been defined both in dnstest.h and update.c
files. This has created a conflict between macro definitions when
including both of the files in update_test.c. While the CHECK() macro
is convenient for the tests, it has been really used in just two
files, so the MR moves them into those respective .c files.
The three functions has been modeled after the arc4random family of
functions, and they will always return random bytes.
The isc_random family of functions internally use these CSPRNG (if available):
1. getrandom() libc call (might be available on Linux and Solaris)
2. SYS_getrandom syscall (might be available on Linux, detected at runtime)
3. arc4random(), arc4random_buf() and arc4random_uniform() (available on BSDs and Mac OS X)
4. crypto library function:
4a. RAND_bytes in case OpenSSL
4b. pkcs_C_GenerateRandom() in case PKCS#11 library
Implement dns_test_difffromchanges(), a function which enables preparing
a dns_diff_t structure from a mostly-textual representation of zone
database changes to be applied. This will improve readability of test
case definitions by allowing contents of a dns_diff_t structure, passed
e.g. to update_sigs(), to be represented in a human-friendly manner.
The dns_test_makezone() helper function always assigns the created zone
to some view, which is not always necessary and complicates cleanup of
non-managed zones as they are required not to be assigned to any view.
Rework dns_test_makezone() in order to make it easier to use in unit
tests operating on non-managed zones. Use dns_name_fromstring() instead
of dns_name_fromtext() to simplify code. Do not use the CHECK() macro
and add comments to make code flow simpler to follow. Use
dns_test_makeview() instead of dns_view_create().
Adjust existing unit tests using this function so that they still pass.
Replace dns_fixedname_init() calls followed by dns_fixedname_name()
calls with calls to dns_fixedname_initname() where it is possible
without affecting current behavior and/or performance.
This patch was mostly prepared using Coccinelle and the following
semantic patch:
@@
expression fixedname, name;
@@
- dns_fixedname_init(&fixedname);
...
- name = dns_fixedname_name(&fixedname);
+ name = dns_fixedname_initname(&fixedname);
The resulting set of changes was then manually reviewed to exclude false
positives and apply minor tweaks.
It is likely that more occurrences of this pattern can be refactored in
an identical way. This commit only takes care of the low-hanging fruit.