The task exclusive mode stops all processing (tasks and networking IO)
except the designated exclusive task events. This has impact on the
operation of the server. Add log messages indicating when we start the
exclusive mode, and when we end exclusive task mode.
This commit converts the license handling to adhere to the REUSE
specification. It specifically:
1. Adds used licnses to LICENSES/ directory
2. Add "isc" template for adding the copyright boilerplate
3. Changes all source files to include copyright and SPDX license
header, this includes all the C sources, documentation, zone files,
configuration files. There are notes in the doc/dev/copyrights file
on how to add correct headers to the new files.
4. Handle the rest that can't be modified via .reuse/dep5 file. The
binary (or otherwise unmodifiable) files could have license places
next to them in <foo>.license file, but this would lead to cluttered
repository and most of the files handled in the .reuse/dep5 file are
system test files.
While doing code review, it was found that the taskmgr->exiting is set
under taskmgr->lock, but accessed under taskmgr->excl_lock in the
isc_task_beginexclusive().
Additionally, before the change that moved running the tasks to the
netmgr, the task_ready() subrouting of isc_task_detach() would lock
mgr->lock, requiring the mgr->excl to be protected mgr->excl_lock
to prevent deadlock in the code. After !4918 has been merged, this is
no longer true, and we can remove taskmgr->excl_lock and use
taskmgr->lock in its stead.
Solve both issues by removing the taskmgr->excl_lock and exclusively use
taskmgr->lock to protect both taskmgr->excl and taskmgr->exiting which
now doesn't need to be atomic_bool, because it's always accessed from
within the locked section.
The isc_taskmgr_excltask() would return ISC_R_NOTFOUND either when the
exclusive task was not set (yet) or when the taskmgr is shutting down
and the exclusive task has been already cleared.
Distinguish between the two states and return ISC_R_SHUTTINGDOWN when
the taskmgr is being shut down instead of ISC_R_NOTFOUND.
The isc/platform.h header was left empty which things either already
moved to config.h or to appropriate headers. This is just the final
cleanup commit.
configuring with --enable-mutex-atomics flagged these incorrectly
initialised variables on systems where pthread_mutex_init doesn't
just zero out the structure.
Since a client object is bound to a netmgr handle, each client
will always be processed by the same netmgr worker, so we can
simplify the code by binding client->task to the same thread as
the client. Since ns__client_request() now runs in the same event
loop as client->task events, is no longer necessary to pause the
task manager before launching them.
Also removed some functions in isc_task that were not used.
We were clearing the pointer to taskmgr as soon as isc_taskmgr_destroy()
would be called and before all tasks were finished. Unfortunately, some
tasks would use global named_g_taskmgr objects from inside the events
and this would cause either a data race or NULL pointer dereference.
This commit fixes the data race by moving the destruction of the
referenced pointer to the time after all tasks are finished.
Network manager events that require interlock (pause, resume, listen)
are now always executed in the same worker thread, mgr->workers[0],
to prevent races.
"stoplistening" events no longer require interlock.
The netmgr listening, stoplistening, pausing and resuming functions
now use barriers for synchronization, which makes the code much simpler.
isc/barrier.h defines isc_barrier macros as a front-end for uv_barrier
on platforms where that works, and pthread_barrier where it doesn't
(including TSAN builds).
all zone loading tasks have the privileged flag, but we only want
them to run as privileged tasks when the server is being initialized;
if we privilege them the rest of the time, the server may hang for a
long time after a reload/reconfig. so now we call isc_taskmgr_setmode()
to turn privileged execution mode on or off in the task manager.
isc_task_privileged() returns true if the task's privilege flag is
set *and* the taskmgr is in privileged execution mode. this is used
to determine in which netmgr event queue the task should be run.
There was a theoretical possibility of clogging up the queue processing
with an endless loop where currently processing netievent would schedule
new netievent that would get processed immediately. This wasn't such a
problem when only netmgr netievents were processed, but with the
addition of the tasks, there are at least two situation where this could
happen:
1. In lib/dns/zone.c:setnsec3param() the task would get re-enqueued
when the zone was not yet fully loaded.
2. Tasks have internal quantum for maximum number of isc_events to be
processed, when the task quantum is reached, the task would get
rescheduled and then immediately processed by the netmgr queue
processing.
As the isc_queue doesn't have a mechanism to atomically move the queue,
this commit adds a mechanism to quantize the queue, so enqueueing new
netievents will never stop processing other uv_loop_t events.
The default quantum size is 128.
Since the queue used in the network manager allows items to be enqueued
more than once, tasks are now reference-counted around task_ready()
and task_run(). task_ready() now has a public API wrapper,
isc_task_ready(), that the netmgr can use to reschedule processing
of a task if the quantum has been reached.
Incidental changes: Cleaned up some unused fields left in isc_task_t
and isc_taskmgr_t after the last refactoring, and changed atomic
flags to atomic_bools for easier manipulation.
With taskmgr running on top of netmgr, the ordering of how the tasks and
netmgr shutdown interacts was wrong as previously isc_taskmgr_destroy()
was waiting until all tasks were properly shutdown and detached. This
responsibility was moved to netmgr, so we now need to do the following:
1. shutdown all the tasks - this schedules all shutdown events onto
the netmgr queue
2. shutdown the netmgr - this also makes sure all the tasks and
events are properly executed
3. Shutdown the taskmgr - this now waits for all the tasks to finish
running before returning
4. Shutdown the netmgr - this call waits for all the netmgr netievents
to finish before returning
This solves the race when the taskmgr object would be destroyed before
all the tasks were finished running in the netmgr loops.
Previously, netmgr, taskmgr, timermgr and socketmgr all had their own
isc_<*>mgr_create() and isc_<*>mgr_destroy() functions. The new
isc_managers_create() and isc_managers_destroy() fold all four into a
single function and makes sure the objects are created and destroy in
correct order.
Especially now, when taskmgr runs on top of netmgr, the correct order is
important and when the code was duplicated at many places it's easy to
make mistake.
The former isc_<*>mgr_create() and isc_<*>mgr_destroy() functions were
made private and a single call to isc_managers_create() and
isc_managers_destroy() is required at the program startup / shutdown.
This commit changes the taskmgr to run the individual tasks on the
netmgr internal workers. While an effort has been put into keeping the
taskmgr interface intact, couple of changes have been made:
* The taskmgr has no concept of universal privileged mode - rather the
tasks are either privileged or unprivileged (normal). The privileged
tasks are run as a first thing when the netmgr is unpaused. There
are now four different queues in in the netmgr:
1. priority queue - netievent on the priority queue are run even when
the taskmgr enter exclusive mode and netmgr is paused. This is
needed to properly start listening on the interfaces, free
resources and resume.
2. privileged task queue - only privileged tasks are queued here and
this is the first queue that gets processed when network manager
is unpaused using isc_nm_resume(). All netmgr workers need to
clean the privileged task queue before they all proceed normal
operation. Both task queues are processed when the workers are
finished.
3. task queue - only (traditional) task are scheduled here and this
queue along with privileged task queues are process when the
netmgr workers are finishing. This is needed to process the task
shutdown events.
4. normal queue - this is the queue with netmgr events, e.g. reading,
sending, callbacks and pretty much everything is processed here.
* The isc_taskmgr_create() now requires initialized netmgr (isc_nm_t)
object.
* The isc_nm_destroy() function now waits for indefinite time, but it
will print out the active objects when in tracing mode
(-DNETMGR_TRACE=1 and -DNETMGR_TRACE_VERBOSE=1), the netmgr has been
made a little bit more asynchronous and it might take longer time to
shutdown all the active networking connections.
* Previously, the isc_nm_stoplistening() was a synchronous operation.
This has been changed and the isc_nm_stoplistening() just schedules
the child sockets to stop listening and exits. This was needed to
prevent a deadlock as the the (traditional) tasks are now executed on
the netmgr threads.
* The socket selection logic in isc__nm_udp_send() was flawed, but
fortunatelly, it was broken, so we never hit the problem where we
created uvreq_t on a socket from nmhandle_t, but then a different
socket could be picked up and then we were trying to run the send
callback on a socket that had different threadid than currently
running.
Since all the libraries are internal now, just cleanup the ISCAPI remnants
in isc_socket, isc_task and isc_timer APIs. This means, there's one less
layer as following changes have been done:
* struct isc_socket and struct isc_socketmgr have been removed
* struct isc__socket and struct isc__socketmgr have been renamed
to struct isc_socket and struct isc_socketmgr
* struct isc_task and struct isc_taskmgr have been removed
* struct isc__task and struct isc__taskmgr have been renamed
to struct isc_task and struct isc_taskmgr
* struct isc_timer and struct isc_timermgr have been removed
* struct isc__timer and struct isc__timermgr have been renamed
to struct isc_timer and struct isc_timermgr
* All the associated code that dealt with typing isc_<foo>
to isc__<foo> and back has been removed.
Previously, the taskmgr, timermgr and socketmgr had a constructor
variant, that would create the mgr on top of existing appctx. This was
no longer true and isc_<*>mgr was just calling isc_<*>mgr_create()
directly without any extra code.
This commit just cleans up the extra function.
The pthread_self(), thrd_current() or GetCurrentThreadId() could
actually be a pointer, so we should rather convert the value into
uintptr_t instead of unsigned long.
isc_task_pause/unpause were inherently thread-unsafe - a task
could be paused only once by one thread, if the task was running
while we paused it it led to races. Fix it by making sure that
the task will pause if requested to, and by using a 'pause reference
counter' to count task pause requests - a task will be unpaused
iff all threads unpause it.
Don't remove from queue when pausing task - we lock the queue lock
(expensive), while it's unlikely that the task will be running -
and we'll remove it anyway in dispatcher
Also disable the semantic patch as the code needs tweaks here and there because
some destroy functions might not destroy the object and return early if the
object is still in use.
To reproduce the race - create a task, send two events to it, first one
must take some time. Then, from the outside, pause(), unpause() and detach()
the task.
When the long-running event is processed by the task it is in
task_state_running state. When we called pause() the state changed to
task_state_paused, on unpause we checked that there are events in the task
queue, changed the state to task_state_ready and enqueued the task on the
workers readyq. We then detach the task.
The dispatch() is done with processing the event, it processes the second
event in the queue, and then shuts down the task and frees it (as it's not
referenced anymore). Dispatcher then takes the, already freed, task from
the queue where it was wrongly put, causing an use-after free and,
subsequently, either an assertion failure or a segmentation fault.
The probability of this happening is very slim, yet it might happen under a
very high load, more probably on a recursive resolver than on an
authoritative.
The fix introduces a new 'task_state_pausing' state - to which tasks
are moved if they're being paused while still running. They are moved
to task_state_paused state when dispatcher is done with them, and
if we unpause a task in paused state it's moved back to task_state_running
and not requeued.
This allows a task to be temporary disabled so that objects won't be
processed simultaneously by libuv events and isc_task events. When a
task is paused, currently running events may complete, but no further
event will added to the run queue will be executed until the task is
unpaused.
When a task manager is created, we can now specify an `isc_nm`
object to associate with it; thereafter when the task manager is
placed into exclusive mode, the network manager will be paused.
This is a replacement for the existing isc_socket and isc_socketmgr
implementation. It uses libuv for asynchronous network communication;
"networker" objects will be distributed across worker threads reading
incoming packets and sending them for processing.
UDP listener sockets automatically create an array of "child" sockets
so each worker can listen separately.
TCP sockets are shared amongst worker threads.
A TCPDNS socket is a wrapper around a TCP socket, which handles the
the two-byte length field at the beginning of DNS messages over TCP.
(Other wrapper socket types can be implemented in the future to handle
DNS over TLS, DNS over HTTPS, etc.)
isc_event_allocate() calls isc_mem_get() to allocate the event structure. As
isc_mem_get() cannot fail softly (e.g. it never returns NULL), the
isc_event_allocate() cannot return NULL, hence we remove the (ret == NULL)
handling blocks using the semantic patch from the previous commit.
Previously isc_thread_join() would return ISC_R_UNEXPECTED on a failure to
create new thread. All such occurences were caught and wrapped into assert
function at higher level. The function was simplified to assert directly in the
isc_thread_join() function and all caller level assertions were removed.
Previously isc_thread_create() would return ISC_R_UNEXPECTED on a failure to
create new thread. All such occurences were caught and wrapped into assert
function at higher level. The function was simplified to assert directly in the
isc_thread_create() function and all caller level assertions were removed.
The json-c have previously leaked into the global namespace leading
to forced -I<include_path> for every compilation unit using isc/xml.h
header. This MR fixes the usage making the caller object opaque.
The libxml2 have previously leaked into the global namespace leading
to forced -I<include_path> for every compilation unit using isc/xml.h
header. This MR fixes the usage making the caller object opaque.