The C17 standard deprecated ATOMIC_VAR_INIT() macro (see [1]). Follow
the suite and remove the ATOMIC_VAR_INIT() usage in favor of simple
assignment of the value as this is what all supported stdatomic.h
implementations do anyway:
* MacOSX.plaform: #define ATOMIC_VAR_INIT(__v) {__v}
* Gcc stdatomic.h: #define ATOMIC_VAR_INIT(VALUE) (VALUE)
1. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1138r0.pdf
For the reference, the _cancel_lookup() function iterates through
the lookup's queries list and detaches them. In the ideal scenario,
that should be the last reference and the query will be destroyed
after that, but it is also possible that we are still expecting a
callback, which also holds a reference (for example, _cancel_lookup()
could have been called from recv_done(), when send_done() was still
not executed).
The start_udp() and start_tcp() functions are currently designed in
slightly different ways: start_udp() creates a new query attachment
`connectquery`, to be called in the callback function, while
start_tcp() does not, which is a bug, but is hidden by the fact
that when the query is being erroneously destroyed prematurely (before
_cancel_lookup() is called) in the result of that, it also gets
de-listed from the lookup's queries' list, so _cancel_lookup() doesn't
even try to detach it.
For better understanding, here's an illustration of the query's
references count changes, and from where it was changed:
UDP
---
1. _new_query() -> refcount = 1 (initial)
2. start_udp() -> refcount = 2 (lookup->current_query)
3. start_udp() -> refcount = 3 (connectquery)
4. udp_ready() -> refcount = 4 (readquery)
5. udp_ready() -> refcount = 5 (sendquery)
6. udp_ready() -> refcount = 4 (lookup->current_query)
7. udp_ready() -> refcount = 3 (connectquery)
8. send_done() -> refcount = 2 (sendquery)
9. recv_done() -> refcount = 1 (readquery)
10. _cancel_lookup() -> refcount = 0 (initial)
11. the query gets destroyed and removed from `lookup->q`
TCP, fortunate scenario
-----------------------
1. _new_query() -> refcount = 1 (initial)
2. start_tcp() -> refcount = 2 (lookup->current_query)
3. launch_next_query() -> refcount = 3 (readquery)
4. launch_next_query() -> refcount = 4 (sendquery)
5. tcp_connected() -> refcount = 3 (lookup->current_query)
6. tcp_connected() -> refcount = 2 (bug, there was no connectquery)
7. send_done() -> refcount = 1 (sendquery)
8. recv_done() -> refcount = 0 (readquery)
9. the query gets prematurely destroyed and removed from `lookup->q`
10. _cancel_lookup() -> the query is not in `lookup->q`
TCP, unfortunate scenario, revealing the bug
--------------------------------------------
1. _new_query() -> refcount = 1 (initial)
2. start_tcp() -> refcount = 2 (lookup->current_query)
3. launch_next_query() -> refcount = 3 (readquery)
4. launch_next_query() -> refcount = 4 (sendquery)
5. tcp_connected() -> refcount = 3 (lookup->current_query)
6. tcp_connected() -> refcount = 2 (bug, there was no connectquery)
7. recv_done() -> refcount = 1 (readquery)
8. _cancel_lookup() -> refcount = 0 (the query was in `lookup->q`)
9. we hit an assertion here when trying to destroy the query, because
sendhandle is not detached (which is done by send_done()).
10. send_done() -> this never happens
This commit does the following:
1. Add a `connectquery` attachment in start_tcp(), like done in
start_udp().
2. Add missing _cancel_lookup() calls for error scenarios, which
were possibly missing because before fixing the bug, calling
_cancel_lookup() and then calling query_detach() would cause
an assertion.
3. Log a debug message and call isc_nm_cancelread(query->readhandle)
for every query in the lookup from inside the _cancel_lookup()
function, like it is done in _cancel_all().
4. Add a `canceled` property for the query which becomes `true` when
the lookup (and subsequently, its queries) are canceled.
5. Use the `canceled` property in the network manager callbacks to
know that the query was canceled, and act like `eresult` was equal
to `ISC_R_CANCELED`.
There was a missing UNLOCK_LOOKUP in the recv_done() callback when
the operation had been canceled. That omission could result in a
deadlock situation.
This commit converts the license handling to adhere to the REUSE
specification. It specifically:
1. Adds used licnses to LICENSES/ directory
2. Add "isc" template for adding the copyright boilerplate
3. Changes all source files to include copyright and SPDX license
header, this includes all the C sources, documentation, zone files,
configuration files. There are notes in the doc/dev/copyrights file
on how to add correct headers to the new files.
4. Handle the rest that can't be modified via .reuse/dep5 file. The
binary (or otherwise unmodifiable) files could have license places
next to them in <foo>.license file, but this would lead to cluttered
repository and most of the files handled in the .reuse/dep5 file are
system test files.
Disable IDN2_USE_STD3_ASCII_RULES to the libidn2 conversion because it
broke encoding some non-letter but valid domain names like _tcp or *.
This reverts commit ef8aa91740592a78c9162f3f7109167f2c9297a5.
This commit adds an isc_nm_socket_type() function which can be used to
obtain a handle's socket type.
This change obsoletes isc_nm_is_tlsdns_handle() and
isc_nm_is_http_handle(). However, it was decided to keep the latter as
we eventually might end up supporting multiple HTTP versions.
libidn2 defaults to UseSTD3ASCIIRules=false. That allows arbitrary ASCII
characters to show up in the toASCII output, including space and
underscore. Enable IDN2_USE_STD3_ASCII_RULES to the libidn2 conversion
to disallow additional characters from the conversion (see Validity
Criteria[1]).
Remove the dynamic registration of result codes. Convert isc_result_t
from unsigned + #defines into 32-bit enum type in grand unified
<isc/result.h> header. Keep the existing values of the result codes
even at the expense of the description and identifier tables being
unnecessary large.
Additionally, add couple of:
switch (result) {
[...]
default:
break;
}
statements where compiler now complains about missing enum values in the
switch statement.
This commit makes dig fail with error in case a zone transfer is
attempted over a connections where ALPN was not negotiated. All other
request types will work fine.
The native PKCS#11 support has been removed in favour of better
maintained, more performance and easier to use OpenSSL PKCS#11 engine
from the OpenSC project.
This commit replaces ad-hoc code for DoH connect URI construction with
isc_nm_http_makeuri(), making it handle IPv6 adresses properly (among
other things).
Current mempools are kind of hybrid structures - they serve two
purposes:
1. mempool with a lock is basically static sized allocator with
pre-allocated free items
2. mempool without a lock is a doubly-linked list of preallocated items
The first kind of usage could be easily replaced with jemalloc small
sized arena objects and thread-local caches.
The second usage not-so-much and we need to keep this (in
libdns:message.c) for performance reasons.
This commit adds two new autoconf options `--enable-doh` (enabled by
default) and `--with-libnghttp2` (mandatory when DoH is enabled).
When DoH support is disabled the library is not linked-in and support
for http(s) protocol is disabled in the netmgr, named and dig.
Previously, we would set the locale on a global level and that could
possibly lead to different behaviour in underlying functions. In this
commit, we change to code to use the system locale only when calling the
libidn2 functions and reset the locale back to "POSIX" when exiting the
libidn2 code.
The isc_nmiface_t type was holding just a single isc_sockaddr_t,
so we got rid of the datatype and use plain isc_sockaddr_t in place
where isc_nmiface_t was used before. This means less type-casting and
shorter path to access isc_sockaddr_t members.
At the same time, instead of keeping the reference to the isc_sockaddr_t
that was passed to us when we start listening, we will keep a local
copy. This prevents the data race on destruction of the ns_interface_t
objects where pending nmsockets could reference the sockaddr of already
destroyed ns_interface_t object.
dns_name_copy() has been replaced nearly everywhere with
dns_name_copynf(). this commit changes the last two uses of
the original function. afterward, we can remove the old
dns_name_copy() implementation, and replace it with _copynf().
dns_message_gettempname() now returns a pointer to an initialized
name associated with a dns_fixedname_t object. it is no longer
necessary to allocate a buffer for temporary names associated with
the message object.
This workarounds couple of races where the current_lookup would be
already detached during shutting down the dig, but still processing the
pending reads.
The start_udp() function didn't properly attach to the query and thus
a callback with ISC_R_CANCELED would end with wrong accounting on the
query object.
Usually, this doesn't happen because underlying libuv API
uv_udp_connect() is synchronous, but isc_nm_udpconnect() could return
ISC_R_CANCELED in case it's called while the netmgr is shutting down.
Previously, netmgr, taskmgr, timermgr and socketmgr all had their own
isc_<*>mgr_create() and isc_<*>mgr_destroy() functions. The new
isc_managers_create() and isc_managers_destroy() fold all four into a
single function and makes sure the objects are created and destroy in
correct order.
Especially now, when taskmgr runs on top of netmgr, the correct order is
important and when the code was duplicated at many places it's easy to
make mistake.
The former isc_<*>mgr_create() and isc_<*>mgr_destroy() functions were
made private and a single call to isc_managers_create() and
isc_managers_destroy() is required at the program startup / shutdown.
_query_detach function was incorrectly unliking the query object from
the lookup->q query list, this made it impossible to follow a query
lookup failure with the next one in the list (possibly using a separate
resolver), as the link to the next query in the list was dissolved.
Fix by unliking the node only when the query object is about to be
destroyed, i.e. there is no more references to the object.
This commit changes the taskmgr to run the individual tasks on the
netmgr internal workers. While an effort has been put into keeping the
taskmgr interface intact, couple of changes have been made:
* The taskmgr has no concept of universal privileged mode - rather the
tasks are either privileged or unprivileged (normal). The privileged
tasks are run as a first thing when the netmgr is unpaused. There
are now four different queues in in the netmgr:
1. priority queue - netievent on the priority queue are run even when
the taskmgr enter exclusive mode and netmgr is paused. This is
needed to properly start listening on the interfaces, free
resources and resume.
2. privileged task queue - only privileged tasks are queued here and
this is the first queue that gets processed when network manager
is unpaused using isc_nm_resume(). All netmgr workers need to
clean the privileged task queue before they all proceed normal
operation. Both task queues are processed when the workers are
finished.
3. task queue - only (traditional) task are scheduled here and this
queue along with privileged task queues are process when the
netmgr workers are finishing. This is needed to process the task
shutdown events.
4. normal queue - this is the queue with netmgr events, e.g. reading,
sending, callbacks and pretty much everything is processed here.
* The isc_taskmgr_create() now requires initialized netmgr (isc_nm_t)
object.
* The isc_nm_destroy() function now waits for indefinite time, but it
will print out the active objects when in tracing mode
(-DNETMGR_TRACE=1 and -DNETMGR_TRACE_VERBOSE=1), the netmgr has been
made a little bit more asynchronous and it might take longer time to
shutdown all the active networking connections.
* Previously, the isc_nm_stoplistening() was a synchronous operation.
This has been changed and the isc_nm_stoplistening() just schedules
the child sockets to stop listening and exits. This was needed to
prevent a deadlock as the the (traditional) tasks are now executed on
the netmgr threads.
* The socket selection logic in isc__nm_udp_send() was flawed, but
fortunatelly, it was broken, so we never hit the problem where we
created uvreq_t on a socket from nmhandle_t, but then a different
socket could be picked up and then we were trying to run the send
callback on a socket that had different threadid than currently
running.
dig previously ran isc_nm_udpconnect() three times before giving
up, to work around a freebsd bug that caused connect() to return
a spurious transient EADDRINUSE. this commit moves the retry code
into the network manager itself, so that isc_nm_udpconnect() no
longer needs to return a result code.
Before this commit, a premature EOF (connection closed) on tcp queries
was causing dig to automatically attempt to send the query again, even
if +tries=1 or +retries=0 was provided on command line.
This commit fix the problem by taking into account the no. of retries
specified by the user when processing a premature EOF on tcp
connections.
The TIME_NOW macro calls isc_time_now which uses CLOCK_REALTIME_COARSE
for getting the current time. This is perfectly fine for millisecond,
however when the user request microsecond resolutiuon, they are going
to get very inaccurate results. This is especially true on a server
class machine where the clock ticks may be set to 100HZ.
This changes dig to use the new TIME_NOW_HIRES macro that uses the
CLOCK_MONOTONIC_RAW that is more expensive, but gets the *actual*
current time rather than the at the last kernel time tick.
On 24-core machine, the tests would crash because we would run out of
the hazard pointers. We now adjust the number of hazard pointers to be
in the <128,256> interval based on the number of available cores.
Note: This is just a band-aid and needs a proper fix.
* Following the example set in 634bdfb16d8, the tlsdns netmgr
module now uses libuv and SSL primitives directly, rather than
opening a TLS socket which opens a TCP socket, as the previous
model was difficult to debug. Closes#2335.
* Remove the netmgr tls layer (we will have to re-add it for DoH)
* Add isc_tls API to wrap the OpenSSL SSL_CTX object into libisc
library; move the OpenSSL initialization/deinitialization from dstapi
needed for OpenSSL 1.0.x to the isc_tls_{initialize,destroy}()
* Add couple of new shims needed for OpenSSL 1.0.x
* When LibreSSL is used, require at least version 2.7.0 that
has the best OpenSSL 1.1.x compatibility and auto init/deinit
* Enforce OpenSSL 1.1.x usage on Windows
* Added a TLSDNS unit test and implemented a simple TLSDNS echo
server and client.
This is a part of the works that intends to make the netmgr stable,
testable, maintainable and tested. It contains a numerous changes to
the netmgr code and unfortunately, it was not possible to split this
into smaller chunks as the work here needs to be committed as a complete
works.
NOTE: There's a quite a lot of duplicated code between udp.c, tcp.c and
tcpdns.c and it should be a subject to refactoring in the future.
The changes that are included in this commit are listed here
(extensively, but not exclusively):
* The netmgr_test unit test was split into individual tests (udp_test,
tcp_test, tcpdns_test and newly added tcp_quota_test)
* The udp_test and tcp_test has been extended to allow programatic
failures from the libuv API. Unfortunately, we can't use cmocka
mock() and will_return(), so we emulate the behaviour with #define and
including the netmgr/{udp,tcp}.c source file directly.
* The netievents that we put on the nm queue have variable number of
members, out of these the isc_nmsocket_t and isc_nmhandle_t always
needs to be attached before enqueueing the netievent_<foo> and
detached after we have called the isc_nm_async_<foo> to ensure that
the socket (handle) doesn't disappear between scheduling the event and
actually executing the event.
* Cancelling the in-flight TCP connection using libuv requires to call
uv_close() on the original uv_tcp_t handle which just breaks too many
assumptions we have in the netmgr code. Instead of using uv_timer for
TCP connection timeouts, we use platform specific socket option.
* Fix the synchronization between {nm,async}_{listentcp,tcpconnect}
When isc_nm_listentcp() or isc_nm_tcpconnect() is called it was
waiting for socket to either end up with error (that path was fine) or
to be listening or connected using condition variable and mutex.
Several things could happen:
0. everything is ok
1. the waiting thread would miss the SIGNAL() - because the enqueued
event would be processed faster than we could start WAIT()ing.
In case the operation would end up with error, it would be ok, as
the error variable would be unchanged.
2. the waiting thread miss the sock->{connected,listening} = `true`
would be set to `false` in the tcp_{listen,connect}close_cb() as
the connection would be so short lived that the socket would be
closed before we could even start WAIT()ing
* The tcpdns has been converted to using libuv directly. Previously,
the tcpdns protocol used tcp protocol from netmgr, this proved to be
very complicated to understand, fix and make changes to. The new
tcpdns protocol is modeled in a similar way how tcp netmgr protocol.
Closes: #2194, #2283, #2318, #2266, #2034, #1920
* The tcp and tcpdns is now not using isc_uv_import/isc_uv_export to
pass accepted TCP sockets between netthreads, but instead (similar to
UDP) uses per netthread uv_loop listener. This greatly reduces the
complexity as the socket is always run in the associated nm and uv
loops, and we are also not touching the libuv internals.
There's an unfortunate side effect though, the new code requires
support for load-balanced sockets from the operating system for both
UDP and TCP (see #2137). If the operating system doesn't support the
load balanced sockets (either SO_REUSEPORT on Linux or SO_REUSEPORT_LB
on FreeBSD 12+), the number of netthreads is limited to 1.
* The netmgr has now two debugging #ifdefs:
1. Already existing NETMGR_TRACE prints any dangling nmsockets and
nmhandles before triggering assertion failure. This options would
reduce performance when enabled, but in theory, it could be enabled
on low-performance systems.
2. New NETMGR_TRACE_VERBOSE option has been added that enables
extensive netmgr logging that allows the software engineer to
precisely track any attach/detach operations on the nmsockets and
nmhandles. This is not suitable for any kind of production
machine, only for debugging.
* The tlsdns netmgr protocol has been split from the tcpdns and it still
uses the old method of stacking the netmgr boxes on top of each other.
We will have to refactor the tlsdns netmgr protocol to use the same
approach - build the stack using only libuv and openssl.
* Limit but not assert the tcp buffer size in tcp_alloc_cb
Closes: #2061
The recv_done() callback had many exit paths with different conditions,
and every path had it's own set of destructors. The refactored code now
has unified exit path with descriptive goto labels matching the intent:
- cancel_lookup
- next_lookup
- detach_query
- keep_query
The only exception to the rule is check_for_more_data() path, where the
part of the query gets reused, so the query->readhandle and query gets
detached on it's own, and by going to the keep_query, we are just
skipping calling the destructors again.
FreeBSD sometimes returns spurious errors in UDP connect() attempts,
so we try a few times before giving up. However, each failed attempt
triggers a call to udp_ready() in dighost.c, and that was causing
the query object to be detached prematurely.
Sometimes, the dig_lookup_t could be destroyed before the final
send_done() callback was be called, leading to dereferencing an
already freed dig_lookup_t object. By making the dig_lookup_t
reference counted, we are ensuring that it won't be freed until
the last reference (from dig_query_t .lookup) is released.
because dig now uses the netmgr, printing of response messages
happens in a different thread than setup. the IDN output filtering
procedure, which set using dns_name_settotextfilter(), is stored as
thread-local data, and so if it's set during setup, it won't be
accessible when printing. we now set it immediately before printing,
in the same thread, and clear it immedately afterward.
The network manager does not support returning UDP datagrams to
clients from unexpected sources; it is therefore not possible for
dig to accept them. The "+[no]unexpected" option has therefore
been removed from the dig command and its documentation.
The dns_message_create() function cannot soft fail (as all memory
allocations either succeed or cause abort), so we change the function to
return void and cleanup the calls.