Because dns_resolver_createfetch() locks the view, it was necessary
to unlock the zone in zone_refreshkeys() before calling it in order
to maintain the lock order, and relock afterward. this permitted a race
with dns_zone_synckeyzone().
This commit moves the call to dns_resolver_createfetch() into a separate
function which is called asynchronously after the zone has been
unlocked.
The keyfetch object now attaches to the zone to ensure that
it won't be shut down before the asynchronous call completes.
This necessitated refactoring dns_zone_detach() so it always runs
unlocked. For managed zones it now schedules zone_shutdown() to
run asynchronously, and for unmanaged zones, it requires the last
dns_zone_detach() to be run without loopmgr running.
when more than one event was scheduled in the isc_aysnc queue,
they were executed in reverse order. we need to pull events
off the back of queue instead the front, so that uv_loop will
run them in the right order.
note that isc_job_run() has the same behavior, because it calls
uv_idle_start() directly. in that case we just document it so
it'll be less surprising in the future.
Instead of fixing a Coverity complaint (and other style nits),
delete it because it needs input data that can't be generated
with the tools that ship with BIND.
Currently the 'duration_test' unit test checks only the
cfg_obj_asduration() function.
Extend the test so it checks also the reverse operation using the
cfg_print_duration() function, which is used in named-checkconf.
The `render` benchmark loads some binary DNS message dumps and
repeatedly passes them to `dns_message_render`.
The `compress` benchmark loads a list of domain names and packs them
into 4KiB chunks using `dns_name_towire`.
Check that names are correctly added and deleted in the compression
context. Use many names with differing numerical prefixes to make it
relatively easy to identify and debug problems.
All we need for compression is a very small hash set of compression
offsets, because most of the information we need (the previously added
names) can be found in the message using the compression offsets.
This change combines dns_compress_find() and dns_compress_add() into
one function dns_compress_name() that both finds any existing suffix,
and adds any new prefix to the table. The old split led to performance
problems caused by duplicate names in the compression context.
Compression contexts are now either small or large, which the caller
chooses depending on the expected size of the message. There is no
dynamic resizing.
There is a behaviour change: compression now acts on all the labels in
each name, instead of just the last few.
A small benchmark suggests this is about 2x faster.
sizeof(dns_name_t) did not change but the boolean attributes are now
separated as one-bit structure members. This allows debuggers to
pretty-print dns_name_t attributes without any special hacks, plus we
got rid of manual bit manipulation code.
In ac4cc8443dddc8e900188b4beae54c7ca222094c, the ISC_R_CONNREFUSED was
removed in connect_read_cb, but it can actually happen in the udp_test:
[ RUN ] udp_recv_send
connect_read_cb(0x7f2c2801a270, connection refused, (nil))
The ISC_R_SHUTTINGDOWN should be handled the same as ISC_R_CANCELED in
the udp__send_cb(), as we might be sending the data while the
loopmgr/netmgr shutdown has been initiated.
In rare circumstances, the UDP port for the listening socket and the UDP
port for the connecting socket might be the same. Because we use the
"reuse" port socket option, this isn't caught when binding the socket,
and thus the connected client socket could send a datagram to itself,
completely bypassing the server. This doesn't happen under normal
operation mode because `named` is listening on a privileged port (53),
and even if not, it doesn't usually talk to itself as the tests do.
Pick an arbitrary port for listening (9153-9156) that is outside the
ephemeral port range for the network manager related unit tests (except
the `doh_test).
Since we are testing UDP on the localhost and the same interface, the
UDP datagrams can't get lost. Change the connect read callback, so it
starts reading again on the timeout instead of just getting stuck, and
fail when any other result codes than ISC_R_SUCCESS and ISC_R_TIMEDOUT
are received because we don't expect them to happen in these simple
tests.
This commit removes broken remnants of unit test slowdown logic, which
caused unit test hangs on platforms susceptible to "too many open
files" error, notably OpenBSD.
There was inconsistency in which error codes would get accepted and
ignored in the network manager unit test callbacks. Add following
results, so we just detach the handle instead of causing assertion
failure:
* ISC_R_SHUTTINGDOWN - when the network manager is shutting down
* ISC_R_CANCELED - the socket has been shut down
* ISC_R_EOF - the (TCP) communication has ended on the other side
* ISC_R_CONNECTIONRESET - the TCP connection was reset
This should fix some of the spurious unit test failures.
If sending took too long the isc_nm_read() could timeout twice, leading
to extra 'cread' counter in the udp_cancel_read test. Increase the
cread counter only on ISC_R_EOF (canceled read) and deal with the
multiple ISC_R_TIMEOUTS gracefully.
The dns_view implements weak and strong reference counting. When strong
reference counting reaches zero, the adb, ntatable and resolver objects
are shut down and detached.
In dns_zone and dns_nta the dns_view was weakly attached, but the
view->resolver reference was accessed directly leading to dereferencing
the NULL pointer.
Add dns_view_getresolver() method which attaches to view->resolver
object under the lock (if it still exists) ensuring the dns_resolver
will be kept referenced until not needed.
Previously, the isc_mem_get_aligned() and friends took alignment size as
one of the arguments. Replace the specific function with more generic
extended variant that now accepts ISC_MEM_ALIGN(alignment) for aligned
allocations and ISC_MEM_ZERO for allocations that zeroes
the (re-)allocated memory before returning the pointer to the caller.
Coverity is optimistic that we might do thousands of hashes in less
than a microsecond.
/tests/bench/siphash.c: 54 in main()
48 count++;
49 }
50
51 isc_time_now_hires(&finish);
52
53 us = isc_time_microdiff(&finish, &start);
>>> CID 358309: Integer handling issues (DIVIDE_BY_ZERO)
>>> In expression "count * 1000UL / us", division by expression "us" which may be zero has undefined behavior.
54 printf("%f us wide-lower len %3zu, %7llu kh/s (%llx)\n",
55 (double)us / 1000000.0, len,
56 (unsigned long long)(count * 1000 / us),
57 (unsigned long long)sum);
58 }
59
Formerly, the isc_hash32() would have to change the key in a local copy
to make it case insensitive. Change the isc_siphash24() and
isc_halfsiphash24() functions to lowercase the input directly when
reading it from the memory and converting the uint8_t * array to
64-bit (respectively 32-bit numbers).
dohpath is specfied in draft-ietf-add-svcb-dns and has a value
of 7. It must be a relative path (start with a /), be encoded
as UTF8 and contain the variable dns ({?dns}).
Previously, the isc_mem_debugging would be single global variable that
would affect the behavior of the memory context whenever it would be
changed which could be after some allocation were already done.
Change the memory debugging options to be local to the memory context
and immutable, so all allocations within the same memory context are
treated the same.
By bumping the minimum libuv version to 1.34.0, it allows us to remove
all libuv shims we ever had and makes the code much cleaner. The
up-to-date libuv is available in all distributions supported by BIND
9.19+ either natively or as a backport.
After the loopmgr work has been merged, we can now cleanup the TCP and
TLS protocols a little bit, because there are stronger guarantees that
the sockets will be kept on the respective loops/threads. We only need
asynchronous call for listening sockets (start, stop) and reading from
the TCP (because the isc_nm_read() might be called from read callback
again.
This commit does the following changes (they are intertwined together):
1. Cleanup most of the asynchronous events in the TCP code, and add
comments for the events that needs to be kept asynchronous.
2. Remove isc_nm_resumeread() from the netmgr API, and replace
isc_nm_resumeread() calls with existing isc_nm_read() calls.
3. Remove isc_nm_pauseread() from the netmgr API, and replace
isc_nm_pauseread() calls with a new isc_nm_read_stop() call.
4. Disable the isc_nm_cancelread() for the streaming protocols, only the
datagram-like protocols can use isc_nm_cancelread().
5. Add isc_nmhandle_close() that can be used to shutdown the socket
earlier than after the last detach. Formerly, the socket would be
closed only after all reading and sending would be finished and the
last reference would be detached. The new isc_nmhandle_close() can
be used to close the underlying socket earlier, so all the other
asynchronous calls would call their respective callbacks immediately.
Co-authored-by: Ondřej Surý <ondrej@isc.org>
Co-authored-by: Artem Boldariev <artem@isc.org>
This change prepares ground for sending DNS requests using DoT,
which, in particular, will be used for forwarding dynamic updates
to TLS-enabled primaries.
Instead of checking if we need to re-seed for every isc_random call,
seed the random number generator in the libisc global initializer
and the per-thread initializer.
In the udp_shutdown_read unit test, delay the isc_loopmgr_shutdown() to
the send callback, and in the udp_cancel_read test wait for a single
timed out test, then read again, send an UDP packet and cancel the read
from the send callback.
The network manager UDP code was misinterpreting when the libuv called
the udp_recv_cb with nrecv == 0 and addr == NULL -> this doesn't really
mean that the "stream" has ended, but the libuv indicates that the
receive buffer can be freed. This could lead to assertion failure in
the code that calls isc_nm_read() from the network manager read callback
due to the extra spurious callbacks.
Properly handle the extra callback calls from the libuv in the client
read callback, and refactor the UDP isc_nm_read() implementation to be
synchronous, so no datagram is lost between the time that we stop the
reading from the UDP socket and we restart it again in the asychronous
udpread event.
Add a unit test that tests the isc_nm_read() call from the read
callback to receive two datagrams.
When converting a string to lower case, the compiler is able to
autovectorize nicely, so a nice simple implementation is also very
fast, comparable to memcpy().
Comparisons are more difficult for the compiler, so we convert eight
bytes at a time using "SIMD within a register" tricks. Experiments
indicate it's best to stick to simple loops for shorter strings and
the remainder of long strings.
The netmgr_test unit test has been subdivided into tcp_test,
tcpdns_test, tls_test, tlsdns_test, and udp_test components.
These have been updated to use the new loopmgr.
Previously:
* applications were using isc_app as the base unit for running the
application and signal handling.
* networking was handled in the netmgr layer, which would start a
number of threads, each with a uv_loop event loop.
* task/event handling was done in the isc_task unit, which used
netmgr event loops to run the isc_event calls.
In this refactoring:
* the network manager now uses isc_loop instead of maintaining its
own worker threads and event loops.
* the taskmgr that manages isc_task instances now also uses isc_loopmgr,
and every isc_task runs on a specific isc_loop bound to the specific
thread.
* applications have been updated as necessary to use the new API.
* new ISC_LOOP_TEST macros have been added to enable unit tests to
run isc_loop event loops. unit tests have been updated to use this
where needed.
* isc_timer was rewritten using the uv_timer, and isc_timermgr_t was
completely removed; isc_timer objects are now directly created on the
isc_loop event loops.
* the isc_timer API has been simplified. the "inactive" timer type has
been removed; timers are now stopped by calling isc_timer_stop()
instead of resetting to inactive.
* isc_manager now creates a loop manager rather than a timer manager.
* modules and applications using isc_timer have been updated to use the
new API.
This commit introduces new APIs for applications and signal handling,
intended to replace isc_app for applications built on top of libisc.
* isc_app will be replaced with isc_loopmgr, which handles the
starting and stopping of applications. In isc_loopmgr, the main
thread is not blocked, but is part of the working thread set.
The loop manager will start a number of threads, each with a
uv_loop event loop running. Setup and teardown functions can be
assigned which will run when the loop starts and stops, and
jobs can be scheduled to run in the meantime. When
isc_loopmgr_shutdown() is run from any the loops, all loops
will shut down and the application can terminate.
* signal handling will now be handled with a separate isc_signal unit.
isc_loopmgr only handles SIGTERM and SIGINT for application
termination, but the application may install additional signal
handlers, such as SIGHUP as a signal to reload configuration.
* new job running primitives, isc_job and isc_async, have been added.
Both units schedule callbacks (specifying a callback function and
argument) on an event loop. The difference is that isc_job unit is
unlocked and not thread-safe, so it can be used to efficiently
run jobs in the same thread, while isc_async is thread-safe and
uses locking, so it can be used to pass jobs from one thread to
another.
* isc_tid will be used to track the thread ID in isc_loop worker
threads.
* unit tests have been added for the new APIs.
Clean up dns_rdatalist_tordataset() and dns_rdatalist_fromrdataset()
functions by making them return void, because they cannot fail.
Clean up other functions that subsequently cannot fail.
Instead of returning error values from isc_rwlock_*(), isc_mutex_*(),
and isc_condition_*() macros/functions and subsequently carrying out
runtime assertion checks on the return values in the calling code,
trigger assertion failures directly in those macros/functions whenever
any pthread function returns an error, as there is no point in
continuing execution in such a case anyway.
This commit removes an assertion from the unit test which cannot be
guaranteed.
According to the test, exactly one client send must succeed. However,
it cannot really be guaranteed, as do not start to read data in the
accept callback on the server nor attach to the accepted handle. Thus,
we can expect the connection to be closed soon after we have returned
from the callback.
Interestingly enough, the test would pass just fine on TCP because:
a) there are fewer layers involved and thus there is less processing;
b) it is possible for the data to be sent and end up in an internal OS
socket buffer without being touched by an application's code on the
server. In such a case the client's write callback still would be
called successfully;
There is a chance for the test to succeed over TLS as well (as it
happily did before), but as the code has been changed to close unused
connections as soon as possible, the chance is far slimmer now.
What can be guaranteed is:
* cconnects == 1 (number client connections equals 1);
* saccepts == 1 (number of accepted connections equals 1).
The name compression unit test is expanded to check that the compressed
form matches the expected wire pattern.
Record owner names are compressed differently to rdata names by
calling dns_name_towire2 instead of dns_name_towire so check that
owner names are compressed correctly as well.
We do this by adding callbacks for when a node is added or deleted
from the keytable. dns_keytable_add and dns_keytable_delete where
extended to take a callback. dns_keytable_deletekey does not remove
the node so it was not extended.
Similarly to how different code paths reused common client handle
pointers and fetch references despite being logically unrelated, they
also reuse client->recursionquota, the field in which a reference to the
recursion quota is stored. This unnecessarily forces all code using
that field to be aware of the fact that it is overloaded by different
features.
Overloading client->recursionquota also causes inconsistent behavior.
For example, if prefetch code triggers recursion and then delegation
handling code also triggers recursion, only one of these code paths will
be able to attach to the recursion quota, but both recursions will be
started anyway. In other words, each code path only checks whether the
recursion quota has not been exceeded if the quota has not yet been
attached to by another code path. This behavior theoretically allows
the configured recursion quota to be slightly exceeded; while it is not
expected to be a real-world operational issue, it is still confusing and
should therefore be fixed.
Extend the structures comprising the 'recursions' array with a new field
holding a pointer to the recursion quota that a given recursion process
attached to. Update all code paths using client->recursionquota so that
they use the appropriate slot in the 'recursions' array. Drop the
'recursionquota' field from ns_client_t.
Async hooks are the last feature using the client->fetchhandle and
client->query.fetch pointers. Update ns_query_hookasync() and
query_hookresume() so that they use a dedicated slot in the 'recursions'
array. Note that async hooks are still not expected to initiate
recursion if one was already started by a prior ns_query_recurse() call,
so the REQUIRE assertion in ns_query_hookasync() needs to check the
RECTYPE_NORMAL slot rather than the RECTYPE_HOOK one.