Somewhere in the move from netmgr/uv-compat.h to uv.c, the
uv_os_getenv() implementation was lost in the process. Restore the
implementation, so we can support Debian stretch for couple more months.
Add a #if to make it clear that struct xrdata->order is only used
in DNS_RDATASET_FIXED mode.
Re-order some variable declarations to merge two #if blocks into one.
As we are going to use libuv outside of the netmgr, we need the shims to
be readily available for the rest of the codebase.
Move the "netmgr/uv-compat.h" to <isc/uv.h> and netmgr/uv-compat.c to
uv.c, and as a rule of thumb, the users of libuv should include
<isc/uv.h> instead of <uv.h> directly.
Additionally, merge netmgr/uverr2result.c into uv.c and rename the
single function from isc__nm_uverr2result() to isc_uverr2result().
Move the netmgr socket related functions from netmgr/netmgr.c and
netmgr/uv-compat.c to netmgr/socket.c, so they are all present all in
the same place. Adjust the names of couple interal functions
accordingly.
These checks have been redundant since the `rbtdb64` implementation
was removed in 2018 (commit 784087390ae8). It isn't possible to create
a zone that uses `database "rbt64"` now that the `rbt64` database
implementation has been removed, so the checks will always fail.
Sometimes the compiler is unable to see that the `empty` variable was
initialized by the call to is_empty(), which can cause a build
failure; I encountered this with CFLAGS=-Os. So get rid of it and use
the result from `is_empty()` instead.
The rbtnode->rpz flag was left behind when rbt and rpz were disentangled
by CHANGES #4576. Removing it makes the comment above correct again.
This reduces the flags so they fit in a 32 bit word again. On 64
bit systems there is still padding so it doesn't change the size
of an rbtnode. On 32 bit systems it reduces an rbtnode by 4 bytes.
Coverity detected issues:
- var_decl: Declaring variable "diff" without initializer.
- uninit_use_in_call: Using uninitialized value "diff.tuples.head" when
calling "dns_diff_clear".
SIG and RRSIG records for private algorithms are supposed to contain
the name / OID of the algorithm used to generate them at the start
of the signature field.
The DNS catalog zones draft version 5 document requires that catalog
zones consumers must reset the member zone's internal zone state when
its unique label changes (either within the same catalog zone or
during change of ownership performed using the "coo" property).
BIND already behaves like that, and, in fact, doesn't support keeping
the zone state during change of ownership even if the unique label
has been kept the same, because BIND always removes the member zone
and adds it back during unique label renaming or change of ownership.
Document the described behavior and add a log message to inform when
unique label renaming occurs.
Add a system test case with unique label renaming.
We check the `rdclass` to be of type IN in `dns_catz_update_process()`
function, and all the other static functions where similar checks exist
are called after (and in the result of) that function being called,
so they are effectively redundant.
The DNS catalog zones draft version 5 document describes various
situations when a catalog zones must be considered as "broken" and
not be processed.
Implement those checks in catz.c and add corresponding system tests.
Add DNS extended errors 3 (Stale Answer) and 19 (Stale NXDOMAIN Answer)
to responses. Add extra text with the reason why the stale answer was
returned.
To test, we need to change the configuration such that for the first
set of tests the stale-refresh-time window does not interfer with the
expected extended errors.
In zone.c, the "me" strings were defined for functions that could be
traced with "ENTER" macro.
Use the __func__ that's defined by the compiler and is less prone to
copy&paste errors.
The shutdown test sends 'rdnc status' commands in parallel with
'rndc stop' A new rndc connection arriving will reference the ACL
environment to see whether the client is allowed to connect.
Commit c0995bc380 added a mutex lock to ns_interfacemgr_getaclenv(),
but if the new connection arrives while the interfaces are being
purged during shutdown, that lock is already being held. If the
the connection event slips in ahead of one of the netmgr's "stop
listening" events on a worker thread, a deadlock can occur.
The fix is not to hold the interfacemgr lock while shutting down
interfaces; only while actually traversing the interface list to
identify interfaces needing shutdown.
previously fctx_done() detached the fctx but did not clear the pointer
passed into it from the caller. in some conditions, when rctx_done()
was reached while waiting for a validator to complete, fctx_done()
could be called twice on the same fetch, causing a double detach.
fctx_done() now clears the fctx pointer, to reduce the chances of
such mistakes.
There is a possibility for `udp_recv()` to be called with `eresult`
being `ISC_R_SUCCESS`, but nevertheless with already deactivated `resp`,
which can happen when the request has been canceled in the meantime.
This commit ensures that write callbacks are getting called only after
the data has been sent via the network.
Without this fix, a situation could appear when a write callback could
get called before the actual encrypted data would have been sent to
the network. Instead, it would get called right after it would have
been passed to the OpenSSL (i.e. encrypted).
Most likely, the issue does not reveal itself often because the
callback call was asynchronous, so in most cases it should have been
called after the data has been sent, but that was not guaranteed by
the code logic.
Also, this commit removes one memory allocation (netievent) from a hot
path, as there is no need to call this callback asynchronously
anymore.
In dns_adb_cancelfind(), we need to release the find lock and
then acquire the bucket and find locks in that order, for
consistency with locking hierarchy elsehwere. Previously we
were only acquiring the bucket lock.
Also rewrote the function for better readability.
The connect()ed UDP socket provides feedback on a variety of ICMP
errors (eg port unreachable) which bind can then use to decide what to
do with errors (report them to the client, try again with a different
nameserver etc). However, Linux's implementation does not report what
it considers "transient" conditions, which is defined as Destination
host Unreachable, Destination network unreachable, Source Route Failed
and Message Too Big.
Explicitly enable IP_RECVERR / IPV6_RECVERR (via libuv uv_udp_bind()
flag) to learn about ICMP destination network/host unreachable.
When we compile with libuv that has some capabilities via flags passed
to f.e. uv_udp_listen() or uv_udp_bind(), the call with such flags would
fail with invalid arguments when older libuv version is linked at the
runtime that doesn't understand the flag that was available at the
compile time.
Enforce minimal libuv version when flags have been available at the
compile time, but are not available at the runtime. This check is less
strict than enforcing the runtime libuv version to be same or higher
than compile time libuv version.
The interfacemgr and the .route was being detached while the network
manager had pending read from the socket. Instead of detaching from the
socket, we need to cancel the read which in turn will detach the route
socket and the associated interfacemgr.
Instead of checking if we need to re-seed for every isc_random call,
seed the random number generator in the libisc global initializer
and the per-thread initializer.
It used to require two 32-bit integer divisions to get a random number
less than some limit. Now we use Daniel Lemire's "nearly-divisionless"
algorithm for unbiased bounded random numbers, which requires one
64-bit integer multiply in the usual case, and one 32-bit integer
division in rare slow cases. Even the slow cases are faster than
before; there are also fewer branches.
I think this algorithm is exceptionally beautiful. It also has more
clever tricks than lines of code, so I have done my best to explain
how it works.
The dns__adb_attach() had an assertion failure that prevented to attach
to dns_adb if the dns_adb was shutting down. There was a race between
checking for .exiting in dns_adb_createfind and creating new_adbfind() -
other thread could have set the .exiting to true between the check.
Remove the assertion failure and allow attaching to dns_adb even while
shutting down. The process of dns_adb shutting down would be noticed
only a moments later when any other callback is called.
The rctx_chaseds() function calls dns_resolver_createfetch(), passing
fctx->task as the target task to run resume_dslookup() from. This
breaks task-based serialization of events as fctx->task is the task that
the dns_resolver_createfetch() caller wants to receive its fetch
completion event in; meanwhile, intermediate fetches started by the
resolver itself (e.g. related to QNAME minimization) must use
res->buckets[bucketnum].task instead. This discrepancy may cause
trouble if the resume_dslookup() callback happens to be run concurrently
with e.g. fctx_doshutdown().
Fix by passing the correct task to dns_resolver_createfetch() in
rctx_chaseds().
BIND 9 plugins are installed using Automake's pkglib_LTLIBRARIES stanza,
which causes the relevant shared objects to be placed in the
$(libdir)/@PACKAGE@/ directory, where @PACKAGE@ is expanded to the
lowercase form of the first argument passed to AC_INIT(), i.e. "bind".
Meanwhile, NAMED_PLUGINDIR - the preprocessor macro that the
ns_plugin_expandpath() function uses for determining the absolute path
to a plugin for which only a filename has been provided (rather than a
path) - is set to $(libdir)/named. This discrepancy breaks loading
plugins using just their filenames. Fix the issue (and also prevent it
from reoccurring) by setting NAMED_PLUGINDIR to $(pkglibdir).
Since version 5.0.0, decay-based purging is the only available dirty
page cleanup mechanism in jemalloc. It relies on so-called tickers,
which are simple data structures used for ensuring that certain actions
are taken "once every N times". Ticker data (state) is stored in a
thread-specific data structure called tsd in jemalloc parlance. Ticks
are triggered when extents are allocated and deallocated. Once every
1000 ticks, jemalloc attempts to release some of the dirty pages hanging
around (if any). This allows memory use to be kept in check over time.
This dirty page cleanup mechanism has a quirk. If the first
allocator-related action for a given thread is a free(), a
minimally-initialized tsd is set up which does not include ticker data.
When that thread subsequently calls *alloc(), the tsd transitions to its
nominal state, but due to a certain flag being set during minimal tsd
initialization, ticker data remains unallocated. This prevents
decay-based dirty page purging from working, effectively enabling memory
exhaustion over time. [1]
The quirk described above has been addressed (by moving ticker state to
a different structure) in jemalloc's development branch [2], but not in
any numbered jemalloc version released to date (the latest one being
5.2.1 as of this writing).
Work around the problem by ensuring that every thread spawned by
isc_thread_create() starts with a malloc() call. Avoid immediately
calling free() for the dummy allocation to prevent an optimizing
compiler from stripping away the malloc() + free() pair altogether.
An alternative implementation of this workaround was considered that
used a pair of isc_mem_create() + isc_mem_destroy() calls instead of
malloc() + free(), enabling the change to be fully contained within
isc__trampoline_run() (i.e. to not touch struct isc__trampoline), as the
compiler is not allowed to strip away arbitrary function calls.
However, that solution was eventually dismissed as it triggered
ThreadSanitizer reports when tools like dig, nsupdate, or rndc exited
abruptly without waiting for all worker threads to finish their work.
[1] https://github.com/jemalloc/jemalloc/issues/2251
[2] c259323ab3
dns_message_findname and dns_message_sectiontotext incorrectly accepted
DNS_SECTION_ANY. If DNS_SECTION_ANY was passed the section array could
be incorrectly accessed at (-1).
dns_message_pseudosectiontotext and dns_message_pseudosectiontoyaml
incorrectly accepted DNS_PSEUDOSECTION_ANY. These functions are
designed to process a single section.