The "glue-cache" option was marked as deprecated by commit
5ae33351f286feb25a965bf3c9e6b122ab495342 (first released in BIND 9.17.6,
back in October 2020), so now obsolete that option, removing all code
and documentation related to it.
Note: this causes the glue cache feature to be permanently enabled, not
disabled.
The only values that the isc_quota_attach() function (called from
check_recursionquota() via recursionquotatype_attach_soft()) can
currently return are: ISC_R_SUCCESS, ISC_R_SOFTQUOTA, and ISC_R_QUOTA.
Instead of just propagating any other (unexpected) error up the call
stack, assert immediately, so that if the isc_quota_* API gets updated
in the future to return values currently matching the "default"
statement, check_recursionquota() can be promptly updated to handle such
new return values as desired.
Reduce code duplication in check_recursionquota() by extracting its
parts responsible for logging to a separate helper function.
Remove result text from the "no more recursive clients" message because
it always says "quota reached" (as the relevant branch is only evaluated
when 'result' is set to ISC_R_QUOTA) and therefore brings no additional
value.
When the client->recursionquota pointer was overloaded by different
features, each of those features had to be aware of that fact and handle
any updates of that pointer gracefully. Example: prefetch code
initiates recursion, attaching to client->recursionquota, then query
processing restarts due to a CNAME being encountered, then that CNAME is
not found in the cache, so another recursion is triggered, but
client->recursionquota is already attached to; even though it is not
CNAME chaining code that attached to that pointer, that code still has
to handle such a situation gracefully.
However, all features that can initiate recursion have now been updated
to use separate slots in the 'recursions' array, so keeping the old
checks in place means masking future programming errors that could
otherwise be caught - and should be caught because each feature needs to
properly maintain its own quota reference.
Remove outdated recursion quota pointer checks to enable the assertions
in isc_quota_*() functions to detect programming errors in code paths
that can start recursion. Remove an outdated comment to prevent
confusion.
Some functions fail to detach from the recursion quota if an error
occurs while initiating recursion. This causes the recursive client
counter to be off. Add missing recursionquota_detach() calls, reworking
cleanup code where appropriate.
Similarly to how different code paths reused common client handle
pointers and fetch references despite being logically unrelated, they
also reuse client->recursionquota, the field in which a reference to the
recursion quota is stored. This unnecessarily forces all code using
that field to be aware of the fact that it is overloaded by different
features.
Overloading client->recursionquota also causes inconsistent behavior.
For example, if prefetch code triggers recursion and then delegation
handling code also triggers recursion, only one of these code paths will
be able to attach to the recursion quota, but both recursions will be
started anyway. In other words, each code path only checks whether the
recursion quota has not been exceeded if the quota has not yet been
attached to by another code path. This behavior theoretically allows
the configured recursion quota to be slightly exceeded; while it is not
expected to be a real-world operational issue, it is still confusing and
should therefore be fixed.
Extend the structures comprising the 'recursions' array with a new field
holding a pointer to the recursion quota that a given recursion process
attached to. Update all code paths using client->recursionquota so that
they use the appropriate slot in the 'recursions' array. Drop the
'recursionquota' field from ns_client_t.
Previously, multiple code paths reused client->query.fetch, so it was
enough for ns_query_cancel() to issue a single call to
dns_resolver_cancelfetch() with that fetch as an argument. Now, since
each slot in the 'recursions' array can hold a reference to a separate
resolver fetch, ns_query_cancel() needs to handle all of them, so that
all recursion callbacks get a chance to clean up the associated
resources when a query is canceled.
Async hooks are the last feature using the client->fetchhandle and
client->query.fetch pointers. Update ns_query_hookasync() and
query_hookresume() so that they use a dedicated slot in the 'recursions'
array. Note that async hooks are still not expected to initiate
recursion if one was already started by a prior ns_query_recurse() call,
so the REQUIRE assertion in ns_query_hookasync() needs to check the
RECTYPE_NORMAL slot rather than the RECTYPE_HOOK one.
With prefetch and RPZ code updated to use separate slots in the
'recursions' array, the code responsible for starting recursion in
ns_query_recurse() and resuming query handling in fetch_callback()
should follow suit, so that it does not need to explicitly cooperate
with other code paths that may initiate recursion.
Replace:
- client->fetchhandle with HANDLE_RECTYPE_NORMAL(client)
- client->query.fetch with FETCH_RECTYPE_NORMAL(client)
Also update other functions using client->fetchhandle and
client->query.fetch (ns_query_cancel(), query_usestale()) so that those
two fields can shortly be dropped altogether.
Both prefetch code and RPZ code ignore recursion results (caching the
response notwithstanding). RPZ code has been (ab)using that fact since
commit 08e36aa5a5c7697a839f83831fccf8fb3f792848 by employing
prefetch_done() as the fetch completion callback. This is only
seemingly a simplification as it makes the code harder to follow ("why
is prefetch code used for handling RPZ-triggered recursion?").
Turn prefetch_done() into a new function whose name clearly conveys its
purpose. Add a parameter to its prototype in order to allow callers to
specify which slot in the 'recursions' array it should use. Reintroduce
prefetch_done() as a wrapper for that function. Add rpzfetch_done(), an
RPZ-exclusive wrapper for that function (using a distinct recursion
type).
Since each slot in the 'recursions' array needs to be initialized before
getting cleaned up when recursion completes, rework fetch_and_forget()
so that it takes recursion type rather than extra fetch options as the
last parameter and make it use the requested slot in the 'recursions'
array rather than a fixed slot (RECTYPE_PREFETCH) for all callers. This
makes fetch_and_forget() a logical complement of cleanup_after_fetch().
Collectively, these changes make prefetch and RPZ code logically
separate (except for reusing client->recursionquota, which will be
refactored later).
Replace:
- client->prefetchhandle with HANDLE_RECTYPE_PREFETCH(client)
- client->query.prefetch with FETCH_RECTYPE_PREFETCH(client)
This is preparatory work for separating prefetch code from RPZ code.
Initialize client->query using a compound literal in order to make the
ns_query_init() function shorter and more readable. This also prevents
the need to explicitly initialize any newly added fields in the future.
query_prefetch() and query_rpzfetch() contain a lot of duplicated code.
Extract the common bits into a separate function whose name clearly
suggests its purpose.
Add a set of new helper functions for attaching to the recursion quota
in order to reduce code duplication and to ensure that the recursive
clients counter is always adjusted properly. Since some callers
(query_prefetch(), query_rpzfetch()) treat exceeding the soft quota as
an error while others (check_recursionquota()) do not, also add two
wrapper functions whose names help convey their purpose, in order to
improve code readability.
Add a new helper function for detaching from the recursion quota in
order to reduce code duplication and to ensure that detaching from that
quota is always accompanied by decreasing the recursive clients counter.
Commit 9ffb4a7ba11fae64a6ce2dd6390cd334372b7ab7 causes Clang Static
Analyzer to flag a potential NULL dereference in query_nxdomain():
query.c:9394:26: warning: Dereference of null pointer [core.NullDereference]
if (!qctx->nxrewrite || qctx->rpz_st->m.rpz->addsoa) {
^~~~~~~~~~~~~~~~~~~
1 warning generated.
The warning above is for qctx->rpz_st potentially being a NULL pointer
when query_nxdomain() is called from query_resume(). This is a false
positive because none of the database lookup result codes currently
causing query_nxdomain() to be called (DNS_R_EMPTYWILD, DNS_R_NXDOMAIN)
can be returned by a database lookup following a recursive resolution
attempt. Add a NULL check nevertheless in order to future-proof the
code and silence Clang Static Analyzer.
ns_client_getnamebuf() cannot fail (i.e. return NULL) since commit
e31cc1eeb436095490c7caa120de148df82ecd6c. Remove redundant NULL checks
performed on the pointer returned by ns_client_getnamebuf().
ns_client_newname() cannot fail (i.e. return NULL) since commit
2ce0de699528c8d505adfde37a916b1742e5562f (though it was only made more
apparent by commit 33ba0057a7c44d4e5d63f7f55e1823279e996a19). Remove
redundant NULL checks performed on the pointer returned by
ns_client_newname().
ns_client_newrdataset() cannot fail (i.e. return NULL) since commit
efb385ecdcfd3213b3bb739a3dcb9e431690e559 (though it was only made more
apparent by commit 33ba0057a7c44d4e5d63f7f55e1823279e996a19). Remove
redundant NULL checks performed on the pointer returned by
ns_client_newrdataset().
The dns_message_gettempname(), dns_message_gettemprdata(),
dns_message_gettemprdataset(), and dns_message_gettemprdatalist() always
succeeds because the memory allocation cannot fail now. Change the API
to return void and cleanup all the use of aforementioned functions.
The ns_statscounter_recursclients counter was previously only
incremented or decremented if client->recursionquota was non-NULL.
This was harmless, because that value should always be non-NULL if
recursion is enabled, but it made the code slightly confusing.
Add DNS extended errors 3 (Stale Answer) and 19 (Stale NXDOMAIN Answer)
to responses. Add extra text with the reason why the stale answer was
returned.
To test, we need to change the configuration such that for the first
set of tests the stale-refresh-time window does not interfer with the
expected extended errors.
In order to modify the .localhost and .localnets members of the
dns_aclenv, all other processing on the netmgr loops needed to be
stopped using the task exclusive mode. Add the isc_rwlock to the
dns_aclenv, so any modifications to the .localhost and .localnets can be
done under the write lock.
In couple places, we have missed INSIST(0) or ISC_UNREACHABLE()
replacement on some branches with UNREACHABLE(). Replace all
ISC_UNREACHABLE() or INSIST(0) calls with UNREACHABLE().
The way the ns_client_t .shuttingdown member was practically dead code.
The .shuttingdown would be set to true only in ns__client_put() function
meaning that we have detached from all ns_client_t .*handles and the
ns_client_t object being freed:
client->magic = 0;
client->shuttingdown = true;
[...]
isc_mem_put(manager->ctx, client, sizeof(*client))
Meanwhile the ns_client_t object is accessed like this:
isc_nmhandle_detach(&client->fetchhandle);
client->query.attributes &= ~NS_QUERYATTR_RECURSING;
client->state = NS_CLIENTSTATE_WORKING;
qctx_init(client, &devent, 0, &qctx);
client_shuttingdown = ns_client_shuttingdown(client);
if (fetch_canceled || fetch_answered || client_shuttingdown) {
[...]
}
Even if the isc_nmhandle_detach(...) was the last handle detach, it
would mean that immediatelly, after calling the isc_nmhandle_detach(),
we would be causing use-after-free, because the ns_client_t is
immediatelly destroyed after setting .shuttingdown to true.
The similar code in the query_hookresume() already noticed this:
/*
* This event is running under a client task, so it's safe to detach
* the fetch handle. And it should be done before resuming query
* processing below, since that may trigger another recursion or
* asynchronous hook event.
*/
The ns_client_t is always attached to ns_clientmgr_t which has
associated memory context, server context, task and threadid. Use those
directly from the ns_clientmgr_t instead of attaching it to an extra
copy in ns_client_t to make the ns_client_t more sleek and lean.
Additionally, remove some stray ns_client_t struct members that were not
used anywhere.
Historically, the inline keyword was a strong suggestion to the compiler
that it should inline the function marked inline. As compilers became
better at optimising, this functionality has receded, and using inline
as a suggestion to inline a function is obsolete. The compiler will
happily ignore it and inline something else entirely if it finds that's
a better optimisation.
Therefore, remove all the occurences of the inline keyword with static
functions inside single compilation unit and leave the decision whether
to inline a function or not entirely on the compiler
NOTE: We keep the usage the inline keyword when the purpose is to change
the linkage behaviour.
Previously, the unreachable code paths would have to be tagged with:
INSIST(0);
ISC_UNREACHABLE();
There was also older parts of the code that used comment annotation:
/* NOTREACHED */
Unify the handling of unreachable code paths to just use:
UNREACHABLE();
The UNREACHABLE() macro now asserts when reached and also uses
__builtin_unreachable(); when such builtin is available in the compiler.
Gcc 7+ and Clang 10+ have implemented __attribute__((fallthrough)) which
is explicit version of the /* FALLTHROUGH */ comment we are currently
using.
Add and apply FALLTHROUGH macro that uses the attribute if available,
but does nothing on older compilers.
In one case (lib/dns/zone.c), using the macro revealed that we were
using the /* FALLTHROUGH */ comment in wrong place, remove that comment.
Commit aab691d51266f552a7923db32686fb9398b1d255 did not fix all possible
scenarios in which the ns_statscounter_recursclients counter underflows.
The solution implemented therein can be ineffective e.g. when CNAME
chaining happens with prefetching enabled.
Here is an example recursive resolution scenario in which the
ns_statscounter_recursclients counter can underflow with the current
logic in effect:
1. Query processing starts, the answer is not found in the cache, so
recursion is started. The NS_CLIENTATTR_RECURSING attribute is set.
ns_statscounter_recursclients is incremented (Δ = +1).
2. Recursion completes, returning a CNAME. client->recursionquota is
non-NULL, so the NS_CLIENTATTR_RECURSING attribute remains set.
ns_statscounter_recursclients is decremented (Δ = 0).
3. Query processing restarts.
4. The current QNAME (the target of the CNAME from step 2) is found in
the cache, with a TTL low enough to trigger a prefetch.
5. query_prefetch() attaches to client->recursionquota.
ns_statscounter_recursclients is not incremented because
query_prefetch() does not do that (Δ = 0).
6. Query processing restarts.
7. The current QNAME (the target of the CNAME from step 4) is not found
in the cache, so recursion is started. client->recursionquota is
already attached to (since step 5) and the NS_CLIENTATTR_RECURSING
attribute is set (since step 1), so ns_statscounter_recursclients is
not incremented (Δ = 0).
8. The prefetch from step 5 completes. client->recursionquota is
detached from in prefetch_done(). ns_statscounter_recursclients is
not decremented because prefetch_done() does not do that (Δ = 0).
9. Recursion for the current QNAME completes. client->recursionquota
is already detached from, i.e. set to NULL (since step 8), and the
NS_CLIENTATTR_RECURSING attribute is set (since step 1), so
ns_statscounter_recursclients is decremented (Δ = -1).
Another possible scenario is that after step 7, recursion for the target
of the CNAME from step 4 completes before the prefetch for the CNAME
itself. fetch_callback() then notices that client->recursionquota is
non-NULL and decrements ns_statscounter_recursclients, even though
client->recursionquota was attached to by query_prefetch() and therefore
not accompanied by an incrementation of ns_statscounter_recursclients.
The net result is also an underflow.
Instead of trying to properly handle all possible orderings of events
set into motion by normal recursion and prefetch-triggered recursion,
adjust ns_statscounter_recursclients whenever the recursive clients
quota is successfully attached to or detached from. Remove the
NS_CLIENTATTR_RECURSING attribute altogether as its only purpose is made
obsolete by this change.
this brings DNS_CLIENTINFO_VERSION into line with the subscription
branch so that fixes applied to clientinfo processing can also be
applied to the main branch without diverging.
This commit converts the license handling to adhere to the REUSE
specification. It specifically:
1. Adds used licnses to LICENSES/ directory
2. Add "isc" template for adding the copyright boilerplate
3. Changes all source files to include copyright and SPDX license
header, this includes all the C sources, documentation, zone files,
configuration files. There are notes in the doc/dev/copyrights file
on how to add correct headers to the new files.
4. Handle the rest that can't be modified via .reuse/dep5 file. The
binary (or otherwise unmodifiable) files could have license places
next to them in <foo>.license file, but this would lead to cluttered
repository and most of the files handled in the .reuse/dep5 file are
system test files.
The old code rejected NSEC that proved the wildcard name existed
(exists). The new code rejects NSEC that prove that the wildcard
name exists and that the type exists (exists && data) but accept
NSEC that prove the wildcard name exists.
query_synthnxdomain (renamed query_synthnxdomainnodata) already
took the NSEC records and added the correct records to the message
body for NXDOMAIN or NODATA responses with the above change. The
only additional change needed was to ensure the correct RCODE is
set.
dns_nsec_noexistnodata now checks that RRSIG and NSEC are
present in the type map. Both types should be present in
a correctly constructed NSEC record. This check is in
addition to similar checks in resolver.c and validator.c.
This commit adds an isc_nm_socket_type() function which can be used to
obtain a handle's socket type.
This change obsoletes isc_nm_is_tlsdns_handle() and
isc_nm_is_http_handle(). However, it was decided to keep the latter as
we eventually might end up supporting multiple HTTP versions.
Change 5756 (GL #2854) introduced build errors when using
'configure --disable-doh'. To fix this, isc_nm_is_http_handle() is
now defined in all builds, not just builds that have DoH enabled.
Missing code comments were added both for that function and for
isc_nm_is_tlsdns_handle().
The __builtin_expect() can be used to provide the compiler with branch
prediction information. The Gcc manual says[1] on the subject:
In general, you should prefer to use actual profile feedback for
this (-fprofile-arcs), as programmers are notoriously bad at
predicting how their programs actually perform.
Stop using __builtin_expect() and ISC_LIKELY() and ISC_UNLIKELY() macros
to provide the branch prediction information as the performance testing
shows that named performs better when the __builtin_expect() is not
being used.
1. https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005fexpect
Remove the dynamic registration of result codes. Convert isc_result_t
from unsigned + #defines into 32-bit enum type in grand unified
<isc/result.h> header. Keep the existing values of the result codes
even at the expense of the description and identifier tables being
unnecessary large.
Additionally, add couple of:
switch (result) {
[...]
default:
break;
}
statements where compiler now complains about missing enum values in the
switch statement.
This commit makes BIND verify that zone transfers are allowed to be
done over the underlying connection. Currently, it makes sense only
for DoT, but the code is deliberately made to be protocol-agnostic.
- The `timeout_action` parameter to dns_dispatch_addresponse() been
replaced with a netmgr callback that is called when a dispatch read
times out. this callback may optionally reset the read timer and
resume reading.
- Added a function to convert isc_interval to milliseconds; this is used
to translate fctx->interval into a value that can be passed to
dns_dispatch_addresponse() as the timeout.
- Note that netmgr timeouts are accurate to the millisecond, so code to
check whether a timeout has been reached cannot rely on microsecond
accuracy.
- If serve-stale is configured, then a timeout received by the resolver
may trigger it to return stale data, and then resume waiting for the
read timeout. this is no longer based on a separate stale timer.
- The code for canceling requests in request.c has been altered so that
it can run asynchronously.
- TCP timeout events apply to the dispatch, which may be shared by
multiple queries. since in the event of a timeout we have no query ID
to use to identify the resp we wanted, we now just send the timeout to
the oldest query that was pending.
- There was some additional refactoring in the resolver: combining
fctx_join() and fctx_try_events() into one function to reduce code
duplication, and using fixednames in fetchctx and fetchevent.
- Incidental fix: new_adbaddrinfo() can't return NULL anymore, so the
code can be simplified.
The additional processing method has been expanded to take the
owner name of the record, as HTTPS and SVBC need it to process "."
in service form.
The additional section callback can now return the RRset that was
added. We use this when adding CNAMEs. Previously, the recursion
would stop if it detected that a record you added already exists. With
CNAMEs this rule doesn't work, as you ultimately care about the RRset
at the target of the CNAME and not the presence of the CNAME itself.
Returning the record allows the caller to restart with the target
name. As CNAMEs can form loops, loop protection was added.
As HTTPS and SVBC can produce infinite chains, we prevent this by
tracking recursion depth and stopping if we go too deep.