The high-water allows administrators to better tune the recursive
clients limit without having to to poll the statistics channel in high
rates to get this number.
An RPZ response's SOA record TTL is set to 1 instead of the SOA TTL,
a boolean value is passed on to query_addsoa, which is supposed to be
a TTL value. I don't see what value is appropriate to be used for
overriding, so we will pass UINT32_MAX.
Use the (1 << N) form for defining the flags, in order to avoid
errors like the one fixed in the previous commit.
Also convert the definitions to an enum, as done in some of our
recent refactoring work.
The DNS_GETDB_STALEFIRST flag is defined as 0x0C, which is the
combination of the DNS_GETDB_PARTIAL (0x04) and the
DNS_GETDB_IGNOREACL (0x08) flags (0x04 | 0x08 == 0x0C) , which is
an obvious error.
All the flags should be power of two, so they don't interfere with
each other. Fix the DNS_GETDB_STALEFIRST flag by setting it to 0x10.
If we are in the process of looking for the A records as part of
dns64 processing and the server-stale timeout triggers, redo the
dns64 changes that had been made to the orignal qctx.
The wrong result value was being saved for resumption with
nxdomain-redirect when performing the fetch. This lead to an assert
when checking that RFC 1918 reverse queries where not leaking to
the global internet.
When serve-stale is enabled and recursive resolution fails, the fallback
to lookup stale data always happens in the cache database. Any
authoritative data is ignored, and only information learned through
recursive resolution is examined.
If there is data in the cache that could lead to an answer, and this can
be just the root delegation, the resolver will iterate further, getting
closer to the answer that can be found by recursing down the root, and
eventually puts the final response in the cache.
Change the fallback to serve-stale to use 'query_getdb()', that finds
out the best matching database for the given query.
when synthesizing a new CNAME, we now check whether the target
matches the query already being processed. if so, we do not
restart the query; this prevents a waste of resources.
Add a trace point that would report when a query gets dropped or slipped
by rate limits. It reports the client IP, the zone, and the RRL result
code.
Co-authored-by: Paul Frieden <pfrieden@yahooinc.com>
The dns_badcache unit had (yet another) own locked hashtable
implementation. Replace the hashtable used by dns_badcache with
lock-free cds_lfht implementation from liburcu.
Under some circumstances when processing a query response - for example,
when it contains a CNAME or DNAME - a query will have to be restarted
from the beginning to look up a new target.
This was previously handled by recursively calling the ns__query_start()
function directly from ns_query_done(). However, performance test data
indicated that chains of CNAMEs could consume quite a bit of time inside
the worker thread, increasing latency for other waiting queries. This
has now been changed so that restarted queries are run asynchronously.
ultimately we want the slab implementation of dns_rdataset to
be usable by more database implementaions than just rbtdb. this
commit moves rdataset_methods to rdataslab.c, renamed
dns_rdataslab_rdatasetmethods.
new database methods have been added: locknode, unlocknode,
addglue, expiredata, and deletedata, allowing external functions to
perform functions that previously required internal access to the
database implementation.
database and heap pointers are now stored in the dns_slabheader object
so that header is the only thing that needs to be passed to some
functions; this will simplify moving functions that process slabheaders
out of rbtdb.c so they can be used by other database implementations.
BIND's rdataset structure is a view of some DNS records. It is
polymorphic, so the details of how the records are stored can vary.
For instance, the records can be held in an rdatalist, or in an
rdataslab in the rbtdb.
The dns_rdataset structure previously had a number of fields called
`private1` up to `private7`, which were used by the various rdataset
implementations. It was not at all clear what these fields were for,
without reading the code and working it out from context.
This change makes the rdataset inheritance hierarchy more clear. The
polymorphic part of a `struct dns_rdataset` is now a union of structs,
each of which is named for the class of implementation using it. The
fields of these structs replace the old `privateN` fields. (Note: the
term "inheritance hierarchy" refers to the fact that the builtin and
SDLZ implementations are based on and inherit from the rdatalist
implementation, which in turn inherits from the generic rdataset.
Most of this change is mechanical, but there are a few extras.
In keynode.c there were a number of REQUIRE()ments that were not
necessary: they had already been checked by the rdataset method
dispatch code. On the other hand, In ncache.c there was a public
function which needed to REQUIRE() that an rdataset was valid.
I have removed lots of "reset iterator state" comments, because it
should now be clear from `target->iter = NULL` where before
`target->private5 = NULL` could have been doing anything.
Initialization is a bit neater in a few places, using C structure
literals where appropriate.
The pointer arithmetic for translating between an rdataslab header and
its raw contents is now fractionally safer.
The server was previously tolerant of out-of-date or otherwise bad
DNS SERVER COOKIES that where well formed unless require-cookie was
set. BADCOOKIE is now return for these conditions.
We recently fixed a bug where in some cases (when following an
expired CNAME for example), named could return SERVFAIL if the target
record is still valid (see isc-projects/bind9#3678, and
isc-projects/bind9!7096). We fixed this by considering non-stale
RRsets as well during the stale lookup.
However, this triggered a new bug because despite the answer from
cache not being stale, the lookup may be triggered by serve-stale.
If the answer from database is not stale, the fix in
isc-projects/bind9!7096 erroneously skips the serve-stale logic.
Add 'answer_found' checks to the serve-stale logic to fix this issue.
The following code block repeats quite often:
if (rdata.type == dns_rdatatype_dnskey ||
rdata.type == dns_rdatatype_cdnskey ||
rdata.type == dns_rdatatype_cds)
Introduce a new function to reduce the repetition.
In e18541287231b721c9cdb7e492697a2a80fd83fc, the TCP accept quota code
became broken in a subtle way - the quota would get initialized on the
first accept for the server socket and then deleted from the server
socket, so it would never get applied again.
Properly fixing this required a bigger refactoring of the isc_quota API
code to make it much simpler. The new code decouples the ownership of
the quota and acquiring/releasing the quota limit.
After (during) the refactoring it became more clear that we need to use
the callback from the child side of the accepted connection, and not the
server side.
This change makes the zone table lock-free for reads. Previously, the
zone table used a red-black tree, which is not thread safe, so the hot
read path acquired both the per-view mutex and the per-zonetable
rwlock. (The double locking was to fix to cleanup races on shutdown.)
One visible difference is that zones are not necessarily shut down
promptly: it depends on when the qp-trie garbage collector cleans up
the zone table. The `catz` system test checks several times that zones
have been deleted; the test now checks for zones to be removed from
the server configuration, instead of being fully shut down. The catz
test does not churn through enough zones to trigger a gc, so the zones
are not fully detached until the server exits.
After this change, it is still possible to improve the way we handle
changes to the zone table, for instance, batching changes, or better
compaction heuristics.
This is a simple replacement using the semantic patch from the previous
commit and as added bonus, one removal of previously undetected unused
variable in named/server.c.
This implements node reference tracing that passes all the internal
layers from dns_db API (and friends) to increment_reference() and
decrement_reference().
It can be enabled by #defining DNS_DB_NODETRACE in <dns/trace.h> header.
The output then looks like this:
incr:node:check_address_records:rootns.c:409:0x7f67f5a55a40->references = 1
decr:node:check_address_records:rootns.c:449:0x7f67f5a55a40->references = 0
incr:nodelock:check_address_records:rootns.c:409:0x7f67f5a55a40:0x7f68304d7040->references = 1
decr:nodelock:check_address_records:rootns.c:449:0x7f67f5a55a40:0x7f68304d7040->references = 0
There's associated python script to find the missing detach located at:
https://gitlab.isc.org/isc-projects/bind9/-/snippets/1038
as there is no further use of isc_task in BIND, this commit removes
it, along with isc_taskmgr, isc_event, and all other related types.
functions that accepted taskmgr as a parameter have been cleaned up.
as a result of this change, some functions can no longer fail, so
they've been changed to type void, and their callers have been
updated accordingly.
the tasks table has been removed from the statistics channel and
the stats version has been updated. dns_dyndbctx has been changed
to reference the loopmgr instead of taskmgr, and DNS_DYNDB_VERSION
has been udpated as well.
callback events from dns_resolver_createfetch() are now posted
using isc_async_run.
other modules which called the resolver and maintained task/taskmgr
objects for this purpose have been cleaned up.
Instead of using an extra rarely-used paramater to dns_clientinfo_init()
to set ECS information for a client, this commit adds a function
dns_clientinfo_setecs() which can be called only when ECS is needed.
Return 'isc_result_t' type value instead of 'bool' to indicate
the actual failure. Rename the function to something not suggesting
a boolean type result. Make changes in the places where the API
function is being used to check for the result code instead of
a boolean value.
Although 'dns_fetch_t' fetch can have two associated events, one for
each of 'DNS_EVENT_FETCHDONE' and 'DNS_EVENT_TRYSTALE' types, the
dns_resolver_cancelfetch() function is designed in a way that it
expects only one existing event, which it must cancel, and when it
happens so that 'stale-answer-client-timeout' is enabled and there
are two events, only one of them is canceled, and it results in an
assertion in dns_resolver_destroyfetch(), when it finds a dangling
event.
Change the logic of dns_resolver_cancelfetch() function so that it
cancels both the events (if they exist), and in the right order.
dns_db_findext() asserts if RRSIG is passed to it and
query_lookup_stale() failed to map RRSIG to ANY to prevent this. To
avoid cases like this in the future, move the mapping of SIG and RRSIG
to ANY for qctx->type to qctx_init().
With 'stale-answer-enable yes;' and 'stale-answer-client-timeout off;',
consider the following situation:
A CNAME record and its target record are in the cache, then the CNAME
record expires, but the target record is still valid.
When a new query for the CNAME record arrives, and the query fails,
the stale record is used, and then the query "restarts" to follow
the CNAME target. The problem is that the query's multiple stale
options (like DNS_DBFIND_STALEOK) are not reset, so 'query_lookup()'
treats the restarted query as a lookup following a failed lookup,
and returns a SERVFAIL answer when there is no stale data found in the
cache, even if there is valid non-stale data there available.
With this change, query_lookup() now considers non-stale data in the
cache in the first place, and returns it if it is available.
This reverts commit 629f66ea8e7a3455f22f57394eef54cfabcb8860, reversing
changes made to 84a7be327e801cfda207629285bf3302a71e8119.
It also removes release note 6038, since the fix is reverted.
With 'stale-answer-enable yes;' and 'stale-answer-client-timeout off;',
consider the following situation:
A CNAME record and its target record are in the cache, then the CNAME
record expires, but the target record is still valid.
When a new query for the CNAME record arrives, and the query fails,
the stale record is used, and then the query "restarts" to follow
the CNAME target. The problem is that the query's multiple stale
options (like DNS_DBFIND_STALEOK) are not reset, so 'query_lookup()'
treats the restarted query as a lookup following a failed lookup,
and returns a SERVFAIL answer when there is no stale data found in the
cache, even if there is valid non-stale data there available.
With this change, query_lookup() now considers non-stale data in the
cache in the first place, and returns it if it is available.