2
0
mirror of https://gitlab.isc.org/isc-projects/bind9 synced 2025-08-28 13:08:06 +00:00

283 Commits

Author SHA1 Message Date
Mark Andrews
3969e2c5f7 Return BADCOOKIE on validly formed bad SERVER COOKIES
The server was previously tolerant of out-of-date or otherwise bad
DNS SERVER COOKIES that where well formed unless require-cookie was
set.  BADCOOKIE is now return for these conditions.
2023-07-13 01:58:53 +00:00
Mark Andrews
dd00b3c50b Use NS rather than A records for qname-minimization relaxed
Remove all references to DNS_FETCHOPT_QMIN_USE_A and adjust
the expected tests results in the qmin system test.
2023-06-28 11:45:59 +10:00
Mark Andrews
783c6a9538
Use dns_view_findzone instead of dns_zt_find
This ensures that rcu locking is properly applied for
view->zonetable.
2023-06-01 16:51:38 +02:00
Matthijs Mekking
74d30879ba Extend serve-stale logging
Print the database lookup result in serve-stale logs for debugging
potential future serve-stale issues.
2023-05-30 11:58:19 +02:00
Matthijs Mekking
bbd163acf6 Fix serve-stale bug when cache has no data
We recently fixed a bug where in some cases (when following an
expired CNAME for example), named could return SERVFAIL if the target
record is still valid (see isc-projects/bind9#3678, and
isc-projects/bind9!7096). We fixed this by considering non-stale
RRsets as well during the stale lookup.

However, this triggered a new bug because despite the answer from
cache not being stale, the lookup may be triggered by serve-stale.
If the answer from database is not stale, the fix in
isc-projects/bind9!7096 erroneously skips the serve-stale logic.

Add 'answer_found' checks to the serve-stale logic to fix this issue.
2023-05-30 11:58:19 +02:00
Matthijs Mekking
ef58f2444f Add new dns_rdatatype_iskeymaterial() function
The following code block repeats quite often:

    if (rdata.type == dns_rdatatype_dnskey ||
        rdata.type == dns_rdatatype_cdnskey ||
        rdata.type == dns_rdatatype_cds)

Introduce a new function to reduce the repetition.
2023-05-23 08:53:23 +02:00
Ondřej Surý
1715cad685
Refactor the isc_quota code and fix the quota in TCP accept code
In e18541287231b721c9cdb7e492697a2a80fd83fc, the TCP accept quota code
became broken in a subtle way - the quota would get initialized on the
first accept for the server socket and then deleted from the server
socket, so it would never get applied again.

Properly fixing this required a bigger refactoring of the isc_quota API
code to make it much simpler.  The new code decouples the ownership of
the quota and acquiring/releasing the quota limit.

After (during) the refactoring it became more clear that we need to use
the callback from the child side of the accepted connection, and not the
server side.
2023-04-12 14:10:37 +02:00
Tony Finch
b171cacf4f Use a qp-trie for the zone table
This change makes the zone table lock-free for reads. Previously, the
zone table used a red-black tree, which is not thread safe, so the hot
read path acquired both the per-view mutex and the per-zonetable
rwlock. (The double locking was to fix to cleanup races on shutdown.)

One visible difference is that zones are not necessarily shut down
promptly: it depends on when the qp-trie garbage collector cleans up
the zone table. The `catz` system test checks several times that zones
have been deleted; the test now checks for zones to be removed from
the server configuration, instead of being fully shut down. The catz
test does not churn through enough zones to trigger a gc, so the zones
are not fully detached until the server exits.

After this change, it is still possible to improve the way we handle
changes to the zone table, for instance, batching changes, or better
compaction heuristics.
2023-04-05 12:38:11 +01:00
Evan Hunt
80e2a23f9e
silence coverity warnings
silence coverity warnings in the DNSPRS code:
- CID 451097, failure to check return value of rpz_ready()
- CID 451099, resource leak
2023-04-05 09:23:51 +02:00
Ondřej Surý
46f06c1d6e
Apply the semantic patch to remove isc_stdtime_get()
This is a simple replacement using the semantic patch from the previous
commit and as added bonus, one removal of previously undetected unused
variable in named/server.c.
2023-03-31 13:32:56 +02:00
Evan Hunt
bed8f85ff2 import libdummyrpz test library for DNSRPS
libdummyrpz is a limited version of the fastrpz library for use in
testing the dnsrps API.
2023-03-28 15:44:31 -07:00
Ondřej Surý
cd632ad31d
Implement dns_db node tracing
This implements node reference tracing that passes all the internal
layers from dns_db API (and friends) to increment_reference() and
decrement_reference().

It can be enabled by #defining DNS_DB_NODETRACE in <dns/trace.h> header.

The output then looks like this:

    incr:node:check_address_records:rootns.c:409:0x7f67f5a55a40->references = 1
    decr:node:check_address_records:rootns.c:449:0x7f67f5a55a40->references = 0

    incr:nodelock:check_address_records:rootns.c:409:0x7f67f5a55a40:0x7f68304d7040->references = 1
    decr:nodelock:check_address_records:rootns.c:449:0x7f67f5a55a40:0x7f68304d7040->references = 0

There's associated python script to find the missing detach located at:
https://gitlab.isc.org/isc-projects/bind9/-/snippets/1038
2023-02-28 11:44:15 +01:00
Evan Hunt
a52b17d39b
remove isc_task completely
as there is no further use of isc_task in BIND, this commit removes
it, along with isc_taskmgr, isc_event, and all other related types.

functions that accepted taskmgr as a parameter have been cleaned up.
as a result of this change, some functions can no longer fail, so
they've been changed to type void, and their callers have been
updated accordingly.

the tasks table has been removed from the statistics channel and
the stats version has been updated. dns_dyndbctx has been changed
to reference the loopmgr instead of taskmgr, and DNS_DYNDB_VERSION
has been udpated as well.
2023-02-16 18:35:32 +01:00
Evan Hunt
0312789129
refactor dns_resolver to use loop callbacks
callback events from dns_resolver_createfetch() are now posted
using isc_async_run.

other modules which called the resolver and maintained task/taskmgr
objects for this purpose have been cleaned up.
2023-02-16 17:27:59 +01:00
Evan Hunt
b061c7e27f
refactor plugin hook resumption to use loop callbacks
plugins supporting asynchronous operation now use a loop callback
to resume operation in query_hookresume() rather than a task.
2023-02-16 17:16:41 +01:00
Tony Finch
6927a30926 Remove do-nothing header <isc/print.h>
This one really truly did nothing. No lines added!
2023-02-15 16:44:47 +00:00
Evan Hunt
ff3fdaa424 refactor dns_clientinfo_init(); use separate function to set ECS
Instead of using an extra rarely-used paramater to dns_clientinfo_init()
to set ECS information for a client, this commit adds a function
dns_clientinfo_setecs() which can be called only when ECS is needed.
2023-02-07 23:48:22 -08:00
Michał Kępień
4e934bae0b BIND 9.19.9
-----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEENKwGS3ftSQfs1TU17QVz/8hFYQUFAmPAfwYPHG1pY2hhbEBp
 c2Mub3JnAAoJEO0Fc//IRWEFpmAP/23tasuol54W1dxnjGoQ7NYDV89ywQiWplyn
 syPs+iESFb3I9SlAHHhRGM0IREuDxjuexFdrIJOfZqokg36qPj+z81LRlRuRuetc
 HigGzpt2CDP41rVMsxzW3vyh2a3fTrjBKYT4tnDlsdnbwJOfFG4N/hdB7jqDPWut
 u1Itf/lD8iHhsISgFqvtKiQqc6XFwwzVAeSPH6pHnmngt16imVoQiddnw1RYn0vB
 EPcqhVvSeYS1AGWprnHpaWt8bru460iZwet+QKlxNxW6p4mOXGr6jQWqhZ+6ORDr
 Vo/a3+5Di+tNn89GJSbehLi5UQbvrcMR8WiQ54WP/k0PPTgoqMRC4PerLsNU8Vzq
 y1k18n8DMsuro92cNAdJk3gXuXYgGNF2sk9JtqwmiDo1/6G3afKfDiVKjiK1CxK0
 1CMKD+mPHCWB/H5U50oL1z89OCZDVUBUDT0YIrCBBrTIitzyXyAFkh+sjbRbdzww
 kg1GdZ4ODaydcWYH7r3RCHWDX6nkwADqGRk0SYvrJTFL2Hu150mwuxZj/5UZcmsz
 of6qh5b9yZrDrnBHgoqknnepuxiORFF7l3kk63fA13WG6S1m6h2ZONoVLw0J67dx
 mnAo0nlnWKi+TEl/CHiHcMZbeVhE/jrHAMPIcQQphKbCeQT1NPFSU2FQxa+dpix+
 V+y8x6Qb
 =TTpT
 -----END PGP SIGNATURE-----

Merge tag 'v9_19_9'

BIND 9.19.9
2023-01-25 21:16:00 +01:00
Aram Sargsyan
41dc48bfd7 Refactor isc_nm_xfr_allowed()
Return 'isc_result_t' type value instead of 'bool' to indicate
the actual failure. Rename the function to something not suggesting
a boolean type result. Make changes in the places where the API
function is being used to check for the result code instead of
a boolean value.
2023-01-19 10:24:08 +00:00
Aram Sargsyan
ec2098ca35 Cancel all fetch events in dns_resolver_cancelfetch()
Although 'dns_fetch_t' fetch can have two associated events, one for
each of 'DNS_EVENT_FETCHDONE' and 'DNS_EVENT_TRYSTALE' types, the
dns_resolver_cancelfetch() function is designed in a way that it
expects only one existing event, which it must cancel, and when it
happens so that 'stale-answer-client-timeout' is enabled and there
are two events, only one of them is canceled, and it results in an
assertion in dns_resolver_destroyfetch(), when it finds a dangling
event.

Change the logic of dns_resolver_cancelfetch() function so that it
cancels both the events (if they exist), and in the right order.
2023-01-12 12:43:32 +01:00
Mark Andrews
56eae06418 Move the mapping of SIG and RRSIG to ANY
dns_db_findext() asserts if RRSIG is passed to it and
query_lookup_stale() failed to map RRSIG to ANY to prevent this.  To
avoid cases like this in the future, move the mapping of SIG and RRSIG
to ANY for qctx->type to qctx_init().
2023-01-12 12:22:58 +01:00
Matthijs Mekking
91a1a8efc5 Consider non-stale data when in serve-stale mode
With 'stale-answer-enable yes;' and 'stale-answer-client-timeout off;',
consider the following situation:

A CNAME record and its target record are in the cache, then the CNAME
record expires, but the target record is still valid.

When a new query for the CNAME record arrives, and the query fails,
the stale record is used, and then the query "restarts" to follow
the CNAME target. The problem is that the query's multiple stale
options (like DNS_DBFIND_STALEOK) are not reset, so 'query_lookup()'
treats the restarted query as a lookup following a failed lookup,
and returns a SERVFAIL answer when there is no stale data found in the
cache, even if there is valid non-stale data there available.

With this change, query_lookup() now considers non-stale data in the
cache in the first place, and returns it if it is available.
2023-01-09 10:44:01 +01:00
Artem Boldariev
03e33a014c BIND: use Stream DNS for DNS over TLS connections
This commit makes BIND use the new Stream DNS transport for DNS over
TLS.
2022-12-20 22:13:52 +02:00
Tom Krizek
ba1607747c
Revert "Merge branch '3678-serve-stale-servfailing-unexpectedly' into 'main'"
This reverts commit 629f66ea8e7a3455f22f57394eef54cfabcb8860, reversing
changes made to 84a7be327e801cfda207629285bf3302a71e8119.

It also removes release note 6038, since the fix is reverted.
2022-12-08 10:30:44 +01:00
Mark Andrews
7695c36a5d Extend dns_db_allrdatasets to control interation results
Add an options parameter to control what rdatasets are returned when
iteratating over the node.  Specific modes will be added later.
2022-12-07 22:20:02 +00:00
Matthijs Mekking
86a80e723f Consider non-stale data when in serve-stale mode
With 'stale-answer-enable yes;' and 'stale-answer-client-timeout off;',
consider the following situation:

A CNAME record and its target record are in the cache, then the CNAME
record expires, but the target record is still valid.

When a new query for the CNAME record arrives, and the query fails,
the stale record is used, and then the query "restarts" to follow
the CNAME target. The problem is that the query's multiple stale
options (like DNS_DBFIND_STALEOK) are not reset, so 'query_lookup()'
treats the restarted query as a lookup following a failed lookup,
and returns a SERVFAIL answer when there is no stale data found in the
cache, even if there is valid non-stale data there available.

With this change, query_lookup() now considers non-stale data in the
cache in the first place, and returns it if it is available.
2022-12-06 13:26:53 +00:00
Mark Andrews
bce1cf6c62 Log type with stale answer log messages
Add more information about which query type is dealing with serve-stale.
Update the expected log messages in the serve-stale system test.
2022-11-30 14:32:58 +01:00
Michal Nowak
afdb41a5aa
Update sources to Clang 15 formatting 2022-11-29 08:54:34 +01:00
Evan Hunt
305a50dbe1 ensure RPZ lookups handle CD=1 correctly
RPZ rewrites called dns_db_findext() without passing through the
client database options; as as result, if the client set CD=1,
DNS_DBFIND_PENDINGOK was not used as it should have been, and
cache lookups failed, resulting in failure of the rewrite.
2022-10-19 11:36:11 -07:00
Petr Špaček
53b3ceacd4
Replace #define DNS_NAMEATTR_ with struct of bools
sizeof(dns_name_t) did not change but the boolean attributes are now
separated as one-bit structure members. This allows debuggers to
pretty-print dns_name_t attributes without any special hacks, plus we
got rid of manual bit manipulation code.
2022-10-13 17:04:02 +02:00
Matthijs Mekking
0681b15225 If refresh stale RRset times out, start stale-refresh-time
The previous commit failed some tests because we expect that if a
fetch fails and we have stale candidates in cache, the
stale-refresh-time window is started. This means that if we hit a stale
entry in cache and answering stale data is allowed, we don't bother
resolving it again for as long we are within the stale-refresh-time
window.

This is useful for two reasons:
- If we failed to fetch the RRset that we are looking for, we are not
  hammering the authoritative servers.

- Successor clients don't need to wait for stale-answer-client-timeout
  to get their DNS response, only the first one to query will take
  the latency penalty.

The latter is not useful when stale-answer-client-timeout is 0 though.

So this exception code only to make sure we don't try to refresh the
RRset again if it failed to do so recently.
2022-10-05 08:20:48 +02:00
Matthijs Mekking
64d51285d5 Reuse recursion type code for refresh stale RRset
Refreshing a stale RRset is similar to prefetching an RRset, so
reuse the existing code. When refreshing an RRset we need to clear
all db options related to serve-stale so that stale RRsets in cache
are ignored during the refresh.

We no longer need to set the "nodetach" flag, because the refresh
fetch is now a "fetch and forget". So we can detach from the client
in the query_send().

This code will break some serve-stale test cases, this will be fixed
in the successor commit.

TODO: add explanation why the serve-stale test cases fail.
2022-10-05 08:20:48 +02:00
Matthijs Mekking
5fb8e555bc Add new recursion type for refreshing stale RRset
Refreshing a stale RRset is similar to a prefetch query, so we can
refactor this code to use the new recursion types introduced in !5883.
2022-10-05 08:20:48 +02:00
Michał Kępień
2ee16067c5 BIND 9.19.5
-----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEENKwGS3ftSQfs1TU17QVz/8hFYQUFAmMZ2WwPHG1pY2hhbEBp
 c2Mub3JnAAoJEO0Fc//IRWEFZz0P/3B8tQXCztMneNsAzvQ11hASuQH3RVvd1p9z
 H6yPfbBuqyBM7FOJWozLQSI0JvxwBPXW+G+AmEhafSB4plgJBfNb12TsN7ZpECbF
 E6ckVQTiLwiYWt/2neu2OYg0aOnl5mhO5J4ESkSgqXGXcDihQ922xLJFQdAAgeAj
 T6TzrF1rv0fVNNlAcE1hrsZsGChTdPAguo/jVPXJjOO8hcEFGEqCWGhCX+wuyY6t
 WRXYcnh37/rlLIY29R3sVKttPIrD7DN6doGuz0/BP0PuuXCFnWBz/t61Et8Q/nxO
 hTS4RoKs/14IXRH7UBspo1dnG7khGYu2z44mCRwx15+fjpJ+zAL/Ym9xa0ElLOWg
 +Asd8w1N275xUQdrcTxpM7z/2z7SP/+bxtLJjIPW+9Z2a8rk8ifLu1yjtWASwOUO
 vLIK0WU3T7FPhpdP+0VgeSYAlJgLEoIgwIWCB+u+I4dR9DJJ7TtjPHDcfrJKXaJ6
 eTTFIZ97xIFEpH53mT+QRG52PFP39fiLa0i7ylM+C0UbMklG++UgtkHz2CkkzV4H
 hqVcQ0Usk8XICkZ0PHAQklaDnDhXBD48x0J7wJOQSy+KS1foAyMFSPXv0ZelwiRM
 Q0StU+t+wXTAK3QID0tBqU4CyFD8fKO3cFwUnv5zqmrRc4ITu3etObT17MDPQKJj
 KLSl1VyB
 =6VJu
 -----END PGP SIGNATURE-----

Merge tag 'v9_19_5'

BIND 9.19.5
2022-09-21 13:04:58 +02:00
Petr Špaček
fdf7456643
Log reason why cache peek is not available
Log which ACL caused RD=0 query into cache to be refused.
Expected performance impact is negligible.
2022-09-15 06:50:13 +02:00
Matthijs Mekking
d939d2ecde Only refresh RRset once
Don't attempt to resolve DNS responses for intermediate results. This
may create multiple refreshes and can cause a crash.

One scenario is where for the query there is a CNAME and canonical
answer in cache that are both stale. This will trigger a refresh of
the RRsets because we encountered stale data and we prioritized it over
the lookup. It will trigger a refresh of both RRsets. When we start
recursing, it will detect a recursion loop because the recursion
parameters will eventually be the same. In 'dns_resolver_destroyfetch'
the sanity check fails, one of the callers did not get its event back
before trying to destroy the fetch.

Move the call to 'query_refresh_rrset' to 'ns_query_done', so that it
is only called once per client request.

Another scenario is where for the query there is a stale CNAME in the
cache that points to a record that is also in cache but not stale. This
will trigger a refresh of the RRset (because we encountered stale data
and we prioritized it over the lookup).

We mark RRsets that we add to the message with
DNS_RDATASETATTR_STALE_ADDED to prevent adding a duplicate RRset when
a stale lookup and a normal lookup conflict with each other. However,
the other non-stale RRset when following a CNAME chain will be added to
the message without setting that attribute, because it is not stale.

This is a variant of the bug in #2594. The fix covered the same crash
but for stale-answer-client-timeout > 0.

Fix this by clearing all RRsets from the message before refreshing.
This requires the refresh to happen after the query is send back to
the client.
2022-09-08 11:24:37 +02:00
Aram Sargsyan
baa9698c9d Fix RRL responses-per-second bypass using wildcard names
It is possible to bypass Response Rate Limiting (RRL)
`responses-per-second` limitation using specially crafted wildcard
names, because the current implementation, when encountering a found
DNS name generated from a wildcard record, just strips the leftmost
label of the name before making a key for the bucket.

While that technique helps with limiting random requests like
<random>.example.com (because all those requests will be accounted
as belonging to a bucket constructed from "example.com" name), it does
not help with random names like subdomain.<random>.example.com.

The best solution would have been to strip not just the leftmost
label, but as many labels as necessary until reaching the suffix part
of the wildcard record from which the found name is generated, however,
we do not have that information readily available in the context of RRL
processing code.

Fix the issue by interpreting all valid wildcard domain names as
the zone's origin name concatenated to the "*" name, so they all will
be put into the same bucket.
2022-09-08 09:15:30 +02:00
Mark Andrews
5805457d9d Remove dead code
*** CID 352816:  Control flow issues  (DEADCODE) /lib/ns/query.c: 8443 in query_dns64()
    8437     cleanup:
    8438     	if (buffer != NULL) {
    8439     		isc_buffer_free(&buffer);
    8440     	}
    8441
    8442     	if (dns64_rdata != NULL) {
    >>>     CID 352816:  Control flow issues  (DEADCODE)
    >>>     Execution cannot reach this statement: "dns_message_puttemprdata(cl...".
    8443     		dns_message_puttemprdata(client->message, &dns64_rdata);
    8444     	}
    8445
    8446     	if (dns64_rdataset != NULL) {
    8447     		dns_message_puttemprdataset(client->message, &dns64_rdataset);
    8448     	}
2022-09-06 12:47:08 +00:00
Mark Andrews
3ef734e0f5 Remove dead code
*** CID 352812:  Control flow issues  (DEADCODE) /lib/ns/query.c: 8584 in query_filter64()
    8578     cleanup:
    8579     	if (buffer != NULL) {
    8580     		isc_buffer_free(&buffer);
    8581     	}
    8582
    8583     	if (myrdata != NULL) {
    >>>     CID 352812:  Control flow issues  (DEADCODE)
    >>>     Execution cannot reach this statement: "dns_message_puttemprdata(cl...".
    8584     		dns_message_puttemprdata(client->message, &myrdata);
    8585     	}
    8586
    8587     	if (myrdataset != NULL) {
    8588     		dns_message_puttemprdataset(client->message, &myrdataset);
    8589     	}
2022-09-06 12:47:08 +00:00
Aram Sargsyan
83395f4cfb Set the extended DNS error code for RPZ-modified queries
When enabled through a configuration option, set the configured EDE code
for the modified queries.
2022-08-31 08:56:03 +00:00
Aram Sargsyan
c51b052827 dns_rdatalist_tordataset() and dns_rdatalist_fromrdataset() can not fail
Clean up dns_rdatalist_tordataset() and dns_rdatalist_fromrdataset()
functions by making them return void, because they cannot fail.

Clean up other functions that subsequently cannot fail.
2022-08-09 08:19:51 +00:00
Matthijs Mekking
c5b71e2472 Don't enable serve-stale on duplicate queries
When checking if we should enable serve-stale, add an early out case
when the result is an error signalling a duplicate query or a query
that would be dropped.
2022-08-09 09:13:53 +02:00
Mark Andrews
228dadb026 Check the synth-form-dnssec namespace when synthesising responses
Call dns_view_sfd_find to find the namespace to be used to verify
the covering NSEC records returned for the given QNAME.  Check that
the NSEC owner names are within that namespace.
2022-07-05 12:29:01 +10:00
Michał Kępień
887c666caf Obsolete the "glue-cache" option
The "glue-cache" option was marked as deprecated by commit
5ae33351f286feb25a965bf3c9e6b122ab495342 (first released in BIND 9.17.6,
back in October 2020), so now obsolete that option, removing all code
and documentation related to it.

Note: this causes the glue cache feature to be permanently enabled, not
disabled.
2022-06-30 15:24:08 +02:00
Michał Kępień
0d7e5d513c Assert on unknown isc_quota_attach() return values
The only values that the isc_quota_attach() function (called from
check_recursionquota() via recursionquotatype_attach_soft()) can
currently return are: ISC_R_SUCCESS, ISC_R_SOFTQUOTA, and ISC_R_QUOTA.
Instead of just propagating any other (unexpected) error up the call
stack, assert immediately, so that if the isc_quota_* API gets updated
in the future to return values currently matching the "default"
statement, check_recursionquota() can be promptly updated to handle such
new return values as desired.
2022-06-14 13:13:32 +02:00
Michał Kępień
8d64beb06f Use a switch statement in check_recursionquota()
Improve readability of the check_recursionquota() function by replacing
a sequence of conditional statements with a switch statement.
2022-06-14 13:13:32 +02:00
Michał Kępień
a06cdfc7b7 Add helper function for recursive-clients logging
Reduce code duplication in check_recursionquota() by extracting its
parts responsible for logging to a separate helper function.

Remove result text from the "no more recursive clients" message because
it always says "quota reached" (as the relevant branch is only evaluated
when 'result' is set to ISC_R_QUOTA) and therefore brings no additional
value.
2022-06-14 13:13:32 +02:00
Michał Kępień
7bc4425e2a Remove redundant recursion quota pointer checks
When the client->recursionquota pointer was overloaded by different
features, each of those features had to be aware of that fact and handle
any updates of that pointer gracefully.  Example: prefetch code
initiates recursion, attaching to client->recursionquota, then query
processing restarts due to a CNAME being encountered, then that CNAME is
not found in the cache, so another recursion is triggered, but
client->recursionquota is already attached to; even though it is not
CNAME chaining code that attached to that pointer, that code still has
to handle such a situation gracefully.

However, all features that can initiate recursion have now been updated
to use separate slots in the 'recursions' array, so keeping the old
checks in place means masking future programming errors that could
otherwise be caught - and should be caught because each feature needs to
properly maintain its own quota reference.

Remove outdated recursion quota pointer checks to enable the assertions
in isc_quota_*() functions to detect programming errors in code paths
that can start recursion.  Remove an outdated comment to prevent
confusion.
2022-06-14 13:13:32 +02:00
Michał Kępień
e09b36f2cc Adjust recursion quota when starting a fetch fails
Some functions fail to detach from the recursion quota if an error
occurs while initiating recursion.  This causes the recursive client
counter to be off.  Add missing recursionquota_detach() calls, reworking
cleanup code where appropriate.
2022-06-14 13:13:32 +02:00
Michał Kępień
172e15f7ad Attach to separate recursion quota pointers
Similarly to how different code paths reused common client handle
pointers and fetch references despite being logically unrelated, they
also reuse client->recursionquota, the field in which a reference to the
recursion quota is stored.  This unnecessarily forces all code using
that field to be aware of the fact that it is overloaded by different
features.

Overloading client->recursionquota also causes inconsistent behavior.
For example, if prefetch code triggers recursion and then delegation
handling code also triggers recursion, only one of these code paths will
be able to attach to the recursion quota, but both recursions will be
started anyway.  In other words, each code path only checks whether the
recursion quota has not been exceeded if the quota has not yet been
attached to by another code path.  This behavior theoretically allows
the configured recursion quota to be slightly exceeded; while it is not
expected to be a real-world operational issue, it is still confusing and
should therefore be fixed.

Extend the structures comprising the 'recursions' array with a new field
holding a pointer to the recursion quota that a given recursion process
attached to.  Update all code paths using client->recursionquota so that
they use the appropriate slot in the 'recursions' array.  Drop the
'recursionquota' field from ns_client_t.
2022-06-14 13:13:32 +02:00