2
0
mirror of https://gitlab.isc.org/isc-projects/bind9 synced 2025-08-31 14:35:26 +00:00
Commit Graph

37807 Commits

Author SHA1 Message Date
Matthijs Mekking
ff5bacf17c Fix serve-stale hang at shutdown
The 'refresh_rrset' variable is used to determine if we can detach from
the client. This can cause a hang on shutdown. To fix this, move setting
of the 'nodetach' variable up to where 'refresh_rrset' is set (in
query_lookup(), and thus not in ns_query_done()), and set it to false
when actually refreshing the RRset, so that when this lookup is
completed, the client will be detached.
2023-06-09 14:54:48 +02:00
Evan Hunt
240caa32b9 Stale answer lookups could loop when over recursion quota
When a query was aborted because of the recursion quota being exceeded,
but triggered a stale answer response and a stale data refresh query,
it could cause named to loop back where we are iterating and following
a delegation. Having no good answer in cache, we would fall back to
using serve-stale again, use the stale data, try to refresh the RRset,
and loop back again, without ever terminating until crashing due to
stack overflow.

This happens because in the functions 'query_notfound()' and
'query_delegation_recurse()', we check whether we can fall back to
serving stale data. We shouldn't do so if we are already refreshing
an RRset due to having prioritized stale data in cache.

In other words, we need to add an extra check to 'query_usestale()' to
disallow serving stale data if we are currently refreshing a stale
RRset.

As an additional mitigation to prevent looping, we now use the result
code ISC_R_ALREADYRUNNING rather than ISC_R_FAILURE when a recursion
loop is encountered, and we check for that condition in
'query_usestale()' as well.
2023-06-09 14:54:48 +02:00
Michal Nowak
e9af3d15d8 Merge branch '4055-improve-the-overmem-cache-cleaning-9.18' into 'security-bind-9.18'
[9.18] Improve RBT overmem cache cleaning

See merge request isc-private/bind9!527
2023-06-09 12:54:04 +00:00
Michal Nowak
ec72e11ee4 Set max-cache-size expectations for low values 2023-06-08 11:47:04 +02:00
Ondřej Surý
09fcd8f88a Add CHANGES and release note for [GL #4055] 2023-06-08 11:47:04 +02:00
Ondřej Surý
e9d5219fca Improve RBT overmem cache cleaning
When cache memory usage is over the configured cache size (overmem) and
we are cleaning unused entries, it might not be enough to clean just two
entries if the entries to be expired are smaller than the newly added
rdata.  This could be abused by an attacker to cause a remote Denial of
Service by possibly running out of the operating system memory.

Currently, the addrdataset() tries to do a single TTL-based cleaning
considering the serve-stale TTL and then optionally moves to overmem
cleaning if we are in that condition.  Then the overmem_purge() tries to
do another single TTL based cleaning from the TTL heap and then continue
with LRU-based cleaning up to 2 entries cleaned.

Squash the TTL-cleaning mechanism into single call from addrdataset(),
but ignore the serve-stale TTL if we are currently overmem.

Then instead of having a fixed number of entries to clean, pass the size
of newly added rdatasetheader to the overmem_purge() function and
cleanup at least the size of the newly added data.  This prevents the
cache going over the configured memory limit (`max-cache-size`).

Additionally, refactor the overmem_purge() function to reduce for-loop
nesting for readability.
2023-06-08 11:43:18 +02:00
Arаm Sаrgsyаn
36d019ffce Merge branch '4105-QryDropped-stats-counter-documentation-update-9.18' into 'bind-9.18'
[9.18] QryDropped stats counter documentation update

See merge request isc-projects/bind9!8011
2023-06-07 15:13:17 +00:00
Aram Sargsyan
dd2973996f QryDropped stats counter documentation update
Document which dropped queries are calculated by the QryDropped
statistics counter.

(cherry picked from commit 27c30fe8a4)
2023-06-07 14:01:46 +00:00
Arаm Sаrgsyаn
5206a06e11 Merge branch '4074-fix-stale-answer-client-timeout-with-clients-per-query-9.18' into 'bind-9.18'
[9.18] Fix a clients-per-query miscalculation bug

See merge request isc-projects/bind9!7997
2023-06-06 12:47:28 +00:00
Aram Sargsyan
545a3fe089 Add CHANGES and release notes for [GL #4074]
(cherry picked from commit 466a7d9b5f)
2023-06-06 12:46:18 +00:00
Aram Sargsyan
d91edda639 Fix a clients-per-query miscalculation bug
The number of clients per query is calculated using the pending
fetch responses in the list. The dns_resolver_createfetch() function
includes every item in the list when deciding whether the limit is
reached (i.e. fctx->spilled is true). Then, when the limit is reached,
there is another calculation in fctx_sendevents(), when deciding
whether it is needed to increase the limit, but this time the TRYSTALE
responses are not included in the calculation (because of early break
from the loop), and because of that the limit is never increased.

A single client can have more than one associated response/event in the
list (currently max. two), and calculating them as separate "clients"
is unexpected. E.g. if 'stale-answer-enable' is enabled and
'stale-answer-client-timeout' is enabled and is larger than 0, then
each client will have two events, which will effectively halve the
clients-per-query limit.

Fix the dns_resolver_createfetch() function to calculate only the
regular FETCHDONE responses/events.

Change the fctx_sendevents() function to also calculate only FETCHDONE
responses/events. Currently, this second change doesn't have any impact,
because the TRYSTALE events were already skipped, but having the same
condition in both places will help prevent similar bugs in the future
if a new type of response/event is ever added.

(cherry picked from commit 2ae5c4a674)
2023-06-06 12:45:00 +00:00
Aram Sargsyan
f82aaedbdc Add clients-per-query checks for the fetchlimit system test
Check if clients-per-query quota works as expected with or without
a positive stale-answer-client-timeout value and serve-stale answers
enabled.

(cherry picked from commit 3bb2babcd0)
2023-06-06 12:45:00 +00:00
Aram Sargsyan
71a27a2848 Light refactoring of the fetchlimit system test
Prepare the fetchlimit system test for adding a clients-per-query
check. Change some functions and commands to accept a destination
NS IP address instead of using the hardcoded 10.53.0.3.

(cherry picked from commit 7ebd055c78)
2023-06-06 12:45:00 +00:00
Aram Sargsyan
17e09d8a10 Fix fetchlimit system test issues
1. Fix the numbering.
2. Fix an artifacts rewriting issue.
3. Add missing checks of 'ret' after some checks.
4. Fix extracting the quota value from the ADB dump.

(cherry picked from commit 101d829b02)
2023-06-06 12:45:00 +00:00
Ondřej Surý
449124c56d Merge branch '4038-resize-send-buffers-to-avoid-excessive-memory-allocation-9.18' into 'bind-9.18'
[9.18] Use appropriately sized send buffers for DNS messages over TCP

See merge request isc-projects/bind9!8005
2023-06-06 12:21:56 +00:00
Artem Boldariev
2c145b1862 Update CHANGES and release note [GL #4038]
Mention that memory usage was reduced by allocating properly sized
send buffers for stream-based transports.

(cherry picked from commit 8672d54847)
2023-06-06 14:04:01 +02:00
Artem Boldariev
285e75b3b0 Use appropriately sized send buffers for DNS messages over TCP
This commit changes send buffers allocation strategy for stream based
transports. Before that change we would allocate a dynamic buffers
sized at 64Kb even when we do not need that much. That could lead to
high memory usage on server. Now we resize the send buffer to match
the size of the actual data, freeing the memory at the end of the
buffer for being reused later.

(cherry picked from commit d8a5feb556)
2023-06-06 14:04:01 +02:00
Arаm Sаrgsyаn
e72c92c497 Merge branch '4106-lock-order-inversion-in-resolver.c' into 'bind-9.18'
[9.18] Fix a lock-order-inversion bug in resolver.c

See merge request isc-projects/bind9!8000
2023-06-06 11:56:01 +00:00
Aram Sargsyan
db45cab546 Fix a lock-order-inversion bug in resolver.c
There is a lock-order-inversion (potential deadlock) in resolver.c,
because in dns_resolver_shutdown() a resolver bucket lock is locked
while the resolver lock itself is already locked, while in
fctx_sendevents() the resolver lock is locked while a bucket lock
is locked before calling that function in fctx__done_detach().

The resolver lock/unlock in dns_resolver_shutdown() was added back in
the 317e36d47e commit to make sure that
the function is finished before the resolver object is destroyed.

Since res->exiting is atomic, it should be possible to remove the
resolver locking in dns_resolver_shutdown() and add it to the
send_shutdown_events() function which requires it.

Also, since 'res->exiting' is now set while unlocked, the 'INSIST'
in spillattimer_countdown() is wrong, and is removed.
2023-06-06 11:02:24 +00:00
Arаm Sаrgsyаn
ff3d25a47f Merge branch 'aram/statschannel-spilled-clients-counter-9.18' into 'bind-9.18'
[9.18] Add ClientQuota statistics channel counter

See merge request isc-projects/bind9!7993
2023-05-31 14:51:08 +00:00
Aram Sargsyan
9a3e00478f Add a CHANGES note for [GL !7978]
(cherry picked from commit fa9172d996)
2023-05-31 11:07:08 +00:00
Aram Sargsyan
b6eec9ee51 Update the documentation of the resolver statistics counters
The reference manual doesn't document all the available resolver
statistics counters. Add information about the missing counters.

(cherry picked from commit 08ebf39d1e)
2023-05-31 11:07:08 +00:00
Aram Sargsyan
cd47429365 Add ClientQuota statistics channel counter
This counter indicates the number of the resolver's spilled
queries due to reaching the clients per query quota.

(cherry picked from commit 04648d7c2f)
2023-05-31 11:07:08 +00:00
Michal Nowak
56ae462f21 Merge branch 'mnowak/alpine-3.18-9.18' into 'bind-9.18'
[9.18] Add Alpine Linux 3.18

See merge request isc-projects/bind9!7994
2023-05-31 10:16:09 +00:00
Michal Nowak
46e98810d7 Add Alpine Linux 3.18
(cherry picked from commit ddb846454d)
2023-05-31 12:03:52 +02:00
Michal Nowak
b751e2b4be Merge branch 'mnowak/look-for-core-files-in-TOP_BUILDDIR-9.18' into 'bind-9.18'
[9.18] Look for core files in $TOP_BUILDDIR

See merge request isc-projects/bind9!7986
2023-05-30 20:27:33 +00:00
Michal Nowak
2476d43acf Look for core files in $TOP_BUILDDIR
The get_core_dumps.sh script couldn't find and process core files of
out-of-tree configurations because it looked for them in the source
instead of the build directory.

(cherry picked from commit a13448a769)
2023-05-30 21:31:41 +02:00
Michal Nowak
99e910e4b9 Merge branch 'mnowak/custom-userspace-rcu-library-9.18' into 'bind-9.18'
[9.18] Change images for TSAN jobs

See merge request isc-projects/bind9!7987
2023-05-30 19:30:02 +00:00
Michal Nowak
44fff18b68 Change images for TSAN jobs
Fedora 38 and Debian "bullseye" images were "forked" to images used only
for TSAN CI jobs. The new images contain TSAN-aware liburcu that does
not fit well with ASAN CI jobs for which original images were also used.

liburcu is not used in this branch, but images are shared among
branches, and their use needs to be consistent in all maintained
branches.

(cherry picked from commit 04dda8661f)
2023-05-30 20:35:12 +02:00
Tom Krizek
4f3cfba6c0 Merge branch 'tkrizek-fix-pytest-base-port-9.18' into 'bind-9.18'
[9.18] Fix base_port calculation in pytest runner

See merge request isc-projects/bind9!7983
2023-05-30 15:37:37 +00:00
Tom Krizek
1b8f0711f2 Fix base_port calculation in pytest runner
The selected base port should be in the range <port_min, port_max), the
formula was incorrect.

Credit for discovering this fault goes to Ondrej Sury.

(cherry picked from commit e8ea6b610b)
2023-05-30 15:37:29 +02:00
Matthijs Mekking
076b8363fc Merge branch '3950-serve-stale-strikes-again-9.18' into 'bind-9.18'
[9.18] Fix serve-stale bug when cache has no data

See merge request isc-projects/bind9!7909
2023-05-30 13:15:13 +00:00
Matthijs Mekking
cbe0cddcd4 Add release note and changes for #3950
Fixing another serve-stale bug is still news.

(cherry picked from commit 23dbb6ba72)
2023-05-30 13:46:34 +02:00
Matthijs Mekking
b90bad93cb Fix serve-stale bug when cache has no data
We recently fixed a bug where in some cases (when following an
expired CNAME for example), named could return SERVFAIL if the target
record is still valid (see isc-projects/bind9#3678, and
isc-projects/bind9!7096). We fixed this by considering non-stale
RRsets as well during the stale lookup.

However, this triggered a new bug because despite the answer from
cache not being stale, the lookup may be triggered by serve-stale.
If the answer from database is not stale, the fix in
isc-projects/bind9!7096 erroneously skips the serve-stale logic.

Add 'answer_found' checks to the serve-stale logic to fix this issue.

(cherry picked from commit bbd163acf6)
2023-05-30 13:46:00 +02:00
Matthijs Mekking
ad5d447348 Add serve-stale test case for GL #3950
Add a test case where when priming the cache with a slow authoritative
resolver, the stale-answer-client-timeout option should not return
a delegation to the client (it should wait until an applicable answer
is found, if no entry is found in the cache).

(cherry picked from commit c3d4fd3449)
2023-05-30 13:45:54 +02:00
Ondřej Surý
2a498d944a Merge branch '4098-remove-cruft-epoll-kqueue-configure-options-9.18' into 'bind-9.18'
[9.18] Remove obsolete epoll/kqueue/devpoll configure options

See merge request isc-projects/bind9!7975
2023-05-29 06:07:16 +00:00
Ondřej Surý
4fb2c9568d Add CHANGES note for [GL #4098]
(cherry picked from commit 0266760fdd)
2023-05-29 07:58:51 +02:00
Ondřej Surý
6b6076c882 Remove obsolete epoll/kqueue/devpoll configure options
Since we don't use networking directly but rather via libuv, these
configure options were no-op.  Remove the configure checks for epoll
(Linux), kqueue (BSDs) and /dev/poll (Solaris).

(cherry picked from commit 051f3d612f)
2023-05-29 07:58:03 +02:00
Mark Andrews
aca974dc29 Merge branch '4090-corrected-bad-insist-logic-in-isc_radix_remove-bind-9.18' into 'bind-9.18'
[9.18] Resolve "Corrected bad INSIST logic in isc_radix_remove()"

See merge request isc-projects/bind9!7974
2023-05-29 04:42:09 +00:00
Mark Andrews
eb52c30524 Add regression test for [GL # 4090]
These insertions are added to produce a radix tree that will trigger
the INSIST reported in [GL #4090].  Due to fixes added since BIND 9.9
an extra insert in needed to ensure node->parent is non NULL.

(cherry picked from commit 03ebe96110)
2023-05-29 13:27:51 +10:00
Mark Andrews
27eb8ed20f Move isc_mem_put to after node is checked for equality
isc_mem_put NULL's the pointer to the memory being freed.  The
equality test 'parent->r == node' was accidentally being turned
into a test against NULL.

(cherry picked from commit ac2e0bc3ff)
2023-05-29 13:27:51 +10:00
Evan Hunt
9a1d565f07 Merge branch '3905-deprecate-tkey-dhkey-v9_18' into 'bind-9.18'
mark 'tkey-dhkey' as deprecated

See merge request isc-projects/bind9!7956
2023-05-28 08:07:25 +00:00
Evan Hunt
9a8f8d6046 CHANGES and release note for [GL #3905] 2023-05-28 00:55:55 -07:00
Evan Hunt
88383aa158 mark 'tkey-dhkey' as deprecated
Diffie-Hellman TKEY mode has been removed for 9.20.
2023-05-28 00:55:34 -07:00
Artem Boldariev
97d672d368 Merge branch '4091-syncrhonise-access-to-the-client-tlsctx-cache-9.18' into 'bind-9.18'
[9.18] ZMGR: TLS contexts cache - properly synchronise access

See merge request isc-projects/bind9!7972
2023-05-26 14:11:40 +00:00
Artem Boldariev
cec8947bc1 ZMGR: TLS contexts cache - properly synchronise access
This commit ensures that access to the TLS context cache within zone
manager is properly synchronised.

Previously there was a possibility for it to get unexpectedly
NULLified for a brief moment by a call to
dns_zonemgr_set_tlsctx_cache() from one thread, while being accessed
from another (e.g. from got_transfer_quota()). This behaviour could
lead to server abort()ing on configuration reload (under very rare
circumstances).

That behaviour has been fixed.

(cherry picked from commit 0b95cf74ff)
2023-05-26 15:24:51 +03:00
Michal Nowak
207f9bb1c3 Merge branch 'mnowak/gitlab-runner-autoscaling-9.18' into 'bind-9.18'
[9.18] Run most Docker CI jobs in AWS with autoscaler

See merge request isc-projects/bind9!7969
2023-05-26 09:54:12 +00:00
Michal Nowak
94d83b7960 Run most Docker CI jobs in AWS with autoscaler
All but the "respdiff-long" job, for which our AWS instances do not have
enough memory, are now being spawned in the AWS by the autoscaler
executor.

(cherry picked from commit f09cf69594)
2023-05-26 11:47:20 +02:00
Evan Hunt
59827b21d1 Merge branch '4072-tcp-dispatch-timeout-bind-9.18' into 'bind-9.18'
[9.18] fix handling of TCP timeouts

See merge request isc-projects/bind9!7968
2023-05-26 09:32:22 +00:00
Evan Hunt
e9b6991357 fix handling of TCP timeouts
when a TCP dispatch times out, we call tcp_recv() with a result
value of ISC_R_TIMEDOUT; this cancels the oldest dispatch
entry in the dispatch's active queue, plus any additional entries
that have waited longer than their configured timeouts. if, at
that point, there were more dispatch entries still on the active
queue, it resumes reading, but until now it failed to restart
the timer.

this has been corrected: we now calculate a new timeout
based on the oldest dispatch entry still remaining.  this
requires us to initialize the start time of each dispatch entry
when it's first added to the queue.

in order to ensure that the handling of timed-out requests is
consistent, we now calculate the runtime of each dispatch
entry based on the same value for 'now'.

incidentally also fixed a compile error that turned up when
DNS_DISPATCH_TRACE was turned on.

(cherry picked from commit 0e800467ee)
2023-05-26 02:07:02 -07:00