ZONEMD needs to be able to digest SIG and RRSIG records. The signer
field can be compressed in SIG so we need to call dns_name_digest().
While for RRSIG the records the signer field is not compressed the
canonical form has the signer field downcased (RFC 4034, 6.2). This
also implies that compare_rrsig needs to downcase the signer field
during comparison.
(cherry picked from commit 006c5990ce88aa5b5869a6140392ef80f38e415a)
From technical reasons --with-readline=libedit is not being tested on
FreeBSD anymore as it's hard to have anchors both unified and specific.
(cherry picked from commit e0df774ca093bfc775232c5a543162de3c7245c2)
When a system test is run with the `USE_RR` environment variable set to 1, an `rr` trace is now correctly generated for each instance of `named`.
Closes#5079
Backport of MR !10197
Merge branch 'backport-5079-fix-rr-9.20' into 'bind-9.20'
See merge request isc-projects/bind9!10207
when running a system test with the USE_RR environment
variable set to 1, an rr trace is generated for named.
because rr wasn't run using libtool --mode=execute, the
trace would actually be generated for the wrapper script
generated by libtool, not for the actual named binary.
(cherry picked from commit 00d7c7c3462dd13b0cf003ad825689c218624ff0)
isc_iterated_hash didn't work in offloaded threads as the per thread
initialisation has not been done. This has been fixed.
Closes#5214
Backport of MR !10206
Merge branch 'backport-5214-call-isc__iterated_hash_initialize-in-isc__work_cb-9.20' into 'bind-9.20'
See merge request isc-projects/bind9!10210
The iterated hash implementation needs to be initialised
on the worker thread. Also clean it up after we are done.
(cherry picked from commit 988dc57c8cb04748059769400275a2da5dd6a449)
When querying zone transfers information from the statistics channel there was a rare possibility that `named` could terminate unexpectedly if a zone transfer was in a state when transferring from all the available primary servers had failed earlier. This has been fixed.
Closes#5198
Backport of MR !10182
Merge branch 'backport-5198-dns_remote_curraddr-bug-fix-9.20' into 'bind-9.20'
See merge request isc-projects/bind9!10194
When all the addresses were already iterated over, the
dns_remote_curraddr() function asserts. So before calling it,
dns_zone_getprimaryaddr() now checks the address list using the
dns_remote_done() function. This also means that instead of
returning 'isc_sockaddr_t' it now returns 'isc_result_t' and
writes the primary's address into the provided pointer only when
returning success.
(cherry picked from commit 7293cb0612e9362ad78f5c318b767a4f8ce16451)
This merge request addresses several key performance bottlenecks in the DoH (DNS over HTTPS) implementation by introducing significant optimizations and improvements.
### Key Improvements
1. **Simplification and Optimisation of `http_do_bio()` Function**:
- The code flow in the `http_do_bio()` function has been significantly simplified.
2. **Flushing HTTP Write Buffer on Outgoing DNS Messages**:
- The buffer is flushed and a send operation is performed when there is an outgoing DNS message.
3. **Bumping Active Streams Processing Limit**:
- The total number of active streams has been increased to 60% of the total streams limit.
These changes collectively enhance the performance and reliability of the DoH implementation, making it more efficient and robust for handling high-load scenarios, particularly noticeable in long runs (>= 1h) of `stress:long:rpz:doh+udp:linux:*` tests. It improves perf. for tests for BIND 9.18, but it likely will have a positive but less pronounced effect on newer versions as well.
In essence, the merge request fixes three bottlenecks stacked upon each other.
*It is a logical continuation of the merge requests !10109.* !10109, unfortunately, did not completely [address the performance drop in 9.18](https://gitlab.isc.org/isc-projects/bind9/-/pipelines/221545) for longer runs of the stress test. This merge request [addresses that](https://gitlab.isc.org/isc-projects/bind9/-/pipelines/223661).
**P.S.**
The origin of the fixes is, in fact, the branch in !10193. So this MR is a ... *forward port* of them.
Backport of MR !10192
Merge branch 'backport-artem-doh-performance-drop-post-fix-9.20' into 'bind-9.20'
See merge request isc-projects/bind9!10199
This commit bumps the total number of active streams (= the opened
streams for which a request is received, but response is not ready) to
60% of the total streams limit.
The previous limit turned out to be too tight as revealed by
longer (≥1h) runs of "stress:long:rpz:doh+udp:linux:*" tests.
(cherry picked from commit eaad0aefe668408d8ae0792796852cc7bccaff0f)
The check, while not active by default, is not valid since the commit
8b8f4d500d9c1d41d95d34a79c8935823978114c.
See 'if (total == 0) { ...' below branch to understand why.
(cherry picked from commit 217a1ebd79d90d2a3eebf44256fd15ab61a7d2a9)
Previously, the code would try to avoid sending any data regardless of
what it is unless:
a) The flush limit is reached;
b) There are no sends in flight.
This strategy is used to avoid too numerous send requests with little
amount of data. However, it has been proven to be too aggressive and,
in fact, harms performance in some cases (e.g., on longer (≥1h) runs
of "stress:long:rpz:doh+udp:linux:*").
Now, additionally to the listed cases, we also:
c) Flush the buffer and perform a send operation when there is an
outgoing DNS message passed to the code (which is indicated by the
presence of a send callback).
That helps improve performance for "stress:long:rpz:doh+udp:linux:*"
tests.
(cherry picked from commit c5f7968856f3d0cf37c5882ac19fa70333bae4cc)
Previously, a function for continuing IO processing on the next UV
tick was introduced (http_do_bio_async()). The intention behind this
function was to ensure that http_do_bio() is eventually called at
least once in the future. However, the current implementation allows
queueing multiple such delayed requests needlessly. There is currently
no need for these excessive requests as http_do_bio() can requeue them
if needed. At the same time, each such request can lead to a memory
allocation, particularly in BIND 9.18.
This commit ensures that the number of enqueued delayed IO processing
requests never exceeds one in order to avoid potentially bombarding IO
threads with the delayed requests needlessly.
(cherry picked from commit 0e1b02868a63d2c4f6f0c414b20d4b999adbcc46)
This commit significantly simplifies the code flow in the
http_do_bio() function, which is responsible for processing incoming
and outgoing HTTP/2 data. It seems that the way it was structured
before was indirectly caused by the presence of the missing callback
calls bug, fixed in 8b8f4d500d9c1d41d95d34a79c8935823978114c.
The change introduced by this commit is known to remove a bottleneck
and allows reproducible and measurable performance improvement for
long runs (>= 1h) of "stress:long:rpz:doh+udp:linux:*" tests.
Additionally, it fixes a similar issue with potentially missing send
callback calls processing and hardens the code against use-after-free
errors related to the session object (they can potentially occur).
(cherry picked from commit 0956fb9b9ec4d07f1b744a829dacfe7a1251a58e)
Currently, the ChangeLog file is a dangling symlink pointing to the
removed CHANGES file. Fix the link by pointing to doc/arm/changelog.rst.
(cherry picked from commit de0598cbc3691e2443b6d9ac90b9ea464b7678e0)
Disabling dynamic tags ensures the Clang symbolizer creates a valid TSAN
report. For consistency, also add the option to gcc:tsan so they are
both on the same footing.
Closes#5149
Backport of MR !10185
Merge branch 'backport-5149-fix-tsan-flags-9.20' into 'bind-9.20'
See merge request isc-projects/bind9!10187
Disabling new dynamic ELF tags ensures the Clang symbolizer creates
valid TSAN reports. For consistency, also add the option to gcc:tsan so
they are both on the same footing.
(cherry picked from commit ac9eec632718a20a3454d2d547fea60d58fc603c)
29fd7564083731373bd132ec65ffc0a9072f8efc replaced "only" with "rules" in
.gitlab-ci.yml but forgot to drop the removal from here, hence the
script was broken.
(cherry picked from commit 6e2272d769a205e7ca98f698a578f9daf6d8d28d)
Execute DNS Shotgun performance tests on the regular MRs and compare the changes they introduce against the MR diff base. The results are evaluated automatically - the shotgun jobs will fail if thresholds for CPU/memory/latency difference is exceeded.
Backport of MR !10127
Merge branch 'backport-nicki/ci-shotgun-eval-9.20' into 'bind-9.20'
See merge request isc-projects/bind9!10183
The keyword rules allows more flexible and complex conditions when
deciding whether to create the job and also makes it possible run tweak
variables or job properties depending on arbitraty rules. Since it's
not possible to combine only/except and rules together, replace all
uses of only/except to avoid any potential future issues.
(cherry picked from commit 29fd7564083731373bd132ec65ffc0a9072f8efc)
If the shotgun tests are executed for MRs, compare it against the MR's
base rather than the previous release. Only fail the job in case the
performance drops (pass on performance improvements).
Note that start_in optimization was removed, since it isn't properly
supported with rules as of February 2025
(https://gitlab.com/gitlab-org/gitlab/-/issues/424203). Without this
optimization, container test images are likely to be re-built
unnecessarily when testing different protocols. A workaround for the
.gitlab-ci.yml exists, but the extra complexity doesn't seem justified.
The container image builds might change or be optimized in the future,
so let's just go with the build duplication for now.
(cherry picked from commit 4214c1e8a71d857fc8d602dc577260934c6342f5)
Answers to an "ANY" query which were processed by the RPZ "passthru"
policy had the response-policy's `max-policy-ttl` value unexpectedly
applied. This has been fixed.
Closes#5187
Backport of MR !10176
Merge branch 'backport-5187-rpz-passthru-any-type-ttl-bug-fix-9.20' into 'bind-9.20'
See merge request isc-projects/bind9!10180
Expand the test_rpz_passthru_logging() check in the "rpzextra" system
test to check the answer's TTL values with ANY type queries.
(cherry picked from commit 98ff3a4432172b1c5c869969f122b6204c2eb7ee)
Answers to an "ANY" query which are processed by the RPZ "passthru"
policy have the response-policy's 'max-policy-ttl' value unexpectedly
applied. Do not change the records' TTL when RPZ uses a policy which
does not alter the answer.
(cherry picked from commit 5633dc90d3f4d3e2bd4d461e07fcd8d611843e7f)
The dual-stack-servers configuration option was not working as expected; the specified servers were not being used when they should have been, leading to resolution failures. This has been fixed.
Closes#5019
Backport of MR !9708
Merge branch 'backport-5019-dual-stack-servers-wasn-t-working-in-all-cases-9.20' into 'bind-9.20'
See merge request isc-projects/bind9!10174
Now that fctx_try is being called when adb returns DNS_ADB_NOMOREADDRESSES
we don't need these priming queries for the dual-stack-servers test
to succeed.
(cherry picked from commit 14ab1629b7bd7c6e766216388803bde0fcf97a4f)
Named was stopping nameserver address resolution attempts too soon
when dual stack servers are configured. Dual stack servers are
used when there are *not* addresses for the server in a particular
address family so find->status == DNS_ADB_NOMOREADDRESSES is not a
sufficient stopping condition when dual stack servers are available.
Call fctx_try to see if the alternate servers can be used.
(cherry picked from commit f98a8331aab5e26266f66be4d974da2522f1d2e1)
The `NS_QUERY_DONE_BEGIN` and `NS_QUERY_DONE_SEND` plugin hooks could cause a reference leak if they returned `NS_HOOK_RETURN` without cleaning up the query context properly.
Closes#2094
Backport of MR !9971
Merge branch 'backport-2094-plugin-reference-leak-9.20' into 'bind-9.20'
See merge request isc-projects/bind9!10170
When testing, the client object doesn't have a proper
netmgr handle, so ns_client_error() needs to be a no-op.
(cherry picked from commit ae37ef45ff45db4919842ad29d3f8dfe0c77c76c)
if the NS_QUERY_DONE_BEGIN or NS_QUERY_DONE_SEND hook is
used in a plugin and returns NS_HOOK_RETURN, some of the
cleanup in ns_query_done() can be skipped over, leading
to reference leaks that can cause named to hang on shut
down.
this has been addressed by adding more housekeeping
code after the cleanup: tag in ns_query_done().
(cherry picked from commit c2e43582678ec5d0c40e19c671d60012b36ac312)
DNSKEY, KEY, RRSIG and SIG constraints have been relaxed to allow empty key and signature material after the algorithm identifier for PRIVATEOID and PRIVATEDNS. It is arguable whether this falls within the expected use of these types as no key material is shared and the signatures are ineffective but these are private algorithms and they can be totally insecure.
Closes#5167
Backport of MR !10083
Merge branch 'backport-5167-relax-private-dnskey-constraints-9.20' into 'bind-9.20'
See merge request isc-projects/bind9!10173
DNSKEY, KEY, RRSIG and SIG constraints have been relaxed to allow
empty key and signature material after the algorithm identifier for
PRIVATEOID and PRIVATEDNS. It is arguable whether this falls within
the expected use of these types as no key material is shared and
the signatures are ineffective but these are private algorithms and
they can be totally insecure.
(cherry picked from commit b048190e237d6650cf32ac641b1ee4b96c90ac2e)
dnssec-signzone could dereference a NULL key pointer when resigning a zone. This has been fixed.
Closes#5192
Backport of MR !10161
Merge branch 'backport-5192-dnssec-signzone-needs-to-check-for-a-null-key-when-setting-offline-9.20' into 'bind-9.20'
See merge request isc-projects/bind9!10169
The zone file for example3 (ns1/example3.db) can be modified in the
upforwd test as example3 is updated as part of the test. Whether
the zone is written out or not by the end of the test is timing
dependent. Rename ns1/example3.db to ns1/example3.db.in and copy it to
ns1/example3.db in setup so we don't trigger post test changes checks.
Closes#5180
Backport of MR !10160
Merge branch 'backport-5180-create-example3-in-setup-9.20' into 'bind-9.20'
See merge request isc-projects/bind9!10163
The zone file for example3 (ns1/example3.db) can be modified in the
upforwd test as example3 is updated as part of the test. Whether
the zone is written out or not by the end of the test is timing
dependent. Rename ns1/example3.db to ns1/example3.db.in and copy
it to ns1/example3.db in setup so we don't trigger post test changes
checks.
(cherry picked from commit afc4413862f389b527119647b05982e5119508ef)
Previously, if a new counter was added to the hashtable
while dumping recursing clients via the `rndc recursing`
command, and `fetches-per-zone` was enabled, an assertion
failure could occur. This has been fixed.
Closes#5200
Backport of MR !10164
Merge branch 'backport-5200-destroy-iterator-inside-the-rwlock-9.20' into 'bind-9.20'
See merge request isc-projects/bind9!10168
Previously, the hashmap iterator for fetches-per-zone was destroy
outside the rwlock. This could lead to an assertion failure due to a
timing race with the internal rehashing of the hashmap table as the
rehashing process requires no iterators to be running when rehashing the
hashmap table. This has been fixed by moving the destruction of the
iterator inside the read locked section.
(cherry picked from commit 1e4fb53c6142d0148be782aede47bebd8e00d5b2)
A change in 6aba56ae8 (checking whether a rejected RRset was identical
to the data it would have replaced, so that we could still cache a
signature) inadvertently introduced cases where processing of a
response would continue when previously it would have been skipped.
Closes#5197
Backport of MR !10157
Merge branch 'backport-5197-cache_name-logic-error-9.20' into 'bind-9.20'
See merge request isc-projects/bind9!10158
A change in 6aba56ae8 (checking whether a rejected RRset was identical
to the data it would have replaced, so that we could still cache a
signature) inadvertently introduced cases where processing of a
response would continue when previously it would have been skipped.
(cherry picked from commit d0fd9cbe3b0455d0db04b5afe67b7edc44e55965)
Previously, active resolver fetches were only dumped when the `fetches-per-zone` configuration option was enabled. Now, active resolver fetches are dumped along with the number of `clients-per-server` counters per resolver fetch.
Backport of MR !10107
Merge branch 'backport-ondrej/make-dns_resolver_dumpfetches-dump-fetches-9.20' into 'bind-9.20'
See merge request isc-projects/bind9!10148
Previously, the dns_resolver_dumpfetches() would go over the fetch
counters. Alas, because of the earlier optimization, the fetch counters
would be increased only when fetches-per-zone was not 0, otherwise the
whole counting was skipped for performance reasons.
Instead of using the auxiliary fetch counters hash table, use the real
hash table that stores the fetch contexts to dump the ongoing fetches to
the recursing file.
Additionally print more information about the fetch context like start
and expiry times, number of fetch responses, number of queries and count
of allowed and dropped fetches.
(cherry picked from commit c6b0368b21a75145509fc4f2f490d1c27df7f1fd)
Previously, a data race could cause a newly created fetch context for a new client to be used
before it had been fully initialized, which would cause the query to become stuck; queries for the same
data would be either paused indefinitely or dropped because of
the `clients-per-query` limit. This has been fixed.
Closes#5053
Backport of MR !10146
Merge branch 'backport-5053-fetch-context-create-data-race-9.20' into 'bind-9.20'
See merge request isc-projects/bind9!10147