2
0
mirror of https://gitlab.isc.org/isc-projects/bind9 synced 2025-08-29 05:28:00 +00:00

42936 Commits

Author SHA1 Message Date
Artem Boldariev
c8104daf8d fix: dev: Post [CVE-2024-12705] Performance Drop Fixes, Part 2
This merge request addresses several key performance bottlenecks in the DoH (DNS over HTTPS) implementation by introducing significant optimizations and improvements.

### Key Improvements

1. **Simplification and Optimisation of `http_do_bio()` Function**:
   - The code flow in the `http_do_bio()` function has been significantly simplified.
2. **Flushing HTTP Write Buffer on Outgoing DNS Messages**:
   - The buffer is flushed and a send operation is performed when there is an outgoing DNS message.
3. **Bumping Active Streams Processing Limit**:
   - The total number of active streams has been increased to 60% of the total streams limit.

These changes collectively enhance the performance and reliability of the DoH implementation, making it more efficient and robust for handling high-load scenarios, particularly noticeable in long runs (>= 1h) of `stress:long:rpz:doh+udp:linux:*` tests. It improves perf. for tests for BIND 9.18, but it likely will have a positive but less pronounced effect on newer versions as well.

In essence, the merge request fixes three bottlenecks stacked upon each other.

*It is a logical continuation of the merge requests !10109.* !10109, unfortunately, did not completely [address the performance drop in 9.18](https://gitlab.isc.org/isc-projects/bind9/-/pipelines/221545) for longer runs of the stress test. This merge request [addresses that](https://gitlab.isc.org/isc-projects/bind9/-/pipelines/223661).

**P.S.**

The origin of the fixes is, in fact, the branch in !10193. So this MR is a ... *forward port* of them.

Merge branch 'artem-doh-performance-drop-post-fix' into 'main'

See merge request isc-projects/bind9!10192
2025-03-03 10:10:33 +00:00
Artem Boldariev
eaad0aefe6 DoH: Bump the active streams processing limit
This commit bumps the total number of active streams (= the opened
streams for which a request is received, but response is not ready) to
60% of the total streams limit.

The previous limit turned out to be too tight as revealed by
longer (≥1h) runs of "stress:long:rpz:doh+udp:linux:*" tests.
2025-03-03 11:32:29 +02:00
Artem Boldariev
217a1ebd79 DoH: remove obsolete INSIST() check
The check, while not active by default, is not valid since the commit
8b8f4d500d9c1d41d95d34a79c8935823978114c.

See 'if (total == 0) { ...' below branch to understand why.
2025-03-03 11:32:11 +02:00
Artem Boldariev
c5f7968856 DoH: Flush HTTP write buffer on an outgoing DNS message
Previously, the code would try to avoid sending any data regardless of
what it is unless:

a) The flush limit is reached;
b) There are no sends in flight.

This strategy is used to avoid too numerous send requests with little
amount of data. However, it has been proven to be too aggressive and,
in fact, harms performance in some cases (e.g., on longer (≥1h) runs
of "stress:long:rpz:doh+udp:linux:*").

Now, additionally to the listed cases, we also:

c) Flush the buffer and perform a send operation when there is an
outgoing DNS message passed to the code (which is indicated by the
presence of a send callback).

That helps improve performance for "stress:long:rpz:doh+udp:linux:*"
tests.
2025-03-03 11:32:11 +02:00
Artem Boldariev
0e1b02868a DoH: Limit the number of delayed IO processing requests
Previously, a function for continuing IO processing on the next UV
tick was introduced (http_do_bio_async()). The intention behind this
function was to ensure that http_do_bio() is eventually called at
least once in the future. However, the current implementation allows
queueing multiple such delayed requests needlessly. There is currently
no need for these excessive requests as http_do_bio() can requeue them
if needed. At the same time, each such request can lead to a memory
allocation, particularly in BIND 9.18.

This commit ensures that the number of enqueued delayed IO processing
requests never exceeds one in order to avoid potentially bombarding IO
threads with the delayed requests needlessly.
2025-03-03 11:32:11 +02:00
Artem Boldariev
0956fb9b9e DoH: Simplify http_do_bio()
This commit significantly simplifies the code flow in the
http_do_bio() function, which is responsible for processing incoming
and outgoing HTTP/2 data. It seems that the way it was structured
before was indirectly caused by the presence of the missing callback
calls bug, fixed in 8b8f4d500d9c1d41d95d34a79c8935823978114c.

The change introduced by this commit is known to remove a bottleneck
and allows reproducible and measurable performance improvement for
long runs (>= 1h) of "stress:long:rpz:doh+udp:linux:*" tests.

Additionally, it fixes a similar issue with potentially missing send
callback calls processing and hardens the code against use-after-free
errors related to the session object (they can potentially occur).
2025-03-03 11:32:11 +02:00
Ondřej Surý
239712df16 rem: dev: Cleanup isc/util.h header and friends
Cleanup short list macros from <isc/util.h>, remove two unused headers, move locking macros to respective headers and use only the C11 static assertion.

Merge branch 'ondrej/cleanup-short-macros' into 'main'

See merge request isc-projects/bind9!10196
2025-03-01 06:35:58 +00:00
Ondřej Surý
ce7879c924
Remove STATIC_ASSERT variants in favor of the C11 variant
Previously, a gcc < 4.6 shim for _Static_assert() was included.  Such an
old compiler is not supported now anyway, so the macro variant has been
removed in favor of a single definition using _Static_assert().
2025-03-01 07:33:53 +01:00
Ondřej Surý
534069e048
Move locking macros into individual headers
Previously, the LOCK()/UNLOCK() and friends macros were defined in the
isc/util.h header.  Those macros were moved to their respective headers
as those would have to be included anyway if that particular lock was in
use.
2025-03-01 07:33:51 +01:00
Ondřej Surý
901637c25c
Remove superflous header includes from isc/util.h header
Formerly, isc/util.h would pull a few extra headers (isc/list.h,
isc/attributes.h, isc/result.h and errno.h).  These includes were
removed in favor of including them directly when used.
2025-03-01 07:33:40 +01:00
Ondřej Surý
c5075a9a61
Remove convenience list macros from isc/util.h
The short convenience list macros were used very sparingly and
inconsistenly in the code base.  As the consistency is prefered over
the convenience, all shortened list macro were removed in favor of
their ISC_LIST API targets.
2025-03-01 07:33:40 +01:00
Ondřej Surý
2aa70fff76
Remove unused isc_mutexblock and isc_condition units
The isc_mutexblock and isc_condition units were no longer in use and
were removed.
2025-03-01 07:33:09 +01:00
Arаm Sаrgsyаn
e02d73e7e3 fix: usr: Fix a bug in the statistics channel when querying zone transfers information
When querying zone transfers information from the statistics channel there was a rare possibility that `named` could terminate unexpectedly if a zone transfer was in a state when transferring from all the available primary servers had failed earlier. This has been fixed.

Closes #5198

Merge branch '5198-dns_remote_curraddr-bug-fix' into 'main'

See merge request isc-projects/bind9!10182
2025-02-28 15:34:05 +00:00
Aram Sargsyan
7293cb0612 Fix a bug in dns_zone_getprimaryaddr()
When all the addresses were already iterated over, the
dns_remote_curraddr() function asserts. So before calling it,
dns_zone_getprimaryaddr() now checks the address list using the
dns_remote_done() function. This also means that instead of
returning 'isc_sockaddr_t' it now returns 'isc_result_t' and
writes the primary's address into the provided pointer only when
returning success.
2025-02-28 15:33:37 +00:00
Michal Nowak
67d37a365e new: ci: Check dangling symlinks in the repository
Merge branch 'mnowak/check-dangling-symlinks' into 'main'

See merge request isc-projects/bind9!10120
2025-02-28 11:05:46 +00:00
Michal Nowak
de0598cbc3 Link ChangeLog to doc/arm/changelog.rst
Currently, the ChangeLog file is a dangling symlink pointing to the
removed CHANGES file. Fix the link by pointing to doc/arm/changelog.rst.
2025-02-28 11:02:28 +00:00
Michal Nowak
f3087f1299 Check dangling symlinks in the repository 2025-02-28 11:02:28 +00:00
Aydın Mercan
1b3e7f52ec rem: dev: remove log initialization checks from named
Logging initialization check is now redundant as there is a default global log context created during libisc's constructor.

`isc_log` calls can safely be made at any time outside libisc's constructor.

Merge branch 'aydin/remove-log-check' into 'main'

See merge request isc-projects/bind9!10186
2025-02-28 10:32:20 +00:00
Aydın Mercan
68bbf151a4 remove log initialization checks from named
This check is now redundant as there is a default global log context
created during libisc's constructor.
2025-02-28 10:31:46 +00:00
Michal Nowak
58ea2b1b22 fix: ci: Fix Clang TSAN reports
Disabling dynamic tags ensures the Clang symbolizer creates a valid TSAN
report. For consistency, also add the option to gcc:tsan so they are
both on the same footing.

Closes #5149

Merge branch '5149-fix-tsan-flags' into 'main'

See merge request isc-projects/bind9!10185
2025-02-28 10:15:06 +00:00
Michal Nowak
ac9eec6327
Fix Clang TSAN reports
Disabling new dynamic ELF tags ensures the Clang symbolizer creates
valid TSAN reports. For consistency, also add the option to gcc:tsan so
they are both on the same footing.
2025-02-28 09:01:46 +01:00
Michal Nowak
6e2272d769
No need to delete the "only" keyword in generate-tsan-stress-jobs.py
29fd7564083731373bd132ec65ffc0a9072f8efc replaced "only" with "rules" in
.gitlab-ci.yml but forgot to drop the removal from here, hence the
script was broken.
2025-02-28 09:01:46 +01:00
Evan Hunt
462f367d87 fix: nil: Complete the fix for the import_rdataset() crash
The fix in !10172 was incomplete; there were still some code paths in the resolver that could set the dns_fetchresponse `result` field to the wrong value when the rdataset type was CNAME or DNAME.

Closes #5201

Merge branch '5201-eresult-fix' into 'main'

See merge request isc-projects/bind9!10178
2025-02-27 19:01:02 +00:00
Evan Hunt
9ebeb60174 fix the fetchresponse result for CNAME/DNAME
the fix in commit 1edbbc32b4 was incomplete; the wrong
event result could also be set in cache_name() and validated().
2025-02-27 19:00:27 +00:00
Aydın Mercan
3de629d6b7 chg: dev: unify fips handling to isc_crypto and make the toggle one way
Since algorithm fetching is handled purely in libisc, FIPS mode toggling
can be purely done in within the library instead of provider fetching in
the binary for OpenSSL >=3.0.

Disabling FIPS mode isn't a realistic requirement and isn't done
anywhere in the codebase. Make the FIPS mode toggle enable-only to
reflect the situation.

Merge branch 'aydin/fips-in-crypto' into 'main'

See merge request isc-projects/bind9!9920
2025-02-27 15:29:07 +00:00
Aydın Mercan
f4ab4f07e3
unify fips handling to isc_crypto and make the toggle one way
Since algorithm fetching is handled purely in libisc, FIPS mode toggling
can be purely done in within the library instead of provider fetching in
the binary for OpenSSL >=3.0.

Disabling FIPS mode isn't a realistic requirement and isn't done
anywhere in the codebase. Make the FIPS mode toggle enable-only to
reflect the situation.
2025-02-27 17:37:43 +03:00
Nicki Křížek
ce47cb3ab6 new: ci: Run shotgun tests on MRs
Execute DNS Shotgun performance tests on the regular MRs and compare the changes they introduce against the MR diff base. The results are evaluated automatically - the shotgun jobs will fail if thresholds for CPU/memory/latency difference is exceeded.

Merge branch 'nicki/ci-shotgun-eval' into 'main'

See merge request isc-projects/bind9!10127
2025-02-27 13:30:26 +00:00
Nicki Křížek
29fd756408 Replace deprecated only/except with rules in .gitlab-ci.yml
The keyword rules allows more flexible and complex conditions when
deciding whether to create the job and also makes it possible run tweak
variables or job properties depending on arbitraty rules. Since it's
not possible to combine only/except and rules together, replace all
uses of only/except to avoid any potential future issues.
2025-02-27 14:26:38 +01:00
Nicki Křížek
4214c1e8a7 Run shotgun tests on MRs
If the shotgun tests are executed for MRs, compare it against the MR's
base rather than the previous release. Only fail the job in case the
performance drops (pass on performance improvements).

Note that start_in optimization was removed, since it isn't properly
supported with rules as of February 2025
(https://gitlab.com/gitlab-org/gitlab/-/issues/424203). Without this
optimization, container test images are likely to be re-built
unnecessarily when testing different protocols. A workaround for the
.gitlab-ci.yml exists, but the extra complexity doesn't seem justified.
The container image builds might change or be optimized in the future,
so let's just go with the build duplication for now.
2025-02-27 14:26:38 +01:00
Arаm Sаrgsyаn
23c1fbc609 fix: usr: Fix TTL issue with ANY queries processed through RPZ "passthru"
Answers to an "ANY" query which were processed by the RPZ "passthru"
policy had the response-policy's `max-policy-ttl` value unexpectedly
applied. This has been fixed.

Closes #5187

Merge branch '5187-rpz-passthru-any-type-ttl-bug-fix' into 'main'

See merge request isc-projects/bind9!10176
2025-02-27 09:19:12 +00:00
Aram Sargsyan
98ff3a4432 Test that RPZ "passthru" doesn't alter the answer's TTL with ANY queries
Expand the test_rpz_passthru_logging() check in the "rpzextra" system
test to check the answer's TTL values with ANY type queries.
2025-02-27 08:36:49 +00:00
Aram Sargsyan
5633dc90d3 Fix TTL issue with ANY queries processed through RPZ "passthru"
Answers to an "ANY" query which are processed by the RPZ "passthru"
policy have the response-policy's 'max-policy-ttl' value unexpectedly
applied. Do not change the records' TTL when RPZ uses a policy which
does not alter the answer.
2025-02-27 08:36:49 +00:00
Evan Hunt
49ccbe857a fix: dev: Validating ADB fetches could cause a crash in import_rdataset()
Previously, in some cases, the resolver could return rdatasets of type CNAME or DNAME without the result code being set to `DNS_R_CNAME` or `DNS_R_DNAME`. This could trigger an assertion failure in the ADB. The resolver error has been fixed.

Closes #5201

Merge branch '5201-adb-cname-error' into 'main'

See merge request isc-projects/bind9!10172
2025-02-26 20:34:27 +00:00
Evan Hunt
1edbbc32b4 set eresult based on the type in ncache_adderesult()
when the caching of a negative record failed because of the
presence of a positive one, ncache_adderesult() could override
this to ISC_R_SUCCESS. this could cause CNAME and DNAME responses
to be handled incorrectly.  ncache_adderesult() now sets the result
code correctly in such cases.
2025-02-25 21:29:19 -08:00
Mark Andrews
a102e504c3 fix: doc: Fix command to generate KSR in DNSSEC guide
Merge branch 'doc-fix-dnssec-ksr-request-command' into 'main'

See merge request isc-projects/bind9!10087
2025-02-26 01:51:33 +00:00
Doug Freed
0dd046d007 Fix command to generate KSR in DNSSEC guide 2025-02-26 01:08:52 +00:00
Evan Hunt
764eb65cf6 fix: dev: Remove 'target' from dns_adb
When a server name turns out to be a CNAME or DNAME, the ADB does not use it, but the `dns_adbname` structure still stored a copy of the target name. This is unnecessary and the code has been removed.

Merge branch 'each-remove-adb-target' into 'main'

See merge request isc-projects/bind9!10149
2025-02-26 00:43:46 +00:00
Evan Hunt
6c2af2ae3b remove 'target' from dns_adb
the target name parameter to dns_adb_createfind() was always passed as
NULL, so we can safely remove it.

relatedly, the 'target' field in the dns_adbname structure was never
referenced after being set.  the 'expire_target' field was used, but
only as a way to check whether an ADB name represents a CNAME or DNAME,
and that information can be stored as a single flag.
2025-02-26 00:43:21 +00:00
Mark Andrews
6af708f3b0 fix: usr: Fix dual-stack-servers configuration option
The dual-stack-servers configuration option was not working as expected; the specified servers were not being used when they should have been, leading to resolution failures. This has been fixed.

Closes #5019

Merge branch '5019-dual-stack-servers-wasn-t-working-in-all-cases' into 'main'

See merge request isc-projects/bind9!9708
2025-02-26 00:22:30 +00:00
Mark Andrews
14ab1629b7 Removing now unneeded priming queries
Now that fctx_try is being called when adb returns DNS_ADB_NOMOREADDRESSES
we don't need these priming queries for the dual-stack-servers test
to succeed.
2025-02-25 23:47:46 +00:00
Mark Andrews
f98a8331aa Fix dual-stack-servers
Named was stopping nameserver address resolution attempts too soon
when dual stack servers are configured.  Dual stack servers are
used when there are *not* addresses for the server in a particular
address family so find->status == DNS_ADB_NOMOREADDRESSES is not a
sufficient stopping condition when dual stack servers are available.
Call fctx_try to see if the alternate servers can be used.
2025-02-25 23:47:46 +00:00
Mark Andrews
1bc7016d7a fix: usr: Relax private DNSKEY and RRSIG constraints
DNSKEY, KEY, RRSIG and SIG constraints have been relaxed to allow empty key and signature material after the algorithm identifier for PRIVATEOID and PRIVATEDNS. It is arguable whether this falls within the expected use of these types as no key material is shared and the signatures are ineffective but these are private algorithms and they can be totally insecure.

Closes #5167

Merge branch '5167-relax-private-dnskey-constraints' into 'main'

See merge request isc-projects/bind9!10083
2025-02-25 23:39:40 +00:00
Mark Andrews
b048190e23 Relax private DNSKEY and RRSIG constraints
DNSKEY, KEY, RRSIG and SIG constraints have been relaxed to allow
empty key and signature material after the algorithm identifier for
PRIVATEOID and PRIVATEDNS. It is arguable whether this falls within
the expected use of these types as no key material is shared and
the signatures are ineffective but these are private algorithms and
they can be totally insecure.
2025-02-25 22:59:46 +00:00
Evan Hunt
5604d3a44e fix: dev: Prevent a reference leak when using plugins
The `NS_QUERY_DONE_BEGIN` and `NS_QUERY_DONE_SEND` plugin hooks could cause a reference leak if they returned `NS_HOOK_RETURN` without cleaning up the query context properly.

Closes #2094

Merge branch '2094-plugin-reference-leak' into 'main'

See merge request isc-projects/bind9!9971
2025-02-25 22:40:55 +00:00
Evan Hunt
ae37ef45ff wrap ns_client_error() for unit testing
When testing, the client object doesn't have a proper
netmgr handle, so ns_client_error() needs to be a no-op.
2025-02-25 22:40:48 +00:00
Evan Hunt
c2e4358267 prevent a reference leak from the ns_query_done hooks
if the NS_QUERY_DONE_BEGIN or NS_QUERY_DONE_SEND hook is
used in a plugin and returns NS_HOOK_RETURN, some of the
cleanup in ns_query_done() can be skipped over, leading
to reference leaks that can cause named to hang on shut
down.

this has been addressed by adding more housekeeping
code after the cleanup: tag in ns_query_done().
2025-02-25 22:40:48 +00:00
Mark Andrews
26f8ee7229 fix: usr: dnssec-signzone needs to check for a NULL key when setting offline
dnssec-signzone could dereference a NULL key pointer when resigning a zone.  This has been fixed.

Closes #5192

Merge branch '5192-dnssec-signzone-needs-to-check-for-a-null-key-when-setting-offline' into 'main'

See merge request isc-projects/bind9!10161
2025-02-25 22:22:30 +00:00
Mark Andrews
1784e4a9ae Check if key is NULL before dereferencing it 2025-02-25 21:45:37 +00:00
Evan Hunt
e16560a650 fix: dev: Simplify some dns_name API calls
Several functions in the `dns_name` module have had parameters removed, that were rarely or never used:
- `dns_name_fromtext()` and `dns_name_concatenate()` no longer take a target buffer.
- `dns_name_towire()` no longer takes a compression offset pointer; this is now part of the compression context.
- `dns_name_towire()` with a `NULL` compression context will copy name data directly into a buffer with no processing.

Merge branch 'each-simplify-names' into 'main'

See merge request isc-projects/bind9!10152
2025-02-25 21:34:31 +00:00
Evan Hunt
2f7e6eb019 allow NULL compression context in dns_name_towire()
passing NULL as the compression context to dns_name_towire()
copies the uncompressed name data directly into the target buffer.
2025-02-25 12:53:25 -08:00