2
0
mirror of https://gitlab.isc.org/isc-projects/bind9 synced 2025-08-30 14:07:59 +00:00
Commit Graph

42936 Commits

Author SHA1 Message Date
Michal Nowak
92a3487106 chg: test: Add stress tests with DoH and DoT
Validation pipeline: https://gitlab.isc.org/isc-projects/bind9/-/pipelines/160984

Prerequisites:
- [x] isc-private/devops!11
- [x] https://gitlab.isc.org/isc-projects/bind9-qa/-/merge_requests/9

Things to consider:
- FreeBSD DoH jobs are not added because Flamethrower queries always timeout.
- This adds 15 more CI jobs:
  - Linux (AWS autoscaler): `(auth + recursive + RPZ) * (DoH + DoT) * (amd64 + arm64) = 12`
  - FreeBSD (one FreeBSD runner): `(auth + recursive + RPZ) * (DoT) * (amd64) = 3`
- Autoscaler is not yet present on FreeBSD. Adding 3 CI jobs (i.e., DoT) run serially adds 3 hours to the pipeline runtime. Should we add just one FreeBSD DoT job to limit the runtime?
- DoH/DoT performance is slightly lower than pure TCP, so the threshold for the test to pass must be lowered by 5-10% (see isc-private/bind-qa!40).

Merge branch 'mnowak/stress-test-with-doh-dot' into 'main'

See merge request isc-projects/bind9!5800
2025-01-27 20:09:49 +00:00
Michal Nowak
9756292a5f Add DoH and DoT stress tests, generate test configurations
Add DoH and DoT stress test jobs. The DoH scenario on FreeBSD is omitted
because all Flamethrower's DoH queries timeout on this platform.

Since the response rate of DoT queries is lower than that of DoH and
TCP, the expected TCP response rate is 80%.

Due to the large number of similar stress test configurations, the
"util/generate-stress-test-configs.py" script now generates them as part
of a downstream pipeline. The script is expected to be run exclusively
within the CI environment, which sources all environmental variables and
files.

This refactoring brought the following changes:

- To start a stress test immediately and not wait for artifacts of the
  autoreconf job, run the "autoreconf -fi" command as part of every job.

- Drop the BIND_STRESS_TEST_* variables as they were rarely used and
  conflicted with mode and platform selection in the configuration
  generator.

- Most pipelines now include a few short, randomly selected stress test
  jobs. To schedule all stress tests, set the ALL_BIND_STRESS_TESTS
  environmental variable, push a tag to CI, or run a scheduled pipeline.

- Set the BIND_STRESS_TESTS_RUN_TIME environmental variable to pick the
  stress test runtime of your choosing, set the BIND_STRESS_TESTS_RATE
  environmental variable to set different than the default query rate.

- Job timeout is set to 30 minutes plus stress test runtime in minutes.
2025-01-27 16:17:39 +01:00
Colin Vidal
dc3c3efdbf fix: dev: fix EDE 22 time out detection
Extended DNS error 22 (No reachable authority) was previously detected when `fctx_expired` fired. It turns out this function is used as a "safety net" and the timeout detection should be caught earlier.

It was working though, because of another issue fixed by !9927. But then, the recursive request timed out detection occurs before `fctx_expired` making impossible to raise the EDE 22 error.

This fixes the problem by triggering the EDE 22 in the part of the code detecting the (TCP or UDP) time out and taking the decision to cancel the whole fetch (i.e. There is no other server to attempt to contact).

Note this is not targeting users (no release note) because there is no release versions of BIND between !9927 and this changes. Thus a release note would be confusing.

Closes #5137

Merge branch '5137-ede22' into 'main'

See merge request isc-projects/bind9!9985
2025-01-27 11:48:49 +00:00
Colin Vidal
39c2fc4670 fix byte order in EDE logging
When an EDE code is added to a message, the code is converted early in a
big-endian order so it can be memcpy-ed directly in the EDE buffer that
will go on the wire.

This previous change forget to update debug logs which still assume the
EDE code was in host byte order. Add a separate variable to
differentiate both and avoid ambiguities
2025-01-27 11:49:44 +01:00
Colin Vidal
27f3b8950a update serve-stale test to support EDE 22
When EDE 3 (stale answer) was added the serve-stale tests were checking
for those exclusively, i.e. grepping for no "EDE" in the dig output when
no stale answer was expected.

However, some stale tests disable stale answers and make the
authoritative server unresponsive, effectively triggering a timed out
request thus an EDE 22. Update those tests so they still tests the
absence of EDE 3 error, but also the presence of EDE 22.
2025-01-27 11:49:44 +01:00
Colin Vidal
7cb8a028fe add new EDE 22 system tests
This re-do a previously existing EDE 22 system test as well as add
another one making sure the timed out flow detection works also on UDP
when the resolver is contacting the authoritative server. (the existing
test was using TCP to contact the authoritative servers).
2025-01-27 11:49:44 +01:00
Colin Vidal
78274ec2b1 fix EDE 22 time out detection
Extended DNS error 22 (No reachable authority) was previously detected
when `fctx_expired` fired. It turns out this function is used as a
"safety net" and the timeout detection should be caught earlier.

It was working though, because of another issue fixed by !9927. Since
this change, the recursive request timed out detection occurs before
`fctx_expired` so EDE 22 is not added to the response message anymore.

The fix of the problem is to add the EDE 22 code in two situations:

- When the dispatch code timed out (rctx_timedout) the resolver code
  checks various properties to figure out if it needs to make another
  fetch attempt. One of the paramters if the fetch expiration time. If
  it expires, the whole recursion is canceled, so it now adds the EDE 22
  code.

- If the fetch expiration time doesn't expires in the case above (and
  other parameters allows it) a new fetch attempt is made (fctx_query).
  But before the new request is actually made, the fetch expiration time
  is re-checked. It might then has elapsed, and the whole recursion is
  canceled. So it now also adds the EDE 22 code here as well.
2025-01-27 11:49:44 +01:00
Aydın Mercan
2bee113a46 new: usr: add a rndc command to toggle jemalloc profiling
The new command is `rndc memprof`. The memory profiling status is also
reported inside `rndc status`. The status also shows whether named can
toggle memory profiling or not and if the server is built with jemalloc.

Closes #4759

Merge branch '4759-add-a-trigger-to-dump-jeprof-data-or-memory-statistics' into 'main'

See merge request isc-projects/bind9!9370
2025-01-25 12:53:38 +00:00
Aydın Mercan
b495e9918e add a rndc command to toggle jemalloc profiling
The new command is `rndc memprof`. The memory profiling status is also
reported inside `rndc status`. The status also shows whether named can
toggle memory profiling or not and if the server is built with jemalloc.
2025-01-25 14:28:41 +03:00
Nicki Křížek
b9db917752 chg: ci: Ensure changelog job builds docs with the new entry
The changelog job is supposed to test that the text from GitLab MR
title&description is valid rst syntax and can be built with sphinx. In
49128fc1, the way gitchangelog generates entries was changed - it no
longer writes to the changelog file, but generates output on stdout
instead. Ensure the generated notes is actually written to (some)
rendered file which is part of the docs so that the subsequent sphinx
build attempts to render the note.

Merge branch 'nicki/ci-fix-changelog-job' into 'main'

See merge request isc-projects/bind9!9804
2025-01-24 18:10:32 +00:00
Nicki Křížek
380a30ba8d Ensure changelog job builds docs with the new entry
The changelog job is supposed to test that the text from GitLab MR
title&description is valid rst syntax and can be built with sphinx. In
49128fc1, the way gitchangelog generates entries was changed - it no
longer writes to the changelog file, but generates output on stdout
instead. Ensure the generated notes is actually written to (some)
rendered file which is part of the docs so that the subsequent sphinx
build attempts to render the note.
2025-01-24 14:48:06 +01:00
Colin Vidal
5a8fce4851 new: usr: adds support for EDE code 1 and 2
Add support for EDE codes 1 & 2 which might occurs during DNSSEC validation in case of unsupported RRSIG algorithm or DNSKEY digest.

See #2715

Merge branch '2715-ede-unsupported-digest-alg' into 'main'

See merge request isc-projects/bind9!9948
2025-01-24 13:16:56 +00:00
Colin Vidal
244923b9dc add DNSSEC EDE test for unsupported digest and alg
A DNSSEC validation can fail in the case where multiple DNSKEY are
available for a zone and none of them are supported, but for different
reasons: one has a DS record in the parent zone using an unsupported
digest while the other one uses an unsupported encryption algorithm.

Add a specific test case covering this flow and making sure that two
extended DNS error are provided: code 1 and 2, each of them highlighting
unsupported algorithm and digest.
2025-01-24 12:26:30 +00:00
Colin Vidal
8b50d63fe1 tests for support for EDE 1 & 2 2025-01-24 12:26:30 +00:00
Colin Vidal
46a58acdf5 add support for EDE code 1 and 2
Add support for EDE codes 1 (Unsupported DNSKEY Algorithm) and 2
(Unsupported DS Digest Type) which might occurs during DNSSEC
validation in case of unsupported DNSKEY algorithm or DS digest type.

Because DNSSEC internally kicks off various fetches, we need to copy
all encountered extended errors from fetch responses to the fetch
context. Upon an event, the errors from the fetch context are copied
to the client response.
2025-01-24 12:26:30 +00:00
Michal Nowak
98522ab702 new: doc: Add linkcheck to CI
Merge branch 'mnowak/rtd-add-linkcheck' into 'main'

See merge request isc-projects/bind9!9680
2025-01-24 12:00:30 +00:00
Michal Nowak
48eab76427 Add linkcheck job 2025-01-24 12:18:37 +01:00
Michal Nowak
8302469507 Fix broken links in documentation
Some detected links are not to be verified (127.*, dnssec-or-not.com)
and some I can't fix (flaticon, godaddy, icann), but they are not
crucial.
2025-01-24 12:07:36 +01:00
Michal Nowak
4dfc12cb44 chg: test: Rewrite cipher-suites system test to pytest
The minimal dnspython version to run this test is 2.5.0.

Merge branch 'mnowak/pytest_rewrite_cipher-suites' into 'main'

See merge request isc-projects/bind9!8662
2025-01-24 08:53:25 +00:00
Michal Nowak
df7e9f4ac3 Rename have_* marks to with_*
Marks starting with "with" or "without" make more sense linguistically
than those starting with "have" or "have_not".
2025-01-24 08:45:51 +00:00
Nicki Křížek
23fb615963 Test cipher-suites after zone transfers complete
Ensure the zone transfers have completed (successfully or not) before
running the test cases, because they assume zone transfers have been
done.
2025-01-24 08:45:51 +00:00
Nicki Křížek
a72ff9fd57 Make servers fixture in pytest module-wide
The servers are setup and torn down once per each test module. All the
logs and server state persists between individual tests within the same
module. The servers fixture representing these servers should be
module-wide as well.
2025-01-24 08:45:51 +00:00
Michal Nowak
100b759863 Rewrite cipher-suites system test to pytest
The minimal required dnspython version is 2.5.0 because of the need for
the "verify" argument in dns.query.tls().
2025-01-24 08:45:51 +00:00
Michal Nowak
b2964cc922 Use Debian "sid" for pylint and mypy jobs to get recent dnspython
The base image tends to have a rather old dnspython version and when
used with pylint and mypy it produces errors about newer dnspython
features the old version does not know about.

    $ mypy "bin/tests/system/isctest/"
    bin/tests/system/isctest/query.py:55: error: Unexpected keyword argument "verify" for "tls"  [call-arg]
    /usr/lib/python3/dist-packages/dns/query.py:958: note: "tls" defined here

    $ pylint --rcfile $CI_PROJECT_DIR/.pylintrc --disable=wrong-import-position $(git ls-files 'bin/tests/system/*.py' | grep -vE 'ans\.py')
    ************* Module isctest.query
    bin/tests/system/isctest/query.py:55:11: E1123: Unexpected keyword argument 'verify' in function call (unexpected-keyword-arg)
2025-01-24 08:45:51 +00:00
Michal Nowak
df8c419058 Add isctest.query.tls() function
When explicitly set to True, the "verify" argument lets dnspython verify
certificates used for the connection. As most certificates in the system
test will inevitably be self-signed, the "verify" argument defaults to
False.

The "verify" argument is present in dnspython since the version 2.5.0.
2025-01-24 08:45:51 +00:00
Michal Nowak
feecbd8e77 Add "without_fips" mark
The "without_fips" mark disables test function when BIND 9 was built
with the FIPS mode enabled as not everything works in FIPS-enabled
builds.
2025-01-24 08:45:51 +00:00
Evan Hunt
d3455be08c rem: dev: Clean up unused result codes
A number of result codes are obsolete and can be removed. Others, including `ISC_R_NOMEMORY`, are still checked in various places even though they can't occur any longer. These have been cleaned up.

Merge branch 'each-cleanup-results' into 'main'

See merge request isc-projects/bind9!9942
2025-01-24 00:58:06 +00:00
Evan Hunt
314741fcd0 deduplicate result codes
ISCCC_R_SYNTAX, ISCCC_R_EXPIRED, and ISCCC_R_CLOCKSKEW have the
same usage and text formats as DNS_R_SYNTAX, DNS_R_EXPIRED and
DNS_R_CLOCKSCREW respectively. this was originally done because
result codes were defined in separate libraries, and some tool
might be linked with libisccc but not libdns. as the result codes
are now defined in only one place, there's no need to retain the
duplicates.
2025-01-23 15:54:57 -08:00
Evan Hunt
a19f6c6654 clean up result codes that are never used
the following result codes are obsolete and have been removed
from result.h and result.c:

        - ISC_R_NOTHREADS
        - ISC_R_BOUND
        - ISC_R_NOTBOUND
        - ISC_R_NOTDIRECTORY
        - ISC_R_EMPTY
        - ISC_R_NOTBLOCKING
        - ISC_R_INPROGRESS
        - ISC_R_WOULDBLOCK

        - DNS_R_TOOMANYHOPS
        - DNS_R_NOREDATA
        - DNS_R_BADCKSUM
        - DNS_R_MOREDATA
        - DNS_R_NOVALIDDS
        - DNS_R_UNKNOWNOPT
        - DNS_R_NOVALIDKEY
        - DNS_R_NTACOVERED

        - DST_R_COMPUTESECRETFAILURE
        - DST_R_NORANDOMNESS
        - DST_R_NOCRYPTO
2025-01-23 15:54:57 -08:00
Evan Hunt
70e3d91396 clean up uses of DST_R_NOCRYPTO
building BIND without crypto support is no longer possible.
consequently this result code is never sent, and therefore we
don't need code in calling functions to handle it.
2025-01-23 15:54:57 -08:00
Evan Hunt
10accd6260 clean up uses of ISC_R_NOMEMORY
the isc_mem allocation functions can no longer fail; as a result,
ISC_R_NOMEMORY is now rarely used: only when an external library
such as libjson-c or libfstrm could return NULL. (even in
these cases, arguably we should assert rather than returning
ISC_R_NOMEMORY.)

code and comments that mentioned ISC_R_NOMEMORY have been
cleaned up, and the following functions have been changed to
type void, since (in most cases) the only value they could
return was ISC_R_SUCCESS:

- dns_dns64_create()
- dns_dyndb_create()
- dns_ipkeylist_resize()
- dns_kasp_create()
- dns_kasp_key_create()
- dns_keystore_create()
- dns_order_create()
- dns_order_add()
- dns_peerlist_new()
- dns_tkeyctx_create()
- dns_view_create()
- dns_zone_setorigin()
- dns_zone_setfile()
- dns_zone_setstream()
- dns_zone_getdbtype()
- dns_zone_setjournal()
- dns_zone_setkeydirectory()
- isc_lex_openstream()
- isc_portset_create()
- isc_symtab_create()

(the exception is dns_view_create(), which could have returned
other error codes in the event of a crypto library failure when
calling isc_file_sanitize(), but that should be a RUNTIME_CHECK
anyway.)
2025-01-23 15:54:57 -08:00
Nicki Křížek
af0a5dfeeb chg: ci: Set stricter limits for respdiff testing
Adjust the limit of maximum disagreements in respdiff results based on
recent pipeline results.

The respdiff and respdiff:asan seem to have almost identical results,
typically around 0.07 % of differences with ocassional spikes up to
around 0.11 %. Similar results are for respdiff:tsan, perhaps with more
common spikes with values up to around 0.12 %. Set the limit to 0.15 %
to allow for some tolerance due to network conditions, time of day etc.

The respdiff:third-party has a slightly higher disagreements average,
with typical values being around 0.12 %. Set the limit to 0.2 %.

Exceeding either of those values should be quite clear indication that
some resolution behaviour has changed, since the values appear to be
very stable within the newly configured limits.

Merge branch 'nicki/ci-respdiff-limits' into 'main'

See merge request isc-projects/bind9!9950
2025-01-23 17:26:40 +00:00
Nicki Křížek
0584d3f65f Set stricter limits for respdiff testing
Adjust the limit of maximum disagreements in respdiff results based on
recent pipeline results.

The respdiff and respdiff:asan seem to have almost identical results,
typically around 0.07 % of differences with ocassional spikes up to
around 0.11 %. Similar results are for respdiff:tsan, perhaps with more
common spikes with values up to around 0.12 %. Set the limit to 0.15 %
to allow for some tolerance due to network conditions, time of day etc.

The respdiff:third-party has a slightly higher disagreements average,
with typical values being around 0.12 %. Set the limit to 0.2 %.

Exceeding either of those values should be quite clear indication that
some resolution behaviour has changed, since the values appear to be
very stable within the newly configured limits.
2025-01-23 18:19:35 +01:00
Matthijs Mekking
da207678f3 chg: doc: Document how secondaries refresh a zone in the ARM
Closes #5123

Merge branch '5123-document-refreshing-a-secondary' into 'main'

See merge request isc-projects/bind9!9966
2025-01-23 15:52:54 +00:00
Matthijs Mekking
8daf3782d1 Document how secondaries refresh a zone in the ARM
We have a KB article that describes this, put a condensed version into
the ARM.
2025-01-23 15:52:31 +00:00
Matthijs Mekking
417a0e331f fix: doc: Clarify dnssec-signzone interval option
There was confusion about whether the interval was calculated from
the validity period provided on the command line (with -s and -e),
or from the signature being replaced.

Add text to clarify that the interval is calculated from the new
validity period.

Closes #5128

Merge branch '5128-clarify-dnssec-signzone-interval' into 'main'

See merge request isc-projects/bind9!9955
2025-01-23 11:12:33 +00:00
Matthijs Mekking
ae42fa69fa Clarify dnssec-signzone interval option
There was confusion about whether the interval was calculated from
the validity period provided on the command line (with -s and -e),
or from the signature being replaced.

Add text to clarify that the interval is calculated from the new
validity period.
2025-01-23 11:12:25 +00:00
Matthijs Mekking
8efb4e2f26 fix: usr: Fix a bug in dnssec-signzone related to keys being offline
In the case when `dnssec-signzone` is called on an already signed zone, and the private key file is unavailable, a signature that needs to be refreshed may be dropped without being able to generate a replacement. This has been fixed.

Closes #5126

Merge branch '5126-dnssec-signzone-retain-rrsig-if-key-is-offline' into 'main'

See merge request isc-projects/bind9!9951
2025-01-23 10:36:15 +00:00
Matthijs Mekking
5e3aef364f dnssec-signzone retain signature if key is offline
Track inside the dns_dnsseckey structure whether we have seen the
private key, or if this key only has a public key file.

If the key only has a public key file, or a DNSKEY reference in the
zone, mark the key 'pubkey'. In dnssec-signzone, if the key only
has a public key available, consider the key to be offline. Any
signatures that should be refreshed for which the key is not available,
retain the signature.

So in the code, 'expired' becomes 'refresh', and the new 'expired'
is only used to determine whether we need to keep the signature if
the corresponding key is not available (retaining the signature if
it is not expired).

In the 'keysthatsigned' function, we can remove:
  -	key->force_publish = false;
  -	key->force_sign = false;

because they are redundant ('dns_dnsseckey_create' already sets these
values to false).
2025-01-23 09:43:07 +00:00
Matthijs Mekking
0a91321d78 Test dnssec-signzone with private key file missing
Add a test case for the scenario below.

There is a case when signing a zone with dnssec-signzone where the
private key file is moved outside the key directory (for offline
ksk purposes), and then the zone is resigned. The signature of the
DNSKEY needs refreshing, but is not expired.

Rather than removing the signature without having a valid replacement,
leave the signature in the zone (despite it needs to be refreshed).
2025-01-23 09:43:07 +00:00
Matthijs Mekking
eec0aaa391 fix: dev: Fix possible truncation in dns_keymgr_status()
If the generated status output exceeds 4096 it was silently truncated, now we output that the status was truncated.

Closes #4180

Merge branch '4180-possible-truncation-in-dns_keymgr_status' into 'main'

See merge request isc-projects/bind9!9905
2025-01-23 09:40:05 +00:00
Matthijs Mekking
7ae7851173 Fix possible truncation in dns_keymgr_status()
If the generated status output exceeds 4096 it was silently truncated,
now we output that the status was truncated.
2025-01-23 09:31:00 +01:00
Mark Andrews
e57ebb8f1b fix: usr: Yaml string not terminated in negative response in delv
Closes #5098

Merge branch '5098-missing-yaml-string-termination-delv' into 'main'

See merge request isc-projects/bind9!9922
2025-01-22 23:55:50 +00:00
Mark Andrews
9c04640def Check delv +yaml negative response output 2025-01-22 21:33:08 +00:00
Mark Andrews
89afc11389 Terminate yaml string after negative comment 2025-01-22 21:33:08 +00:00
Colin Vidal
076e47b427 new: usr: Add support for multiple extended DNS errors
Extended DNS error mechanism (EDE) may have several errors raised during a DNS resolution. `named` is now able to add up to three EDE codes in a DNS response. In the case of duplicate error codes, only the first one will be part of the DNS response.

Closes #5085

Merge branch '5085-multiple-ede' into 'main'

See merge request isc-projects/bind9!9952
2025-01-22 21:32:28 +00:00
Colin Vidal
950a0cffb3 add unit tests covering multiple EDE support 2025-01-22 21:07:44 +01:00
Colin Vidal
4096f27130 add support for multiple EDE
Extended DNS error mechanism (EDE) enables to have several EDE raised
during a DNS resolution (typically, a DNSSEC query will do multiple
fetches which each of them can have an error). Add support to up to 3
EDE errors in an DNS response. If duplicates occur (two EDEs with the
same code, the extra text is not compared), only the first one will be
part of the DNS answer.

Because the maximum number of EDE is statically fixed, `ns_client_t`
object own a static vector of `DNS_DE_MAX_ERRORS` (instead of a linked
list, for instance). The array can be fully filled (all slots point to
an allocated `dns_ednsopt_t` object) or partially filled (or
empty). In such case, the first NULL slot means there is no more EDE
objects.
2025-01-22 21:07:44 +01:00
Arаm Sаrgsyаn
66d4f9184a chg: dev: Use a suitable response in tcp_connected() when initiating a read
When 'ISC_R_TIMEDOUT' is received in 'tcp_recv()', it times out the
oldest response in the active responses queue, and only after that it
checks whether other active responses have also timed out. So when
setting a timeout value for a read operation after a successful
connection, it makes sense to take the timeout value from the oldest
response in the active queue too, because, theoretically, the responses
can have different timeout values, e.g. when the TCP dispatch is shared.
Currently 'resp' is always NULL. Previously when connect and read timeouts
were not separated in dispatch this affected only logging, but now since
we are setting a new timeout after a successful connection, we need to
choose a suitable response from the active queue.

Merge branch 'aram/dispatch-tcp_connected-fix' into 'main'

See merge request isc-projects/bind9!9927
2025-01-22 13:41:25 +00:00
Aram Sargsyan
a6d6c3cb45 Clean up fctx->next_timeout
Since the support for non-zero values of stale-answer-client-timeout
was removed in bd7463914f, 'next_timeout'
is unused. Clean it up.
2025-01-22 13:40:45 +00:00