Resolve "NSEC records aren't signed with both configured algorithms during NSEC3->NSEC transition"
Closes#3937
See merge request isc-projects/bind9!7682
If the zone already has existing NSEC/NSEC3 chains then zone_sign
needs to continue to use them. If there are no chains then use
kasp setting otherwise generate an NSEC chain.
The dnstap system test fails intermittently, and it appears to be
a timing issue - adding a short delay after running 'fstrm_capture',
and before running 'dnstap -reopen' improves the situation from
50% failures (5 out of 10 times) to 0% failures (0 out of 20 times),
tested locally.
The reason is that 'fstrm_capture' is executed in the background,
and due to OS scheduling and other factors, the listener socket
may not be ready when the following command runs and tells 'named'
to (re)open it.
The CodeQL and SonarCloud GitHub Actions would FTBFS because of missing
liburcu-dev package resulting. Install the required package to both
GitHub Action files.
[func] BIND now requires liburcu for lock-free data structures
and concurrent safe memory reclamation. It replaces the
home-grown lock-free linked list and QSBR machinery
added in changes 6108 and 6109. The qp-trie code has
been adjusted to use liburcu.
BIND needs a collection of standard lock-free data structures,
which we can find in liburcu, along with its RCU safe memory
reclamation machinery. We will use liburcu's QSBR variant instead
of the home-grown isc_qsbr.
ISC_REFCOUNT_TRACE_IMPL uses isc_tid(), but the corresponding header
file is not included, which breaks, for example, compiling BIND with
DNS_CATZ_TRACE defined in lib/dns/include/dns/catz.h.
Add '#include <isc/tid.h>' in lib/isc/include/isc/refcount.h.
BUILD_PARALLEL_JOBS environmental variable is set to 6, which does not
align well with 4 and 8 CPU core systems dedicated to CI "stress" tests.
When multiple parallel jobs run on the host, they compete for resources
with an undesirable result: 6 compiler processes of one job may starve
named, resulting in lower-than-expected throughput and minutes-long
query response latency spikes.
Better drop the build parallelism of BIND-under-test. About 1-2 minutes
are added to the 60-65 minutes long job duration.
Since pregenerated manual pages were removed from the BIND 9 repository,
Sphinx must be present in the build environment for manual pages to be
created and placed to release tarball. release-tarball-comparison.sh
script needs to be adapted to keep up with how to release tarballs are
prepared.
Dumping of the freshly transferred zone file can take some time.
Retry 5 times before failing.
The log excerpt below shows such a case, when dumping lasted more than
two seconds.
06-Mar-2023 09:32:09.973 zone example6/IN: Transfer started.
06-Mar-2023 09:32:10.301 zone example6/IN: zone transfer finished: success
06-Mar-2023 09:32:10.301 zone_dump: zone example6/IN: enter
06-Mar-2023 09:32:11.789 client @0x7fe9ab435d68 10.53.0.10#44113 (example6): AXFR request
06-Mar-2023 09:32:11.801 client @0x7fe9ab435d68 10.53.0.10#44113 (example6): transfer of 'example6/IN': AXFR ended: 5 messages, 2676 records, 55815 bytes, 0.011 secs (5074090 bytes/sec) (serial 1397051952)
06-Mar-2023 09:32:12.409 zone_gotwritehandle: zone example6/IN: enter
06-Mar-2023 09:32:12.421 dump_done: zone example6/IN: enter
06-Mar-2023 09:32:12.421 zone_journal_compact: zone example6/IN: target journal size 53044
There can be comments in dig output for a zone transfer only in case
of an error, so we should print those errors not when wait_for_tls_xfer
succeeds, but when it fails.
Also, there is no point in printing those comments when a failure was
indeed expected.
By omission, BIND was not built with common CFLAGS in the stress test
jobs. Building with common CFLAGS and -Og should help GDB produce a
backtrace with more information.
In base32_decode_char the GCC 12 static analyser fails to determine
that ctx->val[1], ctx->val[3], ctx->val[4] and ctx->val[6] are
assigned values by the previous call to base32_decode_char. Initialise
ctx->val to zeros when initalising the rest of ctx to silence the
false positive.
REQUIRE that rdata->type is dns_rdatatype_svcb to detect when
dns_rdata_checksvcb is called with the wrong rdata type. There are
no code paths that currently pass the wrong rdata to dns_rdata_checksvcb.
This was found by GCC 12 static analysis.
The serve-stale system test was intermittently failing due to a timing
issue:
I:serve-stale:check stale data.example TXT was refreshed...
I:serve-stale:failed
The RRset is refreshed, however, it first checks for an expected log
line, prior checking that the stale data.example TXT was refreshed
(using dig). This log line is there to ensure the record is actually
refreshed before we start querying again. Alternatively we could just
retry_quiet 10 <wait for dig output matches expectations>. It would
lower the chances for intermittent test failures, since there is no
longer a "check for log line, sleep one second if check fails, check
for log line, ...", prior to the check.
Rename and simplify dst__openssl_compare_keypair() to
dst__openssl_keypair_compare(), and introduce two additional functions
dst__openssl_keypair_isprivate and dst__openssl_keypair_destroy.
Use those to de-duplicated openssl{rsa,ecdsa}_isprivate, and
openssl{rsa,ecdsa}_destroy.
without diffie-hellman TKEY negotiation, some other code is
now effectively dead or unnecessary, and can be cleaned up:
- the rndc tsig-list and tsig-delete commands.
- a nonoperational command-line option to dnssec-keygen that
was documented as being specific to DH.
- the section of the ARM that discussed TKEY/DH.
- the functions dns_tkey_builddeletequery(), processdeleteresponse(),
and tkey_processgssresponse(), which are unused.