When resquery_response() was called with ISC_R_SHUTTINDOWN, the region
argument would be NULL, but rctx_respinit() would try to pass
region->base and region->len to the isc_buffer_init() leading to
a NULL pointer dereference. Properly handle non-ISC_R_SUCCESS by
ignoring the provided region.
Previously, an AXFR request would be issued every second while waiting
for the zone to be signed. This might've been the cause of issues in CI
where many tests are running in parallel and any extra load may increase
test instability.
Instead, check for the last NSEC record to have a signature before
commencing the AXFR request to check the zone has been fully signed.
Also increase the time for the zone signing to a total of 60+10 seconds
up from the previous 30.
Ensure messages from dupsigs system test end up in its log rather than
stdout. Previously, the output was hard to debug when running the tests
in parallel and messages wouldn't end up in the dupsigs.log.
This should delay the catalog zone from being destroyed during
shutdown, if the update process is still running.
Doing this should not introduce significant shutdown delays, as
the update function constantly checks the 'shuttingdown' flag
and cancels the process if it is set.
stop and restart the server in the 'tsiggss' test, in order
to confirm that GSS negotiated TSIG keys are saved and restored
when named loads.
added logging to dns_tsigkey_createfromkey() to indicate whether
a key has been statically configured, generated via GSS negotiation,
or restored from a file.
The instance-wide GitLab CI artifact retention time was changed to 1 day
up from the previous value of 12 hours. Remove our explicit overrides
for 1 day artifact retention time, as it is the default now.
Previously, most of our jobs had overrides for 1 day retention, while
some of our jobs used the default 12 hours. This discrepancy could be
quite impractical at times.
Resolve "NSEC records aren't signed with both configured algorithms during NSEC3->NSEC transition"
Closes#3937
See merge request isc-projects/bind9!7682
If the zone already has existing NSEC/NSEC3 chains then zone_sign
needs to continue to use them. If there are no chains then use
kasp setting otherwise generate an NSEC chain.
The dnstap system test fails intermittently, and it appears to be
a timing issue - adding a short delay after running 'fstrm_capture',
and before running 'dnstap -reopen' improves the situation from
50% failures (5 out of 10 times) to 0% failures (0 out of 20 times),
tested locally.
The reason is that 'fstrm_capture' is executed in the background,
and due to OS scheduling and other factors, the listener socket
may not be ready when the following command runs and tells 'named'
to (re)open it.
The CodeQL and SonarCloud GitHub Actions would FTBFS because of missing
liburcu-dev package resulting. Install the required package to both
GitHub Action files.
[func] BIND now requires liburcu for lock-free data structures
and concurrent safe memory reclamation. It replaces the
home-grown lock-free linked list and QSBR machinery
added in changes 6108 and 6109. The qp-trie code has
been adjusted to use liburcu.
BIND needs a collection of standard lock-free data structures,
which we can find in liburcu, along with its RCU safe memory
reclamation machinery. We will use liburcu's QSBR variant instead
of the home-grown isc_qsbr.
ISC_REFCOUNT_TRACE_IMPL uses isc_tid(), but the corresponding header
file is not included, which breaks, for example, compiling BIND with
DNS_CATZ_TRACE defined in lib/dns/include/dns/catz.h.
Add '#include <isc/tid.h>' in lib/isc/include/isc/refcount.h.
BUILD_PARALLEL_JOBS environmental variable is set to 6, which does not
align well with 4 and 8 CPU core systems dedicated to CI "stress" tests.
When multiple parallel jobs run on the host, they compete for resources
with an undesirable result: 6 compiler processes of one job may starve
named, resulting in lower-than-expected throughput and minutes-long
query response latency spikes.
Better drop the build parallelism of BIND-under-test. About 1-2 minutes
are added to the 60-65 minutes long job duration.
Since pregenerated manual pages were removed from the BIND 9 repository,
Sphinx must be present in the build environment for manual pages to be
created and placed to release tarball. release-tarball-comparison.sh
script needs to be adapted to keep up with how to release tarballs are
prepared.
Dumping of the freshly transferred zone file can take some time.
Retry 5 times before failing.
The log excerpt below shows such a case, when dumping lasted more than
two seconds.
06-Mar-2023 09:32:09.973 zone example6/IN: Transfer started.
06-Mar-2023 09:32:10.301 zone example6/IN: zone transfer finished: success
06-Mar-2023 09:32:10.301 zone_dump: zone example6/IN: enter
06-Mar-2023 09:32:11.789 client @0x7fe9ab435d68 10.53.0.10#44113 (example6): AXFR request
06-Mar-2023 09:32:11.801 client @0x7fe9ab435d68 10.53.0.10#44113 (example6): transfer of 'example6/IN': AXFR ended: 5 messages, 2676 records, 55815 bytes, 0.011 secs (5074090 bytes/sec) (serial 1397051952)
06-Mar-2023 09:32:12.409 zone_gotwritehandle: zone example6/IN: enter
06-Mar-2023 09:32:12.421 dump_done: zone example6/IN: enter
06-Mar-2023 09:32:12.421 zone_journal_compact: zone example6/IN: target journal size 53044
There can be comments in dig output for a zone transfer only in case
of an error, so we should print those errors not when wait_for_tls_xfer
succeeds, but when it fails.
Also, there is no point in printing those comments when a failure was
indeed expected.
By omission, BIND was not built with common CFLAGS in the stress test
jobs. Building with common CFLAGS and -Og should help GDB produce a
backtrace with more information.