Tweak various system test which have been unstable in the past weeks.
Closes#5406
Merge branch 'nicki/improve-system-test-stability' into 'main'
See merge request isc-projects/bind9!10690
The code which checks for both IPv4 and IPv6 mixed usage is inherently
unstable, since the address family is chosen randomly for each
connection.
Closes#5406
It's possible to use pytest.mark.flaky, which achieves the exact same
thing as our custom-defined isctest.mark.flaky -- attempts to rerun the
test on failure, but only is flaky package is available.
The test_kasp_case[secondary.kasp] can sometimes fail on freebsd13. It
appears the test gets stuck on some operation which should be very
quick, but for some reason takes at least a few seconds, causing the
cb_ixfr_is_signed() function to time out.
In one of the cases I investigated, it wasn't a query/response that
caused a timeout, but rather some operation in between. The test
attempts to read from a keyfile/statefile, but I see no reason why that
should block.
In any case, try to increase the timeout for the verification, as that
shouldn't hurt. Also allow the test to be re-run on freebsd13, as it's
likely to be caused by some odd behaviour on that platform -- the issue
doesn't appear anywhere else.
The check "unix socket message counts" sometimes fails with "dnstap
output file smaller than expected". This only happens on freebsd13 and
can't be reproduced easily. There was an attempt to decrease the
required file size in the past, but apparently, the issue can still
occur.
The serve_stale test has some inherent instabilities affecting many
different checks. While the failure rate isn't too high (about four
failures in past three weeks of nightlies), it gets ignored, because the
test has been unstable for a very long time.
This removes a leftover check which should've been removed in a prior
change (see #5244). The softhsm2 failures when attempting to delete the
token should be ignored.
Previously, the one-second sleep was unreliable, as it didn't properly
indicate that the rndc reconfig has been processed. The "test 'rndc
reconfig' with a broken config" check would sometimes fail under TSAN
in CI, because the previous rndc reconfig was still ongoing, and the
subsequent rndc reconfig was ignored.
These tests have been unstable under TSAN in the past, but it appears
that the same failure mode can happen outside of TSAN tests as well.
These tests have produced 12 failures combined in the past three weeks
in nightlies.
The fetchlimit test has failed 8 times in the nightly CI over the past
three weeks. That makes the overall failure rate somewhere around 1 %,
which isn't a lot, but is still annoying when lots of testing is going
on.
Rndc test "test 'rndc reconfig' with a broken config" was failing
intermittently.
Wait for 'running' to be logged rather than just using 'sleep 1' before
calling 'rndc reconfig' a second time to get the expected error message
rather than 'reconfig request ignored: already running'.
Closes#5408
Merge branch '5408-rndc-test-second-rndc-reconfig-happens-too-soon' into 'main'
See merge request isc-projects/bind9!10687
Rndc test "test 'rndc reconfig' with a broken config" was failing
intermittently.
Wait for 'running' to be logged rather than just using 'sleep 1' before
calling 'rndc reconfig' a second time to get the expected error message
rather than 'reconfig request ignored: already running'.
There are many system tests where we set `dnssec-validation yes;` only
to also set `trust-anchors { };` which effectively disables the
validation.
This MR replaces this convoluted setup with just `dnssec-validation no;`.
Merge branch 'stepan/empty-trust-anchors-in-system-tests' into 'main'
See merge request isc-projects/bind9!10684
There are many system tests where we set `dnssec-validation yes;` only
to also set `trust-anchors { };` which effectively disables the
validation.
This commit replaces this convoluted setup with just
`dnssec-validation no;`.
On MRs it uses the merge target as the reference.
In schedules it uses the latest released version for this branch as the reference.
This MR lays the ground work for using respdiff on non-standard configurations (like ECS) in the public repo, see https://gitlab.isc.org/isc-private/bind9/-/merge_requests/807#note_573140.
To reduce the future hassle when maintaining the -S version, most of the work (including an added job, so we know that it actually works) is done here.
Merge branch 'stepan/respdiff-against-merge-target-or-last-release' into 'main'
See merge request isc-projects/bind9!10664
There are three adbname flags that are used to identify different
types of adbname lookups when hashing rather than using multiple
hash tables. Separate these to their own structure element as these
need to be able to be read without locking the adbname structure.
Closes#5404
Merge branch '5404-seperate-out-adbname-type-flags' into 'main'
See merge request isc-projects/bind9!10677
There are three adbname flags that are used to identify different
types of adbname lookups when hashing rather than using multiple
hash tables. Separate these to their own structure element as these
need to be able to be read without locking the adbname structure.
In specific circumstances the :iscman:`named` resolver process could
terminate unexpectedly when stale answers were enabled and the
``stale-answer-client-timeout 0`` configuration option was used.
This has been fixed.
See isc-projects/bind9#5372
Merge branch '5372-security-serve-stale-crash-on-insist-unreachable' into 'v9.21.10-release'
See merge request isc-private/bind9!808
If ns__query_start() is called because of a chained query (e.g.
after encountering a CNAME), a previously set DNS_DBFIND_STALETIMEOUT
flag on the query's 'dboptions' field can cause an assertion
failure if the new query's 'stalefirst' value is not true (e.g. if the
target qname is an authoritative zone for the server). Reset the
DNS_DBFIND_STALETIMEOUT flag in the query_lookup() function before
evaluating the 'stalefirst' value, and make sure to assign a fresh
value to the `stalefirst' flag instead of conditionally assigning it
only if the value is 'true'.
Mark unstable unit tests with `flaky` test suite. Execute the stable
separately in CI. Allow the flaky ones to be re-executed once in case
they fail.
Closes#5385
Merge branch '5385-rerun-flaky-unit-tests' into 'main'
See merge request isc-projects/bind9!10665
The Meson build system does not use `configure.ac`. Remove all mentions
of this file from documentation and scripts.
See #5379
Merge branch 'andoni/remove-references-to-configureac' into 'main'
See merge request isc-projects/bind9!10672
dangerfile.py checked for new configure switches in `configure.ac`,
these were annotated with "# [pairwise:..." in a leading line. Meson
reads those from `meson_options.txt` instead.
The 'plain' optimization level doesn't add any flags and gives the
control to the packager. Similarly, avoid any hardening flags in this
level.
Necessary flags such as `-fno-delete-null-pointer-checks` and
`-fno-strict-aliasing` are still included.
Merge branch 'aydin/plain-build' into 'main'
See merge request isc-projects/bind9!10673
The 'plain' optimization level doesn't add any flags and gives the
control to the packager. Similarly, avoid any hardening flags in this
level.
Necessary flags such as `-fno-delete-null-pointer-checks` and
`-fno-strict-aliasing` are still included.
When the interface-interval parser was changed from uint32 parser to
duration parser, the default value stayed at plain number `60` which
now means 60 seconds instead of 60 minutes. The documentation also
incorrectly states that the value is in minutes. That has been fixed.
Closes#5246
Merge branch '5246-fix-default-interface-interval' into 'main'
See merge request isc-projects/bind9!10281
When the interface-interval parser was changed from uint32 parser to
duration parser, the default value stayed at plain 60 which now means 60
seconds instead of 60 minutes. Fix the default value and the
documentation to match the reality.
Root trust anchors are automatically updated as described in RFC5011.
Add a system test which ensures the root DNSKEYs are always queried by
named during startup.
Because this test uses real internet DNS root servers, it is enabled
only when `CI_ENABLE_LIVE_INTERNET_TESTS` is set.
Merge branch 'colin/updaterootdnskey' into 'main'
See merge request isc-projects/bind9!10615
Root trust anchors are automatically updated as described in RFC5011.
Add a system test which ensures the root DNSKEYs are always queried by
named during startup.
Because this test uses real internet DNS root servers, it is enabled
only when `CI_ENABLE_LIVE_INTERNET_TESTS` is set.
Change the .inuse member of memory context to have a loop-local
variable, so there's no contention even when the same memory
context is shared among multiple threads.
Closes#5354
Merge branch '5354-prevent-false-sharing-in-isc_mem' into 'main'
See merge request isc-projects/bind9!10555
The .inuse member was causing a lot of contention between threads using
the same memory context. Scather the .inuse and .overmem members of
isc_mem_t structure to be an per-tid array of variables to reduce the
contention as the writes are now independent of each other.
The array uses one tad bit nasty trick, as ISC_TID_UNKNOWN is now -1,
the array has been sized to fit the unknown tid with [-1] index into the
array accomplished with `ctx->stat = &ctx->stat_s[1];`. It will not win
a beauty contest, but it works seamlessly by just passing `isc_tid()` as
an index into the array.
The caveat here is that gathering the real inuse value requires walking
the whole array for all registered tid values (isc_tid_count()). The
gather part happens only when statistics are being gathered or when
isc_mem_isovermem() is called. As the isc_mem_isovermem() call happens
only when new data is being added to cache or ADB, it doesn't happen on
the hottest (read-only) path and according to the measurements, it
doesn't slow down neither the cold cache nor the hot cache latency.
As POSIX guarantees only that the type ssize_t shall be capable of
storing values at least in the range [-1, {SSIZE_MAX}], it can't be used
to calculate the difference between two memory sizes. Change the logic
for junk filling to test whether the new size is larger than old size
and then use size_t as the result will be always positive.
The .hi_called member was dead structure member and it hasn't been used
since the overmem callback has been removed in commit
14bdd21e0a7ad5f115bb2427d4f88fe7a84e9324.
The jemalloc arena in isc_mem was added to solve runaway memory problem
for outgoing TCP connections. In the end, this was a red herring and
the jemalloc arena code is now unused (via e28266bf). Remove the
support for jemalloc memory arenas as we can restore this at any time if
we need it ever again, but right now it's just a dead code.
In jemalloc_shim.h, we relied on including <isc/overflow.h> implicitly
instead of explicitly and same was happening inside isc/overflow.h - the
stdbool.h (for bool type) was being included implicitly instead of
explicitly.