Make the code block handling the --with-libidn2=/path/to/libidn2 form of
the --with-libidn2 build-time option behave more similarly to the
PKG_CHECK_MODULES() macro.
The DLZ_DRIVER_MYSQL_INCLUDES and DLZ_DRIVER_MYSQL_LIBS build variables
are not used anywhere. Remove their definitions and the associated
AC_SUBST() calls.
LMDB build variables are already substituted by AC_SUBST() calls in
configure.ac and therefore the latter should not be duplicated in the
AX_LIB_LMDB() helper macro.
The "stress" test can be run in different ways, depending on:
- the tested scenario (authoritative, recursive),
- the operating system used (Linux, FreeBSD),
- the architecture used (amd64, arm64).
Currently, all supported "stress" test variants are automatically
launched for all scheduled pipelines and for pipelines started for tags;
there is no possibility of running these tests on demand, which could be
useful in certain circumstances.
Employ the "only:variables" key to enable fine-grained control over the
list of "stress" test jobs to be run for a given pipeline. Three CI
variables are used to specify the list of "stress" test jobs to create:
- BIND_STRESS_TEST_MODE: specifies the test mode to use; must be
explicitly set in order for any "stress" test job to be created;
allowed values are: "authoritative", "recursive",
- BIND_STRESS_TEST_OS: specifies the operating system to run the test
on; allowed values are: "linux", "freebsd"; defaults to "linux", may
be overridden at pipeline creation time,
- BIND_STRESS_TEST_ARCH: specifies the architecture to run the test
on; allowed values are: "amd64", "arm64"; defaults to "amd64", may
be overridden at pipeline creation time.
Since case-insensitive regular expressions are used for determining
which jobs to run, every variable described above may contain multiple
values. For example, setting the BIND_STRESS_TEST_MODE variable to
"authoritative,recursive" will cause the "stress" test to be run in both
supported scenarios (either on the default OS/architecture combination,
i.e. Linux/amd64, or, if the relevant variables are explicitly
specified, the requested OS/architecture combinations).
When calling the high level netmgr functions, the callback would be
sometimes called synchronously if we catch the failure directly, or
asynchronously if it happens later. The synchronous call to the
callback could create deadlocks as the caller would not expect the
failed callback to be executed directly.
Add one test that checks the behavior when serve-stale is enabled
via configuration (as opposed to enabled via rndc).
Add one test that checks the behavior when stale-refresh-time is
disabled (set to 0).
Using a 'stale-answer-ttl' the same value as the authoritative ttl
value makes it hard to differentiate between a response from the
stale cache and a response from the authoritative server.
Change the stale-answer-ttl from 2 to 4, so that it differs from the
authoritative ttl.
The strategy of running many dig commands in parallel and
waiting for the respective output files to be non empty was
resulting in random test failures, hard to reproduce, where
it was possible that the subsequent reading of the files could
have been failing due to the file's content not being fully flushed.
Instead of checking if output files are non empty, we now wait
for the dig processes to finish.
This test works as follow:
- Query for data.example rrset.
- Sleep until its TTL expires (2 secs).
- Disable authoritative server.
- Query for data.example again.
- Since server is down, answer come from stale cache, which has
a configured stale-answer-ttl of 3 seconds.
- Enable authoritative server.
- Query for data.example again
- Since last query before activating authoritative server failed, and
since 'stale-refresh-time' seconds hasn't elapsed yet, answer should
come from stale cache and not from the authoritative server.
Before the stale-refresh-time feature, the system test for ancient rrset
was somewhat based on the average time the previous tests and queries
were taking, thus not very precise.
After the addition of stale-refresh-time the system test for ancient
rrset started to fail since the queries for stale records (low
max-stale-ttl) were not taking the time to do a full resolution
anymore, since the answers now were coming from the cache (because the
rrset were stale and within stale-refresh-time window after the
previous resolution failure).
To handle this, the correct time to wait before rrset become ancient is
calculated from max-stale-ttl configuration plus the TTL set in the
rrset used in the tests (ans2/ans.pl).
Then before sending queries for ancient rrset, we check if we need to
sleep enough to ensure those rrset will be marked as ancient.
RFC 8767 recommends that attempts to refresh to be done no more
frequently than every 30 seconds.
Added check into named-checkconf, which will warn if values below the
default are found in configuration.
BIND will also log the warning during loading of configuration in the
same fashion.
Before this update, BIND would attempt to do a full recursive resolution
process for each query received if the requested rrset had its ttl
expired. If the resolution fails for any reason, only then BIND would
check for stale rrset in cache (if 'stale-cache-enable' and
'stale-answer-enable' is on).
The problem with this approach is that if an authoritative server is
unreachable or is failing to respond, it is very unlikely that the
problem will be fixed in the next seconds.
A better approach to improve performance in those cases, is to mark the
moment in which a resolution failed, and if new queries arrive for that
same rrset, try to respond directly from the stale cache, and do that
for a window of time configured via 'stale-refresh-time'.
Only when this interval expires we then try to do a normal refresh of
the rrset.
The logic behind this commit is as following:
- In query.c / query_gotanswer(), if the test of 'result' variable falls
to the default case, an error is assumed to have happened, and a call
to 'query_usestale()' is made to check if serving of stale rrset is
enabled in configuration.
- If serving of stale answers is enabled, a flag will be turned on in
the query context to look for stale records:
query.c:6839
qctx->client->query.dboptions |= DNS_DBFIND_STALEOK;
- A call to query_lookup() will be made again, inside it a call to
'dns_db_findext()' is made, which in turn will invoke rbdb.c /
cache_find().
- In rbtdb.c / cache_find() the important bits of this change is the
call to 'check_stale_header()', which is a function that yields true
if we should skip the stale entry, or false if we should consider it.
- In check_stale_header() we now check if the DNS_DBFIND_STALEOK option
is set, if that is the case we know that this new search for stale
records was made due to a failure in a normal resolution, so we keep
track of the time in which the failured occured in rbtdb.c:4559:
header->last_refresh_fail_ts = search->now;
- In check_stale_header(), if DNS_DBFIND_STALEOK is not set, then we
know this is a normal lookup, if the record is stale and the query
time is between last failure time + stale-refresh-time window, then
we return false so cache_find() knows it can consider this stale
rrset entry to return as a response.
The last additions are two new methods to the database interface:
- setservestale_refresh
- getservestale_refresh
Those were added so rbtdb can be aware of the value set in configuration
option, since in that level we have no access to the view object.
This commit adds couple of additional safeguards against running
sends/reads on inactive sockets. The changes was modeled after the
changes we made to netmgr/tcpdns.c