This commit allows to specify "disabled" or "off" in
stale-answer-client-timeout statement. The logic to support this
behavior will be added in the subsequent commits.
This commit also ensures an upper bound to stale-answer-client-timeout
which equals to one second less than 'resolver-query-timeout'.
After the addition of stale-answer-client-timeout a test was broken due
to the following behavior expected by the test.
1. Prime cache data.example txt.
2. Disable authoritative server.
3. Send a query for data.example txt.
4. Recursive server will timeout and answer from cache with stale RRset.
5. Recursive server will activate stale-refresh-time due to the previous
failure in attempting to refresh the RRset.
6. Send a query for data.example txt.
7. Expect stale answer from cache due to stale-refresh-time
window being active, even if authoritative server is up.
Problem is that in step 4, due to the new option
stale-answer-client-timeout, recursive server will answer with stale
data before the actual fetch completes.
Since the original fetch is still running in background, if we re-enable
the authoritative server during that time, the RRset will actually be
successfully refreshed, and stale-refresh-window will not be activated.
The next queries will fail because they expect the TTL of the RRset to
match the one in the stale cache, not the one just refreshed.
To solve this, we explicitly disable stale-answer-client-timeout for
this test, as it's not the feature we are interested in testing here
anyways.
The general logic behind the addition of this new feature works as
folows:
When a client query arrives, the basic path (query.c / ns_query_recurse)
was to create a fetch, waiting for completion in fetch_callback.
With the introduction of stale-answer-client-timeout, a new event of
type DNS_EVENT_TRYSTALE may invoke fetch_callback, whenever stale
answers are enabled and the fetch took longer than
stale-answer-client-timeout to complete.
When an event of type DNS_EVENT_TRYSTALE triggers fetch_callback, we
must ensure that the folowing happens:
1. Setup a new query context with the sole purpose of looking up for
stale RRset only data, for that matters a new flag was added
'DNS_DBFIND_STALEONLY' used in database lookups.
. If a stale RRset is found, mark the original client query as
answered (with a new query attribute named NS_QUERYATTR_ANSWERED),
so when the fetch completion event is received later, we avoid
answering the client twice.
. If a stale RRset is not found, cleanup and wait for the normal
fetch completion event.
2. In ns_query_done, we must change this part:
/*
* If we're recursing then just return; the query will
* resume when recursion ends.
*/
if (RECURSING(qctx->client)) {
return (qctx->result);
}
To this:
if (RECURSING(qctx->client) && !QUERY_STALEONLY(qctx->client)) {
return (qctx->result);
}
Otherwise we would not proceed to answer the client if it happened
that a stale answer was found when looking up for stale only data.
When an event of type DNS_EVENT_FETCHDONE triggers fetch_callback, we
proceed as before, resuming query, updating stats, etc, but a few
exceptions had to be added, most important of which are two:
1. Before answering the client (ns_client_send), check if the query
wasn't already answered before.
2. Before detaching a client, e.g.
isc_nmhandle_detach(&client->reqhandle), ensure that this is the
fetch completion event, and not the one triggered due to
stale-answer-client-timeout, so a correct call would be:
if (!QUERY_STALEONLY(client)) {
isc_nmhandle_detach(&client->reqhandle);
}
Other than these notes, comments were added in code in attempt to make
these updates easier to follow.
This is a minor performance improvement, we store the result of the
first call to strlcat to use as an offset in the next call when
constructing fctx->info string.
The BIND 9 libraries are considered to be internal only and hence the
API and ABI changes a lot. Keeping track of the API/ABI changes takes
time and it's a complicated matter as the safest way to make everything
stable would be to bump any library in the dependency chain as in theory
if libns links with libdns, and a binary links with both, and we bump
the libdns SOVERSION, but not the libns SOVERSION, the old libns might
be loaded by binary pulling old libdns together with new libdns loaded
by the binary. The situation gets even more complicated with loading
the plugins that have been compiled with few versions old BIND 9
libraries and then dynamically loaded into the named.
We are picking the safest option possible and usable for internal
libraries - instead of using -version-info that has only a weak link to
BIND 9 version number, we are using -release libtool option that will
embed the corresponding BIND 9 version number into the library name.
That means that instead of libisc.so.1701 (as an example) the library
will now be named libisc-9.17.10.so.
* Following the example set in 634bdfb16d, the tlsdns netmgr
module now uses libuv and SSL primitives directly, rather than
opening a TLS socket which opens a TCP socket, as the previous
model was difficult to debug. Closes#2335.
* Remove the netmgr tls layer (we will have to re-add it for DoH)
* Add isc_tls API to wrap the OpenSSL SSL_CTX object into libisc
library; move the OpenSSL initialization/deinitialization from dstapi
needed for OpenSSL 1.0.x to the isc_tls_{initialize,destroy}()
* Add couple of new shims needed for OpenSSL 1.0.x
* When LibreSSL is used, require at least version 2.7.0 that
has the best OpenSSL 1.1.x compatibility and auto init/deinit
* Enforce OpenSSL 1.1.x usage on Windows
* Added a TLSDNS unit test and implemented a simple TLSDNS echo
server and client.
the taskset command used for the cpu system test seems
to be failing under vmware, causing a test failure. we
can try the taskset command and skip the test if it doesn't
work.
The 'filter-aaaa', 'filter-aaaa-on-v4', and 'filter-aaaa-on-v6' options
are replaced by the filter-aaaa plugin. This plugin was introduced in
9.13.5 and so it is safe to remove the named.conf options.
When compiling BIND 9 without lmdb, this is promoted from
'not operational' to 'not configured', resulting in a failure (and no
longer a warning) if ldmb-related configuration options are set.
Special case certain system tests to avoid test failures on systems
that do not have lmdb.
These options were ancient or made obsolete a long time ago, it is
safe to remove them.
Also stop printing ancient options, they should be treated the same as
unknown options.
Removed options: lwres, geoip-use-ecs, sit-secret, use-ixfr,
acache-cleaning-interval, acache-enable, additional-from-auth,
additional-from-cache, allow-v6-synthesis, dnssec-enable,
max-acache-size, nosit-udp-size, queryport-pool-ports,
queryport-pool-updateinterval, request-sit, use-queryport-pool, and
support-ixfr.
The 'new default' option was introduced in 2002 to signal that a
default option had changed its default value, in this specific case
the value for 'auth-nxdomain'. However, this default has been unchanged
for 18 years now, and logging that the default has changed does not
have significant value nowadays.
This is also a good example that the clause flag 'new default' is
broken: it is easy to get out of date.
It is also easy to forget, because we have changed the default value
for 'max-stale-ttl' and haven't been flagging it with 'new default'
Also, if the operator cares for a specific value it should set it
explicitly. Using the default is telling the software: use whatever
you think is best, and this may change over time. Default value
changes should be mentioned in the release note, but do not require
further special treatment.
The clause flags 'not implented' and 'not implemented yet' are the
same as 'obsoleted' when it comes to behavior. These options will
now be treated similar as obsoleted (the idea being that if an
option is implemented it should be functional).
The new options for DoT are new options and rather than flagging them
obsolete, they should have been flagged as experimental, signalling
that these options are subject to change in the future.
Some merge requests (e.g. those created for release branches) include
merge commits. Prevent Danger from warning about excessive subject line
length for merge commits. (While the proper way to detect a merge
commit would be to check the 'parents' attribute of a commit object,
Danger Python does not seem to populate that attribute, so a simple
string search is performed on the commit subject instead.)
The Danger GitLab CI job currently flags excessively long lines in
commit log messages. Exclude lines containing references (i.e. starting
with "[1]", "[2]", etc.) from this check. This allows e.g. long URLs to
be included in commit log messages without triggering Danger warnings.
The Danger GitLab CI job currently generates a separate error message
about fixup commits being present in a merge request for every such
commit found. Prevent that by making it only log that error message
once per run.