2
0
mirror of https://gitlab.isc.org/isc-projects/bind9 synced 2025-08-26 03:58:15 +00:00

240 Commits

Author SHA1 Message Date
Michał Kępień
29d8d35869 Tweak query_addds() comments to avoid confusion
It has been noticed that commit 7a87bf468b9e092bf65db55a8e9234853c7db63d
did not only fix NSEC record handling in signed, insecure delegations
prepared using both wildcard expansion and CNAME chaining - it also
inadvertently fixed DS record handling in signed, secure delegations
of that flavor.  This is because the 'rdataset' variable in the relevant
location in query_addds() can be either a DS RRset or an NSEC RRset.
Update a code comment in query_addds() to avoid confusion.

Update the comments describing the purpose of query_addds() so that they
also mention NSEC(3) records.
2021-07-16 07:20:15 +02:00
Ondřej Surý
2bb454182b Make the DNS over HTTPS support optional
This commit adds two new autoconf options `--enable-doh` (enabled by
default) and `--with-libnghttp2` (mandatory when DoH is enabled).

When DoH support is disabled the library is not linked-in and support
for http(s) protocol is disabled in the netmgr, named and dig.
2021-07-07 09:50:53 +02:00
Ondřej Surý
4677bb28d1 Remove atomics emulated by a mutex-locked variable
Mutex atomics were intended to be used as a debugging tool only
and it has already served its purpose and it's not needed anymore.
2021-06-17 09:51:04 +02:00
Artem Boldariev
b84fa122ce Make BIND refuse to serve XFRs over DoH
We cannot use DoH for zone transfers.  According to RFC8484 a DoH
request contains exactly one DNS message (see Section 6: Definition of
the "application/dns-message" Media Type,
https://datatracker.ietf.org/doc/html/rfc8484#section-6).  This makes
DoH unsuitable for zone transfers as often (and usually!) these need
more than one DNS message, especially for larger zones.

As zone transfers over DoH are not (yet) standardised, nor discussed
in RFC8484, the best thing we can do is to return "not implemented."

Technically DoH can be used to transfer small zones which fit in one
message, but that is not enough for the generic case.

Also, this commit makes the server-side DoH code ensure that no
multiple responses could be attempted to be sent over one HTTP/2
stream. In HTTP/2 one stream is mapped to one request/response
transaction. Now the write callback will be called with failure error
code in such a case.
2021-06-14 11:37:36 +03:00
Michał Kępień
7a87bf468b Fix "no DS" proofs for wildcard+CNAME delegations
When answering a query requires wildcard expansion, the AUTHORITY
section of the response needs to include NSEC(3) record(s) proving that
the QNAME does not exist.

When a response to a query is an insecure delegation, the AUTHORITY
section needs to include an NSEC(3) proof that no DS record exists at
the parent side of the zone cut.

These two conditions combined trip up the NSEC part of the logic
contained in query_addds(), which expects the NS RRset to be owned by
the first name found in the AUTHORITY section of a delegation response.
This may not always be true, for example if wildcard expansion causes an
NSEC record proving QNAME nonexistence to be added to the AUTHORITY
section before the delegation is added to the response.  In such a case,
named incorrectly omits the NSEC record proving nonexistence of QNAME
from the AUTHORITY section.

The same block of code is affected by another flaw: if the same NSEC
record proves nonexistence of both the QNAME and the DS record at the
parent side of the zone cut, this NSEC record will be added to the
AUTHORITY section twice.

Fix by looking for the NS RRset in the entire AUTHORITY section and
adding the NSEC record to the delegation using query_addrrset() (which
handles duplicate RRset detection).
2021-06-10 10:13:23 +02:00
Kevin Chen
0cdf85d204 Several serve-stale improvements
Commit a83c8cb0afd88d54b9cf67239f2495c9b0391e97 updated masterdump so
that stale records in "rndc dumpdb" output no longer shows 0 TTLs.  In
this commit we change the name of the `rdataset->stale_ttl` field to
`rdataset->expired` to make its purpose clearer, and set it to zero in
cases where it's unused.

Add 'rbtdb->serve_stale_ttl' to various checks so that stale records
are not purged from the cache when they've been stale for RBTDB_VIRTUAL
(300) seconds.

Increment 'ns_statscounter_usedstale' when a stale answer is used.

Note: There was a question of whether 'overmem_purge' should be
purging ancient records, instead of stale ones.  It is left as purging
stale records, since stale records could take up the majority of the
cache.

This submission is copyrighted Akamai Technologies, Inc. and provided
under an MPL 2.0 license.

This commit was originally authored by Kevin Chen, and was updated by
Matthijs Mekking to match recent serve-stale developments.
2021-05-30 11:45:35 -07:00
Matthijs Mekking
c0dc5937c7 Reset DNS_FETCHOPT_TRYSTALE_ONTIMEOUT on resume
Once we resume a query, we should clear DNS_FETCHOPT_TRYSTALE_ONTIMEOUT
from the options to prevent triggering the stale-answer-client-timeout
on subsequent fetches.

If we don't this may cause a crash when for example when prefetch is
triggered after a query restart.
2021-05-30 00:03:51 -07:00
Evan Hunt
8bd8e995f1 clean up query correctly if already answered by serve-stale
when a serve-stale answer has been sent, the client continues waiting
for a proper answer. if a final completion event for the client does
arrive, it can just be cleaned up without sending a response, similar
to a canceled fetch.
2021-05-27 10:35:48 -07:00
Ondřej Surý
28b65d8256 Reduce the number of clientmgr objects created
Previously, as a way of reducing the contention between threads a
clientmgr object would be created for each interface/IP address.

We tasks being more strictly bound to netmgr workers, this is no longer
needed and we can just create clientmgr object per worker queue (ncpus).

Each clientmgr object than would have a single task and single memory
context.
2021-05-24 20:44:54 +02:00
Evan Hunt
b0aadaac8e rename dns_name_copynf() to dns_name_copy()
dns_name_copy() is now the standard name-copying function.
2021-05-22 00:37:27 -07:00
Ondřej Surý
ce3e1abc1d Use dns_name_copynf() with dns_message_gettempname() when needed
dns_message_gettempname() returns an initialized name with a dedicated
buffer, associated with a dns_fixedname object.  Using dns_name_copynf()
to write a name into this object will actually copy the name data
from a source name. dns_name_clone() merely points target->ndata to
source->ndata, so it is faster, but it can lead to a use-after-free if
the source is freed before the target object is released via
dns_message_puttempname().

In a few places, clone was being used where copynf should have been;
this is now fixed.

As a side note, no memory was lost, because the ndata buffer used in
the dns_fixedname_t is internal to the structure, and is freed when
the dns_fixedname_t is freed regardless of the .ndata contents.
2021-05-21 21:28:10 -07:00
Evan Hunt
e31cc1eeb4 use a fixedname buffer in dns_message_gettempname()
dns_message_gettempname() now returns a pointer to an initialized
name associated with a dns_fixedname_t object. it is no longer
necessary to allocate a buffer for temporary names associated with
the message object.
2021-05-20 20:41:29 +02:00
Matthijs Mekking
66f2cd228d Use isdigit instead of checking character range
When looking for key files, we could use isdigit rather than checking
if the character is within the range [0-9].

Use (unsigned char) cast to ensure the value is representable in the
unsigned char type (as suggested by the isdigit manpage).

Change " & 0xff" occurrences to the recommended (unsigned char) type
cast.
2021-05-05 19:15:33 +02:00
Mark Andrews
c1190a3fe0 Handle DNAME lookup via itself
When answering a query, named should never attempt to add the same RRset
to the ANSWER section more than once.  However, such a situation may
arise when chasing DNAME records: one of the DNAME records placed in the
ANSWER section may turn out to be the final answer to a client query,
but there is no way to know that in advance.  Tweak the relevant INSIST
assertion in query_respond() so that it handles this case properly.
qctx->rdataset is freed later anyway, so there is no need to clean it up
in query_respond().
2021-04-29 10:30:00 +02:00
Matthijs Mekking
104b676235 Serve-stale nit fixes
While working on the serve-stale backports, I noticed the following
oddities:

1. In the serve-stale system test, in one case we keep track of the
   time how long it took for dig to complete. In commit
   aaed7f9d8c2465790d769221dfe8378c7147f5eb, the code removed the
   exception to check for result == ISC_R_SUCCESS on stale found
   answers, and adjusted the test accordingly. This failed to update
   the time tracking accordingly. Move the t1/t2 time track variables
   back around the two dig commands to ensure the lookups resolved
   faster than the resolver-query-timeout.

2. We can remove the setting of NS_QUERYATTR_STALEOK and
   DNS_RDATASETATTR_STALE_ADDED on the "else if (stale_timeout)"
   code path, because they are added later when we know we have
   actually found a stale answer on a stale timeout lookup.

3. We should clear the NS_QUERYATTR_STALEOK flag from the client
   query attributes instead of DNS_RDATASETATTR_STALE_ADDED (that
   flag is set on the rdataset attributes).

4. In 'bin/named/config.c' we should set the configuration options
   in alpabetical order.

5. In the ARM, in the backports we have added "(stale)" between
   "cached" and "RRset" to make more clear a stale RRset may be
   returned in this scenario.
2021-04-28 12:24:24 +02:00
Matthijs Mekking
3d3a6415f7 If RPZ config'd, bail stale-answer-client-timeout
When we are recursing, RPZ processing is not allowed. But when we are
performing a lookup due to "stale-answer-client-timeout", we are still
recursing. This effectively means that RPZ processing is disabled on
such a lookup.

In this case, bail the "stale-answer-client-timeout" lookup and wait
for recursion to complete, as we we can't perform the RPZ rewrite
rules reliably.
2021-04-02 10:02:40 +02:00
Matthijs Mekking
839df94190 Rename "staleonly"
The dboption DNS_DBFIND_STALEONLY caused confusion because it implies
we are looking for stale data **only** and ignore any active RRsets in
the cache. Rename it to DNS_DBFIND_STALETIMEOUT as it is more clear
the option is related to a lookup due to "stale-answer-client-timeout".

Rename other usages of "staleonly", instead use "lookup due to...".
Also rename related function and variable names.
2021-04-02 10:02:40 +02:00
Matthijs Mekking
3f81d79ffb Restore the RECURSIONOK attribute after staleonly
When doing a staleonly lookup we don't want to fallback to recursion.
After all, there are obviously problems with recursion, otherwise we
wouldn't do a staleonly lookup.

When resuming from recursion however, we should restore the
RECURSIONOK flag, allowing future required lookups for this client
to recurse.
2021-04-02 10:02:40 +02:00
Matthijs Mekking
aaed7f9d8c Remove result exception on staleonly lookup
When implementing "stale-answer-client-timeout", we decided that
we should only return positive answers prematurely to clients. A
negative response is not useful, and in that case it is better to
wait for the recursion to complete.

To do so, we check the result and if it is not ISC_R_SUCCESS, we
decide that it is not good enough. However, there are more return
codes that could lead to a positive answer (e.g. CNAME chains).

This commit removes the exception and now uses the same logic that
other stale lookups use to determine if we found a useful stale
answer (stale_found == true).

This means we can simplify two test cases in the serve-stale system
test: nodata.example is no longer treated differently than data.example.
2021-04-02 10:02:40 +02:00
Matthijs Mekking
3d5429f61f Remove INSIST on NS_QUERYATTR_ANSWERED
The NS_QUERYATTR_ANSWERED attribute is to prevent sending a response
twice. Without the attribute, this may happen if a staleonly lookup
found a useful answer and sends a response to the client, and later
recursion ends and also tries to send a response.

The attribute was also used to mask adding a duplicate RRset. This is
considered harmful. When we created a response to the client with a
stale only lookup (regardless if we actually have send the response),
we should clear the rdatasets that were added during that lookup.

Mark such rdatasets with the a new attribute,
DNS_RDATASETATTR_STALE_ADDED. Set a query attribute
NS_QUERYATTR_STALEOK if we may have added rdatasets during a stale
only lookup. Before creating a response on a normal lookup, check if
we can expect rdatasets to have been added during a staleonly lookup.
If so, clear the rdatasets from the message with the attribute
DNS_RDATASETATTR_STALE_ADDED set.
2021-04-02 09:15:07 +02:00
Matthijs Mekking
48b0dc159b Simplify when to detach the client
With stale-answer-client-timeout, we may send a response to the client,
but we may want to hold on to the network manager handle, because
recursion is going on in the background, or we need to refresh a
stale RRset.

Simplify the setting of 'nodetach':
* During a staleonly lookup we should not detach the nmhandle, so just
  set it prior to 'query_lookup()'.
* During a staleonly "stalefirst" lookup set the 'nodetach' to true
  if we are going to refresh the RRset.

Now there is no longer the need to clear the 'nodetach' if we go
through the "dbfind_stale", "stale_refresh_window", or "stale_only"
paths.
2021-04-02 09:14:09 +02:00
Matthijs Mekking
92f7a67892 Refactor stale lookups, ignore active RRsets
When doing a staleonly lookup, ignore active RRsets from cache. If we
don't, we may add a duplicate RRset to the message, and hit an
assertion failure in query.c because adding the duplicate RRset to the
ANSWER section failed.

This can happen on a race condition. When a client query is received,
the recursion is started. When 'stale-answer-client-timeout' triggers
around the same time the recursion completes, the following sequence
of events may happen:
1. Queue the "try stale" fetch_callback() event to the client task.
2. Add the RRsets from the authoritative response to the cache.
3. Queue the "fetch complete" fetch_callback() event to the client task.
4. Execute the "try stale" fetch_callback(), which retrieves the
   just-inserted RRset from the database.
5. In "ns_query_done()" we are still recursing, but the "staleonly"
   query attribute has already been cleared. In other words, the
   query will resume when recursion ends (it already has ended but is
   still on the task queue).
6. Execute the "fetch complete" fetch_callback(). It finds the answer
   from recursion in the cache again and tries to add the duplicate to
   the answer section.

This commit changes the logic for finding stale answers in the cache,
such that on "stale_only" lookups actually only stale RRsets are
considered. It refactors the code so that code paths for "dbfind_stale",
"stale_refresh_window", and "stale_only" are more clear.

First we call some generic code that applies in all three cases,
formatting the domain name for logging purposes, increment the
trystale stats, and check if we actually found stale data that we can
use.

The "dbfind_stale" lookup will return SERVFAIL if we didn't found a
usable answer, otherwise we will continue with the lookup
(query_gotanswer()). This is no different as before the introduction of
"stale-answer-client-timeout" and "stale-refresh-time".

The "stale_refresh_window" lookup is similar to the "dbfind_stale"
lookup: return SERVFAIL if we didn't found a usable answer, otherwise
continue with the lookup (query_gotanswer()).

Finally the "stale_only" lookup.

If the "stale_only" lookup was triggered because of an actual client
timeout (stale-answer-client-timeout > 0), and if database lookup
returned a stale usable RRset, trigger a response to the client.
Otherwise return and wait until the recursion completes (or the
resolver query times out).

If the "stale_only" lookup is a "stale-anwer-client-timeout 0" lookup,
preferring stale data over a lookup. In this case if there was no stale
data, or the data was not a positive answer, retry the lookup with the
stale options cleared, a.k.a. a normal lookup. Otherwise, continue
with the lookup (query_gotanswer()) and refresh the stale RRset. This
will trigger a response to the client, but will not detach the handle
because a fetch will be created to refresh the RRset.
2021-04-02 09:14:09 +02:00
Matthijs Mekking
fee164243f Keep track of allow client detach
The stale-answer-client-timeout feature introduced a dependancy on
when a client may be detached from the handle. The dboption
DNS_DBFIND_STALEONLY was reused to track this attribute. This overloads
the meaning of this database option, and actually introduced a bug
because the option was checked in other places. In particular, in
'ns_query_done()' there is a check for 'RECURSING(qctx->client) &&
(!QUERY_STALEONLY(&qctx->client->query) || ...' and the condition is
satisfied because recursion has not completed yet and
DNS_DBFIND_STALEONLY is already cleared by that time (in
query_lookup()), because we found a useful answer and we should detach
the client from the handle after sending the response.

Add a new boolean to the client structure to keep track of client
detach from handle is allowed or not. It is only disallowed if we are
in a staleonly lookup and we didn't found a useful answer.
2021-04-02 09:14:09 +02:00
Matthijs Mekking
87591de6f7 Fix servestale fetchlimits crash
When we query the resolver for a domain name that is in the same zone
for which is already one or more fetches outstanding, we could
potentially hit the fetch limits. If so, recursion fails immediately
for the incoming query and if serve-stale is enabled, we may try to
return a stale answer.

If the resolver is also is authoritative for the parent zone (for
example the root zone), first a delegation is found, but we first
check the cache for a better response.

Nothing is found in the cache, so we try to recurse to find the
answer to the query.

Because of fetch-limits 'dns_resolver_createfetch()' returns an error,
which 'ns_query_recurse()' propagates to the caller,
'query_delegation_recurse()'.

Because serve-stale is enabled, 'query_usestale()' is called,
setting 'qctx->db' to the cache db, but leaving 'qctx->version'
untouched. Now 'query_lookup()' is called to search for stale data
in the cache database with a non-NULL 'qctx->version'
(which is set to a zone db version), and thus we hit an assertion
in rbtdb.

This crash was introduced in 'main' by commit
8bcd7fe69e5343071fc917738d6092a8b974ef3f.
2021-03-11 12:16:14 +01:00
Ondřej Surý
a0181056a8 Change the isc_thread_self() return type to uintptr_t
The pthread_self(), thrd_current() or GetCurrentThreadId() could
actually be a pointer, so we should rather convert the value into
uintptr_t instead of unsigned long.
2021-02-25 16:21:10 +01:00
Matthijs Mekking
f8b7b597e9 Don't servfail on staleonly lookups
When a staleonly lookup doesn't find a satisfying answer, it should
not try to respond to the client.

This is not true when the initial lookup is staleonly (that is when
'stale-answer-client-timeout' is set to 0), because no resolver fetch
has been created at this point. In this case continue with the lookup
normally.
2021-02-25 11:32:17 +01:00
Matthijs Mekking
9e061faaae Don't allow recursion on staleonly lookups
Fix a crash that can happen in the following scenario:

A client request is received. There is no data for it in the cache,
(not even stale data). A resolver fetch is created as part of
recursion.

Some time later, the fetch still hasn't completed, and
stale-answer-client-timeout is triggered. A staleonly lookup is
started. It will also find no data in the cache.

So 'query_lookup()' will call 'query_gotanswer()' with ISC_R_NOTFOUND,
so this will call 'query_notfound()' and this will start recursion.

We will eventually end up in 'ns_query_recurse()' and that requires
the client query fetch to be NULL:

    REQUIRE(client->query.fetch == NULL);

If the previously started fetch is still running this assertion
fails.

The crash is easily prevented by not requiring recursion for
staleonly lookups.

Also remove a redundant setting of the staleonly flag at the end of
'query_lookup_staleonly()' before destroying the query context.

Add a system test to catch this case.
2021-02-25 11:32:17 +01:00
Matthijs Mekking
8bcd7fe69e Use stale on error also when unable to recurse
The 'query_usestale()' function was only called when in
'query_gotanswer()' and an unexpected error occurred. This may have
been "quota reached", and thus we were in some cases returning
stale data on fetch-limits (and if serve-stale enabled of course).

But we can also hit fetch-limits when recursing because we are
following a referral (in 'query_notfound()' and
'query_delegation_recurse()'). Here we should also check for using
stale data in case an error occurred.

Specifically don't check for using stale data when refetching a
zero TTL RRset from cache.

Move the setting of DNS_DBFIND_STALESTART into the 'query_usestale()'
function to avoid code duplication.
2021-02-08 15:17:09 +01:00
Matthijs Mekking
aabdedeae3 Only start stale refresh window when resuming
If we did not attempt a fetch due to fetch-limits, we should not start
the stale-refresh-time window.

Introduce a new flag DNS_DBFIND_STALESTART to differentiate between
a resolver failure and unexpected error. If we are resuming, this
indicates a resolver failure, then start the stale-refresh-time window,
otherwise don't start the stale-refresh-time window, but still fall
back to using stale data.

(This commit also wraps some docstrings to 80 characters width)
2021-01-28 16:38:34 +01:00
Matthijs Mekking
c6fd02aed5 Use stale data also if we are not resuming
Before this change, BIND will only fallback to using stale data if
there was an actual attempt to resolve the query. Then on a timeout,
the stale data from cache becomes eligible.

This commit changes this so that on any unexpected error stale data
becomes eligble (you would still have to have 'stale-answer-enable'
enabled of course).

If there is no stale data, this may return in an error again, so don't
loop on stale data lookup attempts. If the DNS_DBFIND_STALEOK flag is
set, this means we already tried to lookup stale data, so if that is
the case, don't use stale again.
2021-01-28 16:36:46 +01:00
Matthijs Mekking
fa0c9280d2 Update code flow in query.c wrt stale data
First of all, there was a flaw in the code related to the
'stale-refresh-time' option. If stale answers are enabled, and we
returned stale data, then it was assumed that it was because we were
in the 'stale-refresh-time' window. But now we could also have returned
stale data because of a 'stale-answer-client-timeout'. To fix this,
introduce a rdataset attribute DNS_RDATASETATTR_STALE_WINDOW to
indicate whether the stale cache entry was returned because the
'stale-refresh-time' window is active.

Second, remove the special case handling when the result is
DNS_R_NCACHENXRRSET. This can be done more generic in the code block
when dealing with stale data.

Putting all stale case handling in the code block when dealing with
stale data makes the code more easy to follow.

Update documentation to be more verbose and to match then new code
flow.
2021-01-25 10:48:16 -03:00
Diego Fronza
966060c03b Extracted common function from query_lookup and query_refresh_rrset
Both functions employed the same code lines to allocate query context
buffers, which are used to store query results, so this shared portion
of code was extracted out to a new function, qctx_prepare_buffers.

Also, this commit uses qctx_init to initialize the query context whitin
query_refresh_rrset function.
2021-01-25 10:48:16 -03:00
Diego Fronza
f89ac07b28 Small optimization in query_usestale
This commit makes the code in query_usestale easier to follow, it also
doesn't attach/detach to the database if stale answers are not enabled.
2021-01-25 10:48:16 -03:00
Diego Fronza
e219422575 Allow stale data to be used before name resolution
This commit allows stale RRset to be used (if available) for responding
a query, before an attempt to refresh an expired, or otherwise resolve
an unavailable RRset in cache is made.

For that to work, a value of zero must be specified for
stale-answer-client-timeout statement.

To better understand the logic implemented, there are three flags being
used during database lookup and other parts of code that must be
understood:

. DNS_DBFIND_STALEOK: This flag is set when BIND fails to refresh a
  RRset due to timeout (resolver-query-timeout), its intent is to
  try to look for stale data in cache as a fallback, but only if
  stale answers are enabled in configuration.

  This flag is also used to activate stale-refresh-time window, since it
  is the only way the database knows that a resolution has failed.

. DNS_DBFIND_STALEENABLED: This flag is used as a hint to the database
  that it may use stale data. It is always set during query lookup if
  stale answers are enabled, but only effectively used during
  stale-refresh-time window. Also during this window, the resolver will
  not try to resolve the query, in other words no attempt to refresh the
  data in cache is made when the stale-refresh-time window is active.

. DNS_DBFIND_STALEONLY: This new introduced flag is used when we want
  stale data from the database, but not due to a failure in resolution,
  it also doesn't require stale-refresh-time window timer to be active.
  As long as there is a stale RRset available, it should be returned.
  It is mainly used in two situations:

    1. When stale-answer-client-timeout timer is triggered: in that case
       we want to know if there is stale data available to answer the
       client.
    2. When stale-answer-client-timeout value is set to zero: in that
       case, we also want to know if there is some stale RRset available
       to promptly answer the client.

We must also discern between three situations that may happen when
resolving a query after the addition of stale-answer-client-timeout
statement, and how to handle them:

	1. Are we running query_lookup() due to stale-answer-client-timeout
       timer being triggered?

       In this case, we look for stale data, making use of
       DNS_DBFIND_STALEONLY flag. If a stale RRset is available then
       respond the client with the data found, mark this query as
       answered (query attribute NS_QUERYATTR_ANSWERED), so when the
       fetch completes the client won't be answered twice.

       We must also take care of not detaching from the client, as a
       fetch will still be running in background, this is handled by the
       following snippet:

       if (!QUERY_STALEONLY(&client->query)) {
           isc_nmhandle_detach(&client->reqhandle);
       }

       Which basically tests if DNS_DBFIND_STALEONLY flag is set, which
       means we are here due to a stale-answer-client-timeout timer
       expiration.

    2. Are we running query_lookup() due to resolver-query-timeout being
       triggered?

       In this case, DNS_DBFIND_STALEOK flag will be set and an attempt
       to look for stale data will be made.
       As already explained, this flag is algo used to activate
       stale-refresh-time window, as it means that we failed to refresh
       a RRset due to timeout.
       It is ok in this situation to detach from the client, as the
       fetch is already completed.

    3. Are we running query_lookup() during the first time, looking for
       a RRset in cache and stale-answer-client-timeout value is set to
       zero?

       In this case, if stale answers are enabled (probably), we must do
       an initial database lookup with DNS_DBFIND_STALEONLY flag set, to
       indicate to the database that we want stale data.

       If we find an active RRset, proceed as normal, answer the client
       and the query is done.

       If we find a stale RRset we respond to the client and mark the
       query as answered, but don't detach from the client yet as an
       attempt in refreshing the RRset will still be made by means of
       the new introduced function 'query_resolve'.

       If no active or stale RRset is available, begin resolution as
       usual.
2021-01-25 10:47:14 -03:00
Diego Fronza
171a5b7542 Add stale-answer-client-timeout option
The general logic behind the addition of this new feature works as
folows:

When a client query arrives, the basic path (query.c / ns_query_recurse)
was to create a fetch, waiting for completion in fetch_callback.

With the introduction of stale-answer-client-timeout, a new event of
type DNS_EVENT_TRYSTALE may invoke fetch_callback, whenever stale
answers are enabled and the fetch took longer than
stale-answer-client-timeout to complete.

When an event of type DNS_EVENT_TRYSTALE triggers fetch_callback, we
must ensure that the folowing happens:

1. Setup a new query context with the sole purpose of looking up for
   stale RRset only data, for that matters a new flag was added
   'DNS_DBFIND_STALEONLY' used in database lookups.

    . If a stale RRset is found, mark the original client query as
      answered (with a new query attribute named NS_QUERYATTR_ANSWERED),
      so when the fetch completion event is received later, we avoid
      answering the client twice.

    . If a stale RRset is not found, cleanup and wait for the normal
      fetch completion event.

2. In ns_query_done, we must change this part:
	/*
	 * If we're recursing then just return; the query will
	 * resume when recursion ends.
	 */
	if (RECURSING(qctx->client)) {
		return (qctx->result);
	}

   To this:

	if (RECURSING(qctx->client) && !QUERY_STALEONLY(qctx->client)) {
		return (qctx->result);
	}

   Otherwise we would not proceed to answer the client if it happened
   that a stale answer was found when looking up for stale only data.

When an event of type DNS_EVENT_FETCHDONE triggers fetch_callback, we
proceed as before, resuming query, updating stats, etc, but a few
exceptions had to be added, most important of which are two:

1. Before answering the client (ns_client_send), check if the query
   wasn't already answered before.

2. Before detaching a client, e.g.
   isc_nmhandle_detach(&client->reqhandle), ensure that this is the
   fetch completion event, and not the one triggered due to
   stale-answer-client-timeout, so a correct call would be:
   if (!QUERY_STALEONLY(client)) {
        isc_nmhandle_detach(&client->reqhandle);
   }

Other than these notes, comments were added in code in attempt to make
these updates easier to follow.
2021-01-25 10:47:14 -03:00
Diego Fronza
74840ec50b Added dns_view_staleanswerenabled() function
Since it takes a couple lines of code to check whether stale answers
are enabled for a given view, code was extracted out to a proper
function.
2021-01-25 10:47:14 -03:00
Matthijs Mekking
87744f218d Remove a lot of obsoleted options
These options were ancient or made obsolete a long time ago, it is
safe to remove them.

Also stop printing ancient options, they should be treated the same as
unknown options.

Removed options: lwres, geoip-use-ecs, sit-secret, use-ixfr,
acache-cleaning-interval, acache-enable, additional-from-auth,
additional-from-cache, allow-v6-synthesis, dnssec-enable,
max-acache-size, nosit-udp-size, queryport-pool-ports,
queryport-pool-updateinterval, request-sit, use-queryport-pool, and
support-ixfr.
2021-01-19 10:12:40 +01:00
JINMEI Tatuya
d520f01c7b detach fetchhandle before resume query processing
otherwise, another hook async event or DNS recursion would
trigger an assertion failure.
2021-01-06 13:14:13 -08:00
Ondřej Surý
7ba18870dc Reformat sources using clang-format-11 2020-12-08 18:36:23 +01:00
Diego Fronza
95add01643 Silence coverity warnings in query.c
Return value of dns_db_getservestalerefresh() and
dns_db_getservestalettl() functions were previously unhandled.

This commit purposefully ignore those return values since there is
no side effect if those results are != ISC_R_SUCCESS, it also supress
Coverity warnings.
2020-11-26 14:55:14 +00:00
JINMEI Tatuya
75cdd758ed implementation of hook-based asynchronous functionality
previously query plugins were strictly synchrounous - the query
process would be interrupted at some point, data would be looked
up or a change would be made, and then the query processing would
resume immediately.

this commit enables query plugins to initiate asynchronous processes
and resume on a completion event, as with recursion.
2020-11-24 15:11:39 -08:00
JINMEI Tatuya
9c8dae041d ns_query refactoring for hook-based recursion
several small changes to query processing to make it easier to
use hook-based recursion (and other asynchronous functionlity)
later.

- recursion quota check is now a separate function,
  check_recursionquota(), which is called by ns_query_recurse().
- pass isc_result to query_nxdomain() instead of bool.
  the value of 'empty_wild' will be determined in the function
  based on the passed result.  this is similar to query_nodata(),
  and makes the signatures of the two functions more consistent.
- pass the current 'result' value into plugin hooks.
2020-11-24 15:11:39 -08:00
Mark Andrews
e980affba0 Fix DNAME when QTYPE is CNAME or ANY
The synthesised CNAME is not supposed to be followed when the
QTYPE is CNAME or ANY as the lookup is satisfied by the CNAME
record.
2020-11-19 10:18:01 +11:00
Diego Fronza
4827ad0ec4 Add stale-refresh-time option
Before this update, BIND would attempt to do a full recursive resolution
process for each query received if the requested rrset had its ttl
expired. If the resolution fails for any reason, only then BIND would
check for stale rrset in cache (if 'stale-cache-enable' and
'stale-answer-enable' is on).

The problem with this approach is that if an authoritative server is
unreachable or is failing to respond, it is very unlikely that the
problem will be fixed in the next seconds.

A better approach to improve performance in those cases, is to mark the
moment in which a resolution failed, and if new queries arrive for that
same rrset, try to respond directly from the stale cache, and do that
for a window of time configured via 'stale-refresh-time'.

Only when this interval expires we then try to do a normal refresh of
the rrset.

The logic behind this commit is as following:

- In query.c / query_gotanswer(), if the test of 'result' variable falls
  to the default case, an error is assumed to have happened, and a call
  to 'query_usestale()' is made to check if serving of stale rrset is
  enabled in configuration.

- If serving of stale answers is enabled, a flag will be turned on in
  the query context to look for stale records:
  query.c:6839
  qctx->client->query.dboptions |= DNS_DBFIND_STALEOK;

- A call to query_lookup() will be made again, inside it a call to
  'dns_db_findext()' is made, which in turn will invoke rbdb.c /
  cache_find().

- In rbtdb.c / cache_find() the important bits of this change is the
  call to 'check_stale_header()', which is a function that yields true
  if we should skip the stale entry, or false if we should consider it.

- In check_stale_header() we now check if the DNS_DBFIND_STALEOK option
  is set, if that is the case we know that this new search for stale
  records was made due to a failure in a normal resolution, so we keep
  track of the time in which the failured occured in rbtdb.c:4559:
  header->last_refresh_fail_ts = search->now;

- In check_stale_header(), if DNS_DBFIND_STALEOK is not set, then we
  know this is a normal lookup, if the record is stale and the query
  time is between last failure time + stale-refresh-time window, then
  we return false so cache_find() knows it can consider this stale
  rrset entry to return as a response.

The last additions are two new methods to the database interface:
- setservestale_refresh
- getservestale_refresh

Those were added so rbtdb can be aware of the value set in configuration
option, since in that level we have no access to the view object.
2020-11-11 12:53:23 -03:00
Diego Fronza
d727eaae6c Always return address records in additional section for NS queries 2020-10-21 12:03:42 -03:00
Evan Hunt
86eddebc83 Purge memory pool upon plugin destruction
The typical sequence of events for AAAA queries which trigger recursion
for an A RRset at the same name is as follows:

 1. Original query context is created.
 2. An AAAA RRset is found in cache.
 3. Client-specific data is allocated from the filter-aaaa memory pool.
 4. Recursion is triggered for an A RRset.
 5. Original query context is torn down.

 6. Recursion for an A RRset completes.
 7. A second query context is created.
 8. Client-specific data is retrieved from the filter-aaaa memory pool.
 9. The response to be sent is processed according to configuration.
10. The response is sent.
11. Client-specific data is returned to the filter-aaaa memory pool.
12. The second query context is torn down.

However, steps 6-12 are not executed if recursion for an A RRset is
canceled.  Thus, if named is in the process of recursing for A RRsets
when a shutdown is requested, the filter-aaaa memory pool will have
outstanding allocations which will never get released.  This in turn
leads to a crash since every memory pool must not have any outstanding
allocations by the time isc_mempool_destroy() is called.

Fix by creating a stub query context whenever fetch_callback() is called,
including cancellation events. When the qctx is destroyed, it will ensure
the client is detached and the plugin memory is freed.
2020-09-25 13:32:34 -07:00
Mark Andrews
f0d9bf7c30 Clone the saved / query message buffers
The message buffer passed to ns__client_request is only valid for
the life of the the ns__client_request call.  Save a copy of it
when we recurse or process a update as ns__client_request will
return before those operations complete.
2020-09-23 10:37:42 +10:00
Ondřej Surý
d4976e0ebe Add separate prefetch nmhandle to ns_client_t
As the query_prefetch() or query_rpzfetch() could be called during
"regular" fetch, we need to introduce separate storage for attaching
the nmhandle during prefetching the records.  The query_prefetch()
and query_rpzfetch() are guarded for re-entrance by .query.prefetch
member of ns_client_t, so we can reuse the same .prefetchhandle for
both.
2020-09-22 09:56:26 +02:00
Evan Hunt
dcee985b7f update all copyright headers to eliminate the typo 2020-09-14 16:20:40 -07:00
Evan Hunt
57b4dde974 change from isc_nmhandle_ref/unref to isc_nmhandle attach/detach
Attaching and detaching handle pointers will make it easier to
determine where and why reference counting errors have occurred.

A handle needs to be referenced more than once when multiple
asynchronous operations are in flight, so callers must now maintain
multiple handle pointers for each pending operation. For example,
ns_client objects now contain:

        - reqhandle:    held while waiting for a request callback (query,
                        notify, update)
        - sendhandle:   held while waiting for a send callback
        - fetchhandle:  held while waiting for a recursive fetch to
                        complete
        - updatehandle: held while waiting for an update-forwarding
                        task to complete

control channel connection objects now contain:

        - readhandle: held while waiting for a read callback
        - sendhandle: held while waiting for a send callback
        - cmdhandle:  held while an rndc command is running

httpd connections contain:

        - readhandle: held while waiting for a read callback
        - sendhandle: held while waiting for a send callback
2020-09-11 12:17:57 -07:00