2
0
mirror of https://gitlab.isc.org/isc-projects/bind9 synced 2025-08-28 13:08:06 +00:00

170 Commits

Author SHA1 Message Date
Matthijs Mekking
48b0dc159b Simplify when to detach the client
With stale-answer-client-timeout, we may send a response to the client,
but we may want to hold on to the network manager handle, because
recursion is going on in the background, or we need to refresh a
stale RRset.

Simplify the setting of 'nodetach':
* During a staleonly lookup we should not detach the nmhandle, so just
  set it prior to 'query_lookup()'.
* During a staleonly "stalefirst" lookup set the 'nodetach' to true
  if we are going to refresh the RRset.

Now there is no longer the need to clear the 'nodetach' if we go
through the "dbfind_stale", "stale_refresh_window", or "stale_only"
paths.
2021-04-02 09:14:09 +02:00
Matthijs Mekking
92f7a67892 Refactor stale lookups, ignore active RRsets
When doing a staleonly lookup, ignore active RRsets from cache. If we
don't, we may add a duplicate RRset to the message, and hit an
assertion failure in query.c because adding the duplicate RRset to the
ANSWER section failed.

This can happen on a race condition. When a client query is received,
the recursion is started. When 'stale-answer-client-timeout' triggers
around the same time the recursion completes, the following sequence
of events may happen:
1. Queue the "try stale" fetch_callback() event to the client task.
2. Add the RRsets from the authoritative response to the cache.
3. Queue the "fetch complete" fetch_callback() event to the client task.
4. Execute the "try stale" fetch_callback(), which retrieves the
   just-inserted RRset from the database.
5. In "ns_query_done()" we are still recursing, but the "staleonly"
   query attribute has already been cleared. In other words, the
   query will resume when recursion ends (it already has ended but is
   still on the task queue).
6. Execute the "fetch complete" fetch_callback(). It finds the answer
   from recursion in the cache again and tries to add the duplicate to
   the answer section.

This commit changes the logic for finding stale answers in the cache,
such that on "stale_only" lookups actually only stale RRsets are
considered. It refactors the code so that code paths for "dbfind_stale",
"stale_refresh_window", and "stale_only" are more clear.

First we call some generic code that applies in all three cases,
formatting the domain name for logging purposes, increment the
trystale stats, and check if we actually found stale data that we can
use.

The "dbfind_stale" lookup will return SERVFAIL if we didn't found a
usable answer, otherwise we will continue with the lookup
(query_gotanswer()). This is no different as before the introduction of
"stale-answer-client-timeout" and "stale-refresh-time".

The "stale_refresh_window" lookup is similar to the "dbfind_stale"
lookup: return SERVFAIL if we didn't found a usable answer, otherwise
continue with the lookup (query_gotanswer()).

Finally the "stale_only" lookup.

If the "stale_only" lookup was triggered because of an actual client
timeout (stale-answer-client-timeout > 0), and if database lookup
returned a stale usable RRset, trigger a response to the client.
Otherwise return and wait until the recursion completes (or the
resolver query times out).

If the "stale_only" lookup is a "stale-anwer-client-timeout 0" lookup,
preferring stale data over a lookup. In this case if there was no stale
data, or the data was not a positive answer, retry the lookup with the
stale options cleared, a.k.a. a normal lookup. Otherwise, continue
with the lookup (query_gotanswer()) and refresh the stale RRset. This
will trigger a response to the client, but will not detach the handle
because a fetch will be created to refresh the RRset.
2021-04-02 09:14:09 +02:00
Matthijs Mekking
fee164243f Keep track of allow client detach
The stale-answer-client-timeout feature introduced a dependancy on
when a client may be detached from the handle. The dboption
DNS_DBFIND_STALEONLY was reused to track this attribute. This overloads
the meaning of this database option, and actually introduced a bug
because the option was checked in other places. In particular, in
'ns_query_done()' there is a check for 'RECURSING(qctx->client) &&
(!QUERY_STALEONLY(&qctx->client->query) || ...' and the condition is
satisfied because recursion has not completed yet and
DNS_DBFIND_STALEONLY is already cleared by that time (in
query_lookup()), because we found a useful answer and we should detach
the client from the handle after sending the response.

Add a new boolean to the client structure to keep track of client
detach from handle is allowed or not. It is only disallowed if we are
in a staleonly lookup and we didn't found a useful answer.
2021-04-02 09:14:09 +02:00
Matthijs Mekking
87591de6f7 Fix servestale fetchlimits crash
When we query the resolver for a domain name that is in the same zone
for which is already one or more fetches outstanding, we could
potentially hit the fetch limits. If so, recursion fails immediately
for the incoming query and if serve-stale is enabled, we may try to
return a stale answer.

If the resolver is also is authoritative for the parent zone (for
example the root zone), first a delegation is found, but we first
check the cache for a better response.

Nothing is found in the cache, so we try to recurse to find the
answer to the query.

Because of fetch-limits 'dns_resolver_createfetch()' returns an error,
which 'ns_query_recurse()' propagates to the caller,
'query_delegation_recurse()'.

Because serve-stale is enabled, 'query_usestale()' is called,
setting 'qctx->db' to the cache db, but leaving 'qctx->version'
untouched. Now 'query_lookup()' is called to search for stale data
in the cache database with a non-NULL 'qctx->version'
(which is set to a zone db version), and thus we hit an assertion
in rbtdb.

This crash was introduced in 'main' by commit
8bcd7fe69e5343071fc917738d6092a8b974ef3f.
2021-03-11 12:16:14 +01:00
Ondřej Surý
a0181056a8 Change the isc_thread_self() return type to uintptr_t
The pthread_self(), thrd_current() or GetCurrentThreadId() could
actually be a pointer, so we should rather convert the value into
uintptr_t instead of unsigned long.
2021-02-25 16:21:10 +01:00
Matthijs Mekking
f8b7b597e9 Don't servfail on staleonly lookups
When a staleonly lookup doesn't find a satisfying answer, it should
not try to respond to the client.

This is not true when the initial lookup is staleonly (that is when
'stale-answer-client-timeout' is set to 0), because no resolver fetch
has been created at this point. In this case continue with the lookup
normally.
2021-02-25 11:32:17 +01:00
Matthijs Mekking
9e061faaae Don't allow recursion on staleonly lookups
Fix a crash that can happen in the following scenario:

A client request is received. There is no data for it in the cache,
(not even stale data). A resolver fetch is created as part of
recursion.

Some time later, the fetch still hasn't completed, and
stale-answer-client-timeout is triggered. A staleonly lookup is
started. It will also find no data in the cache.

So 'query_lookup()' will call 'query_gotanswer()' with ISC_R_NOTFOUND,
so this will call 'query_notfound()' and this will start recursion.

We will eventually end up in 'ns_query_recurse()' and that requires
the client query fetch to be NULL:

    REQUIRE(client->query.fetch == NULL);

If the previously started fetch is still running this assertion
fails.

The crash is easily prevented by not requiring recursion for
staleonly lookups.

Also remove a redundant setting of the staleonly flag at the end of
'query_lookup_staleonly()' before destroying the query context.

Add a system test to catch this case.
2021-02-25 11:32:17 +01:00
Matthijs Mekking
8bcd7fe69e Use stale on error also when unable to recurse
The 'query_usestale()' function was only called when in
'query_gotanswer()' and an unexpected error occurred. This may have
been "quota reached", and thus we were in some cases returning
stale data on fetch-limits (and if serve-stale enabled of course).

But we can also hit fetch-limits when recursing because we are
following a referral (in 'query_notfound()' and
'query_delegation_recurse()'). Here we should also check for using
stale data in case an error occurred.

Specifically don't check for using stale data when refetching a
zero TTL RRset from cache.

Move the setting of DNS_DBFIND_STALESTART into the 'query_usestale()'
function to avoid code duplication.
2021-02-08 15:17:09 +01:00
Matthijs Mekking
aabdedeae3 Only start stale refresh window when resuming
If we did not attempt a fetch due to fetch-limits, we should not start
the stale-refresh-time window.

Introduce a new flag DNS_DBFIND_STALESTART to differentiate between
a resolver failure and unexpected error. If we are resuming, this
indicates a resolver failure, then start the stale-refresh-time window,
otherwise don't start the stale-refresh-time window, but still fall
back to using stale data.

(This commit also wraps some docstrings to 80 characters width)
2021-01-28 16:38:34 +01:00
Matthijs Mekking
c6fd02aed5 Use stale data also if we are not resuming
Before this change, BIND will only fallback to using stale data if
there was an actual attempt to resolve the query. Then on a timeout,
the stale data from cache becomes eligible.

This commit changes this so that on any unexpected error stale data
becomes eligble (you would still have to have 'stale-answer-enable'
enabled of course).

If there is no stale data, this may return in an error again, so don't
loop on stale data lookup attempts. If the DNS_DBFIND_STALEOK flag is
set, this means we already tried to lookup stale data, so if that is
the case, don't use stale again.
2021-01-28 16:36:46 +01:00
Matthijs Mekking
fa0c9280d2 Update code flow in query.c wrt stale data
First of all, there was a flaw in the code related to the
'stale-refresh-time' option. If stale answers are enabled, and we
returned stale data, then it was assumed that it was because we were
in the 'stale-refresh-time' window. But now we could also have returned
stale data because of a 'stale-answer-client-timeout'. To fix this,
introduce a rdataset attribute DNS_RDATASETATTR_STALE_WINDOW to
indicate whether the stale cache entry was returned because the
'stale-refresh-time' window is active.

Second, remove the special case handling when the result is
DNS_R_NCACHENXRRSET. This can be done more generic in the code block
when dealing with stale data.

Putting all stale case handling in the code block when dealing with
stale data makes the code more easy to follow.

Update documentation to be more verbose and to match then new code
flow.
2021-01-25 10:48:16 -03:00
Diego Fronza
966060c03b Extracted common function from query_lookup and query_refresh_rrset
Both functions employed the same code lines to allocate query context
buffers, which are used to store query results, so this shared portion
of code was extracted out to a new function, qctx_prepare_buffers.

Also, this commit uses qctx_init to initialize the query context whitin
query_refresh_rrset function.
2021-01-25 10:48:16 -03:00
Diego Fronza
f89ac07b28 Small optimization in query_usestale
This commit makes the code in query_usestale easier to follow, it also
doesn't attach/detach to the database if stale answers are not enabled.
2021-01-25 10:48:16 -03:00
Diego Fronza
e219422575 Allow stale data to be used before name resolution
This commit allows stale RRset to be used (if available) for responding
a query, before an attempt to refresh an expired, or otherwise resolve
an unavailable RRset in cache is made.

For that to work, a value of zero must be specified for
stale-answer-client-timeout statement.

To better understand the logic implemented, there are three flags being
used during database lookup and other parts of code that must be
understood:

. DNS_DBFIND_STALEOK: This flag is set when BIND fails to refresh a
  RRset due to timeout (resolver-query-timeout), its intent is to
  try to look for stale data in cache as a fallback, but only if
  stale answers are enabled in configuration.

  This flag is also used to activate stale-refresh-time window, since it
  is the only way the database knows that a resolution has failed.

. DNS_DBFIND_STALEENABLED: This flag is used as a hint to the database
  that it may use stale data. It is always set during query lookup if
  stale answers are enabled, but only effectively used during
  stale-refresh-time window. Also during this window, the resolver will
  not try to resolve the query, in other words no attempt to refresh the
  data in cache is made when the stale-refresh-time window is active.

. DNS_DBFIND_STALEONLY: This new introduced flag is used when we want
  stale data from the database, but not due to a failure in resolution,
  it also doesn't require stale-refresh-time window timer to be active.
  As long as there is a stale RRset available, it should be returned.
  It is mainly used in two situations:

    1. When stale-answer-client-timeout timer is triggered: in that case
       we want to know if there is stale data available to answer the
       client.
    2. When stale-answer-client-timeout value is set to zero: in that
       case, we also want to know if there is some stale RRset available
       to promptly answer the client.

We must also discern between three situations that may happen when
resolving a query after the addition of stale-answer-client-timeout
statement, and how to handle them:

	1. Are we running query_lookup() due to stale-answer-client-timeout
       timer being triggered?

       In this case, we look for stale data, making use of
       DNS_DBFIND_STALEONLY flag. If a stale RRset is available then
       respond the client with the data found, mark this query as
       answered (query attribute NS_QUERYATTR_ANSWERED), so when the
       fetch completes the client won't be answered twice.

       We must also take care of not detaching from the client, as a
       fetch will still be running in background, this is handled by the
       following snippet:

       if (!QUERY_STALEONLY(&client->query)) {
           isc_nmhandle_detach(&client->reqhandle);
       }

       Which basically tests if DNS_DBFIND_STALEONLY flag is set, which
       means we are here due to a stale-answer-client-timeout timer
       expiration.

    2. Are we running query_lookup() due to resolver-query-timeout being
       triggered?

       In this case, DNS_DBFIND_STALEOK flag will be set and an attempt
       to look for stale data will be made.
       As already explained, this flag is algo used to activate
       stale-refresh-time window, as it means that we failed to refresh
       a RRset due to timeout.
       It is ok in this situation to detach from the client, as the
       fetch is already completed.

    3. Are we running query_lookup() during the first time, looking for
       a RRset in cache and stale-answer-client-timeout value is set to
       zero?

       In this case, if stale answers are enabled (probably), we must do
       an initial database lookup with DNS_DBFIND_STALEONLY flag set, to
       indicate to the database that we want stale data.

       If we find an active RRset, proceed as normal, answer the client
       and the query is done.

       If we find a stale RRset we respond to the client and mark the
       query as answered, but don't detach from the client yet as an
       attempt in refreshing the RRset will still be made by means of
       the new introduced function 'query_resolve'.

       If no active or stale RRset is available, begin resolution as
       usual.
2021-01-25 10:47:14 -03:00
Diego Fronza
171a5b7542 Add stale-answer-client-timeout option
The general logic behind the addition of this new feature works as
folows:

When a client query arrives, the basic path (query.c / ns_query_recurse)
was to create a fetch, waiting for completion in fetch_callback.

With the introduction of stale-answer-client-timeout, a new event of
type DNS_EVENT_TRYSTALE may invoke fetch_callback, whenever stale
answers are enabled and the fetch took longer than
stale-answer-client-timeout to complete.

When an event of type DNS_EVENT_TRYSTALE triggers fetch_callback, we
must ensure that the folowing happens:

1. Setup a new query context with the sole purpose of looking up for
   stale RRset only data, for that matters a new flag was added
   'DNS_DBFIND_STALEONLY' used in database lookups.

    . If a stale RRset is found, mark the original client query as
      answered (with a new query attribute named NS_QUERYATTR_ANSWERED),
      so when the fetch completion event is received later, we avoid
      answering the client twice.

    . If a stale RRset is not found, cleanup and wait for the normal
      fetch completion event.

2. In ns_query_done, we must change this part:
	/*
	 * If we're recursing then just return; the query will
	 * resume when recursion ends.
	 */
	if (RECURSING(qctx->client)) {
		return (qctx->result);
	}

   To this:

	if (RECURSING(qctx->client) && !QUERY_STALEONLY(qctx->client)) {
		return (qctx->result);
	}

   Otherwise we would not proceed to answer the client if it happened
   that a stale answer was found when looking up for stale only data.

When an event of type DNS_EVENT_FETCHDONE triggers fetch_callback, we
proceed as before, resuming query, updating stats, etc, but a few
exceptions had to be added, most important of which are two:

1. Before answering the client (ns_client_send), check if the query
   wasn't already answered before.

2. Before detaching a client, e.g.
   isc_nmhandle_detach(&client->reqhandle), ensure that this is the
   fetch completion event, and not the one triggered due to
   stale-answer-client-timeout, so a correct call would be:
   if (!QUERY_STALEONLY(client)) {
        isc_nmhandle_detach(&client->reqhandle);
   }

Other than these notes, comments were added in code in attempt to make
these updates easier to follow.
2021-01-25 10:47:14 -03:00
Diego Fronza
74840ec50b Added dns_view_staleanswerenabled() function
Since it takes a couple lines of code to check whether stale answers
are enabled for a given view, code was extracted out to a proper
function.
2021-01-25 10:47:14 -03:00
Matthijs Mekking
87744f218d Remove a lot of obsoleted options
These options were ancient or made obsolete a long time ago, it is
safe to remove them.

Also stop printing ancient options, they should be treated the same as
unknown options.

Removed options: lwres, geoip-use-ecs, sit-secret, use-ixfr,
acache-cleaning-interval, acache-enable, additional-from-auth,
additional-from-cache, allow-v6-synthesis, dnssec-enable,
max-acache-size, nosit-udp-size, queryport-pool-ports,
queryport-pool-updateinterval, request-sit, use-queryport-pool, and
support-ixfr.
2021-01-19 10:12:40 +01:00
JINMEI Tatuya
d520f01c7b detach fetchhandle before resume query processing
otherwise, another hook async event or DNS recursion would
trigger an assertion failure.
2021-01-06 13:14:13 -08:00
Ondřej Surý
7ba18870dc Reformat sources using clang-format-11 2020-12-08 18:36:23 +01:00
Diego Fronza
95add01643 Silence coverity warnings in query.c
Return value of dns_db_getservestalerefresh() and
dns_db_getservestalettl() functions were previously unhandled.

This commit purposefully ignore those return values since there is
no side effect if those results are != ISC_R_SUCCESS, it also supress
Coverity warnings.
2020-11-26 14:55:14 +00:00
JINMEI Tatuya
75cdd758ed implementation of hook-based asynchronous functionality
previously query plugins were strictly synchrounous - the query
process would be interrupted at some point, data would be looked
up or a change would be made, and then the query processing would
resume immediately.

this commit enables query plugins to initiate asynchronous processes
and resume on a completion event, as with recursion.
2020-11-24 15:11:39 -08:00
JINMEI Tatuya
9c8dae041d ns_query refactoring for hook-based recursion
several small changes to query processing to make it easier to
use hook-based recursion (and other asynchronous functionlity)
later.

- recursion quota check is now a separate function,
  check_recursionquota(), which is called by ns_query_recurse().
- pass isc_result to query_nxdomain() instead of bool.
  the value of 'empty_wild' will be determined in the function
  based on the passed result.  this is similar to query_nodata(),
  and makes the signatures of the two functions more consistent.
- pass the current 'result' value into plugin hooks.
2020-11-24 15:11:39 -08:00
Mark Andrews
e980affba0 Fix DNAME when QTYPE is CNAME or ANY
The synthesised CNAME is not supposed to be followed when the
QTYPE is CNAME or ANY as the lookup is satisfied by the CNAME
record.
2020-11-19 10:18:01 +11:00
Diego Fronza
4827ad0ec4 Add stale-refresh-time option
Before this update, BIND would attempt to do a full recursive resolution
process for each query received if the requested rrset had its ttl
expired. If the resolution fails for any reason, only then BIND would
check for stale rrset in cache (if 'stale-cache-enable' and
'stale-answer-enable' is on).

The problem with this approach is that if an authoritative server is
unreachable or is failing to respond, it is very unlikely that the
problem will be fixed in the next seconds.

A better approach to improve performance in those cases, is to mark the
moment in which a resolution failed, and if new queries arrive for that
same rrset, try to respond directly from the stale cache, and do that
for a window of time configured via 'stale-refresh-time'.

Only when this interval expires we then try to do a normal refresh of
the rrset.

The logic behind this commit is as following:

- In query.c / query_gotanswer(), if the test of 'result' variable falls
  to the default case, an error is assumed to have happened, and a call
  to 'query_usestale()' is made to check if serving of stale rrset is
  enabled in configuration.

- If serving of stale answers is enabled, a flag will be turned on in
  the query context to look for stale records:
  query.c:6839
  qctx->client->query.dboptions |= DNS_DBFIND_STALEOK;

- A call to query_lookup() will be made again, inside it a call to
  'dns_db_findext()' is made, which in turn will invoke rbdb.c /
  cache_find().

- In rbtdb.c / cache_find() the important bits of this change is the
  call to 'check_stale_header()', which is a function that yields true
  if we should skip the stale entry, or false if we should consider it.

- In check_stale_header() we now check if the DNS_DBFIND_STALEOK option
  is set, if that is the case we know that this new search for stale
  records was made due to a failure in a normal resolution, so we keep
  track of the time in which the failured occured in rbtdb.c:4559:
  header->last_refresh_fail_ts = search->now;

- In check_stale_header(), if DNS_DBFIND_STALEOK is not set, then we
  know this is a normal lookup, if the record is stale and the query
  time is between last failure time + stale-refresh-time window, then
  we return false so cache_find() knows it can consider this stale
  rrset entry to return as a response.

The last additions are two new methods to the database interface:
- setservestale_refresh
- getservestale_refresh

Those were added so rbtdb can be aware of the value set in configuration
option, since in that level we have no access to the view object.
2020-11-11 12:53:23 -03:00
Diego Fronza
d727eaae6c Always return address records in additional section for NS queries 2020-10-21 12:03:42 -03:00
Evan Hunt
86eddebc83 Purge memory pool upon plugin destruction
The typical sequence of events for AAAA queries which trigger recursion
for an A RRset at the same name is as follows:

 1. Original query context is created.
 2. An AAAA RRset is found in cache.
 3. Client-specific data is allocated from the filter-aaaa memory pool.
 4. Recursion is triggered for an A RRset.
 5. Original query context is torn down.

 6. Recursion for an A RRset completes.
 7. A second query context is created.
 8. Client-specific data is retrieved from the filter-aaaa memory pool.
 9. The response to be sent is processed according to configuration.
10. The response is sent.
11. Client-specific data is returned to the filter-aaaa memory pool.
12. The second query context is torn down.

However, steps 6-12 are not executed if recursion for an A RRset is
canceled.  Thus, if named is in the process of recursing for A RRsets
when a shutdown is requested, the filter-aaaa memory pool will have
outstanding allocations which will never get released.  This in turn
leads to a crash since every memory pool must not have any outstanding
allocations by the time isc_mempool_destroy() is called.

Fix by creating a stub query context whenever fetch_callback() is called,
including cancellation events. When the qctx is destroyed, it will ensure
the client is detached and the plugin memory is freed.
2020-09-25 13:32:34 -07:00
Mark Andrews
f0d9bf7c30 Clone the saved / query message buffers
The message buffer passed to ns__client_request is only valid for
the life of the the ns__client_request call.  Save a copy of it
when we recurse or process a update as ns__client_request will
return before those operations complete.
2020-09-23 10:37:42 +10:00
Ondřej Surý
d4976e0ebe Add separate prefetch nmhandle to ns_client_t
As the query_prefetch() or query_rpzfetch() could be called during
"regular" fetch, we need to introduce separate storage for attaching
the nmhandle during prefetching the records.  The query_prefetch()
and query_rpzfetch() are guarded for re-entrance by .query.prefetch
member of ns_client_t, so we can reuse the same .prefetchhandle for
both.
2020-09-22 09:56:26 +02:00
Evan Hunt
dcee985b7f update all copyright headers to eliminate the typo 2020-09-14 16:20:40 -07:00
Evan Hunt
57b4dde974 change from isc_nmhandle_ref/unref to isc_nmhandle attach/detach
Attaching and detaching handle pointers will make it easier to
determine where and why reference counting errors have occurred.

A handle needs to be referenced more than once when multiple
asynchronous operations are in flight, so callers must now maintain
multiple handle pointers for each pending operation. For example,
ns_client objects now contain:

        - reqhandle:    held while waiting for a request callback (query,
                        notify, update)
        - sendhandle:   held while waiting for a send callback
        - fetchhandle:  held while waiting for a recursive fetch to
                        complete
        - updatehandle: held while waiting for an update-forwarding
                        task to complete

control channel connection objects now contain:

        - readhandle: held while waiting for a read callback
        - sendhandle: held while waiting for a send callback
        - cmdhandle:  held while an rndc command is running

httpd connections contain:

        - readhandle: held while waiting for a read callback
        - sendhandle: held while waiting for a send callback
2020-09-11 12:17:57 -07:00
Diego Fronza
aab691d512 Fix ns_statscounter_recursclients underflow
The basic scenario for the problem was that in the process of
resolving a query, if any rrset was eligible for prefetching, then it
would trigger a call to query_prefetch(), this call would run in
parallel to the normal query processing.

The problem arises due to the fact that both query_prefetch(), and,
in the original thread, a call to ns_query_recurse(), try to attach
to the recursionquota, but recursing client stats counter is only
incremented if ns_query_recurse() attachs to it first.

Conversely, if fetch_callback() is called before prefetch_done(),
it would not only detach from recursionquota, but also decrement
the stats counter, if query_prefetch() attached to te quota first
that would result in a decrement not matched by an increment, as
expected.

To solve this issue an atomic bool was added, it is set once in
ns_query_recurse(), allowing fetch_callback() to check for it
and decrement stats accordingly.

For a more compreensive explanation check the thread comment below:
https://gitlab.isc.org/isc-projects/bind9/-/issues/1719#note_145857
2020-07-13 11:46:18 -03:00
Mark Andrews
e5b2eca1d3 The dsset returned by dns_keynode_dsset needs to be thread safe.
- clone keynode->dsset rather than return a pointer so that thread
  use is independent of each other.
- hold a reference to the dsset (keynode) so it can't be deleted
  while in use.
- create a new keynode when removing DS records so that dangling
  pointers to the deleted records will not occur.
- use a rwlock when accessing the rdatalist to prevent instabilities
  when DS records are added.
2020-06-11 16:02:09 +10:00
Evan Hunt
57e54c46e4 change "expr == false" to "!expr" in conditionals 2020-05-25 16:09:57 -07:00
Evan Hunt
68a1c9d679 change 'expr == true' to 'expr' in conditionals 2020-05-25 16:09:57 -07:00
Diego Fronza
f2bf7beeb6 Added new logging category rpz-passthru
It is now possible to use the new logging category "rpz-passthru"
to redirect RPZ passthru activity to a dedicate logging channel.
2020-05-07 11:44:48 -03:00
Ondřej Surý
978c7b2e89 Complete rewrite the BIND 9 build system
The rewrite of BIND 9 build system is a large work and cannot be reasonable
split into separate merge requests.  Addition of the automake has a positive
effect on the readability and maintainability of the build system as it is more
declarative, it allows conditional and we are able to drop all of the custom
make code that BIND 9 developed over the years to overcome the deficiencies of
autoconf + custom Makefile.in files.

This squashed commit contains following changes:

- conversion (or rather fresh rewrite) of all Makefile.in files to Makefile.am
  by using automake

- the libtool is now properly integrated with automake (the way we used it
  was rather hackish as the only official way how to use libtool is via
  automake

- the dynamic module loading was rewritten from a custom patchwork to libtool's
  libltdl (which includes the patchwork to support module loading on different
  systems internally)

- conversion of the unit test executor from kyua to automake parallel driver

- conversion of the system test executor from custom make/shell to automake
  parallel driver

- The GSSAPI has been refactored, the custom SPNEGO on the basis that
  all major KRB5/GSSAPI (mit-krb5, heimdal and Windows) implementations
  support SPNEGO mechanism.

- The various defunct tests from bin/tests have been removed:
  bin/tests/optional and bin/tests/pkcs11

- The text files generated from the MD files have been removed, the
  MarkDown has been designed to be readable by both humans and computers

- The xsl header is now generated by a simple sed command instead of
  perl helper

- The <irs/platform.h> header has been removed

- cleanups of configure.ac script to make it more simpler, addition of multiple
  macros (there's still work to be done though)

- the tarball can now be prepared with `make dist`

- the system tests are partially able to run in oot build

Here's a list of unfinished work that needs to be completed in subsequent merge
requests:

- `make distcheck` doesn't yet work (because of system tests oot run is not yet
  finished)

- documentation is not yet built, there's a different merge request with docbook
  to sphinx-build rst conversion that needs to be rebased and adapted on top of
  the automake

- msvc build is non functional yet and we need to decide whether we will just
  cross-compile bind9 using mingw-w64 or fix the msvc build

- contributed dlz modules are not included neither in the autoconf nor automake
2020-04-21 14:19:48 +02:00
Ondřej Surý
78886d4bed Fix the statistic counter underflow in ns_client_t
In case of normal fetch, the .recursionquota is attached and
ns_statscounter_recursclients is incremented when the fetch is created.  Then
the .recursionquota is detached and the counter decremented in the
fetch_callback().

In case of prefetch or rpzfetch, the quota is attached, but the counter is not
incremented.  When we reach the soft-quota, the function returns early but don't
detach from the quota, and it gets destroyed during the ns_client_endrequest(),
so no memory was leaked.

But because the ns_statscounter_recursclients is only incremented during the
normal fetch the counter would be incorrectly decremented on two occassions:

1) When we reached the softquota, because the quota was not properly detached
2) When the prefetch or rpzfetch was cancelled mid-flight and the callback
   function was never called.
2020-04-03 19:41:46 +02:00
Diego Fronza
c786c578d7 Added RPZ configuration option "nsdname-wait-recurse"
This new option was added to fill a gap in RPZ configuration
options.

It was possible to instruct BIND wheter NSIP rewritting rules would
apply or not, as long as the required data was already in cache or not,
respectively, by means of the option nsip-wait-recurse.

A value of yes (default) could incur a little processing cost, since
BIND would need to recurse to find NS addresses in case they were not in
the cache.

This behavior could be changed by setting nsip-wait-recurse value to no,
in which case BIND would promptly return some error code if the NS IP addresses
data were not in cache, then BIND would start a recursive query
in background, so future similar requests would have the required data
(NS IPs) in cache, allowing BIND to apply NSIP rules accordingly.

A similar feature wasn't available for NSDNAME triggers, so this commit
adds the option nsdname-wait-recurse to fill this gap, as it was
expected by couple BIND users.
2020-03-16 15:18:46 -03:00
Evan Hunt
ba0313e649 fix spelling errors reported by Fossies. 2020-02-21 15:05:08 +11:00
Evan Hunt
0002377dca adjust the clang-format penalties to reduce string breaking
this corrects some style glitches such as:
```
        long_function_call(arg, arg2, arg3, arg4, arg5, "str"
                                                        "ing");
```
...by adjusting the penalties for breaking strings and call
parameter lists.
2020-02-17 14:23:58 -08:00
Ondřej Surý
5777c44ad0 Reformat using the new rules 2020-02-14 09:31:05 +01:00
Evan Hunt
e851ed0bb5 apply the modified style 2020-02-13 15:05:06 -08:00
Ondřej Surý
056e133c4c Use clang-tidy to add curly braces around one-line statements
The command used to reformat the files in this commit was:

./util/run-clang-tidy \
	-clang-tidy-binary clang-tidy-11
	-clang-apply-replacements-binary clang-apply-replacements-11 \
	-checks=-*,readability-braces-around-statements \
	-j 9 \
	-fix \
	-format \
	-style=file \
	-quiet
clang-format -i --style=format $(git ls-files '*.c' '*.h')
uncrustify -c .uncrustify.cfg --replace --no-backup $(git ls-files '*.c' '*.h')
clang-format -i --style=format $(git ls-files '*.c' '*.h')
2020-02-13 22:07:21 +01:00
Ondřej Surý
36c6105e4f Use coccinelle to add braces to nested single line statement
Both clang-tidy and uncrustify chokes on statement like this:

for (...)
	if (...)
		break;

This commit uses a very simple semantic patch (below) to add braces around such
statements.

Semantic patch used:

@@
statement S;
expression E;
@@

while (...)
- if (E) S
+ { if (E) { S } }

@@
statement S;
expression E;
@@

for (...;...;...)
- if (E) S
+ { if (E) { S } }

@@
statement S;
expression E;
@@

if (...)
- if (E) S
+ { if (E) { S } }
2020-02-13 21:58:55 +01:00
Ondřej Surý
f50b1e0685 Use clang-format to reformat the source files 2020-02-12 15:04:17 +01:00
Ondřej Surý
c73e5866c4 Refactor the isc_buffer_allocate() usage using the semantic patch
The isc_buffer_allocate() function now cannot fail with ISC_R_MEMORY.
This commit removes all the checks on the return code using the semantic
patch from previous commit, as isc_buffer_allocate() now returns void.
2020-02-03 08:29:00 +01:00
Evan Hunt
7fdf40770f remove all code that uses non-DS trust anchors
as initial-key and static-key trust anchors will now be stored as a
DS rrset, code referencing keynodes storing DNSKEY trust anchors will
no longer be reached.
2020-01-14 09:24:13 -08:00
Ondřej Surý
edd97cddc1 Refactor dns_name_dup() usage using the semantic patch 2019-11-29 14:00:37 +01:00
Evan Hunt
edafbf1c0f fix root key sentinel code to send the correct key ID for DS trust anchors 2019-11-15 15:47:57 -08:00
Evan Hunt
53f0b6c34d convert ns_client and related objects to use netmgr
- ns__client_request() is now called by netmgr with an isc_nmhandle_t
  parameter. The handle can then be permanently associated with an
  ns_client object.
- The task manager is paused so that isc_task events that may be
  triggred during client processing will not fire until after the netmgr is
  finished with it. Before any asynchronous event, the client MUST
  call isc_nmhandle_ref(client->handle), to prevent the client from
  being reset and reused while waiting for an event to process. When
  the asynchronous event is complete, isc_nmhandle_unref(client->handle)
  must be called to ensure the handle can be reused later.
- reference counting of client objects is now handled in the nmhandle
  object.  when the handle references drop to zero, the client's "reset"
  callback is used to free temporary resources and reiniialize it,
  whereupon the handle (and associated client) is placed in the
  "inactive handles" queue.  when the sysstem is shutdown and the
  handles are cleaned up, the client's "put" callback is called to free
  all remaining resources.
- because client allocation is no longer handled in the same way,
  the '-T clienttest' option has now been removed and is no longer
  used by any system tests.
- the unit tests require wrapping the isc_nmhandle_unref() function;
  when LD_WRAP is supported, that is used. otherwise we link a
  libwrap.so interposer library and use that.
2019-11-07 11:55:37 -08:00