2
0
mirror of https://gitlab.isc.org/isc-projects/bind9 synced 2025-08-28 21:17:54 +00:00

994 Commits

Author SHA1 Message Date
Evan Hunt
6ac8723611
use isc_loop_now() for dispentry timeouts
store a pointer to the running loop when creating a dispatch entry
with dns_dispatch_add(), and use isc_loop_now() to get the timestamp for
the current event loop tick when we initialize the dispentry start time
and check for timeouts.
2023-07-19 15:32:21 +02:00
Evan Hunt
0955cf1af5 clean up numbering of FETCHOPT and ADDRINFO flags
in the past there was overlap between the fields used
as resolver fetch options and ADB addrinfo flags. this has
mostly been eliminated; now we can clean up the rest of
it and remove some confusing comments.
2023-07-04 18:23:57 +00:00
Evan Hunt
5ba73c785e fix a TSAN bug in "rndc fetchlimit"
fctx counters could be accessed without locking when
"rndc fetchlimit" is called; while this is probably harmless
in production, it triggered TSAN reports in system tests.
2023-06-30 06:52:01 +00:00
Evan Hunt
352d542d27 minor refactoring of resume_qmin() for clarity
make the code flow clearer by enumerating the result codes that
are treated as success conditions for an intermediate minimized
query (ISC_R_SUCCESS, DNS_R_DELEGATION, DNS_R_NXRRSET, etc), rather
than just folding them all into the 'default' branch of a switch
statement.
2023-06-29 10:14:20 -07:00
Mark Andrews
ea11650376 In rctx_answer return DNS_R_DELEGATION on NOFOLLOW
When DNS_FETCHOPT_NOFOLLOW is set DNS_R_DELEGATION needs to be
returned to restart the resolution process rather than converting
it to ISC_R_SUCCESS.
2023-06-28 11:48:32 +10:00
Mark Andrews
80bc0ee075 Skip some QNAME mininisation queries if possible
If we know that the NS RRset for an intermediate label doesn't exist
on cache contents don't query using that name when looking for a
referral.
2023-06-28 11:47:56 +10:00
Mark Andrews
dd00b3c50b Use NS rather than A records for qname-minimization relaxed
Remove all references to DNS_FETCHOPT_QMIN_USE_A and adjust
the expected tests results in the qmin system test.
2023-06-28 11:45:59 +10:00
Ondřej Surý
519481dcdb
Use per-loop memory contexts for dns_resolver child objects
The dns_resolver creates a lot of smaller objects (fetch context, fetch
counter, query, response, ...) and those are all loop-bound.
Previously, those objects were allocated from the a single resolver
context, which in turn increases contention between threads - remember
"dead by thousand atomic paper cuts".  Instead of using a single memory
context, use the per-loop memory contexts that are bound to a specific
loop and thus there's no contention between them when doing the memory
accounting.
2023-06-27 10:51:54 +02:00
Mark Andrews
971f49b3ad Use RCU for view->adb access
view->adb may be referenced while the view is shutting down as the
zone uses a weak reference to the view and examines view->adb but
dns_view_detach call dns_adb_detach to clear view->adb.
2023-06-14 19:21:28 +10:00
Mark Andrews
783c6a9538
Use dns_view_findzone instead of dns_zt_find
This ensures that rcu locking is properly applied for
view->zonetable.
2023-06-01 16:51:38 +02:00
Aram Sargsyan
2ae5c4a674 Fix a clients-per-query miscalculation bug
The number of clients per query is calculated using the pending
fetch responses in the list. The dns_resolver_createfetch() function
includes every item in the list when deciding whether the limit is
reached (i.e. fctx->spilled is true). Then, when the limit is reached,
there is another calculation in fctx_sendevents(), when deciding
whether it is needed to increase the limit, but this time the TRYSTALE
responses are not included in the calculation (because of early break
from the loop), and because of that the limit is never increased.

A single client can have more than one associated response/event in the
list (currently max. two), and calculating them as separate "clients"
is unexpected. E.g. if 'stale-answer-enable' is enabled and
'stale-answer-client-timeout' is enabled and is larger than 0, then
each client will have two events, which will effectively halve the
clients-per-query limit.

Fix the dns_resolver_createfetch() function to calculate only the
regular FETCHDONE responses/events.

Change the fctx_sendevents() function to also calculate only FETCHDONE
responses/events. Currently, this second change doesn't have any impact,
because the TRYSTALE events were already skipped, but having the same
condition in both places will help prevent similar bugs in the future
if a new type of response/event is ever added.
2023-06-01 08:13:09 +00:00
Aram Sargsyan
04648d7c2f Add ClientQuota statistics channel counter
This counter indicates the number of the resolver's spilled
queries due to reaching the clients per query quota.
2023-05-31 09:08:58 +00:00
Mark Andrews
f3b24ba789 Handle FORMERR on unknown EDNS option that are echoed
If the resolver received a FORMERR response to a request with
an DNS COOKIE option present that echoes the option back, resend
the request without an DNS COOKIE option present.
2023-05-11 09:32:02 +10:00
Evan Hunt
2269a3e6fb
check for invalid protocol when dispatch fails
treat ISC_R_INVALIDPROTO as a networking error when it occurs.
2023-04-21 12:42:11 +02:00
Tony Finch
e8ff0f0c08 Correct value of DNS_NAME_MAXLABELS
It should be floor(DNS_NAME_MAXWIRE / 2) + 1 == 128

The mistake was introduced in c6bf51492dbd because:

  * I was refactoring an existing `DNS_MAX_LABELS` defined as 127

  * There was a longstanding bug in `dns_name_isvalid()` which
    checked the number of labels against 127U instead of 128

  * I mistakenly thought `dns_name_isvalid()` was correct and
    `dns_name_countlabels()` was incorrect, but the reverse was true.

After this commit, occurrances of `DNS_NAME_MAXLABELS` with value
128 are consistent with the use of 127 or 128 before commit
c6bf51492dbd except for the mistake in `dns_name_isvalid()`.
This commit adds a test case that checks the MAXLABELS case
in `dns_name_fromtext()` and `dns_name_isvalid()`.
2023-04-05 14:46:39 +00:00
Tony Finch
b171cacf4f Use a qp-trie for the zone table
This change makes the zone table lock-free for reads. Previously, the
zone table used a red-black tree, which is not thread safe, so the hot
read path acquired both the per-view mutex and the per-zonetable
rwlock. (The double locking was to fix to cleanup races on shutdown.)

One visible difference is that zones are not necessarily shut down
promptly: it depends on when the qp-trie garbage collector cleans up
the zone table. The `catz` system test checks several times that zones
have been deleted; the test now checks for zones to be removed from
the server configuration, instead of being fully shut down. The catz
test does not churn through enough zones to trigger a gc, so the zones
are not fully detached until the server exits.

After this change, it is still possible to improve the way we handle
changes to the zone table, for instance, batching changes, or better
compaction heuristics.
2023-04-05 12:38:11 +01:00
Ondřej Surý
b8d34e960b
Change dns_adbentry_overquota() to dns_adb_overquota()
The dns_adbentry_overquota() was violating the layers accessing the
adbentry struct members directly.  Change it to dns_adb_overquota() to
match the dns_adb API.
2023-04-04 16:21:49 +02:00
Ondřej Surý
a5f5f68502
Refactor isc_time_now() to return time, and not result
The isc_time_now() and isc_time_now_hires() were used inconsistently
through the code - either with status check, or without status check,
or via TIME_NOW() macro with RUNTIME_CHECK() on failure.

Refactor the isc_time_now() and isc_time_now_hires() to always fail when
getting current time has failed, and return the isc_time_t value as
return value instead of passing the pointer to result in the argument.
2023-03-31 15:02:06 +02:00
Ondřej Surý
46f06c1d6e
Apply the semantic patch to remove isc_stdtime_get()
This is a simple replacement using the semantic patch from the previous
commit and as added bonus, one removal of previously undetected unused
variable in named/server.c.
2023-03-31 13:32:56 +02:00
Ondřej Surý
1844590ad9
Refactor isc_job_run to not-make any allocations
Change the isc_job_run() to not-make any allocations.  The caller must
make sure that it allocates isc_job_t - usually as part of the argument
passed to the callback.

For simple jobs, using isc_async_run() is advised as it allocates its
own separate isc_job_t.
2023-03-30 16:00:52 +02:00
Evan Hunt
8ce33dca6a change the log level of "resolver priming query complete"
this log message, formerly at level INFO, is now DEBUG(1),
so it won't be printed when running "delv +ns +nortrace".
2023-03-28 12:39:06 -07:00
Evan Hunt
833ca463d4 remove {root-}delegation-only
complete the removal of the delegation-only and root-delegation-only
options, and the delegation-only zone type.
2023-03-23 12:57:01 -07:00
Ondřej Surý
93259812dd
Properly handle ISC_R_SHUTTINGDOWN in resquery_response()
When resquery_response() was called with ISC_R_SHUTTINDOWN, the region
argument would be NULL, but rctx_respinit() would try to pass
region->base and region->len to the isc_buffer_init() leading to
a NULL pointer dereference.  Properly handle non-ISC_R_SUCCESS by
ignoring the provided region.
2023-03-23 11:51:22 +01:00
Ondřej Surý
bd4576b3ce Remove TKEY Mode 2 (Diffie-Hellman)
Completely remove the TKEY Mode 2 (Diffie-Hellman Exchanged Keying) from
BIND 9 (from named, named.conf and all the tools).  The TKEY usage is
fringe at best and in all known cases, GSSAPI is being used as it should.

The draft-eastlake-dnsop-rfc2930bis-tkey specifies that:

    4.2 Diffie-Hellman Exchanged Keying (Deprecated)

       The use of this mode (#2) is NOT RECOMMENDED for the following two
       reasons but the specification is still included in Appendix A in case
       an implementation is needed for compatibility with old TKEY
       implementations. See Section 4.6 on ECDH Exchanged Keying.

          The mixing function used does not meet current cryptographic
          standards because it uses MD5 [RFC6151].

          RSA keys must be excessively long to achieve levels of security
          required by current standards.

We might optionally implement Elliptic Curve Diffie-Hellman (ECDH) key
exchange mode 6 if the draft ever reaches the RFC status.  Meanwhile the
insecure DH mode needs to be removed.
2023-03-08 08:36:25 +01:00
Ondřej Surý
cd632ad31d
Implement dns_db node tracing
This implements node reference tracing that passes all the internal
layers from dns_db API (and friends) to increment_reference() and
decrement_reference().

It can be enabled by #defining DNS_DB_NODETRACE in <dns/trace.h> header.

The output then looks like this:

    incr:node:check_address_records:rootns.c:409:0x7f67f5a55a40->references = 1
    decr:node:check_address_records:rootns.c:449:0x7f67f5a55a40->references = 0

    incr:nodelock:check_address_records:rootns.c:409:0x7f67f5a55a40:0x7f68304d7040->references = 1
    decr:nodelock:check_address_records:rootns.c:449:0x7f67f5a55a40:0x7f68304d7040->references = 0

There's associated python script to find the missing detach located at:
https://gitlab.isc.org/isc-projects/bind9/-/snippets/1038
2023-02-28 11:44:15 +01:00
Tony Finch
c6bf51492d Define DNS_NAME_MAXLABELS and DNS_NAME_LABELLEN
Some qp-trie operations will need to know the maximum number of labels
in a name, so I wanted a standard macro definition with the right
value.

Replace DNS_MAX_LABELS from <dns/resolver.h with DNS_NAME_MAXLABELS in
<dns/name.h>, and add its counterpart DNS_NAME_LABELLEN.

Use these macros in `name.c` and `resolver.c`.

Fix an off-by-one error in an assertion in `dns_name_countlabels()`.
2023-02-27 11:27:12 +00:00
Evan Hunt
ae5ba54fbe move dispatchmgr from resolver to view
the 'dispatchmgr' member of the resolver object is used by both
the dns_resolver and dns_request modules, and may in the future
be used by others such as dns_xfrin. it doesn't make sense for it
to live in the resolver object; this commit moves it into dns_view.
2023-02-24 08:30:33 +00:00
Mark Andrews
9c17f4353b Cleanup left over 'fctx != NULL' test following refactoring
This was causing 'CID 436299: Null pointer dereferences (REVERSE_INULL)'
in Coverity.  Also removed an 'INSIST(fctx != NULL);' that should
no longer be needed.
2023-02-21 12:22:27 +00:00
Ondřej Surý
7da99414c0
Implement proper reference counting in dns_validator
use reference counting in dns_validator to prevent use after free.
2023-02-17 07:18:25 +01:00
Evan Hunt
b4715a34a0
additional refactoring of dns_validator
refactor validator so that the validation status object (previously
called dns_valstatus_t, which was derived from dns_validatorevent_t), is
now part of the dns_validator object.  when calling validator callbacks,
the validator itself is now sent as the argument.

(note: this necessitates caution in the callback functions that are
internal to validator.c validators spawn other validators, and it can be
confusing at times whether we need to be looking at val, val->subvalidator,
or val->parent.)
2023-02-17 07:18:25 +01:00
Evan Hunt
0312789129
refactor dns_resolver to use loop callbacks
callback events from dns_resolver_createfetch() are now posted
using isc_async_run.

other modules which called the resolver and maintained task/taskmgr
objects for this purpose have been cleaned up.
2023-02-16 17:27:59 +01:00
Evan Hunt
7a78a85b35
refactor dns_validator to use loop callbacks
The validator now uses loop callbacks to post its completion
events. (A task is still used for the fetches.)
2023-02-16 14:55:06 +01:00
Evan Hunt
31aee2ef9c
refactor dns_adb to use loop callbacks
The callbacks from dns_abd_createfind() are now posted using
isc_async_run() instead of isc_task_send().  ADB event types
have been replaced with a new dns_adbstatus_t type which is
included as find->status.

(The ADB still uses a task for dns_resolver_createfetch().)
2023-02-16 14:55:06 +01:00
Evan Hunt
106da9c190
refactor dns_request to use loopmgr callbacks
dns_request_create() and _createraw() now take a 'loop' parameter
and run the callback event on the specified loop.

as the task manager is no longer used, it has been removed from
the dns_requestmgr structure.  the dns_resolver_taskmgr() function
is also no longer used and has been removed.
2023-02-16 14:55:06 +01:00
Tony Finch
6927a30926 Remove do-nothing header <isc/print.h>
This one really truly did nothing. No lines added!
2023-02-15 16:44:47 +00:00
Ondřej Surý
3d3d3b8c58
Use C-RW-WP lock in the dns_resolver unit
Replace the isc_mutex with isc_rwlock in the dns_resolver unit,
specifically, both fetch context and fetch counters now uses the C-RW-WP
locks.
2023-02-15 09:30:04 +01:00
Ondřej Surý
70439e2494 Add magic to fctxcount and replace the atomics with integers
Add magic value to the fctxcount, to check for completely invalid
counters, or counters that have been already destroyed.

Improve the locking around the counters, and because of that we can drop
the atomics and use simple integers - the counters were already locked
and the tiny bits that used the atomics were not worth the extra effort.
2023-02-11 20:21:47 +00:00
Aram Sargsyan
410fcbfcfe Fix a bug in resolver's resume_dslookup() function
A recent refactoring in 7e4e125e5ea5b29c946ce4646461d06a75cd8702
had introduced a logical error which could result in calling the
dns_resolver_createfetch() function with 'nameservers' pointer set
to NULL, but with 'domain' not set to NULL, which is not allowed
by the function.

Make sure 'domain' is set only when 'nsrdataset' is valid.
2023-02-07 10:41:21 +00:00
Michał Kępień
4e934bae0b BIND 9.19.9
-----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEENKwGS3ftSQfs1TU17QVz/8hFYQUFAmPAfwYPHG1pY2hhbEBp
 c2Mub3JnAAoJEO0Fc//IRWEFpmAP/23tasuol54W1dxnjGoQ7NYDV89ywQiWplyn
 syPs+iESFb3I9SlAHHhRGM0IREuDxjuexFdrIJOfZqokg36qPj+z81LRlRuRuetc
 HigGzpt2CDP41rVMsxzW3vyh2a3fTrjBKYT4tnDlsdnbwJOfFG4N/hdB7jqDPWut
 u1Itf/lD8iHhsISgFqvtKiQqc6XFwwzVAeSPH6pHnmngt16imVoQiddnw1RYn0vB
 EPcqhVvSeYS1AGWprnHpaWt8bru460iZwet+QKlxNxW6p4mOXGr6jQWqhZ+6ORDr
 Vo/a3+5Di+tNn89GJSbehLi5UQbvrcMR8WiQ54WP/k0PPTgoqMRC4PerLsNU8Vzq
 y1k18n8DMsuro92cNAdJk3gXuXYgGNF2sk9JtqwmiDo1/6G3afKfDiVKjiK1CxK0
 1CMKD+mPHCWB/H5U50oL1z89OCZDVUBUDT0YIrCBBrTIitzyXyAFkh+sjbRbdzww
 kg1GdZ4ODaydcWYH7r3RCHWDX6nkwADqGRk0SYvrJTFL2Hu150mwuxZj/5UZcmsz
 of6qh5b9yZrDrnBHgoqknnepuxiORFF7l3kk63fA13WG6S1m6h2ZONoVLw0J67dx
 mnAo0nlnWKi+TEl/CHiHcMZbeVhE/jrHAMPIcQQphKbCeQT1NPFSU2FQxa+dpix+
 V+y8x6Qb
 =TTpT
 -----END PGP SIGNATURE-----

Merge tag 'v9_19_9'

BIND 9.19.9
2023-01-25 21:16:00 +01:00
Aram Sargsyan
6ea05ac3fe Resolver query forwarding to DoT-enabled upstream servers
Implement TLS transport usage in the resolver.

Use the configured TLS transport for the forwarders in the resolver.
2023-01-20 14:45:30 +00:00
Aram Sargsyan
ec2098ca35 Cancel all fetch events in dns_resolver_cancelfetch()
Although 'dns_fetch_t' fetch can have two associated events, one for
each of 'DNS_EVENT_FETCHDONE' and 'DNS_EVENT_TRYSTALE' types, the
dns_resolver_cancelfetch() function is designed in a way that it
expects only one existing event, which it must cancel, and when it
happens so that 'stale-answer-client-timeout' is enabled and there
are two events, only one of them is canceled, and it results in an
assertion in dns_resolver_destroyfetch(), when it finds a dangling
event.

Change the logic of dns_resolver_cancelfetch() function so that it
cancels both the events (if they exist), and in the right order.
2023-01-12 12:43:32 +01:00
Evan Hunt
916ea26ead remove nonfunctional DSCP implementation
DSCP has not been fully working since the network manager was
introduced in 9.16, and has been completely broken since 9.18.
This seems to have caused very few difficulties for anyone,
so we have now marked it as obsolete and removed the
implementation.

To ensure that old config files don't fail, the code to parse
dscp key-value pairs is still present, but a warning is logged
that the feature is obsolete and should not be used. Nothing is
done with configured values, and there is no longer any
range checking.
2023-01-09 12:15:21 -08:00
Aram Sargsyan
53afe1f978 Fix an ADB quota management error in the resolver
Normally, when a 'resquery_t' object is created in fctx_query(),
we call dns_adb_beginudpfetch() (which increases the ADB quota)
only if it's a UDP query. Then, in fctx_cancelquery(), we call
dns_adb_endudpfetch() to decreases back the ADB quota, again only
if it's a UDP query.

The problem is that a UDP query can become a TCP query, preventing
the quota from adjusting back in fctx_cancelquery() later.

Call dns_adb_beginudpfetch() also when switching the query type
from UDP to TCP.
2022-12-23 09:45:20 +00:00
Ondřej Surý
aea251f3bc
Change the isc_buffer_reserve() to take just buffer pointer
The isc_buffer_reserve() would be passed a reference to the buffer
pointer, which was unnecessary as the pointer would never be changed
in the current implementation.  Remove the extra dereference.
2022-12-20 19:13:48 +01:00
Ondřej Surý
6f317f27ea Fix the thread safety in the dns_dispatch unit
The dispatches are not thread-bound, and used freely between various
threads (see the dns_resolver and dns_request units for details).

This refactoring make sure that all non-const dns_dispatch_t and
dns_dispentry_t members are accessed under a lock, and both object now
track their internal state (NONE, CONNECTING, CONNECTED, CANCELED)
instead of guessing the state from the state of various struct members.

During the refactoring, the artificial limit DNS_DISPATCH_SOCKSQUOTA on
UDP sockets per dispatch was removed as the limiting needs to happen and
happens on in dns_resolver and limiting the number of UDP sockets
artificially in dispatch could lead to unpredictable behaviour in case
one dispatch has the limit exhausted by others are idle.

The TCP artificial limit of DNS_DISPATCH_MAXREQUESTS makes even less
sense as the TCP connections are only reused in the dns_request API
that's not a heavy user of the outgoing connections.

As a side note, the fact that UDP and TCP dispatch pretends to be same
thing, but in fact the connected UDP is handled from dns_dispentry_t and
dns_dispatch_t acts as a broker, but connected TCP is handled from
dns_dispatch_t and dns_dispatchmgr_t acts as a broker doesn't really
help the clarity of this unit.

This refactoring kept to API almost same - only dns_dispatch_cancel()
and dns_dispatch_done() were merged into dns_dispatch_done() as we need
to cancel active netmgr handles in any case to not leave dangling
connections around.  The functions handling UDP and TCP have been mostly
split to their matching counterparts and the dns_dispatch_<function>
functions are now thing wrappers that call <udp|tcp>_dispatch_<function>
based on the socket type.

More debugging-level logging was added to the unit to accomodate for
this fact.
2022-12-19 11:42:13 +01:00
Aram Sargsyan
03442d922b Clean up and refactor dns_adb_getcookie()
The dns_adb_getcookie() doesn't use the 'adb' parameter, remove it.

Refactor the dns_adb_getcookie() function to just return the size of
the cookie when the caller passes 'NULL' as the 'cookie' argument.
2022-12-15 12:34:26 +00:00
Ondřej Surý
5466a48fc9
Try next server on resolver timeout
Instead of resending to the same server on the (dispatch) timeout in the
resolver, try the next server.
2022-12-14 18:49:18 +01:00
Ondřej Surý
7292ee6d92
Fix intermittent memory leak in dns_resolver unit
A rdataset could have been left unassociated on the error path in the
resume_dslookup() in the dns_resolver unit.  Clone the rdataset after
the error check, so it's not cloned before we check whether we can make
further progress chasing DS records.
2022-12-14 10:48:06 +01:00
Ondřej Surý
7e4e125e5e Refactor the dns_resolver fetch context hash tables and locking
This is second in the series of fixing the usage of hashtables in the
dns_adb and the dns_resolver units.

Currently, the fetch buckets (used to hold the fetch context) and zone
buckets (used to hold per-domain counters) would never get cleaned from
the memory.  Combined with the fact that the hashtable now grows as
needed (instead of using hashtable as buckets), the memory usage in the
resolver can just grow and it never drops down.

In this commit, the usage of hashtables (hashmaps) has been completely
rewritten, so there are no "buckets" and all the matching conditions are
directly mapped into the hashtable key:

 1. For per-domain counter hashtable, this is simple as the lowercase
    domain name is used directly as a counter.

 2. For fetch context hashtable, this requires copying some extra flags
    back and forth in the key.

As we don't hold the "buckets" forever, the cleaning mechanism has been
rewritten as well:

 1. For per-domain counter hashtable, this is again much simpler, as we
    only need to check whether the usage counter is still zero under the
    lock and bail-out on cleaning if the counter is in use.

 2. For fetch context hashtable, this is more complicated as the fetch
    context cannot be reused after it has been finished.  The algorithm
    is different, the fetch context is always removed from the
    hashtable, but if we find the fetch context that has been marked
    as finished in the lookup function, we help with the cleaning from
    the hashtable and try again.

Couple of additional changes have been implemented in this refactoring
as those were needed for correct functionality and could not be split
into individual commits (or would not make sense as seperate commits):

 1. The dns_resolver_createfetch() has an option to create "unshared"
    fetch.  The "unshared" fetch will never get matched, so there's
    little point in storing the "unshared" fetch in the hashtable.
    Therefore the "unshared" fetches are now detached from the
    hashtable and live just on their own.

 2. Replace the custom reference counting with ISC_REFCOUNT_DECL/IMPL
    macros for better tracing.

 3. fctx_done_detach() is idempotent, it makes the "final" detach (the
    one matching the create function) only once.  But that also means
    that it has to be called before the detach that kept the fetch
    context alive in the callback.  A new macro fctx_done_unref() has
    been added to allow this code flow:

    fctx_done_unref(fctx, result);
    fctx_detach(&fctx);

    Doing this the other way around could cause fctx to get destroyed in
    the fctx_unref() first and fctx_done_detach() would cause UAF.

 4. The resume_qmin() and resume_dslookup() callbacks have been
    refactored for more readability and simpler code paths.  The
    validated() callback has also received some of the simplifications,
    but it should be refactored in the future as it is bit of spaghetti
    now.
2022-12-01 11:42:46 +01:00
Ondřej Surý
50f357cb36
Refactor the dns_adb unit
The dns_adb unit has been refactored to be much simpler.  Following
changes have been made:

1. Simplify the ADB to always allow GLUE and hints

   There were only two places where dns_adb_createfind() was used - in
   the dns_resolver unit where hints and GLUE addresses were ok, and in
   the dns_zone where dns_adb_createfind() would be called without
   DNS_ADBFIND_HINTOK and DNS_ADBFIND_GLUEOK set.

   Simplify the logic by allowing hint and GLUE addresses when looking
   up the nameserver addresses to notify.  The difference is negligible
   and would cause a difference in the notified addresses only when
   there's mismatch between the parent and child addresses and we
   haven't cached the child addresses yet.

2. Drop the namebuckets and entrybuckets

   Formerly, the namebuckets and entrybuckets were used to reduced the
   lock contention when accessing the double-linked lists stored in each
   bucket.  In the previous refactoring, the custom hashtable for the
   buckets has been replaced with isc_ht/isc_hashmap, so only a single
   item (mostly, see below) would end up in each bucket.

   Removing the entrybuckets has been straightforward, the only matching
   was done on the isc_sockaddr_t member of the dns_adbentry.

   Removing the zonebuckets required GLUEOK and HINTOK bits to be
   removed because the find could match entries with-or-without the bits
   set, and creating a custom key that stores the
   DNS_ADBFIND_STARTATZONE in the first byte of the key, so we can do a
   straightforward lookup into the hashtable without traversing a list
   that contains items with different flags.

3. Remove unassociated entries from ADB database

   Previously, the adbentries could live in the ADB database even after
   unlinking them from dns_adbnames.  Such entries would show up as
   "Unassociated entries" in the ADB dump.  The benefit of keeping such
   entries is little - the chance that we link such entry to a adbname
   is small, and it's simpler to evict unlinked entries from the ADB
   cache (and the hashtable) than create second LRU cleaning mechanism.

   Unlinked ADB entries are now directly deleted from the hash
   table (hashmap) upon destruction.

4. Cleanup expired entries from the hash table

   When buckets were still in place, the code would keep the buckets
   always allocated and never shrink the hash table (hashmap).  With
   proper reference counting in place, we can delete the adbnames from
   the hash table and the LRU list.

5. Stop purging the names early when we hit the time limit

   Because the LRU list is now time ordered, we can stop purging the
   names when we find a first entry that doesn't fullfil our time-based
   eviction criteria because no further entry on the LRU list will meet
   the criteria.

Future work:

1. Lock contention

   In this commit, the focus was on correctness of the data structure,
   but in the future, the lock contention in the ADB database needs to
   be addressed.  Currently, we use simple mutex to lock the hash
   tables, because we almost always need to use a write lock for
   properly purging the hashtables.  The ADB database needs to be
   sharded (similar to the effect that buckets had in the past).  Each
   shard would contain own hashmap and own LRU list.

2. Time-based purging

   The ADB names and entries stay intact when there are no lookups.
   When we add separate shards, a timer needs to be added for time-based
   cleaning in case there's no traffic hashing to the inactive shard.

3. Revisit the 30 minutes limit

   The ADB cache is capped at 30 minutes.  This needs to be revisited,
   and at least the limit should be configurable (in both directions).
2022-11-30 10:03:24 +01:00