The memory context created in the clientmgr context was missing a name,
so it was nameless in the memory context statistics.
Set the clientmgr memory context name to "clientmgr".
A number of DNS implementation produce NSEC records with bad type
maps that don't contain types that exist at the name leading to
NODATA responses being synthesize instead of the records in the
zone. NSEC records with these bad type maps often have the NSEC
NSEC field set to '\000.QNAME'. We look for the first label of
this pattern.
e.g.
example.com NSEC \000.example.com SOA NS NSEC RRSIG
example.com RRRSIG NSEC ...
example.com SOA ...
example.com RRRSIG SOA ...
example.com NS ...
example.com RRRSIG NS ...
example.com A ...
example.com RRRSIG A ...
A is missing from the type map.
This introduces a temporary option 'reject-000-label' to control
this behaviour.
With isc_mem_get() and dns_name_dup() no longer being able to fail, some
functions can now only return ISC_R_SUCCESS. Change the return type to
void for the following function(s):
* dns_view_adddelegationonly()
* dns_view_excludedelegationonly()
Remove the dynamic registration of result codes. Convert isc_result_t
from unsigned + #defines into 32-bit enum type in grand unified
<isc/result.h> header. Keep the existing values of the result codes
even at the expense of the description and identifier tables being
unnecessary large.
Additionally, add couple of:
switch (result) {
[...]
default:
break;
}
statements where compiler now complains about missing enum values in the
switch statement.
The flow of operations in dispatch is changing and will now be similar
for both UDP and TCP queries:
1) Call dns_dispatch_addresponse() to assign a query ID and register
that we'll be listening for a response with that ID soon. the
parameters for this function include callback functions to inform the
caller when the socket is connected and when the message has been
sent, as well as a task action that will be sent when the response
arrives. (later this could become a netmgr callback, but at this
stage to minimize disruption to the calling code, we continue to use
isc_task for the response event.) on successful completion of this
function, a dispatch entry object will be instantiated.
2) Call dns_dispatch_connect() on the dispatch entry. this runs
isc_nm_udpconnect() or isc_nm_tcpdnsconnect(), as needed, and begins
listening for responses. the caller is informed via a callback
function when the connection is established.
3) Call dns_dispatch_send() on the dispatch entry. this runs
isc_nm_send() to send a request.
4) Call dns_dispatch_removeresponse() to terminate listening and close
the connection.
Implementation comments below:
- As we will be using netmgr buffers now. code to send the length in
TCP queries has also been removed as that is handled by the netmgr.
- TCP dispatches can be used by multiple simultaneous queries, so
dns_dispatch_connect() now checks whether the dispatch is already
connected before calling isc_nm_tcpdnsconnect() again.
- Running dns_dispatch_getnext() from a non-network thread caused a
crash due to assertions in the netmgr read functions that appear to be
unnecessary now. the assertions have been removed.
- fctx->nqueries was formerly incremented when the connection was
successful, but is now incremented when the query is started and
decremented if the connection fails.
- It's no longer necessary for each dispatch to have a pool of tasks, so
there's now a single task per dispatch.
- Dispatch code to avoid UDP ports already in use has been removed.
- dns_resolver and dns_request have been modified to use netmgr callback
functions instead of task events. some additional changes were needed
to handle shutdown processing correctly.
- Timeout processing is not yet fully converted to use netmgr timeouts.
- Fixed a lock order cycle reported by TSAN (view -> zone-> adb -> view)
by by calling dns_zt functions without holding the view lock.
The last remaining defines needed for platforms without NAME_MAX and
PATH_MAX (I'm looking at you, GNU Hurd) were moved to isc/dir.h where
it's prevalently used.
Previously, as a way of reducing the contention between threads a
clientmgr object would be created for each interface/IP address.
We tasks being more strictly bound to netmgr workers, this is no longer
needed and we can just create clientmgr object per worker queue (ncpus).
Each clientmgr object than would have a single task and single memory
context.
The 'checknames' field wasn't initialized in dns_view_create(), but it
should otherwise AddressSanitizer identifies the following runtime error
in query_test.c.
runtime error: load of value 190, which is not a valid value for type '_Bool'
On 24-core machine, the tests would crash because we would run out of
the hazard pointers. We now adjust the number of hazard pointers to be
in the <128,256> interval based on the number of available cores.
Note: This is just a band-aid and needs a proper fix.
Add support for a "tls" key/value pair for zone primaries, referencing
either a "tls" configuration statement or "ephemeral". If set to use
TLS, zones will send SOA and AXFR/IXFR queries over a TLS channel.
Hold a weak reference to the view so that it can't go away while
nta is performing its lookups. Cancel nta timers once all external
references to the view have gone to prevent them triggering new work.
Created isc_refcount_decrement_expect macro to test conditionally
the return value to ensure it is in expected range. Converted
unchecked isc_refcount_decrement to use isc_refcount_decrement_expect.
Converted INSIST(isc_refcount_decrement()...) to isc_refcount_decrement_expect.
There were several problems with rbt hashtable implementation:
1. Our internal hashing function returns uint64_t value, but it was
silently truncated to unsigned int in dns_name_hash() and
dns_name_fullhash() functions. As the SipHash 2-4 higher bits are
more random, we need to use the upper half of the return value.
2. The hashtable implementation in rbt.c was using modulo to pick the
slot number for the hash table. This has several problems because
modulo is: a) slow, b) oblivious to patterns in the input data. This
could lead to very uneven distribution of the hashed data in the
hashtable. Combined with the single-linked lists we use, it could
really hog-down the lookup and removal of the nodes from the rbt
tree[a]. The Fibonacci Hashing is much better fit for the hashtable
function here. For longer description, read "Fibonacci Hashing: The
Optimization that the World Forgot"[b] or just look at the Linux
kernel. Also this will make Diego very happy :).
3. The hashtable would rehash every time the number of nodes in the rbt
tree would exceed 3 * (hashtable size). The overcommit will make the
uneven distribution in the hashtable even worse, but the main problem
lies in the rehashing - every time the database grows beyond the
limit, each subsequent rehashing will be much slower. The mitigation
here is letting the rbt know how big the cache can grown and
pre-allocate the hashtable to be big enough to actually never need to
rehash. This will consume more memory at the start, but since the
size of the hashtable is capped to `1 << 32` (e.g. 4 mio entries), it
will only consume maximum of 32GB of memory for hashtable in the
worst case (and max-cache-size would need to be set to more than
4TB). Calling the dns_db_adjusthashsize() will also cap the maximum
size of the hashtable to the pre-computed number of bits, so it won't
try to consume more gigabytes of memory than available for the
database.
FIXME: What is the average size of the rbt node that gets hashed? I
chose the pagesize (4k) as initial value to precompute the size of
the hashtable, but the value is based on feeling and not any real
data.
For future work, there are more places where we use result of the hash
value modulo some small number and that would benefit from Fibonacci
Hashing to get better distribution.
Notes:
a. A doubly linked list should be used here to speedup the removal of
the entries from the hashtable.
b. https://probablydance.com/2018/06/16/fibonacci-hashing-the-optimization-that-the-world-forgot-or-a-better-alternative-to-integer-modulo/
Also disable the semantic patch as the code needs tweaks here and there because
some destroy functions might not destroy the object and return early if the
object is still in use.
Function dns_view_findzonecut in view.c wasn't correctly handling
classes other than IN (chaos, hesiod, etc) whenever the name being
looked up wasn't in cache or in any of the configured zone views' database.
That resulted in a NULL fname being used in resolver.c:4900, which
in turn was triggering abort.
this function is used by dns_view_untrust() to handle revoked keys, so
it will still be needed after the keytable/validator refactoring is
complete, even though the keytable will be storing DS trust anchors
instead of keys. to simplify the way it's called, it now takes a DNSKEY
rdata struct instead of a DST key.
This second commit uses second semantic patch to replace the calls to
dns_name_copy() with NULL as third argument where the result was stored in a
isc_result_t variable. As the dns_name_copy(..., NULL) cannot fail gracefully
when the third argument is NULL, it was just a bunch of dead code.
Couple of manual tweaks (removing dead labels and unused variables) were
manually applied on top of the semantic patch.
This commit add RUNTIME_CHECK() around all simple dns_name_copy() calls where
the third argument is NULL using the semantic patch from the previous commit.
so that cleanup can all be done in dns_view_weakattach().
(cherry picked from commit be8af3afb711a04ac140b1debce3b950922d714f)
(cherry picked from commit e3946327035832ec03c064410e5c6881841f8f6f)
There's a deadlock in BIND 9 code where (dns_view_t){ .lock } and
(dns_resolver_t){ .buckets[i].lock } gets locked in different order. When
view->weakrefs gets converted to a reference counting we can reduce the locking
in dns_view_weakdetach only to cases where it's the last instance of the
dns_view_t object.
(cherry picked from commit a7c9a52c89146490a0e797df2119026a268f294c)
(cherry picked from commit 232140edae6b80f8344db234cda28f8456269a6e)
If named is configured to perform DNSSEC validation and also forwards
all queries ("forward only;") to validating resolvers, negative trust
anchors do not work properly because the CD bit is not set in queries
sent to the forwarders. As a result, instead of retrieving bogus DNSSEC
material and making validation decisions based on its configuration,
named is only receiving SERVFAIL responses to queries for bogus data.
Fix by ensuring the CD bit is always set in queries sent to forwarders
if the query name is covered by an NTA.
- "hook" is now used only for hook points and hook actions
- the "hook" statement in named.conf is now "plugin"
- ns_module and ns_modlist are now ns_plugin and ns_plugins
- ns_module_load is renamed ns_plugin_register
- the mandatory functions in plugin modules (hook_register,
hook_check, hook_version, hook_destroy) have been renamed
- use a per-view module list instead of global hook_modules
- create an 'instance' pointer when registering modules, store it in
the module structure, and use it as action_data when calling
hook functions - this enables multiple module instances to be set
up in parallel
- also some nomenclature changes and cleanup
- make some cfg-parsing functions global so they can be run
from filter-aaaa.so
- add filter-aaaa options to the hook module's parser
- mark filter-aaaa options in named.conf as obsolete, remove
from named and checkconf, and update the filter-aaaa test not to
use checkconf anymore
- remove filter-aaaa-related struct members from dns_view
- allow multiple "hook" statements at global or view level
- add "optional bracketed text" type for optional parameter list
- load hook module from specified path rather than hardcoded path
- add a hooktable pointer (and a callback for freeing it) to the
view structure
- change the hooktable functions so they no longer update ns__hook_table
by default, and modify PROCESS_HOOK so it uses the view hooktable, if
set, rather than ns__hook_table. (ns__hook_table is retained for
use by unit tests.)
- update the filter-aaaa system test to load filter-aaaa.so
- add a prereq script to check for dlopen support before running
the filter-aaaa system test
not yet done:
- configuration parameters are not being passed to the filter-aaaa
module; the filter-aaaa ACL and filter-aaaa-on-{v4,v6} settings are
still stored in dns_view