If zone and denial data are going to be stored in the same qp storage,
the unit tests need to be updated to reflect this change. The code
changes mainly affect name to qpkey conversion, lookups, and
predecessors.
A note on predecessors: since the denial and zone data are now in the
same qp storage, the predecessor of the first name in the zone data will
consequently be the last name in the denial data.
In preparation to merge the three qp tries (tree, nsec, nsec3) into
one, add the piece of information into the qpkey. This is the most
significant bit of information, so prepend the denial type to the qpkey.
This means we need to pass on the denial type when constructing the
qpkey from a name, or doing a lookup.
Reuse the the DNS_DB_NSEC_* values. Most qp tries in the code we just
pass on 0 (nta, rpz, zt, etc.), because there is no need for denial of
existence, but for qpzone and qpcache we must pass the right value.
Change the code, so that node->nsec no longer can have the value
DNS_DB_NSEC_HAS_NSEC, instead track this in a new attribute 'havensec'.
Since we use node->nsec to convert names to keys, the value MUST be set
before inserting the node into the qp-trie.
Update the fuzzing and unit tests accordingly. This only adds a few
extra test cases, more are needed.
In the qp_test.c we can remove test code for empty keys as this is
no longer possible.
RRset ordering is now an enum inside struct rdataset attributes. This
was done to keep size to of the structure to its original value before
this MR.
I expect zero performance impact but it should be easier to deal with
attributes in debuggers and language servers.
The ns_client_t struct is reset and zero-ed out on every query,
but some fields (query, message, manager) are preserved.
We observe two things:
- The sendbuf field is going to be overwritten anyway, there's
no need to zero it out.
- The fields are copied out when the struct is zero-ed out, and
then copied back in. For the query field (which is 896 bytes)
this is very inefficient.
This commit makes the reset more efficient avoiding to unnecessary
zero-ing and copy.
Instead of having hand crafted attach/detach/destroy functions, replace
them with the standard ISC_REFCOUNT macro. This also have advantage
that delayed netmgr detach (from dns_dispatch) now doesn't cause
assertion failure. This can happen with delayed (call_rcu) shutdown of
dns_adb.
Qpzone employs a locking strategy where rwlocks are grouped into
buckets, and each zone gets 17 buckets.
This strategy is suboptimal in two ways:
- If named is serving a single zone or a zone is the majority of the
traffic, this strategy pretty much guarantees contention when using
more than a dozen threads.
- If named is serving many small zones, it causes substantial memory
usage.
This commit switches the locking to a global table initialized at start
time. This should have three effects:
- Performance should improve in the single zone case, since now we are
selecting from a bigger pool of locks.
- Memory consumption should go down significantly in the many zone
cases.
- Performance should not degrade substantially in the many zone cases.
The reason for this is that, while we could have substantially more
zones than locks, we can query/edit only O(num threads) at the same
time. So by making the global table much bigger than the expected
number of threads, we can limit contention.
In the current implementation, the resigning heap is part of the zone
database. This leads to a cycle, as the database has a reference to its
nodes, but each node needs a reference to the database.
This MR splits the resigning heap into its own separate struct, in order
to help breaking the cycle.
Recovering the node lock from a pointer to the header and a pointer to
the db is a common operation. This commit abstracts it away into a
function, so that the node lock selection logic may be modified more
easily.
Change the internal type used for isc_tid unit to isc_tid_t to hide the
specific integer type being used for the 'tid'. Internally, the signed
integer type is being used. This allows us to have negatively indexed
arrays that works both for threads with assigned tid and the threads
with unassigned tid. This should be used only in specific situations.
We need disable clang-format here to preserve the brackets around
the string concatenation to prevent -Wstring-concatenation -Werror
breaking the build.
Add support for proposed DS digest types that encode the private
algorithm identifier at the start of the DS digest as is done for
DNSKEY and RRSIG. This allows a DS record to identify the specific
DNSSEC algorithm, rather than a set of algorithms, when the algorithm
field is set to PRIVATEDNS or PRIVATEOID.
Meson is a modern build system that has seen a rise in adoption and some
version of it is available in almost every platform supported.
Compared to automake, meson has the following advantages:
* Meson provides a significant boost to the build and configuration time
by better exploiting parallelism.
* Meson is subjectively considered to be better in readability.
These merits alone justify experimenting with meson as a way of
improving development time and ergonomics. However, there are some
compromises to ensure the transition goes relatively smooth:
* The system tests currently rely on various files within the source
directory. Changing this requirement is a non-trivial task that can't
be currently justified. Currently the last compiled build directory
writes into the source tree which is in turn used by pytest.
* The minimum version supported has been fixed at 0.61. Increasing this
value will require choosing a baseline of distributions that can
package with meson. On the contrary, there will likely be an attempt
to decrease this value to ensure almost universal support for building
BIND 9 with meson.
The cache for unreachable primaries was added to BIND 9 in 2006 via
1372e172d0e0b08996376b782a9041d1e3542489. It features a 10-slot LRU
array with 600 seconds (10 minutes) fixed delay. During this time, any
primary with a hiccup would be blocked for the whole block duration
(unless overwritten by a different entry).
As this design is not very flexible (i.e. the fixed delay and the fixed
amount of the slots), redesign it based on the badcache.c module, which
was implemented earlier for a similar mechanism.
The differences between the new code and the badcache module were large
enough to create a new module instead of trying to make the badcache
module universal, which could complicate the implementation.
The new design implements an exponential backoff for entries which are
added again soon after expiring, i.e. the next expiration happens in
double the amount of time of the previous expiration, but in no more
time than the defined maximum value.
The initial and the maximum expiration values are hard-coded, but, if
required, it should be trivial to implement configurable knobs.
Special tokens can now be specified in a zone "file" option
in order to generate the filename parametrically. The first
instead of "$name" in the "file" option is replaced with the
zone origin, the first instance of "$type" is replaced with the
zone type (i.e., primary, secondary, etc), and the first instance
of "$view" is replaced with the view name..
This simplifies the creation of zones using initial-file templates.
For example:
$ rndc addzone <zonename> \
{ type primary; file "$name.db"; initial-file "template.db"
When loading a primary zone for the first time, if the zonefile
does not exist but an "initial-file" option has been set, then a
new file will be copied into place from the path specified by
"initial-file".
This can be used to simplify the process of adding new zones. For
instance, a template zonefile could be used by running:
$ rndc addzone example.com \
'{ type primary; file "example.db"; initial-file "template.db"; };'
Coverity flagged a potential divide by zero error in collect in
qpmulti.c when the elapsed time is zero but that is only called
once the elapsed time is greater than or equal to RUNTIME (1/4
second) so INSIST this is the case.
Instead of giving the memory pools names with an explicit call to
isc_mempool_setname(), add the name to isc_mempool_create() call to have
all the memory pools an unconditional name.
Instead of giving the memory context names with an explicit call to
isc_mem_setname(), add the name to isc_mem_create() call to have all the
memory contexts an unconditional name.
previously, ISC_LIST_FOREACH and ISC_LIST_FOREACH_SAFE were
two separate macros, with the _SAFE version allowing entries
to be unlinked during the loop. ISC_LIST_FOREACH is now also
safe, and the separate _SAFE macro has been removed.
similarly, the ISC_LIST_FOREACH_REV macro is now safe, and
ISC_LIST_FOREACH_REV_SAFE has also been removed.
The `max-rsa-exponent-size` could limit the exponents of the RSA
public keys during the DNSSEC verification. Instead of providing
a cryptic (not cryptographic) knob, hardcode the max exponent to
be 4096 (the theoretical maximum for DNSSEC).
The DST API has been cleaned up, duplicate functions has been squashed
into single call (verify and verify2 functions), and couple of unused
functions have been completely removed (createctx2, computesecret,
paramcompare, and cleanup).
In a previous change, the "algorithm" value passed to
dns_tsigkey_create() was changed from a DNS name to an integer;
the name was then chosen from a table of known algorithms. A
side effect of this change was that a query using an unknown TSIG
algorithm was no longer handled correctly, and could trigger an
assertion failure. This has been corrected.
The dns_tsigkey struct now stores the signing algorithm
as dst_algorithm_t value 'alg' instead of as a dns_name,
but retains an 'algname' field, which is used only when the
algorithm is DST_ALG_UNKNOWN. This allows the name of the
unrecognized algorithm name to be returned in a BADKEY
response.
Previously all kinds of TCP timeouts had a single getter and setter
functions. Separate each timeout to its own getter/setter functions,
because in majority of cases only one is required at a time, and it's
not optimal expanding those functions every time a new timeout value
is implemented.
The new 'tcp-primaries-timeout' configuration option works the same way
as the existing 'tcp-initial-timeout' option, but applies only to the
TCP connections made to the primary servers, so that the timeout value
can be set separately for them. The default is 15 seconds.
Also, while accommodating zone.c's code to support the new option, make
a light refactoring with the way UDP timeouts are calculated by using
definitions instead of hardcoded values.
We were failing to account for the length byte before the OID.
See RFC 4034.
Algorithm number 254 is reserved for private use and will never be
assigned to a specific algorithm. The public key area in the DNSKEY
RR and the signature area in the RRSIG RR begin with an unsigned
length byte followed by a BER encoded Object Identifier (ISO OID) of
that length. The OID indicates the private algorithm in use, and the
remainder of the area is whatever is required by that algorithm.
Entities should only use OIDs they control to designate their private
algorithms.
the pattern `for (x = ISC_LIST_HEAD(...); x != NULL; ISC_LIST_NEXT(...)`
has been changed to `ISC_LIST_FOREACH` throughout BIND, except in a few
cases where the change would be excessively complex.
in most cases this was a straightforward change. in some places,
however, the list element variable was referenced after the loop
ended, and the code was refactored to avoid this necessity.
also, because `ISC_LIST_FOREACH` uses typeof(list.head) to declare
the list elements, compilation failures can occur if the list object
has a `const` qualifier. some `const` qualifiers have been removed
from function parameters to avoid this problem, and where that was not
possible, `UNCONST` was used.
Previously changed mem_test (!10320) introduces a test which checks for
the value of `__FILE__`, which is different if the build is done
out-of-tree or not, even though this is not relevant for the test (only
the base filename is). This result in a broken test for out-of-tree
builds. Fix this by changing the way the "grep" is done in the test,
ignoring the optional path prefix in the filename.
When allocating memory under -m trace|record, the __FILE__ pointer is
stored, so it can be printed out later in order to figure out in which
file an allocation leaked. (among others, like the line number).
However named crashes when called with -m record and using a plugin
leaking memory. The reason is that plugins are unloaded earlier than
when the leaked allocations are dumped (obviously, as it's done as late
as possible). In such circumstances, __FILE__ is dangling because the
dynamically loaded library (the plugin) is not in memory anymore.
Fix the crash by systematically copying the __FILE__ string
instead of copying the pointer. Of course, this make each allocation to
consume a bit more memory (and longer, as it needs to calculate the
length of __FILE__) but this occurs only under -m trace|record debugging
flags.
In term of unit test, because grepping in C is not fun, and because the
whole "syntax" of the dump output is tested in other tests, this simply
search for a substring in the whole buffer to make sure the expected
allocations are found.
This is the core implementation of the SIEVE algorithm described in the
following paper:
Zhang, Yazhuo, Juncheng Yang, Yao Yue, Ymir Vigfusson, and K V
Rashmi. “SIEVE Is Simpler than LRU: An Efficient Turn-Key Eviction
Algorithm for Web Caches,” n.d.. available online from
https://junchengyang.com/publication/nsdi24-SIEVE.pdf
Unit test for isc_netaddr_masktoprefixlen were missing IPv6 mask cases.
Add those and few other IPv4 cases. Also, the test is refactored in
order to make it easy to add new cases.
The string literal initialalising compressed was too big for the
array as it has an unwanted NUL terminator. This is allowed for
in C for historical reasons but produces a warning with some
compilers. Adjust the declaration to include the NUL and adjust
the users to pass in an adjusted size which excludes the NUL rather
than sizeof(compressed).
Since algorithm fetching is handled purely in libisc, FIPS mode toggling
can be purely done in within the library instead of provider fetching in
the binary for OpenSSL >=3.0.
Disabling FIPS mode isn't a realistic requirement and isn't done
anywhere in the codebase. Make the FIPS mode toggle enable-only to
reflect the situation.
DNSKEY, KEY, RRSIG and SIG constraints have been relaxed to allow
empty key and signature material after the algorithm identifier for
PRIVATEOID and PRIVATEDNS. It is arguable whether this falls within
the expected use of these types as no key material is shared and
the signatures are ineffective but these are private algorithms and
they can be totally insecure.
previously, dns_name_fromtext() took both a target name and an
optional target buffer parameter, which could override the name's
dedicated buffer. this interface is unnecessarily complex.
we now have two functions, dns_name_fromtext() to convert text
into a dns_name that has a dedicated buffer, and dns_name_wirefromtext()
to convert text into uncompressed DNS wire format and append it to a
target buffer.
in cases where it really is necessary to have both, we can use
dns_name_fromtext() to load the dns_name, then dns_name_towire()
to append the wire format to the target buffer.
dns_name_fromtext() stores the converted name in the 'name'
passed to it, and optionally also copies it in wire format to
a buffer 'target'. this makes the interface unnecessarily
complex, and could be simplified by having a different function
for each purpose. as a first step, remove uses of the target
buffer in calls to dns_name_fromtext() where it wasn't actually
needed.
this parameter was added as a (minor) optimization for
cases where dns_name_towire() is run repeatedly with the
same compression context, as when rendering all of the rdatas
in an rdataset. it is currently only used in one place.
we now simplify the interface by removing the extra parameter.
the compression offset value is now part of the compression
context, and can be activated when needed by calling
dns_compress_setmultiuse(). multiuse mode is automatically
deactivated by any subsequent call to dns_compress_permitted().