The rbtnode->rpz flag was left behind when rbt and rpz were disentangled
by CHANGES #4576. Removing it makes the comment above correct again.
This reduces the flags so they fit in a 32 bit word again. On 64
bit systems there is still padding so it doesn't change the size
of an rbtnode. On 32 bit systems it reduces an rbtnode by 4 bytes.
Fibonacci hashing was implemented in four separate places (rbt.c,
rbtdb.c, resolver.c, zone.c). This commit combines them into a single
implementation. The hash_32() function is now replaced with
isc_hash_bits32().
Historically, the inline keyword was a strong suggestion to the compiler
that it should inline the function marked inline. As compilers became
better at optimising, this functionality has receded, and using inline
as a suggestion to inline a function is obsolete. The compiler will
happily ignore it and inline something else entirely if it finds that's
a better optimisation.
Therefore, remove all the occurences of the inline keyword with static
functions inside single compilation unit and leave the decision whether
to inline a function or not entirely on the compiler
NOTE: We keep the usage the inline keyword when the purpose is to change
the linkage behaviour.
Previously, the unreachable code paths would have to be tagged with:
INSIST(0);
ISC_UNREACHABLE();
There was also older parts of the code that used comment annotation:
/* NOTREACHED */
Unify the handling of unreachable code paths to just use:
UNREACHABLE();
The UNREACHABLE() macro now asserts when reached and also uses
__builtin_unreachable(); when such builtin is available in the compiler.
This commit converts the license handling to adhere to the REUSE
specification. It specifically:
1. Adds used licnses to LICENSES/ directory
2. Add "isc" template for adding the copyright boilerplate
3. Changes all source files to include copyright and SPDX license
header, this includes all the C sources, documentation, zone files,
configuration files. There are notes in the doc/dev/copyrights file
on how to add correct headers to the new files.
4. Handle the rest that can't be modified via .reuse/dep5 file. The
binary (or otherwise unmodifiable) files could have license places
next to them in <foo>.license file, but this would lead to cluttered
repository and most of the files handled in the .reuse/dep5 file are
system test files.
The __builtin_expect() can be used to provide the compiler with branch
prediction information. The Gcc manual says[1] on the subject:
In general, you should prefer to use actual profile feedback for
this (-fprofile-arcs), as programmers are notoriously bad at
predicting how their programs actually perform.
Stop using __builtin_expect() and ISC_LIKELY() and ISC_UNLIKELY() macros
to provide the branch prediction information as the performance testing
shows that named performs better when the __builtin_expect() is not
being used.
1. https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005fexpect
Originally, the hash table used in RBT database would be resized when it
reached certain number of elements (defined by overcommit). This was
causing resolution brownouts for busy resolvers, because the rehashing
could take several seconds to complete. This was mitigated by
pre-allocating the hash table in the RBT database used for caching to be
large-enough as determined by max-cache-size. The downside of this
solution was that the pre-allocated hash table could take a significant
chunk of the memory even when the resolver cache would be otherwise
empty because the default value for max-cache-size is 90% of available
memory.
Implement incremental resizing[1] to perform the rehashing gradually:
1. During the resize, allocate the new hash table, but keep the old
table unchanged.
2. In each lookup or delete operation, check both tables.
3. Perform insertion operations only in the new table.
4. At each insertion also move r elements from the old table to the new
table.
5. When all elements are removed from the old table, deallocate it.
To ensure that the old table is completely copied over before the new
table itself needs to be enlarged, it is necessary to increase the
size of the table by a factor of at least (r + 1)/r during resizing.
In our implementation r is equal to 1.
The downside of this approach is that the old table and the new table
could stay in memory for longer when there are no new insertions into
the hash table for prolonged periods of time as the incremental
rehashing happens only during the insertions.
The upside of this approach is that it's no longer necessary to
pre-allocate large hash table, because the RBT hash table rehashing
doesn't cause resolution brownouts anymore and thus we can use the
memory as needed.
1. https://en.m.wikipedia.org/wiki/Hash_table#Dynamic_resizing
Remove the dynamic registration of result codes. Convert isc_result_t
from unsigned + #defines into 32-bit enum type in grand unified
<isc/result.h> header. Keep the existing values of the result codes
even at the expense of the description and identifier tables being
unnecessary large.
Additionally, add couple of:
switch (result) {
[...]
default:
break;
}
statements where compiler now complains about missing enum values in the
switch statement.
- fixed a size comparison using "signed int" that failed if the file
size was more than 2GB, since that was treated as a negative number.
- incidentally renamed deserialize32() to just deserialize(). we no
longer have separate 32 and 64 bit rbtdb implementations.
The isc/platform.h header was left empty which things either already
moved to config.h or to appropriate headers. This is just the final
cleanup commit.
When "max-cache-size" is changed to "unlimited" (or "0") for a running
named instance (using "rndc reconfig"), the hash table size limit for
each affected cache DB is not reset to the maximum possible value,
preventing those hash tables from being allowed to grow as a result of
new nodes being added.
Extend dns_rbt_adjusthashsize() to interpret "size" set to 0 as a signal
to remove any previously imposed limits on the hash table size. Adjust
API documentation for dns_db_adjusthashsize() accordingly. Move the
call to dns_db_adjusthashsize() from dns_cache_setcachesize() so that it
also happens when "size" is set to 0.
Upon creation, each dns_rbt_t structure has its "maxhashbits" field
initialized to the value of the RBT_HASH_MAX_BITS preprocessor macro,
i.e. 32. When the dns_rbt_adjusthashsize() function is called for the
first time for a given RBT (for cache RBTs, this happens when they are
first created, i.e. upon named startup), it lowers the value of the
"maxhashbits" field to the number of bits required to index the
requested number of hash table slots. When a larger hash table size is
subsequently requested, the value of the "maxhashbits" field should be
increased accordingly, up to RBT_HASH_MAX_BITS. However, the loop in
the rehash_bits() function currently ensures that the number of bits
necessary to index the resized hash table will not be larger than
rbt->maxhashbits instead of RBT_HASH_MAX_BITS, preventing the hash table
from being grown once the "maxhashbits" field of a given dns_rbt_t
structure is set to any value lower than RBT_HASH_MAX_BITS.
Fix by tweaking the loop guard condition in the rehash_bits() function
so that it compares the new number of bits used for indexing the hash
table against RBT_HASH_MAX_BITS rather than rbt->maxhashbits.
When calculating the new hashtable bitsize, there was an off-by-one
error that would allow the new bitsize to be larger than maximum allowed
causing assertion failure in the rehash() function.
The hash table rework MRs (!3865, !3871) increased the default RBT hash
table size from 64 to 65,536 entries (for 64-bit architectures, that is
512 bytes before vs. 524,288 bytes after). This works fine for RBTs
used for cache databases, but since three separate RBT databases are
created for every zone loaded (RRs, NSEC, NSEC3), memory usage would
skyrocket when BIND 9 is used as an authoritative DNS server with many
zones.
The default RBT hash table size before the rework was 64 entries, this
commit reduces it to 16 entries because our educated guess is that most
zones are just couple of entries (SOA, NS, A, AAAA, MX) and rehashing
small hash tables is actually cheap. The rework we did in the previous
MRs tries to avoid growing the hash tables for big-to-huge caches where
growing the hash table comes at a price because the whole cache needs to
be locked.
There were several problems with rbt hashtable implementation:
1. Our internal hashing function returns uint64_t value, but it was
silently truncated to unsigned int in dns_name_hash() and
dns_name_fullhash() functions. As the SipHash 2-4 higher bits are
more random, we need to use the upper half of the return value.
2. The hashtable implementation in rbt.c was using modulo to pick the
slot number for the hash table. This has several problems because
modulo is: a) slow, b) oblivious to patterns in the input data. This
could lead to very uneven distribution of the hashed data in the
hashtable. Combined with the single-linked lists we use, it could
really hog-down the lookup and removal of the nodes from the rbt
tree[a]. The Fibonacci Hashing is much better fit for the hashtable
function here. For longer description, read "Fibonacci Hashing: The
Optimization that the World Forgot"[b] or just look at the Linux
kernel. Also this will make Diego very happy :).
3. The hashtable would rehash every time the number of nodes in the rbt
tree would exceed 3 * (hashtable size). The overcommit will make the
uneven distribution in the hashtable even worse, but the main problem
lies in the rehashing - every time the database grows beyond the
limit, each subsequent rehashing will be much slower. The mitigation
here is letting the rbt know how big the cache can grown and
pre-allocate the hashtable to be big enough to actually never need to
rehash. This will consume more memory at the start, but since the
size of the hashtable is capped to `1 << 32` (e.g. 4 mio entries), it
will only consume maximum of 32GB of memory for hashtable in the
worst case (and max-cache-size would need to be set to more than
4TB). Calling the dns_db_adjusthashsize() will also cap the maximum
size of the hashtable to the pre-computed number of bits, so it won't
try to consume more gigabytes of memory than available for the
database.
FIXME: What is the average size of the rbt node that gets hashed? I
chose the pagesize (4k) as initial value to precompute the size of
the hashtable, but the value is based on feeling and not any real
data.
For future work, there are more places where we use result of the hash
value modulo some small number and that would benefit from Fibonacci
Hashing to get better distribution.
Notes:
a. A doubly linked list should be used here to speedup the removal of
the entries from the hashtable.
b. https://probablydance.com/2018/06/16/fibonacci-hashing-the-optimization-that-the-world-forgot-or-a-better-alternative-to-integer-modulo/
The rewrite of BIND 9 build system is a large work and cannot be reasonable
split into separate merge requests. Addition of the automake has a positive
effect on the readability and maintainability of the build system as it is more
declarative, it allows conditional and we are able to drop all of the custom
make code that BIND 9 developed over the years to overcome the deficiencies of
autoconf + custom Makefile.in files.
This squashed commit contains following changes:
- conversion (or rather fresh rewrite) of all Makefile.in files to Makefile.am
by using automake
- the libtool is now properly integrated with automake (the way we used it
was rather hackish as the only official way how to use libtool is via
automake
- the dynamic module loading was rewritten from a custom patchwork to libtool's
libltdl (which includes the patchwork to support module loading on different
systems internally)
- conversion of the unit test executor from kyua to automake parallel driver
- conversion of the system test executor from custom make/shell to automake
parallel driver
- The GSSAPI has been refactored, the custom SPNEGO on the basis that
all major KRB5/GSSAPI (mit-krb5, heimdal and Windows) implementations
support SPNEGO mechanism.
- The various defunct tests from bin/tests have been removed:
bin/tests/optional and bin/tests/pkcs11
- The text files generated from the MD files have been removed, the
MarkDown has been designed to be readable by both humans and computers
- The xsl header is now generated by a simple sed command instead of
perl helper
- The <irs/platform.h> header has been removed
- cleanups of configure.ac script to make it more simpler, addition of multiple
macros (there's still work to be done though)
- the tarball can now be prepared with `make dist`
- the system tests are partially able to run in oot build
Here's a list of unfinished work that needs to be completed in subsequent merge
requests:
- `make distcheck` doesn't yet work (because of system tests oot run is not yet
finished)
- documentation is not yet built, there's a different merge request with docbook
to sphinx-build rst conversion that needs to be rebased and adapted on top of
the automake
- msvc build is non functional yet and we need to decide whether we will just
cross-compile bind9 using mingw-w64 or fix the msvc build
- contributed dlz modules are not included neither in the autoconf nor automake
- change name of 'bytes' to 'xfrsize' in dns_db_getsize() parameter list
and related variables; this is a more accurate representation of what
the function is doing
- change the size calculations in dns_db_getsize() to more accurately
represent the space needed for a *XFR message or journal file to contain
the data in the database. previously we returned the sizes of all
rdataslabs, including header overhead and offset tables, which
resulted in the database size being reported as much larger than the
equivalent *XFR or journal.
- map files caused a particular problem here: the fullname can't be
determined from the node while a file is being deserialized, because
the uppernode pointers aren't set yet. so we store "full name length"
in the dns_rbtnode structure while serializing, and clear it after
deserialization is complete.
Also disable the semantic patch as the code needs tweaks here and there because
some destroy functions might not destroy the object and return early if the
object is still in use.
This commit was done by hand to add the RUNTIME_CHECK() around stray
dns_name_copy() calls with NULL as third argument. This covers the edge cases
that doesn't make sense to write a semantic patch since the usage pattern was
unique or almost unique.
This second commit uses second semantic patch to replace the calls to
dns_name_copy() with NULL as third argument where the result was stored in a
isc_result_t variable. As the dns_name_copy(..., NULL) cannot fail gracefully
when the third argument is NULL, it was just a bunch of dead code.
Couple of manual tweaks (removing dead labels and unused variables) were
manually applied on top of the semantic patch.