Historically, the inline keyword was a strong suggestion to the compiler
that it should inline the function marked inline. As compilers became
better at optimising, this functionality has receded, and using inline
as a suggestion to inline a function is obsolete. The compiler will
happily ignore it and inline something else entirely if it finds that's
a better optimisation.
Therefore, remove all the occurences of the inline keyword with static
functions inside single compilation unit and leave the decision whether
to inline a function or not entirely on the compiler
NOTE: We keep the usage the inline keyword when the purpose is to change
the linkage behaviour.
This commit converts the license handling to adhere to the REUSE
specification. It specifically:
1. Adds used licnses to LICENSES/ directory
2. Add "isc" template for adding the copyright boilerplate
3. Changes all source files to include copyright and SPDX license
header, this includes all the C sources, documentation, zone files,
configuration files. There are notes in the doc/dev/copyrights file
on how to add correct headers to the new files.
4. Handle the rest that can't be modified via .reuse/dep5 file. The
binary (or otherwise unmodifiable) files could have license places
next to them in <foo>.license file, but this would lead to cluttered
repository and most of the files handled in the .reuse/dep5 file are
system test files.
When dns_adb is shutting down, first the adb->shutting_down flag is set
and then task is created that runs shutdown_stage2() that sets the
shutdown flag on names and entries. However, when dns_adb_createfind()
is called, only the individual shutdown flags are being checked, and the
global adb->shutting_down flag was not checked. Because of that it was
possible for a different thread to slip in and create new find between
the dns_adb_shutdown() and dns_adb_detach(), but before the
shutdown_stage2() task is complete. This was detected by
ThreadSanitizer as data race because the zonetable might have been
already detached by dns_view shutdown process and simultaneously
accessed by dns_adb_createfind().
This commit converts the adb->shutting_down to atomic_bool to prevent
the global adb lock when creating the find.
The NAME_FETCH_A and NAME_FETCH_AAAA macros were meant to be
boolean, indicating whether the pointers were set or not, while
the NAME_FETCH_V4 and NAME_FETCH_V6 macros were meant to return
the pointer values. The latter were only used as booleans, so
they've been removed in favor of the former.
Also did some style cleanup and removed an unreachable code block.
Remove the dynamic registration of result codes. Convert isc_result_t
from unsigned + #defines into 32-bit enum type in grand unified
<isc/result.h> header. Keep the existing values of the result codes
even at the expense of the description and identifier tables being
unnecessary large.
Additionally, add couple of:
switch (result) {
[...]
default:
break;
}
statements where compiler now complains about missing enum values in the
switch statement.
- The `timeout_action` parameter to dns_dispatch_addresponse() been
replaced with a netmgr callback that is called when a dispatch read
times out. this callback may optionally reset the read timer and
resume reading.
- Added a function to convert isc_interval to milliseconds; this is used
to translate fctx->interval into a value that can be passed to
dns_dispatch_addresponse() as the timeout.
- Note that netmgr timeouts are accurate to the millisecond, so code to
check whether a timeout has been reached cannot rely on microsecond
accuracy.
- If serve-stale is configured, then a timeout received by the resolver
may trigger it to return stale data, and then resume waiting for the
read timeout. this is no longer based on a separate stale timer.
- The code for canceling requests in request.c has been altered so that
it can run asynchronously.
- TCP timeout events apply to the dispatch, which may be shared by
multiple queries. since in the event of a timeout we have no query ID
to use to identify the resp we wanted, we now just send the timeout to
the oldest query that was pending.
- There was some additional refactoring in the resolver: combining
fctx_join() and fctx_try_events() into one function to reduce code
duplication, and using fixednames in fetchctx and fetchevent.
- Incidental fix: new_adbaddrinfo() can't return NULL anymore, so the
code can be simplified.
On the isc_mem water change the old water_t structure could be used
after free. Instead of introducing reference counting on the hot-path
we are going to introduce additional constraints on the
isc_mem_setwater. Once it's set for the first time, the additional
calls have to be made with the same water and water_arg arguments.
The proper way how to disable the water limit in the isc_mem context is
to call:
isc_mem_setwater(ctx, NULL, NULL, 0, 0);
this ensures that the old water callback is called with ISC_MEM_LOWATER
if the callback was called with ISC_MEM_HIWATER before.
Historically, there were some places where the limits were disabled by
calling:
isc_mem_setwater(ctx, water, water_arg, 0, 0);
which would also call the old callback, but it also causes the water_t
to be allocated and extra check to be executed because water callback is
not NULL.
This commits unifies the calls to disable water to the preferred form.
Current mempools are kind of hybrid structures - they serve two
purposes:
1. mempool with a lock is basically static sized allocator with
pre-allocated free items
2. mempool without a lock is a doubly-linked list of preallocated items
The first kind of usage could be easily replaced with jemalloc small
sized arena objects and thread-local caches.
The second usage not-so-much and we need to keep this (in
libdns:message.c) for performance reasons.
The DNS Flag Day 2020 aims to remove the IP fragmentation problem from
the UDP DNS communication. In this commit, we implement the required
changes and simplify the logic for picking the EDNS Buffer Size.
1. The defaults for `edns-udp-size`, `max-udp-size` and
`nocookie-udp-size` have been changed to `1232` (the value picked by
DNS Flag Day 2020).
2. The probing heuristics that would try 512->4096->1432->1232 buffer
sizes has been removed and the resolver will always use just the
`edns-udp-size` value.
3. Instead of just disabling the PMTUD mechanism on the UDP sockets, we
now set IP_DONTFRAG (IPV6_DONTFRAG) flag. That means that the UDP
packets won't get ever fragmented. If the ICMP packets are lost the
UDP will just timeout and eventually be retried over TCP.
There were several problems with rbt hashtable implementation:
1. Our internal hashing function returns uint64_t value, but it was
silently truncated to unsigned int in dns_name_hash() and
dns_name_fullhash() functions. As the SipHash 2-4 higher bits are
more random, we need to use the upper half of the return value.
2. The hashtable implementation in rbt.c was using modulo to pick the
slot number for the hash table. This has several problems because
modulo is: a) slow, b) oblivious to patterns in the input data. This
could lead to very uneven distribution of the hashed data in the
hashtable. Combined with the single-linked lists we use, it could
really hog-down the lookup and removal of the nodes from the rbt
tree[a]. The Fibonacci Hashing is much better fit for the hashtable
function here. For longer description, read "Fibonacci Hashing: The
Optimization that the World Forgot"[b] or just look at the Linux
kernel. Also this will make Diego very happy :).
3. The hashtable would rehash every time the number of nodes in the rbt
tree would exceed 3 * (hashtable size). The overcommit will make the
uneven distribution in the hashtable even worse, but the main problem
lies in the rehashing - every time the database grows beyond the
limit, each subsequent rehashing will be much slower. The mitigation
here is letting the rbt know how big the cache can grown and
pre-allocate the hashtable to be big enough to actually never need to
rehash. This will consume more memory at the start, but since the
size of the hashtable is capped to `1 << 32` (e.g. 4 mio entries), it
will only consume maximum of 32GB of memory for hashtable in the
worst case (and max-cache-size would need to be set to more than
4TB). Calling the dns_db_adjusthashsize() will also cap the maximum
size of the hashtable to the pre-computed number of bits, so it won't
try to consume more gigabytes of memory than available for the
database.
FIXME: What is the average size of the rbt node that gets hashed? I
chose the pagesize (4k) as initial value to precompute the size of
the hashtable, but the value is based on feeling and not any real
data.
For future work, there are more places where we use result of the hash
value modulo some small number and that would benefit from Fibonacci
Hashing to get better distribution.
Notes:
a. A doubly linked list should be used here to speedup the removal of
the entries from the hashtable.
b. https://probablydance.com/2018/06/16/fibonacci-hashing-the-optimization-that-the-world-forgot-or-a-better-alternative-to-integer-modulo/
If there are more that 5 NS record for a zone only perform a
maximum of 4 address lookups for all the name servers. This
limits the amount of remote lookup performed for server
addresses at each level for a given query.
Due to a way the stdatomic.h shim is implemented on Windows, the MSVC
always things that the outside type is the largest - atomic_(u)int_fast64_t.
This can lead to false positives as this one:
lib\dns\adb.c(3678): warning C4477: 'fprintf' : format string '%u' requires an argument of type 'unsigned int', but variadic argument 2 has type 'unsigned __int64'
We workaround the issue by loading the value in a scoped local variable
with correct type first.
Both clang-tidy and uncrustify chokes on statement like this:
for (...)
if (...)
break;
This commit uses a very simple semantic patch (below) to add braces around such
statements.
Semantic patch used:
@@
statement S;
expression E;
@@
while (...)
- if (E) S
+ { if (E) { S } }
@@
statement S;
expression E;
@@
for (...;...;...)
- if (E) S
+ { if (E) { S } }
@@
statement S;
expression E;
@@
if (...)
- if (E) S
+ { if (E) { S } }
Also disable the semantic patch as the code needs tweaks here and there because
some destroy functions might not destroy the object and return early if the
object is still in use.
The isc_mempool_create() function now cannot fail with ISC_R_MEMORY.
This commit removes all the checks on the return code using the semantic
patch from previous commit, as isc_mempool_create() now returns void.
- TSAN can't handle more than 64 locks in one thread, lock ADB bucket-by-bucket
in TSAN mode. This means that the dump won't be consistent but it's good
enough for testing
- Use proper order when unlocking adb->namelocks and adb->entrylocks when
dumping ADB.
This second commit uses second semantic patch to replace the calls to
dns_name_copy() with NULL as third argument where the result was stored in a
isc_result_t variable. As the dns_name_copy(..., NULL) cannot fail gracefully
when the third argument is NULL, it was just a bunch of dead code.
Couple of manual tweaks (removing dead labels and unused variables) were
manually applied on top of the semantic patch.
- make qname-minimization option tristate {strict,relaxed,disabled}
- go straight for the record if we hit NXDOMAIN in relaxed mode
- go straight for the record after 3 labels without new delegation or 7 labels total
- use start of fetch (and not time of response) as 'now' time for querying cache for
zonecut when following delegation.
This commit reverts the previous change to use system provided
entropy, as (SYS_)getrandom is very slow on Linux because it is
a syscall.
The change introduced in this commit adds a new call isc_nonce_buf
that uses CSPRNG from cryptographic library provider to generate
secure data that can be and must be used for generating nonces.
Example usage would be DNS cookies.
The isc_random() API has been changed to use fast PRNG that is not
cryptographically secure, but runs entirely in user space. Two
contestants have been considered xoroshiro family of the functions
by Villa&Blackman and PCG by O'Neill. After a consideration the
xoshiro128starstar function has been used as uint32_t random number
provider because it is very fast and has good enough properties
for our usage pattern.
The other change introduced in the commit is the more extensive usage
of isc_random_uniform in places where the usage pattern was
isc_random() % n to prevent modulo bias. For usage patterns where
only 16 or 8 bits are needed (DNS Message ID), the isc_random()
functions has been renamed to isc_random32(), and isc_random16() and
isc_random8() functions have been introduced by &-ing the
isc_random32() output with 0xffff and 0xff. Please note that the
functions that uses stripped down bit count doesn't pass our
NIST SP 800-22 based random test.
The three functions has been modeled after the arc4random family of
functions, and they will always return random bytes.
The isc_random family of functions internally use these CSPRNG (if available):
1. getrandom() libc call (might be available on Linux and Solaris)
2. SYS_getrandom syscall (might be available on Linux, detected at runtime)
3. arc4random(), arc4random_buf() and arc4random_uniform() (available on BSDs and Mac OS X)
4. crypto library function:
4a. RAND_bytes in case OpenSSL
4b. pkcs_C_GenerateRandom() in case PKCS#11 library
Replace dns_fixedname_init() calls followed by dns_fixedname_name()
calls with calls to dns_fixedname_initname() where it is possible
without affecting current behavior and/or performance.
This patch was mostly prepared using Coccinelle and the following
semantic patch:
@@
expression fixedname, name;
@@
- dns_fixedname_init(&fixedname);
...
- name = dns_fixedname_name(&fixedname);
+ name = dns_fixedname_initname(&fixedname);
The resulting set of changes was then manually reviewed to exclude false
positives and apply minor tweaks.
It is likely that more occurrences of this pattern can be refactored in
an identical way. This commit only takes care of the low-hanging fruit.