- isc_mempool_get() can no longer fail; when there are no more objects
in the pool, more are always allocated. checking for NULL return is
no longer necessary.
- the isc_mempool_setmaxalloc() and isc_mempool_getmaxalloc() functions
are no longer used and have been removed.
Current mempools are kind of hybrid structures - they serve two
purposes:
1. mempool with a lock is basically static sized allocator with
pre-allocated free items
2. mempool without a lock is a doubly-linked list of preallocated items
The first kind of usage could be easily replaced with jemalloc small
sized arena objects and thread-local caches.
The second usage not-so-much and we need to keep this (in
libdns:message.c) for performance reasons.
The isc/platform.h header was left empty which things either already
moved to config.h or to appropriate headers. This is just the final
cleanup commit.
The last remaining defines needed for platforms without NAME_MAX and
PATH_MAX (I'm looking at you, GNU Hurd) were moved to isc/dir.h where
it's prevalently used.
The old approach where each zone structure has its own mutex that
a thread needs to obtain multiple locks to do safe keyfile I/O
operations lead to a race condition ending in a possible deadlock.
Consider a zone in two views. Each such zone is stored in a separate
zone structure. A thread that needs to read or write the key files for
this zone needs to obtain both mutexes in seperate structures. If
another thread is working on the same zone in a different view, they
race to get the locks. It would be possible that thread1 grabs the
lock of the zone in view1, while thread2 wins the race for the lock
of the zone in view2. Now both threads try to get the other lock, both
of them are already locked.
Ideally, when a thread wants to do key file operations, it only needs
to lock a single mutex. This commit introduces a key management hash
table, stored in the zonemgr structure. Each time a zone is being
managed, an object is added to the hash table (and removed when the
zone is being released). This object is identified by the zone name
and contains a mutex that needs to be locked prior to reading or
writing key files.
(cherry-picked from commit ef4619366d49efd46f9fae5f75c4a67c246ba2e6)
Similar to notify, add code to send and keep track of checkds requests.
On every zone_rekey event, we will check the DS at parental agents
(but we will only actually query parental agents if theree is a DS
scheduled to be published/withdrawn).
On a zone_rekey event, we will first clear the ongoing checkds requests.
Reset the counter, to avoid continuing KSK rollover premature.
This has the risk that if zone_rekey events happen too soon after each
other, there are redundant DS queries to the parental agents. But
if TTLs and the configured durations in the dnssec-policy are sane (as
in not ridiculous short) the chance of this happening is low.
This code gathers DNSSEC keys from key files and from the DNSKEY RRset.
It is used for the 'rndc dnssec -status' command, but will also be
needed for "checkds". Turn it into a function.
Change the static function 'get_ksk_zsk' to a library function that
can be used to determine the role of a dst_key. Add checks if the
boolean parameters to store the role are not NULL. Rename to
'dst_key_role'.
When performing the 'setnsec3param' task, zones that are not loaded will have
their task rescheduled. We should do this only if the zone load is still
pending, this prevents zones that failed to load get stuck in a busy wait and
causing a hang on shutdown.
The Makefile.tests was modifying global AM_CFLAGS and LDADD and could
accidentally pull /usr/include to be listed before the internal
libraries, which is known to cause problems if the headers from the
previous version of BIND 9 has been installed on the build machine.
This commit adds a unittest that tests private rdataset_getownercase()
and rdataset_setownercase() methods from rbtdb.c. The test setups
minimal mock dns_rbtdb_t and dns_rbtdbnode_t data structures.
As the rbtdb methods are generally hidden behind layers and layers, we
include the "rbtdb.c" directly from rbtdb_test.c, and thus we can use
the private methods and data structures directly. This also opens up
opportunity to add more unittest for the rbtdb private functions without
going through all the layers.
In the code that rdataset_setownercase() and rdataset_getownercase() we
now use tolower()/toupper()/isupper() functions appropriately instead of
rolling our own code.
When locking key files for a zone, we iterate over all the views and
lock a mutex inside the zone structure. However, if we envounter an
in-view zone, we will try to lock the key files twice, one time for
the home view and one time for the in-view view. This will lead to
a deadlock because one thread is trying to get the same lock twice.
When "max-cache-size" is changed to "unlimited" (or "0") for a running
named instance (using "rndc reconfig"), the hash table size limit for
each affected cache DB is not reset to the maximum possible value,
preventing those hash tables from being allowed to grow as a result of
new nodes being added.
Extend dns_rbt_adjusthashsize() to interpret "size" set to 0 as a signal
to remove any previously imposed limits on the hash table size. Adjust
API documentation for dns_db_adjusthashsize() accordingly. Move the
call to dns_db_adjusthashsize() from dns_cache_setcachesize() so that it
also happens when "size" is set to 0.
Upon creation, each dns_rbt_t structure has its "maxhashbits" field
initialized to the value of the RBT_HASH_MAX_BITS preprocessor macro,
i.e. 32. When the dns_rbt_adjusthashsize() function is called for the
first time for a given RBT (for cache RBTs, this happens when they are
first created, i.e. upon named startup), it lowers the value of the
"maxhashbits" field to the number of bits required to index the
requested number of hash table slots. When a larger hash table size is
subsequently requested, the value of the "maxhashbits" field should be
increased accordingly, up to RBT_HASH_MAX_BITS. However, the loop in
the rehash_bits() function currently ensures that the number of bits
necessary to index the resized hash table will not be larger than
rbt->maxhashbits instead of RBT_HASH_MAX_BITS, preventing the hash table
from being grown once the "maxhashbits" field of a given dns_rbt_t
structure is set to any value lower than RBT_HASH_MAX_BITS.
Fix by tweaking the loop guard condition in the rehash_bits() function
so that it compares the new number of bits used for indexing the hash
table against RBT_HASH_MAX_BITS rather than rbt->maxhashbits.
Instead of checking the value of the variable modified two lines earlier
(the number of SOA records present at the apex of the old version of the
zone), one of the RUNTIME_CHECK() assertions in zone_postload() checks
the number of SOA records present at the apex of the new version of the
zone, which is already checked before. Fix the assertion by making it
check the correct variable.
The Windows support has been completely removed from the source tree
and BIND 9 now no longer supports native compilation on Windows.
We might consider reviewing mingw-w64 port if contributed by external
party, but no development efforts will be put into making BIND 9 compile
and run on Windows again.
When named restarts, it will examine signed zones and checks if the
current denial of existence strategy matches the dnssec-policy. If not,
it will schedule to create a new NSEC(3) chain.
However, on startup the zone database may not be read yet, fooling
BIND that the denial of existence chain needs to be created. This
results in a replacement of the previous NSEC(3) chain.
Change the code such that if the NSEC3PARAM lookup failed (the result
did not return in ISC_R_SUCCESS or ISC_R_NOTFOUND), we will try
again later. The nsec3param structure has additional variables to
signal if the lookup is postponed. We also need to save the signal
if an explicit resalt was requested.
In addition to the two added boolean variables, we add a variable to
store the NSEC3PARAM rdata. This may have a yet to be determined salt
value. We can't create the private data yet because there may be a
mismatch in salt length and the NULL salt value.
When we rewrote the zone dumping to use the separate threadpool, the
dumping would acquire the read lock for the whole time the zone dumping
process is dumping the zone.
When combined with incoming IXFR that tries to acquire the write lock on
the same rwlock, we would end up blocking all the other readers.
In this commit, we pause the dbiterator every time we get next record
and before start dumping it to the disk.
Add a call to posix_fadvise() to indicate to the kernel, that `named`
won't be needing the dumped zone files any time soon with:
* POSIX_FADV_DONTNEED - The specified data will not be accessed in the
near future.
Notes:
POSIX_FADV_DONTNEED attempts to free cached pages associated with the
specified region. This is useful, for example, while streaming large
files. A program may periodically request the kernel to free cached
data that has already been used, so that more useful cached pages are
not discarded instead.
Previously, dumping the zones to the files were quantized, so it doesn't
slow down network IO processing. With the introduction of network
manager asynchronous threadpools, we can move the IO intensive work to
use that API and we don't have to quantize the work anymore as it the
file IO won't block anything except other zone dumping processes.
Rather than having an expensive 'expired' (fka 'stale_ttl') in the
rdataset structure, that is only used to be printed in a comment on
ancient RRsets, reuse the TTL field of the RRset.
Commit a83c8cb0afd88d54b9cf67239f2495c9b0391e97 updated masterdump so
that stale records in "rndc dumpdb" output no longer shows 0 TTLs. In
this commit we change the name of the `rdataset->stale_ttl` field to
`rdataset->expired` to make its purpose clearer, and set it to zero in
cases where it's unused.
Add 'rbtdb->serve_stale_ttl' to various checks so that stale records
are not purged from the cache when they've been stale for RBTDB_VIRTUAL
(300) seconds.
Increment 'ns_statscounter_usedstale' when a stale answer is used.
Note: There was a question of whether 'overmem_purge' should be
purging ancient records, instead of stale ones. It is left as purging
stale records, since stale records could take up the majority of the
cache.
This submission is copyrighted Akamai Technologies, Inc. and provided
under an MPL 2.0 license.
This commit was originally authored by Kevin Chen, and was updated by
Matthijs Mekking to match recent serve-stale developments.
configuring with --enable-mutex-atomics flagged these incorrectly
initialised variables on systems where pthread_mutex_init doesn't
just zero out the structure.
The isc_nmiface_t type was holding just a single isc_sockaddr_t,
so we got rid of the datatype and use plain isc_sockaddr_t in place
where isc_nmiface_t was used before. This means less type-casting and
shorter path to access isc_sockaddr_t members.
At the same time, instead of keeping the reference to the isc_sockaddr_t
that was passed to us when we start listening, we will keep a local
copy. This prevents the data race on destruction of the ns_interface_t
objects where pending nmsockets could reference the sockaddr of already
destroyed ns_interface_t object.
* dns_journal_next() leaves the read point in the journal after the
transaction header so journal_seek() should be inside the loop.
* we need to recover from transaction header inconsistencies
Additionally when correcting for <size, serial0, serial1, 0> the
correct consistency check is isc_serial_gt() rather than
isc_serial_ge(). All instances updated.
According to the measurements (recorded on GL!5085), the fillcount of 2
for namepool and fillcount of 4 for rdspool can fit 99.99% of request
for tested scenarios.
This was discovered by perf recording the single second recursive test
using flamethrower where the initial malloc lit up like a flare.
Previously, as a way of reducing the contention between threads a
clientmgr object would be created for each interface/IP address.
We tasks being more strictly bound to netmgr workers, this is no longer
needed and we can just create clientmgr object per worker queue (ncpus).
Each clientmgr object than would have a single task and single memory
context.
Similarly, the resolver code would create hundreds of memory contexts
just on the resolver setup. The contention will be reduced directly in
the allocator, so for now just attach to the view memory instead of
creating separate memory context for each bucket.
dns_name_copy() has been replaced nearly everywhere with
dns_name_copynf(). this commit changes the last two uses of
the original function. afterward, we can remove the old
dns_name_copy() implementation, and replace it with _copynf().