> Put a space before opening parentheses only after control statement
> keywords (for/if/while...) except this option doesn’t apply to ForEach
> and If macros. This is useful in projects where ForEach/If macros are
> treated as function calls instead of control statements.
All databases in the codebase follow the same structure: a database is
an associative container from DNS names to nodes, and each node is an
associative container from RR types to RR data.
Each database implementation (qpzone, qpcache, sdlz, builtin, dyndb) has
its own corresponding node type (qpznode, qpcnode, etc). However, some
code needs to work with nodes generically regardless of their specific
type - for example, to acquire locks, manage references, or
register/unregister slabs from the heap.
Currently, these generic node operations are implemented as methods in
the database vtable, which creates problematic coupling between database
and node lifetimes. If a node outlives its parent database, the node
destructor will destroy all RR data, and each RR data destructor will
try to unregister from heaps by calling a virtual function from the
database vtable. Since the database was already freed, this causes a
crash.
This commit breaks the coupling by standardizing the layout of all
database nodes, adding a dedicated vtable for node operations, and
moving node-specific methods from the database vtable to the node
vtable.
All the applications built on top of the loop manager were required to
create just a single instance of the loop manager. Refactor the loop
manager to not expose this instance to the callers and keep the loop
manager object internal to the isc_loop compilation unit.
This significantly simplifies a number of data structures and calls to
the isc_loop API.
Instead of giving the memory context names with an explicit call to
isc_mem_setname(), add the name to isc_mem_create() call to have all the
memory contexts an unconditional name.
Remove the complicated mechanism that could be (in theory) used by
external libraries to register new categories and modules with
statically defined lists in <isc/log.h>. This is similar to what we
have done for <isc/result.h> result codes. All the libraries are now
internal to BIND 9, so we don't need to provide a mechanism to register
extra categories and modules.
Previously, the number of RR types for a single owner name was limited
only by the maximum number of the types (64k). As the data structure
that holds the RR types for the database node is just a linked list, and
there are places where we just walk through the whole list (again and
again), adding a large number of RR types for a single owner named with
would slow down processing of such name (database node).
Add a configurable limit to cap the number of the RR types for a single
owner. This is enforced at the database (rbtdb, qpzone, qpcache) level
and configured with new max-types-per-name configuration option that
can be configured globally, per-view and per-zone.
Previously, the number of RRs in the RRSets were internally unlimited.
As the data structure that holds the RRs is just a linked list, and
there are places where we just walk through all of the RRs, adding an
RRSet with huge number of RRs inside would slow down processing of said
RRSets.
Add a configurable limit to cap the number of the RRs in a single RRSet.
This is enforced at the database (rbtdb, qpzone, qpcache) level and
configured with new max-records-per-type configuration option that can
be configured globally, per-view and per-zone.
Replace the ISC_LIST based deadnodes implementation with isc_queue which
is wait-free and we don't have to acquire neither the tree nor node lock
to append nodes to the queue and the cleaning process can also
copy (splice) the list into a local copy without acquiring the list.
Currently, there's little benefit to this as we need to hold those
locks anyway, but in the future as we move to RCU based implementation,
this will be ready.
To align the cleaning with our event loop based model, remove the
hardcoded count for the node locks and use the number of the event loops
instead. This way, each event loop can have its own cleaning as part of
the process. Use uniform random numbers to spread the nodes evenly
between the buckets (instead of hashing the domain name).
When the cache's memory context was in over memory state when the
cache was flushed it resulted in LRU cleaning removing newly entered
data in the new cache straight away until the old cache had been
destroyed enough to take it out of over memory state. When flushing
the cache create a new memory context for the new db to prevent this.
The dns_cache_flush() drops the old database and creates a new one, but
it forgets to pass the loop that runs the node pruning and cleaning
the rbtdb when flushing it next time. This causes the cleaning to skip
cleaning the parent nodes (with .down == NULL) leading to increased
memory usage over time until the database is unable to keep up and just
stays overmem all the time.
by default, QPDB is the database used by named and all tools and
unit tests. the old default of RBTDB can now be restored by using
"configure --with-zonedb=rbt --with-cachedb=rbt".
some tests have been fixed so they will work correctly with either
database.
CHANGES and release notes have been updated to reflect this change.
replace the string "rbt" throughout BIND with "qp" so that
qpdb databases will be used by default instead of rbtdb.
rbtdb databases can still be used by specifying "database rbt;"
in a zone statement.
Previously, there were two methods of working with the overmem
condition:
1. hi/lo water callback - when the overmem condition was reached
for the first time, the water callback was called with HIWATER
mark and .is_overmem boolean was set internally. Similarly,
when the used memory went below the lo water mark, the water
callback would be called with LOWATER mark and .is_overmem
was reset. This check would be called **every** time memory
was allocated or freed.
2. isc_mem_isovermem() - a simple getter for the internal
.is_overmem flag
This commit refactors removes the first method and move the hi/lo water
checks to the isc_mem_isovermem() function, thus we now have only a
single method of checking overmem condition and the check for hi/lo
water is removed from the hot path for memory contexts that doesn't use
overmem checks.
Commit 4db150437e14b28c5b50ae466af9ce502fd73185 incorrectly removed the
call to isc_mem_setwater() from dns_cache_setcachesize(). The water()
function is a no-op, but we still need to set high- and low-water marks
in the memory context, otherwise overmem conditions will not be
detected.
When flushing the cache, we create a new cache database. The serve-stale
settings need to be restored after doing this. We already did this
for max-stale-ttl, but forgot to do this for stale-refresh-time.
The isc_stats_create() can no longer return anything else than
ISC_R_SUCCESS. Refactor isc_stats_create() and its variants in libdns,
libns and named to just return void.
to reduce the amount of common code that will need to be shared
between the separated cache and zone database implementations,
clean up unused portions of dns_db.
the methods dns_db_dump(), dns_db_isdnssec(), dns_db_printnode(),
dns_db_resigned(), dns_db_expirenode() and dns_db_overmem() were
either never called or were only implemented as nonoperational stub
functions: they have now been removed.
dns_db_nodefullname() was only used in one place, which turned out
to be unnecessary, so it has also been removed.
dns_db_ispersistent() and dns_db_transfernode() are used, but only
the default implementation in db.c was ever actually called. since
they were never overridden by database methods, there's no need to
retain methods for them.
in rbtdb.c, beginload() and endload() methods are no longer defined for
the cache database, because that was never used (except in a few unit
tests which can easily be modified to use the zone implementation
instead). issecure() is also no longer defined for the cache database,
as the cache is always insecure and the default implementation of
dns_db_issecure() returns false.
for similar reasons, hashsize() is no longer defined for zone databases.
implementation functions that are shared between zone and cache are now
prepended with 'dns__rbtdb_' so they can become nonstatic.
serve_stale_ttl is now a common member of dns_db.
The total memory counter had again little or no meaning when we removed
the internal memory allocator. It was just a monotonic counter that
would count add the allocation sizes but never subtracted anything, so
it would be just a "big number".
The maxinuse memory counter indicated the highest amount of
memory allocated in the past. Checking and updating this high-
water mark value every time memory was allocated had an impact
on server performance, so it has been removed. Memory size can
be monitored more efficiently via an external tool logging RSS.
'DNS_DB_STALEOK' returns stale rdatasets as well as current rdatasets.
'DNS_DB_EXPIREDOK' returns expired rdatasets as well as current
rdatasets. This option is currently only set when DNS_DB_STALEOK is
also set.
The mechanism for associating a worker task to a database now
uses loops rather than tasks.
For this reason, the parameters to dns_cache_create() have been
updated to take a loop manager rather than a task manager.
The dns_cache API contained a cache cleaning mechanism that would be
disabled for 'rbt' based cache. As named doesn't have any other cache
implementations, remove the cache cleaning mechanism from dns_cache API.
Mostly generated automatically with the following semantic patch,
except where coccinelle was confused by #ifdef in lib/isc/net.c
@@ expression list args; @@
- UNEXPECTED_ERROR(__FILE__, __LINE__, args)
+ UNEXPECTED_ERROR(args)
@@ expression list args; @@
- FATAL_ERROR(__FILE__, __LINE__, args)
+ FATAL_ERROR(args)
Previously:
* applications were using isc_app as the base unit for running the
application and signal handling.
* networking was handled in the netmgr layer, which would start a
number of threads, each with a uv_loop event loop.
* task/event handling was done in the isc_task unit, which used
netmgr event loops to run the isc_event calls.
In this refactoring:
* the network manager now uses isc_loop instead of maintaining its
own worker threads and event loops.
* the taskmgr that manages isc_task instances now also uses isc_loopmgr,
and every isc_task runs on a specific isc_loop bound to the specific
thread.
* applications have been updated as necessary to use the new API.
* new ISC_LOOP_TEST macros have been added to enable unit tests to
run isc_loop event loops. unit tests have been updated to use this
where needed.
* isc_timer was rewritten using the uv_timer, and isc_timermgr_t was
completely removed; isc_timer objects are now directly created on the
isc_loop event loops.
* the isc_timer API has been simplified. the "inactive" timer type has
been removed; timers are now stopped by calling isc_timer_stop()
instead of resetting to inactive.
* isc_manager now creates a loop manager rather than a timer manager.
* modules and applications using isc_timer have been updated to use the
new API.
Previously, tasks could be created either unbound or bound to a specific
thread (worker loop). The unbound tasks would be assigned to a random
thread every time isc_task_send() was called. Because there's no logic
that would assign the task to the least busy worker, this just creates
unpredictability. Instead of random assignment, bind all the previously
unbound tasks to worker 0, which is guaranteed to exist.
After removing the isc_task_onshutdown(), the isc_task_shutdown() and
isc_task_destroy() became obsolete.
Remove calls to isc_task_shutdown() and replace the calls to
isc_task_destroy() with isc_task_detach().
Simplify the internal logic to destroy the task when the last reference
is removed.
The isc_task_onshutdown() was used to post event that should be run when
the task is being shutdown. This could happen explicitly in the
isc_test_shutdown() call or implicitly when we detach the last reference
to the task and there are no more events posted on the task.
This whole task onshutdown mechanism just makes things more complicated,
and it's easier to post the "shutdown" events when we are shutting down
explicitly and the existing code already always knows when it should
shutdown the task that's being used to execute the onshutdown events.
Replace the isc_task_onshutdown() calls with explicit calls to execute
the shutdown tasks.
Historically, the inline keyword was a strong suggestion to the compiler
that it should inline the function marked inline. As compilers became
better at optimising, this functionality has receded, and using inline
as a suggestion to inline a function is obsolete. The compiler will
happily ignore it and inline something else entirely if it finds that's
a better optimisation.
Therefore, remove all the occurences of the inline keyword with static
functions inside single compilation unit and leave the decision whether
to inline a function or not entirely on the compiler
NOTE: We keep the usage the inline keyword when the purpose is to change
the linkage behaviour.
This commit converts the license handling to adhere to the REUSE
specification. It specifically:
1. Adds used licnses to LICENSES/ directory
2. Add "isc" template for adding the copyright boilerplate
3. Changes all source files to include copyright and SPDX license
header, this includes all the C sources, documentation, zone files,
configuration files. There are notes in the doc/dev/copyrights file
on how to add correct headers to the new files.
4. Handle the rest that can't be modified via .reuse/dep5 file. The
binary (or otherwise unmodifiable) files could have license places
next to them in <foo>.license file, but this would lead to cluttered
repository and most of the files handled in the .reuse/dep5 file are
system test files.
dns_db_nodecount can now be used to get counts from the auxilary
rbt databases. The existing node count is returned by
tree=dns_dbtree_main. The nsec and nsec3 node counts by dns_dbtree_nsec
and dns_dbtree_nsec3 respectively.
Note when synthesising answer involving wildcards we look in the
cache multiple times, once for the QNAME and once for the wildcard
name which is constucted by looking at the names from the covering
NSEC return by the QNAME miss.
Originally, the hash table used in RBT database would be resized when it
reached certain number of elements (defined by overcommit). This was
causing resolution brownouts for busy resolvers, because the rehashing
could take several seconds to complete. This was mitigated by
pre-allocating the hash table in the RBT database used for caching to be
large-enough as determined by max-cache-size. The downside of this
solution was that the pre-allocated hash table could take a significant
chunk of the memory even when the resolver cache would be otherwise
empty because the default value for max-cache-size is 90% of available
memory.
Implement incremental resizing[1] to perform the rehashing gradually:
1. During the resize, allocate the new hash table, but keep the old
table unchanged.
2. In each lookup or delete operation, check both tables.
3. Perform insertion operations only in the new table.
4. At each insertion also move r elements from the old table to the new
table.
5. When all elements are removed from the old table, deallocate it.
To ensure that the old table is completely copied over before the new
table itself needs to be enlarged, it is necessary to increase the
size of the table by a factor of at least (r + 1)/r during resizing.
In our implementation r is equal to 1.
The downside of this approach is that the old table and the new table
could stay in memory for longer when there are no new insertions into
the hash table for prolonged periods of time as the incremental
rehashing happens only during the insertions.
The upside of this approach is that it's no longer necessary to
pre-allocate large hash table, because the RBT hash table rehashing
doesn't cause resolution brownouts anymore and thus we can use the
memory as needed.
1. https://en.m.wikipedia.org/wiki/Hash_table#Dynamic_resizing
Remove the dynamic registration of result codes. Convert isc_result_t
from unsigned + #defines into 32-bit enum type in grand unified
<isc/result.h> header. Keep the existing values of the result codes
even at the expense of the description and identifier tables being
unnecessary large.
Additionally, add couple of:
switch (result) {
[...]
default:
break;
}
statements where compiler now complains about missing enum values in the
switch statement.
as libdns is no longer exported, it's not necessary to have
init and shutdown functions. the only purpose they served
was to create a private mctx and run dst_lib_init(), which
can be called directly instead.
"cache-file" was already documented as intended for testing
purposes only and not to be used, so we can remove it without
waiting. this commit marks the option as "ancient", and
removes all the documentation and implementing code, including
dns_cache_setfilename() and dns_cache_dump().
it also removes the documentation for the '-x cachefile`
parameter to named, which had already been removed, but the man
page was not updated at the time.