The previous refactoring added an assertion failure when negative RRSIG
would be added to the cache database. As result, any query for RRSIG in
any unsigned zone would trigger that assertion failure.
Allow the negative RRSIG entries to be stored in the cache database
again as not caching these would trigger new remote fetch every time
such query would be received from a client.
the if statements calling iterator_active() checked the EXISTS
flag on the header and then iterator_active() checked it again.
simplify so only the caller checks it.
As the qpcache has only one active header at the time, we can move the
SIEVE-LRU members from dns_slabheader_t to dns_slabtop_t structure thus
saving a little bit of memory in each slabheader and using it only once
per type.
The code that combines the top-level hierarchy (per-typepair) and
individual slab headers (per-version) saves a little bit of memory, but
makes the code convoluted, hard to read and hard to modify. Change the
top level hierarchy to be of different type with individual slabheaders
"hanging" from the per-typepair dns_slabtop_t structure.
This change makes the future enhancements (changing the top level data
structure for faster lookups; coupling type + sig(type) into single
slabtop) much easier.
The slabheader doesn't directly attach or link to 'db' anymore. Pass
only the memory context needed to create the slab header to make the
lack of relation ship more prominent.
Also don't call dns_slabheader_reset() from dns_slabheader_new(), it has
no added value.
Previously, when the new header was NOT added into the cache, we would
increment and then decrement stat counters immediately.
Delay incrementing the stat counters until after the newheader has
been actually added into the database.
A little cleanup to accomodate the fact that qpdb->rrsetstats is always
available was also done here.
Change the add() function in the dns_qpcache to properly return
DNS_R_UNCHANGED if the newheader was not actually consumed, and move
the dns_slabheader_destroy() call outside of the add() function.
The slabheader.c, qpzone.c and qpcache.c had couple of shared macros
that were copied and paste between the units. Move these common
attributes access macros into private header, so these can be shared
among the three compilation units.
Previously, when a negative header was stored in the cache, it would be
stored in the dns_typepair_t as .type = 0, .covers = <negative type>.
When searching the cache internally, we would have to look for both
positive and negative typepair and the slabheader .down list could be a
mix of positive and negative types.
Remove the extra representation of the negative type and simply use the
negative attribute on the slabheader. Other units (namely dns_ncache)
can still insert the (0, type) negative rdatasets into the cache, but
internally, those will be converted into (type, 0) slabheaders, and vice
versa - when binding the rdatasets, the negative (type, 0) slabheader
will be converted to (0, type) rdataset. Simple DNS_TYPEPAIR() helper
macro was added to simplify converting single rdatatype to typepair
value.
As a side-effect, the search logic in all places can exit early if
there's a negative header for the type we are looking for, f.e. when
searching for the zone cut, we don't have to walk through all the
slabheaders, if there's a stored negative slabheader.
Use dns_rdatatype_none instead of plain '0' for dns_rdatatype_t and
dns_typepair_t manipulation. While plain '0' is technically ok, it
doesn't carry the required semantic meaning, and using the named
dns_rdatatype_none constant makes the code more readable.
The RR type 0 is a reserved type for SIG[1] resource record. It should
not be ever inserted into the database nor queried. Add a special
handling to bail out quickly with DNS_R_DISALLOWED when inserting and
ISC_R_NOTFOUND when looking up TYPE0. This is also prerequisite for
stricter checks in the follow-up commit.
1. https://www.rfc-editor.org/rfc/rfc2535#section-4.1.8.1
The dns_typepair_t and dns_rdatatype_t variables were both named 'type'
in multiple places. Rename all dns_typepair_t variables to include word
'pair' in the variable name to make sure that the distinction between
the two types is more clear.
All databases in the codebase follow the same structure: a database is
an associative container from DNS names to nodes, and each node is an
associative container from RR types to RR data.
Each database implementation (qpzone, qpcache, sdlz, builtin, dyndb) has
its own corresponding node type (qpznode, qpcnode, etc). However, some
code needs to work with nodes generically regardless of their specific
type - for example, to acquire locks, manage references, or
register/unregister slabs from the heap.
Currently, these generic node operations are implemented as methods in
the database vtable, which creates problematic coupling between database
and node lifetimes. If a node outlives its parent database, the node
destructor will destroy all RR data, and each RR data destructor will
try to unregister from heaps by calling a virtual function from the
database vtable. Since the database was already freed, this causes a
crash.
This commit breaks the coupling by standardizing the layout of all
database nodes, adding a dedicated vtable for node operations, and
moving node-specific methods from the database vtable to the node
vtable.
All the applications built on top of the loop manager were required to
create just a single instance of the loop manager. Refactor the loop
manager to not expose this instance to the callers and keep the loop
manager object internal to the isc_loop compilation unit.
This significantly simplifies a number of data structures and calls to
the isc_loop API.
dns_qp_lookup was returning ISC_R_NOTFOUND rather than DNS_R_PARTIALMATCH
when there wasn't a parent with a NSEC record in the cache. This was
causing find_coveringnsec to fail rather than returing the covering NSEC.
Is there a time when new_qp(c|z)node() would not be followed by
assignment of the namespace? No, so let's add the assignment to the
function that creates the node.
Now that we have to code working, rename 'dns_qp_lookup2' back to
'dns_qp_lookup' and adjust all remaining 'dns_qp_lookup' occurrences
to take a space=0 parameter.
For now we only allow DNS_DB_NSEC_* values so it makes sense to change
the type to an enum.
Rename 'denial' to the more intuitive 'space', indicating the namespace
of the keyvalue pair.
In preparation to merge the three qp tries (tree, nsec, nsec3) into
one, add the piece of information into the qpkey. This is the most
significant bit of information, so prepend the denial type to the qpkey.
This means we need to pass on the denial type when constructing the
qpkey from a name, or doing a lookup.
Reuse the the DNS_DB_NSEC_* values. Most qp tries in the code we just
pass on 0 (nta, rpz, zt, etc.), because there is no need for denial of
existence, but for qpzone and qpcache we must pass the right value.
Change the code, so that node->nsec no longer can have the value
DNS_DB_NSEC_HAS_NSEC, instead track this in a new attribute 'havensec'.
Since we use node->nsec to convert names to keys, the value MUST be set
before inserting the node into the qp-trie.
Update the fuzzing and unit tests accordingly. This only adds a few
extra test cases, more are needed.
In the qp_test.c we can remove test code for empty keys as this is
no longer possible.
RRset ordering is now an enum inside struct rdataset attributes. This
was done to keep size to of the structure to its original value before
this MR.
I expect zero performance impact but it should be easier to deal with
attributes in debuggers and language servers.
Per pspacek, currently qp and qpcache logs are too verbose and enabled at a
level too low compared to how often the logging is useful.
This commit increases the logging level, while keeping it configurable
via a define.
qp-tries allocate their nodes (twigs) in chunks to reduce allocator
pressure and improve memory locality. The choice of chunk size presents
a tradeoff: larger chunks benefit qp-tries with many values (as seen
in large zones and resolvers) but waste memory in smaller use cases.
Previously, our fixed chunk size of 2^10 twigs meant that even an
empty qp-trie would consume 12KB of memory, while reducing this size
would negatively impact resolver performance.
This commit implements an adaptive chunking strategy that:
- Tracks the size of the most recently allocated chunk.
- Doubles the chunk size for each new allocation until reaching a
predefined maximum.
This approach effectively balances memory efficiency for small tries
while maintaining the performance benefits of larger chunk sizes for
bigger data structures.
This commit also splits the callback freeing qpmultis into two
phases, one that frees the underlying qptree, and one that reclaims
the qpmulti memory. In order to prevent races between the qpmulti
destructor and chunk garbage collection jobs, the second phase is
protected by reference counting.
The 'qpnode->nsec' structure member isn't protected by a lock and
there's a data race between the reading and writing parts in the
qpcache_addrdataset() function. Use a node read lock for accessing
'qpnode->nsec' in qpcache_addrdataset(). Add an additional
'qpnode->nsec != DNS_DB_NSEC_HAS_NSEC' check under a write lock
to be sure that no other competing thread changed it in the time
when the read lock is unlocked and a write lock is not acquired
yet.
The *DB_VIRTUAL value was introduced to allow the clients (presumably
ns_clients) that has been running for some time to access the cached
data that was valid at the time of its inception. The default value
of 5 minutes is way longer than longevity of the ns_client object as
the resolver will give up after 2 minutes.
Reduce the value to 10 seconds to accomodate to honour the original
more closely, but still allow some leeway for clients that started some
time in the past.
Our measurements show that even setting this value to 0 has no
statistically significant effect, thus the value of 10 seconds should be
on the safe side.
Profiles show that an high amount of CPU time spent in memset.
By removing zero initalization of certain large buffers we improve
performance in certain authoritative workloads.
This is the core implementation of the SIEVE algorithm described in the
following paper:
Zhang, Yazhuo, Juncheng Yang, Yao Yue, Ymir Vigfusson, and K V
Rashmi. “SIEVE Is Simpler than LRU: An Efficient Turn-Key Eviction
Algorithm for Web Caches,” n.d.. available online from
https://junchengyang.com/publication/nsdi24-SIEVE.pdf
In QPcache, there were two places that tried to upgrade the lock. In
clean_stale_header(), the code would try to upgrade the lock and cleanup
the header, and in qpzonode_release(), the tree lock would be optionally
upgraded, so we can cleanup the node directly if empty. These
optimizations are not needed and they have no effect on the performance.
dns_zonekey_iszonekey() was the only function defined in the
dns_zonekey module, and was only called from one place. it
makes more sense to group this with dns_dnssec functions.
The qpcache_findzonecut() accepts two "foundnames": 'foundname' and
'dcname' could be NULL. Originally, when 'dcname' would be NULL, the
'dcname' would be set to 'foundname'. Then code like this was present:
result = find_deepest_zonecut(&search, node, nodep, foundname,
rdataset,
sigrdataset DNS__DB_FLARG_PASS);
dns_name_copy(foundname, dcname);
Which basically means that we are copying the .ndata over itself for no
apparent reason. Cleanup the dcname vs foundname usage.
Co-authored-by: Evan Hunt <each@isc.org>
Co-authored-by: Ondřej Surý <ondrej@isc.org>
The offsets were meant to speed-up the repeated dns_name operations, but
it was experimentally proven that there's actually no real-world
benefit. Remove the offsets and labels fields from the dns_name and the
static offsets fields to save 128 bytes from the fixedname in favor of
calculating labels and offsets only when needed.
Acquire the database refernce in the detachnode() to prevent the last
reference to be release while the NODE_LOCK being locked. The NODE_LOCK
is locked/unlocked inside the RCU critical section, thus it is most
probably this should not pose a problem as the database uses call_rcu
memory reclamation, but this it is still safer to acquire the reference
before releasing the node.
The "raw" version of the header was used for the noqname and the closest
proofs to save around 152 bytes of the dns_slabheader_t while bringing
an additional complexity. Remove the raw version of the dns_slabheader
API at the slight expense of having unused dns_slabheader_t data sitting
in front of the proofs.
when dns_rdataslab_fromrdataset() is run, in addition to
allocating space for a slab header, it also partially
initializes it, setting the type match rdataset->type and
rdataset->covers, the trust to rdataset->trust, and the TTL to
rdataset->ttl.
there are now two functions for creating an rdataslab from an
rdataset: dns_rdataslab_fromrdataset() creates a full slab (including
space for a slab header), and dns_rdataslab_raw_fromrdataset() creates
a raw slab.
- there are now two functions for getting rdataslab size:
dns_rdataslab_size() is for full slabs and dns_rdataslab_sizeraw()
for raw slabs. there is no longer a need for a reservelen parameter.
- dns_rdataslab_count() also no longer takes a reservelen parameter.
(currently it's never used for raw slabs, so there is no _countraw()
function.)
- dns_rdataslab_rdatasize() has been removed, because
dns_rdataslab_sizeraw() can do the same thing.
- dns_rdataslab_merge() and dns_rdataslab_subtract() both take
slabheader parameters instead of character buffers, and the
reservelen parameter has been removed.
if both rdataslabs being compared have zero length, return true.
also, since these functions are only ever called on slabheaders
with sizeof(dns_slabheader_t) as the reserve length, we can
simplify the API: remove the reservelen argument, and pass the
slabs as type dns_slabheader_t * instead of unsigned char *.
Database versions are not used in cache databases. Some places in
qpcache.c required the version argument to be NULL; others marked it
as UNUSED. Unify all cases to require version to be NULL.
Add new related_headers() function that simplifies the code
flow in qpcache_findrdataset(). Also use check_stale_header() function
to remove code duplication.
Add a helper function both_headers() that unifies the slabheader
matching for simple type: it returns true when both the type and
the matching RRSIG have been found.