2
0
mirror of https://gitlab.isc.org/isc-projects/bind9 synced 2025-08-28 13:08:06 +00:00

81 Commits

Author SHA1 Message Date
Matthijs Mekking
c5a14f263f Add namespace to new_qp(c|z)node
Is there a time when new_qp(c|z)node() would not be followed by
assignment of the namespace? No, so let's add the assignment to the
function that creates the node.
2025-07-10 13:52:59 +00:00
Matthijs Mekking
df6763fd2a Rename DNS_DB_NSEC_ constants to DNS_DBNAMESPACE_
Naming is hard exercise.
2025-07-10 13:52:59 +00:00
Matthijs Mekking
a7021a3a51 Rename dns_qp_lookup2 back to dns_qp_lookup
Now that we have to code working, rename 'dns_qp_lookup2' back to
'dns_qp_lookup' and adjust all remaining 'dns_qp_lookup' occurrences
to take a space=0 parameter.
2025-07-10 13:52:59 +00:00
Matthijs Mekking
e052e14b40 Change denial type to enum
For now we only allow DNS_DB_NSEC_* values so it makes sense to change
the type to an enum.

Rename 'denial' to the more intuitive 'space', indicating the namespace
of the keyvalue pair.
2025-07-10 13:52:59 +00:00
Matthijs Mekking
16a1c5a623 Prepend qpkey with denial byte
In preparation to merge the three qp tries (tree, nsec, nsec3) into
one, add the piece of information into the qpkey. This is the most
significant bit of information, so prepend the denial type to the qpkey.

This means we need to pass on the denial type when constructing the
qpkey from a name, or doing a lookup.

Reuse the the DNS_DB_NSEC_* values. Most qp tries in the code we just
pass on 0 (nta, rpz, zt, etc.), because there is no need for denial of
existence, but for qpzone and qpcache we must pass the right value.

Change the code, so that node->nsec no longer can have the value
DNS_DB_NSEC_HAS_NSEC, instead track this in a new attribute 'havensec'.

Since we use node->nsec to convert names to keys, the value MUST be set
before inserting the node into the qp-trie.

Update the fuzzing and unit tests accordingly. This only adds a few
extra test cases, more are needed.

In the qp_test.c we can remove test code for empty keys as this is
no longer possible.
2025-07-10 13:52:59 +00:00
Petr Špaček
750d8a61b6 Convert DNS_RDATASETATTR_ bitfield manipulation to struct of bools
RRset ordering is now an enum inside struct rdataset attributes. This
was done to keep size to of the structure to its original value before
this MR.

I expect zero performance impact but it should be easier to deal with
attributes in debuggers and language servers.
2025-07-10 11:17:19 +02:00
Alessio Podda
ef95806e05 Change QP and qpcache logging from DEBUG(1) to DEBUG(3)
Per pspacek, currently qp and qpcache logs are too verbose and enabled at a
level too low compared to how often the logging is useful.

This commit increases the logging level, while keeping it configurable
via a define.
2025-06-25 14:37:01 +02:00
alessio
70b1777d8a Adaptive memory allocation strategy for qp-tries
qp-tries allocate their nodes (twigs) in chunks to reduce allocator
pressure and improve memory locality. The choice of chunk size presents
a tradeoff: larger chunks benefit qp-tries with many values (as seen
in large zones and resolvers) but waste memory in smaller use cases.

Previously, our fixed chunk size of 2^10 twigs meant that even an
empty qp-trie would consume 12KB of memory, while reducing this size
would negatively impact resolver performance.

This commit implements an adaptive chunking strategy that:
 - Tracks the size of the most recently allocated chunk.
 - Doubles the chunk size for each new allocation until reaching a
   predefined maximum.

This approach effectively balances memory efficiency for small tries
while maintaining the performance benefits of larger chunk sizes for
bigger data structures.

This commit also splits the callback freeing qpmultis into two
phases, one that frees the underlying qptree, and one that reclaims
the qpmulti memory. In order to prevent races between the qpmulti
destructor and chunk garbage collection jobs, the second phase is
protected by reference counting.
2025-05-22 15:19:27 -07:00
Aram Sargsyan
e1a415b412 Fix a date race in qpcache_addrdataset()
The 'qpnode->nsec' structure member isn't protected by a lock and
there's a data race between the reading and writing parts in the
qpcache_addrdataset() function. Use a node read lock for accessing
'qpnode->nsec' in qpcache_addrdataset(). Add an additional
'qpnode->nsec != DNS_DB_NSEC_HAS_NSEC' check under a write lock
to be sure that no other competing thread changed it in the time
when the read lock is unlocked and a write lock is not acquired
yet.
2025-04-23 13:02:43 +00:00
Ondřej Surý
6ed821beb4
Reduce QPDB_VIRTUAL to 10 seconds
The *DB_VIRTUAL value was introduced to allow the clients (presumably
ns_clients) that has been running for some time to access the cached
data that was valid at the time of its inception.  The default value
of 5 minutes is way longer than longevity of the ns_client object as
the resolver will give up after 2 minutes.

Reduce the value to 10 seconds to accomodate to honour the original
more closely, but still allow some leeway for clients that started some
time in the past.

Our measurements show that even setting this value to 0 has no
statistically significant effect, thus the value of 10 seconds should be
on the safe side.
2025-04-16 11:21:38 +02:00
alessio
4017a40b1d Remove zero initialization of large buffers
Profiles show that an high amount of CPU time spent in memset.
By removing zero initalization of certain large buffers we improve
performance in certain authoritative workloads.
2025-04-02 16:24:31 +02:00
Ondřej Surý
1233dc8a61 Add isc_sieve unit implementing SIEVE-LRU algorithm
This is the core implementation of the SIEVE algorithm described in the
following paper:

  Zhang, Yazhuo, Juncheng Yang, Yao Yue, Ymir Vigfusson, and K V
  Rashmi. “SIEVE Is Simpler than LRU: An Efficient Turn-Key Eviction
  Algorithm for Web Caches,” n.d.. available online from
  https://junchengyang.com/publication/nsdi24-SIEVE.pdf
2025-03-26 15:36:33 -07:00
Ondřej Surý
e8a1949566
Remove lock upgrading from the hot path in the cache
In QPcache, there were two places that tried to upgrade the lock.  In
clean_stale_header(), the code would try to upgrade the lock and cleanup
the header, and in qpzonode_release(), the tree lock would be optionally
upgraded, so we can cleanup the node directly if empty.  These
optimizations are not needed and they have no effect on the performance.
2025-03-25 10:57:19 +01:00
Ondřej Surý
3ef9b09620
Fix invalid cache-line padding for qpcache buckets
The isc_queue_t was missing in the calculation of the required
padding size inside the qpcache bucket structure.
2025-03-25 10:56:21 +01:00
Evan Hunt
341b962665 move dns_zonekey_iszonekey() to dns_dnssec module
dns_zonekey_iszonekey() was the only function defined in the
dns_zonekey module, and was only called from one place. it
makes more sense to group this with dns_dnssec functions.
2025-03-20 18:22:58 +00:00
Evan Hunt
606d30796e use new dns_rdatatype classification functions
modify code to use dns_rdatatype_ismulti(), dns_rdatatype_issig(),
dns_rdatatype_isaddr(), and dns_rdatatype_isalias() where applicable.
2025-03-15 00:27:54 +00:00
Ondřej Surý
1e4695510a
Revert "fix: dev: Delete dead nodes when committing a new version"
This reverts commit 67255da4b376f65138b299dcd5eb6a3b7f9735a9, reversing
changes made to 74c9ff384e695d1b27fa365d1fee84576f869d4c.
2025-03-05 17:46:54 +01:00
Ondřej Surý
303c20caf8
Fix the foundname vs dcname madness in qpcache_findzonecut()
The qpcache_findzonecut() accepts two "foundnames": 'foundname' and
'dcname' could be NULL.  Originally, when 'dcname' would be NULL, the
'dcname' would be set to 'foundname'.  Then code like this was present:

    result = find_deepest_zonecut(&search, node, nodep, foundname,
                                  rdataset,
                                  sigrdataset DNS__DB_FLARG_PASS);
    dns_name_copy(foundname, dcname);

Which basically means that we are copying the .ndata over itself for no
apparent reason.  Cleanup the dcname vs foundname usage.

Co-authored-by: Evan Hunt <each@isc.org>
Co-authored-by: Ondřej Surý <ondrej@isc.org>
2025-03-05 07:49:46 +01:00
Ondřej Surý
08e966df82
Remove offsets from the dns_name and dns_fixedname structures
The offsets were meant to speed-up the repeated dns_name operations, but
it was experimentally proven that there's actually no real-world
benefit.  Remove the offsets and labels fields from the dns_name and the
static offsets fields to save 128 bytes from the fixedname in favor of
calculating labels and offsets only when needed.
2025-02-25 12:17:34 +01:00
Ondřej Surý
d1ef6a93c1
Acquire the database reference before possibly last node release
Acquire the database refernce in the detachnode() to prevent the last
reference to be release while the NODE_LOCK being locked.  The NODE_LOCK
is locked/unlocked inside the RCU critical section, thus it is most
probably this should not pose a problem as the database uses call_rcu
memory reclamation, but this it is still safer to acquire the reference
before releasing the node.
2025-02-24 20:05:56 +01:00
Ondřej Surý
2fc32c105d Remove the "raw" version of the dns_slabheader API
The "raw" version of the header was used for the noqname and the closest
proofs to save around 152 bytes of the dns_slabheader_t while bringing
an additional complexity.  Remove the raw version of the dns_slabheader
API at the slight expense of having unused dns_slabheader_t data sitting
in front of the proofs.
2025-02-19 15:00:15 -08:00
Evan Hunt
82edec67a5 initialize header in dns_rdataslab_fromrdataset()
when dns_rdataslab_fromrdataset() is run, in addition to
allocating space for a slab header, it also partially
initializes it, setting the type match rdataset->type and
rdataset->covers, the trust to rdataset->trust, and the TTL to
rdataset->ttl.
2025-02-19 14:58:32 -08:00
Evan Hunt
b4bde9bef4 clarify dns_rdataslab_fromrdataset()
there are now two functions for creating an rdataslab from an
rdataset: dns_rdataslab_fromrdataset() creates a full slab (including
space for a slab header), and dns_rdataslab_raw_fromrdataset() creates
a raw slab.
2025-02-19 14:58:32 -08:00
Evan Hunt
6908d1f9be more rdataslab refactoring
- there are now two functions for getting rdataslab size:
  dns_rdataslab_size() is for full slabs and dns_rdataslab_sizeraw()
  for raw slabs. there is no longer a need for a reservelen parameter.
- dns_rdataslab_count() also no longer takes a reservelen parameter.
  (currently it's never used for raw slabs, so there is no _countraw()
  function.)
- dns_rdataslab_rdatasize() has been removed, because
  dns_rdataslab_sizeraw() can do the same thing.
- dns_rdataslab_merge() and dns_rdataslab_subtract() both take
  slabheader parameters instead of character buffers, and the
  reservelen parameter has been removed.
2025-02-19 14:58:32 -08:00
Evan Hunt
4601d4299a fix and simplify dns_rdataset_equal() and _equalx()
if both rdataslabs being compared have zero length, return true.

also, since these functions are only ever called on slabheaders
with sizeof(dns_slabheader_t) as the reserve length, we can
simplify the API: remove the reservelen argument, and pass the
slabs as type dns_slabheader_t * instead of unsigned char *.
2025-02-19 14:58:32 -08:00
Evan Hunt
e58ce19cf2 when committing a new qpzone version, delete dead nodes
if all data has been deleted from a node in the qpzone
database, delete the node too.
2025-02-18 14:22:38 -08:00
Ondřej Surý
ce9f6e68c3 Unify how we handle database version in the cache
Database versions are not used in cache databases. Some places in
qpcache.c required the version argument to be NULL; others marked it
as UNUSED. Unify all cases to require version to be NULL.
2025-02-18 20:15:00 +00:00
Ondřej Surý
2d53796e28 Clean up 'now' usage in the cache
Unify the way we handle the 'now' argument in the cache: when it's
set to zero by the caller, it is replaced with isc_stdtime_now().
2025-02-18 20:15:00 +00:00
Ondřej Surý
3b2fe808c4 Clean up the search part in qpcache_find()
Slightly refactor the header search in qpcache_find(), so the scope
level is reduced and the cname parts are logically grouped together.
2025-02-18 20:15:00 +00:00
Ondřej Surý
bfb219ac2d Refactor the search in qpcache_findrdataset()
Add new related_headers() function that simplifies the code
flow in qpcache_findrdataset().  Also use check_stale_header() function
to remove code duplication.
2025-02-18 20:15:00 +00:00
Ondřej Surý
cf66ba02a4 Refactor simple slabheader matching
Add a helper function both_headers() that unifies the slabheader
matching for simple type: it returns true when both the type and
the matching RRSIG have been found.
2025-02-18 20:15:00 +00:00
Ondřej Surý
4cd1dd8dd7 Add new helper maybe_update_headers() function
The new maybe_update_headers() function unifies the LRU updates to the
slabheaders that was scattered all over the place.  More calls to update
headers after bindrdatasets() were also added for completeness.
2025-02-18 20:15:00 +00:00
Ondřej Surý
4448f1adb2 Add bindrdatasets() function that binds both rdatasets
This removes code duplication between the dual bindrdataset() calls.  It
also unifies the handling as there were small differences between the
calls: one variant was checking for !NEGATIVE(found) condition and one
wasn't, and it is technically ok to do the check for all variants.
2025-02-18 20:15:00 +00:00
Ondřej Surý
53d9ef5bd0 Refactor check_stale_header() function
The check_stale_header() function now updates header_prev directly
so it doesn't have to be handled in the outer loop; it's always
set to the correct value of the previous header in the chain.
2025-02-18 20:15:00 +00:00
Evan Hunt
5281c708d3 clean up unnecessary code in qpcache
some code was left in the cache database implementation after
it was separated from the zone database, and can be cleaned up
and refactored now:

- the DNS_SLABHEADERATTR_IGNORE flag is never set in the cache
- support for loading the cache from was removed, but the add()
  function still had a 'loading' flag that's always false
- two different macros were used for checking the
  DNS_SLABHEADERATTR_NONEXISTENT flag - EXISTS() and NONEXISTENT().
  it's clearer to just use EXISTS().
- the cache doesn't support versions, so it isn't necessary to
  walk down the 'down' pointer chain when iterating through the
  cache or looking for a header to update.  'down' now only points
  to records that are deleted from the cache but have not yet been
  purged from memory. this allows us to simplify both the iterator
  and the add() function.
2025-02-18 20:15:00 +00:00
Ondřej Surý
1fa5219fdf
Rely on call_rcu() to destroy the qpzone outside of locks
Reduce the number of qpzone_ref() and qpzone_unref() calls in
qpzone_detachnode() by relying on the call_rcu to delay
the destruction of the lock buckets.
2025-02-04 21:37:46 +01:00
Ondřej Surý
c602d76c1f
Reduce false sharing in dns_qpcache
Instead of having many node_lock_count * sizeof(<member>) arrays, pack
all the members into a qpcache_bucket_t struct that is cacheline aligned
and have a single array of those.

Additionaly, make both the head and the tail of isc_queue_t padded, not
just the head, to prevent false sharing of the lock-free structure with
the lock that follows it.
2025-02-04 21:37:46 +01:00
Ondřej Surý
355fc48472
Print the expiration time of the stale records (not ancient)
In #1870, the expiration time of ANCIENT records were printed, but
actually the ancient records are very short lived, and the information
carries a little value.

Instead of printing the expiration of ANCIENT records, print the
expiration time of STALE records.
2025-02-03 15:47:06 +01:00
Ondřej Surý
60f6b88c63
Remove duplicate 'now' argument from find_coveringnsec()
The find_coveringnsec() was getting the 'now' from two sources -
search->now and separate now argument.  Things like this are ticking
bombs, remove the extra 'now' argument and use single source of 'now'.
2025-02-03 14:39:06 +01:00
Ondřej Surý
58179e6a19
Expand the usage of mark_ancient() helper functions
When the mark_ancient() helper function was introduced, couple of places
with duplicate (or almost duplicate) code was missed.  Move the
mark_ancient() function closer to the top of the file, and correctly use
it in places that mark the header as ANCIENT.
2025-02-03 14:39:06 +01:00
Ondřej Surý
cfee6aa565
Add better ZEROTTL handling in bindrdataset()
If we know that the header has ZEROTTL set, the server should never send
stale records for it and the TTL should never be anything else than 0.
The comment was already there, but the code was not matching the
comment.
2025-02-03 14:39:06 +01:00
Ondřej Surý
e07f5a4a5b
In dns_slabheader_t structure, change .ttl to .expire
The old name was misleading as it never meant time-to-live, e.g. number
of seconds from now when the header should expire.  The true meaning was
an expiration time e.g. now + ttl.  This was the original design bug
that caused the slip when we assigned header->ttl to rdataset->ttl.
Because the name was matching, nobody has questioned the correctness of
the code both during the MR review and during the numerous re-reviews
when we were searching for the cause of the 54 year TTL.
2025-02-03 14:39:06 +01:00
Ondřej Surý
1bbb57f81b
In cache, set rdataset TTL to 0 when the header is not active
When the header has been marked as ANCIENT, but the ttl hasn't been
reset (this happens in couple of places), the rdataset TTL would be
set to the header timestamp instead to a reasonable TTL value.

Since this header has been already expired (ANCIENT is set), set the
rdataset TTL to 0 and don't reuse this field to print the expiration
time when dumping the cache.  Instead of printing the time, we now
just print 'expired (awaiting cleanup'.
2025-02-03 14:39:06 +01:00
Evan Hunt
1f095b902c
fix the cache findzonecut implementation
the search for the deepest known zone cut in the cache could
improperly reject a node containing stale data, even if the
NS rdataset wasn't the data that was stale.

this change also improves the efficiency of the search by
stopping it when both NS and RRSIG(NS) have been found.
2025-02-02 18:43:50 +01:00
Evan Hunt
d4f791793e Clarify reference counting in QP databases
Change the names of the node reference counting functions
and add comments to make the mechanism easier to understand:

- newref() and decref() are now called qpcnode_acquire()/
  qpznode_acquire() and qpcnode_release()/qpznode_release()
  respectively; this reflects the fact that they modify both
  the internal and external reference counters for a node.

- qpcnode_newref() and qpznode_newref() are now called
  qpcnode_erefs_increment() and qpznode_erefs_increment(), and
  qpcnode_decref() and qpznode_decref() are now called
  qpcnode_erefs_decrement() and qpznode_erefs_decrement(),
  to reflect that they only increase and decrease the node's
  external reference counters, not internal.
2025-01-30 20:08:46 -08:00
Ondřej Surý
431513d8b3
Remove db_nodelock_t in favor of reference counted qpdb
This removes the db_nodelock_t structure and changes the node_locks
array to be composed only of isc_rwlock_t pointers.  The .reference
member has been moved to qpdb->references in addition to
common.references that's external to dns_db API users.  The .exiting
members has been completely removed as it has no use when the reference
counting is used correctly.
2025-01-30 16:43:02 +01:00
Ondřej Surý
36a26bfa1a
Remove origin_node from qpcache
The origin_node in qpcache was always NULL, so we can remove the
getoriginode() function and origin_node pointer as the
dns_db_getoriginnode() correctly returns ISC_R_NOTFOUND when the
function is not implemented.
2025-01-30 16:43:02 +01:00
Ondřej Surý
814b87da64
Refactor decref() in both qpcache.c and qpzone.c
Cleanup the pattern in the decref() functions in both qpcache.c and
qpzone.c, so it follows the similar patter as we already have in
newref() function.
2025-01-30 16:43:02 +01:00
Andoni Duarte Pintado
3a64b288c1 Merge tag 'v9.21.4' 2025-01-29 17:17:18 +01:00
JINMEI Tatuya
7f4471594d
Optimize database decref by avoiding locking with refs > 1
Previously, this function always acquires a node write lock if it
might need node cleanup in case the reference decrements to 0.  In
fact, the lock is unnecessary if the reference is larger than 1 and it
can be optimized as an "easy" case. This optimization could even be
"necessary". In some extreme cases, many worker threads could repeat
acquring and releasing the reference on the same node, resulting in
severe lock contention for nothing (as the ref wouldn't decrement to 0
in most cases). This change would prevent noticeable performance
drop like query timeout for such cases.

Co-authored-by: JINMEI Tatuya <jtatuya@infoblox.com>
Co-authored-by: Ondřej Surý <ondrej@isc.org>
2025-01-22 14:27:13 +01:00