mir/bind - bind - Mike's Git repositories

mir/bind

mirror of https://gitlab.isc.org/isc-projects/bind9 synced 2025-08-30 05:57:52 +00:00

Author	SHA1	Message	Date
Alessio Podda	d7064c9b88	Tune min and max chunk size Before implementing adaptive chunk sizing, it was necessary to ensure that a chunk could hold up to 48 twigs, but the new logic will size-up new chunks to ensure that the current allocation can succeed. We exploit the new logic in two ways: - We make the minimum chunk size smaller than the old limit of 2^6, reducing memory consumption. - We make the maximum chunk size larger, as it has been observed that it improves resolver performance.	2025-05-22 15:19:48 -07:00
alessio	70b1777d8a	Adaptive memory allocation strategy for qp-tries qp-tries allocate their nodes (twigs) in chunks to reduce allocator pressure and improve memory locality. The choice of chunk size presents a tradeoff: larger chunks benefit qp-tries with many values (as seen in large zones and resolvers) but waste memory in smaller use cases. Previously, our fixed chunk size of 2^10 twigs meant that even an empty qp-trie would consume 12KB of memory, while reducing this size would negatively impact resolver performance. This commit implements an adaptive chunking strategy that: - Tracks the size of the most recently allocated chunk. - Doubles the chunk size for each new allocation until reaching a predefined maximum. This approach effectively balances memory efficiency for small tries while maintaining the performance benefits of larger chunk sizes for bigger data structures. This commit also splits the callback freeing qpmultis into two phases, one that frees the underlying qptree, and one that reclaims the qpmulti memory. In order to prevent races between the qpmulti destructor and chunk garbage collection jobs, the second phase is protected by reference counting.	2025-05-22 15:19:27 -07:00
Ondřej Surý	f5c204ac3e	Move the library init and shutdown to executables Instead of relying on unreliable order of execution of the library constructors and destructors, move them to individual binaries. The advantage is that the execution time and order will remain constant and will not depend on the dynamic load dependency solver. This requires more work, but that was mitigated by a simple requirement, any executable using libisc and libdns, must include <isc/lib.h> and <dns/lib.h> respectively (in this particular order). In turn, these two headers must not be included from within any library as they contain inlined functions marked with constructor/destructor attributes.	2025-02-22 16:19:00 +01:00
Ondřej Surý	0258850f20	Remove redundant parentheses from the return statement	2024-11-19 12:27:22 +01:00
Evan Hunt	6231fd66af	rename QP-related types to use standard BIND nomenclature changed type names in QP trie code to match the usual convention: - qp_node_t -> dns_qpnode_t - qp_ref_t -> dns_qpref_t - qp_shift_t -> dns_qpshift_t - qp_weight_t -> dns_qpweight_t - qp_chunk_t -> dns_qpchunk_t - qp_cell_t -> dns_qpcell_t	2023-09-28 00:32:39 -07:00
Evan Hunt	7f0242b8c7	tidy the helper functions for retrieving twigs - the helper functions for accessing twigs beneath a branch (branch_twig_pos(), branch_twig_ptr(), etc) were somewhat confusing to read, since several of them were implemented by calling other helper functions. they now all show what they're really doing. - branch_twigs_vector() has been renamed to simply branch_twigs(). - revised some unrelated comments in qp_p.h for clarity.	2023-09-28 00:30:47 -07:00
Evan Hunt	7f766ba7c4	add a node chain traversal mechanism dns_qp_findname_ancestor() now takes an optional 'chain' parameter; if set, the dns_qpchain object it points to will be updated with an array of pointers to the populated nodes between the tree root and the requested name. the number of nodes in the chain can then be accessed using dns_qpchain_length() and the individual nodes using dns_qpchain_node().	2023-09-28 00:30:43 -07:00
Tony Finch	c319ccd4c9	Fixes for liburcu-qsbr Move registration and deregistration of the main thread from `isc_loopmgr_run()` into `isc__initialize()` / `isc__shutdown()`: liburcu-qsbr fails an assertion if we try to use it from an unregistered thread, and we need to be able to use it when the event loops are not running. Use `rcu_assign_pointer()` and `rcu_dereference()` in qp-trie transactions so that they properly mark threads as online. The RCU-protected pointer is no longer declared atomic because liburcu does not (yet) use standard C atomics. Fix the definition of `isc_qsbr_rcu_dereference()` to return the referenced value, and to call the right function inside liburcu. Change the thread sanitizer suppressions to match any variant of `rcu_*_barrier()`	2023-05-15 20:49:42 +00:00
Tony Finch	6217e434b5	Refactor the core qp-trie code to use liburcu A `dns_qmpulti_t` no longer needs to know about its loopmgr. We no longer keep a linked list of `dns_qpmulti_t` that have reclamation work, and we no longer mark chunks with the phase in which they are to be reclaimed. Instead, empty chunks are listed in an array in a `qp_rcu_t`, which is passed to call_rcu().	2023-05-12 20:48:31 +01:00
Tony Finch	b3e35fd120	A few qp-trie cleanups Revert refcount debug tracing (commit a8b29f0365), there are better ways to do it. Use the dns_qpmethods_t typedef where appropriate. Some stylistic improvements.	2023-04-05 12:35:04 +01:00
Tony Finch	8a3a216f40	Support for iterating over the leaves in a qp-trie The iterator object records a path through the trie, in a similar manner to the existing dns_rbtnodechain.	2023-04-05 12:35:04 +01:00
Tony Finch	3c333d02a0	More dns_qpkey_t safety checks My original idea had been that the core qp-trie code would be mostly independent of the storage for keys, so I did not make it check at run time that key lengths are sensible. However, the qp-trie search routines need to get keys out of leaf objects, for which they provide storage on the stack, which is particularly dangerous for unchecked buffer overflows. So this change checks that key lengths are in bounds at the API boundary between the qp-trie code and the rest of BIND, and there is no more pretence that keys might be longer.	2023-04-03 15:10:47 +00:00
Tony Finch	0858514ae8	Improve qp-trie compaction in write transactions In general, it's better to do one thorough compaction when a batch of work is complete, which is the way that `update` transactions work. Conversely, `write` transactions are designed so that lots of little transactions are not too inefficient, but they need explicit compaction. This changes `dns_qp_compact()` so that it is easier to compact any time that makes sense, if there isn't a better way to schedule compaction. And `dns_qpmulti_commit()` only recycles garbage when there is enough to make it worthwhile.	2023-02-27 13:47:57 +00:00
Tony Finch	a8b29f0365	Improve qp-trie refcount debugging Add some qp-trie tracing macros which can be enabled by a developer. These print a message when a leaf is attached or detached, indicating which part of the qp-trie implementation did so. The refcount methods must now return the refcount value so it can be printed by the trace macros.	2023-02-27 13:47:57 +00:00
Tony Finch	4b5ec07bb7	Refactor qp-trie to use QSBR The first working multi-threaded qp-trie was stuck with an unpleasant trade-off: * Use `isc_rwlock`, which has acceptable write performance, but terrible read scalability because the qp-trie made all accesses through a single lock. * Use `liburcu`, which has great read scalability, but terrible write performance, because I was relying on `rcu_synchronize()` which is rather slow. And `liburcu` is LGPL. To get the best of both worlds, we need our own scalable read side, which we now have with `isc_qsbr`. And we need to modify the write side so that it is not blocked by readers. Better write performance requires an async cleanup function like `call_rcu()`, instead of the blocking `rcu_synchronize()`. (There is no blocking cleanup in `isc_qsbr`, because I have concluded that it would be an attractive nuisance.) Until now, all my multithreading qp-trie designs have been based around two versions, read-only and mutable. This is too few to work with asynchronous cleanup. The bare minimum (as in epoch based reclamation) is three, but it makes more sense to support an arbitrary number. Doing multi-version support "properly" makes fewer assumptions about how safe memory reclamation works, and it makes snapshots and rollbacks simpler. To avoid making the memory management even more complicated, I have introduced a new kind of "packed reader node" to anchor the root of a version of the trie. This is simpler because it re-uses the existing chunk lifetime logic - see the discussion under "packed reader nodes" in `qp_p.h`. I have also made the chunk lifetime logic simpler. The idea of a "generation" is gone; instead, chunks are either mutable or immutable. And the QSBR phase number is used to indicate when a chunk can be reclaimed. Instead of the `shared_base` flag (which was basically a one-bit reference count, with a two version limit) the base array now has a refcount, which replaces the confusing ad-hoc lifetime logic with something more familiar and systematic.	2023-02-27 13:47:55 +00:00
Tony Finch	549854f63b	Some minor qp-trie improvements Adjust the dns_qp_memusage() and dns_qp_compact() functions to be more informative and flexible about handling fragmentation. Avoid wasting space in runt chunks. Switch from twigs_mutable() to cells_immutable() because that is the sense we usually want. Drop the redundant evacuate() function and rename evacuate_twigs() to evacuate(). Move some chunk test functions closer to their point of use. Clarify compact_recursive(). Some small cleanups to comments. Use isc_time_monotonic() for qp-trie timing stats. Use #define constants to control debug logging. Set up DNS name label offsets in dns_qpkey_fromname() so it is easier to use in cases where the name is not fully hydrated.	2023-02-27 13:47:25 +00:00
Tony Finch	4b09c9a6ae	qp-trie naming improvements Adjust to typename_operation style s/VALID_QP/QP_VALID/g s/QP_VALIDMULTI/QPMULTI_VALID/g Improved greppability s/\bctx\b/uctx/g Less cluttered logging s/QP_TRACE/TRACE/g s/QP_LOG_STATS/LOG_STATS/g	2023-02-27 13:47:25 +00:00
Tony Finch	6b9ddbd1ce	Add a qp-trie data structure A qp-trie is a kind of radix tree that is particularly well-suited to DNS servers. I invented the qp-trie in 2015, based on Dan Bernstein's crit-bit trees and Phil Bagwell's HAMT. https://dotat.at/prog/qp/ This code incorporates some new ideas that I prototyped using NLnet Labs NSD in 2020 (optimizations for DNS names as keys) and 2021 (custom allocator and garbage collector). https://dotat.at/cgi/git/nsd.git The BIND version of my qp-trie code has a number of improvements compared to the prototype developed for NSD. * The main omission in the prototype was the very sketchy outline of how locking might work. Now the locking has been implemented, using a reader/writer lock and a mutex. However, it is designed to benefit from liburcu if that is available. * The prototype was designed for two-version concurrency, one version for readers and one for the writer. The new code supports multiversion concurrency, to provide a basis for BIND's dbversion machinery, so that updates are not blocked by long-running zone transfers. * There are now two kinds of transaction that modify the trie: an `update` aims to support many very small zones without wasting memory; a `write` avoids unnecessary allocation to help the performance of many small changes to the cache. * There is also a single-threaded interface for situations where concurrent access is not necessary. * The API makes better use of types to make it more clear which operations are permitted when. * The lookup table used to convert a DNS name to a qp-trie key is now initialized by a run-time constructor instead of a programmer using copy-and-paste. Key conversion is more flexible, so the qp-trie can be used with keys other than DNS names. * There has been much refactoring and re-arranging things to improve the terminology and order of presentation in the code, and the internal documentation has been moved from a comment into a file of its own. Some of the required functionality has been stripped out, to be brought back later after the basics are known to work. * Garbage collector performance statistics are missing. * Fancy searches are missing, such as longest match and nearest match. * Iteration is missing. * Search for update is missing, for cases where the caller needs to know if the value object is mutable or not.	2023-02-27 13:47:25 +00:00

18 Commits