Add a qp-trie data structure
A qp-trie is a kind of radix tree that is particularly well-suited to
DNS servers. I invented the qp-trie in 2015, based on Dan Bernstein's
crit-bit trees and Phil Bagwell's HAMT. https://dotat.at/prog/qp/
This code incorporates some new ideas that I prototyped using
NLnet Labs NSD in 2020 (optimizations for DNS names as keys)
and 2021 (custom allocator and garbage collector).
https://dotat.at/cgi/git/nsd.git
The BIND version of my qp-trie code has a number of improvements
compared to the prototype developed for NSD.
* The main omission in the prototype was the very sketchy outline of
how locking might work. Now the locking has been implemented,
using a reader/writer lock and a mutex. However, it is designed to
benefit from liburcu if that is available.
* The prototype was designed for two-version concurrency, one
version for readers and one for the writer. The new code supports
multiversion concurrency, to provide a basis for BIND's dbversion
machinery, so that updates are not blocked by long-running zone
transfers.
* There are now two kinds of transaction that modify the trie: an
`update` aims to support many very small zones without wasting
memory; a `write` avoids unnecessary allocation to help the
performance of many small changes to the cache.
* There is also a single-threaded interface for situations where
concurrent access is not necessary.
* The API makes better use of types to make it more clear which
operations are permitted when.
* The lookup table used to convert a DNS name to a qp-trie key is
now initialized by a run-time constructor instead of a programmer
using copy-and-paste. Key conversion is more flexible, so the
qp-trie can be used with keys other than DNS names.
* There has been much refactoring and re-arranging things to improve
the terminology and order of presentation in the code, and the
internal documentation has been moved from a comment into a file
of its own.
Some of the required functionality has been stripped out, to be
brought back later after the basics are known to work.
* Garbage collector performance statistics are missing.
* Fancy searches are missing, such as longest match and
nearest match.
* Iteration is missing.
* Search for update is missing, for cases where the caller needs to
know if the value object is mutable or not.
2022-05-09 14:31:35 +01:00
|
|
|
/*
|
|
|
|
* Copyright (C) Internet Systems Consortium, Inc. ("ISC")
|
|
|
|
*
|
|
|
|
* SPDX-License-Identifier: MPL-2.0
|
|
|
|
*
|
|
|
|
* This Source Code Form is subject to the terms of the Mozilla Public
|
|
|
|
* License, v. 2.0. If a copy of the MPL was not distributed with this
|
|
|
|
* file, you can obtain one at https://mozilla.org/MPL/2.0/.
|
|
|
|
*
|
|
|
|
* See the COPYRIGHT file distributed with this work for additional
|
|
|
|
* information regarding copyright ownership.
|
|
|
|
*/
|
|
|
|
|
|
|
|
/*
|
|
|
|
* For an overview, see doc/design/qp-trie.md
|
|
|
|
*/
|
|
|
|
|
|
|
|
#pragma once
|
|
|
|
|
|
|
|
/***********************************************************************
|
|
|
|
*
|
|
|
|
* interior node basics
|
|
|
|
*/
|
|
|
|
|
|
|
|
/*
|
|
|
|
* A qp-trie node can be a leaf or a branch. It consists of three 32-bit
|
|
|
|
* words into which the components are packed. They are used as a 64-bit
|
|
|
|
* word and a 32-bit word, but they are not declared like that to avoid
|
|
|
|
* unwanted padding, keeping the size down to 12 bytes. They are in native
|
|
|
|
* endian order so getting the 64-bit part should compile down to an
|
|
|
|
* unaligned load.
|
|
|
|
*
|
|
|
|
* In a branch the 64-bit word is described by the enum below. The 32-bit
|
|
|
|
* word is a reference to the packed sparse vector of "twigs", i.e. child
|
|
|
|
* nodes. A branch node has at least 2 and less than SHIFT_OFFSET twigs
|
|
|
|
* (see the enum below). The qp-trie update functions ensure that branches
|
|
|
|
* actually branch, i.e. branches cannot have only 1 child.
|
|
|
|
*
|
|
|
|
* The contents of each leaf are set by the trie's user. The 64-bit word
|
|
|
|
* contains a pointer value (which must be word-aligned), and the 32-bit
|
|
|
|
* word is an arbitrary integer value.
|
|
|
|
*/
|
|
|
|
typedef struct qp_node {
|
|
|
|
#if WORDS_BIGENDIAN
|
|
|
|
uint32_t bighi, biglo, small;
|
|
|
|
#else
|
|
|
|
uint32_t biglo, bighi, small;
|
|
|
|
#endif
|
|
|
|
} qp_node_t;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* A branch node contains a 64-bit word comprising the branch/leaf tag,
|
|
|
|
* the bitmap, and an offset into the key. It is called an "index word"
|
|
|
|
* because it describes how to access the twigs vector (think "database
|
|
|
|
* index"). The following enum sets up the bit positions of these parts.
|
|
|
|
*
|
|
|
|
* In a leaf, the same 64-bit word contains a pointer. The pointer
|
|
|
|
* must be word-aligned so that the branch/leaf tag bit is zero.
|
|
|
|
* This requirement is checked by the newleaf() constructor.
|
|
|
|
*
|
|
|
|
* The bitmap is just above the tag bit. The `bits_for_byte[]` table is
|
|
|
|
* used to fill in a key so that bit tests can work directly against the
|
|
|
|
* index word without superfluous masking or shifting; we don't need to
|
|
|
|
* mask out the bitmap before testing a bit, but we do need to mask the
|
|
|
|
* bitmap before calling popcount.
|
|
|
|
*
|
|
|
|
* The byte offset into the key is at the top of the word, so that it
|
|
|
|
* can be extracted with just a shift, with no masking needed.
|
|
|
|
*
|
|
|
|
* The names are SHIFT_thing because they are qp_shift_t values. (See
|
|
|
|
* below for the various `qp_*` type declarations.)
|
|
|
|
*
|
|
|
|
* These values are relatively fixed in practice; the symbolic names
|
|
|
|
* avoid mystery numbers in the code.
|
|
|
|
*/
|
|
|
|
enum {
|
|
|
|
SHIFT_BRANCH = 0, /* branch / leaf tag */
|
|
|
|
SHIFT_NOBYTE, /* label separator has no byte value */
|
|
|
|
SHIFT_BITMAP, /* many bits here */
|
|
|
|
SHIFT_OFFSET = 48, /* offset of byte in key */
|
|
|
|
};
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Value of the node type tag bit.
|
|
|
|
*
|
|
|
|
* It is defined this way to be explicit about where the value comes
|
|
|
|
* from, even though we know it is always the bottom bit.
|
|
|
|
*/
|
|
|
|
#define BRANCH_TAG (1ULL << SHIFT_BRANCH)
|
|
|
|
|
|
|
|
/***********************************************************************
|
|
|
|
*
|
|
|
|
* garbage collector tuning parameters
|
|
|
|
*/
|
|
|
|
|
|
|
|
/*
|
|
|
|
* A "cell" is a location that can contain a `qp_node_t`, and a "chunk"
|
|
|
|
* is a moderately large array of cells. A big trie can occupy
|
|
|
|
* multiple chunks. (Unlike other nodes, a trie's root node lives in
|
|
|
|
* its `struct dns_qp` instead of being allocated in a cell.)
|
|
|
|
*
|
|
|
|
* The qp-trie allocator hands out space for twigs vectors. Allocations are
|
|
|
|
* made sequentially from one of the chunks; this kind of "sequential
|
|
|
|
* allocator" is also known as a "bump allocator", so in `struct dns_qp`
|
|
|
|
* (see below) the allocation chunk is called `bump`.
|
|
|
|
*/
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Number of cells in a chunk is a power of 2, which must have space for
|
|
|
|
* a full twigs vector (48 wide). When testing, use a much smaller chunk
|
|
|
|
* size to make the allocator work harder.
|
|
|
|
*/
|
|
|
|
#ifdef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION
|
|
|
|
#define QP_CHUNK_LOG 7
|
|
|
|
#else
|
|
|
|
#define QP_CHUNK_LOG 10
|
|
|
|
#endif
|
|
|
|
|
|
|
|
STATIC_ASSERT(6 <= QP_CHUNK_LOG && QP_CHUNK_LOG <= 20,
|
|
|
|
"qp-trie chunk size is unreasonable");
|
|
|
|
|
|
|
|
#define QP_CHUNK_SIZE (1U << QP_CHUNK_LOG)
|
|
|
|
#define QP_CHUNK_BYTES (QP_CHUNK_SIZE * sizeof(qp_node_t))
|
|
|
|
|
|
|
|
/*
|
|
|
|
* A chunk needs to be compacted if it has fragmented this much.
|
|
|
|
* (12% overhead seems reasonable)
|
|
|
|
*/
|
|
|
|
#define QP_MAX_FREE (QP_CHUNK_SIZE / 8)
|
2023-01-06 19:10:19 +00:00
|
|
|
#define QP_MIN_USED (QP_CHUNK_SIZE - QP_MAX_FREE)
|
Add a qp-trie data structure
A qp-trie is a kind of radix tree that is particularly well-suited to
DNS servers. I invented the qp-trie in 2015, based on Dan Bernstein's
crit-bit trees and Phil Bagwell's HAMT. https://dotat.at/prog/qp/
This code incorporates some new ideas that I prototyped using
NLnet Labs NSD in 2020 (optimizations for DNS names as keys)
and 2021 (custom allocator and garbage collector).
https://dotat.at/cgi/git/nsd.git
The BIND version of my qp-trie code has a number of improvements
compared to the prototype developed for NSD.
* The main omission in the prototype was the very sketchy outline of
how locking might work. Now the locking has been implemented,
using a reader/writer lock and a mutex. However, it is designed to
benefit from liburcu if that is available.
* The prototype was designed for two-version concurrency, one
version for readers and one for the writer. The new code supports
multiversion concurrency, to provide a basis for BIND's dbversion
machinery, so that updates are not blocked by long-running zone
transfers.
* There are now two kinds of transaction that modify the trie: an
`update` aims to support many very small zones without wasting
memory; a `write` avoids unnecessary allocation to help the
performance of many small changes to the cache.
* There is also a single-threaded interface for situations where
concurrent access is not necessary.
* The API makes better use of types to make it more clear which
operations are permitted when.
* The lookup table used to convert a DNS name to a qp-trie key is
now initialized by a run-time constructor instead of a programmer
using copy-and-paste. Key conversion is more flexible, so the
qp-trie can be used with keys other than DNS names.
* There has been much refactoring and re-arranging things to improve
the terminology and order of presentation in the code, and the
internal documentation has been moved from a comment into a file
of its own.
Some of the required functionality has been stripped out, to be
brought back later after the basics are known to work.
* Garbage collector performance statistics are missing.
* Fancy searches are missing, such as longest match and
nearest match.
* Iteration is missing.
* Search for update is missing, for cases where the caller needs to
know if the value object is mutable or not.
2022-05-09 14:31:35 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Compact automatically when we pass this threshold: when there is a lot
|
|
|
|
* of free space in absolute terms, and when we have freed more than half
|
|
|
|
* of the space we allocated.
|
|
|
|
*
|
|
|
|
* The current compaction algorithm scans the whole trie, so it is important
|
|
|
|
* to scale the threshold based on the size of the trie to avoid quadratic
|
|
|
|
* behaviour. XXXFANF find an algorithm that scans less of the trie!
|
|
|
|
*
|
|
|
|
* During a modification transaction, when we copy-on-write some twigs we
|
|
|
|
* count the old copy as "free", because they will be when the transaction
|
|
|
|
* commits. But they cannot be recovered immediately so they are also
|
|
|
|
* counted as on hold, and discounted when we decide whether to compact.
|
|
|
|
*/
|
|
|
|
#define QP_MAX_GARBAGE(qp) \
|
|
|
|
(((qp)->free_count - (qp)->hold_count) > QP_CHUNK_SIZE * 4 && \
|
|
|
|
((qp)->free_count - (qp)->hold_count) > (qp)->used_count / 2)
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The chunk base and usage arrays are resized geometically and start off
|
|
|
|
* with two entries.
|
|
|
|
*/
|
|
|
|
#define GROWTH_FACTOR(size) ((size) + (size) / 2 + 2)
|
|
|
|
|
|
|
|
/***********************************************************************
|
|
|
|
*
|
|
|
|
* helper types
|
|
|
|
*/
|
|
|
|
|
|
|
|
/*
|
|
|
|
* C is not strict enough with its integer types for these typedefs to
|
|
|
|
* improve type safety, but it helps to have annotations saying what
|
|
|
|
* particular kind of number we are dealing with.
|
|
|
|
*/
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The number or position of a bit inside a word. (0..63)
|
|
|
|
*
|
|
|
|
* Note: A dns_qpkey_t is logically an array of qp_shift_t values, but it
|
|
|
|
* isn't declared that way because dns_qpkey_t is a public type whereas
|
|
|
|
* qp_shift_t is private.
|
2023-01-06 19:10:19 +00:00
|
|
|
*
|
|
|
|
* A dns_qpkey element key[off] must satisfy
|
|
|
|
*
|
|
|
|
* SHIFT_NOBYTE <= key[off] && key[off] < SHIFT_OFFSET
|
Add a qp-trie data structure
A qp-trie is a kind of radix tree that is particularly well-suited to
DNS servers. I invented the qp-trie in 2015, based on Dan Bernstein's
crit-bit trees and Phil Bagwell's HAMT. https://dotat.at/prog/qp/
This code incorporates some new ideas that I prototyped using
NLnet Labs NSD in 2020 (optimizations for DNS names as keys)
and 2021 (custom allocator and garbage collector).
https://dotat.at/cgi/git/nsd.git
The BIND version of my qp-trie code has a number of improvements
compared to the prototype developed for NSD.
* The main omission in the prototype was the very sketchy outline of
how locking might work. Now the locking has been implemented,
using a reader/writer lock and a mutex. However, it is designed to
benefit from liburcu if that is available.
* The prototype was designed for two-version concurrency, one
version for readers and one for the writer. The new code supports
multiversion concurrency, to provide a basis for BIND's dbversion
machinery, so that updates are not blocked by long-running zone
transfers.
* There are now two kinds of transaction that modify the trie: an
`update` aims to support many very small zones without wasting
memory; a `write` avoids unnecessary allocation to help the
performance of many small changes to the cache.
* There is also a single-threaded interface for situations where
concurrent access is not necessary.
* The API makes better use of types to make it more clear which
operations are permitted when.
* The lookup table used to convert a DNS name to a qp-trie key is
now initialized by a run-time constructor instead of a programmer
using copy-and-paste. Key conversion is more flexible, so the
qp-trie can be used with keys other than DNS names.
* There has been much refactoring and re-arranging things to improve
the terminology and order of presentation in the code, and the
internal documentation has been moved from a comment into a file
of its own.
Some of the required functionality has been stripped out, to be
brought back later after the basics are known to work.
* Garbage collector performance statistics are missing.
* Fancy searches are missing, such as longest match and
nearest match.
* Iteration is missing.
* Search for update is missing, for cases where the caller needs to
know if the value object is mutable or not.
2022-05-09 14:31:35 +01:00
|
|
|
*/
|
|
|
|
typedef uint8_t qp_shift_t;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The number of bits set in a word (as in Hamming weight or popcount)
|
|
|
|
* which is used for the position of a node in the packed sparse
|
|
|
|
* vector of twigs. (0..47) because our bitmap does not fill the word.
|
|
|
|
*/
|
|
|
|
typedef uint8_t qp_weight_t;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* A chunk number, i.e. an index into the chunk arrays.
|
|
|
|
*/
|
|
|
|
typedef uint32_t qp_chunk_t;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Cell offset within a chunk, or a count of cells. Each cell in a
|
|
|
|
* chunk can contain a node.
|
|
|
|
*/
|
|
|
|
typedef uint32_t qp_cell_t;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* A twig reference is used to refer to a twigs vector, which occupies a
|
|
|
|
* contiguous group of cells.
|
|
|
|
*/
|
|
|
|
typedef uint32_t qp_ref_t;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Constructors and accessors for qp_ref_t values, defined here to show
|
|
|
|
* how the qp_ref_t, qp_chunk_t, qp_cell_t types relate to each other
|
|
|
|
*/
|
|
|
|
|
|
|
|
static inline qp_ref_t
|
|
|
|
make_ref(qp_chunk_t chunk, qp_cell_t cell) {
|
|
|
|
return (QP_CHUNK_SIZE * chunk + cell);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline qp_chunk_t
|
|
|
|
ref_chunk(qp_ref_t ref) {
|
|
|
|
return (ref / QP_CHUNK_SIZE);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline qp_cell_t
|
|
|
|
ref_cell(qp_ref_t ref) {
|
|
|
|
return (ref % QP_CHUNK_SIZE);
|
|
|
|
}
|
|
|
|
|
|
|
|
/***********************************************************************
|
|
|
|
*
|
|
|
|
* main qp-trie structures
|
|
|
|
*/
|
|
|
|
|
|
|
|
#define QP_MAGIC ISC_MAGIC('t', 'r', 'i', 'e')
|
2023-01-06 18:25:34 +00:00
|
|
|
#define QP_VALID(qp) ISC_MAGIC_VALID(qp, QP_MAGIC)
|
Add a qp-trie data structure
A qp-trie is a kind of radix tree that is particularly well-suited to
DNS servers. I invented the qp-trie in 2015, based on Dan Bernstein's
crit-bit trees and Phil Bagwell's HAMT. https://dotat.at/prog/qp/
This code incorporates some new ideas that I prototyped using
NLnet Labs NSD in 2020 (optimizations for DNS names as keys)
and 2021 (custom allocator and garbage collector).
https://dotat.at/cgi/git/nsd.git
The BIND version of my qp-trie code has a number of improvements
compared to the prototype developed for NSD.
* The main omission in the prototype was the very sketchy outline of
how locking might work. Now the locking has been implemented,
using a reader/writer lock and a mutex. However, it is designed to
benefit from liburcu if that is available.
* The prototype was designed for two-version concurrency, one
version for readers and one for the writer. The new code supports
multiversion concurrency, to provide a basis for BIND's dbversion
machinery, so that updates are not blocked by long-running zone
transfers.
* There are now two kinds of transaction that modify the trie: an
`update` aims to support many very small zones without wasting
memory; a `write` avoids unnecessary allocation to help the
performance of many small changes to the cache.
* There is also a single-threaded interface for situations where
concurrent access is not necessary.
* The API makes better use of types to make it more clear which
operations are permitted when.
* The lookup table used to convert a DNS name to a qp-trie key is
now initialized by a run-time constructor instead of a programmer
using copy-and-paste. Key conversion is more flexible, so the
qp-trie can be used with keys other than DNS names.
* There has been much refactoring and re-arranging things to improve
the terminology and order of presentation in the code, and the
internal documentation has been moved from a comment into a file
of its own.
Some of the required functionality has been stripped out, to be
brought back later after the basics are known to work.
* Garbage collector performance statistics are missing.
* Fancy searches are missing, such as longest match and
nearest match.
* Iteration is missing.
* Search for update is missing, for cases where the caller needs to
know if the value object is mutable or not.
2022-05-09 14:31:35 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* This is annoying: C doesn't allow us to use a predeclared structure as
|
|
|
|
* an anonymous struct member, so we have to fart around. The feature we
|
|
|
|
* want is available in GCC and Clang with -fms-extensions, but a
|
|
|
|
* non-standard extension won't make these declarations neater if we must
|
|
|
|
* also have a standard alternative.
|
|
|
|
*/
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Lightweight read-only access to a qp-trie.
|
|
|
|
*
|
|
|
|
* Just the fields neded for the hot path. The `base` field points
|
|
|
|
* to an array containing pointers to the base of each chunk like
|
|
|
|
* `qp->base[chunk]` - see `refptr()` below.
|
|
|
|
*
|
|
|
|
* A `dns_qpread_t` has a lifetime that does not extend across multiple
|
|
|
|
* write transactions, so it can share a chunk `base` array belonging to
|
|
|
|
* the `dns_qpmulti_t` it came from.
|
|
|
|
*
|
|
|
|
* We're lucky with the layout on 64 bit systems: this is only 40 bytes,
|
|
|
|
* with no padding.
|
|
|
|
*/
|
|
|
|
#define DNS_QPREAD_COMMON \
|
|
|
|
uint32_t magic; \
|
|
|
|
qp_node_t root; \
|
|
|
|
qp_node_t **base; \
|
2023-01-06 19:10:19 +00:00
|
|
|
void *uctx; \
|
Add a qp-trie data structure
A qp-trie is a kind of radix tree that is particularly well-suited to
DNS servers. I invented the qp-trie in 2015, based on Dan Bernstein's
crit-bit trees and Phil Bagwell's HAMT. https://dotat.at/prog/qp/
This code incorporates some new ideas that I prototyped using
NLnet Labs NSD in 2020 (optimizations for DNS names as keys)
and 2021 (custom allocator and garbage collector).
https://dotat.at/cgi/git/nsd.git
The BIND version of my qp-trie code has a number of improvements
compared to the prototype developed for NSD.
* The main omission in the prototype was the very sketchy outline of
how locking might work. Now the locking has been implemented,
using a reader/writer lock and a mutex. However, it is designed to
benefit from liburcu if that is available.
* The prototype was designed for two-version concurrency, one
version for readers and one for the writer. The new code supports
multiversion concurrency, to provide a basis for BIND's dbversion
machinery, so that updates are not blocked by long-running zone
transfers.
* There are now two kinds of transaction that modify the trie: an
`update` aims to support many very small zones without wasting
memory; a `write` avoids unnecessary allocation to help the
performance of many small changes to the cache.
* There is also a single-threaded interface for situations where
concurrent access is not necessary.
* The API makes better use of types to make it more clear which
operations are permitted when.
* The lookup table used to convert a DNS name to a qp-trie key is
now initialized by a run-time constructor instead of a programmer
using copy-and-paste. Key conversion is more flexible, so the
qp-trie can be used with keys other than DNS names.
* There has been much refactoring and re-arranging things to improve
the terminology and order of presentation in the code, and the
internal documentation has been moved from a comment into a file
of its own.
Some of the required functionality has been stripped out, to be
brought back later after the basics are known to work.
* Garbage collector performance statistics are missing.
* Fancy searches are missing, such as longest match and
nearest match.
* Iteration is missing.
* Search for update is missing, for cases where the caller needs to
know if the value object is mutable or not.
2022-05-09 14:31:35 +01:00
|
|
|
const dns_qpmethods_t *methods
|
|
|
|
|
|
|
|
struct dns_qpread {
|
|
|
|
DNS_QPREAD_COMMON;
|
|
|
|
};
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Heavyweight read-only snapshots of a qp-trie.
|
|
|
|
*
|
|
|
|
* Unlike a lightweight `dns_qpread_t`, a snapshot can survive across
|
|
|
|
* multiple write transactions, any of which may need to expand the
|
|
|
|
* chunk `base` array. So a `dns_qpsnap_t` keeps its own copy of the
|
|
|
|
* array, which will always be equal to some prefix of the expanded
|
|
|
|
* arrays in the `dns_qpmulti_t` that it came from.
|
|
|
|
*
|
|
|
|
* The `dns_qpmulti_t` keeps a refcount of its snapshots, and while
|
|
|
|
* the refcount is non-zero, chunks are not freed or reused. When a
|
|
|
|
* `dns_qpsnap_t` is destroyed, if it decrements the refcount to zero,
|
|
|
|
* it can do any deferred cleanup.
|
|
|
|
*
|
|
|
|
* The generation number is used for tracing.
|
|
|
|
*/
|
|
|
|
struct dns_qpsnap {
|
|
|
|
DNS_QPREAD_COMMON;
|
|
|
|
uint32_t generation;
|
|
|
|
dns_qpmulti_t *whence;
|
|
|
|
qp_node_t *base_array[];
|
|
|
|
};
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Read-write access to a qp-trie requires extra fields to support the
|
|
|
|
* allocator and garbage collector.
|
|
|
|
*
|
|
|
|
* The chunk `base` and `usage` arrays are separate because the `usage`
|
|
|
|
* array is only needed for allocation, so it is kept separate from the
|
|
|
|
* data needed by the read-only hot path. The arrays have empty slots where
|
|
|
|
* new chunks can be placed, so `chunk_max` is the maximum number of chunks
|
|
|
|
* (until the arrays are resized).
|
|
|
|
*
|
|
|
|
* Bare instances of a `struct dns_qp` are used for stand-alone
|
|
|
|
* single-threaded tries. For multithreaded access, transactions alternate
|
|
|
|
* between the `phase` pair of dns_qp objects inside a dns_qpmulti.
|
|
|
|
*
|
|
|
|
* For multithreaded access, the `generation` counter allows us to know
|
|
|
|
* which chunks are writable or not: writable chunks were allocated in the
|
|
|
|
* current generation. For single-threaded access, the generation counter
|
|
|
|
* is always zero, so all chunks are considered to be writable.
|
|
|
|
*
|
|
|
|
* Allocations are made sequentially in the `bump` chunk. Lightweight write
|
|
|
|
* transactions can re-use the `bump` chunk, so its prefix before `fender`
|
|
|
|
* is immutable, and the rest is mutable even though its generation number
|
|
|
|
* does not match the current generation.
|
|
|
|
*
|
|
|
|
* To decide when to compact and reclaim space, QP_MAX_GARBAGE() examines
|
|
|
|
* the values of `used_count`, `free_count`, and `hold_count`. The
|
|
|
|
* `hold_count` tracks nodes that need to be retained while readers are
|
|
|
|
* using them; they are free but cannot be reclaimed until the transaction
|
|
|
|
* has committed, so the `hold_count` is discounted from QP_MAX_GARBAGE()
|
|
|
|
* during a transaction.
|
|
|
|
*
|
|
|
|
* There are some flags that alter the behaviour of write transactions.
|
|
|
|
*
|
|
|
|
* - The `transaction_mode` indicates whether the current transaction is a
|
|
|
|
* light write or a heavy update, or (between transactions) the previous
|
|
|
|
* transaction's mode, because the setup for the next transaction
|
|
|
|
* depends on how the previous one committed. The mode is set at the
|
|
|
|
* start of each transaction. It is QP_NONE in a single-threaded qp-trie
|
|
|
|
* to detect if part of a `dns_qpmulti_t` is passed to dns_qp_destroy().
|
|
|
|
*
|
|
|
|
* - The `compact_all` flag is used when every node in the trie should be
|
|
|
|
* copied. (Usually compation aims to avoid moving nodes out of
|
|
|
|
* unfragmented chunks.) It is used when compaction is explicitly
|
|
|
|
* requested via `dns_qp_compact()`, and as an emergency mechanism if
|
|
|
|
* normal compaction failed to clear the QP_MAX_GARBAGE() condition.
|
|
|
|
* (This emergency is a bug even tho we have a rescue mechanism.)
|
|
|
|
*
|
|
|
|
* - The `shared_arrays` flag indicates that the chunk `base` and `usage`
|
|
|
|
* arrays are shared by both `phase`s in this trie's `dns_qpmulti_t`.
|
|
|
|
* This allows us to delay allocating copies of the arrays during a
|
|
|
|
* write transaction, until we definitely need to resize them.
|
|
|
|
*
|
|
|
|
* - When built with fuzzing support, we can use mprotect() and munmap()
|
|
|
|
* to ensure that incorrect memory accesses cause fatal errors. The
|
|
|
|
* `write_protect` flag must be set straight after the `dns_qpmulti_t`
|
|
|
|
* is created, then left unchanged.
|
|
|
|
*
|
|
|
|
* Some of the dns_qp_t fields are only used for multithreaded transactions
|
|
|
|
* (marked [MT] below) but the same code paths are also used for single-
|
|
|
|
* threaded writes. To reduce the size of a dns_qp_t, these fields could
|
|
|
|
* perhaps be moved into the dns_qpmulti_t, but that would require some kind
|
|
|
|
* of conditional runtime downcast from dns_qp_t to dns_multi_t, which is
|
|
|
|
* likely to be ugly. It is probably best to keep things simple if most tries
|
|
|
|
* need multithreaded access (XXXFANF do they? e.g. when there are many auth
|
|
|
|
* zones),
|
|
|
|
*/
|
|
|
|
struct dns_qp {
|
|
|
|
DNS_QPREAD_COMMON;
|
|
|
|
isc_mem_t *mctx;
|
|
|
|
/*% array of per-chunk allocation counters */
|
|
|
|
struct {
|
|
|
|
/*% the allocation point, increases monotonically */
|
|
|
|
qp_cell_t used;
|
|
|
|
/*% count of nodes no longer needed, also monotonic */
|
|
|
|
qp_cell_t free;
|
|
|
|
/*% when was this chunk allocated? */
|
|
|
|
uint32_t generation;
|
|
|
|
} *usage;
|
|
|
|
/*% transaction counter [MT] */
|
|
|
|
uint32_t generation;
|
|
|
|
/*% number of slots in `chunk` and `usage` arrays */
|
|
|
|
qp_chunk_t chunk_max;
|
|
|
|
/*% which chunk is used for allocations */
|
|
|
|
qp_chunk_t bump;
|
2023-01-06 19:10:19 +00:00
|
|
|
/*% nodes in the `bump` chunk below `fender` are read only [MT] */
|
Add a qp-trie data structure
A qp-trie is a kind of radix tree that is particularly well-suited to
DNS servers. I invented the qp-trie in 2015, based on Dan Bernstein's
crit-bit trees and Phil Bagwell's HAMT. https://dotat.at/prog/qp/
This code incorporates some new ideas that I prototyped using
NLnet Labs NSD in 2020 (optimizations for DNS names as keys)
and 2021 (custom allocator and garbage collector).
https://dotat.at/cgi/git/nsd.git
The BIND version of my qp-trie code has a number of improvements
compared to the prototype developed for NSD.
* The main omission in the prototype was the very sketchy outline of
how locking might work. Now the locking has been implemented,
using a reader/writer lock and a mutex. However, it is designed to
benefit from liburcu if that is available.
* The prototype was designed for two-version concurrency, one
version for readers and one for the writer. The new code supports
multiversion concurrency, to provide a basis for BIND's dbversion
machinery, so that updates are not blocked by long-running zone
transfers.
* There are now two kinds of transaction that modify the trie: an
`update` aims to support many very small zones without wasting
memory; a `write` avoids unnecessary allocation to help the
performance of many small changes to the cache.
* There is also a single-threaded interface for situations where
concurrent access is not necessary.
* The API makes better use of types to make it more clear which
operations are permitted when.
* The lookup table used to convert a DNS name to a qp-trie key is
now initialized by a run-time constructor instead of a programmer
using copy-and-paste. Key conversion is more flexible, so the
qp-trie can be used with keys other than DNS names.
* There has been much refactoring and re-arranging things to improve
the terminology and order of presentation in the code, and the
internal documentation has been moved from a comment into a file
of its own.
Some of the required functionality has been stripped out, to be
brought back later after the basics are known to work.
* Garbage collector performance statistics are missing.
* Fancy searches are missing, such as longest match and
nearest match.
* Iteration is missing.
* Search for update is missing, for cases where the caller needs to
know if the value object is mutable or not.
2022-05-09 14:31:35 +01:00
|
|
|
qp_cell_t fender;
|
|
|
|
/*% number of leaf nodes */
|
|
|
|
qp_cell_t leaf_count;
|
|
|
|
/*% total of all usage[] counters */
|
|
|
|
qp_cell_t used_count, free_count;
|
|
|
|
/*% cells that cannot be recovered right now */
|
|
|
|
qp_cell_t hold_count;
|
|
|
|
/*% what kind of transaction was most recently started [MT] */
|
|
|
|
enum { QP_NONE, QP_WRITE, QP_UPDATE } transaction_mode : 2;
|
|
|
|
/*% compact the entire trie [MT] */
|
|
|
|
bool compact_all : 1;
|
|
|
|
/*% chunk arrays are shared with a readonly qp-trie [MT] */
|
|
|
|
bool shared_arrays : 1;
|
|
|
|
/*% optionally when compiled with fuzzing support [MT] */
|
|
|
|
bool write_protect : 1;
|
|
|
|
};
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Concurrent access to a qp-trie.
|
|
|
|
*
|
|
|
|
* The `read` pointer is used for read queries. It points to one of the
|
|
|
|
* `phase` elements. During a transaction, the other `phase` (see
|
|
|
|
* `write_phase()` below) is modified incrementally in copy-on-write
|
|
|
|
* style. On commit the `read` pointer is swapped to the altered phase.
|
|
|
|
*/
|
|
|
|
struct dns_qpmulti {
|
|
|
|
uint32_t magic;
|
|
|
|
/*% controls access to the `read` pointer and its target phase */
|
|
|
|
isc_rwlock_t rwlock;
|
|
|
|
/*% points to phase[r] and swaps on commit */
|
|
|
|
dns_qp_t *read;
|
|
|
|
/*% protects the snapshot counter and `write_phase()` */
|
|
|
|
isc_mutex_t mutex;
|
|
|
|
/*% so we know when old chunks are still shared */
|
|
|
|
unsigned int snapshots;
|
|
|
|
/*% one is read-only, one is mutable */
|
|
|
|
dns_qp_t phase[2];
|
|
|
|
};
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Get a pointer to the phase that isn't read-only.
|
|
|
|
*/
|
|
|
|
static inline dns_qp_t *
|
|
|
|
write_phase(dns_qpmulti_t *multi) {
|
|
|
|
bool read0 = multi->read == &multi->phase[0];
|
|
|
|
return (read0 ? &multi->phase[1] : &multi->phase[0]);
|
|
|
|
}
|
|
|
|
|
|
|
|
#define QPMULTI_MAGIC ISC_MAGIC('q', 'p', 'm', 'v')
|
2023-01-06 18:25:34 +00:00
|
|
|
#define QPMULTI_VALID(qp) ISC_MAGIC_VALID(qp, QPMULTI_MAGIC)
|
Add a qp-trie data structure
A qp-trie is a kind of radix tree that is particularly well-suited to
DNS servers. I invented the qp-trie in 2015, based on Dan Bernstein's
crit-bit trees and Phil Bagwell's HAMT. https://dotat.at/prog/qp/
This code incorporates some new ideas that I prototyped using
NLnet Labs NSD in 2020 (optimizations for DNS names as keys)
and 2021 (custom allocator and garbage collector).
https://dotat.at/cgi/git/nsd.git
The BIND version of my qp-trie code has a number of improvements
compared to the prototype developed for NSD.
* The main omission in the prototype was the very sketchy outline of
how locking might work. Now the locking has been implemented,
using a reader/writer lock and a mutex. However, it is designed to
benefit from liburcu if that is available.
* The prototype was designed for two-version concurrency, one
version for readers and one for the writer. The new code supports
multiversion concurrency, to provide a basis for BIND's dbversion
machinery, so that updates are not blocked by long-running zone
transfers.
* There are now two kinds of transaction that modify the trie: an
`update` aims to support many very small zones without wasting
memory; a `write` avoids unnecessary allocation to help the
performance of many small changes to the cache.
* There is also a single-threaded interface for situations where
concurrent access is not necessary.
* The API makes better use of types to make it more clear which
operations are permitted when.
* The lookup table used to convert a DNS name to a qp-trie key is
now initialized by a run-time constructor instead of a programmer
using copy-and-paste. Key conversion is more flexible, so the
qp-trie can be used with keys other than DNS names.
* There has been much refactoring and re-arranging things to improve
the terminology and order of presentation in the code, and the
internal documentation has been moved from a comment into a file
of its own.
Some of the required functionality has been stripped out, to be
brought back later after the basics are known to work.
* Garbage collector performance statistics are missing.
* Fancy searches are missing, such as longest match and
nearest match.
* Iteration is missing.
* Search for update is missing, for cases where the caller needs to
know if the value object is mutable or not.
2022-05-09 14:31:35 +01:00
|
|
|
|
|
|
|
/***********************************************************************
|
|
|
|
*
|
|
|
|
* interior node constructors and accessors
|
|
|
|
*/
|
|
|
|
|
|
|
|
/*
|
|
|
|
* See the comments under "interior node basics" above, which explain the
|
|
|
|
* layout of nodes as implemented by the following functions.
|
|
|
|
*/
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Get the 64-bit word of a node.
|
|
|
|
*/
|
|
|
|
static inline uint64_t
|
|
|
|
node64(qp_node_t *n) {
|
|
|
|
uint64_t lo = n->biglo;
|
|
|
|
uint64_t hi = n->bighi;
|
|
|
|
return (lo | (hi << 32));
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Get the 32-bit word of a node.
|
|
|
|
*/
|
|
|
|
static inline uint32_t
|
|
|
|
node32(qp_node_t *n) {
|
|
|
|
return (n->small);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Create a node from its parts
|
|
|
|
*/
|
|
|
|
static inline qp_node_t
|
|
|
|
make_node(uint64_t big, uint32_t small) {
|
|
|
|
return ((qp_node_t){
|
|
|
|
.biglo = (uint32_t)(big),
|
|
|
|
.bighi = (uint32_t)(big >> 32),
|
|
|
|
.small = small,
|
|
|
|
});
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Test a node's tag bit.
|
|
|
|
*/
|
|
|
|
static inline bool
|
|
|
|
is_branch(qp_node_t *n) {
|
|
|
|
return (n->biglo & BRANCH_TAG);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* leaf nodes *********************************************************/
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Get a leaf's pointer value. The double cast is to avoid a warning
|
|
|
|
* about mismatched pointer/integer sizes on 32 bit systems.
|
|
|
|
*/
|
|
|
|
static inline void *
|
|
|
|
leaf_pval(qp_node_t *n) {
|
|
|
|
return ((void *)(uintptr_t)node64(n));
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Get a leaf's integer value
|
|
|
|
*/
|
|
|
|
static inline uint32_t
|
|
|
|
leaf_ival(qp_node_t *n) {
|
|
|
|
return (node32(n));
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Create a leaf node from its parts
|
|
|
|
*/
|
|
|
|
static inline qp_node_t
|
|
|
|
make_leaf(const void *pval, uint32_t ival) {
|
|
|
|
qp_node_t leaf = make_node((uintptr_t)pval, ival);
|
|
|
|
REQUIRE(!is_branch(&leaf) && pval != NULL);
|
|
|
|
return (leaf);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* branch nodes *******************************************************/
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The following function names use plural `twigs` when they work on a
|
|
|
|
* branch's twigs vector as a whole, and singular `twig` when they work on
|
|
|
|
* a particular twig.
|
|
|
|
*/
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Get a branch node's index word
|
|
|
|
*/
|
|
|
|
static inline uint64_t
|
|
|
|
branch_index(qp_node_t *n) {
|
|
|
|
return (node64(n));
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Get a reference to a branch node's child twigs.
|
|
|
|
*/
|
|
|
|
static inline qp_ref_t
|
|
|
|
branch_twigs_ref(qp_node_t *n) {
|
|
|
|
return (node32(n));
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Bit positions in the bitmap come directly from the key. DNS names are
|
|
|
|
* converted to keys using the tables declared at the end of this file.
|
|
|
|
*/
|
|
|
|
static inline qp_shift_t
|
|
|
|
qpkey_bit(const dns_qpkey_t key, size_t len, size_t offset) {
|
|
|
|
if (offset < len) {
|
|
|
|
return (key[offset]);
|
|
|
|
} else {
|
|
|
|
return (SHIFT_NOBYTE);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Extract a branch node's offset field, used to index the key.
|
|
|
|
*/
|
|
|
|
static inline size_t
|
|
|
|
branch_key_offset(qp_node_t *n) {
|
|
|
|
return ((size_t)(branch_index(n) >> SHIFT_OFFSET));
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Which bit identifies the twig of this node for this key?
|
|
|
|
*/
|
|
|
|
static inline qp_shift_t
|
|
|
|
branch_keybit(qp_node_t *n, const dns_qpkey_t key, size_t len) {
|
|
|
|
return (qpkey_bit(key, len, branch_key_offset(n)));
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Convert a twig reference into a pointer.
|
|
|
|
*/
|
|
|
|
static inline qp_node_t *
|
|
|
|
ref_ptr(dns_qpreadable_t qpr, qp_ref_t ref) {
|
|
|
|
dns_qpread_t *qp = dns_qpreadable_cast(qpr);
|
|
|
|
return (qp->base[ref_chunk(ref)] + ref_cell(ref));
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Get a pointer to a branch node's twigs vector.
|
|
|
|
*/
|
|
|
|
static inline qp_node_t *
|
|
|
|
branch_twigs_vector(dns_qpreadable_t qpr, qp_node_t *n) {
|
2023-01-06 19:10:19 +00:00
|
|
|
return (ref_ptr(qpr, branch_twigs_ref(n)));
|
Add a qp-trie data structure
A qp-trie is a kind of radix tree that is particularly well-suited to
DNS servers. I invented the qp-trie in 2015, based on Dan Bernstein's
crit-bit trees and Phil Bagwell's HAMT. https://dotat.at/prog/qp/
This code incorporates some new ideas that I prototyped using
NLnet Labs NSD in 2020 (optimizations for DNS names as keys)
and 2021 (custom allocator and garbage collector).
https://dotat.at/cgi/git/nsd.git
The BIND version of my qp-trie code has a number of improvements
compared to the prototype developed for NSD.
* The main omission in the prototype was the very sketchy outline of
how locking might work. Now the locking has been implemented,
using a reader/writer lock and a mutex. However, it is designed to
benefit from liburcu if that is available.
* The prototype was designed for two-version concurrency, one
version for readers and one for the writer. The new code supports
multiversion concurrency, to provide a basis for BIND's dbversion
machinery, so that updates are not blocked by long-running zone
transfers.
* There are now two kinds of transaction that modify the trie: an
`update` aims to support many very small zones without wasting
memory; a `write` avoids unnecessary allocation to help the
performance of many small changes to the cache.
* There is also a single-threaded interface for situations where
concurrent access is not necessary.
* The API makes better use of types to make it more clear which
operations are permitted when.
* The lookup table used to convert a DNS name to a qp-trie key is
now initialized by a run-time constructor instead of a programmer
using copy-and-paste. Key conversion is more flexible, so the
qp-trie can be used with keys other than DNS names.
* There has been much refactoring and re-arranging things to improve
the terminology and order of presentation in the code, and the
internal documentation has been moved from a comment into a file
of its own.
Some of the required functionality has been stripped out, to be
brought back later after the basics are known to work.
* Garbage collector performance statistics are missing.
* Fancy searches are missing, such as longest match and
nearest match.
* Iteration is missing.
* Search for update is missing, for cases where the caller needs to
know if the value object is mutable or not.
2022-05-09 14:31:35 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Warm up the cache while calculating which twig we want.
|
|
|
|
*/
|
|
|
|
static inline void
|
|
|
|
prefetch_twigs(dns_qpreadable_t qpr, qp_node_t *n) {
|
|
|
|
__builtin_prefetch(branch_twigs_vector(qpr, n));
|
|
|
|
}
|
|
|
|
|
|
|
|
/***********************************************************************
|
|
|
|
*
|
|
|
|
* bitmap popcount shenanigans
|
|
|
|
*/
|
|
|
|
|
|
|
|
/*
|
|
|
|
* How many twigs appear in the vector before the one corresponding to the
|
|
|
|
* given bit? Calculated using popcount of part of the branch's bitmap.
|
|
|
|
*
|
|
|
|
* To calculate a mask that covers the lesser bits in the bitmap, we
|
|
|
|
* subtract 1 to set the bits, and subtract the branch tag because it
|
|
|
|
* is not part of the bitmap.
|
|
|
|
*/
|
|
|
|
static inline qp_weight_t
|
|
|
|
branch_twigs_before(qp_node_t *n, qp_shift_t bit) {
|
|
|
|
uint64_t mask = (1ULL << bit) - 1 - BRANCH_TAG;
|
|
|
|
uint64_t bmp = branch_index(n) & mask;
|
|
|
|
return ((qp_weight_t)__builtin_popcountll(bmp));
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* How many twigs does this node have?
|
|
|
|
*
|
|
|
|
* The offset is directly after the bitmap so the offset's lesser bits
|
|
|
|
* covers the whole bitmap, and the bitmap's weight is the number of twigs.
|
|
|
|
*/
|
|
|
|
static inline qp_weight_t
|
|
|
|
branch_twigs_size(qp_node_t *n) {
|
|
|
|
return (branch_twigs_before(n, SHIFT_OFFSET));
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Position of a twig within the packed sparse vector.
|
|
|
|
*/
|
|
|
|
static inline qp_weight_t
|
|
|
|
branch_twig_pos(qp_node_t *n, qp_shift_t bit) {
|
|
|
|
return (branch_twigs_before(n, bit));
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Get a pointer to a particular twig.
|
|
|
|
*/
|
|
|
|
static inline qp_node_t *
|
|
|
|
branch_twig_ptr(dns_qpreadable_t qpr, qp_node_t *n, qp_shift_t bit) {
|
|
|
|
return (branch_twigs_vector(qpr, n) + branch_twig_pos(n, bit));
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Is the twig identified by this bit present?
|
|
|
|
*/
|
|
|
|
static inline bool
|
|
|
|
branch_has_twig(qp_node_t *n, qp_shift_t bit) {
|
|
|
|
return (branch_index(n) & (1ULL << bit));
|
|
|
|
}
|
|
|
|
|
|
|
|
/* twig logistics *****************************************************/
|
|
|
|
|
|
|
|
static inline void
|
|
|
|
move_twigs(qp_node_t *to, qp_node_t *from, qp_weight_t size) {
|
|
|
|
memmove(to, from, size * sizeof(qp_node_t));
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void
|
|
|
|
zero_twigs(qp_node_t *twigs, qp_weight_t size) {
|
|
|
|
memset(twigs, 0, size * sizeof(qp_node_t));
|
|
|
|
}
|
|
|
|
|
|
|
|
/***********************************************************************
|
|
|
|
*
|
|
|
|
* method invocation helpers
|
|
|
|
*/
|
|
|
|
|
|
|
|
static inline void
|
|
|
|
attach_leaf(dns_qpreadable_t qpr, qp_node_t *n) {
|
|
|
|
dns_qpread_t *qp = dns_qpreadable_cast(qpr);
|
2023-01-06 18:25:34 +00:00
|
|
|
qp->methods->attach(qp->uctx, leaf_pval(n), leaf_ival(n));
|
Add a qp-trie data structure
A qp-trie is a kind of radix tree that is particularly well-suited to
DNS servers. I invented the qp-trie in 2015, based on Dan Bernstein's
crit-bit trees and Phil Bagwell's HAMT. https://dotat.at/prog/qp/
This code incorporates some new ideas that I prototyped using
NLnet Labs NSD in 2020 (optimizations for DNS names as keys)
and 2021 (custom allocator and garbage collector).
https://dotat.at/cgi/git/nsd.git
The BIND version of my qp-trie code has a number of improvements
compared to the prototype developed for NSD.
* The main omission in the prototype was the very sketchy outline of
how locking might work. Now the locking has been implemented,
using a reader/writer lock and a mutex. However, it is designed to
benefit from liburcu if that is available.
* The prototype was designed for two-version concurrency, one
version for readers and one for the writer. The new code supports
multiversion concurrency, to provide a basis for BIND's dbversion
machinery, so that updates are not blocked by long-running zone
transfers.
* There are now two kinds of transaction that modify the trie: an
`update` aims to support many very small zones without wasting
memory; a `write` avoids unnecessary allocation to help the
performance of many small changes to the cache.
* There is also a single-threaded interface for situations where
concurrent access is not necessary.
* The API makes better use of types to make it more clear which
operations are permitted when.
* The lookup table used to convert a DNS name to a qp-trie key is
now initialized by a run-time constructor instead of a programmer
using copy-and-paste. Key conversion is more flexible, so the
qp-trie can be used with keys other than DNS names.
* There has been much refactoring and re-arranging things to improve
the terminology and order of presentation in the code, and the
internal documentation has been moved from a comment into a file
of its own.
Some of the required functionality has been stripped out, to be
brought back later after the basics are known to work.
* Garbage collector performance statistics are missing.
* Fancy searches are missing, such as longest match and
nearest match.
* Iteration is missing.
* Search for update is missing, for cases where the caller needs to
know if the value object is mutable or not.
2022-05-09 14:31:35 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline void
|
|
|
|
detach_leaf(dns_qpreadable_t qpr, qp_node_t *n) {
|
|
|
|
dns_qpread_t *qp = dns_qpreadable_cast(qpr);
|
2023-01-06 18:25:34 +00:00
|
|
|
qp->methods->detach(qp->uctx, leaf_pval(n), leaf_ival(n));
|
Add a qp-trie data structure
A qp-trie is a kind of radix tree that is particularly well-suited to
DNS servers. I invented the qp-trie in 2015, based on Dan Bernstein's
crit-bit trees and Phil Bagwell's HAMT. https://dotat.at/prog/qp/
This code incorporates some new ideas that I prototyped using
NLnet Labs NSD in 2020 (optimizations for DNS names as keys)
and 2021 (custom allocator and garbage collector).
https://dotat.at/cgi/git/nsd.git
The BIND version of my qp-trie code has a number of improvements
compared to the prototype developed for NSD.
* The main omission in the prototype was the very sketchy outline of
how locking might work. Now the locking has been implemented,
using a reader/writer lock and a mutex. However, it is designed to
benefit from liburcu if that is available.
* The prototype was designed for two-version concurrency, one
version for readers and one for the writer. The new code supports
multiversion concurrency, to provide a basis for BIND's dbversion
machinery, so that updates are not blocked by long-running zone
transfers.
* There are now two kinds of transaction that modify the trie: an
`update` aims to support many very small zones without wasting
memory; a `write` avoids unnecessary allocation to help the
performance of many small changes to the cache.
* There is also a single-threaded interface for situations where
concurrent access is not necessary.
* The API makes better use of types to make it more clear which
operations are permitted when.
* The lookup table used to convert a DNS name to a qp-trie key is
now initialized by a run-time constructor instead of a programmer
using copy-and-paste. Key conversion is more flexible, so the
qp-trie can be used with keys other than DNS names.
* There has been much refactoring and re-arranging things to improve
the terminology and order of presentation in the code, and the
internal documentation has been moved from a comment into a file
of its own.
Some of the required functionality has been stripped out, to be
brought back later after the basics are known to work.
* Garbage collector performance statistics are missing.
* Fancy searches are missing, such as longest match and
nearest match.
* Iteration is missing.
* Search for update is missing, for cases where the caller needs to
know if the value object is mutable or not.
2022-05-09 14:31:35 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline size_t
|
|
|
|
leaf_qpkey(dns_qpreadable_t qpr, qp_node_t *n, dns_qpkey_t key) {
|
|
|
|
dns_qpread_t *qp = dns_qpreadable_cast(qpr);
|
2023-01-06 19:10:19 +00:00
|
|
|
return (qp->methods->makekey(key, qp->uctx, leaf_pval(n),
|
|
|
|
leaf_ival(n)));
|
Add a qp-trie data structure
A qp-trie is a kind of radix tree that is particularly well-suited to
DNS servers. I invented the qp-trie in 2015, based on Dan Bernstein's
crit-bit trees and Phil Bagwell's HAMT. https://dotat.at/prog/qp/
This code incorporates some new ideas that I prototyped using
NLnet Labs NSD in 2020 (optimizations for DNS names as keys)
and 2021 (custom allocator and garbage collector).
https://dotat.at/cgi/git/nsd.git
The BIND version of my qp-trie code has a number of improvements
compared to the prototype developed for NSD.
* The main omission in the prototype was the very sketchy outline of
how locking might work. Now the locking has been implemented,
using a reader/writer lock and a mutex. However, it is designed to
benefit from liburcu if that is available.
* The prototype was designed for two-version concurrency, one
version for readers and one for the writer. The new code supports
multiversion concurrency, to provide a basis for BIND's dbversion
machinery, so that updates are not blocked by long-running zone
transfers.
* There are now two kinds of transaction that modify the trie: an
`update` aims to support many very small zones without wasting
memory; a `write` avoids unnecessary allocation to help the
performance of many small changes to the cache.
* There is also a single-threaded interface for situations where
concurrent access is not necessary.
* The API makes better use of types to make it more clear which
operations are permitted when.
* The lookup table used to convert a DNS name to a qp-trie key is
now initialized by a run-time constructor instead of a programmer
using copy-and-paste. Key conversion is more flexible, so the
qp-trie can be used with keys other than DNS names.
* There has been much refactoring and re-arranging things to improve
the terminology and order of presentation in the code, and the
internal documentation has been moved from a comment into a file
of its own.
Some of the required functionality has been stripped out, to be
brought back later after the basics are known to work.
* Garbage collector performance statistics are missing.
* Fancy searches are missing, such as longest match and
nearest match.
* Iteration is missing.
* Search for update is missing, for cases where the caller needs to
know if the value object is mutable or not.
2022-05-09 14:31:35 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline char *
|
|
|
|
triename(dns_qpreadable_t qpr, char *buf, size_t size) {
|
|
|
|
dns_qpread_t *qp = dns_qpreadable_cast(qpr);
|
2023-01-06 18:25:34 +00:00
|
|
|
qp->methods->triename(qp->uctx, buf, size);
|
Add a qp-trie data structure
A qp-trie is a kind of radix tree that is particularly well-suited to
DNS servers. I invented the qp-trie in 2015, based on Dan Bernstein's
crit-bit trees and Phil Bagwell's HAMT. https://dotat.at/prog/qp/
This code incorporates some new ideas that I prototyped using
NLnet Labs NSD in 2020 (optimizations for DNS names as keys)
and 2021 (custom allocator and garbage collector).
https://dotat.at/cgi/git/nsd.git
The BIND version of my qp-trie code has a number of improvements
compared to the prototype developed for NSD.
* The main omission in the prototype was the very sketchy outline of
how locking might work. Now the locking has been implemented,
using a reader/writer lock and a mutex. However, it is designed to
benefit from liburcu if that is available.
* The prototype was designed for two-version concurrency, one
version for readers and one for the writer. The new code supports
multiversion concurrency, to provide a basis for BIND's dbversion
machinery, so that updates are not blocked by long-running zone
transfers.
* There are now two kinds of transaction that modify the trie: an
`update` aims to support many very small zones without wasting
memory; a `write` avoids unnecessary allocation to help the
performance of many small changes to the cache.
* There is also a single-threaded interface for situations where
concurrent access is not necessary.
* The API makes better use of types to make it more clear which
operations are permitted when.
* The lookup table used to convert a DNS name to a qp-trie key is
now initialized by a run-time constructor instead of a programmer
using copy-and-paste. Key conversion is more flexible, so the
qp-trie can be used with keys other than DNS names.
* There has been much refactoring and re-arranging things to improve
the terminology and order of presentation in the code, and the
internal documentation has been moved from a comment into a file
of its own.
Some of the required functionality has been stripped out, to be
brought back later after the basics are known to work.
* Garbage collector performance statistics are missing.
* Fancy searches are missing, such as longest match and
nearest match.
* Iteration is missing.
* Search for update is missing, for cases where the caller needs to
know if the value object is mutable or not.
2022-05-09 14:31:35 +01:00
|
|
|
return (buf);
|
|
|
|
}
|
|
|
|
|
|
|
|
#define TRIENAME(qp) \
|
|
|
|
triename(qp, (char[DNS_QP_TRIENAME_MAX]){}, DNS_QP_TRIENAME_MAX)
|
|
|
|
|
|
|
|
/***********************************************************************
|
|
|
|
*
|
|
|
|
* converting DNS names to trie keys
|
|
|
|
*/
|
|
|
|
|
|
|
|
/*
|
|
|
|
* This is a deliberate simplification of the hostname characters,
|
|
|
|
* because it doesn't matter much if we treat a few extra characters
|
|
|
|
* favourably: there is plenty of space in the index word for a
|
|
|
|
* slightly larger bitmap.
|
|
|
|
*/
|
|
|
|
static inline bool
|
|
|
|
qp_common_character(uint8_t byte) {
|
|
|
|
return (('-' <= byte && byte <= '9') || ('_' <= byte && byte <= 'z'));
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Lookup table mapping bytes in DNS names to bit positions, used
|
|
|
|
* by dns_qpkey_fromname() to convert DNS names to qp-trie keys.
|
|
|
|
*/
|
|
|
|
extern uint16_t dns_qp_bits_for_byte[];
|
|
|
|
|
|
|
|
/*
|
|
|
|
* And the reverse, mapping bit positions to characters, so the tests
|
|
|
|
* can print diagnostics involving qp-trie keys.
|
|
|
|
*/
|
|
|
|
extern uint8_t dns_qp_byte_for_bit[];
|
|
|
|
|
|
|
|
/**********************************************************************/
|