This commit modifies TLS Stream and DNS-over-HTTPS transports so that
they do not use the "sock->iface" and "sock->peer" of the lower level
transport directly.
That did not cause any problems before, as things worked as expected,
but with the introduction of PROXYv2 support we use handles to store
the information in both PROXY Stream and UDP Proxy
transports. Therefore, in order to propagate the information (like
addresses), extracted from PROXYv2 headers, from the lower level
transports to the higher-level ones, we need to get that information
from the lower-level handles rather than sockets. That means that we
should get the peer and interface addresses using the intended
APIs ("isc_nmhandle_peeraddr()" and "isc_nmhandle_localaddr()").
Add the new isc__nm_dump_active_manager() function that can be used
for debugging purposes: it dumps all active sockets withing the
network manager instance.
This commit adds a new transport that supports PROXYv2 over UDP. It is
built on top of PROXYv2 handling code (just like PROXY Stream). It
works by processing and stripping the PROXYv2 headers at the beginning
of a datagram (when accepting a datagram) or by placing a PROXYv2
header to the beginning of an outgoing datagram.
The transport is built in such a way that incoming datagrams are being
handled with minimal memory allocations and copying.
In the previous versions of the NM, detecting the case when worker is
shutting down was not that important and actual status code did not
matter much. However, that might be not the case all the time.
This commit makes necessary modifications to the code.
This commit makes it possible to use PROXY Stream not only over TCP,
but also over TLS. That is, now PROXY Stream can work in two modes as
far as TLS is involved:
1. PROXY over (plain) TCP - PROXYv2 headers are sent unencrypted before
TLS handshake messages. That is the main mode as described in the
PROXY protocol specification (as it is clearly stated there), and most
of the software expects PROXYv2 support to be implemented that
way (e.g. HAProxy);
2. PROXY over (encrypted) TLS - PROXYv2 headers are sent after the TLS
handshake has happened. For example, this mode is being used (only ?)
by "dnsdist". As far as I can see, that is, in fact, a deviation from
the spec, but I can certainly see how PROXYv2 could end up being
implemented this way elsewhere.
This commit modifies TLS Stream to make it possible to use over PROXY
Stream. That is required to add PROVYv2 support into TLS-based
transports (DNS over HTTP, DNS over TLS).
This commit adds a new stream-based transport with an interface
compatible with TCP. The transport is built on top of TCP transport
and the new PROXYv2 handling code. Despite being built on top of TCP,
it can be easily extended to work on top of any TCP-like stream-based
transport. The intention of having this transport is to add PROXYv2
support into all existing stream-based DNS transport (DNS over TCP,
DNS over TLS, DNS over HTTP) by making the work on top of this new
transport.
The idea behind the transport is simple after accepting the connection
or connecting to a remote server it enters PROXYv2 handling mode: that
is, it either attempts to read (when accepting the connection) or send
(when establishing a connection) a PROXYv2 header. After that it works
like a mere wrapper on top of the underlying stream-based
transport (TCP).
This commit adds a set of utilities for dealing with PROXYv2 headers,
both parsing and generating them. The code has no dependencies from
the networking code and is (for the most part) a "separate library".
The part responsible for handling incoming PROXYv2 headers is
structured as a state machine which accepts data as input and calls a
callback to notify the upper-level code about the data processing
status.
Such a design, among other things, makes it easy to write a thorough
unit test suite for that, as there are fewer dependencies as well as
will not stand in the way of any changes in the networking code.
Previously, there were two methods of working with the overmem
condition:
1. hi/lo water callback - when the overmem condition was reached
for the first time, the water callback was called with HIWATER
mark and .is_overmem boolean was set internally. Similarly,
when the used memory went below the lo water mark, the water
callback would be called with LOWATER mark and .is_overmem
was reset. This check would be called **every** time memory
was allocated or freed.
2. isc_mem_isovermem() - a simple getter for the internal
.is_overmem flag
This commit refactors removes the first method and move the hi/lo water
checks to the isc_mem_isovermem() function, thus we now have only a
single method of checking overmem condition and the check for hi/lo
water is removed from the hot path for memory contexts that doesn't use
overmem checks.
The AES algorithm for DNS cookies was being kept for legacy reasons, and
it can be safely removed in the next major release. Remove both the AES
usage for DNS cookies and the AES implementation itself.
Concurrent threads can access a hashmap for reading by creating and
then destroying an iterator, in which case the integer number of the
active iterators is increased or decreased from different threads,
introducing a data race. Use atomic operations to protect the variable.
Add complementary macros to ISC_LIST_FOREACH(_SAFE) that walk the lists
in reverse.
* ISC_LIST_FOREACH_REV(list, elt, link) - walk the static list from
tail to head
* ISC_LIST_FOREACH_REV_SAFE(list, elt, link, next) - walk the list
from tail to head in a manner that's safe against list member
deletions
OpenSSL 1.1 has already reached end-of-life and since we are
experiencing a weird memory leak in the mirror system test on just
Ubuntu 20.04 (Focal) with OpenSSL 1.1, we disable the legacy code for
enabling memory contexts for OpenSSL < 3.0.0 in this commit.
In units that support detailed reference tracing via ISC_REFCOUNT
macros, we were doing:
/* Define to 1 for detailed reference tracing */
#undef <unit>_TRACE
This would prevent using -D<unit>_TRACE=1 in the CFLAGS.
Convert the above mentioned snippet with just a comment how to enable
the detailed reference tracing:
/* Add -D<unit>_TRACE=1 to CFLAGS for detailed reference tracing */
Apply the semantic patch to catch all the places where we pass 'char' to
the <ctype.h> family of functions (isalpha() and friends, toupper(),
tolower()).
Instead of copying address back and forth when hashing addr+port, we can
use incremental hashing. Additionally, switch from 64-bit
isc_hash_function to 32-bit isc_hash32() as the resulting value is
32-bit.
The Unix Domain Sockets support in BIND 9 has been completely disabled
since BIND 9.18 and it has been a fatal error since then. Cleanup the
code and the documentation that suggest that Unix Domain Sockets are
supported.
When iterating the table, we can't add new nodes to the hashmap because
we can't assure that we are not adding the new node before the iterator.
This also applies to rehashing - which might be triggered by both
isc_hashmap_add() and isc_hashmap_delete(), but not
isc_hashmap_iter_delcurrent_next().
When isc_hashmap_iter_delcurrent_next calls hashmap_delete_node
nodes from the front of the table could be added to the end of
the table resulting in them being returned twice. Detect when
this is happening and prevent those nodes being returned twice
buy reducing the effective size of the table by one each time
it happens.
Instead of high number of dispatches (4 * named_g_udpdisp)[1], make the
dispatches bound to threads and make dns_dispatchset_t create a dispatch
for each thread (event loop).
This required couple of other changes:
1. The dns_dispatch_createudp() must be called on loop, so the isc_tid()
is already initialized - changes to nsupdate and mdig were required.
2. The dns_requestmgr had only a single dispatch per v4 and v6. Instead
of using single dispatch, use dns_dispatchset_t for each protocol -
this is same as dns_resolver.
Looking up unique message ID in the dns_dispatch has been using custom
hash tables. Rewrite the custom hashtable to use cds_lfht API, removing
one extra lock in the cold-cache resolver hot path.
Refactor isc_hashmap to allow custom matching functions. This allows us
to have better tailored keys that don't require fixed uint8_t arrays,
but can be composed of more fields from the stored data structure.
Add support for incremental hashing to the isc_hash unit, both 32-bit
and 64-bit incremental hashing is now supported.
This is commit second in series adding incremental hashing to libisc.
When inserting items into hashtables (hashmaps), we might have a
fragmented key (as an example we might want to hash DNS name + class +
type). We either need to construct continuous key in the memory and
then hash it en bloc, or incremental hashing is required.
This incremental version of SipHash 2-4 algorithm is the first building
block.
As SipHash 2-4 is often used in the hot paths, I've turned the
implementation into header-only version in the process.
We now depend on explicitly creating memory arenas and disabling tcache
on those, and these features are not available with jemalloc < 4.
Instead of working around these issues, make the jemalloc >= 4.0.0 hard
requirement by looking for sdallocx() symbol that's only available from
that version.
The jemalloc < 4 was only used by RHEL 7 which is not supported since
BIND 9.19+.
This commit extends the internal memory management middleware code in
BIND so that memory contexts backed by dedicated jemalloc arenas can
be created. A new function (isc_mem_create_arena()) is added for that.
Moreover, it extends the existing code so that specialised memory
contexts can be created easily, should we need that functionality for
other future purposes. We have achieved that by passing the flags to
the underlying jemalloc-related calls. See the above
isc_mem_create_arena(), which can serve as an example of this.
Having this opens up possibilities for creating memory contexts tuned
for specific needs.