Change the internal type used for isc_tid unit to isc_tid_t to hide the
specific integer type being used for the 'tid'. Internally, the signed
integer type is being used. This allows us to have negatively indexed
arrays that works both for threads with assigned tid and the threads
with unassigned tid. This should be used only in specific situations.
Coverity detected that 'optlen' was not being checked in 'process_opt'.
This is actually already done when the OPT record was initially
parsed. Add an INSIST to silence Coverity as is done in message.c.
Instead of giving the memory context names with an explicit call to
isc_mem_setname(), add the name to isc_mem_create() call to have all the
memory contexts an unconditional name.
Previously all kinds of TCP timeouts had a single getter and setter
functions. Separate each timeout to its own getter/setter functions,
because in majority of cases only one is required at a time, and it's
not optimal expanding those functions every time a new timeout value
is implemented.
The new 'tcp-primaries-timeout' configuration option works the same way
as the existing 'tcp-initial-timeout' option, but applies only to the
TCP connections made to the primary servers, so that the timeout value
can be set separately for them. The default is 15 seconds.
Also, while accommodating zone.c's code to support the new option, make
a light refactoring with the way UDP timeouts are calculated by using
definitions instead of hardcoded values.
This will help identify the broken server if we happen to break
EDNS version negotiation. It will also help protect the client
from spoofed BADVERSION responses.
use the ISC_LIST_FOREACH pattern in places where lists had
been iterated using a different pattern from the typical
`for` loop: for example, `while (!ISC_LIST_EMPTY(...))` or
`while ((e = ISC_LIST_HEAD(...)) != NULL)`.
the pattern `for (x = ISC_LIST_HEAD(...); x != NULL; ISC_LIST_NEXT(...)`
has been changed to `ISC_LIST_FOREACH` throughout BIND, except in a few
cases where the change would be excessively complex.
in most cases this was a straightforward change. in some places,
however, the list element variable was referenced after the loop
ended, and the code was refactored to avoid this necessity.
also, because `ISC_LIST_FOREACH` uses typeof(list.head) to declare
the list elements, compilation failures can occur if the list object
has a `const` qualifier. some `const` qualifiers have been removed
from function parameters to avoid this problem, and where that was not
possible, `UNCONST` was used.
If there was an EDNS ZONEVERSION option in the DNS request and the
answer was from a zone, return the zone's serial and number of
labels excluding the root label with the type set to 0 (ZONE-SERIAL).
this parameter was added as a (minor) optimization for
cases where dns_name_towire() is run repeatedly with the
same compression context, as when rendering all of the rdatas
in an rdataset. it is currently only used in one place.
we now simplify the interface by removing the extra parameter.
the compression offset value is now part of the compression
context, and can be activated when needed by calling
dns_compress_setmultiuse(). multiuse mode is automatically
deactivated by any subsequent call to dns_compress_permitted().
Instead of mixing the dns_resolver and dns_validator units directly with
the EDE code, split-out the dns_ede functionality into own separate
compilation unit and hide the implementation details behind abstraction.
Additionally, the EDE codes are directly copied into the ns_client
buffers by passing the EDE context to dns_resolver_createfetch().
This makes the dns_ede implementation simpler to use, although sligtly
more complicated on the inside.
Co-authored-by: Colin Vidal <colin@isc.org>
Co-authored-by: Ondřej Surý <ondrej@isc.org>
When an EDE code is added to a message, the code is converted early in a
big-endian order so it can be memcpy-ed directly in the EDE buffer that
will go on the wire.
This previous change forget to update debug logs which still assume the
EDE code was in host byte order. Add a separate variable to
differentiate both and avoid ambiguities
the isc_mem allocation functions can no longer fail; as a result,
ISC_R_NOMEMORY is now rarely used: only when an external library
such as libjson-c or libfstrm could return NULL. (even in
these cases, arguably we should assert rather than returning
ISC_R_NOMEMORY.)
code and comments that mentioned ISC_R_NOMEMORY have been
cleaned up, and the following functions have been changed to
type void, since (in most cases) the only value they could
return was ISC_R_SUCCESS:
- dns_dns64_create()
- dns_dyndb_create()
- dns_ipkeylist_resize()
- dns_kasp_create()
- dns_kasp_key_create()
- dns_keystore_create()
- dns_order_create()
- dns_order_add()
- dns_peerlist_new()
- dns_tkeyctx_create()
- dns_view_create()
- dns_zone_setorigin()
- dns_zone_setfile()
- dns_zone_setstream()
- dns_zone_getdbtype()
- dns_zone_setjournal()
- dns_zone_setkeydirectory()
- isc_lex_openstream()
- isc_portset_create()
- isc_symtab_create()
(the exception is dns_view_create(), which could have returned
other error codes in the event of a crypto library failure when
calling isc_file_sanitize(), but that should be a RUNTIME_CHECK
anyway.)
Extended DNS error mechanism (EDE) enables to have several EDE raised
during a DNS resolution (typically, a DNSSEC query will do multiple
fetches which each of them can have an error). Add support to up to 3
EDE errors in an DNS response. If duplicates occur (two EDEs with the
same code, the extra text is not compared), only the first one will be
part of the DNS answer.
Because the maximum number of EDE is statically fixed, `ns_client_t`
object own a static vector of `DNS_DE_MAX_ERRORS` (instead of a linked
list, for instance). The array can be fully filled (all slots point to
an allocated `dns_ednsopt_t` object) or partially filled (or
empty). In such case, the first NULL slot means there is no more EDE
objects.
Instead of cleaning the dns_badcache opportunistically, add per-loop
LRU, so each thread-loop can clean the expired entries. This also
allows removal of the atomic operations as the badcache entries are now
immutable, instead of updating the badcache entry in place, the old
entry is now deleted from the hashtable and the LRU list, and the new
entry is inserted in the LRU.
RFC 9567 section 8.1 specifies that the agent domain cannot
be a subdomain of the domain it is reporting on. therefore,
in addition to making it illegal to configure that at the
zone level, we also need to disable send-report-channel for
any zone for which the global send-report-channel value is
a subdomain.
we also now warn if send-report-channel is configured
globally to a zone that we host, but that zone doesn't
have log-report-channel set.
If send-report-channel is set at the zone level, it will
be stored in the zone object and used instead of the
view-level agent-domain when constructing the EDNS
Report-Channel option.
This commit adds support for the EDNS Report-Channel option,
which is returned in authoritative responses when EDNS is in use.
"send-report-channel" sets the Agent-Domain value that will be
included in EDNS Report-Channel options. This is configurable at
the options/view level; the value is a DNS name. Setting the
Agent-Domain to the root zone (".") disables the option.
When this value has been set, incoming queries matchng the form
_er.<qtype>.<qname>.<extended-error-code>._er.<agent-domain>/TXT
will be logged to the dns-reporting-agent channel at INFO level.
(Note: error reporting queries will only be accepted if sent via
TCP or with a good server cookie. If neither is present, named
returns BADCOOKIE to complete the DNS COOKIE handshake, or TC=1
to switch the client to TCP.)
Remove the complicated mechanism that could be (in theory) used by
external libraries to register new categories and modules with
statically defined lists in <isc/log.h>. This is similar to what we
have done for <isc/result.h> result codes. All the libraries are now
internal to BIND 9, so we don't need to provide a mechanism to register
extra categories and modules.
When sending fails, the ns__client_request() would not reset the
connection and continue as nothing is happening. This comes from the
model that we don't care about failed UDP sends because datagrams are
unreliable anyway, but it greatly affects TCP connections with
keep-alive.
The worst case scenario is as follows:
1. the 3-way TCP handshake gets completed
2. the libuv calls the "uv_connection_cb" callback
3. the TCP connection gets queue because of the tcp-clients quota
4. the TCP client sends as many DNS messages as the buffers allow
5. the TCP connection gets dropped by the client due to the timeout
6. the TCP connection gets accepted by the server
7. the data already sent by the client gets read
8. all sending fails immediately because the TCP connection is dead
9. we consume all the data in the buffer in a very tight loop
As it doesn't make sense to trying to process more data on the TCP
connection when the sending is failing, drop the connection immediately
on the first sending error.
As ns_query_init() cannot fail now, remove the error paths, especially
in ns__client_setup() where we now don't have to care what to do with
the connection if setting up the client could fail. It couldn't fail
even before, but now it's formal.
View matching on an incoming query checks the query's signature,
which can be a CPU-heavy task for a SIG(0)-signed message. Implement
an asynchronous mode of the view matching function which uses the
offloaded signature checking facilities, and use it for the incoming
queries.
In order to protect from a malicious DNS client that sends many
queries with a SIG(0)-signed message, add a quota of simultaneously
running SIG(0) checks.
This protection can only help when named is using more than one worker
threads. For example, if named is running with the '-n 4' option, and
'sig0checks-quota 2;' is used, then named will make sure to not use
more than 2 workers for the SIG(0) signature checks in parallel, thus
leaving the other workers to serve the remaining clients which do not
use SIG(0)-signed messages.
That limitation is going to change when SIG(0) signature checks are
offloaded to "slow" threads in a future commit.
The 'sig0checks-quota-exempt' ACL option can be used to exempt certain
clients from the quota requirements using their IP or network addresses.
The 'sig0checks-quota-maxwait-ms' option is used to define a maximum
amount of time for named to wait for a quota to appear. If during that
time no new quota becomes available, named will answer to the client
with DNS_R_REFUSED.
The changes in this MR prevent the memory used for sending the outgoing
TCP requests to spike so much. That strictly remove the extra need for
own memory context, and thus since we generally prefer simplicity,
remove the extra memory context with own jemalloc arenas just for the
outgoing send buffers.
As a single thread can process only one TCP send at the time, we don't
really need a memory pool for the TCP buffers, but it's enough to have
a single per-loop (client manager) static buffer that's being used to
assemble the DNS message and then it gets copied into own sending
buffer.
In the future, this should get optimized by exposing the uv_try API
from the network manager, and first try to send the message directly
and allocate the sending buffer only if we need to send the data
asynchronously.
Constantly allocating, reallocating and deallocating 64K TCP send
buffers by 'ns_client' instances takes too much CPU time.
There is an existing mechanism to reuse the ns_clent_t structure
associated with the handle using 'isc_nmhandle_getdata/_setdata'
(see ns_client_request()), but it doesn't work with TCP, because
every time ns_client_request() is called it gets a new handle even
for the same TCP connection, see the comments in
streamdns_on_complete_dnsmessage().
To solve the problem, we introduce an array of available (unused)
TCP buffers stored in ns_clientmgr_t structure so that a 'client'
working via TCP can have a chance to reuse one (if there is one)
instead of allocating a new one every time.
Because some tests don't have a legtimate handle, provide a temporary
return early that should be fixed and removed before squashing. This
short circuiting is still correct until DoQ/DoH3 support is introduced.
GCC might fail to compile because it expects a return after UNREACHABLE.
It should ideally just work anyway since UNREACHABLE is either a
noreturn or UB (__builtin_unreachable / C23 unreachable).
Either way, it should be optimized almost always so the fallback is
free or basically free anyway when it isn't optimized out.
The main intention of PROXY protocol is to pass endpoints information
to a back-end server (in our case - BIND). That means that it is a
valid way to spoof endpoints information, as the addresses and ports
extracted from PROXYv2 headers, from the point of view of BIND, are
used instead of the real connection addresses.
Of course, an ability to easily spoof endpoints information can be
considered a security issue when used uncontrollably. To resolve that,
we introduce 'allow-proxy' and 'allow-proxy-on' ACL options. These are
the only ACL options in BIND that work with real PROXY connections
addresses, allowing a DNS server operator to specify from what clients
and on which interfaces he or she is willing to accept PROXY
headers. By default, for security reasons we do not allow to accept
them.
The AES algorithm for DNS cookies was being kept for legacy reasons, and
it can be safely removed in the next major release. Remove both the AES
usage for DNS cookies and the AES implementation itself.
All these pointers are guaranteed to be non-NULL.
Additionally, update a comment to remove obviously outdated
information about the function's requirements.
Instead of creating new memory pools for each new dns_message, change
dns_message_create() method to optionally accept externally created
dns_fixedname_t and dns_rdataset_t memory pools. This allows us to
preallocate the memory pools in ns_client and dns_resolver units for the
lifetime of dns_resolver_t and ns_clientmgr_t.
The dns_badcache unit had (yet another) own locked hashtable
implementation. Replace the hashtable used by dns_badcache with
lock-free cds_lfht implementation from liburcu.
The server was previously tolerant of out-of-date or otherwise bad
DNS SERVER COOKIES that where well formed unless require-cookie was
set. BADCOOKIE is now return for these conditions.