Instead of cleaning the dns_badcache opportunistically, add per-loop
LRU, so each thread-loop can clean the expired entries. This also
allows removal of the atomic operations as the badcache entries are now
immutable, instead of updating the badcache entry in place, the old
entry is now deleted from the hashtable and the LRU list, and the new
entry is inserted in the LRU.
RFC 9567 section 8.1 specifies that the agent domain cannot
be a subdomain of the domain it is reporting on. therefore,
in addition to making it illegal to configure that at the
zone level, we also need to disable send-report-channel for
any zone for which the global send-report-channel value is
a subdomain.
we also now warn if send-report-channel is configured
globally to a zone that we host, but that zone doesn't
have log-report-channel set.
If send-report-channel is set at the zone level, it will
be stored in the zone object and used instead of the
view-level agent-domain when constructing the EDNS
Report-Channel option.
This commit adds support for the EDNS Report-Channel option,
which is returned in authoritative responses when EDNS is in use.
"send-report-channel" sets the Agent-Domain value that will be
included in EDNS Report-Channel options. This is configurable at
the options/view level; the value is a DNS name. Setting the
Agent-Domain to the root zone (".") disables the option.
When this value has been set, incoming queries matchng the form
_er.<qtype>.<qname>.<extended-error-code>._er.<agent-domain>/TXT
will be logged to the dns-reporting-agent channel at INFO level.
(Note: error reporting queries will only be accepted if sent via
TCP or with a good server cookie. If neither is present, named
returns BADCOOKIE to complete the DNS COOKIE handshake, or TC=1
to switch the client to TCP.)
Remove the complicated mechanism that could be (in theory) used by
external libraries to register new categories and modules with
statically defined lists in <isc/log.h>. This is similar to what we
have done for <isc/result.h> result codes. All the libraries are now
internal to BIND 9, so we don't need to provide a mechanism to register
extra categories and modules.
When sending fails, the ns__client_request() would not reset the
connection and continue as nothing is happening. This comes from the
model that we don't care about failed UDP sends because datagrams are
unreliable anyway, but it greatly affects TCP connections with
keep-alive.
The worst case scenario is as follows:
1. the 3-way TCP handshake gets completed
2. the libuv calls the "uv_connection_cb" callback
3. the TCP connection gets queue because of the tcp-clients quota
4. the TCP client sends as many DNS messages as the buffers allow
5. the TCP connection gets dropped by the client due to the timeout
6. the TCP connection gets accepted by the server
7. the data already sent by the client gets read
8. all sending fails immediately because the TCP connection is dead
9. we consume all the data in the buffer in a very tight loop
As it doesn't make sense to trying to process more data on the TCP
connection when the sending is failing, drop the connection immediately
on the first sending error.
As ns_query_init() cannot fail now, remove the error paths, especially
in ns__client_setup() where we now don't have to care what to do with
the connection if setting up the client could fail. It couldn't fail
even before, but now it's formal.
View matching on an incoming query checks the query's signature,
which can be a CPU-heavy task for a SIG(0)-signed message. Implement
an asynchronous mode of the view matching function which uses the
offloaded signature checking facilities, and use it for the incoming
queries.
In order to protect from a malicious DNS client that sends many
queries with a SIG(0)-signed message, add a quota of simultaneously
running SIG(0) checks.
This protection can only help when named is using more than one worker
threads. For example, if named is running with the '-n 4' option, and
'sig0checks-quota 2;' is used, then named will make sure to not use
more than 2 workers for the SIG(0) signature checks in parallel, thus
leaving the other workers to serve the remaining clients which do not
use SIG(0)-signed messages.
That limitation is going to change when SIG(0) signature checks are
offloaded to "slow" threads in a future commit.
The 'sig0checks-quota-exempt' ACL option can be used to exempt certain
clients from the quota requirements using their IP or network addresses.
The 'sig0checks-quota-maxwait-ms' option is used to define a maximum
amount of time for named to wait for a quota to appear. If during that
time no new quota becomes available, named will answer to the client
with DNS_R_REFUSED.
The changes in this MR prevent the memory used for sending the outgoing
TCP requests to spike so much. That strictly remove the extra need for
own memory context, and thus since we generally prefer simplicity,
remove the extra memory context with own jemalloc arenas just for the
outgoing send buffers.
As a single thread can process only one TCP send at the time, we don't
really need a memory pool for the TCP buffers, but it's enough to have
a single per-loop (client manager) static buffer that's being used to
assemble the DNS message and then it gets copied into own sending
buffer.
In the future, this should get optimized by exposing the uv_try API
from the network manager, and first try to send the message directly
and allocate the sending buffer only if we need to send the data
asynchronously.
Constantly allocating, reallocating and deallocating 64K TCP send
buffers by 'ns_client' instances takes too much CPU time.
There is an existing mechanism to reuse the ns_clent_t structure
associated with the handle using 'isc_nmhandle_getdata/_setdata'
(see ns_client_request()), but it doesn't work with TCP, because
every time ns_client_request() is called it gets a new handle even
for the same TCP connection, see the comments in
streamdns_on_complete_dnsmessage().
To solve the problem, we introduce an array of available (unused)
TCP buffers stored in ns_clientmgr_t structure so that a 'client'
working via TCP can have a chance to reuse one (if there is one)
instead of allocating a new one every time.
Because some tests don't have a legtimate handle, provide a temporary
return early that should be fixed and removed before squashing. This
short circuiting is still correct until DoQ/DoH3 support is introduced.
GCC might fail to compile because it expects a return after UNREACHABLE.
It should ideally just work anyway since UNREACHABLE is either a
noreturn or UB (__builtin_unreachable / C23 unreachable).
Either way, it should be optimized almost always so the fallback is
free or basically free anyway when it isn't optimized out.
The main intention of PROXY protocol is to pass endpoints information
to a back-end server (in our case - BIND). That means that it is a
valid way to spoof endpoints information, as the addresses and ports
extracted from PROXYv2 headers, from the point of view of BIND, are
used instead of the real connection addresses.
Of course, an ability to easily spoof endpoints information can be
considered a security issue when used uncontrollably. To resolve that,
we introduce 'allow-proxy' and 'allow-proxy-on' ACL options. These are
the only ACL options in BIND that work with real PROXY connections
addresses, allowing a DNS server operator to specify from what clients
and on which interfaces he or she is willing to accept PROXY
headers. By default, for security reasons we do not allow to accept
them.
The AES algorithm for DNS cookies was being kept for legacy reasons, and
it can be safely removed in the next major release. Remove both the AES
usage for DNS cookies and the AES implementation itself.
All these pointers are guaranteed to be non-NULL.
Additionally, update a comment to remove obviously outdated
information about the function's requirements.
Instead of creating new memory pools for each new dns_message, change
dns_message_create() method to optionally accept externally created
dns_fixedname_t and dns_rdataset_t memory pools. This allows us to
preallocate the memory pools in ns_client and dns_resolver units for the
lifetime of dns_resolver_t and ns_clientmgr_t.
The dns_badcache unit had (yet another) own locked hashtable
implementation. Replace the hashtable used by dns_badcache with
lock-free cds_lfht implementation from liburcu.
The server was previously tolerant of out-of-date or otherwise bad
DNS SERVER COOKIES that where well formed unless require-cookie was
set. BADCOOKIE is now return for these conditions.
view->adb may be referenced while the view is shutting down as the
zone uses a weak reference to the view and examines view->adb but
dns_view_detach call dns_adb_detach to clear view->adb.
since it is not necessary to find partial matches when looking
up names in a TSIG keyring, we can use a hash table instead of
an RBT to store them.
the tsigkey object now stores the key name as a dns_fixedname
rather than allocating memory for it.
the `name` parameter to dns_tsigkeyring_add() has been removed;
it was unneeded since the tsigkey object already contains a copy
of the name.
the opportunistic cleanup_ring() function has been removed;
it was only slowing down lookups.
the prior practice of passing a dns_name containing the
expanded name of an algorithm to dns_tsigkey_create() and
dns_tsigkey_createfromkey() is unnecessarily cumbersome;
we can now pass the algorithm number instead.
This commit changes send buffers allocation strategy for stream based
transports. Before that change we would allocate a dynamic buffers
sized at 64Kb even when we do not need that much. That could lead to
high memory usage on server. Now we resize the send buffer to match
the size of the actual data, freeing the memory at the end of the
buffer for being reused later.
Use current used pointer - 16 instead of a saved pointer as Coverity
thinks the memory may be freed between assignment and use of 'cp'.
isc_buffer_put{mem,uint{8,16,32}} can theoretically free the memory
if there is a dynamic buffer in use but that is not the case here.
In e18541287231b721c9cdb7e492697a2a80fd83fc, the TCP accept quota code
became broken in a subtle way - the quota would get initialized on the
first accept for the server socket and then deleted from the server
socket, so it would never get applied again.
Properly fixing this required a bigger refactoring of the isc_quota API
code to make it much simpler. The new code decouples the ownership of
the quota and acquiring/releasing the quota limit.
After (during) the refactoring it became more clear that we need to use
the callback from the child side of the accepted connection, and not the
server side.
This should have no functional effects.
The message size stats are specified by RSSAC002 so it's best not
to mess around with how they appear in the statschannel. But it's
worth changing the implementation to use general-purpose histograms,
to reduce code size and benefit from sharded counters.
The isc_time_now() and isc_time_now_hires() were used inconsistently
through the code - either with status check, or without status check,
or via TIME_NOW() macro with RUNTIME_CHECK() on failure.
Refactor the isc_time_now() and isc_time_now_hires() to always fail when
getting current time has failed, and return the isc_time_t value as
return value instead of passing the pointer to result in the argument.
This is a simple replacement using the semantic patch from the previous
commit and as added bonus, one removal of previously undetected unused
variable in named/server.c.
this function was just a front-end for gethostname(). it was
needed when we supported windows, which has a different function
for looking up the hostname; it's not needed any longer.
as there is no further use of isc_task in BIND, this commit removes
it, along with isc_taskmgr, isc_event, and all other related types.
functions that accepted taskmgr as a parameter have been cleaned up.
as a result of this change, some functions can no longer fail, so
they've been changed to type void, and their callers have been
updated accordingly.
the tasks table has been removed from the statistics channel and
the stats version has been updated. dns_dyndbctx has been changed
to reference the loopmgr instead of taskmgr, and DNS_DYNDB_VERSION
has been udpated as well.
The ns_client_aclchecksilent is used to check multiple ACLs before
the decision is made that a query is denied. It is also used to
determine if recursion is available. In those cases we should not
set the extended DNS error "Prohibited".
The Stream DNS implementation needs a peek methods that read the value
from the buffer, but it doesn't advance the current position. Add
isc_buffer_peekuintX methods, refactor the isc_buffer_{get,put}uintN
methods to modern integer types, and move the isc_buffer_getuintN to the
header as static inline functions.
Send the ns_query_cancel() on the recursing clients when we initiate the
named shutdown for faster shutdown.
When we are shutting down the resolver, we cancel all the outstanding
fetches, and the ISC_R_CANCEL events doesn't propagate to the ns_client
callback.
In the future, the better solution how to fix this would be to look at
the shutdown paths and let them all propagate from bottom (loopmgr) to
top (f.e. ns_client).
The dns_cache API contained a cache cleaning mechanism that would be
disabled for 'rbt' based cache. As named doesn't have any other cache
implementations, remove the cache cleaning mechanism from dns_cache API.