2
0
mirror of https://gitlab.isc.org/isc-projects/bind9 synced 2025-08-23 02:28:55 +00:00

338 Commits

Author SHA1 Message Date
Ondřej Surý
4f5b4662b6 Remove the limit on the number of simultaneous TCP queries
There was an artificial limit of 23 on the number of simultaneous
pipelined queries in the single TCP connection.  The new network
managers is capable of handling "unlimited" (limited only by the TCP
read buffer size ) queries similar to "unlimited" handling of the DNS
queries receive over UDP.

Don't limit the number of TCP queries that we can process within a
single TCP read callback.
2022-02-17 16:19:12 -08:00
Ondřej Surý
4716c56ebb Reset the TCP connection when garbage is received
When invalid DNS message is received, there was a handling mechanism for
DoH that would be called to return proper HTTP response.

Reuse this mechanism and reset the TCP connection when the client is
blackholed, DNS message is completely bogus or the ns_client receives
response instead of query.
2022-02-17 20:39:55 +01:00
Ondřej Surý
a89d9e0fa6 Add isc_nmhandle_setwritetimeout() function
In some situations (unit test and forthcoming XFR timeouts MR), we need
to modify the write timeout independently of the read timeout.  Add a
isc_nmhandle_setwritetimeout() function that could be called before
isc_nm_send() to specify a custom write timeout interval.
2022-02-17 09:06:58 +01:00
Ondřej Surý
408b362169 Add TCP, TCPDNS and TLSDNS write timer
When the outgoing TCP write buffers are full because the other party is
not reading the data, the uv_write() could wait indefinitely on the
uv_loop and never calling the callback.  Add a new write timer that uses
the `tcp-idle-timeout` value to interrupt the TCP connection when we are
not able to send data for defined period of time.
2022-02-17 09:06:58 +01:00
Ondřej Surý
45a73c113f Rename sock->timer to sock->read_timer
Before adding the write timer, we have to remove the generic sock->timer
to sock->read_timer.  We don't touch the function names to limit the
impact of the refactoring.
2022-02-17 09:06:58 +01:00
Ondřej Surý
8715be1e4b Use UV_RUNTIME_CHECK() as appropriate
Replace the RUNTIME_CHECK() calls for libuv API calls with
UV_RUNTIME_CHECK() to get more detailed error message when
something fails and should not.
2022-02-16 11:16:57 +01:00
Ondřej Surý
b5e086257d Explicitly enable IPV6_V6ONLY on the netmgr sockets
Some operating systems (OpenBSD and DragonFly BSD) don't restrict the
IPv6 sockets to sending and receiving IPv6 packets only.  Explicitly
enable the IPV6_V6ONLY socket option on the IPv6 sockets to prevent
failures from using the IPv4-mapped IPv6 address.
2022-01-17 22:16:27 +01:00
Ondřej Surý
7370725008 Fix the UDP recvmmsg support
Previously, the netmgr/udp.c tried to detect the recvmmsg detection in
libuv with #ifdef UV_UDP_<foo> preprocessor macros.  However, because
the UV_UDP_<foo> are not preprocessor macros, but enum members, the
detection didn't work.  Because the detection didn't work, the code
didn't have access to the information when we received the final chunk
of the recvmmsg and tried to free the uvbuf every time.  Fortunately,
the isc__nm_free_uvbuf() had a kludge that detected attempt to free in
the middle of the receive buffer, so the code worked.

However, libuv 1.37.0 changed the way the recvmmsg was enabled from
implicit to explicit, and we checked for yet another enum member
presence with preprocessor macro, so in fact libuv recvmmsg support was
never enabled with libuv >= 1.37.0.

This commit changes to the preprocessor macros to autoconf checks for
declaration, so the detection now works again.  On top of that, it's now
possible to cleanup the alloc_cb and free_uvbuf functions because now,
the information whether we can or cannot free the buffer is available to
us.
2022-01-13 19:06:39 +01:00
Ondřej Surý
58bd26b6cf Update the copyright information in all files in the repository
This commit converts the license handling to adhere to the REUSE
specification.  It specifically:

1. Adds used licnses to LICENSES/ directory

2. Add "isc" template for adding the copyright boilerplate

3. Changes all source files to include copyright and SPDX license
   header, this includes all the C sources, documentation, zone files,
   configuration files.  There are notes in the doc/dev/copyrights file
   on how to add correct headers to the new files.

4. Handle the rest that can't be modified via .reuse/dep5 file.  The
   binary (or otherwise unmodifiable) files could have license places
   next to them in <foo>.license file, but this would lead to cluttered
   repository and most of the files handled in the .reuse/dep5 file are
   system test files.
2022-01-11 09:05:02 +01:00
Ondřej Surý
6269fce0fe Use isc_mem_get_aligned() for isc_queue and cleanup max_threads
The isc_queue_new() was using dirty tricks to allocate the head and tail
members of the struct aligned to the cacheline.  We can now use
isc_mem_get_aligned() to allocate the structure to the cacheline
directly.

Use ISC_OS_CACHELINE_SIZE (64) instead of arbitrary ALIGNMENT (128), one
cacheline size is enough to prevent false sharing.

Cleanup the unused max_threads variable - there was actually no limit on
the maximum number of threads.  This was changed a while ago.
2022-01-05 17:10:58 +01:00
Ondřej Surý
57d0fabadd Stop leaking mutex in nmworker and cond in nm socket
On FreeBSD, the pthread primitives are not solely allocated on stack,
but part of the object lives on the heap.  Missing pthread_*_destroy
causes the heap memory to grow and in case of fast lived object it's
possible to run out-of-memory.

Properly destroy the leaking mutex (worker->lock) and
the leaking condition (sock->cond).
2021-12-08 17:58:53 +01:00
Ondřej Surý
20ac73eb22 Improve the logging on failed TCP accept
Previously, when TCP accept failed, we have logged a message with
ISC_LOG_ERROR level.  One common case, how this could happen is that the
client hits TCP client quota and is put on hold and when resumed, the
client has already given up and closed the TCP connection.  In such
case, the named would log:

    TCP connection failed: socket is not connected

This message was quite confusing because it actually doesn't say that
it's related to the accepting the TCP connection and also it logs
everything on the ISC_LOG_ERROR level.

Change the log message to "Accepting TCP connection failed" and for
specific error states lower the severity of the log message to
ISC_LOG_INFO.
2021-12-02 13:50:00 +01:00
Artem Boldariev
f0e18f3927 Add isc_nm_has_encryption()
This commit adds an isc_nm_has_encryption() function intended to check
if a given handle is backed by a connection which uses encryption.
2021-11-30 12:20:22 +02:00
Artem Boldariev
07cf827b0b Add isc_nm_socket_type()
This commit adds an isc_nm_socket_type() function which can be used to
obtain a handle's socket type.

This change obsoletes isc_nm_is_tlsdns_handle() and
isc_nm_is_http_handle(). However, it was decided to keep the latter as
we eventually might end up supporting multiple HTTP versions.
2021-11-30 12:20:22 +02:00
Evan Hunt
7f63ee3bae address '--disable-doh' failures
Change 5756 (GL #2854) introduced build errors when using
'configure --disable-doh'.  To fix this, isc_nm_is_http_handle() is
now defined in all builds, not just builds that have DoH enabled.

Missing code comments were added both for that function and for
isc_nm_is_tlsdns_handle().
2021-11-17 13:48:43 -08:00
Artem Boldariev
80482f8d3e DoH: Add isc_nm_set_min_answer_ttl()
This commit adds an isc_nm_set_min_answer_ttl() function which is
intended to to be used to give a hint to the underlying transport
regarding the answer TTL.

The interface is intentionally kept generic because over time more
transports might benefit from this functionality, but currently it is
intended for DoH to set "max-age" value within "Cache-Control" HTTP
header (as recommended in the RFC8484, section 5.1 "Cache
Interaction").

It is no-op for other DNS transports for the time being.
2021-11-05 14:14:59 +02:00
Evan Hunt
32b50407bf check statichandle before attaching
it is possible for udp_recv_cb() to fire after the socket
is already shutting down and statichandle is NULL; we need to
create a temporary handle in this case.
2021-10-18 14:21:04 -07:00
Evan Hunt
a55589f881 remove all references to isc_socket and related types
Removed socket.c, socket.h, and all references to isc_socket_t,
isc_socketmgr_t, isc_sockevent_t, etc.
2021-10-15 01:01:25 -07:00
Evan Hunt
075139f60e netmgr: refactor isc__nm_incstats() and isc__nm_decstats()
route/netlink sockets don't have stats counters associated with them,
so it's now necessary to check whether socket stats exist before
incrementing or decrementing them. rather than relying on the caller
for this, we now just pass the socket and an index, and the correct
stats counter will be updated if it exists.
2021-10-15 00:57:02 -07:00
Evan Hunt
8c51a32e5c netmgr: add isc_nm_routeconnect()
isc_nm_routeconnect() opens a route/netlink socket, then calls a
connect callback, much like isc_nm_udpconnect(), with a handle that
can then be monitored for network changes.

Internally the socket is treated as a UDP socket, since route/netlink
sockets follow the datagram contract.
2021-10-15 00:56:58 -07:00
Evan Hunt
8d6bf826c6 netmgr: refactor isc__nm_incstats() and isc__nm_decstats()
After support for route/netlink sockets is merged, not all sockets
will have stats counters associated with them, so it's now necessary
to check whether socket stats exist before incrementing or decrementing
them. rather than relying on the caller for this, we now just pass the
socket and an index, and the correct stats counter will be updated if
it exists.
2021-10-15 00:40:37 -07:00
Artem Boldariev
25b2c6ad96 Require "dot" ALPN token for zone transfer requests over DoT (XoT)
This commit makes BIND verify that zone transfers are allowed to be
done over the underlying connection. Currently, it makes sense only
for DoT, but the code is deliberately made to be protocol-agnostic.
2021-10-05 11:23:47 +03:00
Artem Boldariev
eba3278e52 Add isc_nm_xfr_allowed() function
The intention of having this function is to have a predicate to check
if a zone transfer could be performed over the given handle. In most
cases we can assume that we can do zone transfers over any stream
transport except DoH, but this assumption will not work for zone
transfers over DoT (XoT), as the RFC9103 requires ALPN to happen,
which might not be the case for all deployments of DoT.
2021-10-05 11:23:47 +03:00
Evan Hunt
08ce69a0ea Rewrite dns_resolver and dns_request to use netmgr timeouts
- The `timeout_action` parameter to dns_dispatch_addresponse() been
  replaced with a netmgr callback that is called when a dispatch read
  times out.  this callback may optionally reset the read timer and
  resume reading.

- Added a function to convert isc_interval to milliseconds; this is used
  to translate fctx->interval into a value that can be passed to
  dns_dispatch_addresponse() as the timeout.

- Note that netmgr timeouts are accurate to the millisecond, so code to
  check whether a timeout has been reached cannot rely on microsecond
  accuracy.

- If serve-stale is configured, then a timeout received by the resolver
  may trigger it to return stale data, and then resume waiting for the
  read timeout. this is no longer based on a separate stale timer.

- The code for canceling requests in request.c has been altered so that
  it can run asynchronously.

- TCP timeout events apply to the dispatch, which may be shared by
  multiple queries.  since in the event of a timeout we have no query ID
  to use to identify the resp we wanted, we now just send the timeout to
  the oldest query that was pending.

- There was some additional refactoring in the resolver: combining
  fctx_join() and fctx_try_events() into one function to reduce code
  duplication, and using fixednames in fetchctx and fetchevent.

- Incidental fix: new_adbaddrinfo() can't return NULL anymore, so the
  code can be simplified.
2021-10-02 11:39:56 -07:00
Ondřej Surý
9ee60e7a17 netmgr fixes needed for dispatch
- The read timer must always be stopped when reading stops.

- Read callbacks can now call isc_nm_read() again in TCP, TCPDNS and
  TLSDNS; previously this caused an assertion.

- The wrong failure code could be sent after a UDP recv failure because
  the if statements were in the wrong order. the check for a NULL
  address needs to be after the check for an error code, otherwise the
  result will always be set to ISC_R_EOF.

- When aborting a read or connect because the netmgr is shutting down,
  use ISC_R_SHUTTINGDOWN. (ISC_R_CANCELED is now reserved for when the
  read has been canceled by the caller.)

- A new function isc_nmhandle_timer_running() has been added enabling a
  callback to check whether the timer has been reset after processing a
  timeout.

- Incidental netmgr fix: always use isc__nm_closing() instead of
  referencing sock->mgr->closing directly

- Corrected a few comments that used outdated function names.
2021-10-02 11:39:56 -07:00
Evan Hunt
d9e1ad9e37 Remove reference count REQUIRE in isc_nm_read()
Previously isc_nm_read() required references on the handle to be at
least 2, under the assumption that it would only ever be called from a
connect or accept callback. however, it can also be called from a read
callback, in which case the reference count might be only 1.
2021-10-02 11:39:56 -07:00
Mark Andrews
7079829b84 Address use before NULL check warning of uvreq
move dereference of uvreq until the after NULL check.
2021-09-28 11:57:47 +10:00
Ondřej Surý
8248da3b83 Preserve the contents of socket buffer on realloc
On TCPDNS/TLSDNS read callback, the socket buffer could be reallocated
if the received contents would be larger than the buffer.  The existing
code would not preserve the contents of the existing buffer which lead
to the loss of the already received data.

This commit changes the isc_mem_put()+isc_mem_get() with isc_mem_reget()
to preserve the existing contents of the socket buffer.
2021-09-23 22:36:01 +02:00
Ondřej Surý
8edbd0929f Use isc_mem_reget() to handle the internal active handle cache
The netmgr, has an internal cache for freed active handles.  This cache
was allocated using isc_mem_allocate()/isc_mem_free() API because it was
simpler to reallocate the cache when we needed to grow it.  The new
isc_mem_reget() function could be used here reducing the need to use
isc_mem_allocate() API which is tad bit slower than isc_mem_get() API.
2021-09-23 22:17:15 +02:00
Evan Hunt
fc6f751fbe replace per-protocol keepalive functions with a common one
this commit removes isc__nm_tcpdns_keepalive() and
isc__nm_tlsdns_keepalive(); keepalive for these protocols and
for TCP will now be set directly from isc_nmhandle_keepalive().

protocols that have an underlying TCP socket (i.e., TLS stream
and HTTP), now have protocol-specific routines, called by
isc_nmhandle_keeaplive(), to set the keepalive value on the
underlying socket.
2021-08-27 10:02:10 -07:00
Evan Hunt
7867b8b57d enable keepalive when the keepalive EDNS option is seen
previously, receiving a keepalive option had no effect on how
long named would keep the connection open; there was a place to
configure the keepalive timeout but it was never used. this commit
corrects that.

this also fixes an error in isc__nm_{tcp,tls}dns_keepalive()
in which the sense of a REQUIRE test was reversed; previously this
error had not been noticed because the functions were not being
used.
2021-08-27 09:56:51 -07:00
Ondřej Surý
87d5c8ab7c Disable the Path MTU Discover on UDP Sockets
Instead of disabling the fragmentation on the UDP sockets, we now
disable the Path MTU Discovery by setting IP(V6)_MTU_DISCOVER socket
option to IP_PMTUDISC_OMIT on Linux and disabling IP(V6)_DONTFRAG socket
option on FreeBSD.  This option sets DF=0 in the IP header and also
ignores the Path MTU Discovery.

As additional mitigation on Linux, we recommend setting
net.ipv4.ip_no_pmtu_disc to Mode 3:

    Mode 3 is a hardend pmtu discover mode. The kernel will only accept
    fragmentation-needed errors if the underlying protocol can verify
    them besides a plain socket lookup. Current protocols for which pmtu
    events will be honored are TCP, SCTP and DCCP as they verify
    e.g. the sequence number or the association. This mode should not be
    enabled globally but is only intended to secure e.g. name servers in
    namespaces where TCP path mtu must still work but path MTU
    information of other protocols should be discarded. If enabled
    globally this mode could break other protocols.
2021-08-19 07:12:33 +02:00
Artem Boldariev
170cc41d5c Get rid of some HTTP/2 related types when NGHTTP2 is not available
This commit removes definitions of some DoH-related types when
libnghttp2 is not available.
2021-08-04 10:32:27 +03:00
Ondřej Surý
a9e6a7ae57 Disable setting the thread affinity
It was discovered that setting the thread affinity on both the netmgr
and netthread threads lead to inconsistent recursive performance because
sometimes the netmgr and netthread threads would compete over single
resource and sometimes not.

Removing setting the affinity causes a slight dip in the authoritative
performance around 5% (the measured range was from 3.8% to 7.8%), but
the recursive performance is now consistently good.
2021-07-13 14:48:29 +02:00
Ondřej Surý
29a285a67d Revert the allocate/free -> get/put change from jemalloc change
In the jemalloc merge request, we missed the fact that ah_frees and ah_handles
are reallocated which is not compatible with using isc_mem_get() for allocation
and isc_mem_put() for deallocation.  This commit reverts that part and restores
use of isc_mem_allocate() and isc_mem_free().
2021-07-09 18:19:57 +02:00
Ondřej Surý
f487c6948b Replace locked mempools with memory contexts
Current mempools are kind of hybrid structures - they serve two
purposes:

 1. mempool with a lock is basically static sized allocator with
    pre-allocated free items

 2. mempool without a lock is a doubly-linked list of preallocated items

The first kind of usage could be easily replaced with jemalloc small
sized arena objects and thread-local caches.

The second usage not-so-much and we need to keep this (in
libdns:message.c) for performance reasons.
2021-07-09 15:58:02 +02:00
Ondřej Surý
5ab05d1696 Replace isc_mem_allocate() usage with isc_mem_get() in netmgr.c
The isc_mem_allocate() comes with additional cost because of the memory
tracking.  In this commit, we replace the usage with isc_mem_get()
because we track the allocated sizes anyway, so it's possible to also
replace isc_mem_free() with isc_mem_put().
2021-07-09 15:58:02 +02:00
Artem Boldariev
c6d0e3d3a7 Return HTTP status code for small/malformed requests
This commit makes BIND return HTTP status codes for malformed or too
small requests.

DNS request processing code would ignore such requests. Such an
approach works well for other DNS transport but does not make much
sense for HTTP, not allowing it to complete the request/response
sequence.

Suppose execution has reached the point where DNS message handling
code has been called. In that case, it means that the HTTP request has
been successfully processed, and, thus, we are expected to respond to
it either with a message containing some DNS payload or at least to
return an error status code. This commit ensures that BIND behaves
this way.
2021-07-09 16:37:08 +03:00
Ondřej Surý
2bb454182b Make the DNS over HTTPS support optional
This commit adds two new autoconf options `--enable-doh` (enabled by
default) and `--with-libnghttp2` (mandatory when DoH is enabled).

When DoH support is disabled the library is not linked-in and support
for http(s) protocol is disabled in the netmgr, named and dig.
2021-07-07 09:50:53 +02:00
Ondřej Surý
b941411072 Disable IP fragmentation on the UDP sockets
In DNS Flag Day 2020, we started setting the DF (Don't Fragment socket
option on the UDP sockets.  It turned out, that this code was incomplete
leading to dropping the outgoing UDP packets.

This has been now remedied, so it is possible to disable the
fragmentation on the UDP sockets again as the sending error is now
handled by sending back an empty response with TC (truncated) bit set.

This reverts commit 66eefac78c92b64b6689a1655cc677a2b1d13496.
2021-06-23 17:41:34 +02:00
Ondřej Surý
ec86759401 Replace netmgr per-protocol sequential function with a common one
Previously, each protocol (TCPDNS, TLSDNS) has specified own function to
disable pipelining on the connection.  An oversight would lead to
assertion failure when opcode is not query over non-TCPDNS protocol
because the isc_nm_tcpdns_sequential() function would be called over
non-TCPDNS socket.  This commit removes the per-protocol functions and
refactors the code to have and use common isc_nm_sequential() function
that would either disable the pipelining on the socket or would handle
the request in per specific manner.  Currently it ignores the call for
HTTP sockets and causes assertion failure for protocols where it doesn't
make sense to call the function at all.
2021-06-22 17:21:44 +03:00
Artem Boldariev
5b507c1136 Fix BIND to serve large HTTP responses
This commit makes NM code to report HTTP as a stream protocol. This
makes it possible to handle large responses properly. Like:

dig +https @127.0.0.1 A cmts1-dhcp.longlines.com
2021-06-14 11:37:17 +03:00
Ondřej Surý
440fb3d225 Completely remove BIND 9 Windows support
The Windows support has been completely removed from the source tree
and BIND 9 now no longer supports native compilation on Windows.

We might consider reviewing mingw-w64 port if contributed by external
party, but no development efforts will be put into making BIND 9 compile
and run on Windows again.
2021-06-09 14:35:14 +02:00
Ondřej Surý
f14d870d15 Fix copy&paste error in setsockopt_off
Because of copy&paste error the setsockopt_off macro would enable
the socket option instead of disabling it.
2021-06-02 17:47:14 +02:00
Ondřej Surý
87fe97ed91 Add asynchronous work API to the network manager
The libuv has a support for running long running tasks in the dedicated
threadpools, so it doesn't affect networking IO.

This commit adds isc_nm_work_enqueue() wrapper that would wraps around
the libuv API and runs it on top of associated worker loop.

The only limitation is that the function must be called from inside
network manager thread, so the call to the function should be wrapped
inside a (bound) task.
2021-05-31 14:52:05 +02:00
Mark Andrews
715a2c7fc1 Add missing initialisations
configuring with --enable-mutex-atomics flagged these incorrectly
initialised variables on systems where pthread_mutex_init doesn't
just zero out the structure.
2021-05-26 08:15:08 +00:00
Ondřej Surý
50270de8a0 Refactor the interface handling in the netmgr
The isc_nmiface_t type was holding just a single isc_sockaddr_t,
so we got rid of the datatype and use plain isc_sockaddr_t in place
where isc_nmiface_t was used before.  This means less type-casting and
shorter path to access isc_sockaddr_t members.

At the same time, instead of keeping the reference to the isc_sockaddr_t
that was passed to us when we start listening, we will keep a local
copy. This prevents the data race on destruction of the ns_interface_t
objects where pending nmsockets could reference the sockaddr of already
destroyed ns_interface_t object.
2021-05-26 09:43:12 +02:00
Ondřej Surý
28b65d8256 Reduce the number of clientmgr objects created
Previously, as a way of reducing the contention between threads a
clientmgr object would be created for each interface/IP address.

We tasks being more strictly bound to netmgr workers, this is no longer
needed and we can just create clientmgr object per worker queue (ncpus).

Each clientmgr object than would have a single task and single memory
context.
2021-05-24 20:44:54 +02:00
Mark Andrews
7e83c6df94 initialise worker->cond_prio 2021-05-18 07:47:42 +00:00
Ondřej Surý
9e3cb396b2 Replace netmgr quantum with loop-preventing barrier
Instead of using fixed quantum, this commit adds atomic counter for
number of items on each queue and uses the number of netievents
scheduled to run as the limit of maximum number of netievents for a
single process_queue() run.

This prevents the endless loops when the netievent would schedule more
netievents onto the same loop, but we don't have to pick "magic" number
for the quantum.
2021-05-17 11:59:19 +02:00