2
0
mirror of https://gitlab.isc.org/isc-projects/bind9 synced 2025-08-28 21:17:54 +00:00

425 Commits

Author SHA1 Message Date
Artem Boldariev
64cd7e8a7f Fix crash in DoH on empty query string in GET requests
An unhandled code path left GET query string data uninitialised (equal
to NULL) and led to a crash during the requests' base64 data
decoding. This commit fixes that.
2021-07-13 16:53:51 +03:00
Ondřej Surý
a9e6a7ae57 Disable setting the thread affinity
It was discovered that setting the thread affinity on both the netmgr
and netthread threads lead to inconsistent recursive performance because
sometimes the netmgr and netthread threads would compete over single
resource and sometimes not.

Removing setting the affinity causes a slight dip in the authoritative
performance around 5% (the measured range was from 3.8% to 7.8%), but
the recursive performance is now consistently good.
2021-07-13 14:48:29 +02:00
Ondřej Surý
29a285a67d Revert the allocate/free -> get/put change from jemalloc change
In the jemalloc merge request, we missed the fact that ah_frees and ah_handles
are reallocated which is not compatible with using isc_mem_get() for allocation
and isc_mem_put() for deallocation.  This commit reverts that part and restores
use of isc_mem_allocate() and isc_mem_free().
2021-07-09 18:19:57 +02:00
Ondřej Surý
f487c6948b Replace locked mempools with memory contexts
Current mempools are kind of hybrid structures - they serve two
purposes:

 1. mempool with a lock is basically static sized allocator with
    pre-allocated free items

 2. mempool without a lock is a doubly-linked list of preallocated items

The first kind of usage could be easily replaced with jemalloc small
sized arena objects and thread-local caches.

The second usage not-so-much and we need to keep this (in
libdns:message.c) for performance reasons.
2021-07-09 15:58:02 +02:00
Ondřej Surý
5ab05d1696 Replace isc_mem_allocate() usage with isc_mem_get() in netmgr.c
The isc_mem_allocate() comes with additional cost because of the memory
tracking.  In this commit, we replace the usage with isc_mem_get()
because we track the allocated sizes anyway, so it's possible to also
replace isc_mem_free() with isc_mem_put().
2021-07-09 15:58:02 +02:00
Artem Boldariev
c6d0e3d3a7 Return HTTP status code for small/malformed requests
This commit makes BIND return HTTP status codes for malformed or too
small requests.

DNS request processing code would ignore such requests. Such an
approach works well for other DNS transport but does not make much
sense for HTTP, not allowing it to complete the request/response
sequence.

Suppose execution has reached the point where DNS message handling
code has been called. In that case, it means that the HTTP request has
been successfully processed, and, thus, we are expected to respond to
it either with a message containing some DNS payload or at least to
return an error status code. This commit ensures that BIND behaves
this way.
2021-07-09 16:37:08 +03:00
Artem Boldariev
fedff2cd6c Return "Bad Request" (400) in a case of Base64 decoding error
This error code fits better than the more generic "Internal Server
Error" (500) which implies that the problem is on the server.

Also, do not end the whole HTTP/2 session on a bad request.
2021-07-09 16:26:46 +03:00
Artem Boldariev
1792740075 Ignore an "Accept" HTTP header value
We were too strict regarding the value and presence of "Accept" HTTP
header, slightly breaking compatibility with the specification.

According to RFC8484 client SHOULD add "Accept" header to the requests
but MUST be able to handle "application/dns-message" media type
regardless of the value of the header. That basically suggests we
ignore its value.

Besides, verifying the value of the "Accept" header is a bit tricky
because it could contain multiple media types, thus requiring proper
parsing. That is doable but does not provide us with any benefits.

Among other things, not verifying the value also fixes compatibility
with clients, which could advertise multiple media types as supported,
which we should accept. For example, it is possible for a perfectly
valid request to contain "application/dns-message", "application/*",
and "*/*" in the "Accept" header value. Still, we would treat such a
request as invalid.
2021-07-09 16:26:46 +03:00
Artem Boldariev
7b6945fb60 Fix BIND hanging when browsers end HTTP/2 streams prematurely
The commit fixes BIND hanging when browsers end HTTP/2 streams
prematurely (for example, by sending RST_STREAM). It ensures that
isc__nmsocket_prep_destroy() will be called for an HTTP/2 stream,
allowing it to be properly disposed.

The problem was impossible to reproduce using dig or DoH benchmarking
software (e.g. flamethrower) because these do not tend to end HTTP/2
streams prematurely.
2021-07-09 15:42:44 +03:00
Artem Boldariev
094fcc10e7 Move the code which calls server read callback into a separate func
This commit moves the code which calls server read callback into a
separate function to avoid code repetition.
2021-07-09 15:42:44 +03:00
Ondřej Surý
2bb454182b Make the DNS over HTTPS support optional
This commit adds two new autoconf options `--enable-doh` (enabled by
default) and `--with-libnghttp2` (mandatory when DoH is enabled).

When DoH support is disabled the library is not linked-in and support
for http(s) protocol is disabled in the netmgr, named and dig.
2021-07-07 09:50:53 +02:00
Ondřej Surý
b941411072 Disable IP fragmentation on the UDP sockets
In DNS Flag Day 2020, we started setting the DF (Don't Fragment socket
option on the UDP sockets.  It turned out, that this code was incomplete
leading to dropping the outgoing UDP packets.

This has been now remedied, so it is possible to disable the
fragmentation on the UDP sockets again as the sending error is now
handled by sending back an empty response with TC (truncated) bit set.

This reverts commit 66eefac78c92b64b6689a1655cc677a2b1d13496.
2021-06-23 17:41:34 +02:00
Evan Hunt
a3ba95116e Handle UDP send errors when sending DNS message larger than MTU
When the fragmentation is disabled on UDP sockets, the uv_udp_send()
call can fail with UV_EMSGSIZE for messages larger than path MTU.
Previously, this error would end with just discarding the response.  In
this commit, a proper handling of such case is added and on such error,
a new DNS response with truncated bit set is generated and sent to the
client.

This change allows us to disable the fragmentation on the UDP
sockets again.
2021-06-23 17:41:34 +02:00
Ondřej Surý
ec86759401 Replace netmgr per-protocol sequential function with a common one
Previously, each protocol (TCPDNS, TLSDNS) has specified own function to
disable pipelining on the connection.  An oversight would lead to
assertion failure when opcode is not query over non-TCPDNS protocol
because the isc_nm_tcpdns_sequential() function would be called over
non-TCPDNS socket.  This commit removes the per-protocol functions and
refactors the code to have and use common isc_nm_sequential() function
that would either disable the pipelining on the socket or would handle
the request in per specific manner.  Currently it ignores the call for
HTTP sockets and causes assertion failure for protocols where it doesn't
make sense to call the function at all.
2021-06-22 17:21:44 +03:00
Artem Boldariev
dc356bb196 Fix ASAN error in DoH (passing NULL to memmove())
The warning was produced by an ASAN build:

runtime error: null pointer passed as argument 2, which is declared to
never be null

This commit fixes it by checking if nghttp2_session_mem_send() has
actually returned anything.
2021-06-16 17:46:10 +03:00
Artem Boldariev
ccd2267b1c Set sock->iface and sock->peer properly for layered connection types
This change sets the mentioned fields properly and gets rid of klusges
added in the times when we were keeping pointers to isc_sockaddr_t
instead of copies. Among other things it helps to avoid a situation
when garbage instead of an address appears in dig output.
2021-06-14 11:37:36 +03:00
Artem Boldariev
b84fa122ce Make BIND refuse to serve XFRs over DoH
We cannot use DoH for zone transfers.  According to RFC8484 a DoH
request contains exactly one DNS message (see Section 6: Definition of
the "application/dns-message" Media Type,
https://datatracker.ietf.org/doc/html/rfc8484#section-6).  This makes
DoH unsuitable for zone transfers as often (and usually!) these need
more than one DNS message, especially for larger zones.

As zone transfers over DoH are not (yet) standardised, nor discussed
in RFC8484, the best thing we can do is to return "not implemented."

Technically DoH can be used to transfer small zones which fit in one
message, but that is not enough for the generic case.

Also, this commit makes the server-side DoH code ensure that no
multiple responses could be attempted to be sent over one HTTP/2
stream. In HTTP/2 one stream is mapped to one request/response
transaction. Now the write callback will be called with failure error
code in such a case.
2021-06-14 11:37:36 +03:00
Artem Boldariev
009752cab0 Pass an HTTP handle to the read callback when finishing a stream
This commit fixes a leftover from an earlier version of the client-side
DoH code when the underlying transport handle was used directly.
2021-06-14 11:37:36 +03:00
Artem Boldariev
d5d20cebb2 Fix a crash in the client-side DoH code (header processing callback)
Support a situation in header processing callback when client side
code could receive a belated response or part of it. That could
happen when the HTTP/2 session was already closed, but there were some
response data from server in flight. Other client-side nghttp2
callbacks code already handled this case.

The bug became apparent after HTTP/2 write buffering was supported,
leading to rare unit test failures.
2021-06-14 11:37:33 +03:00
Artem Boldariev
2dfc0d9afc Nullify connect.cstream in time and keep track of all client streams
This commit ensures that sock->h2.connect.cstream gets nullified when
the object in question is deleted. This fixes a nasty crash in dig
exposed when receiving large responses leading to double free()ing.

Also, it refactors how the client-side code keeps track of client
streams (hopefully) preventing from similar errors appearing in the
future.
2021-06-14 11:37:29 +03:00
Artem Boldariev
5b507c1136 Fix BIND to serve large HTTP responses
This commit makes NM code to report HTTP as a stream protocol. This
makes it possible to handle large responses properly. Like:

dig +https @127.0.0.1 A cmts1-dhcp.longlines.com
2021-06-14 11:37:17 +03:00
Ondřej Surý
440fb3d225 Completely remove BIND 9 Windows support
The Windows support has been completely removed from the source tree
and BIND 9 now no longer supports native compilation on Windows.

We might consider reviewing mingw-w64 port if contributed by external
party, but no development efforts will be put into making BIND 9 compile
and run on Windows again.
2021-06-09 14:35:14 +02:00
Ondřej Surý
f14d870d15 Fix copy&paste error in setsockopt_off
Because of copy&paste error the setsockopt_off macro would enable
the socket option instead of disabling it.
2021-06-02 17:47:14 +02:00
Ondřej Surý
67afea6cfc Cleanup the remaining of HAVE_UV_<func> macros
While cleaning up the usage of HAVE_UV_<func> macros, we forgot to
cleanup the HAVE_UV_UDP_CONNECT in the actual code and
HAVE_UV_TRANSLATE_SYS_ERROR and this was causing Windows build to fail
on uv_udp_send() because the socket was already connected and we were
falsely assuming that it was not.

The platforms with autoconf support were not affected, because we were
still checking for the functions from the configure.
2021-06-02 11:23:36 +02:00
Artem Boldariev
35d0027f36 HTTP/2 write buffering
This commit adds the ability to consolidate HTTP/2 write requests if
there is already one in flight. If it is the case, the code will
consolidate multiple subsequent write request into a larger one
allowing to utilise the network in a more efficient way by creating
larger TCP packets as well as by reducing TLS records overhead (by
creating large TLS records instead of multiple small ones).

This optimisation is especially efficient for clients, creating many
concurrent HTTP/2 streams over a transport connection at once.  This
way, the code might create a small amount of multi-kilobyte requests
instead of many 50-120 byte ones.

In fact, it turned out to work so well that I had to add a work-around
to the code to ensure compatibility with the flamethrower, which, at
the time of writing, does not support TLS records larger than two
kilobytes. Now the code tries to flush the write buffer after 1.5
kilobyte, which is still pretty adequate for our use case.

Essentially, this commit implements a recommendation given by nghttp2
library:

https://nghttp2.org/documentation/nghttp2_session_mem_send.html
2021-06-01 21:07:45 +03:00
Ondřej Surý
87fe97ed91 Add asynchronous work API to the network manager
The libuv has a support for running long running tasks in the dedicated
threadpools, so it doesn't affect networking IO.

This commit adds isc_nm_work_enqueue() wrapper that would wraps around
the libuv API and runs it on top of associated worker loop.

The only limitation is that the function must be called from inside
network manager thread, so the call to the function should be wrapped
inside a (bound) task.
2021-05-31 14:52:05 +02:00
Ondřej Surý
211bfefbaa Use UV_VERSION_HEX to decide whether we need libuv shim functions
Instead of having a configure check for every missing function that has
been added in later version of libuv, we now use UV_VERSION_HEX to
decide whether we need the shim or not.
2021-05-31 14:52:05 +02:00
Ondřej Surý
7477d1b2ed Add uv_os_getenv() and uv_os_setenv() compatibility shims
The uv_os_getenv() and uv_os_setenv() functions were introduced in the
libuv >= 1.12.0.  Add simple compatibility shims for older versions.
2021-05-31 14:52:05 +02:00
Ondřej Surý
f752840db3 Add uv_req_get_data() and uv_req_set_data() compatibility shims
The uv_req_get_data() and uv_req_set_data() functions were introduced in
libuv >= 1.19.0, so we need to add compatibility shims with older libuv
versions.
2021-05-31 14:52:05 +02:00
Mark Andrews
715a2c7fc1 Add missing initialisations
configuring with --enable-mutex-atomics flagged these incorrectly
initialised variables on systems where pthread_mutex_init doesn't
just zero out the structure.
2021-05-26 08:15:08 +00:00
Ondřej Surý
a227562f13 Cleanup the struct isc_nmiface
In previous MR, I forgot to remove the `struct isc_nmiface`, this commit
rectifies that.
2021-05-26 09:55:10 +02:00
Ondřej Surý
50270de8a0 Refactor the interface handling in the netmgr
The isc_nmiface_t type was holding just a single isc_sockaddr_t,
so we got rid of the datatype and use plain isc_sockaddr_t in place
where isc_nmiface_t was used before.  This means less type-casting and
shorter path to access isc_sockaddr_t members.

At the same time, instead of keeping the reference to the isc_sockaddr_t
that was passed to us when we start listening, we will keep a local
copy. This prevents the data race on destruction of the ns_interface_t
objects where pending nmsockets could reference the sockaddr of already
destroyed ns_interface_t object.
2021-05-26 09:43:12 +02:00
Ondřej Surý
28b65d8256 Reduce the number of clientmgr objects created
Previously, as a way of reducing the contention between threads a
clientmgr object would be created for each interface/IP address.

We tasks being more strictly bound to netmgr workers, this is no longer
needed and we can just create clientmgr object per worker queue (ncpus).

Each clientmgr object than would have a single task and single memory
context.
2021-05-24 20:44:54 +02:00
Mark Andrews
7e83c6df94 initialise worker->cond_prio 2021-05-18 07:47:42 +00:00
Ondřej Surý
9e3cb396b2 Replace netmgr quantum with loop-preventing barrier
Instead of using fixed quantum, this commit adds atomic counter for
number of items on each queue and uses the number of netievents
scheduled to run as the limit of maximum number of netievents for a
single process_queue() run.

This prevents the endless loops when the netievent would schedule more
netievents onto the same loop, but we don't have to pick "magic" number
for the quantum.
2021-05-17 11:59:19 +02:00
Ondřej Surý
4509089419 Add configuration option to set send/recv buffers on the nm sockets
This commit adds a new configuration option to set the receive and send
buffer sizes on the TCP and UDP netmgr sockets.  The default is `0`
which doesn't set any value and just uses the value set by the operating
system.

There's no magic value here - set it too small and the performance will
drop, set it too large, the buffers can fill-up with queries that have
already timeouted on the client side and nobody is interested for the
answer and this would just make the server clog up even more by making
it produce useless work.

The `netstat -su` can be used on POSIX systems to monitor the receive
and send buffer errors.
2021-05-17 08:47:09 +02:00
Ondřej Surý
cd413234f7 Fix the outgoing UDP socket selection on Windows
The outgoing UDP socket selection would pick unintialized children
socket on Windows, because we have more netmgr workers than we have
listening sockets.  This commit fixes the selection by keeping the
outgoing socket the same, so it's always run on existing socket.
2021-05-13 15:04:48 +02:00
Artem Boldariev
6816a741ca Fix crash in TLS caused by improper handling of shutdown messages
The problem was found when flamethrower was accidentally run in DoT
mode against DoH port.
2021-05-13 10:42:25 +03:00
Artem Boldariev
1947f6372d Limit the number of active concurrent HTTP/2 streams
The initial intent was to limit the number of concurrent streams by
the value of 100 but due to the error when reading the documentation
it was set to the maximum possible number of streams per session.

This could lead to security issues, e.g. a remote attacker could have
taken down the BIND instance by creating lots of sessions via low
number of transport connections. This commit fixes that.
2021-05-13 10:42:25 +03:00
Artem Boldariev
d80d1b0dd9 Do not allow empty DoH endpoints to be added
It was possible to specify empty DoH endpoint in BIND's configuration
file: that was an error, we should not allow doing so.
2021-05-13 10:42:25 +03:00
Artem Boldariev
9155a87528 Do not call nghttp2_session_terminate_session() in server-side code
We should not call nghttp2_session_terminate_session() in server-side
code after all of the active HTTP/2 streams are processed. The
underlying transport connection is expected to remain opened at least
for some time in this case for new HTTP/2 requests to arrive. That is
what flamethrower was expecting and it makes perfect sense from the
HTTP/2 perspective.
2021-05-13 10:42:25 +03:00
Mark Andrews
0f6ae9000a initalise sock->cond 2021-05-11 14:06:26 +02:00
Ondřej Surý
3713a38689 Bump the netmgr quantum to 1024
During the stress testing, it was discovered that the default netmgr
quantum of 128 is not enough and there was a performance drop for TCP on
FreeBSD.  Bumping the default quantum to 1024 solves the performance
issue and is still enough to prevent the endless loops.
2021-05-10 21:32:31 +02:00
Ondřej Surý
365c6a9851 ensure interlocked netmgr events run on worker[0]
Network manager events that require interlock (pause, resume, listen)
are now always executed in the same worker thread, mgr->workers[0],
to prevent races.

"stoplistening" events no longer require interlock.
2021-05-07 14:28:32 -07:00
Evan Hunt
c44423127d fix shutdown deadlocks
- ensure isc_nm_pause() and isc_nm_resume() work the same whether
  run from inside or outside of the netmgr.
- promote 'stop' events to the priority event level so they can
  run while the netmgr is pausing or paused.
- when pausing, drain the priority queue before acquiring an
  interlock; this prevents a deadlock when another thread is waiting
  for us to complete a task.
- release interlock after pausing, reacquire it when resuming, so
  that stop events can happen.

some incidental changes:
- use a function to enqueue pause and resume events (this was part of a
  different change attempt that didn't work out; I kept it because I
  thought was more readable).
- make mgr->nworkers a signed int to remove some annoying integer casts.
2021-05-07 14:28:32 -07:00
Ondřej Surý
4c8f6ebeb1 Use barriers for netmgr synchronization
The netmgr listening, stoplistening, pausing and resuming functions
now use barriers for synchronization, which makes the code much simpler.

isc/barrier.h defines isc_barrier macros as a front-end for uv_barrier
on platforms where that works, and pthread_barrier where it doesn't
(including TSAN builds).
2021-05-07 14:28:32 -07:00
Ondřej Surý
2eae7813b6 Run isc__nm_http_stoplistening() synchronously in netmgr
When isc__nm_http_stoplistening() is run from inside the netmgr, we need
to make sure it's run synchronously.  This commit is just a band-aid
though, as the desired behvaior for isc_nm_stoplistening() is not always
the same:

  1. When run from outside user of the interface, the call must be
     synchronous, e.g. the calling code expects the call to really stop
     listening on the interfaces.

  2. But if there's a call from listen<proto> when listening fails,
     that needs to be scheduled to run asynchronously, because
     isc_nm_listen<proto> is being run in a paused (interlocked)
     netmgr thread and we could get stuck.

The proper solution would be to make isc_nm_stoplistening()
behave like uv_close(), i.e., to have a proper callback.
2021-05-07 14:28:32 -07:00
Evan Hunt
5c08f97791 only run tasks as privileged if taskmgr is in privileged mode
all zone loading tasks have the privileged flag, but we only want
them to run as privileged tasks when the server is being initialized;
if we privilege them the rest of the time, the server may hang for a
long time after a reload/reconfig. so now we call isc_taskmgr_setmode()
to turn privileged execution mode on or off in the task manager.

isc_task_privileged() returns true if the task's privilege flag is
set *and* the taskmgr is in privileged execution mode. this is used
to determine in which netmgr event queue the task should be run.
2021-05-07 14:28:30 -07:00
Ondřej Surý
29a208aaf7 Fix crash when allocating UDP socket fails on OpenBSD
When socket() call fails, the UDP connect code would call the connectcb
with empty req->handle.  This has been fixed.
2021-05-07 14:28:30 -07:00
Ondřej Surý
dacf586e18 Make the netmgr queue processing quantized
There was a theoretical possibility of clogging up the queue processing
with an endless loop where currently processing netievent would schedule
new netievent that would get processed immediately.  This wasn't such a
problem when only netmgr netievents were processed, but with the
addition of the tasks, there are at least two situation where this could
happen:

 1. In lib/dns/zone.c:setnsec3param() the task would get re-enqueued
    when the zone was not yet fully loaded.

 2. Tasks have internal quantum for maximum number of isc_events to be
    processed, when the task quantum is reached, the task would get
    rescheduled and then immediately processed by the netmgr queue
    processing.

As the isc_queue doesn't have a mechanism to atomically move the queue,
this commit adds a mechanism to quantize the queue, so enqueueing new
netievents will never stop processing other uv_loop_t events.
The default quantum size is 128.

Since the queue used in the network manager allows items to be enqueued
more than once, tasks are now reference-counted around task_ready()
and task_run(). task_ready() now has a public API wrapper,
isc_task_ready(), that the netmgr can use to reschedule processing
of a task if the quantum has been reached.

Incidental changes: Cleaned up some unused fields left in isc_task_t
and isc_taskmgr_t after the last refactoring, and changed atomic
flags to atomic_bools for easier manipulation.
2021-05-07 14:28:30 -07:00