The netievent handler for isc_nmsocket_set_tlsctx() was inadvertently
ifdef'd out when BIND was built with --disable-doh, resulting in an
assertion failure on startup when DoT was configured.
It was possible that accept callback can be called after listener
shutdown. In such a case the callback pointer equals NULL, leading to
segmentation fault. This commit fixes that.
This commit introduces a primitive isc__nmsocket_stop() which performs
shutting down on a multilayered socket ensuring the proper order of
the operations.
The shared data within the socket object can be destroyed after the
call completed, as it is guaranteed to not be used from within the
context of other worker threads.
During loop manager refactoring isc_nmsocket_set_tlsctx() was not
properly adapted. The function is expected to broadcast the new TLS
context for every worker, but this behaviour was accidentally broken.
The isc_nm_udpconnect() erroneously set the reuse port with
load-balancing on the outgoing connected UDP sockets. This socket
option makes only sense for the listening sockets. Don't set the
load-balancing reuse port option on the outgoing UDP sockets.
This commit fixes TLS DNS verification error message reporting which
we probably broke during one of the recent networking code
refactorings.
This prevent e.g. dig from producing useful error messages related to
TLS certificates verification.
Ensure that TLS error is empty before calling SSL_get_error() or doing
SSL I/O so that the result will not get affected by prior error
statuses.
In particular, the improper error handling led to intermittent unit
test failure and, thus, could be responsible for some of the system
test failures and other intermittent TLS-related issues.
See here for more details:
https://www.openssl.org/docs/man3.0/man3/SSL_get_error.html
In particular, it mentions the following:
> The current thread's error queue must be empty before the TLS/SSL
> I/O operation is attempted, or SSL_get_error() will not work
> reliably.
As we use the result of SSL_get_error() to decide on I/O operations,
we need to ensure that it works reliably by cleaning the error queue.
TLS DNS: empty error queue before attempting I/O
The check is left from when tcp_connect_direct() called isc__nm_socket()
and it was uncertain whether it had succeeded, but now isc__nm_socket()
is called before tcp_connect_direct(), so sock->fd cannot be -1.
*** CID 357292: (REVERSE_NEGATIVE)
/lib/isc/netmgr/tcp.c: 309 in isc_nm_tcpconnect()
303
304 atomic_store(&sock->active, true);
305
306 result = tcp_connect_direct(sock, req);
307 if (result != ISC_R_SUCCESS) {
308 atomic_store(&sock->active, false);
>>> CID 357292: (REVERSE_NEGATIVE)
>>> You might be using variable "sock->fd" before verifying that it is >= 0.
309 if (sock->fd != (uv_os_sock_t)(-1)) {
310 isc__nm_tcp_close(sock);
311 }
312 isc__nm_connectcb(sock, req, result, true);
313 }
314
Add new semantic patch to replace the straightfoward uses of:
ptr = isc_mem_{get,allocate}(..., size);
memset(ptr, 0, size);
with the new API call:
ptr = isc_mem_{get,allocate}x(..., size, ISC_MEM_ZERO);
The isc__nm_udp_send() callback would be called synchronously when
shutting down or when the socket has been closed. This could lead to
double locking in the calling code and thus those callbacks needs to be
called asynchronously.
By bumping the minimum libuv version to 1.34.0, it allows us to remove
all libuv shims we ever had and makes the code much cleaner. The
up-to-date libuv is available in all distributions supported by BIND
9.19+ either natively or as a backport.
After the loopmgr work has been merged, we can now cleanup the TCP and
TLS protocols a little bit, because there are stronger guarantees that
the sockets will be kept on the respective loops/threads. We only need
asynchronous call for listening sockets (start, stop) and reading from
the TCP (because the isc_nm_read() might be called from read callback
again.
This commit does the following changes (they are intertwined together):
1. Cleanup most of the asynchronous events in the TCP code, and add
comments for the events that needs to be kept asynchronous.
2. Remove isc_nm_resumeread() from the netmgr API, and replace
isc_nm_resumeread() calls with existing isc_nm_read() calls.
3. Remove isc_nm_pauseread() from the netmgr API, and replace
isc_nm_pauseread() calls with a new isc_nm_read_stop() call.
4. Disable the isc_nm_cancelread() for the streaming protocols, only the
datagram-like protocols can use isc_nm_cancelread().
5. Add isc_nmhandle_close() that can be used to shutdown the socket
earlier than after the last detach. Formerly, the socket would be
closed only after all reading and sending would be finished and the
last reference would be detached. The new isc_nmhandle_close() can
be used to close the underlying socket earlier, so all the other
asynchronous calls would call their respective callbacks immediately.
Co-authored-by: Ondřej Surý <ondrej@isc.org>
Co-authored-by: Artem Boldariev <artem@isc.org>
The destructor for the isc__nmsocket_t was missing call to the
isc_refcount_destroy() on the reference counter, which might lead to
spurious ThreadSanitizer data race warnings if we ever change the
acquire-release memory order in the isc_refcount_decrement().
Simplify the closing code - during the loopmgr implementation, it was
discovered that the various lists used by the uv_loop_t aren't FIFO, but
LIFO. See doc/dev/libuv.md for more details.
With this knowledge, we can close the protocol handles (uv_udp_t and
uv_tcp_t) and uv_timer_t at the same time by reordering the uv_close()
calls, and thus making sure that after calling the
isc__nm_stoplistening(), the code will not issue any additional callback
calls (accept, read) on the socket that stopped listening.
This might help with the TLS and DoH shutting down sequence as described
in the [GL #3509] as we now stop the reading, stop the timer and call
the uv_close() as earliest as possible.
The network manager UDP code was misinterpreting when the libuv called
the udp_recv_cb with nrecv == 0 and addr == NULL -> this doesn't really
mean that the "stream" has ended, but the libuv indicates that the
receive buffer can be freed. This could lead to assertion failure in
the code that calls isc_nm_read() from the network manager read callback
due to the extra spurious callbacks.
Properly handle the extra callback calls from the libuv in the client
read callback, and refactor the UDP isc_nm_read() implementation to be
synchronous, so no datagram is lost between the time that we stop the
reading from the UDP socket and we restart it again in the asychronous
udpread event.
Add a unit test that tests the isc_nm_read() call from the read
callback to receive two datagrams.
Commit b69e783164cd50e3306364668558e460617ee8fc inadvertently caused
builds using the --disable-doh switch to fail, by putting the
declaration of the isc__nm_async_settlsctx() function inside an #ifdef
block that is only evaluated when DNS-over-HTTPS support is enabled.
This results in the following compilation errors being triggered:
netmgr/netmgr.c:2657:1: error: no previous prototype for 'isc__nm_async_settlsctx' [-Werror=missing-prototypes]
2657 | isc__nm_async_settlsctx(isc__networker_t *worker, isc__netievent_t *ev0) {
| ^~~~~~~~~~~~~~~~~~~~~~~
Fix by making the declaration of the isc__nm_async_settlsctx() function
in lib/isc/netmgr/netmgr-int.h visible regardless of whether
DNS-over-HTTPS support is enabled or not.
The isc_nm_listentlsdns() function erroneously calls
isc__nm_tcpdns_stoplistening() instead of isc__nm_tlsdns_stoplistening()
when something goes wrong, which can cause an assertion failure.
When we are closing the listening sockets, there's a time window in
which the TCP connection could be accepted although the respective
stoplistening function has already returned to control to the caller.
Clear the accept callback function early, so it doesn't get called when
we are not interested in the incoming connections anymore.
Previously:
* applications were using isc_app as the base unit for running the
application and signal handling.
* networking was handled in the netmgr layer, which would start a
number of threads, each with a uv_loop event loop.
* task/event handling was done in the isc_task unit, which used
netmgr event loops to run the isc_event calls.
In this refactoring:
* the network manager now uses isc_loop instead of maintaining its
own worker threads and event loops.
* the taskmgr that manages isc_task instances now also uses isc_loopmgr,
and every isc_task runs on a specific isc_loop bound to the specific
thread.
* applications have been updated as necessary to use the new API.
* new ISC_LOOP_TEST macros have been added to enable unit tests to
run isc_loop event loops. unit tests have been updated to use this
where needed.
In some circumstances generic TLS code could have resumed data reading
unexpectedly on the TCP layer code. Due to this, the behaviour of
isc_nm_pauseread() and isc_nm_resumeread() might have been
unexpected. This commit fixes that.
The bug does not seems to have real consequences in the existing code
due to the way the code is used. However, the bug could have lead to
unexpected behaviour and, at any rate, makes the TLS code behave
differently from the TCP code, with which it attempts to be as
compatible as possible.
Sometimes tls_do_bio() might be called when there is no new data to
process (most notably, when resuming reads), in such a case internal
TLS session state will remain untouched and old value in 'errno' will
alter the result of SSL_get_error() call, possibly making it to return
SSL_ERROR_SYSCALL. This value will be treated as an error, and will
lead to closing the connection, which is not what expected.
The STATID_CONNECT and STATID_CONNECTFAIL statistics were used
incorrectly. The STATID_CONNECT was incremented twice (once in
the *_connect_direct() and once in the callback) and STATID_CONNECTFAIL
would not be incremented at all if the failure happened in the callback.
Closes: #3452
On FreeBSD (and perhaps other *BSD) systems, the TCP connect() call (via
uv_tcp_connect()) can fail with transient UV_EADDRINUSE error. The UDP
code already handles this by trying three times (is a charm) before
giving up. Add a code for the TCP, TCPDNS and TLSDNS layers to also try
three times before giving up by calling uv_tcp_connect() from the
callback two more time on UV_EADDRINUSE error.
Additionally, stop the timer only if we succeed or on hard error via
isc__nm_failed_connect_cb().
Before this change the TLS code would ignore the accept callback result,
and would not try to gracefully close the connection. This had not been
noticed, as it is not really required for DoH. Now the code tries to
shut down the TLS connection gracefully when accepting it is not
successful.
Otherwise the code path will lead to a call to SSL_get_error()
returning SSL_ERROR_SSL, which in turn might lead to closing
connection to early in an unexpected way, as it is clearly not what is
intended.
The issue was found when working on loppmgr branch and appears to
be timing related as well. Might be responsible for some unexpected
transmission failures e.g. on zone transfers.
In some operations - most prominently when establishing connection -
it might be beneficial to bail out earlier when the network manager
is stopping.
The issue is backported from loopmgr branch, where such a change is
not only beneficial, but required.
In some cases - in particular, in case of errors, NULL might be passed
to a connection callback instead of a handle that could have led to
an abort. This commit ensures that such a situation will not occur.
The issue was found when working on the loopmgr branch.
This commit ensures that the underlying TCP socket of a TLS connection
gets closed earlier whenever there are no pending operations on it.
In the loop-manager branch, in some circumstances the connection
could have remained opened for far too long for no reason. This
commit ensures that will not happen.
it's a style violation to have REQUIRE or INSIST contain code that
must run for the server to work. this was being done with some
atomic_compare_exchange calls. these have been cleaned up. uses
of atomic_compare_exchange in assertions have been replaced with
a new macro atomic_compare_exchange_enforced, which uses RUNTIME_CHECK
to ensure that the exchange was successful.
Before the changes from this commit were introduced, the accept
callback function will get called twice when accepting connection
during two of these stages:
* when accepting the TCP connection;
* when handshake has completed.
That is clearly an error, as it should have been called only once. As
far as I understand it the mistake is a result of TLS DNS transport
being essentially a fork of TCP transport, where calling the accept
callback immediately after accepting TCP connection makes sense.
This commit fixes this mistake. It did not have any very serious
consequences because in BIND the accept callback only checks an ACL
and updates stats.
Under specific rare timing circumstances the uv_read_start() could
fail with UV_EINVAL when the connection is reset between the connect (or
accept) and the uv_read_start() call on the nmworker loop. Handle such
situation gracefully by propagating the errors from uv_read_start() into
upper layers, so the socket can be internally closed().
The commit fixes a corner case in client-side DoH code, when a write
attempt is done on a closing socket (session).
The change ensures that the write call-back will be called with a
proper error code (see failed_send_cb() call in client_httpsend()).
Setting the sock->write_timeout from the TCP, TCPDNS, and TLSDNS send
functions could lead to (harmless) data race when setting the value for
the first time when the isc_nm_send() function would be called from
thread not-matching the socket we are sending to. Move the setting the
sock->write_timeout to the matching async function which is always
called from the matching thread.
Since the fctx hash table is now self-resizing, and resolver tasks are
selected to match the thread that created the fetch context, there
shouldn't be any significant advantage to having multiple tasks per CPU;
a single task per thread should be sufficient.
Additionally, the fetch context is always pinned to the calling netmgr
thread to minimize the contention just to coalesced fetches - if two
threads starts the same fetch, it will be pinned to the first one to get
the bucket.
This commit fixes a crash in generic TLS stream code, which could be
reproduced during some runs of the 'sslyze' tool.
The intention of this commit is twofold.
Firstly, it ensures that the TLS socket object cannot be destroyed too
early. Now it is being deleted alongside the underlying TCP socket
object.
Secondly, it ensures that the TLS socket object cannot be destroyed as
a result of calling 'tls_do_bio()' (the primary function which
performs encryption/decryption during the IO) as the code did not
expect that. This code path is fixed now.
As we are going to use libuv outside of the netmgr, we need the shims to
be readily available for the rest of the codebase.
Move the "netmgr/uv-compat.h" to <isc/uv.h> and netmgr/uv-compat.c to
uv.c, and as a rule of thumb, the users of libuv should include
<isc/uv.h> instead of <uv.h> directly.
Additionally, merge netmgr/uverr2result.c into uv.c and rename the
single function from isc__nm_uverr2result() to isc_uverr2result().
Move the netmgr socket related functions from netmgr/netmgr.c and
netmgr/uv-compat.c to netmgr/socket.c, so they are all present all in
the same place. Adjust the names of couple interal functions
accordingly.
This commit ensures that write callbacks are getting called only after
the data has been sent via the network.
Without this fix, a situation could appear when a write callback could
get called before the actual encrypted data would have been sent to
the network. Instead, it would get called right after it would have
been passed to the OpenSSL (i.e. encrypted).
Most likely, the issue does not reveal itself often because the
callback call was asynchronous, so in most cases it should have been
called after the data has been sent, but that was not guaranteed by
the code logic.
Also, this commit removes one memory allocation (netievent) from a hot
path, as there is no need to call this callback asynchronously
anymore.
The connect()ed UDP socket provides feedback on a variety of ICMP
errors (eg port unreachable) which bind can then use to decide what to
do with errors (report them to the client, try again with a different
nameserver etc). However, Linux's implementation does not report what
it considers "transient" conditions, which is defined as Destination
host Unreachable, Destination network unreachable, Source Route Failed
and Message Too Big.
Explicitly enable IP_RECVERR / IPV6_RECVERR (via libuv uv_udp_bind()
flag) to learn about ICMP destination network/host unreachable.