Continuing the effort to move all uses of the isc_socket API into
dispatch.c, this commit removes the dns_tcpmsg module entirely, as
dispatch was its only caller, and moves the parts of its functionality
that were being used into the dispatch module.
This code will be removed when we switch to using netmgr TCPDNS.
configuring with --enable-mutex-atomics flagged these incorrectly
initialised variables on systems where pthread_mutex_init doesn't
just zero out the structure.
The isc_nmiface_t type was holding just a single isc_sockaddr_t,
so we got rid of the datatype and use plain isc_sockaddr_t in place
where isc_nmiface_t was used before. This means less type-casting and
shorter path to access isc_sockaddr_t members.
At the same time, instead of keeping the reference to the isc_sockaddr_t
that was passed to us when we start listening, we will keep a local
copy. This prevents the data race on destruction of the ns_interface_t
objects where pending nmsockets could reference the sockaddr of already
destroyed ns_interface_t object.
dns_message_gettempname() now returns a pointer to an initialized
name associated with a dns_fixedname_t object. it is no longer
necessary to allocate a buffer for temporary names associated with
the message object.
An IXFR containing SOA records with owner names different than the
transferred zone's origin can result in named serving a version of that
zone without an SOA record at the apex. This causes a RUNTIME_CHECK
assertion failure the next time such a zone is refreshed. Fix by
immediately rejecting a zone transfer (either an incremental or
non-incremental one) upon detecting an SOA record not placed at the apex
of the transferred zone.
When we are reading from the xfrin socket, and the transfer would be
shutdown, the shutdown function would call `xfrin_fail()` which in turns
calls `xfrin_cancelio()` that causes the read callback to be invoked
with `ISC_R_CANCELED` status code and that caused yet another
`xfrin_fail()` call.
The fix here is to ensure the `xfrin_fail()` would be run only once
properly using better synchronization on xfr->shuttingdown flag.
The isc_nm_*connect() functions were refactored to always return the
connection status via the connect callback instead of sometimes returning
the hard failure directly (for example, when the socket could not be
created, or when the network manager was shutting down).
This commit changes the connect functions in all the network manager
modules, and also makes the necessary refactoring changes in places
where the connect functions are called.
Add support for a "tls" key/value pair for zone primaries, referencing
either a "tls" configuration statement or "ephemeral". If set to use
TLS, zones will send SOA and AXFR/IXFR queries over a TLS channel.
This is a part of the works that intends to make the netmgr stable,
testable, maintainable and tested. It contains a numerous changes to
the netmgr code and unfortunately, it was not possible to split this
into smaller chunks as the work here needs to be committed as a complete
works.
NOTE: There's a quite a lot of duplicated code between udp.c, tcp.c and
tcpdns.c and it should be a subject to refactoring in the future.
The changes that are included in this commit are listed here
(extensively, but not exclusively):
* The netmgr_test unit test was split into individual tests (udp_test,
tcp_test, tcpdns_test and newly added tcp_quota_test)
* The udp_test and tcp_test has been extended to allow programatic
failures from the libuv API. Unfortunately, we can't use cmocka
mock() and will_return(), so we emulate the behaviour with #define and
including the netmgr/{udp,tcp}.c source file directly.
* The netievents that we put on the nm queue have variable number of
members, out of these the isc_nmsocket_t and isc_nmhandle_t always
needs to be attached before enqueueing the netievent_<foo> and
detached after we have called the isc_nm_async_<foo> to ensure that
the socket (handle) doesn't disappear between scheduling the event and
actually executing the event.
* Cancelling the in-flight TCP connection using libuv requires to call
uv_close() on the original uv_tcp_t handle which just breaks too many
assumptions we have in the netmgr code. Instead of using uv_timer for
TCP connection timeouts, we use platform specific socket option.
* Fix the synchronization between {nm,async}_{listentcp,tcpconnect}
When isc_nm_listentcp() or isc_nm_tcpconnect() is called it was
waiting for socket to either end up with error (that path was fine) or
to be listening or connected using condition variable and mutex.
Several things could happen:
0. everything is ok
1. the waiting thread would miss the SIGNAL() - because the enqueued
event would be processed faster than we could start WAIT()ing.
In case the operation would end up with error, it would be ok, as
the error variable would be unchanged.
2. the waiting thread miss the sock->{connected,listening} = `true`
would be set to `false` in the tcp_{listen,connect}close_cb() as
the connection would be so short lived that the socket would be
closed before we could even start WAIT()ing
* The tcpdns has been converted to using libuv directly. Previously,
the tcpdns protocol used tcp protocol from netmgr, this proved to be
very complicated to understand, fix and make changes to. The new
tcpdns protocol is modeled in a similar way how tcp netmgr protocol.
Closes: #2194, #2283, #2318, #2266, #2034, #1920
* The tcp and tcpdns is now not using isc_uv_import/isc_uv_export to
pass accepted TCP sockets between netthreads, but instead (similar to
UDP) uses per netthread uv_loop listener. This greatly reduces the
complexity as the socket is always run in the associated nm and uv
loops, and we are also not touching the libuv internals.
There's an unfortunate side effect though, the new code requires
support for load-balanced sockets from the operating system for both
UDP and TCP (see #2137). If the operating system doesn't support the
load balanced sockets (either SO_REUSEPORT on Linux or SO_REUSEPORT_LB
on FreeBSD 12+), the number of netthreads is limited to 1.
* The netmgr has now two debugging #ifdefs:
1. Already existing NETMGR_TRACE prints any dangling nmsockets and
nmhandles before triggering assertion failure. This options would
reduce performance when enabled, but in theory, it could be enabled
on low-performance systems.
2. New NETMGR_TRACE_VERBOSE option has been added that enables
extensive netmgr logging that allows the software engineer to
precisely track any attach/detach operations on the nmsockets and
nmhandles. This is not suitable for any kind of production
machine, only for debugging.
* The tlsdns netmgr protocol has been split from the tcpdns and it still
uses the old method of stacking the netmgr boxes on top of each other.
We will have to refactor the tlsdns netmgr protocol to use the same
approach - build the stack using only libuv and openssl.
* Limit but not assert the tcp buffer size in tcp_alloc_cb
Closes: #2061
there were two failures during observed in testing, both occurring
when 'rndc halt' was run rather than 'rndc stop' - the latter dumps
zone contents to disk and presumably introduced enough delay to
prevent the races:
- a failure when the zone was shut down and called dns_xfrin_detach()
before the xfrin had finished connecting; the connect timeout
terminated without detaching its handle
- a failure when the tcpdns socket timer fired after the outerhandle
had already been cleared.
this commit incidentally addresses a failure observed in mutexatomic
due to a variable having been initialized incorrectly.
Previously, the xfrin object relied on four different reference counters
(`refs`, `connects`, `sends`, `recvs`) and destroyed the xfrin object
only if all of them were zero. This commit reduces the reference
counting only to the `references` (renamed from `refs`) counter. We
keep the existing `connects`, `sends` and `recvs` as safe guards, but
they are not formally needed.
since the network manager is now handling timeouts, xfrin doesn't
need an isc_task object.
it may be necessary to revert this later if we find that it's
important for zone_xfrdone() to be executed in the zone task context.
currently things seem to be working well without that, though.
The dns_message_create() function cannot soft fail (as all memory
allocations either succeed or cause abort), so we change the function to
return void and cleanup the calls.
There was a case where an primary server sent a response
on the wrong TCP connection and failure to check the question
section resulted in a truncated zone being served.
Also disable the semantic patch as the code needs tweaks here and there because
some destroy functions might not destroy the object and return early if the
object is still in use.
The "mirror" system test checks whether log messages announcing a mirror
zone coming into effect are emitted properly. However, the helper
functions responsible for waiting for zone transfers and zone loading to
complete do not wait for these exact log messages, but rather for other
ones preceding them, which introduces a possibility of false positives.
This problem cannot be addressed by just changing the log message to
look for because the test still needs to discern between transferring a
zone and loading a zone.
Add two new log messages at debug level 99 (which is what named
instances used in system tests are configured with) that are to be
emitted after the log messages announcing a mirror zone coming into
effect. Tweak the aforementioned helper functions to only return once
the log messages they originally looked for are followed by the newly
added log messages. This reliably prevents races when looking for
"mirror zone is now in use" log messages and also enables a workaround
previously put into place in the "mirror" system test to be reverted.
Log a message when a mirror zone is successfully transferred and
verified, but only if no database for that zone was yet loaded at the
time the transfer was initiated.
This could have been implemented in a simpler manner, e.g. by modifying
zone_replacedb(), but (due to the calling order of the functions
involved in finalizing a zone transfer) that would cause the resulting
logs to suggest that a mirror zone comes into effect before its transfer
is finished, which would be confusing given the nature of mirror zones
and the fact that no message is logged upon successful mirror zone
verification.
Once the dns_zone_replacedb() call in axfr_finalize() is made, it
becomes impossible to determine whether the transferred zone had a
database attached before the transfer was started. Thus, that check is
instead performed when the transfer context is first created and the
result of this check is passed around in a field of the transfer context
structure. If it turns out to be desired, the relevant log message is
then emitted just before the transfer context is freed.
Taking this approach means that the log message added by this commit is
not timed precisely, i.e. mirror zone data may be used before this
message is logged. However, that can only be fixed by logging the
message inside zone_replacedb(), which causes arguably more dire issues
discussed above.
dns_zone_isloaded() is not used to double-check that transferred zone
data was correctly loaded since the 'shutdown_result' field of the zone
transfer context will not be set to ISC_R_SUCCESS unless axfr_finalize()
succeeds (and that in turn will not happen unless dns_zone_replacedb()
succeeds).
Update axfr_commit() so that all incoming versions of a mirror zone
transferred using AXFR are verified before being used. If zone
verification fails, discard the received version of the zone, wait until
the next refresh and retry.
This commit reverts the previous change to use system provided
entropy, as (SYS_)getrandom is very slow on Linux because it is
a syscall.
The change introduced in this commit adds a new call isc_nonce_buf
that uses CSPRNG from cryptographic library provider to generate
secure data that can be and must be used for generating nonces.
Example usage would be DNS cookies.
The isc_random() API has been changed to use fast PRNG that is not
cryptographically secure, but runs entirely in user space. Two
contestants have been considered xoroshiro family of the functions
by Villa&Blackman and PCG by O'Neill. After a consideration the
xoshiro128starstar function has been used as uint32_t random number
provider because it is very fast and has good enough properties
for our usage pattern.
The other change introduced in the commit is the more extensive usage
of isc_random_uniform in places where the usage pattern was
isc_random() % n to prevent modulo bias. For usage patterns where
only 16 or 8 bits are needed (DNS Message ID), the isc_random()
functions has been renamed to isc_random32(), and isc_random16() and
isc_random8() functions have been introduced by &-ing the
isc_random32() output with 0xffff and 0xff. Please note that the
functions that uses stripped down bit count doesn't pass our
NIST SP 800-22 based random test.
The three functions has been modeled after the arc4random family of
functions, and they will always return random bytes.
The isc_random family of functions internally use these CSPRNG (if available):
1. getrandom() libc call (might be available on Linux and Solaris)
2. SYS_getrandom syscall (might be available on Linux, detected at runtime)
3. arc4random(), arc4random_buf() and arc4random_uniform() (available on BSDs and Mac OS X)
4. crypto library function:
4a. RAND_bytes in case OpenSSL
4b. pkcs_C_GenerateRandom() in case PKCS#11 library