The full netmgr test suite is unstable when run in CI due to various
timing issues. Previously, we enabled the full test suite only when
CI_ENABLE_ALL_TESTS environment variable was set, but that went against
original intent of running the full suite when an individual developer
would run it locally.
This change disables the full test suite only when running in the CI and
the CI_ENABLE_ALL_TESTS is not set.
See "BUGS" section at:
https://www.openssl.org/docs/man1.1.1/man3/SSL_get_error.html
It is mentioned there that when TLS status equals SSL_ERROR_SYSCALL
AND errno == 0 it means that underlying transport layer returned EOF
prematurely. However, we are managing the transport ourselves, so we
should just resume reading from the TCP socket.
It seems that this case has been handled properly on modern versions
of OpenSSL. That being said, the situation goes in line with the
manual: it is briefly mentioned there that SSL_ERROR_SYSCALL might be
returned not only in a case of low-level errors (like system call
failures).
This commit fixes crash in dig when it encounters non-expected header
value. The bug was introduced at some point late in the last DoH
development cycle. Also, refactors the relevant code a little bit to
ensure better incoming data validation for client-side DoH
connections.
util.h requires ISC_CONSTRUCTOR definition, which depends on config.h
inclusion. It does not include it from isc/util.h (or any other header).
Using isc/util.h fails hard when isc/util.h is used without including
bind's config.h.
Move the check to c file, where ISC_CONSTRUCTOR is used. Ensure config.h
is included there.
The current isc_time_now uses CLOCK_REALTIME_COARSE which only updates
on a timer tick. This clock is generally fine for millisecond accuracy,
but on servers with 100hz clocks, this clock is nowhere near accurate
enough for microsecond accuracy.
This commit adds a new isc_time_now_hires function that uses
CLOCK_REALTIME, which gives the current time, though it is somewhat
expensive to call. When microsecond accuracy is required, it may be
required to use extra resources for higher accuracy.
When NETMGR_TRACE(_VERBOSE) is enabled, the build would fail on some
non-Linux non-glibc platforms because:
* Use <stdint.h> print macros because uint_fast32_t is not always
unsigned long
* The header <execinfo.h> is not available on non-glibc, thus commit
adds dummy backtrace() and backtrace_symbols_fd() functions for
platforms without HAVE_BACKTRACE
The netmgr unit tests were designed to push the system limits to maximum
by sending as many queries as possible in the busy loop from multiple
threads. This mostly works with UDP, but in the stateful protocol where
establishing the connection takes more time, it failed quite often in
the CI. On FreeBSD, this happened more often, because the socket() call
would fail spuriosly making the problem even worse.
This commit does several things to improve reliability:
* return value of isc_nm_<proto>connect() is always checked and retried
when scheduling the connection fails
* The busy while loop has been slowed down with usleep(1000); so the
netmgr threads could schedule the work and get executed.
* The isc_thread_yield() was replaced with usleep(1000); also to allow
the other threads to do any work.
* Instead of waiting on just one variable, we wait for multiple
variables to reach the final value
* We are wrapping the netmgr operations (connects, reads, writes,
accepts) with reference counting and waiting for all the callbacks to
be accounted for.
This has two effects:
a) the isc_nm_t is always clean of active sockets and handles when
destroyed, so it will prevent the spurious INSIST(references == 1)
from isc_nm_destroy()
b) the unit test now ensures that all the callbacks are always called
when they should be called, so any stuck test means that there was
a missing callback call and it is always a real bug
These changes allows us to remove the workaround that would not run
certain tests on systems without port load-balancing.
In tls_error(), we now call isc__nm_tlsdns_failed_read() instead of just
stopping timer and reading from the socket. This allows us to properly
cleanup any pending operation on the socket.
When shutting down, calling the isc__nm_failed_connect_cb() was delayed
until the connect callback would be called. It turned out that the
connect callback might not get called at all when the socket is being
shut down. Call the failed_connect_cb() directly in the
tlsdns_shutdown() instead of waiting for the connect callback to call it.
After a partial write the tls.senddata buffer would be rearranged to
contain only the data tha wasn't sent and the len part would be made
shorter, which would lead to attempt to free only part of a socket's
tls.senddata buffer.
The tlsdns_cycle() might call uv_write() to write data to the socket,
when this happens and the socket is shutdown before the callback
completes, the uvreq structure was not freed because the callback would
be called with non-zero status code.
The RFC7828 specifies the keepalive interval to be 16-bit, specified in
units of 100 milliseconds and the configuration options tcp-*-timeouts
are following the suit. The units of 100 milliseconds are very
unintuitive and while we can't change the configuration and presentation
format, we should not follow this weird unit in the API.
This commit changes the isc_nm_(get|set)timeouts() functions to work
with milliseconds and convert the values to milliseconds before passing
them to the function, not just internally.
The udp, tcpdns and tlsdns contained lot of cut&paste code or code that
was very similar making the stack harder to maintain as any change to
one would have to be copied to the the other protocols.
In this commit, we merge the common parts into the common functions
under isc__nm_<foo> namespace and just keep the little differences based
on the socket type.
After the TCPDNS refactoring the initial and idle timers were broken and
only the tcp-initial-timeout was always applied on the whole TCP
connection.
This broke any TCP connection that took longer than tcp-initial-timeout,
most often this would affect large zone AXFRs.
This commit changes the timeout logic in this way:
* On TCP connection accept the tcp-initial-timeout is applied
and the timer is started
* When we are processing and/or sending any DNS message the timer is
stopped
* When we stop processing all DNS messages, the tcp-idle-timeout
is applied and the timer is started again
This commit fixes loading the certificate chain files so that the full
chain could be sent to the clients which require that for
verification. Before that fix only the top most certificate would be
loaded from the chain and sent to clients preventing some of them to
perform certificate validation (e.g. Windows 10 DoH client).
It is advisable to disable Nagle's algorithm for HTTP/2 connections
because multiple HTTP/2 streams could be multiplexed over one
transport connection. Thus, delays when delivering small packets could
bring down performance for the whole session. HTTP/2 is meant to be
used this way.
when called from within the context of a network thread,
isc_nm_tlsconnect() hangs. it is waiting for the socket's
result code to be updated, but that update is supposed to happen
asynchronously in the network thread, and if we're already blocking
in the network thread, it can never occur.
we can kluge around this by setting the socket result code
early; this works for most clients (including "dig"), but it causes
inconsistent behaviors that manifest as test failures in the DoH unit
test.
so we kluged around it even more by setting the socket result code
early *only when running in the network thread*. we need a better
solution for this problem, but this will do for now.
This commit makes the server-side code polite.
It fixes the error handling code on the server side and fixes
returning error code in responses (there was a nasty bug which could
potentially crash the server).
Also, in this commit we limit max size POST request data to 96K, max
processed data size in headers to 128K (should be enough to handle any
GET requests).
If these limits are surpassed, server will terminate the request with
RST_STREAM without responding with error code. Otherwise it politely
responds with error code.
This commit also limits number of concurrent HTTP/2 streams per
transport connection on server to 100 (as nghttp2 advises by default).
Ideally, these parameters should be configurable both globally and per
every HTTP endpoint description in the configuration file, but for now
putting sane limits should be enough.
- style, cleanup, and removal of unnecessary code.
- combined isc_nm_http_add_endpoint() and isc_nm_http_add_doh_endpoint()
into one function, renamed isc_http_endpoint().
- moved isc_nm_http_connect_send_request() into doh_test.c as a helper
function; remove it from the public API.
- renamed isc_http2 and isc_nm_http2 types and functions to just isc_http
and isc_nm_http, for consistency with other existing names.
- shortened a number of long names.
- the caller is now responsible for determining the peer address.
in isc_nm_httpconnect(); this eliminates the need to parse the URI
and the dependency on an external resolver.
- the caller is also now responsible for creating the SSL client context,
for consistency with isc_nm_tlsdnsconnect().
- added setter functions for HTTP/2 ALPN. instead of setting up ALPN in
isc_tlsctx_createclient(), we now have a function
isc_tlsctx_enable_http2client_alpn() that can be run from
isc_nm_httpconnect().
- refactored isc_nm_httprequest() into separate read and send functions.
isc_nm_send() or isc_nm_read() is called on an http socket, it will
be stored until a corresponding isc_nm_read() or _send() arrives; when
we have both halves of the pair the HTTP request will be initiated.
- isc_nm_httprequest() is renamed isc__nm_http_request() for use as an
internal helper function by the DoH unit test. (eventually doh_test
should be rewritten to use read and send, and this function should
be removed.)
- added implementations of isc__nm_tls_settimeout() and
isc__nm_http_settimeout().
- increased NGHTTP2 header block length for client connections to 128K.
- use isc_mem_t for internal memory allocations inside nghttp2, to
help track memory leaks.
- send "Cache-Control" header in requests and responses. (note:
currently we try to bypass HTTP caching proxies, but ideally we should
interact with them: https://tools.ietf.org/html/rfc8484#section-5.1)
Simple typecast to size_t should be enough to silence the warning on
ARMv7, even though the code is in fact correct, because the readlen is
checked for being < 0 in the block before the warning.
Call the libisc isc__initialize() constructor and isc__shutdown()
destructor from DllMain instead of having duplicate code between
those and DllMain() code.
When AddressSanitizer is in use, disable the internal mempool
implementation and redirect the isc_mempool_get to isc_mem_get
(and similarly for isc_mempool_put). This is the method recommended
by the AddressSanitizer authors for tracking allocations and
deallocations instead of custom poison/unpoison code (see
https://github.com/google/sanitizers/wiki/AddressSanitizerManualPoisoning).
The pthread_self(), thrd_current() or GetCurrentThreadId() could
actually be a pointer, so we should rather convert the value into
uintptr_t instead of unsigned long.
Convert the isc_hp API to use the globally available isc_tid_v instead
of locally defined tid_v. This should solve most of the problems on
machines with many number of cores / CPUs.
The current isc_hp API uses internal tid_v variable that gets
incremented for each new thread using hazard pointers. This tid_v
variable is then used as a index to global shared table with hazard
pointers state. Since the tid_v is only incremented and never
decremented the table could overflow very quickly if we create set of
threads for short period of time, they finish the work and cease to
exist. Then we create identical set of threads and so on and so on.
This is not a problem for a normal `named` operation as the set of
threads is stable, but the problematic place are the unit tests where we
test network manager or other APIs (task, timer) that create threads.
This commits adds a thin wrapper around any function called from
isc_thread_create() that adds unique-but-reusable small digit thread id
that can be used as index to f.e. hazard pointer tables. The trampoline
wrapper ensures that the thread ids will be reused, so the highest
thread_id number doesn't grow indefinitely when threads are created and
destroyed and then created again. This fixes the hazard pointer table
overflow on machines with many cores. [GL #2396]
The BIND 9 libraries on Windows define DllMain() optional entry point
into a dynamic-link library (DLL). When the system starts or terminates
a process or thread, it calls the entry-point function for each loaded
DLL using the first thread of the process.
When the DLL is being loaded into the virtual address space of the
current process as a result of the process starting up, we make a call
to DisableThreadLibraryCalls() which should disable the
DLL_THREAD_ATTACH and DLL_THREAD_DETACH notifications for the specified
dynamic-link library (DLL).
This seems not be the case because we never check the return value of
the DisableThreadLibraryCalls() call, and it could in fact fail. The
DisableThreadLibraryCalls() function fails if the DLL specified by
hModule has active static thread local storage, or if hModule is an
invalid module handle.
In this commit, we remove the safe-guard assertion put in place for the
DLL_THREAD_ATTACH and DLL_THREAD_DETACH events and we just ignore them.
BIND 9 doesn't create/destroy enough threads for it actually to make any
difference, and in fact we do use static thread local storage in the
code.
The addition of lib/isc/tls_p.h to the source tree was not accounted for
in the relevant variable in lib/isc/Makefile.am and thus the former file
is not being included in release tarballs prepared using "make dist".
Fix by tweaking the libisc_la_SOURCES list in lib/isc/Makefile.am
accordingly.
Instead of calling isc_tls_initialize()/isc_tls_destroy() explicitly use
gcc/clang attributes on POSIX and DLLMain on Windows to initialize and
shutdown OpenSSL library.
This resolves the issue when isc_nm_create() / isc_nm_destroy() was
called multiple times and it would call OpenSSL library destructors from
isc_nm_destroy().
At the same time, since we now have introduced the ctor/dtor for libisc,
this commit moves the isc_mem API initialization (the list of the
contexts) and changes the isc_mem_checkdestroyed() to schedule the
checking of memory context on library unload instead of executing the
code immediately.
Disables the DLL_THREAD_ATTACH and DLL_THREAD_DETACH notifications for
the specified dynamic-link library (DLL). This can reduce the size of
the working set for some applications.
Although harmless, the memmove() in tlsdns and tcpdns was guarded by a
current message length variable that was always bigger than 0 instead of
correct current buffer length remainder variable.
Since we now require both libcrypto and libssl to be initialized for
netmgr, we move all the OpenSSL initialization code except the engine
initialization to isc_tls API.
The isc_tls_initialize() and isc_tls_destroy() has been made idempotent,
so they could be called multiple time. However when isc_tls_destroy()
has been called, the isc_tls_initialize() could not be called again.
The ISC_MEM_CHECKOVERRUN would add canary byte at the end of every
allocations and check whether the canary byte hasn't been changed at the
free time. The AddressSanitizer and valgrind memory checks surpases
simple checks like this, so there's no need to actually keep the code
inside the allocator.
Previously, the mem_{get,put} benchmark would pass the allocation size
as thread_create argument. This has been now changed, so the allocation
size is stored and decremented (divided) in atomic variable and the
thread create routing is given a memory context. This will allow to
write tests where each thread is given different memory context and do
the same for mempool benchmarking.
The two memory debugging features: ISC_MEM_DEFAULTFILL
(ISC_MEMFLAG_FILL) and ISC_MEM_TRACKLINES were always enabled in all
builds and the former was only disabled in `named`.
This commits disables those two features in non-developer build to make
the memory allocator significantly faster.
This is yet another step into unlocking some parts of the memory
contexts. All the regularly updated variables has been turned into
atomic types, so we can later remove the locks when updating various
counters.
Also unlock as much code as possible without breaking anything.
On 24-core machine, the tests would crash because we would run out of
the hazard pointers. We now adjust the number of hazard pointers to be
in the <128,256> interval based on the number of available cores.
Note: This is just a band-aid and needs a proper fix.
Previously, the applications using libisc would be able to override the
internal memory methods with own implementation. This was no longer
possible, but the extra level of indirection was not removed. This
commit removes the extra level of indirection for the memory methods and
the default_memalloc() and default_memfree().
The internal memory allocator had an extra code to keep a list of blocks
for small size allocation. This would help to reduce the interactions
with the system malloc as the memory would be already allocated from the
system, but there's an extra cost associated with that - all the
allocations/deallocations must be locked, effectively eliminating any
optimizations in the system allocator targeted at multi-threaded
applications. While the isc_mem API is still using locks pretty heavily,
this is a first step into reducing the memory allocation/deallocation
contention.
In DNS Flag Day 2020, the development branch started setting the
IP_DONTFRAG option on the UDP sockets. It turned out, that this
code was incomplete leading to dropping the outgoing UDP packets.
Henceforth this commit rolls back this setting until we have a
proper fix that would send back empty response with TC flag set.
Commit fa505bfb0e omitted two unit tests
while introducing the SKIP_TEST_EXIT_CODE preprocessor macro. Fix the
outliers to make use of SKIP_TEST_EXIT_CODE consistent across all unit
tests. Also make sure lib/dns/tests/dnstap_test returns an exit code
that indicates a skipped test when dnstap is not enabled.
The <isc/readline.h> header provided a compatibility shim to use when
other non-GNU readline libraries are in use. The two places where
readline library is being used is nslookup and nsupdate, so the header
file has been moved to bin/dig directory and it's directly included from
bin/nsupdate.
This also conceals any readline headers exposed from the libisc headers.