2
0
mirror of https://gitlab.isc.org/isc-projects/bind9 synced 2025-08-30 14:07:59 +00:00
Commit Graph

4452 Commits

Author SHA1 Message Date
Ondřej Surý
0aad979175 Disable netmgr tests only when running under CI
The full netmgr test suite is unstable when run in CI due to various
timing issues.  Previously, we enabled the full test suite only when
CI_ENABLE_ALL_TESTS environment variable was set, but that went against
original intent of running the full suite when an individual developer
would run it locally.

This change disables the full test suite only when running in the CI and
the CI_ENABLE_ALL_TESTS is not set.
2021-04-07 15:36:58 +02:00
Artem Boldariev
ee10948e2d Remove dead code which was supposed to handle TLS shutdowns nicely
Fixes Coverity issue CID 330954 (See #2612).
2021-04-07 11:21:08 +03:00
Artem Boldariev
e6062210c7 Handle buggy situations with SSL_ERROR_SYSCALL
See "BUGS" section at:

https://www.openssl.org/docs/man1.1.1/man3/SSL_get_error.html

It is mentioned there that when TLS status equals SSL_ERROR_SYSCALL
AND errno == 0 it means that underlying transport layer returned EOF
prematurely.  However, we are managing the transport ourselves, so we
should just resume reading from the TCP socket.

It seems that this case has been handled properly on modern versions
of OpenSSL. That being said, the situation goes in line with the
manual: it is briefly mentioned there that SSL_ERROR_SYSCALL might be
returned not only in a case of low-level errors (like system call
failures).
2021-04-07 11:21:08 +03:00
Artem Boldariev
fa062162a7 Fix crash (regression) in DIG when handling non-DoH responses
This commit fixes crash in dig when it encounters non-expected header
value. The bug was introduced at some point late in the last DoH
development cycle. Also, refactors the relevant code a little bit to
ensure better incoming data validation for client-side DoH
connections.
2021-04-01 17:31:29 +03:00
Artem Boldariev
11ed7aac5d TLS code refactoring, fixes and unit-tests
This commit fixes numerous stability issues with TLS transport code as
well as adds unit tests for it.
2021-04-01 17:31:29 +03:00
Petr Mensik
81eb3396bf Do not require config.h to use isc/util.h
util.h requires ISC_CONSTRUCTOR definition, which depends on config.h
inclusion. It does not include it from isc/util.h (or any other header).
Using isc/util.h fails hard when isc/util.h is used without including
bind's config.h.

Move the check to c file, where ISC_CONSTRUCTOR is used. Ensure config.h
is included there.
2021-03-26 11:41:22 +01:00
Patrick McLean
ebced74b19 Add isc_time_now_hires function to get current time with high resolution
The current isc_time_now uses CLOCK_REALTIME_COARSE which only updates
on a timer tick. This clock is generally fine for millisecond accuracy,
but on servers with 100hz clocks, this clock is nowhere near accurate
enough for microsecond accuracy.

This commit adds a new isc_time_now_hires function that uses
CLOCK_REALTIME, which gives the current time, though it is somewhat
expensive to call. When microsecond accuracy is required, it may be
required to use extra resources for higher accuracy.
2021-03-20 11:25:55 -07:00
Ondřej Surý
d016ea745f Fix compilation with NETMGR_TRACE(_VERBOSE) enabled on non-Linux
When NETMGR_TRACE(_VERBOSE) is enabled, the build would fail on some
non-Linux non-glibc platforms because:

  * Use <stdint.h> print macros because uint_fast32_t is not always
    unsigned long

  * The header <execinfo.h> is not available on non-glibc, thus commit
    adds dummy backtrace() and backtrace_symbols_fd() functions for
    platforms without HAVE_BACKTRACE
2021-03-19 16:25:28 +01:00
Ondřej Surý
42e4e3b843 Improve reliability of the netmgr unit tests
The netmgr unit tests were designed to push the system limits to maximum
by sending as many queries as possible in the busy loop from multiple
threads.  This mostly works with UDP, but in the stateful protocol where
establishing the connection takes more time, it failed quite often in
the CI.  On FreeBSD, this happened more often, because the socket() call
would fail spuriosly making the problem even worse.

This commit does several things to improve reliability:

* return value of isc_nm_<proto>connect() is always checked and retried
  when scheduling the connection fails

* The busy while loop has been slowed down with usleep(1000); so the
  netmgr threads could schedule the work and get executed.

* The isc_thread_yield() was replaced with usleep(1000); also to allow
  the other threads to do any work.

* Instead of waiting on just one variable, we wait for multiple
  variables to reach the final value

* We are wrapping the netmgr operations (connects, reads, writes,
  accepts) with reference counting and waiting for all the callbacks to
  be accounted for.

  This has two effects:

  a) the isc_nm_t is always clean of active sockets and handles when
     destroyed, so it will prevent the spurious INSIST(references == 1)
     from isc_nm_destroy()

  b) the unit test now ensures that all the callbacks are always called
     when they should be called, so any stuck test means that there was
     a missing callback call and it is always a real bug

These changes allows us to remove the workaround that would not run
certain tests on systems without port load-balancing.
2021-03-19 16:25:28 +01:00
Ondřej Surý
e4e0e9e3c1 Call isc__nm_tlsdns_failed_read on tls_error to cleanup the socket
In tls_error(), we now call isc__nm_tlsdns_failed_read() instead of just
stopping timer and reading from the socket.  This allows us to properly
cleanup any pending operation on the socket.
2021-03-19 15:28:52 +01:00
Ondřej Surý
e4b0730387 Call the isc__nm_failed_connect_cb() early when shutting down
When shutting down, calling the isc__nm_failed_connect_cb() was delayed
until the connect callback would be called.  It turned out that the
connect callback might not get called at all when the socket is being
shut down.  Call the failed_connect_cb() directly in the
tlsdns_shutdown() instead of waiting for the connect callback to call it.
2021-03-18 14:31:15 -07:00
Ondřej Surý
73c574e553 Fix typo in processbuffer() - tcpdns vs tlsdns
The processbuffer() would call isc__nm_tcpdns_processbuffer() instead of
isc__nm_tlsdns_processbuffer() for the isc_nm_tlsdnssocket type of
socket.
2021-03-18 21:35:13 +01:00
Ondřej Surý
1d64d4cde8 Fix memory accounting bug in TLSDNS
After a partial write the tls.senddata buffer would be rearranged to
contain only the data tha wasn't sent and the len part would be made
shorter, which would lead to attempt to free only part of a socket's
tls.senddata buffer.
2021-03-18 18:14:38 +01:00
Ondřej Surý
5cc406a920 Fix dangling uvreq when data is sent from tlsdns_cycle()
The tlsdns_cycle() might call uv_write() to write data to the socket,
when this happens and the socket is shutdown before the callback
completes, the uvreq structure was not freed because the callback would
be called with non-zero status code.
2021-03-18 17:58:56 +01:00
Ondřej Surý
36ddefacb4 Change the isc_nm_(get|set)timeouts() to work with milliseconds
The RFC7828 specifies the keepalive interval to be 16-bit, specified in
units of 100 milliseconds and the configuration options tcp-*-timeouts
are following the suit.  The units of 100 milliseconds are very
unintuitive and while we can't change the configuration and presentation
format, we should not follow this weird unit in the API.

This commit changes the isc_nm_(get|set)timeouts() functions to work
with milliseconds and convert the values to milliseconds before passing
them to the function, not just internally.
2021-03-18 16:37:57 +01:00
Ondřej Surý
1ef232f93d Merge the common parts between udp, tcpdns and tlsdns protocol
The udp, tcpdns and tlsdns contained lot of cut&paste code or code that
was very similar making the stack harder to maintain as any change to
one would have to be copied to the the other protocols.

In this commit, we merge the common parts into the common functions
under isc__nm_<foo> namespace and just keep the little differences based
on the socket type.
2021-03-18 16:37:57 +01:00
Ondřej Surý
caa5b6548a Fix TCPDNS and TLSDNS timers
After the TCPDNS refactoring the initial and idle timers were broken and
only the tcp-initial-timeout was always applied on the whole TCP
connection.

This broke any TCP connection that took longer than tcp-initial-timeout,
most often this would affect large zone AXFRs.

This commit changes the timeout logic in this way:

  * On TCP connection accept the tcp-initial-timeout is applied
    and the timer is started
  * When we are processing and/or sending any DNS message the timer is
    stopped
  * When we stop processing all DNS messages, the tcp-idle-timeout
    is applied and the timer is started again
2021-03-18 16:37:57 +01:00
Mark Andrews
a9f883cbc2 Stop using deprecated calls in lib/isc/tls.c
from Rosen Penev @neheb
2021-03-17 20:05:47 +00:00
Artem Boldariev
75363dcb7c Load full certificate chain from a certificate chain file
This commit fixes loading the certificate chain files so that the full
chain could be sent to the clients which require that for
verification. Before that fix only the top most certificate would be
loaded from the chain and sent to clients preventing some of them to
perform certificate validation (e.g. Windows 10 DoH client).
2021-03-16 11:49:04 +02:00
Mark Andrews
99bd0c346f cast (char) to (unsigned char) when calling is*() 2021-03-15 14:18:03 +11:00
Artem Boldariev
7a59fb8207 Disable Nagle's algorithm for HTTP/2 connections
It is advisable to disable Nagle's algorithm for HTTP/2 connections
because multiple HTTP/2 streams could be multiplexed over one
transport connection. Thus, delays when delivering small packets could
bring down performance for the whole session. HTTP/2 is meant to be
used this way.
2021-03-05 18:09:42 +02:00
Artem Boldariev
66d20cf28b Fix deadlock in isc_nm_tlsconnect()
when called from within the context of a network thread,
isc_nm_tlsconnect() hangs. it is waiting for the socket's
result code to be updated, but that update is supposed to happen
asynchronously in the network thread, and if we're already blocking
in the network thread, it can never occur.

we can kluge around this by setting the socket result code
early; this works for most clients (including "dig"), but it causes
inconsistent behaviors that manifest as test failures in the DoH unit
test.

so we kluged around it even more by setting the socket result code
early *only when running in the network thread*. we need a better
solution for this problem, but this will do for now.
2021-03-05 18:09:22 +02:00
Artem Boldariev
ca9a15e3bc DoH: call send callbacks after data was actually sent 2021-03-05 13:29:32 +02:00
Artem Boldariev
71668437d4 Put sane limitations in place to handle bad requests gracefully
This commit makes the server-side code polite.

It fixes the error handling code on the server side and fixes
returning error code in responses (there was a nasty bug which could
potentially crash the server).

Also, in this commit we limit max size POST request data to 96K, max
processed data size in headers to 128K (should be enough to handle any
GET requests).

If these limits are surpassed, server will terminate the request with
RST_STREAM without responding with error code. Otherwise it politely
responds with error code.

This commit also limits number of concurrent HTTP/2 streams per
transport connection on server to 100 (as nghttp2 advises by default).

Ideally, these parameters should be configurable both globally and per
every HTTP endpoint description in the configuration file, but for now
putting sane limits should be enough.
2021-03-05 13:29:32 +02:00
Evan Hunt
88752b1121 refactor outgoing HTTP connection support
- style, cleanup, and removal of unnecessary code.
- combined isc_nm_http_add_endpoint() and isc_nm_http_add_doh_endpoint()
  into one function, renamed isc_http_endpoint().
- moved isc_nm_http_connect_send_request() into doh_test.c as a helper
  function; remove it from the public API.
- renamed isc_http2 and isc_nm_http2 types and functions to just isc_http
  and isc_nm_http, for consistency with other existing names.
- shortened a number of long names.
- the caller is now responsible for determining the peer address.
  in isc_nm_httpconnect(); this eliminates the need to parse the URI
  and the dependency on an external resolver.
- the caller is also now responsible for creating the SSL client context,
  for consistency with isc_nm_tlsdnsconnect().
- added setter functions for HTTP/2 ALPN. instead of setting up ALPN in
  isc_tlsctx_createclient(), we now have a function
  isc_tlsctx_enable_http2client_alpn() that can be run from
  isc_nm_httpconnect().
- refactored isc_nm_httprequest() into separate read and send functions.
  isc_nm_send() or isc_nm_read() is called on an http socket, it will
  be stored until a corresponding isc_nm_read() or _send() arrives; when
  we have both halves of the pair the HTTP request will be initiated.
- isc_nm_httprequest() is renamed isc__nm_http_request() for use as an
  internal helper function by the DoH unit test. (eventually doh_test
  should be rewritten to use read and send, and this function should
  be removed.)
- added implementations of isc__nm_tls_settimeout() and
  isc__nm_http_settimeout().
- increased NGHTTP2 header block length for client connections to 128K.
- use isc_mem_t for internal memory allocations inside nghttp2, to
  help track memory leaks.
- send "Cache-Control" header in requests and responses. (note:
  currently we try to bypass HTTP caching proxies, but ideally we should
  interact with them: https://tools.ietf.org/html/rfc8484#section-5.1)
2021-03-05 13:29:26 +02:00
Ondřej Surý
a55bdb28f9 Assigning uint64_t from buffer might be misaligned in netmgr tests
Resolve possible 8-byte unaligned access when assigning the magic
value from the received buffer.
2021-03-04 15:02:24 +01:00
Ondřej Surý
d3bb3ae64f Fix comparison between signed and unsigned integer expressions
Simple typecast to size_t should be enough to silence the warning on
ARMv7, even though the code is in fact correct, because the readlen is
checked for being < 0 in the block before the warning.
2021-03-04 11:21:43 +01:00
Ondřej Surý
a50f5d0cf5 Call isc__initialize()/isc__shutdown() from win32 DllMain
Call the libisc isc__initialize() constructor and isc__shutdown()
destructor from DllMain instead of having duplicate code between
those and DllMain() code.
2021-03-01 14:24:57 +01:00
Ondřej Surý
888bdfc1ff Add mempool get/put tracking with AddressSanitizer
When AddressSanitizer is in use, disable the internal mempool
implementation and redirect the isc_mempool_get to isc_mem_get
(and similarly for isc_mempool_put).  This is the method recommended
by the AddressSanitizer authors for tracking allocations and
deallocations instead of custom poison/unpoison code (see
https://github.com/google/sanitizers/wiki/AddressSanitizerManualPoisoning).
2021-02-26 10:05:42 -08:00
Ondřej Surý
a0181056a8 Change the isc_thread_self() return type to uintptr_t
The pthread_self(), thrd_current() or GetCurrentThreadId() could
actually be a pointer, so we should rather convert the value into
uintptr_t instead of unsigned long.
2021-02-25 16:21:10 +01:00
Ondřej Surý
bea333f7c9 Use globally assigned thread_id in the isc_hp API
Convert the isc_hp API to use the globally available isc_tid_v instead
of locally defined tid_v.  This should solve most of the problems on
machines with many number of cores / CPUs.
2021-02-25 16:21:10 +01:00
Ondřej Surý
cbbecfcc82 Add isc_trampoline API to have simple accounting around threads
The current isc_hp API uses internal tid_v variable that gets
incremented for each new thread using hazard pointers.  This tid_v
variable is then used as a index to global shared table with hazard
pointers state.  Since the tid_v is only incremented and never
decremented the table could overflow very quickly if we create set of
threads for short period of time, they finish the work and cease to
exist.  Then we create identical set of threads and so on and so on.
This is not a problem for a normal `named` operation as the set of
threads is stable, but the problematic place are the unit tests where we
test network manager or other APIs (task, timer) that create threads.

This commits adds a thin wrapper around any function called from
isc_thread_create() that adds unique-but-reusable small digit thread id
that can be used as index to f.e. hazard pointer tables.  The trampoline
wrapper ensures that the thread ids will be reused, so the highest
thread_id number doesn't grow indefinitely when threads are created and
destroyed and then created again.  This fixes the hazard pointer table
overflow on machines with many cores. [GL #2396]
2021-02-25 16:21:10 +01:00
Mark Andrews
3ac53daa06 Address unbalanced lock/unlock
Also address race between reading and testing mpctx->allocated
and incrementing mpctx->allocated.
2021-02-25 13:08:07 +11:00
Ondřej Surý
c5887c4312 Disable safe-guard assertion in DLL_THREAD_ATTACH/DLL_THREAD_DETACH
The BIND 9 libraries on Windows define DllMain() optional entry point
into a dynamic-link library (DLL).  When the system starts or terminates
a process or thread, it calls the entry-point function for each loaded
DLL using the first thread of the process.

When the DLL is being loaded into the virtual address space of the
current process as a result of the process starting up, we make a call
to DisableThreadLibraryCalls() which should disable the
DLL_THREAD_ATTACH and DLL_THREAD_DETACH notifications for the specified
dynamic-link library (DLL).

This seems not be the case because we never check the return value of
the DisableThreadLibraryCalls() call, and it could in fact fail.  The
DisableThreadLibraryCalls() function fails if the DLL specified by
hModule has active static thread local storage, or if hModule is an
invalid module handle.

In this commit, we remove the safe-guard assertion put in place for the
DLL_THREAD_ATTACH and DLL_THREAD_DETACH events and we just ignore them.
BIND 9 doesn't create/destroy enough threads for it actually to make any
difference, and in fact we do use static thread local storage in the
code.
2021-02-24 08:31:42 +01:00
Ondřej Surý
f53e7ed12c Include lib/isc/tls_p.h in release tarballs
The addition of lib/isc/tls_p.h to the source tree was not accounted for
in the relevant variable in lib/isc/Makefile.am and thus the former file
is not being included in release tarballs prepared using "make dist".
Fix by tweaking the libisc_la_SOURCES list in lib/isc/Makefile.am
accordingly.
2021-02-19 13:25:18 +01:00
Ondřej Surý
494d0da522 Use library constructor/destructor to initialize OpenSSL
Instead of calling isc_tls_initialize()/isc_tls_destroy() explicitly use
gcc/clang attributes on POSIX and DLLMain on Windows to initialize and
shutdown OpenSSL library.

This resolves the issue when isc_nm_create() / isc_nm_destroy() was
called multiple times and it would call OpenSSL library destructors from
isc_nm_destroy().

At the same time, since we now have introduced the ctor/dtor for libisc,
this commit moves the isc_mem API initialization (the list of the
contexts) and changes the isc_mem_checkdestroyed() to schedule the
checking of memory context on library unload instead of executing the
code immediately.
2021-02-18 19:33:54 +01:00
Ondřej Surý
4bde4f050b Disable calling DllMain() on thread creation/destruction
Disables the DLL_THREAD_ATTACH and DLL_THREAD_DETACH notifications for
the specified dynamic-link library (DLL).  This can reduce the size of
the working set for some applications.
2021-02-18 19:33:54 +01:00
Ondřej Surý
f225462055 Fix the invalid condition variable
Although harmless, the memmove() in tlsdns and tcpdns was guarded by a
current message length variable that was always bigger than 0 instead of
correct current buffer length remainder variable.
2021-02-18 19:33:54 +01:00
Ondřej Surý
4775e9f256 Move most of the OpenSSL initialization to isc_tls
Since we now require both libcrypto and libssl to be initialized for
netmgr, we move all the OpenSSL initialization code except the engine
initialization to isc_tls API.

The isc_tls_initialize() and isc_tls_destroy() has been made idempotent,
so they could be called multiple time.  However when isc_tls_destroy()
has been called, the isc_tls_initialize() could not be called again.
2021-02-18 19:33:54 +01:00
Ondřej Surý
ff47b47f1a Remove overrun checking code from memory allocator
The ISC_MEM_CHECKOVERRUN would add canary byte at the end of every
allocations and check whether the canary byte hasn't been changed at the
free time.  The AddressSanitizer and valgrind memory checks surpases
simple checks like this, so there's no need to actually keep the code
inside the allocator.
2021-02-18 19:33:54 +01:00
Ondřej Surý
549e5b693a Modify the way we benchmark mem_{get,put}
Previously, the mem_{get,put} benchmark would pass the allocation size
as thread_create argument.  This has been now changed, so the allocation
size is stored and decremented (divided) in atomic variable and the
thread create routing is given a memory context.  This will allow to
write tests where each thread is given different memory context and do
the same for mempool benchmarking.
2021-02-18 19:33:54 +01:00
Ondřej Surý
f34f943b16 Disable memory debugging features in non-developer build
The two memory debugging features: ISC_MEM_DEFAULTFILL
(ISC_MEMFLAG_FILL) and ISC_MEM_TRACKLINES were always enabled in all
builds and the former was only disabled in `named`.

This commits disables those two features in non-developer build to make
the memory allocator significantly faster.
2021-02-18 19:33:54 +01:00
Ondřej Surý
c9fe12443f Make the mempool names unconditional
The named memory pools were default and always compiled-in.  Remove the
extra complexity by removing the #define and #ifdefs around the code.
2021-02-18 19:33:54 +01:00
Ondřej Surý
b09106e93a Make the memory and mempool counters to be stdatomic types
This is yet another step into unlocking some parts of the memory
contexts.  All the regularly updated variables has been turned into
atomic types, so we can later remove the locks when updating various
counters.

Also unlock as much code as possible without breaking anything.
2021-02-18 19:33:51 +01:00
Ondřej Surý
0f44139145 Bump the maximum number of hazard pointers in tests
On 24-core machine, the tests would crash because we would run out of
the hazard pointers.  We now adjust the number of hazard pointers to be
in the <128,256> interval based on the number of available cores.

Note: This is just a band-aid and needs a proper fix.
2021-02-18 19:32:55 +01:00
Ondřej Surý
7de846977b Remove the extra level of indirection via isc_memmethods_t
Previously, the applications using libisc would be able to override the
internal memory methods with own implementation.  This was no longer
possible, but the extra level of indirection was not removed.  This
commit removes the extra level of indirection for the memory methods and
the default_memalloc() and default_memfree().
2021-02-18 19:32:55 +01:00
Ondřej Surý
55ace5d3aa Remove the internal memory allocator
The internal memory allocator had an extra code to keep a list of blocks
for small size allocation.  This would help to reduce the interactions
with the system malloc as the memory would be already allocated from the
system, but there's an extra cost associated with that - all the
allocations/deallocations must be locked, effectively eliminating any
optimizations in the system allocator targeted at multi-threaded
applications. While the isc_mem API is still using locks pretty heavily,
this is a first step into reducing the memory allocation/deallocation
contention.
2021-02-18 19:32:02 +01:00
Ondřej Surý
66eefac78c Rollback setting IP_DONTFRAG option on the UDP sockets
In DNS Flag Day 2020, the development branch started setting the
IP_DONTFRAG option on the UDP sockets.  It turned out, that this
code was incomplete leading to dropping the outgoing UDP packets.
Henceforth this commit rolls back this setting until we have a
proper fix that would send back empty response with TC flag set.
2021-02-17 08:09:56 +01:00
Michal Nowak
c286341703 Use SKIPPED_TEST_EXIT_CODE consistently
Commit fa505bfb0e omitted two unit tests
while introducing the SKIP_TEST_EXIT_CODE preprocessor macro.  Fix the
outliers to make use of SKIP_TEST_EXIT_CODE consistent across all unit
tests.  Also make sure lib/dns/tests/dnstap_test returns an exit code
that indicates a skipped test when dnstap is not enabled.
2021-02-16 13:41:50 +01:00
Ondřej Surý
d1448a4c2a Move the <isc/readline.h> header to bin/dig/readline.h
The <isc/readline.h> header provided a compatibility shim to use when
other non-GNU readline libraries are in use.  The two places where
readline library is being used is nslookup and nsupdate, so the header
file has been moved to bin/dig directory and it's directly included from
bin/nsupdate.

This also conceals any readline headers exposed from the libisc headers.
2021-02-16 01:04:46 +00:00