2
0
mirror of https://gitlab.isc.org/isc-projects/bind9 synced 2025-08-29 05:28:00 +00:00

4339 Commits

Author SHA1 Message Date
Artem Boldariev
4c5b36780b Fix flawed DoH unit tests logic
This commit fixes some logical mistakes in DoH unit tests logic,
causing them either to fail or not to do what they are intended to do.
2021-05-07 15:47:24 +03:00
Matthijs Mekking
66f2cd228d Use isdigit instead of checking character range
When looking for key files, we could use isdigit rather than checking
if the character is within the range [0-9].

Use (unsigned char) cast to ensure the value is representable in the
unsigned char type (as suggested by the isdigit manpage).

Change " & 0xff" occurrences to the recommended (unsigned char) type
cast.
2021-05-05 19:15:33 +02:00
Ondřej Surý
dfd56b84f5 Add support for generating backtraces on Windows
This commit adds support for generating backtraces on Windows and
refactors the isc_backtrace API to match the Linux/BSD API (without
the isc_ prefix)

 * isc_backtrace_gettrace() was renamed to isc_backtrace(), the third
   argument was removed and the return type was changed to int
 * isc_backtrace_symbols() was added
 * isc_backtrace_symbols_fd() was added and used as appropriate
2021-05-03 20:31:52 +02:00
Ondřej Surý
37c0d196e3 Use uv_sleep in the netmgr code
libuv added uv_sleep(unsigned int msec) to the API since 1.34.0.  Use that in
the netmgr code and define usleep based shim for libuv << 1.34.0.
2021-05-03 20:22:54 +02:00
Ondřej Surý
c37ff5d188 Add nanosleep and usleep Windows shims
This commit adds POSIX nanosleep() and usleep() shim implementation for
Windows to help implementors use less #ifdef _WIN32 in the code.
2021-05-03 20:22:54 +02:00
Ondřej Surý
cd54bbbd9a Add trampoline around iocompletionport_createthreads()
On Windows, the iocompletionport_createthreads() didn't use
isc_thread_create() to create new threads for processing IO, but just a
simple CreateThread() function that completely circumvent the
isc_trampoline mechanism to initialize global isc_tid_v.  This lead to
segmentation fault in isc_hp API because '-1' isn't valid index to the
hazard pointer array.

This commit changes the iocompletionport_createthreads() to use
isc_thread_create() instead of CreateThread() to properly initialize
isc_tid_v.
2021-05-03 20:21:15 +02:00
Diego Fronza
7729844150 Address comparison of integers with different signedess 2021-05-03 06:54:30 +00:00
Diego Fronza
54aa60eef8 Add malloc attribute to memory allocation functions
The malloc attribute allows compiler to do some optmizations on
functions that behave like malloc/calloc, like assuming that the
returned pointer do not alias other pointers.
2021-04-26 11:32:17 -03:00
Diego Fronza
efb9c540cd Removed unnecessary check (mpctx->items == NULL)
There is no possibility for mpctx->items to be NULL at the point where
the code was removed, since we enforce that fillcount > 0, if
mpctx->items == NULL when isc_mempool_get is called, then we will
allocate fillcount more items and add to the mpctx->items list.
2021-04-26 11:32:17 -03:00
Artem Boldariev
62033110b9 Use a constant for timeouts in soft-timeout tests
It makes it easier to change the value should the need arise.
2021-04-23 10:01:42 -07:00
Evan Hunt
7f367b0c7f use the correct handle when calling the read callback
when calling isc_nm_read() on an HTTP socket, the read callback
was being run with the incorrect handle. this has been corrected.
2021-04-23 10:01:42 -07:00
Evan Hunt
f0d75ee7c3 fix DOH timeout recovery
as with TLS, the destruction of a client stream on failed read
needs to be conditional: if we reached failed_read_cb() as a
result of a timeout on a timer which has subsequently been
reset, the stream must not be closed.
2021-04-23 10:01:42 -07:00
Evan Hunt
b258df8562 add HTTP timeout recovery test
NOTE: this test currently fails
2021-04-22 12:40:04 -07:00
Evan Hunt
23ec011298 fix TLS timeout recovery
the destruction of the socket in tls_failed_read_cb() needs to be
conditional; if reached due to a timeout on a timer that has
subsequently been reset, the socket must not be destroyed.
2021-04-22 12:08:04 -07:00
Evan Hunt
c90da99180 fix TCP timeout recovery
removed an unnecessary assert in the failed_read_cb() function.
also renamed to isc__nm_tcp_failed_read_cb() to match the practice
in other modules.
2021-04-22 12:08:04 -07:00
Evan Hunt
25ef0547a9 add TCP and TLS timeout recovery tests
NOTE: currently these tests fail
2021-04-22 12:08:04 -07:00
Evan Hunt
52f256f9ae add TCPDNS and TLSDNS timeout recovery tests
this is similar in structure to the UDP timeout recovery test.

this commit adds a new mechanism to the netmgr test allowing the
listen socket to accept incoming TCP connections but never send
a response. this forces the client to time out on read.
2021-04-22 12:08:04 -07:00
Evan Hunt
bcf5b2a675 run read callbacks synchronously on timeout
when running read callbacks, if the event result is not ISC_R_SUCCESS,
the callback is always run asynchronously. this is a problem on timeout,
because there's no chance to reset the timer before the socket has
already been destroyed. this commit allows read callbacks to run
synchronously for both ISC_R_SUCCESS and ISC_R_TIMEDOUT result codes.
2021-04-22 12:08:04 -07:00
Evan Hunt
609975ad20 add a UDP timeout recovery test
this test sets up a server socket that listens for UDP connections
but never responds. the client will always time out; it should retry
five times before giving up.
2021-04-22 12:08:04 -07:00
Evan Hunt
1f41d59a5e allow client read callback to be assignable
allow netmgr client tests to choose the function that will be
used as a read callback, without having to write a different
connect callback handler.
2021-04-22 12:08:04 -07:00
Ondřej Surý
b540722bc3 Refactor taskmgr to run on top of netmgr
This commit changes the taskmgr to run the individual tasks on the
netmgr internal workers.  While an effort has been put into keeping the
taskmgr interface intact, couple of changes have been made:

 * The taskmgr has no concept of universal privileged mode - rather the
   tasks are either privileged or unprivileged (normal).  The privileged
   tasks are run as a first thing when the netmgr is unpaused.  There
   are now four different queues in in the netmgr:

   1. priority queue - netievent on the priority queue are run even when
      the taskmgr enter exclusive mode and netmgr is paused.  This is
      needed to properly start listening on the interfaces, free
      resources and resume.

   2. privileged task queue - only privileged tasks are queued here and
      this is the first queue that gets processed when network manager
      is unpaused using isc_nm_resume().  All netmgr workers need to
      clean the privileged task queue before they all proceed normal
      operation.  Both task queues are processed when the workers are
      finished.

   3. task queue - only (traditional) task are scheduled here and this
      queue along with privileged task queues are process when the
      netmgr workers are finishing.  This is needed to process the task
      shutdown events.

   4. normal queue - this is the queue with netmgr events, e.g. reading,
      sending, callbacks and pretty much everything is processed here.

 * The isc_taskmgr_create() now requires initialized netmgr (isc_nm_t)
   object.

 * The isc_nm_destroy() function now waits for indefinite time, but it
   will print out the active objects when in tracing mode
   (-DNETMGR_TRACE=1 and -DNETMGR_TRACE_VERBOSE=1), the netmgr has been
   made a little bit more asynchronous and it might take longer time to
   shutdown all the active networking connections.

 * Previously, the isc_nm_stoplistening() was a synchronous operation.
   This has been changed and the isc_nm_stoplistening() just schedules
   the child sockets to stop listening and exits.  This was needed to
   prevent a deadlock as the the (traditional) tasks are now executed on
   the netmgr threads.

 * The socket selection logic in isc__nm_udp_send() was flawed, but
   fortunatelly, it was broken, so we never hit the problem where we
   created uvreq_t on a socket from nmhandle_t, but then a different
   socket could be picked up and then we were trying to run the send
   callback on a socket that had different threadid than currently
   running.
2021-04-20 23:22:28 +02:00
Ondřej Surý
16fe0d1f41 Cleanup the public vs private ISCAPI remnants
Since all the libraries are internal now, just cleanup the ISCAPI remnants
in isc_socket, isc_task and isc_timer APIs.  This means, there's one less
layer as following changes have been done:

 * struct isc_socket and struct isc_socketmgr have been removed
 * struct isc__socket and struct isc__socketmgr have been renamed
   to struct isc_socket and struct isc_socketmgr
 * struct isc_task and struct isc_taskmgr have been removed
 * struct isc__task and struct isc__taskmgr have been renamed
   to struct isc_task and struct isc_taskmgr
 * struct isc_timer and struct isc_timermgr have been removed
 * struct isc__timer and struct isc__timermgr have been renamed
   to struct isc_timer and struct isc_timermgr
 * All the associated code that dealt with typing isc_<foo>
   to isc__<foo> and back has been removed.
2021-04-19 13:18:24 +02:00
Ondřej Surý
3388ef36b3 Cleanup the isc_<*>mgr_createinc() constructors
Previously, the taskmgr, timermgr and socketmgr had a constructor
variant, that would create the mgr on top of existing appctx.  This was
no longer true and isc_<*>mgr was just calling isc_<*>mgr_create()
directly without any extra code.

This commit just cleans up the extra function.
2021-04-19 10:22:56 +02:00
Artem Boldariev
66432dcd65 Handle a situation when SSL shutdown messages were sent and received
It fixes a corner case which was causing dig to print annoying
messages like:

14-Apr-2021 18:48:37.099 SSL error in BIO: 1 TLS error (errno:
0). Arguments: received_data: (nil), send_data: (nil), finish: false

even when all the data was properly processed.
2021-04-15 15:49:36 +03:00
Artem Boldariev
513cdb52ec TLS: try to close TCP socket descriptor earlier when possible
Before this fix underlying TCP sockets could remain opened for longer
than it is actually required, causing unit tests to fail with lots of
ISC_R_TOOMANYOPENFILES errors.

The change also enables graceful SSL shutdown (before that it  would
happen only in the case when isc_nm_cancelread() were called).
2021-04-15 15:49:36 +03:00
Ondřej Surý
202b1d372d Merge the tls_test.c into netmgr_test.c and extend the tests suite
This commit merges TLS tests into the common Network Manager unit
tests suite and extends the unit test framework to include support for
additional "ping-pong" style tests where all data could be sent via
lesser number of connections (the behaviour of the old test
suite). The tests for TCP and TLS were extended to make use of the new
mode, as this mode better translates to how the code is used in DoH.

Both TLS and TCP tests now share most of the unit tests' code, as they
are expected to function similarly from a users's perspective anyway.

Additionally to the above, the TLS test suite was extended to include
TLS tests using the connections quota facility.
2021-04-15 15:49:36 +03:00
Artem Boldariev
8da12738f1 Use T_CONNECT timeout constant for TCP tests (instead of 1 ms)
The netmgr_test would be failing on heavily loaded systems because the
connection timeout was set to 1 ms.  Use the global constant instead.
2021-04-07 15:37:10 +02:00
Ondřej Surý
72ef5f465d Refactor async callbacks and fix the double tlsdnsconnect callback
The isc_nm_tlsdnsconnect() call could end up with two connect callbacks
called when the timeout fired and the TCP connection was aborted,
but the TLS handshake was not complete yet.  isc__nm_connecttimeout_cb()
forgot to clean up sock->tls.pending_req when the connect callback was
called with ISC_R_TIMEDOUT, leading to a second callback running later.

A new argument has been added to the isc__nm_*_failed_connect_cb and
isc__nm_*_failed_read_cb functions, to indicate whether the callback
needs to run asynchronously or not.
2021-04-07 15:36:59 +02:00
Ondřej Surý
58e75e3ce5 Skip long tls_tests in the CI
We already skip most of the recv_send tests in CI because they are
too timing-related to be run in overloaded environment.  This commit
adds a similar change to tls_test before we merge tls_test into
netmgr_test.
2021-04-07 15:36:59 +02:00
Artem Boldariev
340235c855 Prevent short TLS tests from hanging in case of errors
The tests in tls_test.c could hang in the event of a connect
error.  This commit allows the tests to bail out when such an
error occurs.
2021-04-07 15:36:59 +02:00
Evan Hunt
426c40c96d rearrange nm_teardown() to check correctness after shutting down
if a test failed at the beginning of nm_teardown(), the function
would abort before isc_nm_destroy() or isc_tlsctx_free() were reached;
we would then abort when nm_setup() was run for the next test case.
rearranging the teardown function prevents this problem.
2021-04-07 15:36:59 +02:00
Ondřej Surý
86f4872dd6 isc_nm_*connect() always return via callback
The isc_nm_*connect() functions were refactored to always return the
connection status via the connect callback instead of sometimes returning
the hard failure directly (for example, when the socket could not be
created, or when the network manager was shutting down).

This commit changes the connect functions in all the network manager
modules, and also makes the necessary refactoring changes in places
where the connect functions are called.
2021-04-07 15:36:59 +02:00
Evan Hunt
a70cd026df move UDP connect retries from dig into isc_nm_udpconnect()
dig previously ran isc_nm_udpconnect() three times before giving
up, to work around a freebsd bug that caused connect() to return
a spurious transient EADDRINUSE. this commit moves the retry code
into the network manager itself, so that isc_nm_udpconnect() no
longer needs to return a result code.
2021-04-07 15:36:59 +02:00
Ondřej Surý
ca12e25bb0 Use generic functions for reading and timers in TCP
The TCP module has been updated to use the generic functions from
netmgr.c instead of its own local copies.  This brings the module
mostly up to par with the TCPDNS and TLSDNS modules.
2021-04-07 15:36:59 +02:00
Ondřej Surý
7df8c7061c Fix and clean up handling of connect callbacks
Serveral problems were discovered and fixed after the change in
the connection timeout in the previous commits:

  * In TLSDNS, the connection callback was not called at all under some
    circumstances when the TCP connection had been established, but the
    TLS handshake hadn't been completed yet.  Additional checks have
    been put in place so that tls_cycle() will end early when the
    nmsocket is invalidated by the isc__nm_tlsdns_shutdown() call.

  * In TCP, TCPDNS and TLSDNS, new connections would be established
    even when the network manager was shutting down.  The new
    call isc__nm_closing() has been added and is used to bail out
    early even before uv_tcp_connect() is attempted.
2021-04-07 15:36:59 +02:00
Ondřej Surý
5a87c7372c Make it possible to recover from connect timeouts
Similarly to the read timeout, it's now possible to recover from
ISC_R_TIMEDOUT event by restarting the timer from the connect callback.

The change here also fixes platforms that missing the socket() options
to set the TCP connection timeout, by moving the timeout code into user
space.  On platforms that support setting the connect timeout via a
socket option, the timeout has been hardcoded to 2 minutes (the maximum
value of tcp-initial-timeout).
2021-04-07 15:36:58 +02:00
Ondřej Surý
33c00c281f Make it possible to recover from read timeouts
Previously, when the client timed out on read, the client socket would
be automatically closed and destroyed when the nmhandle was detached.
This commit changes the logic so that it's possible for the callback to
recover from the ISC_R_TIMEDOUT event by restarting the timer. This is
done by calling isc_nmhandle_settimeout(), which prevents the timeout
handling code from destroying the socket; instead, it continues to wait
for data.

One specific use case for multiple timeouts is serve-stale - the client
socket could be created with shorter timeout (as specified with
stale-answer-client-timeout), so we can serve the requestor with stale
answer, but keep the original query running for a longer time.
2021-04-07 15:36:58 +02:00
Ondřej Surý
0aad979175 Disable netmgr tests only when running under CI
The full netmgr test suite is unstable when run in CI due to various
timing issues.  Previously, we enabled the full test suite only when
CI_ENABLE_ALL_TESTS environment variable was set, but that went against
original intent of running the full suite when an individual developer
would run it locally.

This change disables the full test suite only when running in the CI and
the CI_ENABLE_ALL_TESTS is not set.
2021-04-07 15:36:58 +02:00
Artem Boldariev
ee10948e2d Remove dead code which was supposed to handle TLS shutdowns nicely
Fixes Coverity issue CID 330954 (See #2612).
2021-04-07 11:21:08 +03:00
Artem Boldariev
e6062210c7 Handle buggy situations with SSL_ERROR_SYSCALL
See "BUGS" section at:

https://www.openssl.org/docs/man1.1.1/man3/SSL_get_error.html

It is mentioned there that when TLS status equals SSL_ERROR_SYSCALL
AND errno == 0 it means that underlying transport layer returned EOF
prematurely.  However, we are managing the transport ourselves, so we
should just resume reading from the TCP socket.

It seems that this case has been handled properly on modern versions
of OpenSSL. That being said, the situation goes in line with the
manual: it is briefly mentioned there that SSL_ERROR_SYSCALL might be
returned not only in a case of low-level errors (like system call
failures).
2021-04-07 11:21:08 +03:00
Artem Boldariev
fa062162a7 Fix crash (regression) in DIG when handling non-DoH responses
This commit fixes crash in dig when it encounters non-expected header
value. The bug was introduced at some point late in the last DoH
development cycle. Also, refactors the relevant code a little bit to
ensure better incoming data validation for client-side DoH
connections.
2021-04-01 17:31:29 +03:00
Artem Boldariev
11ed7aac5d TLS code refactoring, fixes and unit-tests
This commit fixes numerous stability issues with TLS transport code as
well as adds unit tests for it.
2021-04-01 17:31:29 +03:00
Petr Mensik
81eb3396bf Do not require config.h to use isc/util.h
util.h requires ISC_CONSTRUCTOR definition, which depends on config.h
inclusion. It does not include it from isc/util.h (or any other header).
Using isc/util.h fails hard when isc/util.h is used without including
bind's config.h.

Move the check to c file, where ISC_CONSTRUCTOR is used. Ensure config.h
is included there.
2021-03-26 11:41:22 +01:00
Patrick McLean
ebced74b19 Add isc_time_now_hires function to get current time with high resolution
The current isc_time_now uses CLOCK_REALTIME_COARSE which only updates
on a timer tick. This clock is generally fine for millisecond accuracy,
but on servers with 100hz clocks, this clock is nowhere near accurate
enough for microsecond accuracy.

This commit adds a new isc_time_now_hires function that uses
CLOCK_REALTIME, which gives the current time, though it is somewhat
expensive to call. When microsecond accuracy is required, it may be
required to use extra resources for higher accuracy.
2021-03-20 11:25:55 -07:00
Ondřej Surý
d016ea745f Fix compilation with NETMGR_TRACE(_VERBOSE) enabled on non-Linux
When NETMGR_TRACE(_VERBOSE) is enabled, the build would fail on some
non-Linux non-glibc platforms because:

  * Use <stdint.h> print macros because uint_fast32_t is not always
    unsigned long

  * The header <execinfo.h> is not available on non-glibc, thus commit
    adds dummy backtrace() and backtrace_symbols_fd() functions for
    platforms without HAVE_BACKTRACE
2021-03-19 16:25:28 +01:00
Ondřej Surý
42e4e3b843 Improve reliability of the netmgr unit tests
The netmgr unit tests were designed to push the system limits to maximum
by sending as many queries as possible in the busy loop from multiple
threads.  This mostly works with UDP, but in the stateful protocol where
establishing the connection takes more time, it failed quite often in
the CI.  On FreeBSD, this happened more often, because the socket() call
would fail spuriosly making the problem even worse.

This commit does several things to improve reliability:

* return value of isc_nm_<proto>connect() is always checked and retried
  when scheduling the connection fails

* The busy while loop has been slowed down with usleep(1000); so the
  netmgr threads could schedule the work and get executed.

* The isc_thread_yield() was replaced with usleep(1000); also to allow
  the other threads to do any work.

* Instead of waiting on just one variable, we wait for multiple
  variables to reach the final value

* We are wrapping the netmgr operations (connects, reads, writes,
  accepts) with reference counting and waiting for all the callbacks to
  be accounted for.

  This has two effects:

  a) the isc_nm_t is always clean of active sockets and handles when
     destroyed, so it will prevent the spurious INSIST(references == 1)
     from isc_nm_destroy()

  b) the unit test now ensures that all the callbacks are always called
     when they should be called, so any stuck test means that there was
     a missing callback call and it is always a real bug

These changes allows us to remove the workaround that would not run
certain tests on systems without port load-balancing.
2021-03-19 16:25:28 +01:00
Ondřej Surý
e4e0e9e3c1 Call isc__nm_tlsdns_failed_read on tls_error to cleanup the socket
In tls_error(), we now call isc__nm_tlsdns_failed_read() instead of just
stopping timer and reading from the socket.  This allows us to properly
cleanup any pending operation on the socket.
2021-03-19 15:28:52 +01:00
Ondřej Surý
e4b0730387 Call the isc__nm_failed_connect_cb() early when shutting down
When shutting down, calling the isc__nm_failed_connect_cb() was delayed
until the connect callback would be called.  It turned out that the
connect callback might not get called at all when the socket is being
shut down.  Call the failed_connect_cb() directly in the
tlsdns_shutdown() instead of waiting for the connect callback to call it.
2021-03-18 14:31:15 -07:00
Ondřej Surý
73c574e553 Fix typo in processbuffer() - tcpdns vs tlsdns
The processbuffer() would call isc__nm_tcpdns_processbuffer() instead of
isc__nm_tlsdns_processbuffer() for the isc_nm_tlsdnssocket type of
socket.
2021-03-18 21:35:13 +01:00
Ondřej Surý
1d64d4cde8 Fix memory accounting bug in TLSDNS
After a partial write the tls.senddata buffer would be rearranged to
contain only the data tha wasn't sent and the len part would be made
shorter, which would lead to attempt to free only part of a socket's
tls.senddata buffer.
2021-03-18 18:14:38 +01:00