2
0
mirror of https://gitlab.isc.org/isc-projects/bind9 synced 2025-09-02 07:35:26 +00:00
Commit Graph

14012 Commits

Author SHA1 Message Date
Ondřej Surý
7e59b8a4a1 Pause the dbiterator when dumping the zone to the disk
When we rewrote the zone dumping to use the separate threadpool, the
dumping would acquire the read lock for the whole time the zone dumping
process is dumping the zone.

When combined with incoming IXFR that tries to acquire the write lock on
the same rwlock, we would end up blocking all the other readers.

In this commit, we pause the dbiterator every time we get next record
and before start dumping it to the disk.
2021-06-04 08:25:05 +00:00
Mark Andrews
66d1df57cb Report which assertion failed when calling set_global_error 2021-06-03 11:55:31 +10:00
Ondřej Surý
f14d870d15 Fix copy&paste error in setsockopt_off
Because of copy&paste error the setsockopt_off macro would enable
the socket option instead of disabling it.
2021-06-02 17:47:14 +02:00
Ondřej Surý
67afea6cfc Cleanup the remaining of HAVE_UV_<func> macros
While cleaning up the usage of HAVE_UV_<func> macros, we forgot to
cleanup the HAVE_UV_UDP_CONNECT in the actual code and
HAVE_UV_TRANSLATE_SYS_ERROR and this was causing Windows build to fail
on uv_udp_send() because the socket was already connected and we were
falsely assuming that it was not.

The platforms with autoconf support were not affected, because we were
still checking for the functions from the configure.
2021-06-02 11:23:36 +02:00
Artem Boldariev
35d0027f36 HTTP/2 write buffering
This commit adds the ability to consolidate HTTP/2 write requests if
there is already one in flight. If it is the case, the code will
consolidate multiple subsequent write request into a larger one
allowing to utilise the network in a more efficient way by creating
larger TCP packets as well as by reducing TLS records overhead (by
creating large TLS records instead of multiple small ones).

This optimisation is especially efficient for clients, creating many
concurrent HTTP/2 streams over a transport connection at once.  This
way, the code might create a small amount of multi-kilobyte requests
instead of many 50-120 byte ones.

In fact, it turned out to work so well that I had to add a work-around
to the code to ensure compatibility with the flamethrower, which, at
the time of writing, does not support TLS records larger than two
kilobytes. Now the code tries to flush the write buffer after 1.5
kilobyte, which is still pretty adequate for our use case.

Essentially, this commit implements a recommendation given by nghttp2
library:

https://nghttp2.org/documentation/nghttp2_session_mem_send.html
2021-06-01 21:07:45 +03:00
Ondřej Surý
e83b6569da Indicate to the kernel that we won't be needing the zone dumps
Add a call to posix_fadvise() to indicate to the kernel, that `named`
won't be needing the dumped zone files any time soon with:

 * POSIX_FADV_DONTNEED - The specified data will not be accessed in the
   near future.

Notes:

 POSIX_FADV_DONTNEED attempts to free cached pages associated with the
 specified region. This is useful, for example, while streaming large
 files. A program may periodically request the kernel to free cached
 data that has already been used, so that more useful cached pages are
 not discarded instead.
2021-05-31 14:52:05 +02:00
Ondřej Surý
8a5c62de83 Refactor zone dumping code to use netmgr async threadpools
Previously, dumping the zones to the files were quantized, so it doesn't
slow down network IO processing.  With the introduction of network
manager asynchronous threadpools, we can move the IO intensive work to
use that API and we don't have to quantize the work anymore as it the
file IO won't block anything except other zone dumping processes.
2021-05-31 14:52:05 +02:00
Ondřej Surý
7670f98377 Add isc_task_getnetmgr() function
Add a function to pull the attached netmgr from inside the executed
task.  This is needed for any task that needs to call the netmgr API.
2021-05-31 14:52:05 +02:00
Ondřej Surý
87fe97ed91 Add asynchronous work API to the network manager
The libuv has a support for running long running tasks in the dedicated
threadpools, so it doesn't affect networking IO.

This commit adds isc_nm_work_enqueue() wrapper that would wraps around
the libuv API and runs it on top of associated worker loop.

The only limitation is that the function must be called from inside
network manager thread, so the call to the function should be wrapped
inside a (bound) task.
2021-05-31 14:52:05 +02:00
Ondřej Surý
211bfefbaa Use UV_VERSION_HEX to decide whether we need libuv shim functions
Instead of having a configure check for every missing function that has
been added in later version of libuv, we now use UV_VERSION_HEX to
decide whether we need the shim or not.
2021-05-31 14:52:05 +02:00
Ondřej Surý
7477d1b2ed Add uv_os_getenv() and uv_os_setenv() compatibility shims
The uv_os_getenv() and uv_os_setenv() functions were introduced in the
libuv >= 1.12.0.  Add simple compatibility shims for older versions.
2021-05-31 14:52:05 +02:00
Ondřej Surý
f752840db3 Add uv_req_get_data() and uv_req_set_data() compatibility shims
The uv_req_get_data() and uv_req_set_data() functions were introduced in
libuv >= 1.19.0, so we need to add compatibility shims with older libuv
versions.
2021-05-31 14:52:05 +02:00
Matthijs Mekking
f7f543d99b Reuse rdatset->ttl when dumping ancient RRsets
Rather than having an expensive 'expired' (fka 'stale_ttl') in the
rdataset structure, that is only used to be printed in a comment on
ancient RRsets, reuse the TTL field of the RRset.
2021-05-30 11:48:36 -07:00
Kevin Chen
0cdf85d204 Several serve-stale improvements
Commit a83c8cb0af updated masterdump so
that stale records in "rndc dumpdb" output no longer shows 0 TTLs.  In
this commit we change the name of the `rdataset->stale_ttl` field to
`rdataset->expired` to make its purpose clearer, and set it to zero in
cases where it's unused.

Add 'rbtdb->serve_stale_ttl' to various checks so that stale records
are not purged from the cache when they've been stale for RBTDB_VIRTUAL
(300) seconds.

Increment 'ns_statscounter_usedstale' when a stale answer is used.

Note: There was a question of whether 'overmem_purge' should be
purging ancient records, instead of stale ones.  It is left as purging
stale records, since stale records could take up the majority of the
cache.

This submission is copyrighted Akamai Technologies, Inc. and provided
under an MPL 2.0 license.

This commit was originally authored by Kevin Chen, and was updated by
Matthijs Mekking to match recent serve-stale developments.
2021-05-30 11:45:35 -07:00
Matthijs Mekking
c0dc5937c7 Reset DNS_FETCHOPT_TRYSTALE_ONTIMEOUT on resume
Once we resume a query, we should clear DNS_FETCHOPT_TRYSTALE_ONTIMEOUT
from the options to prevent triggering the stale-answer-client-timeout
on subsequent fetches.

If we don't this may cause a crash when for example when prefetch is
triggered after a query restart.
2021-05-30 00:03:51 -07:00
Evan Hunt
8bd8e995f1 clean up query correctly if already answered by serve-stale
when a serve-stale answer has been sent, the client continues waiting
for a proper answer. if a final completion event for the client does
arrive, it can just be cleaned up without sending a response, similar
to a canceled fetch.
2021-05-27 10:35:48 -07:00
Mark Andrews
d68b009cfe Remove priority from attribute constructor/destructor
On some platforms, the __attribute__ constructor and destructor won't
take priorities and the compilation failed.  On such platform would be
macOS.  For this reason, the constructor/destructor in the libisc was
reworked to not use priorities, but have a single constructor and
destructor that calls the appropriate routines in correct order.

This commit removes the extra priority because it's now not needed and
it also breaks a compilation on macOS with GCC 10.
2021-05-27 08:02:21 +02:00
Mark Andrews
715a2c7fc1 Add missing initialisations
configuring with --enable-mutex-atomics flagged these incorrectly
initialised variables on systems where pthread_mutex_init doesn't
just zero out the structure.
2021-05-26 08:15:08 +00:00
Ondřej Surý
2db5290579 Fix the sizeof() for array holding the pointers to clientmgr
The size of the array holding the pointers to clientmgr was created so
big it could hold the actual clientmgr objects, not just the pointer.
This commit fixes the size to be just the ncpus * sizeof(pointer).
2021-05-26 10:03:52 +02:00
Ondřej Surý
a227562f13 Cleanup the struct isc_nmiface
In previous MR, I forgot to remove the `struct isc_nmiface`, this commit
rectifies that.
2021-05-26 09:55:10 +02:00
Ondřej Surý
50270de8a0 Refactor the interface handling in the netmgr
The isc_nmiface_t type was holding just a single isc_sockaddr_t,
so we got rid of the datatype and use plain isc_sockaddr_t in place
where isc_nmiface_t was used before.  This means less type-casting and
shorter path to access isc_sockaddr_t members.

At the same time, instead of keeping the reference to the isc_sockaddr_t
that was passed to us when we start listening, we will keep a local
copy. This prevents the data race on destruction of the ns_interface_t
objects where pending nmsockets could reference the sockaddr of already
destroyed ns_interface_t object.
2021-05-26 09:43:12 +02:00
Mark Andrews
0a45af2e2f Consolidate xhdr fixups 2021-05-26 08:16:35 +10:00
Mark Andrews
00609f5094 Correct size calculation in dns_journal_iter_init()
* dns_journal_next() leaves the read point in the journal after the
transaction header so journal_seek() should be inside the loop.
* we need to recover from transaction header inconsistencies

Additionally when correcting for <size, serial0, serial1, 0> the
correct consistency check is isc_serial_gt() rather than
isc_serial_ge().  All instances updated.
2021-05-25 22:27:54 +10:00
Ondřej Surý
d0d37aa6d1 Don't set memory context name in resolver.c
We now attach to existing memory context instead of creating a new
memory context, so we should not set its name.
2021-05-25 07:25:44 +02:00
Ondřej Surý
a1c6fd5ede Adjust the fillcount and freemax for dns_message mempools
According to the measurements (recorded on GL!5085), the fillcount of 2
for namepool and fillcount of 4 for rdspool can fit 99.99% of request
for tested scenarios.

This was discovered by perf recording the single second recursive test
using flamethrower where the initial malloc lit up like a flare.
2021-05-24 20:44:58 +02:00
Ondřej Surý
28b65d8256 Reduce the number of clientmgr objects created
Previously, as a way of reducing the contention between threads a
clientmgr object would be created for each interface/IP address.

We tasks being more strictly bound to netmgr workers, this is no longer
needed and we can just create clientmgr object per worker queue (ncpus).

Each clientmgr object than would have a single task and single memory
context.
2021-05-24 20:44:54 +02:00
Ondřej Surý
aad7856b8e Don't create per bucket memory contexts in resolver
Similarly, the resolver code would create hundreds of memory contexts
just on the resolver setup.  The contention will be reduced directly in
the allocator, so for now just attach to the view memory instead of
creating separate memory context for each bucket.
2021-05-24 20:02:20 +02:00
Ondřej Surý
4db5e30177 Run shutdown events with the task's existing threadid
Previously, task->threadid was reassigned to 0 while shutting
down, which caused an assertion.
2021-05-24 20:02:20 +02:00
Ondřej Surý
0be7ea78be Reduce the number of client tasks and bind them to netmgr queues
Since a client object is bound to a netmgr handle, each client
will always be processed by the same netmgr worker, so we can
simplify the code by binding client->task to the same thread as
the client. Since ns__client_request() now runs in the same event
loop as client->task events, is no longer necessary to pause the
task manager before launching them.

Also removed some functions in isc_task that were not used.
2021-05-24 20:02:20 +02:00
Ondřej Surý
c07f8c5a43 Reduce the number of tasks in the clientmgr
We now use one task per CPU per dispatchmgr (that's still a lot).
2021-05-24 20:02:20 +02:00
Ondřej Surý
0719f032e1 Reduce the number of mctx created in clientmgr
The number of memory contexts created in the clientmgr was enormous.  It
could easily create thousands of memory contexts because the formula was:

    nprotocols * ncpus * ninterfaces * CLIENT_NMCTXS_PERCPU (8)

The original goal was to reduce the contention when allocating the
memory, but after a while nobody noticed that the amount of memory
context allocated would not reduce contention at all.

This commit removes the whole mctxpool and just uses the mctx from
clientmgr as the contention will be reduced directly in the allocator.
2021-05-24 20:02:20 +02:00
Evan Hunt
b0aadaac8e rename dns_name_copynf() to dns_name_copy()
dns_name_copy() is now the standard name-copying function.
2021-05-22 00:37:27 -07:00
Evan Hunt
ea7b28f101 remove dns_name_copy() implementation
Remove dns_name_copy() and refactor the underlying code since
it will only be called by dns_name_copynf() now, and can't fail.
2021-05-22 00:22:32 -07:00
Evan Hunt
b1fe1b8ae3 remove the remaining uses of dns_name_copy()
dns_name_copy() has been replaced nearly everywhere with
dns_name_copynf().  this commit changes the last two uses of
the original function.  afterward, we can remove the old
dns_name_copy() implementation, and replace it with _copynf().
2021-05-22 00:22:32 -07:00
Ondřej Surý
ce3e1abc1d Use dns_name_copynf() with dns_message_gettempname() when needed
dns_message_gettempname() returns an initialized name with a dedicated
buffer, associated with a dns_fixedname object.  Using dns_name_copynf()
to write a name into this object will actually copy the name data
from a source name. dns_name_clone() merely points target->ndata to
source->ndata, so it is faster, but it can lead to a use-after-free if
the source is freed before the target object is released via
dns_message_puttempname().

In a few places, clone was being used where copynf should have been;
this is now fixed.

As a side note, no memory was lost, because the ndata buffer used in
the dns_fixedname_t is internal to the structure, and is freed when
the dns_fixedname_t is freed regardless of the .ndata contents.
2021-05-21 21:28:10 -07:00
Ondřej Surý
5ee9edc4ce Optimize rdataset_getownercase not to use bitshifts
The last rdataset_getownercase() left it in a state where the code was
mix of microoptimizations (manual loop unrolling, complicated bitshifts)
with a code that would always rewrite the character even if it stayed
the same after transformation.

This commit makes sure that we modify only the characters that actually
need to change, removes the manual loop unrolling, and replaces the
weird bit arithmetics with a simple shift and bit-and.
2021-05-20 20:41:29 +02:00
Evan Hunt
e31cc1eeb4 use a fixedname buffer in dns_message_gettempname()
dns_message_gettempname() now returns a pointer to an initialized
name associated with a dns_fixedname_t object. it is no longer
necessary to allocate a buffer for temporary names associated with
the message object.
2021-05-20 20:41:29 +02:00
Matthijs Mekking
252a1ae0a1 Lock kasp when looking for zone keys
We should also lock kasp when reading key files, because at the same
time the zone in another view may be updating the key file.
2021-05-20 09:15:43 +02:00
Artem Boldariev
67c50abe5a Add DoH quota tests
This commit adds unit tests which ensure that DoH code is compatible
with quota functionality.
2021-05-19 10:28:47 +03:00
Matthijs Mekking
19395fd168 Fix coverity issue 331478
Move the "cannot start rollover" warning into code block that checks
if 'active_key' is not NULL.
2021-05-19 00:45:54 +00:00
Mark Andrews
314b5362a8 Remove dns_zone_setflag()
This function has never been used since it was added to the source tree
by commit 686b27bfd3 back in 1999.  As
the dns_zoneflg_t type is only defined in lib/dns/zone.c, no function
external to that file would be able to use dns_zone_setflag() properly
anyway - the DNS_ZONE_SETFLAG() and DNS_ZONE_CLRFLAG() macros should be
used instead. Zone options that can be set from outside zone.c are set
using dns_zone_setoption().
2021-05-18 16:02:18 -07:00
Matthijs Mekking
494e8b2cbd Check key-directory duplicates for kasp zones
Don't allow the same zone with different dnssec-policies in separate
views have the same key-directory.

Track zones plus key-directory in a symtab and if there is a match,
check the offending zone's dnssec-policy name. If the name is "none"
(there is no kasp for the offending zone), or if the name is the same
(the zone shares keys), it is fine, otherwise it is an error (zones
in views using different policies cannot share the same key-directory).
2021-05-18 15:47:02 +02:00
Mark Andrews
5d21042ed8 Adjust returned method from dns_updatemethod_date
if dns_updatemethod_date is used do that the returned method is only
set to dns_updatemethod_increment if the new serial does not encode
the current day (YYYYMMDDXX).
2021-05-18 12:30:22 +00:00
Mark Andrews
7e83c6df94 initialise worker->cond_prio 2021-05-18 07:47:42 +00:00
Mark Andrews
29f1c1e677 Silence gcc-10-fanalyzer false positive
If 'state == ft_ordinary' then 'label' can't be NULL. Add
INSIST to reflect this.
2021-05-18 15:51:51 +10:00
Mark Andrews
683ad6e4bd Silence gcc-10-fanalyzer false positive
Add REQUIRE(type == dns_rdatatype_nsec3 || firstp != NULL); so
that dereferences of *firstp is not flagged as a NULL pointer
dereference.
2021-05-18 15:19:28 +10:00
Mark Andrews
8eed392add Address potential resource leak in dst_key_fromnamedfile 2021-05-18 10:33:43 +10:00
Ondřej Surý
9e3cb396b2 Replace netmgr quantum with loop-preventing barrier
Instead of using fixed quantum, this commit adds atomic counter for
number of items on each queue and uses the number of netievents
scheduled to run as the limit of maximum number of netievents for a
single process_queue() run.

This prevents the endless loops when the netievent would schedule more
netievents onto the same loop, but we don't have to pick "magic" number
for the quantum.
2021-05-17 11:59:19 +02:00
Ondřej Surý
4509089419 Add configuration option to set send/recv buffers on the nm sockets
This commit adds a new configuration option to set the receive and send
buffer sizes on the TCP and UDP netmgr sockets.  The default is `0`
which doesn't set any value and just uses the value set by the operating
system.

There's no magic value here - set it too small and the performance will
drop, set it too large, the buffers can fill-up with queries that have
already timeouted on the client side and nobody is interested for the
answer and this would just make the server clog up even more by making
it produce useless work.

The `netstat -su` can be used on POSIX systems to monitor the receive
and send buffer errors.
2021-05-17 08:47:09 +02:00
Michal Nowak
c628f2c71b Make masterXX.data.in reachable by out-of-tree builds
Unit test run for out-of-tree builds used to fail to find
masterXX.data.in files:

    /usr/bin/perl -w /builds/mnowak/bind9/lib/dns/tests/mkraw.pl < testdata/master/master12.data.in > testdata/master/master12.data
    /bin/bash: testdata/master/master12.data.in: No such file or directory
    make[4]: *** [Makefile:1910: testdata/master/master12.data] Error 1
2021-05-14 13:22:09 +02:00