Instead of holding the catzs->lock the whole time we process the catz
update, only hold it for hash table lookup and then release it. This
should unblock any other threads that might be processing updates to
catzs triggered by extra incoming transfer.
Offload catalog zone processing so that the network manager threads
are not interrupted by a large catalog zone update.
Introduce a new 'updaterunning' state alongside with 'updatepending',
like it is done in the RPZ module.
Note that the dns__catz_update_cb() function currently holds the
catzs->lock during the whole process, which is far from being optimal,
but the issue is going to be addressed separately.
This change should make sure that catalog zone update processing
doesn't happen when the catalog zone is being shut down. This
should help avoid races when offloading the catalog zone updates
in the follow-up commit.
The configure_catz() function creates the catalog zones structure
for the view even when it is not needed, in which case it then
discards it (by detaching) later.
Instead, call dns_catz_new_zones() only when it is needed, i.e. when
there is no existing "previous" view with an existing 'catzs', that
is going to be reused.
* Change 'dns_catz_new_zones()' function's prototype (the order of the
arguments) to synchronize it with the similar function in rpz.c.
* Rename 'refs' to 'references' in preparation of ISC_REFCOUNT_*
macros usage for reference tracking.
* Unify dns_catz_zone_t naming to catz, and dns_catz_zones_t naming to
catzs, following the logic of similar changes in rpz.c.
* Use C compound literals for structure initialization.
* Synchronize the "new zone version came too soon" log message with the
one in rpz.c.
* Use more of 'sizeof(*ptr)' style instead of the 'sizeof(type_t)' style
expressions when allocating or freeing memory for 'ptr'.
`libirs` used to be a reference implementation of `getaddrinfo` and
related modern resolver APIs. It was stripped down in BIND 9.18
leaving only the `irs_resconf` module, which parses
`/etc/resolv.conf`. I have kept its include path and namespace prefix,
so it remains a little fragment of libirs now embedded in libdns.
Add new SonarCloud GitHub Action and configuration; something (maybe
the way the builds were submitted) has apparently changed and the
project got deleted and the analysis wasn't working.
when a message arrives over a TCP connection matching an expected
QID, the dispatch is updated so it no longer expects that QID,
but continues reading. subsequent messages with the same QID are
ignored, unless the dispatch entry has called dns_dispatch_getnext()
or dns_dispatch_resume().
however, a coding error caused those functions to have no effect
when the dispatch was reading, so streams of messages with the same
QID could not be received over a single TCP connection, breaking *XFR.
this has been corrected by changing the order of operations in
tcp_dispatch_getnext() so that disp->reading isn't checked until
after the dispatch entry has been reactivated.
the dns_xfrin module was still using the network manager directly to
manage TCP connections and send and receive messages. this commit
changes it to use the dispatch manager instead.
use ISC_REFCOUNT_IMPL for dns_xfrin_ctx_t (which has been renamed
to dns_xfrin_t to keep the function names dns_xfrin_attach() and
dns_xfrin_detach() unchanged).
the 'dispatchmgr' member of the resolver object is used by both
the dns_resolver and dns_request modules, and may in the future
be used by others such as dns_xfrin. it doesn't make sense for it
to live in the resolver object; this commit moves it into dns_view.
This "quiescent state based reclamation" module provides support for
the qp-trie module in dns/qp. It is a replacement for liburcu, written
without reference to the urcu source code, and in fact it works in a
significantly different way.
A few specifics of BIND make this variant of QSBR somewhat simpler:
* We can require that wait-free access to a qp-trie only happens in
an isc_loop callback. The loop provides a natural quiescent state,
after the callbacks are done, when no qp-trie access occurs.
* We can dispense with any API like rcu_synchronize(). In practice,
it takes far too long to wait for a grace period to elapse for each
write to a data structure.
* We use the idea of "phases" (aka epochs or eras) from EBR to
reduce the amount of bookkeeping needed to track memory that is no
longer needed, knowing that the qp-trie does most of that work
already.
I considered hazard pointers for safe memory reclamation. They have
more read-side overhead (updating the hazard pointers) and it wasn't
clear to me how to nicely schedule the cleanup work. Another
alternative, epoch-based reclamation, is designed for fine-grained
lock-free updates, so it needs some rethinking to work well with the
heavily read-biased design of the qp-trie. QSBR has the fastest read
side of the basic SMR algorithms (with no barriers), and fits well
into a libuv loop. More recent hybrid SMR algorithms do not appear to
have enough benefits to justify the extra complexity.
the isc_glob module was originally needed to support posix-style glob
processing on Windows, but is now just an unnecessary wrapper around
glob(3). this commit removes it.
the parser could crash when "include" specified an empty string in place
of the filename. this has been fixed by returning ISC_R_FILENOTFOUND
when the string length is 0.
Previously, the async job queue would use a locked-list (ISC_LIST).
With introduction of atomic stack (that has to be drained at once), we
could use it to remove some contention between the threads and simplify
the async queue.
Fortunately, the reverse order still works for us - instead of append
and tail/prev operation on the list, we are now using prepend and
head/next operation on the atomic stack.
Add a singly-linked stack that supports lock-free prepend and drain (to
empty the list and clean up its elements). Intended for use with QSBR
to collect objects that need safe memory reclamation, or any other user
that works with adding objects to the stack and then draining them in
one go like various work queues.
In <isc/atomic.h>, add an `atomic_ptr()` macro to make type
declarations a little less abominable, and clean up a duplicate
definition of `atomic_compare_exchange_strong_acq_rel()`
[CVE-2022-3924] Add a reproducer for the serve-stale crash when recursive clients soft quota is reached
Closes#3619
See merge request isc-projects/bind9!7575
Reproduce the assertion by configuring a 'named' resolver with
'recursive-clients 10;' configuration option and running 20
queries is parallel.
Also tweak the 'ans2/ans.pl' to simulate a 50ms network latency
when qname starts with "latency". This makes sure that queries
running in parallel don't get served immediately, thus allowing
the configured recursive clients quota limitation to be activated.
removed references in code comments, doc/dev documentation, etc, to
isc_task, isc_timer_reset(), and isc_timertype_inactive. also removed a
coccinelle patch related to isc_timer_reset() that was no longer needed.