The previous work in this area was led by the belief that we might be
calling call_rcu() from within call_rcu() callbacks. After carefully
checking all the current callback, it became evident that this is not
the case and the problem isn't enough rcu_barrier() calls, but something
entirely else.
Call the rcu_barrier() just once as that's enough and the multiple
rcu_barrier() calls will not hide the real problem anymore, so we can
find it.
As it was pointed out, the alignas() can't be used on objects larger
than `max_align_t` otherwise the compiler might miscompile the code to
use auto-vectorization on unaligned memory.
As we were only using alignas() as a way to prevent false memory
sharing, we can use manual padding in the affected structures.
Instead of crude 5x rcu_barrier() call in the isc__mem_destroy(), change
the mechanism to call rcu_barrier() until the memory use and references
stops decreasing. This should deal with any number of nested call_rcu()
levels.
Additionally, don't destroy the contextslock if the list of the contexts
isn't empty. Destroying the lock could make the late threads crash.
Because we don't use jemalloc functions directly, but only via the
libisc library, the dynamic linker might pull the jemalloc library
too late when memory has been already allocated via standard libc
allocator.
Add a workaround round isc_mem_create() that makes the dynamic linker
to pull jemalloc earlier than libc.
Previously, there were two methods of working with the overmem
condition:
1. hi/lo water callback - when the overmem condition was reached
for the first time, the water callback was called with HIWATER
mark and .is_overmem boolean was set internally. Similarly,
when the used memory went below the lo water mark, the water
callback would be called with LOWATER mark and .is_overmem
was reset. This check would be called **every** time memory
was allocated or freed.
2. isc_mem_isovermem() - a simple getter for the internal
.is_overmem flag
This commit refactors removes the first method and move the hi/lo water
checks to the isc_mem_isovermem() function, thus we now have only a
single method of checking overmem condition and the check for hi/lo
water is removed from the hot path for memory contexts that doesn't use
overmem checks.
We now depend on explicitly creating memory arenas and disabling tcache
on those, and these features are not available with jemalloc < 4.
Instead of working around these issues, make the jemalloc >= 4.0.0 hard
requirement by looking for sdallocx() symbol that's only available from
that version.
The jemalloc < 4 was only used by RHEL 7 which is not supported since
BIND 9.19+.
This commit extends the internal memory management middleware code in
BIND so that memory contexts backed by dedicated jemalloc arenas can
be created. A new function (isc_mem_create_arena()) is added for that.
Moreover, it extends the existing code so that specialised memory
contexts can be created easily, should we need that functionality for
other future purposes. We have achieved that by passing the flags to
the underlying jemalloc-related calls. See the above
isc_mem_create_arena(), which can serve as an example of this.
Having this opens up possibilities for creating memory contexts tuned
for specific needs.
Use the new isc_mem_c*() calloc-like API for allocations that are
zeroed.
In turn, this also fixes couple of incorrect usage of the ISC_MEM_ZERO
for structures that need to be zeroed explicitly.
There are few places where isc_mem_cput() is used on structures with a
flexible member (or similar).
Add new isc_mem_cget(), isc_mem_creget(), and isc_mem_cput() macros to
complement the isc_mem_callocate() (which works like calloc()).
The overflow checks are implemented as macros in the <isc/mem.h>, so
that the compiler can see that the element size is constant: it should
always be `sizeof(something)`.
Because rcu_barrier() needs to be called as many times as the number of
nested call_rcu() calls (call_rcu() calls made from call_rcu thread),
and currently there's no mechanism to detect whether there are more
call_rcu callbacks scheduled, we simply call the rcu_barrier() multiple
times. The overhead is negligible and it prevents rare assertion
failures caused by the check for memory leaks in isc__mem_destroy().
As well as clearing the fresh memory, `calloc()`-like functions must
ensure that the count and size do not overflow when multiplied.
Use `isc_mem_callocate()` in `isc__uv_calloc()`.
Memory reclamation by `call_rcu()` is asynchronous, so during shutdown
it can lose a race with the destruction of its memory context. When we
defer memory reclamation, we need to attach to the memory context to
indicate that it is still in use, but that is not enough to delay its
destruction. So, call `rcu_barrier()` in `isc_mem_destroy()` to wait
for pending RCU work to finish before proceeding to destroy the memory
context.
isc_bind9 was a global bool used to indicate whether the library
was being used internally by BIND or by an external caller. external
use is no longer supported, but the variable was retained for use
by dyndb, which needed it only when being built without libtool.
building without libtool is *also* no longer supported, so the variable
can go away.
This restores the Malloced memory counter and it's now always equal to
InUse counter. This is only for backwards compatibility reason and
there is no separate counter.
The commit also cleanups little things like structure with a single
item (summary.inuse), and shuts up a wrong cppcheck warning (the
notorious NULL check after assignment).
Instead of enforcing stronger synchronization between threads, make all
the atomic operations relaxed. We are not really interested in exact
numbers at all times - the single place where we need the exact number
is when the memory context is being destroyed. Even when there's a
overmem counter, we don't care about exact ordering or exact number.
The Lost memory counter would count the memory "lost" by external
libraries. There's really no such thing as `named` require the memory
contexts to be clean on destroy.
The stats buckets were again more useful for internal allocator, because
we would see the individual "block" caches where the allocations would
fall into. Remove the stats buckets, and if needed, we can pull more
detailed statistics out of the jemalloc.
The total memory counter had again little or no meaning when we removed
the internal memory allocator. It was just a monotonic counter that
would count add the allocation sizes but never subtracted anything, so
it would be just a "big number".
The maxinuse memory counter indicated the highest amount of
memory allocated in the past. Checking and updating this high-
water mark value every time memory was allocated had an impact
on server performance, so it has been removed. Memory size can
be monitored more efficiently via an external tool logging RSS.
The malloced and maxmalloced memory counters were mostly useless since
we removed the internal allocator blocks - it would only differ from
inuse by the memory context size itself.
This commit ensures that BIND and supplementary tools still can be
built on newer versions of DragonFly BSD. It used to be the case, but
somewhere between versions 6.2 and 6.4 the OS developers rearranged
headers and moved some function definitions around.
Before that the fact that it worked was more like a coincidence, this
time we, at least, looked at the related man pages included with the
OS.
No in depth testing has been done on this OS as we do not really
support this platform - so it is more like a goodwill act. We can,
however, use this platform for testing purposes, too. Also, we know
that the OS users do use BIND, as it is included in its ports
directory.
Building with './configure' and './configure --without-jemalloc' have
been fixed and are known to work at the time the commit is made.
I.e. print the name of the function in BIND that called the system
function that returned an error. Since it was useful for pthreads
code, it seems worthwhile doing so everywhere.
Mostly generated automatically with the following semantic patch,
except where coccinelle was confused by #ifdef in lib/isc/net.c
@@ expression list args; @@
- UNEXPECTED_ERROR(__FILE__, __LINE__, args)
+ UNEXPECTED_ERROR(args)
@@ expression list args; @@
- FATAL_ERROR(__FILE__, __LINE__, args)
+ FATAL_ERROR(args)
Replace all uses of RUNTIME_CHECK() in lib/isc/include/isc/once.h with
PTHEADS_RUNTIME_CHECK(), in order to improve error reporting for any
once-related run-time failures (by augmenting error messages with
file/line/caller information and the error string corresponding to
errno).
Previously, the isc_mem_get_aligned() and friends took alignment size as
one of the arguments. Replace the specific function with more generic
extended variant that now accepts ISC_MEM_ALIGN(alignment) for aligned
allocations and ISC_MEM_ZERO for allocations that zeroes
the (re-)allocated memory before returning the pointer to the caller.
Unlike standard free(), isc_mem_free() is not a no-op when passed a
NULL pointer. For size accounting purposes it calls sallocx(), which
crashes when passed a NULL pointer. To get more helpful diagnostics,
REQUIRE() that the pointer is not NULL so that when the programmer
makes a mistake they get a backtrace that shows what went wrong.
Previously, the isc_mem_debugging would be single global variable that
would affect the behavior of the memory context whenever it would be
changed which could be after some allocation were already done.
Change the memory debugging options to be local to the memory context
and immutable, so all allocations within the same memory context are
treated the same.
it's a style violation to have REQUIRE or INSIST contain code that
must run for the server to work. this was being done with some
atomic_compare_exchange calls. these have been cleaned up. uses
of atomic_compare_exchange in assertions have been replaced with
a new macro atomic_compare_exchange_enforced, which uses RUNTIME_CHECK
to ensure that the exchange was successful.
Historically, the inline keyword was a strong suggestion to the compiler
that it should inline the function marked inline. As compilers became
better at optimising, this functionality has receded, and using inline
as a suggestion to inline a function is obsolete. The compiler will
happily ignore it and inline something else entirely if it finds that's
a better optimisation.
Therefore, remove all the occurences of the inline keyword with static
functions inside single compilation unit and leave the decision whether
to inline a function or not entirely on the compiler
NOTE: We keep the usage the inline keyword when the purpose is to change
the linkage behaviour.
Previously, the unreachable code paths would have to be tagged with:
INSIST(0);
ISC_UNREACHABLE();
There was also older parts of the code that used comment annotation:
/* NOTREACHED */
Unify the handling of unreachable code paths to just use:
UNREACHABLE();
The UNREACHABLE() macro now asserts when reached and also uses
__builtin_unreachable(); when such builtin is available in the compiler.
The C17 standard deprecated ATOMIC_VAR_INIT() macro (see [1]). Follow
the suite and remove the ATOMIC_VAR_INIT() usage in favor of simple
assignment of the value as this is what all supported stdatomic.h
implementations do anyway:
* MacOSX.plaform: #define ATOMIC_VAR_INIT(__v) {__v}
* Gcc stdatomic.h: #define ATOMIC_VAR_INIT(VALUE) (VALUE)
1. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1138r0.pdf
IBM power architecture has L1 cache line size equal to 128. Take
advantage of that on that architecture, do not force more common value
of 64. When it is possible to detect higher value, use that value
instead. Keep the default to be 64.
This commit converts the license handling to adhere to the REUSE
specification. It specifically:
1. Adds used licnses to LICENSES/ directory
2. Add "isc" template for adding the copyright boilerplate
3. Changes all source files to include copyright and SPDX license
header, this includes all the C sources, documentation, zone files,
configuration files. There are notes in the doc/dev/copyrights file
on how to add correct headers to the new files.
4. Handle the rest that can't be modified via .reuse/dep5 file. The
binary (or otherwise unmodifiable) files could have license places
next to them in <foo>.license file, but this would lead to cluttered
repository and most of the files handled in the .reuse/dep5 file are
system test files.
There are some situations where having aligned allocations would be
useful, so we don't have to play tricks with padding the data to the
cacheline sizes.
Add isc_mem_{get,put,reget,putanddetach}_aligned() functions that has
alignment and size as last argument mimicking the POSIX posix_memalign()
functions on systems with jemalloc (see the documentation on
MALLOX_ALIGN() for more details). On systems without jemalloc, those
functions are same as non-aligned variants.
Previously, whole isc_mempool_get() and isc_mempool_set() would be
replaced by simpler version when run with address sanitizer.
Change the code to limit the fillcount to 1 and freemax to 0. This
change will make isc_mempool_get() to always allocate and use a single
new item and isc_mempool_put() will always return the item to the
allocator.
The __builtin_expect() can be used to provide the compiler with branch
prediction information. The Gcc manual says[1] on the subject:
In general, you should prefer to use actual profile feedback for
this (-fprofile-arcs), as programmers are notoriously bad at
predicting how their programs actually perform.
Stop using __builtin_expect() and ISC_LIKELY() and ISC_UNLIKELY() macros
to provide the branch prediction information as the performance testing
shows that named performs better when the __builtin_expect() is not
being used.
1. https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005fexpect
It was discovered that named could crash due to a segmentation fault
when jemalloc was in use and memory allocation failed. This was not
intended to happen as jemalloc's "xmalloc" option was set to "true" in
the "malloc_conf" configuration variable. However, that variable was
only set after jemalloc was already done with parsing it, which
effectively caused setting that variable to have no effect.
While investigating this issue, it was also discovered that enabling the
"xmalloc" option makes jemalloc use a slow processing path, decreasing
its performance by about 25%. [1]
Additionally, further testing (carried out after fixing the way
"malloc_conf" was set) revealed that the non-default configuration
options do not have any measurable effect on either authoritative or
recursive DNS server performance.
Replace code setting various jemalloc options to non-default values with
assertion checks of mallocx()/rallocx() return values.
[1] https://github.com/jemalloc/jemalloc/pull/523