The C17 standard deprecated ATOMIC_VAR_INIT() macro (see [1]). Follow
the suite and remove the ATOMIC_VAR_INIT() usage in favor of simple
assignment of the value as this is what all supported stdatomic.h
implementations do anyway:
* MacOSX.plaform: #define ATOMIC_VAR_INIT(__v) {__v}
* Gcc stdatomic.h: #define ATOMIC_VAR_INIT(VALUE) (VALUE)
1. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1138r0.pdf
Previously, the function(s) in the commit subject could fail for various
reasons - mostly allocation failures, or other functions returning
different return code than ISC_R_SUCCESS. Now, the aforementioned
function(s) cannot ever fail and they would always return ISC_R_SUCCESS.
Change the function(s) to return void and remove the extra checks in
the code that uses them.
Previously, the function(s) in the commit subject could fail for various
reasons - mostly allocation failures, or other functions returning
different return code than ISC_R_SUCCESS. Now, the aforementioned
function(s) cannot ever fail and they would always return ISC_R_SUCCESS.
Change the function(s) to return void and remove the extra checks in
the code that uses them.
The current implementation of isc_queue uses Michael-Scott lock-free
queue that in turn uses hazard pointers. It was discovered that the way
we use the isc_queue, such complicated mechanism isn't really needed,
because most of the time, we either execute the work directly when on
nmthread (in case of UDP) or schedule the work from the matching
nmthreads.
Replace the current implementation of the isc_queue with a simple locked
ISC_LIST. There's a slight improvement - since copying the whole list
is very lightweight - we move the queue into a new list before we start
the processing and locking just for moving the queue and not for every
single item on the list.
NOTE: There's a room for future improvements - since we don't guarantee
the order in which the netievents are processed, we could have two lists
- one unlocked that would be used when scheduling the work from the
matching thread and one locked that would be used from non-matching
thread.
The keep-response-order option has been obsoleted, and in this commit,
remove the keep-response-order ACL map rendering the option no-op, the
call the isc_nm_sequential() and the now unused isc_nm_sequential()
function itself.
In some situations (unit test and forthcoming XFR timeouts MR), we need
to modify the write timeout independently of the read timeout. Add a
isc_nmhandle_setwritetimeout() function that could be called before
isc_nm_send() to specify a custom write timeout interval.
The isc_thread_setaffinity call was removed in !5265 and we are not
going to restore it because it was proven that the performance is better
without it. Additionally, remove the already disabled cpu system test.
The isc_thread_setconcurrency function is unused and also calling
pthread_setconcurrency() on Linux has no meaning, formerly it was
added because of Solaris in 2001 and it was removed when taskmgr was
refactored to run on top of netmgr in !4918.
IBM power architecture has L1 cache line size equal to 128. Take
advantage of that on that architecture, do not force more common value
of 64. When it is possible to detect higher value, use that value
instead. Keep the default to be 64.
This commit converts the license handling to adhere to the REUSE
specification. It specifically:
1. Adds used licnses to LICENSES/ directory
2. Add "isc" template for adding the copyright boilerplate
3. Changes all source files to include copyright and SPDX license
header, this includes all the C sources, documentation, zone files,
configuration files. There are notes in the doc/dev/copyrights file
on how to add correct headers to the new files.
4. Handle the rest that can't be modified via .reuse/dep5 file. The
binary (or otherwise unmodifiable) files could have license places
next to them in <foo>.license file, but this would lead to cluttered
repository and most of the files handled in the .reuse/dep5 file are
system test files.
The isc_queue_new() was using dirty tricks to allocate the head and tail
members of the struct aligned to the cacheline. We can now use
isc_mem_get_aligned() to allocate the structure to the cacheline
directly.
Use ISC_OS_CACHELINE_SIZE (64) instead of arbitrary ALIGNMENT (128), one
cacheline size is enough to prevent false sharing.
Cleanup the unused max_threads variable - there was actually no limit on
the maximum number of threads. This was changed a while ago.
There are some situations where having aligned allocations would be
useful, so we don't have to play tricks with padding the data to the
cacheline sizes.
Add isc_mem_{get,put,reget,putanddetach}_aligned() functions that has
alignment and size as last argument mimicking the POSIX posix_memalign()
functions on systems with jemalloc (see the documentation on
MALLOX_ALIGN() for more details). On systems without jemalloc, those
functions are same as non-aligned variants.
Add library ctor and dtor for isc_os compilation unit which initializes
the numbers of the CPUs and also checks whether L1 cacheline size is
really 64 if the sysconf() call is available.
For consistency with similar functions, rename `pcache` to `cachep`,
call a separate destroy function when references reach 0, and add
a missing call to isc_refcount_destroy().
This commit adds a TLS context object cache implementation. The
intention of having this object is manyfold:
- In the case of client-side contexts: allow reusing the previously
created contexts to employ the context-specific TLS session resumption
cache. That will enable XoT connection to be reestablished faster and
with fewer resources by not going through the full TLS handshake
procedure.
- In the case of server-side contexts: reduce the number of contexts
created on startup. That could reduce startup time in a case when
there are many "listen-on" statements referring to a smaller amount of
`tls` statements, especially when "ephemeral" certificates are
involved.
- The long-term goal is to provide in-memory storage for additional
data associated with the certificates, like runtime
representation (X509_STORE) of intermediate CA-certificates bundle for
Strict TLS/Mutual TLS ("ca-file").
TLS pre-master secrets will be dumped to disk using the logging
framework provided by libisc. Add a new logging category for this type
of debugging data in order to enable exporting it to a dedicated
channel. Derive the name of the new category from the name of the
relevant environment variable, SSLKEYLOGFILE.
OpenSSL 3.0.1 does not accept 0 as a digest buffer length when
calling EVP_DigestSignFinal as it now checks that the digest buffer
length is large enough for the digest. Pass the digest buffer
length instead.
Mutex debugging code (used when the ISC_MUTEX_DEBUG preprocessor macro
is set to 1 and PTHREAD_MUTEX_ERRORCHECK is defined) has been broken for
the past 3 years (since commit 2f3eee5a4fdad6606135116c70875b3180c7ed83)
and nobody complained, which is a strong indication that this code is
not being used these days any more. External tools for detecting
locking issues are already wired into various GitLab CI checks. Drop
all code depending on the ISC_MUTEX_DEBUG preprocessor macro being set.
Mutex profiling code (used when the ISC_MUTEX_PROFILE preprocessor macro
is set to 1) has been broken for the past 3 years (since commit
0bed9bfc28a204cde57c6f68170ecc89ebfa6dc8) and nobody complained, which
is a strong indication that this code is not being used these days any
more. External tools for both measuring performance and detecting
locking issues are already wired into various GitLab CI checks. Drop
all code depending on the ISC_MUTEX_PROFILE preprocessor macro being
set.
This commit adds an isc_nm_socket_type() function which can be used to
obtain a handle's socket type.
This change obsoletes isc_nm_is_tlsdns_handle() and
isc_nm_is_http_handle(). However, it was decided to keep the latter as
we eventually might end up supporting multiple HTTP versions.
Change 5756 (GL #2854) introduced build errors when using
'configure --disable-doh'. To fix this, isc_nm_is_http_handle() is
now defined in all builds, not just builds that have DoH enabled.
Missing code comments were added both for that function and for
isc_nm_is_tlsdns_handle().
This commit adds an isc_nm_set_min_answer_ttl() function which is
intended to to be used to give a hint to the underlying transport
regarding the answer TTL.
The interface is intentionally kept generic because over time more
transports might benefit from this functionality, but currently it is
intended for DoH to set "max-age" value within "Cache-Control" HTTP
header (as recommended in the RFC8484, section 5.1 "Cache
Interaction").
It is no-op for other DNS transports for the time being.
isc_nm_routeconnect() opens a route/netlink socket, then calls a
connect callback, much like isc_nm_udpconnect(), with a handle that
can then be monitored for network changes.
Internally the socket is treated as a UDP socket, since route/netlink
sockets follow the datagram contract.
The __builtin_expect() can be used to provide the compiler with branch
prediction information. The Gcc manual says[1] on the subject:
In general, you should prefer to use actual profile feedback for
this (-fprofile-arcs), as programmers are notoriously bad at
predicting how their programs actually perform.
Stop using __builtin_expect() and ISC_LIKELY() and ISC_UNLIKELY() macros
to provide the branch prediction information as the performance testing
shows that named performs better when the __builtin_expect() is not
being used.
1. https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005fexpect
Unify the header guard style and replace the inconsistent include guards
with #pragma once.
The #pragma once is widely and very well supported in all compilers that
BIND 9 supports, and #pragma once was already in use in several new or
refactored headers.
Using simpler method will also allow us to automate header guard checks
as this is simpler to programatically check.
For reference, here are the reasons for the change taken from
Wikipedia[1]:
> In the C and C++ programming languages, #pragma once is a non-standard
> but widely supported preprocessor directive designed to cause the
> current source file to be included only once in a single compilation.
>
> Thus, #pragma once serves the same purpose as include guards, but with
> several advantages, including: less code, avoidance of name clashes,
> and sometimes improvement in compilation speed. On the other hand,
> #pragma once is not necessarily available in all compilers and its
> implementation is tricky and might not always be reliable.
1. https://en.wikipedia.org/wiki/Pragma_once
Replace some "master/slave" terminology in the code with the preferred
"primary/secondary" keywords. This also changes user output such as
log messages, and fixes a typo ("seconary") in cfg_test.c.
There are still some references to "master" and "slave" for various
reasons:
- The old syntax can still be used as a synonym.
- The master syntax is kept when it refers to master files and formats.
- This commit replaces mainly keywords that are local. If "master" or
"slave" is used in for example a structure that is all over the
place, it is considered out of scope for the moment.
Remove the dynamic registration of result codes. Convert isc_result_t
from unsigned + #defines into 32-bit enum type in grand unified
<isc/result.h> header. Keep the existing values of the result codes
even at the expense of the description and identifier tables being
unnecessary large.
Additionally, add couple of:
switch (result) {
[...]
default:
break;
}
statements where compiler now complains about missing enum values in the
switch statement.
Previously, when using compiler without support for static assertions,
the STATIC_ASSERT() macro would be replaced with runtime assertion.
Change the STATIC_ASSERT() macro to a version that's compile time
assertion even when using pre-C11 compilers.
Courtesy of Joseph Quinsey: https://godbolt.org/z/K9RvWS
This commit makes BIND verify that zone transfers are allowed to be
done over the underlying connection. Currently, it makes sense only
for DoT, but the code is deliberately made to be protocol-agnostic.
The intention of having this function is to have a predicate to check
if a zone transfer could be performed over the given handle. In most
cases we can assume that we can do zone transfers over any stream
transport except DoH, but this assumption will not work for zone
transfers over DoT (XoT), as the RFC9103 requires ALPN to happen,
which might not be the case for all deployments of DoT.
- Responses received by the dispatch are no longer sent to the caller
via a task event, but via a netmgr-style recv callback. the 'action'
parameter to dns_dispatch_addresponse() is now called 'response' and
is called directly from udp_recv() or tcp_recv() when a valid response
has been received.
- All references to isc_task and isc_taskmgr have been removed from
dispatch functions.
- All references to dns_dispatchevent_t have been removed and the type
has been deleted.
- Added a task to the resolver response context, to be used for fctx
events.
- When the caller cancels an operation, the response handler will be
called with ISC_R_CANCELED; it can abort immediately since the caller
will presumably have taken care of cleanup already.
- Cleaned up attach/detach in resquery and request.
- The `timeout_action` parameter to dns_dispatch_addresponse() been
replaced with a netmgr callback that is called when a dispatch read
times out. this callback may optionally reset the read timer and
resume reading.
- Added a function to convert isc_interval to milliseconds; this is used
to translate fctx->interval into a value that can be passed to
dns_dispatch_addresponse() as the timeout.
- Note that netmgr timeouts are accurate to the millisecond, so code to
check whether a timeout has been reached cannot rely on microsecond
accuracy.
- If serve-stale is configured, then a timeout received by the resolver
may trigger it to return stale data, and then resume waiting for the
read timeout. this is no longer based on a separate stale timer.
- The code for canceling requests in request.c has been altered so that
it can run asynchronously.
- TCP timeout events apply to the dispatch, which may be shared by
multiple queries. since in the event of a timeout we have no query ID
to use to identify the resp we wanted, we now just send the timeout to
the oldest query that was pending.
- There was some additional refactoring in the resolver: combining
fctx_join() and fctx_try_events() into one function to reduce code
duplication, and using fixednames in fetchctx and fetchevent.
- Incidental fix: new_adbaddrinfo() can't return NULL anymore, so the
code can be simplified.
The flow of operations in dispatch is changing and will now be similar
for both UDP and TCP queries:
1) Call dns_dispatch_addresponse() to assign a query ID and register
that we'll be listening for a response with that ID soon. the
parameters for this function include callback functions to inform the
caller when the socket is connected and when the message has been
sent, as well as a task action that will be sent when the response
arrives. (later this could become a netmgr callback, but at this
stage to minimize disruption to the calling code, we continue to use
isc_task for the response event.) on successful completion of this
function, a dispatch entry object will be instantiated.
2) Call dns_dispatch_connect() on the dispatch entry. this runs
isc_nm_udpconnect() or isc_nm_tcpdnsconnect(), as needed, and begins
listening for responses. the caller is informed via a callback
function when the connection is established.
3) Call dns_dispatch_send() on the dispatch entry. this runs
isc_nm_send() to send a request.
4) Call dns_dispatch_removeresponse() to terminate listening and close
the connection.
Implementation comments below:
- As we will be using netmgr buffers now. code to send the length in
TCP queries has also been removed as that is handled by the netmgr.
- TCP dispatches can be used by multiple simultaneous queries, so
dns_dispatch_connect() now checks whether the dispatch is already
connected before calling isc_nm_tcpdnsconnect() again.
- Running dns_dispatch_getnext() from a non-network thread caused a
crash due to assertions in the netmgr read functions that appear to be
unnecessary now. the assertions have been removed.
- fctx->nqueries was formerly incremented when the connection was
successful, but is now incremented when the query is started and
decremented if the connection fails.
- It's no longer necessary for each dispatch to have a pool of tasks, so
there's now a single task per dispatch.
- Dispatch code to avoid UDP ports already in use has been removed.
- dns_resolver and dns_request have been modified to use netmgr callback
functions instead of task events. some additional changes were needed
to handle shutdown processing correctly.
- Timeout processing is not yet fully converted to use netmgr timeouts.
- Fixed a lock order cycle reported by TSAN (view -> zone-> adb -> view)
by by calling dns_zt functions without holding the view lock.
- Many dispatch attributes can be set implicitly instead of being passed
in. we can infer whether to set DNS_DISPATCHATTR_TCP or _UDP from
whether we're calling dns_dispatch_createtcp() or _createudp(). we
can also infer DNS_DISPATCHATTR_IPV4 or _IPV6 from the addresses or
the socket that were passed in.
- We no longer use dup'd sockets in UDP dispatches, so the 'dup_socket'
parameter has been removed from dns_dispatch_createudp(), along with
the code implementing it. also removed isc_socket_dup() since it no
longer has any callers.
- The 'buffersize' parameter was ignored and has now been removed;
buffersize is now fixed at 4096.
- Maxbuffers and maxrequests don't need to be passed in on every call to
dns_dispatch_createtcp() and _createudp().
In all current uses, the value for mgr->maxbuffers will either be
raised once from its default of 20000 to 32768, or else left
alone. (passing in a value lower than 20000 does not lower it.) there
isn't enough difference between these values for there to be any need
to configure this.
The value for disp->maxrequests controls both the quota of concurrent
requests for a dispatch and also the size of the dispatch socket
memory pool. it's not clear that this quota is necessary at all. the
memory pool size currently starts at 32768, but is sometimes lowered
to 4096, which is definitely unnecessary.
This commit sets both values permanently to 32768.
- Previously TCP dispatches allocated their own separate QID table,
which didn't incorporate a port table. this commit removes
per-dispatch QID tables and shares the same table between all
dispatches. since dispatches are created for each TCP socket, this may
speed up the dispatch allocation process. there may be a slight
increase in lock contention since all dispatches are sharing a single
QID table, but since TCP sockets are used less often than UDP
sockets (which were already sharing a QID table), it should not be a
substantial change.
- The dispatch port table was being used to determine whether a port was
already in use; if so, then a UDP socket would be bound with
REUSEADDR. this commit removes the port table, and always binds UDP
sockets that way.
This commit adds the ability to enable or disable stateless TLS
session resumption tickets (see RFC5077). Having this ability is
twofold.
Firstly, these tickets are encrypted by the server, and the algorithm
might be weaker than the algorithm negotiated during the TLS session
establishment (it is in general the case for TLSv1.2, but the generic
principle applies to TLSv1.3 as well, despite it having better ciphers
for session tickets). Thus, they might compromise Perfect Forward
Secrecy.
Secondly, disabling it might be necessary if the same TLS key/cert
pair is supposed to be used by multiple servers to achieve, e.g., load
balancing because the session ticket by default gets generated in
runtime, while to achieve successful session resumption ability, in
this case, would have required using a shared key.
The proper alternative to having the ability to disable stateless TLS
session resumption tickets is to implement a proper session tickets
key rollover mechanism so that key rotation might be performed
often (e.g. once an hour) to not compromise forward secrecy while
retaining the associated performance benefits. That is much more work,
though. On the other hand, having the ability to disable session
tickets allows having a deployable configuration right now in the
cases when either forward secrecy is wanted or sharing the TLS
key/cert pair between multiple servers is needed (or both).
This commit adds support for enforcing the preference of server
ciphers over the client ones. This way, the server attains control
over the ciphers priority and, thus, can choose more strong cyphers
when a client prioritises less strong ciphers over the more strong
ones, which is beneficial when trying to achieve Perfect Forward
Secrecy.
This commit adds support for setting TLS cipher list string in the
format specified in the OpenSSL
documentation (https://www.openssl.org/docs/man1.1.1/man1/ciphers.html).
The syntax of the cipher list is verified so that specifying the wrong
string will prevent the configuration from being loaded.
This commit adds support for loading DH-parameters (Diffie-Hellman
parameters) via the new "dhparam-file" option within "tls" clause. In
particular, Diffie-Hellman parameters are needed to enable the range
of forward-secrecy enabled cyphers for TLSv1.2, which are getting
silently disabled otherwise.
This commit adds the ability to specify allowed TLS protocols versions
within the "tls" clause. If an unsupported TLS protocol version is
specified in a file, the configuration file will not pass
verification.
Also, this commit adds strict checks for "tls" clauses verification,
in particular:
- it ensures that loading configuration files containing duplicated
"tls" clauses is not allowed;
- it ensures that loading configuration files containing "tls" clauses
missing "cert-file" or "key-file" is not allowed;
- it ensures that loading configuration files containing "tls" clauses
named as "ephemeral" or "none" is not allowed.
The isc_mem_get() and isc_mem_put() functions are leaving the memory
allocation size tracking to the users of the API, while
isc_mem_allocate() and isc_mem_free() would track the sizes internally.
This allowed to have isc_mem_rellocate() to manipulate the memory
allocations by the later set, but not the former set of the functions.
This commit introduces isc_mem_reget(ctx, old_ptr, old_size, new_size)
function that operates on the memory allocations with external size
tracking completing the API.