when parsing key pairs, if the '=' character fell at max_token
a protective INSIST preventing buffer overrun could be triggered.
Attempt to grow the buffer immediately before the INSIST.
Also removed an unnecessary INSIST on the opening double quote
of key buffer pair.
The isc__nmsocket_reset() was missing a case for raw TCP sockets (used
by RNDC and DoH) which would case a assertion failure when write timeout
would be triggered.
TCP sockets are now also properly handled in isc__nmsocket_reset().
When isc__nm_uvreq_t gets deactivated, it could be just put onto array
stack to be reused later to save some initialization time.
Unfortunately, this might hide some use-after-free errors.
Disable the inactive uvreqs caching when compiled with Address or
Thread Sanitizer.
When isc_nmhandle_t gets deactivated, it could be just put onto array
stack to be reused later to safe some initialization time.
Unfortunately, this might hide some use-after-free errors.
Disable the inactive handles caching when compiled with Address or
Thread Sanitizer.
The isc__nmsocket_t has locked array of isc_nmhandle_t that's not used
for anything. The isc__nmhandle_get() adds the isc_nmhandle_t to the
locked array (and resized if necessary) and removed when
isc_nmhandle_put() finally destroys the handle. That's all it does, so
it serves no useful purpose.
Remove the .ah_handles, .ah_size, and .ah_frees members of the
isc__nmsocket_t and .ah_pos member of the isc_nmhandle_t struct.
When the TCP, TCPDNS or TLSDNS connection times out, the isc__nm_uvreq_t
would be pushed into sock->inactivereqs before the uv_tcp_connect()
callback finishes. Because the isc__nmsocket_t keeps the list of
inactive isc__nm_uvreq_t, this would cause use-after-free only when the
sock->inactivereqs is full (which could never happen because the failure
happens in connection timeout callback) or when the sock->inactivereqs
mechanism is completely removed (f.e. when running under Address or
Thread Sanitizer).
Delay isc__nm_uvreq_t deallocation to the connection callback and only
signal the connection callback should be called by shutting down the
libuv socket from the connection timeout callback.
The keep-response-order option has been obsoleted, and in this commit,
remove the keep-response-order ACL map rendering the option no-op, the
call the isc_nm_sequential() and the now unused isc_nm_sequential()
function itself.
There was an artificial limit of 23 on the number of simultaneous
pipelined queries in the single TCP connection. The new network
managers is capable of handling "unlimited" (limited only by the TCP
read buffer size ) queries similar to "unlimited" handling of the DNS
queries receive over UDP.
Don't limit the number of TCP queries that we can process within a
single TCP read callback.
When invalid DNS message is received, there was a handling mechanism for
DoH that would be called to return proper HTTP response.
Reuse this mechanism and reset the TCP connection when the client is
blackholed, DNS message is completely bogus or the ns_client receives
response instead of query.
Use the isc_nmhandle_setwritetimeout() function in the netmgr unit test
to allow more time for writing and reading the responses because some of
the intervals that are used in the unit tests are really small leaving a
little room for any delays.
In some situations (unit test and forthcoming XFR timeouts MR), we need
to modify the write timeout independently of the read timeout. Add a
isc_nmhandle_setwritetimeout() function that could be called before
isc_nm_send() to specify a custom write timeout interval.
When the outgoing TCP write buffers are full because the other party is
not reading the data, the uv_write() could wait indefinitely on the
uv_loop and never calling the callback. Add a new write timer that uses
the `tcp-idle-timeout` value to interrupt the TCP connection when we are
not able to send data for defined period of time.
The uv_tcp_close_reset() function was added in libuv 1.32.0 and since we
support older libuv releases, we have to add a shim uv_tcp_close_reset()
implementation loosely based on libuv.
Before adding the write timer, we have to remove the generic sock->timer
to sock->read_timer. We don't touch the function names to limit the
impact of the refactoring.
When libuv functions fail, they return correct return value that could
be useful for more detailed debugging. Currently, we usually just check
whether the return value is 0 and invoke assertion error if it doesn't
throwing away the details why the call has failed. Unfortunately, this
often happen on more exotic platforms.
Add a UV_RUNTIME_CHECK() macro that can be used to print more detailed
error message (via uv_strerror() before ending the execution of the
program abruptly with the assertion.
The task exclusive mode stops all processing (tasks and networking IO)
except the designated exclusive task events. This has impact on the
operation of the server. Add log messages indicating when we start the
exclusive mode, and when we end exclusive task mode.
There are reported occurences where the statitic counters underflows and
starts reporting non-sense.
Add a check for the underflow, when ``named`` is compiled in the
developer mode.
The isc_thread_setaffinity call was removed in !5265 and we are not
going to restore it because it was proven that the performance is better
without it. Additionally, remove the already disabled cpu system test.
The isc_thread_setconcurrency function is unused and also calling
pthread_setconcurrency() on Linux has no meaning, formerly it was
added because of Solaris in 2001 and it was removed when taskmgr was
refactored to run on top of netmgr in !4918.
When isc_quota_attach_cb() API returns ISC_R_QUOTA (meaning hard quota
was reached) the accept_connection() would return without logging a
message about quota reached.
Change the connection callback to log the quota reached message.
IBM power architecture has L1 cache line size equal to 128. Take
advantage of that on that architecture, do not force more common value
of 64. When it is possible to detect higher value, use that value
instead. Keep the default to be 64.
TLS clients can have their clock a short time in the past which will
result in not being able to validate the certificate.
Setting the "not before" property 5 minutes in the past will
accommodate with some possible clock skew across systems.
On some systems, the glibc can return 0 instead of cache-line size to
indicate the cache line sizes cannot be determined. This is comment
from glibc source code:
/* In general we cannot determine these values. Therefore we
return zero which indicates that no information is
available. */
As the goal of the check is to determine whether the L1 cache line size
is still 64 and we would use this value in case the sysconf() call is
not available, we can also ignore the invalid values returned by the
sysconf() call.
Some operating systems (OpenBSD and DragonFly BSD) don't restrict the
IPv6 sockets to sending and receiving IPv6 packets only. Explicitly
enable the IPV6_V6ONLY socket option on the IPv6 sockets to prevent
failures from using the IPv4-mapped IPv6 address.
The server_send_error_response() function is supposed to be used only
in case of failures and never in case of legitimate requests. Ensure
that ISC_HTTP_ERROR_SUCCESS is never passed there by mistake.
The commit itself is harmless, but at the same time it is also useless,
so we are reverting it.
This reverts commit 11c869a3d53eafa4083b404e6b6686a120919c26.
Previously, the netmgr/udp.c tried to detect the recvmmsg detection in
libuv with #ifdef UV_UDP_<foo> preprocessor macros. However, because
the UV_UDP_<foo> are not preprocessor macros, but enum members, the
detection didn't work. Because the detection didn't work, the code
didn't have access to the information when we received the final chunk
of the recvmmsg and tried to free the uvbuf every time. Fortunately,
the isc__nm_free_uvbuf() had a kludge that detected attempt to free in
the middle of the receive buffer, so the code worked.
However, libuv 1.37.0 changed the way the recvmmsg was enabled from
implicit to explicit, and we checked for yet another enum member
presence with preprocessor macro, so in fact libuv recvmmsg support was
never enabled with libuv >= 1.37.0.
This commit changes to the preprocessor macros to autoconf checks for
declaration, so the detection now works again. On top of that, it's now
possible to cleanup the alloc_cb and free_uvbuf functions because now,
the information whether we can or cannot free the buffer is available to
us.
Clients can cache the TLS certificates and refuse to accept
another one with the same serial number from the same issuer.
Generate a random serial number for the self-signed certificates
instead of using a fixed value.
GnuTLS, NSS, and possibly other TLS libraries currently fail to work
with compressed point conversion form supported by OpenSSL.
Use uncompressed point conversion form for better compatibility.
This commit converts the license handling to adhere to the REUSE
specification. It specifically:
1. Adds used licnses to LICENSES/ directory
2. Add "isc" template for adding the copyright boilerplate
3. Changes all source files to include copyright and SPDX license
header, this includes all the C sources, documentation, zone files,
configuration files. There are notes in the doc/dev/copyrights file
on how to add correct headers to the new files.
4. Handle the rest that can't be modified via .reuse/dep5 file. The
binary (or otherwise unmodifiable) files could have license places
next to them in <foo>.license file, but this would lead to cluttered
repository and most of the files handled in the .reuse/dep5 file are
system test files.
The isc__nm_tcp_resumeread() was using maybe_enqueue function to enqueue
netmgr event which could case the read callback to be executed
immediately if there was enough data waiting in the TCP queue.
If such thing would happen, the read callback would be called before the
previous read callback was finished and the worker receive buffer would
be still marked "in use" causing a assertion failure.
This would affect only raw TCP channels, e.g. rndc and http statistics.
The isc_queue_new() was using dirty tricks to allocate the head and tail
members of the struct aligned to the cacheline. We can now use
isc_mem_get_aligned() to allocate the structure to the cacheline
directly.
Use ISC_OS_CACHELINE_SIZE (64) instead of arbitrary ALIGNMENT (128), one
cacheline size is enough to prevent false sharing.
Cleanup the unused max_threads variable - there was actually no limit on
the maximum number of threads. This was changed a while ago.
The hazard pointers implementation was bit of frivolous with memory
usage allocating memory based on maximum constants rather than on the
usage.
Make the retired list bit use exactly the memory needed for specified
number of hazard pointers. This reduced the memory used by hazard
pointers to one quarter in our specific case because we only use single
HP in the queue implementation (as opposed to allocating memory for
HP_MAX_HPS = 4).
Previously, the alignment to prevent false sharing was double the
cacheline size. This was copied from the ConcurrencyFreaks
implementation, but one cacheline size is enough to prevent false
sharing, so we are using this now to save few bits of memory.
The top level hazard pointers and retired list arrays are now not
aligned to the cacheline size - they are read-only for the whole
life-time of the isc_hp object. Only hp (hazard pointer) and
rl (retired list) array members are allocated aligned to the cacheline
size to avoid false sharing between threads.
Cleanup HP_MAX_HPS and HP_THRESHOLD_R constants from the paper, because
we don't use them in the code. HP_THRESHOLD_R was 0, so the check
whether the retired list size was smaller than the value was basically a
dead code.
There are some situations where having aligned allocations would be
useful, so we don't have to play tricks with padding the data to the
cacheline sizes.
Add isc_mem_{get,put,reget,putanddetach}_aligned() functions that has
alignment and size as last argument mimicking the POSIX posix_memalign()
functions on systems with jemalloc (see the documentation on
MALLOX_ALIGN() for more details). On systems without jemalloc, those
functions are same as non-aligned variants.
Add library ctor and dtor for isc_os compilation unit which initializes
the numbers of the CPUs and also checks whether L1 cacheline size is
really 64 if the sysconf() call is available.
While doing code review, it was found that the taskmgr->exiting is set
under taskmgr->lock, but accessed under taskmgr->excl_lock in the
isc_task_beginexclusive().
Additionally, before the change that moved running the tasks to the
netmgr, the task_ready() subrouting of isc_task_detach() would lock
mgr->lock, requiring the mgr->excl to be protected mgr->excl_lock
to prevent deadlock in the code. After !4918 has been merged, this is
no longer true, and we can remove taskmgr->excl_lock and use
taskmgr->lock in its stead.
Solve both issues by removing the taskmgr->excl_lock and exclusively use
taskmgr->lock to protect both taskmgr->excl and taskmgr->exiting which
now doesn't need to be atomic_bool, because it's always accessed from
within the locked section.
The isc_taskmgr_excltask() would return ISC_R_NOTFOUND either when the
exclusive task was not set (yet) or when the taskmgr is shutting down
and the exclusive task has been already cleared.
Distinguish between the two states and return ISC_R_SHUTTINGDOWN when
the taskmgr is being shut down instead of ISC_R_NOTFOUND.
For consistency with similar functions, rename `pcache` to `cachep`,
call a separate destroy function when references reach 0, and add
a missing call to isc_refcount_destroy().
Using the TLS context cache for server-side contexts could reduce the
number of contexts to initialise in the configurations when e.g. the
same 'tls' entry is used in multiple 'listen-on' statements for the
same DNS transport, binding to multiple IP addresses.
In such a case, only one TLS context will be created, instead of a
context per IP address, which could reduce the initialisation time, as
initialising even a non-ephemeral TLS context introduces some delay,
which can be *visually* noticeable by log activity.
Also, this change lays down a foundation for Mutual TLS (when the
server validates a client certificate, additionally to a client
validating the server), as the TLS context cache can be extended to
store additional data required for validation (like intermediates CA
chain).
Additionally to the above, the change ensures that the contexts are
not being changed after initialisation, as such a practice is frowned
upon. Previously we would set the supported ALPN tags within
isc_nm_listenhttp() and isc_nm_listentlsdns(). We do not do that for
client-side contexts, so that appears to be an overlook. Now we set
the supported ALPN tags right after server-side contexts creation,
similarly how we do for client-side ones.
This commit adds a TLS context object cache implementation. The
intention of having this object is manyfold:
- In the case of client-side contexts: allow reusing the previously
created contexts to employ the context-specific TLS session resumption
cache. That will enable XoT connection to be reestablished faster and
with fewer resources by not going through the full TLS handshake
procedure.
- In the case of server-side contexts: reduce the number of contexts
created on startup. That could reduce startup time in a case when
there are many "listen-on" statements referring to a smaller amount of
`tls` statements, especially when "ephemeral" certificates are
involved.
- The long-term goal is to provide in-memory storage for additional
data associated with the certificates, like runtime
representation (X509_STORE) of intermediate CA-certificates bundle for
Strict TLS/Mutual TLS ("ca-file").
Commit 9ee60e7a17bf34c7ef7f4d79e6a00ca45444ec8c erroneously introduced
duplicate conditions to several existing conditional statements
responsible for determining error codes passed to connection callbacks
upon failure. Fix the affected expressions to ensure connection
callbacks are invoked with:
- the ISC_R_SHUTTINGDOWN error code when a global netmgr shutdown is
in progress,
- the ISC_R_CANCELED error code when a specific operation has been
canceled.
This does not fix any known bugs, it only adjusts the changes introduced
by commit 9ee60e7a17bf34c7ef7f4d79e6a00ca45444ec8c so that they match
its original intent.
The SSL_CTX_set_keylog_callback() function is a fairly recent OpenSSL
addition, having first appeared in version 1.1.1. Add a configure.ac
check for the availability of that function to prevent build errors on
older platforms. Sort similar checks alphabetically.
This makes the SSLKEYLOGFILE mechanism a silent no-op on unsupported
platforms, which is considered acceptable for a debugging feature.