mir/bind - bind - Mike's Git repositories

mir/bind

mirror of https://gitlab.isc.org/isc-projects/bind9 synced 2025-08-28 21:17:54 +00:00

Author	SHA1	Message	Date
Ondřej Surý	eb8f2974b1	Abort when libuv at runtime mismatches libuv at compile time When we compile with libuv that has some capabilities via flags passed to f.e. uv_udp_listen() or uv_udp_bind(), the call with such flags would fail with invalid arguments when older libuv version is linked at the runtime that doesn't understand the flag that was available at the compile time. Enforce minimal libuv version when flags have been available at the compile time, but are not available at the runtime. This check is less strict than enforcing the runtime libuv version to be same or higher than compile time libuv version.	2022-04-26 11:40:40 +02:00
Ondřej Surý	f55a4d3e55	Allow listening on less than nworkers threads For some applications, it's useful to not listen on full battery of threads. Add workers argument to all isc_nm_listen*() functions and convenience ISC_NM_LISTEN_ONE and ISC_NM_LISTEN_ALL macros.	2022-04-19 11:08:13 +02:00
Artem Boldariev	df317184eb	Add isc_nmsocket_set_tlsctx() This commit adds isc_nmsocket_set_tlsctx() - an asynchronous function that replaces the TLS context within a given TLS-enabled listener socket object. It is based on the newly added reference counting functionality. The intention of adding this function is to add functionality to replace a TLS context without recreating the whole socket object, including the underlying TCP listener socket, as a BIND process might not have enough permissions to re-create it fully on reconfiguration.	2022-04-06 18:45:57 +03:00
Artem Boldariev	25609156a5	Maintain a per-thread TLS ctx reference in TLS stream code This commit changes the generic TLS stream code to maintain a per-worker thread TLS context reference.	2022-04-06 18:45:57 +03:00
Artem Boldariev	9256026d18	Use isc_tlsctx_attach() in TLS DNS code This commit adds proper reference counting for TLS contexts into generic TLS DNS (DoT) code.	2022-04-06 18:45:57 +03:00
Artem Boldariev	b52d46612f	Use isc_tlsctx_attach() in TLS stream code This commit adds proper reference counting for TLS contexts into generic TLS stream code.	2022-04-06 18:45:57 +03:00
Ondřej Surý	142c63dda8	Enable the load-balance-sockets configuration Previously, HAVE_SO_REUSEPORT_LB has been defined only in the private netmgr-int.h header file, making the configuration of load balanced sockets inoperable. Move the missing HAVE_SO_REUSEPORT_LB define the isc/netmgr.h and add missing isc_nm_getloadbalancesockets() implementation.	2022-04-05 01:30:58 +02:00
Ondřej Surý	85c6e797aa	Add option to configure load balance sockets Previously, the option to enable kernel load balancing of the sockets was always enabled when supported by the operating system (SO_REUSEPORT on Linux and SO_REUSEPORT_LB on FreeBSD). It was reported that in scenarios where the networking threads are also responsible for processing long-running tasks (like RPZ processing, CATZ processing or large zone transfers), this could lead to intermitten brownouts for some clients, because the thread assigned by the operating system might be busy. In such scenarious, the overall performance would be better served by threads competing over the sockets because the idle threads can pick up the incoming traffic. Add new configuration option (`load-balance-sockets`) to allow enabling or disabling the load balancing of the sockets.	2022-04-04 23:10:04 +02:00
Ondřej Surý	30e0fd942b	Remove task privileged mode Previously, the task privileged mode has been used only when the named was starting up and loading the zones from the disk as the "first" thing to do. The privileged task was setup with quantum == 2, which made the taskmgr/netmgr spin around the privileged queue processing two events at the time. The same effect can be achieved by setting the quantum to UINT_MAX (e.g. practically unlimited) for the loadzone task, hence the privileged task mode was removed in favor of just processing all the events on the loadzone task in a single task_run().	2022-04-01 23:55:26 +02:00
Ondřej Surý	4dceab142d	Consistenly use UNREACHABLE() instead of ISC_UNREACHABLE() In couple places, we have missed INSIST(0) or ISC_UNREACHABLE() replacement on some branches with UNREACHABLE(). Replace all ISC_UNREACHABLE() or INSIST(0) calls with UNREACHABLE().	2022-03-28 23:26:08 +02:00
Artem Boldariev	783663db80	Add ISC_R_TLSBADPEERCERT error code to the TLS related code This commit adds support for ISC_R_TLSBADPEERCERT error code, which is supposed to be used to signal for TLS peer certificates verification in dig and other code. The support for this error code is added to our TLS and TLS DNS implementations. This commit also adds isc_nm_verify_tls_peer_result_string() function which is supposed to be used to get a textual description of the reason for getting a ISC_R_TLSBADPEERCERT error.	2022-03-28 15:32:30 +03:00
Ondřej Surý	9de10cd153	Remove extrahandle size from netmgr Previously, it was possible to assign a bit of memory space in the nmhandle to store the client data. This was complicated and prevents further refactoring of isc_nmhandle_t caching (future work). Instead of caching the data in the nmhandle, allocate the hot-path ns_client_t objects from per-thread clientmgr memory context and just assign it to the isc_nmhandle_t via isc_nmhandle_set().	2022-03-25 10:38:35 +01:00
Ondřej Surý	584f0d7a7e	Simplify way we tag unreachable code with only ISC_UNREACHABLE() Previously, the unreachable code paths would have to be tagged with: INSIST(0); ISC_UNREACHABLE(); There was also older parts of the code that used comment annotation: /* NOTREACHED */ Unify the handling of unreachable code paths to just use: UNREACHABLE(); The UNREACHABLE() macro now asserts when reached and also uses __builtin_unreachable(); when such builtin is available in the compiler.	2022-03-25 08:33:43 +01:00
Ondřej Surý	fe7ce629f4	Add FALLTHROUGH macro for __attribute__((fallthrough)) Gcc 7+ and Clang 10+ have implemented __attribute__((fallthrough)) which is explicit version of the /* FALLTHROUGH / comment we are currently using. Add and apply FALLTHROUGH macro that uses the attribute if available, but does nothing on older compilers. In one case (lib/dns/zone.c), using the macro revealed that we were using the / FALLTHROUGH */ comment in wrong place, remove that comment.	2022-03-25 08:33:43 +01:00
Ondřej Surý	d70daa29f7	Make netmgr the authority on number of threads running Instead of passing the "workers" variable back and forth along with passing the single isc_nm_t instance, add isc_nm_getnworkers() function that returns the number of netmgr threads are running. Change the ns_interfacemgr and ns_taskmgr to utilize the newly acquired knowledge.	2022-03-18 21:53:28 +01:00
Ondřej Surý	bfa4b9c141	Run .closehandle_cb asynchrounosly in nmhandle_detach_cb() When sock->closehandle_cb is set, we need to run nmhandle_detach_cb() asynchronously to ensure correct order of multiple packets processing in the isc__nm_process_sock_buffer(). When not run asynchronously, it would cause: a) out-of-order processing of the return codes from processbuffer(); b) stack growth because the next TCP DNS message read callback will be called from within the current TCP DNS message read callback. The sock->closehandle_cb is set to isc__nm_resume_processing() for TCP sockets which calls isc__nm_process_sock_buffer(). If the read callback (called from isc__nm_process_sock_buffer()->processbuffer()) doesn't attach to the nmhandle (f.e. because it wants to drop the processing or we send the response directly via uv_try_write()), the isc__nm_resume_processing() (via .closehandle_cb) would call isc__nm_process_sock_buffer() recursively. The below shortened code path shows how the stack can grow: 1: ns__client_request(handle, ...); 2: isc_nm_tcpdns_sequential(handle); 3: ns_query_start(client, handle); 4: query_lookup(qctx); 5: query_send(qctcx->client); 6: isc__nmhandle_detach(&client->reqhandle); 7: nmhandle_detach_cb(&handle); 8: sock->closehandle_cb(sock); // isc__nm_resume_processing 9: isc__nm_process_sock_buffer(sock); 10: processbuffer(sock); // isc__nm_tcpdns_processbuffer 11: isc_nmhandle_attach(req->handle, &handle); 12: isc__nm_readcb(sock, req, ISC_R_SUCCESS); 13: isc__nm_async_readcb(NULL, ...); 14: uvreq->cb.recv(...); // ns__client_request Instead, if 'sock->closehandle_cb' is set, we need to run detach the handle asynchroniously in 'isc__nmhandle_detach', so that on line 8 in the code flow above does not start this recursion. This ensures the correct order when processing multiple packets in the function 'isc__nm_process_sock_buffer()' and prevents the stack growth. When not run asynchronously, the out-of-order processing leaves the first TCP socket open until all requests on the stream have been processed. If the pipelining is disabled on the TCP via `keep-response-order` configuration option, named would keep the first socket in lingering CLOSE_WAIT state when the client sends an incomplete packet and then closes the connection from the client side.	2022-03-16 22:11:49 +01:00
Ondřej Surý	6ddac2d56d	On shutdown, reset the established TCP connections Previously, the established TCP connections (both client and server) would be gracefully closed waiting for the write timeout. Don't wait for TCP connections to gracefully shutdown, but directly reset them for faster shutdown.	2022-03-11 09:56:57 +01:00
Ondřej Surý	a761aa59e3	Change single write timer to per-send timers Previously, there was a single per-socket write timer that would get restarted for every new write. This turned out to be insufficient because the other side could keep reseting the timer, and never reading back the responses. Change the single write timer to per-send timer which would in turn reset the TCP connection on the first send timeout.	2022-03-11 09:56:57 +01:00
Ondřej Surý	f251d69eba	Remove usage of deprecated ATOMIC_VAR_INIT() macro The C17 standard deprecated ATOMIC_VAR_INIT() macro (see [1]). Follow the suite and remove the ATOMIC_VAR_INIT() usage in favor of simple assignment of the value as this is what all supported stdatomic.h implementations do anyway: * MacOSX.plaform: #define ATOMIC_VAR_INIT(__v) {__v} * Gcc stdatomic.h: #define ATOMIC_VAR_INIT(VALUE) (VALUE) 1. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1138r0.pdf	2022-03-08 23:55:10 +01:00
Ondřej Surý	8098a58581	Set TCP maximum segment size to minimum size of 1220 Previously the socket code would set the TCPv6 maximum segment size to minimum value to prevent IP fragmentation for TCP. This was not yet implemented for the network manager. Implement network manager functions to set and use minimum MTU socket option and set the TCP_MAXSEG socket option for both IPv4 and IPv6 and use those to clamp the TCP maximum segment size for TCP, TCPDNS and TLSDNS layers in the network manager to 1220 bytes, that is 1280 (IPv6 minimum link MTU) minus 40 (IPv6 fixed header) minus 20 (TCP fixed header) We already rely on a similar value for UDP to prevent IP fragmentation and it make sense to use the same value for IPv4 and IPv6 because the modern networks are required to support IPv6 packet sizes. If there's need for small TCP segment values, the MTU on the interfaces needs to be properly configured.	2022-03-08 10:27:05 +01:00
Ondřej Surý	5d34a14f22	Set minimum MTU (1280) on IPv6 sockets The IPV6_USE_MIN_MTU socket option directs the IP layer to limit the IPv6 packet size to the minimum required supported MTU from the base IPv6 specification, i.e. 1280 bytes. Many implementations of TCP running over IPv6 neglect to check the IPV6_USE_MIN_MTU value when performing MSS negotiation and when constructing a TCP segment despite MSS being defined to be the MTU less the IP and TCP header sizes (60 bytes for IPv6). This leads to oversized IPv6 packets being sent resulting in unintended Path Maximum Transport Unit Discovery (PMTUD) being performed and to fragmented IPv6 packets being sent. Add and use a function to set socket option to limit the MTU on IPv6 sockets to the minimum MTU (1280) both for UDP and TCP.	2022-03-08 10:27:05 +01:00
Ondřej Surý	6bd025942c	Replace netievent lock-free queue with simple locked queue The current implementation of isc_queue uses Michael-Scott lock-free queue that in turn uses hazard pointers. It was discovered that the way we use the isc_queue, such complicated mechanism isn't really needed, because most of the time, we either execute the work directly when on nmthread (in case of UDP) or schedule the work from the matching nmthreads. Replace the current implementation of the isc_queue with a simple locked ISC_LIST. There's a slight improvement - since copying the whole list is very lightweight - we move the queue into a new list before we start the processing and locking just for moving the queue and not for every single item on the list. NOTE: There's a room for future improvements - since we don't guarantee the order in which the netievents are processed, we could have two lists - one unlocked that would be used when scheduling the work from the matching thread and one locked that would be used from non-matching thread.	2022-03-04 13:49:51 +01:00
Ondřej Surý	b220fb32bd	Handle TCP sockets in isc__nmsocket_reset() The isc__nmsocket_reset() was missing a case for raw TCP sockets (used by RNDC and DoH) which would case a assertion failure when write timeout would be triggered. TCP sockets are now also properly handled in isc__nmsocket_reset().	2022-02-28 02:06:03 -08:00
Ondřej Surý	ecf042991c	Fix typo __SANITIZE_ADDRESS -> __SANITIZE_ADDRESS__ When checking for Address Sanitizer to disable the inactivehandles caching, there was a typo in the macro.	2022-02-24 00:15:16 +01:00
Ondřej Surý	be339b3c83	Disable inactive uvreqs caching when compiled with sanitizers When isc__nm_uvreq_t gets deactivated, it could be just put onto array stack to be reused later to save some initialization time. Unfortunately, this might hide some use-after-free errors. Disable the inactive uvreqs caching when compiled with Address or Thread Sanitizer.	2022-02-24 00:15:16 +01:00
Ondřej Surý	92cce1da65	Disable inactive handles caching when compiled with sanitizers When isc_nmhandle_t gets deactivated, it could be just put onto array stack to be reused later to safe some initialization time. Unfortunately, this might hide some use-after-free errors. Disable the inactive handles caching when compiled with Address or Thread Sanitizer.	2022-02-23 23:21:29 +01:00
Ondřej Surý	e2555a306f	Remove active handles tracking from isc__nmsocket_t The isc__nmsocket_t has locked array of isc_nmhandle_t that's not used for anything. The isc__nmhandle_get() adds the isc_nmhandle_t to the locked array (and resized if necessary) and removed when isc_nmhandle_put() finally destroys the handle. That's all it does, so it serves no useful purpose. Remove the .ah_handles, .ah_size, and .ah_frees members of the isc__nmsocket_t and .ah_pos member of the isc_nmhandle_t struct.	2022-02-23 22:54:47 +01:00
Ondřej Surý	3268627916	Delay isc__nm_uvreq_t deallocation to connection callback When the TCP, TCPDNS or TLSDNS connection times out, the isc__nm_uvreq_t would be pushed into sock->inactivereqs before the uv_tcp_connect() callback finishes. Because the isc__nmsocket_t keeps the list of inactive isc__nm_uvreq_t, this would cause use-after-free only when the sock->inactivereqs is full (which could never happen because the failure happens in connection timeout callback) or when the sock->inactivereqs mechanism is completely removed (f.e. when running under Address or Thread Sanitizer). Delay isc__nm_uvreq_t deallocation to the connection callback and only signal the connection callback should be called by shutting down the libuv socket from the connection timeout callback.	2022-02-23 22:54:47 +01:00
Ondřej Surý	88418c3372	Properly free up enqueued netievents in nm_destroy() When the isc_netmgr is being destroyed, the normal and priority queues should be dequeued and netievents properly freed. This wasn't the case.	2022-02-23 22:51:12 +01:00
Ondřej Surý	d01562f22b	Remove the keep-response-order ACL map The keep-response-order option has been obsoleted, and in this commit, remove the keep-response-order ACL map rendering the option no-op, the call the isc_nm_sequential() and the now unused isc_nm_sequential() function itself.	2022-02-18 09:16:03 +01:00
Ondřej Surý	4f5b4662b6	Remove the limit on the number of simultaneous TCP queries There was an artificial limit of 23 on the number of simultaneous pipelined queries in the single TCP connection. The new network managers is capable of handling "unlimited" (limited only by the TCP read buffer size ) queries similar to "unlimited" handling of the DNS queries receive over UDP. Don't limit the number of TCP queries that we can process within a single TCP read callback.	2022-02-17 16:19:12 -08:00
Ondřej Surý	3c7b04d015	Add network manager based timer API This commits adds API that allows to create arbitrary timers associated with the network manager handles.	2022-02-17 21:38:17 +01:00
Ondřej Surý	4716c56ebb	Reset the TCP connection when garbage is received When invalid DNS message is received, there was a handling mechanism for DoH that would be called to return proper HTTP response. Reuse this mechanism and reset the TCP connection when the client is blackholed, DNS message is completely bogus or the ns_client receives response instead of query.	2022-02-17 20:39:55 +01:00
Ondřej Surý	a89d9e0fa6	Add isc_nmhandle_setwritetimeout() function In some situations (unit test and forthcoming XFR timeouts MR), we need to modify the write timeout independently of the read timeout. Add a isc_nmhandle_setwritetimeout() function that could be called before isc_nm_send() to specify a custom write timeout interval.	2022-02-17 09:06:58 +01:00
Ondřej Surý	408b362169	Add TCP, TCPDNS and TLSDNS write timer When the outgoing TCP write buffers are full because the other party is not reading the data, the uv_write() could wait indefinitely on the uv_loop and never calling the callback. Add a new write timer that uses the `tcp-idle-timeout` value to interrupt the TCP connection when we are not able to send data for defined period of time.	2022-02-17 09:06:58 +01:00
Ondřej Surý	cd3b58622c	Add uv_tcp_close_reset compat The uv_tcp_close_reset() function was added in libuv 1.32.0 and since we support older libuv releases, we have to add a shim uv_tcp_close_reset() implementation loosely based on libuv.	2022-02-17 09:06:58 +01:00
Ondřej Surý	45a73c113f	Rename sock->timer to sock->read_timer Before adding the write timer, we have to remove the generic sock->timer to sock->read_timer. We don't touch the function names to limit the impact of the refactoring.	2022-02-17 09:06:58 +01:00
Ondřej Surý	8715be1e4b	Use UV_RUNTIME_CHECK() as appropriate Replace the RUNTIME_CHECK() calls for libuv API calls with UV_RUNTIME_CHECK() to get more detailed error message when something fails and should not.	2022-02-16 11:16:57 +01:00
Ondřej Surý	62e15bb06d	Add UV_RUNTIME_CHECK() macro to print uv_strerror() When libuv functions fail, they return correct return value that could be useful for more detailed debugging. Currently, we usually just check whether the return value is 0 and invoke assertion error if it doesn't throwing away the details why the call has failed. Unfortunately, this often happen on more exotic platforms. Add a UV_RUNTIME_CHECK() macro that can be used to print more detailed error message (via uv_strerror() before ending the execution of the program abruptly with the assertion.	2022-02-16 11:16:57 +01:00
Ondřej Surý	2ae84702ad	Add log message when hard quota is reached in TCP accept When isc_quota_attach_cb() API returns ISC_R_QUOTA (meaning hard quota was reached) the accept_connection() would return without logging a message about quota reached. Change the connection callback to log the quota reached message.	2022-02-01 21:00:05 +01:00
Ondřej Surý	b5e086257d	Explicitly enable IPV6_V6ONLY on the netmgr sockets Some operating systems (OpenBSD and DragonFly BSD) don't restrict the IPv6 sockets to sending and receiving IPv6 packets only. Explicitly enable the IPV6_V6ONLY socket option on the IPv6 sockets to prevent failures from using the IPv4-mapped IPv6 address.	2022-01-17 22:16:27 +01:00
Evan Hunt	be0bc24c7f	add UV_ENOTSUP to isc___nm_uverr2result() This error code is now mapped to ISC_R_FAMILYNOSUPPORT.	2022-01-17 11:45:10 +01:00
Artem Boldariev	ca9fe3559a	DoH: ensure that server_send_error_response() is used properly The server_send_error_response() function is supposed to be used only in case of failures and never in case of legitimate requests. Ensure that ISC_HTTP_ERROR_SUCCESS is never passed there by mistake.	2022-01-14 16:00:42 +02:00
Artem Boldariev	a38b4945c1	DoH: add bad HTTP/2 requests logging Add some error logging when facing bad requests over HTTP/2. Log the address and the error description.	2022-01-14 16:00:42 +02:00
Ondřej Surý	0a4e91ee47	Revert "Always enqueue isc__nm_tcp_resumeread()" The commit itself is harmless, but at the same time it is also useless, so we are reverting it. This reverts commit 11c869a3d53eafa4083b404e6b6686a120919c26.	2022-01-13 19:06:39 +01:00
Ondřej Surý	7370725008	Fix the UDP recvmmsg support Previously, the netmgr/udp.c tried to detect the recvmmsg detection in libuv with #ifdef UV_UDP_<foo> preprocessor macros. However, because the UV_UDP_<foo> are not preprocessor macros, but enum members, the detection didn't work. Because the detection didn't work, the code didn't have access to the information when we received the final chunk of the recvmmsg and tried to free the uvbuf every time. Fortunately, the isc__nm_free_uvbuf() had a kludge that detected attempt to free in the middle of the receive buffer, so the code worked. However, libuv 1.37.0 changed the way the recvmmsg was enabled from implicit to explicit, and we checked for yet another enum member presence with preprocessor macro, so in fact libuv recvmmsg support was never enabled with libuv >= 1.37.0. This commit changes to the preprocessor macros to autoconf checks for declaration, so the detection now works again. On top of that, it's now possible to cleanup the alloc_cb and free_uvbuf functions because now, the information whether we can or cannot free the buffer is available to us.	2022-01-13 19:06:39 +01:00
Ondřej Surý	58bd26b6cf	Update the copyright information in all files in the repository This commit converts the license handling to adhere to the REUSE specification. It specifically: 1. Adds used licnses to LICENSES/ directory 2. Add "isc" template for adding the copyright boilerplate 3. Changes all source files to include copyright and SPDX license header, this includes all the C sources, documentation, zone files, configuration files. There are notes in the doc/dev/copyrights file on how to add correct headers to the new files. 4. Handle the rest that can't be modified via .reuse/dep5 file. The binary (or otherwise unmodifiable) files could have license places next to them in <foo>.license file, but this would lead to cluttered repository and most of the files handled in the .reuse/dep5 file are system test files.	2022-01-11 09:05:02 +01:00
Ondřej Surý	11c869a3d5	Always enqueue isc__nm_tcp_resumeread() The isc__nm_tcp_resumeread() was using maybe_enqueue function to enqueue netmgr event which could case the read callback to be executed immediately if there was enough data waiting in the TCP queue. If such thing would happen, the read callback would be called before the previous read callback was finished and the worker receive buffer would be still marked "in use" causing a assertion failure. This would affect only raw TCP channels, e.g. rndc and http statistics.	2022-01-06 10:34:04 -08:00
Ondřej Surý	6269fce0fe	Use isc_mem_get_aligned() for isc_queue and cleanup max_threads The isc_queue_new() was using dirty tricks to allocate the head and tail members of the struct aligned to the cacheline. We can now use isc_mem_get_aligned() to allocate the structure to the cacheline directly. Use ISC_OS_CACHELINE_SIZE (64) instead of arbitrary ALIGNMENT (128), one cacheline size is enough to prevent false sharing. Cleanup the unused max_threads variable - there was actually no limit on the maximum number of threads. This was changed a while ago.	2022-01-05 17:10:58 +01:00
Artem Boldariev	5b7d4341fe	Use the TLS context cache for server-side contexts Using the TLS context cache for server-side contexts could reduce the number of contexts to initialise in the configurations when e.g. the same 'tls' entry is used in multiple 'listen-on' statements for the same DNS transport, binding to multiple IP addresses. In such a case, only one TLS context will be created, instead of a context per IP address, which could reduce the initialisation time, as initialising even a non-ephemeral TLS context introduces some delay, which can be visually noticeable by log activity. Also, this change lays down a foundation for Mutual TLS (when the server validates a client certificate, additionally to a client validating the server), as the TLS context cache can be extended to store additional data required for validation (like intermediates CA chain). Additionally to the above, the change ensures that the contexts are not being changed after initialisation, as such a practice is frowned upon. Previously we would set the supported ALPN tags within isc_nm_listenhttp() and isc_nm_listentlsdns(). We do not do that for client-side contexts, so that appears to be an overlook. Now we set the supported ALPN tags right after server-side contexts creation, similarly how we do for client-side ones.	2021-12-29 10:25:14 +02:00

1 2 3 4 5 ...

425 Commits