mir/bind - bind - Mike's Git repositories

mir/bind

mirror of https://gitlab.isc.org/isc-projects/bind9 synced 2025-08-28 13:08:06 +00:00

Author	SHA1	Message	Date
Ondřej Surý	15329d471e	Add memory pools for isc_nmsocket_t structures To reduce memory pressure, we can add light per-loop (netmgr worker) memory pools for isc_nmsocket_t structures. This will help in situations where there's a lot of churn creating and destroying the nmsockets.	2024-02-08 15:13:47 +01:00
Mark Andrews	decc17d3b0	Ineffective DbC protections Dereference before NULL checks. Thanks to Eric Sesterhenn from X41 D-Sec GmbH for reporting this.	2023-11-21 14:48:43 +11:00
Ondřej Surý	55c29b8d83	Do extra manual isc_mem_cget() conversions Some of the cases weren't caught by the coccinelle and there were some places where cget+memmove() could get converted to simple creget().	2023-08-31 22:08:35 +02:00
Ondřej Surý	f36e118b9a	Limit the number of inactive handles kept for reuse Instead of growing and never shrinking the list of the inactive handles (to be reused mostly on the UDP connections), limit the number of maximum number of inactive handles kept to 64. Instead of caching the inactive handles for all listening sockets, enable the caching on on UDP listening sockets. For TCP, the handles were cached for each accepted socket thus reusing the handles only for long-standing TCP connections, but not reusing the handles across different TCP streams.	2023-08-21 16:34:30 +02:00
Ondřej Surý	3b10814569	Fix the streaming read callback shutdown logic When shutting down TCP sockets, the read callback calling logic was flawed, it would call either one less callback or one extra. Fix the logic in the way: 1. When isc_nm_read() has been called but isc_nm_read_stop() hasn't on the handle, the read callback will be called with ISC_R_CANCELED to cancel active reading from the socket/handle. 2. When isc_nm_read() has been called and isc_nm_read_stop() has been called on the on the handle, the read callback will be called with ISC_R_SHUTTINGDOWN to signal that the dormant (not-reading) socket is being shut down. 3. The .reading and .recv_read flags are little bit tricky. The .reading flag indicates if the outer layer is reading the data (that would be uv_tcp_t for TCP and isc_nmsocket_t (TCP) for TLSStream), the .recv_read flag indicates whether somebody is interested in the data read from the socket. Usually, you would expect that the .reading should be false when .recv_read is false, but it gets even more tricky with TLSStream as the TLS protocol might need to read from the socket even when sending data. Fix the usage of the .recv_read and .reading flags in the TLSStream to their true meaning - which mostly consist of using .recv_read everywhere and then wrapping isc_nm_read() and isc_nm_read_stop() with the .reading flag. 4. The TLS failed read helper has been modified to resemble the TCP code as much as possible, clearing and re-setting the .recv_read flag in the TCP timeout code has been fixed and .recv_read is now cleared when isc_nm_read_stop() has been called on the streaming socket. 5. The use of Network Manager in the named_controlconf, isccc_ccmsg, and isc_httpd units have been greatly simplified due to the improved design. 6. More unit tests for TCP and TLS testing the shutdown conditions have been added. Co-authored-by: Ondřej Surý <ondrej@isc.org> Co-authored-by: Artem Boldariev <artem@isc.org>	2023-04-20 12:58:32 +02:00
Ondřej Surý	1715cad685	Refactor the isc_quota code and fix the quota in TCP accept code In e18541287231b721c9cdb7e492697a2a80fd83fc, the TCP accept quota code became broken in a subtle way - the quota would get initialized on the first accept for the server socket and then deleted from the server socket, so it would never get applied again. Properly fixing this required a bigger refactoring of the isc_quota API code to make it much simpler. The new code decouples the ownership of the quota and acquiring/releasing the quota limit. After (during) the refactoring it became more clear that we need to use the callback from the child side of the accepted connection, and not the server side.	2023-04-12 14:10:37 +02:00
Ondřej Surý	2846888c57	Attach the accept "client" socket to .listener member of the socket When accepting a TCP connection in the higher layers (tlsstream, streamdns, and http) attach to the socket the connection was accepted on, and use this socket instead of the parent listening socket. This has an advantage - accessing the sock->listener now doesn't break the thread boundaries, so we can properly check whether the socket is being closed without requiring .closing member to be atomic_bool.	2023-03-30 16:10:08 +02:00
Ondřej Surý	45365adb32	Convert sock->active to non-atomic variable, cleanup rchildren The last atomic_bool variable sock->active was converted to non-atomic bool by properly handling the listening socket case where we were checking parent socket instead of children sockets. This is no longer necessary as we properly set the .active to false on the children sockets. Additionally, cleanup the .rchildren - the atomic variable was used for mutex+condition to block until all children were listening, but that's now being handled by a barrier. Finally, just remove dead .self and .active_child_connections members of the netmgr socket.	2023-03-30 16:10:08 +02:00
Ondřej Surý	e1a4572fd6	Refactor the use of atomics in netmgr Now that everything runs on their own loop and we don't cross the thread boundaries (with few exceptions), most of the atomic_bool variables used to track the socket state have been unatomicized because they are always accessed from the matching thread. The remaining few have been relaxed: a) the sock->active is now using acquire/release memory ordering; b) the various global limits are now using relaxed memory ordering - we don't really care about the synchronization for those.	2023-03-30 16:10:08 +02:00
Ondřej Surý	639d5065a3	Refactor the isc__nm_uvreq_t to have idle callback Change the isc__nm_uvreq_t to have the idle callback as a separate member as we always need to use it to properly close the uvreq. Slightly refactor uvreq_put and uvreq_get to remove the unneeded arguments - in uvreq_get(), we always use sock->worker, and in uvreq_put, we always use req->sock, so there's not reason to pass those extra arguments.	2023-03-29 21:16:44 +02:00
Ondřej Surý	1baffb6ff5	Convert canceling UDP socket to to isc_async callback Simplify the canceling of the UDP socket by using the isc_async API from the loopmgr instead of using the asychronous netievent mechanism in the netmgr.	2023-03-24 07:58:52 +01:00
Ondřej Surý	8cb4cfd9db	Convert stopping UDP children to to isc_async callback Simplify the stopping of the UDP children by using the isc_async API from the loopmgr instead of using the asychronous netievent mechanism in the netmgr.	2023-03-24 07:58:52 +01:00
Ondřej Surý	b25dd5eaf5	Convert starting UDP children to to isc_async callback Simplify the starting of the UDP children by using the isc_async API from the loopmgr instead of using the asychronous netievent mechanism in the netmgr.	2023-03-24 07:58:52 +01:00
Mark Andrews	b74dd2e8c2	Use INSIST rather then REQUIRE to meet DBC usage rules	2023-01-20 11:05:24 +11:00
Mark Andrews	624f5a0dae	isc_nm_listenudp: treat socket failures gracefully The old code didn't handle race conditions and errors on systems with non load balancing sockets gracefully. Look for an error on any child socket and if found close all the child sockets and return an error.	2023-01-20 11:05:24 +11:00
Ondřej Surý	d06602f036	Get rid of locking during UDP and TCP listen We already have a synchronization mechanism when starting the UDP and TCP listener children - barriers. Change how we start the first-born child (tid == 0), so we don't have to race for sock->parent->result and sock->parent->fd.	2023-01-11 07:17:46 +01:00
Ondřej Surý	5bbba0d1a1	Simplify tracing the reference counting in isc_netmgr Always track the per-worker sockets in the .active_sockets field in the isc__networker_t struct and always track the per-socket handles in the .active_handles field ian the isc_nmsocket_t struct.	2023-01-10 19:57:39 +01:00
Evan Hunt	9c577e10c3	use separate barriers for "stop" and "listen" operations On some platforms, when a synchronizing barrier is cleared, one thread can progress while other threads are still in the process of releasing the barrier. If a barrier is reused by the progressing thread during this window, it can cause a deadlock. This can occur if, for example, we stop listening immediately after we start, because the stop and listen functions both use socket->barrier. This has been addressed by using separate barrier objects for stop and listen.	2023-01-07 16:30:21 -08:00
Michal Nowak	afdb41a5aa	Update sources to Clang 15 formatting	2022-11-29 08:54:34 +01:00
Ondřej Surý	f3004da3a5	Make the netmgr send callback to be asynchronous only when needed Previously, the send callback would be synchronous only on success. Add an option (similar to what other callbacks have) to decide whether we need the asynchronous send callback on a higher level. On a general level, we need the asynchronous callbacks to happen only when we are invoking the callback from the public API. If the path to the callback went through the libuv callback or netmgr callback, we are already on asynchronous path, and there's no need to make the call to the callback asynchronous again. For the send callback, this means we need the asynchronous path for failure paths inside the isc_nm_send() (which calls isc__nm_udp_send(), isc__nm_tcp_send(), etc...) - all other invocations of the send callback could be synchronous, because those are called from the respective libuv send callbacks.	2022-11-25 15:46:25 +01:00
Ondřej Surý	5ca49942a3	Make the netmgr read callback to be asynchronous only when needed Previously, the read callback would be synchronous only on success or timeout. Add an option (similar to what other callbacks have) to decide whether we need the asynchronous read callback on a higher level. On a general level, we need the asynchronous callbacks to happen only when we are invoking the callback from the public API. If the path to the callback went through the libuv callback or netmgr callback, we are already on asynchronous path, and there's no need to make the call to the callback asynchronous again. For the read callback, this means we need the asynchronous path for failure paths inside the isc_nm_read() (which calls isc__nm_udp_read(), isc__nm_tcp_read(), etc...) - all other invocations of the read callback could be synchronous, because those are called from the respective libuv or netmgr read callbacks.	2022-11-25 15:46:15 +01:00
Artem Boldariev	5ab2c0ebb3	Synchronise stop listening operation for multi-layer transports This commit introduces a primitive isc__nmsocket_stop() which performs shutting down on a multilayered socket ensuring the proper order of the operations. The shared data within the socket object can be destroyed after the call completed, as it is guaranteed to not be used from within the context of other worker threads.	2022-10-18 12:06:00 +03:00
Ondřej Surý	b6b7a6886a	Don't set load-balancing socket option on the UDP connect sockets The isc_nm_udpconnect() erroneously set the reuse port with load-balancing on the outgoing connected UDP sockets. This socket option makes only sense for the listening sockets. Don't set the load-balancing reuse port option on the outgoing UDP sockets.	2022-10-12 15:36:25 +02:00
Ondřej Surý	c1d26b53eb	Add and use semantic patch to replace isc_mem_get/allocate+memset Add new semantic patch to replace the straightfoward uses of: ptr = isc_mem_{get,allocate}(..., size); memset(ptr, 0, size); with the new API call: ptr = isc_mem_{get,allocate}x(..., size, ISC_MEM_ZERO);	2022-10-05 16:44:05 +02:00
Ondřej Surý	173c352452	Call the isc__nm_udp_send() callbacks asynchronously on shutdown The isc__nm_udp_send() callback would be called synchronously when shutting down or when the socket has been closed. This could lead to double locking in the calling code and thus those callbacks needs to be called asynchronously.	2022-09-29 11:06:58 +02:00
Ondřej Surý	0086ebf3fc	Bump the libuv requirement to libuv >= 1.34.0 By bumping the minimum libuv version to 1.34.0, it allows us to remove all libuv shims we ever had and makes the code much cleaner. The up-to-date libuv is available in all distributions supported by BIND 9.19+ either natively or as a backport.	2022-09-27 17:09:10 +02:00
Ondřej Surý	fffd444440	Cleanup the asychronous code in the stream implementations After the loopmgr work has been merged, we can now cleanup the TCP and TLS protocols a little bit, because there are stronger guarantees that the sockets will be kept on the respective loops/threads. We only need asynchronous call for listening sockets (start, stop) and reading from the TCP (because the isc_nm_read() might be called from read callback again. This commit does the following changes (they are intertwined together): 1. Cleanup most of the asynchronous events in the TCP code, and add comments for the events that needs to be kept asynchronous. 2. Remove isc_nm_resumeread() from the netmgr API, and replace isc_nm_resumeread() calls with existing isc_nm_read() calls. 3. Remove isc_nm_pauseread() from the netmgr API, and replace isc_nm_pauseread() calls with a new isc_nm_read_stop() call. 4. Disable the isc_nm_cancelread() for the streaming protocols, only the datagram-like protocols can use isc_nm_cancelread(). 5. Add isc_nmhandle_close() that can be used to shutdown the socket earlier than after the last detach. Formerly, the socket would be closed only after all reading and sending would be finished and the last reference would be detached. The new isc_nmhandle_close() can be used to close the underlying socket earlier, so all the other asynchronous calls would call their respective callbacks immediately. Co-authored-by: Ondřej Surý <ondrej@isc.org> Co-authored-by: Artem Boldariev <artem@isc.org>	2022-09-22 14:51:15 +02:00
Ondřej Surý	f6e4f620b3	Use the semantic patch to do the unsigned -> unsigned int change Apply the semantic patch on the whole code base to get rid of 'unsigned' usage in favor of explicit 'unsigned int'.	2022-09-19 15:56:02 +02:00
Ondřej Surý	9b8d432403	Reorder the uv_close() calls to close the socket immediately Simplify the closing code - during the loopmgr implementation, it was discovered that the various lists used by the uv_loop_t aren't FIFO, but LIFO. See doc/dev/libuv.md for more details. With this knowledge, we can close the protocol handles (uv_udp_t and uv_tcp_t) and uv_timer_t at the same time by reordering the uv_close() calls, and thus making sure that after calling the isc__nm_stoplistening(), the code will not issue any additional callback calls (accept, read) on the socket that stopped listening. This might help with the TLS and DoH shutting down sequence as described in the [GL #3509] as we now stop the reading, stop the timer and call the uv_close() as earliest as possible.	2022-09-19 14:38:56 +02:00
Ondřej Surý	eac8bc5c1a	Prevent unexpected UDP client read callbacks The network manager UDP code was misinterpreting when the libuv called the udp_recv_cb with nrecv == 0 and addr == NULL -> this doesn't really mean that the "stream" has ended, but the libuv indicates that the receive buffer can be freed. This could lead to assertion failure in the code that calls isc_nm_read() from the network manager read callback due to the extra spurious callbacks. Properly handle the extra callback calls from the libuv in the client read callback, and refactor the UDP isc_nm_read() implementation to be synchronous, so no datagram is lost between the time that we stop the reading from the UDP socket and we restart it again in the asychronous udpread event. Add a unit test that tests the isc_nm_read() call from the read callback to receive two datagrams.	2022-09-19 12:20:41 +02:00
Ondřej Surý	718e92c31a	Clear the callbacks when isc_nm_stoplistening() is called When we are closing the listening sockets, there's a time window in which the TCP connection could be accepted although the respective stoplistening function has already returned to control to the caller. Clear the accept callback function early, so it doesn't get called when we are not interested in the incoming connections anymore.	2022-08-26 09:09:25 +02:00
Ondřej Surý	b69e783164	Update netmgr, tasks, and applications to use isc_loopmgr Previously: * applications were using isc_app as the base unit for running the application and signal handling. * networking was handled in the netmgr layer, which would start a number of threads, each with a uv_loop event loop. * task/event handling was done in the isc_task unit, which used netmgr event loops to run the isc_event calls. In this refactoring: * the network manager now uses isc_loop instead of maintaining its own worker threads and event loops. * the taskmgr that manages isc_task instances now also uses isc_loopmgr, and every isc_task runs on a specific isc_loop bound to the specific thread. * applications have been updated as necessary to use the new API. * new ISC_LOOP_TEST macros have been added to enable unit tests to run isc_loop event loops. unit tests have been updated to use this where needed.	2022-08-26 09:09:24 +02:00
Ondřej Surý	a280855f7b	Handle the transient TCP connect() failures on FreeBSD On FreeBSD (and perhaps other *BSD) systems, the TCP connect() call (via uv_tcp_connect()) can fail with transient UV_EADDRINUSE error. The UDP code already handles this by trying three times (is a charm) before giving up. Add a code for the TCP, TCPDNS and TLSDNS layers to also try three times before giving up by calling uv_tcp_connect() from the callback two more time on UV_EADDRINUSE error. Additionally, stop the timer only if we succeed or on hard error via isc__nm_failed_connect_cb().	2022-07-14 14:20:10 +02:00
Ondřej Surý	b432d5d3bc	Gracefully handle uv_read_start() failures Under specific rare timing circumstances the uv_read_start() could fail with UV_EINVAL when the connection is reset between the connect (or accept) and the uv_read_start() call on the nmworker loop. Handle such situation gracefully by propagating the errors from uv_read_start() into upper layers, so the socket can be internally closed().	2022-06-14 11:33:02 +02:00
Ondřej Surý	b43812692d	Move netmgr/uv-compat.h to <isc/uv.h> As we are going to use libuv outside of the netmgr, we need the shims to be readily available for the rest of the codebase. Move the "netmgr/uv-compat.h" to <isc/uv.h> and netmgr/uv-compat.c to uv.c, and as a rule of thumb, the users of libuv should include <isc/uv.h> instead of <uv.h> directly. Additionally, merge netmgr/uverr2result.c into uv.c and rename the single function from isc__nm_uverr2result() to isc_uverr2result().	2022-05-03 10:02:19 +02:00
Ondřej Surý	24c3879675	Move socket related functions to netmgr/socket.c Move the netmgr socket related functions from netmgr/netmgr.c and netmgr/uv-compat.c to netmgr/socket.c, so they are all present all in the same place. Adjust the names of couple interal functions accordingly.	2022-05-03 09:52:49 +02:00
Ondřej Surý	407b37c3f2	Set IP(V6)_RECVERR on connect UDP sockets (via libuv) The connect()ed UDP socket provides feedback on a variety of ICMP errors (eg port unreachable) which bind can then use to decide what to do with errors (report them to the client, try again with a different nameserver etc). However, Linux's implementation does not report what it considers "transient" conditions, which is defined as Destination host Unreachable, Destination network unreachable, Source Route Failed and Message Too Big. Explicitly enable IP_RECVERR / IPV6_RECVERR (via libuv uv_udp_bind() flag) to learn about ICMP destination network/host unreachable.	2022-04-26 12:22:18 +02:00
Ondřej Surý	f55a4d3e55	Allow listening on less than nworkers threads For some applications, it's useful to not listen on full battery of threads. Add workers argument to all isc_nm_listen*() functions and convenience ISC_NM_LISTEN_ONE and ISC_NM_LISTEN_ALL macros.	2022-04-19 11:08:13 +02:00
Ondřej Surý	85c6e797aa	Add option to configure load balance sockets Previously, the option to enable kernel load balancing of the sockets was always enabled when supported by the operating system (SO_REUSEPORT on Linux and SO_REUSEPORT_LB on FreeBSD). It was reported that in scenarios where the networking threads are also responsible for processing long-running tasks (like RPZ processing, CATZ processing or large zone transfers), this could lead to intermitten brownouts for some clients, because the thread assigned by the operating system might be busy. In such scenarious, the overall performance would be better served by threads competing over the sockets because the idle threads can pick up the incoming traffic. Add new configuration option (`load-balance-sockets`) to allow enabling or disabling the load balancing of the sockets.	2022-04-04 23:10:04 +02:00
Ondřej Surý	9de10cd153	Remove extrahandle size from netmgr Previously, it was possible to assign a bit of memory space in the nmhandle to store the client data. This was complicated and prevents further refactoring of isc_nmhandle_t caching (future work). Instead of caching the data in the nmhandle, allocate the hot-path ns_client_t objects from per-thread clientmgr memory context and just assign it to the isc_nmhandle_t via isc_nmhandle_set().	2022-03-25 10:38:35 +01:00
Ondřej Surý	584f0d7a7e	Simplify way we tag unreachable code with only ISC_UNREACHABLE() Previously, the unreachable code paths would have to be tagged with: INSIST(0); ISC_UNREACHABLE(); There was also older parts of the code that used comment annotation: /* NOTREACHED */ Unify the handling of unreachable code paths to just use: UNREACHABLE(); The UNREACHABLE() macro now asserts when reached and also uses __builtin_unreachable(); when such builtin is available in the compiler.	2022-03-25 08:33:43 +01:00
Ondřej Surý	a761aa59e3	Change single write timer to per-send timers Previously, there was a single per-socket write timer that would get restarted for every new write. This turned out to be insufficient because the other side could keep reseting the timer, and never reading back the responses. Change the single write timer to per-send timer which would in turn reset the TCP connection on the first send timeout.	2022-03-11 09:56:57 +01:00
Ondřej Surý	5d34a14f22	Set minimum MTU (1280) on IPv6 sockets The IPV6_USE_MIN_MTU socket option directs the IP layer to limit the IPv6 packet size to the minimum required supported MTU from the base IPv6 specification, i.e. 1280 bytes. Many implementations of TCP running over IPv6 neglect to check the IPV6_USE_MIN_MTU value when performing MSS negotiation and when constructing a TCP segment despite MSS being defined to be the MTU less the IP and TCP header sizes (60 bytes for IPv6). This leads to oversized IPv6 packets being sent resulting in unintended Path Maximum Transport Unit Discovery (PMTUD) being performed and to fragmented IPv6 packets being sent. Add and use a function to set socket option to limit the MTU on IPv6 sockets to the minimum MTU (1280) both for UDP and TCP.	2022-03-08 10:27:05 +01:00
Ondřej Surý	408b362169	Add TCP, TCPDNS and TLSDNS write timer When the outgoing TCP write buffers are full because the other party is not reading the data, the uv_write() could wait indefinitely on the uv_loop and never calling the callback. Add a new write timer that uses the `tcp-idle-timeout` value to interrupt the TCP connection when we are not able to send data for defined period of time.	2022-02-17 09:06:58 +01:00
Ondřej Surý	45a73c113f	Rename sock->timer to sock->read_timer Before adding the write timer, we have to remove the generic sock->timer to sock->read_timer. We don't touch the function names to limit the impact of the refactoring.	2022-02-17 09:06:58 +01:00
Ondřej Surý	8715be1e4b	Use UV_RUNTIME_CHECK() as appropriate Replace the RUNTIME_CHECK() calls for libuv API calls with UV_RUNTIME_CHECK() to get more detailed error message when something fails and should not.	2022-02-16 11:16:57 +01:00
Ondřej Surý	b5e086257d	Explicitly enable IPV6_V6ONLY on the netmgr sockets Some operating systems (OpenBSD and DragonFly BSD) don't restrict the IPv6 sockets to sending and receiving IPv6 packets only. Explicitly enable the IPV6_V6ONLY socket option on the IPv6 sockets to prevent failures from using the IPv4-mapped IPv6 address.	2022-01-17 22:16:27 +01:00
Ondřej Surý	7370725008	Fix the UDP recvmmsg support Previously, the netmgr/udp.c tried to detect the recvmmsg detection in libuv with #ifdef UV_UDP_<foo> preprocessor macros. However, because the UV_UDP_<foo> are not preprocessor macros, but enum members, the detection didn't work. Because the detection didn't work, the code didn't have access to the information when we received the final chunk of the recvmmsg and tried to free the uvbuf every time. Fortunately, the isc__nm_free_uvbuf() had a kludge that detected attempt to free in the middle of the receive buffer, so the code worked. However, libuv 1.37.0 changed the way the recvmmsg was enabled from implicit to explicit, and we checked for yet another enum member presence with preprocessor macro, so in fact libuv recvmmsg support was never enabled with libuv >= 1.37.0. This commit changes to the preprocessor macros to autoconf checks for declaration, so the detection now works again. On top of that, it's now possible to cleanup the alloc_cb and free_uvbuf functions because now, the information whether we can or cannot free the buffer is available to us.	2022-01-13 19:06:39 +01:00
Ondřej Surý	58bd26b6cf	Update the copyright information in all files in the repository This commit converts the license handling to adhere to the REUSE specification. It specifically: 1. Adds used licnses to LICENSES/ directory 2. Add "isc" template for adding the copyright boilerplate 3. Changes all source files to include copyright and SPDX license header, this includes all the C sources, documentation, zone files, configuration files. There are notes in the doc/dev/copyrights file on how to add correct headers to the new files. 4. Handle the rest that can't be modified via .reuse/dep5 file. The binary (or otherwise unmodifiable) files could have license places next to them in <foo>.license file, but this would lead to cluttered repository and most of the files handled in the .reuse/dep5 file are system test files.	2022-01-11 09:05:02 +01:00
Evan Hunt	8c51a32e5c	netmgr: add isc_nm_routeconnect() isc_nm_routeconnect() opens a route/netlink socket, then calls a connect callback, much like isc_nm_udpconnect(), with a handle that can then be monitored for network changes. Internally the socket is treated as a UDP socket, since route/netlink sockets follow the datagram contract.	2021-10-15 00:56:58 -07:00

1 2 3

133 Commits