mir/bind - bind - Mike's Git repositories

mir/bind

mirror of https://gitlab.isc.org/isc-projects/bind9 synced 2025-08-24 02:58:38 +00:00

Author	SHA1	Message	Date
Ondřej Surý	6ddac2d56d	On shutdown, reset the established TCP connections Previously, the established TCP connections (both client and server) would be gracefully closed waiting for the write timeout. Don't wait for TCP connections to gracefully shutdown, but directly reset them for faster shutdown.	2022-03-11 09:56:57 +01:00
Ondřej Surý	a761aa59e3	Change single write timer to per-send timers Previously, there was a single per-socket write timer that would get restarted for every new write. This turned out to be insufficient because the other side could keep reseting the timer, and never reading back the responses. Change the single write timer to per-send timer which would in turn reset the TCP connection on the first send timeout.	2022-03-11 09:56:57 +01:00
Ondřej Surý	8098a58581	Set TCP maximum segment size to minimum size of 1220 Previously the socket code would set the TCPv6 maximum segment size to minimum value to prevent IP fragmentation for TCP. This was not yet implemented for the network manager. Implement network manager functions to set and use minimum MTU socket option and set the TCP_MAXSEG socket option for both IPv4 and IPv6 and use those to clamp the TCP maximum segment size for TCP, TCPDNS and TLSDNS layers in the network manager to 1220 bytes, that is 1280 (IPv6 minimum link MTU) minus 40 (IPv6 fixed header) minus 20 (TCP fixed header) We already rely on a similar value for UDP to prevent IP fragmentation and it make sense to use the same value for IPv4 and IPv6 because the modern networks are required to support IPv6 packet sizes. If there's need for small TCP segment values, the MTU on the interfaces needs to be properly configured.	2022-03-08 10:27:05 +01:00
Ondřej Surý	5d34a14f22	Set minimum MTU (1280) on IPv6 sockets The IPV6_USE_MIN_MTU socket option directs the IP layer to limit the IPv6 packet size to the minimum required supported MTU from the base IPv6 specification, i.e. 1280 bytes. Many implementations of TCP running over IPv6 neglect to check the IPV6_USE_MIN_MTU value when performing MSS negotiation and when constructing a TCP segment despite MSS being defined to be the MTU less the IP and TCP header sizes (60 bytes for IPv6). This leads to oversized IPv6 packets being sent resulting in unintended Path Maximum Transport Unit Discovery (PMTUD) being performed and to fragmented IPv6 packets being sent. Add and use a function to set socket option to limit the MTU on IPv6 sockets to the minimum MTU (1280) both for UDP and TCP.	2022-03-08 10:27:05 +01:00
Ondřej Surý	6bd025942c	Replace netievent lock-free queue with simple locked queue The current implementation of isc_queue uses Michael-Scott lock-free queue that in turn uses hazard pointers. It was discovered that the way we use the isc_queue, such complicated mechanism isn't really needed, because most of the time, we either execute the work directly when on nmthread (in case of UDP) or schedule the work from the matching nmthreads. Replace the current implementation of the isc_queue with a simple locked ISC_LIST. There's a slight improvement - since copying the whole list is very lightweight - we move the queue into a new list before we start the processing and locking just for moving the queue and not for every single item on the list. NOTE: There's a room for future improvements - since we don't guarantee the order in which the netievents are processed, we could have two lists - one unlocked that would be used when scheduling the work from the matching thread and one locked that would be used from non-matching thread.	2022-03-04 13:49:51 +01:00
Ondřej Surý	b220fb32bd	Handle TCP sockets in isc__nmsocket_reset() The isc__nmsocket_reset() was missing a case for raw TCP sockets (used by RNDC and DoH) which would case a assertion failure when write timeout would be triggered. TCP sockets are now also properly handled in isc__nmsocket_reset().	2022-02-28 02:06:03 -08:00
Ondřej Surý	ecf042991c	Fix typo __SANITIZE_ADDRESS -> __SANITIZE_ADDRESS__ When checking for Address Sanitizer to disable the inactivehandles caching, there was a typo in the macro.	2022-02-24 00:15:16 +01:00
Ondřej Surý	be339b3c83	Disable inactive uvreqs caching when compiled with sanitizers When isc__nm_uvreq_t gets deactivated, it could be just put onto array stack to be reused later to save some initialization time. Unfortunately, this might hide some use-after-free errors. Disable the inactive uvreqs caching when compiled with Address or Thread Sanitizer.	2022-02-24 00:15:16 +01:00
Ondřej Surý	92cce1da65	Disable inactive handles caching when compiled with sanitizers When isc_nmhandle_t gets deactivated, it could be just put onto array stack to be reused later to safe some initialization time. Unfortunately, this might hide some use-after-free errors. Disable the inactive handles caching when compiled with Address or Thread Sanitizer.	2022-02-23 23:21:29 +01:00
Ondřej Surý	e2555a306f	Remove active handles tracking from isc__nmsocket_t The isc__nmsocket_t has locked array of isc_nmhandle_t that's not used for anything. The isc__nmhandle_get() adds the isc_nmhandle_t to the locked array (and resized if necessary) and removed when isc_nmhandle_put() finally destroys the handle. That's all it does, so it serves no useful purpose. Remove the .ah_handles, .ah_size, and .ah_frees members of the isc__nmsocket_t and .ah_pos member of the isc_nmhandle_t struct.	2022-02-23 22:54:47 +01:00
Ondřej Surý	3268627916	Delay isc__nm_uvreq_t deallocation to connection callback When the TCP, TCPDNS or TLSDNS connection times out, the isc__nm_uvreq_t would be pushed into sock->inactivereqs before the uv_tcp_connect() callback finishes. Because the isc__nmsocket_t keeps the list of inactive isc__nm_uvreq_t, this would cause use-after-free only when the sock->inactivereqs is full (which could never happen because the failure happens in connection timeout callback) or when the sock->inactivereqs mechanism is completely removed (f.e. when running under Address or Thread Sanitizer). Delay isc__nm_uvreq_t deallocation to the connection callback and only signal the connection callback should be called by shutting down the libuv socket from the connection timeout callback.	2022-02-23 22:54:47 +01:00
Ondřej Surý	88418c3372	Properly free up enqueued netievents in nm_destroy() When the isc_netmgr is being destroyed, the normal and priority queues should be dequeued and netievents properly freed. This wasn't the case.	2022-02-23 22:51:12 +01:00
Ondřej Surý	d01562f22b	Remove the keep-response-order ACL map The keep-response-order option has been obsoleted, and in this commit, remove the keep-response-order ACL map rendering the option no-op, the call the isc_nm_sequential() and the now unused isc_nm_sequential() function itself.	2022-02-18 09:16:03 +01:00
Ondřej Surý	4f5b4662b6	Remove the limit on the number of simultaneous TCP queries There was an artificial limit of 23 on the number of simultaneous pipelined queries in the single TCP connection. The new network managers is capable of handling "unlimited" (limited only by the TCP read buffer size ) queries similar to "unlimited" handling of the DNS queries receive over UDP. Don't limit the number of TCP queries that we can process within a single TCP read callback.	2022-02-17 16:19:12 -08:00
Ondřej Surý	4716c56ebb	Reset the TCP connection when garbage is received When invalid DNS message is received, there was a handling mechanism for DoH that would be called to return proper HTTP response. Reuse this mechanism and reset the TCP connection when the client is blackholed, DNS message is completely bogus or the ns_client receives response instead of query.	2022-02-17 20:39:55 +01:00
Ondřej Surý	a89d9e0fa6	Add isc_nmhandle_setwritetimeout() function In some situations (unit test and forthcoming XFR timeouts MR), we need to modify the write timeout independently of the read timeout. Add a isc_nmhandle_setwritetimeout() function that could be called before isc_nm_send() to specify a custom write timeout interval.	2022-02-17 09:06:58 +01:00
Ondřej Surý	408b362169	Add TCP, TCPDNS and TLSDNS write timer When the outgoing TCP write buffers are full because the other party is not reading the data, the uv_write() could wait indefinitely on the uv_loop and never calling the callback. Add a new write timer that uses the `tcp-idle-timeout` value to interrupt the TCP connection when we are not able to send data for defined period of time.	2022-02-17 09:06:58 +01:00
Ondřej Surý	45a73c113f	Rename sock->timer to sock->read_timer Before adding the write timer, we have to remove the generic sock->timer to sock->read_timer. We don't touch the function names to limit the impact of the refactoring.	2022-02-17 09:06:58 +01:00
Ondřej Surý	8715be1e4b	Use UV_RUNTIME_CHECK() as appropriate Replace the RUNTIME_CHECK() calls for libuv API calls with UV_RUNTIME_CHECK() to get more detailed error message when something fails and should not.	2022-02-16 11:16:57 +01:00
Ondřej Surý	b5e086257d	Explicitly enable IPV6_V6ONLY on the netmgr sockets Some operating systems (OpenBSD and DragonFly BSD) don't restrict the IPv6 sockets to sending and receiving IPv6 packets only. Explicitly enable the IPV6_V6ONLY socket option on the IPv6 sockets to prevent failures from using the IPv4-mapped IPv6 address.	2022-01-17 22:16:27 +01:00
Ondřej Surý	7370725008	Fix the UDP recvmmsg support Previously, the netmgr/udp.c tried to detect the recvmmsg detection in libuv with #ifdef UV_UDP_<foo> preprocessor macros. However, because the UV_UDP_<foo> are not preprocessor macros, but enum members, the detection didn't work. Because the detection didn't work, the code didn't have access to the information when we received the final chunk of the recvmmsg and tried to free the uvbuf every time. Fortunately, the isc__nm_free_uvbuf() had a kludge that detected attempt to free in the middle of the receive buffer, so the code worked. However, libuv 1.37.0 changed the way the recvmmsg was enabled from implicit to explicit, and we checked for yet another enum member presence with preprocessor macro, so in fact libuv recvmmsg support was never enabled with libuv >= 1.37.0. This commit changes to the preprocessor macros to autoconf checks for declaration, so the detection now works again. On top of that, it's now possible to cleanup the alloc_cb and free_uvbuf functions because now, the information whether we can or cannot free the buffer is available to us.	2022-01-13 19:06:39 +01:00
Ondřej Surý	58bd26b6cf	Update the copyright information in all files in the repository This commit converts the license handling to adhere to the REUSE specification. It specifically: 1. Adds used licnses to LICENSES/ directory 2. Add "isc" template for adding the copyright boilerplate 3. Changes all source files to include copyright and SPDX license header, this includes all the C sources, documentation, zone files, configuration files. There are notes in the doc/dev/copyrights file on how to add correct headers to the new files. 4. Handle the rest that can't be modified via .reuse/dep5 file. The binary (or otherwise unmodifiable) files could have license places next to them in <foo>.license file, but this would lead to cluttered repository and most of the files handled in the .reuse/dep5 file are system test files.	2022-01-11 09:05:02 +01:00
Ondřej Surý	6269fce0fe	Use isc_mem_get_aligned() for isc_queue and cleanup max_threads The isc_queue_new() was using dirty tricks to allocate the head and tail members of the struct aligned to the cacheline. We can now use isc_mem_get_aligned() to allocate the structure to the cacheline directly. Use ISC_OS_CACHELINE_SIZE (64) instead of arbitrary ALIGNMENT (128), one cacheline size is enough to prevent false sharing. Cleanup the unused max_threads variable - there was actually no limit on the maximum number of threads. This was changed a while ago.	2022-01-05 17:10:58 +01:00
Ondřej Surý	57d0fabadd	Stop leaking mutex in nmworker and cond in nm socket On FreeBSD, the pthread primitives are not solely allocated on stack, but part of the object lives on the heap. Missing pthread_*_destroy causes the heap memory to grow and in case of fast lived object it's possible to run out-of-memory. Properly destroy the leaking mutex (worker->lock) and the leaking condition (sock->cond).	2021-12-08 17:58:53 +01:00
Ondřej Surý	20ac73eb22	Improve the logging on failed TCP accept Previously, when TCP accept failed, we have logged a message with ISC_LOG_ERROR level. One common case, how this could happen is that the client hits TCP client quota and is put on hold and when resumed, the client has already given up and closed the TCP connection. In such case, the named would log: TCP connection failed: socket is not connected This message was quite confusing because it actually doesn't say that it's related to the accepting the TCP connection and also it logs everything on the ISC_LOG_ERROR level. Change the log message to "Accepting TCP connection failed" and for specific error states lower the severity of the log message to ISC_LOG_INFO.	2021-12-02 13:50:00 +01:00
Artem Boldariev	f0e18f3927	Add isc_nm_has_encryption() This commit adds an isc_nm_has_encryption() function intended to check if a given handle is backed by a connection which uses encryption.	2021-11-30 12:20:22 +02:00
Artem Boldariev	07cf827b0b	Add isc_nm_socket_type() This commit adds an isc_nm_socket_type() function which can be used to obtain a handle's socket type. This change obsoletes isc_nm_is_tlsdns_handle() and isc_nm_is_http_handle(). However, it was decided to keep the latter as we eventually might end up supporting multiple HTTP versions.	2021-11-30 12:20:22 +02:00
Evan Hunt	7f63ee3bae	address '--disable-doh' failures Change 5756 (GL #2854) introduced build errors when using 'configure --disable-doh'. To fix this, isc_nm_is_http_handle() is now defined in all builds, not just builds that have DoH enabled. Missing code comments were added both for that function and for isc_nm_is_tlsdns_handle().	2021-11-17 13:48:43 -08:00
Artem Boldariev	80482f8d3e	DoH: Add isc_nm_set_min_answer_ttl() This commit adds an isc_nm_set_min_answer_ttl() function which is intended to to be used to give a hint to the underlying transport regarding the answer TTL. The interface is intentionally kept generic because over time more transports might benefit from this functionality, but currently it is intended for DoH to set "max-age" value within "Cache-Control" HTTP header (as recommended in the RFC8484, section 5.1 "Cache Interaction"). It is no-op for other DNS transports for the time being.	2021-11-05 14:14:59 +02:00
Evan Hunt	32b50407bf	check statichandle before attaching it is possible for udp_recv_cb() to fire after the socket is already shutting down and statichandle is NULL; we need to create a temporary handle in this case.	2021-10-18 14:21:04 -07:00
Evan Hunt	a55589f881	remove all references to isc_socket and related types Removed socket.c, socket.h, and all references to isc_socket_t, isc_socketmgr_t, isc_sockevent_t, etc.	2021-10-15 01:01:25 -07:00
Evan Hunt	075139f60e	netmgr: refactor isc__nm_incstats() and isc__nm_decstats() route/netlink sockets don't have stats counters associated with them, so it's now necessary to check whether socket stats exist before incrementing or decrementing them. rather than relying on the caller for this, we now just pass the socket and an index, and the correct stats counter will be updated if it exists.	2021-10-15 00:57:02 -07:00
Evan Hunt	8c51a32e5c	netmgr: add isc_nm_routeconnect() isc_nm_routeconnect() opens a route/netlink socket, then calls a connect callback, much like isc_nm_udpconnect(), with a handle that can then be monitored for network changes. Internally the socket is treated as a UDP socket, since route/netlink sockets follow the datagram contract.	2021-10-15 00:56:58 -07:00
Evan Hunt	8d6bf826c6	netmgr: refactor isc__nm_incstats() and isc__nm_decstats() After support for route/netlink sockets is merged, not all sockets will have stats counters associated with them, so it's now necessary to check whether socket stats exist before incrementing or decrementing them. rather than relying on the caller for this, we now just pass the socket and an index, and the correct stats counter will be updated if it exists.	2021-10-15 00:40:37 -07:00
Artem Boldariev	25b2c6ad96	Require "dot" ALPN token for zone transfer requests over DoT (XoT) This commit makes BIND verify that zone transfers are allowed to be done over the underlying connection. Currently, it makes sense only for DoT, but the code is deliberately made to be protocol-agnostic.	2021-10-05 11:23:47 +03:00
Artem Boldariev	eba3278e52	Add isc_nm_xfr_allowed() function The intention of having this function is to have a predicate to check if a zone transfer could be performed over the given handle. In most cases we can assume that we can do zone transfers over any stream transport except DoH, but this assumption will not work for zone transfers over DoT (XoT), as the RFC9103 requires ALPN to happen, which might not be the case for all deployments of DoT.	2021-10-05 11:23:47 +03:00
Evan Hunt	08ce69a0ea	Rewrite dns_resolver and dns_request to use netmgr timeouts - The `timeout_action` parameter to dns_dispatch_addresponse() been replaced with a netmgr callback that is called when a dispatch read times out. this callback may optionally reset the read timer and resume reading. - Added a function to convert isc_interval to milliseconds; this is used to translate fctx->interval into a value that can be passed to dns_dispatch_addresponse() as the timeout. - Note that netmgr timeouts are accurate to the millisecond, so code to check whether a timeout has been reached cannot rely on microsecond accuracy. - If serve-stale is configured, then a timeout received by the resolver may trigger it to return stale data, and then resume waiting for the read timeout. this is no longer based on a separate stale timer. - The code for canceling requests in request.c has been altered so that it can run asynchronously. - TCP timeout events apply to the dispatch, which may be shared by multiple queries. since in the event of a timeout we have no query ID to use to identify the resp we wanted, we now just send the timeout to the oldest query that was pending. - There was some additional refactoring in the resolver: combining fctx_join() and fctx_try_events() into one function to reduce code duplication, and using fixednames in fetchctx and fetchevent. - Incidental fix: new_adbaddrinfo() can't return NULL anymore, so the code can be simplified.	2021-10-02 11:39:56 -07:00
Ondřej Surý	9ee60e7a17	netmgr fixes needed for dispatch - The read timer must always be stopped when reading stops. - Read callbacks can now call isc_nm_read() again in TCP, TCPDNS and TLSDNS; previously this caused an assertion. - The wrong failure code could be sent after a UDP recv failure because the if statements were in the wrong order. the check for a NULL address needs to be after the check for an error code, otherwise the result will always be set to ISC_R_EOF. - When aborting a read or connect because the netmgr is shutting down, use ISC_R_SHUTTINGDOWN. (ISC_R_CANCELED is now reserved for when the read has been canceled by the caller.) - A new function isc_nmhandle_timer_running() has been added enabling a callback to check whether the timer has been reset after processing a timeout. - Incidental netmgr fix: always use isc__nm_closing() instead of referencing sock->mgr->closing directly - Corrected a few comments that used outdated function names.	2021-10-02 11:39:56 -07:00
Evan Hunt	d9e1ad9e37	Remove reference count REQUIRE in isc_nm_read() Previously isc_nm_read() required references on the handle to be at least 2, under the assumption that it would only ever be called from a connect or accept callback. however, it can also be called from a read callback, in which case the reference count might be only 1.	2021-10-02 11:39:56 -07:00
Mark Andrews	7079829b84	Address use before NULL check warning of uvreq move dereference of uvreq until the after NULL check.	2021-09-28 11:57:47 +10:00
Ondřej Surý	8248da3b83	Preserve the contents of socket buffer on realloc On TCPDNS/TLSDNS read callback, the socket buffer could be reallocated if the received contents would be larger than the buffer. The existing code would not preserve the contents of the existing buffer which lead to the loss of the already received data. This commit changes the isc_mem_put()+isc_mem_get() with isc_mem_reget() to preserve the existing contents of the socket buffer.	2021-09-23 22:36:01 +02:00
Ondřej Surý	8edbd0929f	Use isc_mem_reget() to handle the internal active handle cache The netmgr, has an internal cache for freed active handles. This cache was allocated using isc_mem_allocate()/isc_mem_free() API because it was simpler to reallocate the cache when we needed to grow it. The new isc_mem_reget() function could be used here reducing the need to use isc_mem_allocate() API which is tad bit slower than isc_mem_get() API.	2021-09-23 22:17:15 +02:00
Evan Hunt	fc6f751fbe	replace per-protocol keepalive functions with a common one this commit removes isc__nm_tcpdns_keepalive() and isc__nm_tlsdns_keepalive(); keepalive for these protocols and for TCP will now be set directly from isc_nmhandle_keepalive(). protocols that have an underlying TCP socket (i.e., TLS stream and HTTP), now have protocol-specific routines, called by isc_nmhandle_keeaplive(), to set the keepalive value on the underlying socket.	2021-08-27 10:02:10 -07:00
Evan Hunt	7867b8b57d	enable keepalive when the keepalive EDNS option is seen previously, receiving a keepalive option had no effect on how long named would keep the connection open; there was a place to configure the keepalive timeout but it was never used. this commit corrects that. this also fixes an error in isc__nm_{tcp,tls}dns_keepalive() in which the sense of a REQUIRE test was reversed; previously this error had not been noticed because the functions were not being used.	2021-08-27 09:56:51 -07:00
Ondřej Surý	87d5c8ab7c	Disable the Path MTU Discover on UDP Sockets Instead of disabling the fragmentation on the UDP sockets, we now disable the Path MTU Discovery by setting IP(V6)_MTU_DISCOVER socket option to IP_PMTUDISC_OMIT on Linux and disabling IP(V6)_DONTFRAG socket option on FreeBSD. This option sets DF=0 in the IP header and also ignores the Path MTU Discovery. As additional mitigation on Linux, we recommend setting net.ipv4.ip_no_pmtu_disc to Mode 3: Mode 3 is a hardend pmtu discover mode. The kernel will only accept fragmentation-needed errors if the underlying protocol can verify them besides a plain socket lookup. Current protocols for which pmtu events will be honored are TCP, SCTP and DCCP as they verify e.g. the sequence number or the association. This mode should not be enabled globally but is only intended to secure e.g. name servers in namespaces where TCP path mtu must still work but path MTU information of other protocols should be discarded. If enabled globally this mode could break other protocols.	2021-08-19 07:12:33 +02:00
Artem Boldariev	170cc41d5c	Get rid of some HTTP/2 related types when NGHTTP2 is not available This commit removes definitions of some DoH-related types when libnghttp2 is not available.	2021-08-04 10:32:27 +03:00
Ondřej Surý	a9e6a7ae57	Disable setting the thread affinity It was discovered that setting the thread affinity on both the netmgr and netthread threads lead to inconsistent recursive performance because sometimes the netmgr and netthread threads would compete over single resource and sometimes not. Removing setting the affinity causes a slight dip in the authoritative performance around 5% (the measured range was from 3.8% to 7.8%), but the recursive performance is now consistently good.	2021-07-13 14:48:29 +02:00
Ondřej Surý	29a285a67d	Revert the allocate/free -> get/put change from jemalloc change In the jemalloc merge request, we missed the fact that ah_frees and ah_handles are reallocated which is not compatible with using isc_mem_get() for allocation and isc_mem_put() for deallocation. This commit reverts that part and restores use of isc_mem_allocate() and isc_mem_free().	2021-07-09 18:19:57 +02:00
Ondřej Surý	f487c6948b	Replace locked mempools with memory contexts Current mempools are kind of hybrid structures - they serve two purposes: 1. mempool with a lock is basically static sized allocator with pre-allocated free items 2. mempool without a lock is a doubly-linked list of preallocated items The first kind of usage could be easily replaced with jemalloc small sized arena objects and thread-local caches. The second usage not-so-much and we need to keep this (in libdns:message.c) for performance reasons.	2021-07-09 15:58:02 +02:00
Ondřej Surý	5ab05d1696	Replace isc_mem_allocate() usage with isc_mem_get() in netmgr.c The isc_mem_allocate() comes with additional cost because of the memory tracking. In this commit, we replace the usage with isc_mem_get() because we track the allocated sizes anyway, so it's possible to also replace isc_mem_free() with isc_mem_put().	2021-07-09 15:58:02 +02:00

1 2 3 4 5

201 Commits