mir/bind - bind - Mike's Git repositories

mir/bind

mirror of https://gitlab.isc.org/isc-projects/bind9 synced 2025-08-24 11:08:45 +00:00

Author	SHA1	Message	Date
Ondřej Surý	2e1dd56d0b	Fix the data race in accessing the isc_nm_t timers The following TSAN report about accessing the mgr timers (mgr->init, mgr->idle, mgr->keepalive and mgr->advertised) has been fixed in this commit: ================== WARNING: ThreadSanitizer: data race (pid=2746) Read of size 4 at 0x7b440008a948 by thread T18: #0 isc__nm_tcpdns_read /home/ondrej/Projects/bind9/lib/isc/netmgr/tcpdns.c:849:25 (libisc.so.1706+0x2ba0f) #1 isc_nm_read /home/ondrej/Projects/bind9/lib/isc/netmgr/netmgr.c:1679:3 (libisc.so.1706+0x22258) #2 tcpdns_connect_connect_cb /home/ondrej/Projects/bind9/lib/isc/tests/tcpdns_test.c:363:2 (tcpdns_test+0x4bc5fb) #3 isc__nm_async_connectcb /home/ondrej/Projects/bind9/lib/isc/netmgr/netmgr.c:1816:2 (libisc.so.1706+0x228c9) #4 isc__nm_connectcb /home/ondrej/Projects/bind9/lib/isc/netmgr/netmgr.c:1791:3 (libisc.so.1706+0x22713) #5 tcpdns_connect_cb /home/ondrej/Projects/bind9/lib/isc/netmgr/tcpdns.c:343:2 (libisc.so.1706+0x2d89d) #6 uv__stream_connect /home/ondrej/Projects/tsan/libuv/src/unix/stream.c:1381:5 (libuv.so.1+0x27c18) #7 uv__stream_io /home/ondrej/Projects/tsan/libuv/src/unix/stream.c:1298:5 (libuv.so.1+0x25977) #8 uv__io_poll /home/ondrej/Projects/tsan/libuv/src/unix/linux-core.c:462:11 (libuv.so.1+0x2e795) #9 uv_run /home/ondrej/Projects/tsan/libuv/src/unix/core.c:385:5 (libuv.so.1+0x158ec) #10 nm_thread /home/ondrej/Projects/bind9/lib/isc/netmgr/netmgr.c:530:11 (libisc.so.1706+0x1c94a) Previous write of size 4 at 0x7b440008a948 by main thread: #0 isc_nm_settimeouts /home/ondrej/Projects/bind9/lib/isc/netmgr/netmgr.c:490:12 (libisc.so.1706+0x1dda5) #1 tcpdns_recv_two /home/ondrej/Projects/bind9/lib/isc/tests/tcpdns_test.c:601:2 (tcpdns_test+0x4bad0e) #2 cmocka_run_one_test_or_fixture <null> (libcmocka.so.0+0x70be) #3 __libc_start_main /build/glibc-vjB4T1/glibc-2.28/csu/../csu/libc-start.c:308:16 (libc.so.6+0x2409a) Location is heap block of size 281 at 0x7b440008a840 allocated by main thread: #0 malloc <null> (tcpdns_test+0x42864b) #1 default_memalloc /home/ondrej/Projects/bind9/lib/isc/mem.c:713:8 (libisc.so.1706+0x6d261) #2 mem_get /home/ondrej/Projects/bind9/lib/isc/mem.c:622:8 (libisc.so.1706+0x69b9c) #3 isc___mem_get /home/ondrej/Projects/bind9/lib/isc/mem.c:1044:9 (libisc.so.1706+0x6d379) #4 isc__mem_get /home/ondrej/Projects/bind9/lib/isc/mem.c:2432:10 (libisc.so.1706+0x6889e) #5 isc_nm_start /home/ondrej/Projects/bind9/lib/isc/netmgr/netmgr.c:203:8 (libisc.so.1706+0x1c219) #6 nm_setup /home/ondrej/Projects/bind9/lib/isc/tests/tcpdns_test.c:244:11 (tcpdns_test+0x4baaa4) #7 cmocka_run_one_test_or_fixture <null> (libcmocka.so.0+0x70fd) #8 __libc_start_main /build/glibc-vjB4T1/glibc-2.28/csu/../csu/libc-start.c:308:16 (libc.so.6+0x2409a) Thread T18 'isc-net-0000' (tid=3513, running) created by main thread at: #0 pthread_create <null> (tcpdns_test+0x429e7b) #1 isc_thread_create /home/ondrej/Projects/bind9/lib/isc/pthreads/thread.c:73:8 (libisc.so.1706+0x8476a) #2 isc_nm_start /home/ondrej/Projects/bind9/lib/isc/netmgr/netmgr.c:271:3 (libisc.so.1706+0x1c66a) #3 nm_setup /home/ondrej/Projects/bind9/lib/isc/tests/tcpdns_test.c:244:11 (tcpdns_test+0x4baaa4) #4 cmocka_run_one_test_or_fixture <null> (libcmocka.so.0+0x70fd) #5 __libc_start_main /build/glibc-vjB4T1/glibc-2.28/csu/../csu/libc-start.c:308:16 (libc.so.6+0x2409a) SUMMARY: ThreadSanitizer: data race /home/ondrej/Projects/bind9/lib/isc/netmgr/tcpdns.c:849:25 in isc__nm_tcpdns_read ================== ThreadSanitizer: reported 1 warnings	2020-12-02 10:14:31 +01:00
Ondřej Surý	634bdfb16d	Refactor netmgr and add more unit tests This is a part of the works that intends to make the netmgr stable, testable, maintainable and tested. It contains a numerous changes to the netmgr code and unfortunately, it was not possible to split this into smaller chunks as the work here needs to be committed as a complete works. NOTE: There's a quite a lot of duplicated code between udp.c, tcp.c and tcpdns.c and it should be a subject to refactoring in the future. The changes that are included in this commit are listed here (extensively, but not exclusively): * The netmgr_test unit test was split into individual tests (udp_test, tcp_test, tcpdns_test and newly added tcp_quota_test) * The udp_test and tcp_test has been extended to allow programatic failures from the libuv API. Unfortunately, we can't use cmocka mock() and will_return(), so we emulate the behaviour with #define and including the netmgr/{udp,tcp}.c source file directly. * The netievents that we put on the nm queue have variable number of members, out of these the isc_nmsocket_t and isc_nmhandle_t always needs to be attached before enqueueing the netievent_<foo> and detached after we have called the isc_nm_async_<foo> to ensure that the socket (handle) doesn't disappear between scheduling the event and actually executing the event. * Cancelling the in-flight TCP connection using libuv requires to call uv_close() on the original uv_tcp_t handle which just breaks too many assumptions we have in the netmgr code. Instead of using uv_timer for TCP connection timeouts, we use platform specific socket option. * Fix the synchronization between {nm,async}_{listentcp,tcpconnect} When isc_nm_listentcp() or isc_nm_tcpconnect() is called it was waiting for socket to either end up with error (that path was fine) or to be listening or connected using condition variable and mutex. Several things could happen: 0. everything is ok 1. the waiting thread would miss the SIGNAL() - because the enqueued event would be processed faster than we could start WAIT()ing. In case the operation would end up with error, it would be ok, as the error variable would be unchanged. 2. the waiting thread miss the sock->{connected,listening} = `true` would be set to `false` in the tcp_{listen,connect}close_cb() as the connection would be so short lived that the socket would be closed before we could even start WAIT()ing * The tcpdns has been converted to using libuv directly. Previously, the tcpdns protocol used tcp protocol from netmgr, this proved to be very complicated to understand, fix and make changes to. The new tcpdns protocol is modeled in a similar way how tcp netmgr protocol. Closes: #2194, #2283, #2318, #2266, #2034, #1920 * The tcp and tcpdns is now not using isc_uv_import/isc_uv_export to pass accepted TCP sockets between netthreads, but instead (similar to UDP) uses per netthread uv_loop listener. This greatly reduces the complexity as the socket is always run in the associated nm and uv loops, and we are also not touching the libuv internals. There's an unfortunate side effect though, the new code requires support for load-balanced sockets from the operating system for both UDP and TCP (see #2137). If the operating system doesn't support the load balanced sockets (either SO_REUSEPORT on Linux or SO_REUSEPORT_LB on FreeBSD 12+), the number of netthreads is limited to 1. * The netmgr has now two debugging #ifdefs: 1. Already existing NETMGR_TRACE prints any dangling nmsockets and nmhandles before triggering assertion failure. This options would reduce performance when enabled, but in theory, it could be enabled on low-performance systems. 2. New NETMGR_TRACE_VERBOSE option has been added that enables extensive netmgr logging that allows the software engineer to precisely track any attach/detach operations on the nmsockets and nmhandles. This is not suitable for any kind of production machine, only for debugging. * The tlsdns netmgr protocol has been split from the tcpdns and it still uses the old method of stacking the netmgr boxes on top of each other. We will have to refactor the tlsdns netmgr protocol to use the same approach - build the stack using only libuv and openssl. * Limit but not assert the tcp buffer size in tcp_alloc_cb Closes: #2061	2020-12-01 16:47:07 +01:00
Ondřej Surý	a49d88568f	Turn all the callback to be always asynchronous When calling the high level netmgr functions, the callback would be sometimes called synchronously if we catch the failure directly, or asynchronously if it happens later. The synchronous call to the callback could create deadlocks as the caller would not expect the failed callback to be executed directly.	2020-11-11 22:15:40 +01:00
Ondřej Surý	fa424225af	netmgr: Add additional safeguards to netmgr/tls.c This commit adds couple of additional safeguards against running sends/reads on inactive sockets. The changes was modeled after the changes we made to netmgr/tcpdns.c	2020-11-10 14:17:20 +01:00
Witold Kręcicki	38b78f59a0	Add DoT support to bind Parse the configuration of tls objects into SSL_CTX* objects. Listen on DoT if 'tls' option is setup in listen-on directive. Use DoT/DoH ports for DoT/DoH.	2020-11-10 14:16:55 +01:00
Witold Kręcicki	b2ee0e9dc3	netmgr: server-side TLS support Add server-side TLS support to netmgr - that includes moving some of the isc_nm_ functions from tcp.c to a wrapper in netmgr.c calling a proper tcp or tls function, and a new isc_nm_listentls() function. Add DoT support to tcpdns - isc_nm_listentlsdns().	2020-11-10 14:16:27 +01:00
Evan Hunt	e011521ef1	address some possible shutdown races in xfrin there were two failures during observed in testing, both occurring when 'rndc halt' was run rather than 'rndc stop' - the latter dumps zone contents to disk and presumably introduced enough delay to prevent the races: - a failure when the zone was shut down and called dns_xfrin_detach() before the xfrin had finished connecting; the connect timeout terminated without detaching its handle - a failure when the tcpdns socket timer fired after the outerhandle had already been cleared. this commit incidentally addresses a failure observed in mutexatomic due to a variable having been initialized incorrectly.	2020-11-09 12:33:37 -08:00
Evan Hunt	4be63c5b00	add isc_nmhandle_settimeout() function this function sets the read timeout for the socket associated with a netmgr handle and, if the timer is running, resets it. for TCPDNS sockets it also sets the read timeout and resets the timer on the outer TCP socket.	2020-11-07 20:49:53 +01:00
Ondřej Surý	2191d2bf44	fix nmhandle attach/detach errors in tcpdnsconnect_cb() we need to attach to the statichandle when connecting TCPDNS sockets, same as with UDP.	2020-11-07 20:49:53 +01:00
Ondřej Surý	c14c1fdd2c	Put up additional safe guards to not use inactive/closed tcpdns socket When we are operating on the tcpdns socket, we need to double check whether the socket or its outerhandle or its listener or its mgr is still active and when not, bail out early.	2020-11-02 20:58:00 +01:00
Witold Kręcicki	3ab3d90de0	Fix improper closed connection handling in tcpdns. If dnslisten_readcb gets a read callback it needs to verify that the outer socket wasn't closed in the meantime, and issue a CANCELED callback if it was.	2020-11-02 15:10:28 +01:00
Evan Hunt	5dcdc00b93	add netmgr functions to support outgoing DNS queries - isc_nm_tcpdnsconnect() sets up up an outgoing TCP DNS connection. - isc_nm_tcpconnect(), _udpconnect() and _tcpdnsconnect() now take a timeout argument to ensure connections time out and are correctly cleaned up on failure. - isc_nm_read() now supports UDP; it reads a single datagram and then stops until the next time it's called. - isc_nm_cancelread() now runs asynchronously to prevent assertion failure if reading is interrupted by a non-network thread (e.g. a timeout). - isc_nm_cancelread() can now apply to UDP sockets. - added shim code to support UDP connection in versions of libuv prior to 1.27, when uv_udp_connect() was added all these functions will be used to support outgoing queries in dig, xfrin, dispatch, etc.	2020-10-30 11:11:54 +01:00
Witold Kręcicki	c41ce8e0c9	Properly handle outer TCP connection closed in TCPDNS. If the connection is closed while we're processing the request we might access TCPDNS outerhandle which is already reset. Check for this condition and call the callback with ISC_R_CANCELED result.	2020-10-29 12:32:25 +01:00
Ondřej Surý	8797e5efd5	Fix the data race when read-writing sock->active by using cmpxchg	2020-10-22 11:46:58 -07:00
Ondřej Surý	f7c82e406e	Fix the isc_nm_closedown() to actually close the pending connections 1. The isc__nm_tcp_send() and isc__nm_tcp_read() was not checking whether the socket was still alive and scheduling reads/sends on closed socket. 2. The isc_nm_read(), isc_nm_send() and isc_nm_resumeread() have been changed to always return the error conditions via the callbacks, so they always succeed. This applies to all protocols (UDP, TCP and TCPDNS).	2020-10-22 11:37:16 -07:00
Ondřej Surý	e8b56acb49	Clone the csock in accept_connection(), not in callback If we clone the csock (children socket) in TCP accept_connection() instead of passing the ssock (server socket) to the call back and cloning it there we unbreak the assumption that every socket is handled inside it's own worker thread and therefore we can get rid of (at least) callback locking.	2020-10-08 07:24:31 +02:00
Ondřej Surý	d86a74d8a4	Change the isc__nm_tcpdns_stoplistening() to be asynchronous event The isc__nm_tcpdns_stoplistening() would call isc__nmsocket_clearcb() that would clear the .accept_cb from non-netmgr thread. Change the tcpdns_stoplistening to enqueue ievent that would get processed in the right netmgr thread to avoid locking.	2020-10-08 07:24:31 +02:00
Evan Hunt	dcee985b7f	update all copyright headers to eliminate the typo	2020-09-14 16:20:40 -07:00
Ondřej Surý	89c534d3b9	properly lock the setting/unsetting of callbacks in isc_nmsocket_t changes to socket callback functions were not thread safe.	2020-09-11 12:17:57 -07:00
Evan Hunt	57b4dde974	change from isc_nmhandle_ref/unref to isc_nmhandle attach/detach Attaching and detaching handle pointers will make it easier to determine where and why reference counting errors have occurred. A handle needs to be referenced more than once when multiple asynchronous operations are in flight, so callers must now maintain multiple handle pointers for each pending operation. For example, ns_client objects now contain: - reqhandle: held while waiting for a request callback (query, notify, update) - sendhandle: held while waiting for a send callback - fetchhandle: held while waiting for a recursive fetch to complete - updatehandle: held while waiting for an update-forwarding task to complete control channel connection objects now contain: - readhandle: held while waiting for a read callback - sendhandle: held while waiting for a send callback - cmdhandle: held while an rndc command is running httpd connections contain: - readhandle: held while waiting for a read callback - sendhandle: held while waiting for a send callback	2020-09-11 12:17:57 -07:00
Witold Kręcicki	7eb4564895	assorted small netmgr-related changes - rename isc_nmsocket_t->tcphandle to statichandle - cancelread functions now take handles instead of sockets - add a 'client' flag in socket objects, currently unused, to indicate whether it is to be used as a client or server socket	2020-09-11 10:24:36 -07:00
Evan Hunt	55896df79d	use handles for isc_nm_pauseread() and isc_nm_resumeread() by having these functions act on netmgr handles instead of socket objects, they can be used in callback functions outside the netgmr.	2020-07-13 13:17:08 -07:00
Evan Hunt	23c7373d68	restore "blackhole" functionality the blackhole ACL was accidentally disabled with respect to client queries during the netmgr conversion. in order to make this work for TCP, it was necessary to add a return code to the accept callback functions passed to isc_nm_listentcp() and isc_nm_listentcpdns().	2020-06-30 17:29:09 -07:00
Evan Hunt	591b79b597	Make netmgr tcpdns send calls asynchronous isc__nm_tcpdns_send() was not asynchronous and accessed socket internal fields in an unsafe manner, which could lead to a race condition and subsequent crash. Fix it by moving tcpdns processing to a proper netmgr thread.	2020-06-26 00:19:42 -07:00
Evan Hunt	3704c4fff2	clean up outerhandle when a tcpdns socket is disconnected this prevents a crash when some non-netmgr thread, such as a recursive lookup, times out after the TCP socket is already disconnected.	2020-06-26 00:19:42 -07:00
Evan Hunt	75c985c07f	change the signature of recv callbacks to include a result code this will allow recv event handlers to distinguish between cases in which the region is NULL because of error, shutdown, or cancelation.	2020-06-19 12:33:26 -07:00
Witold Kręcicki	cd79b49538	allow tcpdns sockets to self-reference while connected A TCPDNS socket creates a handle for each complete DNS message. Previously, when all the handles were disconnected, the socket would be closed, but the wrapped TCP socket might still have more to read. Now, when a connection is established, the TCPDNS socket creates a reference to itself by attaching itself to sock->self. This reference isn't cleared until the connection is closed via EOF, timeout, or server shutdown. This allows the socket to remain open even when there are no active handles for it.	2020-06-19 09:39:50 +02:00
Evan Hunt	5ea26ee1f1	modify reference counting within netmgr - isc__nmhandle_get() now attaches to the sock in the nmhandle object. the caller is responsible for dereferencing the original socket pointer when necessary. - tcpdns listener sockets attach sock->outer to the outer tcp listener socket. tcpdns connected sockets attach sock->outerhandle to the handle for the tcp connected socket. - only listener sockets need to be attached/detached directly. connected sockets should only be accessed and reference-counted via their associated handles.	2020-06-19 09:39:50 +02:00
Evan Hunt	9e740cad21	make isc_nmsocket_{attach,detach}{} functions private there is no need for a caller to reference-count socket objects. they need tto be able tto close listener sockets (i.e., those returned by isc_nm_listen{udp,tcp,tcpdns}), and an isc_nmsocket_close() function has been added for that. other sockets are only accessed via handles.	2020-06-19 09:39:50 +02:00
Witold Kręcicki	5fedd21e16	netmgr refactoring: use generic functions when operating on sockets. tcpdns used transport-specific functions to operate on the outer socket. Use generic ones instead, and select the proper call in netmgr.c. Make the missing functions (e.g. isc_nm_read) generic and add type-specific calls (isc__nm_tcp_read). This is the preparation for netmgr TLS layer.	2020-03-24 20:31:43 +00:00
Witold Kręcicki	4b9962d4a3	Only use tcpdns timer if it's initialized.	2020-03-05 23:13:39 +01:00
Witold Kręcicki	ae1499ca19	Fix TCPDNS socket closing issues	2020-03-05 18:02:27 +00:00
Ondřej Surý	5777c44ad0	Reformat using the new rules	2020-02-14 09:31:05 +01:00
Evan Hunt	e851ed0bb5	apply the modified style	2020-02-13 15:05:06 -08:00
Ondřej Surý	f50b1e0685	Use clang-format to reformat the source files	2020-02-12 15:04:17 +01:00
Witold Kręcicki	8d6dc8613a	clean up some handle/client reference counting errors in error cases. We weren't consistent about who should unreference the handle in case of network error. Make it consistent so that it's always the client code responsibility to unreference the handle - either in the callback or right away if send function failed and the callback will never be called.	2020-01-20 22:28:36 +01:00
Witold Kręcicki	eda4300bbb	netmgr: have a single source of truth for tcpdns callback We pass interface as an opaque argument to tcpdns listening socket. If we stop listening on an interface but still have in-flight connections the opaque 'interface' is not properly reference counted, and we might hit a dead memory. We put just a single source of truth in a listening socket and make the child sockets use that instead of copying the value from listening socket. We clean the callback when we stop listening.	2020-01-15 17:22:13 +01:00
Evan Hunt	80a5c9f5c8	associate socket stats counters with netmgr socket objects - the socket stat counters have been moved from socket.h to stats.h. - isc_nm_t now attaches to the same stats counter group as isc_socketmgr_t, so that both managers can increment the same set of statistics - isc__nmsocket_init() now takes an interface as a paramter so that the address family can be determined when initializing the socket. - based on the address family and socket type, a group of statistics counters will be associated with the socket - for example, UDP4Active with IPv4 UDP sockets and TCP6Active with IPv6 TCP sockets. note that no counters are currently associated with TCPDNS sockets; those stats will be handled by the underlying TCP socket. - the counters are not actually used by netmgr sockets yet; counter increment and decrement calls will be added in a later commit.	2020-01-13 14:05:02 -08:00
Diego Fronza	ed9853e739	Fix tcp-highwater stats updating After the network manager rewrite, tcp-higwater stats was only being updated when a valid DNS query was received over tcp. It turns out tcp-quota is updated right after a tcp connection is accepted, before any data is read, so in the event that some client connect but don't send a valid query, it wouldn't be taken into account to update tcp-highwater stats, that is wrong. This commit fix tcp-highwater to update its stats whenever a tcp connection is established, independent of what happens after (timeout/invalid request, etc).	2019-12-12 11:23:10 -08:00
Evan Hunt	31b3980ef0	shorten some names reduce line breaks and general unwieldiness by changing some function, type, and parameter names.	2019-12-09 21:44:04 +01:00
Witold Kręcicki	a34ced776e	Remove read callback before detaching from inner socket in tcpdns	2019-12-09 21:44:04 +01:00
Witold Kręcicki	b0779cc429	netmgr: Add more DbC checks for asynchronous calls.	2019-12-09 21:44:04 +01:00
Witold Kręcicki	3e66b7ba1c	Fix a race in tcpdns close with uv_close on timer stop timers before closing netmgr: tcpdns_close needs to be asynchronous, it manipulates sock->timer	2019-12-09 21:43:45 +01:00
Evan Hunt	b05194160b	style, comments	2019-12-09 11:15:27 -08:00
Witold Kręcicki	8c5aaacbef	- Add separate priority event queue for events that must be processed even when worker is paused (e.g. interface reconfiguration). This is needed to prevent deadlocks when reconfiguring interfaces - as network manager is paused then, but we still need to stop/start listening. - Proper handling of TCP listen errors in netmgr - bind to the socket first, then return the error code.	2019-12-09 11:15:27 -08:00
Witold Kręcicki	5a65ec0aff	Add uv_handle_{get,set}_data functions that's absent in pre-1.19 libuv to make code clearer. This might be removed when we stop supporting older libuv versions.	2019-12-09 11:15:27 -08:00
Witold Kręcicki	37354ee225	netmgr: fix TCP backlog and client quota count - add support for TCP backlog, using the value provided by config. - don't attach to TCP client quota for listening sockets, only connected sockets.	2019-11-22 16:46:32 -08:00
Evan Hunt	199bd6b623	netmgr: make TCP timeouts configurable - restore support for tcp-initial-timeout, tcp-idle-timeout, tcp-keepalive-timeout and tcp-advertised-timeout configuration options, which were ineffective previously.	2019-11-22 16:46:31 -08:00
Witold Kręcicki	b7a72b1667	netmgr: TCP improvements - add timeout support for TCP and TCPDNS connections to protect against slowloris style attacks. currently, all timeouts are hard-coded. - rework and simplify the TCPDNS state machine.	2019-11-22 16:46:31 -08:00
Evan Hunt	123ee350dc	place a limit on pipelined queries that can be processed simultaneously when the TCPDNS_CLIENTS_PER_CONN limit has been exceeded for a TCP DNS connection, switch to sequential mode to ensure that memory cannot be exhausted by too many simultaneous queries.	2019-11-17 18:59:39 -08:00

1 2 3

101 Commits