mir/bind - bind - Mike's Git repositories

mir/bind

mirror of https://gitlab.isc.org/isc-projects/bind9 synced 2025-08-28 13:08:06 +00:00

Author	SHA1	Message	Date
Ondřej Surý	634bdfb16d	Refactor netmgr and add more unit tests This is a part of the works that intends to make the netmgr stable, testable, maintainable and tested. It contains a numerous changes to the netmgr code and unfortunately, it was not possible to split this into smaller chunks as the work here needs to be committed as a complete works. NOTE: There's a quite a lot of duplicated code between udp.c, tcp.c and tcpdns.c and it should be a subject to refactoring in the future. The changes that are included in this commit are listed here (extensively, but not exclusively): * The netmgr_test unit test was split into individual tests (udp_test, tcp_test, tcpdns_test and newly added tcp_quota_test) * The udp_test and tcp_test has been extended to allow programatic failures from the libuv API. Unfortunately, we can't use cmocka mock() and will_return(), so we emulate the behaviour with #define and including the netmgr/{udp,tcp}.c source file directly. * The netievents that we put on the nm queue have variable number of members, out of these the isc_nmsocket_t and isc_nmhandle_t always needs to be attached before enqueueing the netievent_<foo> and detached after we have called the isc_nm_async_<foo> to ensure that the socket (handle) doesn't disappear between scheduling the event and actually executing the event. * Cancelling the in-flight TCP connection using libuv requires to call uv_close() on the original uv_tcp_t handle which just breaks too many assumptions we have in the netmgr code. Instead of using uv_timer for TCP connection timeouts, we use platform specific socket option. * Fix the synchronization between {nm,async}_{listentcp,tcpconnect} When isc_nm_listentcp() or isc_nm_tcpconnect() is called it was waiting for socket to either end up with error (that path was fine) or to be listening or connected using condition variable and mutex. Several things could happen: 0. everything is ok 1. the waiting thread would miss the SIGNAL() - because the enqueued event would be processed faster than we could start WAIT()ing. In case the operation would end up with error, it would be ok, as the error variable would be unchanged. 2. the waiting thread miss the sock->{connected,listening} = `true` would be set to `false` in the tcp_{listen,connect}close_cb() as the connection would be so short lived that the socket would be closed before we could even start WAIT()ing * The tcpdns has been converted to using libuv directly. Previously, the tcpdns protocol used tcp protocol from netmgr, this proved to be very complicated to understand, fix and make changes to. The new tcpdns protocol is modeled in a similar way how tcp netmgr protocol. Closes: #2194, #2283, #2318, #2266, #2034, #1920 * The tcp and tcpdns is now not using isc_uv_import/isc_uv_export to pass accepted TCP sockets between netthreads, but instead (similar to UDP) uses per netthread uv_loop listener. This greatly reduces the complexity as the socket is always run in the associated nm and uv loops, and we are also not touching the libuv internals. There's an unfortunate side effect though, the new code requires support for load-balanced sockets from the operating system for both UDP and TCP (see #2137). If the operating system doesn't support the load balanced sockets (either SO_REUSEPORT on Linux or SO_REUSEPORT_LB on FreeBSD 12+), the number of netthreads is limited to 1. * The netmgr has now two debugging #ifdefs: 1. Already existing NETMGR_TRACE prints any dangling nmsockets and nmhandles before triggering assertion failure. This options would reduce performance when enabled, but in theory, it could be enabled on low-performance systems. 2. New NETMGR_TRACE_VERBOSE option has been added that enables extensive netmgr logging that allows the software engineer to precisely track any attach/detach operations on the nmsockets and nmhandles. This is not suitable for any kind of production machine, only for debugging. * The tlsdns netmgr protocol has been split from the tcpdns and it still uses the old method of stacking the netmgr boxes on top of each other. We will have to refactor the tlsdns netmgr protocol to use the same approach - build the stack using only libuv and openssl. * Limit but not assert the tcp buffer size in tcp_alloc_cb Closes: #2061	2020-12-01 16:47:07 +01:00
Witold Kręcicki	f68fe9ff14	Fix a startup/shutdown crash in ns_clientmgr_create	2020-11-10 14:17:05 +01:00
Mark Andrews	b09727a765	Implement DNSTAP support in ns_client_sendraw() ns_client_sendraw() is currently only used to relay UPDATE responses back to the client. dns_dt_send() is called with this assumption.	2020-11-10 06:15:46 +00:00
Ondřej Surý	f7c82e406e	Fix the isc_nm_closedown() to actually close the pending connections 1. The isc__nm_tcp_send() and isc__nm_tcp_read() was not checking whether the socket was still alive and scheduling reads/sends on closed socket. 2. The isc_nm_read(), isc_nm_send() and isc_nm_resumeread() have been changed to always return the error conditions via the callbacks, so they always succeed. This applies to all protocols (UDP, TCP and TCPDNS).	2020-10-22 11:37:16 -07:00
Ondřej Surý	33eefe9f85	The dns_message_create() cannot fail, change the return to void The dns_message_create() function cannot soft fail (as all memory allocations either succeed or cause abort), so we change the function to return void and cleanup the calls.	2020-09-29 08:22:08 +02:00
Diego Fronza	12d6d13100	Refactored dns_message_t for using attach/detach semantics This commit will be used as a base for the next code updates in order to have a better control of dns_message_t objects' lifetime.	2020-09-29 08:22:08 +02:00
Evan Hunt	dcee985b7f	update all copyright headers to eliminate the typo	2020-09-14 16:20:40 -07:00
Evan Hunt	57b4dde974	change from isc_nmhandle_ref/unref to isc_nmhandle attach/detach Attaching and detaching handle pointers will make it easier to determine where and why reference counting errors have occurred. A handle needs to be referenced more than once when multiple asynchronous operations are in flight, so callers must now maintain multiple handle pointers for each pending operation. For example, ns_client objects now contain: - reqhandle: held while waiting for a request callback (query, notify, update) - sendhandle: held while waiting for a send callback - fetchhandle: held while waiting for a recursive fetch to complete - updatehandle: held while waiting for an update-forwarding task to complete control channel connection objects now contain: - readhandle: held while waiting for a read callback - sendhandle: held while waiting for a send callback - cmdhandle: held while an rndc command is running httpd connections contain: - readhandle: held while waiting for a read callback - sendhandle: held while waiting for a send callback	2020-09-11 12:17:57 -07:00
Mark Andrews	20488d6ad3	Map DNS_R_BADTSIG to FORMERR Now that the log message has been printed set the result code to DNS_R_FORMERR. We don't do this via dns_result_torcode() as we don't want upstream errors to produce FORMERR if that processing end with DNS_R_BADTSIG.	2020-08-04 12:20:37 +00:00
Evan Hunt	23c7373d68	restore "blackhole" functionality the blackhole ACL was accidentally disabled with respect to client queries during the netmgr conversion. in order to make this work for TCP, it was necessary to add a return code to the accept callback functions passed to isc_nm_listentcp() and isc_nm_listentcpdns().	2020-06-30 17:29:09 -07:00
Mark Andrews	abe2c84b1d	Suppress cppcheck warnings: cppcheck-suppress objectIndex cppcheck-suppress nullPointerRedundantCheck	2020-06-25 12:04:36 +10:00
Evan Hunt	75c985c07f	change the signature of recv callbacks to include a result code this will allow recv event handlers to distinguish between cases in which the region is NULL because of error, shutdown, or cancelation.	2020-06-19 12:33:26 -07:00
Mark Andrews	924e141a15	Adjust NS_CLIENT_TCP_BUFFER_SIZE and cleanup client_allocsendbuf NS_CLIENT_TCP_BUFFER_SIZE was 2 byte too large following the move to netmgr add associated changes to lib/ns/client.c and as a result an INSIST could be trigger if the DNS message being constructed had a checkpoint stage that fell in those two extra bytes. Adjusted NS_CLIENT_TCP_BUFFER_SIZE and cleaned up client_allocsendbuf now that the previously reserved 2 bytes are no longer used.	2020-06-18 09:59:19 +02:00
Evan Hunt	249184e03e	add a quick-and-dirty method of debugging a single query when built with "configure --enable-singletrace", named will produce detailed query logging at the highest debug level for any query with query ID zero. this enables monitoring of the progress of a single query by specifying the QID using "dig +qid=0". the "client" logging category should be set to a low severity level to suppress logging of other queries. (the chance of another query using QID=0 at the same time is only 1 in 2^16.) "--enable-singletrace" turns on "--enable-querytrace" as well, so if the logging severity is not lowered, all other queries will be logged verbosely as well. compiling with either of these options will impair query performance; they should only be turned on when testing or troubleshooting.	2020-05-26 00:47:18 -07:00
Evan Hunt	57e54c46e4	change "expr == false" to "!expr" in conditionals	2020-05-25 16:09:57 -07:00
Evan Hunt	68a1c9d679	change 'expr == true' to 'expr' in conditionals	2020-05-25 16:09:57 -07:00
Ondřej Surý	78886d4bed	Fix the statistic counter underflow in ns_client_t In case of normal fetch, the .recursionquota is attached and ns_statscounter_recursclients is incremented when the fetch is created. Then the .recursionquota is detached and the counter decremented in the fetch_callback(). In case of prefetch or rpzfetch, the quota is attached, but the counter is not incremented. When we reach the soft-quota, the function returns early but don't detach from the quota, and it gets destroyed during the ns_client_endrequest(), so no memory was leaked. But because the ns_statscounter_recursclients is only incremented during the normal fetch the counter would be incorrectly decremented on two occassions: 1) When we reached the softquota, because the quota was not properly detached 2) When the prefetch or rpzfetch was cancelled mid-flight and the callback function was never called.	2020-04-03 19:41:46 +02:00
Witold Kręcicki	df3dbdff81	Destroy query in killoldestquery under a lock. Fixes a race between ns_client_killoldestquery and ns_client_endrequest - killoldestquery takes a client from `recursing` list while endrequest destroys client object, then killoldestquery works on a destroyed client object. Prevent it by holding reclist lock while cancelling query.	2020-03-05 08:13:50 +00:00
Evan Hunt	0b76d8a490	comments	2020-02-28 08:46:16 +01:00
Witold Kręcicki	4b6a064972	Don't define NS_CLIENT_TRACE by default	2020-02-28 08:46:16 +01:00
Witold Kręcicki	8c6c07286f	Remove some stale fields from ns_client_t; make sendbuf allocated on heap	2020-02-28 08:46:16 +01:00
Witold Kręcicki	938b61405b	Don't check if the client is on recursing list (requires locking) if it's not RECURSING	2020-02-28 08:46:16 +01:00
Witold Kręcicki	b0888ff039	Don't issue ns_client_endrequest on a NS_CLIENTSTATE_READY client. Fix a potential assertion failure on shutdown in ns__client_endrequest. Scenario: 1. We are shutting down, interface->clientmgr is gone. 2. We receive a packet, it gets through ns__client_request 3. mgr == NULL, return 4. isc_nmhandle_detach calls ns_client_reset_cb 5. ns_client_reset_cb calls ns_client_endrequest 6. INSIST(client->state == NS_CLIENTSTATE_WORKING \|\| client->state == NS_CLIENTSTATE_RECURSING) is not met - we haven't started processing this packet so client->state == NS_CLIENTSTATE_READY. As a solution - don't do anything in ns_client_reset_cb if the client is still in READY state.	2020-02-26 12:15:01 +00:00
Witold Kręcicki	952f7b503d	Use thread-friendly mctxpool and taskpool in ns_client. Make ns_client mctxpool more thread-friendly by sharding it by netmgr threadid, use task pool also sharded by thread id to avoid lock contention.	2020-02-18 10:31:13 +01:00
Ondřej Surý	5777c44ad0	Reformat using the new rules	2020-02-14 09:31:05 +01:00
Evan Hunt	e851ed0bb5	apply the modified style	2020-02-13 15:05:06 -08:00
Ondřej Surý	056e133c4c	Use clang-tidy to add curly braces around one-line statements The command used to reformat the files in this commit was: ./util/run-clang-tidy \ -clang-tidy-binary clang-tidy-11 -clang-apply-replacements-binary clang-apply-replacements-11 \ -checks=-,readability-braces-around-statements \ -j 9 \ -fix \ -format \ -style=file \ -quiet clang-format -i --style=format $(git ls-files '.c' '.h') uncrustify -c .uncrustify.cfg --replace --no-backup $(git ls-files '.c' '.h') clang-format -i --style=format $(git ls-files '.c' '*.h')	2020-02-13 22:07:21 +01:00
Ondřej Surý	f50b1e0685	Use clang-format to reformat the source files	2020-02-12 15:04:17 +01:00
Ondřej Surý	bc1d4c9cb4	Clear the pointer to destroyed object early using the semantic patch Also disable the semantic patch as the code needs tweaks here and there because some destroy functions might not destroy the object and return early if the object is still in use.	2020-02-09 18:00:17 -08:00
Ondřej Surý	41fe9b7a14	Formatting issues found by local coccinelle run	2020-02-08 03:12:09 -08:00
Ondřej Surý	c73e5866c4	Refactor the isc_buffer_allocate() usage using the semantic patch The isc_buffer_allocate() function now cannot fail with ISC_R_MEMORY. This commit removes all the checks on the return code using the semantic patch from previous commit, as isc_buffer_allocate() now returns void.	2020-02-03 08:29:00 +01:00
Witold Kręcicki	8d6dc8613a	clean up some handle/client reference counting errors in error cases. We weren't consistent about who should unreference the handle in case of network error. Make it consistent so that it's always the client code responsibility to unreference the handle - either in the callback or right away if send function failed and the callback will never be called.	2020-01-20 22:28:36 +01:00
Witold Kręcicki	dcc0835a3a	cleanup properly if we fail to initialize ns_client structure If taskmgr is shutting down ns_client_setup will fail to create a task for the newly created client, we weren't cleaning up already created/attached things (memory context, server, clientmgr).	2020-01-20 22:28:36 +01:00
Witold Kręcicki	eda4300bbb	netmgr: have a single source of truth for tcpdns callback We pass interface as an opaque argument to tcpdns listening socket. If we stop listening on an interface but still have in-flight connections the opaque 'interface' is not properly reference counted, and we might hit a dead memory. We put just a single source of truth in a listening socket and make the child sockets use that instead of copying the value from listening socket. We clean the callback when we stop listening.	2020-01-15 17:22:13 +01:00
Ondřej Surý	7c3e342935	Use isc_refcount_increment0() where appropriate	2020-01-14 13:12:13 +01:00
Ondřej Surý	fbf9856f43	Add isc_refcount_destroy() as appropriate	2020-01-14 13:12:13 +01:00
Diego Fronza	ed9853e739	Fix tcp-highwater stats updating After the network manager rewrite, tcp-higwater stats was only being updated when a valid DNS query was received over tcp. It turns out tcp-quota is updated right after a tcp connection is accepted, before any data is read, so in the event that some client connect but don't send a valid query, it wouldn't be taken into account to update tcp-highwater stats, that is wrong. This commit fix tcp-highwater to update its stats whenever a tcp connection is established, independent of what happens after (timeout/invalid request, etc).	2019-12-12 11:23:10 -08:00
Evan Hunt	715afa9c57	add a stats counter for clients dropped due to recursive-clients limit	2019-11-26 17:55:06 +00:00
Evan Hunt	199bd6b623	netmgr: make TCP timeouts configurable - restore support for tcp-initial-timeout, tcp-idle-timeout, tcp-keepalive-timeout and tcp-advertised-timeout configuration options, which were ineffective previously.	2019-11-22 16:46:31 -08:00
Evan Hunt	123ee350dc	place a limit on pipelined queries that can be processed simultaneously when the TCPDNS_CLIENTS_PER_CONN limit has been exceeded for a TCP DNS connection, switch to sequential mode to ensure that memory cannot be exhausted by too many simultaneous queries.	2019-11-17 18:59:39 -08:00
Ondřej Surý	e95af30b23	Make lib/ns Thread Sanitizer clean	2019-11-17 17:42:41 -08:00
Evan Hunt	b9a5508e52	remove ISC_QUEUE as it is no longer used	2019-11-07 11:55:37 -08:00
Evan Hunt	53f0b6c34d	convert ns_client and related objects to use netmgr - ns__client_request() is now called by netmgr with an isc_nmhandle_t parameter. The handle can then be permanently associated with an ns_client object. - The task manager is paused so that isc_task events that may be triggred during client processing will not fire until after the netmgr is finished with it. Before any asynchronous event, the client MUST call isc_nmhandle_ref(client->handle), to prevent the client from being reset and reused while waiting for an event to process. When the asynchronous event is complete, isc_nmhandle_unref(client->handle) must be called to ensure the handle can be reused later. - reference counting of client objects is now handled in the nmhandle object. when the handle references drop to zero, the client's "reset" callback is used to free temporary resources and reiniialize it, whereupon the handle (and associated client) is placed in the "inactive handles" queue. when the sysstem is shutdown and the handles are cleaned up, the client's "put" callback is called to free all remaining resources. - because client allocation is no longer handled in the same way, the '-T clienttest' option has now been removed and is no longer used by any system tests. - the unit tests require wrapping the isc_nmhandle_unref() function; when LD_WRAP is supported, that is used. otherwise we link a libwrap.so interposer library and use that.	2019-11-07 11:55:37 -08:00
Evan Hunt	64e1a4a398	temporarily move ISC_QUEUE to list.h The double-locked queue implementation is still currently in use in ns_client, but will be replaced by a fetch-and-add array queue. This commit moves it from queue.h to list.h so that queue.h can be used for the new data structure, and clean up dependencies between list.h and types.h. Later, when the ISC_QUEUE is no longer is use, it will be removed completely.	2019-11-07 11:55:37 -08:00
Diego Fronza	66fe8627de	Added TCP high-water statistics variable This variable will report the maximum number of simultaneous tcp clients that BIND has served while running. It can be verified by running rndc status, then inspect "tcp high-water: count", or by generating statistics file, rndc stats, then inspect the line with "TCP connection high-water" text. The tcp-highwater variable is atomically updated based on an existing tcp-quota system handled in ns/client.c.	2019-11-06 09:18:27 +01:00
Ondřej Surý	b4a42a286f	lib/ns/client.c: Fix invalid order of DbC checks that could cause dereference before NULL check	2019-10-03 09:04:27 +02:00
Mark Andrews	b59fe46e76	address or suppress cppcheck warnings	2019-09-12 17:59:28 +10:00
Ondřej Surý	4957255d13	Use the semantic patch to change the usage isc_mem_create() to new API	2019-09-12 09:26:09 +02:00
Ondřej Surý	ae83801e2b	Remove blocks checking whether isc_mem_get() failed using the coccinelle	2019-07-23 15:32:35 -04:00
Ondřej Surý	a912f31398	Add new default siphash24 cookie algorithm, but keep AES as legacy This commit changes the BIND cookie algorithms to match draft-sury-toorop-dnsop-server-cookies-00. Namely, it changes the Client Cookie algorithm to use SipHash 2-4, adds the new Server Cookie algorithm using SipHash 2-4, and changes the default for the Server Cookie algorithm to be siphash24. Add siphash24 cookie algorithm, and make it keep legacy aes as	2019-07-21 15:16:28 -04:00

1 2 3 4 5

207 Commits