mir/bind - bind - Mike's Git repositories

mir/bind

mirror of https://gitlab.isc.org/isc-projects/bind9 synced 2025-08-29 05:28:00 +00:00

Author	SHA1	Message	Date
Mark Andrews	66d1df57cb	Report which assertion failed when calling set_global_error	2021-06-03 11:55:31 +10:00
Ondřej Surý	f14d870d15	Fix copy&paste error in setsockopt_off Because of copy&paste error the setsockopt_off macro would enable the socket option instead of disabling it.	2021-06-02 17:47:14 +02:00
Ondřej Surý	67afea6cfc	Cleanup the remaining of HAVE_UV_<func> macros While cleaning up the usage of HAVE_UV_<func> macros, we forgot to cleanup the HAVE_UV_UDP_CONNECT in the actual code and HAVE_UV_TRANSLATE_SYS_ERROR and this was causing Windows build to fail on uv_udp_send() because the socket was already connected and we were falsely assuming that it was not. The platforms with autoconf support were not affected, because we were still checking for the functions from the configure.	2021-06-02 11:23:36 +02:00
Artem Boldariev	35d0027f36	HTTP/2 write buffering This commit adds the ability to consolidate HTTP/2 write requests if there is already one in flight. If it is the case, the code will consolidate multiple subsequent write request into a larger one allowing to utilise the network in a more efficient way by creating larger TCP packets as well as by reducing TLS records overhead (by creating large TLS records instead of multiple small ones). This optimisation is especially efficient for clients, creating many concurrent HTTP/2 streams over a transport connection at once. This way, the code might create a small amount of multi-kilobyte requests instead of many 50-120 byte ones. In fact, it turned out to work so well that I had to add a work-around to the code to ensure compatibility with the flamethrower, which, at the time of writing, does not support TLS records larger than two kilobytes. Now the code tries to flush the write buffer after 1.5 kilobyte, which is still pretty adequate for our use case. Essentially, this commit implements a recommendation given by nghttp2 library: https://nghttp2.org/documentation/nghttp2_session_mem_send.html	2021-06-01 21:07:45 +03:00
Ondřej Surý	7670f98377	Add isc_task_getnetmgr() function Add a function to pull the attached netmgr from inside the executed task. This is needed for any task that needs to call the netmgr API.	2021-05-31 14:52:05 +02:00
Ondřej Surý	87fe97ed91	Add asynchronous work API to the network manager The libuv has a support for running long running tasks in the dedicated threadpools, so it doesn't affect networking IO. This commit adds isc_nm_work_enqueue() wrapper that would wraps around the libuv API and runs it on top of associated worker loop. The only limitation is that the function must be called from inside network manager thread, so the call to the function should be wrapped inside a (bound) task.	2021-05-31 14:52:05 +02:00
Ondřej Surý	211bfefbaa	Use UV_VERSION_HEX to decide whether we need libuv shim functions Instead of having a configure check for every missing function that has been added in later version of libuv, we now use UV_VERSION_HEX to decide whether we need the shim or not.	2021-05-31 14:52:05 +02:00
Ondřej Surý	7477d1b2ed	Add uv_os_getenv() and uv_os_setenv() compatibility shims The uv_os_getenv() and uv_os_setenv() functions were introduced in the libuv >= 1.12.0. Add simple compatibility shims for older versions.	2021-05-31 14:52:05 +02:00
Ondřej Surý	f752840db3	Add uv_req_get_data() and uv_req_set_data() compatibility shims The uv_req_get_data() and uv_req_set_data() functions were introduced in libuv >= 1.19.0, so we need to add compatibility shims with older libuv versions.	2021-05-31 14:52:05 +02:00
Mark Andrews	d68b009cfe	Remove priority from attribute constructor/destructor On some platforms, the __attribute__ constructor and destructor won't take priorities and the compilation failed. On such platform would be macOS. For this reason, the constructor/destructor in the libisc was reworked to not use priorities, but have a single constructor and destructor that calls the appropriate routines in correct order. This commit removes the extra priority because it's now not needed and it also breaks a compilation on macOS with GCC 10.	2021-05-27 08:02:21 +02:00
Mark Andrews	715a2c7fc1	Add missing initialisations configuring with --enable-mutex-atomics flagged these incorrectly initialised variables on systems where pthread_mutex_init doesn't just zero out the structure.	2021-05-26 08:15:08 +00:00
Ondřej Surý	a227562f13	Cleanup the struct isc_nmiface In previous MR, I forgot to remove the `struct isc_nmiface`, this commit rectifies that.	2021-05-26 09:55:10 +02:00
Ondřej Surý	50270de8a0	Refactor the interface handling in the netmgr The isc_nmiface_t type was holding just a single isc_sockaddr_t, so we got rid of the datatype and use plain isc_sockaddr_t in place where isc_nmiface_t was used before. This means less type-casting and shorter path to access isc_sockaddr_t members. At the same time, instead of keeping the reference to the isc_sockaddr_t that was passed to us when we start listening, we will keep a local copy. This prevents the data race on destruction of the ns_interface_t objects where pending nmsockets could reference the sockaddr of already destroyed ns_interface_t object.	2021-05-26 09:43:12 +02:00
Ondřej Surý	28b65d8256	Reduce the number of clientmgr objects created Previously, as a way of reducing the contention between threads a clientmgr object would be created for each interface/IP address. We tasks being more strictly bound to netmgr workers, this is no longer needed and we can just create clientmgr object per worker queue (ncpus). Each clientmgr object than would have a single task and single memory context.	2021-05-24 20:44:54 +02:00
Ondřej Surý	4db5e30177	Run shutdown events with the task's existing threadid Previously, task->threadid was reassigned to 0 while shutting down, which caused an assertion.	2021-05-24 20:02:20 +02:00
Ondřej Surý	0be7ea78be	Reduce the number of client tasks and bind them to netmgr queues Since a client object is bound to a netmgr handle, each client will always be processed by the same netmgr worker, so we can simplify the code by binding client->task to the same thread as the client. Since ns__client_request() now runs in the same event loop as client->task events, is no longer necessary to pause the task manager before launching them. Also removed some functions in isc_task that were not used.	2021-05-24 20:02:20 +02:00
Artem Boldariev	67c50abe5a	Add DoH quota tests This commit adds unit tests which ensure that DoH code is compatible with quota functionality.	2021-05-19 10:28:47 +03:00
Mark Andrews	7e83c6df94	initialise worker->cond_prio	2021-05-18 07:47:42 +00:00
Ondřej Surý	9e3cb396b2	Replace netmgr quantum with loop-preventing barrier Instead of using fixed quantum, this commit adds atomic counter for number of items on each queue and uses the number of netievents scheduled to run as the limit of maximum number of netievents for a single process_queue() run. This prevents the endless loops when the netievent would schedule more netievents onto the same loop, but we don't have to pick "magic" number for the quantum.	2021-05-17 11:59:19 +02:00
Ondřej Surý	4509089419	Add configuration option to set send/recv buffers on the nm sockets This commit adds a new configuration option to set the receive and send buffer sizes on the TCP and UDP netmgr sockets. The default is `0` which doesn't set any value and just uses the value set by the operating system. There's no magic value here - set it too small and the performance will drop, set it too large, the buffers can fill-up with queries that have already timeouted on the client side and nobody is interested for the answer and this would just make the server clog up even more by making it produce useless work. The `netstat -su` can be used on POSIX systems to monitor the receive and send buffer errors.	2021-05-17 08:47:09 +02:00
Ondřej Surý	cd413234f7	Fix the outgoing UDP socket selection on Windows The outgoing UDP socket selection would pick unintialized children socket on Windows, because we have more netmgr workers than we have listening sockets. This commit fixes the selection by keeping the outgoing socket the same, so it's always run on existing socket.	2021-05-13 15:04:48 +02:00
Artem Boldariev	bab9309231	Fix DoH unit tests logic This commit fixes logic bugs in DoH test suite revealed by making DoH not to call nghttp2_session_terminate_session() in server-side code.	2021-05-13 10:42:25 +03:00
Artem Boldariev	6816a741ca	Fix crash in TLS caused by improper handling of shutdown messages The problem was found when flamethrower was accidentally run in DoT mode against DoH port.	2021-05-13 10:42:25 +03:00
Artem Boldariev	1947f6372d	Limit the number of active concurrent HTTP/2 streams The initial intent was to limit the number of concurrent streams by the value of 100 but due to the error when reading the documentation it was set to the maximum possible number of streams per session. This could lead to security issues, e.g. a remote attacker could have taken down the BIND instance by creating lots of sessions via low number of transport connections. This commit fixes that.	2021-05-13 10:42:25 +03:00
Artem Boldariev	d80d1b0dd9	Do not allow empty DoH endpoints to be added It was possible to specify empty DoH endpoint in BIND's configuration file: that was an error, we should not allow doing so.	2021-05-13 10:42:25 +03:00
Artem Boldariev	9155a87528	Do not call nghttp2_session_terminate_session() in server-side code We should not call nghttp2_session_terminate_session() in server-side code after all of the active HTTP/2 streams are processed. The underlying transport connection is expected to remain opened at least for some time in this case for new HTTP/2 requests to arrive. That is what flamethrower was expecting and it makes perfect sense from the HTTP/2 perspective.	2021-05-13 10:42:25 +03:00
Mark Andrews	0f6ae9000a	initalise sock->cond	2021-05-11 14:06:26 +02:00
Ondřej Surý	3713a38689	Bump the netmgr quantum to 1024 During the stress testing, it was discovered that the default netmgr quantum of 128 is not enough and there was a performance drop for TCP on FreeBSD. Bumping the default quantum to 1024 solves the performance issue and is still enough to prevent the endless loops.	2021-05-10 21:32:31 +02:00
Ondřej Surý	e623c12757	Destroy reference to taskmgr after all tasks are done We were clearing the pointer to taskmgr as soon as isc_taskmgr_destroy() would be called and before all tasks were finished. Unfortunately, some tasks would use global named_g_taskmgr objects from inside the events and this would cause either a data race or NULL pointer dereference. This commit fixes the data race by moving the destruction of the referenced pointer to the time after all tasks are finished.	2021-05-10 12:13:27 -07:00
Ondřej Surý	6c57a6cc3d	Add isc_taskmgr_detach when task is created while shutting down When taskmgr is shutting down, the creating the task would attach to the taskmgr, but don't detach on error condition.	2021-05-10 11:39:51 +02:00
Ondřej Surý	0133096c88	improvements to socket_test - be more strict, but patient, waiting for event completion. - use an atomic pointer for the socket to silence TSAN warnings.	2021-05-07 14:28:33 -07:00
Ondřej Surý	365c6a9851	ensure interlocked netmgr events run on worker[0] Network manager events that require interlock (pause, resume, listen) are now always executed in the same worker thread, mgr->workers[0], to prevent races. "stoplistening" events no longer require interlock.	2021-05-07 14:28:32 -07:00
Evan Hunt	c44423127d	fix shutdown deadlocks - ensure isc_nm_pause() and isc_nm_resume() work the same whether run from inside or outside of the netmgr. - promote 'stop' events to the priority event level so they can run while the netmgr is pausing or paused. - when pausing, drain the priority queue before acquiring an interlock; this prevents a deadlock when another thread is waiting for us to complete a task. - release interlock after pausing, reacquire it when resuming, so that stop events can happen. some incidental changes: - use a function to enqueue pause and resume events (this was part of a different change attempt that didn't work out; I kept it because I thought was more readable). - make mgr->nworkers a signed int to remove some annoying integer casts.	2021-05-07 14:28:32 -07:00
Ondřej Surý	4c8f6ebeb1	Use barriers for netmgr synchronization The netmgr listening, stoplistening, pausing and resuming functions now use barriers for synchronization, which makes the code much simpler. isc/barrier.h defines isc_barrier macros as a front-end for uv_barrier on platforms where that works, and pthread_barrier where it doesn't (including TSAN builds).	2021-05-07 14:28:32 -07:00
Ondřej Surý	2eae7813b6	Run isc__nm_http_stoplistening() synchronously in netmgr When isc__nm_http_stoplistening() is run from inside the netmgr, we need to make sure it's run synchronously. This commit is just a band-aid though, as the desired behvaior for isc_nm_stoplistening() is not always the same: 1. When run from outside user of the interface, the call must be synchronous, e.g. the calling code expects the call to really stop listening on the interfaces. 2. But if there's a call from listen<proto> when listening fails, that needs to be scheduled to run asynchronously, because isc_nm_listen<proto> is being run in a paused (interlocked) netmgr thread and we could get stuck. The proper solution would be to make isc_nm_stoplistening() behave like uv_close(), i.e., to have a proper callback.	2021-05-07 14:28:32 -07:00
Evan Hunt	5c08f97791	only run tasks as privileged if taskmgr is in privileged mode all zone loading tasks have the privileged flag, but we only want them to run as privileged tasks when the server is being initialized; if we privilege them the rest of the time, the server may hang for a long time after a reload/reconfig. so now we call isc_taskmgr_setmode() to turn privileged execution mode on or off in the task manager. isc_task_privileged() returns true if the task's privilege flag is set and the taskmgr is in privileged execution mode. this is used to determine in which netmgr event queue the task should be run.	2021-05-07 14:28:30 -07:00
Ondřej Surý	29a208aaf7	Fix crash when allocating UDP socket fails on OpenBSD When socket() call fails, the UDP connect code would call the connectcb with empty req->handle. This has been fixed.	2021-05-07 14:28:30 -07:00
Ondřej Surý	dacf586e18	Make the netmgr queue processing quantized There was a theoretical possibility of clogging up the queue processing with an endless loop where currently processing netievent would schedule new netievent that would get processed immediately. This wasn't such a problem when only netmgr netievents were processed, but with the addition of the tasks, there are at least two situation where this could happen: 1. In lib/dns/zone.c:setnsec3param() the task would get re-enqueued when the zone was not yet fully loaded. 2. Tasks have internal quantum for maximum number of isc_events to be processed, when the task quantum is reached, the task would get rescheduled and then immediately processed by the netmgr queue processing. As the isc_queue doesn't have a mechanism to atomically move the queue, this commit adds a mechanism to quantize the queue, so enqueueing new netievents will never stop processing other uv_loop_t events. The default quantum size is 128. Since the queue used in the network manager allows items to be enqueued more than once, tasks are now reference-counted around task_ready() and task_run(). task_ready() now has a public API wrapper, isc_task_ready(), that the netmgr can use to reschedule processing of a task if the quantum has been reached. Incidental changes: Cleaned up some unused fields left in isc_task_t and isc_taskmgr_t after the last refactoring, and changed atomic flags to atomic_bools for easier manipulation.	2021-05-07 14:28:30 -07:00
Ondřej Surý	b5bf58b419	Destroy netmgr before destroying taskmgr With taskmgr running on top of netmgr, the ordering of how the tasks and netmgr shutdown interacts was wrong as previously isc_taskmgr_destroy() was waiting until all tasks were properly shutdown and detached. This responsibility was moved to netmgr, so we now need to do the following: 1. shutdown all the tasks - this schedules all shutdown events onto the netmgr queue 2. shutdown the netmgr - this also makes sure all the tasks and events are properly executed 3. Shutdown the taskmgr - this now waits for all the tasks to finish running before returning 4. Shutdown the netmgr - this call waits for all the netmgr netievents to finish before returning This solves the race when the taskmgr object would be destroyed before all the tasks were finished running in the netmgr loops.	2021-05-07 14:28:30 -07:00
Ondřej Surý	a011d42211	Add new isc_managers API to simplify <>mgr create/destroy Previously, netmgr, taskmgr, timermgr and socketmgr all had their own isc_<>mgr_create() and isc_<>mgr_destroy() functions. The new isc_managers_create() and isc_managers_destroy() fold all four into a single function and makes sure the objects are created and destroy in correct order. Especially now, when taskmgr runs on top of netmgr, the correct order is important and when the code was duplicated at many places it's easy to make mistake. The former isc_<>mgr_create() and isc_<*>mgr_destroy() functions were made private and a single call to isc_managers_create() and isc_managers_destroy() is required at the program startup / shutdown.	2021-05-07 10:19:05 -07:00
Artem Boldariev	8c0ea01f34	DoH: close active server streams when finishing session Under some circumstances a situation might occur when server-side session gets finished while there are still active HTTP/2 streams. This would lead to isc_nm_httpsocket object leaks. This commit fixes this behaviour as well as refactors failed_read_cb() to allow better code reuse.	2021-05-07 15:47:24 +03:00
Artem Boldariev	a9e97f28b7	Fix crash in client side DoH code This commit fixes a situation when a cstream object could get unlinked from the list as a result of a cstream->read_cb call. Thus, unlinking it after the call could crash the program.	2021-05-07 15:47:24 +03:00
Artem Boldariev	cd178043d9	Make some TLS tests actually use quota A directive to check quota was missing from some of the TLS tests which were supposed to test TLS code with quotas.	2021-05-07 15:47:24 +03:00
Artem Boldariev	22376fc69a	TLS: cancel reading on the underlying TCP socket after (see below) ... the last handle has been detached after calling write callback. That makes it possible to detach from the underlying socket and not to keep the socket object alive for too long. This issue was causing TLS tests with quota to fail because quota might not have been detached on time (because it was still referenced by the underlying TCP socket). One could say that this commit is an ideological continuation of: 513cdb52ecd4e63566672217f7390574f68c4d2d.	2021-05-07 15:47:24 +03:00
Artem Boldariev	3bf331c453	Fix crashes in TLS when handling TLS shutdown messages This commit fixes some situations which could appear in TLS code when dealing with shutdown messages and lead to crashes.	2021-05-07 15:47:24 +03:00
Artem Boldariev	0d3f503dc9	Avoid creating connect netievents during low level failures in HTTP This way we create less netievent objects, not bombarding NM with the messages in case of numerous low-level errors (like too many open files) in e.g. unit tests.	2021-05-07 15:47:24 +03:00
Artem Boldariev	0e8ac61d6e	Avoid creating httpclose netievents in case of low level failures This way we create less load on NM workers by avoiding netievent creation.	2021-05-07 15:47:24 +03:00
Artem Boldariev	8510c5cd59	Always call TCP connect callback from within a worker context This change ensures that a TCP connect callback is called from within the context of a worker thread in case of a low-level error when descriptors cannot be created (e.g. when there are too many open file descriptors).	2021-05-07 15:47:24 +03:00
Artem Boldariev	1349142333	Got rid of tlsconnect event and corresponding code We do not need it since we decided to not return values from connect functions.	2021-05-07 15:47:24 +03:00
Artem Boldariev	39448c1581	Finish HTTP session on write failure Not doing so caused client-side code to not free file descriptors as soon as possible, that was causing unit tests to fail.	2021-05-07 15:47:24 +03:00

... 3 4 5 6 7 ...

4339 Commits