2
0
mirror of https://gitlab.isc.org/isc-projects/bind9 synced 2025-08-25 11:38:29 +00:00
bind/doc/design/netmgr.md
Ondřej Surý 58bd26b6cf Update the copyright information in all files in the repository
This commit converts the license handling to adhere to the REUSE
specification.  It specifically:

1. Adds used licnses to LICENSES/ directory

2. Add "isc" template for adding the copyright boilerplate

3. Changes all source files to include copyright and SPDX license
   header, this includes all the C sources, documentation, zone files,
   configuration files.  There are notes in the doc/dev/copyrights file
   on how to add correct headers to the new files.

4. Handle the rest that can't be modified via .reuse/dep5 file.  The
   binary (or otherwise unmodifiable) files could have license places
   next to them in <foo>.license file, but this would lead to cluttered
   repository and most of the files handled in the .reuse/dep5 file are
   system test files.
2022-01-11 09:05:02 +01:00

4.3 KiB

Netmgr

Netmgr (aka rainbow duck) is the new networking system for BIND. It's based on libuv, although it does not expose any of the libuv API, in order to keep the API agnostic of underlying library.

A bit of history

Networking in BIND9 up to 9.12 works with a single event loop (epoll() on Linux, kqueue on FreeBSD, etc).

When a client wants to read from a socket, it creates a socket event associated with a task that will receive this event. An isc_socket_{read,write,etc.} operation tries to read directly from the socket; if it succeeds, it sends the socket event to the task provided by the callee. If it doesn't, it adds an event to an event loop, and when this event is received the listener is re-set, and an internal task is launched to read the data from the socket. After the internal task is done, it launches the task from socket event provided by the callee. This means that a simple socket operation causes a lot of context switches.

9.14 fixed some of these issues by having multiple event loops in separate threads (one per CPU), that can read the data immediately and then call the socket event, but this is still sub-optimal.

Basic concepts

isc_nm_t

The isc_nm_t structure represents the network manager itself. It contains a configurable number (generally the same as the number of CPUs) of 'networker' objects, each of which represents a thread for executing networking events.

The manager contains flags to indicate whether it has been paused or interlocked, and counters for the number of workers running and the number of workers paused.

Each networker object contains a queue of incoming asynchronous events and a pool of buffers into which messages will be copied when received.

isc_nmsocket_t

isc_nmsocket_t is a wrapper around a libuv socket. It is configured with

isc_nmhandle_t

An isc_nmhandle_t object represents an interface that can be read or written. For TCP it's a socket, and for UDP it's a socket with a peer address. It is always associated with one and only one isc_nmsocket_t object.

When a handle object is allocated, it may be allocated with a block of 'extra' space in which another object will be stored that is associated with that handle: for example, an ns_client_t structure storing information about an incoming request.

The handle is reference counted; when references drop to zero it calls the 'reset' callback for its associated object and places itself onto a stack of inactive handles in its corresponding isc_nmsocket_t structure so it can be quickly reused when the next incoming message is received. When the handle is freed (which may happen if the socket's inactive-handles stack is full or when the socket is destroyed) then the associated object's 'put' callback will be called to free any resources it allocated.

UDP listening

UDP listener sockets automatically create an array of 'child' sockets, each associated with one networker, and all listening on the same address via SO_REUSEADDR. (The parent's reference counter is used for all the parent and child sockets together; none are destroyed until there are no remaining references to any of tem.)

TCP listening

A TCP listener socket cannot listen on multiple threads in parallel, so receiving a TCP connection can cause a context switch, but this is expected to be rare enough not to impact performance significantly.

When connected, a TCP socket will attach to the system-wide TCP clients quota.

TCP listening for DNS

A TCPDNS listener is a wrapper around a TCP socket which specifically handles DNS traffic, including the two-byte length field that prepends DNS messages over TCP.

Other wrapper socket types can be added in the future, such as a TLS socket wrapper to implement encryption or an HTTP wrapper to implement the HTTP protocol. This will enable the system to have a transport-neutral network manager socket over which DNS can be sent without knowing anything about transport, encryption, etc.