When closing the socket that is actively reading from the stream, the
read_cb() could be called between uv_close() and close callback when the
server socket has been already detached hence using sock->statichandle
after it has been already freed.
When networking statistics was added to the netmgr (in commit
5234a8e00a6ae1df738020f27544594ccb8d5215), two lines were added that
increment the 'STATID_RECVFAIL' statistic: One if 'uv_read_start'
fails and one at the end of the 'read_cb'. The latter happens
if 'nread < 0'.
According to the libuv documentation, I/O read callbacks (such as for
files and sockets) are passed a parameter 'nread'. If 'nread' is less
than 0, there was an error and 'UV_EOF' is the end of file error, which
you may want to handle differently.
In other words, we should not treat EOF as a RECVFAIL error.
If we clone the csock (children socket) in TCP accept_connection()
instead of passing the ssock (server socket) to the call back and
cloning it there we unbreak the assumption that every socket is handled
inside it's own worker thread and therefore we can get rid of (at least)
callback locking.
The isc__nm_tcpdns_stoplistening() would call isc__nmsocket_clearcb()
that would clear the .accept_cb from non-netmgr thread. Change the
tcpdns_stoplistening to enqueue ievent that would get processed in the
right netmgr thread to avoid locking.
On POSIX based systems both uv_os_sock_t and uv_os_fd_t are both typedef
to int. That's not true on Windows, where uv_os_sock_t is SOCKET and
uv_os_fd_t is HANDLE and they differ in level of indirection.
The isc__nm_socket_freebind() has been refactored to match other
isc__nm_socket_...() helper functions and take uv_os_fd_t and
sa_family_t as function arguments.
Attaching and detaching handle pointers will make it easier to
determine where and why reference counting errors have occurred.
A handle needs to be referenced more than once when multiple
asynchronous operations are in flight, so callers must now maintain
multiple handle pointers for each pending operation. For example,
ns_client objects now contain:
- reqhandle: held while waiting for a request callback (query,
notify, update)
- sendhandle: held while waiting for a send callback
- fetchhandle: held while waiting for a recursive fetch to
complete
- updatehandle: held while waiting for an update-forwarding
task to complete
control channel connection objects now contain:
- readhandle: held while waiting for a read callback
- sendhandle: held while waiting for a send callback
- cmdhandle: held while an rndc command is running
httpd connections contain:
- readhandle: held while waiting for a read callback
- sendhandle: held while waiting for a send callback
- rename isc_nmsocket_t->tcphandle to statichandle
- cancelread functions now take handles instead of sockets
- add a 'client' flag in socket objects, currently unused, to
indicate whether it is to be used as a client or server socket
Each worker has a receive buffer with space for 20 DNS messages of up
to 2^16 bytes each, and the allocator function passed to uv_read_start()
or uv_udp_recv_start() will reserve a portion of it for use by sockets.
UDP can use recvmmsg() and so it needs that entire space, but TCP reads
one message at a time.
This commit introduces separate allocator functions for TCP and UDP
setting different buffer size limits, so that libuv will provide the
correct buffer sizes to each of them.
When a new IPv6 interface/address appears it's first in a tentative
state - in which we cannot bind to it, yet it's already being reported
by the route socket. Because of that BIND9 is unable to listen on any
newly detected IPv6 addresses. Fix it by setting IP_FREEBIND option (or
equivalent option on other OSes) and then retrying bind() call.
There's a possibility of race in isc__nm_tcpconnect if the asynchronous
connect operation finishes with all the callbacks before we exit the
isc__nm_tcpconnect itself we might access an already freed memory.
Fix it by creating an additional reference to the socket freed at the
end of isc__nm_tcpconnect.
the blackhole ACL was accidentally disabled with respect to client
queries during the netmgr conversion.
in order to make this work for TCP, it was necessary to add a return
code to the accept callback functions passed to isc_nm_listentcp() and
isc_nm_listentcpdns().
The isc_nm_cancelread() function cancels reading on a connected
socket and calls its read callback function with a 'result'
parameter of ISC_R_CANCELED.
the isc_nm_tcpconnect() function establishes a client connection via
TCP. once the connection is esablished, a callback function will be
called with a newly created network manager handle.
A TCPDNS socket creates a handle for each complete DNS message.
Previously, when all the handles were disconnected, the socket
would be closed, but the wrapped TCP socket might still have
more to read.
Now, when a connection is established, the TCPDNS socket creates
a reference to itself by attaching itself to sock->self. This
reference isn't cleared until the connection is closed via
EOF, timeout, or server shutdown. This allows the socket to remain
open even when there are no active handles for it.
- isc__nmhandle_get() now attaches to the sock in the nmhandle object.
the caller is responsible for dereferencing the original socket
pointer when necessary.
- tcpdns listener sockets attach sock->outer to the outer tcp listener
socket. tcpdns connected sockets attach sock->outerhandle to the handle
for the tcp connected socket.
- only listener sockets need to be attached/detached directly. connected
sockets should only be accessed and reference-counted via their
associated handles.
there is no need for a caller to reference-count socket objects.
they need tto be able tto close listener sockets (i.e., those
returned by isc_nm_listen{udp,tcp,tcpdns}), and an isc_nmsocket_close()
function has been added for that. other sockets are only accessed via
handles.
There's a possibility of a race in TCP accepting code:
T1 accepts a connection C1
T2 accepts a connection C2
T1 tries to accept a connection C3, but we hit a quota,
isc_quota_cb_init() sets quota_accept_cb for the socket,
we return from accept_connection
T2 drops C2, but we race in quota_release with accepting C3 so
we don't see quota->waiting is > 0, we don't launch the callback
T1 accepts a connection C4, we are able to get the quota we clear
the quota_accept_cb from sock->quotacb
T1 drops C1, tries to call the callback which is zeroed, sigsegv.
Instead of using bind() and passing the listening socket to the children
threads using uv_export/uv_import use one thread that does the accepting,
and then passes the connected socket using uv_export/uv_import to a random
worker. The previous solution had thundering herd problems (all workers
waking up on one connection and trying to accept()), this one avoids this
and is simpler.
The tcp clients quota is simplified with isc_quota_attach_cb - a callback
is issued when the quota is available.
The rewrite of BIND 9 build system is a large work and cannot be reasonable
split into separate merge requests. Addition of the automake has a positive
effect on the readability and maintainability of the build system as it is more
declarative, it allows conditional and we are able to drop all of the custom
make code that BIND 9 developed over the years to overcome the deficiencies of
autoconf + custom Makefile.in files.
This squashed commit contains following changes:
- conversion (or rather fresh rewrite) of all Makefile.in files to Makefile.am
by using automake
- the libtool is now properly integrated with automake (the way we used it
was rather hackish as the only official way how to use libtool is via
automake
- the dynamic module loading was rewritten from a custom patchwork to libtool's
libltdl (which includes the patchwork to support module loading on different
systems internally)
- conversion of the unit test executor from kyua to automake parallel driver
- conversion of the system test executor from custom make/shell to automake
parallel driver
- The GSSAPI has been refactored, the custom SPNEGO on the basis that
all major KRB5/GSSAPI (mit-krb5, heimdal and Windows) implementations
support SPNEGO mechanism.
- The various defunct tests from bin/tests have been removed:
bin/tests/optional and bin/tests/pkcs11
- The text files generated from the MD files have been removed, the
MarkDown has been designed to be readable by both humans and computers
- The xsl header is now generated by a simple sed command instead of
perl helper
- The <irs/platform.h> header has been removed
- cleanups of configure.ac script to make it more simpler, addition of multiple
macros (there's still work to be done though)
- the tarball can now be prepared with `make dist`
- the system tests are partially able to run in oot build
Here's a list of unfinished work that needs to be completed in subsequent merge
requests:
- `make distcheck` doesn't yet work (because of system tests oot run is not yet
finished)
- documentation is not yet built, there's a different merge request with docbook
to sphinx-build rst conversion that needs to be rebased and adapted on top of
the automake
- msvc build is non functional yet and we need to decide whether we will just
cross-compile bind9 using mingw-w64 or fix the msvc build
- contributed dlz modules are not included neither in the autoconf nor automake
tcpdns used transport-specific functions to operate on the outer socket.
Use generic ones instead, and select the proper call in netmgr.c.
Make the missing functions (e.g. isc_nm_read) generic and add type-specific
calls (isc__nm_tcp_read). This is the preparation for netmgr TLS layer.
In tcp and udp stoplistening code we accessed libuv structures
from a different thread, which caused a shutdown crash when named
was under load. Also added additional DbC checks making sure we're
in a proper thread when accessing uv_ functions.
- isc__netievent_storage_t was to small to contain
isc__netievent__socket_streaminfo_t on Windows
- handle isc_uv_export and isc_uv_import errors properly
- rewrite isc_uv_export and isc_uv_import on Windows
If a connection was closed early (right after accept()) an assertion
that assumed that the connection was still alive could be triggered
in accept_connection. Handle those errors properly and not with
assertions, free all the resources afterwards.
- the socket stat counters have been moved from socket.h to stats.h.
- isc_nm_t now attaches to the same stats counter group as
isc_socketmgr_t, so that both managers can increment the same
set of statistics
- isc__nmsocket_init() now takes an interface as a paramter so that
the address family can be determined when initializing the socket.
- based on the address family and socket type, a group of statistics
counters will be associated with the socket - for example, UDP4Active
with IPv4 UDP sockets and TCP6Active with IPv6 TCP sockets. note
that no counters are currently associated with TCPDNS sockets; those
stats will be handled by the underlying TCP socket.
- the counters are not actually used by netmgr sockets yet; counter
increment and decrement calls will be added in a later commit.
- use UV_{TC,UD}P_IPV6ONLY for IPv6 sockets, keeping the pre-netmgr
behaviour.
- add a new listening_error bool flag which is set if the child
listener fails to start listening. This fixes a bug where named would
hang if, e.g., we failed to bind to a TCP socket.
For multithreaded TCP listening we need to pass a bound socket to all
listening threads. Instead of using uv_pipe handle passing method which
is quite complex (lots of callbacks, each of them with its own error
handling) we now use isc_uv_export() to export the socket, pass it as a
member of the isc__netievent_tcpchildlisten_t structure, and then
isc_uv_import() it in the child thread, simplifying the process
significantly.
- make tcp listening IPC pipe name saner
- put the pipe in /tmp on unices
- add pid to the pipe name to avoid conflicts between processes
- fsync directory in which the pipe resides to make sure that the
child threads will see it and be able to open it