When the outgoing TCP dispatch times-out active response, we might still
receive the answer during the lifetime of the connection. Previously,
we would just ignore any non-matching DNS answers, which would allow the
server to feed us with otherwise valid DNS answer and keep the
connection open.
Add a counter for timed-out DNS queries over TCP and tear down the whole
TCP connection if we receive unexpected number of DNS answers.
Previously, when invalid DNS message is received over TCP we throw the
garbage DNS message away and continued looking for valid DNS message
that would match our outgoing queries. This logic makes sense for UDP,
because anyone can send DNS message over UDP.
Change the logic that the TCP connection is closed when we receive
garbage, because the other side is acting malicious.
When outgoing TCP connection was prematurely terminated (f.e. with
connection reset), the dispatch code would not cleanup the resources
used by such connection leading to dangling dns_dispentry_t entries.
When a UDP dispatch receives a mismatched response, it checks whether
there is still enough time to wait for the correct one to arrive before
the timeout fires. If there is not, the result code is set to
ISC_R_TIMEDOUT, but it is not subsequently used anywhere as 'response'
is set to NULL a few lines earlier. This results in the higher-level
read callback (resquery_response() in case of resolver code) not being
called. However, shortly afterwards, a few levels up the call chain,
isc__nm_udp_read_cb() calls isc__nmsocket_timer_stop() on the dispatch
socket, effectively disabling read timeout handling for that socket.
Combined with the fact that reading is not restarted in such a case
(e.g. by calling dispatch_getnext() from udp_recv()), this leads to the
higher-level query structure remaining referenced indefinitely because
the dispatch socket it uses will neither be read from nor closed due to
a timeout. This in turn causes fetch contexts to linger around
indefinitely, which in turn i.a. prevents certain cache nodes (those
containing rdatasets used by fetch contexts, like fctx->nameservers)
from being cleaned.
Fix by making sure the higher-level callback does get invoked with the
ISC_R_TIMEDOUT result code when udp_recv() determines there is no more
time left to receive the correct UDP response before the timeout fires.
This allows the higher-level callback to clean things up, preventing the
reference leak described above.
there was a race possible in which a dispatch was put into
the 'connected' state before it had a TCP handle attached,
which could cause an assertion failure in dns_dispatch_gettcp().
Renamed some functions for clarity and readability:
- dns_dispatch_addresponse() -> dns_dispatch_add()
- dns_dispatch_removeresponse() -> dns_dispatch_done()
The dns_dispatch_cancel() function now calls dns_dispatch_done()
directly, so it is no longer ever necessary to call both functions.
dns_dispatch_cancel() is used to terminate dispatch connections
that are still pending, while dns_dispatch_done() is used when they
are complete.
udp_recv() will call dispatch_getnext() if the message received is
invalid or doesn't match; we need to reduce the timeout each time this
happens so we can't be starved forever by someone sending garbage
packets.
- disp_connected() has been split into two functions,
udp_connected() (which takes 'resp' as an argument) and
tcp_connected() (which takes 'disp', and calls the connect callbacks
for all pending resps).
- In dns_dispatch_connect(), if a connection is already open, we need to
detach the dispentry immediately because we won't be running
tcp_connected().
- dns_disptach_cancel() also now calls the connect callbacks for pending
TCP responses, and the response callbacks for open TCP connections
waiting on read.
- If udp_connected() runs after dns_dispatch_cancel() has been called,
ensure that the caller's connect callback is run.
- If a UDP connection fails with EADDRINUSE, we try again up to five
times with a different local port number before giving up.
- If a TCP connection is canceled while still pending connection, the
connect timeout may still fire. we attach the dispatch before
connecting to ensure that it won't be detached too soon in this case.
- The dispentry is no longer removed from the pending list when
deactivating, so that the connect callback can still be run if
dns_dispatch_removeresponse() was run while the connecting was
pending.
- Rewrote dns_dispatch_gettcp() to avoid a data race.
- startrecv() and dispatch_getnext() can be called with a NULL resp when
using TCP.
- Refactored udp_recv() and tcp_recv() and added result logging.
- EOF is now treated the same as CANCELED in response callbacks.
- ISC_R_SHUTTINGDOWN is sent to the reponse callbacks for all resps if
tcp_recv() is triggered by a netmgr shutdown. (response callbacks
are *not* sent by udp_recv() in this case.)
- startrecv() and getnext() have been rewritten.
- Don't set TCP flag when connecting a UDP dispatch.
- Prevent TCP connections from trying to connect twice.
- dns_dispatch_gettcp() can now find a matching TCP dispatch that has
not yet fully connected, and attach to it. when the connection is
completed, the connect callbacks are run for all of the pending
entries.
- An atomic 'state' variable is now used for connection state instead of
attributes.
- When dns_dispatch_cancel() is called on a TCP dispatch entry, only
that one entry is canceled. the dispatch itself should not be shut
down until there are no dispatch entries left associated with it.
- Other incidental cleanup, including removing DNS_DISPATCHATTR_IPV4 and
_IPV6 (they were being set in the dispatch attributes but never used),
cleaning up dns_requestmgr_create(), and renaming dns_dispatch_read()
to the more descriptive dns_dispatch_resume().
- It is no longer necessary to pass a 'timeout' callback to
dns_dispatch_addresponse(); timeouts are handled directly by the
'response' callback instead.
- The netmgr handle is no longer passed to dispatch callbacks, since
they don't (and can't) use it. instead, dispatch_cb_t now takes a
result code, region, and argument.
- Cleaned up timeout-related tests in dispatch_test.c
- Responses received by the dispatch are no longer sent to the caller
via a task event, but via a netmgr-style recv callback. the 'action'
parameter to dns_dispatch_addresponse() is now called 'response' and
is called directly from udp_recv() or tcp_recv() when a valid response
has been received.
- All references to isc_task and isc_taskmgr have been removed from
dispatch functions.
- All references to dns_dispatchevent_t have been removed and the type
has been deleted.
- Added a task to the resolver response context, to be used for fctx
events.
- When the caller cancels an operation, the response handler will be
called with ISC_R_CANCELED; it can abort immediately since the caller
will presumably have taken care of cleanup already.
- Cleaned up attach/detach in resquery and request.
Remove the debugging printfs. (leaving this as a separate commit rather
than squashing it so we can restore it in the future if we ever need it
again.)
Since every dispsock was associated with a dispentry anyway (though not
always vice versa), the members of dispsock have been combined into
dispentry, which is now reference-counted. dispentry objects are now
attached before connecting and detached afterward to prevent races
between the connect callback and dns_dispatch_removeresponse().
Dispatch and dispatchmgr objects are now reference counted as well, and
the shutdown process has been simplified. reference counting of
resquery and request objects has also been cleaned up significantly.
dns_dispatch_cancel() now flags a dispentry as having been canceled, so
that if the connect callback runs after cancellation, it will not
initiate a read.
The isblackholed() function has been simplified.
- The `timeout_action` parameter to dns_dispatch_addresponse() been
replaced with a netmgr callback that is called when a dispatch read
times out. this callback may optionally reset the read timer and
resume reading.
- Added a function to convert isc_interval to milliseconds; this is used
to translate fctx->interval into a value that can be passed to
dns_dispatch_addresponse() as the timeout.
- Note that netmgr timeouts are accurate to the millisecond, so code to
check whether a timeout has been reached cannot rely on microsecond
accuracy.
- If serve-stale is configured, then a timeout received by the resolver
may trigger it to return stale data, and then resume waiting for the
read timeout. this is no longer based on a separate stale timer.
- The code for canceling requests in request.c has been altered so that
it can run asynchronously.
- TCP timeout events apply to the dispatch, which may be shared by
multiple queries. since in the event of a timeout we have no query ID
to use to identify the resp we wanted, we now just send the timeout to
the oldest query that was pending.
- There was some additional refactoring in the resolver: combining
fctx_join() and fctx_try_events() into one function to reduce code
duplication, and using fixednames in fetchctx and fetchevent.
- Incidental fix: new_adbaddrinfo() can't return NULL anymore, so the
code can be simplified.
The flow of operations in dispatch is changing and will now be similar
for both UDP and TCP queries:
1) Call dns_dispatch_addresponse() to assign a query ID and register
that we'll be listening for a response with that ID soon. the
parameters for this function include callback functions to inform the
caller when the socket is connected and when the message has been
sent, as well as a task action that will be sent when the response
arrives. (later this could become a netmgr callback, but at this
stage to minimize disruption to the calling code, we continue to use
isc_task for the response event.) on successful completion of this
function, a dispatch entry object will be instantiated.
2) Call dns_dispatch_connect() on the dispatch entry. this runs
isc_nm_udpconnect() or isc_nm_tcpdnsconnect(), as needed, and begins
listening for responses. the caller is informed via a callback
function when the connection is established.
3) Call dns_dispatch_send() on the dispatch entry. this runs
isc_nm_send() to send a request.
4) Call dns_dispatch_removeresponse() to terminate listening and close
the connection.
Implementation comments below:
- As we will be using netmgr buffers now. code to send the length in
TCP queries has also been removed as that is handled by the netmgr.
- TCP dispatches can be used by multiple simultaneous queries, so
dns_dispatch_connect() now checks whether the dispatch is already
connected before calling isc_nm_tcpdnsconnect() again.
- Running dns_dispatch_getnext() from a non-network thread caused a
crash due to assertions in the netmgr read functions that appear to be
unnecessary now. the assertions have been removed.
- fctx->nqueries was formerly incremented when the connection was
successful, but is now incremented when the query is started and
decremented if the connection fails.
- It's no longer necessary for each dispatch to have a pool of tasks, so
there's now a single task per dispatch.
- Dispatch code to avoid UDP ports already in use has been removed.
- dns_resolver and dns_request have been modified to use netmgr callback
functions instead of task events. some additional changes were needed
to handle shutdown processing correctly.
- Timeout processing is not yet fully converted to use netmgr timeouts.
- Fixed a lock order cycle reported by TSAN (view -> zone-> adb -> view)
by by calling dns_zt functions without holding the view lock.
We now use dns_dispatch_cancel() for this purpose. NOTE: The caller
still has to track whether there are pending send or connect events in
the dispatch or dispatch entry; later this should be moved into the
dispatch module as well.
Also removed some public dns_dispatch_*() API calls that are no longer
used outside dispatch itself.
dns_dispatch_connect() connects a dispatch socket (for TCP) or a
dispatch entry socket (for UDP). This is the next step in moving all
uses of the isc_socket code into the dispatch module.
This API is temporary; it needs to be cleaned up further so that it can
be called the same way for both TCP and UDP.
Continuing the effort to move all uses of the isc_socket API into
dispatch.c, this commit removes the dns_tcpmsg module entirely, as
dispatch was its only caller, and moves the parts of its functionality
that were being used into the dispatch module.
This code will be removed when we switch to using netmgr TCPDNS.
Previously, creation of TCP dispatches differed from UDP in that a TCP
dispatch was created to attach to an existing socket, whereas a UDP
dispatch would be created in a vacuum and sockets would be opened on
demand when a transaction was initiated.
We are moving as much socket code as possible into the dispatch module,
so that it can be replaced with a netmgr version as easily as
possible. (This will also have the side effect of making TCP and UDP
dispatches more similar.)
As a step in that direction, this commit changes
dns_dispatch_createtcp() so that it creates the TCP socket.
- Many dispatch attributes can be set implicitly instead of being passed
in. we can infer whether to set DNS_DISPATCHATTR_TCP or _UDP from
whether we're calling dns_dispatch_createtcp() or _createudp(). we
can also infer DNS_DISPATCHATTR_IPV4 or _IPV6 from the addresses or
the socket that were passed in.
- We no longer use dup'd sockets in UDP dispatches, so the 'dup_socket'
parameter has been removed from dns_dispatch_createudp(), along with
the code implementing it. also removed isc_socket_dup() since it no
longer has any callers.
- The 'buffersize' parameter was ignored and has now been removed;
buffersize is now fixed at 4096.
- Maxbuffers and maxrequests don't need to be passed in on every call to
dns_dispatch_createtcp() and _createudp().
In all current uses, the value for mgr->maxbuffers will either be
raised once from its default of 20000 to 32768, or else left
alone. (passing in a value lower than 20000 does not lower it.) there
isn't enough difference between these values for there to be any need
to configure this.
The value for disp->maxrequests controls both the quota of concurrent
requests for a dispatch and also the size of the dispatch socket
memory pool. it's not clear that this quota is necessary at all. the
memory pool size currently starts at 32768, but is sometimes lowered
to 4096, which is definitely unnecessary.
This commit sets both values permanently to 32768.
- Previously TCP dispatches allocated their own separate QID table,
which didn't incorporate a port table. this commit removes
per-dispatch QID tables and shares the same table between all
dispatches. since dispatches are created for each TCP socket, this may
speed up the dispatch allocation process. there may be a slight
increase in lock contention since all dispatches are sharing a single
QID table, but since TCP sockets are used less often than UDP
sockets (which were already sharing a QID table), it should not be a
substantial change.
- The dispatch port table was being used to determine whether a port was
already in use; if so, then a UDP socket would be bound with
REUSEADDR. this commit removes the port table, and always binds UDP
sockets that way.
Currently the netmgr doesn't support unconnected, shared UDP sockets, so
there's no reason to retain that functionality in the dispatcher prior
to porting to the netmgr.
In this commit, the DNS_DISPATCHATTR_EXCLUSIVE attribute has been
removed as it is now non-optional; UDP dispatches are alwasy exclusive.
Code implementing non-exclusive UDP dispatches has been removed.
dns_dispatch_getentrysocket() now always returns the dispsocket for UDP
dispatches and the dispatch socket for TCP dispatches.
There is no longer any need to search for existing dispatches from
dns_dispatch_getudp(), so the 'mask' option has been removed, and the
function renamed to the more descriptive dns_dispatch_createudp().
- style cleanup
- removed NULL checks in places where they are not currently needed
- use isc_refcount for dispatch reference counting
- revised code flow for readability
- remove some #ifdefs that are no longer relevant
- remove unused struct members
- removed unnecessary function parameters
- use C99 struct initialization
- UDP buffersize is now established when creating dispatch manager
and is always set to 4096.
- Set up the default port range in dispatchmgr before setting the magic
number.
- Magic is not set until dispatchmgr is fully created.
- DNS_DISPATCHATTR_CANREUSE was never set. the code that implements it
has been removed.
- DNS_DISPATCHOPT_FIXEDID and DNS_DISPATCHATTR_FIXEDID were both
defined, but only the DISPATCHOPT was ever set; it appears the
DISPATCHATTR was added accidentally.
- DNS_DISPATCHATTR_NOLISTEN was set but never used.
Current mempools are kind of hybrid structures - they serve two
purposes:
1. mempool with a lock is basically static sized allocator with
pre-allocated free items
2. mempool without a lock is a doubly-linked list of preallocated items
The first kind of usage could be easily replaced with jemalloc small
sized arena objects and thread-local caches.
The second usage not-so-much and we need to keep this (in
libdns:message.c) for performance reasons.
cppcheck 2.2 reports the following false positive:
lib/dns/dispatch.c:1239:14: warning: Either the condition 'resp==NULL' is redundant or there is possible null pointer dereference: resp. [nullPointerRedundantCheck]
if (disp != resp->disp) {
^
lib/dns/dispatch.c:1210:11: note: Assuming that condition 'resp==NULL' is not redundant
if (resp == NULL) {
^
lib/dns/dispatch.c:1239:14: note: Null pointer dereference
if (disp != resp->disp) {
^
Apparently this version of cppcheck gets confused about conditional
"goto" statements because line 1239 can never be reached if 'resp' is
NULL.
Move a code block to prevent the above false positive from being
reported without affecting the processing logic.
Also disable the semantic patch as the code needs tweaks here and there because
some destroy functions might not destroy the object and return early if the
object is still in use.
The isc_mempool_create() function now cannot fail with ISC_R_MEMORY.
This commit removes all the checks on the return code using the semantic
patch from previous commit, as isc_mempool_create() now returns void.