When shutting down, calling the isc__nm_failed_connect_cb() was delayed
until the connect callback would be called. It turned out that the
connect callback might not get called at all when the socket is being
shut down. Call the failed_connect_cb() directly in the
tlsdns_shutdown() instead of waiting for the connect callback to call it.
After a partial write the tls.senddata buffer would be rearranged to
contain only the data tha wasn't sent and the len part would be made
shorter, which would lead to attempt to free only part of a socket's
tls.senddata buffer.
The tlsdns_cycle() might call uv_write() to write data to the socket,
when this happens and the socket is shutdown before the callback
completes, the uvreq structure was not freed because the callback would
be called with non-zero status code.
The RFC7828 specifies the keepalive interval to be 16-bit, specified in
units of 100 milliseconds and the configuration options tcp-*-timeouts
are following the suit. The units of 100 milliseconds are very
unintuitive and while we can't change the configuration and presentation
format, we should not follow this weird unit in the API.
This commit changes the isc_nm_(get|set)timeouts() functions to work
with milliseconds and convert the values to milliseconds before passing
them to the function, not just internally.
The udp, tcpdns and tlsdns contained lot of cut&paste code or code that
was very similar making the stack harder to maintain as any change to
one would have to be copied to the the other protocols.
In this commit, we merge the common parts into the common functions
under isc__nm_<foo> namespace and just keep the little differences based
on the socket type.
After the TCPDNS refactoring the initial and idle timers were broken and
only the tcp-initial-timeout was always applied on the whole TCP
connection.
This broke any TCP connection that took longer than tcp-initial-timeout,
most often this would affect large zone AXFRs.
This commit changes the timeout logic in this way:
* On TCP connection accept the tcp-initial-timeout is applied
and the timer is started
* When we are processing and/or sending any DNS message the timer is
stopped
* When we stop processing all DNS messages, the tcp-idle-timeout
is applied and the timer is started again
The system tests were missing a test that would test tcp-initial-timeout
and tcp-idle-timeout.
This commit adds new "timeouts" system test that adds:
* Test that waits longer than tcp-initial-timeout and then checks
whether the socket was closed
* Test that sends and receives DNS message then waits longer than
tcp-initial-timeout but shorter time than tcp-idle-timeout than
sends DNS message again than waits longer than tcp-idle-timeout
and checks whether the socket was closed
* Similar test, but bursting 25 DNS messages than waiting longer than
tcp-initial-timeout and shorter than tcp-idle-timeout than do second
25 DNS message burst
* Check whether transfer longer than tcp-initial-timeout succeeds
Add a test for freezing, manually updating, and then thawing a dynamic
zone with "dnssec-policy". In the kasp system test we add parameters
to the "update_is_signed" check to signal the indicated IP addresses
for the labels "a" and "d". If set to '-', the test is skipped.
After nsupdating the dynamic.kasp zone, we revert the update (with
nsupdate) and update the zone again, but now with the freeze/thaw
approach.
Dynamic zones with dnssec-policy could not be thawed because KASP
zones were considered always dynamic. But a dynamic KASP zone should
also check whether updates are disabled.
This commit fixes loading the certificate chain files so that the full
chain could be sent to the clients which require that for
verification. Before that fix only the top most certificate would be
loaded from the chain and sent to clients preventing some of them to
perform certificate validation (e.g. Windows 10 DoH client).
The transport should also be detached when we skip a master, otherwise
named will crash when sending a SOA query to the next master over TLS,
because the transport must be NULL when we enter
'dns_view_gettransport'.
When we query the resolver for a domain name that is in the same zone
for which is already one or more fetches outstanding, we could
potentially hit the fetch limits. If so, recursion fails immediately
for the incoming query and if serve-stale is enabled, we may try to
return a stale answer.
If the resolver is also is authoritative for the parent zone (for
example the root zone), first a delegation is found, but we first
check the cache for a better response.
Nothing is found in the cache, so we try to recurse to find the
answer to the query.
Because of fetch-limits 'dns_resolver_createfetch()' returns an error,
which 'ns_query_recurse()' propagates to the caller,
'query_delegation_recurse()'.
Because serve-stale is enabled, 'query_usestale()' is called,
setting 'qctx->db' to the cache db, but leaving 'qctx->version'
untouched. Now 'query_lookup()' is called to search for stale data
in the cache database with a non-NULL 'qctx->version'
(which is set to a zone db version), and thus we hit an assertion
in rbtdb.
This crash was introduced in 'main' by commit
8bcd7fe69e5343071fc917738d6092a8b974ef3f.