2
0
mirror of https://gitlab.isc.org/isc-projects/bind9 synced 2025-08-28 21:17:54 +00:00

994 Commits

Author SHA1 Message Date
Evan Hunt
81a22cbf5f Remove DISPATH_TRACE, RESOLVER_TRACE and REQ_TRACE logging
Remove the debugging printfs. (leaving this as a separate commit rather
than squashing it so we can restore it in the future if we ever need it
again.)
2021-10-02 11:39:56 -07:00
Evan Hunt
7dc54fa6f2 Refactor dispatch, resolver and request
Since every dispsock was associated with a dispentry anyway (though not
always vice versa), the members of dispsock have been combined into
dispentry, which is now reference-counted.  dispentry objects are now
attached before connecting and detached afterward to prevent races
between the connect callback and dns_dispatch_removeresponse().

Dispatch and dispatchmgr objects are now reference counted as well, and
the shutdown process has been simplified.  reference counting of
resquery and request objects has also been cleaned up significantly.

dns_dispatch_cancel() now flags a dispentry as having been canceled, so
that if the connect callback runs after cancellation, it will not
initiate a read.

The isblackholed() function has been simplified.
2021-10-02 11:39:56 -07:00
Evan Hunt
08ce69a0ea Rewrite dns_resolver and dns_request to use netmgr timeouts
- The `timeout_action` parameter to dns_dispatch_addresponse() been
  replaced with a netmgr callback that is called when a dispatch read
  times out.  this callback may optionally reset the read timer and
  resume reading.

- Added a function to convert isc_interval to milliseconds; this is used
  to translate fctx->interval into a value that can be passed to
  dns_dispatch_addresponse() as the timeout.

- Note that netmgr timeouts are accurate to the millisecond, so code to
  check whether a timeout has been reached cannot rely on microsecond
  accuracy.

- If serve-stale is configured, then a timeout received by the resolver
  may trigger it to return stale data, and then resume waiting for the
  read timeout. this is no longer based on a separate stale timer.

- The code for canceling requests in request.c has been altered so that
  it can run asynchronously.

- TCP timeout events apply to the dispatch, which may be shared by
  multiple queries.  since in the event of a timeout we have no query ID
  to use to identify the resp we wanted, we now just send the timeout to
  the oldest query that was pending.

- There was some additional refactoring in the resolver: combining
  fctx_join() and fctx_try_events() into one function to reduce code
  duplication, and using fixednames in fetchctx and fetchevent.

- Incidental fix: new_adbaddrinfo() can't return NULL anymore, so the
  code can be simplified.
2021-10-02 11:39:56 -07:00
Evan Hunt
308bc46a59 Convert dispatch to netmgr
The flow of operations in dispatch is changing and will now be similar
for both UDP and TCP queries:

1) Call dns_dispatch_addresponse() to assign a query ID and register
   that we'll be listening for a response with that ID soon. the
   parameters for this function include callback functions to inform the
   caller when the socket is connected and when the message has been
   sent, as well as a task action that will be sent when the response
   arrives. (later this could become a netmgr callback, but at this
   stage to minimize disruption to the calling code, we continue to use
   isc_task for the response event.) on successful completion of this
   function, a dispatch entry object will be instantiated.

2) Call dns_dispatch_connect() on the dispatch entry. this runs
   isc_nm_udpconnect() or isc_nm_tcpdnsconnect(), as needed, and begins
   listening for responses. the caller is informed via a callback
   function when the connection is established.

3) Call dns_dispatch_send() on the dispatch entry. this runs
   isc_nm_send() to send a request.

4) Call dns_dispatch_removeresponse() to terminate listening and close
   the connection.

Implementation comments below:

- As we will be using netmgr buffers now.  code to send the length in
  TCP queries has also been removed as that is handled by the netmgr.

- TCP dispatches can be used by multiple simultaneous queries, so
  dns_dispatch_connect() now checks whether the dispatch is already
  connected before calling isc_nm_tcpdnsconnect() again.

- Running dns_dispatch_getnext() from a non-network thread caused a
  crash due to assertions in the netmgr read functions that appear to be
  unnecessary now. the assertions have been removed.

- fctx->nqueries was formerly incremented when the connection was
  successful, but is now incremented when the query is started and
  decremented if the connection fails.

- It's no longer necessary for each dispatch to have a pool of tasks, so
  there's now a single task per dispatch.

- Dispatch code to avoid UDP ports already in use has been removed.

- dns_resolver and dns_request have been modified to use netmgr callback
  functions instead of task events. some additional changes were needed
  to handle shutdown processing correctly.

- Timeout processing is not yet fully converted to use netmgr timeouts.

- Fixed a lock order cycle reported by TSAN (view -> zone-> adb -> view)
  by by calling dns_zt functions without holding the view lock.
2021-10-02 11:39:56 -07:00
Evan Hunt
e76a7f764e Move isc_socket_cancel() calls into dispatch
We now use dns_dispatch_cancel() for this purpose. NOTE: The caller
still has to track whether there are pending send or connect events in
the dispatch or dispatch entry; later this should be moved into the
dispatch module as well.

Also removed some public dns_dispatch_*() API calls that are no longer
used outside dispatch itself.
2021-10-02 11:39:56 -07:00
Evan Hunt
2523be1cbe Move isc_socket_sendto2() calls into dispatch
We now use dns_dispatch_send() for this purpose.
2021-10-02 11:39:56 -07:00
Evan Hunt
655e7fcacc Move isc_socket_getsockname() calls into dispatch
We now use dns_dispentry_getlocaladdress(). (this API is likely to be
cleaned up further later.)
2021-10-02 11:39:56 -07:00
Evan Hunt
9f9a327b22 Move isc_socket_connect() calls into dispatch
dns_dispatch_connect() connects a dispatch socket (for TCP) or a
dispatch entry socket (for UDP). This is the next step in moving all
uses of the isc_socket code into the dispatch module.

This API is temporary; it needs to be cleaned up further so that it can
be called the same way for both TCP and UDP.
2021-10-02 11:39:56 -07:00
Evan Hunt
4f30b679e7 Creating TCP dispatch now creates/binds the socket
Previously, creation of TCP dispatches differed from UDP in that a TCP
dispatch was created to attach to an existing socket, whereas a UDP
dispatch would be created in a vacuum and sockets would be opened on
demand when a transaction was initiated.

We are moving as much socket code as possible into the dispatch module,
so that it can be replaced with a netmgr version as easily as
possible. (This will also have the side effect of making TCP and UDP
dispatches more similar.)

As a step in that direction, this commit changes
dns_dispatch_createtcp() so that it creates the TCP socket.
2021-10-02 11:39:34 -07:00
Evan Hunt
f439eb5d99 Dispatch API simplification
- Many dispatch attributes can be set implicitly instead of being passed
  in. we can infer whether to set DNS_DISPATCHATTR_TCP or _UDP from
  whether we're calling dns_dispatch_createtcp() or _createudp().  we
  can also infer DNS_DISPATCHATTR_IPV4 or _IPV6 from the addresses or
  the socket that were passed in.

- We no longer use dup'd sockets in UDP dispatches, so the 'dup_socket'
  parameter has been removed from dns_dispatch_createudp(), along with
  the code implementing it. also removed isc_socket_dup() since it no
  longer has any callers.

- The 'buffersize' parameter was ignored and has now been removed;
  buffersize is now fixed at 4096.

- Maxbuffers and maxrequests don't need to be passed in on every call to
  dns_dispatch_createtcp() and _createudp().

  In all current uses, the value for mgr->maxbuffers will either be
  raised once from its default of 20000 to 32768, or else left
  alone. (passing in a value lower than 20000 does not lower it.) there
  isn't enough difference between these values for there to be any need
  to configure this.

  The value for disp->maxrequests controls both the quota of concurrent
  requests for a dispatch and also the size of the dispatch socket
  memory pool. it's not clear that this quota is necessary at all. the
  memory pool size currently starts at 32768, but is sometimes lowered
  to 4096, which is definitely unnecessary.

  This commit sets both values permanently to 32768.

- Previously TCP dispatches allocated their own separate QID table,
  which didn't incorporate a port table. this commit removes
  per-dispatch QID tables and shares the same table between all
  dispatches. since dispatches are created for each TCP socket, this may
  speed up the dispatch allocation process. there may be a slight
  increase in lock contention since all dispatches are sharing a single
  QID table, but since TCP sockets are used less often than UDP
  sockets (which were already sharing a QID table), it should not be a
  substantial change.

- The dispatch port table was being used to determine whether a port was
  already in use; if so, then a UDP socket would be bound with
  REUSEADDR. this commit removes the port table, and always binds UDP
  sockets that way.
2021-10-02 10:21:49 +02:00
Evan Hunt
9fd375217d Remove DNS_DISPATCHATTR_MAKEQUERY
This attribute was set but was no longer being used.
2021-10-02 10:21:46 +02:00
Evan Hunt
5dcf55da03 Remove support for shared UDP dispatch sockets
Currently the netmgr doesn't support unconnected, shared UDP sockets, so
there's no reason to retain that functionality in the dispatcher prior
to porting to the netmgr.

In this commit, the DNS_DISPATCHATTR_EXCLUSIVE attribute has been
removed as it is now non-optional; UDP dispatches are alwasy exclusive.
Code implementing non-exclusive UDP dispatches has been removed.
dns_dispatch_getentrysocket() now always returns the dispsocket for UDP
dispatches and the dispatch socket for TCP dispatches.

There is no longer any need to search for existing dispatches from
dns_dispatch_getudp(), so the 'mask' option has been removed, and the
function renamed to the more descriptive dns_dispatch_createudp().
2021-10-02 10:21:43 +02:00
Evan Hunt
300392ae2f General code refactoring
- style cleanup
- removed NULL checks in places where they are not currently needed
- use isc_refcount for dispatch reference counting
- revised code flow for readability
- remove some #ifdefs that are no longer relevant
- remove unused struct members
- removed unnecessary function parameters
- use C99 struct initialization
2021-10-02 10:21:38 +02:00
Evan Hunt
ca11f68d61 Simplify dns_dispatchmgr_create with fixed buffersize
- UDP buffersize is now established when creating dispatch manager
  and is always set to 4096.

- Set up the default port range in dispatchmgr before setting the magic
  number.

- Magic is not set until dispatchmgr is fully created.
2021-10-02 10:21:32 +02:00
Mark Andrews
cd985d96e3 Add additional processing to HTTPS and SVBC records
The additional processing method has been expanded to take the
owner name of the record, as HTTPS and SVBC need it to process "."
in service form.

The additional section callback can now return the RRset that was
added.  We use this when adding CNAMEs.  Previously, the recursion
would stop if it detected that a record you added already exists.  With
CNAMEs this rule doesn't work, as you ultimately care about the RRset
at the target of the CNAME and not the presence of the CNAME itself.
Returning the record allows the caller to restart with the target
name.  As CNAMEs can form loops, loop protection was added.

As HTTPS and SVBC can produce infinite chains, we prevent this by
tracking recursion depth and stopping if we go too deep.
2021-08-18 13:49:48 +10:00
Mark Andrews
3c942a3e3a Update out of date comment 2021-07-12 12:33:46 +10:00
Mark Andrews
a3fda086f7 Check that there was no OPT record before falling back
to plain DNS on FORMERR.
2021-07-12 12:30:03 +10:00
Ondřej Surý
7cbfbc8faa Clean up the dns_dispatch_getudp API
Cleanup unused parts of dns_dispatch_getudp API, remove
dns_dispatch_getudp_dup() function and related code.
2021-07-09 15:58:02 +02:00
Ondřej Surý
29c2e52484 The isc/platform.h header has been completely removed
The isc/platform.h header was left empty which things either already
moved to config.h or to appropriate headers.  This is just the final
cleanup commit.
2021-07-06 05:33:48 +00:00
Ondřej Surý
d0d37aa6d1 Don't set memory context name in resolver.c
We now attach to existing memory context instead of creating a new
memory context, so we should not set its name.
2021-05-25 07:25:44 +02:00
Ondřej Surý
28b65d8256 Reduce the number of clientmgr objects created
Previously, as a way of reducing the contention between threads a
clientmgr object would be created for each interface/IP address.

We tasks being more strictly bound to netmgr workers, this is no longer
needed and we can just create clientmgr object per worker queue (ncpus).

Each clientmgr object than would have a single task and single memory
context.
2021-05-24 20:44:54 +02:00
Ondřej Surý
aad7856b8e Don't create per bucket memory contexts in resolver
Similarly, the resolver code would create hundreds of memory contexts
just on the resolver setup.  The contention will be reduced directly in
the allocator, so for now just attach to the view memory instead of
creating separate memory context for each bucket.
2021-05-24 20:02:20 +02:00
Evan Hunt
b0aadaac8e rename dns_name_copynf() to dns_name_copy()
dns_name_copy() is now the standard name-copying function.
2021-05-22 00:37:27 -07:00
Evan Hunt
e31cc1eeb4 use a fixedname buffer in dns_message_gettempname()
dns_message_gettempname() now returns a pointer to an initialized
name associated with a dns_fixedname_t object. it is no longer
necessary to allocate a buffer for temporary names associated with
the message object.
2021-05-20 20:41:29 +02:00
Ondřej Surý
0f44139145 Bump the maximum number of hazard pointers in tests
On 24-core machine, the tests would crash because we would run out of
the hazard pointers.  We now adjust the number of hazard pointers to be
in the <128,256> interval based on the number of available cores.

Note: This is just a band-aid and needs a proper fix.
2021-02-18 19:32:55 +01:00
Mark Andrews
59bf6e71e2 Address theoretical buffer overrun in recent change
The strlcat() call was wrong.

    *** CID 316608:  Memory - corruptions  (OVERRUN)
    /lib/dns/resolver.c: 5017 in fctx_create()
    5011     	 * Make fctx->info point to a copy of a formatted string
    5012     	 * "name/type".
    5013     	 */
    5014     	dns_name_format(name, buf, sizeof(buf));
    5015     	dns_rdatatype_format(type, typebuf, sizeof(typebuf));
    5016     	p = strlcat(buf, "/", sizeof(buf));
    >>>     CID 316608:  Memory - corruptions  (OVERRUN)
    >>>     Calling "strlcat" with "buf + p" and "1036UL" is suspicious because "buf" points into a buffer of 1036 bytes and the function call may access "(char *)(buf + p) + 1035UL". [Note: The source code implementation of the function has been overridden by a builtin model.]
    5017     	strlcat(buf + p, typebuf, sizeof(buf));
    5018     	fctx->info = isc_mem_strdup(mctx, buf);
    5019
    5020     	FCTXTRACE("create");
    5021     	dns_name_init(&fctx->name, NULL);
    5022     	dns_name_dup(name, mctx, &fctx->name);
2021-02-14 22:41:46 +00:00
Mark Andrews
3b11bacbb7 Cleanup redundant isc_rwlock_init() result checks 2021-02-03 12:22:33 +11:00
Diego Fronza
171a5b7542 Add stale-answer-client-timeout option
The general logic behind the addition of this new feature works as
folows:

When a client query arrives, the basic path (query.c / ns_query_recurse)
was to create a fetch, waiting for completion in fetch_callback.

With the introduction of stale-answer-client-timeout, a new event of
type DNS_EVENT_TRYSTALE may invoke fetch_callback, whenever stale
answers are enabled and the fetch took longer than
stale-answer-client-timeout to complete.

When an event of type DNS_EVENT_TRYSTALE triggers fetch_callback, we
must ensure that the folowing happens:

1. Setup a new query context with the sole purpose of looking up for
   stale RRset only data, for that matters a new flag was added
   'DNS_DBFIND_STALEONLY' used in database lookups.

    . If a stale RRset is found, mark the original client query as
      answered (with a new query attribute named NS_QUERYATTR_ANSWERED),
      so when the fetch completion event is received later, we avoid
      answering the client twice.

    . If a stale RRset is not found, cleanup and wait for the normal
      fetch completion event.

2. In ns_query_done, we must change this part:
	/*
	 * If we're recursing then just return; the query will
	 * resume when recursion ends.
	 */
	if (RECURSING(qctx->client)) {
		return (qctx->result);
	}

   To this:

	if (RECURSING(qctx->client) && !QUERY_STALEONLY(qctx->client)) {
		return (qctx->result);
	}

   Otherwise we would not proceed to answer the client if it happened
   that a stale answer was found when looking up for stale only data.

When an event of type DNS_EVENT_FETCHDONE triggers fetch_callback, we
proceed as before, resuming query, updating stats, etc, but a few
exceptions had to be added, most important of which are two:

1. Before answering the client (ns_client_send), check if the query
   wasn't already answered before.

2. Before detaching a client, e.g.
   isc_nmhandle_detach(&client->reqhandle), ensure that this is the
   fetch completion event, and not the one triggered due to
   stale-answer-client-timeout, so a correct call would be:
   if (!QUERY_STALEONLY(client)) {
        isc_nmhandle_detach(&client->reqhandle);
   }

Other than these notes, comments were added in code in attempt to make
these updates easier to follow.
2021-01-25 10:47:14 -03:00
Diego Fronza
49c40827f6 Avoid iterating name twice when constructing fctx->info
This is a minor performance improvement, we store the result of the
first call to strlcat to use as an offset in the next call when
constructing fctx->info string.
2021-01-25 10:47:14 -03:00
Mark Andrews
ab0bf49203 Adjust default value of "max-recursion-queries"
Since the queries sent towards root and TLD servers are now included in
the count (as a result of the fix for CVE-2020-8616),
"max-recursion-queries" has a higher chance of being exceeded by
non-attack queries.  Increase its default value from 75 to 100.
2020-12-01 23:47:23 +11:00
Mark Andrews
0e3b1f5a25 Tighten DNS COOKIE response handling
Fallback to TCP when we have already seen a DNS COOKIE response
from the given address and don't have one in this UDP response. This
could be a server that has turned off DNS COOKIE support, a
misconfigured anycast server with partial DNS COOKIE support, or a
spoofed response. Falling back to TCP is the correct behaviour in
all 3 cases.
2020-11-26 20:48:46 +00:00
Mark Andrews
ed783a8139 remove fctx:id field 2020-11-06 01:54:44 +00:00
Mark Andrews
1f63bb15b3 Restore the dns_message_reset() call before the dns_dispatch_getnext()
This was accidentally lost in the process of moving rmessage from fctx
to query.  Without this dns_message_setclass() will fail.
2020-10-08 10:55:35 +11:00
Ondřej Surý
bb990030d3 Simplify the EDNS buffer size logic for DNS Flag Day 2020
The DNS Flag Day 2020 aims to remove the IP fragmentation problem from
the UDP DNS communication.  In this commit, we implement the required
changes and simplify the logic for picking the EDNS Buffer Size.

1. The defaults for `edns-udp-size`, `max-udp-size` and
   `nocookie-udp-size` have been changed to `1232` (the value picked by
   DNS Flag Day 2020).

2. The probing heuristics that would try 512->4096->1432->1232 buffer
   sizes has been removed and the resolver will always use just the
   `edns-udp-size` value.

3. Instead of just disabling the PMTUD mechanism on the UDP sockets, we
   now set IP_DONTFRAG (IPV6_DONTFRAG) flag.  That means that the UDP
   packets won't get ever fragmented.  If the ICMP packets are lost the
   UDP will just timeout and eventually be retried over TCP.
2020-10-05 16:21:21 +02:00
Ondřej Surý
33eefe9f85 The dns_message_create() cannot fail, change the return to void
The dns_message_create() function cannot soft fail (as all memory
allocations either succeed or cause abort), so we change the function to
return void and cleanup the calls.
2020-09-29 08:22:08 +02:00
Diego Fronza
cde6227a68 Properly handling dns_message_t shared references
This commit fix the problems that arose when moving the dns_message_t
object from fetchctx_t to the query structure.

Since the lifetime of query objects are different than that of a fetchctx
and the dns_message_t object held by the query may be being used by some
external module, e.g. validator, even after the query may have been destroyed,
propery handling of the references to the message were added in this commit to
avoid accessing an already destroyed object.

Specifically, in rctx_done(), a reference to the message is attached at
the beginning of the function and detached at the end, since a possible call
to fctx_cancelquery() would release the dns_message_t object, and in the next
lines of code a call to rctx_nextserver() or rctx_chaseds() would require
a valid pointer to the same object.

In valcreate() a new reference is attached to the message object, this
ensures that if the corresponding query object is destroyed before the
validator attempts to access it, no invalid pointer access occurs.

In validated() we have to attach a new reference to the message, since
we destroy the validator object at the beginning of the function,
and we need access to the message in the next lines of the same function.

rctx_nextserver() and rctx_chaseds() functions were adapted to receive
a new parameter of dns_message_t* type, this was so they could receive a
valid reference to a dns_message_t since using the response context respctx_t
to access the message through rctx->query->rmessage could lead to an already
released reference due to the query being canceled.
2020-09-29 08:22:08 +02:00
Diego Fronza
02f9e125c1 Fix invalid dns message state in resolver's logic
The assertion failure REQUIRE(msg->state == DNS_SECTION_ANY),
caused by calling dns_message_setclass within function resquery_response()
in resolver.c, was happening due to wrong management of dns message_t
objects used to process responses to the queries issued by the resolver.

Before the fix, a resolver's fetch context (fetchctx_t) would hold
a pointer to the message, this same reference would then be used over all
the attempts to resolve the query, trying next server, etc... for this to work
the message object would have it's state reset between each iteration, marking
it as ready for a new processing.

The problem arose in a scenario with many different forwarders configured,
managing the state of the dns_message_t object was lacking better
synchronization, which have led it to a invalid dns_message_t state in
resquery_response().

Instead of adding unnecessarily complex code to synchronize the object,
the dns_message_t object was moved from fetchctx_t structure to the
query structure, where it better belongs to, since each query will produce
a response, this way whenever a new query is created an associated
dns_messate_t is also created.

This commit deals mainly with moving the dns_message_t object from fetchctx_t
to the query structure.
2020-09-29 08:22:08 +02:00
Diego Fronza
12d6d13100 Refactored dns_message_t for using attach/detach semantics
This commit will be used as a base for the next code updates in order
to have a better control of dns_message_t objects' lifetime.
2020-09-29 08:22:08 +02:00
Evan Hunt
dcee985b7f update all copyright headers to eliminate the typo 2020-09-14 16:20:40 -07:00
Mark Andrews
992a79a14b Address lock-order-inversion
WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) (pid=12714)
  Cycle in lock order graph: M100252 (0x7b7c00010a08) => M1171 (0x7b7400000dc8) => M100252

  Mutex M1171 acquired here while holding mutex M100252 in thread T1:
    #0 pthread_mutex_lock <null> (delv+0x4483a6)
    #1 dns_resolver_createfetch3 /builds/isc-projects/bind9/lib/dns/resolver.c:9585:2 (libdns.so.1110+0x1769fd)
    #2 dns_resolver_createfetch /builds/isc-projects/bind9/lib/dns/resolver.c:9504:10 (libdns.so.1110+0x174e17)
    #3 create_fetch /builds/isc-projects/bind9/lib/dns/validator.c:1156:10 (libdns.so.1110+0x1c1e5f)
    #4 validatezonekey /builds/isc-projects/bind9/lib/dns/validator.c:2124:13 (libdns.so.1110+0x1c3b6d)
    #5 start_positive_validation /builds/isc-projects/bind9/lib/dns/validator.c:2301:10 (libdns.so.1110+0x1bfde9)
    #6 validator_start /builds/isc-projects/bind9/lib/dns/validator.c:3647:12 (libdns.so.1110+0x1bef62)
    #7 dispatch /builds/isc-projects/bind9/lib/isc/task.c:1157:7 (libisc.so.1107+0x507d5)
    #8 run /builds/isc-projects/bind9/lib/isc/task.c:1331:2 (libisc.so.1107+0x4d729)

  Mutex M100252 previously acquired by the same thread here:
    #0 pthread_mutex_lock <null> (delv+0x4483a6)
    #1 validator_start /builds/isc-projects/bind9/lib/dns/validator.c:3628:2 (libdns.so.1110+0x1bee31)
    #2 dispatch /builds/isc-projects/bind9/lib/isc/task.c:1157:7 (libisc.so.1107+0x507d5)
    #3 run /builds/isc-projects/bind9/lib/isc/task.c:1331:2 (libisc.so.1107+0x4d729)

  Mutex M100252 acquired here while holding mutex M1171 in thread T1:
    #0 pthread_mutex_lock <null> (delv+0x4483a6)
    #1 dns_validator_destroy /builds/isc-projects/bind9/lib/dns/validator.c:3912:2 (libdns.so.1110+0x1bf788)
    #2 validated /builds/isc-projects/bind9/lib/dns/resolver.c:4916:2 (libdns.so.1110+0x18fdfd)
    #3 dispatch /builds/isc-projects/bind9/lib/isc/task.c:1157:7 (libisc.so.1107+0x507d5)
    #4 run /builds/isc-projects/bind9/lib/isc/task.c:1331:2 (libisc.so.1107+0x4d729)

  Mutex M1171 previously acquired by the same thread here:
    #0 pthread_mutex_lock <null> (delv+0x4483a6)
    #1 validated /builds/isc-projects/bind9/lib/dns/resolver.c:4907:2 (libdns.so.1110+0x18fc3d)
    #2 dispatch /builds/isc-projects/bind9/lib/isc/task.c:1157:7 (libisc.so.1107+0x507d5)
    #3 run /builds/isc-projects/bind9/lib/isc/task.c:1331:2 (libisc.so.1107+0x4d729)

  Thread T1 'isc-worker0000' (tid=12729, running) created by main thread at:
    #0 pthread_create <null> (delv+0x42afdb)
    #1 isc_thread_create /builds/isc-projects/bind9/lib/isc/pthreads/thread.c:60:8 (libisc.so.1107+0x726d8)
    #2 isc__taskmgr_create /builds/isc-projects/bind9/lib/isc/task.c:1468:7 (libisc.so.1107+0x4d635)
    #3 isc_taskmgr_createinctx /builds/isc-projects/bind9/lib/isc/task.c:2091:11 (libisc.so.1107+0x4f4ac)
    #4 main /builds/isc-projects/bind9/bin/delv/delv.c:1639:2 (delv+0x4b7f96)

SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) (/builds/isc-projects/bind9/bin/delv/.libs/delv+0x4483a6) in pthread_mutex_lock
2020-09-09 14:08:33 +10:00
Ondřej Surý
0a22024c27 Print diagnostics on dns_name_issubdomain() failure in fctx_create()
Log diagnostic message when dns_name_issubdomain() in the fctx_create()
when the resolver is qname minimizing and forwarding at the same time.
2020-09-02 18:08:15 +02:00
Diego Fronza
230d79c191 Fix resolution of unusual ip6.arpa names
Before this commit, BIND was unable to resolve ip6.arpa names like
the one reported in issue #1847 when using query minimization.

As reported in the issue, an attempt to resolve a name like
'rec-test-dom-158937817846788.test123.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.3.4.3.5.4.0.8.2.6.0.1.0.0.2.ip6.arpa'
using default settings would fail.

The reason was that query minimization algorithm in 'fctx_minimize_qname'
would divide any ip6.arpa names in increasing number of labels,
7,11, ... up to 35, thus limiting the destination name (minimized) to a number
of 35 labels.

In case the last query minimization attempt (with 35 labels) would fail with
NXDOMAIN, BIND would attempt the query mininimization again with the exact
same QNAME, limited on the 35 labels, and that in turn would fail again.

This fix avoids this fail loop by considering the extra labels that may appear
in the leftmost part of an ip6.arpa name, those after the IPv6 part.
2020-09-01 15:47:00 -03:00
Evan Hunt
51c9ea98a3 permanently disable QNAME minimization in a fetch when forwarding
QNAME minimization is normally disabled when forwarding. if, in the
course of processing a fetch, we switch back to normal recursion at
some point, we can't safely start minimizing because we may have
been left in an inconsistent state.
2020-08-05 15:43:52 +02:00
Mark Andrews
bde5c7632a Always check the return from isc_refcount_decrement.
Created isc_refcount_decrement_expect macro to test conditionally
the return value to ensure it is in expected range.  Converted
unchecked isc_refcount_decrement to use isc_refcount_decrement_expect.
Converted INSIST(isc_refcount_decrement()...) to isc_refcount_decrement_expect.
2020-07-31 10:15:44 +10:00
Michał Kępień
953d704bd2 Fix idle timeout for connected TCP sockets
When named acting as a resolver connects to an authoritative server over
TCP, it sets the idle timeout for that connection to 20 seconds.  This
fixed timeout was picked back when the default processing timeout for
each client query was hardcoded to 30 seconds.  Commit
000a8970f840a0c27c5cc404826853c4674362ac made this processing timeout
configurable through "resolver-query-timeout" and decreased its default
value to 10 seconds, but the idle TCP timeout was not adjusted to
reflect that change.  As a result, with the current defaults in effect,
a single hung TCP connection will consistently cause the resolution
process for a given query to time out.

Set the idle timeout for connected TCP sockets to half of the client
query processing timeout configured for a resolver.  This allows named
to handle hung TCP connections more robustly and prevents the timeout
mismatch issue from resurfacing in the future if the default is ever
changed again.
2020-07-30 10:58:39 +02:00
Witold Kręcicki
b4f3fafcff Fix assertion failure during startup when the server is under load.
When we're coming back from recursion fetch_callback does not accept
DNS_R_NXDOMAIN as an rcode - query_gotanswer calls query_nxdomain in
which an assertion fails on qctx->is_zone. Yet, under some
circumstances, qname minimization will return an DNS_R_NXDOMAIN - when
root zone mirror is not yet loaded. The fix changes the DNS_R_NXDOMAIN
answer to DNS_R_SERVFAIL.
2020-07-01 12:25:36 +02:00
Evan Hunt
364b349ad2 ensure clientstr is null-terminated 2020-06-05 18:56:40 -07:00
Witold Kręcicki
175c4d9055 Fix a data access race in resolver
We were passing client address to dns_resolver_createfetch as a pointer
and it was saved as a pointer. The client (with its address) could be
gone before the fetch is finished, and in a very odd scenario
log_formerr would call isc_sockaddr_format() which first checks if the
address family is valid (and at this point it still is), then the
sockaddr is cleared, and then isc_netaddr_fromsockaddr is called which
fails an assertion as the address family is now invalid.
2020-06-05 16:06:42 +02:00
Michał Kępień
fb123df2b2 Improve the "hint" variable comment
Replace an existing comment with a more verbose explanation of when the
"hint" variable is set in resquery_send() and how its value affects the
advertised UDP buffer size in outgoing queries.
2020-05-25 14:34:56 +02:00
Michał Kępień
d27f96cc98 Ensure server-specific "edns-udp-size" is obeyed
If "edns-udp-size" is set in a "server" block matching the queried
server, it is accounted for in the process of determining the advertised
UDP buffer size, but its value may still be overridden before the query
is sent.  This behavior contradicts the ARM which claims that when set,
the server-specific "edns-udp-size" value is used for all EDNS queries
sent to a given server.

Furthermore, calling dns_peer_getudpsize() with the "udpsize" variable
as an argument makes the code hard to follow as that call may either
update the value of "udpsize" or leave it untouched.

Ensure the code matches the documentation by moving the
dns_peer_getudpsize() call below all other blocks of code potentially
affecting the advertised UDP buffer size, which is where it was located
when server-specific "edns-udp-size" support was first implemented [1].
Improve code readability by calling dns_peer_getudpsize() with a helper
variable instead of "udpsize".

[1] see commit 1c153afce556ff3c687986fb7c4a0b0a7f5e7cd8
2020-05-25 14:34:56 +02:00