mir/bind - bind - Mike's Git repositories

mir/bind

mirror of https://gitlab.isc.org/isc-projects/bind9 synced 2025-08-29 05:28:00 +00:00

Author	SHA1	Message	Date
Tony Finch	66b3cb9732	Remove several superfluous newlines in log messages	2022-05-02 23:49:38 +01:00
Matthijs Mekking	c66b9abc0b	Add stale answer extended errors Add DNS extended errors 3 (Stale Answer) and 19 (Stale NXDOMAIN Answer) to responses. Add extra text with the reason why the stale answer was returned. To test, we need to change the configuration such that for the first set of tests the stale-refresh-time window does not interfer with the expected extended errors.	2022-04-28 09:58:25 +02:00
Ondřej Surý	8138a595d9	Add isc_rwlock around dns_aclenv .localhost and .localnets member In order to modify the .localhost and .localnets members of the dns_aclenv, all other processing on the netmgr loops needed to be stopped using the task exclusive mode. Add the isc_rwlock to the dns_aclenv, so any modifications to the .localhost and .localnets can be done under the write lock.	2022-04-04 19:27:00 +02:00
Ondřej Surý	4dceab142d	Consistenly use UNREACHABLE() instead of ISC_UNREACHABLE() In couple places, we have missed INSIST(0) or ISC_UNREACHABLE() replacement on some branches with UNREACHABLE(). Replace all ISC_UNREACHABLE() or INSIST(0) calls with UNREACHABLE().	2022-03-28 23:26:08 +02:00
Ondřej Surý	1f35977423	Remove ns_client_t .shuttingdown member The way the ns_client_t .shuttingdown member was practically dead code. The .shuttingdown would be set to true only in ns__client_put() function meaning that we have detached from all ns_client_t .handles and the ns_client_t object being freed: client->magic = 0; client->shuttingdown = true; [...] isc_mem_put(manager->ctx, client, sizeof(client)) Meanwhile the ns_client_t object is accessed like this: isc_nmhandle_detach(&client->fetchhandle); client->query.attributes &= ~NS_QUERYATTR_RECURSING; client->state = NS_CLIENTSTATE_WORKING; qctx_init(client, &devent, 0, &qctx); client_shuttingdown = ns_client_shuttingdown(client); if (fetch_canceled \|\| fetch_answered \|\| client_shuttingdown) { [...] } Even if the isc_nmhandle_detach(...) was the last handle detach, it would mean that immediatelly, after calling the isc_nmhandle_detach(), we would be causing use-after-free, because the ns_client_t is immediatelly destroyed after setting .shuttingdown to true. The similar code in the query_hookresume() already noticed this: /* * This event is running under a client task, so it's safe to detach * the fetch handle. And it should be done before resuming query * processing below, since that may trigger another recursion or * asynchronous hook event. */	2022-03-25 10:38:35 +01:00
Ondřej Surý	23195f18bc	Remove extra copies and stray members from ns_client_t The ns_client_t is always attached to ns_clientmgr_t which has associated memory context, server context, task and threadid. Use those directly from the ns_clientmgr_t instead of attaching it to an extra copy in ns_client_t to make the ns_client_t more sleek and lean. Additionally, remove some stray ns_client_t struct members that were not used anywhere.	2022-03-25 10:18:11 +01:00
Ondřej Surý	20f0936cf2	Remove use of the inline keyword used as suggestion to compiler Historically, the inline keyword was a strong suggestion to the compiler that it should inline the function marked inline. As compilers became better at optimising, this functionality has receded, and using inline as a suggestion to inline a function is obsolete. The compiler will happily ignore it and inline something else entirely if it finds that's a better optimisation. Therefore, remove all the occurences of the inline keyword with static functions inside single compilation unit and leave the decision whether to inline a function or not entirely on the compiler NOTE: We keep the usage the inline keyword when the purpose is to change the linkage behaviour.	2022-03-25 08:33:43 +01:00
Ondřej Surý	584f0d7a7e	Simplify way we tag unreachable code with only ISC_UNREACHABLE() Previously, the unreachable code paths would have to be tagged with: INSIST(0); ISC_UNREACHABLE(); There was also older parts of the code that used comment annotation: /* NOTREACHED */ Unify the handling of unreachable code paths to just use: UNREACHABLE(); The UNREACHABLE() macro now asserts when reached and also uses __builtin_unreachable(); when such builtin is available in the compiler.	2022-03-25 08:33:43 +01:00
Ondřej Surý	fe7ce629f4	Add FALLTHROUGH macro for __attribute__((fallthrough)) Gcc 7+ and Clang 10+ have implemented __attribute__((fallthrough)) which is explicit version of the /* FALLTHROUGH / comment we are currently using. Add and apply FALLTHROUGH macro that uses the attribute if available, but does nothing on older compilers. In one case (lib/dns/zone.c), using the macro revealed that we were using the / FALLTHROUGH */ comment in wrong place, remove that comment.	2022-03-25 08:33:43 +01:00
Michał Kępień	f7482b68b9	Fix more ns_statscounter_recursclients underflows Commit aab691d51266f552a7923db32686fb9398b1d255 did not fix all possible scenarios in which the ns_statscounter_recursclients counter underflows. The solution implemented therein can be ineffective e.g. when CNAME chaining happens with prefetching enabled. Here is an example recursive resolution scenario in which the ns_statscounter_recursclients counter can underflow with the current logic in effect: 1. Query processing starts, the answer is not found in the cache, so recursion is started. The NS_CLIENTATTR_RECURSING attribute is set. ns_statscounter_recursclients is incremented (Δ = +1). 2. Recursion completes, returning a CNAME. client->recursionquota is non-NULL, so the NS_CLIENTATTR_RECURSING attribute remains set. ns_statscounter_recursclients is decremented (Δ = 0). 3. Query processing restarts. 4. The current QNAME (the target of the CNAME from step 2) is found in the cache, with a TTL low enough to trigger a prefetch. 5. query_prefetch() attaches to client->recursionquota. ns_statscounter_recursclients is not incremented because query_prefetch() does not do that (Δ = 0). 6. Query processing restarts. 7. The current QNAME (the target of the CNAME from step 4) is not found in the cache, so recursion is started. client->recursionquota is already attached to (since step 5) and the NS_CLIENTATTR_RECURSING attribute is set (since step 1), so ns_statscounter_recursclients is not incremented (Δ = 0). 8. The prefetch from step 5 completes. client->recursionquota is detached from in prefetch_done(). ns_statscounter_recursclients is not decremented because prefetch_done() does not do that (Δ = 0). 9. Recursion for the current QNAME completes. client->recursionquota is already detached from, i.e. set to NULL (since step 8), and the NS_CLIENTATTR_RECURSING attribute is set (since step 1), so ns_statscounter_recursclients is decremented (Δ = -1). Another possible scenario is that after step 7, recursion for the target of the CNAME from step 4 completes before the prefetch for the CNAME itself. fetch_callback() then notices that client->recursionquota is non-NULL and decrements ns_statscounter_recursclients, even though client->recursionquota was attached to by query_prefetch() and therefore not accompanied by an incrementation of ns_statscounter_recursclients. The net result is also an underflow. Instead of trying to properly handle all possible orderings of events set into motion by normal recursion and prefetch-triggered recursion, adjust ns_statscounter_recursclients whenever the recursive clients quota is successfully attached to or detached from. Remove the NS_CLIENTATTR_RECURSING attribute altogether as its only purpose is made obsolete by this change.	2022-02-23 14:39:11 +01:00
Evan Hunt	737e658602	allow dns_clientinfo to store client ECS data this brings DNS_CLIENTINFO_VERSION into line with the subscription branch so that fixes applied to clientinfo processing can also be applied to the main branch without diverging.	2022-01-27 13:53:59 -08:00
Ondřej Surý	58bd26b6cf	Update the copyright information in all files in the repository This commit converts the license handling to adhere to the REUSE specification. It specifically: 1. Adds used licnses to LICENSES/ directory 2. Add "isc" template for adding the copyright boilerplate 3. Changes all source files to include copyright and SPDX license header, this includes all the C sources, documentation, zone files, configuration files. There are notes in the doc/dev/copyrights file on how to add correct headers to the new files. 4. Handle the rest that can't be modified via .reuse/dep5 file. The binary (or otherwise unmodifiable) files could have license places next to them in <foo>.license file, but this would lead to cluttered repository and most of the files handled in the .reuse/dep5 file are system test files.	2022-01-11 09:05:02 +01:00
Mark Andrews	3fa3b11ef8	Add synthesis of NODATA at wildcard The old code rejected NSEC that proved the wildcard name existed (exists). The new code rejects NSEC that prove that the wildcard name exists and that the type exists (exists && data) but accept NSEC that prove the wildcard name exists. query_synthnxdomain (renamed query_synthnxdomainnodata) already took the NSEC records and added the correct records to the message body for NXDOMAIN or NODATA responses with the above change. The only additional change needed was to ensure the correct RCODE is set.	2021-12-02 14:24:37 +01:00
Mark Andrews	4bdd5a9953	Ignore NSEC records without RRSIG and NSEC present dns_nsec_noexistnodata now checks that RRSIG and NSEC are present in the type map. Both types should be present in a correctly constructed NSEC record. This check is in addition to similar checks in resolver.c and validator.c.	2021-12-02 14:18:42 +01:00
Artem Boldariev	07cf827b0b	Add isc_nm_socket_type() This commit adds an isc_nm_socket_type() function which can be used to obtain a handle's socket type. This change obsoletes isc_nm_is_tlsdns_handle() and isc_nm_is_http_handle(). However, it was decided to keep the latter as we eventually might end up supporting multiple HTTP versions.	2021-11-30 12:20:22 +02:00
Matthijs Mekking	ca7f2fd903	Add EDE to query messages Add extended DNS error on refused queries. All instances are related to unauthorized clients, so set extended DNS error code 18 (Prohibited).	2021-11-19 09:44:28 +01:00
Evan Hunt	7f63ee3bae	address '--disable-doh' failures Change 5756 (GL #2854) introduced build errors when using 'configure --disable-doh'. To fix this, isc_nm_is_http_handle() is now defined in all builds, not just builds that have DoH enabled. Missing code comments were added both for that function and for isc_nm_is_tlsdns_handle().	2021-11-17 13:48:43 -08:00
Ondřej Surý	e603983ec9	Stop providing branch prediction information The __builtin_expect() can be used to provide the compiler with branch prediction information. The Gcc manual says[1] on the subject: In general, you should prefer to use actual profile feedback for this (-fprofile-arcs), as programmers are notoriously bad at predicting how their programs actually perform. Stop using __builtin_expect() and ISC_LIKELY() and ISC_UNLIKELY() macros to provide the branch prediction information as the performance testing shows that named performs better when the __builtin_expect() is not being used. 1. https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005fexpect	2021-10-14 10:33:24 +02:00
Matthijs Mekking	71b92d4d19	Replace "master/slave" terms in code comments Replace those terms with the preferred "primary/secondary" keywords.	2021-10-12 13:09:00 -07:00
Ondřej Surý	2e3a2eecfe	Make isc_result a static enum Remove the dynamic registration of result codes. Convert isc_result_t from unsigned + #defines into 32-bit enum type in grand unified <isc/result.h> header. Keep the existing values of the result codes even at the expense of the description and identifier tables being unnecessary large. Additionally, add couple of: switch (result) { [...] default: break; } statements where compiler now complains about missing enum values in the switch statement.	2021-10-06 11:22:20 +02:00
Artem Boldariev	25b2c6ad96	Require "dot" ALPN token for zone transfer requests over DoT (XoT) This commit makes BIND verify that zone transfers are allowed to be done over the underlying connection. Currently, it makes sense only for DoT, but the code is deliberately made to be protocol-agnostic.	2021-10-05 11:23:47 +03:00
Evan Hunt	08ce69a0ea	Rewrite dns_resolver and dns_request to use netmgr timeouts - The `timeout_action` parameter to dns_dispatch_addresponse() been replaced with a netmgr callback that is called when a dispatch read times out. this callback may optionally reset the read timer and resume reading. - Added a function to convert isc_interval to milliseconds; this is used to translate fctx->interval into a value that can be passed to dns_dispatch_addresponse() as the timeout. - Note that netmgr timeouts are accurate to the millisecond, so code to check whether a timeout has been reached cannot rely on microsecond accuracy. - If serve-stale is configured, then a timeout received by the resolver may trigger it to return stale data, and then resume waiting for the read timeout. this is no longer based on a separate stale timer. - The code for canceling requests in request.c has been altered so that it can run asynchronously. - TCP timeout events apply to the dispatch, which may be shared by multiple queries. since in the event of a timeout we have no query ID to use to identify the resp we wanted, we now just send the timeout to the oldest query that was pending. - There was some additional refactoring in the resolver: combining fctx_join() and fctx_try_events() into one function to reduce code duplication, and using fixednames in fetchctx and fetchevent. - Incidental fix: new_adbaddrinfo() can't return NULL anymore, so the code can be simplified.	2021-10-02 11:39:56 -07:00
Evan Hunt	916760ae46	rename dns_zone_master and dns_zone_slave dns_zone_master and dns_zone_slave are renamed as dns_zone_primary and dns_zone_secondary.	2021-08-30 11:06:12 -07:00
Mark Andrews	cd985d96e3	Add additional processing to HTTPS and SVBC records The additional processing method has been expanded to take the owner name of the record, as HTTPS and SVBC need it to process "." in service form. The additional section callback can now return the RRset that was added. We use this when adding CNAMEs. Previously, the recursion would stop if it detected that a record you added already exists. With CNAMEs this rule doesn't work, as you ultimately care about the RRset at the target of the CNAME and not the presence of the CNAME itself. Returning the record allows the caller to restart with the target name. As CNAMEs can form loops, loop protection was added. As HTTPS and SVBC can produce infinite chains, we prevent this by tracking recursion depth and stopping if we go too deep.	2021-08-18 13:49:48 +10:00
Mark Andrews	f0265b8fa6	Make whether to follow additional data records generic Adds dns_rdatatype_followadditional() and DNS_RDATATYPEATTR_FOLLOWADDITIONAL	2021-08-18 13:49:48 +10:00
Michał Kępień	29d8d35869	Tweak query_addds() comments to avoid confusion It has been noticed that commit 7a87bf468b9e092bf65db55a8e9234853c7db63d did not only fix NSEC record handling in signed, insecure delegations prepared using both wildcard expansion and CNAME chaining - it also inadvertently fixed DS record handling in signed, secure delegations of that flavor. This is because the 'rdataset' variable in the relevant location in query_addds() can be either a DS RRset or an NSEC RRset. Update a code comment in query_addds() to avoid confusion. Update the comments describing the purpose of query_addds() so that they also mention NSEC(3) records.	2021-07-16 07:20:15 +02:00
Ondřej Surý	2bb454182b	Make the DNS over HTTPS support optional This commit adds two new autoconf options `--enable-doh` (enabled by default) and `--with-libnghttp2` (mandatory when DoH is enabled). When DoH support is disabled the library is not linked-in and support for http(s) protocol is disabled in the netmgr, named and dig.	2021-07-07 09:50:53 +02:00
Ondřej Surý	4677bb28d1	Remove atomics emulated by a mutex-locked variable Mutex atomics were intended to be used as a debugging tool only and it has already served its purpose and it's not needed anymore.	2021-06-17 09:51:04 +02:00
Artem Boldariev	b84fa122ce	Make BIND refuse to serve XFRs over DoH We cannot use DoH for zone transfers. According to RFC8484 a DoH request contains exactly one DNS message (see Section 6: Definition of the "application/dns-message" Media Type, https://datatracker.ietf.org/doc/html/rfc8484#section-6). This makes DoH unsuitable for zone transfers as often (and usually!) these need more than one DNS message, especially for larger zones. As zone transfers over DoH are not (yet) standardised, nor discussed in RFC8484, the best thing we can do is to return "not implemented." Technically DoH can be used to transfer small zones which fit in one message, but that is not enough for the generic case. Also, this commit makes the server-side DoH code ensure that no multiple responses could be attempted to be sent over one HTTP/2 stream. In HTTP/2 one stream is mapped to one request/response transaction. Now the write callback will be called with failure error code in such a case.	2021-06-14 11:37:36 +03:00
Michał Kępień	7a87bf468b	Fix "no DS" proofs for wildcard+CNAME delegations When answering a query requires wildcard expansion, the AUTHORITY section of the response needs to include NSEC(3) record(s) proving that the QNAME does not exist. When a response to a query is an insecure delegation, the AUTHORITY section needs to include an NSEC(3) proof that no DS record exists at the parent side of the zone cut. These two conditions combined trip up the NSEC part of the logic contained in query_addds(), which expects the NS RRset to be owned by the first name found in the AUTHORITY section of a delegation response. This may not always be true, for example if wildcard expansion causes an NSEC record proving QNAME nonexistence to be added to the AUTHORITY section before the delegation is added to the response. In such a case, named incorrectly omits the NSEC record proving nonexistence of QNAME from the AUTHORITY section. The same block of code is affected by another flaw: if the same NSEC record proves nonexistence of both the QNAME and the DS record at the parent side of the zone cut, this NSEC record will be added to the AUTHORITY section twice. Fix by looking for the NS RRset in the entire AUTHORITY section and adding the NSEC record to the delegation using query_addrrset() (which handles duplicate RRset detection).	2021-06-10 10:13:23 +02:00
Kevin Chen	0cdf85d204	Several serve-stale improvements Commit a83c8cb0afd88d54b9cf67239f2495c9b0391e97 updated masterdump so that stale records in "rndc dumpdb" output no longer shows 0 TTLs. In this commit we change the name of the `rdataset->stale_ttl` field to `rdataset->expired` to make its purpose clearer, and set it to zero in cases where it's unused. Add 'rbtdb->serve_stale_ttl' to various checks so that stale records are not purged from the cache when they've been stale for RBTDB_VIRTUAL (300) seconds. Increment 'ns_statscounter_usedstale' when a stale answer is used. Note: There was a question of whether 'overmem_purge' should be purging ancient records, instead of stale ones. It is left as purging stale records, since stale records could take up the majority of the cache. This submission is copyrighted Akamai Technologies, Inc. and provided under an MPL 2.0 license. This commit was originally authored by Kevin Chen, and was updated by Matthijs Mekking to match recent serve-stale developments.	2021-05-30 11:45:35 -07:00
Matthijs Mekking	c0dc5937c7	Reset DNS_FETCHOPT_TRYSTALE_ONTIMEOUT on resume Once we resume a query, we should clear DNS_FETCHOPT_TRYSTALE_ONTIMEOUT from the options to prevent triggering the stale-answer-client-timeout on subsequent fetches. If we don't this may cause a crash when for example when prefetch is triggered after a query restart.	2021-05-30 00:03:51 -07:00
Evan Hunt	8bd8e995f1	clean up query correctly if already answered by serve-stale when a serve-stale answer has been sent, the client continues waiting for a proper answer. if a final completion event for the client does arrive, it can just be cleaned up without sending a response, similar to a canceled fetch.	2021-05-27 10:35:48 -07:00
Ondřej Surý	28b65d8256	Reduce the number of clientmgr objects created Previously, as a way of reducing the contention between threads a clientmgr object would be created for each interface/IP address. We tasks being more strictly bound to netmgr workers, this is no longer needed and we can just create clientmgr object per worker queue (ncpus). Each clientmgr object than would have a single task and single memory context.	2021-05-24 20:44:54 +02:00
Evan Hunt	b0aadaac8e	rename dns_name_copynf() to dns_name_copy() dns_name_copy() is now the standard name-copying function.	2021-05-22 00:37:27 -07:00
Ondřej Surý	ce3e1abc1d	Use dns_name_copynf() with dns_message_gettempname() when needed dns_message_gettempname() returns an initialized name with a dedicated buffer, associated with a dns_fixedname object. Using dns_name_copynf() to write a name into this object will actually copy the name data from a source name. dns_name_clone() merely points target->ndata to source->ndata, so it is faster, but it can lead to a use-after-free if the source is freed before the target object is released via dns_message_puttempname(). In a few places, clone was being used where copynf should have been; this is now fixed. As a side note, no memory was lost, because the ndata buffer used in the dns_fixedname_t is internal to the structure, and is freed when the dns_fixedname_t is freed regardless of the .ndata contents.	2021-05-21 21:28:10 -07:00
Evan Hunt	e31cc1eeb4	use a fixedname buffer in dns_message_gettempname() dns_message_gettempname() now returns a pointer to an initialized name associated with a dns_fixedname_t object. it is no longer necessary to allocate a buffer for temporary names associated with the message object.	2021-05-20 20:41:29 +02:00
Matthijs Mekking	66f2cd228d	Use isdigit instead of checking character range When looking for key files, we could use isdigit rather than checking if the character is within the range [0-9]. Use (unsigned char) cast to ensure the value is representable in the unsigned char type (as suggested by the isdigit manpage). Change " & 0xff" occurrences to the recommended (unsigned char) type cast.	2021-05-05 19:15:33 +02:00
Mark Andrews	c1190a3fe0	Handle DNAME lookup via itself When answering a query, named should never attempt to add the same RRset to the ANSWER section more than once. However, such a situation may arise when chasing DNAME records: one of the DNAME records placed in the ANSWER section may turn out to be the final answer to a client query, but there is no way to know that in advance. Tweak the relevant INSIST assertion in query_respond() so that it handles this case properly. qctx->rdataset is freed later anyway, so there is no need to clean it up in query_respond().	2021-04-29 10:30:00 +02:00
Matthijs Mekking	104b676235	Serve-stale nit fixes While working on the serve-stale backports, I noticed the following oddities: 1. In the serve-stale system test, in one case we keep track of the time how long it took for dig to complete. In commit aaed7f9d8c2465790d769221dfe8378c7147f5eb, the code removed the exception to check for result == ISC_R_SUCCESS on stale found answers, and adjusted the test accordingly. This failed to update the time tracking accordingly. Move the t1/t2 time track variables back around the two dig commands to ensure the lookups resolved faster than the resolver-query-timeout. 2. We can remove the setting of NS_QUERYATTR_STALEOK and DNS_RDATASETATTR_STALE_ADDED on the "else if (stale_timeout)" code path, because they are added later when we know we have actually found a stale answer on a stale timeout lookup. 3. We should clear the NS_QUERYATTR_STALEOK flag from the client query attributes instead of DNS_RDATASETATTR_STALE_ADDED (that flag is set on the rdataset attributes). 4. In 'bin/named/config.c' we should set the configuration options in alpabetical order. 5. In the ARM, in the backports we have added "(stale)" between "cached" and "RRset" to make more clear a stale RRset may be returned in this scenario.	2021-04-28 12:24:24 +02:00
Matthijs Mekking	3d3a6415f7	If RPZ config'd, bail stale-answer-client-timeout When we are recursing, RPZ processing is not allowed. But when we are performing a lookup due to "stale-answer-client-timeout", we are still recursing. This effectively means that RPZ processing is disabled on such a lookup. In this case, bail the "stale-answer-client-timeout" lookup and wait for recursion to complete, as we we can't perform the RPZ rewrite rules reliably.	2021-04-02 10:02:40 +02:00
Matthijs Mekking	839df94190	Rename "staleonly" The dboption DNS_DBFIND_STALEONLY caused confusion because it implies we are looking for stale data only and ignore any active RRsets in the cache. Rename it to DNS_DBFIND_STALETIMEOUT as it is more clear the option is related to a lookup due to "stale-answer-client-timeout". Rename other usages of "staleonly", instead use "lookup due to...". Also rename related function and variable names.	2021-04-02 10:02:40 +02:00
Matthijs Mekking	3f81d79ffb	Restore the RECURSIONOK attribute after staleonly When doing a staleonly lookup we don't want to fallback to recursion. After all, there are obviously problems with recursion, otherwise we wouldn't do a staleonly lookup. When resuming from recursion however, we should restore the RECURSIONOK flag, allowing future required lookups for this client to recurse.	2021-04-02 10:02:40 +02:00
Matthijs Mekking	aaed7f9d8c	Remove result exception on staleonly lookup When implementing "stale-answer-client-timeout", we decided that we should only return positive answers prematurely to clients. A negative response is not useful, and in that case it is better to wait for the recursion to complete. To do so, we check the result and if it is not ISC_R_SUCCESS, we decide that it is not good enough. However, there are more return codes that could lead to a positive answer (e.g. CNAME chains). This commit removes the exception and now uses the same logic that other stale lookups use to determine if we found a useful stale answer (stale_found == true). This means we can simplify two test cases in the serve-stale system test: nodata.example is no longer treated differently than data.example.	2021-04-02 10:02:40 +02:00
Matthijs Mekking	3d5429f61f	Remove INSIST on NS_QUERYATTR_ANSWERED The NS_QUERYATTR_ANSWERED attribute is to prevent sending a response twice. Without the attribute, this may happen if a staleonly lookup found a useful answer and sends a response to the client, and later recursion ends and also tries to send a response. The attribute was also used to mask adding a duplicate RRset. This is considered harmful. When we created a response to the client with a stale only lookup (regardless if we actually have send the response), we should clear the rdatasets that were added during that lookup. Mark such rdatasets with the a new attribute, DNS_RDATASETATTR_STALE_ADDED. Set a query attribute NS_QUERYATTR_STALEOK if we may have added rdatasets during a stale only lookup. Before creating a response on a normal lookup, check if we can expect rdatasets to have been added during a staleonly lookup. If so, clear the rdatasets from the message with the attribute DNS_RDATASETATTR_STALE_ADDED set.	2021-04-02 09:15:07 +02:00
Matthijs Mekking	48b0dc159b	Simplify when to detach the client With stale-answer-client-timeout, we may send a response to the client, but we may want to hold on to the network manager handle, because recursion is going on in the background, or we need to refresh a stale RRset. Simplify the setting of 'nodetach': * During a staleonly lookup we should not detach the nmhandle, so just set it prior to 'query_lookup()'. * During a staleonly "stalefirst" lookup set the 'nodetach' to true if we are going to refresh the RRset. Now there is no longer the need to clear the 'nodetach' if we go through the "dbfind_stale", "stale_refresh_window", or "stale_only" paths.	2021-04-02 09:14:09 +02:00
Matthijs Mekking	92f7a67892	Refactor stale lookups, ignore active RRsets When doing a staleonly lookup, ignore active RRsets from cache. If we don't, we may add a duplicate RRset to the message, and hit an assertion failure in query.c because adding the duplicate RRset to the ANSWER section failed. This can happen on a race condition. When a client query is received, the recursion is started. When 'stale-answer-client-timeout' triggers around the same time the recursion completes, the following sequence of events may happen: 1. Queue the "try stale" fetch_callback() event to the client task. 2. Add the RRsets from the authoritative response to the cache. 3. Queue the "fetch complete" fetch_callback() event to the client task. 4. Execute the "try stale" fetch_callback(), which retrieves the just-inserted RRset from the database. 5. In "ns_query_done()" we are still recursing, but the "staleonly" query attribute has already been cleared. In other words, the query will resume when recursion ends (it already has ended but is still on the task queue). 6. Execute the "fetch complete" fetch_callback(). It finds the answer from recursion in the cache again and tries to add the duplicate to the answer section. This commit changes the logic for finding stale answers in the cache, such that on "stale_only" lookups actually only stale RRsets are considered. It refactors the code so that code paths for "dbfind_stale", "stale_refresh_window", and "stale_only" are more clear. First we call some generic code that applies in all three cases, formatting the domain name for logging purposes, increment the trystale stats, and check if we actually found stale data that we can use. The "dbfind_stale" lookup will return SERVFAIL if we didn't found a usable answer, otherwise we will continue with the lookup (query_gotanswer()). This is no different as before the introduction of "stale-answer-client-timeout" and "stale-refresh-time". The "stale_refresh_window" lookup is similar to the "dbfind_stale" lookup: return SERVFAIL if we didn't found a usable answer, otherwise continue with the lookup (query_gotanswer()). Finally the "stale_only" lookup. If the "stale_only" lookup was triggered because of an actual client timeout (stale-answer-client-timeout > 0), and if database lookup returned a stale usable RRset, trigger a response to the client. Otherwise return and wait until the recursion completes (or the resolver query times out). If the "stale_only" lookup is a "stale-anwer-client-timeout 0" lookup, preferring stale data over a lookup. In this case if there was no stale data, or the data was not a positive answer, retry the lookup with the stale options cleared, a.k.a. a normal lookup. Otherwise, continue with the lookup (query_gotanswer()) and refresh the stale RRset. This will trigger a response to the client, but will not detach the handle because a fetch will be created to refresh the RRset.	2021-04-02 09:14:09 +02:00
Matthijs Mekking	fee164243f	Keep track of allow client detach The stale-answer-client-timeout feature introduced a dependancy on when a client may be detached from the handle. The dboption DNS_DBFIND_STALEONLY was reused to track this attribute. This overloads the meaning of this database option, and actually introduced a bug because the option was checked in other places. In particular, in 'ns_query_done()' there is a check for 'RECURSING(qctx->client) && (!QUERY_STALEONLY(&qctx->client->query) \|\| ...' and the condition is satisfied because recursion has not completed yet and DNS_DBFIND_STALEONLY is already cleared by that time (in query_lookup()), because we found a useful answer and we should detach the client from the handle after sending the response. Add a new boolean to the client structure to keep track of client detach from handle is allowed or not. It is only disallowed if we are in a staleonly lookup and we didn't found a useful answer.	2021-04-02 09:14:09 +02:00
Matthijs Mekking	87591de6f7	Fix servestale fetchlimits crash When we query the resolver for a domain name that is in the same zone for which is already one or more fetches outstanding, we could potentially hit the fetch limits. If so, recursion fails immediately for the incoming query and if serve-stale is enabled, we may try to return a stale answer. If the resolver is also is authoritative for the parent zone (for example the root zone), first a delegation is found, but we first check the cache for a better response. Nothing is found in the cache, so we try to recurse to find the answer to the query. Because of fetch-limits 'dns_resolver_createfetch()' returns an error, which 'ns_query_recurse()' propagates to the caller, 'query_delegation_recurse()'. Because serve-stale is enabled, 'query_usestale()' is called, setting 'qctx->db' to the cache db, but leaving 'qctx->version' untouched. Now 'query_lookup()' is called to search for stale data in the cache database with a non-NULL 'qctx->version' (which is set to a zone db version), and thus we hit an assertion in rbtdb. This crash was introduced in 'main' by commit 8bcd7fe69e5343071fc917738d6092a8b974ef3f.	2021-03-11 12:16:14 +01:00
Ondřej Surý	a0181056a8	Change the isc_thread_self() return type to uintptr_t The pthread_self(), thrd_current() or GetCurrentThreadId() could actually be a pointer, so we should rather convert the value into uintptr_t instead of unsigned long.	2021-02-25 16:21:10 +01:00

... 2 3 4 5 6 ...

365 Commits