mir/bind - bind - Mike's Git repositories

mir/bind

mirror of https://gitlab.isc.org/isc-projects/bind9 synced 2025-08-28 21:17:54 +00:00

Author	SHA1	Message	Date
Michał Kępień	7eb87270a4	Add CHANGES entry	2021-04-08 10:33:44 +02:00
Michał Kępień	d954e152d9	Free resources when gss_accept_sec_context() fails Even if a call to gss_accept_sec_context() fails, it might still cause a GSS-API response token to be allocated and left for the caller to release. Make sure the token is released before an early return from dst_gssapi_acceptctx().	2021-04-08 10:33:44 +02:00
Ondřej Surý	3c5267cc5c	Merge branch '2600-general-error-managed-keys-zone-dns_journal_compact-failed-no-more' into 'main' Resolve "general: error: managed-keys-zone: dns_journal_compact failed: no more" Closes #2600 See merge request isc-projects/bind9!4849	2021-04-07 19:28:39 +00:00
Mark Andrews	0174098aca	Add CHANGES and release note for [GL #2600 ]	2021-04-07 21:02:10 +02:00
Mark Andrews	bb6f0faeed	Check that upgrade of managed-keys.bind.jnl succeeded Update the system to include a recoverable managed.keys journal created with <size,serial0,serial1,0> transactions and test that it has been updated as part of the start up process.	2021-04-07 20:27:22 +02:00
Mark Andrews	0fbdf189c7	Rewrite managed-key journal immediately Both managed keys and regular zone journals need to be updated immediately when a recoverable error is discovered.	2021-04-07 20:23:46 +02:00
Mark Andrews	83310ffd92	Update dns_journal_compact() to handle bad transaction headers Previously, dns_journal_begin_transaction() could reserve the wrong amount of space. We now check that the transaction is internally consistent when upgrading / downgrading a journal and we also handle the bad transaction headers.	2021-04-07 20:23:46 +02:00
Mark Andrews	520509ac7e	Compute transaction size based on journal/transaction type previously the code assumed that it was a new transaction.	2021-04-07 20:20:57 +02:00
Mark Andrews	5a6112ec8f	Use journal_write_xhdr() to write the dummy transaction header Instead of journal_write(), use correct format call journal_write_xhdr() to write the dummy transaction header which looks at j->header_ver1 to determine which transaction header to write instead of always writing a zero filled journal_rawxhdr_t header.	2021-04-07 20:18:44 +02:00
Ondřej Surý	81c5f5e6a8	Merge branch '2401-ISC_R_TIMEDOUT-is-recoverable' into 'main' netmgr: Make it possible to recover from ISC_R_TIMEDOUT Closes #2401 See merge request isc-projects/bind9!4845	2021-04-07 14:34:46 +00:00
Evan Hunt	5496e51a80	Add CHANGES note for GL #2401	2021-04-07 15:38:16 +02:00
Artem Boldariev	8da12738f1	Use T_CONNECT timeout constant for TCP tests (instead of 1 ms) The netmgr_test would be failing on heavily loaded systems because the connection timeout was set to 1 ms. Use the global constant instead.	2021-04-07 15:37:10 +02:00
Evan Hunt	d2ea8f4245	Ensure dig lookup is detached on UDP connect failure dig could hang when UDP connect failed due to a dangling lookup object.	2021-04-07 15:36:59 +02:00
Ondřej Surý	72ef5f465d	Refactor async callbacks and fix the double tlsdnsconnect callback The isc_nm_tlsdnsconnect() call could end up with two connect callbacks called when the timeout fired and the TCP connection was aborted, but the TLS handshake was not complete yet. isc__nm_connecttimeout_cb() forgot to clean up sock->tls.pending_req when the connect callback was called with ISC_R_TIMEDOUT, leading to a second callback running later. A new argument has been added to the isc__nm__failed_connect_cb and isc__nm__failed_read_cb functions, to indicate whether the callback needs to run asynchronously or not.	2021-04-07 15:36:59 +02:00
Ondřej Surý	58e75e3ce5	Skip long tls_tests in the CI We already skip most of the recv_send tests in CI because they are too timing-related to be run in overloaded environment. This commit adds a similar change to tls_test before we merge tls_test into netmgr_test.	2021-04-07 15:36:59 +02:00
Artem Boldariev	340235c855	Prevent short TLS tests from hanging in case of errors The tests in tls_test.c could hang in the event of a connect error. This commit allows the tests to bail out when such an error occurs.	2021-04-07 15:36:59 +02:00
Evan Hunt	426c40c96d	rearrange nm_teardown() to check correctness after shutting down if a test failed at the beginning of nm_teardown(), the function would abort before isc_nm_destroy() or isc_tlsctx_free() were reached; we would then abort when nm_setup() was run for the next test case. rearranging the teardown function prevents this problem.	2021-04-07 15:36:59 +02:00
Ondřej Surý	86f4872dd6	isc_nm_connect() always return via callback The isc_nm_connect() functions were refactored to always return the connection status via the connect callback instead of sometimes returning the hard failure directly (for example, when the socket could not be created, or when the network manager was shutting down). This commit changes the connect functions in all the network manager modules, and also makes the necessary refactoring changes in places where the connect functions are called.	2021-04-07 15:36:59 +02:00
Evan Hunt	a70cd026df	move UDP connect retries from dig into isc_nm_udpconnect() dig previously ran isc_nm_udpconnect() three times before giving up, to work around a freebsd bug that caused connect() to return a spurious transient EADDRINUSE. this commit moves the retry code into the network manager itself, so that isc_nm_udpconnect() no longer needs to return a result code.	2021-04-07 15:36:59 +02:00
Ondřej Surý	ca12e25bb0	Use generic functions for reading and timers in TCP The TCP module has been updated to use the generic functions from netmgr.c instead of its own local copies. This brings the module mostly up to par with the TCPDNS and TLSDNS modules.	2021-04-07 15:36:59 +02:00
Ondřej Surý	7df8c7061c	Fix and clean up handling of connect callbacks Serveral problems were discovered and fixed after the change in the connection timeout in the previous commits: * In TLSDNS, the connection callback was not called at all under some circumstances when the TCP connection had been established, but the TLS handshake hadn't been completed yet. Additional checks have been put in place so that tls_cycle() will end early when the nmsocket is invalidated by the isc__nm_tlsdns_shutdown() call. * In TCP, TCPDNS and TLSDNS, new connections would be established even when the network manager was shutting down. The new call isc__nm_closing() has been added and is used to bail out early even before uv_tcp_connect() is attempted.	2021-04-07 15:36:59 +02:00
Ondřej Surý	5a87c7372c	Make it possible to recover from connect timeouts Similarly to the read timeout, it's now possible to recover from ISC_R_TIMEDOUT event by restarting the timer from the connect callback. The change here also fixes platforms that missing the socket() options to set the TCP connection timeout, by moving the timeout code into user space. On platforms that support setting the connect timeout via a socket option, the timeout has been hardcoded to 2 minutes (the maximum value of tcp-initial-timeout).	2021-04-07 15:36:58 +02:00
Ondřej Surý	33c00c281f	Make it possible to recover from read timeouts Previously, when the client timed out on read, the client socket would be automatically closed and destroyed when the nmhandle was detached. This commit changes the logic so that it's possible for the callback to recover from the ISC_R_TIMEDOUT event by restarting the timer. This is done by calling isc_nmhandle_settimeout(), which prevents the timeout handling code from destroying the socket; instead, it continues to wait for data. One specific use case for multiple timeouts is serve-stale - the client socket could be created with shorter timeout (as specified with stale-answer-client-timeout), so we can serve the requestor with stale answer, but keep the original query running for a longer time.	2021-04-07 15:36:58 +02:00
Ondřej Surý	0aad979175	Disable netmgr tests only when running under CI The full netmgr test suite is unstable when run in CI due to various timing issues. Previously, we enabled the full test suite only when CI_ENABLE_ALL_TESTS environment variable was set, but that went against original intent of running the full suite when an individual developer would run it locally. This change disables the full test suite only when running in the CI and the CI_ENABLE_ALL_TESTS is not set.	2021-04-07 15:36:58 +02:00
Matthijs Mekking	ad25ca8bc6	Merge branch '2608-stale-answer-client-timeout-default-off' into 'main' Change default stale-answer-client-timeout to off Closes #2608 See merge request isc-projects/bind9!4862	2021-04-07 12:45:48 +00:00
Matthijs Mekking	e443279bbf	Change default stale-answer-client-timeout to off Using "stale-answer-client-timeout" turns out to have unforeseen negative consequences, and thus it is better to disable the feature by default for the time being.	2021-04-07 14:10:31 +02:00
Diego dos Santos Fronza	e8313d91ea	Merge branch '2582-threadsanitizer-data-race-lib-dns-zone-c-10272-7-in-zone_maintenance' into 'main' Resolve "ThreadSanitizer: data race lib/dns/zone.c:10272:7 in zone_maintenance" Closes #2582 See merge request isc-projects/bind9!4864	2021-04-07 12:05:05 +00:00
Diego Fronza	6e08307bc8	Resolve TSAN data race in zone_maintenance Fix race between zone_maintenance and dns_zone_notifyreceive functions, zone_maintenance was attempting to read a zone flag calling DNS_ZONE_FLAG(zone, flag) while dns_zone_notifyreceive was updating a flag in the same zone calling DNS_ZONE_SETFLAG(zone, ...). The code reading the flag in zone_maintenance was not protected by the zone's lock, to avoid a race the zone's lock is now being acquired before an attempt to read the zone flag is made.	2021-04-07 12:04:01 +00:00
Michał Kępień	2e5a6ab7fc	Merge branch '2579-enforce-a-run-time-limit-on-unit-test-binaries' into 'main' Enforce a run time limit on unit test binaries Closes #2579 See merge request isc-projects/bind9!4802	2021-04-07 09:46:40 +00:00
Michał Kępień	6bdd55a9b3	Enforce a run time limit on unit test binaries When a unit test binary hangs, the GitLab CI job in which it is run is stuck until its run time limit is exceeded. Furthermore, it is not trivial to determine which test(s) hung in a given GitLab CI job based on its log. To prevent these issues, enforce a run time limit on every binary executed by the lib/unit-test-driver.sh script. Use a timeout of 5 minutes for consistency with older BIND 9 branches, which employed Kyua for running unit tests. Report an exit code of 124 when the run time limit is exceeded for a unit test binary, for consistency with the "timeout" tool included in GNU coreutils.	2021-04-07 11:41:45 +02:00
Artem Boldariev	d1bb1b01b9	Merge branch '2611-doth-failure' into 'main' Fix "doth" system test failure with SSL_ERROR_SYSCALL (5) See merge request isc-projects/bind9!4863	2021-04-07 08:44:38 +00:00
Artem Boldariev	ee10948e2d	Remove dead code which was supposed to handle TLS shutdowns nicely Fixes Coverity issue CID 330954 (See #2612).	2021-04-07 11:21:08 +03:00
Artem Boldariev	e6062210c7	Handle buggy situations with SSL_ERROR_SYSCALL See "BUGS" section at: https://www.openssl.org/docs/man1.1.1/man3/SSL_get_error.html It is mentioned there that when TLS status equals SSL_ERROR_SYSCALL AND errno == 0 it means that underlying transport layer returned EOF prematurely. However, we are managing the transport ourselves, so we should just resume reading from the TCP socket. It seems that this case has been handled properly on modern versions of OpenSSL. That being said, the situation goes in line with the manual: it is briefly mentioned there that SSL_ERROR_SYSCALL might be returned not only in a case of low-level errors (like system call failures).	2021-04-07 11:21:08 +03:00
Mark Andrews	6b121171a5	Merge branch '2613-lib-dns-gen-is-not-deleted-on-make-clean' into 'main' Resolve "lib/dns/gen is not deleted on make clean" Closes #2613 See merge request isc-projects/bind9!4865	2021-04-07 07:18:53 +00:00
Mark Andrews	9c28df2204	remove lib/dns/gen when running 'make clean'	2021-04-07 08:06:49 +10:00
Matthijs Mekking	8556c7f854	Merge branch '2594-servestale-staleonly-recursion-race' into 'main' Serve-stale "staleonly" recursion race condition See merge request isc-projects/bind9!4859	2021-04-02 11:26:57 +00:00
Matthijs Mekking	3d3a6415f7	If RPZ config'd, bail stale-answer-client-timeout When we are recursing, RPZ processing is not allowed. But when we are performing a lookup due to "stale-answer-client-timeout", we are still recursing. This effectively means that RPZ processing is disabled on such a lookup. In this case, bail the "stale-answer-client-timeout" lookup and wait for recursion to complete, as we we can't perform the RPZ rewrite rules reliably.	2021-04-02 10:02:40 +02:00
Matthijs Mekking	839df94190	Rename "staleonly" The dboption DNS_DBFIND_STALEONLY caused confusion because it implies we are looking for stale data only and ignore any active RRsets in the cache. Rename it to DNS_DBFIND_STALETIMEOUT as it is more clear the option is related to a lookup due to "stale-answer-client-timeout". Rename other usages of "staleonly", instead use "lookup due to...". Also rename related function and variable names.	2021-04-02 10:02:40 +02:00
Matthijs Mekking	3f81d79ffb	Restore the RECURSIONOK attribute after staleonly When doing a staleonly lookup we don't want to fallback to recursion. After all, there are obviously problems with recursion, otherwise we wouldn't do a staleonly lookup. When resuming from recursion however, we should restore the RECURSIONOK flag, allowing future required lookups for this client to recurse.	2021-04-02 10:02:40 +02:00
Matthijs Mekking	aaed7f9d8c	Remove result exception on staleonly lookup When implementing "stale-answer-client-timeout", we decided that we should only return positive answers prematurely to clients. A negative response is not useful, and in that case it is better to wait for the recursion to complete. To do so, we check the result and if it is not ISC_R_SUCCESS, we decide that it is not good enough. However, there are more return codes that could lead to a positive answer (e.g. CNAME chains). This commit removes the exception and now uses the same logic that other stale lookups use to determine if we found a useful stale answer (stale_found == true). This means we can simplify two test cases in the serve-stale system test: nodata.example is no longer treated differently than data.example.	2021-04-02 10:02:40 +02:00
Matthijs Mekking	e44bcc6f53	Add notes and changes for [#2594 ] Pretty newsworthy.	2021-04-02 10:02:40 +02:00
Matthijs Mekking	3d5429f61f	Remove INSIST on NS_QUERYATTR_ANSWERED The NS_QUERYATTR_ANSWERED attribute is to prevent sending a response twice. Without the attribute, this may happen if a staleonly lookup found a useful answer and sends a response to the client, and later recursion ends and also tries to send a response. The attribute was also used to mask adding a duplicate RRset. This is considered harmful. When we created a response to the client with a stale only lookup (regardless if we actually have send the response), we should clear the rdatasets that were added during that lookup. Mark such rdatasets with the a new attribute, DNS_RDATASETATTR_STALE_ADDED. Set a query attribute NS_QUERYATTR_STALEOK if we may have added rdatasets during a stale only lookup. Before creating a response on a normal lookup, check if we can expect rdatasets to have been added during a staleonly lookup. If so, clear the rdatasets from the message with the attribute DNS_RDATASETATTR_STALE_ADDED set.	2021-04-02 09:15:07 +02:00
Matthijs Mekking	48b0dc159b	Simplify when to detach the client With stale-answer-client-timeout, we may send a response to the client, but we may want to hold on to the network manager handle, because recursion is going on in the background, or we need to refresh a stale RRset. Simplify the setting of 'nodetach': * During a staleonly lookup we should not detach the nmhandle, so just set it prior to 'query_lookup()'. * During a staleonly "stalefirst" lookup set the 'nodetach' to true if we are going to refresh the RRset. Now there is no longer the need to clear the 'nodetach' if we go through the "dbfind_stale", "stale_refresh_window", or "stale_only" paths.	2021-04-02 09:14:09 +02:00
Matthijs Mekking	92f7a67892	Refactor stale lookups, ignore active RRsets When doing a staleonly lookup, ignore active RRsets from cache. If we don't, we may add a duplicate RRset to the message, and hit an assertion failure in query.c because adding the duplicate RRset to the ANSWER section failed. This can happen on a race condition. When a client query is received, the recursion is started. When 'stale-answer-client-timeout' triggers around the same time the recursion completes, the following sequence of events may happen: 1. Queue the "try stale" fetch_callback() event to the client task. 2. Add the RRsets from the authoritative response to the cache. 3. Queue the "fetch complete" fetch_callback() event to the client task. 4. Execute the "try stale" fetch_callback(), which retrieves the just-inserted RRset from the database. 5. In "ns_query_done()" we are still recursing, but the "staleonly" query attribute has already been cleared. In other words, the query will resume when recursion ends (it already has ended but is still on the task queue). 6. Execute the "fetch complete" fetch_callback(). It finds the answer from recursion in the cache again and tries to add the duplicate to the answer section. This commit changes the logic for finding stale answers in the cache, such that on "stale_only" lookups actually only stale RRsets are considered. It refactors the code so that code paths for "dbfind_stale", "stale_refresh_window", and "stale_only" are more clear. First we call some generic code that applies in all three cases, formatting the domain name for logging purposes, increment the trystale stats, and check if we actually found stale data that we can use. The "dbfind_stale" lookup will return SERVFAIL if we didn't found a usable answer, otherwise we will continue with the lookup (query_gotanswer()). This is no different as before the introduction of "stale-answer-client-timeout" and "stale-refresh-time". The "stale_refresh_window" lookup is similar to the "dbfind_stale" lookup: return SERVFAIL if we didn't found a usable answer, otherwise continue with the lookup (query_gotanswer()). Finally the "stale_only" lookup. If the "stale_only" lookup was triggered because of an actual client timeout (stale-answer-client-timeout > 0), and if database lookup returned a stale usable RRset, trigger a response to the client. Otherwise return and wait until the recursion completes (or the resolver query times out). If the "stale_only" lookup is a "stale-anwer-client-timeout 0" lookup, preferring stale data over a lookup. In this case if there was no stale data, or the data was not a positive answer, retry the lookup with the stale options cleared, a.k.a. a normal lookup. Otherwise, continue with the lookup (query_gotanswer()) and refresh the stale RRset. This will trigger a response to the client, but will not detach the handle because a fetch will be created to refresh the RRset.	2021-04-02 09:14:09 +02:00
Matthijs Mekking	fee164243f	Keep track of allow client detach The stale-answer-client-timeout feature introduced a dependancy on when a client may be detached from the handle. The dboption DNS_DBFIND_STALEONLY was reused to track this attribute. This overloads the meaning of this database option, and actually introduced a bug because the option was checked in other places. In particular, in 'ns_query_done()' there is a check for 'RECURSING(qctx->client) && (!QUERY_STALEONLY(&qctx->client->query) \|\| ...' and the condition is satisfied because recursion has not completed yet and DNS_DBFIND_STALEONLY is already cleared by that time (in query_lookup()), because we found a useful answer and we should detach the client from the handle after sending the response. Add a new boolean to the client structure to keep track of client detach from handle is allowed or not. It is only disallowed if we are in a staleonly lookup and we didn't found a useful answer.	2021-04-02 09:14:09 +02:00
Artem Boldariev	e7fe606020	Merge branch 'artem/tls-tests-and-fixes' into 'main' TLS transport code refactoring and unit tests See merge request isc-projects/bind9!4851	2021-04-01 15:41:52 +00:00
Artem Boldariev	fa062162a7	Fix crash (regression) in DIG when handling non-DoH responses This commit fixes crash in dig when it encounters non-expected header value. The bug was introduced at some point late in the last DoH development cycle. Also, refactors the relevant code a little bit to ensure better incoming data validation for client-side DoH connections.	2021-04-01 17:31:29 +03:00
Artem Boldariev	11ed7aac5d	TLS code refactoring, fixes and unit-tests This commit fixes numerous stability issues with TLS transport code as well as adds unit tests for it.	2021-04-01 17:31:29 +03:00
Ondřej Surý	01cd310407	Merge branch '2607-remove-custom-spnego' into 'main' Remove custom ISC SPNEGO implementation Closes #2607 See merge request isc-projects/bind9!4856	2021-04-01 14:14:00 +00:00
Ondřej Surý	66bd47a129	Add CHANGES and release note for GL #2607	2021-04-01 16:08:19 +02:00

... 3 4 5 6 7 ...

33814 Commits