mir/bind - bind - Mike's Git repositories

mir/bind

mirror of https://gitlab.isc.org/isc-projects/bind9 synced 2025-08-31 14:35:26 +00:00

Author	SHA1	Message	Date
Matthijs Mekking	9af8caa733	Implement draft-vandijk-dnsop-nsec-ttl The draft says that the NSEC(3) TTL must have the same TTL value as the minimum of the SOA MINIMUM field and the SOA TTL. This was always the intended behaviour. Update the zone structure to also track the SOA TTL. Whenever we use the MINIMUM value to determine the NSEC(3) TTL, use the minimum of MINIMUM and SOA TTL instead. There is no specific test for this, however two tests need adjusting because otherwise they failed: They were testing for NSEC3 records including the TTL. Update these checks to use 600 (the SOA TTL), rather than 3600 (the SOA MINIMUM).	2021-04-13 11:26:26 +02:00
Matthijs Mekking	8ffb4b0a13	Merge branch '2289-cache-dump-stale-ttl-weird-values' into 'main' Fix nonsensical stale TTL values in cache dump Closes #2289 See merge request isc-projects/bind9!4799	2021-04-13 08:54:49 +00:00
Matthijs Mekking	a83c8cb0af	Use stale TTL as RRset TTL in dumpdb It is more intuitive to have the countdown 'max-stale-ttl' as the RRset TTL, instead of 0 TTL. This information was already available in a comment "; stale (will be retained for x more seconds", but Support suggested to put it in the TTL field instead.	2021-04-13 09:48:20 +02:00
Matthijs Mekking	debee6157b	Check staleness in bind_rdataset Before binding an RRset, check the time and see if this record is stale (or perhaps even ancient). Marking a header stale or ancient happens only when looking up an RRset in cache, but binding an RRset can also happen on other occasions (for example when dumping the database). Check the time and compare it to the header. If according to the time the entry is stale, but not ancient, set the STALE attribute. If according to the time is ancient, set the ANCIENT attribute. We could mark the header stale or ancient here, but that requires locking, so that's why we only compare the current time against the rdh_ttl. Adjust the test to check the dump-db before querying for data. In the dumped file the entry should be marked as stale, despite no cache lookup happened since the initial query.	2021-04-13 09:48:20 +02:00
Matthijs Mekking	2a5e0232ed	Fix nonsensical stale TTL values in cache dump When introducing change 5149, "rndc dumpdb" started to print a line above a stale RRset, indicating how long the data will be retained. At that time, I thought it should also be possible to load a cache from file. But if a TTL has a value of 0 (because it is stale), stale entries wouldn't be loaded from file. So, I added the 'max-stale-ttl' to TTL values, and adjusted the $DATE accordingly. Since we actually don't have a "load cache from file" feature, this is premature and is causing confusion at operators. This commit changes the 'max-stale-ttl' adjustments. A check in the serve-stale system test is added for a non-stale RRset (longttl.example) to make sure the TTL in cache is sensible. Also, the comment above stale RRsets could have nonsensical values. A possible reason why this may happen is when the RRset was marked a stale but the 'max-stale-ttl' has passed (and is actually an RRset awaiting cleanup). This would lead to the "will be retained" value to be negative (but since it is stored in an uint32_t, you would get a nonsensical value (e.g. 4294362497). To mitigate against this, we now also check if the header is not ancient. In addition we check if the stale_ttl would be negative, and if so we set it to 0. Most likely this will not happen because the header would already have been marked ancient, but there is a possible race condition where the 'rdh_ttl + serve_stale_ttl' has passed, but the header has not been checked for staleness.	2021-04-13 09:48:20 +02:00
Mark Andrews	1941ce99d4	Merge branch '2622-command-line-option-l-not-shown-with-usage-message' into 'main' Resolve "Command-line option -L not shown with usage message" Closes #2622 See merge request isc-projects/bind9!4881	2021-04-13 01:33:28 +00:00
Mark Andrews	38449de93b	Update named's usage description	2021-04-12 12:07:44 +10:00
Michał Kępień	b64af491bf	Merge branch 'michal/add-placeholder-entries-to-CHANGES' into 'main' Add placeholders for GL #2467, GL #2540, GL #2604 See merge request isc-projects/bind9!4878	2021-04-08 11:10:54 +00:00
Michał Kępień	0874242db6	Add placeholders for GL #2467 , GL #2540 , GL #2604	2021-04-08 13:06:57 +02:00
Michał Kępień	b517108cbc	Merge branch '2578-rework-get_ports.sh-to-make-it-not-use-a-lock-file' into 'main' Rework get_ports.sh to make it not use a lock file Closes #2578 See merge request isc-projects/bind9!4801	2021-04-08 09:37:51 +00:00
Michał Kępień	c3718b926b	Use the same port selection method on all systems When system tests are run on Windows, they are assigned port ranges that are 100 ports wide and start from port number 5000. This is a different port assignment method than the one used on Unix systems. Drop the "-p" command line option from bin/tests/system/run.sh invocations used for starting system tests on Windows to unify the port assignment method used across all operating systems.	2021-04-08 11:12:37 +02:00
Michał Kępień	31e5ca4bd9	Rework get_ports.sh to make it not use a lock file The get_ports.sh script is used for determining the range of ports a given system test should use. It first determines the start of the port range to return (the base port); it can either be specified explicitly by the caller or chosen randomly. Subsequent ports are picked sequentially, starting from the base port. To ensure no single port is used by multiple tests, a state file (get_ports.state) containing the last assigned port is maintained by the script. Concurrent access to the state file is protected by a lock file (get_ports.lock); if one instance of the script holds the lock file while another instance tries to acquire it, the latter retries its attempt to acquire the lock file after sleeping for 1 second; this retry process can be repeated up to 10 times before the script returns an error. There are some problems with this approach: - the sleep period in case of failure to acquire the lock file is fixed, which leads to a "thundering herd" type of problem, where (depending on how processes are scheduled by the operating system) multiple system tests try to acquire the lock file at the same time and subsequently sleep for 1 second, only for the same situation to likely happen the next time around, - the lock file is being locked and then unlocked for every single port assignment made, not just once for the entire range of ports a system test should use; in other words, the lock file is currently locked and unlocked 13 times per system test; this increases the odds of the "thundering herd" problem described above preventing a system test from getting one or more ports assigned before the maximum retry count is reached (assuming multiple system tests are run in parallel); it also enables the range of ports used by a given system test to be non-sequential (which is a rather cosmetic issue, but one that can make log interpretation harder than necessary when test failures are diagnosed), - both issues described above cause unnecessary delays when multiple system tests are started in parallel (due to high lock file contention among the system tests being started), - maintaining a state file requires ensuring proper locking, which complicates the script's source code. Rework the get_ports.sh script so that it assigns non-overlapping port ranges to its callers without using a state file or a lock file: - add a new command line switch, "-t", which takes the name of the system test to assign ports for, - ensure every instance of get_ports.sh knows how many ports all system tests which form the test suite are going to need in total (based on the number of subdirectories found in bin/tests/system/), - in order to ensure all instances of get_ports.sh work on the same global port range (so that no port range collisions happen), a stable (throughout the expected run time of a single system test suite) base port selection method is used instead of the random one; specifically, the base port, unless specified explicitly using the "-p" command line switch, is derived from the number of hours which passed since the Unix Epoch time, - use the name of the system test to assign ports for (passed via the new "-t" command line switch) as a unique index into the global system test range, to ensure all system tests use disjoint port ranges.	2021-04-08 11:12:37 +02:00
Michal Nowak	a2484f2673	Merge branch 'mnowak/fix-missing-fromhex.pl-in-out-of-tree' into 'main' Move fromhex.pl script to bin/tests/system/ See merge request isc-projects/bind9!4875	2021-04-08 09:07:07 +00:00
Michal Nowak	cd0a34df1b	Move fromhex.pl script to bin/tests/system/ The fromhex.pl script needs to be copied from the source directory to the build directory before any test is run, otherwise the out-of-tree fails to find it. Given that the script is used only in system test, move it to bin/tests/system/.	2021-04-08 11:04:26 +02:00
Michał Kępień	e01b3ccfaa	Merge branch '2620-free-resources-when-gss_accept_sec_context-fails' into 'main' Free resources when gss_accept_sec_context() fails Closes #2620 See merge request isc-projects/bind9!4873	2021-04-08 08:40:27 +00:00
Michał Kępień	7eb87270a4	Add CHANGES entry	2021-04-08 10:33:44 +02:00
Michał Kępień	d954e152d9	Free resources when gss_accept_sec_context() fails Even if a call to gss_accept_sec_context() fails, it might still cause a GSS-API response token to be allocated and left for the caller to release. Make sure the token is released before an early return from dst_gssapi_acceptctx().	2021-04-08 10:33:44 +02:00
Ondřej Surý	3c5267cc5c	Merge branch '2600-general-error-managed-keys-zone-dns_journal_compact-failed-no-more' into 'main' Resolve "general: error: managed-keys-zone: dns_journal_compact failed: no more" Closes #2600 See merge request isc-projects/bind9!4849	2021-04-07 19:28:39 +00:00
Mark Andrews	0174098aca	Add CHANGES and release note for [GL #2600 ]	2021-04-07 21:02:10 +02:00
Mark Andrews	bb6f0faeed	Check that upgrade of managed-keys.bind.jnl succeeded Update the system to include a recoverable managed.keys journal created with <size,serial0,serial1,0> transactions and test that it has been updated as part of the start up process.	2021-04-07 20:27:22 +02:00
Mark Andrews	0fbdf189c7	Rewrite managed-key journal immediately Both managed keys and regular zone journals need to be updated immediately when a recoverable error is discovered.	2021-04-07 20:23:46 +02:00
Mark Andrews	83310ffd92	Update dns_journal_compact() to handle bad transaction headers Previously, dns_journal_begin_transaction() could reserve the wrong amount of space. We now check that the transaction is internally consistent when upgrading / downgrading a journal and we also handle the bad transaction headers.	2021-04-07 20:23:46 +02:00
Mark Andrews	520509ac7e	Compute transaction size based on journal/transaction type previously the code assumed that it was a new transaction.	2021-04-07 20:20:57 +02:00
Mark Andrews	5a6112ec8f	Use journal_write_xhdr() to write the dummy transaction header Instead of journal_write(), use correct format call journal_write_xhdr() to write the dummy transaction header which looks at j->header_ver1 to determine which transaction header to write instead of always writing a zero filled journal_rawxhdr_t header.	2021-04-07 20:18:44 +02:00
Ondřej Surý	81c5f5e6a8	Merge branch '2401-ISC_R_TIMEDOUT-is-recoverable' into 'main' netmgr: Make it possible to recover from ISC_R_TIMEDOUT Closes #2401 See merge request isc-projects/bind9!4845	2021-04-07 14:34:46 +00:00
Evan Hunt	5496e51a80	Add CHANGES note for GL #2401	2021-04-07 15:38:16 +02:00
Artem Boldariev	8da12738f1	Use T_CONNECT timeout constant for TCP tests (instead of 1 ms) The netmgr_test would be failing on heavily loaded systems because the connection timeout was set to 1 ms. Use the global constant instead.	2021-04-07 15:37:10 +02:00
Evan Hunt	d2ea8f4245	Ensure dig lookup is detached on UDP connect failure dig could hang when UDP connect failed due to a dangling lookup object.	2021-04-07 15:36:59 +02:00
Ondřej Surý	72ef5f465d	Refactor async callbacks and fix the double tlsdnsconnect callback The isc_nm_tlsdnsconnect() call could end up with two connect callbacks called when the timeout fired and the TCP connection was aborted, but the TLS handshake was not complete yet. isc__nm_connecttimeout_cb() forgot to clean up sock->tls.pending_req when the connect callback was called with ISC_R_TIMEDOUT, leading to a second callback running later. A new argument has been added to the isc__nm__failed_connect_cb and isc__nm__failed_read_cb functions, to indicate whether the callback needs to run asynchronously or not.	2021-04-07 15:36:59 +02:00
Ondřej Surý	58e75e3ce5	Skip long tls_tests in the CI We already skip most of the recv_send tests in CI because they are too timing-related to be run in overloaded environment. This commit adds a similar change to tls_test before we merge tls_test into netmgr_test.	2021-04-07 15:36:59 +02:00
Artem Boldariev	340235c855	Prevent short TLS tests from hanging in case of errors The tests in tls_test.c could hang in the event of a connect error. This commit allows the tests to bail out when such an error occurs.	2021-04-07 15:36:59 +02:00
Evan Hunt	426c40c96d	rearrange nm_teardown() to check correctness after shutting down if a test failed at the beginning of nm_teardown(), the function would abort before isc_nm_destroy() or isc_tlsctx_free() were reached; we would then abort when nm_setup() was run for the next test case. rearranging the teardown function prevents this problem.	2021-04-07 15:36:59 +02:00
Ondřej Surý	86f4872dd6	isc_nm_connect() always return via callback The isc_nm_connect() functions were refactored to always return the connection status via the connect callback instead of sometimes returning the hard failure directly (for example, when the socket could not be created, or when the network manager was shutting down). This commit changes the connect functions in all the network manager modules, and also makes the necessary refactoring changes in places where the connect functions are called.	2021-04-07 15:36:59 +02:00
Evan Hunt	a70cd026df	move UDP connect retries from dig into isc_nm_udpconnect() dig previously ran isc_nm_udpconnect() three times before giving up, to work around a freebsd bug that caused connect() to return a spurious transient EADDRINUSE. this commit moves the retry code into the network manager itself, so that isc_nm_udpconnect() no longer needs to return a result code.	2021-04-07 15:36:59 +02:00
Ondřej Surý	ca12e25bb0	Use generic functions for reading and timers in TCP The TCP module has been updated to use the generic functions from netmgr.c instead of its own local copies. This brings the module mostly up to par with the TCPDNS and TLSDNS modules.	2021-04-07 15:36:59 +02:00
Ondřej Surý	7df8c7061c	Fix and clean up handling of connect callbacks Serveral problems were discovered and fixed after the change in the connection timeout in the previous commits: * In TLSDNS, the connection callback was not called at all under some circumstances when the TCP connection had been established, but the TLS handshake hadn't been completed yet. Additional checks have been put in place so that tls_cycle() will end early when the nmsocket is invalidated by the isc__nm_tlsdns_shutdown() call. * In TCP, TCPDNS and TLSDNS, new connections would be established even when the network manager was shutting down. The new call isc__nm_closing() has been added and is used to bail out early even before uv_tcp_connect() is attempted.	2021-04-07 15:36:59 +02:00
Ondřej Surý	5a87c7372c	Make it possible to recover from connect timeouts Similarly to the read timeout, it's now possible to recover from ISC_R_TIMEDOUT event by restarting the timer from the connect callback. The change here also fixes platforms that missing the socket() options to set the TCP connection timeout, by moving the timeout code into user space. On platforms that support setting the connect timeout via a socket option, the timeout has been hardcoded to 2 minutes (the maximum value of tcp-initial-timeout).	2021-04-07 15:36:58 +02:00
Ondřej Surý	33c00c281f	Make it possible to recover from read timeouts Previously, when the client timed out on read, the client socket would be automatically closed and destroyed when the nmhandle was detached. This commit changes the logic so that it's possible for the callback to recover from the ISC_R_TIMEDOUT event by restarting the timer. This is done by calling isc_nmhandle_settimeout(), which prevents the timeout handling code from destroying the socket; instead, it continues to wait for data. One specific use case for multiple timeouts is serve-stale - the client socket could be created with shorter timeout (as specified with stale-answer-client-timeout), so we can serve the requestor with stale answer, but keep the original query running for a longer time.	2021-04-07 15:36:58 +02:00
Ondřej Surý	0aad979175	Disable netmgr tests only when running under CI The full netmgr test suite is unstable when run in CI due to various timing issues. Previously, we enabled the full test suite only when CI_ENABLE_ALL_TESTS environment variable was set, but that went against original intent of running the full suite when an individual developer would run it locally. This change disables the full test suite only when running in the CI and the CI_ENABLE_ALL_TESTS is not set.	2021-04-07 15:36:58 +02:00
Matthijs Mekking	ad25ca8bc6	Merge branch '2608-stale-answer-client-timeout-default-off' into 'main' Change default stale-answer-client-timeout to off Closes #2608 See merge request isc-projects/bind9!4862	2021-04-07 12:45:48 +00:00
Matthijs Mekking	e443279bbf	Change default stale-answer-client-timeout to off Using "stale-answer-client-timeout" turns out to have unforeseen negative consequences, and thus it is better to disable the feature by default for the time being.	2021-04-07 14:10:31 +02:00
Diego dos Santos Fronza	e8313d91ea	Merge branch '2582-threadsanitizer-data-race-lib-dns-zone-c-10272-7-in-zone_maintenance' into 'main' Resolve "ThreadSanitizer: data race lib/dns/zone.c:10272:7 in zone_maintenance" Closes #2582 See merge request isc-projects/bind9!4864	2021-04-07 12:05:05 +00:00
Diego Fronza	6e08307bc8	Resolve TSAN data race in zone_maintenance Fix race between zone_maintenance and dns_zone_notifyreceive functions, zone_maintenance was attempting to read a zone flag calling DNS_ZONE_FLAG(zone, flag) while dns_zone_notifyreceive was updating a flag in the same zone calling DNS_ZONE_SETFLAG(zone, ...). The code reading the flag in zone_maintenance was not protected by the zone's lock, to avoid a race the zone's lock is now being acquired before an attempt to read the zone flag is made.	2021-04-07 12:04:01 +00:00
Michał Kępień	2e5a6ab7fc	Merge branch '2579-enforce-a-run-time-limit-on-unit-test-binaries' into 'main' Enforce a run time limit on unit test binaries Closes #2579 See merge request isc-projects/bind9!4802	2021-04-07 09:46:40 +00:00
Michał Kępień	6bdd55a9b3	Enforce a run time limit on unit test binaries When a unit test binary hangs, the GitLab CI job in which it is run is stuck until its run time limit is exceeded. Furthermore, it is not trivial to determine which test(s) hung in a given GitLab CI job based on its log. To prevent these issues, enforce a run time limit on every binary executed by the lib/unit-test-driver.sh script. Use a timeout of 5 minutes for consistency with older BIND 9 branches, which employed Kyua for running unit tests. Report an exit code of 124 when the run time limit is exceeded for a unit test binary, for consistency with the "timeout" tool included in GNU coreutils.	2021-04-07 11:41:45 +02:00
Artem Boldariev	d1bb1b01b9	Merge branch '2611-doth-failure' into 'main' Fix "doth" system test failure with SSL_ERROR_SYSCALL (5) See merge request isc-projects/bind9!4863	2021-04-07 08:44:38 +00:00
Artem Boldariev	ee10948e2d	Remove dead code which was supposed to handle TLS shutdowns nicely Fixes Coverity issue CID 330954 (See #2612).	2021-04-07 11:21:08 +03:00
Artem Boldariev	e6062210c7	Handle buggy situations with SSL_ERROR_SYSCALL See "BUGS" section at: https://www.openssl.org/docs/man1.1.1/man3/SSL_get_error.html It is mentioned there that when TLS status equals SSL_ERROR_SYSCALL AND errno == 0 it means that underlying transport layer returned EOF prematurely. However, we are managing the transport ourselves, so we should just resume reading from the TCP socket. It seems that this case has been handled properly on modern versions of OpenSSL. That being said, the situation goes in line with the manual: it is briefly mentioned there that SSL_ERROR_SYSCALL might be returned not only in a case of low-level errors (like system call failures).	2021-04-07 11:21:08 +03:00
Mark Andrews	6b121171a5	Merge branch '2613-lib-dns-gen-is-not-deleted-on-make-clean' into 'main' Resolve "lib/dns/gen is not deleted on make clean" Closes #2613 See merge request isc-projects/bind9!4865	2021-04-07 07:18:53 +00:00
Mark Andrews	9c28df2204	remove lib/dns/gen when running 'make clean'	2021-04-07 08:06:49 +10:00

... 2 3 4 5 6 ...

33779 Commits