Replace the custom DNS server used in the "dispatch" system test with
new code based on the isctest.asyncserver module.
Merge branch 'stepan/dispatch-asyncserver' into 'main'
See merge request isc-projects/bind9!10689
When the tests-connreset.py module was initially implemented in commit
5c17919019ef0af8226e5bb61214b805bb3e2451, the dispatch code did not
properly apply the idle timeout to TCP connections. This allowed the
check in that test module to reset the TCP connection after 5 seconds as
named did not attempt to tear the connection down earlier than that.
However, as the dispatch code was improved, the idle timeout started
being enforced for TCP dispatches; the exact value it is set to in the
current code depends on a given server's SRTT, but it defaults to about
1.2 seconds for responsive servers. This means that the code paths
triggered by the "dispatch" system test are now different than the ones
it was originally supposed to trigger because it is now named itself
that shuts the TCP connection down cleanly before the ans3 server gets a
chance to reset it.
Account for the above by lowering the amount of time after which the
ans3 server in the "dispatch" system test resets TCP connections to just
1 second, so that the test actually does what its name implies.
Add a TCP connection handler, ConnectionReset, which enables closing TCP
connections without emptying the client socket buffer, causing the
kernel to send an RST segment to the client. This relies on a horrible
asyncio hack that can break at any point in the future due to abusing
implementation details in the Python Standard Library. Despite the eye
bleeding this code may cause, the approach it takes was still deemed
preferable to implementing an asyncio transport from scratch just to
enable triggering connection resets.
In response to client queries, AsyncDnsServer users can currently only
make the server either send a reply or silently ignore the query. In
the case of TCP queries, neither of these actions causes the client's
connection to be closed - the onus of doing that is on the client.
However, in some cases the server may be required to close the
connection on its own, so AsyncDnsServer users need to have some way of
requesting such an action.
Add a new ResponseAction subclass, ResponseDropAndCloseConnection, which
enables AsyncDnsServer users to conveniently request TCP connections to
be closed. Instead of returning the response to send,
ResponseDropAndCloseConnection raises a custom exception that
AsyncDnsServer._handle_tcp() handles accordingly.
Move the client_addrs and client_refs to libtest to prevent this.
Merge branch 'ondrej/fix-odr-violation' into 'main'
See merge request isc-projects/bind9!10771
Locally, clang reported odr-violation:
=================================================================
==588371==ERROR: AddressSanitizer: odr-violation (0x55555556a060):
[1] size=256 'client_addrs' ../tests/ns/netmgr_wrap.c:36:18 in /home/ondrej/Projects/bind9/build/tests/ns/query
[2] size=256 'client_addrs' ../tests/ns/netmgr_wrap.c:36:18 in /home/ondrej/Projects/bind9/build/tests/ns/../libbindtest.so
These globals were registered at these points:
[1]:
#0 0x7ffff785306f in __asan_register_globals ../../../../src/libsanitizer/asan/asan_globals.cpp:350
#1 0x7ffff6a2a303 in call_init ../csu/libc-start.c:145
#2 0x7ffff6a2a303 in __libc_start_main_impl ../csu/libc-start.c:347
#3 0x55555555a084 in _start (/home/ondrej/Projects/bind9/build/tests/ns/query+0x6084) (BuildId: fbe4a3fcf1a249c7d7da69ee8b255a1dbb610c7a)
[2]:
#0 0x7ffff785306f in __asan_register_globals ../../../../src/libsanitizer/asan/asan_globals.cpp:350
#1 0x7ffff7fca71e in call_init elf/dl-init.c:74
#2 0x7ffff7fca823 in call_init elf/dl-init.c:120
#3 0x7ffff7fca823 in _dl_init elf/dl-init.c:121
#4 0x7ffff7fe459f (/lib64/ld-linux-x86-64.so.2+0x1f59f) (BuildId: 281ac1521b4102509b1c7ac7004db7c1efb81796)
==588371==HINT: if you don't care about these errors you may set ASAN_OPTIONS=detect_odr_violation=0
SUMMARY: AddressSanitizer: odr-violation: global 'client_addrs' at ../tests/ns/netmgr_wrap.c:36:18 in /home/ondrej/Projects/bind9/build/tests/ns/query
==588371==ABORTING
Move the client_addrs and client_refs to libtest to prevent this.
Refactor the network manager to be a single object which is not exposed to the caller.
Merge branch 'ondrej/refactor-isc_netmgr-to-be-singleton' into 'main'
See merge request isc-projects/bind9!10735
There is only a single network manager running on top of the loop
manager (except for tests). Refactor the network manager to be a
singleton (a single instance) and change the unit tests, so that the
shorter read timeouts apply only to a specific handle, not the whole
extra 'connect_nm' network manager instance.
All the applications built on top of the loop manager were required to
create a single instance of the loop manager. Refactor the loop
manager not to expose this instance to the callers, and keep the loop
manager object internal to the `isc_loop` compilation unit.
This significantly simplifies a number of data structures and calls to
the `isc_loop` API.
Merge branch 'ondrej/refactor-isc_loopmgr-to-be-singleton' into 'main'
See merge request isc-projects/bind9!10733
All the applications built on top of the loop manager were required to
create just a single instance of the loop manager. Refactor the loop
manager to not expose this instance to the callers and keep the loop
manager object internal to the isc_loop compilation unit.
This significantly simplifies a number of data structures and calls to
the isc_loop API.
The log message 'shut down hung fetch while resolving' may be confusing
because no detection of hung fetches actually takes place, but rather
the timer on the fetch context expires and the resolver gives up.
Change the log message to actually say that instead of the original
cryptic message about hung fetch.
Closes#3148
Merge branch '3148-rename-shut-down-hung-fetch' into 'main'
See merge request isc-projects/bind9!10759
The log message 'shut down hung fetch while resolving' may be confusing
because no detection of hung fetches actually takes place, but rather
the timer on the fetch context expires and the resolver gives up.
Change the log message to actually say that instead of the original
cryptic message about hung fetch.
The original `ans.pl` server was a copy of the one in `fetchlimit`, so
there are some changes:
- The server now only responds with A replies (which is the only thing
needed).
- The incrementing of the IP address goes beyond the least significant
octet (so, after 192.0.2.255 it will yield 192.0.3.0).
Merge branch 'stepan/zero-asyncserver' into 'main'
See merge request isc-projects/bind9!10597
The original `ans.pl` server was based on a copy of the one in
`fetchlimit`, so there are some changes:
- The server now only responds with A replies (which is the only thing
needed).
- The incrementing of the IP address goes beyond the least significant
octet (so, after 192.0.2.255 it will yield 192.0.3.0).
With serve-stale enabled, a CNAME chain that contains a stale RRset, the refresh query doesn't always properly refresh the stale RRsets. This has been fixed.
Closes#5243
Merge branch '5243-stale-refresh-as-prefetch' into 'main'
See merge request isc-projects/bind9!10720
A serve-stale refresh is similar to a prefetch, the only difference
is when it triggers. Where a prefetch is done when an RRset is about
to expire, a serve-stale refresh is done when the RRset is already
stale.
This means that the check for the stale-refresh window needs to
move into query_stale_refresh(). We need to clear the
DNS_DBFIND_STALEENABLED option at the same places as where we clear
DNS_DBFIND_STALETIMEOUT.
Now that serve-stale refresh acts the same as prefetch, there is no
worry that the same rdataset is added to the message twice. This makes
some code obsolete, specifically where we need to clear rdatasets from
the message.
The LSP server (using clangd) was always complaining about:
Suspicious string literal, probably missing a comma
for the two Local IPv6 Unicast Addresses strings that spanned
across multiple lines. Disable clang-format for these two lines.
Merge branch 'ondrej/fix-suspicious-string-literal-probably-missing-comma' into 'main'
See merge request isc-projects/bind9!10764
The LSP server (using clangd) was always complaining about:
Suspicious string literal, probably missing a comma
for the two Local IPv6 Unicast Addresses strings that spanned
across multiple lines. Disable clang-format for these two lines.
The beauty and horrors of the C - the compiler properly detects variable
shadowing, but you can freely shadow a standard function 'free()' with
variable called 'free'. And if you reference 'free()' just as 'free'
you get the function pointer which means you can do also pointer
arithmetics, so 'free > 0' is always valid even when you delete the
local variable.
Replace the local variables 'free' with a name that doesn't shadow the
'free()' function to prevent future hard to detect bugs.
Replace the custom DNS server used in the "fetchlimit" system test
with new code based on the isctest.asyncserver module.
Merge branch 'stepan/fetchlimit-asyncserver' into 'main'
See merge request isc-projects/bind9!10614
Aggressive use of DNSSEC-Validated cache with NSEC was not working in scenarios when no parent NSEC was not in cache. This has been fixed.
Closes#5422
Merge branch '5422-aggressive-nsec-not-working' into 'main'
See merge request isc-projects/bind9!10736
Add \007.no-apex-covering as an owner name so that the cache does
not get primed with a parent NSEC RRset to test the case where
dns_qp_lookup returns ISC_R_NOTFOUND.
dns_qp_lookup was returning ISC_R_NOTFOUND rather than DNS_R_PARTIALMATCH
when there wasn't a parent with a NSEC record in the cache. This was
causing find_coveringnsec to fail rather than returing the covering NSEC.
The kasp test cases assume that keymgr operations on the zone under test
have been completed before the test is executed. These are typically
quite fast, but the logs need to be explicitly checked for the messages,
otherwise there's a possibility of race conditions causing the
kasp/rollover tests to become unstable.
Call the wait function in all the kasp/rollover tests where it is
expected (which is generally in each test, unless we're dealing with
unsigned zones).
Closes#5371
Merge branch '5371-wait-keymgr-done-rollover-kasp-tests' into 'main'
See merge request isc-projects/bind9!10717
The kasp test cases assume that keymgr operations on the zone under test
have been completed before the test is executed. These are typically
quite fast, but the logs need to be explicitly checked for the messages,
otherwise there's a possibility of race conditions causing the
kasp/rollover tests to become unstable.
Call the wait function in all the kasp/rollover tests where it is
expected (which is generally in each test, unless we're dealing with
unsigned zones).
Many of our test cases only use a single NamedInstance from the
`servers` fixture. Introduce `nsX` helper fixtures to simplify these
tests and reduce boilterplate code further.
Specifically, the test no longer has to either define its own variable
to extract a single server from the list, or use the longer
servers["nsX"] syntax. While this may seem minor, the amount of times it
is repeated across the tests justifies the change. It also promotes
using more explicit server identification, i.e. `nsX`, rather than
generic `server`. This also improves the clarity of the tests and may be
helpful in traceback during debugging as well.
Prior to this change, there was a single `rollover` test directory, containing 8 tests. These contained even more test scenarios, that were mostly unrelated to each other. This made debugging or even comprehending the tests difficult, as you'd often have to grasp the importance (or rather lack of it) of thousands of lines of setup, configuration and test code, and debug logs.
Now the tests were split up into 14 different test directories, containing 67 tests in total. This makes it much more comprehensible to understand what's going on in any single of these test cases, as there is no unrelated code. It also allows better parallelization and debugging of individual test cases, because of the improved granularity.
Merge branch 'nicki/split-rollover-test-cases' into 'main'
See merge request isc-projects/bind9!10581
Previously, a lot of the checking was re-implemented and duplicated from
check_rollover_step(). Use that function where possible and only
override the needed checks.