The bin/tests/system/start.pl script waits until a "running" message is
logged by a given name server instance before attempting to send a
version.bind/CH/TXT query to it. The idea behind this was to make the
script wait until named loads all the zones it is configured to serve
before telling the system test framework that a given server is ready to
use; this prevents the need to add boilerplate code that waits for a
specific zone to be loaded to each test expecting that.
The problem is that when it looks for "running" messages, the
bin/tests/system/start.pl script assumes that the existence of any such
message in the named.run file indicates that a given named instance has
already finished loading all zones. Meanwhile, some system tests
restart all the named instances they use throughout their lifetime (some
even do that a few times), for example to run Python-based tests. The
bin/tests/system/start.pl script handles such a scenario incorrectly: as
soon as it finds any "running" message in the named.run file it inspects
and it gets a response to a version.bind/CH/TXT query, it tells the
system test framework that a given server is ready to use, which might
not be true - it is possible that only the "version.bind" zone is loaded
at that point and the "running" message found was logged by a
previously-shutdown named instance. This triggers intermittent failures
for Python-based tests.
Fix by improving the logic that the bin/tests/system/start.pl script
uses to detect server startup: check how many "running" lines are
present in a given named.run file before attempting to start a named
instance and only proceed with version.bind/CH/TXT queries when the
number of "running" lines found in that named.run file increases after
the server is started.
In the "rrsetorder" system test, the ns2 named instance is restarted
without passing the --restart option to bin/tests/system/start.pl. This
causes the log file for that named instance to be needlessly truncated.
Prevent this from happening by restarting the affected named instance
in the same way as all the other named instances used in system tests.
Getting the recorded value of 'edns-udp-size' from the resolver requires
strong attach to the dns_view because we are accessing `view->resolver`.
This is not the case in places (f.e. dns_zone unit) where `.udpsize` is
accessed. By moving the .udpsize field from `struct dns_resolver` to
`struct dns_view`, we can access the value directly even with weakly
attached dns_view without the need to lock the view because `.udpsize`
can be accessed after the dns_view object has been shut down.
The dns_view implements weak and strong reference counting. When strong
reference counting reaches zero, the adb, ntatable and resolver objects
are shut down and detached.
In dns_zone and dns_nta the dns_view was weakly attached, but the
view->resolver reference was accessed directly leading to dereferencing
the NULL pointer.
Add dns_view_getresolver() method which attaches to view->resolver
object under the lock (if it still exists) ensuring the dns_resolver
will be kept referenced until not needed.
The value of `sign_bit` is platform-dependent but constant at compile
time. Use a cast to convert the boolean `sign_bit` to 0 or 1 instead of
ternary `?:` because one branch of the conditional is dead code. (We
could leave out the cast to `size_t` but our style prefers to handle
booleans more explicitly, hence the `?:` that caused the issue.)
*** CID 358310: Possible Control flow issues (DEADCODE)
/lib/isc/resource.c: 118 in isc_resource_setlimit()
112 * rlim_t, and whether rlim_t has a sign bit.
113 */
114 isc_resourcevalue_t rlim_max = UINT64_MAX;
115 size_t wider = sizeof(rlim_max) - sizeof(rlim_t);
116 bool sign_bit = (double)(rlim_t)-1 < 0;
117
>>> CID 358310: Possible Control flow issues (DEADCODE)
>>> Execution cannot reach the expression "1" inside this statement: "rlim_max >>= 8UL * wider + ...".
118 rlim_max >>= CHAR_BIT * wider + (sign_bit ? 1 : 0);
119 rlim_value = ISC_MIN(value, rlim_max);
120 }
121
122 rl.rlim_cur = rl.rlim_max = rlim_value;
123 unixresult = setrlimit(unixresource, &rl);
While refactoring the isc_mem_getx(...) usage, couple places were
identified where the memory was resized manually. Use the
isc_mem_reget(...) that was introduced in [GL !5440] to resize the
arrays via function rather than a custom code.
In several places, the structures were cleaned with memset(...)) and
thus the semantic patch converted the isc_mem_get(...) to
isc_mem_getx(..., ISC_MEM_ZERO). Use the designated initializer to
initialized the structures instead of zeroing the memory with
ISC_MEM_ZERO flag as this better matches the intended purpose.
Add new semantic patch to replace the straightfoward uses of:
ptr = isc_mem_{get,allocate}(..., size);
memset(ptr, 0, size);
with the new API call:
ptr = isc_mem_{get,allocate}x(..., size, ISC_MEM_ZERO);
Previously, the isc_mem_get_aligned() and friends took alignment size as
one of the arguments. Replace the specific function with more generic
extended variant that now accepts ISC_MEM_ALIGN(alignment) for aligned
allocations and ISC_MEM_ZERO for allocations that zeroes
the (re-)allocated memory before returning the pointer to the caller.
Coverity is optimistic that we might do thousands of hashes in less
than a microsecond.
/tests/bench/siphash.c: 54 in main()
48 count++;
49 }
50
51 isc_time_now_hires(&finish);
52
53 us = isc_time_microdiff(&finish, &start);
>>> CID 358309: Integer handling issues (DIVIDE_BY_ZERO)
>>> In expression "count * 1000UL / us", division by expression "us" which may be zero has undefined behavior.
54 printf("%f us wide-lower len %3zu, %7llu kh/s (%llx)\n",
55 (double)us / 1000000.0, len,
56 (unsigned long long)(count * 1000 / us),
57 (unsigned long long)sum);
58 }
59
This is hopefully end of duplication. This batch did not cause clashes
in Sphinx but it was pointless nonetheless as we have auto-generated
anchors for all statements.
Some statement names like "allow-query" had manually defined link anchor
_allow-query and also implicit anchor created by
.. namedconf:statement:: syntax. This causes warnings if a ambiguous
reference is made using :any:`allow-query` syntax.
Remove (hopefully all) manually defined anchors which pointed to
identical place as the implicit anchor. This allows :any: to work.
In rare cases where manual anchor points to descriptive text separated
from statement definition the reference was disamguated by replacing
:any:`notify` with :ref:`notify` (for manual anchor)
vs. :namedconf:ref:`notify` (for statement definition).
Please note that `options` statement is a trap: It is ambiguous even
without manual anchor because rndc.conf has its own `options`. Use
:namedconf:ref:`options` vs. :rndcconf:ref:`options` to select
appropriate target.
Fixup for 2c3b2dabe9a6b3c4a10f6498a1169f39ed031eed.
We forgot to update TSAN paths when moving all the unit tests to
/tests/. Let's remove paths from find to make it less dependent on
exact location, and store all untracked files as we do in the normal
unit test template.
Related: !6243
The previous commit failed some tests because we expect that if a
fetch fails and we have stale candidates in cache, the
stale-refresh-time window is started. This means that if we hit a stale
entry in cache and answering stale data is allowed, we don't bother
resolving it again for as long we are within the stale-refresh-time
window.
This is useful for two reasons:
- If we failed to fetch the RRset that we are looking for, we are not
hammering the authoritative servers.
- Successor clients don't need to wait for stale-answer-client-timeout
to get their DNS response, only the first one to query will take
the latency penalty.
The latter is not useful when stale-answer-client-timeout is 0 though.
So this exception code only to make sure we don't try to refresh the
RRset again if it failed to do so recently.
Refreshing a stale RRset is similar to prefetching an RRset, so
reuse the existing code. When refreshing an RRset we need to clear
all db options related to serve-stale so that stale RRsets in cache
are ignored during the refresh.
We no longer need to set the "nodetach" flag, because the refresh
fetch is now a "fetch and forget". So we can detach from the client
in the query_send().
This code will break some serve-stale test cases, this will be fixed
in the successor commit.
TODO: add explanation why the serve-stale test cases fail.
Formerly, the isc_hash32() would have to change the key in a local copy
to make it case insensitive. Change the isc_siphash24() and
isc_halfsiphash24() functions to lowercase the input directly when
reading it from the memory and converting the uint8_t * array to
64-bit (respectively 32-bit numbers).
dohpath is specfied in draft-ietf-add-svcb-dns and has a value
of 7. It must be a relative path (start with a /), be encoded
as UTF8 and contain the variable dns ({?dns}).
On systems with signed rlim_t the old code calculated its maximum
value by shifting 1 into the sign bit, which is undefined behaviour.
Avoid the bug by using an unsigned shift.
When the TCP test is run on the busy server, the server might take a
while to wind the server down because it might still be processing all
that 300k invalid XFR requests.
Increate the rncd wait time to 120 seconds, the SIGTERM time to 300
seconds, and reduce the time to wait for ans servers from 1200 second
to just 120 seconds.