When both 'broken' and 'failed' test cases appear in unit test output
...
===> Broken tests
lib/isc/tests/socket_test:main -> broken: Test case timed out [300.022s]
===> Failed tests
lib/isc/tests/time_test:main -> failed: 2 of 6 tests failed [0.006s]
===> Summary
...
spurious '===>' string gets matched, that results in the following
error:
Usage error for command debug: '===>' is not a test case identifier (missing ':'?).
Following change makes sure the string is omitted.
I checked on FreeBSD and OpenBSD that the AWK construct is supported.
If we created a key, mark its SyncPublish time as 'now' and started
bind the key might not be published if the SyncPublish time is in
the same second as the time the zone is loaded. This is mostly
for dnssec system test, as this kind of scenario is very unlikely
in a real world environment.
To reproduce the race - create a task, send two events to it, first one
must take some time. Then, from the outside, pause(), unpause() and detach()
the task.
When the long-running event is processed by the task it is in
task_state_running state. When we called pause() the state changed to
task_state_paused, on unpause we checked that there are events in the task
queue, changed the state to task_state_ready and enqueued the task on the
workers readyq. We then detach the task.
The dispatch() is done with processing the event, it processes the second
event in the queue, and then shuts down the task and frees it (as it's not
referenced anymore). Dispatcher then takes the, already freed, task from
the queue where it was wrongly put, causing an use-after free and,
subsequently, either an assertion failure or a segmentation fault.
The probability of this happening is very slim, yet it might happen under a
very high load, more probably on a recursive resolver than on an
authoritative.
The fix introduces a new 'task_state_pausing' state - to which tasks
are moved if they're being paused while still running. They are moved
to task_state_paused state when dispatcher is done with them, and
if we unpause a task in paused state it's moved back to task_state_running
and not requeued.
This is a bug I encountered when trying to schedule an algorithm
rollover. My plan, for a zone whose maximum TTL is 48h, was to sign
with the new algorithm and schedule a change of CDS records for more
than 48 hours in the future, roughly like this:
$ dnssec-keygen -a 13 -fk -Psync now+50h $zone
$ dnssec-keygen -a 13 $zone
$ dnssec-settime -Dsync now+50h $zone_ksk_old
However the algorithm 13 CDS was published immediately, which could
have made the zone bogus.
To reveal the bug using the `smartsign` test, this change just adds a
KSK with all its times in the future, so it should not affect the
existing checks at all. But the final check (that there are no CDS or
CDSNSKEY records after -Dsync) fails with the old `syncpublish()`
logic, because the future key's sync records appear early. With the
new `syncpublish()` logic the future key does not affect the test, as
expected, and it now passes.
When two threads unreferenced handles coming from one socket while
the socket was being destructed we could get a use-after-free:
Having handle H1 coming from socket S1, H2 coming from socket S2,
S0 being a parent socket to S1 and S2:
Thread A Thread B
Unref handle H1 Unref handle H2
Remove H1 from S1 active handles Remove H2 from S2 active handles
nmsocket_maybe_destroy(S1) nmsocket_maybe_destroy(S2)
nmsocket_maybe_destroy(S0) nmsocket_maybe_destroy(S0)
LOCK(S0->lock)
Go through all children, figure
out that we have no more active
handles:
sum of S0->children[i]->ah == 0
UNLOCK(S0->lock)
destroy(S0)
LOCK(S0->lock)
- but S0 is already gone
We weren't consistent about who should unreference the handle in
case of network error. Make it consistent so that it's always the
client code responsibility to unreference the handle - either
in the callback or right away if send function failed and the callback
will never be called.
If taskmgr is shutting down ns_client_setup will fail to create
a task for the newly created client, we weren't cleaning up already
created/attached things (memory context, server, clientmgr).
In tcp and udp stoplistening code we accessed libuv structures
from a different thread, which caused a shutdown crash when named
was under load. Also added additional DbC checks making sure we're
in a proper thread when accessing uv_ functions.
We had a race in which n UDP socket could have been already closing
by libuv but we still sent data to it. Mark socket as not-active
when stopping listening and verify that socket is not active when
trying to send data to it.
if validator_start() is called with validator->event->message set to
NULL, we can't use message->rcode to decide which negative proofs are
needed, so we use the rdataset attributes instead to determine whether
the rdataset was cached as NXDOMAIN or NODATA.