In case of normal fetch, the .recursionquota is attached and
ns_statscounter_recursclients is incremented when the fetch is created. Then
the .recursionquota is detached and the counter decremented in the
fetch_callback().
In case of prefetch or rpzfetch, the quota is attached, but the counter is not
incremented. When we reach the soft-quota, the function returns early but don't
detach from the quota, and it gets destroyed during the ns_client_endrequest(),
so no memory was leaked.
But because the ns_statscounter_recursclients is only incremented during the
normal fetch the counter would be incorrectly decremented on two occassions:
1) When we reached the softquota, because the quota was not properly detached
2) When the prefetch or rpzfetch was cancelled mid-flight and the callback
function was never called.
Rather than group key ids together, group key id with its
corresponding counters. This should make growing / shrinking easier
than having keyids then counters.
Add a statschannel test case for DNSSEC sign metrics that has more
keys than there are allocated stats counters for. This will produce
gibberish, but at least it should not crash.
The first attempt to add DNSSEC sign statistics was naive: for each
zone we allocated 64K counters, twice. In reality each zone has at
most four keys, so the new approach only has room for four keys per
zone. If after a rollover more keys have signed the zone, existing
keys are rotated out.
The DNSSEC sign statistics has three counters per key, so twelve
counters per zone. First counter is actually a key id, so it is
clear what key contributed to the metrics. The second counter
tracks the number of generated signatures, and the third tracks
how many of those are refreshes.
This means that in the zone structure we no longer need two separate
references to DNSSEC sign metrics: both the resign and refresh stats
are kept in a single dns_stats structure.
Incrementing dnssecsignstats:
Whenever a dnssecsignstat is incremented, we look up the key id
to see if we already are counting metrics for this key. If so,
we update the corresponding operation counter (resign or
refresh).
If the key is new, store the value in a new counter and increment
corresponding counter.
If all slots are full, we rotate the keys and overwrite the last
slot with the new key.
Dumping dnssecsignstats:
Dumping dnssecsignstats is no longer a simple wrapper around
isc_stats_dump, but uses the same principle. The difference is that
rather than dumping the index (key tag) and counter, we have to look
up the corresponding counter.
Resolve "Changing from auto-dnssec maintain to dnssec-policy x immediately deletes existing keys"
Closes#1706
See merge request isc-projects/bind9!3322
Add a test to ensure migration from 'auto-dnssec maintain;' to
dnssec-policy works even if the algorithm is changed. The existing
keys should not be removed immediately, but their goal should be
changed to become hidden, and the new keys with the different
algorithm should be introduced immediately.
If we initialize goals on all keys, superfluous keys that match
the policy all desire to be active. For example, there are six
keys available for a policy that needs just two, we only want to
set the goal state to OMNIPRESENT on two keys, not six.
Migrating from 'auto-dnssec maintain;' to dnssec-policy did not
work properly, mainly because the legacy keys were initialized
badly. Earlier commit deals with migration where existing keys
match the policy. This commit deals with migration where existing
keys do not match the policy. In that case, named must not
immediately delete the existing keys, but gracefully roll to the
dnssec-policy.
However, named did remove the existing keys immediately. This is
because the legacy key states were initialized badly. Because
those keys had their states initialized to HIDDEN or RUMOURED, the
keymgr decides that they can be removed (because only when the key
has its states in OMNIPRESENT it can be used safely).
The original thought to initialize key states to HIDDEN (and
RUMOURED to deal with existing keys) was to ensure that those keys
will go through the required propagation time before the keymgr
decides they can be used safely. However, those keys are already
in the zone for a long time and making the key states represent
otherwise is dangerous: keys may be pulled out of the zone while
in fact they are required to establish the chain of trust.
Fix initializing key states for existing keys by looking more closely
at the time metadata. Add TTL and propagation delays to the time
metadata and see if the DNSSEC records have been propagated.
Initialize the state to OMNIPRESENT if so, otherwise initialize to
RUMOURED. If the time metadata is in the future, or does not exist,
keep initializing the state to HIDDEN.
The added test makes sure that new keys matching the policy are
introduced, but existing keys are kept in the zone until the new
keys have been propagated.
A few kasp system test tweaks to improve test failure debugging and
deal with tests related to migration to dnssec-policy.
1. When clearing a key, set lifetime to "none". If "none", skip
expect no lifetime set in the state file. Legacy keys that
are migrated but don't match the dnssec-policy will not have a
lifetime.
2. The kasp system test prints which key id and file it is checking.
Log explicitly if we are checking the id or a file.
3. Add quotes around "ID" when setting the key id, for consistency.
4. Fix a typo (non -> none).
5. Print which key ids are found, this way it is easier to see what
KEY[1-4] failed to match one of the key files.
Migrating from 'auto-dnssec maintain;' to dnssec-policy did not
work properly, mainly because the legacy keys were initialized
badly. Several adjustments in the keymgr are required to get it right:
- Set published time on keys when we calculate prepublication time.
This is not strictly necessary, but it is weird to have an active
key without the published time set.
- Initalize key states also before matching keys. Determine the
target state by looking at existing time metadata: If the time
data is set and is in the past, it is a hint that the key and
its corresponding records have been published in the zone already,
and the state is initialized to RUMOURED. Otherwise, initialize it
as HIDDEN. This fixes migration to dnssec-policy from existing
keys.
- Initialize key goal on keys that match key policy to OMNIPRESENT.
These may be existing legacy keys that are being migrated.
- A key that has its goal to OMNIPRESENT *or* an active key can
match a kasp key. The code was changed with CHANGE 5354 that
was a bugfix to prevent creating new KSK keys for zones in the
initial stage of signing. However, this caused problems for
restarts when rollovers are in progress, because an outroducing
key can still be an active key.
The test for this introduces a new KEY property 'legacy'. This is
used to skip tests related to .state files.
The rwlock introduced to protect the .logconfig member of isc_log_t
structure caused a significant performance drop because of the rwlock
contention. It was also found, that the debug_level member of said
structure was not protected from concurrent read/writes.
The .dynamic and .highest_level members of isc_logconfig_t structure
were actually just cached values pulled from the assigned channels.
We introduced an even higher cache level for .dynamic and .highest_level
members directly into the isc_log_t structure, so we don't have to
access the .logconfig member in the isc_log_wouldlog() function.
After an RPZ zone is updated via zone transfer, the RPZ summary
database is updated, inserting the newly added names in the policy
zone and deleting the newly removed ones. The first part of this
was quantized so it would not run too long and starve other tasks
during large updates, but the second part was not quantized, so
that an update in which a large number of records were deleted
could cause named to become briefly unresponsive.
We could have a race between handle closing and processing async
callback. Deactivate the handle before issuing the callback - we
have the socket referenced anyway so it's not a problem.
We introduce a isc_quota_attach_cb function - if ISC_R_QUOTA is returned
at the time the function is called, then a callback will be called when
there's quota available (with quota already attached). The callbacks are
organized as a LIFO queue in the quota structure.
It's needed for TCP client quota - with old networking code we had one
single place where tcp clients quota was processed so we could resume
accepting when the we had spare slots, but it's gone with netmgr - now
we need to notify the listener/accepter that there's quota available so
that it can resume accepting.
Remove unused isc_quota_force() function.
The isc_quote_reserve and isc_quota_release were used only internally
from the quota.c and the tests. We should not expose API we are not
using.
ORACLE MySQL 8.0 has dropped the my_bool type, so we need to reinstate
it back when compiling with that version or higher. MariaDB is still
keeping the my_bool type. The numbering between the two (MariaDB 5.x
jumped to MariaDB 10.x) doesn't make the life of the developer easy.
Most build/test job names already contain a "clang", "gcc", or "msvc"
prefix which indicates the compiler used for a given job. Apply that
naming convention to all build/test job names.
Multiple YAML keys have identical values for both TSAN unit test job
definitions. Extract these common keys to a YAML anchor and use it in
TSAN unit test job definitions to reduce code duplication.
Definitions of jobs running unit tests under TSAN contain an
"after_script" YAML key. Since the "unit_test_job" anchor is included
in those job definitions before "after_script" is defined, the
job-specific value of that key overrides the one defined in the included
anchor. This prevents "kyua report-html" from being run for TSAN unit
test jobs. Moving the invocation of "kyua report-html" to the "script"
key in the "unit_test_job" anchor is not acceptable as it would cause
the exit code of that command to determine the result of all unit test
jobs and we need that to be the exit code of "make unit". Instead, add
"kyua report-html" invocations to the "after_script" key of TSAN unit
test job definitions to address the problem without affecting other job
definitions.
Multiple YAML keys have identical values for both TSAN system test job
definitions. Extract these common keys to a YAML anchor and use it in
TSAN system test job definitions to reduce code duplication.
Both "system_test_job" and "unit_test_job" YAML anchors contain a
"before_script" key. TSAN job definitions first specify their own value
of the "before_script" key and then include the aforementioned YAML
anchors, which results in the value of the "before_script" key being
overridden with the value specified by the included anchor. Given this,
remove "before_script" definitions specific to TSAN jobs as they serve
no practical purpose.
All assignments for the TSAN_OPTIONS variable are identical across the
entire .gitlab-ci.yml file. Define a global TSAN_OPTIONS_COMMON
variable and use it in job definitions to reduce code duplication.