While working on 'rndc dnssec -rollover' I noticed the following
(small) issues:
- The key files where updated with hints set to "-when" and that
should always be "now.
- The kasp system test did not properly update the test number when
calling 'rndc dnssec -checkds' (and ensuring that works).
- There was a missing ']' in the rndc.c help output.
Add to the keymgr a function that will schedule a rollover. This
basically means setting the time when the key needs to retire,
and updating the key lifetime, then update the state file. The next
time that named runs the keymgr the new lifetime will be taken into
account.
The CDS/CDNSKEY record will be published when the DS is in the
rumoured state. However, with the introduction of the rndc '-checkds'
command, the logic in the keymgr was changed to prevent the DS
state to go in RUMOURED unless the specific command was given. Hence,
the CDS was never published before it was seen in the parent.
Initially I thought this was a policy approval rule, however it is
actually a DNSSEC timing rule. Remove the restriction from
'keymgr_policy_approval' and update the 'keymgr_transition_time'
function. When looking to move the DS state to OMNIPRESENT it will
no longer calculate the state from its last change, but from when
the DS was seen in the parent, "DS Publish". If the time was not set,
default to next key event of an hour.
Similarly for moving the DS state to HIDDEN, the time to wait will
be derived from the "DS Delete" time, not from when the DS state
last changed.
Add a new 'rndc' command 'dnssec -checkds' that allows the user to
signal named that a new DS record has been seen published in the
parent, or that an existing DS record has been withdrawn from the
parent.
Upon the 'checkds' request, 'named' will write out the new state for
the key, updating the 'DSPublish' or 'DSRemoved' timing metadata.
This replaces the "parent-registration-delay" configuration option,
this was unreliable because it was purely time based (if the user
did not actually submit the new DS to the parent for example, this
could result in an invalid DNSSEC state).
Because we cannot rely on the parent registration delay for state
transition, we need to replace it with a different guard. Instead,
if a key wants its DS state to be moved to RUMOURED, the "DSPublish"
time must be set and must not be in the future. If a key wants its
DS state to be moved to UNRETENTIVE, the "DSRemoved" time must be set
and must not be in the future.
By default, with '-checkds' you set the time that the DS has been
published or withdrawn to now, but you can set a different time with
'-when'. If there is only one KSK for the zone, that key has its
DS state moved to RUMOURED. If there are multiple keys for the zone,
specify the right key with '-key'.
Fix Coverity CHECKED_RETURN reports for dst_key_getbool(). In most
cases we do not really care about its return value, but it is prudent
to check it.
In one case, where a dst_key_getbool() error should be treated
identically as success, cast the return value to void and add a relevant
comment.
Implement the 'rndc dnssec -status' command that will output
some information about the key states, such as which policy is
used for the zone, what keys are in use, and when rollover is
scheduled.
Add loose testing in the kasp system test, the actual times are
already tested via key file inspection.
When creating the successor, the current active key (predecessor)
should change its goal state to HIDDEN.
Also add two useful debug logs in the keymgr_key_rollover function.
Catch a case where if the prepublication time of the successor key
is later than the retire time of the predecessor. If that is the
case we should prepublish as soon as possible, a.k.a. now.
The `dns_keymgr_run()` function became quite long, put the logic
that looks if a new key needs to be created (start a key rollover)
in a separate function.
The logic in `keymgr_key_has_successor(key, keyring)` is flawed, it
returns true if there is any key in the keyring that has a successor,
while what we really want here is to make sure that the given key
has a successor in the given keyring.
Rather than relying on `keymgr_key_exists_with_state`, walk the
list of keys in the keyring and check if the key is a successor of
the given predecessor key.
This improves keytime testing on CSK rollover. It now
tests for specific times, and also tests for SyncPublish and
Removed keytimes.
Since an "active key" for ZSK and KSK means something
different, this makes it tricky to decide when a CSK is
active. An "active key" intuitively means the key is signing
so we say a CSK is active when it is creating zone signatures.
This change means a lot of timings for the CSK rollover tests
need to be adjusted.
The keymgr code needs a slight change on calculating the
prepublication time: For a KSK we need to include the parent
registration delay, but for CSK we look at the zone signing
property and stick with the ZSK prepublication calculation.
Registration delay is not part of the Iret retire interval, thus
removed from the calculation when setting the Delete time metadata.
Include the registration delay in prepublication time, because
we need to prepublish the key sooner than just the Ipub
publication interval.
While kasp relies on key states to determine when a key needs to
be published or be used for signing, the keytimes are used by
operators to get some expectation of key publication and usage.
Update the code such that these keytimes are set appropriately.
That means:
- Print "PublishCDS" and "DeleteCDS" times in the state files.
- The keymgr sets the "Removed" and "PublishCDS" times and derives
those from the dnssec-policy.
- Tweak setting of the "Retired" time, when retiring keys, only
update the time to now when the retire time is not yet set, or is
in the future.
This also fixes a bug in "keymgr_transition_time" where we may wait
too long before zone signatrues become omnipresent or hidden. Not
only can we skip waiting the sign delay Dsgn if there is no
predecessor, we can also skip it if there is no successor.
Finally, this commit moves setting the lifetime, reducing two calls
to one.
Coverity showed that the return value of `dst_key_gettime` was
unchecked in INITIALIZE_STATE. If DST_TIME_CREATED was not set we
would set the state to be initialized to a weird last changed time.
This would normally not happen because DST_TIME_CREATED is always
set. However, we would rather set the time to now (as the comment
also indicates) not match the creation time.
The comment on INITIALIZE_STATE also needs updating as we no
longer always initialize to HIDDEN.
If we initialize goals on all keys, superfluous keys that match
the policy all desire to be active. For example, there are six
keys available for a policy that needs just two, we only want to
set the goal state to OMNIPRESENT on two keys, not six.
Migrating from 'auto-dnssec maintain;' to dnssec-policy did not
work properly, mainly because the legacy keys were initialized
badly. Earlier commit deals with migration where existing keys
match the policy. This commit deals with migration where existing
keys do not match the policy. In that case, named must not
immediately delete the existing keys, but gracefully roll to the
dnssec-policy.
However, named did remove the existing keys immediately. This is
because the legacy key states were initialized badly. Because
those keys had their states initialized to HIDDEN or RUMOURED, the
keymgr decides that they can be removed (because only when the key
has its states in OMNIPRESENT it can be used safely).
The original thought to initialize key states to HIDDEN (and
RUMOURED to deal with existing keys) was to ensure that those keys
will go through the required propagation time before the keymgr
decides they can be used safely. However, those keys are already
in the zone for a long time and making the key states represent
otherwise is dangerous: keys may be pulled out of the zone while
in fact they are required to establish the chain of trust.
Fix initializing key states for existing keys by looking more closely
at the time metadata. Add TTL and propagation delays to the time
metadata and see if the DNSSEC records have been propagated.
Initialize the state to OMNIPRESENT if so, otherwise initialize to
RUMOURED. If the time metadata is in the future, or does not exist,
keep initializing the state to HIDDEN.
The added test makes sure that new keys matching the policy are
introduced, but existing keys are kept in the zone until the new
keys have been propagated.
Migrating from 'auto-dnssec maintain;' to dnssec-policy did not
work properly, mainly because the legacy keys were initialized
badly. Several adjustments in the keymgr are required to get it right:
- Set published time on keys when we calculate prepublication time.
This is not strictly necessary, but it is weird to have an active
key without the published time set.
- Initalize key states also before matching keys. Determine the
target state by looking at existing time metadata: If the time
data is set and is in the past, it is a hint that the key and
its corresponding records have been published in the zone already,
and the state is initialized to RUMOURED. Otherwise, initialize it
as HIDDEN. This fixes migration to dnssec-policy from existing
keys.
- Initialize key goal on keys that match key policy to OMNIPRESENT.
These may be existing legacy keys that are being migrated.
- A key that has its goal to OMNIPRESENT *or* an active key can
match a kasp key. The code was changed with CHANGE 5354 that
was a bugfix to prevent creating new KSK keys for zones in the
initial stage of signing. However, this caused problems for
restarts when rollovers are in progress, because an outroducing
key can still be an active key.
The test for this introduces a new KEY property 'legacy'. This is
used to skip tests related to .state files.
Algorithm rollover waited too long before introducing zone
signatures. It waited to make sure all signatures were resigned,
but when introducing a new algorithm, all signatures are resigned
immediately. Only add the sign delay if there is a predecessor key.
Algorithm rollover was stuck on submitting DS because keymgr thought
it would move to an invalid state. It did not match the current
key because it checked it against the current key in the next state.
Fixed by when checking the current key, check it against the desired
state, not the existing state.
When you do a restart or reconfig of named, or rndc loadkeys, this
triggers the key manager to run. The key manager will check if new
keys need to be created. If there is an active key, and key rollover
is scheduled far enough away, no new key needs to be created.
However, there was a bug that when you just start to sign your zone,
it takes a while before the KSK becomes an active key. An active KSK
has its DS submitted or published, but before the key manager allows
that, the DNSKEY needs to be omnipresent. If you restart named
or rndc loadkeys in quick succession when you just started to sign
your zone, new keys will be created because the KSK is not yet
considered active.
Fix is to check for introducing as well as active keys. These keys
all have in common that their goal is to become omnipresent.
Add a key manager to named. If a 'dnssec-policy' is set, 'named'
will run a key manager on the matching keys. This will do a couple
of things:
1. Create keys when needed (in case of rollover for example)
according to the set policy.
2. Retire keys that are in excess of the policy.
3. Maintain key states according to "Flexible and Robust Key
Rollover" [1]. After key manager ran, key files will be saved to
disk.
[1] https://matthijsmekking.nl/static/pdf/satin2012-Schaeffer.pdf
KEY GENERATION
Create keys according to DNSSEC policy. Zones configured with
'dnssec-policy' will allow 'named' to create DNSSEC keys (similar
to dnssec-keymgr) if not available.
KEY ROLLOVER
Rather than determining the desired state from timing metadata,
add a key state goal. Any keys that are created or picked from the
key ring and selected to be a successor has its key state goal set
to OMNIPRESENT (this key wants to be signing!). At the same time,
a key that is being retired has its key state goal set to HIDDEN.
The keymgr state machine with the three rules will make sure no
introduction or withdrawal of DNSSEC records happens too soon.
KEY TIMINGS
All timings are based on RFC 7583.
The keymgr will return when the next action is happening so
that the zone can set the proper rekey event. Prior to this change
the rekey event will run every hour by default (configurable),
but with kasp we can determine exactly when we need to run again.
The prepublication time is derived from policy.