Overview
On 2022-08-25, the history of the main
branch in the public BIND 9
project was partially1 rewritten. This action was
taken to amend a series of 13 broken commits which
triggered badTimezone
errors when git fsck
was run an a fresh clone
of the repository.
How Was the Problem Introduced?
While working on merge requests isc-projects/bind9!6570,
isc-projects/bind9!6571, and isc-projects/bind9!6572, a malformed git filter-repo
command was used to update the author & commit dates of the
relevant commits before merging them (because they were drafted months
before their intended merge date). That command used the string +020
instead of +0200
as the timezone offset, which resulted in Git
interpreting the time zone offset as "0 hours 20 minutes" instead of "2
hours 0 minutes". As no known timezone uses an offset of 20 minutes
from UTC, git fsck
flags that as an error.
Why Was the Problem Not Noticed Earlier?
In most default Git configurations, git fsck
is not automatically run
during typical day-to-day operations. The error was discovered during
an auto-scheduled monthly GitLab repository check.
We will be adding routine git fsck
checks to GitLab CI pipelines in
order to enable catching similar issues earlier in the future.
Was Rewriting Public Branch History the Only Solution?
It was the only solution that allowed resolving the problem for good for any future users of the BIND 9 source code repository. Modifying commit metadata inevitably leads to the SHA-1 hashes of all the affected commits being updated (by design).
The alternative was to not do anything and leave the malformed commits
in the main
branch. However, some tools/environments run git fsck
during routine operations (like cloning) and error out when the check
fails. In other words, leaving the broken commits in the main
branch
could possibly prevent people from cloning it in the future - and the
longer we would wait with addressing the problem, the bigger the
potential disruption would have become over time.
The two most important factors that led to the decision to rewrite public Git history were that:
- a relatively low number of commits (~200) were affected,
- the malformed commits were never backported to any stable branches.
Unfortunately, one public development release (v9_19_4
) has been
tagged with the malformed commits present in its history. That tag had
to be refreshed2 to point to the rewritten commits - otherwise, the
broken commits would remain reachable indefinitely, rendering the
applied fix ineffective and pointless. Refreshing a tag was deemed
acceptable for a development release.
Are All Traces of the Old Commits Gone?
Not really.
Once all refs/heads/*
refs are rebased on top of the rewritten main
branch, fresh clones of the repository will no longer contain the
malformed commits. However, GitLab tends to hang on to anything for
which a merge request or a pipeline was created pretty much
indefinitely, in the form of hidden refs/keep-around/*
refs. However, those hidden refs are not advertised
during cloning and only exist in the GitLab-side repository, so they
should not cause trouble for external users. Still, git fsck
run on
the GitLab server will continue to report badTimezone
errors as long
as those hidden refs exist (because that makes the broken commits
reachable), i.e. indefinitely. However, that is not considered to be a
major issue.
Furthermore, GitLab's repository cleanup feature was not used for cleaning up the rewritten commits as it was determined that doing that removes some of the hidden refs (which e.g. makes it impossible to view the MRs whose commits were rewritten in GitLab's web interface), but not all of them. In other words, taking that action would break harmless things (as this was not e.g. a case of leaking sensitive information) while still not enabling the malformed commits to eventually be completely purged from the repository. Given this, using the repository cleanup mechanism was deemed pointless.