If a TCP connection fails while attempting to send a query to a server,
the fetch context will be restarted without marking the target server as
a bad one. If this happens for a server which:
- was already marked with the DNS_FETCHOPT_EDNS512 flag,
- responds to EDNS queries with the UDP payload size set to 512 bytes,
- does not send response packets larger than 512 bytes,
and the response for the query being sent is larger than 512 byes, then
named will pointlessly alternate between sending UDP queries with EDNS
UDP payload size set to 512 bytes (which are responded to with truncated
answers) and TCP connections until the fetch context retry limit is
reached. Prevent such query loops by marking the server as bad for a
given fetch context if the advertised EDNS UDP payload size for that
server gets reduced to 512 bytes and it is impossible to reach it using
TCP.
I was truncating zone files for experimental purposes when I found
that `named-compilezone | head` got stuck. The full command line that
exhibited the problem was:
dig axfr dotat.at |
named-compilezone -o /dev/stdout dotat.at /dev/stdin |
head
This requires a large enough zone to exhibit the problem, more than
about 70000 bytes of plain text output from named-compilezone.
I was running the command on Debian Stretch amd64.
This was puzzling since it looked like something was suppressing the
SIGPIPE. I used `strace` to examine what was happening at the hang.
The program was just calling write() a lot to print the zone file, and
the last write() hanged until I sent it a SIGINT.
During some discussion with friends, Ian Jackson guessed that opening
/dev/stdout O_RDRW might be the problem, and after some tests we found
that this does in fact suppress SIGPIPE.
Since `named-compilezone` only needs to write to its output file, the
fix is to omit the stdio "+" update flag.
It was found that NSEC Aggressive Caching has a significant performance impact
on BIND 9 when used as recursor. This commit disables the synth-from-dnssec
configuration option by default to provide immediate remedy for people running
BIND 9.12+. The NSEC Aggressive Cache will be enabled again after a proper fix
will be prepared.
Make the release checklist match the current release process better by
adding missing steps, rearranging existing ones, reassigning
responsibilities, and dividing the list into sections (by due date).
cppcheck 1.89 emits a false positive for lib/dns/spnego_asn1.c:
lib/dns/spnego_asn1.c:698:9: error: Uninitialized variable: data [uninitvar]
memset(data, 0, sizeof(*data));
^
lib/dns/spnego.c:1707:47: note: Calling function 'decode_NegTokenResp', 3rd argument '&resp' value is <Uninit>
ret = decode_NegTokenResp(buf + taglen, len, &resp, NULL);
^
lib/dns/spnego_asn1.c:698:9: note: Uninitialized variable: data
memset(data, 0, sizeof(*data));
^
This message started appearing with cppcheck 1.89 [1], but it will be
gone in the next release [2], so just suppress it for the time being.
[1] af214e8212
[2] 2595b82634
cppcheck 1.89 enabled certain value flow analysis mechanisms [1] which
trigger null pointer dereference false positives in lib/dns/rpz.c:
lib/dns/rpz.c:582:7: warning: Possible null pointer dereference: tgt_ip [nullPointer]
if (KEY_IS_IPV4(tgt_prefix, tgt_ip)) {
^
lib/dns/rpz.c:1419:44: note: Calling function 'adj_trigger_cnt', 4th argument 'NULL' value is 0
adj_trigger_cnt(rpzs, rpz_num, rpz_type, NULL, 0, true);
^
lib/dns/rpz.c:582:7: note: Null pointer dereference
if (KEY_IS_IPV4(tgt_prefix, tgt_ip)) {
^
lib/dns/rpz.c:596:7: warning: Possible null pointer dereference: tgt_ip [nullPointer]
if (KEY_IS_IPV4(tgt_prefix, tgt_ip)) {
^
lib/dns/rpz.c:1419:44: note: Calling function 'adj_trigger_cnt', 4th argument 'NULL' value is 0
adj_trigger_cnt(rpzs, rpz_num, rpz_type, NULL, 0, true);
^
lib/dns/rpz.c:596:7: note: Null pointer dereference
if (KEY_IS_IPV4(tgt_prefix, tgt_ip)) {
^
lib/dns/rpz.c:610:7: warning: Possible null pointer dereference: tgt_ip [nullPointer]
if (KEY_IS_IPV4(tgt_prefix, tgt_ip)) {
^
lib/dns/rpz.c:1419:44: note: Calling function 'adj_trigger_cnt', 4th argument 'NULL' value is 0
adj_trigger_cnt(rpzs, rpz_num, rpz_type, NULL, 0, true);
^
lib/dns/rpz.c:610:7: note: Null pointer dereference
if (KEY_IS_IPV4(tgt_prefix, tgt_ip)) {
^
It seems that cppcheck no longer treats at least some REQUIRE()
assertion failures as fatal, so add extra assertion macro definitions to
lib/isc/include/isc/util.h that are only used when the CPPCHECK
preprocessor macro is defined; these definitions make cppcheck 1.89
behave as expected.
There is an important requirement for these custom definitions to work:
cppcheck must properly treat abort() as a function which does not
return. In order for that to happen, the __GNUC__ macro must be set to
a high enough number (because system include directories are used and
system headers compile attributes away if __GNUC__ is not high enough).
__GNUC__ is thus set to the major version number of the GCC compiler
used, which is what that latter does itself during compilation.
[1] aaeec462e6
Commit afa81ee4e4 omitted some spots in
the source tree which are still referencing the removed --with-cc-alg
"configure" option. Make sure the latter is removed completely.
When a GitLab CI runner is not under load, a single OpenBSD system test
job completes in about 12 minutes, which is considered decent. However,
such jobs are usually multiplexed with other system test jobs on the
same host, which causes each of them to take even 40 minutes to
complete. Taking retries into account, this is completely unacceptable
for everyday use, so only start OpenBSD system test jobs for pipelines
created through GitLab's web interface and for pipelines created for Git
tags.
Since the Windows build job does not use the files created as a result
of running "autoreconf -fi" in the "autoreconf:sid:amd64" job, set its
dependencies to an empty list.
Since it is currently not possible to use "needs: []" for jobs which do
not belong to the first stage of a pipeline, set the "needs" key for the
Windows build job to the "autoreconf:sid:amd64" job so that all build
jobs are started at the same time (without this change, the Windows
build job does not start until all jobs in the "precheck" stage are
finished).
As a side note, these changes also attempt to eliminate intermittent,
bogus GitLab error messages ("There has been a missing dependency
failure").
The intended purpose of the "autoreconf:sid:amd64" GitLab CI job is to
run "autoreconf -fi" and then pass the updated files on to subsequent
non-Windows build jobs. However, the artifacts currently created by
that job only include files which are not tracked by Git. Since we
currently do track e.g. "configure" with Git, the aforementioned job is
essentially a no-op. Fix by manually specifying the files generated by
the "autoreconf:sid:amd64" job that should be passed on to subsequent
build jobs.