During loop manager refactoring isc_nmsocket_set_tlsctx() was not
properly adapted. The function is expected to broadcast the new TLS
context for every worker, but this behaviour was accidentally broken.
Replace all uses of RUNTIME_CHECK() in lib/isc/include/isc/once.h with
PTHEADS_RUNTIME_CHECK(), in order to improve error reporting for any
once-related run-time failures (by augmenting error messages with
file/line/caller information and the error string corresponding to
errno).
There are multiple reasons to remove this test as obsolete:
- The test may not possibly work for over 2.5 years, since
98b3b93791 removed the rndc.py python
tool on which this test relies.
- It isn't part of the test suite either in CI or locally unless it is
explicitly enabled. As a result, there are many issues which prevent
the test from being executed caused by various refactoring efforts
accumulated over time.
- Even if the test could be executed, it has no clear failure condition.
If the python script(s) fail, the test still passes.
Now that the artificial limit on the recv buffer has been removed, the
current system test always fails because it tests if the truncation has
happened.
Add test that sending more than 10 headers makes the connection to
closed; and add test that sending huge HTTP request makes the connection
to be closed.
Rewrite the isc_httpd to be more robust.
1. Replace the hand-crafted HTTP request parser with picohttpparser for
parsing the whole HTTP/1.0 and HTTP/1.1 requests. Limit the number
of allowed headers to 10 (arbitrary number).
2. Replace the hand-crafted URL parser with isc_url_parse for parsing
the URL from the HTTP request.
3. Increase the receive buffer to match the isc_netmgr buffers, so we
can at least receive two full isc_nm_read()s. This makes the
truncation processing much simpler.
4. Process the received buffer from single isc_nm_read() in a single
loop and schedule the sends to be independent of each other.
The first two changes makes the code simpler and rely on already
existing libraries that we already had (isc_url based on nodejs) or are
used elsewhere (picohttpparser).
The second two changes remove the artificial "truncation" limit on
parsing multiple request. Now only a request that has too many
headers (currently 10) or is too big (so, the receive buffer fills up
without reaching end of the request) will end the connection.
We can be benevolent here with the limites, because the statschannel
channel is by definition private and access must be allowed only to
administrators of the server. There are no timers, no rate-limiting, no
upper limit on the number of requests that can be served, etc.
PicoHTTPParser is a tiny, primitive, fast HTTP request/response parser.
Unlike most parsers, it is stateless and does not allocate memory by
itself. All it does is accept pointer to buffer and the output
structure, and setups the pointers in the latter to point at the
necessary portions of the buffer.
Works nicely together with:
git config --add blame.ignoreRevsFile .git-blame-ignore-revs
The list was generated by hand-picking from git log --oneline augmented
with:
--author=tbox
--grep=clang-format
--grep=copyright
--grep=reformat
--grep=whitespace
plus
git log --format='commit %H %s' --stat | grep -E 'commit|changed' | grep -B1 '[0-9][0-9][0-9] files changed'
plus some sanity checking.
Comments were added with:
for COMMIT in $(cat .git-blame-ignore-revs)
do git log -1 --format="# %s" "$COMMIT"
echo $COMMIT
done
sizeof(dns_name_t) did not change but the boolean attributes are now
separated as one-bit structure members. This allows debuggers to
pretty-print dns_name_t attributes without any special hacks, plus we
got rid of manual bit manipulation code.
Originally RBT node stored three lowest bits from dns_name_t attributes.
This had a curious side-effect noticed by Tony Finch:
If you create an rbt node from a DYNAMIC name then the flag will be
propagated through dns_rbt_namefromnode() ... if you subsequently call
dns_name_free() it will try to isc_mem_put() a piece of an rbt node ...
but dns_name_free() REQUIRE()s that the name is dynamic so in the usual
case where rbt nodes are created from non-dynamic names, this kind of
code will fail an assertion.
This is a bug it dates back to june 1999 when NAMEATTR_DYNAMIC was
invented.
Apparently it does not happen often :-)
I'm planning to get rid of DNS_NAMEATTR_ definitions and bit operations,
so removal of this "three-bit-subset" assignment is a first step.
We can keep only the ABSOLUTE flag in RBT node and nothing else because
names attached to rbt nodes are always readonly: The internal node_name()
function always sets the NAMEATTR_READONLY when making a dns_name that
refers to the node's name, so the READONLY flag will be set in the name
returned by dns_rbt_namefromnode().
Co-authored-by: Tony Finch <fanf@isc.org>
doth system test fixes - decrese the size of HTTP listener quota, increase transfer-in/out limits
Closes#3596
See merge request isc-projects/bind9!6898
Sometimes doth test could intermittently fail shortly after start due
to inability to complete a zone transfer in time. As it turned out, it
could happen due to transfers-in/out limits. Initially the defaults
were fine, but over time, especially when adding Strict/Mutual TLS, we
added more than 10 zones so it became possible to hit the limits.
This commit takes care of that by bumping the limits.
This commit reduces the size of HTTP listener quota from 300 (default)
to 100 so that it would make hitting any global limits in case of
running multiple tests in parallel in multiple containers unlikely.
This way the need in opening many file descriptors of different
kinds (e.g. client side connections and pipes) gets significantly
reduced while the required code paths are still verified.
In ac4cc8443d, the ISC_R_CONNREFUSED was
removed in connect_read_cb, but it can actually happen in the udp_test:
[ RUN ] udp_recv_send
connect_read_cb(0x7f2c2801a270, connection refused, (nil))
The ISC_R_SHUTTINGDOWN should be handled the same as ISC_R_CANCELED in
the udp__send_cb(), as we might be sending the data while the
loopmgr/netmgr shutdown has been initiated.
In rare circumstances, the UDP port for the listening socket and the UDP
port for the connecting socket might be the same. Because we use the
"reuse" port socket option, this isn't caught when binding the socket,
and thus the connected client socket could send a datagram to itself,
completely bypassing the server. This doesn't happen under normal
operation mode because `named` is listening on a privileged port (53),
and even if not, it doesn't usually talk to itself as the tests do.
Pick an arbitrary port for listening (9153-9156) that is outside the
ephemeral port range for the network manager related unit tests (except
the `doh_test).
The isc_nm_udpconnect() erroneously set the reuse port with
load-balancing on the outgoing connected UDP sockets. This socket
option makes only sense for the listening sockets. Don't set the
load-balancing reuse port option on the outgoing UDP sockets.
Since we are testing UDP on the localhost and the same interface, the
UDP datagrams can't get lost. Change the connect read callback, so it
starts reading again on the timeout instead of just getting stuck, and
fail when any other result codes than ISC_R_SUCCESS and ISC_R_TIMEDOUT
are received because we don't expect them to happen in these simple
tests.
This commit removes broken remnants of unit test slowdown logic, which
caused unit test hangs on platforms susceptible to "too many open
files" error, notably OpenBSD.
This commit fixes TLS DNS verification error message reporting which
we probably broke during one of the recent networking code
refactorings.
This prevent e.g. dig from producing useful error messages related to
TLS certificates verification.
Ensure that TLS error is empty before calling SSL_get_error() or doing
SSL I/O so that the result will not get affected by prior error
statuses.
In particular, the improper error handling led to intermittent unit
test failure and, thus, could be responsible for some of the system
test failures and other intermittent TLS-related issues.
See here for more details:
https://www.openssl.org/docs/man3.0/man3/SSL_get_error.html
In particular, it mentions the following:
> The current thread's error queue must be empty before the TLS/SSL
> I/O operation is attempted, or SSL_get_error() will not work
> reliably.
As we use the result of SSL_get_error() to decide on I/O operations,
we need to ensure that it works reliably by cleaning the error queue.
TLS DNS: empty error queue before attempting I/O
There was inconsistency in which error codes would get accepted and
ignored in the network manager unit test callbacks. Add following
results, so we just detach the handle instead of causing assertion
failure:
* ISC_R_SHUTTINGDOWN - when the network manager is shutting down
* ISC_R_CANCELED - the socket has been shut down
* ISC_R_EOF - the (TCP) communication has ended on the other side
* ISC_R_CONNECTIONRESET - the TCP connection was reset
This should fix some of the spurious unit test failures.
The check is left from when tcp_connect_direct() called isc__nm_socket()
and it was uncertain whether it had succeeded, but now isc__nm_socket()
is called before tcp_connect_direct(), so sock->fd cannot be -1.
*** CID 357292: (REVERSE_NEGATIVE)
/lib/isc/netmgr/tcp.c: 309 in isc_nm_tcpconnect()
303
304 atomic_store(&sock->active, true);
305
306 result = tcp_connect_direct(sock, req);
307 if (result != ISC_R_SUCCESS) {
308 atomic_store(&sock->active, false);
>>> CID 357292: (REVERSE_NEGATIVE)
>>> You might be using variable "sock->fd" before verifying that it is >= 0.
309 if (sock->fd != (uv_os_sock_t)(-1)) {
310 isc__nm_tcp_close(sock);
311 }
312 isc__nm_connectcb(sock, req, result, true);
313 }
314
If sending took too long the isc_nm_read() could timeout twice, leading
to extra 'cread' counter in the udp_cancel_read test. Increase the
cread counter only on ISC_R_EOF (canceled read) and deal with the
multiple ISC_R_TIMEOUTS gracefully.
The bin/tests/system/start.pl script waits until a "running" message is
logged by a given name server instance before attempting to send a
version.bind/CH/TXT query to it. The idea behind this was to make the
script wait until named loads all the zones it is configured to serve
before telling the system test framework that a given server is ready to
use; this prevents the need to add boilerplate code that waits for a
specific zone to be loaded to each test expecting that.
The problem is that when it looks for "running" messages, the
bin/tests/system/start.pl script assumes that the existence of any such
message in the named.run file indicates that a given named instance has
already finished loading all zones. Meanwhile, some system tests
restart all the named instances they use throughout their lifetime (some
even do that a few times), for example to run Python-based tests. The
bin/tests/system/start.pl script handles such a scenario incorrectly: as
soon as it finds any "running" message in the named.run file it inspects
and it gets a response to a version.bind/CH/TXT query, it tells the
system test framework that a given server is ready to use, which might
not be true - it is possible that only the "version.bind" zone is loaded
at that point and the "running" message found was logged by a
previously-shutdown named instance. This triggers intermittent failures
for Python-based tests.
Fix by improving the logic that the bin/tests/system/start.pl script
uses to detect server startup: check how many "running" lines are
present in a given named.run file before attempting to start a named
instance and only proceed with version.bind/CH/TXT queries when the
number of "running" lines found in that named.run file increases after
the server is started.
In the "rrsetorder" system test, the ns2 named instance is restarted
without passing the --restart option to bin/tests/system/start.pl. This
causes the log file for that named instance to be needlessly truncated.
Prevent this from happening by restarting the affected named instance
in the same way as all the other named instances used in system tests.