Instead of passing the "workers" variable back and forth along with
passing the single isc_nm_t instance, add isc_nm_getnworkers() function
that returns the number of netmgr threads are running.
Change the ns_interfacemgr and ns_taskmgr to utilize the newly acquired
knowledge.
From an attacker's point of view, a VLA declaration is essentially a
primitive for performing arbitrary arithmetic on the stack pointer. If
the attacker can control the size of a VLA they have a very powerful
tool for causing memory corruption.
To mitigate this kind of attack, and the more general class of stack
clash vulnerabilities, C compilers insert extra code when allocating a
VLA to probe the growing stack one page at a time. If these probes hit
the stack guard page, the program will crash.
From the point of view of a C programmer, there are a few things to
consider about VLAs:
* If it is important to handle allocation failures in a controlled
manner, don't use VLAs. You can use VLAs if it is OK for
unreasonable inputs to cause an uncontrolled crash.
* If the VLA is known to be smaller than some known fixed size,
use a fixed size array and a run-time check to ensure it is large
enough. This will be more efficient than the compiler's stack
probes that need to cope with arbitrary-size VLAs.
* If the VLA might be large, allocate it on the heap. The heap
allocator can allocate multiple pages in one shot, whereas the
stack clash probes work one page at a time.
Most of the existing uses of VLAs in BIND are in test code where they
are benign, but there was one instance in `named`, in the GSS-TSIG
verification code, which has now been removed.
This commit adjusts the style guide and the C compiler flags to allow
VLAs in test code but not elsewhere.
In the GSS-TSIG verification code there was an alarming
variable-length array whose size came off the network, from the
signature in the request. It turned out to be safe, because the caller
had previously checked that the signature had a reasonable size.
However, the safety checks are in the generic TSIG implementation, and
the risky VLA usage was in the GSS-specific code, and they are
separated by the DST indirection layer, so it wasn't immediately
obvious that the risky VLA was in fact safe.
In fact this risky VLA was completely unnecessary, because the GSS
signature can be verified in place without being copied to the stack,
like the message covered by the signature. The `REGION_TO_GBUFFER()`
macro backwardly assigns the region in its left argument to the GSS
buffer in its right argument; this is just a pointer and length
conversion, without copying any data. The `gss_verify_mic()` call uses
both message and signature GSS buffers in a read-only manner.
Rework the "ans8" server in the "digdelv" system test to support various
modes of operations using a control channel.
The supported modes are:
1. `silent` (do not respond)
2. `close` (UDP: same as `silent`; TCP: also close the connection)
3. `servfail` (always respond with `SERVFAIL`)
4. `unstable` (constantly switch between `silent` and `servfail`)
Add multiple tests to check the handling of both TCP and UDP socket
error scenarios in dig/host.
When encountering a TCP connection error while trying to initiate a
connection to a server, dig erroneously cancels the lookup even when
there are other server(s) to try, which results in an assertion failure.
Cancel the lookup only when there are no more queries left in the
lookup's queries list (i.e. `next` is NULL).
Add a test to check whether dig tries the next query/server after
a connection error.
Add a test to check whether dig tries the next query/server after
a one or more (default is 3) connection/request timeouts.
When timing-out or having other types of socket errors during a query,
dig isn't trying to perform the lookup using other servers which exist
in the lookup's queries list.
After configured amount of timeout retries, or after a socket error,
check if there are other queries/servers in the lookup's queries list,
and start the next one if it exists, instead of unconditionally failing.
This test ensures that `dig` retries with another attempt after a
timed-out request, and that it does not crash when the retried
request returns a SERVFAIL result. See [GL #3020] for the latter
issue.
When a query times out, and `dig` (or `host`) creates a new query
to resend the request, it is being prepended to the lookup's queries
list, which can cause a confusion later, making `dig` (or `host`)
believe that there is another new query in the list, but that is
actually the old one, which was timed out. That mistake will result
in an assertion failure.
That can happen, in particular, when after a timed out request,
the retried request returns a SERVFAIL result, and the recursion
is enabled, and `+nofail` option was used with `dig` (that is the
default behavior in `host`, unless the `-s` option is provided).
Fix the problem by inserting the query just after the current,
timed-out query, instead of prepending to the list.
Before calling start_udp() detach `l->current_query`, like it is
done in another place in the function.
Slightly update a couple of debug messages to make them more
consistent.
After a query results in a SERVFAIL result, and there is another
registered query in the lookup's queries list, `dig` starts the next
query to try another server, but for some reason, reports about that
also when the current query is in the head of the list, even if there
is no other query in the list to try.
Use the same condition for both decisions, and after starting the next
query, jump to the "detach_query" label instead of "next_lookup",
because there is no need to start the next lookup after we just started
a query in the current lookup.
When max-transfer-*-out timeouts were reintroduced, the log message
about starting the timer was errorneously left as ISC_LOG_ERROR.
Change the log level of said message to ISC_LOG_DEBUG(1).
The clang-format-15 has new option InsertBraces that could add missing
branches around single line statements. Use that to our advantage
without switching to not-yet-released LLVM version to add missing braces
in couple of places.
As incremental rehashing has been added to isc_ht implementation, we
need to test whether the rehashing works.
Update the isc_ht unit test to test:
* preinitialized hash table large enough to hold all the elements
* smallest hash table that fully grows to hold all the elements
* partially preinitialized hash table that grows
* iterating while rehashing is in progress
Previously, an incremental hash table resizing was implemented for the
dns_rbt_t hash table implementation. Using that as a base, also
implement the incremental hash table resizing also for isc_ht API
hashtables:
1. During the resize, allocate the new hash table, but keep the old
table unchanged.
2. In each lookup, delete, or iterator operation, check both tables.
3. Perform insertion operations only in the new table.
4. At each insertion also move <r> elements from the old table to
the new table.
5. When all elements are removed from the old table, deallocate it.
To ensure that the old table is completely copied over before the new
table itself needs to be enlarged, it is necessary to increase the
size of the table by a factor of at least (<r> + 1)/<r> during resizing.
In our implementation <r> is equal to 1.
The downside of this approach is that the old table and the new table
could stay in memory for longer when there are no new insertions into
the hash table for prolonged periods of time as the incremental
rehashing happens only during the insertions.
The fetch can be in the shutting down state when resume_dslookup() is
trying to operate on it.
This is also a security issue, because a malicious actor can set up a
name server which delays certain queries in such a way that the fetch
will time out and shut down, which will cause named to crash.
Add a check to see if the fetch has the shutting down attribute set,
and cancel any further operations on it in such case.
A similar bug had been fixed earlier for the resume_qmin() function,
see [GL #966].
This is an optimisation as we can skip a lot of pointless work when we
know there is a DNAME there.
When we have a partial match and a DNAME above the QNAME, the closest
encloser has the same owner as the DNAME, will have the DNAME bit set
in the type map, and we wouldn't use it as we would return the
DNAME + RRSIG(DNAME) instead.
So there is no point in looking for it nor in attempting to check that
it is valid for the QNAME.
When sock->closehandle_cb is set, we need to run nmhandle_detach_cb()
asynchronously to ensure correct order of multiple packets processing in
the isc__nm_process_sock_buffer(). When not run asynchronously, it
would cause:
a) out-of-order processing of the return codes from processbuffer();
b) stack growth because the next TCP DNS message read callback will
be called from within the current TCP DNS message read callback.
The sock->closehandle_cb is set to isc__nm_resume_processing() for TCP
sockets which calls isc__nm_process_sock_buffer(). If the read callback
(called from isc__nm_process_sock_buffer()->processbuffer()) doesn't
attach to the nmhandle (f.e. because it wants to drop the processing or
we send the response directly via uv_try_write()), the
isc__nm_resume_processing() (via .closehandle_cb) would call
isc__nm_process_sock_buffer() recursively.
The below shortened code path shows how the stack can grow:
1: ns__client_request(handle, ...);
2: isc_nm_tcpdns_sequential(handle);
3: ns_query_start(client, handle);
4: query_lookup(qctx);
5: query_send(qctcx->client);
6: isc__nmhandle_detach(&client->reqhandle);
7: nmhandle_detach_cb(&handle);
8: sock->closehandle_cb(sock); // isc__nm_resume_processing
9: isc__nm_process_sock_buffer(sock);
10: processbuffer(sock); // isc__nm_tcpdns_processbuffer
11: isc_nmhandle_attach(req->handle, &handle);
12: isc__nm_readcb(sock, req, ISC_R_SUCCESS);
13: isc__nm_async_readcb(NULL, ...);
14: uvreq->cb.recv(...); // ns__client_request
Instead, if 'sock->closehandle_cb' is set, we need to run detach the
handle asynchroniously in 'isc__nmhandle_detach', so that on line 8 in
the code flow above does not start this recursion. This ensures the
correct order when processing multiple packets in the function
'isc__nm_process_sock_buffer()' and prevents the stack growth.
When not run asynchronously, the out-of-order processing leaves the
first TCP socket open until all requests on the stream have been
processed.
If the pipelining is disabled on the TCP via `keep-response-order`
configuration option, named would keep the first socket in lingering
CLOSE_WAIT state when the client sends an incomplete packet and then
closes the connection from the client side.
'setup_delegation' depends on 'foundname' being the value returned
by 'dns_rbt_findnode' in the cache and 'find_coveringnsec' was
modifying 'foundname' when a covering NSEC was not found.
When caching glue, we need to ensure that there is no closer
source of truth for the name. If the owner name for the glue
record would be answered by a locally configured zone, do not
cache.
When caching additional and glue data *not* from a forwarder, we must
check that there is no "forward only" clause covering the owner name
that would take precedence. Such names would normally be allowed by
baliwick rules, but a "forward only" zone introduces a new baliwick
scope.
If we are using a fowarder, in addition to checking that names to
be cached are subdomains of the forwarded namespace, we must also
check that there are no subsidiary forwarded namespaces which would
take precedence. To be safe, we don't cache any responses if the
forwarding configuration has changed since the query was sent.
Commit 4ca74eee49 update the zone grammar
such that the zone statement is printed with the valid options per
zone type.
This commit is a follow-up, putting back the ZONE heading and adding
a note that these zone statements may also be put inside the view
statement.
It is tricky to actually print the zone statements inside
the view statement, and so we decided that we would add a note to say
that this is possible.
Change the isc_interval_t implementation from separate data type and
separate implementation to be shim implementation on top of isc_time_t.
The distinction between isc_interval_t and isc_time_t has been kept
because they are semantically different - isc_interval_t is relative and
isc_time_t is absolute, but this allows isc_time_t and isc_interval_t to
be freely interchangeable, f.e. this:
isc_time_t *t1;
isc_interval_t *interval;
isc_time_t *t2;
isc_interval_set(interval, isc_time_seconds(t2), isc_time_nanoseconds(t2);;
isc_time_subtract(t1, interval, t2);
isc_interval_set(interval, isc_time_seconds(t2), isc_time_nanoseconds(t2));
to just:
isc_time_t *t1;
isc_interval_t *interval;
isc_time_t *t2;
isc_time_subtract(t1, t2, interval);
without introducing a whole set of new functions.