The netmgr unit tests were designed to push the system limits to maximum
by sending as many queries as possible in the busy loop from multiple
threads. This mostly works with UDP, but in the stateful protocol where
establishing the connection takes more time, it failed quite often in
the CI. On FreeBSD, this happened more often, because the socket() call
would fail spuriosly making the problem even worse.
This commit does several things to improve reliability:
* return value of isc_nm_<proto>connect() is always checked and retried
when scheduling the connection fails
* The busy while loop has been slowed down with usleep(1000); so the
netmgr threads could schedule the work and get executed.
* The isc_thread_yield() was replaced with usleep(1000); also to allow
the other threads to do any work.
* Instead of waiting on just one variable, we wait for multiple
variables to reach the final value
* We are wrapping the netmgr operations (connects, reads, writes,
accepts) with reference counting and waiting for all the callbacks to
be accounted for.
This has two effects:
a) the isc_nm_t is always clean of active sockets and handles when
destroyed, so it will prevent the spurious INSIST(references == 1)
from isc_nm_destroy()
b) the unit test now ensures that all the callbacks are always called
when they should be called, so any stuck test means that there was
a missing callback call and it is always a real bug
These changes allows us to remove the workaround that would not run
certain tests on systems without port load-balancing.
On 24-core machine, the tests would crash because we would run out of
the hazard pointers. We now adjust the number of hazard pointers to be
in the <128,256> interval based on the number of available cores.
Note: This is just a band-aid and needs a proper fix.
The isc_mem API now crashes on memory allocation failure, and this is
the next commit in series to cleanup the code that could fail before,
but cannot fail now, e.g. isc_result_t return type has been changed to
void for the isc_log API functions that could only return ISC_R_SUCCESS.
- the netmgr was not correctly being specified when creating the task
manager, and was cleaned up in the wrong order when shutting down.
- on freebsd, timer_test appears to be prone to failure if the
netmgr is set up and torn down before and after ever test case, but
less so if it's only set up once at the beginning and once at the
end.
This commit renames isctest {mctx,lctx} to test_{mctx,lctx} and cleans
up their usage in the individual unit tests. This allows embedding
library .c files directly into the unit tests.
When a task manager is created, we can now specify an `isc_nm`
object to associate with it; thereafter when the task manager is
placed into exclusive mode, the network manager will be paused.
The three functions has been modeled after the arc4random family of
functions, and they will always return random bytes.
The isc_random family of functions internally use these CSPRNG (if available):
1. getrandom() libc call (might be available on Linux and Solaris)
2. SYS_getrandom syscall (might be available on Linux, detected at runtime)
3. arc4random(), arc4random_buf() and arc4random_uniform() (available on BSDs and Mac OS X)
4. crypto library function:
4a. RAND_bytes in case OpenSSL
4b. pkcs_C_GenerateRandom() in case PKCS#11 library
3535. [func] Add support for setting Differentiated Services Code
Point (DSCP) values in named. Most configuration
options which take a "port" option (e.g.,
listen-on, forwarders, also-notify, masters,
notify-source, etc) can now also take a "dscp"
option specifying a code point for use with
outgoing traffic, if supported by the underlying
OS. [RT #27596]