2
0
mirror of https://gitlab.isc.org/isc-projects/bind9 synced 2025-08-29 13:38:26 +00:00

14012 Commits

Author SHA1 Message Date
Artem Boldariev
eb37d967c2 Add TLS context cache
This commit adds a TLS context object cache implementation. The
intention of having this object is manyfold:

- In the case of client-side contexts: allow reusing the previously
created contexts to employ the context-specific TLS session resumption
cache. That will enable XoT connection to be reestablished faster and
with fewer resources by not going through the full TLS handshake
procedure.

- In the case of server-side contexts: reduce the number of contexts
created on startup. That could reduce startup time in a case when
there are many "listen-on" statements referring to a smaller amount of
`tls` statements, especially when "ephemeral" certificates are
involved.

- The long-term goal is to provide in-memory storage for additional
data associated with the certificates, like runtime
representation (X509_STORE) of intermediate CA-certificates bundle for
Strict TLS/Mutual TLS ("ca-file").
2021-12-29 10:25:11 +02:00
Michał Kępień
ea89ab80ae Fix error codes passed to connection callbacks
Commit 9ee60e7a17bf34c7ef7f4d79e6a00ca45444ec8c erroneously introduced
duplicate conditions to several existing conditional statements
responsible for determining error codes passed to connection callbacks
upon failure.  Fix the affected expressions to ensure connection
callbacks are invoked with:

  - the ISC_R_SHUTTINGDOWN error code when a global netmgr shutdown is
    in progress,

  - the ISC_R_CANCELED error code when a specific operation has been
    canceled.

This does not fix any known bugs, it only adjusts the changes introduced
by commit 9ee60e7a17bf34c7ef7f4d79e6a00ca45444ec8c so that they match
its original intent.
2021-12-28 15:09:50 +01:00
Mark Andrews
dc8595936c remove broken-nsec and reject-000-label options 2021-12-23 15:13:46 +11:00
Michał Kępień
7983d5fa7c Check for SSL_CTX_set_keylog_callback() support
The SSL_CTX_set_keylog_callback() function is a fairly recent OpenSSL
addition, having first appeared in version 1.1.1.  Add a configure.ac
check for the availability of that function to prevent build errors on
older platforms.  Sort similar checks alphabetically.

This makes the SSLKEYLOGFILE mechanism a silent no-op on unsupported
platforms, which is considered acceptable for a debugging feature.
2021-12-22 18:17:26 +01:00
Michał Kępień
060fed3097 Log TLS pre-master secrets when requested
Generate log messages containing TLS pre-master secrets when the
SSLKEYLOGFILE environment variable is set.  This only ensures such
messages are prepared using the right logging category and passed to
libisc for further processing.

The TLS pre-master secret logging callback needs to be set on a
per-context basis, so ensure it happens for both client-side and
server-side TLS contexts.
2021-12-22 18:17:26 +01:00
Michał Kępień
3081bda798 Add a logging category for TLS pre-master secrets
TLS pre-master secrets will be dumped to disk using the logging
framework provided by libisc.  Add a new logging category for this type
of debugging data in order to enable exporting it to a dedicated
channel.  Derive the name of the new category from the name of the
relevant environment variable, SSLKEYLOGFILE.
2021-12-22 18:17:26 +01:00
Aram Sargsyan
5d87725fdc Use ECDSA P-256 instead of 4096-bit RSA for 'tls ephemeral'
ECDSA P-256 performs considerably better than the previously used
4096-bit RSA (can be observed using `openssl speed`), and, according
to RFC 6605, provides a security level comparable to 3072-bit RSA.
2021-12-20 10:09:05 +00:00
Ondřej Surý
ee1f8b60c5 Simplify Address Sanitizer tweaks in mem.c
Previously, whole isc_mempool_get() and isc_mempool_set() would be
replaced by simpler version when run with address sanitizer.

Change the code to limit the fillcount to 1 and freemax to 0.  This
change will make isc_mempool_get() to always allocate and use a single
new item and isc_mempool_put() will always return the item to the
allocator.
2021-12-17 14:43:05 +01:00
Mark Andrews
a23507c4fa Pass the digest buffer length to EVP_DigestSignFinal
OpenSSL 3.0.1 does not accept 0 as a digest buffer length when
calling EVP_DigestSignFinal as it now checks that the digest buffer
length is large enough for the digest.  Pass the digest buffer
length instead.
2021-12-17 20:28:01 +11:00
Ondřej Surý
72cc25465f Reduce freemax values for dns_message mempools
It was discovered that NAME_FREEMAX and RDATASET_FREEMAX was based on
the NAME_FILLCOUNT and RDATASET_FILLCOUNT respectively multiplied by 8
and then when used in isc_mempool_setfreemax, the value would be again
multiplied by 32.

Keep the 8 multiplier in the #define and remove the 32 multiplier as it
was kept in error.  The default fillcount can fit 99.99% of the requests
under normal circumstances, so we don't need to keep that many free
items on the mempool.
2021-12-15 21:25:00 +01:00
Evan Hunt
df2ddc9e7e remove ns_interface reference counting
reference counting of ns_interface objects has not been used
since the clientmgr cleanup in #2433, and it no longer really
makes sense now - when we want to destroy an interface on a
rescan, we want it to be destroyed, not kept active by some
other caller. so ns_interface_attach() has been removed,
ns_interface_detach() has been replaced with a static
interface_destroy(), and do_scan() has been simplified
accordingly.
2021-12-15 09:46:06 -08:00
Evan Hunt
6df5cf1ee6 keep track of non-listening interfaces
previously, if "listen-on-v6" was set to "none", then every
time a scan saw an IPv6 address it would appear to be a new
one.  this commit retains all known interfaces in a list
and sets a flag in the ones that are listening, so that
configured interfaces that have been seen before will be
recognized as such.

as an incidental fix, the ns__interfacemgr_getif() and _nextif()
functions have been removed since they were never used.
2021-12-15 09:46:06 -08:00
Artem Boldariev
fb4e1ed5b2 Examine RTM_NEWADDR, RTM_DELADDR messages contents
This commit modifies the NetLink handling code in such a way
that the contents of the messages we are interested in is checked
for the local addresses changes only. This helps to avoid spurious
interface re-scans.

The 'route_recv' log messages are also reduced from DEBUG(3) to
DEBUG(9).
2021-12-15 09:46:06 -08:00
Ondřej Surý
ce75d4a96b Set the clientmgr isc_mem_t context name
The memory context created in the clientmgr context was missing a name,
so it was nameless in the memory context statistics.

Set the clientmgr memory context name to "clientmgr".
2021-12-14 19:15:58 +00:00
Michal Nowak
9c013f37d0
Drop cppcheck workarounds
As cppcheck was removed from the CI, associated workarounds and
suppressions are not required anymore.
2021-12-14 15:03:56 +01:00
Aram Sargsyan
f595a75cd6 Recreate HTTPS and TLS interfaces only during reconfiguration
The 850e9e59bf8c29f895a981211c72c0b3c294bcfd commit intended to recreate
the HTTPS and TLS interfaces during reconfiguration, but they are being
recreated also during regular interface re-scans.

Make sure the HTTPS and TLS interfaces are being recreated only during
reconfiguration.
2021-12-14 09:28:01 +00:00
Aram Sargsyan
850e9e59bf Recreate TLS interfaces during reconfiguration
For DoH and DoT listeners, a reconfiguration event triggers a creation
of a new 'SSL_CTX' TLS context, and a destruction of the old one.

The network manager, though, keeps using the old context which causes
errors.

During interface scanning, when a matching existing interface is found,
reuse it only when it doesn't have a TLS context, otherwise shut it down
and recreate with a new TLS context.
2021-12-13 10:19:57 +00:00
Petr Menšík
929bbe192d Improve error message when directory name is given
Surprising error IO error is returned when directory name
is given instead of named.conf file. It can be passed to named-checkconf
or include statement. Make a simple change to return Invalid file
instead. Still not precise, but much better error message is returned.

Fix of rhbz#490837.
2021-12-10 10:50:21 +01:00
Michał Kępień
eb4713c8e5 Remove mutex debugging code
Mutex debugging code (used when the ISC_MUTEX_DEBUG preprocessor macro
is set to 1 and PTHREAD_MUTEX_ERRORCHECK is defined) has been broken for
the past 3 years (since commit 2f3eee5a4fdad6606135116c70875b3180c7ed83)
and nobody complained, which is a strong indication that this code is
not being used these days any more.  External tools for detecting
locking issues are already wired into various GitLab CI checks.  Drop
all code depending on the ISC_MUTEX_DEBUG preprocessor macro being set.
2021-12-09 14:02:36 +01:00
Michał Kępień
0964a94ad5 Remove mutex profiling code
Mutex profiling code (used when the ISC_MUTEX_PROFILE preprocessor macro
is set to 1) has been broken for the past 3 years (since commit
0bed9bfc28a204cde57c6f68170ecc89ebfa6dc8) and nobody complained, which
is a strong indication that this code is not being used these days any
more.  External tools for both measuring performance and detecting
locking issues are already wired into various GitLab CI checks.  Drop
all code depending on the ISC_MUTEX_PROFILE preprocessor macro being
set.
2021-12-09 12:25:21 +01:00
Evan Hunt
157d7bd0e9 incidental cleanups
the 'dipsatchmgr->state' was never set, so the MGR_IS_SHUTTINGDOWN
macro was always false. both of these have been removed.

renamed the 'dispatch->state' field to 'tcpstate' to make its purpose
less ambiguous.

changed an FCTXTRACE log message from "response did not match question"
to the more correctly descriptive "invalid question section".
2021-12-08 10:22:03 -08:00
Evan Hunt
5f82fc11a9 prevent a shutdown hang on non-matching TCP responses
When a non-matching DNS response is received by the resolver,
it calls dns_dispatch_getnext() to resume reading. This is necessary
for UDP but not for TCP, because TCP connections automatically
resume reading after any valid DNS response.

This commit adds a 'tcpreading' flag to TCP dispatches, so that
`dispatch_getnext()` can be called multiple times without subsequent
calls having any effect.
2021-12-08 10:22:03 -08:00
Ondřej Surý
57d0fabadd Stop leaking mutex in nmworker and cond in nm socket
On FreeBSD, the pthread primitives are not solely allocated on stack,
but part of the object lives on the heap.  Missing pthread_*_destroy
causes the heap memory to grow and in case of fast lived object it's
possible to run out-of-memory.

Properly destroy the leaking mutex (worker->lock) and
the leaking condition (sock->cond).
2021-12-08 17:58:53 +01:00
Ondřej Surý
c6f3e12fe7 Reduce the number of hazard pointers
Previously, we set the number of the hazard pointers to be 4 times the
number of workers because the dispatch ran on the old socket code.
Since the old socket code was removed there's a smaller number of
threads, namely:

 - 1 main thread
 - 1 timer thread
 - <n> netmgr threads
 - <n> threadpool threads

Set the number of hazard pointers to 2 + 2 * workers.
2021-12-07 21:12:53 +01:00
Ondřej Surý
15ce1737fa Fix the isc_hp initialization and memory usage
Previously, the isc_hp_init() could not lower the value of
isc__hp_max_threads, but because of a mistake the isc__hp_max_threads
would be set to HP_MAX_THREADS (e.g. 128 threads) thus it would be
always set to 128.  This would result in increased memory usage even
when small number of workers were in use.

Change the default value of isc__hp_max_threads to be 1.

Additionally, enforce the max_hps value in isc_hp_new() to be smaller or
equal to HP_MAX_HPS.  The only user is isc_queue which uses just 1
hazard pointer, so it's only theoretical issue.
2021-12-07 20:41:46 +01:00
Petr Špaček
74d83910d5
Mark broken-nsec option as deprecated
It's unclear if we are going to keep it or not, so let's mark it as
deprecated for a good measure. It's easier to un-deprecate it than the
other way around.
2021-12-06 16:55:55 +01:00
Evan Hunt
4d4cea243a restore the fetch lifetime timer
the lifetime expiry timer for the fetch context was removed
when we switched to using in-band netmgr timeouts. however,
it turns out some dependency loops can occur between a fetch
and the ADB the validator; these deadlocks were formerly broken
when the timer fired, and now there's no timer. we can fix these
errors individually, but in the meantime we don't want the server
to get hung at shutdown because of dangling fetches.

this commit puts back a single timer, which fires two seconds
after the fetch should have completed, and shuts it down. it also
logs a message at level INFO so we know about the problems when
they occur.
2021-12-03 09:49:24 +01:00
Mark Andrews
0aaaa8768f
Reject NSEC records with next field with \000 label
A number of DNS implementation produce NSEC records with bad type
maps that don't contain types that exist at the name leading to
NODATA responses being synthesize instead of the records in the
zone.  NSEC records with these bad type maps often have the NSEC
NSEC field set to '\000.QNAME'.  We look for the first label of
this pattern.

e.g.
	example.com NSEC \000.example.com SOA NS NSEC RRSIG
	example.com RRRSIG NSEC ...
	example.com SOA ...
	example.com RRRSIG SOA ...
	example.com NS ...
	example.com RRRSIG NS ...
	example.com A ...
	example.com RRRSIG A ...

	A is missing from the type map.

This introduces a temporary option 'reject-000-label' to control
this behaviour.
2021-12-02 14:27:18 +01:00
Mark Andrews
3faccb16cc
Add server christmas tree test
This sets as many server options as possible at once to detect
cut-and-paste bugs when implementing new server options in peer.c.
Most of the accessor functions are similar and it is easy to miss
updating a macro name or structure element name when adding new
accessor functions.

checkconf/setup.sh is there to minimise the difference to branches
with optional server options where the list is updated at runtime.
2021-12-02 14:27:18 +01:00
Mark Andrews
733f58a7a5
Allow servers that emit broken NSEC records to be identified
'server <prefix> { broken-nsec yes; };' can now be used to stop
NSEC records from negative responses from servers in the given
prefix being cached and hence available to synth-from-dnssec.
2021-12-02 14:27:14 +01:00
Mark Andrews
454c29046f
Check that SOA and DNSKEY are consistent in NSEC typemaps
If there is a SOA record present then there should also be a
DNSKEY record present as the DNSKEY is supposed to live at the
zone apex like the SOA.
2021-12-02 14:24:37 +01:00
Mark Andrews
5252985a21
Look for covering NSEC under two more conditions
1) when after processing a node there where no headers that
   contained active records.

   When

       if (check_stale_header(node, header, &locktype, lock, &search,
			      &header_prev);

   succeeds or

       if (EXISTS(header) && !ANCIENT(header))

   fails for all entries in the list leading to 'empty_node' remaining
   true.

   If there is are no active records we know nothing about the
   current state of the name so we treat is as ISC_R_NOTFOUND.

2) when there was a covering NOQNAME proof found or all the
   active headers where negative.

   When

	if (header->noqname != NULL &&
	    header->trust == dns_trust_secure)

   succeeds or

	if (!NEGATIVE(header))

   never succeeds.  Under these conditions there could (should be for
   found_noqname) be a covering NSEC earlier in the tree.
2021-12-02 14:24:37 +01:00
Mark Andrews
3fa3b11ef8
Add synthesis of NODATA at wildcard
The old code rejected NSEC that proved the wildcard name existed
(exists).  The new code rejects NSEC that prove that the wildcard
name exists and that the type exists (exists && data) but accept
NSEC that prove the wildcard name exists.

query_synthnxdomain (renamed query_synthnxdomainnodata) already
took the NSEC records and added the correct records to the message
body for NXDOMAIN or NODATA responses with the above change.  The
only additional change needed was to ensure the correct RCODE is
set.
2021-12-02 14:24:37 +01:00
Mark Andrews
4bdd5a9953
Ignore NSEC records without RRSIG and NSEC present
dns_nsec_noexistnodata now checks that RRSIG and NSEC are
present in the type map.  Both types should be present in
a correctly constructed NSEC record.  This check is in
addition to similar checks in resolver.c and validator.c.
2021-12-02 14:18:42 +01:00
Mark Andrews
8ff2c133b5
Add dns_nsec_requiredtypespresent
checks an NSEC rdataset to ensure that both NSEC and RRSIG are
present in the type map.  These types are required for the NSEC
to be valid
2021-12-02 14:18:42 +01:00
Mark Andrews
43316a40a0
Record how often DNS_R_COVERINGNSEC is returned from the cache
reported as "covering nsec returned" when dumping cache stats
and as "CoveringNSEC" in json and xml cache statistics.
2021-12-02 14:18:41 +01:00
Mark Andrews
62dd9ec9c1
Report Cache NSEC auxilary database size 2021-12-02 14:18:41 +01:00
Mark Andrews
85bfcaeb2e
Extend dns_db_nodecount to access auxilary rbt node counts
dns_db_nodecount can now be used to get counts from the auxilary
rbt databases.  The existing node count is returned by
tree=dns_dbtree_main.  The nsec and nsec3 node counts by dns_dbtree_nsec
and dns_dbtree_nsec3 respectively.
2021-12-02 14:18:41 +01:00
Mark Andrews
c8a7f92b9e
Allow "black lies" to be cached
"black lies" differ from "white lies" in that the owner name of the
NSEC record matches the QNAME and the intent is to return NODATA
instead of NXDOMAIN for all types.  Caching this NSEC does not lead
to unexpected behaviour on synthesis when the QNAME matches the
NSEC owner which it does for the the general "white lie" response.

"black lie" QNAME NSEC \000.QNAME NSEC RRSIG

"white lie" QNAME- NSEC QNAME+ NSEC RRSIG

where QNAME- is a name that is close to QNAME but sorts before QNAME
and QNAME+ is a that is close to QNAME but sorts after QNAME.

Black lies are safe to cache as they don't bring into existence
names that are not intended to exist.  "Black lies" intentional change
NXDOMAIN to NODATA. "White lies" bring QNAME- into existence and named
would synthesis NODATA for QNAME+ if it is queried for that name
instead of discovering the, presumable, NXDOMAIN response.

Note rejection NSEC RRsets with NEXT names starting with the label
'\000' renders this change ineffective (see reject-000-label).
2021-12-02 14:18:41 +01:00
Mark Andrews
6fae151c9d
Do not cache minimal NSEC records (NSEC + RRSIG only)
these are not useful for dnssec synthesis as they can result in
false NODATA responses and just consume cache memory
2021-12-02 14:18:41 +01:00
Mark Andrews
27acf56ba3
Remove unnecessary dns_rbt_fullnamefromnode call
the results from dns_rbt_fullnamefromnode are not used.
2021-12-02 14:18:40 +01:00
Mark Andrews
89542b8a15
Count DNS_R_COVERINGNSEC as a cache {query}hit
Note when synthesising answer involving wildcards we look in the
cache multiple times, once for the QNAME and once for the wildcard
name which is constucted by looking at the names from the covering
NSEC return by the QNAME miss.
2021-12-02 14:18:40 +01:00
Mark Andrews
3a5652ccb1
Rework rbtdb.c:find_coveringnsec() to use the auxilary nsec rbt
this improves the performance of looking for NSEC and RRSIG(NSEC)
records in the cache by skipping lots of nodes in the main trees
in the cache without these records present.  This is a simplified
version of previous_closest_nsec() which uses the same underlying
mechanism to look for NSEC and RRSIG(NSEC) records in authorative
zones.

The auxilary NSEC tree was already being maintained as a side effect
of looking for the covering NSEC in large zones where there can be
lots of glue records that needed to be skipped.  Nodes are added
to the tree whenever a NSEC record is added to the primary tree.
They are removed when the corresponding node is removed from the
primary tree.

Having nodes in the NSEC tree w/o NSEC records in the primary tree
should not impact on synth-from-dnssec efficiency as that node would
have held the NSEC we would have been needed to synthesise the
response.  Removing the node when the NSEC RRset expires would only
cause rbtdb to return a NSEC which would be rejected at a higher
level.
2021-12-02 14:18:40 +01:00
Ondřej Surý
20ac73eb22 Improve the logging on failed TCP accept
Previously, when TCP accept failed, we have logged a message with
ISC_LOG_ERROR level.  One common case, how this could happen is that the
client hits TCP client quota and is put on hold and when resumed, the
client has already given up and closed the TCP connection.  In such
case, the named would log:

    TCP connection failed: socket is not connected

This message was quite confusing because it actually doesn't say that
it's related to the accepting the TCP connection and also it logs
everything on the ISC_LOG_ERROR level.

Change the log message to "Accepting TCP connection failed" and for
specific error states lower the severity of the log message to
ISC_LOG_INFO.
2021-12-02 13:50:00 +01:00
Evan Hunt
fa8f409af2 On non-matching answer, check for missed timeout
A TCP connection may be held open past its proper timeout if it's
receiving a stream of DNS responses that don't match any queries.
In this case, we now check whether the oldest query should have timed
out.
2021-12-01 11:45:55 -08:00
Ondřej Surý
ba1cadf14a Tear down the TCP connection on too many unexpected DNS messages
When the outgoing TCP dispatch times-out active response, we might still
receive the answer during the lifetime of the connection.  Previously,
we would just ignore any non-matching DNS answers, which would allow the
server to feed us with otherwise valid DNS answer and keep the
connection open.

Add a counter for timed-out DNS queries over TCP and tear down the whole
TCP connection if we receive unexpected number of DNS answers.
2021-12-01 11:45:55 -08:00
Ondřej Surý
c84ed5056e Refactor tcp_recv()
The tcp_recv() function used lot of gotos that made the function hard to
read.  Refactor the function by splitting it into smaller logical chunks.
2021-12-01 11:45:55 -08:00
Ondřej Surý
10f4f1a250 Shutdown all TCP connection on invalid DNS message
Previously, when invalid DNS message is received over TCP we throw the
garbage DNS message away and continued looking for valid DNS message
that would match our outgoing queries.  This logic makes sense for UDP,
because anyone can send DNS message over UDP.

Change the logic that the TCP connection is closed when we receive
garbage, because the other side is acting malicious.
2021-12-01 11:45:55 -08:00
Ondřej Surý
9230473324 Shutdown all active TCP connections on error
When outgoing TCP connection was prematurely terminated (f.e. with
connection reset), the dispatch code would not cleanup the resources
used by such connection leading to dangling dns_dispentry_t entries.
2021-12-01 11:45:55 -08:00
Artem Boldariev
5f859d8a98 TLS context handling code: Fix an abort on ancient OpenSSL version
There was a logical bug when setting a list of enabled TLS protocols,
which may lead to a crash (an abort()) on systems with ancient OpenSSL
versions.

The problem was due to the fact that we were INSIST()ing on supporting
all of the TLS versions, while checking only for mentioned in the
configuration was implied.
2021-12-01 12:00:30 +02:00