2
0
mirror of https://gitlab.isc.org/isc-projects/bind9 synced 2025-08-28 13:08:06 +00:00

74 Commits

Author SHA1 Message Date
Ondřej Surý
4c8f6ebeb1 Use barriers for netmgr synchronization
The netmgr listening, stoplistening, pausing and resuming functions
now use barriers for synchronization, which makes the code much simpler.

isc/barrier.h defines isc_barrier macros as a front-end for uv_barrier
on platforms where that works, and pthread_barrier where it doesn't
(including TSAN builds).
2021-05-07 14:28:32 -07:00
Ondřej Surý
29a208aaf7 Fix crash when allocating UDP socket fails on OpenBSD
When socket() call fails, the UDP connect code would call the connectcb
with empty req->handle.  This has been fixed.
2021-05-07 14:28:30 -07:00
Evan Hunt
bcf5b2a675 run read callbacks synchronously on timeout
when running read callbacks, if the event result is not ISC_R_SUCCESS,
the callback is always run asynchronously. this is a problem on timeout,
because there's no chance to reset the timer before the socket has
already been destroyed. this commit allows read callbacks to run
synchronously for both ISC_R_SUCCESS and ISC_R_TIMEDOUT result codes.
2021-04-22 12:08:04 -07:00
Ondřej Surý
b540722bc3 Refactor taskmgr to run on top of netmgr
This commit changes the taskmgr to run the individual tasks on the
netmgr internal workers.  While an effort has been put into keeping the
taskmgr interface intact, couple of changes have been made:

 * The taskmgr has no concept of universal privileged mode - rather the
   tasks are either privileged or unprivileged (normal).  The privileged
   tasks are run as a first thing when the netmgr is unpaused.  There
   are now four different queues in in the netmgr:

   1. priority queue - netievent on the priority queue are run even when
      the taskmgr enter exclusive mode and netmgr is paused.  This is
      needed to properly start listening on the interfaces, free
      resources and resume.

   2. privileged task queue - only privileged tasks are queued here and
      this is the first queue that gets processed when network manager
      is unpaused using isc_nm_resume().  All netmgr workers need to
      clean the privileged task queue before they all proceed normal
      operation.  Both task queues are processed when the workers are
      finished.

   3. task queue - only (traditional) task are scheduled here and this
      queue along with privileged task queues are process when the
      netmgr workers are finishing.  This is needed to process the task
      shutdown events.

   4. normal queue - this is the queue with netmgr events, e.g. reading,
      sending, callbacks and pretty much everything is processed here.

 * The isc_taskmgr_create() now requires initialized netmgr (isc_nm_t)
   object.

 * The isc_nm_destroy() function now waits for indefinite time, but it
   will print out the active objects when in tracing mode
   (-DNETMGR_TRACE=1 and -DNETMGR_TRACE_VERBOSE=1), the netmgr has been
   made a little bit more asynchronous and it might take longer time to
   shutdown all the active networking connections.

 * Previously, the isc_nm_stoplistening() was a synchronous operation.
   This has been changed and the isc_nm_stoplistening() just schedules
   the child sockets to stop listening and exits.  This was needed to
   prevent a deadlock as the the (traditional) tasks are now executed on
   the netmgr threads.

 * The socket selection logic in isc__nm_udp_send() was flawed, but
   fortunatelly, it was broken, so we never hit the problem where we
   created uvreq_t on a socket from nmhandle_t, but then a different
   socket could be picked up and then we were trying to run the send
   callback on a socket that had different threadid than currently
   running.
2021-04-20 23:22:28 +02:00
Ondřej Surý
72ef5f465d Refactor async callbacks and fix the double tlsdnsconnect callback
The isc_nm_tlsdnsconnect() call could end up with two connect callbacks
called when the timeout fired and the TCP connection was aborted,
but the TLS handshake was not complete yet.  isc__nm_connecttimeout_cb()
forgot to clean up sock->tls.pending_req when the connect callback was
called with ISC_R_TIMEDOUT, leading to a second callback running later.

A new argument has been added to the isc__nm_*_failed_connect_cb and
isc__nm_*_failed_read_cb functions, to indicate whether the callback
needs to run asynchronously or not.
2021-04-07 15:36:59 +02:00
Ondřej Surý
86f4872dd6 isc_nm_*connect() always return via callback
The isc_nm_*connect() functions were refactored to always return the
connection status via the connect callback instead of sometimes returning
the hard failure directly (for example, when the socket could not be
created, or when the network manager was shutting down).

This commit changes the connect functions in all the network manager
modules, and also makes the necessary refactoring changes in places
where the connect functions are called.
2021-04-07 15:36:59 +02:00
Evan Hunt
a70cd026df move UDP connect retries from dig into isc_nm_udpconnect()
dig previously ran isc_nm_udpconnect() three times before giving
up, to work around a freebsd bug that caused connect() to return
a spurious transient EADDRINUSE. this commit moves the retry code
into the network manager itself, so that isc_nm_udpconnect() no
longer needs to return a result code.
2021-04-07 15:36:59 +02:00
Ondřej Surý
7df8c7061c Fix and clean up handling of connect callbacks
Serveral problems were discovered and fixed after the change in
the connection timeout in the previous commits:

  * In TLSDNS, the connection callback was not called at all under some
    circumstances when the TCP connection had been established, but the
    TLS handshake hadn't been completed yet.  Additional checks have
    been put in place so that tls_cycle() will end early when the
    nmsocket is invalidated by the isc__nm_tlsdns_shutdown() call.

  * In TCP, TCPDNS and TLSDNS, new connections would be established
    even when the network manager was shutting down.  The new
    call isc__nm_closing() has been added and is used to bail out
    early even before uv_tcp_connect() is attempted.
2021-04-07 15:36:59 +02:00
Ondřej Surý
5a87c7372c Make it possible to recover from connect timeouts
Similarly to the read timeout, it's now possible to recover from
ISC_R_TIMEDOUT event by restarting the timer from the connect callback.

The change here also fixes platforms that missing the socket() options
to set the TCP connection timeout, by moving the timeout code into user
space.  On platforms that support setting the connect timeout via a
socket option, the timeout has been hardcoded to 2 minutes (the maximum
value of tcp-initial-timeout).
2021-04-07 15:36:58 +02:00
Ondřej Surý
1ef232f93d Merge the common parts between udp, tcpdns and tlsdns protocol
The udp, tcpdns and tlsdns contained lot of cut&paste code or code that
was very similar making the stack harder to maintain as any change to
one would have to be copied to the the other protocols.

In this commit, we merge the common parts into the common functions
under isc__nm_<foo> namespace and just keep the little differences based
on the socket type.
2021-03-18 16:37:57 +01:00
Ondřej Surý
caa5b6548a Fix TCPDNS and TLSDNS timers
After the TCPDNS refactoring the initial and idle timers were broken and
only the tcp-initial-timeout was always applied on the whole TCP
connection.

This broke any TCP connection that took longer than tcp-initial-timeout,
most often this would affect large zone AXFRs.

This commit changes the timeout logic in this way:

  * On TCP connection accept the tcp-initial-timeout is applied
    and the timer is started
  * When we are processing and/or sending any DNS message the timer is
    stopped
  * When we stop processing all DNS messages, the tcp-idle-timeout
    is applied and the timer is started again
2021-03-18 16:37:57 +01:00
Witold Kręcicki
7a96081360 nghttp2-based HTTP layer in netmgr
This commit includes work-in-progress implementation of
DNS-over-HTTP(S).

Server-side code remains mostly untested, and there is only support
for POST requests.
2021-02-03 12:06:17 +01:00
Ondřej Surý
5caf33feda Fix HAVE_SO_REUSEPORT_LB macro name definition
A typo in macro definition caused the load-balanced sockets to be
disabled even on platforms with existing support for load-balanced
sockets.
2020-12-04 14:45:22 +01:00
Ondřej Surý
87c5867202 Use sock->nchildren instead of mgr->nworkers when initializing NM
On Windows, we were limiting the number of listening children to just 1,
but we were then iterating on mgr->nworkers.  That lead to scheduling
more async_*listen() than actually allocated and out-of-bound read-write
operation on the heap.
2020-12-03 18:03:25 +01:00
Ondřej Surý
151852f428 Fix datarace when UDP/TCP connect fails and we are in nmthread
When we were in nmthread, the isc__nm_async_<proto>connect() function
executes in the same thread as the isc__nm_<proto>connect() and on a
failure, it would block indefinitely because the failure branch was
setting sock->active to false before the condition around the wait had a
chance to skip the WAIT().

This also fixes the zero system test being stuck on FreeBSD 11, so we
re-enable the test in the commit.
2020-12-03 13:56:34 +01:00
Ondřej Surý
1d066e4bc5 Distribute queries among threads even on platforms without lb sockets
On platforms without load-balancing socket all the queries would be
handle by a single thread.  Currently, the support for load-balanced
sockets is present in Linux with SO_REUSEPORT and FreeBSD 12 with
SO_REUSEPORT_LB.

This commit adds workaround for such platforms that:

1. setups single shared listening socket for all listening nmthreads for
   UDP, TCP and TCPDNS netmgr transports

2. Calls uv_udp_bind/uv_tcp_bind on the underlying socket just once and
   for rest of the nmthreads only copy the internal libuv flags (should
   be just UV_HANDLE_BOUND and optionally UV_HANDLE_IPV6).

3. start reading on UDP socket or listening on TCP socket

The load distribution among the nmthreads is uneven, but it's still
better than utilizing just one thread for processing all the incoming
queries
2020-12-03 09:20:33 +01:00
Ondřej Surý
d6d2fbe0e9 Avoid netievent allocations when the callbacks can be called directly
After turning the users callbacks to be asynchronous, there was a
visible performance drop.  This commit prevents the unnecessary
allocations while keeping the code paths same for both asynchronous and
synchronous calls.

The same change was done to the isc__nm_udp_{read,send} as those two
functions are in the hot path.
2020-12-02 09:45:05 +01:00
Ondřej Surý
634bdfb16d Refactor netmgr and add more unit tests
This is a part of the works that intends to make the netmgr stable,
testable, maintainable and tested.  It contains a numerous changes to
the netmgr code and unfortunately, it was not possible to split this
into smaller chunks as the work here needs to be committed as a complete
works.

NOTE: There's a quite a lot of duplicated code between udp.c, tcp.c and
tcpdns.c and it should be a subject to refactoring in the future.

The changes that are included in this commit are listed here
(extensively, but not exclusively):

* The netmgr_test unit test was split into individual tests (udp_test,
  tcp_test, tcpdns_test and newly added tcp_quota_test)

* The udp_test and tcp_test has been extended to allow programatic
  failures from the libuv API.  Unfortunately, we can't use cmocka
  mock() and will_return(), so we emulate the behaviour with #define and
  including the netmgr/{udp,tcp}.c source file directly.

* The netievents that we put on the nm queue have variable number of
  members, out of these the isc_nmsocket_t and isc_nmhandle_t always
  needs to be attached before enqueueing the netievent_<foo> and
  detached after we have called the isc_nm_async_<foo> to ensure that
  the socket (handle) doesn't disappear between scheduling the event and
  actually executing the event.

* Cancelling the in-flight TCP connection using libuv requires to call
  uv_close() on the original uv_tcp_t handle which just breaks too many
  assumptions we have in the netmgr code.  Instead of using uv_timer for
  TCP connection timeouts, we use platform specific socket option.

* Fix the synchronization between {nm,async}_{listentcp,tcpconnect}

  When isc_nm_listentcp() or isc_nm_tcpconnect() is called it was
  waiting for socket to either end up with error (that path was fine) or
  to be listening or connected using condition variable and mutex.

  Several things could happen:

    0. everything is ok

    1. the waiting thread would miss the SIGNAL() - because the enqueued
       event would be processed faster than we could start WAIT()ing.
       In case the operation would end up with error, it would be ok, as
       the error variable would be unchanged.

    2. the waiting thread miss the sock->{connected,listening} = `true`
       would be set to `false` in the tcp_{listen,connect}close_cb() as
       the connection would be so short lived that the socket would be
       closed before we could even start WAIT()ing

* The tcpdns has been converted to using libuv directly.  Previously,
  the tcpdns protocol used tcp protocol from netmgr, this proved to be
  very complicated to understand, fix and make changes to.  The new
  tcpdns protocol is modeled in a similar way how tcp netmgr protocol.
  Closes: #2194, #2283, #2318, #2266, #2034, #1920

* The tcp and tcpdns is now not using isc_uv_import/isc_uv_export to
  pass accepted TCP sockets between netthreads, but instead (similar to
  UDP) uses per netthread uv_loop listener.  This greatly reduces the
  complexity as the socket is always run in the associated nm and uv
  loops, and we are also not touching the libuv internals.

  There's an unfortunate side effect though, the new code requires
  support for load-balanced sockets from the operating system for both
  UDP and TCP (see #2137).  If the operating system doesn't support the
  load balanced sockets (either SO_REUSEPORT on Linux or SO_REUSEPORT_LB
  on FreeBSD 12+), the number of netthreads is limited to 1.

* The netmgr has now two debugging #ifdefs:

  1. Already existing NETMGR_TRACE prints any dangling nmsockets and
     nmhandles before triggering assertion failure.  This options would
     reduce performance when enabled, but in theory, it could be enabled
     on low-performance systems.

  2. New NETMGR_TRACE_VERBOSE option has been added that enables
     extensive netmgr logging that allows the software engineer to
     precisely track any attach/detach operations on the nmsockets and
     nmhandles.  This is not suitable for any kind of production
     machine, only for debugging.

* The tlsdns netmgr protocol has been split from the tcpdns and it still
  uses the old method of stacking the netmgr boxes on top of each other.
  We will have to refactor the tlsdns netmgr protocol to use the same
  approach - build the stack using only libuv and openssl.

* Limit but not assert the tcp buffer size in tcp_alloc_cb
  Closes: #2061
2020-12-01 16:47:07 +01:00
Ondřej Surý
a49d88568f Turn all the callback to be always asynchronous
When calling the high level netmgr functions, the callback would be
sometimes called synchronously if we catch the failure directly, or
asynchronously if it happens later.  The synchronous call to the
callback could create deadlocks as the caller would not expect the
failed callback to be executed directly.
2020-11-11 22:15:40 +01:00
Ondřej Surý
8af7f81d6c netmgr: Don't crash if socket() returns an error in udpconnect
socket() call can return an error - e.g. EMFILE, so we need to handle
this nicely and not crash.

Additionally wrap the socket() call inside a platform independent helper
function as the Socket data type on Windows is unsigned integer:

> This means, for example, that checking for errors when the socket and
> accept functions return should not be done by comparing the return
> value with –1, or seeing if the value is negative (both common and
> legal approaches in UNIX). Instead, an application should use the
> manifest constant INVALID_SOCKET as defined in the Winsock2.h header
> file.
2020-11-08 13:36:12 -08:00
Ondřej Surý
050258bda4 netmgr: Always load the result from async socket
Because we use result earlier for setting the loadbalancing on the
socket, we could be left with a ISC_R_NOTIMPLEMENTED value stored in the
variable and when the UDP connection would succeed, we would
errorneously return this value instead of ISC_R_SUCCESS.
2020-11-07 21:12:08 +01:00
Evan Hunt
ea2b04c361 dig: use new netmgr timeout mechanism
use isc_nmhandle_settimeout() to set read/recv timeouts, and get rid
of connect_timeout() and related functions in dighost.c.
2020-11-07 20:49:53 +01:00
Evan Hunt
4be63c5b00 add isc_nmhandle_settimeout() function
this function sets the read timeout for the socket associated
with a netmgr handle and, if the timer is running, resets it.
for TCPDNS sockets it also sets the read timeout and resets the
timer on the outer TCP socket.
2020-11-07 20:49:53 +01:00
Ondřej Surý
ed3ab63f74 Fix more races between connect and shutdown
There were more races that could happen while connecting to a
socket while closing or shutting down the same socket.  This
commit introduces a .closing flag to guard the socket from
being closed twice.
2020-10-30 11:11:54 +01:00
Ondřej Surý
6cfadf9db0 Fix a race between isc__nm_async_shutdown() and new sends/reads
There was a data race where a new event could be scheduled after
isc__nm_async_shutdown() had cleaned up all the dangling UDP/TCP
sockets from the loop.
2020-10-30 11:11:54 +01:00
Ondřej Surý
5fcd52209a Refactor udp_recv_cb()
- more logical code flow.
- propagate errors back to the caller.
- add a 'reading' flag and call the callback from failed_read_cb()
  only when it the socket was actively reading.
2020-10-30 11:11:54 +01:00
Ondřej Surý
cdccac4993 Fix netmgr read/connect timeout issues
- don't bother closing sockets that are already closing.
- UDP read timeout timer was not stopped after reading.
- improve handling of TCP connection failures.
2020-10-30 11:11:54 +01:00
Ondřej Surý
7a6056bc8f Add isc__nm_udp_shutdown() function
This function will be called during isc_nm_closedown() to ensure
that all UDP sockets are closed and detached.
2020-10-30 11:11:54 +01:00
Evan Hunt
5dcdc00b93 add netmgr functions to support outgoing DNS queries
- isc_nm_tcpdnsconnect() sets up up an outgoing TCP DNS connection.
- isc_nm_tcpconnect(), _udpconnect() and _tcpdnsconnect() now take a
  timeout argument to ensure connections time out and are correctly
  cleaned up on failure.
- isc_nm_read() now supports UDP; it reads a single datagram and then
  stops until the next time it's called.
- isc_nm_cancelread() now runs asynchronously to prevent assertion
  failure if reading is interrupted by a non-network thread (e.g.
  a timeout).
- isc_nm_cancelread() can now apply to UDP sockets.
- added shim code to support UDP connection in versions of libuv
  prior to 1.27, when uv_udp_connect() was added

all these functions will be used to support outgoing queries in dig,
xfrin, dispatch, etc.
2020-10-30 11:11:54 +01:00
Ondřej Surý
8797e5efd5 Fix the data race when read-writing sock->active by using cmpxchg 2020-10-22 11:46:58 -07:00
Ondřej Surý
f7c82e406e Fix the isc_nm_closedown() to actually close the pending connections
1. The isc__nm_tcp_send() and isc__nm_tcp_read() was not checking
   whether the socket was still alive and scheduling reads/sends on
   closed socket.

2. The isc_nm_read(), isc_nm_send() and isc_nm_resumeread() have been
   changed to always return the error conditions via the callbacks, so
   they always succeed.  This applies to all protocols (UDP, TCP and
   TCPDNS).
2020-10-22 11:37:16 -07:00
Ondřej Surý
afca2e3b21 Fix the way udp_send_direct() is used
There were two problems how udp_send_direct() was used:

1. The udp_send_direct() can return ISC_R_CANCELED (or translated error
   from uv_udp_send()), but the isc__nm_async_udpsend() wasn't checking
   the error code and not releasing the uvreq in case of an error.

2. In isc__nm_udp_send(), when the UDP send is already in the right
   netthread, it uses udp_send_direct() to send the UDP packet right
   away.  When that happened the uvreq was not freed, and the error code
   was returned to the caller.  We need to return ISC_R_SUCCESS and
   rather use the callback to report an error in such case.
2020-10-22 11:37:16 -07:00
Ondřej Surý
e8b56acb49 Clone the csock in accept_connection(), not in callback
If we clone the csock (children socket) in TCP accept_connection()
instead of passing the ssock (server socket) to the call back and
cloning it there we unbreak the assumption that every socket is handled
inside it's own worker thread and therefore we can get rid of (at least)
callback locking.
2020-10-08 07:24:31 +02:00
Ondřej Surý
b9a42446e8 Enable DF (don't fragment) flag on listening UDP sockets
This commits uses the isc__nm_socket_dontfrag() helper function to
enable setting DF bit on the outgoing UDP packets.
2020-10-05 16:21:21 +02:00
Ondřej Surý
fd975a551d Split reusing the addr/port and load-balancing socket options
The SO_REUSEADDR, SO_REUSEPORT and SO_REUSEPORT_LB has different meaning
on different platform. In this commit, we split the function to set the
reuse of address/port and setting the load-balancing into separate
functions.

The libuv library already have multiplatform support for setting
SO_REUSEADDR and SO_REUSEPORT that allows binding to the same address
and port, but unfortunately, when used after the load-balancing socket
options have been already set, it overrides the previous setting, so we
need our own helper function to enable the SO_REUSEADDR/SO_REUSEPORT
first and then enable the load-balancing socket option.
2020-10-05 15:18:28 +02:00
Ondřej Surý
9dc01a636b Refactor isc__nm_socket_freebind() to take fd and sa_family as args
The isc__nm_socket_freebind() has been refactored to match other
isc__nm_socket_...() helper functions and take uv_os_fd_t and
sa_family_t as function arguments.
2020-10-05 15:18:24 +02:00
Ondřej Surý
5daaca7146 Add SO_REUSEPORT and SO_INCOMING_CPU helper functions
The setting of SO_REUSE**** and SO_INCOMING_CPU have been moved into a
separate helper functions.
2020-10-05 14:54:24 +02:00
Evan Hunt
dcee985b7f update all copyright headers to eliminate the typo 2020-09-14 16:20:40 -07:00
Ondřej Surý
89c534d3b9 properly lock the setting/unsetting of callbacks in isc_nmsocket_t
changes to socket callback functions were not thread safe.
2020-09-11 12:17:57 -07:00
Evan Hunt
57b4dde974 change from isc_nmhandle_ref/unref to isc_nmhandle attach/detach
Attaching and detaching handle pointers will make it easier to
determine where and why reference counting errors have occurred.

A handle needs to be referenced more than once when multiple
asynchronous operations are in flight, so callers must now maintain
multiple handle pointers for each pending operation. For example,
ns_client objects now contain:

        - reqhandle:    held while waiting for a request callback (query,
                        notify, update)
        - sendhandle:   held while waiting for a send callback
        - fetchhandle:  held while waiting for a recursive fetch to
                        complete
        - updatehandle: held while waiting for an update-forwarding
                        task to complete

control channel connection objects now contain:

        - readhandle: held while waiting for a read callback
        - sendhandle: held while waiting for a send callback
        - cmdhandle:  held while an rndc command is running

httpd connections contain:

        - readhandle: held while waiting for a read callback
        - sendhandle: held while waiting for a send callback
2020-09-11 12:17:57 -07:00
Witold Kręcicki
7eb4564895 assorted small netmgr-related changes
- rename isc_nmsocket_t->tcphandle to statichandle
- cancelread functions now take handles instead of sockets
- add a 'client' flag in socket objects, currently unused, to
  indicate whether it is to be used as a client or server socket
2020-09-11 10:24:36 -07:00
Evan Hunt
38264b6a4d Use different allocators for UDP and TCP
Each worker has a receive buffer with space for 20 DNS messages of up
to 2^16 bytes each, and the allocator function passed to uv_read_start()
or uv_udp_recv_start() will reserve a portion of it for use by sockets.
UDP can use recvmmsg() and so it needs that entire space, but TCP reads
one message at a time.

This commit introduces separate allocator functions for TCP and UDP
setting different buffer size limits, so that libuv will provide the
correct buffer sizes to each of them.
2020-08-05 12:57:23 +02:00
Witold Kręcicki
a0f7d28967 netmgr: retry binding with IP_FREEBIND when EADDRNOTAVAIL is returned.
When a new IPv6 interface/address appears it's first in a tentative
state - in which we cannot bind to it, yet it's already being reported
by the route socket. Because of that BIND9 is unable to listen on any
newly detected IPv6 addresses. Fix it by setting IP_FREEBIND option (or
equivalent option on other OSes) and then retrying bind() call.
2020-07-31 12:44:22 +02:00
Witold Kręcicki
1cf65cd882 Fix a shutdown race in netmgr udp
We need to mark the socket as inactive early (and synchronously)
in the stoplistening process; otherwise we might destroy the
callback argument before we actually stop listening, and call
the callback on bad memory.
2020-06-26 00:19:42 -07:00
Evan Hunt
75c985c07f change the signature of recv callbacks to include a result code
this will allow recv event handlers to distinguish between cases
in which the region is NULL because of error, shutdown, or cancelation.
2020-06-19 12:33:26 -07:00
Evan Hunt
5ea26ee1f1 modify reference counting within netmgr
- isc__nmhandle_get() now attaches to the sock in the nmhandle object.
  the caller is responsible for dereferencing the original socket
  pointer when necessary.
- tcpdns listener sockets attach sock->outer to the outer tcp listener
  socket. tcpdns connected sockets attach sock->outerhandle to the handle
  for the tcp connected socket.
- only listener sockets need to be attached/detached directly. connected
  sockets should only be accessed and reference-counted via their
  associated handles.
2020-06-19 09:39:50 +02:00
Evan Hunt
9e740cad21 make isc_nmsocket_{attach,detach}{} functions private
there is no need for a caller to reference-count socket objects.
they need tto be able tto close listener sockets (i.e., those
returned by isc_nm_listen{udp,tcp,tcpdns}), and an isc_nmsocket_close()
function has been added for that. other sockets are only accessed via
handles.
2020-06-19 09:39:50 +02:00
Ondřej Surý
4ec357da0a Don't check the result of setting SO_INCOMING_CPU
The SO_INCOMING_CPU is available since Linux 3.19 for getting the value,
but only since Linux 4.4 for setting the value (see below for a full
description).  BIND 9 should not fail when setting the option on the
socket fails, as this is only an optimization and not hard requirement
to run BIND 9.

    SO_INCOMING_CPU (gettable since Linux 3.19, settable since Linux 4.4)
        Sets or gets the CPU affinity of a socket.  Expects an integer flag.

            int cpu = 1;
            setsockopt(fd, SOL_SOCKET, SO_INCOMING_CPU, &cpu, sizeof(cpu));

        Because all of the packets for a single stream (i.e., all
	packets for the same 4-tuple) arrive on the single RX queue that
	is associated with a particular CPU, the typical use case is to
	employ one listening process per RX queue, with the incoming
	flow being handled by a listener on the same CPU that is
	handling the RX queue.  This provides optimal NUMA behavior and
	keeps CPU caches hot.
2020-06-03 12:44:44 +02:00
Witold Kręcicki
fa02f6438b Don't set UDP recv/send buffer sizes - use system defaults (unless explicitly defined) 2020-05-01 17:04:00 +02:00
Ondřej Surý
09ba47b067 Use SO_REUSEPORT only on Linux, use SO_REUSEPORT_LB on FreeBSD
The SO_REUSEPORT socket option on Linux means something else on BSD
based systems.  On FreeBSD there's 1:1 option SO_REUSEPORT_LB, so we can
use that.
2020-05-01 15:20:55 +02:00