Introduction
Modern Linux kernels enable "peeking" into any running application using so-called uprobes. To facilitate that, BIND 9.19.17 added several USDT (User Statically-Defined Tracing) tracepoints so that useful data can be extracted from the "interesting" parts of the code. More such tracepoints will be added over time.
The general advantage of uprobes compared to other debugging techniques is that tracepoints embedded in the binaries add zero overhead unless they are enabled (probed). This allows such instrumented binaries to run in production and only be "asked" for debugging information when the need arises.
Requirements
System
At a minimum, using uprobes requires:
- a modern Linux kernel (3.5+) built with
CONFIG_UPROBES=y
, - a working toolchain of choice (SystemTap, perf, BCC); due to the fragile nature of the build process and the kernel-side dependencies of these tools, it is strongly recommended to use packaged installations of these tools,
- root privileges on the host.
BIND 9
- BIND 9.19.17+ with debug symbols,
- the
dtrace
utility (part of SystemTap) must be present inPATH
, - the
<sys/sdt.h>
C header (part of SystemTap) must be present in the include path when building BIND 9.
The USDT tracepoints in BIND 9 will only be enabled if ./configure
produces this line:
checking whether to enable User Statically Defined Tracing support (default is auto)... yes
(Note the yes
keyword at the end of the line.)
Usage
Listing available USDT trace points
To get a list of all available tracepoints in a given BIND 9 version, run the following command in the root directory of a BIND 9 source tree:
find . -name probes.d | xargs grep .
This will produce output similar to the following:
...
./lib/isc/probes.d:provider libisc {
./lib/isc/probes.d: probe job_cb_after(void *, void *, void *);
./lib/isc/probes.d: probe job_cb_before(void *, void *, void *);
./lib/isc/probes.d: probe rwlock_destroy(void *);
./lib/isc/probes.d: probe rwlock_downgrade(void *);
./lib/isc/probes.d: probe rwlock_init(void *);
...
To see where each listed tracepoint is defined, prepend the provider name to the probe name and convert it to uppercase, then search for it in the source tree. So, for probe rwlock_init
, look for LIBISC_RWLOCK_INIT
in the source:
$ git grep -C5 -n LIBISC_RWLOCK_INIT
lib/isc/rwlock.c-295-
lib/isc/rwlock.c-296- atomic_init(&rwl->writers_lock, ISC_RWLOCK_UNLOCKED);
lib/isc/rwlock.c-297- atomic_init(&rwl->writers_barrier, 0);
lib/isc/rwlock.c-298- atomic_init(&rwl->readers_ingress, 0);
lib/isc/rwlock.c-299- atomic_init(&rwl->readers_egress, 0);
lib/isc/rwlock.c:300: LIBISC_RWLOCK_INIT(rwl);
lib/isc/rwlock.c-301-}
lib/isc/rwlock.c-302-
lib/isc/rwlock.c-303-void
lib/isc/rwlock.c-304-isc_rwlock_destroy(isc_rwlock_t *rwl) {
lib/isc/rwlock.c-305- LIBISC_RWLOCK_DESTROY(rwl);
Basic sanity check
The example below shows how to confirm that the USDT probe for read-write lock initialization works as expected. perf
is used in this example as it is arguably the most commonly-available tool from the ones that can be used for probing USDT tracepoints.
USDT tracepoints defined in BIND 9 are spread across various libraries (libisc, libdns, libns, etc.) To probe a specific tracepoint, the library providing it must first be identified. For the LIBISC_RWLOCK_INIT
tracepoint, that would be libisc, i.e. the libisc-9.x.y-z.so
shared library. Its location must be established before moving on - this depends on the build method used and is therefore left as an exercise for the reader.
Let's assume the libisc library was installed at /tmp/bind9-install/lib/libisc-9.19.17-dev.so
. First, we need to ask perf
to scan that library for available tracepoints.
Open a ⚠️ root terminal ⚠️ and run the following command:
perf buildid-cache --add /tmp/bind9-install/lib/libisc-9.19.17-dev.so
This command should return no output. Success can be confirmed by asking perf
to list known tracepoints:
# perf list | grep sdt_
sdt_libisc:job_cb_after [SDT event]
sdt_libisc:job_cb_before [SDT event]
sdt_libisc:rwlock_destroy [SDT event]
sdt_libisc:rwlock_downgrade [SDT event]
sdt_libisc:rwlock_init [SDT event]
sdt_libisc:rwlock_rdlock_acq [SDT event]
sdt_libisc:rwlock_rdlock_req [SDT event]
sdt_libisc:rwlock_rdunlock [SDT event]
sdt_libisc:rwlock_tryrdlock [SDT event]
sdt_libisc:rwlock_tryupgrade [SDT event]
sdt_libisc:rwlock_trywrlock [SDT event]
sdt_libisc:rwlock_wrlock_acq [SDT event]
sdt_libisc:rwlock_wrlock_req [SDT event]
sdt_libisc:rwlock_wrunlock [SDT event]
If the above command does not return any output, something went wrong. Stop here and figure out what and why before moving forward.
Once perf
is "aware" of the above tracepoints, they can be probed. Run:
# perf probe --add="sdt_libisc:rwlock_init"
Added new event:
sdt_libisc:rwlock_init (on %rwlock_init in /tmp/bind9-install/lib/libisc-9.19.17-dev.so)
You can now use it in all perf tools, such as:
perf record -e sdt_libisc:rwlock_init -aR sleep 1
and then start recording sdt_libisc:rwlock_init
events by leaving the following command running in a terminal:
perf record -e sdt_libisc:rwlock_init -a
Then, in another terminal, start named -g -c /dev/null
. Switch back to the terminal running perf record
and interrupt the command using CTRL+C. Wait until it terminates. Then, run:
perf script
The output of this command should look similar to this:
...
named 25786 [003] 7436.802108: sdt_libisc:rwlock_init: (7f6c555b213f) arg1=94615411565728
named 25786 [003] 7436.802110: sdt_libisc:rwlock_init: (7f6c555b213f) arg1=94615411566048
named 25786 [003] 7436.802117: sdt_libisc:rwlock_init: (7f6c555b213f) arg1=94615408800192
named 25786 [003] 7436.802548: sdt_libisc:rwlock_init: (7f6c555b213f) arg1=94615408812880
isc-loop-0002 25792 [002] 7436.816947: sdt_libisc:rwlock_init: (7f6c555b213f) arg1=140102974060704
isc-loop-0002 25792 [002] 7436.817036: sdt_libisc:rwlock_init: (7f6c555b213f) arg1=140102974064960
isc-loop-0002 25792 [002] 7436.817362: sdt_libisc:rwlock_init: (7f6c555b213f) arg1=140102974060704
isc-loop-0002 25792 [002] 7436.820991: sdt_libisc:rwlock_init: (7f6c555b213f) arg1=140102974136880
This indicates that the USDT tracepoints work correctly.
If the output of the perf script
command is empty, it means perf
was unable to record any sdt_libisc:rwlock_init
events, i.e. something went wrong. For starters, double-check whether the named
instance started up loads the exact shared library that was passed to perf build-id
for scanning - if it loads a different copy of this library from another path, it won't work.
Basic SystemTap example
SystemTap is a very powerful tool, capable of manipulating raw data extracted from user-space probes using a domain-specific language. It works by converting a SystemTap script into C code that is subsequently compiled into a kernel module and loaded into the running kernel. The requested data is then collected from the probes, acted upon, and then the kernel module is unloaded. At a minimum, it requires a working C compiler and Linux kernel headers to be installed on the host machine.
SystemTap works by performing requested actions when specific probes are hit. Here is a basic example for extracting AXFR information from named
using USDT tracepoints.
First, create a SystemTap script by putting the following contents into a filed called axfr.stap
:
probe process("/tmp/bind9-install/lib/libdns-9.19.17-dev.so").mark("xfrin_start") {
printf("[%lu] starting transfer: %s\n", gettimeofday_ms(), user_string($arg2))
}
probe process("/tmp/bind9-install/lib/libdns-9.19.17-dev.so").mark("xfrin_axfr_finalize_end") {
printf("[%lu] transfer done: %s, result=%d\n", gettimeofday_ms(), user_string($arg2), $arg3)
}
(Remember to adjust the absolute path to the libdns-9.x.y-z.so
shared library so that it matches your environment!)
Then, run the following command in a terminal:
stap -v /path/to/axfr.stap
(Remember to include the -v
switch as it is very helpful in tracking the work being carried out.)
This command should produce output similar to the following:
Pass 1: parsed user script and 484 library scripts using 139544virt/114028res/10496shr/103116data kb, in 150usr/20sys/168real ms.
Pass 2: analyzed script: 2 probes, 6 functions, 2 embeds, 0 globals using 141800virt/117576res/11740shr/105372data kb, in 10usr/0sys/7real ms.
Pass 3: translated to C into "/tmp/stapPxT19T/stap_935383170d337a4066c50632eb43e54b_7258_src.c" using 142040virt/119948res/13760shr/105612data kb, in 20usr/70sys/99real ms.
Pass 4: compiled C into "stap_935383170d337a4066c50632eb43e54b_7258.ko" in 3290usr/470sys/3462real ms.
Pass 5: starting run.
Wait patiently until the Pass 5: starting run.
appears - building a kernel module can take a while, particularly the first time around on a given host.
Once stap
is up and running, create a minimal configuration file, named.conf
, that tells named
to mirror the root zone:
zone "." {
type mirror;
};
Then, run named -g -c named.conf
. This should cause output similar to the following to appear in the terminal with stap
running:
[1694426087887] starting transfer: ./IN
[1694426088350] transfer done: ./IN, result=0
This indicates that a root zone (./IN
) transfer started at Unix timestamp 1694426087.887
and completed successfully (result=0
) at Unix timestamp 1694426088.350
. Note how these timestamps correlate to log messages produced by named
:
11-Sep-2023 11:54:47.882 zone ./IN: Transfer started.
...
11-Sep-2023 11:54:48.345 zone ./IN: mirror zone is now in use
If the stap
command above produces no output after starting named
, something went wrong. As in the basic sanity check above, first ensure that the library paths specified in axfr.stap
match those actually loaded by the named
binary started.