Update system test documentation

Rewrite and reorganize the test documentation to focus on the pytest runner, omit any mentions of the legacy runner which are no longer relevant, and mention a few pytest tricks.
2025-08-29 13:38:26 +00:00 · 2023-11-06 14:45:07 +01:00 · 2023-11-06 14:45:07 +01:00 · fba295600b
commit fba295600b
parent 3e26d99c3c
2 changed files with 470 additions and 793 deletions
--- a/bin/tests/system/README
+++ b/bin/tests/system/README
@ -1,793 +0,0 @@
-Copyright (C) Internet Systems Consortium, Inc. ("ISC")
-
-SPDX-License-Identifier: MPL-2.0
-
-This Source Code Form is subject to the terms of the Mozilla Public
-License, v. 2.0.  If a copy of the MPL was not distributed with this
-file, you can obtain one at https://mozilla.org/MPL/2.0/.
-
-See the COPYRIGHT file distributed with this work for additional
-information regarding copyright ownership.
-
-Introduction
-===
-This directory holds a simple test environment for running bind9 system tests
-involving multiple name servers.
-
-Each system test directory holds a set of scripts and configuration files to
-test different parts of BIND.  The directories are named for the aspect of BIND
-they test, for example:
-
-  dnssec/       DNSSEC tests
-  forward/      Forwarding tests
-  glue/         Glue handling tests
-
-etc.
-
-A system test directory must start with an alphabetic character and may not
-contain any special characters. Only hyphen may be used as a word separator.
-
-Typically each set of tests sets up 2-5 name servers and then performs one or
-more tests against them.  Within the test subdirectory, each name server has a
-separate subdirectory containing its configuration data.  These subdirectories
-are named "nsN" or "ansN" (where N is a number between 1 and 8, e.g. ns1, ans2
-etc.)
-
-The tests are completely self-contained and do not require access to the real
-DNS.  Generally, one of the test servers (usually ns1) is set up as a root
-nameserver and is listed in the hints file of the others.
-
-
-Preparing to Run the Tests
-===
-To enable all servers to run on the same machine, they bind to separate virtual
-IP addresses on the loopback interface.  ns1 runs on 10.53.0.1, ns2 on
-10.53.0.2, etc.  Before running any tests, you must set up these addresses by
-running the command
-
-    sh ifconfig.sh up
-
-as root.  The interfaces can be removed by executing the command:
-
-    sh ifconfig.sh down
-
-... also as root.
-
-The servers use unprivileged ports (above 1024) instead of the usual port 53,
-so they can be run without root privileges once the interfaces have been set
-up.
-
-
-Note for MacOS Users
---
-If you wish to make the interfaces survive across reboots, copy
-org.isc.bind.system and org.isc.bind.system.plist to /Library/LaunchDaemons
-then run
-
-    launchctl load /Library/LaunchDaemons/org.isc.bind.system.plist
-
-... as root.
-
-
-Running the System Tests with pytest
-===
-
-The pytest system test runner is currently in development, but it is the
-recommended way to run tests. Please report issues to QA.
-
-Running an Individual Test
---
-
-pytest -k <test-name>
-
-Note that in comparison to the legacy test runner, some additional tests might
-be picked up when specifying just the system test directory name. To check
-which tests will be executed, you can use the `--collect-only` option. You
-might also be able to find a more specific test name to provide to ensure only
-your desired test is executed. See help for `-k` option in `pytest --help` for
-more info.
-
-It is also possible to run a single individual pytest test case. For example,
-you can use the name test_sslyze_dot to execute just the test_sslyze_dot()
-function from doth/tests_sslyze.py. The entire needed setup and teardown will
-be handled by the framework.
-
-Running All the System Tests
---
-
-Issuing plain `pytest` command without any argument will execute all tests
-sequenatially. To execute them in parallel, ensure you have pytest-xdist
-installed and run:
-
-pytest -n <number-of-workers>
-
-
-Running the System Tests Using the Legacy Runner
-===
-
-!!! WARNING !!!
---
-The legacy way to run system tests is currently being reworked into a pytest
-system test runner described in the previous section. The contents of this
-section might be out of date and no longer applicable. Please try and use the
-pytest runner if possible and report issues and missing features.
-
-Running an Individual Test
---
-The tests can be run individually using the following command:
-
-    sh legacy.run.sh [flags] <test-name> [<test-arguments>]
-
-e.g.
-
-    sh legacy.run.sh [flags] notify
-
-Optional flags are:
-
-    -k              Keep servers running after the test completes.  Each test
-                    usually starts a number of nameservers, either instances
-                    of the "named" being tested, or custom servers (written in
-                    Python or Perl) that feature test-specific behavior.  The
-                    servers are automatically started before the test is run
-                    and stopped after it ends.  This flag leaves them running
-                    at the end of the test, so that additional queries can be
-                    sent by hand.  To stop the servers afterwards, use the
-                    command "sh stop.sh <test-name>".
-
-    -n              Noclean - do not remove the output files if the test
-                    completes successfully.  By default, files created by the
-                    test are deleted if it passes;  they are not deleted if the
-                    test fails.
-
-    -p <number>     Sets the range of ports used by the test.  A block of 100
-                    ports is available for each test, the number given to the
-                    "-p" switch being the number of the start of that block
-                    (e.g. "-p 7900" will mean that the test is able to use
-                    ports 7900 through 7999).  If not specified, the test will
-                    have ports 5000 to 5099 available to it.
-
-Arguments are:
-
-    test-name       Mandatory.  The name of the test, which is the name of the
-                    subdirectory in bin/tests/system holding the test files.
-
-    test-arguments  Optional arguments that are passed to each of the test's
-                    scripts.
-
-
-Running All The System Tests
---
-To run all the system tests, enter the command:
-
-    make [-j numproc] test
-
-The optional "numproc" argument specifies the maximum number of tests that can
-run in parallel.  The default is 1, which means that all of the tests run
-sequentially.  If greater than 1, up to "numproc" tests will run simultaneously,
-new tests being started as tests finish.  Each test will get a unique set of
-ports, so there is no danger of tests interfering with one another.  Parallel
-running will reduce the total time taken to run the BIND system tests, but will
-mean that the output from all the tests sent to the screen will be mixed up
-with one another.
-
-In this case, retention of the output files after a test completes successfully
-is specified by setting the environment variable SYSTEMTEST_NO_CLEAN to 1 prior
-to running make, e.g.
-
-    SYSTEMTEST_NO_CLEAN=1 make [-j numproc] test
-
-while setting environment variable SYSTEMTEST_FORCE_COLOR to 1 forces system
-test output to be printed in color.
-
-
-Format of Test Output
---
-All output from the system tests is in the form of lines with the following
-structure:
-
-    <letter>:<test-name>:<message> [(<number>)]
-
-e.g.
-
-    I:catz:checking that dom1.example is not served by primary (1)
-
-The meanings of the fields are as follows:
-
-<letter>
-This indicates the type of message.  This is one of:
-
-    S   Start of the test
-    A   Start of test (retained for backwards compatibility)
-    T   Start of test (retained for backwards compatibility)
-    E   End of the test
-    I   Information.  A test will typically output many of these messages
-        during its run, indicating test progress.  Note that such a message may
-        be of the form "I:testname:failed", indicating that a sub-test has
-        failed.
-    R   Result.  Each test will result in one such message, which is of the
-        form:
-
-                R:<test-name>:<result>
-
-        where <result> is one of:
-
-            PASS        The test passed
-            FAIL        The test failed
-            SKIPPED     The test was not run, usually because some
-                        prerequisites required to run the test are missing.
-
-<test-name>
-This is the name of the test from which the message emanated, which is also the
-name of the subdirectory holding the test files.
-
-<message>
-This is text output by the test during its execution.
-
-(<number>)
-If present, this will correlate with a file created by the test.  The tests
-execute commands and route the output of each command to a file.  The name of
-this file depends on the command and the test, but will usually be of the form:
-
-    <command>.out.<suffix><number>
-
-e.g. nsupdate.out.test28, dig.out.q3.  This aids diagnosis of problems by
-allowing the output that caused the problem message to be identified.
-
-
-Re-Running the Tests
---
-If there is a requirement to re-run a test (or the entire test suite), the
-files produced by the tests should be deleted first.  Normally, these files are
-deleted if the test succeeds but are retained on error.  The legacy.run.sh
-script automatically calls a given test's clean.sh script before invoking its
-setup.sh script.
-
-Deletion of the files produced by the set of tests (e.g. after the execution of
-make) can be carried out using the command:
-
-    sh cleanall.sh
-
-or
-
-    make testclean
-
-(Note that the Makefile has two other targets for cleaning up files: "clean"
-will delete all the files produced by the tests, as well as the object and
-executable files used by the tests.  "distclean" does all the work of "clean"
-as well as deleting configuration files produced by "configure".)
-
-
-Developer Notes
-===
-This section is intended for developers writing new tests.
-
-
-Overview
---
-As noted above, each test is in a separate directory.  To interact with the
-test framework, the directories contain the following standard files:
-
-prereq.sh   Run at the beginning to determine whether the test can be run at
-            all; if not, we see a R:SKIPPED result.  This file is optional:
-            if not present, the test is assumed to have all its prerequisites
-            met.
-
-setup.sh    Run after prereq.sh, this sets up the preconditions for the tests.
-            Although optional, virtually all tests will require such a file to
-            set up the ports they should use for the test.
-
-tests.sh    Runs the actual tests.  This file is mandatory.
-
-tests_sh_xyz.py  A glue file for the pytest runner for executing shell tests.
-
-clean.sh    Run at the end to clean up temporary files, but only if the test
-            was completed successfully and its running was not inhibited by the
-            "-n" switch being passed to "legacy.run.sh".  Otherwise the
-            temporary files are left in place for inspection.
-
-ns<N>       These subdirectories contain test name servers that can be queried
-	    or can interact with each other.  The value of N indicates the
-	    address the server listens on: for example, ns2 listens on
-	    10.53.0.2, and ns4 on 10.53.0.4.  All test servers use an
-	    unprivileged port, so they don't need to run as root.  These
-	    servers log at the highest debug level and the log is captured in
-	    the file "named.run".
-
-ans<N>      Like ns[X], but these are simple mock name servers implemented in
-            Perl or Python.  They are generally programmed to misbehave in ways
-            named would not so as to exercise named's ability to interoperate
-            with badly behaved name servers.
-
-
-Port Usage
---
-In order for the tests to run in parallel, each test requires a unique set of
-ports.  These are specified by the "-p" option passed to "legacy.run.sh", which
-sets environment variables that the scripts listed above can reference.
-
-The convention used in the system tests is that the number passed is the start
-of a range of 100 ports.  The test is free to use the ports as required,
-although the first ten ports in the block are named and generally tests use the
-named ports for their intended purpose.  The names of the environment variables
-are:
-
-    PORT                     Number to be used for the query port.
-    CONTROLPORT              Number to be used as the RNDC control port.
-    EXTRAPORT1 - EXTRAPORT8  Eight port numbers that can be used as needed.
-
-Two other environment variables are defined:
-
-    LOWPORT                  The lowest port number in the range.
-    HIGHPORT                 The highest port number in the range.
-
-Since port ranges usually start on a boundary of 10, the variables are set such
-that the last digit of the port number corresponds to the number of the
-EXTRAPORTn variable.  For example, if the port range were to start at 5200, the
-port assignments would be:
-
-    PORT = 5200
-    EXTRAPORT1 = 5201
-        :
-    EXTRAPORT8 = 5208
-    CONTROLPORT = 5209
-    LOWPORT = 5200
-    HIGHPORT = 5299
-
-When running tests in parallel (i.e. giving a value of "numproc" greater than 1
-in the "make" command listed above), it is guaranteed that each
-test will get a set of unique port numbers.
-
-
-Writing a Test
---
-The test framework requires up to four shell scripts (listed above) as well as
-a number of nameserver instances to run.  Certain expectations are put on each
-script:
-
-
-General
---
-1. Each of the four scripts will be invoked with the command
-
-    (cd <test-directory> ; sh <script> [<arguments>] )
-
-... so that working directory when the script starts executing is the test
-directory.
-
-2. Arguments can be only passed to the script if the test is being run as a
-one-off with "legacy.run.sh".  In this case, everything on the command line
-after the name of the test is passed to each script.  For example, the command:
-
-    sh legacy.run.sh -p 12300 mytest -D xyz
-
-... will run "mytest" with a port range of 12300 to 12399.  Each of the
-framework scripts provided by the test will be invoked using the remaining
-arguments, e.g.:
-
-   (cd mytest ; sh prereq.sh -D xyz)
-   (cd mytest ; sh setup.sh -D xyz)
-   (cd mytest ; sh tests.sh -D xyz)
-   (cd mytest ; sh clean.sh -D xyz)
-
-No arguments will be passed to the test scripts if the test is run as part of
-a run of the full test suite (e.g. the tests are started with make).
-
-3.  Each script should start with the following lines:
-
-    . ../conf.sh
-
-"conf.sh" defines a series of environment variables together with functions
-useful for the test scripts.
-
-
-prereq.sh
---
-As noted above, this is optional.  If present, it should check whether specific
-software needed to run the test is available and/or whether BIND has been
-configured with the appropriate options required.
-
-    * If the software required to run the test is present and the BIND
-      configure options are correct, prereq.sh should return with a status code
-      of 0.
-
-    * If the software required to run the test is not available and/or BIND
-      has not been configured with the appropriate options, prereq.sh should
-      return with a status code of 1.
-
-    * If there is some other problem (e.g. prerequisite software is available
-      but is not properly configured), a status code of 255 should be returned.
-
-
-setup.sh
---
-This is responsible for setting up the configuration files used in the test.
-
-To cope with the varying port number, ports are not hard-coded into
-configuration files (or, for that matter, scripts that emulate nameservers).
-Instead, setup.sh is responsible for editing the configuration files to set the
-port numbers.
-
-To do this, configuration files should be supplied in the form of templates
-containing tokens identifying ports.  The tokens have the same name as the
-environment variables listed above, but are prefixed and suffixed by the "@"
-symbol.  For example, a fragment of a configuration file template might look
-like:
-
-    controls {
-        inet 10.53.0.1 port @CONTROLPORT@ allow { any; } keys { rndc_key; };
-    };
-
-    options {
-        query-source address 10.53.0.1;
-        notify-source 10.53.0.1;
-        transfer-source 10.53.0.1;
-        port @PORT@;
-        allow-new-zones yes;
-    };
-
-setup.sh should copy the template to the desired filename using the
-"copy_setports" shell function defined in "conf.sh", i.e.
-
-    copy_setports ns1/named.conf.in ns1/named.conf
-
-This replaces the tokens @PORT@, @CONTROLPORT@, @EXTRAPORT1@ through
-@EXTRAPORT8@ with the contents of the environment variables listed above.
-setup.sh should do this for all configuration files required when the test
-starts.
-
-("setup.sh" should also use this method for replacing the tokens in any Perl or
-Python name servers used in the test.)
-
-
-tests.sh
---
-This is the main test file and the contents depend on the test.  The contents
-are completely up to the developer, although most test scripts have a form
-similar to the following for each sub-test:
-
-    1. n=$((n + 1))
-    2. echo_i "prime cache nodata.example ($n)"
-    3. ret=0
-    4. $DIG -p ${PORT} @10.53.0.1 nodata.example TXT > dig.out.test$n
-    5. grep "status: NOERROR" dig.out.test$n > /dev/null || ret=1
-    6. grep "ANSWER: 0," dig.out.test$n > /dev/null || ret=1
-    7. if [ $ret != 0 ]; then echo_i "failed"; fi
-    8. status=$((status + ret))
-
-1.  Increment the test number "n" (initialized to zero at the start of the
-    script).
-
-2.  Indicate that the sub-test is about to begin.  Note that "echo_i" instead
-    of "echo" is used.  echo_i is a function defined in "conf.sh" which will
-    prefix the message with "I:<testname>:", so allowing the output from each
-    test to be identified within the output.  The test number is included in
-    the message in order to tie the sub-test with its output.
-
-3. Initialize return status.
-
-4 - 6. Carry out the sub-test.  In this case, a nameserver is queried (note
-    that the port used is given by the PORT environment variable, which was set
-    by the inclusion of the file "conf.sh" at the start of the script).  The
-    output is routed to a file whose suffix includes the test number.  The
-    response from the server is examined and, in this case, if the required
-    string is not found, an error is indicated by setting "ret" to 1.
-
-7.  If the sub-test failed, a message is printed. "echo_i" is used to print
-    the message to add the prefix "I:<test-name>:" before it is output.
-
-8.  "status", used to track how many of the sub-tests have failed, is
-    incremented accordingly.  The value of "status" determines the status
-    returned by "tests.sh", which in turn determines whether the framework
-    prints the PASS or FAIL message.
-
-Regardless of this, rules that should be followed are:
-
-a.  Use the environment variables set by conf.sh to determine the ports to use
-    for sending and receiving queries.
-
-b.  Use a counter to tag messages and to associate the messages with the output
-    files.
-
-c.  Store all output produced by queries/commands into files.  These files
-    should be named according to the command that produced them, e.g. "dig"
-    output should be stored in a file "dig.out.<suffix>", the suffix being
-    related to the value of the counter.
-
-d.  Use "echo_i" to output informational messages.
-
-e.  Retain a count of test failures and return this as the exit status from
-    the script.
-
-
-tests_sh_xyz.py
---------------
-This glue file is required by the pytest runner in order to find and execute
-the shell tests in tests.sh.
-
-Replace the "xyz" with the system test name and create the file with the
-following contents.
-
-    def test_xyz(run_tests_sh):
-        run_tests_sh()
-
-clean.sh
---
-The inverse of "setup.sh", this is invoked by the framework to clean up the
-test directory.  It should delete all files that have been created by the test
-during its run.
-
-
-Starting Nameservers
---
-As noted earlier, a system test will involve a number of nameservers.  These
-will be either instances of named, or special servers written in a language
-such as Perl or Python.
-
-For the former, the version of "named" being run is that in the "bin/named"
-directory in the tree holding the tests (i.e. if "make test" is being run
-immediately after "make", the version of "named" used is that just built).  The
-configuration files, zone files etc. for these servers are located in
-subdirectories of the test directory named "nsN", where N is a small integer.
-The latter are special nameservers, mostly used for generating deliberately bad
-responses, located in subdirectories named "ansN" (again, N is an integer).
-In addition to configuration files, these directories should hold the
-appropriate script files as well.
-
-Note that the "N" for a particular test forms a single number space, e.g. if
-there is an "ns2" directory, there cannot be an "ans2" directory as well.
-Ideally, the directory numbers should start at 1 and work upwards.
-
-When running a test, the servers are started using "start.sh" (which is nothing
-more than a wrapper for start.pl).  The options for "start.pl" are documented
-in the header for that file, so will not be repeated here.  In summary, when
-invoked by "legacy.run.sh", start.pl looks for directories named "nsN" or
-"ansN" in the test directory and starts the servers it finds there.
-
-
-"named" Command-Line Options
---
-By default, start.pl starts a "named" server with the following options:
-
-    -c named.conf   Specifies the configuration file to use (so by implication,
-                    each "nsN" nameserver's configuration file must be called
-                    named.conf).
-
-    -d 99           Sets the maximum debugging level.
-
-    -D <name>       The "-D" option sets a string used to identify the
-                    nameserver in a process listing.  In this case, the string
-                    is the name of the subdirectory.
-
-    -g              Runs the server in the foreground and logs everything to
-                    stderr.
-
-    -m record
-                    Turns on these memory usage debugging flags.
-
-    -U 4            Uses four listeners.
-
-      Acquires a lock on this file in the "nsN" directory, so
-                    preventing multiple instances of this named running in this
-                    directory (which could possibly interfere with the test).
-
-All output is sent to a file called "named.run" in the nameserver directory.
-
-The options used to start named can be altered.  There are three ways of doing
-this.  "start.pl" checks the methods in a specific order: if a check succeeds,
-the options are set and any other specification is ignored.  In order, these
-are:
-
-1. Specifying options to "start.sh"/"start.pl" after the name of the test
-directory, e.g.
-
-    sh start.sh reclimit ns1 -- "-c n.conf -d 43"
-
-(This is only really useful when running tests interactively.)
-
-2. Including a file called "named.args" in the "nsN" directory.  If present,
-the contents of the first non-commented, non-blank line of the file are used as
-the named command-line arguments.  The rest of the file is ignored.
-
-3. Tweaking the default command line arguments with "-T" options.  This flag is
-used to alter the behavior of BIND for testing and is not documented in the
-ARM.  The presence of certain files in the "nsN" directory adds flags to
-the default command line (the content of the files is irrelevant - it
-is only the presence that counts):
-
-    named.noaa       Appends "-T noaa" to the command line, which causes
-                     "named" to never set the AA bit in an answer.
-
-    named.dropedns   Adds "-T dropedns" to the command line, which causes
-                     "named" to recognise EDNS options in messages, but drop
-                     messages containing them.
-
-    named.maxudp1460 Adds "-T maxudp1460" to the command line, setting the
-                     maximum UDP size handled by named to 1460.
-
-    named.maxudp512  Adds "-T maxudp512" to the command line, setting the
-                     maximum UDP size handled by named to 512.
-
-    named.noedns     Appends "-T noedns" to the command line, which disables
-                     recognition of EDNS options in messages.
-
-    named.notcp      Adds "-T notcp", which disables TCP in "named".
-
-    named.soa        Appends "-T nosoa" to the command line, which disables
-                     the addition of SOA records to negative responses (or to
-                     the additional section if the response is triggered by RPZ
-                     rewriting).
-
-Starting Other Nameservers
---
-In contrast to "named", nameservers written in Perl or Python (whose script
-file should have the name "ans.pl" or "ans.py" respectively) are started with a
-fixed command line.  In essence, the server is given the address and nothing
-else.
-
-(This is not strictly true: Python servers are provided with the number of the
-query port to use.  Altering the port used by Perl servers currently requires
-creating a template file containing the "@PORT@" token, and having "setup.sh"
-substitute the actual port being used before the test starts.)
-
-
-Stopping Nameservers
---
-As might be expected, the test system stops nameservers with the script
-"stop.sh", which is little more than a wrapper for "stop.pl".  Like "start.pl",
-the options available are listed in the file's header and will not be repeated
-here.
-
-In summary though, the nameservers for a given test, if left running by
-specifying the "-k" flag to "legacy.run.sh" when the test is started, can be
-stopped by the command:
-
-    sh stop.sh <test-name> [server]
-
-... where if the server (e.g. "ns1", "ans3") is not specified, all servers
-associated with the test are stopped.
-
-
-Adding a Test to the System Test Suite
---
-Once a test has been created, the following files should be edited:
-
-* conf.sh.common The name of the test should be added to the PARALLEL_COMMON
-variable.
-
-* Makefile.am The name of the test should be added to the TESTS variable.
-
-(It is likely that a future iteration of the system test suite will remove the
-need to edit multiple files to add a test.)
-
-
-Valgrind
---
-When running system tests, named can be run under Valgrind.  The output from
-Valgrind are sent to per-process files that can be reviewed after the test has
-completed.  To enable this, set the USE_VALGRIND environment variable to
-"helgrind" to run the Helgrind tool, or any other value to run the Memcheck
-tool.  To use "helgrind" effectively, build BIND with --disable-atomic.
-
-Developer Notes for pytest runner
-===
-
-Test discovery and collection
---
-There are two distinct types of system tests. The first is a shell script
-tests.sh containing individual test cases executed sequentially and the
-success/failure is determined by return code. The second type is a regular
-pytest file which contains test functions.
-
-Dealing with the regular pytest files doesn't require any special consideration
-as long as the naming conventions are met. Discovering the tests.sh tests is
-more complicated.
-
-The chosen solution is to add a bit of glue for each system test. For every
-tests.sh, there is an accompanying tests_sh_*.py file that contains a test
-function which utilizes a custom run_tests_sh fixture to call the tests.sh
-script. Other solutions were tried and eventually rejected. While this
-introduces a bit of extra glue, it is the most portable, compatible and least
-complex solution.
-
-Module scope
---
-Pytest fixtures can have a scope. The "module" scope is the most important for
-our use. A module is a python file which contains test functions. Every system
-test directory may contain multiple modules (i.e. tests_*.py files)!
-
-The server setup/teardown is done for a module. Bundling test cases together
-inside a single module may save some resources. However, test cases inside a
-single module can't be executed in parallel.
-
-It is possible to execute different modules defined within a single system test
-directory in parallel. This is possible thanks to executing the tests inside a
-temporary directory and proper port assignment to ensure there won't be any
-conflicts.
-
-Test logging
---
-Each module has a separate log which will be saved as pytest.log.txt in the
-temporary directory in which the test is executed. This log includes messages
-for this module setup/teardown as well as any logging from the tests using the
-`logger` fixture. Logging level DEBUG and above will be present in this log.
-
-In general, any log messages using INFO or above will also be printed out
-during pytest execution. In CI, the pytest output is also saved to
-pytest.out.txt in the bin/tests/system directory.
-
-Parallel execution
---
-As mentioned in the previous section, test cases inside a single module can't
-be executed in parallel. To put it differently, all tests cases inside the same
-module must be performed by the same worker/thread. Otherwise, server
-setup/teardown fixtures won't be shared and runtime issues due to port
-collisions are likely to occur.
-
-Pytest-xdist is used for executing pytest test cases in parallel using the `-n
-N_WORKERS` option. By default, xdist will distribute any test case to any
-worker, which would lead to the issue described above. Therefore, conftest.py
-enforces equivalent of `--dist loadscope` option which ensures that test cases
-within the same (module) scope will be handled by the same worker. Parallelism
-is automatically disabled when xdist.scheduler.loadscope library is not
-available.
-
-$ pytest -n auto
-
-Test selection
---
-It is possible to run just a single pytest test case from any module. Use
-standard pytest facility to select the desired test case(s), i.e. pass a
-sufficiently unique identifier for `-k` parameter. You can also check which
-tests will be executed by using the `--collect-only` flag to debug your `-k`
-expression.
-
-Compatibility with older pytest version
---
-Keep in mind that the pytest runner must work with ancient versions of pytest.
-When implementing new features, it is advisable to check feature support in
-pytest and pytest-xdist in older distributions first.
-
-As a general rule, any changes to the pytest runner need to keep working on all
-platforms in CI that use the pytest runner. As of 2023-01-13, the oldest
-supported version is whatever is available in EL8.
-
-We may need to add more compat code eventually to handle breaking upstream
-changes. For example, using request.fspath attribute is already deprecatred in
-latest pytest.
-
-Maintenance Notes for legacy runner
-===
-This section is aimed at developers maintaining BIND's system test framework.
-
-Notes on Parallel Execution
---
-Although execution of an individual test is controlled by "legacy.run.sh",
-which executes the above shell scripts (and starts the relevant servers) for
-each test, the running of all tests in the test suite is controlled by the
-Makefile.
-
-All system tests are capable of being run in parallel.  For this to work, each
-test needs to use a unique set of ports.  To avoid the need to define which
-tests use which ports (and so risk port clashes as further tests are added),
-the ports are determined by "get_ports.sh", a port broker script which keeps
-track of ports given to each individual system test.
-
-
-Cleaning Up From Tests
---
-When a test is run, up to three different types of files are created:
-
-1. Files generated by the test itself, e.g. output from "dig" and "rndc", are
-stored in the test directory.
-
-2. Files produced by named which may not be cleaned up if named exits
-abnormally, e.g. core files, PID files etc., are stored in the test directory.
-
-If the test fails, all these files are retained.  But if the test succeeds,
-they are cleaned up at different times:
-
-1. Files generated by the test itself are cleaned up by the test's own
-"clean.sh", which is called from "legacy.run.sh".
-
-2. Files that may not be cleaned up if named exits abnormally can be removed
-using the "cleanall.sh" script.
--- a/bin/tests/system/README.md
+++ b/bin/tests/system/README.md
@ -0,0 +1,470 @@
+<!--
+Copyright (C) Internet Systems Consortium, Inc. ("ISC")
+
+SPDX-License-Identifier: MPL-2.0
+
+This Source Code Form is subject to the terms of the Mozilla Public
+License, v. 2.0.  If a copy of the MPL was not distributed with this
+file, you can obtain one at https://mozilla.org/MPL/2.0/.
+
+See the COPYRIGHT file distributed with this work for additional
+information regarding copyright ownership.
+-->
+
+# BIND9 System Test Framework
+
+This directory holds test environments for running bind9 system tests involving
+multiple name servers.
+
+Each system test directory holds a set of scripts and configuration files to
+test different parts of BIND.  The directories are named for the aspect of BIND
+they test, for example:
+
+    dnssec/       DNSSEC tests
+    forward/      Forwarding tests
+    glue/         Glue handling tests
+
+etc.
+
+A system test directory must start with an alphabetic character and may not
+contain any special characters. Only hyphen may be used as a word separator.
+
+Typically each set of tests sets up 2-5 name servers and then performs one or
+more tests against them.  Within the test subdirectory, each name server has a
+separate subdirectory containing its configuration data.  These subdirectories
+are named "nsN" or "ansN" (where N is a number between 1 and 8, e.g. ns1, ans2
+etc.)
+
+The tests are completely self-contained and do not require access to the real
+DNS.  Generally, one of the test servers (usually ns1) is set up as a root
+nameserver and is listed in the hints file of the others.
+
+
+## Running the Tests
+
+### Prerequisites
+
+To run system tests, make sure you have the following dependencies installed:
+
+- python3
+- pytest
+- perl
+- dnspython
+- pytest-xdist (for parallel execution)
+
+Individual system tests might also require additional dependencies. If those
+are missing, the affected tests will be skipped and should produce a message
+specifying what additional prerequisites they expect.
+
+### Network Setup
+
+To enable all servers to run on the same machine, they bind to separate virtual
+IP addresses on the loopback interface.  ns1 runs on 10.53.0.1, ns2 on
+10.53.0.2, etc.  Before running any tests, you must set up these addresses by
+running the command
+
+    sh ifconfig.sh up
+
+as root.  The interfaces can be removed by executing the command:
+
+    sh ifconfig.sh down
+
+... also as root.
+
+The servers use unprivileged ports (above 1024) instead of the usual port 53,
+so they can be run without root privileges once the interfaces have been set
+up.
+
+**Note for MacOS Users**
+
+If you wish to make the interfaces survive across reboots, copy
+org.isc.bind.system and org.isc.bind.system.plist to /Library/LaunchDaemons
+then run
+
+    launchctl load /Library/LaunchDaemons/org.isc.bind.system.plist
+
+... as root.
+
+### Running a Single Test
+
+The recommended way is to use pytest and its test selection facilities:
+
+    pytest -k <test-name-or-pattern>
+
+Using `-k` to specify a pattern allows to run a single pytest test case within
+a system test. E.g. you can use `-k test_sslyze_dot` to execute just the
+`test_sslyze_dot()` function from `doth/tests_sslyze.py`.
+
+However, using the `-k` pattern might pick up more tests than intended. You can
+use the `--collect-only` option to check the list of tests which match you `-k`
+pattern. If you just want to execute all system tests within a single test
+directory, you can also use the utility script:
+
+    ./run.sh system_test_dir_name
+
+### Running All the System Tests
+
+Issuing plain `pytest` command without any argument will execute all tests
+sequentially. To execute them in parallel, ensure you have pytest-xdist
+installed and run:
+
+    pytest [-n <number-of-workers>]
+
+Alternately, using the make command is also supported:
+
+    make [-j numproc] test
+
+### Test Artifacts
+
+Each test module is executed inside a unique temporary directory which contains
+all the artifacts from the test run. If the tests succeed, they are deleted by
+default. To override this behaviour, pass `--noclean` to pytest.
+
+The directory name starts with the system test name, followed by `_tmp_XXXXXX`,
+i.e. `dns64_tmp_r07vei9s` for `dns64` test run. Since this name changes each
+run, a convenience symlink that has a stable name is also created. It points to
+the latest test artifacts directory and has a form of `dns64_sh_dns64`
+(depending on the particular test module).
+
+To clean up the temporary directories and symlinks, run `make clean-local` in
+the system test directory.
+
+The following test artifacts are typically available:
+
+- pytest.log.txt: main log file with test output
+- files generated by the test itself, e.g. output from "dig" and "rndc"
+- files produced by named, other tools or helper scripts
+
+
+## Writing System Tests
+
+### File Overview
+
+Tests are organized into system test directories which may hold one or more
+test modules (python files). Each module may have multiple test cases. The
+system test directories may contain the following standard files:
+
+- `tests_*.py`: These python files are picked up by pytest as modules. If they
+  contain any test functions, they're added to the test suite.
+
+- `setup.sh`: This sets up the preconditions for the tests. Although optional,
+  virtually all tests will require such a file to set up the ports they should
+  use for the test.
+
+- `tests.sh`: Any shell-based tests are located within this file. Runs the
+  actual tests.
+
+- `tests_sh_*.py`: A glue file for the pytest runner for executing shell tests.
+
+- `ns<N>`: These subdirectories contain test name servers that can be queried
+  or can interact with each other. The value of N indicates the address the
+  server listens on: for example, ns2 listens on 10.53.0.2, and ns4 on
+  10.53.0.4. All test servers use an unprivileged port, so they don't need to
+  run as root. These servers log at the highest debug level and the log is
+  captured in the file "named.run".
+
+- `ans<N>`: Like ns[X], but these are simple mock name servers implemented in
+  Perl or Python.  They are generally programmed to misbehave in ways named
+  would not so as to exercise named's ability to interoperate with badly
+  behaved name servers.
+
+### Module Scope
+
+A module is a python file which contains test functions. Every system
+test directory may contain multiple modules (i.e. tests_*.py files).
+
+The server setup/teardown is performed for each module. Bundling test cases
+together inside a single module may save some resources. However, test cases
+inside a single module can't be executed in parallel.
+
+It is possible to execute different modules defined within a single system test
+directory in parallel. This is possible thanks to executing the tests inside a
+temporary directory and proper port assignment to ensure there won't be any
+conflicts.
+
+### Port Usage
+
+In order for the tests to run in parallel, each test requires a unique set of
+ports. This is ensured by the pytest runner, which assigns a unique set of
+ports to each test module.
+
+Inside the python tests, it is possible to use the `ports` fixture to get the
+assigned port numbers. They're also set as environment variables. These include:
+
+- `PORT`: used as the basic dns port
+- `TLSPORT`: used as the port for DNS-over-TLS
+- `HTTPPORT`, `HTTPSPORT`: used as the ports for DNS-over-HTTP
+- `CONTROLPORT`: used as the RNDC control port
+- `EXTRAPORT1` through `EXTRAPORT8`: additional ports that can be used as needed
+
+### Logging
+
+Each module has a separate log which will be saved as pytest.log.txt in the
+temporary directory in which the test is executed. This log includes messages
+for this module setup/teardown as well as any logging from the tests using the
+`logger` fixture. Logging level DEBUG and above will be present in this log.
+
+In general, any log messages using INFO or above will also be printed out
+during pytest execution. In CI, the pytest output is also saved to
+pytest.out.txt in the bin/tests/system directory.
+
+### Adding a Test to the System Test Suite
+
+Once a test has been created it will be automatically picked up by the pytest
+runner if it upholds the convention expected by pytest (especially when it comes
+to naming files and test functions).
+
+However, if a new system test directory is created, it also needs to be added to
+`TESTS` in `Makefile.am`, in order to work with `make check`.
+
+
+## Test Files
+
+### setup.sh
+
+This script is responsible for setting up the configuration files used in the
+test. It is used by both the python and shell tests. It is interpreted just
+before the servers are started up for each test module.
+
+To cope with the varying port number, ports are not hard-coded into
+configuration files (or, for that matter, scripts that emulate nameservers).
+Instead, setup.sh is responsible for editing the configuration files to set the
+port numbers.
+
+To do this, configuration files should be supplied in the form of templates
+containing tokens identifying ports.  The tokens have the same name as the
+environment variables listed above, but are prefixed and suffixed by the "@"
+symbol.  For example, a fragment of a configuration file template might look
+like:
+
+    controls {
+        inet 10.53.0.1 port @CONTROLPORT@ allow { any; } keys { rndc_key; };
+    };
+
+    options {
+        query-source address 10.53.0.1;
+        notify-source 10.53.0.1;
+        transfer-source 10.53.0.1;
+        port @PORT@;
+        allow-new-zones yes;
+    };
+
+setup.sh should copy the template to the desired filename using the
+"copy_setports" shell function defined in "conf.sh", i.e.
+
+    copy_setports ns1/named.conf.in ns1/named.conf
+
+This replaces tokens like @PORT@ with the contents of the environment variables
+listed above. setup.sh should do this for all configuration files required when
+the test starts.
+
+("setup.sh" should also use this method for replacing the tokens in any Perl or
+Python name servers used in the test.)
+
+### tests_*.py
+
+These are test modules containing tests written in python. Every test is a
+function which begins with the name `test_` (according to pytest convention). It
+is possible to pass fixtures to the test function by specifying their name as
+function arguments. Fixtures are used to provide context to the tests, e.g.:
+
+- `ports` is a dictionary with assigned port numbers
+- `logger` is a test-specific logging object
+
+### tests_sh_*.py
+
+These are glue files that are required to execute shell based tests (see below).
+These modules shouldn't contain any python tests (use a separate file instead).
+
+### tests.sh
+
+This is the test file for shell based tests.
+
+
+## Nameservers
+
+As noted earlier, a system test will involve a number of nameservers.  These
+will be either instances of named, or special servers written in a language
+such as Perl or Python.
+
+For the former, the version of "named" being run is that in the "bin/named"
+directory in the tree holding the tests (i.e. if "make test" is being run
+immediately after "make", the version of "named" used is that just built).  The
+configuration files, zone files etc. for these servers are located in
+subdirectories of the test directory named "nsN", where N is a small integer.
+The latter are special nameservers, mostly used for generating deliberately bad
+responses, located in subdirectories named "ansN" (again, N is an integer).
+In addition to configuration files, these directories should hold the
+appropriate script files as well.
+
+Note that the "N" for a particular test forms a single number space, e.g. if
+there is an "ns2" directory, there cannot be an "ans2" directory as well.
+Ideally, the directory numbers should start at 1 and work upwards.
+
+When tests are executed, pytest takes care of the test setup and teardown. It
+looks for any `nsN` and `ansN` directories in the system test directory and
+starts those servers.
+
+### `named` Command-Line Options
+
+By default, `named` server is started with the following options:
+
+    -c named.conf   Specifies the configuration file to use (so by implication,
+                    each "nsN" nameserver's configuration file must be called
+                    named.conf).
+
+    -d 99           Sets the maximum debugging level.
+
+    -D <name>       The "-D" option sets a string used to identify the
+                    nameserver in a process listing.  In this case, the string
+                    is the name of the subdirectory.
+
+    -g              Runs the server in the foreground and logs everything to
+                    stderr.
+
+    -m record
+                    Turns on these memory usage debugging flags.
+
+    -U 4            Uses four listeners.
+
+All output is sent to a file called `named.run` in the nameserver directory.
+
+The options used to start named can be altered. There are a couple ways of
+doing this. `start.pl` checks the methods in a specific order: if a check
+succeeds, the options are set and any other specification is ignored.  In order,
+these are:
+
+1. Specifying options to `start.pl` or `start_server` shell utility function
+   after the name of the test directory, e.g.
+
+   start_server --noclean --restart --port ${PORT} ns1 -- "-D xfer-ns1 -T transferinsecs -T transferslowly"
+
+2. Including a file called "named.args" in the "nsN" directory.  If present,
+the contents of the first non-commented, non-blank line of the file are used as
+the named command-line arguments.  The rest of the file is ignored.
+
+3. Tweaking the default command line arguments with "-T" options.  This flag is
+used to alter the behavior of BIND for testing and is not documented in the
+ARM.  The presence of certain files in the "nsN" directory adds flags to
+the default command line (the content of the files is irrelevant - it
+is only the presence that counts):
+
+    named.noaa       Appends "-T noaa" to the command line, which causes
+                     "named" to never set the AA bit in an answer.
+
+    named.dropedns   Adds "-T dropedns" to the command line, which causes
+                     "named" to recognise EDNS options in messages, but drop
+                     messages containing them.
+
+    named.maxudp1460 Adds "-T maxudp1460" to the command line, setting the
+                     maximum UDP size handled by named to 1460.
+
+    named.maxudp512  Adds "-T maxudp512" to the command line, setting the
+                     maximum UDP size handled by named to 512.
+
+    named.noedns     Appends "-T noedns" to the command line, which disables
+                     recognition of EDNS options in messages.
+
+    named.notcp      Adds "-T notcp", which disables TCP in "named".
+
+    named.soa        Appends "-T nosoa" to the command line, which disables
+                     the addition of SOA records to negative responses (or to
+                     the additional section if the response is triggered by RPZ
+                     rewriting).
+
+### Running Nameservers Interactively
+
+In order to debug the nameservers, you can let pytest perform the nameserver
+setup and interact with the servers before the test starts, or even at specific
+points during the test, using the `--trace` option to drop you into pdb debugger
+which pauses the execution of the tests, while keeping the server state intact:
+
+    pytest -k dns64 --trace
+
+
+## Developer Notes
+
+### Test discovery and collection
+
+There are two distinct types of system tests. The first is a shell script
+tests.sh containing individual test cases executed sequentially and the
+success/failure is determined by return code. The second type is a regular
+pytest file which contains test functions.
+
+Dealing with the regular pytest files doesn't require any special consideration
+as long as the naming conventions are met. Discovering the tests.sh tests is
+more complicated.
+
+The chosen solution is to add a bit of glue for each system test. For every
+tests.sh, there is an accompanying tests_sh_*.py file that contains a test
+function which utilizes a custom run_tests_sh fixture to call the tests.sh
+script. Other solutions were tried and eventually rejected. While this
+introduces a bit of extra glue, it is the most portable, compatible and least
+complex solution.
+
+### Compatibility with older pytest version
+
+Keep in mind that the pytest runner must work with ancient versions of pytest.
+When implementing new features, it is advisable to check feature support in
+pytest and pytest-xdist in older distributions first.
+
+As a general rule, any changes to the pytest runner need to keep working on all
+platforms in CI that use the pytest runner. As of 2023-11-14, the oldest
+supported version is whatever is available in EL8.
+
+We may need to add more compat code eventually to handle breaking upstream
+changes. For example, using request.fspath attribute is already deprecated in
+latest pytest.
+
+### Format of Shell Test Output
+
+Shell-based tests have the following format of output:
+
+    <letter>:<test-name>:<message> [(<number>)]
+
+e.g.
+
+    I:catz:checking that dom1.example is not served by primary (1)
+
+The meanings of the fields are as follows:
+
+<letter>
+This indicates the type of message.  This is one of:
+
+    S   Start of the test
+    A   Start of test (retained for backwards compatibility)
+    T   Start of test (retained for backwards compatibility)
+    E   End of the test
+    I   Information.  A test will typically output many of these messages
+        during its run, indicating test progress.  Note that such a message may
+        be of the form "I:testname:failed", indicating that a sub-test has
+        failed.
+    R   Result.  Each test will result in one such message, which is of the
+        form:
+
+                R:<test-tmpdir>:<result>
+
+        where <result> is one of:
+
+            PASS        The test passed
+            FAIL        The test failed
+            SKIPPED     The test was not run, usually because some
+                        prerequisites required to run the test are missing.
+
+<test-tmpdir>
+This is the name of the temporary test directory from which the message
+emanated, which is also the name of the subdirectory holding the test files.
+
+<message>
+This is text output by the test during its execution.
+
+(<number>)
+If present, this will correlate with a file created by the test.  The tests
+execute commands and route the output of each command to a file.  The name of
+this file depends on the command and the test, but will usually be of the form:
+
+    <command>.out.<suffix><number>
+
+e.g. nsupdate.out.test28, dig.out.q3.  This aids diagnosis of problems by
+allowing the output that caused the problem message to be identified.
+