Rework get_ports.sh to make it not use a lock file

The get_ports.sh script is used for determining the range of ports a given system test should use. It first determines the start of the port range to return (the base port); it can either be specified explicitly by the caller or chosen randomly. Subsequent ports are picked sequentially, starting from the base port. To ensure no single port is used by multiple tests, a state file (get_ports.state) containing the last assigned port is maintained by the script. Concurrent access to the state file is protected by a lock file (get_ports.lock); if one instance of the script holds the lock file while another instance tries to acquire it, the latter retries its attempt to acquire the lock file after sleeping for 1 second; this retry process can be repeated up to 10 times before the script returns an error. There are some problems with this approach: - the sleep period in case of failure to acquire the lock file is fixed, which leads to a "thundering herd" type of problem, where (depending on how processes are scheduled by the operating system) multiple system tests try to acquire the lock file at the same time and subsequently sleep for 1 second, only for the same situation to likely happen the next time around, - the lock file is being locked and then unlocked for every single port assignment made, not just once for the entire range of ports a system test should use; in other words, the lock file is currently locked and unlocked 13 times per system test; this increases the odds of the "thundering herd" problem described above preventing a system test from getting one or more ports assigned before the maximum retry count is reached (assuming multiple system tests are run in parallel); it also enables the range of ports used by a given system test to be non-sequential (which is a rather cosmetic issue, but one that can make log interpretation harder than necessary when test failures are diagnosed), - both issues described above cause unnecessary delays when multiple system tests are started in parallel (due to high lock file contention among the system tests being started), - maintaining a state file requires ensuring proper locking, which complicates the script's source code. Rework the get_ports.sh script so that it assigns non-overlapping port ranges to its callers without using a state file or a lock file: - add a new command line switch, "-t", which takes the name of the system test to assign ports for, - ensure every instance of get_ports.sh knows how many ports all system tests which form the test suite are going to need in total (based on the number of subdirectories found in bin/tests/system/), - in order to ensure all instances of get_ports.sh work on the same global port range (so that no port range collisions happen), a stable (throughout the expected run time of a single system test suite) base port selection method is used instead of the random one; specifically, the base port, unless specified explicitly using the "-p" command line switch, is derived from the number of hours which passed since the Unix Epoch time, - use the name of the system test to assign ports for (passed via the new "-t" command line switch) as a unique index into the global system test range, to ensure all system tests use disjoint port ranges.
2025-08-31 14:35:26 +00:00 · 2021-04-08 11:12:37 +02:00
parent a2484f2673
commit 31e5ca4bd9
4 changed files with 32 additions and 74 deletions
--- a/bin/tests/system/.gitignore
+++ b/bin/tests/system/.gitignore
@@ -13,8 +13,6 @@ named.run
 parallel.mk
 /*.log
 /*.trs
-/get_ports.state
-/get_ports.lock
 /run.sh
 /run.log
 /start.sh
--- a/bin/tests/system/Makefile.am
+++ b/bin/tests/system/Makefile.am
@@ -241,7 +241,4 @@ AM_LOG_FLAGS = -r

 $(TESTS): run.sh

-clean-local:
-	-rm -f get_ports.state get_ports.lock
-
 test-local: check
--- a/bin/tests/system/get_ports.sh
+++ b/bin/tests/system/get_ports.sh
@@ -12,60 +12,15 @@
 # This script is a 'port' broker.  It keeps track of ports given to the
 # individual system subtests, so every test is given a unique port range.

-lockfile=get_ports.lock
-statefile=get_ports.state
+total_tests=$(find . -maxdepth 1 -mindepth 1 -type d | wc -l)
+ports_per_test=20

 port_min=5001
-port_max=32767
-
-get_random() (
-    # shellcheck disable=SC2005,SC2046
-    echo $(dd if=/dev/urandom bs=1 count=2 2>/dev/null | od -tu2 -An) | sed -e 's/^0*//'
-)
-
-get_port() {
-    tries=10
-    port=0
-    while [ "${tries}" -gt 0 ]; do
-	if ( set -o noclobber; echo "$$" > "${lockfile}" ) 2> /dev/null; then
-	    trap 'rm -f "${lockfile}"; exit $?' INT TERM EXIT
-
-	    port=$(cat "${statefile}" 2>/dev/null)
-
-	    if [ -z "${port}" ]; then
-		if [ "$1" -gt 0 ]; then
-		    port="$1"
-		else
-		    port_range=$((port_max-port_min))
-		    port_random=$(get_random)
-		    port=$((port_random%port_range+port_min))
-		fi
-	    fi
-
-	    if [ "$((port+1))" -gt "${port_max}" ]; then
-		port="${port_min}"
-	    fi
-
-	    echo $((port+1)) > get_ports.state
-
-	    # clean up after yourself, and release your trap
-	    rm -f "${lockfile}"
-	    trap - INT TERM EXIT
-
-	    # we have our port
-	    break
-	fi
-	sleep 1
-	tries=$((tries-1))
-    done
-    if [ "$port" -eq 0 ]; then
-	exit 1
-    fi
-    echo "$port"
-}
+port_max=$((32767 - (total_tests * ports_per_test)))

 baseport=0
-while getopts "p:-:" OPT; do
+test_index=0
+while getopts "p:t:-:" OPT; do
    if [ "$OPT" = "-" ] && [ -n "$OPTARG" ]; then
 	OPT="${OPTARG%%=*}"
 	OPTARG="${OPTARG#$OPT}"
@@ -75,24 +30,36 @@ while getopts "p:-:" OPT; do
    # shellcheck disable=SC2214
    case "$OPT" in
 	p | port) baseport=$OPTARG ;;
+	t | test)
+		test_index=$(find . -maxdepth 1 -mindepth 1 -type d | sort | grep -F -x -n "./${OPTARG}" | cut -d: -f1)
+		if [ -z "${test_index}" ]; then
+			echo "Test '${OPTARG}' not found" >&2
+			exit 1
+		fi
+		;;
 	-) break ;;
 	*) echo "invalid option" >&2; exit 1 ;;
    esac
 done

-echo "export PORT=$(get_port "$baseport")"
-echo "export TLSPORT=$(get_port)"
-echo "export HTTPPORT=$(get_port)"
-echo "export HTTPSPORT=$(get_port)"
-echo "export EXTRAPORT1=$(get_port)"
-echo "export EXTRAPORT2=$(get_port)"
-echo "export EXTRAPORT3=$(get_port)"
-echo "export EXTRAPORT4=$(get_port)"
-echo "export EXTRAPORT5=$(get_port)"
-echo "export EXTRAPORT6=$(get_port)"
-echo "export EXTRAPORT7=$(get_port)"
-echo "export EXTRAPORT8=$(get_port)"
-echo "export CONTROLPORT=$(get_port)"
+port_pool_size=$((port_max - port_min))
+if [ "${baseport}" -eq 0 ]; then
+	baseport="$((($(date +%s) / 3600 % port_pool_size) + port_min + (test_index * ports_per_test)))"
+fi
+
+echo "export PORT=$((baseport))"
+echo "export TLSPORT=$((baseport + 1))"
+echo "export HTTPPORT=$((baseport + 2))"
+echo "export HTTPSPORT=$((baseport + 3))"
+echo "export EXTRAPORT1=$((baseport + 4))"
+echo "export EXTRAPORT2=$((baseport + 5))"
+echo "export EXTRAPORT3=$((baseport + 6))"
+echo "export EXTRAPORT4=$((baseport + 7))"
+echo "export EXTRAPORT5=$((baseport + 8))"
+echo "export EXTRAPORT6=$((baseport + 9))"
+echo "export EXTRAPORT7=$((baseport + 10))"
+echo "export EXTRAPORT8=$((baseport + 11))"
+echo "export CONTROLPORT=$((baseport + 12))"

 # Local Variables:
 # sh-basic-offset: 4
--- a/bin/tests/system/run.sh.in
+++ b/bin/tests/system/run.sh.in
@@ -110,12 +110,8 @@ if [ ! -d "${systest}" ]; then
 fi


-# Get the first 10 ports in the set (it is assumed that each test has access
-# to ten or more ports): the query port, the control port and eight extra
-# ports.  Since the lowest numbered port (specified in the command line)
-# will usually be a multiple of 10, the names are chosen so that if this is
-# true, the last digit of EXTRAPORTn is "n".
-eval "$(${srcdir}/get_ports.sh -p "$baseport")"
+# Determine which ports to use for this system test.
+eval "$(cd "${srcdir}" && ./get_ports.sh -p "$baseport" -t "$systest")"

 restart=false