mirror of
https://gitlab.isc.org/isc-projects/bind9
synced 2025-08-30 14:07:59 +00:00
Update DNS Shotgun integration into Gitlab CI
@@ -1,92 +1 @@
|
||||
Work in progress, please report problems to @tkrizek (@pspacek)
|
||||
|
||||
Usage
|
||||
=====
|
||||
|
||||
Decide what you want to test
|
||||
------------------------------
|
||||
|
||||
Configurable parameters:
|
||||
- `SHOTGUN_TEST_VERSION`
|
||||
- list of versions to be tested
|
||||
- at least one value must be provided
|
||||
- accepts commit or branch or tag name
|
||||
- `SHOTGUN_SCENARIO` - udp (default), tcp, dot, doh
|
||||
- `SHOTGUN_TRAFFIC_MULTIPLIER` - simulated load - if unsure leave default value "10", which is roughly 94 k QPS
|
||||
- `SHOTGUN_DURATION` - first 60 seconds (default) is most interesting because we always start with fresh instance and an empty cache
|
||||
- `SHOTGUN_ROUNDS` - how many test rounds - three recommended (default) so we are not fooled by noise
|
||||
- `SHOTGUN_FLAMEGRAPH` - whether flamegraph should be created, don't use if test is longer than 60 seconds otherwise runner will run out of space and the job will fail
|
||||
- `SHOTGUN_SERVER_THREADS` - default 16 is also max; can be used to limit online CPUs to test performance with less cores
|
||||
- `SHOTGUN_CI_IMAGE_TAG` - leave it as `latest` unless you know what you're doing
|
||||
|
||||
Parameters `SHOTGUN_TEST_VERSION`, `SHOTGUN_SCENARIO`, `SHOTGUN_TRAFFIC_MULTIPLIER` accept either one string or list of strings in Python syntax: `['main', 'v9_16_15', '1234567abcdef']`. If multiple parameters contain list then Cartesian product of all provided values is tested.
|
||||
|
||||
Example #1: Determining maximum load for one version
|
||||
- `SHOTGUN_TEST_VERSION` = `1234567abcdef`
|
||||
- `SHOTGUN_SCENARIO` = `udp`
|
||||
- `SHOTGUN_TRAFFIC_MULTIPLIER` = `[10, 12, 14]`
|
||||
- `SHOTGUN_DURATION` = `60`
|
||||
- `SHOTGUN_ROUNDS` = 3
|
||||
|
||||
Run specified version and fire at it over UDP "10 x base load", "12 x base load", "14 x base load". Repeat three times for each load value. Produces 9 (3 load x 3 runs) charts with response rates. Good for finding maximum load by determining when response load starts dropping.
|
||||
|
||||
Example #2: Comparing performance between versions
|
||||
- `SHOTGUN_TEST_VERSION` = `['main', 'v9_16_15', '1234567abcdef']`
|
||||
- `SHOTGUN_SCENARIO` = `udp`
|
||||
- `SHOTGUN_TRAFFIC_MULTIPLIER` = `10`
|
||||
- `SHOTGUN_DURATION` = `60`
|
||||
- `SHOTGUN_ROUNDS` = 3
|
||||
|
||||
Run each version three times, and fire "10 x base load" at it over UDP (roughly 10 x 9.4 k QPS). Produces 9 (3 versions x 3 runs) charts with response rates. Good for comparison between versions. Assumes the load (traffic multiplier) is set to a value where at least one version is able to keep up, otherwise it would be hard to interpret results.
|
||||
|
||||
Start test
|
||||
--------------
|
||||
Go to https://gitlab.isc.org/isc-projects/bind9-shotgun-ci/-/pipelines/new, fill in the parameters and click on `Run pipeline`.
|
||||
|
||||

|
||||
|
||||
|
||||
Getting results
|
||||
----------------
|
||||
Go to https://gitlab.isc.org/isc-projects/bind9-shotgun-ci/-/pipelines and find your new pipeline. The pipeline will dynamically create a "[child pipeline](https://docs.gitlab.com/ee/ci/parent_child_pipelines.html)" where the actual jobs and results will be. To access it, you can click on the performance or downstream job.
|
||||
|
||||

|
||||
|
||||

|
||||
|
||||
Wait until all jobs are finished. Then download all artifacts from the `postproc` job, it has all the charts and also (optionally) profiling flame charts for each run.
|
||||
|
||||
The charts in the artifacts are various representations of the same data. Depending on what you are trying to find out it might be beneficial to either look at individual runs and study rcodes, or look at summary charts for all runs without rcodes etc.
|
||||
|
||||
|
||||
Interpretation
|
||||
--------------
|
||||
See [video](https://ripe79.ripe.net/archives/video/198/) and [slides](https://ripe79.ripe.net/presentations/45-benchmarking.pdf) with explanation of basic principles.
|
||||
|
||||
Obviously DNS Shotgun does not provide information "why" something is happening, that's left to you imagination.
|
||||
|
||||
Bear in mind that we are testing against the live Internet, so results are noisy and can change over time. Do not compare "old" and "new" results, it's better to retest then to chase ghosts of non-existing performance regressions.
|
||||
|
||||
Obviously you can also ask @tkrizek or @pspacek :-)
|
||||
|
||||
|
||||
Implementation overview
|
||||
============
|
||||
1. User starts Shotgun test job in Gitlab CI with tag `linux-benchmarking`
|
||||
2. TODO: describe child pipeline generation magic
|
||||
2. Gitlab directs job to dedicated Docker executor on VM running inside AWS.
|
||||
- Runner VM needs couple [configuration tricks](https://gitlab.isc.org/isc-private/devops/-/merge_requests/8) to get IPv6 to work inside Docker container running on AWS VM
|
||||
3. Docker executor starts `.gitlab-ci.yaml` script inside dedicated Docker image [shotgun-controller](https://gitlab.isc.org/isc-projects/images/-/merge_requests/114)
|
||||
4. [Script `shotgun_aws.py`](https://gitlab.isc.org/isc-private/bind-qa/-/merge_requests/35) creates two ephemeral VMs in AWS, dedicated for this test:
|
||||
- To do that, the script needs AWS permissions to manage VMs and related resources. Runner machine is associated with special AWS ACL `arn:aws:iam::766250944489:role/al2-amd64-bind9-resolver-benchmarking-IAMRole-OKGJY309ABZR`.
|
||||
- VMs use AMI (VM image) [dedicated for DNS Shotgun tests](https://gitlab.isc.org/isc-projects/images/-/merge_requests/113)
|
||||
- To avoid hardcoding values into `shotgun_aws.py`, the script uses AWS Launch Template (ID `lt-0161f30b78633fdb2`) which can be modified in AWS console
|
||||
- New VMs are tagged with timestamp in `isc:remove_after` tag, which denotes deadline after which the job has to be finished and all resources in AWS can be deleted. This is intended as guard against unlimited spending when Gitlab CI job is cancelled before it finishes teardown phase.
|
||||
- Cleanup script [`cleanup_ephemeral.py`](https://gitlab.isc.org/isc-private/bind-qa/-/merge_requests/35) is run from cron job on Gitlab CI runner machine.
|
||||
5. When VMs are ready, `shotgun_aws.py` executes [Ansible playbook](https://gitlab.nic.cz/knot/resolver-benchmarking/) which orchestrates the test on the two VMs.
|
||||
6. The Ansible playbook connects to test VMs using SSH and runs DNS Shotgun on one machine and resolver under test on the other machine.
|
||||
- VMs act as Docker hosts, i.e. Shotgun and resolver run inside Docker containers
|
||||
- Docker networking is disabled using `--network=host`
|
||||
- VM running DNS Shotgun has extra partition with PCAPs (AWS snapshot ID `snap-0ae93969448ffb83f`)
|
||||
7. When test is finished, Ansible playbook gathers test results and stores them inside Docker container executed directly by Gitlab CI.
|
||||
8. TODO: Describe result gathering & postprocessing magic.
|
||||
Moved to https://gitlab.isc.org/isc-projects/bind9-shotgun-ci/#usage
|
Reference in New Issue
Block a user