2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-22 09:58:01 +00:00

Documentation: Add USDT documentation and bpftrace example.

Add the USDT documentation and a bpftrace example using the
bridge run USDT probes.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This commit is contained in:
Eelco Chaudron 2021-12-22 10:17:44 +01:00 committed by Ilya Maximets
parent 512fab8f21
commit ff4c712d45
6 changed files with 399 additions and 6 deletions

View File

@ -57,6 +57,7 @@ DOC_SOURCE = \
Documentation/topics/porting.rst \ Documentation/topics/porting.rst \
Documentation/topics/record-replay.rst \ Documentation/topics/record-replay.rst \
Documentation/topics/tracing.rst \ Documentation/topics/tracing.rst \
Documentation/topics/usdt-probes.rst \
Documentation/topics/userspace-tso.rst \ Documentation/topics/userspace-tso.rst \
Documentation/topics/userspace-tx-steering.rst \ Documentation/topics/userspace-tx-steering.rst \
Documentation/topics/windows.rst \ Documentation/topics/windows.rst \

View File

@ -56,3 +56,4 @@ OVS
idl-compound-indexes idl-compound-indexes
ovs-extensions ovs-extensions
userspace-tx-steering userspace-tx-steering
usdt-probes

View File

@ -0,0 +1,269 @@
..
Licensed under the Apache License, Version 2.0 (the "License"); you may
not use this file except in compliance with the License. You may obtain
a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations
under the License.
Convention for heading levels in Open vSwitch documentation:
======= Heading 0 (reserved for the title in a document)
------- Heading 1
~~~~~~~ Heading 2
+++++++ Heading 3
''''''' Heading 4
Avoid deeper levels because they do not render well.
=============================================
User Statically-Defined Tracing (USDT) probes
=============================================
Sometimes it's desired to troubleshoot one of OVS's components in the field.
One of the techniques used is to add dynamic tracepoints, for example using
perf_. However, the desired dynamic tracepoint and/or the desired variable,
might not be available due to compiler optimizations.
In this case, a well-thought-off, static tracepoint could be permanently added,
so it's always available. For OVS we use the DTrace probe macro's, which have
little to no overhead when disabled. Various tools exist to enable them. See
some examples below.
Compiling with USDT probes enabled
----------------------------------
Since USDT probes are compiled out by default, a compile-time option is
available to include them. To add the probes to the generated code, use the
following configure option ::
$ ./configure --enable-usdt-probes
The following line should be seen in the configure output when the above option
is used ::
checking whether USDT probes are enabled... yes
Listing available probes
------------------------
There are various ways to display USDT probes available in a userspace
application. Here we show three examples. All assuming ovs-vswitchd is in the
search path with USDT probes enabled:
You can use the **perf** tool as follows ::
$ perf buildid-cache --add $(which ovs-vswitchd)
$ perf list | grep sdt_
sdt_main:poll_block [SDT event]
sdt_main:run_start [SDT event]
You can use the bpftrace_ tool ::
# bpftrace -l "usdt:$(which ovs-vswitchd):*"
usdt:/usr/sbin/ovs-vswitchd:main:poll_block
usdt:/usr/sbin/ovs-vswitchd:main:run_start
.. note::
If you execute this on a running process,
``bpftrace -lp $(pidof ovs-vswitchd) "usdt:*"`` , it will list all USDT
events, i.e., also the ones available in the used shared libraries.
Finally, you can use the **tplist** tool which is part of the bcc_ framework ::
$ /usr/share/bcc/tools/tplist -vv -l $(which ovs-vswitchd)
b'main':b'poll_block' [sema 0x0]
location #1 b'/usr/sbin/ovs-vswitchd' 0x407fdc
b'main':b'run_start' [sema 0x0]
location #1 b'/usr/sbin/ovs-vswitchd' 0x407ff6
Using probes
------------
We will use the OVS sandbox environment in combination with the probes shown
above to try out some of the available trace tools. To start up the virtual
environment use the ``make sandbox`` command. In addition we have to create
a bridge to kick of the main bridge loop ::
$ ovs-vsctl add-br test_bridge
$ ovs-vsctl show
055acdca-2f0c-4f6e-b542-f4b6d2c44e08
Bridge test_bridge
Port test_bridge
Interface test_bridge
type: internal
perf
~~~~
Perf is using Linux uprobe based event tracing to for capturing the events.
To enable the main:\* probes as displayed above and take an actual trace, you
need to execute the following sequence of perf commands ::
# perf buildid-cache --add $(which ovs-vswitchd)
# perf list | grep sdt_
sdt_main:poll_block [SDT event]
sdt_main:run_start [SDT event]
# perf probe --add=sdt_main:poll_block --add=sdt_main:run_start
Added new events:
sdt_main:poll_block (on %poll_block in /usr/sbin/ovs-vswitchd)
sdt_main:run_start (on %run_start in /usr/sbin/ovs-vswitchd)
You can now use it in all perf tools, such as:
perf record -e sdt_main:run_start -aR sleep 1
# perf record -e sdt_main:run_start -e sdt_main:poll_block \
-p $(pidof ovs-vswitchd) sleep 30
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.039 MB perf.data (132 samples) ]
# perf script
ovs-vswitchd 8576 [011] 21031.340433: sdt_main:run_start: (407ff6)
ovs-vswitchd 8576 [011] 21031.340516: sdt_main:poll_block: (407fdc)
ovs-vswitchd 8576 [011] 21031.841726: sdt_main:run_start: (407ff6)
ovs-vswitchd 8576 [011] 21031.842088: sdt_main:poll_block: (407fdc)
...
Note that the above examples works with the sandbox environment, so make sure
you execute the above command while in the sandbox shell!
There are a lot more options available with perf, for example, the
``--call-graph dwarf`` option, which would give you a call graph in the
``perf script`` output. See the perf documentation for more information.
One other interesting feature is that the perf data can be converted for use
by the trace visualizer `Trace Compass`_. This can be done using the
``--all --to-ctf`` option to the ``perf data convert`` tool.
bpftrace
~~~~~~~~
bpftrace is a high-level tracing language based on eBPF, which can be used to
script USDT probes. Here we will show a simple one-liner to display the
USDT probes being hit. However, the script section below reference some more
advanced bpftrace scripts.
This is a simple bpftrace one-liner to show all ``main:*`` USDT probes ::
# bpftrace -p $(pidof ovs-vswitchd) -e \
'usdt::main:* { printf("%s %u [%u] %u %s\n",
comm, pid, cpu, elapsed, probe); }'
Attaching 2 probes...
ovs-vswitchd 8576 [11] 203833199 usdt:main:run_start
ovs-vswitchd 8576 [11] 204086854 usdt:main:poll_block
ovs-vswitchd 8576 [11] 221611985 usdt:main:run_start
ovs-vswitchd 8576 [11] 221892019 usdt:main:poll_block
bcc
~~~
The BPF Compiler Collection (BCC) is a set of tools and scripts that also use
eBPF for tracing. The example below uses the ``trace`` tool to show the events
while they are being generated ::
# /usr/share/bcc/tools/trace -T -p $(pidof ovs-vswitchd) \
'u::main:run_start' 'u::main:poll_block'
TIME PID TID COMM FUNC
15:49:06 8576 8576 ovs-vswitchd main:run_start
15:49:06 8576 8576 ovs-vswitchd main:poll_block
15:49:06 8576 8576 ovs-vswitchd main:run_start
15:49:06 8576 8576 ovs-vswitchd main:poll_block
^C
Scripts
-------
To not have to re-invent the wheel when trying to debug complex OVS issues, a
set of scripts are provided in the source repository. They are located in the
``utilities/usdt-scripts/`` directory, and each script contains detailed
information on how they should be used, and what information they provide.
Available probes
----------------
The next sections describes all the available probes, their use case, and if
used in any script, which one. Any new probes being added to OVS should get
their own section. See the below "Adding your own probes" section for the
used naming convention.
Available probes in ``ovs_vswitchd``:
- main:poll_block
- main:run_start
probe main:run_start
~~~~~~~~~~~~~~~~~~~~
**Description**:
The ovs-vswitchd's main process contains a loop that runs every time some work
needs to be done. This probe gets triggered every time the loop starts from the
beginning. See also the ``main:poll_block`` probe below.
**Arguments**:
*None*
**Script references**:
- ``utilities/usdt-scripts/bridge_loop.bt``
probe main:poll_block
~~~~~~~~~~~~~~~~~~~~~
**Description**:
The ovs-vswitchd's main process contains a loop that runs every time some work
needs to be done. This probe gets triggered every time the loop is done, and
it's about to wait for being re-started by a poll_block() call returning.
See also the ``main:run_start`` probe above.
**Arguments**:
*None*
**Script references**:
- ``utilities/usdt-scripts/bridge_loop.bt``
Adding your own probes
----------------------
Adding your own probes is as simple as adding the ``OVS_USDT_PROBE()`` macro
to the code. It's similar to the ``DTRACE_PROBExx`` macro's with the difference
that it does automatically determine the number of optional arguments.
The macro requires at least two arguments. The first one being the *provider*,
and the second one being the *name*. To keep some consistency with the probe
naming, please use the following convention. The *provider* should be the
function name, and the *name* should be the name of the tracepoint. If you do
function entry and exit like probes, please use ``entry`` and ``exit``.
If, for some reason, you do not like to use the function name as a *provider*,
please prefix it with ``__``, so we know it's not a function name.
The remaining parameters, up to 10, can be variables, pointers, etc., that
might be of interest to capture at this point in the code. Note that the
provided variables can cause the compiler to be less effective in optimizing.
.. _perf : https://developers.redhat.com/blog/2020/05/29/debugging-vhost-user-tx-contention-in-open-vswitch#
.. _bpftrace : https://github.com/iovisor/bpftrace
.. _bcc : https://github.com/iovisor/bcc
.. _Trace Compass : https://www.eclipse.org/tracecompass/

1
NEWS
View File

@ -43,6 +43,7 @@ Post-v2.16.0
* Encap & Decap action support for MPLS packet type. * Encap & Decap action support for MPLS packet type.
- Ingress policing on Linux now uses 'matchall' classifier instead of - Ingress policing on Linux now uses 'matchall' classifier instead of
'basic', if available. 'basic', if available.
- Add User Statically-Defined Tracing (USDT) probe framework support.
v2.16.0 - 16 Aug 2021 v2.16.0 - 16 Aug 2021

View File

@ -56,12 +56,13 @@ EXTRA_DIST += \
utilities/ovs-vlan-test.in \ utilities/ovs-vlan-test.in \
utilities/ovs-vsctl-bashcomp.bash \ utilities/ovs-vsctl-bashcomp.bash \
utilities/checkpatch.py \ utilities/checkpatch.py \
utilities/docker/Makefile \ utilities/docker/Makefile \
utilities/docker/ovs-override.conf \ utilities/docker/ovs-override.conf \
utilities/docker/start-ovs \ utilities/docker/start-ovs \
utilities/docker/create_ovs_db.sh \ utilities/docker/create_ovs_db.sh \
utilities/docker/debian/Dockerfile \ utilities/docker/debian/Dockerfile \
utilities/docker/debian/build-kernel-modules.sh utilities/docker/debian/build-kernel-modules.sh \
utilities/usdt-scripts/bridge_loop.bt
MAN_ROOTS += \ MAN_ROOTS += \
utilities/ovs-testcontroller.8.in \ utilities/ovs-testcontroller.8.in \
utilities/ovs-dpctl.8.in \ utilities/ovs-dpctl.8.in \

View File

@ -0,0 +1,120 @@
#!/usr/bin/env bpftrace
/*
* Copyright (c) 2021 Red Hat, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at:
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
*
* Script information:
* -------------------
* bridge_loop.bt uses the USDTs in the ovs-vswitchd's main bridge loop
* to report how long it spends running a single loop, as well as the time
* it spends waiting in the poll_loop(). Once done, it will also print a
* histogram for the time spent.
*
* The following is an example of how to use the script on the running
* ovs-vswitchd process:
*
* $ bridge_loop.bt -p `pgrep -n ovs-vswitchd`
* Attaching 4 probes...
* --------------------------------------------------------------
* Tracing ovs-vswitchd's main() loop... Hit Ctrl-C to end.
* --------------------------------------------------------------
* - [886467@ovs-vswitchd] bridge run loop time : 0:00:00.000230706
* - [886467@ovs-vswitchd] poll_block() wait time: 0:00:00.501854292
* - [886467@ovs-vswitchd] bridge run loop time : 0:00:00.000266445
* - [886467@ovs-vswitchd] poll_block() wait time: 0:00:00.499750288
* - [886467@ovs-vswitchd] bridge run loop time : 0:00:00.000254856
* - [886467@ovs-vswitchd] poll_block() wait time: 0:00:00.499944280
* - [886467@ovs-vswitchd] bridge run loop time : 0:00:00.000267390
* - [886467@ovs-vswitchd] poll_block() wait time: 0:00:00.093566288
* - [886467@ovs-vswitchd] bridge run loop time : 0:00:00.000316774
* - [886467@ovs-vswitchd] poll_block() wait time: 0:00:00.406697754
* - [886467@ovs-vswitchd] bridge run loop time : 0:00:00.000264505
* ^C
* --------------------------------------------------------------
* Showing run time histograms in micro seconds:
* --------------------------------------------------------------
*
* @bridge_run_time:
* [0, 1000) 6 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
*
*
* @poll_block_wait_time:
* [90000, 120000) 1 |@@@@@@@@@@@@@@@@@ |
* [120000, 150000) 0 | |
* [150000, 180000) 0 | |
* [180000, 210000) 0 | |
* [210000, 240000) 0 | |
* [240000, 270000) 0 | |
* [270000, 300000) 0 | |
* [300000, 330000) 0 | |
* [330000, 360000) 0 | |
* [360000, 390000) 0 | |
* [390000, 420000) 1 |@@@@@@@@@@@@@@@@@ |
* [420000, 450000) 0 | |
* [450000, 480000) 0 | |
* [480000, 510000) 3 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
*/
BEGIN
{
printf("--------------------------------------------------------------\n");
printf("Tracing ovs-vswitchd's main() loop... Hit Ctrl-C to end.\n");
printf("--------------------------------------------------------------\n");
}
usdt::main:poll_block
{
@pb_start[tid] = nsecs;
if (@rs_start[tid] != 0) {
$delta = nsecs - @rs_start[tid];
printf("- [%d@%s] bridge run loop time : %u:%2.2u:%2.2u.%9.9u\n",
tid, comm,
$delta / 3600 / 1000000000,
$delta / 60 / 1000000000 % 60,
$delta / 1000000000 % 60,
$delta % 1000000000);
@bridge_run_time = lhist($delta / 1000, 0, 1000000, 1000);
}
}
usdt::main:run_start
{
@rs_start[tid] = nsecs;
if (@pb_start[tid] != 0) {
$delta = nsecs - @pb_start[tid];
printf("- [%d@%s] poll_block() wait time: %u:%2.2u:%2.2u.%9.9u\n",
tid, comm,
$delta / 3600 / 1000000000,
$delta / 60 / 1000000000 % 60,
$delta / 1000000000 % 60,
$delta % 1000000000);
@poll_block_wait_time = lhist($delta / 1000, 0, 30000000, 30000);
}
}
END
{
clear(@rs_start);
clear(@pb_start);
printf("\n");
printf("--------------------------------------------------------------\n");
printf("Showing run time histograms in micro seconds:\n");
printf("--------------------------------------------------------------");
}