mirror of
https://github.com/openvswitch/ovs
synced 2025-08-22 09:58:01 +00:00
Documentation: Add USDT documentation and bpftrace example.
Add the USDT documentation and a bpftrace example using the bridge run USDT probes. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Paolo Valerio <pvalerio@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This commit is contained in:
parent
512fab8f21
commit
ff4c712d45
@ -57,6 +57,7 @@ DOC_SOURCE = \
|
||||
Documentation/topics/porting.rst \
|
||||
Documentation/topics/record-replay.rst \
|
||||
Documentation/topics/tracing.rst \
|
||||
Documentation/topics/usdt-probes.rst \
|
||||
Documentation/topics/userspace-tso.rst \
|
||||
Documentation/topics/userspace-tx-steering.rst \
|
||||
Documentation/topics/windows.rst \
|
||||
|
@ -56,3 +56,4 @@ OVS
|
||||
idl-compound-indexes
|
||||
ovs-extensions
|
||||
userspace-tx-steering
|
||||
usdt-probes
|
||||
|
269
Documentation/topics/usdt-probes.rst
Normal file
269
Documentation/topics/usdt-probes.rst
Normal file
@ -0,0 +1,269 @@
|
||||
..
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
not use this file except in compliance with the License. You may obtain
|
||||
a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
License for the specific language governing permissions and limitations
|
||||
under the License.
|
||||
|
||||
Convention for heading levels in Open vSwitch documentation:
|
||||
|
||||
======= Heading 0 (reserved for the title in a document)
|
||||
------- Heading 1
|
||||
~~~~~~~ Heading 2
|
||||
+++++++ Heading 3
|
||||
''''''' Heading 4
|
||||
|
||||
Avoid deeper levels because they do not render well.
|
||||
|
||||
=============================================
|
||||
User Statically-Defined Tracing (USDT) probes
|
||||
=============================================
|
||||
|
||||
Sometimes it's desired to troubleshoot one of OVS's components in the field.
|
||||
One of the techniques used is to add dynamic tracepoints, for example using
|
||||
perf_. However, the desired dynamic tracepoint and/or the desired variable,
|
||||
might not be available due to compiler optimizations.
|
||||
|
||||
In this case, a well-thought-off, static tracepoint could be permanently added,
|
||||
so it's always available. For OVS we use the DTrace probe macro's, which have
|
||||
little to no overhead when disabled. Various tools exist to enable them. See
|
||||
some examples below.
|
||||
|
||||
|
||||
Compiling with USDT probes enabled
|
||||
----------------------------------
|
||||
|
||||
Since USDT probes are compiled out by default, a compile-time option is
|
||||
available to include them. To add the probes to the generated code, use the
|
||||
following configure option ::
|
||||
|
||||
$ ./configure --enable-usdt-probes
|
||||
|
||||
The following line should be seen in the configure output when the above option
|
||||
is used ::
|
||||
|
||||
checking whether USDT probes are enabled... yes
|
||||
|
||||
|
||||
Listing available probes
|
||||
------------------------
|
||||
|
||||
There are various ways to display USDT probes available in a userspace
|
||||
application. Here we show three examples. All assuming ovs-vswitchd is in the
|
||||
search path with USDT probes enabled:
|
||||
|
||||
You can use the **perf** tool as follows ::
|
||||
|
||||
$ perf buildid-cache --add $(which ovs-vswitchd)
|
||||
$ perf list | grep sdt_
|
||||
sdt_main:poll_block [SDT event]
|
||||
sdt_main:run_start [SDT event]
|
||||
|
||||
You can use the bpftrace_ tool ::
|
||||
|
||||
# bpftrace -l "usdt:$(which ovs-vswitchd):*"
|
||||
usdt:/usr/sbin/ovs-vswitchd:main:poll_block
|
||||
usdt:/usr/sbin/ovs-vswitchd:main:run_start
|
||||
|
||||
.. note::
|
||||
|
||||
If you execute this on a running process,
|
||||
``bpftrace -lp $(pidof ovs-vswitchd) "usdt:*"`` , it will list all USDT
|
||||
events, i.e., also the ones available in the used shared libraries.
|
||||
|
||||
Finally, you can use the **tplist** tool which is part of the bcc_ framework ::
|
||||
|
||||
$ /usr/share/bcc/tools/tplist -vv -l $(which ovs-vswitchd)
|
||||
b'main':b'poll_block' [sema 0x0]
|
||||
location #1 b'/usr/sbin/ovs-vswitchd' 0x407fdc
|
||||
b'main':b'run_start' [sema 0x0]
|
||||
location #1 b'/usr/sbin/ovs-vswitchd' 0x407ff6
|
||||
|
||||
|
||||
Using probes
|
||||
------------
|
||||
|
||||
We will use the OVS sandbox environment in combination with the probes shown
|
||||
above to try out some of the available trace tools. To start up the virtual
|
||||
environment use the ``make sandbox`` command. In addition we have to create
|
||||
a bridge to kick of the main bridge loop ::
|
||||
|
||||
$ ovs-vsctl add-br test_bridge
|
||||
$ ovs-vsctl show
|
||||
055acdca-2f0c-4f6e-b542-f4b6d2c44e08
|
||||
Bridge test_bridge
|
||||
Port test_bridge
|
||||
Interface test_bridge
|
||||
type: internal
|
||||
|
||||
perf
|
||||
~~~~
|
||||
|
||||
Perf is using Linux uprobe based event tracing to for capturing the events.
|
||||
To enable the main:\* probes as displayed above and take an actual trace, you
|
||||
need to execute the following sequence of perf commands ::
|
||||
|
||||
# perf buildid-cache --add $(which ovs-vswitchd)
|
||||
|
||||
# perf list | grep sdt_
|
||||
sdt_main:poll_block [SDT event]
|
||||
sdt_main:run_start [SDT event]
|
||||
|
||||
# perf probe --add=sdt_main:poll_block --add=sdt_main:run_start
|
||||
Added new events:
|
||||
sdt_main:poll_block (on %poll_block in /usr/sbin/ovs-vswitchd)
|
||||
sdt_main:run_start (on %run_start in /usr/sbin/ovs-vswitchd)
|
||||
|
||||
You can now use it in all perf tools, such as:
|
||||
|
||||
perf record -e sdt_main:run_start -aR sleep 1
|
||||
|
||||
# perf record -e sdt_main:run_start -e sdt_main:poll_block \
|
||||
-p $(pidof ovs-vswitchd) sleep 30
|
||||
[ perf record: Woken up 1 times to write data ]
|
||||
[ perf record: Captured and wrote 0.039 MB perf.data (132 samples) ]
|
||||
|
||||
# perf script
|
||||
ovs-vswitchd 8576 [011] 21031.340433: sdt_main:run_start: (407ff6)
|
||||
ovs-vswitchd 8576 [011] 21031.340516: sdt_main:poll_block: (407fdc)
|
||||
ovs-vswitchd 8576 [011] 21031.841726: sdt_main:run_start: (407ff6)
|
||||
ovs-vswitchd 8576 [011] 21031.842088: sdt_main:poll_block: (407fdc)
|
||||
...
|
||||
|
||||
Note that the above examples works with the sandbox environment, so make sure
|
||||
you execute the above command while in the sandbox shell!
|
||||
|
||||
There are a lot more options available with perf, for example, the
|
||||
``--call-graph dwarf`` option, which would give you a call graph in the
|
||||
``perf script`` output. See the perf documentation for more information.
|
||||
|
||||
One other interesting feature is that the perf data can be converted for use
|
||||
by the trace visualizer `Trace Compass`_. This can be done using the
|
||||
``--all --to-ctf`` option to the ``perf data convert`` tool.
|
||||
|
||||
|
||||
bpftrace
|
||||
~~~~~~~~
|
||||
|
||||
bpftrace is a high-level tracing language based on eBPF, which can be used to
|
||||
script USDT probes. Here we will show a simple one-liner to display the
|
||||
USDT probes being hit. However, the script section below reference some more
|
||||
advanced bpftrace scripts.
|
||||
|
||||
This is a simple bpftrace one-liner to show all ``main:*`` USDT probes ::
|
||||
|
||||
# bpftrace -p $(pidof ovs-vswitchd) -e \
|
||||
'usdt::main:* { printf("%s %u [%u] %u %s\n",
|
||||
comm, pid, cpu, elapsed, probe); }'
|
||||
Attaching 2 probes...
|
||||
ovs-vswitchd 8576 [11] 203833199 usdt:main:run_start
|
||||
ovs-vswitchd 8576 [11] 204086854 usdt:main:poll_block
|
||||
ovs-vswitchd 8576 [11] 221611985 usdt:main:run_start
|
||||
ovs-vswitchd 8576 [11] 221892019 usdt:main:poll_block
|
||||
|
||||
|
||||
bcc
|
||||
~~~
|
||||
|
||||
The BPF Compiler Collection (BCC) is a set of tools and scripts that also use
|
||||
eBPF for tracing. The example below uses the ``trace`` tool to show the events
|
||||
while they are being generated ::
|
||||
|
||||
# /usr/share/bcc/tools/trace -T -p $(pidof ovs-vswitchd) \
|
||||
'u::main:run_start' 'u::main:poll_block'
|
||||
TIME PID TID COMM FUNC
|
||||
15:49:06 8576 8576 ovs-vswitchd main:run_start
|
||||
15:49:06 8576 8576 ovs-vswitchd main:poll_block
|
||||
15:49:06 8576 8576 ovs-vswitchd main:run_start
|
||||
15:49:06 8576 8576 ovs-vswitchd main:poll_block
|
||||
^C
|
||||
|
||||
|
||||
Scripts
|
||||
-------
|
||||
To not have to re-invent the wheel when trying to debug complex OVS issues, a
|
||||
set of scripts are provided in the source repository. They are located in the
|
||||
``utilities/usdt-scripts/`` directory, and each script contains detailed
|
||||
information on how they should be used, and what information they provide.
|
||||
|
||||
|
||||
Available probes
|
||||
----------------
|
||||
The next sections describes all the available probes, their use case, and if
|
||||
used in any script, which one. Any new probes being added to OVS should get
|
||||
their own section. See the below "Adding your own probes" section for the
|
||||
used naming convention.
|
||||
|
||||
Available probes in ``ovs_vswitchd``:
|
||||
|
||||
- main:poll_block
|
||||
- main:run_start
|
||||
|
||||
|
||||
probe main:run_start
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
**Description**:
|
||||
The ovs-vswitchd's main process contains a loop that runs every time some work
|
||||
needs to be done. This probe gets triggered every time the loop starts from the
|
||||
beginning. See also the ``main:poll_block`` probe below.
|
||||
|
||||
**Arguments**:
|
||||
|
||||
*None*
|
||||
|
||||
**Script references**:
|
||||
|
||||
- ``utilities/usdt-scripts/bridge_loop.bt``
|
||||
|
||||
|
||||
probe main:poll_block
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
**Description**:
|
||||
The ovs-vswitchd's main process contains a loop that runs every time some work
|
||||
needs to be done. This probe gets triggered every time the loop is done, and
|
||||
it's about to wait for being re-started by a poll_block() call returning.
|
||||
See also the ``main:run_start`` probe above.
|
||||
|
||||
**Arguments**:
|
||||
|
||||
*None*
|
||||
|
||||
**Script references**:
|
||||
|
||||
- ``utilities/usdt-scripts/bridge_loop.bt``
|
||||
|
||||
|
||||
Adding your own probes
|
||||
----------------------
|
||||
|
||||
Adding your own probes is as simple as adding the ``OVS_USDT_PROBE()`` macro
|
||||
to the code. It's similar to the ``DTRACE_PROBExx`` macro's with the difference
|
||||
that it does automatically determine the number of optional arguments.
|
||||
|
||||
The macro requires at least two arguments. The first one being the *provider*,
|
||||
and the second one being the *name*. To keep some consistency with the probe
|
||||
naming, please use the following convention. The *provider* should be the
|
||||
function name, and the *name* should be the name of the tracepoint. If you do
|
||||
function entry and exit like probes, please use ``entry`` and ``exit``.
|
||||
|
||||
If, for some reason, you do not like to use the function name as a *provider*,
|
||||
please prefix it with ``__``, so we know it's not a function name.
|
||||
|
||||
The remaining parameters, up to 10, can be variables, pointers, etc., that
|
||||
might be of interest to capture at this point in the code. Note that the
|
||||
provided variables can cause the compiler to be less effective in optimizing.
|
||||
|
||||
|
||||
|
||||
.. _perf : https://developers.redhat.com/blog/2020/05/29/debugging-vhost-user-tx-contention-in-open-vswitch#
|
||||
.. _bpftrace : https://github.com/iovisor/bpftrace
|
||||
.. _bcc : https://github.com/iovisor/bcc
|
||||
.. _Trace Compass : https://www.eclipse.org/tracecompass/
|
1
NEWS
1
NEWS
@ -43,6 +43,7 @@ Post-v2.16.0
|
||||
* Encap & Decap action support for MPLS packet type.
|
||||
- Ingress policing on Linux now uses 'matchall' classifier instead of
|
||||
'basic', if available.
|
||||
- Add User Statically-Defined Tracing (USDT) probe framework support.
|
||||
|
||||
|
||||
v2.16.0 - 16 Aug 2021
|
||||
|
@ -56,12 +56,13 @@ EXTRA_DIST += \
|
||||
utilities/ovs-vlan-test.in \
|
||||
utilities/ovs-vsctl-bashcomp.bash \
|
||||
utilities/checkpatch.py \
|
||||
utilities/docker/Makefile \
|
||||
utilities/docker/ovs-override.conf \
|
||||
utilities/docker/start-ovs \
|
||||
utilities/docker/create_ovs_db.sh \
|
||||
utilities/docker/debian/Dockerfile \
|
||||
utilities/docker/debian/build-kernel-modules.sh
|
||||
utilities/docker/Makefile \
|
||||
utilities/docker/ovs-override.conf \
|
||||
utilities/docker/start-ovs \
|
||||
utilities/docker/create_ovs_db.sh \
|
||||
utilities/docker/debian/Dockerfile \
|
||||
utilities/docker/debian/build-kernel-modules.sh \
|
||||
utilities/usdt-scripts/bridge_loop.bt
|
||||
MAN_ROOTS += \
|
||||
utilities/ovs-testcontroller.8.in \
|
||||
utilities/ovs-dpctl.8.in \
|
||||
|
120
utilities/usdt-scripts/bridge_loop.bt
Executable file
120
utilities/usdt-scripts/bridge_loop.bt
Executable file
@ -0,0 +1,120 @@
|
||||
#!/usr/bin/env bpftrace
|
||||
/*
|
||||
* Copyright (c) 2021 Red Hat, Inc.
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at:
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*
|
||||
*
|
||||
* Script information:
|
||||
* -------------------
|
||||
* bridge_loop.bt uses the USDTs in the ovs-vswitchd's main bridge loop
|
||||
* to report how long it spends running a single loop, as well as the time
|
||||
* it spends waiting in the poll_loop(). Once done, it will also print a
|
||||
* histogram for the time spent.
|
||||
*
|
||||
* The following is an example of how to use the script on the running
|
||||
* ovs-vswitchd process:
|
||||
*
|
||||
* $ bridge_loop.bt -p `pgrep -n ovs-vswitchd`
|
||||
* Attaching 4 probes...
|
||||
* --------------------------------------------------------------
|
||||
* Tracing ovs-vswitchd's main() loop... Hit Ctrl-C to end.
|
||||
* --------------------------------------------------------------
|
||||
* - [886467@ovs-vswitchd] bridge run loop time : 0:00:00.000230706
|
||||
* - [886467@ovs-vswitchd] poll_block() wait time: 0:00:00.501854292
|
||||
* - [886467@ovs-vswitchd] bridge run loop time : 0:00:00.000266445
|
||||
* - [886467@ovs-vswitchd] poll_block() wait time: 0:00:00.499750288
|
||||
* - [886467@ovs-vswitchd] bridge run loop time : 0:00:00.000254856
|
||||
* - [886467@ovs-vswitchd] poll_block() wait time: 0:00:00.499944280
|
||||
* - [886467@ovs-vswitchd] bridge run loop time : 0:00:00.000267390
|
||||
* - [886467@ovs-vswitchd] poll_block() wait time: 0:00:00.093566288
|
||||
* - [886467@ovs-vswitchd] bridge run loop time : 0:00:00.000316774
|
||||
* - [886467@ovs-vswitchd] poll_block() wait time: 0:00:00.406697754
|
||||
* - [886467@ovs-vswitchd] bridge run loop time : 0:00:00.000264505
|
||||
* ^C
|
||||
* --------------------------------------------------------------
|
||||
* Showing run time histograms in micro seconds:
|
||||
* --------------------------------------------------------------
|
||||
*
|
||||
* @bridge_run_time:
|
||||
* [0, 1000) 6 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
|
||||
*
|
||||
*
|
||||
* @poll_block_wait_time:
|
||||
* [90000, 120000) 1 |@@@@@@@@@@@@@@@@@ |
|
||||
* [120000, 150000) 0 | |
|
||||
* [150000, 180000) 0 | |
|
||||
* [180000, 210000) 0 | |
|
||||
* [210000, 240000) 0 | |
|
||||
* [240000, 270000) 0 | |
|
||||
* [270000, 300000) 0 | |
|
||||
* [300000, 330000) 0 | |
|
||||
* [330000, 360000) 0 | |
|
||||
* [360000, 390000) 0 | |
|
||||
* [390000, 420000) 1 |@@@@@@@@@@@@@@@@@ |
|
||||
* [420000, 450000) 0 | |
|
||||
* [450000, 480000) 0 | |
|
||||
* [480000, 510000) 3 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
|
||||
*/
|
||||
|
||||
|
||||
BEGIN
|
||||
{
|
||||
printf("--------------------------------------------------------------\n");
|
||||
printf("Tracing ovs-vswitchd's main() loop... Hit Ctrl-C to end.\n");
|
||||
printf("--------------------------------------------------------------\n");
|
||||
}
|
||||
|
||||
|
||||
usdt::main:poll_block
|
||||
{
|
||||
@pb_start[tid] = nsecs;
|
||||
if (@rs_start[tid] != 0) {
|
||||
$delta = nsecs - @rs_start[tid];
|
||||
printf("- [%d@%s] bridge run loop time : %u:%2.2u:%2.2u.%9.9u\n",
|
||||
tid, comm,
|
||||
$delta / 3600 / 1000000000,
|
||||
$delta / 60 / 1000000000 % 60,
|
||||
$delta / 1000000000 % 60,
|
||||
$delta % 1000000000);
|
||||
@bridge_run_time = lhist($delta / 1000, 0, 1000000, 1000);
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
usdt::main:run_start
|
||||
{
|
||||
@rs_start[tid] = nsecs;
|
||||
if (@pb_start[tid] != 0) {
|
||||
$delta = nsecs - @pb_start[tid];
|
||||
printf("- [%d@%s] poll_block() wait time: %u:%2.2u:%2.2u.%9.9u\n",
|
||||
tid, comm,
|
||||
$delta / 3600 / 1000000000,
|
||||
$delta / 60 / 1000000000 % 60,
|
||||
$delta / 1000000000 % 60,
|
||||
$delta % 1000000000);
|
||||
@poll_block_wait_time = lhist($delta / 1000, 0, 30000000, 30000);
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
END
|
||||
{
|
||||
clear(@rs_start);
|
||||
clear(@pb_start);
|
||||
|
||||
printf("\n");
|
||||
printf("--------------------------------------------------------------\n");
|
||||
printf("Showing run time histograms in micro seconds:\n");
|
||||
printf("--------------------------------------------------------------");
|
||||
}
|
Loading…
x
Reference in New Issue
Block a user