2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-22 01:51:26 +00:00
ovs/vswitchd/ovs-vswitchd.8.in
Ilya Maximets 56e315937e vswitchd: Only lock pages that are faulted in.
The main purpose of locking the memory is to ensure that OVS can keep
doing what it did before in case of increased memory pressure, e.g.,
during VM ingest / migration.  Fulfilling this requirement can be
achieved without locking all the allocated memory, but only the pages
already accessed in the past (faulted in).  Processing of the new
traffic involves new memory allocations.  Latency on these operations
can't be guaranteed by the locking.  The main difference would be
the pre-faulting of the stack memory.  However, in order to revalidate
or process upcalls on the same traffic, the same amount of stack is
likely needed, so all the necessary memory will already be faulted in.

Switch 'mlockall' to MCL_ONFAULT to avoid consuming unnecessarily
large amounts of RAM on systems with high core counts.  For example,
in a densely populated OVN cluster this saves about 650 MB of RAM per
node on a system with 64 cores.  This equates to 320 GB of allocated
but unused RAM in a 500 node cluster.

This also makes OVS better suited by default for small systems with
limited amount of memory.

The MCL_ONFAULT flag was introduced in Linux kernel 4.4 and wasn't
available at the time of '--mlockall' introduction, but we can use it
now.  Falling back to an old way of locking in case we're running on
an older kernel just in case.

Only locking the faulted in pages also makes locking compatible with
vhost post-copy live migration by default, because we'll no longer
pre-fault all the guest's memory.  Post-copy relies on userfaultfd
to work on shared huge pages, which is only available in 4.11+ kernels.
So, technically, it should not be possible for MCL_ONFAULT to fail and
the call without it to succeed.  But keeping the check just in case
for now.

Acked-by: Simon Horman <horms@ovn.org>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2024-06-28 23:44:53 +02:00

387 lines
16 KiB
Groff

.\" -*- nroff -*-
.so lib/ovs.tmac
.TH ovs\-vswitchd 8 "@VERSION@" "Open vSwitch" "Open vSwitch Manual"
.\" This program's name:
.ds PN ovs\-vswitchd
.
.SH NAME
ovs\-vswitchd \- Open vSwitch daemon
.
.SH SYNOPSIS
\fBovs\-vswitchd \fR[\fIdatabase\fR]
.
.SH DESCRIPTION
A daemon that manages and controls any number of Open vSwitch switches
on the local machine.
.PP
The \fIdatabase\fR argument specifies how \fBovs\-vswitchd\fR connects
to \fBovsdb\-server\fR. \fIdatabase\fR may be an OVSDB active or
passive connection method, as described in \fBovsdb\fR(7). The
default is \fBunix:@RUNDIR@/db.sock\fR.
.PP
\fBovs\-vswitchd\fR retrieves its configuration from \fIdatabase\fR at
startup. It sets up Open vSwitch datapaths and then operates
switching across each bridge described in its configuration files. As
the database changes, \fBovs\-vswitchd\fR automatically updates its
configuration to match.
.PP
\fBovs\-vswitchd\fR switches may be configured with any of the following
features:
.
.IP \(bu
L2 switching with MAC learning.
.
.IP \(bu
NIC bonding with automatic fail-over and source MAC-based TX load
balancing ("SLB").
.
.IP \(bu
802.1Q VLAN support.
.
.IP \(bu
Port mirroring, with optional VLAN tagging.
.
.IP \(bu
NetFlow v5 flow logging.
.
.IP \(bu
sFlow(R) monitoring.
.
.IP \(bu
Connectivity to an external OpenFlow controller, such as NOX.
.
.PP
Only a single instance of \fBovs\-vswitchd\fR is intended to run at a time.
A single \fBovs\-vswitchd\fR can manage any number of switch instances, up
to the maximum number of supported Open vSwitch datapaths.
.PP
\fBovs\-vswitchd\fR does all the necessary management of Open vSwitch
datapaths itself. Thus, \fBovs\-dpctl\fR(8) (and its userspace
datapath counterparts accessible via \fBovs\-appctl
dpctl/\fIcommand\fR) are not needed with \fBovs\-vswitchd\fR and should
not be used because they can interfere with its operation. These
tools are still useful for diagnostics.
.PP
An Open vSwitch datapath kernel module must be loaded for \fBovs\-vswitchd\fR
to be useful. Refer to the documentation for instructions on how to build and
load the Open vSwitch kernel module.
.PP
.SH OPTIONS
.IP "\fB\-\-mlockall\fR"
Causes \fBovs\-vswitchd\fR to call the \fBmlockall()\fR function, to attempt to
lock all of its process memory into physical RAM on page faults (on allocation,
when running on Linux kernel 4.4 or older), preventing the kernel from paging
any of its memory to disk. This helps to avoid networking interruptions due to
system memory pressure.
.IP
Some systems do not support \fBmlockall()\fR at all, and other systems
only allow privileged users, such as the superuser, to use it.
\fBovs\-vswitchd\fR emits a log message if \fBmlockall()\fR is
unavailable or unsuccessful.
.
.SS "DPDK Options"
For details on initializing \fBovs\-vswitchd\fR to use DPDK ports,
refer to the documentation or \fBovs\-vswitchd.conf.db\fR(5).
.SS "DPDK HW Access Options"
.IP "\fB\-\-hw\-rawio\-access\fR"
Tells \fBovs\-vswitchd\fR to retain the \fBCAP_SYS_RAWIO\fR capability,
to allow userspace drivers access to raw hardware memory. This will
also allow the \fBovs\-vswitchd\fR daemon to call \fBiopl()\fR and
\fBioperm()\fR functions as well as access memory devices to set port
access. This is a \fBvery\fR powerful capability, so generally only
enable as needed for specific hardware (for example mlx5 with full
hardware offload via rte_flow).
.SS "Daemon Options"
.ds DD \
\fBovs\-vswitchd\fR detaches only after it has connected to the \
database, retrieved the initial configuration, and set up that \
configuration.
.so lib/daemon.man
.SS "Service Options"
.so lib/service.man
.SS "Public Key Infrastructure Options"
.so lib/ssl.man
.so lib/ssl-bootstrap.man
.so lib/ssl-peer-ca-cert.man
.SS "Logging Options"
.so lib/vlog.man
.SS "Other Options"
.so lib/unixctl.man
.so lib/common.man
.
.SH "RUNTIME MANAGEMENT COMMANDS"
\fBovs\-appctl\fR(8) can send commands to a running
\fBovs\-vswitchd\fR process. The currently supported commands are
described below. The command descriptions assume an understanding of
how to configure Open vSwitch.
.SS "GENERAL COMMANDS"
.IP "\fBexit\fR \fI--cleanup\fR"
Causes \fBovs\-vswitchd\fR to gracefully terminate. If \fI--cleanup\fR
is specified, deletes flows from datapaths and releases other datapath
resources configured by \fBovs\-vswitchd\fR. Otherwise, datapath
flows and other resources remains undeleted. Resources of datapaths
that are integrated into \fBovs\-vswitchd\fR (e.g. the \fBnetdev\fR
datapath type) are always released regardless of \fI--cleanup\fR
except for ports with \fBinternal\fR type. Use \fI--cleanup\fR to
release \fBinternal\fR ports too.
.
.IP "\fBqos/show-types\fR \fIinterface\fR"
Queries the interface for a list of Quality of Service types that are
configurable via Open vSwitch for the given \fIinterface\fR.
.IP "\fBqos/show\fR \fIinterface\fR"
Queries the kernel for Quality of Service configuration and statistics
associated with the given \fIinterface\fR.
.IP "\fBbfd/show\fR [\fIinterface\fR]"
Displays detailed information about Bidirectional Forwarding Detection
configured on \fIinterface\fR. If \fIinterface\fR is not specified,
then displays detailed information about all interfaces with BFD
enabled.
.IP "\fBbfd/set-forwarding\fR [\fIinterface\fR] \fIstatus\fR"
Force the fault status of the BFD module on \fIinterface\fR (or all
interfaces if none is given) to be \fIstatus\fR. \fIstatus\fR can be
"true", "false", or "normal" which reverts to the standard behavior.
.IP "\fBcfm/show\fR [\fIinterface\fR]"
Displays detailed information about Connectivity Fault Management
configured on \fIinterface\fR. If \fIinterface\fR is not specified,
then displays detailed information about all interfaces with CFM
enabled.
.IP "\fBcfm/set-fault\fR [\fIinterface\fR] \fIstatus\fR"
Force the fault status of the CFM module on \fIinterface\fR (or all
interfaces if none is given) to be \fIstatus\fR. \fIstatus\fR can be
"true", "false", or "normal" which reverts to the standard behavior.
.IP "\fBstp/tcn\fR [\fIbridge\fR]"
Forces a topology change event on \fIbridge\fR if it's running STP. This
may cause it to send Topology Change Notifications to its peers and flush
its MAC table. If no \fIbridge\fR is given, forces a topology change
event on all bridges.
.IP "\fBstp/show\fR [\fIbridge\fR]"
Displays detailed information about spanning tree on the \fIbridge\fR. If
\fIbridge\fR is not specified, then displays detailed information about all
bridges with STP enabled.
.IP "\fBrstp/tcn\fR [\fIbridge\fR]"
Forces a topology change event on \fIbridge\fR if it's running RSTP. This
may cause it to send Topology Change Notifications to its peers and flush
its MAC table. If no \fIbridge\fR is given, forces a topology change
event on all bridges.
.IP "\fBrstp/show\fR [\fIbridge\fR]"
Displays detailed information about rapid spanning tree on the \fIbridge\fR.
If \fIbridge\fR is not specified, then displays detailed information about all
bridges with RSTP enabled.
.SS "BRIDGE COMMANDS"
These commands manage bridges.
.IP "\fBfdb/add\fR \fIbridge\fR \fIport\fR \fIvlan\fR \fImac\fR"
Adds \fImac\fR address to a \fIport\fR and \fIvlan\fR on a \fIbridge\fR. This
utility can be used to pre-populate fdb table without relying on dynamic
mac learning.
.IP "\fBfdb/del\fR \fIbridge\fR \fIvlan\fR \fImac\fR"
Deletes \fImac\fR address from a \fIport\fR and \fIvlan\fR on a \fIbridge\fR.
.IP "\fBfdb/flush\fR [\fIbridge\fR]"
Flushes \fIbridge\fR MAC address learning table, or all learning tables
if no \fIbridge\fR is given.
.IP "\fBfdb/show\fR \fIbridge\fR"
Lists each MAC address/VLAN pair learned by the specified \fIbridge\fR,
along with the port on which it was learned and the age of the entry,
in seconds.
.IP "\fBfdb/stats-clear\fR [\fIbridge\fR]"
Clear \fIbridge\fR MAC address learning table statistics, or all
statistics if no \fIbridge\fR is given.
.IP "\fBfdb/stats-show\fR \fIbridge\fR"
Show MAC address learning table statistics for the specified \fIbridge\fR.
.IP "\fBmdb/flush\fR [\fIbridge\fR]"
Flushes \fIbridge\fR multicast snooping table, or all snooping tables
if no \fIbridge\fR is given.
.IP "\fBmdb/show\fR \fIbridge\fR"
Lists each multicast group/VLAN pair learned by the specified \fIbridge\fR,
along with the port on which it was learned and the age of the entry,
in seconds.
.IP "\fBbridge/reconnect\fR [\fIbridge\fR]"
Makes \fIbridge\fR drop all of its OpenFlow controller connections and
reconnect. If \fIbridge\fR is not specified, then all bridges drop
their controller connections and reconnect.
.IP
This command might be useful for debugging OpenFlow controller issues.
.
.IP "\fBbridge/dump\-flows\fR [\fB\-\-offload-stats\fR] \fIbridge\fR"
Lists all flows in \fIbridge\fR, including those normally hidden to
commands such as \fBovs\-ofctl dump\-flows\fR. Flows set up by mechanisms
such as in-band control and fail-open are hidden from the controller
since it is not allowed to modify or override them.
If \fB\-\-offload-stats\fR are specified then also list statistics for
offloaded packets and bytes, which are a subset of the total packets and
bytes.
.SS "BOND COMMANDS"
These commands manage bonded ports on an Open vSwitch's bridges. To
understand some of these commands, it is important to understand a
detail of the bonding implementation called ``source load balancing''
(SLB). Instead of directly assigning Ethernet source addresses to
members, the bonding implementation computes a function that maps an
48-bit Ethernet source addresses into an 8-bit value (a ``MAC hash''
value). All of the Ethernet addresses that map to a single 8-bit
value are then assigned to a single member.
.IP "\fBbond/list\fR"
Lists all of the bonds, and their members, on each bridge.
.
.IP "\fBbond/show\fR [\fIport\fR]"
Lists all of the bond-specific information (updelay, downdelay, time
until the next rebalance) about the given bonded \fIport\fR, or all
bonded ports if no \fIport\fR is given. Also lists information about
each members: whether it is enabled or disabled, the time to completion
of an updelay or downdelay if one is in progress, whether it is the
active member, the hashes assigned to the member. Any LACP information
related to this bond may be found using the \fBlacp/show\fR command.
.
.IP "\fBbond/migrate\fR \fIport\fR \fIhash\fR \fImember\fR"
Only valid for SLB bonds. Assigns a given MAC hash to a new member.
\fIport\fR specifies the bond port, \fIhash\fR the MAC hash to be
migrated (as a decimal number between 0 and 255), and \fImember\fR the
new member to be assigned.
.IP
The reassignment is not permanent: rebalancing or fail-over will
cause the MAC hash to be shifted to a new member in the usual
manner.
.IP
A MAC hash cannot be migrated to a disabled member.
.IP "\fBbond/set\-active\-member\fR \fIport\fR \fImember\fR"
Sets \fImember\fR as the active member on \fIport\fR. \fImember\fR must
currently be enabled.
.IP
The setting is not permanent: a new active member will be selected
if \fImember\fR becomes disabled.
.IP "\fBbond/enable\-member\fR \fIport\fR \fImember\fR"
.IQ "\fBbond/disable\-member\fR \fIport\fR \fImember\fR"
Enables (or disables) \fImember\fR on the given bond \fIport\fR, skipping any
updelay (or downdelay).
.IP
This setting is not permanent: it persists only until the carrier
status of \fImember\fR changes.
.IP "\fBbond/hash\fR \fImac\fR [\fIvlan\fR] [\fIbasis\fR]"
Returns the hash value which would be used for \fImac\fR with \fIvlan\fR
and \fIbasis\fR if specified.
.
.IP "\fBlacp/show\fR [\fIport\fR]"
Lists all of the LACP related information about the given \fIport\fR:
active or passive, aggregation key, system id, and system priority. Also
lists information about each member: whether it is enabled or disabled,
whether it is attached or detached, port id and priority, actor
information, and partner information. If \fIport\fR is not specified,
then displays detailed information about all interfaces with CFM
enabled.
.
.IP "\fBlacp/stats-show\fR [\fIport\fR]"
Lists various stats about LACP PDUs (number of RX/TX PDUs, bad PDUs received)
and member state (number of times its state expired/defaulted and carrier
status changed) for the given \fIport\fR. If \fIport\fR is not specified,
then displays stats of all interfaces with LACP enabled.
.SS "DPCTL DATAPATH DEBUGGING COMMANDS"
The primary way to configure \fBovs\-vswitchd\fR is through the Open
vSwitch database, e.g. using \fBovs\-vsctl\fR(8). These commands
provide a debugging interface for managing datapaths. They implement
the same features (and syntax) as \fBovs\-dpctl\fR(8). Unlike
\fBovs\-dpctl\fR(8), these commands work with datapaths that are
integrated into \fBovs\-vswitchd\fR (e.g. the \fBnetdev\fR datapath
type).
.PP
.
.ds DX \fBdpctl/\fR
.de DO
\\$2 \\$1 \\$3
..
.so lib/dpctl.man
.
.so lib/dpdk-unixctl.man
.so lib/dpif-netdev-unixctl.man
.so lib/dpif-netlink-unixctl.man
.so lib/netdev-dpdk-unixctl.man
.so lib/odp-execute-unixctl.man
.so ofproto/ofproto-dpif-unixctl.man
.so ofproto/ofproto-unixctl.man
.so lib/vlog-unixctl.man
.so lib/memory-unixctl.man
.so lib/coverage-unixctl.man
.so ofproto/ofproto-tnl-unixctl.man
.
.SH "OPENFLOW IMPLEMENTATION"
.
.PP
This section documents aspects of OpenFlow for which the OpenFlow
specification requires documentation.
.
.SS "Packet buffering."
The OpenFlow specification, version 1.2, says:
.
.IP
Switches that implement buffering are expected to expose, through
documentation, both the amount of available buffering, and the length
of time before buffers may be reused.
.
.PP
Open vSwitch does not maintains any packet buffers.
.
.SS "Bundle lifetime"
The OpenFlow specification, version 1.4, says:
.
.IP
If the switch does not receive any OFPT_BUNDLE_CONTROL or
OFPT_BUNDLE_ADD_MESSAGE message for an opened bundle_id for a switch
defined time greater than 1s, it may send an ofp_error_msg with
OFPET_BUNDLE_FAILED type and OFPBFC_TIMEOUT code. If the switch does
not receive any new message in a bundle apart from echo request and
replies for a switch defined time greater than 1s, it may send an
ofp_error_msg with OFPET_BUNDLE_FAILED type and OFPBFC_TIMEOUT code.
.
.PP
Open vSwitch implements default idle bundle lifetime of 10 seconds.
(This is configurable via \fBother-config:bundle-idle-timeout\fR in
the \fBOpen_vSwitch\fR table. See \fBovs-vswitchd.conf.db\fR(5)
for details.)
.
.SH "LIMITS"
.
.PP
We believe these limits to be accurate as of this writing. These
limits assume the use of the Linux kernel datapath.
.
.IP \(bu
\fBovs\-vswitchd\fR started through \fBovs\-ctl\fR(8) provides a limit of 65535
file descriptors. The limits on the number of bridges and ports is decided by
the availability of file descriptors. With the Linux kernel datapath, creation
of a single bridge consumes three file descriptors and each port
consumes one additional file descriptor. Other platforms
may have different limitations.
.
.IP \(bu
8,192 MAC learning entries per bridge, by default. (This is
configurable via \fBother\-config:mac\-table\-size\fR in the
\fBBridge\fR table. See \fBovs\-vswitchd.conf.db\fR(5) for details.)
.
.IP \(bu
Kernel flows are limited only by memory available to the kernel.
Performance will degrade beyond 1,048,576 kernel flows per bridge with
a 32-bit kernel, beyond 262,144 with a 64-bit kernel.
(\fBovs\-vswitchd\fR should never install anywhere near that many
flows.)
.
.IP \(bu
OpenFlow flows are limited only by available memory. Performance is
linear in the number of unique wildcard patterns. That is, an
OpenFlow table that contains many flows that all match on the same
fields in the same way has a constant-time lookup, but a table that
contains many flows that match on different fields requires lookup
time linear in the number of flows.
.
.IP \(bu
255 ports per bridge participating in 802.1D Spanning Tree Protocol.
.
.IP \(bu
32 mirrors per bridge.
.
.IP \(bu
15 bytes for the name of a port, for ports implemented in the Linux
kernel. Ports implemented in userspace, such as patch ports, do not
have an arbitrary length limitation. OpenFlow also limit port names
to 15 bytes.
.
.SH "SEE ALSO"
.BR ovs\-appctl (8),
.BR ovsdb\-server (1).