mirror of
https://github.com/openvswitch/ovs
synced 2025-10-23 14:57:06 +00:00
Previously when deciding whether to migrate a hash between slaves we would never move it if it would cause more load on the new slave than the old. This could lead to a situation where the slaves would be imbalanced but no migration would occur since it would flip the load. This will do the migration if it will decrease the ratio. Bug NIC-49
131 lines
6.4 KiB
Plaintext
131 lines
6.4 KiB
Plaintext
========================
|
|
ovs-vswitchd Internals
|
|
========================
|
|
|
|
This document describes some of the internals of the ovs-vswitchd
|
|
process. It is not complete. It tends to be updated on demand, so if
|
|
you have questions about the vswitchd implementation, ask them and
|
|
perhaps we'll add some appropriate documentation here.
|
|
|
|
Most of the ovs-vswitchd implementation is in vswitchd/bridge.c, so
|
|
code references below should be assumed to refer to that file except
|
|
as otherwise specified.
|
|
|
|
Bonding
|
|
=======
|
|
|
|
Bonding allows two or more interfaces (the "slaves") to share network
|
|
traffic. From a high-level point of view, bonded interfaces act like
|
|
a single port, but they have the bandwidth of multiple network
|
|
devices, e.g. two 1 GB physical interfaces act like a single 2 GB
|
|
interface. Bonds also increase robustness: the bonded port does not
|
|
go down as long as at least one of its slaves is up.
|
|
|
|
In vswitchd, a bond always has at least two slaves (and may have
|
|
more). If a configuration error, etc. would cause a bond to have only
|
|
one slave, the port becomes an ordinary port, not a bonded port, and
|
|
none of the special features of bonded ports described in this section
|
|
apply.
|
|
|
|
There are many forms of bonding, but ovs-vswitchd currently implements
|
|
only a single kind, called "source load balancing" or SLB bonding.
|
|
SLB bonding divides traffic among the slaves based on the Ethernet
|
|
source address. This is useful only if the traffic over the bond has
|
|
multiple Ethernet source addresses, for example if network traffic
|
|
from multiple VMs are multiplexed over the bond.
|
|
|
|
Enabling and Disabling Slaves
|
|
-----------------------------
|
|
|
|
When a bond is created, a slave is initially enabled or disabled based
|
|
on whether carrier is detected on the NIC (see iface_create()). After
|
|
that, a slave is disabled if its carrier goes down for a period of
|
|
time longer than the downdelay, and it is enabled if carrier comes up
|
|
for longer than the updelay (see bond_link_status_update()). There is
|
|
one exception where the updelay is skipped: if no slaves at all are
|
|
currently enabled, then the first slave on which carrier comes up is
|
|
enabled immediately.
|
|
|
|
The updelay should be set to a time longer than the STP forwarding
|
|
delay of the physical switch to which the bond port is connected (if
|
|
STP is enabled on that switch). Otherwise, the slave will be enabled,
|
|
and load may be shifted to it, before the physical switch starts
|
|
forwarding packets on that port, which can cause some data to be
|
|
"blackholed" for a time. The exception for a single enabled slave
|
|
does not cause any problem in this regard because when no slaves are
|
|
enabled all output packets are blackholed anyway.
|
|
|
|
When a slave becomes disabled, the vswitch immediately chooses a new
|
|
output port for traffic that was destined for that slave (see
|
|
bond_enable_slave()). It also sends a "gratuitous learning packet" on
|
|
the bond port (on the newly chosen slave) for each MAC address that
|
|
the vswitch has learned on a port other than the bond (see
|
|
bond_send_learning_packets()), to teach the physical switch that the
|
|
new slave should be used in place of the one that is now disabled.
|
|
(This behavior probably makes sense only for a vswitch that has only
|
|
one port (the bond) connected to a physical switch; vswitchd should
|
|
probably provide a way to disable or configure it in other scenarios.)
|
|
|
|
Bond Packet Input
|
|
-----------------
|
|
|
|
Bond packet input processing takes place in process_flow().
|
|
|
|
Bonding accepts unicast packets on any bond slave. This can
|
|
occasionally cause packet duplication for the first few packets sent
|
|
to a given MAC, if the physical switch attached to the bond is
|
|
flooding packets to that MAC because it has not yet learned the
|
|
correct slave for that MAC.
|
|
|
|
Bonding only accepts multicast (and broadcast) packets on a single
|
|
bond slave (the "active slave") at any given time. Multicast packets
|
|
received on other slaves are dropped. Otherwise, every multicast
|
|
packet would be duplicated, once for every bond slave, because the
|
|
physical switch attached to the bond will flood those packets.
|
|
|
|
Bonding also drops received packets when the vswitch has learned that
|
|
the packet's MAC is on a port other than the bond port itself. This is
|
|
because it is likely that the vswitch itself sent the packet out the
|
|
bond port on a different slave and is now receiving the packet back.
|
|
This occurs when the packet is multicast or the physical switch has not
|
|
yet learned the MAC and is flooding it. However, the vswitch makes an
|
|
exception to this rule for broadcast ARP replies, which indicate that
|
|
the MAC has moved to another switch, probably due to VM migration.
|
|
(ARP replies are normally unicast, so this exception does not match
|
|
normal ARP replies. It will match the learning packets sent on bond
|
|
fail-over.)
|
|
|
|
The active slave is simply the first slave to be enabled after the
|
|
bond is created (see bond_choose_active_iface()). If the active slave
|
|
is disabled, then a new active slave is chosen among the slaves that
|
|
remain active. Currently due to the way that configuration works,
|
|
this tends to be the remaining slave whose interface name is first
|
|
alphabetically, but this is by no means guaranteed.
|
|
|
|
Bond Packet Output
|
|
------------------
|
|
|
|
When a packet is sent out a bond port, the bond slave actually used is
|
|
selected based on the packet's source MAC (see choose_output_iface()).
|
|
In particular, the source MAC is hashed into one of 256 values, and
|
|
that value is looked up in a hash table (the "bond hash") kept in the
|
|
"bond_hash" member of struct port. The hash table entry identifies a
|
|
bond slave. If no bond slave has yet been chosen for that hash table
|
|
entry, vswitchd chooses one arbitrarily.
|
|
|
|
Every 10 seconds, vswitchd rebalances the bond slaves (see
|
|
bond_rebalance_port()). To rebalance, vswitchd examines the
|
|
statistics for the number of bytes transmitted by each slave over
|
|
approximately the past minute, with data sent more recently weighted
|
|
more heavily than data sent less recently. It considers each of the
|
|
slaves in order from most-loaded to least-loaded. If highly loaded
|
|
slave H is significantly more heavily loaded than the least-loaded
|
|
slave L, and slave H carries at least two hashes, then vswitchd shifts
|
|
one of H's hashes to L. However, vswitchd will only shift a hash from
|
|
H to L if it will decrease the ratio of the load between H and L by at
|
|
least 0.1.
|
|
|
|
Currently, "significantly more loaded" means that H must carry at
|
|
least 1 Mbps more traffic, and that traffic must be at least 3%
|
|
greater than L's.
|