mirror of
https://github.com/openvswitch/ovs
synced 2025-10-29 15:28:56 +00:00
With this patch vlan tags are incorporated into bond hashes in addition to the source mac address. NIC-294
132 lines
6.4 KiB
Plaintext
132 lines
6.4 KiB
Plaintext
========================
|
|
ovs-vswitchd Internals
|
|
========================
|
|
|
|
This document describes some of the internals of the ovs-vswitchd
|
|
process. It is not complete. It tends to be updated on demand, so if
|
|
you have questions about the vswitchd implementation, ask them and
|
|
perhaps we'll add some appropriate documentation here.
|
|
|
|
Most of the ovs-vswitchd implementation is in vswitchd/bridge.c, so
|
|
code references below should be assumed to refer to that file except
|
|
as otherwise specified.
|
|
|
|
Bonding
|
|
=======
|
|
|
|
Bonding allows two or more interfaces (the "slaves") to share network
|
|
traffic. From a high-level point of view, bonded interfaces act like
|
|
a single port, but they have the bandwidth of multiple network
|
|
devices, e.g. two 1 GB physical interfaces act like a single 2 GB
|
|
interface. Bonds also increase robustness: the bonded port does not
|
|
go down as long as at least one of its slaves is up.
|
|
|
|
In vswitchd, a bond always has at least two slaves (and may have
|
|
more). If a configuration error, etc. would cause a bond to have only
|
|
one slave, the port becomes an ordinary port, not a bonded port, and
|
|
none of the special features of bonded ports described in this section
|
|
apply.
|
|
|
|
There are many forms of bonding, but ovs-vswitchd currently implements
|
|
only a single kind, called "source load balancing" or SLB bonding.
|
|
SLB bonding divides traffic among the slaves based on the Ethernet
|
|
source address. This is useful only if the traffic over the bond has
|
|
multiple Ethernet source addresses, for example if network traffic
|
|
from multiple VMs are multiplexed over the bond.
|
|
|
|
Enabling and Disabling Slaves
|
|
-----------------------------
|
|
|
|
When a bond is created, a slave is initially enabled or disabled based
|
|
on whether carrier is detected on the NIC (see iface_create()). After
|
|
that, a slave is disabled if its carrier goes down for a period of
|
|
time longer than the downdelay, and it is enabled if carrier comes up
|
|
for longer than the updelay (see bond_link_status_update()). There is
|
|
one exception where the updelay is skipped: if no slaves at all are
|
|
currently enabled, then the first slave on which carrier comes up is
|
|
enabled immediately.
|
|
|
|
The updelay should be set to a time longer than the STP forwarding
|
|
delay of the physical switch to which the bond port is connected (if
|
|
STP is enabled on that switch). Otherwise, the slave will be enabled,
|
|
and load may be shifted to it, before the physical switch starts
|
|
forwarding packets on that port, which can cause some data to be
|
|
"blackholed" for a time. The exception for a single enabled slave
|
|
does not cause any problem in this regard because when no slaves are
|
|
enabled all output packets are blackholed anyway.
|
|
|
|
When a slave becomes disabled, the vswitch immediately chooses a new
|
|
output port for traffic that was destined for that slave (see
|
|
bond_enable_slave()). It also sends a "gratuitous learning packet" on
|
|
the bond port (on the newly chosen slave) for each MAC address that
|
|
the vswitch has learned on a port other than the bond (see
|
|
bond_send_learning_packets()), to teach the physical switch that the
|
|
new slave should be used in place of the one that is now disabled.
|
|
(This behavior probably makes sense only for a vswitch that has only
|
|
one port (the bond) connected to a physical switch; vswitchd should
|
|
probably provide a way to disable or configure it in other scenarios.)
|
|
|
|
Bond Packet Input
|
|
-----------------
|
|
|
|
Bond packet input processing takes place in process_flow().
|
|
|
|
Bonding accepts unicast packets on any bond slave. This can
|
|
occasionally cause packet duplication for the first few packets sent
|
|
to a given MAC, if the physical switch attached to the bond is
|
|
flooding packets to that MAC because it has not yet learned the
|
|
correct slave for that MAC.
|
|
|
|
Bonding only accepts multicast (and broadcast) packets on a single
|
|
bond slave (the "active slave") at any given time. Multicast packets
|
|
received on other slaves are dropped. Otherwise, every multicast
|
|
packet would be duplicated, once for every bond slave, because the
|
|
physical switch attached to the bond will flood those packets.
|
|
|
|
Bonding also drops received packets when the vswitch has learned that
|
|
the packet's MAC is on a port other than the bond port itself. This is
|
|
because it is likely that the vswitch itself sent the packet out the
|
|
bond port on a different slave and is now receiving the packet back.
|
|
This occurs when the packet is multicast or the physical switch has not
|
|
yet learned the MAC and is flooding it. However, the vswitch makes an
|
|
exception to this rule for broadcast ARP replies, which indicate that
|
|
the MAC has moved to another switch, probably due to VM migration.
|
|
(ARP replies are normally unicast, so this exception does not match
|
|
normal ARP replies. It will match the learning packets sent on bond
|
|
fail-over.)
|
|
|
|
The active slave is simply the first slave to be enabled after the
|
|
bond is created (see bond_choose_active_iface()). If the active slave
|
|
is disabled, then a new active slave is chosen among the slaves that
|
|
remain active. Currently due to the way that configuration works,
|
|
this tends to be the remaining slave whose interface name is first
|
|
alphabetically, but this is by no means guaranteed.
|
|
|
|
Bond Packet Output
|
|
------------------
|
|
|
|
When a packet is sent out a bond port, the bond slave actually used is
|
|
selected based on the packet's source MAC and VLAN tag (see
|
|
choose_output_iface()). In particular, the source MAC and VLAN tag
|
|
are hashed into one of 256 values, and that value is looked up in a
|
|
hash table (the "bond hash") kept in the "bond_hash" member of struct
|
|
port. The hash table entry identifies a bond slave. If no bond slave
|
|
has yet been chosen for that hash table entry, vswitchd chooses one
|
|
arbitrarily.
|
|
|
|
Every 10 seconds, vswitchd rebalances the bond slaves (see
|
|
bond_rebalance_port()). To rebalance, vswitchd examines the
|
|
statistics for the number of bytes transmitted by each slave over
|
|
approximately the past minute, with data sent more recently weighted
|
|
more heavily than data sent less recently. It considers each of the
|
|
slaves in order from most-loaded to least-loaded. If highly loaded
|
|
slave H is significantly more heavily loaded than the least-loaded
|
|
slave L, and slave H carries at least two hashes, then vswitchd shifts
|
|
one of H's hashes to L. However, vswitchd will only shift a hash from
|
|
H to L if it will decrease the ratio of the load between H and L by at
|
|
least 0.1.
|
|
|
|
Currently, "significantly more loaded" means that H must carry at
|
|
least 1 Mbps more traffic, and that traffic must be at least 3%
|
|
greater than L's.
|