mirror of
https://github.com/openvswitch/ovs
synced 2025-10-29 15:28:56 +00:00
Drop use of minimum_ip_size_check in Faucet tutorial which is no longer needed after we fixed a bug that was causing packet length checks to be calculated wrong. Signed-off-by: Ben Pfaff <blp@ovn.org>
1437 lines
61 KiB
ReStructuredText
1437 lines
61 KiB
ReStructuredText
..
|
|
Licensed under the Apache License, Version 2.0 (the "License"); you may
|
|
not use this file except in compliance with the License. You may obtain
|
|
a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software
|
|
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
|
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
|
License for the specific language governing permissions and limitations
|
|
under the License.
|
|
|
|
Convention for heading levels in Open vSwitch documentation:
|
|
|
|
======= Heading 0 (reserved for the title in a document)
|
|
------- Heading 1
|
|
~~~~~~~ Heading 2
|
|
+++++++ Heading 3
|
|
''''''' Heading 4
|
|
|
|
Avoid deeper levels because they do not render well.
|
|
|
|
===================
|
|
OVS Faucet Tutorial
|
|
===================
|
|
|
|
This tutorial demonstrates how Open vSwitch works with a general-purpose
|
|
OpenFlow controller, using the Faucet controller as a simple way to get
|
|
started. It was tested with the "master" branch of Open vSwitch and version
|
|
1.6.15 of Faucet. It does not use advanced or recently added features in OVS
|
|
or Faucet, so other versions of both pieces of software are likely to work
|
|
equally well.
|
|
|
|
The goal of the tutorial is to demonstrate Open vSwitch and Faucet in an
|
|
end-to-end way, that is, to show how it works from the Faucet controller
|
|
configuration at the top, through the OpenFlow flow table, to the datapath
|
|
processing. Along the way, in addition to helping to understand the
|
|
architecture at each level, we discuss performance and troubleshooting issues.
|
|
We hope that this demonstration makes it easier for users and potential users
|
|
to understand how Open vSwitch works and how to debug and troubleshoot it.
|
|
|
|
We provide enough details in the tutorial that you should be able to fully
|
|
follow along by following the instructions.
|
|
|
|
Setting Up OVS
|
|
--------------
|
|
|
|
This section explains how to set up Open vSwitch for the purpose of using it
|
|
with Faucet for the tutorial.
|
|
|
|
You might already have Open vSwitch installed on one or more computers or VMs,
|
|
perhaps set up to control a set of VMs or a physical network. This is
|
|
admirable, but we will be using Open vSwitch in a different way to set up a
|
|
simulation environment called the OVS "sandbox". The sandbox does not use
|
|
virtual machines or containers, which makes it more limited, but on the other
|
|
hand it is (in this writer's opinion) easier to set up.
|
|
|
|
There are two ways to start a sandbox: one that uses the Open vSwitch that is
|
|
already installed on a system, and another that uses a copy of Open vSwitch
|
|
that has been built but not yet installed. The latter is more often used and
|
|
thus better tested, but both should work. The instructions below explain both
|
|
approaches:
|
|
|
|
1. Get a copy of the Open vSwitch source repository using Git, then ``cd`` into
|
|
the new directory::
|
|
|
|
$ git clone https://github.com/openvswitch/ovs.git
|
|
$ cd ovs
|
|
|
|
The default checkout is the master branch. You can check out a tag
|
|
(such as v2.8.0) or a branch (such as origin/branch-2.8), if you
|
|
prefer.
|
|
|
|
2. If you do not already have an installed copy of Open vSwitch on your system,
|
|
or if you do not want to use it for the sandbox (the sandbox will not
|
|
disturb the functionality of any existing switches), then proceed to step 3.
|
|
If you do have an installed copy and you want to use it for the sandbox, try
|
|
to start the sandbox by running::
|
|
|
|
$ tutorial/ovs-sandbox
|
|
|
|
If it is successful, you will find yourself in a subshell environment, which
|
|
is the sandbox (you can exit with ``exit`` or Control+D). If so, you're
|
|
finished and do not need to complete the rest of the steps. If it fails,
|
|
you can proceed to step 3 to build Open vSwitch anyway.
|
|
|
|
3. Before you build, you might want to check that your system meets the build
|
|
requirements. Read :doc:`/intro/install/general` to find out. For this
|
|
tutorial, there is no need to compile the Linux kernel module, or to use any
|
|
of the optional libraries such as OpenSSL, DPDK, or libcap-ng.
|
|
|
|
4. Configure and build Open vSwitch::
|
|
|
|
$ ./boot.sh
|
|
$ ./configure
|
|
$ make -j4
|
|
|
|
5. Try out the sandbox by running::
|
|
|
|
$ make sandbox
|
|
|
|
You can exit the sandbox with ``exit`` or Control+D.
|
|
|
|
Setting up Faucet
|
|
-----------------
|
|
|
|
This section explains how to get a copy of Faucet and set it up
|
|
appropriately for the tutorial. There are many other ways to install
|
|
Faucet, but this simple approach worked well for me. It has the
|
|
advantage that it does not require modifying any system-level files or
|
|
directories on your machine. It does, on the other hand, require
|
|
Docker, so make sure you have it installed and working.
|
|
|
|
It will be a little easier to go through the rest of the tutorial if
|
|
you run these instructions in a separate terminal from the one that
|
|
you're using for Open vSwitch, because it's often necessary to switch
|
|
between one and the other.
|
|
|
|
1. Get a copy of the Faucet source repository using Git, then ``cd``
|
|
into the new directory::
|
|
|
|
$ git clone https://github.com/faucetsdn/faucet.git
|
|
$ cd faucet
|
|
|
|
At this point I checked out the latest tag::
|
|
|
|
$ latest_tag=$(git describe --tags $(git rev-list --tags --max-count=1))
|
|
$ git checkout $latest_tag
|
|
|
|
2. Build a docker container image::
|
|
|
|
$ docker build -t faucet/faucet .
|
|
|
|
This will take a few minutes.
|
|
|
|
3. Create an installation directory under the ``faucet`` directory for
|
|
the docker image to use::
|
|
|
|
$ mkdir inst
|
|
|
|
The Faucet configuration will go in ``inst/faucet.yaml`` and its
|
|
main log will appear in ``inst/faucet.log``. (The official Faucet
|
|
installation instructions call to put these in ``/etc/ryu/faucet``
|
|
and ``/var/log/ryu/faucet``, respectively, but we avoid modifying
|
|
these system directories.)
|
|
|
|
4. Create a container and start Faucet::
|
|
|
|
$ docker run -d --name faucet -v $(pwd)/inst/:/etc/ryu/faucet/ -v $(pwd)/inst/:/var/log/ryu/faucet/ -p 6653:6653 faucet/faucet
|
|
|
|
5. Look in ``inst/faucet.log`` to verify that Faucet started. It will
|
|
probably start with an exception and traceback because we have not
|
|
yet created ``inst/faucet.yaml``.
|
|
|
|
6. Later on, to make a new or updated Faucet configuration take
|
|
effect quickly, you can run::
|
|
|
|
$ docker exec faucet pkill -HUP -f faucet.faucet
|
|
|
|
Another way is to stop and start the Faucet container::
|
|
|
|
$ docker restart faucet
|
|
|
|
You can also stop and delete the container; after this, to start it
|
|
again, you need to rerun the ``docker run`` command::
|
|
|
|
$ docker stop faucet
|
|
$ docker rm faucet
|
|
|
|
Overview
|
|
--------
|
|
|
|
Now that Open vSwitch and Faucet are ready, here's an overview of what
|
|
we're going to do for the remainder of the tutorial:
|
|
|
|
1. Switching: Set up an L2 network with Faucet.
|
|
|
|
2. Routing: Route between multiple L3 networks with Faucet.
|
|
|
|
3. ACLs: Add and modify access control rules.
|
|
|
|
At each step, we will take a look at how the features in question work
|
|
from Faucet at the top to the data plane layer at the bottom. From
|
|
the highest to lowest level, these layers and the software components
|
|
that connect them are:
|
|
|
|
Faucet.
|
|
As the top level in the system, this is the authoritative source of the
|
|
network configuration.
|
|
|
|
Faucet connects to a variety of monitoring and performance tools,
|
|
but we won't use them in this tutorial. Our main insights into the
|
|
system will be through ``faucet.yaml`` for configuration and
|
|
``faucet.log`` to observe state, such as MAC learning and ARP
|
|
resolution, and to tell when we've screwed up configuration syntax
|
|
or semantics.
|
|
|
|
The OpenFlow subsystem in Open vSwitch.
|
|
OpenFlow is the protocol, standardized by the Open Networking Foundation,
|
|
that controllers like Faucet use to control how Open vSwitch and other
|
|
switches treat packets in the network.
|
|
|
|
We will use ``ovs-ofctl``, a utility that comes with Open vSwitch,
|
|
to observe and occasionally modify Open vSwitch's OpenFlow behavior.
|
|
We will also use ``ovs-appctl``, a utility for communicating with
|
|
``ovs-vswitchd`` and other Open vSwitch daemons, to ask "what-if?"
|
|
type questions.
|
|
|
|
In addition, the OVS sandbox by default raises the Open vSwitch
|
|
logging level for OpenFlow high enough that we can learn a great
|
|
deal about OpenFlow behavior simply by reading its log file.
|
|
|
|
Open vSwitch datapath.
|
|
This is essentially a cache designed to accelerate packet processing. Open
|
|
vSwitch includes a few different datapaths, such as one based on the Linux
|
|
kernel and a userspace-only datapath (sometimes called the "DPDK" datapath).
|
|
The OVS sandbox uses the latter, but the principles behind it apply equally
|
|
well to other datapaths.
|
|
|
|
At each step, we discuss how the design of each layer influences
|
|
performance. We demonstrate how Open vSwitch features can be used to
|
|
debug, troubleshoot, and understand the system as a whole.
|
|
|
|
Switching
|
|
---------
|
|
|
|
Layer-2 (L2) switching is the basis of modern networking. It's also
|
|
very simple and a good place to start, so let's set up a switch with
|
|
some VLANs in Faucet and see how it works at each layer. Begin by
|
|
putting the following into ``inst/faucet.yaml``::
|
|
|
|
dps:
|
|
switch-1:
|
|
dp_id: 0x1
|
|
timeout: 3600
|
|
arp_neighbor_timeout: 3600
|
|
interfaces:
|
|
1:
|
|
native_vlan: 100
|
|
2:
|
|
native_vlan: 100
|
|
3:
|
|
native_vlan: 100
|
|
4:
|
|
native_vlan: 200
|
|
5:
|
|
native_vlan: 200
|
|
vlans:
|
|
100:
|
|
200:
|
|
|
|
This configuration file defines a single switch ("datapath" or "dp")
|
|
named ``switch-1``. The switch has five ports, numbered 1 through 5.
|
|
Ports 1, 2, and 3 are in VLAN 100, and ports 4 and 5 are in VLAN 2.
|
|
Faucet can identify the switch from its datapath ID, which is defined
|
|
to be 0x1.
|
|
|
|
.. note::
|
|
|
|
This also sets high MAC learning and ARP timeouts. The defaults are
|
|
5 minutes and about 8 minutes, which are fine in production but
|
|
sometimes too fast for manual experimentation. (Don't use a timeout
|
|
bigger than about 65000 seconds because it will crash Faucet.)
|
|
|
|
Now restart Faucet so that the configuration takes effect, e.g.::
|
|
|
|
$ docker restart faucet
|
|
|
|
Assuming that the configuration update is successful, you should now
|
|
see a new line at the end of ``inst/faucet.log``::
|
|
|
|
Jan 06 15:14:35 faucet INFO Add new datapath DPID 1 (0x1)
|
|
|
|
Faucet is now waiting for a switch with datapath ID 0x1 to connect to
|
|
it over OpenFlow, so our next step is to create a switch with OVS and
|
|
make it connect to Faucet. To do that, switch to the terminal where
|
|
you checked out OVS and start a sandbox with ``make sandbox`` or
|
|
``tutorial/ovs-sandbox`` (as explained earlier under `Setting Up
|
|
OVS`_). You should see something like this toward the end of the
|
|
output::
|
|
|
|
----------------------------------------------------------------------
|
|
You are running in a dummy Open vSwitch environment. You can use
|
|
ovs-vsctl, ovs-ofctl, ovs-appctl, and other tools to work with the
|
|
dummy switch.
|
|
|
|
Log files, pidfiles, and the configuration database are in the
|
|
"sandbox" subdirectory.
|
|
|
|
Exit the shell to kill the running daemons.
|
|
blp@sigabrt:~/nicira/ovs/tutorial(0)$
|
|
|
|
Inside the sandbox, create a switch ("bridge") named ``br0``, set its
|
|
datapath ID to 0x1, add simulated ports to it named ``p1`` through
|
|
``p5``, and tell it to connect to the Faucet controller. To make it
|
|
easier to understand, we request for port ``p1`` to be assigned
|
|
OpenFlow port 1, ``p2`` port 2, and so on. As a final touch,
|
|
configure the controller to be "out-of-band" (this is mainly to avoid
|
|
some annoying messages in the ``ovs-vswitchd`` logs; for more
|
|
information, run ``man ovs-vswitchd.conf.db`` and search for
|
|
``connection_mode``)::
|
|
|
|
$ ovs-vsctl add-br br0 \
|
|
-- set bridge br0 other-config:datapath-id=0000000000000001 \
|
|
-- add-port br0 p1 -- set interface p1 ofport_request=1 \
|
|
-- add-port br0 p2 -- set interface p2 ofport_request=2 \
|
|
-- add-port br0 p3 -- set interface p3 ofport_request=3 \
|
|
-- add-port br0 p4 -- set interface p4 ofport_request=4 \
|
|
-- add-port br0 p5 -- set interface p5 ofport_request=5 \
|
|
-- set-controller br0 tcp:127.0.0.1:6653 \
|
|
-- set controller br0 connection-mode=out-of-band
|
|
|
|
.. note::
|
|
|
|
You don't have to run all of these as a single ``ovs-vsctl``
|
|
invocation. It is a little more efficient, though, and since it
|
|
updates the OVS configuration in a single database transaction it
|
|
means that, for example, there is never a time when the controller
|
|
is set but it has not yet been configured as out-of-band.
|
|
|
|
Now, if you look at ``inst/faucet.log`` again, you should see that
|
|
Faucet recognized and configured the new switch and its ports::
|
|
|
|
Jan 06 15:17:10 faucet INFO DPID 1 (0x1) connected
|
|
Jan 06 15:17:10 faucet.valve INFO DPID 1 (0x1) Cold start configuring DP
|
|
Jan 06 15:17:10 faucet.valve INFO DPID 1 (0x1) Configuring VLAN 100 vid:100 ports:Port 1,Port 2,Port 3
|
|
Jan 06 15:17:10 faucet.valve INFO DPID 1 (0x1) Configuring VLAN 200 vid:200 ports:Port 4,Port 5
|
|
Jan 06 15:17:10 faucet.valve INFO DPID 1 (0x1) Port 1 up, configuring
|
|
Jan 06 15:17:10 faucet.valve INFO DPID 1 (0x1) Port 2 up, configuring
|
|
Jan 06 15:17:10 faucet.valve INFO DPID 1 (0x1) Port 3 up, configuring
|
|
Jan 06 15:17:10 faucet.valve INFO DPID 1 (0x1) Port 4 up, configuring
|
|
Jan 06 15:17:10 faucet.valve INFO DPID 1 (0x1) Port 5 up, configuring
|
|
|
|
Over on the Open vSwitch side, you can see a lot of related activity
|
|
if you take a look in ``sandbox/ovs-vswitchd.log``. For example, here
|
|
is the basic OpenFlow session setup and Faucet's probe of the switch's
|
|
ports and capabilities::
|
|
|
|
rconn|INFO|br0<->tcp:127.0.0.1:6653: connecting...
|
|
vconn|DBG|tcp:127.0.0.1:6653: sent (Success): OFPT_HELLO (OF1.4) (xid=0x1):
|
|
version bitmap: 0x01, 0x02, 0x03, 0x04, 0x05
|
|
vconn|DBG|tcp:127.0.0.1:6653: received: OFPT_HELLO (OF1.3) (xid=0x2f24810a):
|
|
version bitmap: 0x01, 0x02, 0x03, 0x04
|
|
vconn|DBG|tcp:127.0.0.1:6653: negotiated OpenFlow version 0x04 (we support version 0x05 and earlier, peer supports version 0x04 and earlier)
|
|
rconn|INFO|br0<->tcp:127.0.0.1:6653: connected
|
|
vconn|DBG|tcp:127.0.0.1:6653: received: OFPT_ECHO_REQUEST (OF1.3) (xid=0x2f24810b): 0 bytes of payload
|
|
vconn|DBG|tcp:127.0.0.1:6653: sent (Success): OFPT_ECHO_REPLY (OF1.3) (xid=0x2f24810b): 0 bytes of payload
|
|
vconn|DBG|tcp:127.0.0.1:6653: received: OFPT_FEATURES_REQUEST (OF1.3) (xid=0x2f24810c):
|
|
vconn|DBG|tcp:127.0.0.1:6653: sent (Success): OFPT_FEATURES_REPLY (OF1.3) (xid=0x2f24810c): dpid:0000000000000001
|
|
n_tables:254, n_buffers:0
|
|
capabilities: FLOW_STATS TABLE_STATS PORT_STATS GROUP_STATS QUEUE_STATS
|
|
vconn|DBG|tcp:127.0.0.1:6653: received: OFPST_PORT_DESC request (OF1.3) (xid=0x2f24810d): port=ANY
|
|
vconn|DBG|tcp:127.0.0.1:6653: sent (Success): OFPST_PORT_DESC reply (OF1.3) (xid=0x2f24810d):
|
|
1(p1): addr:aa:55:aa:55:00:14
|
|
config: PORT_DOWN
|
|
state: LINK_DOWN
|
|
speed: 0 Mbps now, 0 Mbps max
|
|
2(p2): addr:aa:55:aa:55:00:15
|
|
config: PORT_DOWN
|
|
state: LINK_DOWN
|
|
speed: 0 Mbps now, 0 Mbps max
|
|
3(p3): addr:aa:55:aa:55:00:16
|
|
config: PORT_DOWN
|
|
state: LINK_DOWN
|
|
speed: 0 Mbps now, 0 Mbps max
|
|
4(p4): addr:aa:55:aa:55:00:17
|
|
config: PORT_DOWN
|
|
state: LINK_DOWN
|
|
speed: 0 Mbps now, 0 Mbps max
|
|
5(p5): addr:aa:55:aa:55:00:18
|
|
config: PORT_DOWN
|
|
state: LINK_DOWN
|
|
speed: 0 Mbps now, 0 Mbps max
|
|
LOCAL(br0): addr:c6:64:ff:59:48:41
|
|
config: PORT_DOWN
|
|
state: LINK_DOWN
|
|
speed: 0 Mbps now, 0 Mbps max
|
|
|
|
After that, you can see Faucet delete all existing flows and then
|
|
start adding new ones::
|
|
|
|
vconn|DBG|tcp:127.0.0.1:6653: received: OFPT_FLOW_MOD (OF1.3) (xid=0x2f24810e): DEL table:255 priority=0 actions=drop
|
|
vconn|DBG|tcp:127.0.0.1:6653: received: OFPT_BARRIER_REQUEST (OF1.3) (xid=0x2f24810f):
|
|
vconn|DBG|tcp:127.0.0.1:6653: sent (Success): OFPT_BARRIER_REPLY (OF1.3) (xid=0x2f24810f):
|
|
vconn|DBG|tcp:127.0.0.1:6653: received: OFPT_FLOW_MOD (OF1.3) (xid=0x2f248110): ADD priority=0 cookie:0x5adc15c0 out_port:0 actions=drop
|
|
vconn|DBG|tcp:127.0.0.1:6653: received: OFPT_FLOW_MOD (OF1.3) (xid=0x2f248111): ADD table:1 priority=0 cookie:0x5adc15c0 out_port:0 actions=drop
|
|
...
|
|
|
|
OpenFlow Layer
|
|
~~~~~~~~~~~~~~
|
|
|
|
Let's take a look at the OpenFlow tables that Faucet set up. Before
|
|
we do that, it's helpful to take a look at ``docs/architecture.rst``
|
|
in the Faucet documentation to learn how Faucet structures its flow
|
|
tables. In summary, this document says:
|
|
|
|
Table 0
|
|
Port-based ACLs
|
|
|
|
Table 1
|
|
Ingress VLAN processing
|
|
|
|
Table 2
|
|
VLAN-based ACLs
|
|
|
|
Table 3
|
|
Ingress L2 processing, MAC learning
|
|
|
|
Table 4
|
|
L3 forwarding for IPv4
|
|
|
|
Table 5
|
|
L3 forwarding for IPv6
|
|
|
|
Table 6
|
|
Virtual IP processing, e.g. for router IP addresses implemented by Faucet
|
|
|
|
Table 7
|
|
Egress L2 processing
|
|
|
|
Table 8
|
|
Flooding
|
|
|
|
With that in mind, let's dump the flow tables. The simplest way is to
|
|
just run plain ``ovs-ofctl dump-flows``::
|
|
|
|
$ ovs-ofctl dump-flows br0
|
|
|
|
If you run that bare command, it produces a lot of extra junk that
|
|
makes the output harder to read, like statistics and "cookie" values
|
|
that are all the same. In addition, for historical reasons
|
|
``ovs-ofctl`` always defaults to using OpenFlow 1.0 even though Faucet
|
|
and most modern controllers use OpenFlow 1.3, so it's best to force it
|
|
to use OpenFlow 1.3. We could throw in a lot of options to fix these,
|
|
but we'll want to do this more than once, so let's start by defining a
|
|
shell function for ourselves::
|
|
|
|
$ dump-flows () {
|
|
ovs-ofctl -OOpenFlow13 --names --no-stat dump-flows "$@" \
|
|
| sed 's/cookie=0x5adc15c0, //'
|
|
}
|
|
|
|
Let's also define ``save-flows`` and ``diff-flows`` functions for
|
|
later use::
|
|
|
|
$ save-flows () {
|
|
ovs-ofctl -OOpenFlow13 --no-names --sort dump-flows "$@"
|
|
}
|
|
$ diff-flows () {
|
|
ovs-ofctl -OOpenFlow13 diff-flows "$@" | sed 's/cookie=0x5adc15c0 //'
|
|
}
|
|
|
|
Now let's take a look at the flows we've got and what they mean, like
|
|
this::
|
|
|
|
$ dump-flows br0
|
|
|
|
First, table 0 has a flow that just jumps to table 1 for each
|
|
configured port, and drops other unrecognized packets. Presumably it
|
|
will do more if we configured port-based ACLs::
|
|
|
|
priority=9099,in_port=p1 actions=goto_table:1
|
|
priority=9099,in_port=p2 actions=goto_table:1
|
|
priority=9099,in_port=p3 actions=goto_table:1
|
|
priority=9099,in_port=p4 actions=goto_table:1
|
|
priority=9099,in_port=p5 actions=goto_table:1
|
|
priority=0 actions=drop
|
|
|
|
Table 1, for ingress VLAN processing, has a bunch of flows that drop
|
|
inappropriate packets, such as LLDP and STP::
|
|
|
|
table=1, priority=9099,dl_dst=01:80:c2:00:00:00 actions=drop
|
|
table=1, priority=9099,dl_dst=01:00:0c:cc:cc:cd actions=drop
|
|
table=1, priority=9099,dl_type=0x88cc actions=drop
|
|
|
|
Table 1 also has some more interesting flows that recognize packets
|
|
without a VLAN header on each of our ports
|
|
(``vlan_tci=0x0000/0x1fff``), push on the VLAN configured for the
|
|
port, and proceed to table 3. Presumably these skip table 2 because
|
|
we did not configure any VLAN-based ACLs. There is also a fallback
|
|
flow to drop other packets, which in practice means that if any
|
|
received packet already has a VLAN header then it will be dropped::
|
|
|
|
table=1, priority=9000,in_port=p1,vlan_tci=0x0000/0x1fff actions=push_vlan:0x8100,set_field:4196->vlan_vid,goto_table:3
|
|
table=1, priority=9000,in_port=p2,vlan_tci=0x0000/0x1fff actions=push_vlan:0x8100,set_field:4196->vlan_vid,goto_table:3
|
|
table=1, priority=9000,in_port=p3,vlan_tci=0x0000/0x1fff actions=push_vlan:0x8100,set_field:4196->vlan_vid,goto_table:3
|
|
table=1, priority=9000,in_port=p4,vlan_tci=0x0000/0x1fff actions=push_vlan:0x8100,set_field:4296->vlan_vid,goto_table:3
|
|
table=1, priority=9000,in_port=p5,vlan_tci=0x0000/0x1fff actions=push_vlan:0x8100,set_field:4296->vlan_vid,goto_table:3
|
|
table=1, priority=0 actions=drop
|
|
|
|
.. note::
|
|
|
|
The syntax ``set_field:4196->vlan_vid`` is curious and somewhat
|
|
misleading. OpenFlow 1.3 defines the ``vlan_vid`` field as a 13-bit
|
|
field where bit 12 is set to 1 if the VLAN header is present. Thus,
|
|
since 4196 is 0x1064, this action sets VLAN value 0x64, which in
|
|
decimal is 100.
|
|
|
|
Table 2 isn't used because there are no VLAN-based ACLs. It just has
|
|
a drop flow::
|
|
|
|
table=2, priority=0 actions=drop
|
|
|
|
Table 3 is used for MAC learning but the controller hasn't learned any
|
|
MAC yet. It also drops some inappropriate packets such as those that claim
|
|
to be from a broadcast source address (why not from all multicast source
|
|
addresses, though?). We'll come back here later::
|
|
|
|
table=3, priority=9099,dl_src=ff:ff:ff:ff:ff:ff actions=drop
|
|
table=3, priority=9001,dl_src=0e:00:00:00:00:01 actions=drop
|
|
table=3, priority=0 actions=drop
|
|
table=3, priority=9000 actions=CONTROLLER:96,goto_table:7
|
|
|
|
Tables 4, 5, and 6 aren't used because we haven't configured any
|
|
routing::
|
|
|
|
table=4, priority=0 actions=drop
|
|
table=5, priority=0 actions=drop
|
|
table=6, priority=0 actions=drop
|
|
|
|
Table 7 is used to direct packets to learned MACs but Faucet hasn't
|
|
learned any MACs yet, so it just sends all the packets along to table
|
|
8::
|
|
|
|
table=7, priority=0 actions=drop
|
|
table=7, priority=9000 actions=goto_table:8
|
|
|
|
Table 8 implements flooding, broadcast, and multicast. The flows for
|
|
broadcast and flood are easy to understand: if the packet came in on a
|
|
given port and needs to be flooded or broadcast, output it to all the
|
|
other ports in the same VLAN::
|
|
|
|
table=8, priority=9008,in_port=p1,dl_vlan=100,dl_dst=ff:ff:ff:ff:ff:ff actions=pop_vlan,output:p2,output:p3
|
|
table=8, priority=9008,in_port=p2,dl_vlan=100,dl_dst=ff:ff:ff:ff:ff:ff actions=pop_vlan,output:p1,output:p3
|
|
table=8, priority=9008,in_port=p3,dl_vlan=100,dl_dst=ff:ff:ff:ff:ff:ff actions=pop_vlan,output:p1,output:p2
|
|
table=8, priority=9008,in_port=p4,dl_vlan=200,dl_dst=ff:ff:ff:ff:ff:ff actions=pop_vlan,output:p5
|
|
table=8, priority=9008,in_port=p5,dl_vlan=200,dl_dst=ff:ff:ff:ff:ff:ff actions=pop_vlan,output:p4
|
|
table=8, priority=9000,in_port=p1,dl_vlan=100 actions=pop_vlan,output:p2,output:p3
|
|
table=8, priority=9000,in_port=p2,dl_vlan=100 actions=pop_vlan,output:p1,output:p3
|
|
table=8, priority=9000,in_port=p3,dl_vlan=100 actions=pop_vlan,output:p1,output:p2
|
|
table=8, priority=9000,in_port=p4,dl_vlan=200 actions=pop_vlan,output:p5
|
|
table=8, priority=9000,in_port=p5,dl_vlan=200 actions=pop_vlan,output:p4
|
|
|
|
.. note::
|
|
|
|
These flows could apparently be simpler because OpenFlow says that
|
|
``output:<port>`` is ignored if ``<port>`` is the input port. That
|
|
means that the first three flows above could apparently be collapsed
|
|
into just::
|
|
|
|
table=8, priority=9008,dl_vlan=100,dl_dst=ff:ff:ff:ff:ff:ff actions=pop_vlan,output:p1,output:p2,output:p3
|
|
|
|
There might be some reason why this won't work or isn't practical,
|
|
but that isn't obvious from looking at the flow table.
|
|
|
|
There are also some flows for handling some standard forms of
|
|
multicast, and a fallback drop flow::
|
|
|
|
table=8, priority=9006,in_port=p1,dl_vlan=100,dl_dst=33:33:00:00:00:00/ff:ff:00:00:00:00 actions=pop_vlan,output:p2,output:p3
|
|
table=8, priority=9006,in_port=p2,dl_vlan=100,dl_dst=33:33:00:00:00:00/ff:ff:00:00:00:00 actions=pop_vlan,output:p1,output:p3
|
|
table=8, priority=9006,in_port=p3,dl_vlan=100,dl_dst=33:33:00:00:00:00/ff:ff:00:00:00:00 actions=pop_vlan,output:p1,output:p2
|
|
table=8, priority=9006,in_port=p4,dl_vlan=200,dl_dst=33:33:00:00:00:00/ff:ff:00:00:00:00 actions=pop_vlan,output:p5
|
|
table=8, priority=9006,in_port=p5,dl_vlan=200,dl_dst=33:33:00:00:00:00/ff:ff:00:00:00:00 actions=pop_vlan,output:p4
|
|
table=8, priority=9002,in_port=p1,dl_vlan=100,dl_dst=01:80:c2:00:00:00/ff:ff:ff:00:00:00 actions=pop_vlan,output:p2,output:p3
|
|
table=8, priority=9002,in_port=p2,dl_vlan=100,dl_dst=01:80:c2:00:00:00/ff:ff:ff:00:00:00 actions=pop_vlan,output:p1,output:p3
|
|
table=8, priority=9002,in_port=p3,dl_vlan=100,dl_dst=01:80:c2:00:00:00/ff:ff:ff:00:00:00 actions=pop_vlan,output:p1,output:p2
|
|
table=8, priority=9004,in_port=p1,dl_vlan=100,dl_dst=01:00:5e:00:00:00/ff:ff:ff:00:00:00 actions=pop_vlan,output:p2,output:p3
|
|
table=8, priority=9004,in_port=p2,dl_vlan=100,dl_dst=01:00:5e:00:00:00/ff:ff:ff:00:00:00 actions=pop_vlan,output:p1,output:p3
|
|
table=8, priority=9004,in_port=p3,dl_vlan=100,dl_dst=01:00:5e:00:00:00/ff:ff:ff:00:00:00 actions=pop_vlan,output:p1,output:p2
|
|
table=8, priority=9002,in_port=p4,dl_vlan=200,dl_dst=01:80:c2:00:00:00/ff:ff:ff:00:00:00 actions=pop_vlan,output:p5
|
|
table=8, priority=9002,in_port=p5,dl_vlan=200,dl_dst=01:80:c2:00:00:00/ff:ff:ff:00:00:00 actions=pop_vlan,output:p4
|
|
table=8, priority=9004,in_port=p4,dl_vlan=200,dl_dst=01:00:5e:00:00:00/ff:ff:ff:00:00:00 actions=pop_vlan,output:p5
|
|
table=8, priority=9004,in_port=p5,dl_vlan=200,dl_dst=01:00:5e:00:00:00/ff:ff:ff:00:00:00 actions=pop_vlan,output:p4
|
|
table=8, priority=0 actions=drop
|
|
|
|
Tracing
|
|
~~~~~~~
|
|
|
|
Let's go a level deeper. So far, everything we've done has been
|
|
fairly general. We can also look at something more specific: the path
|
|
that a particular packet would take through Open vSwitch. We can use
|
|
OVN ``ofproto/trace`` command to play "what-if?" games. This command
|
|
is one that we send directly to ``ovs-vswitchd``, using the
|
|
``ovs-appctl`` utility.
|
|
|
|
.. note::
|
|
|
|
``ovs-appctl`` is actually a very simple-minded JSON-RPC client, so you could
|
|
also use some other utility that speaks JSON-RPC, or access it from a program
|
|
as an API.
|
|
|
|
The ``ovs-vswitchd``\(8) manpage has a lot of detail on how to use
|
|
``ofproto/trace``, but let's just start by building up from a simple
|
|
example. You can start with a command that just specifies the
|
|
datapath (e.g. ``br0``), an input port, and nothing else; unspecified
|
|
fields default to all-zeros. Let's look at the full output for this
|
|
trivial example::
|
|
|
|
$ ovs-appctl ofproto/trace br0 in_port=p1
|
|
Flow: in_port=1,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,dl_type=0x0000
|
|
|
|
bridge("br0")
|
|
-------------
|
|
0. in_port=1, priority 9099, cookie 0x5adc15c0
|
|
goto_table:1
|
|
1. in_port=1,vlan_tci=0x0000/0x1fff, priority 9000, cookie 0x5adc15c0
|
|
push_vlan:0x8100
|
|
set_field:4196->vlan_vid
|
|
goto_table:3
|
|
3. priority 9000, cookie 0x5adc15c0
|
|
CONTROLLER:96
|
|
goto_table:7
|
|
7. priority 9000, cookie 0x5adc15c0
|
|
goto_table:8
|
|
8. in_port=1,dl_vlan=100, priority 9000, cookie 0x5adc15c0
|
|
pop_vlan
|
|
output:2
|
|
output:3
|
|
|
|
Final flow: unchanged
|
|
Megaflow: recirc_id=0,eth,in_port=1,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,dl_type=0x0000
|
|
Datapath actions: push_vlan(vid=100,pcp=0),userspace(pid=0,controller(reason=1,flags=1,recirc_id=1,rule_cookie=0x5adc15c0,controller_id=0,max_len=96)),pop_vlan,2,3
|
|
|
|
The first line of output, beginning with ``Flow:``, just repeats our
|
|
request in a more verbose form, including the L2 fields that were
|
|
zeroed.
|
|
|
|
Each of the numbered items under ``bridge("br0")`` shows what would
|
|
happen to our hypothetical packet in the table with the given number.
|
|
For example, we see in table 1 that the packet matches a flow that
|
|
push on a VLAN header, set the VLAN ID to 100, and goes on to further
|
|
processing in table 3. In table 3, the packet gets sent to the
|
|
controller to allow MAC learning to take place, and then table 8
|
|
floods the packet to the other ports in the same VLAN.
|
|
|
|
Summary information follows the numbered tables. The packet hasn't
|
|
been changed (overall, even though a VLAN was pushed and then popped
|
|
back off) since ingress, hence ``Final flow: unchanged``. We'll look
|
|
at the ``Megaflow`` information later. The ``Datapath actions``
|
|
summarize what would actually happen to such a packet.
|
|
|
|
Triggering MAC Learning
|
|
~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
We just saw how a packet gets sent to the controller to trigger MAC
|
|
learning. Let's actually send the packet and see what happens. But
|
|
before we do that, let's save a copy of the current flow tables for
|
|
later comparison::
|
|
|
|
$ save-flows br0 > flows1
|
|
|
|
Now use ``ofproto/trace``, as before, with a few new twists: we
|
|
specify the source and destination Ethernet addresses and append the
|
|
``-generate`` option so that side effects like sending a packet to the
|
|
controller actually happen::
|
|
|
|
$ ovs-appctl ofproto/trace br0 in_port=p1,dl_src=00:11:11:00:00:00,dl_dst=00:22:22:00:00:00 -generate
|
|
|
|
The output is almost identical to that before, so it is not repeated
|
|
here. But, take a look at ``inst/faucet.log`` now. It should now
|
|
include a line at the end that says that it learned about our MAC
|
|
00:11:11:00:00:00, like this::
|
|
|
|
Jan 06 15:56:02 faucet.valve INFO DPID 1 (0x1) L2 learned 00:11:11:00:00:00 (L2 type 0x0000, L3 src None) on Port 1 on VLAN 100 (1 hosts total
|
|
|
|
Now compare the flow tables that we saved to the current ones::
|
|
|
|
diff-flows flows1 br0
|
|
|
|
The result should look like this, showing new flows for the learned
|
|
MACs::
|
|
|
|
+table=3 priority=9098,in_port=1,dl_vlan=100,dl_src=00:11:11:00:00:00 hard_timeout=3601 actions=goto_table:7
|
|
+table=7 priority=9099,dl_vlan=100,dl_dst=00:11:11:00:00:00 idle_timeout=3601 actions=pop_vlan,output:1
|
|
|
|
To demonstrate the usefulness of the learned MAC, try tracing (with
|
|
side effects) a packet arriving on ``p2`` (or ``p3``) and destined to
|
|
the address learned on ``p1``, like this::
|
|
|
|
$ ovs-appctl ofproto/trace br0 in_port=p2,dl_src=00:22:22:00:00:00,dl_dst=00:11:11:00:00:00 -generate
|
|
|
|
The first time you run this command, you will notice that it sends the
|
|
packet to the controller, to learn ``p2``'s 00:22:22:00:00:00 source
|
|
address::
|
|
|
|
bridge("br0")
|
|
-------------
|
|
0. in_port=2, priority 9099, cookie 0x5adc15c0
|
|
goto_table:1
|
|
1. in_port=2,vlan_tci=0x0000/0x1fff, priority 9000, cookie 0x5adc15c0
|
|
push_vlan:0x8100
|
|
set_field:4196->vlan_vid
|
|
goto_table:3
|
|
3. priority 9000, cookie 0x5adc15c0
|
|
CONTROLLER:96
|
|
goto_table:7
|
|
7. dl_vlan=100,dl_dst=00:11:11:00:00:00, priority 9099, cookie 0x5adc15c0
|
|
pop_vlan
|
|
output:1
|
|
|
|
If you check ``inst/faucet.log``, you can see that ``p2``'s MAC has
|
|
been learned too::
|
|
|
|
Jan 06 15:58:09 faucet.valve INFO DPID 1 (0x1) L2 learned 00:22:22:00:00:00 (L2 type 0x0000, L3 src None) on Port 2 on VLAN 100 (2 hosts total)
|
|
|
|
Similarly for ``diff-flows``::
|
|
|
|
$ diff-flows flows1 br0
|
|
+table=3 priority=9098,in_port=1,dl_vlan=100,dl_src=00:11:11:00:00:00 hard_timeout=3601 actions=goto_table:7
|
|
+table=3 priority=9098,in_port=2,dl_vlan=100,dl_src=00:22:22:00:00:00 hard_timeout=3604 actions=goto_table:7
|
|
+table=7 priority=9099,dl_vlan=100,dl_dst=00:11:11:00:00:00 idle_timeout=3601 actions=pop_vlan,output:1
|
|
+table=7 priority=9099,dl_vlan=100,dl_dst=00:22:22:00:00:00 idle_timeout=3604 actions=pop_vlan,output:2
|
|
|
|
Then, if you re-run either of the ``ofproto/trace`` commands (with or
|
|
without ``-generate``), you can see that the packets go back and forth
|
|
without any further MAC learning, e.g.::
|
|
|
|
$ ovs-appctl ofproto/trace br0 in_port=p2,dl_src=00:22:22:00:00:00,dl_dst=00:11:11:00:00:00 -generate
|
|
Flow: in_port=2,vlan_tci=0x0000,dl_src=00:22:22:00:00:00,dl_dst=00:11:11:00:00:00,dl_type=0x0000
|
|
|
|
bridge("br0")
|
|
-------------
|
|
0. in_port=2, priority 9099, cookie 0x5adc15c0
|
|
goto_table:1
|
|
1. in_port=2,vlan_tci=0x0000/0x1fff, priority 9000, cookie 0x5adc15c0
|
|
push_vlan:0x8100
|
|
set_field:4196->vlan_vid
|
|
goto_table:3
|
|
3. in_port=2,dl_vlan=100,dl_src=00:22:22:00:00:00, priority 9098, cookie 0x5adc15c0
|
|
goto_table:7
|
|
7. dl_vlan=100,dl_dst=00:11:11:00:00:00, priority 9099, cookie 0x5adc15c0
|
|
pop_vlan
|
|
output:1
|
|
|
|
Final flow: unchanged
|
|
Megaflow: recirc_id=0,eth,in_port=2,vlan_tci=0x0000/0x1fff,dl_src=00:22:22:00:00:00,dl_dst=00:11:11:00:00:00,dl_type=0x0000
|
|
Datapath actions: 1
|
|
|
|
Performance
|
|
~~~~~~~~~~~
|
|
|
|
Open vSwitch has a concept of a "fast path" and a "slow path"; ideally
|
|
all packets stay in the fast path. This distinction between slow path
|
|
and fast path is the key to making sure that Open vSwitch performs as
|
|
fast as possible.
|
|
|
|
Some factors can force a flow or a packet to take the slow path. As one
|
|
example, all CFM, BFD, LACP, STP, and LLDP processing takes place in the
|
|
slow path, in the cases where Open vSwitch processes these protocols
|
|
itself instead of delegating to controller-written flows. As a second
|
|
example, any flow that modifies ARP fields is processed in the slow
|
|
path. These are corner cases that are unlikely to cause performance
|
|
problems in practice because these protocols send packets at a
|
|
relatively slow rate, and users and controller authors do not normally
|
|
need to be concerned about them.
|
|
|
|
To understand what cases users and controller authors should consider,
|
|
we need to talk about how Open vSwitch optimizes for performance. The
|
|
Open vSwitch code is divided into two major components which, as
|
|
already mentioned, are called the "slow path" and "fast path" (aka
|
|
"datapath"). The slow path is embedded in the ``ovs-vswitchd``
|
|
userspace program. It is the part of the Open vSwitch packet
|
|
processing logic that understands OpenFlow. Its job is to take a
|
|
packet and run it through the OpenFlow tables to determine what should
|
|
happen to it. It outputs a list of actions in a form similar to
|
|
OpenFlow actions but simpler, called "ODP actions" or "datapath
|
|
actions". It then passes the ODP actions to the datapath, which
|
|
applies them to the packet.
|
|
|
|
.. note::
|
|
|
|
Open vSwitch contains a single slow path and multiple fast paths.
|
|
The difference between using Open vSwitch with the Linux kernel
|
|
versus with DPDK is the datapath.
|
|
|
|
If every packet passed through the slow path and the fast path in this
|
|
way, performance would be terrible. The key to getting high
|
|
performance from this architecture is caching. Open vSwitch includes
|
|
a multi-level cache. It works like this:
|
|
|
|
1. A packet initially arrives at the datapath. Some datapaths (such
|
|
as DPDK and the in-tree version of the OVS kernel module) have a
|
|
first-level cache called the "microflow cache". The microflow
|
|
cache is the key to performance for relatively long-lived, high
|
|
packet rate flows. If the datapath has a microflow cache, then it
|
|
consults it and, if there is a cache hit, the datapath executes the
|
|
associated actions. Otherwise, it proceeds to step 2.
|
|
|
|
2. The datapath consults its second-level cache, called the "megaflow
|
|
cache". The megaflow cache is the key to performance for shorter
|
|
or low packet rate flows. If there is a megaflow cache hit, the
|
|
datapath executes the associated actions. Otherwise, it proceeds
|
|
to step 3.
|
|
|
|
3. The datapath passes the packet to the slow path, which runs it
|
|
through the OpenFlow table to yield ODP actions, a process that is
|
|
often called "flow translation". It then passes the packet back to
|
|
the datapath to execute the actions and to, if possible, install a
|
|
megaflow cache entry so that subsequent similar packets can be
|
|
handled directly by the fast path. (We already described above
|
|
most of the cases where a cache entry cannot be installed.)
|
|
|
|
The megaflow cache is the key cache to consider for performance
|
|
tuning. Open vSwitch provides tools for understanding and optimizing
|
|
its behavior. The ``ofproto/trace`` command that we have already been
|
|
using is the most common tool for this use. Let's take another look
|
|
at the most recent ``ofproto/trace`` output::
|
|
|
|
$ ovs-appctl ofproto/trace br0 in_port=p2,dl_src=00:22:22:00:00:00,dl_dst=00:11:11:00:00:00 -generate
|
|
Flow: in_port=2,vlan_tci=0x0000,dl_src=00:22:22:00:00:00,dl_dst=00:11:11:00:00:00,dl_type=0x0000
|
|
|
|
bridge("br0")
|
|
-------------
|
|
0. in_port=2, priority 9099, cookie 0x5adc15c0
|
|
goto_table:1
|
|
1. in_port=2,vlan_tci=0x0000/0x1fff, priority 9000, cookie 0x5adc15c0
|
|
push_vlan:0x8100
|
|
set_field:4196->vlan_vid
|
|
goto_table:3
|
|
3. in_port=2,dl_vlan=100,dl_src=00:22:22:00:00:00, priority 9098, cookie 0x5adc15c0
|
|
goto_table:7
|
|
7. dl_vlan=100,dl_dst=00:11:11:00:00:00, priority 9099, cookie 0x5adc15c0
|
|
pop_vlan
|
|
output:1
|
|
|
|
Final flow: unchanged
|
|
Megaflow: recirc_id=0,eth,in_port=2,vlan_tci=0x0000/0x1fff,dl_src=00:22:22:00:00:00,dl_dst=00:11:11:00:00:00,dl_type=0x0000
|
|
Datapath actions: 1
|
|
|
|
This time, it's the last line that we're interested in. This line
|
|
shows the entry that Open vSwitch would insert into the megaflow cache
|
|
given the particular packet with the current flow tables. The
|
|
megaflow entry includes:
|
|
|
|
* ``recirc_id``. This is an implementation detail that users don't
|
|
normally need to understand.
|
|
|
|
* ``eth``. This just indicates that the cache entry matches only
|
|
Ethernet packets; Open vSwitch also supports other types of packets,
|
|
such as IP packets not encapsulated in Ethernet.
|
|
|
|
* All of the fields matched by any of the flows that the packet
|
|
visited:
|
|
|
|
``in_port``
|
|
In tables 0, 1, and 3.
|
|
|
|
``vlan_tci``
|
|
In tables 1, 3, and 7 (``vlan_tci`` includes the VLAN ID and PCP
|
|
fields and``dl_vlan`` is just the VLAN ID).
|
|
|
|
``dl_src``
|
|
In table 3
|
|
|
|
``dl_dst``
|
|
In table 7.
|
|
|
|
* All of the fields matched by flows that had to be ruled out to
|
|
ensure that the ones that actually matched were the highest priority
|
|
matching rules.
|
|
|
|
The last one is important. Notice how the megaflow matches on
|
|
``dl_type=0x0000``, even though none of the tables matched on
|
|
``dl_type`` (the Ethernet type). One reason is because of this flow
|
|
in OpenFlow table 1 (which shows up in ``dump-flows`` output)::
|
|
|
|
table=1, priority=9099,dl_type=0x88cc actions=drop
|
|
|
|
This flow has higher priority than the flow in table 1 that actually
|
|
matched. This means that, to put it in the megaflow cache,
|
|
``ovs-vswitchd`` has to add a match on ``dl_type`` to ensure that the
|
|
cache entry doesn't match LLDP packets (with Ethertype 0x88cc).
|
|
|
|
.. note::
|
|
|
|
In fact, in some cases ``ovs-vswitchd`` matches on fields that
|
|
aren't strictly required according to this description. ``dl_type``
|
|
is actually one of those, so deleting the LLDP flow probably would
|
|
not have any effect on the megaflow. But the principle here is
|
|
sound.
|
|
|
|
So why does any of this matter? It's because, the more specific a
|
|
megaflow is, that is, the more fields or bits within fields that a
|
|
megaflow matches, the less valuable it is from a caching viewpoint. A
|
|
very specific megaflow might match on L2 and L3 addresses and L4 port
|
|
numbers. When that happens, only packets in one (half-)connection
|
|
match the megaflow. If that connection has only a few packets, as
|
|
many connections do, then the high cost of the slow path translation
|
|
is amortized over only a few packets, so the average cost of
|
|
forwarding those packets is high. On the other hand, if a megaflow
|
|
only matches a relatively small number of L2 and L3 packets, then the
|
|
cache entry can potentially be used by many individual connections,
|
|
and the average cost is low.
|
|
|
|
For more information on how Open vSwitch constructs megaflows,
|
|
including about ways that it can make megaflow entries less specific
|
|
than one would infer from the discussion here, please refer to the
|
|
2015 NSDI paper, "The Design and Implementation of Open vSwitch",
|
|
which focuses on this algorithm.
|
|
|
|
Routing
|
|
-------
|
|
|
|
We've looked at how Faucet implements switching in OpenFlow, and how
|
|
Open vSwitch implements OpenFlow through its datapath architecture.
|
|
Now let's start over, adding L3 routing into the picture.
|
|
|
|
It's remarkably easy to enable routing. We just change our ``vlans``
|
|
section in ``inst/faucet.yaml`` to specify a router IP address for
|
|
each VLAN and define a router between them. The ``dps`` section is unchanged::
|
|
|
|
dps:
|
|
switch-1:
|
|
dp_id: 0x1
|
|
timeout: 3600
|
|
arp_neighbor_timeout: 3600
|
|
interfaces:
|
|
1:
|
|
native_vlan: 100
|
|
2:
|
|
native_vlan: 100
|
|
3:
|
|
native_vlan: 100
|
|
4:
|
|
native_vlan: 200
|
|
5:
|
|
native_vlan: 200
|
|
vlans:
|
|
100:
|
|
faucet_vips: ["10.100.0.254/24"]
|
|
200:
|
|
faucet_vips: ["10.200.0.254/24"]
|
|
routers:
|
|
router-1:
|
|
vlans: [100, 200]
|
|
|
|
Then we restart Faucet::
|
|
|
|
$ docker restart faucet
|
|
|
|
.. note::
|
|
|
|
One should be able to tell Faucet to re-read its configuration file
|
|
without restarting it. I sometimes saw anomalous behavior when I
|
|
did this, although I didn't characterize it well enough to make a
|
|
quality bug report. I found restarting the container to be
|
|
reliable.
|
|
|
|
OpenFlow Layer
|
|
~~~~~~~~~~~~~~
|
|
|
|
Back in the OVS sandbox, let's see how the flow table has changed, with::
|
|
|
|
$ diff-flows flows1 br0
|
|
|
|
First, table 3 has new flows to direct ARP packets to table 6 (the
|
|
virtual IP processing table), presumably to handle ARP for the router
|
|
IPs. New flows also send IP packets destined to a particular Ethernet
|
|
address to table 4 (the L3 forwarding table); we can make the educated
|
|
guess that the Ethernet address is the one used by the Faucet router::
|
|
|
|
+table=3 priority=9131,arp,dl_vlan=100 actions=goto_table:6
|
|
+table=3 priority=9131,arp,dl_vlan=200 actions=goto_table:6
|
|
+table=3 priority=9099,ip,dl_vlan=100,dl_dst=0e:00:00:00:00:01 actions=goto_table:4
|
|
+table=3 priority=9099,ip,dl_vlan=200,dl_dst=0e:00:00:00:00:01 actions=goto_table:4
|
|
|
|
The new flows in table 4 appear to be verifying that the packets are
|
|
indeed addressed to a network or IP address that Faucet knows how to
|
|
route::
|
|
|
|
+table=4 priority=9131,ip,dl_vlan=100,nw_dst=10.100.0.254 actions=goto_table:6
|
|
+table=4 priority=9131,ip,dl_vlan=200,nw_dst=10.200.0.254 actions=goto_table:6
|
|
+table=4 priority=9123,ip,dl_vlan=100,nw_dst=10.100.0.0/24 actions=goto_table:6
|
|
+table=4 priority=9123,ip,dl_vlan=200,nw_dst=10.100.0.0/24 actions=goto_table:6
|
|
+table=4 priority=9123,ip,dl_vlan=100,nw_dst=10.200.0.0/24 actions=goto_table:6
|
|
+table=4 priority=9123,ip,dl_vlan=200,nw_dst=10.200.0.0/24 actions=goto_table:6
|
|
|
|
Table 6 has a few different things going on. It sends ARP requests
|
|
for the router IPs to the controller; presumably the controller will
|
|
generate replies and send them back to the requester. It switches
|
|
other ARP packets, either broadcasting them if they have a broadcast
|
|
destination or attempting to unicast them otherwise. It sends all
|
|
other IP packets to the controller::
|
|
|
|
+table=6 priority=9133,arp,arp_tpa=10.100.0.254 actions=CONTROLLER:128
|
|
+table=6 priority=9133,arp,arp_tpa=10.200.0.254 actions=CONTROLLER:128
|
|
+table=6 priority=9132,arp,dl_dst=ff:ff:ff:ff:ff:ff actions=goto_table:8
|
|
+table=6 priority=9131,arp actions=goto_table:7
|
|
+table=6 priority=9130,ip actions=CONTROLLER:128
|
|
|
|
Performance is clearly going to be poor if every packet that needs to
|
|
be routed has to go to the controller, but it's unlikely that's the
|
|
full story. In the next section, we'll take a closer look.
|
|
|
|
Tracing
|
|
~~~~~~~
|
|
|
|
As in our switching example, we can play some "what-if?" games to
|
|
figure out how this works. Let's suppose that a machine with IP
|
|
10.100.0.1, on port ``p1``, wants to send a IP packet to a machine
|
|
with IP 10.200.0.1 on port ``p4``. Assuming that these hosts have not
|
|
been in communication recently, the steps to accomplish this are
|
|
normally the following:
|
|
|
|
1. Host 10.100.0.1 sends an ARP request to router 10.100.0.254.
|
|
|
|
2. The router sends an ARP reply to the host.
|
|
|
|
3. Host 10.100.0.1 sends an IP packet to 10.200.0.1, via the router's
|
|
Ethernet address.
|
|
|
|
4. The router broadcasts an ARP request to ``p4`` and ``p5``, the
|
|
ports that carry the 10.200.0.<x> network.
|
|
|
|
5. Host 10.200.0.1 sends an ARP reply to the router.
|
|
|
|
6. Either the router sends the IP packet (which it buffered) to
|
|
10.200.0.1, or eventually 10.100.0.1 times out and resends it.
|
|
|
|
Let's use ``ofproto/trace`` to see whether Faucet and OVS follow this
|
|
procedure.
|
|
|
|
Before we start, save a new snapshot of the flow tables for later
|
|
comparison::
|
|
|
|
$ save-flows br0 > flows2
|
|
|
|
Step 1: Host ARP for Router
|
|
+++++++++++++++++++++++++++
|
|
|
|
Let's simulate the ARP from 10.100.0.1 to its gateway router
|
|
10.100.0.254. This requires more detail than any of the packets we've
|
|
simulated previously::
|
|
|
|
$ ovs-appctl ofproto/trace br0 in_port=p1,dl_src=00:01:02:03:04:05,dl_dst=ff:ff:ff:ff:ff:ff,dl_type=0x806,arp_spa=10.100.0.1,arp_tpa=10.100.0.254,arp_sha=00:01:02:03:04:05,arp_tha=ff:ff:ff:ff:ff:ff,arp_op=1 -generate
|
|
|
|
The important part of the output is where it shows that the packet was
|
|
recognized as an ARP request destined to the router gateway and
|
|
therefore sent to the controller::
|
|
|
|
6. arp,arp_tpa=10.100.0.254, priority 9133, cookie 0x5adc15c0
|
|
CONTROLLER:128
|
|
|
|
The Faucet log shows that Faucet learned the host's MAC address,
|
|
its MAC-to-IP mapping, and responded to the ARP request::
|
|
|
|
Jan 06 16:12:23 faucet.valve INFO DPID 1 (0x1) Adding new route 10.100.0.1/32 via 10.100.0.1 (00:01:02:03:04:05) on VLAN 100
|
|
Jan 06 16:12:23 faucet.valve INFO DPID 1 (0x1) Responded to ARP request for 10.100.0.254 from 10.100.0.1 (00:01:02:03:04:05) on VLAN 100
|
|
Jan 06 16:12:23 faucet.valve INFO DPID 1 (0x1) L2 learned 00:01:02:03:04:05 (L2 type 0x0806, L3 src 10.100.0.1) on Port 1 on VLAN 100 (1 hosts total)
|
|
|
|
We can also look at the changes to the flow tables::
|
|
|
|
$ diff-flows flows2 br0
|
|
+table=3 priority=9098,in_port=1,dl_vlan=100,dl_src=00:01:02:03:04:05 hard_timeout=3600 actions=goto_table:7
|
|
+table=4 priority=9131,ip,dl_vlan=100,nw_dst=10.100.0.1 actions=set_field:4196->vlan_vid,set_field:0e:00:00:00:00:01->eth_src,set_field:00:01:02:03:04:05->eth_dst,dec_ttl,goto_table:7
|
|
+table=4 priority=9131,ip,dl_vlan=200,nw_dst=10.100.0.1 actions=set_field:4196->vlan_vid,set_field:0e:00:00:00:00:01->eth_src,set_field:00:01:02:03:04:05->eth_dst,dec_ttl,goto_table:7
|
|
+table=7 priority=9099,dl_vlan=100,dl_dst=00:01:02:03:04:05 idle_timeout=3600 actions=pop_vlan,output:1
|
|
|
|
The new flows include one in table 3 and one in table 7 for the
|
|
learned MAC, which have the same forms we saw before. The new flows
|
|
in table 4 are different. They matches packets directed to 10.100.0.1
|
|
(in two VLANs) and forward them to the host by updating the Ethernet
|
|
source and destination addresses appropriately, decrementing the TTL,
|
|
and skipping ahead to unicast output in table 7. This means that
|
|
packets sent **to** 10.100.0.1 should now get to their destination.
|
|
|
|
Step 2: Router Sends ARP Reply
|
|
++++++++++++++++++++++++++++++
|
|
|
|
``inst/faucet.log`` said that the router sent an ARP reply. How can
|
|
we see it? Simulated packets just get dropped by default. One way is
|
|
to configure the dummy ports to write the packets they receive to a
|
|
file. Let's try that. First configure the port::
|
|
|
|
$ ovs-vsctl set interface p1 options:pcap=p1.pcap
|
|
|
|
Then re-run the "trace" command::
|
|
|
|
$ ovs-appctl ofproto/trace br0 in_port=p1,dl_src=00:01:02:03:04:05,dl_dst=ff:ff:ff:ff:ff:ff,dl_type=0x806,arp_spa=10.100.0.1,arp_tpa=10.100.0.254,arp_sha=00:01:02:03:04:05,arp_tha=ff:ff:ff:ff:ff:ff,arp_op=1 -generate
|
|
|
|
And dump the reply packet::
|
|
|
|
$ /usr/sbin/tcpdump -evvvr sandbox/p1.pcap
|
|
reading from file sandbox/p1.pcap, link-type EN10MB (Ethernet)
|
|
16:14:47.670727 0e:00:00:00:00:01 (oui Unknown) > 00:01:02:03:04:05 (oui Unknown), ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Reply 10.100.0.254 is-at 0e:00:00:00:00:01 (oui Unknown), length 46
|
|
|
|
We clearly see the ARP reply, which tells us that the Faucet router's
|
|
Ethernet address is 0e:00:00:00:00:01 (as we guessed before from the
|
|
flow table.
|
|
|
|
Let's configure the rest of our ports to log their packets, too::
|
|
|
|
$ for i in 2 3 4 5; do ovs-vsctl set interface p$i options:pcap=p$i.pcap; done
|
|
|
|
Step 3: Host Sends IP Packet
|
|
++++++++++++++++++++++++++++
|
|
|
|
Now that host 10.100.0.1 has the MAC address for its router, it can
|
|
send an IP packet to 10.200.0.1 via the router's MAC address, like
|
|
this::
|
|
|
|
$ ovs-appctl ofproto/trace br0 in_port=p1,dl_src=00:01:02:03:04:05,dl_dst=0e:00:00:00:00:01,udp,nw_src=10.100.0.1,nw_dst=10.200.0.1,nw_ttl=64 -generate
|
|
Flow: udp,in_port=1,vlan_tci=0x0000,dl_src=00:01:02:03:04:05,dl_dst=0e:00:00:00:00:01,nw_src=10.100.0.1,nw_dst=10.200.0.1,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=0
|
|
|
|
bridge("br0")
|
|
-------------
|
|
0. in_port=1, priority 9099, cookie 0x5adc15c0
|
|
goto_table:1
|
|
1. in_port=1,vlan_tci=0x0000/0x1fff, priority 9000, cookie 0x5adc15c0
|
|
push_vlan:0x8100
|
|
set_field:4196->vlan_vid
|
|
goto_table:3
|
|
3. ip,dl_vlan=100,dl_dst=0e:00:00:00:00:01, priority 9099, cookie 0x5adc15c0
|
|
goto_table:4
|
|
4. ip,dl_vlan=100,nw_dst=10.200.0.0/24, priority 9123, cookie 0x5adc15c0
|
|
goto_table:6
|
|
6. ip, priority 9130, cookie 0x5adc15c0
|
|
CONTROLLER:128
|
|
|
|
Final flow: udp,in_port=1,dl_vlan=100,dl_vlan_pcp=0,vlan_tci1=0x0000,dl_src=00:01:02:03:04:05,dl_dst=0e:00:00:00:00:01,nw_src=10.100.0.1,nw_dst=10.200.0.1,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=0
|
|
Megaflow: recirc_id=0,eth,ip,in_port=1,vlan_tci=0x0000/0x1fff,dl_src=00:01:02:03:04:05,dl_dst=0e:00:00:00:00:01,nw_dst=10.200.0.0/25,nw_frag=no
|
|
Datapath actions: push_vlan(vid=100,pcp=0),userspace(pid=0,controller(reason=1,flags=0,recirc_id=6,rule_cookie=0x5adc15c0,controller_id=0,max_len=128))
|
|
|
|
Observe that the packet gets recognized as destined to the router, in
|
|
table 3, and then as properly destined to the 10.200.0.0/24 network,
|
|
in table 4. In table 6, however, it gets sent to the controller.
|
|
Presumably, this is because Faucet has not yet resolved an Ethernet
|
|
address for the destination host 10.200.0.1. It probably sent out an
|
|
ARP request. Let's take a look in the next step.
|
|
|
|
Step 4: Router Broadcasts ARP Request
|
|
+++++++++++++++++++++++++++++++++++++
|
|
|
|
The router needs to know the Ethernet address of 10.200.0.1. It knows
|
|
that, if this machine exists, it's on port ``p4`` or ``p5``, since we
|
|
configured those ports as VLAN 200.
|
|
|
|
Let's make sure::
|
|
|
|
$ /usr/sbin/tcpdump -evvvr sandbox/p4.pcap
|
|
reading from file sandbox/p4.pcap, link-type EN10MB (Ethernet)
|
|
16:17:43.174006 0e:00:00:00:00:01 (oui Unknown) > Broadcast, ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Request who-has 10.200.0.1 tell 10.200.0.254, length 46
|
|
|
|
and::
|
|
|
|
$ /usr/sbin/tcpdump -evvvr sandbox/p5.pcap
|
|
reading from file sandbox/p5.pcap, link-type EN10MB (Ethernet)
|
|
16:17:43.174268 0e:00:00:00:00:01 (oui Unknown) > Broadcast, ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Request who-has 10.200.0.1 tell 10.200.0.254, length 46
|
|
|
|
For good measure, let's make sure that it wasn't sent to ``p3``::
|
|
|
|
$ /usr/sbin/tcpdump -evvvr sandbox/p3.pcap
|
|
reading from file sandbox/p3.pcap, link-type EN10MB (Ethernet)
|
|
|
|
Step 5: Host 2 Sends ARP Reply
|
|
++++++++++++++++++++++++++++++
|
|
|
|
The Faucet controller sent an ARP request, so we can send an ARP
|
|
reply::
|
|
|
|
$ ovs-appctl ofproto/trace br0 in_port=p4,dl_src=00:10:20:30:40:50,dl_dst=0e:00:00:00:00:01,dl_type=0x806,arp_spa=10.200.0.1,arp_tpa=10.200.0.254,arp_sha=00:10:20:30:40:50,arp_tha=0e:00:00:00:00:01,arp_op=2 -generate
|
|
Flow: arp,in_port=4,vlan_tci=0x0000,dl_src=00:10:20:30:40:50,dl_dst=0e:00:00:00:00:01,arp_spa=10.200.0.1,arp_tpa=10.200.0.254,arp_op=2,arp_sha=00:10:20:30:40:50,arp_tha=0e:00:00:00:00:01
|
|
|
|
bridge("br0")
|
|
-------------
|
|
0. in_port=4, priority 9099, cookie 0x5adc15c0
|
|
goto_table:1
|
|
1. in_port=4,vlan_tci=0x0000/0x1fff, priority 9000, cookie 0x5adc15c0
|
|
push_vlan:0x8100
|
|
set_field:4296->vlan_vid
|
|
goto_table:3
|
|
3. arp,dl_vlan=200, priority 9131, cookie 0x5adc15c0
|
|
goto_table:6
|
|
6. arp,arp_tpa=10.200.0.254, priority 9133, cookie 0x5adc15c0
|
|
CONTROLLER:128
|
|
|
|
Final flow: arp,in_port=4,dl_vlan=200,dl_vlan_pcp=0,vlan_tci1=0x0000,dl_src=00:10:20:30:40:50,dl_dst=0e:00:00:00:00:01,arp_spa=10.200.0.1,arp_tpa=10.200.0.254,arp_op=2,arp_sha=00:10:20:30:40:50,arp_tha=0e:00:00:00:00:01
|
|
Megaflow: recirc_id=0,eth,arp,in_port=4,vlan_tci=0x0000/0x1fff,dl_dst=0e:00:00:00:00:01,arp_tpa=10.200.0.254
|
|
Datapath actions: push_vlan(vid=200,pcp=0),userspace(pid=0,controller(reason=1,flags=0,recirc_id=7,rule_cookie=0x5adc15c0,controller_id=0,max_len=128))
|
|
|
|
It shows up in ``inst/faucet.log``::
|
|
|
|
Jan 06 03:20:11 faucet.valve INFO DPID 1 (0x1) Adding new route 10.200.0.1/32 via 10.200.0.1 (00:10:20:30:40:50) on VLAN 200
|
|
Jan 06 03:20:11 faucet.valve INFO DPID 1 (0x1) ARP response 10.200.0.1 (00:10:20:30:40:50) on VLAN 200
|
|
Jan 06 03:20:11 faucet.valve INFO DPID 1 (0x1) L2 learned 00:10:20:30:40:50 (L2 type 0x0806, L3 src 10.200.0.1) on Port 4 on VLAN 200 (1 hosts total)
|
|
|
|
and in the OVS flow tables::
|
|
|
|
$ diff-flows flows2 br0
|
|
+table=3 priority=9098,in_port=4,dl_vlan=200,dl_src=00:10:20:30:40:50 hard_timeout=3601 actions=goto_table:7
|
|
...
|
|
+table=4 priority=9131,ip,dl_vlan=200,nw_dst=10.200.0.1 actions=set_field:4296->vlan_vid,set_field:0e:00:00:00:00:01->eth_src,set_field:00:10:20:30:40:50->eth_dst,dec_ttl,goto_table:7
|
|
+table=4 priority=9131,ip,dl_vlan=100,nw_dst=10.200.0.1 actions=set_field:4296->vlan_vid,set_field:0e:00:00:00:00:01->eth_src,set_field:00:10:20:30:40:50->eth_dst,dec_ttl,goto_table:7
|
|
...
|
|
+table=4 priority=9123,ip,dl_vlan=100,nw_dst=10.200.0.0/24 actions=goto_table:6
|
|
+table=7 priority=9099,dl_vlan=200,dl_dst=00:10:20:30:40:50 idle_timeout=3601 actions=pop_vlan,output:4
|
|
|
|
Step 6: IP Packet Delivery
|
|
++++++++++++++++++++++++++
|
|
|
|
Now both the host and the router have everything they need to deliver
|
|
the packet. There are two ways it might happen. If Faucet's router
|
|
is smart enough to buffer the packet that trigger ARP resolution, then
|
|
it might have delivered it already. If so, then it should show up in
|
|
``p4.pcap``. Let's take a look::
|
|
|
|
$ /usr/sbin/tcpdump -evvvr sandbox/p4.pcap ip
|
|
reading from file sandbox/p4.pcap, link-type EN10MB (Ethernet)
|
|
|
|
Nope. That leaves the other possibility, which is that Faucet waits
|
|
for the original sending host to re-send the packet. We can do that
|
|
by re-running the trace::
|
|
|
|
$ ovs-appctl ofproto/trace br0 in_port=p1,dl_src=00:01:02:03:04:05,dl_dst=0e:00:00:00:00:01,udp,nw_src=10.100.0.1,nw_dst=10.200.0.1,nw_ttl=64 -generate
|
|
Flow: udp,in_port=1,vlan_tci=0x0000,dl_src=00:01:02:03:04:05,dl_dst=0e:00:00:00:00:01,nw_src=10.100.0.1,nw_dst=10.200.0.1,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=0
|
|
|
|
bridge("br0")
|
|
-------------
|
|
0. in_port=1, priority 9099, cookie 0x5adc15c0
|
|
goto_table:1
|
|
1. in_port=1,vlan_tci=0x0000/0x1fff, priority 9000, cookie 0x5adc15c0
|
|
push_vlan:0x8100
|
|
set_field:4196->vlan_vid
|
|
goto_table:3
|
|
3. ip,dl_vlan=100,dl_dst=0e:00:00:00:00:01, priority 9099, cookie 0x5adc15c0
|
|
goto_table:4
|
|
4. ip,dl_vlan=100,nw_dst=10.200.0.1, priority 9131, cookie 0x5adc15c0
|
|
set_field:4296->vlan_vid
|
|
set_field:0e:00:00:00:00:01->eth_src
|
|
set_field:00:10:20:30:40:50->eth_dst
|
|
dec_ttl
|
|
goto_table:7
|
|
7. dl_vlan=200,dl_dst=00:10:20:30:40:50, priority 9099, cookie 0x5adc15c0
|
|
pop_vlan
|
|
output:4
|
|
|
|
Final flow: udp,in_port=1,vlan_tci=0x0000,dl_src=0e:00:00:00:00:01,dl_dst=00:10:20:30:40:50,nw_src=10.100.0.1,nw_dst=10.200.0.1,nw_tos=0,nw_ecn=0,nw_ttl=63,tp_src=0,tp_dst=0
|
|
Megaflow: recirc_id=0,eth,ip,in_port=1,vlan_tci=0x0000/0x1fff,dl_src=00:01:02:03:04:05,dl_dst=0e:00:00:00:00:01,nw_dst=10.200.0.1,nw_ttl=64,nw_frag=no
|
|
Datapath actions: set(eth(src=0e:00:00:00:00:01,dst=00:10:20:30:40:50)),set(ipv4(dst=10.200.0.1,ttl=63)),4
|
|
|
|
Finally, we have working IP packet forwarding!
|
|
|
|
Performance
|
|
~~~~~~~~~~~
|
|
|
|
Take another look at the megaflow line above::
|
|
|
|
Megaflow: recirc_id=0,eth,ip,in_port=1,vlan_tci=0x0000/0x1fff,dl_src=00:01:02:03:04:05,dl_dst=0e:00:00:00:00:01,nw_dst=10.200.0.1,nw_ttl=64,nw_frag=no
|
|
|
|
This means that (almost) any packet between these Ethernet source and
|
|
destination hosts, destined to the given IP host, will be handled by
|
|
this single megaflow cache entry. So regardless of the number of UDP
|
|
packets or TCP connections that these hosts exchange, Open vSwitch
|
|
packet processing won't need to fall back to the slow path. It is
|
|
quite efficient.
|
|
|
|
.. note::
|
|
|
|
The exceptions are packets with a TTL other than 64, and fragmented
|
|
packets. Most hosts use a constant TTL for outgoing packets, and
|
|
fragments are rare. If either of those did change, then that would
|
|
simply result in a new megaflow cache entry.
|
|
|
|
The datapath actions might also be worth a look::
|
|
|
|
Datapath actions: set(eth(src=0e:00:00:00:00:01,dst=00:10:20:30:40:50)),set(ipv4(dst=10.200.0.1,ttl=63)),4
|
|
|
|
This just means that, to process these packets, the datapath changes
|
|
the Ethernet source and destination addresses and the IP TTL, and then
|
|
transmits the packet to port ``p4`` (also numbered 4). Notice in
|
|
particular that, despite the OpenFlow actions that pushed, modified,
|
|
and popped back off a VLAN, there is nothing in the datapath actions
|
|
about VLANs. This is because the OVS flow translation code "optimizes
|
|
out" redundant or unneeded actions, which saves time when the cache
|
|
entry is executed later.
|
|
|
|
.. note::
|
|
|
|
It's not clear why the actions also re-set the IP destination
|
|
address to its original value. Perhaps this is a minor performance
|
|
bug.
|
|
|
|
ACLs
|
|
----
|
|
|
|
Let's try out some ACLs, since they do a good job illustrating some of
|
|
the ways that OVS tries to optimize megaflows. Update
|
|
``inst/faucet.yaml`` to the following::
|
|
|
|
dps:
|
|
switch-1:
|
|
dp_id: 0x1
|
|
timeout: 3600
|
|
arp_neighbor_timeout: 3600
|
|
interfaces:
|
|
1:
|
|
native_vlan: 100
|
|
acl_in: 1
|
|
2:
|
|
native_vlan: 100
|
|
3:
|
|
native_vlan: 100
|
|
4:
|
|
native_vlan: 200
|
|
5:
|
|
native_vlan: 200
|
|
vlans:
|
|
100:
|
|
faucet_vips: ["10.100.0.254/24"]
|
|
200:
|
|
faucet_vips: ["10.200.0.254/24"]
|
|
routers:
|
|
router-1:
|
|
vlans: [100, 200]
|
|
acls:
|
|
1:
|
|
- rule:
|
|
dl_type: 0x800
|
|
nw_proto: 6
|
|
tcp_dst: 8080
|
|
actions:
|
|
allow: 0
|
|
- rule:
|
|
actions:
|
|
allow: 1
|
|
|
|
Then restart Faucet::
|
|
|
|
$ docker restart faucet
|
|
|
|
On port 1, this new configuration blocks all traffic to TCP port 8080
|
|
and allows all other traffic. The resulting change in the flow table
|
|
shows this clearly too::
|
|
|
|
$ diff-flows flows2 br0
|
|
-priority=9099,in_port=1 actions=goto_table:1
|
|
+priority=9098,in_port=1 actions=goto_table:1
|
|
+priority=9099,tcp,in_port=1,tp_dst=8080 actions=drop
|
|
|
|
The most interesting question here is performance. If you recall the
|
|
earlier discussion, when a packet through the flow table encounters a
|
|
match on a given field, the resulting megaflow has to match on that
|
|
field, even if the flow didn't actually match. This is expensive.
|
|
|
|
In particular, here you can see that any TCP packet is going to
|
|
encounter the ACL flow, even if it is directed to a port other than
|
|
8080. If that means that every megaflow for a TCP packet is going to
|
|
have to match on the TCP destination, that's going to be bad for
|
|
caching performance because there will be a need for a separate
|
|
megaflow for every TCP destination port that actually appears in
|
|
traffic, which means a lot more megaflows than otherwise. (Really, in
|
|
practice, if such a simple ACL blew up performance, OVS wouldn't be a
|
|
very good switch!)
|
|
|
|
Let's see what happens, by sending a packet to port 80 (instead of
|
|
8080)::
|
|
|
|
$ ovs-appctl ofproto/trace br0 in_port=p1,dl_src=00:01:02:03:04:05,dl_dst=0e:00:00:00:00:01,tcp,nw_src=10.100.0.1,nw_dst=10.200.0.1,nw_ttl=64,tp_dst=80 -generate
|
|
Flow: tcp,in_port=1,vlan_tci=0x0000,dl_src=00:01:02:03:04:05,dl_dst=0e:00:00:00:00:01,nw_src=10.100.0.1,nw_dst=10.200.0.1,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0
|
|
|
|
bridge("br0")
|
|
-------------
|
|
0. in_port=1, priority 9098, cookie 0x5adc15c0
|
|
goto_table:1
|
|
1. in_port=1,vlan_tci=0x0000/0x1fff, priority 9000, cookie 0x5adc15c0
|
|
push_vlan:0x8100
|
|
set_field:4196->vlan_vid
|
|
goto_table:3
|
|
3. ip,dl_vlan=100,dl_dst=0e:00:00:00:00:01, priority 9099, cookie 0x5adc15c0
|
|
goto_table:4
|
|
4. ip,dl_vlan=100,nw_dst=10.200.0.0/24, priority 9123, cookie 0x5adc15c0
|
|
goto_table:6
|
|
6. ip, priority 9130, cookie 0x5adc15c0
|
|
CONTROLLER:128
|
|
|
|
Final flow: tcp,in_port=1,dl_vlan=100,dl_vlan_pcp=0,vlan_tci1=0x0000,dl_src=00:01:02:03:04:05,dl_dst=0e:00:00:00:00:01,nw_src=10.100.0.1,nw_dst=10.200.0.1,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0
|
|
Megaflow: recirc_id=0,eth,tcp,in_port=1,vlan_tci=0x0000/0x1fff,dl_src=00:01:02:03:04:05,dl_dst=0e:00:00:00:00:01,nw_dst=10.200.0.1,nw_frag=no,tp_dst=0x0/0xf000
|
|
Datapath actions: push_vlan(vid=100,pcp=0)
|
|
|
|
Take a look at the Megaflow line and in particular the match on
|
|
``tp_dst``, which says ``tp_dst=0x0/0xf000``. What this means is that
|
|
the megaflow matches on only the top 4 bits of the TCP destination
|
|
port. That works because::
|
|
|
|
80 (base 10) == 0001,1111,1001,0000 (base 2)
|
|
8080 (base 10) == 0000,0000,0101,0000 (base 2)
|
|
|
|
and so by matching on only the top 4 bits, rather than all 16, the OVS
|
|
fast path can distinguish port 80 from port 8080. This allows this
|
|
megaflow to match one-sixteenth of the TCP destination port address
|
|
space, rather than just 1/65536th of it.
|
|
|
|
.. note::
|
|
|
|
The algorithm OVS uses for this purpose isn't perfect. In this
|
|
case, a single-bit match would work (e.g. tp_dst=0x0/0x1000), and
|
|
would be superior since it would only match half the port address
|
|
space instead of one-sixteenth.
|
|
|
|
For details of this algorithm, please refer to ``lib/classifier.c`` in
|
|
the Open vSwitch source tree, or our 2015 NSDI paper "The Design and
|
|
Implementation of Open vSwitch".
|
|
|
|
Finishing Up
|
|
------------
|
|
|
|
When you're done, you probably want to exit the sandbox session, with
|
|
Control+D or ``exit``, and stop the Faucet controller with ``docker
|
|
stop faucet; docker rm faucet``.
|
|
|
|
Further Directions
|
|
------------------
|
|
|
|
We've looked a fair bit at how Faucet interacts with Open vSwitch. If
|
|
you still have some interest, you might want to explore some of these
|
|
directions:
|
|
|
|
* Adding more than one switch. Faucet can control multiple switches
|
|
but we've only been simulating one of them. It's easy enough to
|
|
make a single OVS instance act as multiple switches (just
|
|
``ovs-vsctl add-br`` another bridge), or you could use genuinely
|
|
separate OVS instances.
|
|
|
|
* Additional features. Faucet has more features than we've
|
|
demonstrated, such as IPv6 routing and port mirroring. These should
|
|
also interact gracefully with Open vSwitch.
|
|
|
|
* Real performance testing. We've looked at how flows and traces
|
|
**should** demonstrate good performance, but of course there's no
|
|
proof until it actually works in practice. We've also only tested
|
|
with trivial configurations. Open vSwitch can scale to millions of
|
|
OpenFlow flows, but the scaling in practice depends on the
|
|
particular flow tables and traffic patterns, so it's valuable to
|
|
test with large configurations, either in the way we've done it or
|
|
with real traffic.
|