2
0
mirror of https://github.com/openvswitch/ovs synced 2025-10-25 15:07:05 +00:00

ovn: Design and Schema changes for Container integration.

The design was come up after inputs and discussions with multiple
people, including (in alphabetical order) Aaron Rosen, Ben Pfaff,
Ganesan Chandrashekhar, Justin Pettit, Russell Bryant and Somik Behera.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Russell Bryant <rbryant@redhat.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This commit is contained in:
Gurucharan Shetty
2015-03-03 18:01:43 -08:00
parent a416ff28ed
commit 9fb4636f6c
7 changed files with 387 additions and 43 deletions

121
ovn/CONTAINERS.OpenStack.md Normal file
View File

@@ -0,0 +1,121 @@
Integration of Containers with OVN and OpenStack
------------------------------------------------
Isolation between containers is weaker than isolation between VMs, so
some environments deploy containers for different tenants in separate
VMs as an additional security measure. This document describes creation of
containers inside VMs and how they can be made part of the logical networks
securely. The created logical network can include VMs, containers and
physical machines as endpoints. To better understand the proposed integration
of containers with OVN and OpenStack, this document describes the end to end
workflow with an example.
* A OpenStack tenant creates a VM (say VM-A) with a single network interface
that belongs to a management logical network. The VM is meant to host
containers. OpenStack Nova chooses the hypervisor on which VM-A is created.
* A Neutron port may have been created in advance and passed in to Nova
with the request to create a new VM. If not, Nova will issue a request
to Neutron to create a new port. The ID of the logical port from
Neutron will also be used as the vif-id for the virtual network
interface (VIF) of VM-A.
* When VM-A is created on a hypervisor, its VIF gets added to the
Open vSwitch integration bridge. This creates a row in the Interface table
of the Open_vSwitch database. As explained in the [IntegrationGuide.md],
the vif-id associated with the VM network interface gets added in the
external_ids:iface-id column of the newly created row in the Interface table.
* Since VM-A belongs to a logical network, it gets an IP address. This IP
address is used to spawn containers (either manually or through container
orchestration systems) inside that VM and to monitor the health of the
created containers.
* The vif-id associated with the VM's network interface can be obtained by
making a call to Neutron using tenant credentials.
* This flow assumes a component called a "container network plugin".
If you take Docker as an example for containers, you could envision
the plugin to be either a wrapper around Docker or a feature of Docker itself
that understands how to perform part of this workflow to get a container
connected to a logical network managed by Neutron. The rest of the flow
refers to this logical component that does not yet exist as the
"container network plugin".
* All the calls to Neutron will need tenant credentials. These calls can
either be made from inside the tenant VM as part of a container network plugin
or from outside the tenant VM (if the tenant is not comfortable using temporary
Keystone tokens from inside the tenant VMs). For simplicity, this document
explains the work flow using the former method.
* The container hosting VM will need Open vSwitch installed in it. The only
work for Open vSwitch inside the VM is to tag network traffic coming from
containers.
* When a container needs to be created inside the VM with a container network
interface that is expected to be attached to a particular logical switch, the
network plugin in that VM chooses any unused VLAN (This VLAN tag only needs to
be unique inside that VM. This limits the number of container interfaces to
4096 inside a single VM). This VLAN tag is stripped out in the hypervisor
by OVN and is only useful as a context (or metadata) for OVN.
* The container network plugin then makes a call to Neutron to create a
logical port. In addition to all the inputs that a call to create a port in
Neutron that are currently needed, it sends the vif-id and the VLAN tag as
inputs.
* Neutron in turn will verify that the vif-id belongs to the tenant in question
and then uses the OVN specific plugin to create a new row in the Logical_Port
table of the OVN Northbound Database. Neutron responds back with an
IP address and MAC address for that network interface. So Neutron becomes
the IPAM system and provides unique IP and MAC addresses across VMs and
containers in the same logical network.
* The Neutron API call above to create a logical port for the container
could add a relatively significant amount of time to container creation.
However, an optimization is possible here. Logical ports could be
created in advance and reused by the container system doing container
orchestration. Additional Neutron API calls would only be needed if the
port needs to be attached to a different logical network.
* When a container is eventually deleted, the network plugin in that VM
may make a call to Neutron to delete that port. Neutron in turn will
delete the entry in the Logical_Port table of the OVN Northbound Database.
As an example, consider Docker containers. Since Docker currently does not
have a network plugin feature, this example uses a hypothetical wrapper
around Docker to make calls to Neutron.
* Create a Logical switch, e.g.:
```
% ovn-docker --cred=cca86bd13a564ac2a63ddf14bf45d37f create network LS1
```
The above command will make a call to Neutron with the credentials to create
a logical switch. The above is optional if the logical switch has already
been created from outside the VM.
* List networks available to the tenant.
```
% ovn-docker --cred=cca86bd13a564ac2a63ddf14bf45d37f list networks
```
* Create a container and attach a interface to the previously created switch
as a logical port.
```
% ovn-docker --cred=cca86bd13a564ac2a63ddf14bf45d37f --vif-id=$VIF_ID \
--network=LS1 run -d --net=none ubuntu:14.04 /bin/sh -c \
"while true; do echo hello world; sleep 1; done"
```
The above command will make a call to Neutron with all the inputs it currently
needs to create a logical port. In addition, it passes the $VIF_ID and a
unused VLAN. Neutron will add that information in OVN and return back
a MAC address and IP address for that interface. ovn-docker will then create
a veth pair, insert one end inside the container as 'eth0' and the other end
as a port of a local OVS bridge as an access port of the chosen VLAN.
[IntegrationGuide.md]:IntegrationGuide.md

View File

@@ -74,7 +74,9 @@ SUFFIXES += .xml
$(AM_V_GEN)$(run_python) $(srcdir)/build-aux/xml2nroff \
--version=$(VERSION) $< > $@.tmp && mv $@.tmp $@
EXTRA_DIST += ovn/TODO
EXTRA_DIST += \
ovn/TODO \
ovn/CONTAINERS.OpenStack.md
# ovn IDL
OVSIDL_BUILT += \

View File

@@ -149,8 +149,9 @@
<li>
<code>ovn-controller</code>(8) is OVN's agent on each hypervisor and
software gateway. Northbound, it connects to the OVN Database to learn
about OVN configuration and status and to populate the PN and <code>Bindings</code>
tables with the hypervisor's status. Southbound, it connects to
about OVN configuration and status and to populate the PN table and the
<code>Chassis</code> column in <code>Bindings</code> table with the
hypervisor's status. Southbound, it connects to
<code>ovs-vswitchd</code>(8) as an OpenFlow controller, for control over
network traffic, and to the local <code>ovsdb-server</code>(1) to allow
it to monitor and control Open vSwitch configuration.
@@ -258,6 +259,12 @@
understand. Here's an example.
</p>
<p>
A VIF on a hypervisor is a virtual network interface attached either
to a VM or a container running directly on that hypervisor (This is
different from the interface of a container running inside a VM).
</p>
<p>
The steps in this example refer often to details of the OVN and OVN
Northbound database schemas. Please see <code>ovn</code>(5) and
@@ -288,7 +295,10 @@
rows to the OVN database <code>Pipeline</code> table to reflect the new
port, e.g. add a flow to recognize that packets destined to the new
port's MAC address should be delivered to it, and update the flow that
delivers broadcast and multicast packets to include the new port.
delivers broadcast and multicast packets to include the new port. It
also creates a record in the <code>Bindings</code> table and populates
all its columns except the column that identifies the
<code>chassis</code>.
</li>
<li>
@@ -316,24 +326,25 @@
notices <code>external-ids</code>:<code>iface-id</code> in the new
Interface. In response, it updates the local hypervisor's OpenFlow
tables so that packets to and from the VIF are properly handled.
Afterward, it updates the <code>Bindings</code> table in the OVN DB,
adding a row that links the logical port from
<code>external-ids</code>:<code>iface-id</code> to the hypervisor.
Afterward, in the OVN DB, it updates the <code>Bindings</code> table's
<code>chassis</code> column for the row that links the logical port
from <code>external-ids</code>:<code>iface-id</code> to the hypervisor.
</li>
<li>
Some CMS systems, including OpenStack, fully start a VM only when its
networking is ready. To support this, <code>ovn-nbd</code> notices the
new row in the <code>Bindings</code> table, and pushes this upward by
updating the <ref column="up" table="Logical_Port" db="OVN_NB"/> column
in the OVN Northbound database's <ref table="Logical_Port" db="OVN_NB"/>
table to indicate that the VIF is now up. The CMS, if it uses this
feature, can then react by allowing the VM's execution to proceed.
<code>chassis</code> column updated for the row in <code>Bindings</code>
table and pushes this upward by updating the <ref column="up"
table="Logical_Port" db="OVN_NB"/> column in the OVN Northbound
database's <ref table="Logical_Port" db="OVN_NB"/> table to indicate
that the VIF is now up. The CMS, if it uses this feature, can then
react by allowing the VM's execution to proceed.
</li>
<li>
On every hypervisor but the one where the VIF resides,
<code>ovn-controller</code> notices the new row in the
<code>ovn-controller</code> notices the completely populated row in the
<code>Bindings</code> table. This provides <code>ovn-controller</code>
the physical location of the logical port, so each instance updates the
OpenFlow tables of its switch (based on logical datapath flows in the OVN
@@ -350,16 +361,16 @@
<li>
On the hypervisor where the VM was powered on,
<code>ovn-controller</code> notices that the VIF was deleted. In
response, it removes the logical port's row from the
<code>Bindings</code> table.
response, it removes the <code>Chassis</code> column content in the
<code>Bindings</code> table for the logical port.
</li>
<li>
On every hypervisor, <code>ovn-controller</code> notices the row removed
from the <code>Bindings</code> table. This means that
<code>ovn-controller</code> no longer knows the physical location of the
logical port, so each instance updates its OpenFlow table to reflect
that.
On every hypervisor, <code>ovn-controller</code> notices the empty
<code>Chassis</code> column in the <code>Bindings</code> table's row
for the logical port. This means that <code>ovn-controller</code> no
longer knows the physical location of the logical port, so each instance
updates its OpenFlow table to reflect that.
</li>
<li>
@@ -376,8 +387,8 @@
<li>
<code>ovs-nbd</code> receives the OVN Northbound update and in turn
updates the OVN database accordingly, by removing or updating the
rows from the OVN database <code>Pipeline</code> table that were related
to the now-destroyed VIF.
rows from the OVN database <code>Pipeline</code> table and
<code>Bindings</code> table that were related to the now-destroyed VIF.
</li>
<li>
@@ -390,4 +401,137 @@
</li>
</ol>
<h2>Life Cycle of a container interface inside a VM</h2>
<p>
OVN provides virtual network abstractions by converting information
written in OVN_NB database to OpenFlow flows in each hypervisor. Secure
virtual networking for multi-tenants can only be provided if OVN controller
is the only entity that can modify flows in Open vSwitch. When the
Open vSwitch integration bridge resides in the hypervisor, it is a
fair assumption to make that tenant workloads running inside VMs cannot
make any changes to Open vSwitch flows.
</p>
<p>
If the infrastructure provider trusts the applications inside the
containers not to break out and modify the Open vSwitch flows, then
containers can be run in hypervisors. This is also the case when
containers are run inside the VMs and Open vSwitch integration bridge
with flows added by OVN controller resides in the same VM. For both
the above cases, the workflow is the same as explained with an example
in the previous section ("Life Cycle of a VIF").
</p>
<p>
This section talks about the life cycle of a container interface (CIF)
when containers are created in the VMs and the Open vSwitch integration
bridge resides inside the hypervisor. In this case, even if a container
application breaks out, other tenants are not affected because the
containers running inside the VMs cannot modify the flows in the
Open vSwitch integration bridge.
</p>
<p>
When multiple containers are created inside a VM, there are multiple
CIFs associated with them. The network traffic associated with these
CIFs need to reach the Open vSwitch integration bridge running in the
hypervisor for OVN to support virtual network abstractions. OVN should
also be able to distinguish network traffic coming from different CIFs.
There are two ways to distinguish network traffic of CIFs.
</p>
<p>
One way is to provide one VIF for every CIF (1:1 model). This means that
there could be a lot of network devices in the hypervisor. This would slow
down OVS because of all the additional CPU cycles needed for the management
of all the VIFs. It would also mean that the entity creating the
containers in a VM should also be able to create the corresponding VIFs in
the hypervisor.
</p>
<p>
The second way is to provide a single VIF for all the CIFs (1:many model).
OVN could then distinguish network traffic coming from different CIFs via
a tag written in every packet. OVN uses this mechanism and uses VLAN as
the tagging mechanism.
</p>
<ol>
<li>
A CIF's life cycle begins when a container is spawned inside a VM by
the either the same CMS that created the VM or a tenant that owns that VM
or even a container Orchestration System that is different than the CMS
that initially created the VM. Whoever the entity is, it will need to
know the <var>vif-id</var> that is associated with the network interface
of the VM through which the container interface's network traffic is
expected to go through. The entity that creates the container interface
will also need to choose an unused VLAN inside that VM.
</li>
<li>
The container spawning entity (either directly or through the CMS that
manages the underlying infrastructure) updates the OVN Northbound
database to include the new CIF, by adding a row to the
<code>Logical_Port</code> table. In the new row, <code>name</code> is
any unique identifier, <code>parent_name</code> is the <var>vif-id</var>
of the VM through which the CIF's network traffic is expected to go
through and the <code>tag</code> is the VLAN tag that identifies the
network traffic of that CIF.
</li>
<li>
<code>ovn-nbd</code> receives the OVN Northbound database update. In
turn, it makes the corresponding updates to the OVN database, by adding
rows to the OVN database's <code>Pipeline</code> table to reflect the new
port and also by creating a new row in the <code>Bindings</code> table
and populating all its columns except the column that identifies the
<code>chassis</code>.
</li>
<li>
On every hypervisor, <code>ovn-controller</code> subscribes to the
changes in the <code>Bindings</code> table. When a new row is created
by <code>ovn-nbd</code> that includes a value in <code>parent_port</code>
column of <code>Bindings</code> table, the <code>ovn-controller</code>
in the hypervisor whose OVN integration bridge has that same value in
<var>vif-id</var> in <code>external-ids</code>:<code>iface-id</code>
updates the local hypervisor's OpenFlow tables so that packets to and
from the VIF with the particular VLAN <code>tag</code> are properly
handled. Afterward it updates the <code>chassis</code> column of
the <code>Bindings</code> to reflect the physical location.
</li>
<li>
One can only start the application inside the container after the
underlying network is ready. To support this, <code>ovn-nbd</code>
notices the updated <code>chassis</code> column in <code>Bindings</code>
table and updates the <ref column="up" table="Logical_Port"
db="OVN_NB"/> column in the OVN Northbound database's
<ref table="Logical_Port" db="OVN_NB"/> table to indicate that the
CIF is now up. The entity responsible to start the container application
queries this value and starts the application.
</li>
<li>
Eventually the entity that created and started the container, stops it.
The entity, through the CMS (or directly) deletes its row in the
<code>Logical_Port</code> table.
</li>
<li>
<code>ovn-nbd</code> receives the OVN Northbound update and in turn
updates the OVN database accordingly, by removing or updating the
rows from the OVN database <code>Pipeline</code> table that were related
to the now-destroyed CIF. It also deletes the row in the
<code>Bindings</code> table for that CIF.
</li>
<li>
On every hypervisor, <code>ovn-controller</code> receives the
<code>Pipeline</code> table updates that <code>ovn-nbd</code> made in the
previous step. <code>ovn-controller</code> updates OpenFlow tables to
reflect the update.
</li>
</ol>
</manpage>

View File

@@ -17,6 +17,12 @@
"refTable": "Logical_Switch",
"refType": "strong"}}},
"name": {"type": "string"},
"parent_name": {"type": {"key": "string", "min": 0, "max": 1}},
"tag": {
"type": {"key": {"type": "integer",
"minInteger": 0,
"maxInteger": 4095},
"min": 0, "max": 1}},
"macs": {"type": {"key": "string",
"min": 0,
"max": "unlimited"}},

View File

@@ -73,12 +73,46 @@
</column>
<column name="name">
The logical port name. The name used here must match those used in the
<p>
The logical port name.
</p>
<p>
For entities (VMs or containers) that are spawned in the hypervisor,
the name used here must match those used in the <ref key="iface-id"
table="Interface" column="external_ids" db="Open_vSwitch"/> in the
<ref db="Open_vSwitch"/> database's <ref table="Interface"
db="Open_vSwitch"/> table, because hypervisors use <ref key="iface-id"
table="Interface" column="external_ids" db="Open_vSwitch"/> as a lookup
key to identify the network interface of that entity.
</p>
<p>
For containers that are spawned inside a VM, the name can be
any unique identifier. In such a case, <ref column="parent_name"/>
must be populated.
</p>
</column>
<column name="parent_name">
When <ref column="name"/> identifies the interface of a container
spawned inside a tenant VM, this column represents the VM interface
through which the container interface sends its network traffic.
The name used here must match those used in the <ref key="iface-id"
table="Interface" column="external_ids" db="Open_vSwitch"/> in the
<ref db="Open_vSwitch"/> table, because hypervisors in this case use
<ref key="iface-id" table="Interface" column="external_ids"
db="Open_vSwitch"/> in the <ref db="Open_vSwitch"/> database's <ref
table="Interface" db="Open_vSwitch"/> table, because hypervisors use <ref
key="iface-id" table="Interface" column="external_ids"
db="Open_vSwitch"/> as a lookup key for logical ports.
db="Open_vSwitch"/> as a lookup key to identify the network interface
of the tenant VM.
</column>
<column name="tag">
When <ref column="name"/> identifies the interface of a container
spawned inside a tenant VM, this column identifies the VLAN tag in
the network traffic associated with that container's network interface.
When there are multiple container interfaces inside a VM, all of
them send their network traffic through a single VM network interface and
this value helps OVN identify the correct container interface.
</column>
<column name="up">
@@ -87,8 +121,9 @@
physical location in the OVN database <ref db="OVN" table="Bindings"/>
table, <code>ovn-nbd</code> sets this column to <code>true</code>;
otherwise, or if the port becomes unbound later, it sets it to
<code>false</code>. This allows the CMS to wait for a VM's networking to
become active before it allows the VM to start.
<code>false</code>. This allows the CMS to wait for a VM's
(or container's) networking to become active before it allows the
VM (or container) to start.
</column>
<column name="macs">

View File

@@ -45,6 +45,12 @@
"Bindings": {
"columns": {
"logical_port": {"type": "string"},
"parent_port": {"type": {"key": "string", "min": 0, "max": 1}},
"tag": {
"type": {"key": {"type": "integer",
"minInteger": 0,
"maxInteger": 4095},
"min": 0, "max": 1}},
"chassis": {"type": "string"},
"mac": {"type": {"key": "string",
"min": 0,

View File

@@ -467,34 +467,64 @@
<table name="Bindings" title="Physical-Logical Bindings">
<p>
Each row in this table identifies the physical location of a logical
port. Each hypervisor, via <code>ovn-controller</code>, populates this
table with rows for the logical ports that are located on its hypervisor,
which <code>ovn-controller</code> in turn finds out by monitoring the
local hypervisor's Open_vSwitch database, which identifies logical ports
via the conventions described in <code>IntegrationGuide.md</code>.
port.
</p>
<p>
When a chassis shuts down gracefully, it should remove its bindings.
For every <code>Logical_Port</code> record in <code>OVN_Northbound</code>
database, <code>ovn-ndb</code> creates a record in this table.
<code>ovn-nbd</code> is responsible for populating every column except
the <code>chassis</code> column.
</p>
<p>
<code>ovn-controller</code> populates the <code>chassis</code> column
for the records that identify the logical ports that are located on its
hypervisor, which <code>ovn-controller</code> in turn finds out by
monitoring the local hypervisor's Open_vSwitch database, which
identifies logical ports via the conventions described in
<code>IntegrationGuide.md</code>.
</p>
<p>
When a chassis shuts down gracefully, it should cleanup the
<code>chassis</code> column that it previously had populated.
(This is not critical because resources hosted on the chassis are equally
unreachable regardless of whether their rows are present.) To handle the
case where a VM is shut down abruptly on one chassis, then brought up
again on a different one, <code>ovn-controller</code> must delete any
existing <ref table="Binding"/> record for a logical port when it adds a
new one.
again on a different one, <code>ovn-controller</code> must overwrite the
<code>chassis</code> column with new information.
</p>
<column name="logical_port">
A logical port, taken from <ref key="iface-id" table="Interface"
column="external_ids" db="Open_vSwitch"/> in the Open_vSwitch database's
<ref table="Interface" db="Open_vSwitch"/> table. OVN does not prescribe
a particular format for the logical port ID.
A logical port, taken from <ref table="Logical_Port" column="name"
db="OVN_Northbound"/> in the OVN_Northbound database's
<ref table="Logical_Port" db="OVN_Northbound"/> table. OVN does not
prescribe a particular format for the logical port ID.
</column>
<column name="parent_port">
For containers created inside a VM, this is taken from
<ref table="Logical_Port" column="parent_name" db="OVN_Northbound"/>
in the OVN_Northbound database's <ref table="Logical_Port"
db="OVN_Northbound"/> table. It is left empty if
<ref column="logical_port"/> belongs to a VM or a container created
in the hypervisor.
</column>
<column name="tag">
When <ref column="logical_port"/> identifies the interface of a container
spawned inside a VM, this column identifies the VLAN tag in
the network traffic associated with that container's network interface.
It is left empty if <ref column="logical_port"/> belongs to a VM or a
container created in the hypervisor.
</column>
<column name="chassis">
The physical location of the logical port. To successfully identify a
chassis, this column must match the <ref table="Chassis" column="name"/>
column in some row in the <ref table="Chassis"/> table.
column in some row in the <ref table="Chassis"/> table. This is
populated by <code>ovn-controller</code>.
</column>
<column name="mac">