2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-31 06:15:47 +00:00

datapath-windows: Update DESIGN document.

In this patch, we update the design document to reflect the netlink
based kernel-userspace interface implementation and a few other changes.
I have covered at a high level.

Please feel free to extend the document with more details that you think
got missed out.

Signed-off-by: Nithin Raju <nithin@vmware.com>
Acked-by: Sorin Vinturis <svinturis@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This commit is contained in:
Nithin Raju
2014-11-25 09:06:43 -08:00
committed by Ben Pfaff
parent 5dd9826c9d
commit 36013bb157

View File

@@ -1,20 +1,13 @@
OVS-on-Hyper-V Design Document
==============================
There has been an effort in the recent past to develop the Open vSwitch (OVS)
solution onto multiple hypervisor platforms such as FreeBSD and Microsoft
Hyper-V. VMware has been working on a OVS solution for Microsoft Hyper-V for
the past few months and has successfully completed the implementation.
There has been a community effort to develop Open vSwitch on Microsoft Hyper-V.
In this document, we provide details of the development effort. We believe this
document should give enough information to understand the overall design.
This document provides details of the development effort. We believe this
document should give enough information to members of the community who are
curious about the developments of OVS on Hyper-V. The community should also be
able to get enough information to make plans to leverage the deliverables of
this effort.
The userspace portion of the OVS has already been ported to Hyper-V and
committed to the openvswitch repo. So, this document will mostly emphasize on
the kernel driver, though we touch upon some of the aspects of userspace as
well.
The userspace portion of the OVS has been ported to Hyper-V in a separate
effort, and committed to the openvswitch repo. So, this document will mostly
emphasize on the kernel driver, though we touch upon some of the aspects of
userspace as well.
We cover the following topics:
1. Background into relevant Hyper-V architecture
@@ -48,13 +41,13 @@ In Hyper-V, the virtual machine is called the Child Partition. Each VIF or
physical NIC on the Hyper-V extensible switch is attached via a port. Each port
is both on the ingress path or the egress path of the switch. The ingress path
is used for packets being sent out of a port, and egress is used for packet
being received on a port. By design, NDIS provides a layered interface, where
in the ingress path, higher level layers call into lower level layers, and on
the egress path, it is the other way round. In addition, there is a object
identifier (OID) interface for control operations Eg. addition of a port. The
workflow for the calls is similar in nature to the packets, where higher level
layers call into the lower level layers. A good representational diagram of
this architecture is in [4].
being received on a port. By design, NDIS provides a layered interface. In this
layered interface, higher level layers call into lower level layers, in the
ingress path. In the egress path, it is the other way round. In addition, there
is a object identifier (OID) interface for control operations Eg. addition of
a port. The workflow for the calls is similar in nature to the packets, where
higher level layers call into the lower level layers. A good representational
diagram of this architecture is in [4].
Windows Filtering Platform (WFP)[5] is a platform implemented on Hyper-V that
provides APIs and services for filtering packets. WFP has been utilized to
@@ -75,22 +68,23 @@ has been used to retrieve some of the configuration information that OVS needs.
| |
+------+ +--------------+ | +-----------+ +------------+ |
| | | | | | | | | |
| OVS- | | OVS | | | Virtual | | Virtual | |
| wind | | USERSPACE | | | Machine #1| | Machine #2 | |
| | | DAEMON/CTL | | | | | | |
| ovs- | | OVS- | | | Virtual | | Virtual | |
| *ctl | | USERSPACE | | | Machine #1| | Machine #2 | |
| | | DAEMON | | | | | | |
+------+-++---+---------+ | +--+------+-+ +----+------++ | +--------+
| DPIF- | | netdev- | | |VIF #1| |VIF #2| | |Physical|
| Windows |<=>| Windows | | +------+ +------+ | | NIC |
| dpif- | | netdev- | | |VIF #1| |VIF #2| | |Physical|
| netlink | | windows | | +------+ +------+ | | NIC |
+---------+ +---------+ | || /\ | +--------+
User /\ | || *#1* *#4* || | /\
=========||=======================+------||-------------------||--+ ||
Kernel || \/ || ||=====/
\/ +-----+ +-----+ *#5*
User /\ /\ | || *#1* *#4* || | /\
=========||=========||============+------||-------------------||--+ ||
Kernel || || \/ || ||=====/
\/ \/ +-----+ +-----+ *#5*
+-------------------------------+ | | | |
| +----------------------+ | | | | |
| | OVS Pseudo Device | | | | | |
| +----------------+-----+ | | | | |
| | | I | | |
| +----------------------+ | | | | |
| | Netlink Impl. | | | | | |
| ----------------- | | I | | |
| +------------+ | | N | | E |
| | Flowtable | +------------+ | | G | | G |
| +------------+ | Packet | |*#2*| R | | R |
@@ -110,9 +104,8 @@ Kernel || \/ || ||=====/
Figure 2 shows the various blocks involved in the OVS Windows implementation,
along with some of the components available in the NDIS stack, and also the
virtual machines. The workflow of a packet being transmitted from a VIF out and
into another VIF and to a physical NIC is also shown. New userspace components
being added as also shown. Later on in this section, well discuss the flow of
a packet at a high level.
into another VIF and to a physical NIC is also shown. Later on in this section,
we will discuss the flow of a packet at a high level.
The figure gives a general idea of where the OVS userspace and the kernel
components fit in, and how they interface with each other.
@@ -122,9 +115,11 @@ a forwarding extension roughly implementing the following
sub-modules/functionality. Details of each of these sub-components in the
kernel are contained in later sections:
* Interfacing with the NDIS stack
* Netlink message parser
* Netlink sockets
* Switch/Datapath management
* Interfacing with userspace portion of the OVS solution to implement the
necessary ioctls that userspace needs
necessary functionality that userspace needs
* Port management
* Flowtable/Actions/packet forwarding
* Tunneling
@@ -140,32 +135,36 @@ are:
* Interface between the userspace and the kernel module.
* Event notifications are significantly different.
* The communication interface between DPIF and the kernel module need not be
implemented in the way OVS on Linux does.
implemented in the way OVS on Linux does. That said, it would be
advantageous to have a similar interface to the kernel module for reasons of
readability and maintainability.
* Any licensing issues of using Linux kernel code directly.
Due to these differences, it was a straightforward decision to develop the
datapath for OVS on Hyper-V from scratch rather than porting the one on Linux.
A re-development focussed on the following goals:
A re-development focused on the following goals:
* Adhere to the existing requirements of userspace portion of OVS (such as
ovs- vswitchd), to minimize changes in the userspace workflow.
ovs-vswitchd), to minimize changes in the userspace workflow.
* Fit well into the typical workflow of a Hyper-V extensible switch forwarding
extension.
The userspace portion of the OVS solution is mostly POSIX code, and not very
Linux specific. Majority of the code has already been ported and committed to
the openvswitch repo. Most of the daemons such as ovs-vswitchd or ovsdb-server
can run on Windows now. One additional daemon that has been implemented is
called ovs-wind. At a high level ovs-wind manages keeps the ovsdb used by
userspace in sync with the kernel state. More details in the userspace section.
Linux specific. Majority of the userspace code does not interface directly with
the kernel datapath and was ported independently of the kernel datapath
effort.
As explained in the OVS porting design document [7], DPIF is the portion of
userspace that interfaces with the kernel portion of the OVS. Each platform can
have its own implementation of the DPIF provider whose interface is defined in
dpif-provider.h [3]. For OVS on Hyper-V, we have an implementation of DPIF
provider for Hyper-V. The communication interface between userspace and the
kernel is a pseudo device and is different from that of the Linuxs DPIF
provider which uses netlink. But, as long as the DPIF provider interface is the
same, the callers should be agnostic of the underlying communication interface.
userspace that interfaces with the kernel portion of the OVS. The interface
that each DPIF provider has to implement is defined in dpif-provider.h [3].
Though each platform is allowed to have its own implementation of the DPIF
provider, it was found, via community feedback, that it is desired to
share code whenever possible. Thus, the DPIF provider for OVS on Hyper-V shares
code with the DPIF provider on Linux. This interface is implemented in
dpif-netlink.c, formerly dpif-linux.c.
We'll elaborate more on kernel-userspace interface in a dedicated section
below. Here it suffices to say that the DPIF provider implementation for
Windows is netlink-based and shares code with the Linux one.
2.a) Kernel module (datapath)
-----------------------------
@@ -178,8 +177,8 @@ This is consistent with using a single datapath in the kernel on Linux. All the
physical adapters are connected as external adapters to the extensible switch.
When the OVS switch extension registers itself as a filter driver, it also
registers callbacks for the switch management and datapath functions. In other
words, when a switch is created on the Hyper-V root partition (host), the
registers callbacks for the switch/port management and datapath functions. In
other words, when a switch is created on the Hyper-V root partition (host), the
extension gets an activate callback upon which it can initialize the data
structures necessary for OVS to function. Similarly, there are callbacks for
when a port gets added to the Hyper-V switch, and an External Network adapter
@@ -190,7 +189,7 @@ packet is received on an external NIC.
As shown in the figures, an extensible switch extension gets to see a packet
sent by the VM (VIF) twice - once on the ingress path and once on the egress
path. Forwarding decisions are to be made on the ingress path. Correspondingly,
well be hooking onto the following interfaces:
we will be hooking onto the following interfaces:
* Ingress send indication: intercept packets for performing flow based
forwarding.This includes straight forwarding to output ports. Any packet
modifications needed to be performed are done here either inline or by
@@ -203,11 +202,41 @@ well be hooking onto the following interfaces:
Interfacing with OVS userspace
------------------------------
Weve implemented a pseudo device interface for letting OVS userspace talk to
We have implemented a pseudo device interface for letting OVS userspace talk to
the OVS kernel module. This is equivalent to the typical character device
interface on POSIX platforms. The pseudo device supports a whole bunch of
interface on POSIX platforms where we can register custom functions for read,
write and ioctl functionality. The pseudo device supports a whole bunch of
ioctls that netdev and DPIF on OVS userspace make use of.
Netlink message parser
----------------------
The communication between OVS userspace and OVS kernel datapath is in the form
of Netlink messages [1]. More details about this are provided in #2.c section,
kernel-userspace interface. In the kernel, a full fledged netlink message
parser has been implemented along the lines of the netlink message parser in
OVS userspace. In fact, a lot of the code is ported code.
On the lines of 'struct ofpbuf' in OVS userspace, a managed buffer has been
implemented in the kernel datapath to make it easier to parse and construct
netlink messages.
Netlink sockets
---------------
On Linux, OVS userspace utilizes netlink sockets to pass back and forth netlink
messages. Since much of userspace code including DPIF provider in
dpif-netlink.c (formerly dpif-linux.c) has been reused, pseudo-netlink sockets
have been implemented in OVS userspace. As it is known, Windows lacks native
netlink socket support, and also the socket family is not extensible either.
Hence it is not possible to provide a native implementation of netlink socket.
We emulate netlink sockets in lib/netlink-socket.c and support all of the nl_*
APIs to higher levels. The implementation opens a handle to the pseudo device
for each netlink socket. Some more details on this topic are provided in the
userspace section on netlink sockets.
Typical netlink semantics of read message, write message, dump, and transaction
have been implemented so that higher level layers are not affected by the
netlink implementation not being native.
Switch/Datapath management
--------------------------
As explained above, we hook onto the management callback functions in the NDIS
@@ -220,8 +249,19 @@ Port management
As explained above, we hook onto the management callback functions in the NDIS
interface to know when a port is added/connected to the Hyper-V switch. We use
these callbacks to initialize the port related data structures in OVS. Also,
some of the ports are tunnel ports that dont exist on the Hyper-V switch that
are initiated from OVS userspace.
some of the ports are tunnel ports that dont exist on the Hyper-V switch and
get added from OVS userspace.
In order to identify a Hyper-V port, we use the value of 'FriendlyName' field
in each Hyper-V port. We call this the "OVS-port-name". The idea is that OVS
userspace sets 'OVS-port-name' in each Hyper-V port to the same value as the
'name' field of the 'Interface' table in OVSDB. When OVS userspace calls into
the kernel datapath to add a port, we match the name of the port with the
'OVS-port-name' of a Hyper-V port.
We maintain separate hash tables, and separate counters for ports that have
been added from the Hyper-V switch, and for ports that have been added from OVS
userspace.
Flowtable/Actions/packet forwarding
-----------------------------------
@@ -267,48 +307,90 @@ used.
2.b) Userspace components
-------------------------
A new daemon has been added to userspace to manage the entities in OVSDB, and
also to keep it in sync with the kernel state, and this include bridges,
physical NICs, VIFs etc. For example, upon bootup, ovs-wind does a get on the
kernel to get a list of the bridges, and the corresponding ports and populates
OVSDB. If a new VIF gets added to the kernel switch because a user powered on a
Virtual Machine, ovs-wind detects it, and adds a corresponding entry in the
ovsdb. This implies that ovs-wind has a synchronous as well as an asynchronous
interface to the OVS kernel driver.
The userspace portion of the OVS solution is mostly POSIX code, and not very
Linux specific. Majority of the userspace code does not interface directly with
the kernel datapath and was ported independently of the kernel datapath
effort.
In this section, we cover the userspace components that interface with the
kernel datapath.
2.c) Kernel-Userspace interface
-------------------------------
DPIF-Windows
------------
DPIF-Windows is the Windows implementation of the interface defined in dpif-
provider.h, and provides an interface into the OVS kernel driver. We implement
most of the callbacks required by the DPIF provider. A quick summary of the
functionality implemented is as follows:
* dp_dump, dp_get: dump all datapath information or get information for a
particular datapath. Currently we only support one datapath.
* flow_dump, flow_put, flow_get, flow_flush: These functions retrieve all
flows in the kernel, add a flow to the kernel, get a specific flow and
delete all the flows in the kernel.
* recv_set, recv, recv_wait, recv_purge: these poll packets for upcalls.
* execute: This is used to send packets from userspace to the kernel. The
packets could be either flow miss packet punted from kernel earlier or
userspace generated packets.
* vport_dump, vport_get, ext_info: These functions dump all ports in the
kernel, get a specific port in the kernel, or get extended information
about a port.
* event_subscribe, wait, poll: These functions subscribe, wait and poll the
events that kernel posts. A typical example is kernel notices a port has
gone up/down, and would like to notify the userspace.
As explained earlier, OVS on Hyper-V shares the DPIF provider implementation
with Linux. The DPIF provider on Linux uses netlink sockets and netlink
messages. Netlink sockets and messages are extensively used on Linux to
exchange information between userspace and kernel. In order to satisfy these
dependencies, netlink socket (pseudo and non-native) and netlink messages
are implemented on Hyper-V.
The following are the major advantages of sharing DPIF provider code:
1. Maintenance is simpler:
Any change made to the interface defined in dpif-provider.h need not be
propagated to multiple implementations. Also, developers familiar with the
Linux implementation of the DPIF provider can easily ramp on the Hyper-V
implementation as well.
2. Netlink messages provides inherent advantages:
Netlink messages are known for their extensibility. Each message is
versioned, so the provided data structures offer a mechanism to perform
version checking and forward/backward compatibility with the kernel
module.
Netlink sockets
---------------
As explained in other sections, an emulation of netlink sockets has been
implemented in lib/netlink-socket.c for Windows. The implementation creates a
handle to the OVS pseudo device, and emulates netlink socket semantics of
receive message, send message, dump, and transact. Most of the nl_* functions
are supported.
The fact that the implementation is non-native manifests in various ways.
One example is that PID for the netlink socket is not automatically assigned in
userspace when a handle is created to the OVS pseudo device. There's an extra
command (defined in OvsDpInterfaceExt.h) that is used to grab the PID generated
in the kernel.
DPIF provider
--------------
As has been mentioned in earlier sections, the netlink socket and netlink
message based DPIF provider on Linux has been ported to Windows.
Correspondingly, the file is called lib/dpif-netlink.c now from its former
name of lib/dpif-linux.c.
Most of the code is common. Some divergence is in the code to receive
packets. The Linux implementation uses epoll() which is not natively supported
on Windows.
Netdev-Windows
--------------
We have a Windows implementation of the the interface defined in lib/netdev-
provider.h. The implementation provided functionality to get extended
information about an interface. It is limited in functionality compared to the
Linux implementation of the netdev provider and cannot be used to add any
interfaces in the kernel such as a tap interface.
We have a Windows implementation of the interface defined in
lib/netdev-provider.h. The implementation provides functionality to get
extended information about an interface. It is limited in functionality
compared to the Linux implementation of the netdev provider and cannot be used
to add any interfaces in the kernel such as a tap interface or to send/receive
packets. The netdev-windows implementation uses the datapath interface
extensions defined in:
datapath-windows/include/OvsDpInterfaceExt.h
Powershell extensions to set "OVS-port-name"
--------------------------------------------
As explained in the section on "Port management", each Hyper-V port has a
'FriendlyName' field, which we call as the "OVS-port-name" field. We have
implemented powershell command extensions to be able to set the "OVS-port-name"
of a Hyper-V port.
2.c) Kernel-Userspace interface
-------------------------------
openvswitch.h and OvsDpInterfaceExt.h
-------------------------------------
Since the DPIF provider is shared with Linux, the kernel datapath provides the
same interface as the Linux datapath. The interface is defined in
datapath/linux/compat/include/linux/openvswitch.h. Derivatives of this
interface file are created during OVS userspace compilation. The derivative for
the kernel datapath on Hyper-V is provided in the following location:
datapath-windows/include/OvsDpInterface.h
That said, there are Windows specific extensions that are defined in the
interface file:
datapath-windows/include/OvsDpInterfaceExt.h
2.d) Flow of a packet
---------------------
@@ -354,9 +436,9 @@ driver.
Reference list:
===============
1: Hyper-V Extensible Switch
1. Hyper-V Extensible Switch
http://msdn.microsoft.com/en-us/library/windows/hardware/hh598161(v=vs.85).aspx
2: Hyper-V Extensible Switch Extensions
2. Hyper-V Extensible Switch Extensions
http://msdn.microsoft.com/en-us/library/windows/hardware/hh598169(v=vs.85).aspx
3. DPIF Provider
http://openvswitch.sourcearchive.com/documentation/1.1.0-1/dpif-
@@ -369,3 +451,7 @@ http://msdn.microsoft.com/en-us/library/windows/desktop/aa366510(v=vs.85).aspx
http://msdn.microsoft.com/en-us/library/windows/hardware/ff557015(v=vs.85).aspx
7. How to Port Open vSwitch to New Software or Hardware
http://git.openvswitch.org/cgi-bin/gitweb.cgi?p=openvswitch;a=blob;f=PORTING
8. Netlink
http://en.wikipedia.org/wiki/Netlink
9. epoll
http://en.wikipedia.org/wiki/Epoll