2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-22 01:51:26 +00:00
Ilya Maximets 0d9dc8e9ca dpif-netlink: Provide original upcall pid in 'execute' commands.
When a packet enters kernel datapath and there is no flow to handle it,
packet goes to userspace through a MISS upcall.  With per-CPU upcall
dispatch mechanism, we're using the current CPU id to select the
Netlink PID on which to send this packet.  This allows us to send
packets from the same traffic flow through the same handler.

The handler will process the packet, install required flow into the
kernel and re-inject the original packet via OVS_PACKET_CMD_EXECUTE.

While handling OVS_PACKET_CMD_EXECUTE, however, we may hit a
recirculation action that will pass the (likely modified) packet
through the flow lookup again.  And if the flow is not found, the
packet will be sent to userspace again through another MISS upcall.

However, the handler thread in userspace is likely running on a
different CPU core, and the OVS_PACKET_CMD_EXECUTE request is handled
in the syscall context of that thread.  So, when the time comes to
send the packet through another upcall, the per-CPU dispatch will
choose a different Netlink PID, and this packet will end up processed
by a different handler thread on a different CPU.

The process continues as long as there are new recirculations, each
time the packet goes to a different handler thread before it is sent
out of the OVS datapath to the destination port.  In real setups the
number of recirculations can go up to 4 or 5, sometimes more.

There is always a chance to re-order packets while processing upcalls,
because userspace will first install the flow and then re-inject the
original packet.  So, there is a race window when the flow is already
installed and the second packet can match it inside the kernel and be
forwarded to the destination before the first packet is re-injected.
But the fact that packets are going through multiple upcalls handled
by different userspace threads makes the reordering noticeably more
likely, because we not only have a race between the kernel and a
userspace handler (which is hard to avoid), but also between multiple
userspace handlers.

For example, let's assume that 10 packets got enqueued through a MISS
upcall for handler-1, it will start processing them, will install the
flow into the kernel and start re-injecting packets back, from where
they will go through another MISS to handler-2.  Handler-2 will install
the flow into the kernel and start re-injecting the packets, while
handler-1 continues to re-inject the last of the 10 packets, they will
hit the flow installed by handler-2 and be forwarded without going to
the handler-2, while handler-2 still re-injects the first of these 10
packets.  Given multiple recirculations and misses, these 10 packets
may end up completely mixed up on the output from the datapath.

Let's provide the original upcall PID via the new netlink attribute
OVS_PACKET_ATTR_UPCALL_PID.  This way the upcall triggered during the
execution will go to the same handler.  Packets will be enqueued to
the same socket and re-injected in the same order.  This doesn't
eliminate re-ordering as stated above, since we still have a race
between the kernel and the handler thread, but it allows to eliminate
races between multiple handlers.

The openvswitch kernel module ignores unknown attributes for the
OVS_PACKET_CMD_EXECUTE, so it's safe to provide it even on older
kernels.

Reported-at: https://issues.redhat.com/browse/FDP-1479
Link: https://lore.kernel.org/netdev/20250702155043.2331772-1-i.maximets@ovn.org/
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-07-10 12:20:54 +02:00
2019-09-06 14:54:58 -07:00
2025-04-23 14:24:55 -04:00
2022-07-15 13:45:55 +02:00
2024-10-11 16:44:18 +01:00
2025-06-13 14:09:11 -04:00
2025-01-17 00:25:28 +01:00
2017-03-17 15:20:35 -07:00
2023-08-14 12:15:03 +02:00

.. NOTE(stephenfin): If making changes to this file, ensure that the
   start-after/end-before lines found in 'Documentation/intro/what-is-ovs'
   are kept up-to-date.

============
Open vSwitch
============

.. image:: https://github.com/openvswitch/ovs/workflows/Build%20and%20Test/badge.svg
    :target: https://github.com/openvswitch/ovs/actions
.. image:: https://ci.appveyor.com/api/projects/status/github/openvswitch/ovs?branch=main&svg=true&retina=true
    :target: https://ci.appveyor.com/project/blp/ovs/history
.. image:: https://api.cirrus-ci.com/github/openvswitch/ovs.svg
    :target: https://cirrus-ci.com/github/openvswitch/ovs
.. image:: https://readthedocs.org/projects/openvswitch/badge/?version=latest
    :target: https://docs.openvswitch.org/en/latest/

What is Open vSwitch?
---------------------

Open vSwitch is a multilayer software switch licensed under the open source
Apache 2 license.  Our goal is to implement a production quality switch
platform that supports standard management interfaces and opens the forwarding
functions to programmatic extension and control.

Open vSwitch is well suited to function as a virtual switch in VM environments.
In addition to exposing standard control and visibility interfaces to the
virtual networking layer, it was designed to support distribution across
multiple physical servers.  Open vSwitch supports multiple Linux-based
virtualization technologies including KVM, and VirtualBox.

The bulk of the code is written in platform-independent C and is easily ported
to other environments.  The current release of Open vSwitch supports the
following features:

- Standard 802.1Q VLAN model with trunk and access ports
- NIC bonding with or without LACP on upstream switch
- NetFlow, sFlow(R), and mirroring for increased visibility
- QoS (Quality of Service) configuration, plus policing
- Geneve, GRE, VXLAN, ERSPAN, GTP-U, SRv6, and Bareudp tunneling
- 802.1ag connectivity fault management
- OpenFlow 1.0 plus numerous extensions
- Transactional configuration database with C and Python bindings
- High-performance forwarding using a Linux kernel module

Open vSwitch can also operate entirely in userspace without assistance from
a kernel module.  This userspace implementation should be easier to port than
the kernel-based switch. OVS in userspace can access Linux or DPDK devices.
Note Open vSwitch with userspace datapath and non DPDK devices is considered
experimental and comes with a cost in performance.

What's here?
------------

The main components of this distribution are:

- ovs-vswitchd, a daemon that implements the switch, along with a companion
  Linux kernel module for flow-based switching.
- ovsdb-server, a lightweight database server that ovs-vswitchd queries to
  obtain its configuration.
- ovs-dpctl, a tool for configuring the switch kernel module.
- Scripts and specs for building RPMs for Red Hat Enterprise Linux and
  deb packages for Ubuntu/Debian.
- ovs-vsctl, a utility for querying and updating the configuration of
  ovs-vswitchd.
- ovs-appctl, a utility that sends commands to running Open vSwitch daemons.

Open vSwitch also provides some tools:

- ovs-ofctl, a utility for querying and controlling OpenFlow switches and
  controllers.
- ovs-pki, a utility for creating and managing the public-key infrastructure
  for OpenFlow switches.
- ovs-testcontroller, a simple OpenFlow controller that may be useful for
  testing (though not for production).
- A patch to tcpdump that enables it to parse OpenFlow messages.

What other documentation is available?
--------------------------------------

.. TODO(stephenfin): Update with a link to the hosting site of the docs, once
   we know where that is

To install Open vSwitch on a regular Linux or FreeBSD host, please read the
`installation guide <Documentation/intro/install/general.rst>`__. For specifics
around installation on a specific platform, refer to one of the `other
installation guides <Documentation/intro/install/index.rst>`__

For answers to common questions, refer to the `FAQ <Documentation/faq>`__.

To learn about some advanced features of the Open vSwitch software switch, read
the `tutorial <Documentation/tutorials/ovs-advanced.rst>`__.

Each Open vSwitch userspace program is accompanied by a manpage.  Many of the
manpages are customized to your configuration as part of the build process, so
we recommend building Open vSwitch before reading the manpages.

License
-------

The following is a summary of the licensing of files in this distribution.
As mentioned, Open vSwitch is licensed under the open source Apache 2 license.
Some files may be marked specifically with a different license, in which case
that license applies to the file in question.


Files under the datapath directory are licensed under the GNU General Public
License, version 2.

File build-aux/cccl is licensed under the GNU General Public License, version 2.

The following files are licensed under the 2-clause BSD license.
    include/windows/getopt.h
    lib/getopt_long.c
    lib/conntrack-tcp.c

The following files are licensed under the 3-clause BSD-license
    include/windows/netinet/icmp6.h
    include/windows/netinet/ip6.h
    lib/strsep.c

Files lib/sflow*.[ch] are licensed under the terms of either the
Sun Industry Standards Source License 1.1, that is available at:
        http://host-sflow.sourceforge.net/sissl.html
or the InMon sFlow License, that is available at:
        http://www.inmon.com/technology/sflowlicense.txt

Contact
-------

bugs@openvswitch.org
Description
No description provided
Readme 174 MiB
Languages
C 87.6%
Python 7.7%
Roff 1.9%
Shell 1.3%
M4 0.7%
Other 0.7%