2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-22 01:51:26 +00:00
ovs/NEWS

1871 lines
96 KiB
Plaintext
Raw Normal View History

Post-v3.0.0
--------------------
- ovs-vswitchd now detects changes in CPU affinity and adjusts the number
of handler and revalidator threads if necessary.
netdev-afxdp: Allow building with libxdp and newer libbpf. AF_XDP functions was deprecated in libbpf 0.7 and moved to libxdp. Functions bpf_get/set_link_xdp_id() was deprecated in libbpf 0.8 and replaced with bpf_xdp_query_id() and bpf_xdp_attach/detach(). Updating configuration and source code to accommodate above changes and allow building OVS with AF_XDP support on newer systems: - Checking the version of libbpf by detecting availability of bpf_xdp_detach. - Checking availability of the libxdp in a system by looking for a library providing libxdp_strerror(), if libbpf is newer than 0.6. And checking for xsk.h header provided by libxdp-dev[el]. - Use xsk.h from libbpf if it is older than 0.7 and not linking with libxdp in this case as there are known incompatible versions of libxdp in distributions. - Check for the NEED_WAKEUP feature replaced with direct checking in the source code if XDP_USE_NEED_WAKEUP is defined. - Checking availability of bpf_xdp_query_id and bpf_xdp_detach and using them instead of deprecated APIs. Fall back to old functions if not found. - Dropped LIBBPF_LDADD variable as it makes library and function detection much harder without providing any actual benefits. AC_SEARCH_LIBS is used instead and it allows use of AC_CHECK_FUNCS. - Header includes moved around to files where they are actually used. - Removed libelf dependency as it is not really used. With these changes it should be possible to build OVS with either: - libbpf built from the kernel sources (5.19 or older). - libbpf < 0.7 provided in distributions. - libxdp and libbpf >= 0.7 provided in newer distributions. While it is technically possible to build with libbpf 0.7+ without libxdp at the moment we're not allowing that for a few reasons. First, required functions in libbpf are deprecated and can be removed in future releases. Second, support for all these combinations makes the detection code fairly complex. AFAIK, most of the distributions packaging libbpf 0.7+ do package libxdp as well. libxdp added as a build dependency for Fedora build since all supported versions of Fedora are packaging this library. Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-22 01:06:20 +01:00
- AF_XDP:
* Added support for building with libxdp and libbpf >= 0.7.
* Support for AF_XDP is now enabled by default if all dependencies are
available at the build time. Use --disable-afxdp to disable.
Use --enable-afxdp to fail the build if dependencies are not present.
- ovs-appctl:
* "ovs-appctl ofproto/trace" command can now display port names with the
"--names" option.
- OVSDB-IDL:
* Add the support to specify the persistent uuid for row insert in both
C and Python IDLs.
- Windows:
* Conntrack IPv6 fragment support.
- DPDK:
* Add support for DPDK 22.11.1.
- For the QoS max-rate and STP/RSTP path-cost configuration OVS now assumes
10 Gbps link speed by default in case the actual link speed cannot be
determined. Previously it was 10 Mbps. Values can still be overridden
by specifying 'max-rate' or '[r]stp-path-cost' accordingly.
- ovs-ctl:
* New option '--dump-hugepages' to include hugepages in core dumps. This
can assist with postmortem analysis involving DPDK, but may also produce
significantly larger core dump files.
- Support for travis-ci.org based continuous integration builds has been
dropped.
- Userspace datapath:
* Add '-secs' argument to appctl 'dpif-netdev/pmd-rxq-show' to show
the pmd usage of an Rx queue over a configurable time period.
* Add new experimental PMD load based sleeping feature. PMD threads can
request to sleep up to a user configured 'pmd-maxsleep' value under
low load conditions.
v3.0.0 - 15 Aug 2022
--------------------
- libopenvswitch API change:
* To fix the Undefined Behavior issue causing the compiler to incorrectly
optimize important parts of code, container iteration macros (e.g.,
LIST_FOR_EACH) have been re-implemented in a UB-safe way.
* Backwards compatibility has mostly been preserved, however the
user-provided pointer is now set to NULL after the loop (unless it
exited via "break;")
* Users of libopenvswitch will need to double-check the use of such loop
macros before compiling with a new version.
* Since the change is limited to the definitions within the headers, the
ABI is not affected.
ovsdb: relay: Add transaction history support. Even though relays can be scaled to the big number of servers to handle a lot more clients, lack of transaction history may cause significant load if clients are re-connecting. E.g. in case of the upgrade of a large-scale OVN deployment, relays can be taken down one by one forcing all the clients of one relay to jump to other ones. And all these clients will download the database from scratch from a new relay. Since relay itself supports monitor_cond_since connection to the main cluster, it receives the last transaction id along with each update. Since these transaction ids are 'eid's of actual transactions, they can be used by relay for a transaction history. Relay may not receive all the transaction ids, because the main cluster may combine several changes into a single monitor update. However, all relays will, likely, receive same updates with the same transaction ids, so the case where transaction id can not be found after re-connection between relays should not be very common. If some id is missing on the relay (i.e. this update was merged with some other update and newer id was used) the client will just re-download the database as if there was a normal transaction history miss. OVSDB client synchronization module updated to provide the last transaction id along with the update. Relay module updated to use these ids as a transaction id. If ids are zero, relay decides that the main server doesn't support transaction ids and disables the transaction history accordingly. Using ovsdb_txn_replay_commit() instead of ovsdb_txn_propose_commit_block(), so transactions are added to the history. This can be done, because relays has no file storage, so there is no need to write anything. Relay tests modified to test both standalone and clustered database as a main server. Checks added to ensure that all servers receive the same transaction ids in monitor updates. Acked-by: Mike Pattrick <mkp@redhat.com> Acked-by: Han Zhou <hzhou@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-12-19 15:09:39 +01:00
- OVSDB:
* 'relay' service model now supports transaction history, i.e. honors the
'last-txn-id' field in 'monitor_cond_since' requests from clients.
* New unixctl command 'ovsdb-server/tlog-set DB:TABLE on|off".
If turned on, ovsdb-server will log (at level INFO and rate limited)
all operations that are committed to table TABLE in the DB database.
* New Local_Config schema added to support Connections (--remote)
configuration in a clustered databse independently for each server.
E.g. for listening on unique addresses. See the ovsdb.local-config.5
manpage for schema details.
* Returning unused memory to the OS after the database compaction is now
enabled by default. Use 'ovsdb-server/memory-trim-on-compaction off'
unixctl command to disable.
ovsdb: Prepare snapshot JSON in a separate thread. Conversion of the database data into JSON object, serialization and destruction of that object are the most heavy operations during the database compaction. If these operations are moved to a separate thread, the main thread can continue processing database requests in the meantime. With this change, the compaction is split in 3 phases: 1. Initialization: - Create a copy of the database. - Remember current database index. - Start a separate thread to convert a copy of the database into serialized JSON object. 2. Wait: - Continue normal operation until compaction thread is done. - Meanwhile, compaction thread: * Convert database copy to JSON. * Serialize resulted JSON. * Destroy original JSON object. 3. Finish: - Destroy the database copy. - Take the snapshot created by the thread. - Write on disk. The key for this schema to be fast is the ability to create a shallow copy of the database. This doesn't take too much time allowing the thread to do most of work. Database copy is created and destroyed only by the main thread, so there is no need for synchronization. Such solution allows to reduce the time main thread is blocked by compaction by 80-90%. For example, in ovn-heater tests with 120 node density-heavy scenario, where compaction normally takes 5-6 seconds at the end of a test, measured compaction times was all below 1 second with the change applied. Also, note that these measured times are the sum of phases 1 and 3, so actual poll intervals are about half a second in this case. Only implemented for raft storage for now. The implementation for standalone databases can be added later by using a file offset as a database index and copying newly added changes from the old file to a new one during ovsdb_log_replace(). Reported-at: https://bugzilla.redhat.com/2069108 Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-07-01 01:34:07 +02:00
* Most of the work for the automatic database compaction in clustered
mode has been moved to a separate thread to avoid blocking the process.
ovsdb-idl: Support write-only-changed IDL monitor mode. At a first glance, change tracking should never be allowed for write-only columns. However, some clients (e.g., ovn-northd) that are mostly exclusive writers of a database, use change tracking to avoid duplicating the IDL row records into a local cache when implementing incremental processing. The default behavior of the IDL is to automatically turn a write-only column into a read-write column whenever the client enables change tracking for that column. For the afore mentioned clients, this becomes a performance issue. Commit 1cc618c32524 ("ovsdb-idl: Fix atomicity of writes that don't change a column's value.") explains why writes that don't change a column's value cannot be optimized out early if the column is read/write. Furthermore, if there is at least one record in any table that changed during a transaction, then *all* records that have been written are added to the transaction, even if their values didn't change. If there are many such rows (e.g., like in ovn-northd's case) this incurs a significant overhead because: a. the client has to build this large transaction b. the transaction has to be sent over the network c. the server needs to parse this (mostly) no-op update We now introduce new IDL APIs allowing users to set a new monitoring mode flag, OVSDB_IDL_WRITE_CHANGED_ONLY, to indicate to the IDL that the atomicity constraints may be relaxed and written columns that don't change value can be skipped from the current transaction. We benchmarked ovn-northd performance when using this new mode against NB and SB databases taken from ovn-kubernetes scale tests. We noticed that when a minor change is performed to the Northbound database (e.g., NB_Global.nb_cfg is incremented) the time it takes to build the Southbound transaction becomes negligible (vs ~1.5 seconds before this change). End-to-end ovn-kubernetes scale tests on 120-node clusters also show significant reduction of latency to bring up pods; both average and P99 latency decreased by ~30%. Acked-by: Han Zhou <hzhou@ovn.org> Signed-off-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-04-26 12:37:08 +02:00
- OVSDB-IDL:
* New monitor mode flag, OVSDB_IDL_WRITE_CHANGED_ONLY, allowing
applications to relax atomicity requirements when dealing with
columns whose value has been rewritten (but not changed).
2021-12-08 18:05:22 -05:00
- OpenFlow:
ofp-monitor: Support flow monitoring for OpenFlow 1.3, 1.4+. Extended OpenFlow monitoring support * OpenFlow 1.3 with ONF extensions * OpenFlow 1.4+ as defined in OpenFlow specification 1.4+. ONF extensions are similar to Nicira extensions except for onf_flow_monitor_request{} where out_port is defined as 32-bit number OF(1.1) number, oxm match formats are used in update and request messages. Flow monitoring support in 1.4+ is slightly different from Nicira and ONF extensions. * More flow monitoring flags are defined. * Monitor add/modify/delete command is introduced in flow_monitor request message. * Addition of out_group as part of flow_monitor request message Description of changes: 1. Generate ofp-msgs.inc to be able to support 1.3, 1.4+ flow Monitoring messages. include/openvswitch/ofp-msgs.h 2. Modify openflow header files with protocol specific headers. include/openflow/openflow-1.3.h include/openflow/openflow-1.4.h 3. Modify OvS abstraction of openflow headers. ofp-monitor.h leverages enums from on nicira extensions for creating protocol abstraction headers. OF(1.4+) enums are superset of nicira extensions. include/openvswitch/ofp-monitor.h 4. Changes to these files reflect encoding and decoding of new protocol messages. lib/ofp-monitor.c 5. Changes to modules using ofp-monitor APIs. Most of the changes here are to migrate enums from nicira to OF 1.4+ versions. ofproto/connmgr.c ofproto/connmgr.h ofproto/ofproto-provider.h ofproto/ofproto.c 6. Extended protocol decoding tests to verify all protocol versions FLOW_MONITOR_CANCEL FLOW_MONITOR_PAUSED FLOW_MONITOR_RESUMED FLOW_MONITOR request FLOW_MONITOR reply tests/ofp-print.at 7. Modify flow monitoring tests to be able executed by all protocol versions. tests/ofproto.at 7. Modified documentation highlighting the change utilities/ovs-ofctl.8.in NEWS Signed-off-by: Vasu Dasari <vdasari@gmail.com> Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-June/383915.html Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-12-08 18:05:23 -05:00
* Extended Flow Monitoring support for all supported OpenFlow versions:
OpenFlow versions 1.0-1.2 with Nicira Extensions
OpenFlow versions 1.3 with Open Network Foundation extension
OpenFlow versions 1.4+, as defined in the OpenFlow specification
- Python:
* Added a new flow parsing library ovs.flow capable of parsing
both OpenFlow and datapath flows.
- IPsec:
* Added support for custom per-tunnel options via 'options:ipsec_*' knobs.
See Documentation/tutorials/ipsec.rst for details.
- Windows:
* Conntrack support for TCPv6, UDPv6, ICMPv6, FTPv6.
* IPv6 Geneve tunnel support.
- DPDK:
* OVS validated with DPDK 21.11.1. It is recommended to use this version
until further releases.
* Delay creating or reusing a mempool for vhost ports until the VM
is started. A failure to create a mempool will now be logged only
when the VM is started.
netdev-dpdk: Add shared mempool config. Mempools may currently be shared between DPDK ports based on port MTU and NUMA. With some hint from the user we can increase the sharing on MTU and hence reduce memory consumption in many cases. For example, a port with MTU 9000, uses a mempool with an mbuf size based on 9000 MTU. A port with MTU 1500, uses a different mempool with an mbuf size based on 1500 MTU. In this case, assuming same NUMA, both these ports could share the 9000 MTU mempool. The user must give a hint as order of creation of ports and setting of MTUs may vary and we need to ensure that upgrades from older OVS versions do not require more memory. This scheme can also prevent multiple mempools being created for cases where a port is added picking up a default MTU and an appropriate mempool, but later has it's MTU changed to a different value requiring a different mempool. Example usage: $ ovs-vsctl --no-wait set Open_vSwitch . \ other_config:shared-mempool-config=9000,1500:1,6000:1 Port added on NUMA 0: * MTU 1500, use mempool based on 9000 MTU * MTU 5000, use mempool based on 9000 MTU * MTU 9000, use mempool based on 9000 MTU * MTU 9300, use mempool based on 9300 MTU (existing behaviour) Port added on NUMA 1: * MTU 1500, use mempool based on 1500 MTU * MTU 5000, use mempool based on 6000 MTU * MTU 9000, use mempool based on 9000 MTU * MTU 9300, use mempool based on 9300 MTU (existing behaviour) Default behaviour is unchanged and mempools are still only created when needed. Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2022-06-24 11:13:23 +01:00
* New configuration knob 'other_config:shared-mempool-config' to set MTU
that shared mempool mbuf size is based on. This allows interfaces with
different MTU sizes to share mempools.
- Userspace datapath:
* Improved multi-thread scalability of the userspace connection tracking.
* 'dpif-netdev/subtable-lookup-prio-get' appctl command renamed to
'dpif-netdev/subtable-lookup-info-get' to better reflect its purpose.
The old variant is kept for backward compatibility.
* Add actions auto-validator function to compare different actions
implementations against default implementation.
* Add command line option to switch between different actions
implementations available at run time.
* Add build time configure command to enable auto-validator as default
actions implementation at build time.
* Add AVX512 implementation of actions.
- Debian packaging updated to be on par with package source in Debian/Ubuntu.
* Provided an openvswitch-switch-dpdk package that integrates with the
dpdk package in the distributions so that end users can opt into a
DPDK-enabled Open vSwitch binary.
* Provided systemd service files.
* Provided openvswitch-source package for reproducible integrated build of
for example OVN.
* Shared library and subsequently libopenvswitch and libopenvswitch-dev
binary packages are no longer built.
- Linux TC offload:
* Add support for offloading meters via tc police.
* Add support for offloading the check_pkt_len action.
- New configuration knob 'other_config:all-members-active' for
balance-slb bonds.
- Previously deprecated Linux kernel module is now fully removed from
the OVS source tree. The version provided with the Linux kernel
should be used instead.
- XenServer: Support for integration with XenServer has been removed due to
lack of maintenance and bitrot.
v2.17.0 - 17 Feb 2022
---------------------
dpif-netdev: Forwarding optimization for flows with a simple match. There are cases where users might want simple forwarding or drop rules for all packets received from a specific port, e.g :: "in_port=1,actions=2" "in_port=2,actions=IN_PORT" "in_port=3,vlan_tci=0x1234/0x1fff,actions=drop" "in_port=4,actions=push_vlan:0x8100,set_field:4196->vlan_vid,output:3" There are also cases where complex OpenFlow rules can be simplified down to datapath flows with very simple match criteria. In theory, for very simple forwarding, OVS doesn't need to parse packets at all in order to follow these rules. "Simple match" lookup optimization is intended to speed up packet forwarding in these cases. Design: Due to various implementation constraints userspace datapath has following flow fields always in exact match (i.e. it's required to match at least these fields of a packet even if the OF rule doesn't need that): - recirc_id - in_port - packet_type - dl_type - vlan_tci (CFI + VID) - in most cases - nw_frag - for ip packets Not all of these fields are related to packet itself. We already know the current 'recirc_id' and the 'in_port' before starting the packet processing. It also seems safe to assume that we're working with Ethernet packets. So, for the simple OF rule we need to match only on 'dl_type', 'vlan_tci' and 'nw_frag'. 'in_port', 'dl_type', 'nw_frag' and 13 bits of 'vlan_tci' can be combined in a single 64bit integer (mark) that can be used as a hash in hash map. We are using only VID and CFI form the 'vlan_tci', flows that need to match on PCP will not qualify for the optimization. Workaround for matching on non-existence of vlan updated to match on CFI and VID only in order to qualify for the optimization. CFI is always set by OVS if vlan is present in a packet, so there is no need to match on PCP in this case. 'nw_frag' takes 2 bits of PCP inside the simple match mark. New per-PMD flow table 'simple_match_table' introduced to store simple match flows only. 'dp_netdev_flow_add' adds flow to the usual 'flow_table' and to the 'simple_match_table' if the flow meets following constraints: - 'recirc_id' in flow match is 0. - 'packet_type' in flow match is Ethernet. - Flow wildcards contains only minimal set of non-wildcarded fields (listed above). If the number of flows for current 'in_port' in a regular 'flow_table' equals number of flows for current 'in_port' in a 'simple_match_table', we may use simple match optimization, because all the flows we have are simple match flows. This means that we only need to parse 'dl_type', 'vlan_tci' and 'nw_frag' to perform packet matching. Now we make the unique flow mark from the 'in_port', 'dl_type', 'nw_frag' and 'vlan_tci' and looking for it in the 'simple_match_table'. On successful lookup we don't need to run full 'miniflow_extract()'. Unsuccessful lookup technically means that we have no suitable flow in the datapath and upcall will be required. So, in this case EMC and SMC lookups are disabled. We may optimize this path in the future by bypassing the dpcls lookup too. Performance improvement of this solution on a 'simple match' flows should be comparable with partial HW offloading, because it parses same packet fields and uses similar flow lookup scheme. However, unlike partial HW offloading, it works for all port types including virtual ones. Performance results when compared to EMC: Test setup: virtio-user OVS virtio-user Testpmd1 ------------> pmd1 ------------> Testpmd2 (txonly) x<------ pmd2 <------------ (mac swap) Single stream of 64byte packets. Actions: in_port=vhost0,actions=vhost1 in_port=vhost1,actions=vhost0 Stats collected from pmd1 and pmd2, so there are 2 scenarios: Virt-to-Virt : Testpmd1 ------> pmd1 ------> Testpmd2. Virt-to-NoCopy : Testpmd2 ------> pmd2 --->x Testpmd1. Here the packet sent from pmd2 to Testpmd1 is always dropped, because the virtqueue is full since Testpmd1 is in txonly mode and doesn't receive any packets. This should be closer to the performance of a VM-to-Phy scenario. Test performed on machine with Intel Xeon CPU E5-2690 v4 @ 2.60GHz. Table below represents improvement in throughput when compared to EMC. +----------------+------------------------+------------------------+ | | Default (-g -O2) | "-Ofast -march=native" | | Scenario +------------+-----------+------------+-----------+ | | GCC | Clang | GCC | Clang | +----------------+------------+-----------+------------+-----------+ | Virt-to-Virt | +18.9% | +25.5% | +10.8% | +16.7% | | Virt-to-NoCopy | +24.3% | +33.7% | +14.9% | +22.0% | +----------------+------------+-----------+------------+-----------+ For Phy-to-Phy case performance improvement should be even higher, but it's not the main use-case for this functionality. Performance difference for the non-simple flows is within a margin of error. Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-08-09 14:57:52 +02:00
- Userspace datapath:
* Optimized flow lookups for datapath flows with simple match criteria.
See 'Simple Match Lookup' in Documentation/topics/dpdk/bridge.rst.
* New per-interface configuration knob 'other_config:tx-steering'. If set
to 'hash', enables hash-based Tx packet steering mode to utilize all the
Tx queues of the interface regardles of the number of PMD threads.
* Removed experimental tag for PMD Auto Load Balance.
* New configuration knob 'other_config:n-offload-threads' to change the
number of HW offloading threads.
- DPDK:
* EAL argument --socket-mem is no longer configured by default upon
start-up. If dpdk-socket-mem and dpdk-alloc-mem are not specified,
DPDK defaults will be used.
* EAL argument --socket-limit no longer takes on the value of --socket-mem
by default. 'other_config:dpdk-socket-limit' can be set equal to
the 'other_config:dpdk-socket-mem' to preserve the legacy memory
limiting behavior.
* EAL argument --in-memory is applied by default if supported.
* Add hardware offload support for matching IPv4/IPv6 frag types
(experimental).
* Add hardware offload support for GRE flows (experimental).
Available only if DPDK experimental APIs enabled during the build.
* Add support for DPDK 21.11.
* Forbid use of DPDK multiprocess feature.
* Add support for running threads on cores >= RTE_MAX_LCORE.
- Python:
* For SSL support, the use of the pyOpenSSL library has been replaced
with the native 'ssl' module.
- OVSDB:
* Python library for OVSDB clients now also supports faster
resynchronization with a clustered database after a brief disconnection,
i.e. 'monitor_cond_since' monitoring method.
* Major improvement in the performance of the OVSDB server. See the
"OVSDB: Performance and Scale Journey '21" talk of OVS+OVN Conf'21.
- ovs-dpctl and 'ovs-appctl dpctl/':
* New commands 'cache-get-size' and 'cache-set-size' that allows to
get or configure linux kernel datapath cache sizes.
- ovs-ofctl dump-flows no longer prints "igmp". Instead the flag
"ip,nw_proto=2" is used.
- ovs-appctl:
* New command tnl/neigh/aging to read/write the neigh aging time.
- OpenFlow:
* Default selection method for select groups with up to 256 buckets is
now dp_hash. Previously this was limited to 64 buckets. This change
is mainly for the benefit of OVN load balancing configurations.
2021-11-29 11:52:05 +05:30
* Encap & Decap action support for MPLS packet type.
- Ingress policing on Linux now uses 'matchall' classifier instead of
'basic', if available.
- Add User Statically-Defined Tracing (USDT) probe framework support.
v2.16.0 - 16 Aug 2021
---------------------
- Removed support for 1024-bit Diffie-Hellman key exchange, which is now
considered unsafe.
- Ingress Policing:
* Rate limiting configuration now supports setting packet-per-second
limits in addition to the previously configurable byte rate settings.
This is not supported in the userspace datapath yet.
- OVSDB:
* Introduced new database service model - "relay". Targeted to scale out
read-mostly access (ovn-controller) to existing databases.
For more information: ovsdb(7) and Documentation/topics/ovsdb-relay.rst
* New command line options --record/--replay for ovsdb-server and
ovsdb-client to record and replay all the incoming transactions,
monitors, etc. More datails in Documentation/topics/record-replay.rst.
* The Python Idl class now has a cooperative_yield() method that can be
overridden by an application that uses eventlet / gevent / asyncio with
the desired yield method (e.g. {eventlet,gevent,asyncio}.sleep(0)) to
prevent the application from being blocked for a long time while
processing database updates.
- In ovs-vsctl and vtep-ctl, the "find" command now accept new
operators {in} and {not-in}.
- Userspace datapath:
* Auto load balancing of PMDs now partially supports cross-NUMA polling
cases, e.g if all PMD threads are running on the same NUMA node.
* Userspace datapath now supports up to 2^18 meters.
* Added support for systems with non-contiguous NUMA nodes and core ids.
conntrack: Handle SNAT with all-zero IP address. This patch introduces for the userspace datapath the handling of rules like the following: ct(commit,nat(src=0.0.0.0),...) Kernel datapath already handle this case that is particularly handy in scenarios like the following: Given A: 10.1.1.1, B: 192.168.2.100, C: 10.1.1.2 A opens a connection toward B on port 80 selecting as source port 10000. B's IP gets dnat'ed to C's IP (10.1.1.1:10000 -> 192.168.2.100:80). This will result in: tcp,orig=(src=10.1.1.1,dst=192.168.2.100,sport=10000,dport=80), reply=(src=10.1.1.2,dst=10.1.1.1,sport=80,dport=10000), protoinfo=(state=ESTABLISHED) A now tries to establish another connection with C using source port 10000, this time using C's IP address (10.1.1.1:10000 -> 10.1.1.2:80). This second connection, if processed by conntrack with no SNAT/DNAT involved, collides with the reverse tuple of the first connection, so the entry for this valid connection doesn't get created. With this commit, and adding a SNAT rule with 0.0.0.0 for 10.1.1.1:10000 -> 10.1.1.2:80 will allow to create the conn entry: tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=10000,dport=80), reply=(src=10.1.1.2,dst=10.1.1.1,sport=80,dport=10001), protoinfo=(state=ESTABLISHED) tcp,orig=(src=10.1.1.1,dst=192.168.2.100,sport=10000,dport=80), reply=(src=10.1.1.2,dst=10.1.1.1,sport=80,dport=10000), protoinfo=(state=ESTABLISHED) The issue exists even in the opposite case (with A trying to connect to C using B's IP after establishing a direct connection from A to C). This commit refactors the relevant function in a way that both of the previously mentioned cases are handled as well. Suggested-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Paolo Valerio <pvalerio@redhat.com> Acked-by: Gaetan Rivet <grive@u256.net> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-07-06 15:03:18 +02:00
* Added all-zero IP SNAT handling to conntrack. In case of collision,
using ct(src=0.0.0.0), the source port will be replaced with another
non-colliding port in the ephemeral range (1024, 65535).
dpif-netdev: Refactor to multiple header files. Split the very large file dpif-netdev.c and the datastructures it contains into multiple header files. Each header file is responsible for the datastructures of that component. This logical split allows better reuse and modularity of the code, and reduces the very large file dpif-netdev.c to be more managable. Due to dependencies between components, it is not possible to move component in smaller granularities than this patch. To explain the dependencies better, eg: DPCLS has no deps (from dpif-netdev.c file) FLOW depends on DPCLS (struct dpcls_rule) DFC depends on DPCLS (netdev_flow_key) and FLOW (netdev_flow_key) THREAD depends on DFC (struct dfc_cache) DFC_PROC depends on THREAD (struct pmd_thread) DPCLS lookup.h/c require only DPCLS DPCLS implementations require only dpif-netdev-lookup.h. - This change was made in 2.12 release with function pointers - This commit only refactors the name to "private-dpcls.h" netdev_flow_key_equal_mf() is renamed to emc_flow_key_equal_mf(). Rename functions specific to dpcls from netdev_* namespace to the dpcls_* namespace, as they are only used by dpcls code. 'inline' is added to the dp_netdev_flow_hash() when it is moved definition to fix a compiler error. One valid checkpatch issue with the use of the EMC_FOR_EACH_POS_WITH_HASH() macro was fixed. Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Co-authored-by: Cian Ferriter <cian.ferriter@intel.com> Signed-off-by: Cian Ferriter <cian.ferriter@intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2021-07-09 15:58:15 +00:00
* Refactor lib/dpif-netdev.c to multiple header files.
* Add avx512 implementation of dpif which can process non recirculated
packets. It supports partial HWOL, EMC, SMC and DPCLS lookups.
* Add commands to get and set the dpif implementations.
* Add a partial HWOL PMD statistic counting hits similar to existing
EMC/SMC/DPCLS stats.
dpif-netdev/dpcls-avx512: Enable 16 block processing. This commit implements larger subtable searches in avx512. A limitation of the previous implementation was that up to 8 blocks of miniflow data could be matched on (so a subtable with 8 blocks was handled in avx, but 9 blocks or more would fall back to scalar/generic). This limitation is removed in this patch, where up to 16 blocks of subtable can be matched on. From an implementation perspective, the key to enabling 16 blocks over 8 blocks was to do bitmask calculation up front, and then use the pre-calculated bitmasks for 2x passes of the "blocks gather" routine. The bitmasks need to be shifted for k-mask usage in the upper (8-15) block range, but it is relatively trivial. This also helps in case expanding to 24 blocks is desired in future. The implementation of the 2nd iteration to handle > 8 blocks is behind a conditional branch which checks the total number of bits. This helps the specialized versions of the function that have a miniflow fingerprint of less-than-or-equal 8 blocks, as the code can be statically stripped out of those functions. Specialized functions that do require more than 8 blocks will have the branch removed and unconditionally execute the 2nd blocks gather routine. Lastly, the _any() flavour will have the conditional branch, and the branch predictor may mispredict a bit, but per burst will likely get most packets correct (particularly towards the middle and end of a burst). The code has been run with unit tests under autovalidation and passes all cases, and unit test coverage has been checked to ensure the 16 block code paths are executing. Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2021-07-09 15:58:21 +00:00
* Enable AVX512 optimized DPCLS to search subtables with larger miniflows.
* Add more specialized DPCLS subtables to cover common rules, enhancing
the lookup performance.
* Enable the AVX512 DPCLS implementation to use VPOPCNT instruction if the
CPU supports it. This enhances performance by using the native vpopcount
instructions, instead of the emulated version of vpopcount.
* Add command line option to switch between MFEX function pointers.
* Add miniflow extract auto-validator function to compare different
miniflow extract implementations against default implementation.
* Add study function to miniflow function table which studies packet
and automatically chooses the best miniflow implementation for that
traffic.
* Add build time configure command to enable auto-validatior as default
miniflow implementation at build time.
* Cache results for CPU ISA checks, reduces overhead on repeated lookups.
* Add AVX512 based optimized miniflow extract function for traffic type
IPv4/UDP, IPv4/TCP, Vlan/IPv4/UDP and Vlan/Ipv4/TCP.
dpif-netdev: Add group rxq scheduling assignment type. Add an rxq scheduling option that allows rxqs to be grouped on a pmd based purely on their load. The current default 'cycles' assignment sorts rxqs by measured processing load and then assigns them to a list of round robin PMDs. This helps to keep the rxqs that require most processing on different cores but as it selects the PMDs in round robin order, it equally distributes rxqs to PMDs. 'cycles' assignment has the advantage in that it separates the most loaded rxqs from being on the same core but maintains the rxqs being spread across a broad range of PMDs to mitigate against changes to traffic pattern. 'cycles' assignment has the disadvantage that in order to make the trade off between optimising for current traffic load and mitigating against future changes, it tries to assign and equal amount of rxqs per PMD in a round robin manner and this can lead to a less than optimal balance of the processing load. Now that PMD auto load balance can help mitigate with future changes in traffic patterns, a 'group' assignment can be used to assign rxqs based on their measured cycles and the estimated running total of the PMDs. In this case, there is no restriction about keeping equal number of rxqs per PMD as it is purely load based. This means that one PMD may have a group of low load rxqs assigned to it while another PMD has one high load rxq assigned to it, as that is the best balance of their measured loads across the PMDs. Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Acked-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2021-07-16 17:02:09 +01:00
* Added new 'group' option to pmd-rxq-assign. This will assign rxq to pmds
purely based on rxq and pmd load.
* Add new 'pmd-rxq-isolate' option that can be set to 'false' in order
that pmd cores which are pinned with rxqs using 'pmd-rxq-affinity'
are available for assigning other non-pinned rxqs.
- ovs-ctl:
* New option '--no-record-hostname' to disable hostname configuration
in ovsdb on startup.
* New command 'record-hostname-if-not-set' to update hostname in ovsdb.
- DPDK:
* OVS validated with DPDK 20.11.1. It is recommended to use this version
until further releases.
* New debug appctl command 'dpdk/get-malloc-stats'.
* Add hardware offload support for tunnel pop action (experimental).
Available only if DPDK experimental APIs enabled during the build.
* Add hardware offload support for VXLAN flows (experimental).
Available only if DPDK experimental APIs enabled during the build.
* EAL options --socket-mem and --socket-limit to have default values
removed with 2.17 release. Logging added to alert users.
- ovsdb-tool:
* New option '--election-timer' to the 'create-cluster' command to set the
leader election timer during cluster creation.
- OVS now reports the datapath capability 'ct_zero_snat', which reflects
whether the SNAT with all-zero IP address is supported.
See ovs-vswitchd.conf.db(5) for details.
ofproto-dpif: APIs and CLI option to add/delete static fdb entry. Currently there is an option to add/flush/show ARP/ND neighbor. This covers L3 side. For L2 side, there is only fdb show command. This commit gives an option to add/del an fdb entry via ovs-appctl. CLI command looks like: To add: ovs-appctl fdb/add <bridge> <port> <vlan> <Mac> ovs-appctl fdb/add br0 p1 0 50:54:00:00:00:05 To del: ovs-appctl fdb/del <bridge> <vlan> <Mac> ovs-appctl fdb/del br0 0 50:54:00:00:00:05 Added two new APIs to provide convenient interface to add and delete static-macs. bool xlate_add_static_mac_entry(const struct ofproto_dpif *, ofp_port_t in_port, struct eth_addr dl_src, int vlan); bool xlate_delete_static_mac_entry(const struct ofproto_dpif *, struct eth_addr dl_src, int vlan); 1. Static entry should not age. To indicate that entry being programmed is a static entry, 'expires' field in 'struct mac_entry' will be set to a MAC_ENTRY_AGE_STATIC_ENTRY. A check for this value is made while deleting mac entry as part of regular aging process. 2. Another change to the mac-update logic, when a packet with same dl_src as that of a static-mac entry arrives on any port, the logic will not modify the expires field. 3. While flushing fdb entries, made sure static ones are not evicted. 4. Updated "ovs-appctl fdb/stats-show br0" to display number of static entries in switch Added following tests: ofproto-dpif - static-mac add/del/flush ofproto-dpif - static-mac mac moves Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2019-June/048894.html Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1597752 Signed-off-by: Vasu Dasari <vdasari@gmail.com> Tested-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-06-29 16:43:39 -04:00
- ovs-appctl:
* Added ability to add and delete static mac entries using:
'ovs-appctl fdb/add <bridge> <port> <vlan> <mac>'
'ovs-appctl fdb/del <bridge> <vlan> <mac>'
dpif-netlink: Introduce per-cpu upcall dispatch. The Open vSwitch kernel module uses the upcall mechanism to send packets from kernel space to user space when it misses in the kernel space flow table. The upcall sends packets via a Netlink socket. Currently, a Netlink socket is created for every vport. In this way, there is a 1:1 mapping between a vport and a Netlink socket. When a packet is received by a vport, if it needs to be sent to user space, it is sent via the corresponding Netlink socket. This mechanism, with various iterations of the corresponding user space code, has seen some limitations and issues: * On systems with a large number of vports, there is correspondingly a large number of Netlink sockets which can limit scaling. (https://bugzilla.redhat.com/show_bug.cgi?id=1526306) * Packet reordering on upcalls. (https://bugzilla.redhat.com/show_bug.cgi?id=1844576) * A thundering herd issue. (https://bugzilla.redhat.com/show_bug.cgi?id=1834444) This patch introduces an alternative, feature-negotiated, upcall mode using a per-cpu dispatch rather than a per-vport dispatch. In this mode, the Netlink socket to be used for the upcall is selected based on the CPU of the thread that is executing the upcall. In this way, it resolves the issues above as: a) The number of Netlink sockets scales with the number of CPUs rather than the number of vports. b) Ordering per-flow is maintained as packets are distributed to CPUs based on mechanisms such as RSS and flows are distributed to a single user space thread. c) Packets from a flow can only wake up one user space thread. Reported-at: https://bugzilla.redhat.com/1844576 Signed-off-by: Mark Gray <mark.d.gray@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-07-16 06:17:36 -04:00
- Linux datapath:
* ovs-vswitchd will configure the kernel module using per-cpu dispatch
mode (if available). This changes the way upcalls are delivered to user
space in order to resolve a number of issues with per-vport dispatch.
* New vswitchd unixctl command `dpif-netlink/dispatch-mode` will return
the current dispatch mode for each datapath.
v2.15.0 - 15 Feb 2021
---------------------
- OVSDB:
ovsdb: Use column diffs for ovsdb and raft log entries. Currently, ovsdb-server stores complete value for the column in a database file and in a raft log in case this column changed. This means that transaction that adds, for example, one new acl to a port group creates a log entry with all UUIDs of all existing acls + one new. Same for ports in logical switches and routers and more other columns with sets in Northbound DB. There could be thousands of acls in one port group or thousands of ports in a single logical switch. And the typical use case is to add one new if we're starting a new service/VM/container or adding one new node in a kubernetes or OpenStack cluster. This generates huge amount of traffic within ovsdb raft cluster, grows overall memory consumption and hurts performance since all these UUIDs are parsed and formatted to/from json several times and stored on disks. And more values we have in a set - more space a single log entry will occupy and more time it will take to process by ovsdb-server cluster members. Simple test: 1. Start OVN sandbox with clustered DBs: # make sandbox SANDBOXFLAGS='--nbdb-model=clustered --sbdb-model=clustered' 2. Run a script that creates one port group and adds 4000 acls into it: # cat ../memory-test.sh pg_name=my_port_group export OVN_NB_DAEMON=$(ovn-nbctl --pidfile --detach --log-file -vsocket_util:off) ovn-nbctl pg-add $pg_name for i in $(seq 1 4000); do echo "Iteration: $i" ovn-nbctl --log acl-add $pg_name from-lport $i udp drop done ovn-nbctl acl-del $pg_name ovn-nbctl pg-del $pg_name ovs-appctl -t $(pwd)/sandbox/nb1 memory/show ovn-appctl -t ovn-nbctl exit --- 4. Check the current memory consumption of ovsdb-server processes and space occupied by database files: # ls sandbox/[ns]b*.db -alh # ps -eo vsz,rss,comm,cmd | egrep '=[ns]b[123].pid' Test results with current ovsdb log format: On-disk Nb DB size : ~369 MB RSS of Nb ovsdb-servers: ~2.7 GB Time to finish the test: ~2m In order to mitigate memory consumption issues and reduce computational load on ovsdb-servers let's store diff between old and new values instead. This will make size of each log entry that adds single acl to port group (or port to logical switch or anything else like that) very small and independent from the number of already existing acls (ports, etc.). Added a new marker '_is_diff' into a file transaction to specify that this transaction contains diffs instead of replacements for the existing data. One side effect is that this change will actually increase the size of file transaction that removes more than a half of entries from the set, because diff will be larger than the resulted new value. However, such operations are rare. Test results with change applied: On-disk Nb DB size : ~2.7 MB ---> reduced by 99% RSS of Nb ovsdb-servers: ~580 MB ---> reduced by 78% Time to finish the test: ~1m27s ---> reduced by 27% After this change new ovsdb-server is still able to read old databases, but old ovsdb-server will not be able to read new ones. Since new servers could join ovsdb cluster dynamically it's hard to implement any runtime mechanism to handle cases where different versions of ovsdb-server joins the cluster. However we still need to handle cluster upgrades. For this case added special command line argument to disable new functionality. Documentation updated with the recommended way to upgrade the ovsdb cluster. Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-12-11 21:54:47 +01:00
* Changed format in which ovsdb transactions are stored in database files.
Now each transaction contains diff of data instead of the whole new
value of a column.
New ovsdb-server process will be able to read old database format, but
old processes will *fail* to read database created by the new one.
For cluster and active-backup service models follow upgrade instructions
in 'Upgrading from version 2.14 and earlier to 2.15 and later' section
of ovsdb(7).
* New unixctl command 'ovsdb-server/get-db-storage-status' to show the
status of the storage that's backing a database.
ovsdb-server: Reclaim heap memory after compaction. Compaction happens at most once in 10 minutes. That is a big time interval for a heavy loaded ovsdb-server in cluster mode. In 10 minutes raft logs could grow up to tens of thousands of entries with tens of gigabytes in total size. While compaction cleans up raft log entries, the memory in many cases is not returned to the system, but kept in the heap of running ovsdb-server process, and it could stay in this condition for a really long time. In the end one performance spike could lead to a fast growth of the raft log and this memory will never (for a really long time) be released to the system even if the database if empty. Simple example how to reproduce with OVN sandbox: 1. make sandbox SANDBOXFLAGS='--nbdb-model=clustered --sbdb-model=clustered' 2. Run following script that creates 1 port group, adds 4000 acls and removes all of that in the end: # cat ../memory-test.sh pg_name=my_port_group export OVN_NB_DAEMON=$(ovn-nbctl --pidfile --detach --log-file -vsocket_util:off) ovn-nbctl pg-add $pg_name for i in $(seq 1 4000); do echo "Iteration: $i" ovn-nbctl --log acl-add $pg_name from-lport $i udp drop done ovn-nbctl acl-del $pg_name ovn-nbctl pg-del $pg_name ovs-appctl -t $(pwd)/sandbox/nb1 memory/show ovn-appctl -t ovn-nbctl exit --- 3. Stopping one of Northbound DB servers: ovs-appctl -t $(pwd)/sandbox/nb1 exit Make sure that ovsdb-server didn't compact the database before it was stopped. Now we have a db file on disk that contains 4000 fairly big transactions inside. 4. Trying to start same ovsdb-server with this file. # cd sandbox && ovsdb-server <...> nb1.db At this point ovsdb-server reads all the transactions from db file and performs all of them as fast as it can one by one. When it finishes this, raft log contains 4000 entries and ovsdb-server consumes (on my system) ~13GB of memory while database is empty. And libc will likely never return this memory back to system, or, at least, will hold it for a really long time. This patch adds a new command 'ovsdb-server/memory-trim-on-compaction'. It's disabled by default, but once enabled, ovsdb-server will call 'malloc_trim(0)' after every successful compaction to try to return unused heap memory back to system. This is glibc-specific, so we need to detect function availability in a build time. Disabled by default since it adds from 1% to 30% (depending on the current state) to the snapshot creation time and, also, next memory allocations will likely require requests to kernel and that might be slower. Could be enabled by default later if considered broadly beneficial. Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1888829 Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-10-24 02:25:48 +02:00
* New unixctl command 'ovsdb-server/memory-trim-on-compaction on|off'.
If turned on, ovsdb-server will try to reclaim all the unused memory
after every DB compaction back to OS. Disabled by default.
* Maximum backlog on RAFT connections limited to 500 messages or 4GB.
Once threshold reached, connection is dropped (and re-established).
Use the 'cluster/set-backlog-threshold' command to change limits.
- DPDK:
* Removed support for vhost-user dequeue zero-copy.
dpdk: Update to use DPDK v20.11. This commit adds support for DPDK v20.11, it includes the following changes. 1. travis: Remove explicit DPDK kmods configuration. 2. sparse: Fix build with 20.05 DPDK tracepoints. 3. netdev-dpdk: Remove experimental API flag. http://patchwork.ozlabs.org/project/openvswitch/list/?series=173216&state=* 4. sparse: Update to DPDK 20.05 trace point header. http://patchwork.ozlabs.org/project/openvswitch/list/?series=179604&state=* 5. sparse: Fix build with DPDK 20.08. http://patchwork.ozlabs.org/project/openvswitch/list/?series=200181&state=* 6. build: Add support for DPDK meson build. http://patchwork.ozlabs.org/project/openvswitch/list/?series=199138&state=* 7. netdev-dpdk: Remove usage of RTE_ETH_DEV_CLOSE_REMOVE flag. http://patchwork.ozlabs.org/project/openvswitch/list/?series=207850&state=* 8. netdev-dpdk: Fix build with 20.11-rc1. http://patchwork.ozlabs.org/project/openvswitch/list/?series=209006&state=* 9. sparse: Fix __ATOMIC_* redefinition errors http://patchwork.ozlabs.org/project/openvswitch/list/?series=209452&state=* 10. build: Remove DPDK make build references. http://patchwork.ozlabs.org/project/openvswitch/list/?series=216682&state=* For credit all authors of the original commits to 'dpdk-latest' with the above changes have been added as co-authors for this commit. Signed-off-by: David Marchand <david.marchand@redhat.com> Co-authored-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Sunil Pai G <sunil.pai.g@intel.com> Co-authored-by: Sunil Pai G <sunil.pai.g@intel.com> Signed-off-by: Eli Britstein <elibr@nvidia.com> Co-authored-by: Eli Britstein <elibr@nvidia.com> Tested-by: Harry van Haaren <harry.van.haaren@intel.com> Tested-by: Govindharajan, Hariprasad <hariprasad.govindharajan@intel.com> Tested-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2020-12-15 16:41:28 +00:00
* Add support for DPDK 20.11.
- Userspace datapath:
* Add the 'pmd' option to "ovs-appctl dpctl/dump-flows", which
restricts a flow dump to a single PMD thread if set.
* New 'options:dpdk-vf-mac' field for DPDK interface of VF ports,
that allows configuring the MAC address of a VF representor.
* Add generic IP protocol support to conntrack. With this change, all
none UDP, TCP, and ICMP traffic will be treated as general L3
traffic, i.e. using 3 tupples.
* Add parameters 'pmd-auto-lb-load-threshold' and
'pmd-auto-lb-improvement-threshold' to configure PMD auto load balance
behaviour.
- The environment variable OVS_UNBOUND_CONF, if set, is now used
as the DNS resolver's (unbound) configuration file.
- Linux datapath:
* Support for kernel versions up to 5.8.x.
- Terminology:
* The terms "master" and "slave" have been replaced by "primary" and
"secondary", respectively, for OpenFlow connection roles.
* The term "slave" has been replaced by "member", for bonds, LACP, and
OpenFlow bundle actions.
- Support for GitHub Actions based continuous integration builds has been
added.
tunnel: Bareudp Tunnel Support. There are various L3 encapsulation standards using UDP being discussed to leverage the UDP based load balancing capability of different networks. MPLSoUDP (__ https://tools.ietf.org/html/rfc7510) is one among them. The Bareudp tunnel provides a generic L3 encapsulation support for tunnelling different L3 protocols like MPLS, IP, NSH etc. inside a UDP tunnel. An example to create bareudp device to tunnel MPLS traffic is given $ ovs-vsctl add-port br_mpls udp_port -- set interface udp_port \ type=bareudp options:remote_ip=2.1.1.3 options:local_ip=2.1.1.2 \ options:payload_type=0x8847 options:dst_port=6635 The bareudp device supports special handling for MPLS & IP as they can have multiple ethertypes. MPLS procotcol can have ethertypes ETH_P_MPLS_UC (unicast) & ETH_P_MPLS_MC (multicast). IP protocol can have ethertypes ETH_P_IP (v4) & ETH_P_IPV6 (v6). The bareudp device to tunnel L3 traffic with multiple ethertypes (MPLS & IP) can be created by passing the L3 protocol name as string in the field payload_type. An example to create bareudp device to tunnel MPLS unicast & multicast traffic is given below.:: $ ovs-vsctl add-port br_mpls udp_port -- set interface udp_port \ type=bareudp options:remote_ip=2.1.1.3 options:local_ip=2.1.1.2 \ options:payload_type=mpls options:dst_port=6635 Signed-off-by: Martin Varghese <martin.varghese@nokia.com> Acked-By: Greg Rose <gvrose8192@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-12-17 12:48:41 +05:30
- Bareudp Tunnel
* Bareudp device support is present in linux kernel from version 5.7
* Kernel bareudp device is not backported to ovs tree.
* Userspace datapath support is not added
- ovs-dpctl and 'ovs-appctl dpctl/':
* New commands '{add,mod,del}-flows' where added, which allow adding,
deleting, or modifying flows based on information read from a file.
- IPsec:
* Add option '--no-cleanup' to allow ovs-monitor-ipsec to stop without
tearing down IPsec tunnels.
* Add option '--no-restart-ike-daemon' to allow ovs-monitor-ipsec to start
without restarting ipsec daemon.
- Building the Linux kernel module from the OVS source tree is deprecated
* Support for the Linux kernel is capped at version 5.8
* Only bug fixes for the Linux OOT kernel module will be accepted.
* The Linux kernel module will be fully removed from the OVS source tree
in OVS branch 2.18
v2.14.0 - 17 Aug 2020
---------------------
- ovs-vswitchd no longer deletes datapath flows on exit by default.
- OpenFlow:
* The OpenFlow ofp_desc/serial_num may now be configured by setting the
value of other-config:dp-sn in the Bridge table.
* Added support to watch CONTROLLER port status in fast failover group.
* New action "delete_field".
- DPDK:
* Deprecated DPDK pdump packet capture support removed.
* Deprecated DPDK ring ports (dpdkr) are no longer supported.
* Add hardware offload support for VLAN Push/Pop actions (experimental).
* Add hardware offload support for matching IPv6 protocol (experimental).
* Add hardware offload support for set of IPv6 src/dst/ttl
and tunnel push-output actions (experimental).
* OVS validated with DPDK 19.11.2, due to the inclusion of fixes for
CVE-2020-10722, CVE-2020-10723, CVE-2020-10724, CVE-2020-10725 and
CVE-2020-10726, this DPDK version is strongly recommended to be used.
* New 'ovs-appctl dpdk/log-list' and 'ovs-appctl dpdk/log-set' commands
to list and change log levels in DPDK components.
* Vhost-user Dequeue zero-copy support is deprecated and will be removed
in the next release.
- Linux datapath:
* Support for kernel versions up to 5.5.x.
- AF_XDP:
* New netdev class 'afxdp-nonpmd' for netdev-afxdp to save CPU cycles
by enabling interrupt mode.
- Userspace datapath:
* Removed artificial datapath flow limit that was 65536.
Now number of datapath flows is fully controlled by revalidators and the
'other_config:flow-limit' knob.
* Add support for conntrack zone-based timeout policy.
userspace: Avoid dp_hash recirculation for balance-tcp bond mode. Problem: In OVS, flows with output over a bond interface of type “balance-tcp” gets translated by the ofproto layer into "HASH" and "RECIRC" datapath actions. After recirculation, the packet is forwarded to the bond member port based on 8-bits of the datapath hash value computed through dp_hash. This causes performance degradation in the following ways: 1. The recirculation of the packet implies another lookup of the packet’s flow key in the exact match cache (EMC) and potentially Megaflow classifier (DPCLS). This is the biggest cost factor. 2. The recirculated packets have a new “RSS” hash and compete with the original packets for the scarce number of EMC slots. This implies more EMC misses and potentially EMC thrashing causing costly DPCLS lookups. 3. The 256 extra megaflow entries per bond for dp_hash bond selection put additional load on the revalidation threads. Owing to this performance degradation, deployments stick to “balance-slb” bond mode even though it does not do active-active load balancing for VXLAN- and GRE-tunnelled traffic because all tunnel packet have the same source MAC address. Proposed optimization: This proposal introduces a new load-balancing output action instead of recirculation. Maintain one table per-bond (could just be an array of uint16's) and program it the same way internal flows are created today for each possible hash value (256 entries) from ofproto layer. Use this table to load-balance flows as part of output action processing. Currently xlate_normal() -> output_normal() -> bond_update_post_recirc_rules() -> bond_may_recirc() and compose_output_action__() generate 'dp_hash(hash_l4(0))' and 'recirc(<RecircID>)' actions. In this case the RecircID identifies the bond. For the recirculated packets the ofproto layer installs megaflow entries that match on RecircID and masked dp_hash and send them to the corresponding output port. Instead, we will now generate action as 'lb_output(<bond id>)' This combines hash computation (only if needed, else re-use RSS hash) and inline load-balancing over the bond. This action is used *only* for balance-tcp bonds in userspace datapath (the OVS kernel datapath remains unchanged). Example: Current scheme: With 8 UDP flows (with random UDP src port): flow-dump from pmd on cpu core: 2 recirc_id(0),in_port(7),<...> actions:hash(hash_l4(0)),recirc(0x1) recirc_id(0x1),dp_hash(0xf8e02b7e/0xff),<...> actions:2 recirc_id(0x1),dp_hash(0xb236c260/0xff),<...> actions:1 recirc_id(0x1),dp_hash(0x7d89eb18/0xff),<...> actions:1 recirc_id(0x1),dp_hash(0xa78d75df/0xff),<...> actions:2 recirc_id(0x1),dp_hash(0xb58d846f/0xff),<...> actions:2 recirc_id(0x1),dp_hash(0x24534406/0xff),<...> actions:1 recirc_id(0x1),dp_hash(0x3cf32550/0xff),<...> actions:1 New scheme: We can do with a single flow entry (for any number of new flows): in_port(7),<...> actions:lb_output(1) A new CLI has been added to dump datapath bond cache as given below. # ovs-appctl dpif-netdev/bond-show [dp] Bond cache: bond-id 1 : bucket 0 - slave 2 bucket 1 - slave 1 bucket 2 - slave 2 bucket 3 - slave 1 Co-authored-by: Manohar Krishnappa Chidambaraswamy <manukc@gmail.com> Signed-off-by: Manohar Krishnappa Chidambaraswamy <manukc@gmail.com> Signed-off-by: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com> Tested-by: Matteo Croce <mcroce@redhat.com> Tested-by: Adrian Moreno <amorenoz@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-05-22 10:50:05 +02:00
* New configuration knob 'other_config:lb-output-action' for bond ports
that enables new datapath action 'lb_output' to avoid recirculation
in balance-tcp mode. Disabled by default.
* Add runtime CPU ISA detection to allow optimized ISA functions
* Add support for dynamically changing DPCLS subtable lookup functions
* Add ISA optimized DPCLS lookup function using AVX512
bond: Add 'primary' interface concept for active-backup mode. In AB bonding, if the current active slave becomes disabled, a replacement slave is arbitrarily picked from the remaining set of enabled slaves. This commit adds the concept of a "primary" slave: an interface that will always be (or become) the current active slave if it is enabled. The rationale for this functionality is to allow the designation of a preferred interface for a given bond. For example: 1. Bond is created with interfaces p1 (primary) and p2, both enabled. 2. p1 becomes the current active slave (because it was designated as the primary). 3. Later, p1 fails/becomes disabled. 4. p2 is chosen to become the current active slave. 5. Later, p1 becomes re-enabled. 6. p1 is chosen to become the current active slave (because it was designated as the primary) Note that p1 becomes the active slave once it becomes re-enabled, even if nothing has happened to p2. This "primary" concept exists in Linux kernel network interface bonding, but did not previously exist in OVS bonding. Only one primary slave interface is supported per bond, and is only supported for active/backup bonding. The primary slave interface is designated via "other_config:bond-primary" when creating a bond. Also, while adding tests for the "primary" concept, make a few small improvements to the non-primary AB bonding test. Signed-off-by: Jeff Squyres <jsquyres@cisco.com> Reviewed-by: Aaron Conole <aconole@redhat.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-07-09 16:57:47 -07:00
- New configuration knob 'other_config:bond-primary' for AB bonds
that specifies interface will be the preferred port if it is active.
- Tunnels: TC Flower offload
* Tunnel Local endpoint address masked match are supported.
* Tunnel Romte endpoint address masked match are supported.
- GTP-U Tunnel Protocol
* Add two new fields: tun_gtpu_flags, tun_gtpu_msgtype.
* Only support for userspace datapath.
v2.13.0 - 14 Feb 2020
---------------------
- OVN:
* OVN has been removed from this repository. It now exists as a
separate project. You can find it at
https://github.com/ovn-org/ovn.git
- Userspace datapath:
netdev-afxdp: Best-effort configuration of XDP mode. Until now there was only two options for XDP mode in OVS: SKB or DRV. i.e. 'generic XDP' or 'native XDP with zero-copy enabled'. Devices like 'veth' interfaces in Linux supports native XDP, but doesn't support zero-copy mode. This case can not be covered by existing API and we have to use slower generic XDP for such devices. There are few more issues, e.g. TCP is not supported in generic XDP mode for veth interfaces due to kernel limitations, however it is supported in native mode. This change introduces ability to use native XDP without zero-copy along with best-effort configuration option that enabled by default. In best-effort case OVS will sequentially try different modes starting from the fastest one and will choose the first acceptable for current interface. This will guarantee the best possible performance. If user will want to choose specific mode, it's still possible by setting the 'options:xdp-mode'. This change additionally changes the API by renaming the configuration knob from 'xdpmode' to 'xdp-mode' and also renaming the modes themselves to be more user-friendly. The full list of currently supported modes: * native-with-zerocopy - former DRV * native - new one, DRV without zero-copy * generic - former SKB * best-effort - new one, chooses the best available from 3 above modes Since 'best-effort' is a default mode, users will not need to explicitely set 'xdp-mode' in most cases. TCP related tests enabled back in system afxdp testsuite, because 'best-effort' will choose 'native' mode for veth interfaces and this mode has no issues with TCP. Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: William Tu <u9012063@gmail.com> Acked-by: Eelco Chaudron <echaudro@redhat.com>
2019-11-06 21:38:33 +00:00
* Add option to enable, disable and query TCP sequence checking in
conntrack.
* Add support for conntrack zone limits.
* Command "ovs-appctl dpctl/dump-flows" refactored to show subtable
miniflow bits for userspace datapath.
netdev-afxdp: Best-effort configuration of XDP mode. Until now there was only two options for XDP mode in OVS: SKB or DRV. i.e. 'generic XDP' or 'native XDP with zero-copy enabled'. Devices like 'veth' interfaces in Linux supports native XDP, but doesn't support zero-copy mode. This case can not be covered by existing API and we have to use slower generic XDP for such devices. There are few more issues, e.g. TCP is not supported in generic XDP mode for veth interfaces due to kernel limitations, however it is supported in native mode. This change introduces ability to use native XDP without zero-copy along with best-effort configuration option that enabled by default. In best-effort case OVS will sequentially try different modes starting from the fastest one and will choose the first acceptable for current interface. This will guarantee the best possible performance. If user will want to choose specific mode, it's still possible by setting the 'options:xdp-mode'. This change additionally changes the API by renaming the configuration knob from 'xdpmode' to 'xdp-mode' and also renaming the modes themselves to be more user-friendly. The full list of currently supported modes: * native-with-zerocopy - former DRV * native - new one, DRV without zero-copy * generic - former SKB * best-effort - new one, chooses the best available from 3 above modes Since 'best-effort' is a default mode, users will not need to explicitely set 'xdp-mode' in most cases. TCP related tests enabled back in system afxdp testsuite, because 'best-effort' will choose 'native' mode for veth interfaces and this mode has no issues with TCP. Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: William Tu <u9012063@gmail.com> Acked-by: Eelco Chaudron <echaudro@redhat.com>
2019-11-06 21:38:33 +00:00
- AF_XDP:
* New option 'use-need-wakeup' for netdev-afxdp to control enabling
of corresponding 'need_wakeup' flag in AF_XDP rings. Enabled by default
if supported by libbpf.
netdev-afxdp: Best-effort configuration of XDP mode. Until now there was only two options for XDP mode in OVS: SKB or DRV. i.e. 'generic XDP' or 'native XDP with zero-copy enabled'. Devices like 'veth' interfaces in Linux supports native XDP, but doesn't support zero-copy mode. This case can not be covered by existing API and we have to use slower generic XDP for such devices. There are few more issues, e.g. TCP is not supported in generic XDP mode for veth interfaces due to kernel limitations, however it is supported in native mode. This change introduces ability to use native XDP without zero-copy along with best-effort configuration option that enabled by default. In best-effort case OVS will sequentially try different modes starting from the fastest one and will choose the first acceptable for current interface. This will guarantee the best possible performance. If user will want to choose specific mode, it's still possible by setting the 'options:xdp-mode'. This change additionally changes the API by renaming the configuration knob from 'xdpmode' to 'xdp-mode' and also renaming the modes themselves to be more user-friendly. The full list of currently supported modes: * native-with-zerocopy - former DRV * native - new one, DRV without zero-copy * generic - former SKB * best-effort - new one, chooses the best available from 3 above modes Since 'best-effort' is a default mode, users will not need to explicitely set 'xdp-mode' in most cases. TCP related tests enabled back in system afxdp testsuite, because 'best-effort' will choose 'native' mode for veth interfaces and this mode has no issues with TCP. Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: William Tu <u9012063@gmail.com> Acked-by: Eelco Chaudron <echaudro@redhat.com>
2019-11-06 21:38:33 +00:00
* 'xdpmode' option for netdev-afxdp renamed to 'xdp-mode'.
Modes also updated. New values:
native-with-zerocopy - former DRV
native - new one, DRV without zero-copy
generic - former SKB
best-effort [default] - new one, chooses the best available from
3 above modes
dpdk: Deprecate pdump support. The conventional way for packet dumping in OVS is to use ovs-tcpdump that works via traffic mirroring. DPDK pdump could probably be used for some lower level debugging, but it is not commonly used for various reasons. There are lots of limitations for using this functionality in practice. Most of them connected with running secondary pdump process and memory layout issues like requirement to disable ASLR in kernel. More details are available in DPDK guide: https://doc.dpdk.org/guides/prog_guide/multi_proc_support.html#multi-process-limitations Beside the functional limitations it's also hard to use this functionality correctly. User must be sure that OVS and pdump utility are running on different CPU cores, which is hard because non-PMD threads could float over available CPU cores. This or any other misconfiguration will likely lead to crash of the pdump utility or/and OVS. Another problem is that the user must actually have this special pdump utility in a system and it might be not available in distributions. This change disables pdump support by default introducing special configuration option '--enable-dpdk-pdump'. Deprecation warnings will be shown to users on configuration and in runtime. Claiming to completely remove this functionality from OVS in one of the next releases. Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Aaron Conole <aconole@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2019-11-11 19:52:56 +01:00
- DPDK:
* DPDK pdump packet capture support disabled by default. New configure
option '--enable-dpdk-pdump' to enable it.
* DPDK pdump support is deprecated and will be removed in next releases.
netdev-dpdk: Deprecate ring ports. 'dpdkr' a.k.a. DPDK ring ports has really poor support in OVS and not tested on a regular basis. These ports are intended to work via shared memory with another DPDK secondary process, but there are lots of limitations for using this functionality in practice. Most of them connected with running secondary DPDK application and memory layout issues. More details are available in DPDK guide: https://doc.dpdk.org/guides-18.11/prog_guide/multi_proc_support.html#multi-process-limitations Beside the functional limitations it's also hard to use this functionality correctly. User must be sure that OVS and secondary DPDK application are running on different CPU cores, which is hard because non-PMD threads could float over available CPU cores. This or any other misconfiguration will likely lead to crash of OVS. Another problem is that the user must actually build the secondary application with the same version of DPDK that was used for OVS build. Above issues are same as we have while using DPDK pdump. Beside that, current implementation in OVS is not able to free allocated rings that could lead to memory exhausting. Initially these ports was added to use with IVSHMEM for a fast zero-copy HOST<-->VM communication. However, IVSHMEM is not used anymore. IVSHMEM support was removed from DPDK in 16.11 release (instructions for IVSHMEM were removed from the OVS docs almost 3 years ago by commit 90ca71dd317f ("doc: Remove ivshmem instructions.")) and the patch for QEMU for using regular files as a device backend is no longer available. That makes DPDK ring ports barely useful in real virtualization environment. This patch adds a deprecation warnings for run-time port creation and documentation. Claiming to completely remove this functionality from OVS in one of the next releases. Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Ian Stokes <ian.stokes@intel.com> Acked-by: Aaron Conole <aconole@redhat.com> Acked-by: David Marchand <david.marchand@redhat.com> Acked-by: Kevin Traynor <ktraynor@redhat.com>
2019-11-26 11:43:58 +01:00
* DPDK ring ports (dpdkr) are deprecated and will be removed in next
releases.
* Add support for DPDK 19.11.
* Add hardware offload support for output, drop, set of MAC, IPv4 and
TCP/UDP ports actions (experimental).
* Add experimental support for TSO.
- RSTP:
* The rstp_statistics column in Port table will only be updated every
stats-update-interval configured in Open_vSwitch table.
- OVSDB:
* When ovsdb-server is running in backup mode, the default value of probe
interval is increased to 60 seconds for the connection to the
replication server. This value is configurable with the unixctl
command - ovsdb-server/set-active-ovsdb-server-probe-interval.
* ovsdb-server: New OVSDB extension to allow clients to specify row UUIDs.
- 'ovs-appctl dpctl/dump-flows' can now show offloaded=partial for
partially offloaded flows, dp:dpdk for fully offloaded by dpdk, and
type filter supports new filters: "dpdk" and "partially-offloaded".
- Add new argument '--offload-stats' for command
'ovs-appctl bridge/dump-flows',
so it can display offloaded packets statistics.
v2.12.0 - 03 Sep 2019
---------------------
- DPDK:
* New option 'other_config:dpdk-socket-limit' to limit amount of
hugepage memory that can be used by DPDK.
* Add support for vHost Post-copy Live Migration (experimental).
* OVS validated with DPDK 18.11.2 which is the new minimal supported
version.
* DPDK 18.11.1 and lower is no longer supported.
* New option 'tx-retries-max' to set the maximum amount of vhost tx
retries that can be made.
- OpenFlow:
* All features required by OpenFlow 1.5 are now implemented, so
ovs-vswitchd now enables OpenFlow 1.5 by default (in addition to
OpenFlow 1.0 to 1.4).
* Removed support for OpenFlow 1.6 (draft), which ONF abandoned.
Add a new OVS action check_pkt_larger This patch adds a new action 'check_pkt_larger' which checks if the packet is larger than the given size and stores the result in the destination register. Usage: check_pkt_larger(len)->REGISTER Eg. match=...,actions=check_pkt_larger(1442)->NXM_NX_REG0[0],next; This patch makes use of the new datapath action - 'check_pkt_len' which was recently added in the commit [1]. At the start of ovs-vswitchd, datapath is probed for this action. If the datapath action is present, then 'check_pkt_larger' makes use of this datapath action. Datapath action 'check_pkt_len' takes these nlattrs * OVS_CHECK_PKT_LEN_ATTR_PKT_LEN - 'pkt_len' to check for * OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER (optional) - Nested actions to apply if the packet length is greater than the specified 'pkt_len' * OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL (optional) - Nested actions to apply if the packet length is lesser or equal to the specified 'pkt_len'. Let's say we have these flows added to an OVS bridge br-int table=0, priority=100 in_port=1,ip,actions=check_pkt_larger:100->NXM_NX_REG0[0],resubmit(,1) table=1, priority=200,in_port=1,ip,reg0=0x1/0x1 actions=output:3 table=1, priority=100,in_port=1,ip,actions=output:4 Then the action 'check_pkt_larger' will be translated as - check_pkt_len(size=100,gt(3),le(4)) datapath will check the packet length and if the packet length is greater than 100, it will output to port 3, else it will output to port 4. In case, datapath doesn't support 'check_pkt_len' action, the OVS action 'check_pkt_larger' sets SLOW_ACTION so that datapath flow is not added. This OVS action is intended to be used by OVN to check the packet length and generate an ICMP packet with type 3, code 4 and next hop mtu in the logical router pipeline if the MTU of the physical interface is lesser than the packet length. More information can be found here [2] [1] - https://kernel.googlesource.com/pub/scm/linux/kernel/git/davem/net-next/+/4d5ec89fc8d14dcdab7214a0c13a1c7321dc6ea9 [2] - https://mail.openvswitch.org/pipermail/ovs-discuss/2018-July/047039.html Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2018-July/047039.html Suggested-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Numan Siddique <nusiddiq@redhat.com> CC: Ben Pfaff <blp@ovn.org> CC: Gregory Rose <gvrose8192@gmail.com> Acked-by: Mark Michelson <mmichels@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
2019-04-23 00:53:38 +05:30
* New action "check_pkt_larger".
* Support for OpenFlow 1.5 "meter" action.
- Userspace datapath:
* ICMPv6 ND enhancements: support for match and set ND options type
and reserved fields.
* Add v4/v6 fragmentation support for conntrack.
* New ovs-appctl "dpctl/ipf-set-enabled" and "dpctl/ipf-set-disabled"
commands for userspace datapath conntrack fragmentation support.
* New "ovs-appctl dpctl/ipf-set-min-frag" command for userspace
datapath conntrack fragmentation support.
* New "ovs-appctl dpctl/ipf-set-max-nfrags" command for userspace datapath
conntrack fragmentation support.
* New "ovs-appctl dpctl/ipf-get-status" command for userspace datapath
conntrack fragmentation support.
Add a new OVS action check_pkt_larger This patch adds a new action 'check_pkt_larger' which checks if the packet is larger than the given size and stores the result in the destination register. Usage: check_pkt_larger(len)->REGISTER Eg. match=...,actions=check_pkt_larger(1442)->NXM_NX_REG0[0],next; This patch makes use of the new datapath action - 'check_pkt_len' which was recently added in the commit [1]. At the start of ovs-vswitchd, datapath is probed for this action. If the datapath action is present, then 'check_pkt_larger' makes use of this datapath action. Datapath action 'check_pkt_len' takes these nlattrs * OVS_CHECK_PKT_LEN_ATTR_PKT_LEN - 'pkt_len' to check for * OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER (optional) - Nested actions to apply if the packet length is greater than the specified 'pkt_len' * OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL (optional) - Nested actions to apply if the packet length is lesser or equal to the specified 'pkt_len'. Let's say we have these flows added to an OVS bridge br-int table=0, priority=100 in_port=1,ip,actions=check_pkt_larger:100->NXM_NX_REG0[0],resubmit(,1) table=1, priority=200,in_port=1,ip,reg0=0x1/0x1 actions=output:3 table=1, priority=100,in_port=1,ip,actions=output:4 Then the action 'check_pkt_larger' will be translated as - check_pkt_len(size=100,gt(3),le(4)) datapath will check the packet length and if the packet length is greater than 100, it will output to port 3, else it will output to port 4. In case, datapath doesn't support 'check_pkt_len' action, the OVS action 'check_pkt_larger' sets SLOW_ACTION so that datapath flow is not added. This OVS action is intended to be used by OVN to check the packet length and generate an ICMP packet with type 3, code 4 and next hop mtu in the logical router pipeline if the MTU of the physical interface is lesser than the packet length. More information can be found here [2] [1] - https://kernel.googlesource.com/pub/scm/linux/kernel/git/davem/net-next/+/4d5ec89fc8d14dcdab7214a0c13a1c7321dc6ea9 [2] - https://mail.openvswitch.org/pipermail/ovs-discuss/2018-July/047039.html Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2018-July/047039.html Suggested-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Numan Siddique <nusiddiq@redhat.com> CC: Ben Pfaff <blp@ovn.org> CC: Gregory Rose <gvrose8192@gmail.com> Acked-by: Mark Michelson <mmichels@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
2019-04-23 00:53:38 +05:30
* New action "check_pkt_len".
* Port configuration with "other-config:priority-tags" now has a mode
that retains the 802.1Q header even if VLAN and priority are both zero.
vswitchd: Always cleanup userspace datapath. 'netdev' datapath is implemented within ovs-vswitchd process and can not exist without it, so it should be gracefully terminated with a full cleanup of resources upon ovs-vswitchd exit. This change forces dpif cleanup for 'netdev' datapath regardless of passing '--cleanup' to 'ovs-appctl exit'. Such solution allowes to not pass this additional option everytime for userspace datapath installations and also allowes to not terminate system datapath in setups where both datapaths runs at the same time. The main part is that dpif_port_del() will lead to netdev_close() and subsequent netdev_class->destroy(dev) which will stop HW NICs and free their resources. For vhost-user interfaces it will invoke vhost driver unregistering with a properly closed vhost-user connection. For upcoming AF_XDP netdev this will allow to gracefully destroy xdp sockets and unload xdp programs from linux interfaces. Another important thing is that port deletion will also trigger flushing of flows offloaded to HW NICs. Exception made for 'internal' ports that could have user ip/route configuration. These ports will not be removed without '--cleanup'. This change fixes OVS disappearing from the DPDK point of view (keeping HW NICs improperly configured, sudden closing of vhost-user connections) and will help with linux devices clearing with upcoming AF_XDP netdev support. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Tested-by: William Tu <u9012063@gmail.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Ben Pfaff <blp@ovn.org>
2019-06-24 17:20:17 +03:00
* 'ovs-appctl exit' now implies cleanup of non-internal ports in userspace
datapath regardless of '--cleanup' option. Use '--cleanup' to remove
internal ports too.
* Removed experimental tag for SMC cache.
dpif-netdev: Add specialized generic scalar functions This commit adds a number of specialized functions, that handle common miniflow fingerprints. This enables compiler optimization, resulting in higher performance. Below a quick description of how this optimization actually works; "Specialized functions" are "instances" of the generic implementation, but the compiler is given extra context when compiling. In the case of iterating miniflow datastructures, the most interesting value to enable compile time optimizations is the loop trip count per unit. In order to create a specialized function, there is a generic implementation, which uses a for() loop without the compiler knowing the loop trip count at compile time. The loop trip count is passed in as an argument to the function: uint32_t miniflow_impl_generic(struct miniflow *mf, uint32_t loop_count) { for(uint32_t i = 0; i < loop_count; i++) // do work } In order to "specialize" the function, we call the generic implementation with hard-coded numbers - these are compile time constants! uint32_t miniflow_impl_loop5(struct miniflow *mf, uint32_t loop_count) { // use hard coded constant for compile-time constant-propogation return miniflow_impl_generic(mf, 5); } Given the compiler is aware of the loop trip count at compile time, it can perform an optimization known as "constant propogation". Combined with inlining of the miniflow_impl_generic() function, the compiler is now enabled to *compile time* unroll the loop 5x, and produce "flat" code. The last step to using the specialized functions is to utilize a function-pointer to choose the specialized (or generic) implementation. The selection of the function pointer is performed at subtable creation time, when miniflow fingerprint of the subtable is known. This technique is known as "multiple dispatch" in some literature, as it uses multiple items of information (miniflow bit counts) to select the dispatch function. By pointing the function pointer at the optimized implementation, OvS benefits from the compile time optimizations at runtime. Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Tested-by: Malvika Gupta <malvika.gupta@arm.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2019-07-18 14:03:06 +01:00
* Datapath classifer code refactored to enable function pointers to select
the lookup implementation at runtime. This enables specialization of
specific subtables based on the miniflow attributes, enhancing the
performance of the subtable search.
* Add Linux AF_XDP support through a new experimental netdev type "afxdp".
- OVSDB:
* OVSDB clients can now resynchronize with clustered servers much more
quickly after a brief disconnection, saving bandwidth and CPU time.
See section 4.1.15 of ovsdb-server(7) for details of related OVSDB
protocol extension.
* Support to convert from cluster database to standalone database is now
available when clustered is down and cannot be revived using ovsdb-tool
. Check "Database Migration Commands" in ovsdb-tool man section.
- OVN:
* IPAM/MACAM:
- select IPAM mac_prefix in a random manner if not provided by the user
- add the capability to specify a static IPv4 and/or IPv6 address and
get the L2 one allocated dynamically using the following syntax:
ovn-nbctl lsp-set-addresses <port> "dynamic <IPv4 addr> <IPv6 addr>"
* Added the HA chassis group support.
ovn: Support a new Logical_Switch_Port.type - 'external' In the case of OpenStack + OVN, when the VMs are booted on hypervisors supporting SR-IOV nics, there are no OVS ports for these VMs. When these VMs sends DHCPv4, DHPCv6 or IPv6 Router Solicitation requests, the local ovn-controller cannot reply to these packets. OpenStack Neutron dhcp agent service needs to be run to serve these requests. With the new logical port type - 'external', OVN itself can handle these requests avoiding the need to deploy any external services like neutron dhcp agent. To make use of this feature, CMS has to - create a logical port for such VMs - set the type to 'external' - create an HA chassis group and associate the logical port to it or associate an already existing HA chassis group. - create a localnet port for the logical switch - configure the ovn-bridge-mappings option in the OVS db. HA chassis with the highest priority becomes the master of the HA chassis group and the ovn-controller running in that 'chassis', claims the Port_Binding for that logical port and it adds the necessary DHCPv4/v6 OF flows. Since the packet enters the logical switch pipeline via the localnet port, the inport register (reg14) is set to the tunnel key of localnet port in the match conditions. In case the chassis goes down for some reason, next higher priority HA chassis becomes the master and claims the port. When the VM with the external port, sends an ARP request for the router ips, only the chassis which has claimed the port, will reply to the ARP requests. Rest of the chassis on receiving these packets drop them in the ingress switch datapath stage - S_SWITCH_IN_EXTERNAL_PORT which is just before S_SWITCH_IN_L2_LKUP. This would guarantee that only the chassis which has claimed the external ports will run the router datapath pipeline. Acked-by: Mark Michelson <mmichels@redhat.com> Acked-by: Han Zhou <hzhou8@ebay.com> Signed-off-by: Numan Siddique <nusiddiq@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
2019-03-28 11:40:17 +05:30
* Added 'external' logical port support.
* Added Policy-based routing(PBR) support to create permit/deny/reroute
policies on the logical router. New table(Logical_Router_Policy) added in
OVN-NB schema. New "ovn-nbctl" commands to add/delete/list PBR policies.
OVN: Add support for Transport Zones This patch is adding support for Transport Zones. Transport zones (a.k.a TZs) is way to enable users of OVN to separate Chassis into different logical groups that will only form tunnels between members of the same groups. Each Chassis can belong to one or more Transport Zones. If not set, the Chassis will be considered part of a default group. Configuring Transport Zones is done by creating a key called "ovn-transport-zones" in the external_ids column of the Open_vSwitch table from the local OVS instance. The value is a string with the name of the Transport Zone that this instance is part of. Multiple TZs can be specified with a comma-separated list. For example: $ sudo ovs-vsctl set open . external-ids:ovn-transport-zones=tz1 or $ sudo ovs-vsctl set open . external-ids:ovn-transport-zones=tz1,tz2,tz3 This configuration is also exposed in the Chassis table of the OVN Southbound Database in a new column called "transport_zones". The use for Transport Zones includes but are not limited to: * Edge computing: As a way to preventing edge sites from trying to create tunnels with every node on every other edge site while still allowing these sites to create tunnels with the central node. * Extra security layer: Where users wants to create "trust zones" and prevent computes in a more secure zone to communicate with a less secure zone. This patch is also backward compatible so the upgrade guide for OVN [0] is still valid and the ovn-controller service can be upgraded before the OVSDBs. [0] http://docs.openvswitch.org/en/latest/intro/install/ovn-upgrades/ Reported-by: Daniel Alvarez Sanchez <dalvarez@redhat.com> Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2019-February/048255.html Signed-off-by: Lucas Alvares Gomes <lucasagomes@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
2019-04-18 14:39:09 +01:00
* Support for Transport Zones, a way to separate chassis into
logical groups which results in tunnels only been formed between
members of the same transport zone(s).
* Support for IGMP Snooping and IGMP Querier.
- New QoS type "linux-netem" on Linux.
- Added support for TLS Server Name Indication (SNI).
- Linux datapath:
* Support for the kernel versions 4.19.x and 4.20.x.
* Support for the kernel version 5.0.x.
* Add support for conntrack zone-based timeout policy.
- 'ovs-dpctl dump-flows' is no longer suitable for dumping offloaded flows.
'ovs-appctl dpctl/dump-flows' should be used instead.
- Add L2 GRE tunnel over IPv6 support.
v2.11.0 - 19 Feb 2019
---------------------
- OpenFlow:
* OFPMP_TABLE_FEATURES_REQUEST can now modify table features.
- ovs-ofctl:
* "mod-table" command can now change OpenFlow table names.
- ovn:
* OVN-SB schema changed: duplicated IP with same Encapsulation type
is not allowed any more. Please refer to
Documentation/intro/install/ovn-upgrades.rst for the instructions
in case there are problems encountered when upgrading from an earlier
version.
* New support for IPSEC encrypted tunnels between hypervisors.
* ovn-ctl: allow passing user:group ids to the OVN daemons.
* IPAM/MACAM:
- add the capability to dynamically assign just L2 addresses
- add the capability to specify a static ip address and get the L2 one
allocated dynamically using the following syntax:
ovn-nbctl lsp-set-addresses <port> "dynamic <IP>"
- DPDK:
* Add support for DPDK 18.11
* Add support for port representors.
- Userspace datapath:
* Add option for simple round-robin based Rxq to PMD assignment.
It can be set with pmd-rxq-assign.
Adding support for PMD auto load balancing Port rx queues that have not been statically assigned to PMDs are currently assigned based on periodically sampled load measurements. The assignment is performed at specific instances – port addition, port deletion, upon reassignment request via CLI etc. Due to change in traffic pattern over time it can cause uneven load among the PMDs and thus resulting in lower overall throughout. This patch enables the support of auto load balancing of PMDs based on measured load of RX queues. Each PMD measures the processing load for each of its associated queues every 10 seconds. If the aggregated PMD load reaches 95% for 6 consecutive intervals then PMD considers itself to be overloaded. If any PMD is overloaded, a dry-run of the PMD assignment algorithm is performed by OVS main thread. The dry-run does NOT change the existing queue to PMD assignments. If the resultant mapping of dry-run indicates an improved distribution of the load then the actual reassignment will be performed. The automatic rebalancing will be disabled by default and has to be enabled via configuration option. The interval (in minutes) between two consecutive rebalancing can also be configured via CLI, default is 1 min. Following example commands can be used to set the auto-lb params: ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="true" ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-rebalance-intvl="5" Co-authored-by: Rohith Basavaraja <rohith.basavaraja@gmail.com> Co-authored-by: Venkatesan Pradeep <venkatesan.pradeep@ericsson.com> Signed-off-by: Rohith Basavaraja <rohith.basavaraja@gmail.com> Signed-off-by: Venkatesan Pradeep <venkatesan.pradeep@ericsson.com> Signed-off-by: Nitin Katiyar <nitin.katiyar@ericsson.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Tested-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2019-01-16 05:41:43 +00:00
* Add support for Auto load balancing of PMDs (experimental)
* Added new per-port configurable option to manage EMC:
'other_config:emc-enable'.
- Add 'symmetric_l3' hash function.
- OVS now honors 'updelay' and 'downdelay' for bonds with LACP configured.
revalidator: Rebalance offloaded flows based on the pps rate This is the third patch in the patch-set to support dynamic rebalancing of offloaded flows. The dynamic rebalancing functionality is implemented in this patch. The ukeys that are not scheduled for deletion are obtained and passed as input to the rebalancing routine. The rebalancing is done in the context of revalidation leader thread, after all other revalidator threads are done with gathering rebalancing data for flows. For each netdev that is in OOR state, a list of flows - both offloaded and non-offloaded (pending) - is obtained using the ukeys. For each netdev that is in OOR state, the flows are grouped and sorted into offloaded and pending flows. The offloaded flows are sorted in descending order of pps-rate, while pending flows are sorted in ascending order of pps-rate. The rebalancing is done in two phases. In the first phase, we try to offload all pending flows and if that succeeds, the OOR state on the device is cleared. If some (or none) of the pending flows could not be offloaded, then we start replacing an offloaded flow that has a lower pps-rate than a pending flow, until there are no more pending flows with a higher rate than an offloaded flow. The flows that are replaced from the device are added into kernel datapath. A new OVS configuration parameter "offload-rebalance", is added to ovsdb. The default value of this is "false". To enable this feature, set the value of this parameter to "true", which provides packets-per-second rate based policy to dynamically offload and un-offload flows. Note: This option can be enabled only when 'hw-offload' policy is enabled. It also requires 'tc-policy' to be set to 'skip_sw'; otherwise, flow offload errors (specifically ENOSPC error this feature depends on) reported by an offloaded device are supressed by TC-Flower kernel module. Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Co-authored-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com> Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com> Reviewed-by: Sathya Perla <sathya.perla@broadcom.com> Reviewed-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Simon Horman <simon.horman@netronome.com>
2018-10-18 21:43:14 +05:30
- ovs-vswitchd:
* New configuration option "offload-rebalance", that enables dynamic
rebalancing of offloaded flows.
- The environment variable OVS_SYSLOG_METHOD, if set, is now used
as the default syslog method.
- The environment variable OVS_CTL_TIMEOUT, if set, is now used
as the default timeout for control utilities.
- The environment variable OVS_RESOLV_CONF, if set, is now used
as the DNS server configuration file.
- RHEL packaging:
* OVN packages are split from OVS packages. A new spec
file - ovn-fedora.spec.in is added to generate OVN packages.
- Linux datapath:
* Support for the kernel versions 4.16.x, 4.17.x, and 4.18.x.
v2.10.0 - 18 Aug 2018
---------------------
- ovs-vswitchd and utilities now support DNS names in OpenFlow and
OVSDB remotes.
- ovs-vswitchd:
* New options --l7 and --l7-len to "ofproto/trace" command.
* Previous versions gave OpenFlow tables default names of the form
"table#". These are not helpful names for the purpose of accepting
and displaying table names, so now tables by default have no names.
* The "null" interface type, deprecated since 2013, has been removed.
* Add minimum network namespace support for Linux.
* New command "lacp/show-stats"
- ovs-ofctl:
* ovs-ofctl now accepts and display table names in place of numbers. By
default it always accepts names and in interactive use it displays them;
use --names or --no-names to override. See ovs-ofctl(8) for details.
- ovs-vsctl: New commands "add-bond-iface" and "del-bond-iface".
- ovs-dpctl:
* New commands "ct-set-limits", "ct-del-limits", and "ct-get-limits".
- OpenFlow:
* OFPT_ROLE_STATUS is now available in OpenFlow 1.3.
* OpenFlow 1.5 extensible statistics (OXS) now implemented.
* New OpenFlow 1.0 extensions for group support.
* Default selection method for select groups is now dp_hash with improved
accuracy.
- Linux datapath
* Add support for compiling OVS with the latest Linux 4.14 kernel.
* Added support for meters.
datapath: conntrack: Support conntrack zone limit Upstream commit: commit 11efd5cb04a184eea4f57b68ea63dddd463158d1 Author: Yi-Hung Wei <yihung.wei@gmail.com> Date: Thu May 24 17:56:43 2018 -0700 openvswitch: Support conntrack zone limit Currently, nf_conntrack_max is used to limit the maximum number of conntrack entries in the conntrack table for every network namespace. For the VMs and containers that reside in the same namespace, they share the same conntrack table, and the total # of conntrack entries for all the VMs and containers are limited by nf_conntrack_max. In this case, if one of the VM/container abuses the usage the conntrack entries, it blocks the others from committing valid conntrack entries into the conntrack table. Even if we can possibly put the VM in different network namespace, the current nf_conntrack_max configuration is kind of rigid that we cannot limit different VM/container to have different # conntrack entries. To address the aforementioned issue, this patch proposes to have a fine-grained mechanism that could further limit the # of conntrack entries per-zone. For example, we can designate different zone to different VM, and set conntrack limit to each zone. By providing this isolation, a mis-behaved VM only consumes the conntrack entries in its own zone, and it will not influence other well-behaved VMs. Moreover, the users can set various conntrack limit to different zone based on their preference. The proposed implementation utilizes Netfilter's nf_conncount backend to count the number of connections in a particular zone. If the number of connection is above a configured limitation, ovs will return ENOMEM to the userspace. If userspace does not configure the zone limit, the limit defaults to zero that is no limitation, which is backward compatible to the behavior without this patch. The following high leve APIs are provided to the userspace: - OVS_CT_LIMIT_CMD_SET: * set default connection limit for all zones * set the connection limit for a particular zone - OVS_CT_LIMIT_CMD_DEL: * remove the connection limit for a particular zone - OVS_CT_LIMIT_CMD_GET: * get the default connection limit for all zones * get the connection limit for a particular zone Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Justin Pettit <jpettit@ovn.org>
2018-08-17 02:05:06 -07:00
* Add support for conntrack zone limit.
- ovn:
* Implemented icmp4/icmp6/tcp_reset actions in order to drop the packet
and reply with a RST for TCP or ICMPv4/ICMPv6 unreachable message for
other IPv4/IPv6-based protocols whenever a reject ACL rule is hit.
* ACL match conditions can now match on Port_Groups as well as address
sets that are automatically generated by Port_Groups. ACLs can be
applied directly to Port_Groups as well.
* ovn-nbctl can now run as a daemon (long-lived, background process).
See ovn-nbctl(8) for details.
- DPDK:
* New 'check-dpdk' Makefile target to run a new system testsuite.
See Testing topic for the details.
* Add LSC interrupt support for DPDK physical devices.
* Allow init to fail and record DPDK status/version in OVS database.
* Add experimental flow hardware offload support
dpdk: Support both shared and per port mempools. This commit re-introduces the concept of shared mempools as the default memory model for DPDK devices. Per port mempools are still available but must be enabled explicitly by a user. OVS previously used a shared mempool model for ports with the same MTU and socket configuration. This was replaced by a per port mempool model to address issues flagged by users such as: https://mail.openvswitch.org/pipermail/ovs-discuss/2016-September/042560.html However the per port model potentially requires an increase in memory resource requirements to support the same number of ports and configuration as the shared port model. This is considered a blocking factor for current deployments of OVS when upgrading to future OVS releases as a user may have to redimension memory for the same deployment configuration. This may not be possible for users. This commit resolves the issue by re-introducing shared mempools as the default memory behaviour in OVS DPDK but also refactors the memory configuration code to allow for per port mempools. This patch adds a new global config option, per-port-memory, that controls the enablement of per port mempools for DPDK devices. ovs-vsctl set Open_vSwitch . other_config:per-port-memory=true This value defaults to false; to enable per port memory support, this field should be set to true when setting other global parameters on init (such as "dpdk-socket-mem", for example). Changing the value at runtime is not supported, and requires restarting the vswitch daemon. The mempool sweep functionality is also replaced with the sweep functionality from OVS 2.9 found in commits c77f692 (netdev-dpdk: Free mempool only when no in-use mbufs.) a7fb0a4 (netdev-dpdk: Add mempool reuse/free debug.) A new document to discuss the specifics of the memory models and example memory requirement calculations is also added. Signed-off-by: Ian Stokes <ian.stokes@intel.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Tiago Lam <tiago.lam@intel.com> Tested-by: Tiago Lam <tiago.lam@intel.com>
2018-06-27 14:58:31 +01:00
* Support both shared and per port mempools for DPDK devices.
dpif-netdev: Detailed performance stats for PMDs This patch instruments the dpif-netdev datapath to record detailed statistics of what is happening in every iteration of a PMD thread. The collection of detailed statistics can be controlled by a new Open_vSwitch configuration parameter "other_config:pmd-perf-metrics". By default it is disabled. The run-time overhead, when enabled, is in the order of 1%. The covered metrics per iteration are: - cycles - packets - (rx) batches - packets/batch - max. vhostuser qlen - upcalls - cycles spent in upcalls This raw recorded data is used threefold: 1. In histograms for each of the following metrics: - cycles/iteration (log.) - packets/iteration (log.) - cycles/packet - packets/batch - max. vhostuser qlen (log.) - upcalls - cycles/upcall (log) The histograms bins are divided linear or logarithmic. 2. A cyclic history of the above statistics for 999 iterations 3. A cyclic history of the cummulative/average values per millisecond wall clock for the last 1000 milliseconds: - number of iterations - avg. cycles/iteration - packets (Kpps) - avg. packets/batch - avg. max vhost qlen - upcalls - avg. cycles/upcall The gathered performance metrics can be printed at any time with the new CLI command ovs-appctl dpif-netdev/pmd-perf-show [-nh] [-it iter_len] [-ms ms_len] [-pmd core] [dp] The options are -nh: Suppress the histograms -it iter_len: Display the last iter_len iteration stats -ms ms_len: Display the last ms_len millisecond stats -pmd core: Display only the specified PMD The performance statistics are reset with the existing dpif-netdev/pmd-stats-clear command. The output always contains the following global PMD statistics, similar to the pmd-stats-show command: Time: 15:24:55.270 Measurement duration: 1.008 s pmd thread numa_id 0 core_id 1: Cycles: 2419034712 (2.40 GHz) Iterations: 572817 (1.76 us/it) - idle: 486808 (15.9 % cycles) - busy: 86009 (84.1 % cycles) Rx packets: 2399607 (2381 Kpps, 848 cycles/pkt) Datapath passes: 3599415 (1.50 passes/pkt) - EMC hits: 336472 ( 9.3 %) - Megaflow hits: 3262943 (90.7 %, 1.00 subtbl lookups/hit) - Upcalls: 0 ( 0.0 %, 0.0 us/upcall) - Lost upcalls: 0 ( 0.0 %) Tx packets: 2399607 (2381 Kpps) Tx batches: 171400 (14.00 pkts/batch) Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Acked-by: Billy O'Mahony <billy.o.mahony@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-04-19 19:40:45 +02:00
- Userspace datapath:
* Commands ovs-appctl dpif-netdev/pmd-*-show can now work on a single PMD
* Detailed PMD performance metrics available with new command
ovs-appctl dpif-netdev/pmd-perf-show
dpif-netdev: Detection and logging of suspicious PMD iterations This patch enhances dpif-netdev-perf to detect iterations with suspicious statistics according to the following criteria: - iteration lasts longer than US_THR microseconds (default 250). This can be used to capture events where a PMD is blocked or interrupted for such a period of time that there is a risk for dropped packets on any of its Rx queues. - max vhost qlen exceeds a threshold Q_THR (default 128). This can be used to infer virtio queue overruns and dropped packets inside a VM, which are not visible in OVS otherwise. Such suspicious iterations can be logged together with their iteration statistics to be able to correlate them to packet drop or other events outside OVS. A new command is introduced to enable/disable logging at run-time and to adjust the above thresholds for suspicious iterations: ovs-appctl dpif-netdev/pmd-perf-log-set on | off [-b before] [-a after] [-e|-ne] [-us usec] [-q qlen] Turn logging on or off at run-time (on|off). -b before: The number of iterations before the suspicious iteration to be logged (default 5). -a after: The number of iterations after the suspicious iteration to be logged (default 5). -e: Extend logging interval if another suspicious iteration is detected before logging occurs. -ne: Do not extend logging interval (default). -q qlen: Suspicious vhost queue fill level threshold. Increase this to 512 if the Qemu supports 1024 virtio queue length. (default 128). -us usec: change the duration threshold for a suspicious iteration (default 250 us). Note: Logging of suspicious iterations itself consumes a considerable amount of processing cycles of a PMD which may be visible in the iteration history. In the worst case this can lead OVS to detect another suspicious iteration caused by logging. If more than 100 iterations around a suspicious iteration have been logged once, OVS falls back to the safe default values (-b 5/-a 5/-ne) to avoid that logging itself causes continuos further logging. Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Acked-by: Billy O'Mahony <billy.o.mahony@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-04-19 19:40:46 +02:00
* Supervision of PMD performance metrics and logging of suspicious
iterations
* Add signature match cache (SMC) as experimental feature. When turned on,
it improves throughput when traffic has many more flows than EMC size.
- ERSPAN:
* Implemented ERSPAN protocol (draft-foschiano-erspan-00.txt) for
both kernel datapath and userspace datapath.
* Added port-based and flow-based ERSPAN tunnel port support, added
OpenFlow rules matching ERSPAN fields. See ovs-fields(7).
- ovs-pki
* ovs-pki now generates x.509 version 3 certificate. The new format adds
subjectAltName field and sets its value the same as common name (CN).
v2.9.0 - 19 Feb 2018
--------------------
- NSH implementation now conforms to latest draft (draft-ietf-sfc-nsh-28).
* Add ttl field.
* Add a new action dec_nsh_ttl.
* Enable NSH support in kernel datapath.
- OVSDB has new, experimental support for database clustering:
* New high-level documentation in ovsdb(7).
* New file format documentation for developers in ovsdb(5).
* Protocol documentation moved from ovsdb-server(1) to ovsdb-server(7).
* ovsdb-server now supports online schema conversion via
"ovsdb-client convert".
* ovsdb-server now always hosts a built-in database named _Server. See
ovsdb-server(5) for more details.
* ovsdb-client: New "get-schema-cksum", "query", "backup", "restore",
and "wait" commands. New --timeout option.
* ovsdb-tool: New "create-cluster", "join-cluster", "db-cid", "db-sid",
"db-local-address", "db-is-clustered", "db-is-standalone", "db-name",
"schema-name", "compare-versions", and "check-cluster" commands.
* ovsdb-server: New ovs-appctl commands for managing clusters.
* ovs-sandbox: New support for clustered databases.
- ovs-vsctl and other commands that display data in tables now support a
--max-column-width option to limit column width.
- No longer slow-path traffic that sends to a controller. Applications,
such as OVN ACL logging, want to send a copy of a packet to a
controller while leaving the actual packet forwarding in the datapath.
- OVN:
* The "requested-chassis" option for a logical switch port now accepts a
chassis "hostname" in addition to a chassis "name".
* IPv6
- Added support to send IPv6 Router Advertisement packets in response to
the IPv6 Router Solicitation packets from the VIF ports.
- Added support to generate Neighbor Solicitation packets using the OVN
action 'nd_ns' to resolve unknown next hop MAC addresses for the
IPv6 packets.
* Add support for QoS bandwidth limit with DPDK.
* ovn-ctl: New commands run_nb_ovsdb and run_sb_ovsdb.
* ovn-sbctl, ovn-nbctl: New options --leader-only, --no-leader-only.
- OpenFlow:
* ct_clear action is now backed by kernel datapath. Support is probed for
when OVS starts.
- Linux kernel 4.13
* Add support for compiling OVS with the latest Linux 4.13 kernel
- ovs-dpctl and related ovs-appctl commands:
* "flush-conntrack" now accept a 5-tuple to delete a specific
connection tracking entry.
* New "ct-set-maxconns", "ct-get-maxconns", and "ct-get-nconns" commands
for userspace datapath.
- No longer send packets to the Linux TAP device if it's DOWN unless it is
in another networking namespace.
- DPDK:
* Add support for DPDK v17.11
* Add support for vHost IOMMU
* New debug appctl command 'netdev-dpdk/get-mempool-info'.
* All the netdev-dpdk appctl commands described in ovs-vswitchd man page.
* Custom statistics:
- DPDK physical ports now return custom set of "dropped", "error" and
"management" statistics.
- ovs-ofctl dump-ports command now prints new of set custom statistics
if available (for OpenFlow 1.4+).
* Switch from round-robin allocation of rxq to pmd assignments to a
utilization-based allocation.
* New appctl command 'dpif-netdev/pmd-rxq-rebalance' to rebalance rxq to
pmd assignments.
* Add rxq utilization of pmd to appctl 'dpif-netdev/pmd-rxq-show'.
* Add support for vHost dequeue zero copy (experimental).
- Userspace datapath:
* Output packet batching support.
- vswitchd:
* Datapath IDs may now be specified as 0x1 (etc.) instead of 16 digits.
* Configuring a controller, or unconfiguring all controllers, now deletes
all groups and meters (as well as all flows).
- New --enable-sparse configure option enables "sparse" checking by default.
- Added additional information to vhost-user status.
v2.8.0 - 31 Aug 2017
--------------------
- ovs-ofctl:
* ovs-ofctl can now accept and display port names in place of numbers. By
default it always accepts names and in interactive use it displays them;
use --names or --no-names to override. See ovs-ofctl(8) for details.
* "ovs-ofctl dump-flows" now accepts --no-stats to omit flow statistics.
- New ovs-dpctl command "ct-stats-show" to show connection tracking stats.
- Tunnels:
* Added support to set packet mark for tunnel endpoint using
`egress_pkt_mark` OVSDB option.
* When using Linux kernel datapath tunnels may be created using rtnetlink.
This will allow us to take advantage of new tunnel features without
having to make changes to the vport modules.
- EMC insertion probability is reduced to 1% and is configurable via
the new 'other_config:emc-insert-inv-prob' option.
- DPDK:
* DPDK log messages redirected to OVS logging subsystem.
Log level can be changed in a usual OVS way using
'ovs-appctl vlog' commands for 'dpdk' module. Lower bound
still can be configured via extra arguments for DPDK EAL.
* dpdkvhostuser ports are marked as deprecated. They will be removed
in an upcoming release.
* Support for DPDK v17.05.1.
- IPFIX now provides additional counters:
* Total counters since metering process startup.
* Per-flow TCP flag counters.
* Multicast, broadcast, and unicast counters.
- New support for multiple VLANs (802.1ad or "QinQ"), including a new
"dot1q-tunnel" port VLAN mode.
- In ovn-vsctl and vtep-ctl, record UUIDs in commands may now be
abbreviated to 4 hex digits.
- Userspace Datapath:
* Added NAT support for userspace datapath.
* Added FTP and TFTP support with NAT for userspace datapath.
* Experimental NSH (Network Service Header) support in userspace datapath.
- OVN:
* New built-in DNS support.
* IPAM for IPv4 can now exclude user-defined addresses from assignment.
* IPAM can now assign IPv6 addresses.
* Make the DHCPv4 router setting optional.
* Gratuitous ARP for NAT addresses on a distributed logical router.
* Allow ovn-controller SSL configuration to be obtained from vswitchd
database.
* ovn-trace now has basic support for tracing distributed firewalls.
* In ovn-nbctl and ovn-sbctl, record UUIDs in commands may now be
abbreviated to 4 hex digits.
* "ovn-sbctl lflow-list" can now print OpenFlow flows that correspond
to logical flows.
* Now uses OVSDB RBAC support to reduce impact of compromised hypervisors.
* Multiple chassis may now be specified for L3 gateways. When more than
one chassis is specified, OVN will manage high availability for that
gateway.
* Add support for ACL logging.
* ovn-northd now has native support for active-standby high availability.
- Tracing with ofproto/trace now traces through recirculation.
- OVSDB:
* New support for role-based access control (see ovsdb-server(1)).
- New commands 'stp/show' and 'rstp/show' (see ovs-vswitchd(8)).
- OpenFlow:
* All features required by OpenFlow 1.4 are now implemented, so
ovs-vswitchd now enables OpenFlow 1.4 by default (in addition to
OpenFlow 1.0 to 1.3).
* Increased support for OpenFlow 1.6 (draft).
* Bundles now support hashing by just nw_src or nw_dst.
* The "learn" action now supports a "limit" option (see ovs-ofctl(8)).
* The port status bit OFPPS_LIVE now reflects link aliveness.
* OpenFlow 1.5 packet-out is now supported.
OF support and translation of generic encap and decap This commit adds support for the OpenFlow actions generic encap and decap (as specified in ONF EXT-382) to the OVS control plane. CLI syntax for encap action with properties: encap(<header>) encap(<header>(<prop>=<value>,<tlv>(<class>,<type>,<value>),...)) For example: encap(ethernet) encap(nsh(md_type=1)) encap(nsh(md_type=2,tlv(0x1000,10,0x12345678),tlv(0x2000,20,0xfedcba9876543210))) CLI syntax for decap action: decap() decap(packet_type(ns=<pt_ns>,type=<pt_type>)) For example: decap() decap(packet_type(ns=0,type=0xfffe)) decap(packet_type(ns=1,type=0x894f)) The first header supported for encap and decap is "ethernet" to convert packets between packet_type (1,Ethertype) and (0,0). This commit also implements a skeleton for the translation of generic encap and decap actions in ofproto-dpif and adds support to encap and decap an Ethernet header. In general translation of encap commits pending actions and then rewrites struct flow in accordance with the new packet type and header. In the case of encap(ethernet) it suffices to change the packet type from (1, Ethertype) to (0,0) and set the dl_type accordingly. A new pending_encap flag in xlate ctx is set to mark that an corresponding datapath encap action must be triggered at the next commit. In the case of encap(ethernet) ofproto generetas a push_eth action. The general case for translation of decap() is to emit a datapath action to decap the current outermost header and then recirculate the packet to reparse the inner headers. In the special case of an Ethernet packet, decap() just changes the packet type from (0,0) to (1, dl_type) without a need to recirculate. The emission of the pop_eth action for the datapath is postponed to the next commit. Hence encap(ethernet) and decap() on an Ethernet packet are OF octions that only incur a cost in the dataplane when a modifed packet is actually committed, e.g. because it is sent out. They can freely be used for normalizing the packet type in the OF pipeline without degrading performance. Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Zoltan Balogh <zoltan.balogh@ericsson.com> Co-authored-by: Zoltan Balogh <zoltan.balogh@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-08-02 16:04:12 +08:00
* Support for OpenFlow 1.5 field packet_type and packet-type-aware
pipeline (PTAP).
* Added generic encap and decap actions (EXT-382).
First supported use case is encap/decap for Ethernet.
* Added NSH (Network Service Header) support in userspace
Used generic encap and decap actions to implement encapsulation and
decapsulation of NSH header.
IETF NSH draft - https://datatracker.ietf.org/doc/draft-ietf-sfc-nsh/
* Conntrack state is only available to the processing path that
follows the "recirc_table" argument of the ct() action. Starting
in OVS 2.8, this state is now cleared for the current processing
path whenever ct() is called.
- Fedora Packaging:
* OVN services are no longer restarted automatically after upgrade.
* ovs-vswitchd and ovsdb-server run as non-root users by default.
- Add --cleanup option to command 'ovs-appctl exit' (see ovs-vswitchd(8)).
- L3 tunneling:
* Use new tunnel port option "packet_type" to configure L2 vs. L3.
OF support and translation of generic encap and decap This commit adds support for the OpenFlow actions generic encap and decap (as specified in ONF EXT-382) to the OVS control plane. CLI syntax for encap action with properties: encap(<header>) encap(<header>(<prop>=<value>,<tlv>(<class>,<type>,<value>),...)) For example: encap(ethernet) encap(nsh(md_type=1)) encap(nsh(md_type=2,tlv(0x1000,10,0x12345678),tlv(0x2000,20,0xfedcba9876543210))) CLI syntax for decap action: decap() decap(packet_type(ns=<pt_ns>,type=<pt_type>)) For example: decap() decap(packet_type(ns=0,type=0xfffe)) decap(packet_type(ns=1,type=0x894f)) The first header supported for encap and decap is "ethernet" to convert packets between packet_type (1,Ethertype) and (0,0). This commit also implements a skeleton for the translation of generic encap and decap actions in ofproto-dpif and adds support to encap and decap an Ethernet header. In general translation of encap commits pending actions and then rewrites struct flow in accordance with the new packet type and header. In the case of encap(ethernet) it suffices to change the packet type from (1, Ethertype) to (0,0) and set the dl_type accordingly. A new pending_encap flag in xlate ctx is set to mark that an corresponding datapath encap action must be triggered at the next commit. In the case of encap(ethernet) ofproto generetas a push_eth action. The general case for translation of decap() is to emit a datapath action to decap the current outermost header and then recirculate the packet to reparse the inner headers. In the special case of an Ethernet packet, decap() just changes the packet type from (0,0) to (1, dl_type) without a need to recirculate. The emission of the pop_eth action for the datapath is postponed to the next commit. Hence encap(ethernet) and decap() on an Ethernet packet are OF octions that only incur a cost in the dataplane when a modifed packet is actually committed, e.g. because it is sent out. They can freely be used for normalizing the packet type in the OF pipeline without degrading performance. Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Zoltan Balogh <zoltan.balogh@ericsson.com> Co-authored-by: Zoltan Balogh <zoltan.balogh@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-08-02 16:04:12 +08:00
* In conjunction with PTAP tunnel ports can handle a mix of L2 and L3
payload.
* New vxlan tunnel extension "gpe" to support VXLAN-GPE tunnels.
* New support for non-Ethernet (L3) payloads in GRE and VXLAN-GPE.
bfd: Detect Multiplier configuration Mult value (bfd.DetectMult in RFC5880) is hard-coded and equal to 3 in current openvswitch. As a consequence remote and local mult is the same. In this commit the mult (Detect Multiplier/bfd.DetectMult/Detect Mult) can be set on each interface setting the mult=<value> in bfd Column in Interface table of ovsdb database. Example: ovs-vsctl set Interface p1 bfd:mult=4 sets mult=4 on p1 interface The modification based on RFC5880 June 2010. The relevant paragraphs are: 4.1. Generic BFD Control Packet Format 6.8.4. Calculating the Detection Time 6.8.7. Transmitting BFD Control Packets 6.8.12. Detect Multiplier Change The mult value is set to default 3 if it is not set in ovsdb. This provides backward compatibility to previous openvswitch behaviour. The RFC5880 says in 6.8.1 that DetectMult shall be a non-zero integer. In RFC5880 4.1. "Detect Mult" has 8 bit length and is declared as a 8 bit unsigned integer in bfd.c. Consequently mult value shall be greater than 0 and less then 256. In case of incorrect mult value is given in ovsdb the default value (3) will be set and a message is logged into ovs-vswitchd.log on that. Local or remote mult value change is also logged into ovs-vswitchd.log. Since remote and local mult is not the same calculation of detect time has been changed. Due to RFC5880 6.8.4 Detection Time is calculated using mult value of the remote system. Detection time is recalculated due to remote mult change. The BFD packet transmission jitter is different in case of mult=1 due to RFC5880 6.8.7. The maximum interval of the transmitted bfd packet is 90% of the transmission interval. The value of remote mult is printed in the last line of the output of ovs-appctl bfd/show command with label: Remote Detect Mult. There is a feature in openvswitch connected with forwarding_if_rx that is not the part of RFC5880. This feature also uses mult value but it is not specified if local or remote since it was the same in original code. The relevant description in code: /* When 'bfd->forwarding_if_rx' is set, at least one bfd control packet * is required to be received every 100 * bfd->cfg_min_rx. If bfd * control packet is not received within this interval, even if data * packets are received, the bfd->forwarding will still be false. */ Due to lack of specification local mult value is used for calculation of forwarding_if_rx_detect_time. This detect time is recalculated at mult change if forwarding_if_rx is true and bfd is in UP state. A new unit test has been added: "bfd - Edit the Detect Mult values" The following cases are tested: - Without setting mult the mult will be the default value (3). - The setting of the lowest (1) and highest (255) valid mult value and the detection of remote mult value. - The setting of out of range mult value (0, 256) in ovsdb results sets default value in ovs-vswitchd - Clearing non default mult value from ovsdb results sets default value in ovs-vswitchd. Signed-off-by: Gábor Szűcs <gabor.sz.cs@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-06-06 17:11:54 +02:00
- The BFD detection multiplier is now user-configurable.
- Add experimental support for hardware offloading
* HW offloading is disabled by default.
* HW offloading is done through the TC interface.
- IPv6 link local addresses are now supported on Linux. Use % to designate
the scope device.
v2.7.0 - 21 Feb 2017
---------------------
- Utilities and daemons that support SSL now allow protocols and
ciphers to be configured with --ssl-protocols and --ssl-ciphers.
- OVN:
* QoS is now implemented via egress shaping rather than ingress policing.
* DSCP marking is now supported, via the new northbound QoS table.
* IPAM now supports fixed MAC addresses.
ovn: Add a case of policy based routing. OVN currently supports multiple gateway routers (residing on different chassis) connected to the same logical topology. When external traffic enters the logical topology, they can enter from any gateway routers and reach its eventual destination. This is achieved with proper static routes configured on the gateway routers. But when traffic is initiated in the logical space by a logical port, we do not have a good way to distribute that traffic across multiple gateway routers. This commit introduces one particular way to do it. Based on the source IP address or source IP network of the packet, we can now jump to a specific gateway router. This is very useful for a specific use case of Kubernetes. When traffic is initiated inside a container heading to outside world, we want to be able to send such traffic outside the gateway router residing in the same host as that of the container. Since each host gets a specific subnet, we can use source IP address based policy routing to decide on the gateway router. Rationale for using the same routing table for both source and destination IP address based routing: Some hardware network vendors support policy routing in a different table on arbitrary "match". And when a packet enters, if there is a match in policy based routing table, the default routing table is not consulted at all. In case of OVN, we mainly want policy based routing for north-south traffic. We want east-west traffic to flow as-is. Creating a separate table for policy based routing complicates the configuration quite a bit. For e.g., if we have a source IP network based rule added, to decide a particular gateway router as a next hop, we should add rules at a higher priority for all the connected routes to make sure that east-west traffic is not effected in the policy based routing table itself. Signed-off-by: Gurucharan Shetty <guru@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
2016-10-06 03:33:17 -07:00
* Support for source IP address based routing.
* ovn-trace:
- New --ovs option to also print OpenFlow flows.
- put_dhcp_opts and put_dhcp_optsv6 actions may now be traced.
* Support for managing SSL and remote connection configuration in
northbound and southbound databases.
ovn-ctl: add support for SSL nb/sb db connections Add support for SSL connections to OVN northbound and/or southbound databases. To improve security, the NB and SB ovsdb daemons no longer have open ptcp connections by default. This is a change in behavior from previous versions, users wishing to use TCP connections to the NB/SB daemons can either request that a passive TCP connection be used via ovn-ctl command-line options (e.g. via OVN_CTL_OPTS/OVN_NORTHD_OPTS in startup scripts): --db-sb-create-insecure-remote=yes --db-nb-create-insecure-remote=yes Or configure a connection after the NB/SB daemons have been started, e.g.: ovn-sbctl set-connection ptcp:6642 ovn-nbctl set-connection ptcp:6641 Users desiring SSL database connections will need to generate certificates and private key as described in INSTALL.SSL.rst and perform the following one-time configuration steps: ovn-sbctl set-ssl <private-key> <certificate> <ca-cert> ovn-sbctl set-connection pssl:6642 ovn-nbctl set-ssl <private-key> <certificate> <ca-cert> ovn-nbctl set-connection pssl:6641 On the ovn-controller and ovn-controller-vtep side, SSL configuration must be provided on the command-line when the daemons are started, this should be provided via the following command-line options (e.g. via OVN_CTL_OPTS/OVN_CONTROLLER_OPTS in startup scripts): --ovn-controller-ssl-key=<private-key> --ovn-controller-ssl-cert=<certificate> --ovn-controller-ssl-ca-cert=<ca-cert> The SB database connection should also be configured to use SSL, e.g.: ovs-vsctl set Open_vSwitch . \ external-ids:ovn-remote=ssl:w.x.y.z:6642 Acked-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Lance Richardson <lrichard@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-01-03 13:29:10 -05:00
* TCP connections to northbound and southbound databases are no
longer enabled by default and must be explicitly configured.
See documentation for ovn-sbctl/ovn-nbctl "set-connection"
command or the ovn-ctl "--db-sb-create-insecure-remote" and
"--db-nb-create-insecure-remote" command-line options for
information regarding remote connection configuration.
* New appctl "inject-pkt" command in ovn-controller that allows
packets to be injected into the connected OVS instance.
* Distributed logical routers may now be connected directly to
logical switches with localnet ports, by specifying a
"redirect-chassis" on the distributed gateway port of the
logical router. NAT rules may be specified directly on the
distributed logical router, and are handled either centrally on
the "redirect-chassis", or in many cases are handled locally on
the hypervisor where the corresponding logical port resides.
Gratuitous ARP for NAT addresses on a distributed logical
router is not yet supported, but will be added in a future
version.
- Fixed regression in table stats maintenance introduced in OVS
2.3.0, wherein the number of OpenFlow table hits and misses was
not accurate.
- OpenFlow:
* OFPT_PACKET_OUT messages are now supported in bundles.
* A new "selection_method=dp_hash" type for OpenFlow select group
bucket selection that uses the datapath computed 5-tuple hash
without making datapath flows match the 5-tuple fields, which
is useful for more efficient load balancing, for example. This
uses the Netronome extension to OpenFlow 1.5+ that allows
control over the OpenFlow select groups selection method. See
"selection_method" and related options in ovs-ofctl(8) for
details.
* The "sample" action now supports "ingress" and "egress" options.
* The "ct" action now supports the TFTP ALG where support is available.
* New actions "clone" and "ct_clear".
* The "meter" action is now supported in the userspace datapath.
- ovs-ofctl:
* 'bundle' command now supports packet-out messages.
* New syntax for 'ovs-ofctl packet-out' command, which uses the
same string parser as the 'bundle' command. The old 'packet-out'
syntax is deprecated and will be removed in a later OVS
release.
* New unixctl "ofctl/packet-out" command, which can be used to
instruct a flow monitor to issue OpenFlow packet-out messages.
- ovsdb-server:
* Remote connections can now be made read-only (see ovsdb-server(1)).
tun-metadata: Manage tunnel TLV mapping table on a per-bridge basis. When using tunnel TLVs (at the moment, this means Geneve options), a controller must first map the class and type onto an appropriate OXM field so that it can be used in OVS flow operations. This table is managed using OpenFlow extensions. The original code that added support for TLVs made the mapping table global as a simplification. However, this is not really logically correct as the OpenFlow management commands are operating on a per-bridge basis. This removes the original limitation to make the table per-bridge. One nice result of this change is that it is generally clearer whether the tunnel metadata is in datapath or OpenFlow format. Rather than allowing ad-hoc format changes and trying to handle both formats in the tunnel metadata functions, the format is more clearly separated by function. Datapaths (both kernel and userspace) use datapath format and it is not changed during the upcall process. At the beginning of action translation, tunnel metadata is converted to OpenFlow format and flows and wildcards are translated back at the end of the process. As an additional benefit, this change improves performance in some flow setup situations by keeping the tunnel metadata in the original packet format in more cases. This helps when copies need to be made as the amount of data touched is only what is present in the packet rather than the maximum amount of metadata supported. Co-authored-by: Madhu Challa <challa@noironetworks.com> Signed-off-by: Madhu Challa <challa@noironetworks.com> Signed-off-by: Jesse Gross <jesse@kernel.org> Acked-by: Ben Pfaff <blp@ovn.org>
2016-04-19 18:36:04 -07:00
- Tunnels:
* TLV mappings for protocols such as Geneve are now segregated on
a per-OpenFlow bridge basis rather than globally. (The interface
has not changed.)
* Removed support for IPsec tunnels.
- DPDK:
* New option 'n_rxq_desc' and 'n_txq_desc' fields for DPDK interfaces
which set the number of rx and tx descriptors to use for the given port.
* Support for DPDK v16.11.
netdev-dpdk: Enable Rx checksum offloading feature on DPDK physical ports. Add Rx checksum offloading feature support on DPDK physical ports. By default, the Rx checksum offloading is enabled if NIC supports. However, the checksum offloading can be turned OFF either while adding a new DPDK physical port to OVS or at runtime. The rx checksum offloading can be turned off by setting the parameter to 'false'. For eg: To disable the rx checksum offloading when adding a port, 'ovs-vsctl add-port br0 dpdk0 -- \ set Interface dpdk0 type=dpdk options:rx-checksum-offload=false' OR (to disable at run time after port is being added to OVS) 'ovs-vsctl set Interface dpdk0 options:rx-checksum-offload=false' Similarly to turn ON rx checksum offloading at run time, 'ovs-vsctl set Interface dpdk0 options:rx-checksum-offload=true' The Tx checksum offloading support is not implemented due to the following reasons. 1) Checksum offloading and vectorization are mutually exclusive in DPDK poll mode driver. Vector packet processing is turned OFF when checksum offloading is enabled which causes significant performance drop at Tx side. 2) Normally, OVS generates checksum for tunnel packets in software at the 'tunnel push' operation, where the tunnel headers are created. However enabling Tx checksum offloading involves, *) Mark every packets for tx checksum offloading at 'tunnel_push' and recirculate. *) At the time of xmit, validate the same flag and instruct the NIC to do the checksum calculation. In case NIC doesnt support Tx checksum offloading, the checksum calculation has to be done in software before sending out the packets. No significant performance improvement noticed with Tx checksum offloading due to the e overhead of additional validations + non vector packet processing. In some test scenarios, it introduces performance drop too. Rx checksum offloading still offers 8-9% of improvement on VxLAN tunneling decapsulation even though the SSE vector Rx function is disabled in DPDK poll mode driver. Signed-off-by: Sugesh Chandran <sugesh.chandran@intel.com> Acked-by: Jesse Gross <jesse@kernel.org> Acked-by: Pravin B Shelar <pshelar@ovn.org>
2017-01-02 14:27:48 -08:00
* Support for rx checksum offload. Refer DPDK HOWTO for details.
* Port Hotplug is now supported.
* DPDK physical ports can now have arbitrary names. The PCI address of
the device must be set using the 'dpdk-devargs' option. Compatibility
with the old dpdk<portid> naming scheme is broken, and as such a
device will not be available for use until a valid dpdk-devargs is
specified.
* Virtual DPDK Poll Mode Driver (vdev PMD) support.
* Removed experimental tag.
- Fedora packaging:
* A package upgrade does not automatically restart OVS service.
- ovs-vswitchd/ovs-vsctl:
* Ports now have a "protected" flag. Protected ports can not forward
frames to other protected ports. Unprotected ports can receive and
forward frames to protected and other unprotected ports.
- ovs-vsctl, ovn-nbctl, ovn-sbctl, vtep-ctl:
* Database commands now accept integer ranges, e.g. "set port
eth0 trunks=1-10" to enable trunking VLANs 1 to 10.
v2.6.0 - 27 Sep 2016
---------------------
- First supported release of OVN. See ovn-architecture(7) for more
details.
- ovsdb-server:
* New "monitor_cond" "monitor_cond_update" and "update2" extensions to
RFC 7047.
- OpenFlow:
* OpenFlow 1.3+ bundles now expire after 10 seconds since the
last time the bundle was either opened, modified, or closed.
* OpenFlow 1.3 Extension 230, adding OpenFlow Bundles support, is
now implemented.
* OpenFlow 1.3+ bundles are now supported for group mods as well as
flow mods and port mods. Both 'atomic' and 'ordered' bundle
flags are supported for group mods as well as flow mods.
* Internal OpenFlow rule representation for load and set-field
actions is now much more memory efficient. For a complex flow
table this can reduce rule memory consumption by 40%.
* Bundles are now much more memory efficient than in OVS 2.5.
Together with memory efficiency improvements in OpenFlow rule
representation, the peak OVS resident memory use during a
bundle commit for large complex set of flow mods can be only
25% of that in OVS 2.5 (4x lower).
* OpenFlow 1.1+ OFPT_QUEUE_GET_CONFIG_REQUEST now supports OFPP_ANY.
* OpenFlow 1.4+ OFPMP_QUEUE_DESC is now supported.
* OpenFlow 1.4+ OFPT_TABLE_STATUS is now supported.
* New property-based packet-in message format NXT_PACKET_IN2 with support
Implement serializing the state of packet traversal in "continuations". One purpose of OpenFlow packet-in messages is to allow a controller to interpose on the path of a packet through the flow tables. If, for example, the controller needs to modify a packet in some way that the switch doesn't directly support, the controller should be able to program the switch to send it the packet, then modify the packet and send it back to the switch to continue through the flow table. That's the theory. In practice, this doesn't work with any but the simplest flow tables. Packet-in messages simply don't include enough context to allow the flow table traversal to continue. For example: * Via "resubmit" actions, an Open vSwitch packet can have an effective "call stack", but a packet-in can't describe it, and so it would be lost. * A packet-in can't preserve the stack used by NXAST_PUSH and NXAST_POP actions. * A packet-in can't preserve the OpenFlow 1.1+ action set. * A packet-in can't preserve the state of Open vSwitch mirroring or connection tracking. This commit introduces a solution called "continuations". A continuation is the state of a packet's traversal through OpenFlow flow tables. A "controller" action with the "pause" flag, which is newly implemented in this commit, generates a continuation and sends it to the OpenFlow controller in a packet-in asynchronous message (only NXT_PACKET_IN2 supports continuations, so the controller must configure them with NXT_SET_PACKET_IN_FORMAT). The controller processes the packet-in, possibly modifying some of its data, and sends it back to the switch with an NXT_RESUME request, which causes flow table traversal to continue. In principle, a single packet can be paused and resumed multiple times. Another way to look at it is: - "pause" is an extension of the existing OFPAT_CONTROLLER action. It sends the packet to the controller, with full pipeline context (some of which is switch implementation dependent, and may thus vary from switch to switch). - A continuation is an extension of OFPT_PACKET_IN, allowing for implementation dependent metadata. - NXT_RESUME is an extension of OFPT_PACKET_OUT, with the semantics that the pipeline processing is continued with the original translation context from where it was left at the time it was paused. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Jarno Rajahalme <jarno@ovn.org>
2016-02-19 16:10:06 -08:00
for arbitrary user-provided data and for serializing flow table
traversal into a continuation for later resumption.
* New extension message NXT_SET_ASYNC_CONFIG2 to allow OpenFlow 1.4-like
control over asynchronous messages in earlier versions of OpenFlow.
* New OpenFlow extension NXM_NX_MPLS_TTL to provide access to MPLS TTL.
* New output option, output(port=N,max_len=M), to allow truncating a
packet to size M bytes when outputting to port N.
ofproto: Add relaxed group_mod command ADD_OR_MOD This patch adds support for a new Group Mod command OFPGC_ADD_OR_MOD to OVS for all OpenFlow versions that support groups (OF11 and higher). The new ADD_OR_MOD creates a group that does not yet exist (like ADD) and modifies an existing group (like MODIFY). Rational: In OpenFlow 1.x the Group Mod commands OFPGC_ADD and OFPGC_MODIFY have strict semantics: ADD fails if the group exists, while MODIFY fails if the group does not exist. This requires a controller to exactly know the state of the switch when programming a group in order not run the risk of getting an OFP Error message in response. This is hard to achieve and maintain at all times in view of possible switch and controller restarts or other connection losses between switch and controller. Due to the un-acknowledged nature of the Group Mod message programming groups safely and efficiently at the same time is virtually impossible as the controller has to either query the existence of the group prior to each Group Mod message or to insert a Barrier Request/Reply after every group to be sure that no Error can be received at a later stage and require a complicated roll-back of any dependent actions taken between the failed Group Mod and the Error. In the ovs-ofctl command line the ADD_OR_MOD command is made available through the new option --may-create in the mod-group command: $ ovs-ofctl -Oopenflow13 del-groups br-int group_id=100 $ ovs-ofctl -Oopenflow13 mod-group br-int group_id=100,type=indirect,bucket=actions=2 OFPT_ERROR (OF1.3) (xid=0x2): OFPGMFC_UNKNOWN_GROUP OFPT_GROUP_MOD (OF1.3) (xid=0x2): MOD group_id=100,type=indirect,bucket=actions=output:2 $ ovs-ofctl -Oopenflow13 --may-create mod-group br-int group_id=100,type=indirect,bucket=actions=2 $ ovs-ofctl -Oopenflow13 dump-groups br-int OFPST_GROUP_DESC reply (OF1.3) (xid=0x2): group_id=100,type=indirect,bucket=actions=output:2 $ ovs-ofctl -Oopenflow13 --may-create mod-group br-int group_id=100,type=indirect,bucket=actions=3 $ ovs-ofctl -Oopenflow13 dump-groups br-int OFPST_GROUP_DESC reply (OF1.3) (xid=0x2): group_id=100,type=indirect,bucket=actions=output:3 Signed-off-by: Jan Scheurich <jan.scheurich at web.de> Signed-off-by: Ben Pfaff <blp@ovn.org>
2016-06-29 00:29:25 +02:00
* New command OFPGC_ADD_OR_MOD for OFPT_GROUP_MOD message that adds a
new group or modifies an existing groups
* The optional OpenFlow packet buffering feature is deprecated in
this release, and will be removed in the next OVS release
(2.7). After the change OVS always sends the 'buffer_id' as
0xffffffff in packet-in messages and will send an error
response if any other value of this field is included in
packet-out and flow mod sent by a controller. Controllers are
already expected to work properly in cases where the switch can
not buffer packets, so this change should not affect existing
users.
* New OpenFlow extension NXT_CT_FLUSH_ZONE to flush conntrack zones.
- Improved OpenFlow version compatibility for actions:
* New OpenFlow extension to support the "group" action in OpenFlow 1.0.
* OpenFlow 1.0 "enqueue" action now properly translated to OpenFlow 1.1+.
* OpenFlow 1.1 "mod_nw_ecn" and OpenFlow 1.1+ "mod_nw_ttl" actions now
properly translated to OpenFlow 1.0.
- ovs-ofctl:
* queue-get-config command now allows a queue ID to be specified.
* '--bundle' option can now be used with OpenFlow 1.3 and with group mods.
* New "bundle" command allows executing a mixture of flow and group mods
as a single atomic transaction.
* New option "--color" to produce colorized output for some commands.
ofproto: Add relaxed group_mod command ADD_OR_MOD This patch adds support for a new Group Mod command OFPGC_ADD_OR_MOD to OVS for all OpenFlow versions that support groups (OF11 and higher). The new ADD_OR_MOD creates a group that does not yet exist (like ADD) and modifies an existing group (like MODIFY). Rational: In OpenFlow 1.x the Group Mod commands OFPGC_ADD and OFPGC_MODIFY have strict semantics: ADD fails if the group exists, while MODIFY fails if the group does not exist. This requires a controller to exactly know the state of the switch when programming a group in order not run the risk of getting an OFP Error message in response. This is hard to achieve and maintain at all times in view of possible switch and controller restarts or other connection losses between switch and controller. Due to the un-acknowledged nature of the Group Mod message programming groups safely and efficiently at the same time is virtually impossible as the controller has to either query the existence of the group prior to each Group Mod message or to insert a Barrier Request/Reply after every group to be sure that no Error can be received at a later stage and require a complicated roll-back of any dependent actions taken between the failed Group Mod and the Error. In the ovs-ofctl command line the ADD_OR_MOD command is made available through the new option --may-create in the mod-group command: $ ovs-ofctl -Oopenflow13 del-groups br-int group_id=100 $ ovs-ofctl -Oopenflow13 mod-group br-int group_id=100,type=indirect,bucket=actions=2 OFPT_ERROR (OF1.3) (xid=0x2): OFPGMFC_UNKNOWN_GROUP OFPT_GROUP_MOD (OF1.3) (xid=0x2): MOD group_id=100,type=indirect,bucket=actions=output:2 $ ovs-ofctl -Oopenflow13 --may-create mod-group br-int group_id=100,type=indirect,bucket=actions=2 $ ovs-ofctl -Oopenflow13 dump-groups br-int OFPST_GROUP_DESC reply (OF1.3) (xid=0x2): group_id=100,type=indirect,bucket=actions=output:2 $ ovs-ofctl -Oopenflow13 --may-create mod-group br-int group_id=100,type=indirect,bucket=actions=3 $ ovs-ofctl -Oopenflow13 dump-groups br-int OFPST_GROUP_DESC reply (OF1.3) (xid=0x2): group_id=100,type=indirect,bucket=actions=output:3 Signed-off-by: Jan Scheurich <jan.scheurich at web.de> Signed-off-by: Ben Pfaff <blp@ovn.org>
2016-06-29 00:29:25 +02:00
* New option '--may-create' to use OFPGC_ADD_OR_MOD in mod-group command.
- IPFIX:
* New "sampling_port" option for "sample" action to allow sampling
ingress and egress tunnel metadata with IPFIX.
* New ovs-ofctl commands "dump-ipfix-bridge" and "dump-ipfix-flow" to
dump bridge IPFIX statistics and flow based IPFIX statistics.
* New setting other-config:virtual_obs_id to add an arbitrary string
to IPFIX records.
- Linux:
* OVS Linux datapath now implements Conntrack NAT action with all
supported Linux kernels.
* Support for truncate action.
* New QoS type "linux-noop" that prevents Open vSwitch from trying to
manage QoS for a given port (useful when other software manages QoS).
- DPDK:
* New option "n_rxq" for PMD interfaces.
Old 'other_config:n-dpdk-rxqs' is no longer supported.
Not supported by vHost interfaces. For them number of rx and tx queues
is applied from connected virtio device.
* New 'other_config:pmd-rxq-affinity' field for PMD interfaces, that
allows to pin port's rx queues to desired cores.
* New appctl command 'dpif-netdev/pmd-rxq-show' to check the port/rxq
assignment.
* Type of log messages from PMD threads changed from INFO to DBG.
* QoS functionality with sample egress-policer implementation.
* The mechanism for configuring DPDK has changed to use database
* Sensible defaults have been introduced for many of the required
configuration options
* DB entries have been added for many of the DPDK EAL command line
arguments. Additional arguments can be passed via the dpdk-extra
entry.
* Add ingress policing functionality.
* PMD threads servicing vHost User ports can now come from the NUMA
node that device memory is located on if CONFIG_RTE_LIBRTE_VHOST_NUMA
is enabled in DPDK.
* Basic connection tracking for the userspace datapath (no ALG,
fragmentation or NAT support yet)
* Support for DPDK 16.07
* Optional support for DPDK pdump enabled.
* Jumbo frame support
* Remove dpdkvhostcuse port type.
* OVS client mode for vHost and vHost reconnect (Requires QEMU 2.7)
* 'dpdkvhostuserclient' port type.
- Increase number of registers to 16.
- ovs-benchmark: This utility has been removed due to lack of use and
bitrot.
- ovs-appctl:
* New "vlog/close" command.
- ovs-ctl:
* Added the ability to selectively start the forwarding and database
functions (ovs-vswitchd and ovsdb-server, respectively).
- ovsdb-server:
* Remove max number of sessions limit, to enable connection scaling
testing.
- python:
* Added support for Python 3.4+ in addition to existing support
for 2.7+.
rhel: provide our own SELinux custom policy package CentOS, RHEL and Fedora distributions ship with their own Open vSwitch SELinux policy that is too strict and prevents Open vSwitch to work normally out of the box. As a solution, this patch introduces a new package which will "loosen" up "openvswitch_t" SELinux domain so that Open vSwitch could operate normally. Intended use-cases of this package are: 1. to allow users to install newer Open vSwitch on already released Fedora, RHEL and CentOS distributions where the default Open vSwitch SELinux policy that shipped with the corresponding Linux distribution is not up to date and did not anticipate that a newer Open vSwitch version might need to invoke new system calls or need to access certain system resources that it did not before; And 2. to provide alternative means through which Open vSwitch developers can proactively fix SELinux related policy issues without waiting for corresponding Linux distribution maintainers to update their central Open vSwitch SELinux policy. This patch was tested on Fedora 23 and CentOS 7. I verified that now on Fedora 23 Open vSwitch can create a NetLink socket; and that I did not see following error messages: vlog|INFO|opened log file /var/log/openvswitch/ovs-vswitchd.log ovs_numa|INFO|Discovered 2 CPU cores on NUMA node 0 ovs_numa|INFO|Discovered 1 NUMA nodes and 2 CPU cores reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting... reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected netlink_socket|ERR|fcntl: Permission denied dpif_netlink|ERR|Generic Netlink family 'ovs_datapath' does not exist. The Open vSwitch kernel module is p robably not loaded. dpif|WARN|failed to enumerate system datapaths: Permission denied dpif|WARN|failed to create datapath ovs-system: Permission denied I did not test all Open vSwitch features so there still could be some OVS configuration that would get "Permission denied" errors. Since, Open vSwitch daemons on Ubuntu 15.10 by default run under "unconfined" SELinux domain, then there is no need to create a similar debian package for Ubuntu, because it works on default Ubuntu installation. Signed-off-by: Ansis Atteka <aatteka@nicira.com> Acked-by: Flavio Leitner <fbl@sysclose.com>
2016-01-19 09:59:12 -08:00
- SELinux:
* Introduced SELinux policy package.
- Datapath Linux kernel compatibility.
* Dropped support for kernel older than 3.10.
* Removed VLAN splinters feature.
* Datapath supports kernel upto 4.7.
- Tunnels:
* Flow based tunnel match and action can be used for IPv6 address using
tun_ipv6_src, tun_ipv6_dst fields.
* Added support for IPv6 tunnels, for details checkout FAQ.
* Deprecated support for IPsec tunnels ports.
- A wrapper script, 'ovs-tcpdump', to easily port-mirror an OVS port and
watch with tcpdump
2016-06-20 14:19:40 -07:00
- Introduce --no-self-confinement flag that allows daemons to work with
sockets outside their run directory.
- ovs-pki: Changed message digest algorithm from SHA-1 to SHA-512 because
SHA-1 is no longer secure and some operating systems have started to
disable it in OpenSSL.
- Add 'mtu_request' column to the Interface table. It can be used to
configure the MTU of the ports.
Known issues:
- Using openvswitch module in conjunction with upstream Linux tunnels:
* When using the openvswitch module distributed with OVS against kernel
versions 4.4 to 4.6, the openvswitch module cannot be loaded or used at
the same time as "ip_gre".
- Conntrack FTP ALGs: When using the openvswitch module distributed with
OVS, particular Linux distribution kernels versions may provide diminished
functionality. This typically affects active FTP data connections when
using "actions=ct(alg=ftp),..." in flow tables. Specifically:
* Centos 7.1 kernels (3.10.0-2xx) kernels are unable to correctly set
up expectations for FTP data connections in multiple zones,
eg "actions=ct(zone=1,alg=ftp),ct(zone=2,alg=ftp),...". Executing the
"ct" action for subsequent data connections may fail to determine that
the data connection is "related" to an existing connection.
* Centos 7.2 kernels (3.10.0-3xx) kernels may not establish FTP ALG state
correctly for NATed connections. As a result, flows that perform NAT,
eg "actions=ct(nat,ftp=alg,table=1),..." may fail to NAT the packet,
and will populate the "ct_state=inv" bit in the flow.
v2.5.0 - 26 Feb 2016
---------------------
- Dropped support for Python older than version 2.7. As a consequence,
using Open vSwitch 2.5 or later on XenServer 6.5 or earlier (which
have Python 2.4) requires first installing Python 2.7.
- OpenFlow:
* Group chaining (where one OpenFlow group triggers another) is
now supported.
* OpenFlow 1.4+ "importance" is now considered for flow eviction.
* OpenFlow 1.4+ OFPTC_EVICTION is now implemented.
* OpenFlow 1.4+ OFPTC_VACANCY_EVENTS is now implemented.
* OpenFlow 1.4+ OFPMP_TABLE_DESC is now implemented.
* Allow modifying the ICMPv4/ICMPv6 type and code fields.
* OpenFlow 1.4+ OFPT_SET_ASYNC_CONFIG and OFPT_GET_ASYNC_CONFIG are
now implemented.
- ovs-ofctl:
* New "out_group" keyword for OpenFlow 1.1+ matching on output group.
- Tunnels:
* Geneve tunnels can now match and set options and the OAM bit.
* The nonstandard GRE64 tunnel extension has been dropped.
- Support Multicast Listener Discovery (MLDv1 and MLDv2).
- Add 'symmetric_l3l4' and 'symmetric_l3l4+udp' hash functions.
Extend sFlow agent to report tunnel and MPLS structures Packets are still sampled at ingress only, so the egress tunnel and/or MPLS structures are only included when there is just 1 output port. The actions are either provided by the datapath in the sample upcall or looked up in the userspace cache. The former is preferred because it is more reliable and does not present any new demands or constraints on the userspace cache, however the code falls back on the userspace lookup so that this solution can work with existing kernel datapath modules. If the lookup fails it is not critical: the compiled user-action-cookie is still available and provides the essential output port and output VLAN forwarding information just as before. The openvswitch actions can express almost any tunneling/mangling so the only totally faithful representation would be to somehow encode the whole list of flow actions in the sFlow output. However the standard sFlow tunnel structures can express most common real-world scenarios, so in parsing the actions we look for those and skip the encoding if we see anything unusual. For example, a single set(tunnel()) or tnl_push() is interpreted, but if a second such action is encountered then the egress tunnel reporting is suppressed. The sFlow standard allows "best effort" encoding so that if a field is not knowable or too onerous to look up then it can be left out. This is often the case for the layer-4 source port or even the src ip address of a tunnel. The assumption is that monitoring is enabled everywhere so a missing field can typically be seen at ingress to the next switch in the path. This patch also adds unit tests to check the sFlow encoding of set(tunnel()), tnl_push() and push_mpls() actions. The netlink attribute to request that actions be included in the upcall from the datapath is inserted for sFlow sampling only. To make that option be explicit would require further changes to the printing and parsing of actions in lib/odp-util.c, and to scripts in the test suite. Further enhancements to report on 802.1AD QinQ, 64-bit tunnel IDs, and NAT transformations can follow in future patches that make only incremental changes. Signed-off-by: Neil McKee <neil.mckee@inmon.com> [blp@nicira.com made stylistic and semantic changes] Signed-off-by: Ben Pfaff <blp@nicira.com>
2015-07-17 21:37:02 -07:00
- sFlow agent now reports tunnel and MPLS structures.
- New 'check-system-userspace', 'check-kmod' and 'check-kernel' Makefile
targets to run a new system testsuite. These tests can be run inside
a Vagrant box. See INSTALL.md for details
- Mark --syslog-target argument as deprecated. It will be removed in
the next OVS release.
- Added --user option to all daemons
Add support for connection tracking. This patch adds a new action and fields to OVS that allow connection tracking to be performed. This support works in conjunction with the Linux kernel support merged into the Linux-4.3 development cycle. Packets have two possible states with respect to connection tracking: Untracked packets have not previously passed through the connection tracker, while tracked packets have previously been through the connection tracker. For OpenFlow pipeline processing, untracked packets can become tracked, and they will remain tracked until the end of the pipeline. Tracked packets cannot become untracked. Connections can be unknown, uncommitted, or committed. Packets which are untracked have unknown connection state. To know the connection state, the packet must become tracked. Uncommitted connections have no connection state stored about them, so it is only possible for the connection tracker to identify whether they are a new connection or whether they are invalid. Committed connections have connection state stored beyond the lifetime of the packet, which allows later packets in the same connection to be identified as part of the same established connection, or related to an existing connection - for instance ICMP error responses. The new 'ct' action transitions the packet from "untracked" to "tracked" by sending this flow through the connection tracker. The following parameters are supported initally: - "commit": When commit is executed, the connection moves from uncommitted state to committed state. This signals that information about the connection should be stored beyond the lifetime of the packet within the pipeline. This allows future packets in the same connection to be recognized as part of the same "established" (est) connection, as well as identifying packets in the reply (rpl) direction, or packets related to an existing connection (rel). - "zone=[u16|NXM]": Perform connection tracking in the zone specified. Each zone is an independent connection tracking context. When the "commit" parameter is used, the connection will only be committed in the specified zone, and not in other zones. This is 0 by default. - "table=NUMBER": Fork pipeline processing in two. The original instance of the packet will continue processing the current actions list as an untracked packet. An additional instance of the packet will be sent to the connection tracker, which will be re-injected into the OpenFlow pipeline to resume processing in the specified table, with the ct_state and other ct match fields set. If the table is not specified, then the packet is submitted to the connection tracker, but the pipeline does not fork and the ct match fields are not populated. It is strongly recommended to specify a table later than the current table to prevent loops. When the "table" option is used, the packet that continues processing in the specified table will have the ct_state populated. The ct_state may have any of the following flags set: - Tracked (trk): Connection tracking has occurred. - Reply (rpl): The flow is in the reply direction. - Invalid (inv): The connection tracker couldn't identify the connection. - New (new): This is the beginning of a new connection. - Established (est): This is part of an already existing connection. - Related (rel): This connection is related to an existing connection. For more information, consult the ovs-ofctl(8) man pages. Below is a simple example flow table to allow outbound TCP traffic from port 1 and drop traffic from port 2 that was not initiated by port 1: table=0,priority=1,action=drop table=0,arp,action=normal table=0,in_port=1,tcp,ct_state=-trk,action=ct(commit,zone=9),2 table=0,in_port=2,tcp,ct_state=-trk,action=ct(zone=9,table=1) table=1,in_port=2,ct_state=+trk+est,tcp,action=1 table=1,in_port=2,ct_state=+trk+new,tcp,action=drop Based on original design by Justin Pettit, contributions from Thomas Graf and Daniele Di Proietto. Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
2015-08-11 10:56:09 -07:00
- Add support for connection tracking through the new "ct" action
and "ct_state"/"ct_zone"/"ct_mark"/"ct_label" match fields. Only
available on Linux kernels with the connection tracking module loaded.
- Add experimental version of OVN. OVN, the Open Virtual Network, is a
system to support virtual network abstraction. OVN complements the
existing capabilities of OVS to add native support for virtual network
abstractions, such as virtual L2 and L3 overlays and security groups.
- RHEL packaging:
* DPDK ports may now be created via network scripts (see README.RHEL).
- DPDK:
* Requires DPDK 2.2
* Added multiqueue support to vhost-user
* Note: QEMU 2.5+ required for multiqueue support
v2.4.0 - 20 Aug 2015
---------------------
- Flow table modifications are now atomic, meaning that each packet
now sees a coherent version of the OpenFlow pipeline. For
example, if a controller removes all flows with a single OpenFlow
"flow_mod", no packet sees an intermediate version of the OpenFlow
pipeline where only some of the flows have been deleted.
- Added support for SFQ, FQ_CoDel and CoDel qdiscs.
- Add bash command-line completion support for ovs-vsctl Please check
utilities/ovs-command-compgen.INSTALL.md for how to use.
mac-learning: Implement per-port MAC learning fairness. In "MAC flooding", an attacker transmits an overwhelming number of frames with unique Ethernet source address on a switch port. The goal is to force the switch to evict all useful MAC learning table entries, so that its behavior degenerates to that of a hub, flooding all traffic. In turn, that allows an attacker to eavesdrop on the traffic of other hosts attached to the switch, with all the risks that that entails. Before this commit, the Open vSwitch "normal" action that implements its standalone switch behavior (and that can be used by OpenFlow controllers as well) was vulnerable to MAC flooding attacks. This commit fixes the problem by implementing per-port fairness for MAC table entries: when the MAC table is at its maximum size, MAC table eviction always deletes an entry from the port with the most entries. Thus, MAC entries will never be evicted from ports with only a few entries if a port with a huge number of entries exists. Controllers could introduce their own MAC flooding vulnerabilities into OVS. For a controller that adds destination MAC based flows to an OpenFlow flow table as a reaction to "packet-in" events, such a bug, if it exists, would be in the controller code itself and would need to be fixed in the controller. For a controller that relies on the Open vSwitch "learn" action to add destination MAC based flows, Open vSwitch has existing support for eviction policy similar to that implemented in this commit through the "groups" column in the Flow_Table table documented in ovs-vswitchd.conf.db(5); we recommend that users of "learn" not already familiar with eviction groups to read that documentation. In addition to implementation of per-port MAC learning fairness, this commit includes some closely related changes: - Access to client-provided "port" data in struct mac_entry is now abstracted through helper functions, which makes it easier to ensure that the per-port data structures are maintained consistently. - The mac_learning_changed() function, which had become trivial, vestigial, and confusing, was removed. Its functionality was folded into the new function mac_entry_set_port(). - Many comments were added and improved; there had been a lot of comment rot in previous versions. CERT: VU#784996 Reported-by: "Ronny L. Bull - bullrl" <bullrl@clarkson.edu> Reported-at: http://www.irongeek.com/i.php?page=videos/derbycon4/t314-exploring-layer-2-network-security-in-virtualized-environments-ronny-l-bull-dr-jeanna-n-matthews Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Ethan Jackson <ethan@nicira.com>
2015-02-11 23:34:50 -08:00
- The MAC learning feature now includes per-port fairness to mitigate
MAC flooding attacks.
- New support for a "conjunctive match" OpenFlow extension, which
allows constructing OpenFlow matches of the form "field1 in
{a,b,c...} AND field2 in {d,e,f...}" and generalizations. For details,
see documentation for the "conjunction" action in ovs-ofctl(8).
- Add bash command-line completion support for ovs-appctl/ovs-dpctl/
ovs-ofctl/ovsdb-tool commands. Please check
utilities/ovs-command-compgen.INSTALL.md for how to use.
- The "learn" action supports a new flag "delete_learned" that causes
the learned flows to be deleted when the flow with the "learn" action
is deleted.
- Basic support for the Geneve tunneling protocol. It is not yet
possible to generate or match options. This is planned for a future
release. The protocol is documented at
http://tools.ietf.org/html/draft-gross-geneve-00
- The OVS database now reports controller rate limiting statistics.
- sflow now exports information about LACP-based bonds, port names, and
OpenFlow port numbers, as well as datapath performance counters.
dpctl: add ovs-appctl dpctl/* commands to talk to dpif-netdev This commit introduces multiple appctl commands (dpctl/*) They are needed to interact with userspace datapaths (dpif-netdev), because the ovs-dpctl command runs in a separate process and cannot see the userspace datapaths inside vswitchd. This change moves most of the code of utilities/ovs-dpctl.c in lib/dpctl.c. Both the ovs-dpctl command and the ovs-appctl dpctl/* commands make calls to lib/dpctl.c functions, to interact with datapaths. The code from utilities/ovs-dpctl.c has been moved to lib/dpctl.c and has been changed for different reasons: - An exit() call in the old code made perfectly sense. Now (since the code can be run inside vswitchd) it would terminate the daemon. Same reasoning can be applied to ovs_fatal_*() calls. - The lib/dpctl.c code _should_ not leak memory. - All the print* have been replaced with a function pointer provided by the caller, since this code can be run in the ovs-dpctl process (in which case we need to print to stdout) or in response to a unixctl request (and in this case we need to send everything through a socket, using JSON encapsulation). The syntax is ovs-appctl dpctl/(COMMAND) [OPTIONS] [PARAMETERS] while the ovs-dpctl syntax (which _should_ remain the same after this change) is ovs-dpctl [OPTIONS] (COMMAND) [PARAMETERS] Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com> [blp@nicira.com made stylistic and documentation changes] Signed-off-by: Ben Pfaff <blp@nicira.com>
2014-07-17 17:26:00 -07:00
- ovs-dpctl functionality is now available for datapaths integrated
into ovs-vswitchd, via ovs-appctl. Some existing ovs-appctl
commands are now redundant and will be removed in a future
release. See ovs-vswitchd(8) for details.
- OpenFlow:
* OpenFlow 1.4 bundles are now supported for flow mods and port
mods. For flow mods, both 'atomic' and 'ordered' bundle flags
are trivially supported, as all bundled messages are executed
in the order they were added and all flow table modifications
are now atomic to the datapath. Port mods may not appear in
atomic bundles, as port status modifications are not atomic.
* IPv6 flow label and neighbor discovery fields are now modifiable.
* OpenFlow 1.5 extended registers are now supported.
* The OpenFlow 1.5 actset_output field is now supported.
* OpenFlow 1.5 Copy-Field action is now supported.
* OpenFlow 1.5 masked Set-Field action is now supported.
* OpenFlow 1.3+ table features requests are now supported (read-only).
* Nicira extension "move" actions may now be included in action sets.
* "resubmit" actions may now be included in action sets. The resubmit
is executed last, and only if the action set has no "output" or "group"
action.
* OpenFlow 1.4+ flow "importance" is now maintained in the flow table.
* A new Netronome extension to OpenFlow 1.5+ allows control over the
fields hashed for OpenFlow select groups. See "selection_method" and
related options in ovs-ofctl(8) for details.
- ovs-ofctl has a new '--bundle' option that makes the flow mod commands
('add-flow', 'add-flows', 'mod-flows', 'del-flows', and 'replace-flows')
use an OpenFlow 1.4 bundle to operate the modifications as a single
atomic transaction. If any of the flow mods in a transaction fail, none
of them are executed. All flow mods in a bundle appear to datapath
lookups simultaneously.
- ovs-ofctl 'add-flow' and 'add-flows' commands now accept arbitrary flow
mods as an input by allowing the flow specification to start with an
explicit 'add', 'modify', 'modify_strict', 'delete', or 'delete_strict'
keyword. A missing keyword is treated as 'add', so this is fully
backwards compatible. With the new '--bundle' option all the flow mods
are executed as a single atomic transaction using an OpenFlow 1.4 bundle.
- ovs-pki: Changed message digest algorithm from MD5 to SHA-1 because
MD5 is no longer secure and some operating systems have started to disable
it in OpenSSL.
- ovsdb-server: New OVSDB protocol extension allows inequality tests on
"optional scalar" columns. See ovsdb-server(1) for details.
- ovs-vsctl now permits immutable columns in a new row to be modified in
the same transaction that creates the row.
- test-controller has been renamed ovs-testcontroller at request of users
who find it useful for testing basic OpenFlow setups. It is still not
a necessary or desirable part of most Open vSwitch deployments.
- Support for travis-ci.org based continuous integration builds has been
added. Build failures are reported to build@openvswitch.org. See INSTALL.md
file for additional details.
- Support for the Rapid Spanning Tree Protocol (IEEE 802.1D-2004).
The implementation has been tested successfully against the Ixia Automated
Network Validation Library (ANVL).
- Stats are no longer updated on fake bond interface.
- Keep active bond interface selection across OVS restart.
- A simple wrapper script, 'ovs-docker', to integrate OVS with Docker
containers. If and when there is a native integration of Open vSwitch
with Docker, the wrapper script will be retired.
- Added support for DPDK Tunneling. VXLAN, GRE, and Geneve are supported
protocols. This is generic tunneling mechanism for userspace datapath.
- Support for multicast snooping (IGMPv1, IGMPv2 and IGMPv3)
- Support for Linux kernels up to 4.0.x
- The documentation now use the term 'destination' to mean one of syslog,
console or file for vlog logging instead of the previously used term
'facility'.
- Support for VXLAN Group Policy extension
- Initial support for the IETF Auto-Attach SPBM draft standard. This
contains rudimentary support for the LLDP protocol as needed for
Auto-Attach.
- The default OpenFlow and OVSDB ports are now the IANA-assigned
numbers. OpenFlow is 6653 and OVSDB is 6640.
- Support for DPDK vHost.
- Support for outer UDP checksums in Geneve and VXLAN.
- The kernel vports with dependencies are no longer part of the overall
openvswitch.ko but built and loaded automatically as individual kernel
modules (vport-*.ko).
- Support for STT tunneling.
- ovs-sim: New developer tool for simulating multiple OVS instances.
See ovs-sim(1) for more information.
- Support to configure method (--syslog-method argument) that determines
how daemons will talk with syslog.
- Support for "ovs-appctl vlog/list-pattern" command that lets to query
logging message format for each destination.
v2.3.0 - 14 Aug 2014
---------------------
- OpenFlow 1.1, 1.2, and 1.3 are now enabled by default in
ovs-vswitchd.
- Linux kernel datapath now has an exact match cache optimizing the
flow matching process.
- Datapath flows now have partially wildcarded tranport port field
matches. This reduces userspace upcalls, but increases the
number of different masks in the datapath. The kernel datapath
exact match cache removes the overhead of matching the incoming
packets with the larger number of masks, but when paired with an
older kernel module, some workloads may perform worse with the
new userspace.
- Compatibility with autoconf 2.63 (previously >=2.64)
v2.2.0 - Internal Release
---------------------
- Internal ports are no longer brought up by default, because it
should be an administrator task to bring up devices as they are
configured properly.
ovs-vsctl: Improve error reporting ovs-vsctl is a command-line interface to the Open vSwitch database, and as such it just modifies the desired Open vSwitch configuration in the database. ovs-vswitchd, on the other hand, monitors the database and implements the actual configuration specified in the database. This can lead to surprises when the user makes a change to the database, with ovs-vsctl, that ovs-vswitchd cannot actually implement. In such a case, the ovs-vsctl command silently succeeds (because the database was successfully updated) but its desired effects don't actually take place. One good example of such a change is attempting to add a port with a misspelled name (e.g. ``ovs-vsctl add-port br0 fth0'', where fth0 should be eth0); another is creating a bridge or a port whose name is longer than supported (e.g. ``ovs-vsctl add-br'' with a 16-character bridge name on Linux). It can take users a long time to realize the error, because it requires looking through the ovs-vswitchd log. The patch improves the situation by checking whether operations that ovs executes succeed and report an error when they do not. This patch only report add-br and add-port operation errors by examining the `ofport' value that ovs-vswitchd stores into the database record for the newly created interface. Until ovs-vswitchd finishes implementing the new configuration, this column is empty, and after it finishes it is either -1 (on failure) or a positive number (on success). Signed-off-by: Andy Zhou <azhou@nicira.com> Co-authored-by: Thomas Graf <tgraf@redhat.com> Signed-off-by: Thomas Graf <tgraf@redhat.com> Co-authored-by: Ben Pfaff <blp@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
2014-03-27 17:10:31 +01:00
- ovs-vsctl now reports when ovs-vswitchd fails to create a new port or
bridge.
- Port creation and configuration errors are now stored in a new error
column of the Interface table and included in 'ovs-vsctl show'.
- The "ovsdbmonitor" graphical tool has been removed, because it was
poorly maintained and not widely used.
- New "check-ryu" Makefile target for running Ryu tests for OpenFlow
controllers against Open vSwitch. See INSTALL.md for details.
- Added IPFIX support for SCTP flows and templates for ICMPv4/v6 flows.
- Upon the receipt of a SIGHUP signal, ovs-vswitchd no longer reopens its
log file (it will terminate instead). Please use 'ovs-appctl vlog/reopen'
instead.
- Support for Linux kernels up to 3.14. From Kernel 3.12 onwards OVS uses
tunnel API for GRE and VXLAN.
- Added DPDK support.
- Added support for custom vlog patterns in Python
v2.1.0 - 19 Mar 2014
---------------------
Classifier: Track address prefixes. Add a prefix tree (trie) structure for tracking the used address space, enabling skipping classifier tables containing longer masks than necessary for an address field value in a packet header being classified. This enables less unwildcarding for datapath flows in parts of the address space without host routes. Trie lookup is interwoven to the staged lookup, so that a trie is searched only when the configured trie field becomes relevant for the lookup. The trie lookup results are retained so that each trie is checked at most once for each classifier lookup. This implementation tracks the number of rules at each address prefix for the whole classifier. More aggressive table skipping would be possible by maintaining lists of tables that have prefixes at the lengths encountered on tree traversal, or by maintaining separate tries for subsets of rules separated by metadata fields. Prefix tracking is configured via OVSDB. A new column "prefixes" is added to the database table "Flow_Table". "prefixes" is a set of string values listing the field names for which prefix lookup should be used. As of now, the fields for which prefix lookup can be enabled are: - tun_id, tun_src, tun_dst - nw_src, nw_dst (or aliases ip_src and ip_dst) - ipv6_src, ipv6_dst There is a maximum number of fields that can be enabled for any one flow table. Currently this limit is 3. Examples: ovs-vsctl set Bridge br0 flow_tables:0=@N1 -- \ --id=@N1 create Flow_Table name=table0 ovs-vsctl set Bridge br0 flow_tables:1=@N1 -- \ --id=@N1 create Flow_Table name=table1 ovs-vsctl set Flow_Table table0 prefixes=ip_dst,ip_src ovs-vsctl set Flow_Table table1 prefixes=[] Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
2013-12-11 11:07:01 -08:00
- Address prefix tracking support for flow tables. New columns
"prefixes" in OVS-DB table "Flow_Table" controls which packet
header fields are used for address prefix tracking. Prefix
tracking allows the classifier to skip rules with longer than
necessary prefixes, resulting in better wildcarding for datapath
flows. Default configuration is to not use any fields for prefix
tracking. However, if any flow tables contain both exact matches
and masked matches for IP address fields, OVS performance may be
increased by using this feature.
* As of now, the fields for which prefix lookup can be enabled
are: 'tun_id', 'tun_src', 'tun_dst', 'nw_src', 'nw_dst' (or
aliases 'ip_src' and 'ip_dst'), 'ipv6_src', and 'ipv6_dst'.
(Using this feature for 'tun_id' would only make sense if the
tunnel IDs have prefix structure similar to IP addresses.)
* There is a maximum number of fields that can be enabled for any
one flow table. Currently this limit is 3.
* Examples:
$ ovs-vsctl set Bridge br0 flow_tables:0=@N1 -- \
--id=@N1 create Flow_Table name=table0
$ ovs-vsctl set Bridge br0 flow_tables:1=@N1 -- \
--id=@N1 create Flow_Table name=table1
$ ovs-vsctl set Flow_Table table0 prefixes=ip_dst,ip_src
$ ovs-vsctl set Flow_Table table1 prefixes=[]
- TCP flags matching: OVS now supports matching of TCP flags. This
has an adverse performance impact when using OVS userspace 1.10
or older (no megaflows support) together with the new OVS kernel
module. It is recommended that the kernel and userspace modules
both are upgraded at the same time.
- The default OpenFlow and OVSDB ports will change to
IANA-assigned numbers in a future release. Consider updating
your installations to specify port numbers instead of using the
defaults.
- OpenFlow:
* The OpenFlow 1.1+ "Write-Actions" instruction is now supported.
* OVS limits the OpenFlow port numbers it assigns to port 32767 and
below, leaving port numbers above that range free for assignment
by the controller.
* ovs-vswitchd now honors changes to the "ofport_request" column
in the Interface table by changing the port's OpenFlow port
number.
* The Open vSwitch software switch now supports OpenFlow groups.
- ovs-vswitchd.conf.db.5 man page will contain graphviz/dot
diagram only if graphviz package was installed at the build time.
- Support for Linux kernels up to 3.11
- ovs-dpctl:
The "show" command also displays mega flow mask stats.
- ovs-ofctl:
* New command "ofp-parse-pcap" to dump OpenFlow from PCAP files.
- ovs-controller has been renamed test-controller. It is no longer
packaged or installed by default, because too many users assumed
incorrectly that ovs-controller was a necessary or desirable part
of an Open vSwitch deployment.
- Added vlog option to export to a UDP syslog sink.
- ovsdb-client:
* The "monitor" command can now monitor all tables in a database,
instead of being limited to a single table.
- The flow-eviction-threshold has been replaced by the flow-limit which is a
hard limit on the number of flows in the datapath. It defaults to 200,000
flows. OVS automatically adjusts this number depending on network
conditions.
- Added IPv6 support for active and passive socket communications.
v2.0.0 - 15 Oct 2013
---------------------
- The ovs-vswitchd process is no longer single-threaded. Multiple
threads are now used to handle flow set up and asynchronous
logging.
- OpenFlow:
* Experimental support for OpenFlow 1.1 (in addition to 1.2 and
1.3, which had experimental support in 1.10).
* Experimental protocol support for OpenFlow 1.1+ groups. This
does not yet include an implementation in the Open vSwitch
software switch.
* Experimental protocol support for OpenFlow 1.2+ meters. This
does not yet include an implementation in the Open vSwitch
software switch.
* New support for matching outer source and destination IP address
of tunneled packets, for tunnel ports configured with the newly
added "remote_ip=flow" and "local_ip=flow" options.
* Support for matching on metadata 'pkt_mark' for interacting with
other system components. On Linux this corresponds to the skb
mark.
* Support matching, rewriting SCTP ports
- The Interface table in the database has a new "ifindex" column to
report the interface's OS-assigned ifindex.
- New "check-oftest" Makefile target for running OFTest against Open
vSwitch. See README-OFTest for details.
- The flow eviction threshold has been moved to the Open_vSwitch table.
- Database names are now mandatory when specifying ovsdb-server options
through database paths (e.g. Private key option with the database name
should look like "--private-key=db:Open_vSwitch,SSL,private_key").
- Added ovs-dev.py, a utility script helpful for Open vSwitch developers.
- Support for Linux kernels up to 3.10
- ovs-ofctl:
* New "ofp-parse" for printing OpenFlow messages read from a file.
* New commands for OpenFlow 1.1+ groups.
- Added configurable flow caching support to IPFIX exporter.
- Dropped support for Linux pre-2.6.32.
- Log file timestamps and ovsdb commit timestamps are now reported
with millisecond resolution. (Previous versions only reported
whole seconds.)
v1.11.0 - 28 Aug 2013
---------------------
- Support for megaflows, which allows wildcarding in the kernel (and
any dpif implementation that supports wildcards). Depending on
the flow table and switch configuration, flow set up rates are
close to the Linux bridge.
- The "tutorial" directory contains a new tutorial for some advanced
Open vSwitch features.
- Stable bond mode has been removed.
- The autopath action has been removed.
- New support for the data encapsulation format of the LISP tunnel
protocol (RFC 6830). An external control plane or manual flow
setup is required for EID-to-RLOC mapping.
- OpenFlow:
* The "dec_mpls_ttl" and "set_mpls_ttl" actions from OpenFlow
1.1 and later are now implemented.
* New "stack" extension for use in actions, to push and pop from
NXM fields.
* The "load" and "set_field" actions can now modify the "in_port". (This
allows one to enable output to a flow's input port by setting the
in_port to some unused value, such as OFPP_NONE.)
- ovs-dpctl:
* New debugging commands "add-flow", "mod-flow", "del-flow".
* "dump-flows" now has a -m option to increase output verbosity.
- In dpif-based bridges, cache action translations, which can improve
flow set up performance by 80% with a complicated flow table.
- New syslog format, prefixed with "ovs|", to be easier to filter.
- RHEL: Removes the default firewall rule that allowed GRE traffic to
pass through. Any users that relied on this automatic firewall hole
will have to manually configure it. The ovs-ctl(8) manpage documents
the "enable-protocol" command that can be used as an alternative.
- New CFM demand mode which uses data traffic to indicate interface
liveness.
v1.10.0 - 01 May 2013
---------------------
- Bridge compatibility support has been removed. Any uses that
rely on ovs-brcompatd will have to stick with Open vSwitch 1.9.x
or adapt to native Open vSwitch support (e.g. use ovs-vsctl instead
of brctl).
- The maximum size of the MAC learning table is now configurable.
- With the Linux datapath, packets for new flows are now queued
separately on a per-port basis, so it should no longer be
possible for a large number of new flows arriving on one port to
prevent new flows from being processed on other ports.
- ovs-vsctl:
* Previously ovs-vsctl would retry connecting to the database forever,
causing it to hang if ovsdb-server was not running. Now, ovs-vsctl
only tries once by default (use --retry to try forever). This change
means that you may want to remove uses of --timeout to avoid hangs
in ovs-vsctl calls.
* Many "ovs-vsctl" database commands now accept an --if-exists option.
Please refer to the ovs-vsctl manpage for details.
- OpenFlow:
- Experimental support for newer versions of OpenFlow. See
the "What versions of OpenFlow does Open vSwitch support?"
question in the FAQ for more details.
- The OpenFlow "dp_desc" may now be configured by setting the
value of other-config:dp-desc in the Bridge table.
- It is possible to request the OpenFlow port number with the
"ofport_request" column in the Interface table.
- The NXM flow_removed message now reports the OpenFlow table ID
from which the flow was removed.
- Tunneling:
- New support for the VXLAN tunnel protocol (see the IETF draft here:
http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-03).
- Tunneling requires the version of the kernel module paired with
Open vSwitch 1.9.0 or later.
- Inheritance of the Don't Fragment bit in IP tunnels (df_inherit)
is no longer supported.
- Path MTU discovery is no longer supported.
- CAPWAP tunneling support removed.
- Tunnels with multicast destination ports are no longer supported.
- ovs-dpctl:
- The "dump-flows" and "del-flows" no longer require an argument
if only one datapath exists.
- ovs-appctl:
- New "vlog/disable-rate-limit" and "vlog/enable-rate-limit"
commands available allow control over logging rate limits.
- New "dpif/dump-dps", "dpif/show", and "dpif/dump-flows" command
that mimic the equivalent ovs-dpctl commands.
- The ofproto library is now responsible for assigning OpenFlow port
numbers. An ofproto implementation should assign them when
port_construct() is called.
- All dpif-based bridges of a particular type share a common
datapath called "ovs-<type>", e.g. "ovs-system". The ovs-dpctl
commands will now return information on that shared datapath. To
get the equivalent bridge-specific information, use the new
"ovs-appctl dpif/*" commands.
- Backward-incompatible changes:
- Earlier Open vSwitch versions treated ANY as a wildcard in flow
syntax. OpenFlow 1.1 adds a port named ANY, which introduces a
conflict. ANY was rarely used in flow syntax, so we chose to
retire that meaning of ANY in favor of the OpenFlow 1.1 meaning.
- Patch ports no longer require kernel support, so they now work
with FreeBSD and the kernel module built into Linux 3.3 and later.
- New "sample" action.
v1.9.0 - 26 Feb 2013
------------------------
- Datapath:
- Support for ipv6 set action.
- SKB mark matching and setting.
- support for Linux kernels up to 3.8
- FreeBSD is now a supported platform, thanks to code contributions from
Gaetano Catalli, Ed Maste, and Giuseppe Lettieri.
- ovs-bugtool: New --ovs option to report only OVS related information.
- New %t and %T log escapes to identify the subprogram within a
cooperating group of processes or threads that emitted a log message.
The default log patterns now include this information.
- OpenFlow:
- Allow bitwise masking for SHA and THA fields in ARP, SLL and TLL
fields in IPv6 neighbor discovery messages, and IPv6 flow label.
- Adds support for writing to the metadata field for a flow.
- Tunneling:
- The tunneling code no longer assumes input and output keys are
symmetric. If they are not, PMTUD needs to be disabled for
tunneling to work. Note this only applies to flow-based keys.
- New support for a nonstandard form of GRE that supports a 64-bit key.
- Tunnel Path MTU Discovery default value was set to 'disabled'.
This feature is deprecated and will be removed soon.
- Tunnel header caching removed.
- ovs-ofctl:
- Commands and actions that accept port numbers now also accept keywords
that represent those ports (such as LOCAL, NONE, and ALL). This is
also the recommended way to specify these ports, for compatibility
with OpenFlow 1.1 and later (which use the OpenFlow 1.0 numbers
for these ports for different purposes).
- ovs-dpctl:
- Support requesting the port number with the "port_no" option in
the "add-if" command.
- ovs-pki: The "online PKI" features have been removed, along with
the ovs-pki-cgi program that facilitated it, because of some
alarmist insecurity claims. We do not believe that these claims
are true, but because we do not know of any users for this
feature it seems better on balance to remove it. (The ovs-pki-cgi
program was not included in distribution packaging.)
- ovsdb-server now enforces the immutability of immutable columns. This
was not enforced in earlier versions due to an oversight.
- The following features are now deprecated. They will be removed no
earlier than February 2013. Please email dev@openvswitch.org with
concerns.
- Bridge compatibility.
- Stable bond mode.
- The autopath action.
- Interface type "null".
- Numeric values for reserved ports (see "ovs-ofctl" note above).
- Tunnel Path MTU Discovery.
- CAPWAP tunnel support.
- The data in the RARP packets can now be matched in the same way as the
data in ARP packets.
v1.8.0 - 26 Feb 2013
------------------------
*** Internal only release ***
- New FAQ. Please send updates and additions!
- Authors of controllers, please read the new section titled "Action
Reproduction" in DESIGN, which describes an Open vSwitch change in
behavior in corner cases that may affect some controllers.
- ovs-l3ping:
- A new test utility that can create L3 tunnel between two Open
vSwitches and detect connectivity issues.
- ovs-ofctl:
- New --sort and --rsort options for "dump-flows" command.
- "mod-port" command can now control all OpenFlow config flags.
- OpenFlow:
- Allow general bitwise masking for IPv4 and IPv6 addresses in
IPv4, IPv6, and ARP packets. (Previously, only CIDR masks
were allowed.)
- Allow support for arbitrary Ethernet masks. (Previously, only
the multicast bit in the destination address could be individually
masked.)
- New field OXM_OF_METADATA, to align with OpenFlow 1.1.
- The OFPST_QUEUE request now reports an error if a specified port or
queue does not exist, or for requests for a specific queue on all
ports, if the specified queue does not exist on any port. (Previous
versions generally reported an empty set of results.)
- New "flow monitor" feature to allow controllers to be notified of
flow table changes as they happen.
- Additional protocols are not mirrored and dropped when forward-bpdu is
false. For a full list, see the ovs-vswitchd.conf.db man page.
- Open vSwitch now sends RARP packets in situations where it previously
sent a custom protocol, making it consistent with behavior of QEMU and
VMware.
- All Open vSwitch programs and log files now show timestamps in UTC,
instead the local timezone, by default.
v1.7.0 - 30 Jul 2012
------------------------
- kernel modules are renamed. openvswitch_mod.ko is now
openvswitch.ko and brcompat_mod.ko is now brcompat.ko.
- Increased the number of NXM registers to 8.
- Added ability to configure DSCP setting for manager and controller
connections. By default, these connections have a DSCP value of
Internetwork Control (0xc0).
- Added the granular link health statistics, 'cfm_health', to an
interface.
- OpenFlow:
- Added support to mask nd_target for ICMPv6 neighbor discovery flows.
- Added support for OpenFlow 1.3 port description (OFPMP_PORT_DESC)
multipart messages.
- ovs-ofctl:
- Added the "dump-ports-desc" command to retrieve port
information using the new port description multipart messages.
- ovs-test:
- Added support for spawning ovs-test server from the client.
- Now ovs-test is able to automatically create test bridges and ports.
- "ovs-dpctl dump-flows" now prints observed TCP flags in TCP flows.
- Tripled flow setup performance.
- The "coverage/log" command previously available through ovs-appctl
has been replaced by "coverage/show". The new command replies with
coverage counter values, instead of logging them.
v1.6.1 - 25 Jun 2012
------------------------
- Allow OFPP_CONTROLLER as the in_port for packet-out messages.
v1.6.0 - 24 Feb 2012
------------------------
*** Internal only release ***
- bonding
- LACP bonds no longer fall back to balance-slb when negotiations fail.
Instead they drop traffic.
- The default bond_mode changed from SLB to active-backup, to protect
unsuspecting users from the significant risks of SLB bonds (which are
documented in vswitchd/INTERNALS).
- Load balancing can be disabled by setting the bond-rebalance-interval
to zero.
- OpenFlow:
- Added support for bitwise matching on TCP and UDP ports.
See ovs-ofctl(8) for more information.
- NXM flow dumps now include times elapsed toward idle and hard
timeouts.
- Added an OpenFlow extension NXT_SET_ASYNC_CONFIG that allows
controllers more precise control over which OpenFlow messages they
receive asynchronously.
- New "fin_timeout" action.
- Added "fin_timeout" support to "learn" action.
- New Nicira action NXAST_CONTROLLER that offers additional features
over output to OFPP_CONTROLLER.
- When QoS settings for an interface do not configure a default queue
(queue 0), Open vSwitch now uses a default configuration for that
queue, instead of dropping all packets as in previous versions.
- Logging:
- Logging to console and file will have UTC timestamp as a default for
all the daemons. An example of the default format is
2012-01-27T16:35:17Z. ovs-appctl can be used to change the default
format as before.
- The syntax of commands and options to set log levels was simplified,
to make it easier to remember.
- New support for limiting the number of flows in an OpenFlow flow
table, with configurable policy for evicting flows upon
overflow. See the Flow_Table table in ovs-vswitch.conf.db(5)
for more information.
- New "enable-async-messages" column in the Controller table. If set to
false, OpenFlow connections to the controller will initially have all
asynchronous messages disabled, overriding normal OpenFlow behavior.
- ofproto-provider interface:
- "struct rule" has a new member "used" that ofproto implementations
should maintain by updating with ofproto_rule_update_used().
- ovsdb-client:
- The new option --timestamp causes the "monitor" command to print
a timestamp with every update.
- CFM module CCM broadcasts can now be tagged with an 802.1p priority.
v1.5.0 - 01 Jun 2012
2011-11-30 23:41:19 -08:00
------------------------
- OpenFlow:
- Added support for querying, modifying, and deleting flows
based on flow cookie when using NXM.
- Added new NXM_PACKET_IN format.
- Added new NXAST_DEC_TTL action.
- ovs-ofctl:
- Added daemonization support to the monitor and snoop commands.
- ovs-vsctl:
- The "find" command supports new set relational operators
{=}, {!=}, {<}, {>}, {<=}, and {>=}.
- ovsdb-tool now uses the typical database and schema installation
directories as defaults.
- The default MAC learning timeout has been increased from 60 seconds
to 300 seconds. The MAC learning timeout is now configurable.
2011-11-30 23:41:19 -08:00
v1.4.0 - 30 Jan 2012
2011-10-25 12:37:26 -07:00
------------------------
- Compatible with Open vSwitch kernel module included in Linux 3.3.
- New "VLAN splinters" feature to work around buggy device drivers
in old Linux versions. (This feature is deprecated. When
broken device drivers are no longer in widespread use, we will
delete this feature.) See ovs-vswitchd.conf.db(5) for more
information.
- OpenFlow:
- Added ability to match on IPv6 flow label through NXM.
- Added ability to match on ECN bits in IPv4 and IPv6 through NXM.
- Added ability to match on TTL in IPv4 and IPv6 through NXM.
- Added ability to modify ECN bits in IPv4.
- Added ability to modify TTL in IPv4.
- ovs-vswitchd:
- Don't require the "normal" action to use mirrors. Traffic will
now be properly mirrored for any flows, regardless of their
actions.
- Track packet and byte statistics sent on mirrors.
- The sFlow implementation can now usually infer the correct agent
device instead of having to be told explicitly.
- ovs-appctl:
- New "fdb/flush" command to flush bridge's MAC learning table.
- ovs-test:
- A new distributed testing tool that allows one to diagnose performance
and connectivity issues. This tool currently is not included in RH or
Xen packages.
- RHEL packaging now supports integration with Red Hat network scripts.
- bonding:
- Post 1.4.*, OVS will be changing the default bond mode from balance-slb
to active-backup. SLB bonds carry significant risks with them
(documented vswitchd/INTERNALS) which we want to prevent unsuspecting
users from running into. Users are advised to update any scripts or
configuration which may be negatively impacted by explicitly setting
the bond mode which they want to use.
2011-10-25 12:37:26 -07:00
v1.3.0 - 09 Dec 2011
------------------------
- OpenFlow:
- Added an OpenFlow extension which allows the "output" action to accept
NXM fields.
- Added an OpenFlow extension for flexible learning.
- Bumped number of NXM registers from four to five.
- ovs-appctl:
- New "version" command to determine version of running daemon.
- If no argument is provided for "cfm/show", displays detailed
information about all interfaces with CFM enabled.
- If no argument is provided for "lacp/show", displays detailed
information about all ports with LACP enabled.
- ovs-dpctl:
- New "set-if" command to modify a datapath port's configuration.
- ovs-vswitchd:
- The software switch now supports 255 OpenFlow tables, instead
of just one. By default, only table 0 is consulted, but the
new NXAST_RESUBMIT_TABLE action can look up in additional
tables. Tables 128 and above are reserved for use by the
switch itself; please use only tables 0 through 127.
- Add support for 802.1D spanning tree (STP).
Implement new fragment handling policy. Until now, OVS has handled IP fragments more awkwardly than necessary. It has not been possible to match on L4 headers, even in fragments with offset 0 where they are actually present. This means that there was no way to implement ACLs that treat, say, different TCP ports differently, on fragmented traffic; instead, all decisions for fragment forwarding had to be made on the basis of L2 and L3 headers alone. This commit improves the situation significantly. It is still not possible to match on L4 headers in fragments with nonzero offset, because that information is simply not present in such fragments, but this commit adds the ability to match on L4 headers for fragments with zero offset. This means that it becomes possible to implement ACLs that drop such "first fragments" on the basis of L4 headers. In practice, that effectively blocks even fragmented traffic on an L4 basis, because the receiving IP stack cannot reassemble a full packet when the first fragment is missing. This commit works by adding a new "fragment type" to the kernel flow match and making it available through OpenFlow as a new NXM field named NXM_NX_IP_FRAG. Because OpenFlow 1.0 explicitly says that the L4 fields are always 0 for IP fragments, it adds a new OpenFlow fragment handling mode that fills in the L4 fields for "first fragments". It also enhances ovs-ofctl to allow users to configure this new fragment handling mode and to parse the new field. Signed-off-by: Ben Pfaff <blp@nicira.com> Bug #7557.
2011-10-19 21:33:44 -07:00
- Fragment handling extensions:
- New OFPC_FRAG_NX_MATCH fragment handling mode, in which L4
fields are made available for matching in fragments with
offset 0.
- New NXM_NX_IP_FRAG match field for matching IP fragments (usable
via "ip_frag" in ovs-ofctl).
- New ovs-ofctl "get-frags" and "set-frags" commands to get and set
fragment handling policy.
- CAPWAP tunneling now supports an extension to transport a 64-bit key.
By default it remains compatible with the old version and other
standards-based implementations.
- Flow setups are now processed in a round-robin manner across ports
to prevent any single client from monopolizing the CPU and conducting
a denial of service attack.
- Added support for native VLAN tagging. A new "vlan_mode"
parameter can be set for "port". Possible values: "access",
"trunk", "native-tagged" and "native-untagged".
- test-openflowd has been removed. Please use ovs-vswitchd instead.
2011-08-03 15:09:45 -07:00
v1.2.0 - 03 Aug 2011
------------------------
- New "ofproto" abstraction layer to ease porting to hardware
switching ASICs.
- Packaging for Red Hat Enterprise Linux 5.6 and 6.0.
- Datapath support for Linux kernels up to 3.0.
- OpenFlow:
- New "bundle" and "bundle_load" action extensions.
- Database:
- Implement table unique constraints.
- Support cooperative locking between callers.
- ovs-dpctl:
- New "-s" option for "show" command prints packet and byte
counters for each port.
- ovs-ofctl:
- New "--readd" option for "replace-flows".
- ovs-vsctl:
- New "show" command to print an overview of configuration.
- New "comment" command to add remark that explains intentions.
- ovs-brcompatd has been rewritten to fix long-standing bugs.
- ovs-openflowd has been renamed test-openflowd and moved into the
tests directory. Its presence confused too many users. Please
use ovs-vswitchd instead.
- New ovs-benchmark utility to test flow setup performance.
- A new log level "off" has been added. Configuring a log facility
"off" prevents any messages from being logged to it. Previously,
"emer" was effectively "off" because no messages were ever logged at
level "emer". Now, errors that cause a process to exit are logged
at "emer" level.
- "configure" option --with-l26 has been renamed --with-linux, and
--with-l26-source has been renamed --with-linux-source. The old
names will be removed after the next release, so please update
your scripts.
- The "-2.6" suffix has been dropped from the datapath/linux-2.6 and
datapath/linux-2.6/compat-2.6 directories.
- Feature removals:
- Dropped support for "tun_id_from_cookie" OpenFlow extension.
Please use the extensible match extensions instead.
- Removed the Maintenance_Point and Monitor tables in an effort
to simplify 802.1ag configuration.
- Performance and scalability improvements
- Bug fixes
2011-04-05 22:17:03 -07:00
v1.1.0 - 05 Apr 2011
------------------------
- Ability to define policies over IPv6
- LACP
- 802.1ag CCM
- Support for extensible match extensions to OpenFlow
- QoS:
- Support for HFSC qdisc.
- Queue used by in-band control can now be configured.
- Kernel:
- Kernel<->userspace interface has been reworked and should be
close to a stable ABI now.
- "Port group" concept has been dropped.
- GRE over IPSEC tunnels
- Bonding:
- New active backup bonding mode.
- New L4 hashing support when LACP is enabled.
- Source MAC hash now includes VLAN field also.
- miimon support.
- Greatly improved handling of large flow tables
- ovs-dpctl:
- "show" command now prints full vport configuration.
- "dump-groups" command removed since kernel support for
port groups was dropped.
- ovs-vsctl:
- New commands for working with the new Managers table.
- "list" command enhanced with new formatting options and --columns
option.
- "get" command now accepts new --id option.
- New "find" command.
- ovs-ofctl:
- New "queue-stats" command for printing queue stats.
- New commands "replace-flows" and "diff-flows".
- Commands to add and remove flows can now read from files.
- New --flow-format option to enable or disable NXM.
- New --more option to increase OpenFlow message verbosity.
- Removed "tun-cookie" command, which is no longer useful.
- ovs-controller enhancements for testing various features.
- New ovs-vlan-test command for testing for Linux kernel driver VLAN
bugs. New ovs-vlan-bug-workaround command for enabling and
disabling a workaround for these driver bugs.
- OpenFlow support:
- "Resubmit" actions now update flow statistics.
2011-04-05 22:17:03 -07:00
- New "register" extension for use in matching and actions, via NXM.
- New "multipath" experimental action extension.
- New support for matching multicast Ethernet frames, via NXM.
- New extension for OpenFlow vendor error codes.
- New extension to set the QoS output queue without actually
sending to an output port.
- Open vSwitch now reports a single flow table, instead of
separate hash and wildcard tables. This better models the
current implementation.
- New experimental "note" action.
- New "ofproto/trace" ovs-appctl command and associated utilities
to ease debugging complex flow tables.
- Database:
- Schema documentation now includes an entity-relationship diagram.
- The database is now garbage collected. In most tables,
unreferenced rows will be deleted automatically.
- Many tables now include statistics updated periodically by
ovs-vswitchd or ovsdb-server.
- Every table now has an "external-ids" column for use by OVS
integrators.
- There is no default controller anymore. Each bridge must have its
controller individually specified.
- The "fail-mode" is now a property of a Bridge instead of a Controller.
- New versioning and checksum features.
- New Managers table and manager_options column in Open_vSwitch table
for specifying managers. The old "managers" column in the
Open_vSwitch table has been removed.
- Many "name" columns are now immutable.
- Feature removals:
- Dropped support for XenServer pre-5.6.100.
- Dropped support for Linux pre-2.6.18.
- Dropped controller discovery support.
- Dropped "ovs-ofctl status" and the OpenFlow extension that it used.
Statistics reporting in the database is a rough equivalent.
- Dropped the "corekeeper" package (now separate, at
http://openvswitch.org/cgi-bin/gitweb.cgi?p=corekeeper).
- Performance and scalability improvements
- Bug fixes
2010-09-13 21:55:56 -07:00
v1.1.0pre2 - 13 Sep 2010
------------------------
- Bug fixes
v1.1.0pre1 - 31 Aug 2010
------------------------
2010-08-31 14:16:14 -07:00
- OpenFlow 1.0 slicing (QoS) functionality
- Python bindings for configuration database (no write support)
- Performance and scalability improvements
- Bug fixes
2010-05-31 22:31:07 -07:00
v1.0.1 - 31 May 2010
--------------------
- New "patch" interface type
- Bug fixes
2010-05-15 00:23:50 -07:00
v1.0.0 - 15 May 2010
--------------------
- Configuration database with remote management
- OpenFlow 1.0
- GRE tunneling
- Support for XenServer 5.5 and 5.6
- Performance and scalability improvements
- Bug fixes
v0.99.2 - 18 Feb 2010
---------------------
- Bug fixes
v0.99.1 - 25 Jan 2010
---------------------
- Add support for sFlow(R)
- Make headers compatible with C++
- Bug fixes
v0.99.0 - 14 Jan 2010
---------------------
- User-space forwarding engine
- Bug fixes
2009-11-29 11:23:32 -08:00
v0.90.7 - 29 Nov 2009
---------------------
- Add support for NetFlow active timeouts
- Bug fixes
2009-10-06 14:25:36 -07:00
v0.90.6 - 6 Oct 2009
--------------------
- Bug fixes
2009-09-21 13:08:12 -07:00
v0.90.5 - 21 Sep 2009
2009-09-07 20:44:25 -07:00
---------------------
- Generalize in-band control to more diverse network setups
- Bug fixes