Modified the dplcs info-get command output to include
the count for different dpcls implementations.
$ovs-appctl dpif-netdev/subtable-lookup-info-get
Available dpcls implementations:
autovalidator (Use count: 1, Priority: 5)
generic (Use count: 0, Priority: 1)
avx512_gather (Use count: 0, Priority: 3)
Test case to verify changes:
1061: PMD - dpcls configuration ok
Signed-off-by: Kumar Amber <kumar.amber@intel.com>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Co-authored-by: Harry van Haaren <harry.van.haaren@intel.com>
Co-authored-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
The FreeBSD CI builds keep failing because processes of tests are not
properly killed. This leaves the build hanging until it times out, and
ultimately fails.
Changes name of pidfile pid2 to 2.pid so that the
on_exit 'kill `cat *.pid`' will capture all pidfiles.
Fixes pidfile name logic in test that uses OVSDB_SERVER_SHUTDOWN_N, so
that all pidfile names match the form *.pid.
Replaces unnecessary --pidfile="`pwd`"/pid with just --pidfile, because
by default this argument creates a pidfile named <proc-name>.pid.
Removes extra [test ! -e pid || kill `cat pid`] that run upon AT_CHECK
failure, because those processes are killed with on_exit. Also adds
on_exit in tests where it was missing.
Fixes: 561205007e17 ("tests: Get rid of overly specific --pidfile and --unixctl options.")
Fixes: 0be15ad76f0f ("ovsdb-server.at: Add unit test for record/replay.")
Fixes: 7964ffe7d2bf ("ovsdb: relay: Add support for transaction forwarding.")
Fixes: e879d33e8398 ("ovsdb/jsonrpc-server: ovsdb-server closes accepted connections immediately.")
Reported-at: https://github.com/cirruslabs/cirrus-ci-docs/issues/910
Signed-off-by: Rosemarie O'Riorden <roriorden@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
When setting just one ofp version to protocols of bridge, The function
get_highest_ofp_version in ovs-save parse it error.
For example:
$ ovs-vsctl get bridge br-int protocols
[OpenFlow15]
$ ovs-vsctl get bridge br-int protocols |
sed 's/[][]//g' | sed 's/\ //g' | awk -F ',' '{ print (NF>1)? $(NF) : "OpenFlow14" }'
OpenFlow14
Signed-off-by: Han Ding <handing@chinatelecom.cn>
Acked-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The hash of the port number is only needed when a DPCLS needs to be
created. Move the hash calculation inside the if to accomplish this.
Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Note: There still is an UB instance when using SSE4.2 as reported here:
https://mail.openvswitch.org/pipermail/ovs-dev/2022-January/390904.html
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Some openflow actions can be misaligned, e.g., actions whithin OF 1.0
replies to statistics reply messages which have a header of 12 bytes
and no additional padding.
Also, buggy controllers might incorrectly encode actions.
When decoding multiple actions in ofpacts_decode(), make sure that
when advancing to the next action it will be properly aligned
(multiple of OFPACT_ALIGNTO).
Detected by UB Sanitizer when running one of the fuzz tests:
lib/ofp-actions.c:5347:12:
runtime error: member access within misaligned address 0x0000016ba274
for type 'const struct nx_action_learn', which requires 8 byte alignment
0x0000016ba274: note: pointer points here
20 20 20 20 ff ff 00 38 00 00 23 20 00 10 20 20
^
20 20 20 20 20 20 20 20 20 20 20 20 00 03 20 00
0 0x52cece in decode_LEARN_common lib/ofp-actions.c:5347
1 0x52dcf6 in decode_NXAST_RAW_LEARN lib/ofp-actions.c:5463
2 0x548604 in ofpact_decode lib/ofp-actions.inc2:4723
3 0x53ee43 in ofpacts_decode lib/ofp-actions.c:7781
4 0x53efc1 in ofpacts_pull_openflow_actions__ lib/ofp-actions.c:7820
5 0x5409e1 in ofpacts_pull_openflow_instructions lib/ofp-actions.c:8396
6 0x5608a8 in ofputil_decode_flow_stats_reply lib/ofp-flow.c:1100
Acked-by: Adrian Moreno <amorenoz@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
For example is parsing the OVN "eth.dst[40] = 1;" action, which
triggered the following warning from UndefinedBehaviorSanitizer:
lib/meta-flow.c:3210:9:
runtime error: member access within misaligned address 0x000000de4e36
for type 'const union mf_value', which requires 8 byte alignment
0x000000de4e36: note: pointer points here
00 00 00 00 01 00 00 00 00 00 00 00 00 00 70 4e de 00 00 00 00 00
^
10 51 de 00 00 00 00 00 c0 4f
0 0x5818bc in mf_format lib/meta-flow.c:3210
1 0x5b6047 in format_SET_FIELD lib/ofp-actions.c:3342
2 0x5d68ab in ofpact_format lib/ofp-actions.c:9213
3 0x5d6ee0 in ofpacts_format lib/ofp-actions.c:9237
4 0x410922 in test_parse_actions tests/test-ovn.c:1360
To avoid this we now change the internal representation of the set_field
actions, struct ofpact_set_field, such that the mask is always stored
at a correctly aligned address, multiple of OFPACT_ALIGNTO.
We also need to adapt the "ovs-ofctl show-flows - Oversized flow" test
because now the ofpact representation of the set_field action uses more
bytes in memory (for the extra alignment). Change the test to use
dec_ttl instead.
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This is undefined behavior and was reported by UB Sanitizer:
lib/meta-flow.c:3445:16: runtime error:
member access within null pointer of type 'struct vl_mf_field'
0 0x6aad0f in mf_get_vl_mff lib/meta-flow.c:3445
1 0x6d96d7 in mf_from_oxm_header lib/nx-match.c:260
2 0x6d9e2e in nx_pull_header__ lib/nx-match.c:341
3 0x6daafa in nx_pull_header lib/nx-match.c:488
4 0x6abcb6 in mf_vl_mff_nx_pull_header lib/meta-flow.c:3605
5 0x73b9be in decode_NXAST_RAW_REG_MOVE lib/ofp-actions.c:2652
6 0x764ccd in ofpact_decode lib/ofp-actions.inc2:4681
[...]
lib/sset.c:315:12: runtime error: applying zero offset to null pointer
0 0xcc2e6a in sset_at_position lib/sset.c:315:12
1 0x5734b3 in port_dump_next ofproto/ofproto-dpif.c:4083:20
[...]
lib/ovsdb-data.c:2194:56:
runtime error: applying zero offset to null pointer
0 0x5e9530 in ovsdb_datum_added_removed lib/ovsdb-data.c:2194:56
1 0x4d6258 in update_row_ref_count ovsdb/transaction.c:335:17
2 0x4c360b in for_each_txn_row ovsdb/transaction.c:1572:33
[...]
lib/ofpbuf.c:440:30:
runtime error: applying zero offset to null pointer
0 0x75066d in ofpbuf_push_uninit lib/ofpbuf.c:440
1 0x46ac8a in ovnacts_parse lib/actions.c:4190
2 0x46ad91 in ovnacts_parse_string lib/actions.c:4208
3 0x4106d1 in test_parse_actions tests/test-ovn.c:1324
[...]
lib/ofp-actions.c:3205:22:
runtime error: applying non-zero offset 2 to null pointer
0 0x6e1641 in set_field_split_str lib/ofp-actions.c:3205:22
[...]
lib/tnl-ports.c:74:12:
runtime error: applying zero offset to null pointer
0 0xceffe7 in tnl_port_cast lib/tnl-ports.c:74:12
1 0xcf14c3 in map_insert lib/tnl-ports.c:116:13
[...]
ofproto/ofproto.c:8905:16:
runtime error: applying zero offset to null pointer
0 0x556795 in eviction_group_hash_rule ofproto/ofproto.c:8905:16
1 0x503f8d in eviction_group_add_rule ofproto/ofproto.c:9022:42
[...]
Also, it's valid to have an empty ofpact list and we should be able to
try to iterate through it.
UB Sanitizer report:
include/openvswitch/ofp-actions.h:222:12:
runtime error: applying zero offset to null pointer
0 0x665d69 in ofpact_end ./include/openvswitch/ofp-actions.h:222:12
1 0x66b2cf in ofpacts_put_openflow_actions lib/ofp-actions.c:8861:5
2 0x6ffdd1 in ofputil_encode_flow_mod lib/ofp-flow.c:447:9
[...]
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
UB Sanitizer reports:
tests/test-hash.c:59:40:
runtime error: shift exponent 64 is too large for 64-bit type
'long unsigned int'
0 0x44c3c9 in get_range128 tests/test-hash.c:59
1 0x44cb2e in check_hash_bytes128 tests/test-hash.c:178
2 0x44d14d in test_hash_main tests/test-hash.c:282
[...]
ofproto/ofproto-dpif-xlate.c:5607:45:
runtime error: left shift of 65535 by 16 places cannot be represented
in type 'int'
0 0x53fe9f in xlate_sample_action ofproto/ofproto-dpif-xlate.c:5607
1 0x54d625 in do_xlate_actions ofproto/ofproto-dpif-xlate.c:7160
2 0x553b76 in xlate_actions ofproto/ofproto-dpif-xlate.c:7806
3 0x4fcb49 in upcall_xlate ofproto/ofproto-dpif-upcall.c:1237
4 0x4fe02f in process_upcall ofproto/ofproto-dpif-upcall.c:1456
5 0x4fda99 in upcall_cb ofproto/ofproto-dpif-upcall.c:1358
[...]
tests/test-util.c:89:23:
runtime error: left shift of 1 by 31 places cannot be represented in
type 'int'
0 0x476415 in test_ctz tests/test-util.c:89
[...]
lib/dpif-netlink.c:396:33:
runtime error: left shift of 1 by 31 places cannot be represented in
type 'int'
0 0x571b9f in dpif_netlink_open lib/dpif-netlink.c:396
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
In some places it is using Markdown syntax and in others
it is not needed as there is already a code block.
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
My xml editor keeps autofixing these which means I have to be
careful during 'git add' for unrelated changes. Might as well
just fix them.
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Currently, ovs_dump_packets will break the formatting of the GDB
terminal UI, resulting in artifacts displayed on the screen that
may make packets difficult to read. This patch suppresses stderr
output from tcpdump and feeds tcpdumps stdout into the paginated
output stream.
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
During the revalidation/upcall, mirror could be removed. Instead of crash
the process, we can simply skip the deleted mirror.
The issue had been triggered multiple times by ovs-tcpdump in my test.
Fixes: ec7ceaed4f3e ("ofproto-dpif: Modularize mirror code.")
Signed-off-by: lic121 <lic121@chinatelecom.cn>
Tested-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The QEMU version requirement of >= 2.7 is for vhost-user-client ports
specifically.
Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Tunnel offload APIs have '__rte_experimental' attribute, therefore
available only if ALLOW_EXPERIMENTAL_API is defined. Documente it.
Signed-off-by: Eli Britstein <elibr@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The function dp_netdev_pmd_flush_output_on_port() iterates over the
p->output_pkts batch directly, when it should be using the special
iterator macro, DP_PACKET_BATCH_FOR_EACH.
However, this wasn't possible because the macro could not accept
&p->output_pkts.
The addition of parentheses when BATCH is dereferenced allows the macro
to expand properly. Parenthesizing arguments in macros is good practice
to be able to handle whichever expressions are passed in.
Signed-off-by: Rosemarie O'Riorden <roriorden@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Dropped packets were not counted as tx_dropped when a DPDK netdev is
down (like after calling netdev-dpdk/set-admin-state dpdk1 down).
Fixes: 3b1fb0779b87 ("netdev-dpdk: Don't call rte_dev_stop() in update_flags().")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
A lock annotation was left behind after removing the nonpmd mutex.
Remove it.
Fixes: 1166b0d82043 ("netdev-dpdk: Remove useless nonpmd_mempool_mutex.")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This patch split out the common code between vhost and
dpdk transmit paths to shared functions to simplify the
code and fix an issue.
The issue is that the packet coming from non-DPDK device
and egressing on a DPDK device currently skips the hwol
preparation.
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Co-authored-by: Mike Pattrick <mkp@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
nx_to_ofp_flow_update_event() aborts the execution if incorrect
event is passed, so checking has to be done before conversion
in order to avoid the crash while decoding malformed flow update
message:
==397030==ERROR: AddressSanitizer: ABRT on unknown address 0x... )
0 0x7fd26688418b in raise
1 0x7fd266863858 in abort
2 0x6a6cbd in nx_to_ofp_flow_update_event lib/ofp-monitor.c:399:9
3 0x6a6cbd in ofputil_decode_flow_update lib/ofp-monitor.c:856:25
4 0x56491d in ofp_print_flow_monitor_reply lib/ofp-print.c:779:22
5 0x55f0a0 in ofp_to_string__ lib/ofp-print.c:1154:16
6 0x55f0a0 in ofp_to_string lib/ofp-print.c:1244:21
7 0x5603a5 in ofp_print lib/ofp-print.c:1288:28
Credit to OSS-Fuzz.
Additionally removed the extra 'reply' word from the error message,
since ofpraw_get_name(raw) already has one.
Fixes: c3e64047d1cc ("ofp-monitor: Support flow monitoring for OpenFlow 1.3, 1.4+.")
Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=47112
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Tunnels in LibreSwan and OpenSwan allow for many options to be set on a
per tunnel basis. Pass through any options starting with ipsec_ to the
connection in the configuration file. Administrators are responsible for
picking valid key/value pairs.
Signed-off-by: Andreas Karis <ak.karis@gmail.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
For packets which don't already have a hash calculated,
miniflow_hash_5tuple() calculates the hash of a packet
using the previously built miniflow.
This commit adds IPv4 profile specific hashing which
uses fixed offsets into the packet to improve hashing
performance.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Co-authored-by: Harry van Haaren <harry.van.haaren@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Co-authored-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Kumar Amber <kumar.amber@intel.com>
Acked-by: Cian Ferriter <cian.ferriter@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
For VLANs, the match of ethernet type should be specified in inner_type
field of the vlan match, and not type field in ethernet match.
Fix it.
Fixes: e8a2b5bf92bb ("netdev-dpdk: implement flow offload with rte flow")
Signed-off-by: Eli Britstein <elibr@nvidia.com>
Reviewed-by: Salem Sol <salems@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
DPDK 20.11 introduced an ability to specify existance/non-existance of
VLAN tag by [1].
Use this attribute.
[1]: 09315fc83861 ("ethdev: add VLAN attributes to ethernet and VLAN items")
Signed-off-by: Eli Britstein <elibr@nvidia.com>
Reviewed-by: Salem Sol <salems@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Prior to 4e3966e64, when calling _uuid_to_row, it would raise an
AttributeError when trying to access base.ref_table.rows if the
referenced table was not registered. When called from
Row.__getattr__(), this would appropriately raise an AttributeError.
After 4e3966e64, a KeyError would be raised, which is not expected
from a getattr() or hasattr() call, which could break existing
code.
Fixes: 4e3966e64bed ("python: Politely handle misuse of table.condition.")
Signed-off-by: Terry Wilson <twilson@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This patch removes the newly added NEWS entry and adds it as a leaf
under post 2.17.
Add OVS version instead of specifying that the feature is supported
for IPv6 connection tracking and Genenve IPv6 tunnels.
Signed-off-by: Alin-Gabriel Serdean <aserdean@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Extended OpenFlow monitoring support
* OpenFlow 1.3 with ONF extensions
* OpenFlow 1.4+ as defined in OpenFlow specification 1.4+.
ONF extensions are similar to Nicira extensions except for onf_flow_monitor_request{}
where out_port is defined as 32-bit number OF(1.1) number, oxm match formats are
used in update and request messages.
Flow monitoring support in 1.4+ is slightly different from Nicira and ONF
extensions.
* More flow monitoring flags are defined.
* Monitor add/modify/delete command is introduced in flow_monitor
request message.
* Addition of out_group as part of flow_monitor request message
Description of changes:
1. Generate ofp-msgs.inc to be able to support 1.3, 1.4+ flow Monitoring messages.
include/openvswitch/ofp-msgs.h
2. Modify openflow header files with protocol specific headers.
include/openflow/openflow-1.3.h
include/openflow/openflow-1.4.h
3. Modify OvS abstraction of openflow headers. ofp-monitor.h leverages enums
from on nicira extensions for creating protocol abstraction headers. OF(1.4+)
enums are superset of nicira extensions.
include/openvswitch/ofp-monitor.h
4. Changes to these files reflect encoding and decoding of new protocol messages.
lib/ofp-monitor.c
5. Changes to modules using ofp-monitor APIs. Most of the changes here are to
migrate enums from nicira to OF 1.4+ versions.
ofproto/connmgr.c
ofproto/connmgr.h
ofproto/ofproto-provider.h
ofproto/ofproto.c
6. Extended protocol decoding tests to verify all protocol versions
FLOW_MONITOR_CANCEL
FLOW_MONITOR_PAUSED
FLOW_MONITOR_RESUMED
FLOW_MONITOR request
FLOW_MONITOR reply
tests/ofp-print.at
7. Modify flow monitoring tests to be able executed by all protocol versions.
tests/ofproto.at
7. Modified documentation highlighting the change
utilities/ovs-ofctl.8.in
NEWS
Signed-off-by: Vasu Dasari <vdasari@gmail.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-June/383915.html
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Currently OVS supports flow-monitoring for OpenFlow 1.0 and Nicira Extenstions.
Any other OpenFlow versioned messages are not accepted. This change will allow
OpenFlow1.0-1.2 Flow Monitoring with Nicira extensions be accepted. Also made
sure that flow-monitoring updates, flow monitoring pause messages, resume
messages are sent in the same OpenFlow version as that of flow-monitor request.
Description of changes:
1. Generate ofp-msgs.inc to be able to support 1.0-1.2 Flow Monitoring messages.
include/openvswitch/ofp-msgs.h
2. Support vconn to accept user specified version and use it for vconn
flow-monitoring session
ofproto/ofproto.c
3. Modify APIs to use protocol as an argument to encode and decode messages
include/openvswitch/ofp-monitor.h
lib/ofp-monitor.c
ofproto/connmgr.c
ofproto/connmgr.h
ofproto/ofproto.c
4. Modified following testcases to be verified across supported OF Versions
ofproto - flow monitoring
ofproto - flow monitoring with !own
ofproto - flow monitoring with out_port
ofproto - flow monitoring pause and resume
ofproto - flow monitoring usable protocols
tests/ofproto.at
5. Updated NEWS with the support added with this commit
Signed-off-by: Vasu Dasari <vdasari@gmail.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2020-December/050820.html
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
At a first glance, change tracking should never be allowed for
write-only columns. However, some clients (e.g., ovn-northd) that are
mostly exclusive writers of a database, use change tracking to avoid
duplicating the IDL row records into a local cache when implementing
incremental processing.
The default behavior of the IDL is to automatically turn a write-only
column into a read-write column whenever the client enables change
tracking for that column.
For the afore mentioned clients, this becomes a performance issue.
Commit 1cc618c32524 ("ovsdb-idl: Fix atomicity of writes that don't
change a column's value.") explains why writes that don't change a
column's value cannot be optimized out early if the column is
read/write.
Furthermore, if there is at least one record in any table that
changed during a transaction, then *all* records that have been
written are added to the transaction, even if their values didn't
change. If there are many such rows (e.g., like in ovn-northd's
case) this incurs a significant overhead because:
a. the client has to build this large transaction
b. the transaction has to be sent over the network
c. the server needs to parse this (mostly) no-op update
We now introduce new IDL APIs allowing users to set a new monitoring
mode flag, OVSDB_IDL_WRITE_CHANGED_ONLY, to indicate to the IDL that the
atomicity constraints may be relaxed and written columns that don't
change value can be skipped from the current transaction.
We benchmarked ovn-northd performance when using this new mode
against NB and SB databases taken from ovn-kubernetes scale tests.
We noticed that when a minor change is performed to the Northbound
database (e.g., NB_Global.nb_cfg is incremented) the time it takes to
build the Southbound transaction becomes negligible (vs ~1.5 seconds
before this change).
End-to-end ovn-kubernetes scale tests on 120-node clusters also show
significant reduction of latency to bring up pods; both average and P99
latency decreased by ~30%.
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
When a packet is received over an access port that needs to be sent
over a vxlan tunnel,the access port VLAN id is used in the lookup
leading to a wrong packet being crafted and sent over the tunnel.
Clear out the flow 's VLAN field as it should not be used while
performing mac lookup for the outer tunnel and also at this point
the VLAN action related to inner flow is already committed.
Fixes: 7c12dfc527a5 ("tunneling: Avoid datapath-recirc by combining recirc actions at xlate.")
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2022-April/393566.html
Reported-at: https://bugzilla.redhat.com/2060552
Signed-off-by: Thilak Raj Surendra Babu <thilakraj.sb@nutanix.com>
Signed-off-by: Rosemarie O'Riorden <roriorden@redhat.com>
Co-authored-by: Rosemarie O'Riorden <roriorden@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
UINT64_C(1) is required in this bitshift since batch_size can be 32 and
1 << 32 overflows UINT32_C(1).
Fixes: ba0a2619ca0c ("dpif-netdev-avx512: Fix ubsan shift error in bitmasks.")
Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The code changes here are to handle (1 << i) shifts where 'i' is the
packet index in the batch, and 1 << 31 is an overflow of the signed '1'.
Fixed by adding UINT32_C() around the 1 character, ensuring compiler knows
the 1 is unsigned (and 32-bits). Undefined Behaviour sanitizer is now happy
with the bit-shifts at runtime.
Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Before 46d44cf3b, it was technically possible to assign a monitor
condition directly to Idl.tables[table_name].condition. If done
before the connection was established, it would successfully apply
the condition (where cond_change() actually would fail).
Although this wasn't meant to be supported, several OpenStack
projects made use of this. After 46d44cf3b, .condition is no
longer a list, but a ConditionState. Assigning a list to it breaks
the Idl.
The Neutron and ovsdbapp projects have patches in-flight to
use Idl.cond_change() if ConditionState exists, as it now works
before connection as well, but here could be other users that also
start failing when upgrading to OVS 2.17.
Instead of directly adding attributes to TableSchema, this adds
the IdlTable/IdlColumn objects which hold Idl-specific data and
adds a 'condition' property to TableSchema that maintains the old
interface.
Fixes: 46d44cf3be0d ("python: idl: Add monitor_cond_since support.")
Signed-off-by: Terry Wilson <twilson@redhat.com>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-By: Timothy Redaelli <tredaelli@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
A packet received from a tunnel port with legacy_l3 packet-type (e.g.
lisp, L3 gre, gtpu) is conceptually wrapped in a dummy Ethernet header
for processing in an OF pipeline that is not packet-type-aware. Before
transmission of the packet to another legacy_l3 tunnel port, the dummy
Ethernet header is stripped again.
In ofproto-xlate, wrapping in the dummy Ethernet header is done by
simply changing the packet_type to PT_ETH. The generation of the
push_eth datapath action is deferred until the packet's flow changes
need to be committed, for example at output to a normal port. The
deferred Ethernet encapsulation is marked in the pending_encap flag.
This patch fixes a bug in the translation of the output action to a
legacy_l3 tunnel port, where the packet_type of the flow is reverted
from PT_ETH to PT_IPV4 or PT_IPV6 (depending on the dl_type) to remove
its Ethernet header without clearing the pending_encap flag if it was
set. At the subsequent commit of the flow changes, the unexpected
combination of pending_encap == true with an PT_IPV4 or PT_IPV6
packet_type hit the OVS_NOT_REACHED() abortion clause.
The pending_encap is now cleared in this situation.
Reported-by: Dincer Beken <dbeken@blackned.de>
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Co-authored-by: Dincer Beken <dbeken@blackned.de>
Signed-off-by: Dincer Beken <dbeken@blackned.de>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The test relied on the flows installed by recv_upcalls() after
upcall_receive() returned ENODEV if the packet was initially
originated by packet-out with OFPP_CONTROLLER as in_port.
Since 323ae1e808e6 ("ofproto-dpif-xlate: Fix recirculation when in_port is OFPP_CONTROLLER.")
the test stopped working because recirculation in such scenario got
fixed and upcall_receive() no longer returns ENODEV.
Fix it by setting an invalid as "in_port" in order to similarly
trigger the same behavior.
Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Implementation on Windows:
Currently, IPv4 conntrack was supported on the windows platform.
In this patch we have implemented ipv6 conntrack functions according
to the current logic of the IPv4 conntrack. This implementation has
included TcpV6(nat and normal scenario), UdpV6(nat and normal scenario),
IcmpV6 conntrack of echo request/reply packet and
FtpV6(nat and normal scenario).
Testing Topology:
On the Windows VM runs on the ESXi host, two hyper-v ports attached
to the ovs bridge; one hyper-v port worked as client and the
other port worked as server.
Testing Case:
1. TcpV6
a) Tcp request/reply conntrack for normal scenario.
In this scenario, 20::1 as client, 20::2 as server, it will generate
following conntrack entry:
(Origin(src=20::1, src_port=1555, dst=20::2, dst_port=1556),
reply(src=20::2,src_port=1556,dst=20::1,dst_port=1555),protocol=tcp)
b) Tcp request/reply conntrack for nat scenario.
In this scenario, 20::1 as client, 20::10 as floating ip, 21::3 as server,
it will generate following conntrack entry:
(Origin(src=20::1, src_port=1555, dst=20::10, dst_port=1556),
reply(src=21::3, src_port=1556, dst=20::1, dst_port= 1555),protocol=tcp)
2. UdpV6
a) Udp request/reply conntrack for normal scenario.
(Origin(src=20::1, src_port=1555, dst=20::2, dst_port=1556),
reply(src=20::2,src_port=1556,dst=20::1,dst_port=1555),protocol=udp)
b) Udp request/reply conntrack for nat scenario.
(Origin(src=20::1, src_port=1555, dst=20::10, dst_port=1556),
reply(src=21::3, src_port=1556, dst=20::1, dst_port= 1555),protocol=udp)
3. IcmpV6:
a) Icmpv6 request/reply conntrack for normal scenario.
Currently Icmpv6 only support to construct conntrack for
echo request/reply packet, take (20::1 -> 20::2) for example,
it will generate following conntrack entry:
(origin(src = 20::1, dst=20::2), reply(src=20::2, dst=20::1), protocol=icmp)
b) Icmp request/reply conntrack for dnat scenario,
for example (20::1->20::10->21::3), 20::1 is
client, 20::10 is floating ip, 21::3 is server ip.
It will generate flow like below:
(origin(src=20::1, dst=20::10), reply(src=21::3, dst=20::1), protocol=icmp)
4. FtpV6
a) Ftp request/reply conntrack for normal scenario.
In this scenario, take 20::1 as client, 20::2 as server, it will generate
two conntrack entries:
Ftp active mode
(Origin(src=20::1, src_port=1555, dst=20::2, dst_port=21),
reply(src=20::2, src_port=21, dst=20::1, dst_port=1555), protocol=tcp)
(Origin(src=20::2, src_port=20, dst=20::1, dst_port=1556),
reply(src=20::1, src_port=1556, dst=20::2, dst_port=20), protocol=tcp)
Ftp passive mode
(Origin(src=20::1, src_port=1555, dst=20::2, dst_port=21),
reply(src=20::2,src_port=21,dst=20::1,dst_port=1555),protocol=tcp)
(Origin(src=20::1, src_port=1556, dst=20::2, dst_port=1557),
reply(src=20::2,src_port=1557, dst=20::1, dst_port=1556) protocol=tcp)
b) Ftp request/reply conntrack for nat scenario.
Ftp passive mode,
In this secnario, 20::1 as client, 20::10 as floating ip, 21::3 as server
ip. It will generate following flow:
(Origin(src=20::1, src_port=1555, dst=20::10, dst_port=21),
reply(src=21::3, src_port=21, dst=20::1, dst_port= 1555),protocol=tcp)
(Origin(src=20::1, src_port=1556, dst=20::10, dst_port=1557),
reply(src=21::3, src_port=1557, dst=20::1, dst_port= 1556),protocol=tcp)
5. Regression test for IpV4 in Antrea project (about 60 test case)
Future work:
1) IcmpV6 redirect packet conntrack.
2) IpV6 fragment support on Udp.
3) Support napt for IPv6.
4) FtpV6 active mode for nat.
Signed-off-by: ldejing <ldejing@vmware.com>
Signed-off-by: Alin-Gabriel Serdean <aserdean@ovn.org>
Updated 12.2 to 12.3 and 11.4 to 13.0.
'pkg update' fails on 12.2. 11.4 has reached end of life.
Signed-off-by: Rosemarie O'Riorden <roriorden@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
In the first step OVS Windows will support IPv6 tunnel(Geneve IPv6 tunnel).
Implementation on Windows
-------------------------
1. For the IPv6 tunnel support, OvsIPTunnelKey will replace original
OvsIPv4TunnelKey in the related flow context handing.
2. The related src and dst address will be changed to SOCKADDR_INET type from UINT32.
3. For the IPv6 tunnel, one node running OVS-Windows could encapsulate IPv4/IPv6
Packets via IPV6 Geneve Tunnel, and the node could also encapsulate IPv4/IPv6 packet
Via IPv4 Geneve tunnel.
4. Related IPHelper data structure will be adapted to support IPv6 Tunnel. In the IPHelper
part the related Windows API(such as GetUnicastIpAddressTable/GetBestRoute2/GetIpNetEntry2/
ResolveIpNetEntry2) and Windows data structure(MIB_IPFORWARD_ROW2/MIB_IPNET_ROW2/IP_ADDRESS_PREFIX)
Have already supported both IPv4 and IPV6. Now OVS Windows has been adjusted some functions
And data structured to support IPV6 tunnel also.
5. OVS_TUNNEL_KEY_ATTR_IPV6_SRC and OVS_TUNNEL_KEY_ATTR_IPV6_DST filed will be supported in
OVS-Windows kernel for IPV6 tunnel.
Testing done.
-------------------------
Related topo, 1 Windows VM(Win2019) and 2 Ubuntu 16.04 server. Both VMs
Are running on one ESX host.
1. Setup one IPV6 Geneve Tunnel between 1 Windows VM and 1 Ubuntu server.
Windows VM, vif0( 6000::2/40.1.1.10) vif1(5000::2)—— Ubuntu VM Eth2(5000::9), name space ns1
with interface ns1_link_peer(6000::9/40.1.1.2)
Related tunnnel,
ovs-vsctl.exe add-port br-int bms-tun0 -- set interface bms-tun0 type=Geneve options:csum=true
options:key=flow options:local_ip="5000::2" options:remote_ip=flow
In this topo, traffic from Vif0(Win) to ns1_link_peer(Ubuntu) will be gone through the Geneve tunnel
(5000::2—>5000::9) for both IPv4 traffic(40.1.1.10-->40.1.1.2) and IPv6 traffic(6000::2—>6000::9)
2. Setup one IPV4 Geneve Tunnel between Windows VM and 1 Ubuntu server.
Windows VM, vif0( 6000::2/40.1.1.10) vif1(50.1.1.11)—— Ubuntu, Eth2(50.1.1.9), name space ns1
with interface ns1_link_peer(6000::19/40.1.1.9)
Related tunnnel,
ovs-vsctl.exe -- set Interface bms-tun0 type=geneve options:csum=true options:key=flow
options:local_ip="50.1.1.11" options:remote_ip=flow
In this topo, traffic from Vif0(Win) to ns1_link_peer(Ubuntu) will be gone through the Geneve Tunnel
(50.1.1.11—>50.1.1.9) for both IPv4 traffic(40.1.1.10-->40.1.1.9) and IPv6 traffic(6000::2—>6000::19).
3.Regression test for IpV4 in Antrea project (about 60 test case) is PASS
Future Work
-----------
Add other type IPv6 tunnel support for Gre/Vxlan/Stt.
Signed-off-by: Wilson Peng <pweisong@vmware.com>
Signed-off-by: Alin-Gabriel Serdean <aserdean@ovn.org>
This patch checks for none offloadable ct_state match flag combinations.
If they exist force the +trk flag down to TC Flower
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
cond_changed set to true if _req_cond (requested condition change)
is not none. This can avoid falling into an endless poll loop,
because cond_changed is true will trigger immediate_wake().
Fixes: 46d44cf3be0d ("python: idl: Add monitor_cond_since support.")
Signed-off-by: Wentao Jia <wentao.jia@easystack.cn>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The dp_netdev_get_pmd() function is using only the hash of the core_id
to get the pmd structure. So in case of hash collisions, the wrong pmd
is returned.
This patch is fixing this by checking for the correct core_id using
the CMAP_FOR_EACH_WITH_HASH macro.
Fixes: 65f13b50c5aa ("dpif-netdev: Create multiple pmd threads by default.")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
To parse a json string prior to this change, json_lex_input is called
with each character of the string. If the character needs to be copied
to the buffer, it is copied individually. This is an expensive
operation, as often there are multiple characters in a row
that need to be copied, and copying memory in blocks is more efficient
than byte by byte. To improve this, the string is now copied
in blocks with an offset counter. A copy is performed when the parser
state equals done.
Functions that are called for each character use a lot of CPU cycles.
Making these functions inline greatly reduces the cycles used and
improves overall performance. Since json_lex_input was only needed in
one place, it doesn't have to be its own function.
There is also a conditional that checks if the current character is a
new line, which is quite unlikely. When this was examined with perf, the
comparison had a very high CPU cycle usage. To improve this, the
OVS_UNLIKELY macro was used, which forces the compiler to switch the
order of the instructions.
Here is the improvement seen in the json-string-benchmark test:
SIZE Q S BEFORE AFTER CHANGE
--------------------------------------------------------
100000 0 0 : 0.842 ms 0.489 ms -41.9 %
100000 2 1 : 0.917 ms 0.535 ms -41.7 %
100000 10 1 : 1.063 ms 0.656 ms -38.3 %
10000000 0 0 : 85.328 ms 49.878 ms -41.5 %
10000000 2 1 : 92.555 ms 54.778 ms -40.8 %
10000000 10 1 : 106.728 ms 66.735 ms -37.5 %
100000000 0 0 : 955.375 ms 621.950 ms -34.9 %
100000000 2 1 : 1031.700 ms 665.200 ms -35.5 %
100000000 10 1 : 1189.300 ms 796.050 ms -33.0 %
Here Q is probability (%) for a character to be a '\"' and
S is probability (%) to be a special character ( < 32).
Signed-off-by: Rosemarie O'Riorden <roriorden@redhat.com>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
PMD auto load balance currently only operates when the polling PMD
thread core will not change numa after reassignment.
Add a unit test for this.
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>