Change sets in OVSDB monitor are storing all the changes that happened
between a particular transaction ID and now. Initial change set
basically contains all the data.
On each monitor request a new initial change set is created by creating
an empty change set and adding all the database rows. Then it is
converted into JSON reply and immediately untracked and destroyed.
This is causing significant performance issues if many clients are
requesting new monitors at the same time. For example, that is
happening after database schema conversion, because conversion triggers
cancellation of all monitors. After cancellation, every client sends
a new monitor request. The server then creates a new initial change
set, sends a reply, destroys initial change set and repeats that for
each client. On a system with 200 MB database and 500 clients,
cluster of 3 servers spends 20 minutes replying to all the clients
(200 MB x 500 = 100 GB):
timeval|WARN|Unreasonably long 1201525ms poll interval
Of course, all the clients are already disconnected due to inactivity
at this point. When they are re-connecting back, server accepts new
connections one at a time, so inactivity probes will not be triggered
anymore, but it still takes another 20 minutes to handle all the
incoming connections.
Let's keep the initial change set around for as long as the monitor
itself exists. This will allow us to not construct a new change set
on each new monitor request and even utilize the JSON cache in some
cases. All that at a relatively small maintenance cost, since we'll
need to commit changes to one extra change set on every transaction.
Measured memory usage increase due to keeping around a shallow copy
of a database is about 10%. Measured CPU usage difference during
normal operation is negligible.
With this change it takes only 30 seconds to send out all the monitor
replies in the example above. So, it's a 40x performance improvement.
On a more reasonable setup with 250 nodes, the process takes up to
8-10 seconds instead of 4-5 minutes.
Conditional monitoring will benefit from this change as well, however
results might be less impressive due to lack of JSON cache.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Cluster member, that initiates the schema conversion, converts the
database twice. First time while verifying the possibility of the
conversion, and the second time after reading conversion request
back from the storage.
Keep the converted database from the first time around and use it
after reading the request back from the storage. This cuts in half
the conversion CPU cost.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Currently, database schema conversion in case of clustered database
produces a transaction record with both new schema and converted
database data. So, the sequence of events is following:
1. Get the new schema.
2. Convert the database to a new schema.
3. Translate the newly converted database into JSON.
4. Write the schema + data JSON to the storage.
5. Destroy converted version of a database.
6. Read schema + data JSON from the storage and parse.
7. Create a new database from a parsed database data.
8. Replace current database with the new one.
Most of these steps are very computationally expensive. Also,
conversion to/from JSON is much more expensive than direct database
conversion with ovsdb_convert() that can make use of shallow data
copies.
Instead of doing all that, let's make use of previously introduced
ability to not write the converted data into the storage. The process
will look like this then:
1. Get the new schema.
2. Convert the database to a new schema
(to verify that it is possible).
3. Write the schema to the storage.
4. Destroy converted version of a database.
5. Read the new schema from the storage and parse.
6. Convert the database to a new schema.
7. Replace current database with the new one.
Most of the operations here are performed on the small schema object,
instead of the actual database data. Two remaining data operations
(actual conversion) are noticeably faster than conversion to/from
JSON due to reference counting and shallow data copies.
Steps 4-6 can be optimized later to not convert twice on the
process that initiates the conversion.
The change results in following performance improvements in conversion
of OVN_Southbound database schema from version 20.23.0 to 20.27.0
(measured on a single-server RAFT cluster with no clients):
| Before | After
+---------+-------------------+---------+------------------
DB size | Total | Max poll interval | Total | Max poll interval
--------+---------+-------------------+---------+------------------
542 MB | 47 sec. | 26 sec. | 15 sec. | 10 sec.
225 MB | 19 sec. | 10 sec. | 6 sec. | 4.5 sec.
542 MB database had 19.5 M atoms, 225 MB database had 7.5 M atoms.
Overall performance improvement is about 3x.
Also, note that before this change database conversion basically
doubles the database file on disk. Now it only writes a small
schema JSON.
Since the change requires backward-incompatible database file format
changes, documentation is updated on how to perform an upgrade.
Handled the same way as we did for the previous incompatible format
change in 2.15 (column diffs).
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2022-December/052140.html
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
If the schema with no data was read from the clustered storage, it
should mean a database conversion request. In general, we can get:
1. Just data --> Transaction record.
2. Schema + Data --> Database conversion or raft snapshot install.
3. Just schema --> New. Database conversion request.
We cannot distinguish between conversion and snapshot installation
request in the current implementation, so we will keep handling
conversion with data in the same way as before, i.e. if data is
provided, we should use it.
ovsdb-tool is updated to handle this record type as well while
converting cluster to standalone.
This change doesn't introduce a way for such records to appear in
the database. That will be added in the future commits targeting
conversion speed increase.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Clustered databases do not support ephemeral columns, but ovsdb-server
checks for them after the conversion result is read from the storage.
It's much easier to recover if this constraint is checked before writing
to the storage instead.
It's not a big problem, because the check is always performed by the
native ovsdb clients before sending a conversion request. But the
server, in general, should not trust clients to do the right thing.
Check in the update_schema() remains, because we shouldn't blindly
trust the storage.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
If database conversion happens, both schema and the new data are
present in the database record. However, the schema is just silently
ignored by ovsdb-tool cluster-to-standalone. This creates data
inconsistency if the new data contains new columns, for example, so
the resulting database file will not be readable, or data will be lost.
Fix that by re-setting the database whenever a conversion record is
found and actually writing a new schema that will match the actual
data. The database file will not be that similar to the original,
but there is no way to represent conversion in a standalone database
file format otherwise.
Fixes: 00de46f9ee42 ("ovsdb-tool: Convert clustered db to standalone db.")
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Caught during some code review.
SUPPORT_TC_INGRESS_PPS has been replaced with CHECK_TC_INGRESS_PPS().
Fixes: 5f0fdf5e2c2e ("test: Move check for tc ingress pps support to test script.")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Since 3d9c1b855a5f ("conntrack: Replace timeout based expiration lists
with rculists.") the sweep interval changed as well as the constraints
related to the sweeper.
Being able to change the default reschedule time may be convenient in
some conditions, like debugging.
This patch introduces new commands allowing to get and set the sweep
interval in ms.
Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Testing that RPMs can be built to catch possible spec file
issues like missing dependencies.
GitHub seems to have an agreement with Docker Hub about rate
limiting of image downloads, so it should not affect us.
We may switch to quay.io if that will ever become a problem
in the future.
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
If there is a pipe behind ovs-tcpdump (such as ovs-tcpdump -i eth0
| grep "192.168.1.1"), the child process (grep "192.168.1.1") may
exit first and close the pipe when received SIGTERM. When farther
process (ovs-tcpdump) exit, stdout is flushed into broken pipe, and
then received a exception IOError. To avoid such problems,
ovs-tcpdump first close stdout before exit.
Signed-off-by: Songtao Zhan <zhanst1@chinatelecom.cn>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The ofproto layer currently treats nw_proto field as overloaded to mean
both that a proper nw layer exists, as well as the value contained in
the header for the nw proto. However, this is incorrect behavior as
relevant standards permit that any value, including '0' should be treated
as a valid value.
Because of this overload, when the ofproto layer builds action list for
a packet with nw_proto of 0, it won't build the complete action list that
we expect to be built for the packet. That will cause a bad behavior
where all packets passing the datapath will fall into an incomplete
action set.
The fix here is to unwildcard nw_proto, allowing us to preserve setting
actions for protocols which we know have support for the actions we
program. This means that a traffic which contains nw_proto == 0 cannot
cause connectivity breakage with other traffic on the link.
Reported-by: David Marchand <dmarchand@redhat.com>
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2134873
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Declaration of 'struct conn' will not be visible outside of this function.
Declaration of 'struct conntrack' will not be visible outside of this function.
Declaration of 'struct timeout_policy' will not be visible outside of this function.
Signed-off-by: Lin Huang <linhuang@ruijie.com.cn>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The device may be deleted and added with ifindex changed.
The tc rules on the device will be deleted if the device is deleted.
The func tc_del_filter will fail when flow del. The mapping of
ufid to tc will not be deleted.
The traffic will trigger the same flow(with same ufid) to put to tc
on the new device. Duplicated ufid mapping will be added.
If the hashmap is expanded, the old mapping entry will be the first entry,
and now the dp flow can't be deleted.
Signed-off-by: Faicker Mo <faicker.mo@ucloud.cn>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The commit b8bf410a5 [0] broke the `ovs-vsctl add` command
which now overwrites the value if it existed already.
This patch reverts the code around the `cmd_add` function
to restore the previous behavior. It also adds testing coverage
for this functionality.
[0] b8bf410a5c
Fixes: b8bf410a5c94 ("db-ctl-base: Use partial map/set updates for last add/set commands.")
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2182767
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Daniel Alvarez Sanchez <dalvarez@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The description of SRv6 was missing in vswitch.xml, which is
used to generate the man page, so this patch adds it.
Fixes: 03fc1ad78521 ("userspace: Add SRv6 tunnel support.")
Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This patch adds ODP actions for SRv6 and its tests.
Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
SRv6 (Segment Routing IPv6) tunnel vport is responsible
for encapsulation and decapsulation the inner packets with
IPv6 header and an extended header called SRH
(Segment Routing Header). See spec in:
https://datatracker.ietf.org/doc/html/rfc8754
This patch implements SRv6 tunneling in userspace datapath.
It uses `remote_ip` and `local_ip` options as with existing
tunnel protocols. It also adds a dedicated `srv6_segs` option
to define a sequence of routers called segment list.
Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Checks whether IPPROTO_ROUTING exists in the IPv6 extension headers.
If it exists, the first address is retrieved.
If NULL is specified for "frag_hdr" and/or "rt_hdr", those addresses in
the header are not reported to the caller. Of course, "frag_hdr" and
"rt_hdr" are properly parsed inside this function.
Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
In some tunnels, inner packet needs to support both IPv4
and IPv6. Therefore, this patch improves to allow two
protocols to be tied together in one tunneling.
Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The new ADD_VETH_NS macro creates two netns and connects them
with a veth pair. We can use it for testing in a generic purpose.
e.g.
ADD_VETH_NS([ns1], [p1], [1.1.1.1/24], [ns2], [p2], [1.1.1.2/24])
Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
With the current implementation the available CPUs will not be read
until 10s have passed since the system's boot. For systems that boot
faster, this can make ovs-vswitchd create fewer handlers than necessary
for some time.
Fixes: 0d23948a598a ("ovs-thread: Detect changes in number of CPUs.")
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2180460
Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Michael Santana <msantana@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Ensure at least 1 handler is created even if something goes wrong during
cpu detection or prime numer calculation.
Fixes: a5cacea5f988 ("handlers: Create additional handler threads when using CPU isolation.")
Suggested-by: Aaron Conole <aconole@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Michael Santana <msantana@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
parse_tc_flower_to_actions() was not reporting errors, which would
cause parse_tc_flower_to_match() to ignore them.
Fixes: dd03672f7bbb ("netdev-offload-tc: Move flower_to_match action handling to isolated function.")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Prior to v37.0.0, CryptographyDeprecationWarning could not be imported
from __init__.py resulting in:
Traceback (most recent call last):
File "mfex_fuzzy.py", line 9, in <module>
category=cryptography.CryptographyDeprecationWarning,
AttributeError: module 'cryptography' has no attribute
'CryptographyDeprecationWarning'
This import was only added to __init__ to deprecate python3.6. Importing
the exception from cryptography.utils is the compatible option.
Fixes: c3ed0bf34b8a ("tests/mfex: Silence Blowfish/CAST5 deprecation warnings.")
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Open vSwitch generally tries to let the underlying operating system
managed the low level details of hardware, for example DMA mapping,
bus arbitration, etc. However, when using DPDK, the underlying
operating system yields control of many of these details to userspace
for management.
In the case of some DPDK port drivers, configuring rte_flow or even
allocating resources may require access to iopl/ioperm calls, which
are guarded by the CAP_SYS_RAWIO privilege on linux systems. These
calls are dangerous, and can allow a process to completely compromise
a system. However, they are needed in the case of some userspace
driver code which manages the hardware (for example, the mlx
implementation of backend support for rte_flow).
Here, we create an opt-in flag passed to the command line to allow
this access. We need to do this before ever accessing the database,
because we want to drop all privileges asap, and cannot wait for
a connection to the database to be established and functional before
dropping. There may be distribution specific ways to do capability
management as well (using for example, systemd), but they are not
as universal to the vswitchd as a flag.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Gaetan Rivet <gaetanr@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Specifying datapath with "dpctl/flush-conntrack" didn't
work as expected and caused error:
ovs-dpctl: field system@ovs-system missing value (Invalid argument)
To prevent that, check if we have datapath as first argument
and use it accordingly.
Also add couple of test cases to ensure that everything works as
expected.
Fixes: a9ae73b916ba ("ofp, dpif: Allow CT flush based on partial match.")
Signed-off-by: Ales Musil <amusil@redhat.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Remove one of two consecutive time_msec() calls in the revalidate() function.
We take the time stamp after udpif_get_n_flows(), to avoid any potential
delays in getting the number of offloaded flows.
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Depending on the driver implementation, it can take from 0.2 seconds
up to 2 seconds before offloaded flow statistics are updated. This is
true for both TC and rte_flow-based offloading. This is causing a
problem with min-revalidate-pps, as old statistic values are used
during this period.
This fix will wait for at least 2 seconds, by default, before assuming no
packets where received during this period.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The recently added test cases below are not passing on the af_xdp
datapath due to tcpdump not working on the OVS ports with this datapath.
conntrack - ICMP related NAT with single port
conntrack - ICMPv6 related NAT with single port
conntrack - ICMP from different source related with NAT
The tests are changed to attach tcpdump on the associated veth port in
the netns. Tests are now passing with all datapaths (afxdp, kernel, userspace,
and offloads).
Fixes: 8bd688063078 ("system-traffic.at: Add icmp error tests while dnatting address and port.")
Fixes: 0a7587034dc9 ("conntrack: Properly unNAT inner header of related traffic.")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Ales Musil <amusil@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
We can use the "ip route add ... src ..." command to set the preferred
source address for each entry in the kernel FIB. OVS has a mechanism to
cache the FIB, but the preferred source address is ignored and
calculated with its own logic. This patch resolves the difference
between kernel FIB and OVS route table cache by retrieving the
RTA_PREFSRC attribute of Netlink messages.
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
When adding a route with ovs/route/add command, the source address
in "ovs_router_entry" structure is always the FIRST address that the
interface has. See "ovs_router_get_netdev_source_address"
function for more information.
If an interface has multiple ipv4 and/or ipv6 addresses, there are use
cases where the user wants to control the source address. This patch
therefore addresses this issue by adding a src parameter.
Note that same constraints also exist when caching routes from
Kernel FIB with Netlink, but are not dealt with in this patch.
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Fixed the manual page to indicate that both IPv4/IPv6
are supported. Also added missing pkt_mark on one side
and fixed the "gw" and "bridge" notation quirks.
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This patch cleans up the parser to accept pkt_mark and gw in any order.
pkt_mark and gw are normally expected to be specified exactly once.
However, as with other tools, if specified multiple times, the last
specification is used. Also, pkt_mark and gw have separate prefix
strings so they can be parsed in any order.
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This is useful in test cases where multiple IPv4/IPv6 addresses
are assigned together.
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
ATOMIC_VAR_INIT has a trivial definition
`#define ATOMIC_VAR_INIT(value) (value)`,
is deprecated in C17/C++20, and will be removed in newer standards in
newer GCC/Clang (e.g. https://reviews.llvm.org/D144196).
Signed-off-by: Fangrui Song <maskray@google.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
In the recent Antrea project testing, some port could not be created
on Windows.
When doing debug, our team found there is one case happening when multiple
ports are waiting for be created with correct port number.
Some system type port will be created netdev successfully and it will cause
conflict as in the dpif side it will be internal type. So finally the port
will be created failed and it could not be easily recovered.
With the patch, on Windows the netdev creating will be blocked for system
type when the ovs_tyep got on dpif is internal. More detailed case description
is in the reported issue No.262 with link below.
Reported-at:https://github.com/openvswitch/ovs-issues/issues/262
Signed-off-by: Wilson Peng <pweisong@vmware.com>
Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
The revalidator process uses the internal call udpif_get_n_flows()
to get the total number of flows installed in the system. It uses
this value for various decisions on flow installation and removal.
With the tc offload this values is incorrect, as the hardware
offloaded are not included. With rte_flow offload this is not a
problem as dpif netdev keeps both in sync.
This patch will include the hardware offloaded flows if the
underlying dpif implementation is not syncing them.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
When the ukey's action set changes, it could cause the flow to use a
different datapath, for example, when it moves from tc to kernel.
This will cause the the cached previous datapath statistics to be used.
This change will reset the cached statistics when a change in
datapath is discovered.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Flow lookup doesn't include masks of the final stage in a resulting
flow wildcards in case that stage had L4 ports match. Only the result
of ports trie lookup is added to the mask. It might be sufficient in
many cases, but it's not correct, because ports trie is not how we
decided that the packet didn't match in this subtable. In fact, we
used a full subtable mask in order to determine that, so all the
subtable mask bits has to be added.
Ports trie can still be used to adjust ports' mask, but it is not
sufficient to determine that the packet didn't match.
Assuming we have following 2 OpenFlow rules on the bridge:
table=0, priority=10,tcp,tp_dst=80,tcp_flags=+psh actions=drop
table=0, priority=0 actions=output(1)
The first high priority rule supposed to drop all the TCP data traffic
sent on port 80. The handshake, however, is allowed for forwarding.
Both 'tcp_flags' and 'tp_dst' are on the final stage in the flow.
Since the stage mask from that stage is not incorporated into the flow
wildcards and only ports mask is getting updated, we have the following
megaflow for the SYN packet that has no match on 'tcp_flags':
$ ovs-appctl ofproto/trace br0 "in_port=br0,tcp,tp_dst=80,tcp_flags=syn"
Megaflow: recirc_id=0,eth,tcp,in_port=LOCAL,nw_frag=no,tp_dst=80
Datapath actions: 1
If this flow is getting installed into datapath flow table, all the
packets for port 80, regardless of TCP flags, will be forwarded.
Incorporating all the looked at bits from the final stage into the
stages map in order to get all the necessary wildcards. Ports mask
has to be updated as a last step, because it doesn't cover the full
64-bit slot in the flowmap.
With this change, in the example above, OVS is producing correct
flow wildcards including match on TCP flags:
Megaflow: recirc_id=0,eth,tcp,in_port=LOCAL,nw_frag=no,tp_dst=80,tcp_flags=-psh
Datapath actions: 1
This way only -psh packets will be forwarded, as expected.
This issue affects all other fields on stage 4, not only TCP flags.
Tests included to cover tcp_flags, nd_target and ct_tp_src/dst.
First two are frequently used, ct ones are sharing the same flowmap
slot with L4 ports, so important to test.
Before the pre-computation of stage masks, flow wildcards were updated
during lookup, so there was no issue. The bits of the final stage was
lost with introduction of 'stages_map'.
Recent adjustment of segment boundaries exposed 'tcp_flags' to the issue.
Reported-at: https://github.com/openvswitch/ovs-issues/issues/272
Fixes: ca44218515f0 ("classifier: Adjust segment boundary to execute prerequisite processing.")
Fixes: fa2fdbf8d0c1 ("classifier: Pre-compute stage masks.")
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The two tests verify, for both icmp and icmpv6, that the correct port
translation happen in the inner packet in the case an error is
received in the reply direction.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Move check for tc ingress pps support to from aclocal to test script
This has several problems:
1. Stderror from failing commands is output when executing
various make targets.
2. There are various failure conditions that lead
to veth0 and veth1 being created by not cleaned up.
3. The check seems to execute for many make targets.
And it attempts to temporarily modify system state.
This seems inappropriate.
4. veth0 and veth1 seem far too generic and could easily
conflict with other parts of the system.
All these problems are addressed by this patch.
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Add options to the IPFIX table configure the interval to send statistics
and template information.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Tunnel OpenFlow ports do not exist in the datapath, instead there is a
tunnel backing interface that serves all the tunnels of the same type.
For example, if the geneve port 'my_tunnel' is added to OVS, it will
create 'geneve_sys_6041' datapath port, if it doesn't already exist,
and use this port as a tunnel output.
However, while creating/opening a new datapath after re-start,
ovs-vswitchd only has a list of names of OpenFlow interfaces. And it
thinks that each datapath port, that is not on the list, is a stale
port that needs to be removed. This is obviously not correct for
tunnel backing interfaces that can serve multiple tunnel ports and do
not match OpenFlow port names.
This is causing removal and re-creation of all the tunnel backing
interfaces in the datapath on OVS restart, causing disruption in
existing connections.
It's hard to tell by only having a name of the interface if this
interface is a tunnel backing interface, or someone just named a
normal interface this way. So, instead of trying to determine that,
not removing any interfaces at all, while we don't know types of
actual ports we need.
Assuming that all the ports that are currently not in the list of OF
ports are tunnel backing ports. Later, revalidation of tunnel backing
ports in type_run() will determine which ports are still needed and
which should be removed.
It's OK to add even a non-tunnel stale ports into tnl_backers, they
will be cleaned up the same way as stale tunnel backers.
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2023-February/052215.html
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Cookies are an important part of flow descriptions and must be available
to the end user.
Signed-off-by: Viacheslav Galaktionov <viacheslav.galaktionov@arknetworks.am>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>