When a tunnel port gets added to the bridge setting the checksum option
to true:
ovs-vsctl add-port br0 geneve0 \
-- set interface geneve0 type=geneve \
options:remote_ip=<remote_ip> options:key=<key> options:csum=true
the flow dump for the outgoing traffic will include a
"bad key length 1 ..." message:
ovs-appctl dpctl/dump-flows --names -m
ufid:<>, ..., dp:tc,
actions:set(tunnel(tun_id=<>,dst=<>,ttl=64,tp_dst=6081,
key6(bad key length 1, expected 0)(01)flags(key)))
,genev_sys_6081
This is due to a mismatch present between the expected length (zero
for OVS_TUNNEL_KEY_ATTR_CSUM in ovs_tun_key_attr_lens) and the
current one.
With this patch the same flow dump becomes:
ovs-appctl dpctl/dump-flows --names -m
ufid:<>, ..., dp:tc,
actions:set(tunnel(tun_id=<>,dst=<>,ttl=64,tp_dst=6081,
flags(csum|key))),genev_sys_6081
Fixes: d9677a1f0eaf ("netdev-tc-offloads: TC csum option is not matched with tunnel configuration")
Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
similarly to what already exists for L4, add conntrack_l3csum_err
and ipf_l3csum_err for L3.
Received packets with L3 bad checksum will increase respectively
ipf_l3csum_err if they are fragments and conntrack_l3csum_err
otherwise.
Although the patch basically covers IPv4, the names are kept generic.
Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
conntrack_l4csum_err gets incremented only when corrupted icmp pass
through conntrack. Increase it for the remaining bad checksum cases
including when checksum is offloaded.
Fixes: 38c69ccf8e29 ("conntrack: Add coverage count for l4csum error.")
Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
Acked-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This doesn't have any current users within the OVS repository. OVN will
use it.
Signed-off-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
For VXLAN offload, matches should be done on outer header for tunnel
properties as well as inner packet matches. Add a function for parsing
VXLAN tunnel matches.
Signed-off-by: Eli Britstein <elibr@nvidia.com>
Reviewed-by: Gaetan Rivet <gaetanr@nvidia.com>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Tested-by: Emma Finn <emma.finn@intel.com>
Tested-by: Marko Kovacevic <marko.kovacevic@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Vports are virtual, OVS only logical devices, so rte_flows cannot be
applied as is on them. Instead, apply the rules the physical port from
which the packet has arrived, provided by orig_in_port field.
Signed-off-by: Eli Britstein <elibr@nvidia.com>
Reviewed-by: Gaetan Rivet <gaetanr@nvidia.com>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Tested-by: Emma Finn <emma.finn@intel.com>
Tested-by: Marko Kovacevic <marko.kovacevic@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
When an encapsulated packet is recirculated through a TUNNEL_POP
action, the metadata gets reinitialized and the originating physical
port information is lost. When this flow gets processed by the vport
and it needs to be offloaded, we can't figure out the physical port
through which the tunneled packet was received.
Add a new member to the metadata: 'orig_in_port'. This is passed to
the next stage during recirculation and the offload layer can use it
to offload the flow to this physical port.
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Eli Britstein <elibr@nvidia.com>
Reviewed-by: Gaetan Rivet <gaetanr@nvidia.com>
Tested-by: Emma Finn <emma.finn@intel.com>
Tested-by: Marko Kovacevic <marko.kovacevic@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Support tunnel pop action.
Signed-off-by: Eli Britstein <elibr@nvidia.com>
Reviewed-by: Gaetan Rivet <gaetanr@nvidia.com>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Tested-by: Emma Finn <emma.finn@intel.com>
Tested-by: Marko Kovacevic <marko.kovacevic@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
In order to allow showing more debug messages, increase the rate limits.
Signed-off-by: Eli Britstein <elibr@nvidia.com>
Reviewed-by: Gaetan Rivet <gaetanr@nvidia.com>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Tested-by: Emma Finn <emma.finn@intel.com>
Tested-by: Marko Kovacevic <marko.kovacevic@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
'linux_tc' flow API suitable only for tunneling vports with backing
linux interfaces. DPDK flow API is not suitable for such ports.
With this change we could drop vport restriction from dpif-netdev.
This is a prerequisite for enabling vport offloading in DPDK.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Eli Britstein <elibr@nvidia.com>
Reviewed-by: Gaetan Rivet <gaetanr@nvidia.com>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Tested-by: Emma Finn <emma.finn@intel.com>
Tested-by: Marko Kovacevic <marko.kovacevic@intel.com>
Virtual interfaces like vports or dpdk vhost-user ports have no
proper ifindex, while still supporting some offloads.
This is a prerequisite for tunneling vport offloading with DPDK
flow API.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Eli Britstein <elibr@nvidia.com>
Reviewed-by: Gaetan Rivet <gaetanr@nvidia.com>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Tested-by: Emma Finn <emma.finn@intel.com>
Tested-by: Marko Kovacevic <marko.kovacevic@intel.com>
Recover the packet if it was partially processed by the HW. Fallback to
lookup flow by mark association.
Signed-off-by: Eli Britstein <elibr@nvidia.com>
Reviewed-by: Gaetan Rivet <gaetanr@nvidia.com>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Tested-by: Emma Finn <emma.finn@intel.com>
Tested-by: Marko Kovacevic <marko.kovacevic@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
A miss in virtual port offloads means the flow with tnl_pop was
offloaded, but not the following one. Recover the state and continue
with SW processing.
Signed-off-by: Eli Britstein <elibr@nvidia.com>
Reviewed-by: Gaetan Rivet <gaetanr@nvidia.com>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Tested-by: Emma Finn <emma.finn@intel.com>
Tested-by: Marko Kovacevic <marko.kovacevic@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Add the acceptance of vxlan devices to netdev_dpdk_flow_api_supported()
API, to allow offloading of DPDK vxlan devices.
Signed-off-by: Eli Britstein <elibr@nvidia.com>
Reviewed-by: Gaetan Rivet <gaetanr@nvidia.com>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Tested-by: Emma Finn <emma.finn@intel.com>
Tested-by: Marko Kovacevic <marko.kovacevic@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Introduce an API to traverse the ports added to the offload ports map,
with a generic callback for each one.
Signed-off-by: Eli Britstein <elibr@nvidia.com>
Reviewed-by: Gaetan Rivet <gaetanr@nvidia.com>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Tested-by: Emma Finn <emma.finn@intel.com>
Tested-by: Marko Kovacevic <marko.kovacevic@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
As a pre-step towards tunnel offloads, introduce DPDK APIs.
Signed-off-by: Eli Britstein <elibr@nvidia.com>
Reviewed-by: Gaetan Rivet <gaetanr@nvidia.com>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Tested-by: Emma Finn <emma.finn@intel.com>
Tested-by: Marko Kovacevic <marko.kovacevic@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
When the HW offload involves multiple flows, like in tunnel decap path,
it is possible that not all flows in the path are offloaded, resulting
in partial processing in HW. In order to proceed with rest of the
processing in SW, the packet state has to be recovered as if it was
processed in SW from the beginning. In the case of tunnel decap,
potential state to recover could be the outer tunneling layer to
metadata.
Add an API for that.
Signed-off-by: Eli Britstein <elibr@nvidia.com>
Reviewed-by: Gaetan Rivet <gaetanr@nvidia.com>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Tested-by: Emma Finn <emma.finn@intel.com>
Tested-by: Marko Kovacevic <marko.kovacevic@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
These tests focus on enabling/disabling and user parameters.
Co-Authored-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Some tests get the current log line number so they can check that
there is a new occurrence of a log entry after a command.
'tail' uses the line number as the starting line number. However,
this will include the last line of the log before the command.
To prevent any races on logs and possibly checking an existing log
entry prior to a command here or in reuse of this method, get the
next line number of the log and use that as the starting line for tail.
Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
For now, ovs-vswitchd use the array of the dp_meter struct
to store meter's data, and at most, there are only 65536
(defined by MAX_METERS) meters that can be used. But in some
case, for example, in the edge gateway, we should use 200,000+,
at least, meters for IP address bandwidth limitation.
Every one IP address will use two meters for its rx and tx
path[1]. In other way, ovs-vswitchd should support meter-offload
(rte_mtr_xxx api introduced by dpdk.), but there are more than
65536 meters in the hardware, such as Mellanox ConnectX-6.
This patch use cmap to manage the meter, instead of the array.
* Insertion performance, ovs-ofctl add-meter 1000+ meters,
the cmap takes about 4000ms, as same as previous implementation.
* Lookup performance in datapath, we add 1000+ meters which rate limit
are 10Gbps (the NIC cards are 10Gbps, so netdev-datapath will not
drop the packets.), and a flow which only forward packets from p0
to p1, with meter action[2]. On other machine, pktgen-dpdk will
generate 64B packets to p0.
The forwarding performance always is 1324 Kpps on my server
which CPU is Intel E5-2650, 2.00GHz.
[1].
$ in_port=p0,ip,ip_dst=1.1.1.x action=meter:n,output:p1
$ in_port=p1,ip,ip_src=1.1.1.x action=meter:m,output:p0
[2].
$ in_port=p0 action=meter:100,output:p1
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
New appctl 'dpdk/get-malloc-stats' implemented to get result of
'rte_malloc_dump_stats()' function.
Could be used for debugging.
Signed-off-by: Eli Britstein <elibr@nvidia.com>
Reviewed-by: Salem Sol <salems@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
When a decap action is applied on NSH header encapsulating a
ethernet packet a redundant set mac address action is programmed
to the datapath.
Fixes: f839892a206a ("OF support and translation of generic encap and decap")
Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
Acked-by: Jan Scheurich <jan.scheurich@ericsson.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Normal action is replaced with output to GRE port for sending
l3 packets over GRE tunnel. Normal action cannot be used with
l3 packets.
Fixes: d03d0cf2b71b ("tests: Extend PTAP unit tests with decap action")
Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
Acked-by: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This optimization caused FLOW_TNL_F_UDPIF flag not to be used in
hash calculation for geneve tunnel when revalidating flows which
resulted in different cache hash values and incorrect behaviour.
Added test to prevent regression.
CC: Jesse Gross <jesse@nicira.com>
Fixes: 6728d578f64e ("dpif-netdev: Translate Geneve options per-flow, not per-packet.")
Reported-at: https://github.com/vmware-tanzu/antrea/issues/897
Signed-off-by: Toms Atteka <cpp.code.lv@gmail.com>
Acked-by: Ansis Atteka <aatteka@ovn.org>
As reported by Wang Liang, the way packets are passed to the ipf module
doesn't allow for use later on in reassembly. Such packets may be get
released anyway, such as during cleanup of tx processing. Because the
ipf module lacks a way of forcing the dp_packet to be retained, it
will later reuse the packet. Instead, just clone the packet and let the
ipf queue own the copy until the queue is destroyed.
After this change, there are no more in-tree users of the batch
'do_not_steal' flag. Thus, we remove it as well.
Fixes: 4ea96698f667 ("Userspace datapath: Add fragmentation handling.")
Fixes: 0b3ff31d35f5 ("dp-packet: Add 'do_not_steal' packet batch flag.")
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/382098.html
Reported-by: Wang Liang <wangliangrt@didiglobal.com>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Co-authored-by: Wang Liang <wangliangrt@didiglobal.com>
Signed-off-by: Wang Liang <wangliangrt@didiglobal.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
We don't need to continue parsing if already oversized. This is not
very important, but fuzzer times out while parsing very long list of
actions.
Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=29190
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
As described in commit [1], it's possible that remote IP is backed by
a load-balancer and re-connection to this same IP will lead to
connection to a different server. This case is supported for C version
of IDL and should be supported in a same way for python implementation.
[1] ca367fa5f8bb ("ovsdb-idl.c: Allows retry even when using a single remote.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>
The symptom of this issue is that OVS bridge looses its IP address on
restart.
Simple reproducer:
0. start ovsdb-server and ovs-vswitchd
1. ovs-vsctl add-br br0
2. ifconfig br0 10.0.0.1 up
3. ovs-appctl -t ovs-vswitchd exit
4. start ovs-vswitchd back.
After step #3 ovs-vswitchd is down, but br0 interface exists and
has configured IP address. After step #4 there is no IP address
on the port br0.
What happened:
1. ovsdb-cs connects to the database via ovsdb-idl and requests
database lock.
--> get_schema for _Server database
--> lock request
2. ovsdb-cs receives schema for the _Server database. And sends
monitor request.
<-- schema for _Server
--> monitor_cond for _Server
3. ovsdb-cs receives lock reply.
<-- locked
At this point ovsdb-cs generates OVSDB_CS_EVENT_TYPE_LOCKED
event and passes it to ovsdb-idl. ovsdb-idl increases change_seqno.
4. ovsdb_idl_has_ever_connected() is 'true' now, because change_seqno
is not zero.
5. ovs-vswitchd decides that it has connection with database and
all the initial data, therefore initiates configuration of bridges.
bridge_run():ovsdb_idl_has_ever_connected() --> true
6. Since monitor request for the Open_vSwitch database is not even
sent yet, the database is empty. This leads to removal of all the
ports and all other resources.
7. When data finally received, ovs-vswitchd re-creates bridges and
ports, but IP addresses can not be restored.
While splitting out ovsdb-cs from ovsdb-idl one part of the logic
was lost. Particularly, before the split, ovsdb-idl updated
change_seqno only in MONITORING state.
Restoring the logic by updating the change_seqno only if may send
transaction, i.e. lock is ours and ovsdb-cs is in the MONITORING
state. This matches with the main purpose of increasing change_seqno
at this point, i.e. to force the client to re-try the transaction.
With this change ovsdb_idl_has_ever_connected() remains 'false'
until the first monitor reply with the actual data received.
This issue was reported several times during the last couple of weeks.
Reported-at: https://bugzilla.redhat.com/1968445
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-June/383512.html
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2021-June/051222.html
Fixes: 1c337c43ac1c ("ovsdb-idl: Break into two layers.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Add missing comma.
Signed-off-by: Tao YunXiang <taoyunxiang@cmss.chinamobile.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Cc: Joe Stringer <joe@ovn.org>
This syntax error caused the script to crash. With the use of the
correct argument in the function call, it runs and prints what is
expected.
Signed-off-by: Rosemarie O'Riorden <roriorde@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
This is primarily to be able to test recording of client connections.
Unit test added accordingly.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Current version of replay engine doesn't handle time-based internal
events that results in stream send/receive. Disabling jsonrpc inactivity
probes for now to not block process waiting for probe being sent.
The proper solution would be to implement correct record/replay
of time, probably, by recording time and using the time warping.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Current version or replay engine doesn't handle correctly internal
time-based events that ends up in stream events. For example,
updates of a database status that happens each 2.5 seconds results
in updates on client monitors. Disable updates for now if replay
engine is active. The very first update kept to store the initial
information about the server.
The proper solution would be to record time and replay it, probably,
with time warping or in some other way.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>
This change adds support of stream record/replay functionality to
ovsdb-server.
Since current replay engine doesn't work well with time-based
events generated locally, it will work only with standalone databases
for now (raft heavily depends on time).
To use this functionality run:
Recording:
# create a directory for replay files.
mkdir replay_dir
# copy current db for later use by replay
cp my_db ./replay_dir/my_db
ovsdb-server --record=./replay_dir <OVSDB_ARGS> my_db
# connect some clients and run some ovsdb transactions
ovs-appctl -t ovsdb-server exit
Replay:
# restore db from the copy
cp ./replay_dir/my_db my_db.for_replay
ovsdb-server --replay=./replay_dir <OVSDB_ARGS> my_db.for_replay
At this point ovsdb-server should execute all the same commands
and transactions. Since the last command was 'exit' via unixctl,
ovsdb-server will exit in the end.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>
This is required for the stream record/replay functionality of
ovsdb-server. With record/replay of UUIDs we could record all
incoming transactions and replay them later while being sure
that ovsdb-server will generate exactly same UUIDs for all the
data updates.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>
For debugging purposes it is useful to be able to record all the
incoming transactions and commands and replay them locally under
debugger or with additional logging enabled. This patch introduces
ability to record all the incoming stream data and replay it via new
stream provider named 'stream-replay'. During the record phase all
the incoming stream data written to special replay_* files in the
application rundir. On replay phase instead of opening real streams
application will open replay_* files and read all the incoming data
directly from them.
If enabled for ovsdb-server, for example, this allows to record all
the connections and transactions from the big setup and replay them
locally afterwards to debug the behaviour or test performance.
To start application in recording mode there is a --record cmdline
option. --replay is to replay previously recorded streams.
Current version doesn't work well with time-based stream events like
inactivity probes or any other events generated internally. This is
a point for further improvement.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>
This library provides interfaces to open replay files and
read/write records. Will be used later for stream record/replay
functionality, i.e. to record all the incoming connections and
data and replay it later for debugging and performance analysis
purposes.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>
"ovs-vsctl get Bridge "$1" protocols" prints something like this:
[OpenFlow12, OpenFlow13]
The code in ovs-save didn't parse it properly. This fixes the
problem.
Signed-off-by: linhuang <linhuang@ruijie.com.cn>
Signed-off-by: Ben Pfaff <blp@ovn.org>
In Autotest, [xyz] just expands to xyz. To get [xyz] in output, we
need [[xyz]] in input.
I spotted this based on "expr" reporting an error in testsuite output.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Han Zhou <hzhou@ovn.org>
The "ldd" call here didn't work if libtool was involved and would print
an error message. We could fix that, but the check is only needed for
glibc earlier than 2.11. glibc 2.11 was released in 2009, so it should
be safe to expect that testers are running it or a newer version.
This is a crossport of a patch originally applied to OVN as
commit 2870efff89337298.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Numan Siddique <numans@ovn.org>