Previously the kernel did not provide a netlink interface to flush/list
only conntrack entries matching a specific zone. With [1] and [2] it is now
possible to flush and list conntrack entries filtered by zone. Older
kernels not yet supporting this feature will ignore the filter.
For the list request that means just returning all entries (which we can
then filter in userspace as before).
For the flush request that means deleting all conntrack entries.
The implementation is now identical to the windows one, so we combine
them.
These significantly improves the performance of flushing conntrack zones
when the conntrack table is large. Since flushing a conntrack zone is
normally triggered via an openflow command it blocks the main ovs thread
and thereby also blocks new flows from being applied. Using this new
feature we can reduce the flushing time for zones by around 93%.
In combination with OVN the creation of a Logical_Router (which causes
the flushing of a ct zone) could block other operations, e.g. the
failover of Logical_Routers (as they cause new flows to be created).
This is visible from a user perspective as a ovn-controller that is idle
(as it waits for vswitchd) and vswitchd reporting:
"blocked 1000 ms waiting for main to quiesce" (potentially with ever
increasing times).
The following performance tests where run in a qemu vm with 500.000
conntrack entries distributed evenly over 500 ct zones using `ovstest
test-netlink-conntrack flush zone=<zoneid>`.
| flush zone with 1000 entries | flush zone with no entry |
+---------------------+----------+---------------------+----------|
| with the patch | without | with the patch | without |
+----------+----------+----------+----------+----------+----------|
| v6.8-rc4 | v6.7.1 | v6.8-rc4 | v6.8-rc4 | v6.7.1 | v6.8-rc4 |
+---------+----------+----------+----------+----------+----------+----------|
| Min | 0.260 | 3.946 | 3.497 | 0.228 | 3.462 | 3.212 |
| Median | 0.319 | 4.237 | 4.349 | 0.298 | 4.460 | 4.010 |
| 90%ile | 0.335 | 4.367 | 4.522 | 0.325 | 4.662 | 4.572 |
| 99%ile | 0.348 | 4.495 | 4.773 | 0.340 | 4.931 | 6.003 |
| Max | 0.362 | 4.543 | 5.054 | 0.348 | 5.390 | 6.396 |
| Mean | 0.320 | 4.236 | 4.331 | 0.296 | 4.430 | 4.071 |
| Total | 80.02 | 1058 | 1082 | 73.93 | 1107 | 1017 |
[1]: eff3c558bb
[2]: fa173a1b4e
Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Co-Authored-By: Luca Czesla <luca.czesla@mail.schwarz>
Signed-off-by: Luca Czesla <luca.czesla@mail.schwarz>
Co-Authored-By: Max Lamprecht <max.lamprecht@mail.schwarz>
Signed-off-by: Max Lamprecht <max.lamprecht@mail.schwarz>
Signed-off-by: Felix Huettner <felix.huettner@mail.schwarz>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The SCTP protocol ports were excluded from the netlink
encoding. In that case the nl_ct_flush_tuple() would
return EOPNOTSUPP, that could result in some CT entries
not being properly flushed if we would hit SCTP entry earlier
than others.
This at the same time allows to flush SCTP on its own
in during partial match. This should still be considered
a bug, because OvS currently supports SCTP CT entries,
and it should also support partial flush for them the same
way it supports partial flush for TCP/UDP.
Reported-at: https://bugzilla.redhat.com/2228037
Signed-off-by: Ales Musil <amusil@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Some of the CTA_PROTOINFO_TCP nested attributes are not always
included in the received message, but the parsing logic considers them
as required, failing in case they are not found.
This was observed while monitoring some connections by reading the
events sent by conntrack:
./ovstest test-netlink-conntrack monitor
[...]
2022-08-04T09:39:02Z|00007|netlink_conntrack|ERR|Could not parse nested TCP protoinfo
options. Possibly incompatible Linux kernel version.
2022-08-04T09:39:02Z|00008|netlink_notifier|WARN|unexpected netlink message contents
[...]
All the TCP DELETE/DESTROY events fail to parse with the message
above.
Fix it by turning the relevant attributes to optional.
Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This patch first defines the dpif interface for a datapath to support
adding, deleting, getting and dumping conntrack timeout policy.
The timeout policy is identified by a 4 bytes unsigned integer in
datapath, and it currently support timeout for TCP, UDP, and ICMP
protocols.
Moreover, this patch provides the implementation for Linux kernel
datapath in dpif-netlink.
In Linux kernel, the timeout policy is maintained per L3/L4 protocol,
and it is identified by 32 bytes null terminated string. On the other
hand, in vswitchd, the timeout policy is a generic one that consists of
all the supported L4 protocols. Therefore, one of the main task in
dpif-netlink is to break down the generic timeout policy into 6
sub policies (ipv4 tcp, udp, icmp, and ipv6 tcp, udp, icmp),
and push down the configuration using the netlink API in
netlink-conntrack.c.
This patch also adds missing symbols in the windows datapath so
that the build on windows can pass.
Appveyor CI:
* https://ci.appveyor.com/project/YiHungWei/ovs/builds/26387754
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
Signed-off-by: Justin Pettit <jpettit@ovn.org>
Currently, only the netlink datapath supports SCTP connection tracking,
but at least this removes the warning message that will pop up when
running something like:
ovs-appctl dpctl/dump-conntrack
This doesn't impact any conntrack functionality, just the display.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
The zone Netlink attribute is supposed to be in network-byte order, but
the Windows code for deleting conntrack entries was treating it as
host-byte order.
Found by inspection.
Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Sairam Venugopal <vsairam@vmware.com>
This patch adds support of flushing a conntrack entry specified by the
conntrack 5-tuple, and provides the implementation in dpif-netlink.
The implementation of dpif-netlink in the linux datapath utilizes the
NFNL_SUBSYS_CTNETLINK netlink subsystem to delete a conntrack entry in
nf_conntrack. Future patches will add support for the userspace and
Windows datapaths.
VMWare-BZ: #1983178
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Justin Pettit <jpettit@ovn.org>
Poll-loop is the core to implement main loop. It should be available in
libopenvswitch.
Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
With the command:
ovs-appctl dpctl/ct-bkts
shows the number of connections per bucket.
By using a threshold:
ovs-appctl dpctl/ct-bkts gt=N
for each bucket shows the number of connections when they
are greater than N.
Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>
Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Co-authored-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
No point littering the logs with messages on an unsupported protocol,
so change the log to debug level.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Conntrack update events include labels only if they have changed.
Record the presence of labels in the netlink message to OVS internal
representation, so that the user may keep the old labels when an
update does not modify them.
Fixes: 6830a0c0e6 ("netlink-conntrack: New module.")
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
From the connection tracker perspective, an ICMP connection is a tuple
identified by source ip address, destination ip address and ICMP id.
While this allows basic ICMP traffic (pings) to work, it doesn't take
into account the icmp type: the connection tracker will allow
requests/replies in any directions.
This is improved by making the ICMP type and code part of the connection
tuple. An ICMP echo request packet from A to B, will create a
connection that matches ICMP echo request from A to B and ICMP echo
replies from B to A. The same is done for timestamp and info
request/replies, and for ICMPv6.
A new modules conntrack-icmp is implemented, to allow only "request"
types to create new connections.
Also, since they're tracked in both userspace and kernel
implementations, ICMP type and code are always printed in ct-dpif (a few
testcase are updated as a consequence).
Reported-by: Subramani Paramasivam <subramani.paramasivam@wipro.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Joe Stringer <joe@ovn.org>
Windows datapath lacked support for different Netlink Family protocols.
Now that Windows supports different Netlink protocol, revert the change to
override NETLINK_NETFILTER to use NETLINK_GENERIC.
Signed-off-by: Sairam Venugopal <vsairam@vmware.com>
Acked-by: Nithin Raju <nithin@vmware.com>
Signed-off-by: Gurucharan Shetty <guru@ovn.org>
The flags and state sent by Windows datapath are currently in the
userspace format. So prevent further translation.
Signed-off-by: Sairam Venugopal <vsairam@vmware.com>
Acked-by: Nithin Raju <nithin@vmware.com>
Signed-off-by: Gurucharan Shetty <guru@ovn.org>
Modify dpif-netlink.c and netlink-conntrack.c to send down dump and flush command
to Windows datapath. Include netlink-conntrack.c and netlink-conntrack.h
in automake.mk for Windows binaries.
Windows currently supports only NETLINK_GENERIC port. In order to support
the NETLINK_NETFILTER messages, the port id is being overwritten to
NETLINK_GENERIC on Windows and datapath has been updated to support the
new message format.
Signed-off-by: Sairam Venugopal <vsairam@vmware.com>
Acked-by: Paul-Daniel Boca <pboca@cloudbasesolutions.com>
Acked-by: Nithin Raju <nithin@vmware.com>
Acked-by: Nithin Raju <nithin@vmware.com>
Signed-off-by: Gurucharan Shetty <guru@ovn.org>
This module uses the netlink interface provide by the Linux kernel
connection tracker to provide some visibility into the conntrack tables.
The module provides functions to:
* Convert a netlink representation of a connection into a
struct 'ct_dpif_entry'.
* Dump all the connections.
* Flush all the connections.
* Listen for updates by registering a netlink notifier.
It will be used by dpif-netlink to implement the interface required by
the ct-dpif module.
Based on original work by Jarno Rajahalme
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Joe Stringer <joe@ovn.org>