The ipv4 fragmentation check is broken and allows fragments through.
There were fragile and poorly maintainable checks in extract_l3_ipv*
designed to save a few cycles. The checks make assumptions about what
sanity checks may have been done and could be skipped based on inferring
from the value of another paramater that should be unrelated (l4
pointer needing assignment). Since the benefit is minimal, remove
the special checks and always do sanity checks.
Four tests are added to better maintain fragmentation support.
This needs backporting to 2.9.
Fixes: c8b1ad49da68("conntrack: Reorder sanity checks in extract_l3_ipvx().")
Fixes: a489b16854b5("conntrack: New userspace connection tracker.")
Signed-off-by: Darrell Ball <dlu998@gmail.com>
This patch adds support of flushing a conntrack entry specified by the
conntrack 5-tuple in dpif-netdev.
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Darrell Ball <dlu998@gmail.com>
The ovs_assert() macro always evaluates its argument, even when NDEBUG is
defined so that failure is ignored. This behavior wasn't documented, and
thus a lot of code didn't rely on it. This commit documents the behavior
and simplifies bits of code that heretofore didn't rely on it.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
This supports using the ct_clear action in the kernel datapath. To
preserve compatibility with current ct_clear behavior on old kernels, we
only pass this action down to the datapath if a probe reveals the
datapath actually supports it.
Signed-off-by: Eric Garver <e@erig.me>
Acked-by: William Tu <u9012063@gmail.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Justin Pettit <jpettit@ovn.org>
The functions extract_l3_ipv4 and extract_l3_ipv6 check for
unsupported ip fragments and return early. The checks were after
an assignment that would not be needed when early return happens.
This is slightly inefficient, but mostly reads poorly.
Hence, reorder the ip fragment checks before the assignments.
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Fix up some instances where variable declarations were not close
enough to their use, as these were missed before. This is the
preferred art in OVS code and flagged heavily in code reviews.
This is highly desirable due to code clarity reasons.
There are also some cases where newlines were not needed by prior art
and some cases where they were needed but missed.
There was one case where there was a missing space after "}".
There were a few cases where for loop index declarations could be
folded into the loop.
One function was missing some const qualifiers.
There were a few instances where a local variable for conn_key_hash
could be eliminated.
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
In order to support more algs with different requirements,
expectation handling is allowed to handle more cases, such as
a wildcard source ip as in the case of SIP. NAT can also be
skipped in some alg cases.
Expectation_create() was otherwise simplified in the process.
Some renaming was done to support the above changes.
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Presently, alg expectations are removed by being time expired.
This was intended to happen before the control connections and
was intended to minimize the extra work involved for tracking and
removing the expectations. This is not the best option since it
should be possible to remove expectations when a control connection
is removed and a new api is in the works to do this. Also, conceptually
an expectation should not exist without a control connection context
and it can be argued that this should be a strict requirement.
The approach is changed to remove the expectations when the control
connections are removed. The previous code to expire the expectations
is removed at the same time.
Fixes: bd5e81a0e ("Userspace Datapath: Add ALG infra and FTP.")
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2017-December/341683.html
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
A get command is added for number of conntrack connections.
This command is only supported in the userspace datapath
at this time.
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>
Co-authored-by: Antonio Fischetti <antonio.fischetti@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Get and set dpctl commands are added for conntrack maxconns.
These commands are only supported in the userspace
datapath at this time.
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>
Co-authored-by: Antonio Fischetti <antonio.fischetti@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
An address sanity check is done on icmp error packets to
check that the icmp error payload makes sense w.r.t. the
packet itself.
The sanity check was partially incorrect since it tried
to verify the source address of the error packet against the
original destination, which does not makes since the error
can be generated by any intermediate node.
Reported-by: wangzhike <wangzhike@jd.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2017-December/341609.html
Fixes: a489b1685 ("conntrack: New userspace connection tracker.")
CC: Daniele Di Proietto <diproiettod@vmware.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: wangzhike <wangzhike@jd.com>
Co-authored-by: wangzhike <wangzhike@jd.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Presently, alg processing is enabled by default to better exercise code.
This is similar to kernels before 4.7 as well. The recommended default
behavior in the newer kernels is to only process algs if a helper is
supplied in a conntrack rule. The behavior is changed to match the
later kernels.
A test is extended to check that the control connection is still
created in such a case.
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Algs can use variable control port numbers for servers.
The main use case is a kind of feeble security measure; the
thinking being by some is that it obscures the alg traffic.
It is really not very effective, but the kernel has this
capability. This patch mimics the capability.
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Upcoming requirements for new algs make it desirable to split out
alg helpers more cleanly.
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Poll-loop is the core to implement main loop. It should be available in
libopenvswitch.
Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Close a theoretical race delete/create corner case for alg
reverse conns and add debugging around this that may point to
an intentional exploit, unintentional problem or just a rare
condition. The solution is to keep track of reverse conn via
nat_conn_keys and avoid deleting the reverse conn when it has been
recreated.
Fixes: bd5e81a0e5 ("Userspace Datapath: Add ALG infra and FTP.")
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Create a separate function from existing code, so the
code can be reused in a subsequent patch; no change
in functionality.
Acked-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Conn should be removed from the connection expiry list when
the connection tracker experiences NAT resource exhaustion
and the connection needing NAT mapping cannot get it.
If this is not done, the connection tracker can crash during
cleanup of expired connections by the clean thread.
This crash will be triggered when a established flow do ct(nat)
again, like
"ip,actions=ct(table=1)
table=1,in_port=1,ip,actions=ct(commit,nat(dst=5.5.5.5)),2
table=1,in_port=2,ip,ct_state=+est,actions=1
table=1,in_port=1,ip,ct_state=+est,actions=2"
Fixes: bd5e81a0e5 ("Userspace Datapath: Add ALG infra and FTP.")
Signed-off-by: Lili Huang <huanglili.huang@huawei.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
ALG infra and FTP (both V4 and V6) support is added to the userspace
datapath. Also, NAT support is included.
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
A new function conn_key_cmp() is introduced and used to replace
memcmp of conn_keys. Given that OVS runs on with many compilers and
on many architectures, it seems prudent to avoid memcmp in case
existing and future holes in conn_key are not handled by a given
compiler for a given architecture.
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Suggested-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
With the command:
ovs-appctl dpctl/ct-bkts
shows the number of connections per bucket.
By using a threshold:
ovs-appctl dpctl/ct-bkts gt=N
for each bucket shows the number of connections when they
are greater than N.
Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>
Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Co-authored-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
The 'nat' portion of 'nat_resources_lock' is dropped as
this lock will be used by ALGs in a subsequent patch.
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
The conntrack context flag 'related' is changed to 'icmp_related'
to disambiguate usage w.r.t. ALGs which are added in a subsequent
patch.
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Fixes some lines exceeding 80 chars and a couple of typos.
Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Un-nat conns have no nat_info as do default conns.
However, un-nat conns are originally templated from the
corresponding default conns and therefore need to
have their nat_info explicitly nulled. This
otherwise exposes a double free if conntrack_destroy()
were to be used to destroy the connection tracker. This
would apply to cleaning the datapath after testing.
Fixes: 286de27299 ("dpdk: Userspace Datapath: Introduce NAT Support.")
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Greg Rose <gvrose8192@gmail.com>
The function conn_key_hash() is updated to include
a call to hash_finish() and also to make use of a
new hash abstraction - ct_endpoint_hash_add().
Fixes: a489b16854 ("conntrack: New userspace connection tracker.")
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Part of the hash input for nat_range_hash() was accidentally
omitted, so this fixes the problem. Also, add a missing call to
hash_finish().
Fixes: 286de27299 ("dpdk: Userspace Datapath: Introduce NAT Support.")
Co-authored-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
This patch adds orig tuple checking and context
recovery; NAT interactions are factored in.
Orig tuple support exists to better handle policy
changes.
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Acked-by: Daniele Di Proietto <diproiettod@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
This patch includes more complete support
for icmp4 and icmp6 related NAT handling.
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Acked-by: Daniele Di Proietto <diproiettod@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
This patch introduces NAT support for the userspace datapath.
Most conntrack module changes are in this patch, with the
exception of icmp related handling and recent orig tuple
support.
The per packet scope of lookups for NAT and un_NAT is at
the bucket level rather than global. One hash table is
introduced to support create/delete handling. The create/delete
events may be further optimized, if the need becomes clear.
Some NAT options with limited utility (persistent, random) are
not supported yet, but will be supported in a later patch.
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Daniele Di Proietto <diproiettod@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Packet batch sorting is removed for three reasons:
1) The following patches for NAT change the locking
marshalling so batching loses benefit.
2) For real mixtures of flows either in hypervisors
or gateways, the batch sorting won't provide benefit
and will just be a tax.
3) Code clarity.
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Daniele Di Proietto <diproiettod@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
This commit adds a packet_type attribute to the structs dp_packet and flow
to explicitly carry the type of the packet as prepration for the
introduction of the so-called packet type-aware pipeline (PTAP) in OVS.
The packet_type is a big-endian 32 bit integer with the encoding as
specified in OpenFlow verion 1.5.
The upper 16 bits contain the packet type name space. Pre-defined values
are defined in openflow-common.h:
enum ofp_header_type_namespaces {
OFPHTN_ONF = 0, /* ONF namespace. */
OFPHTN_ETHERTYPE = 1, /* ns_type is an Ethertype. */
OFPHTN_IP_PROTO = 2, /* ns_type is a IP protocol number. */
OFPHTN_UDP_TCP_PORT = 3, /* ns_type is a TCP or UDP port. */
OFPHTN_IPV4_OPTION = 4, /* ns_type is an IPv4 option number. */
};
The lower 16 bits specify the actual type in the context of the name space.
Only name spaces 0 and 1 will be supported for now.
For name space OFPHTN_ONF the relevant packet type is 0 (Ethernet).
This is the default packet_type in OVS and the only one supported so far.
Packets of type (OFPHTN_ONF, 0) are called Ethernet packets.
In name space OFPHTN_ETHERTYPE the type is the Ethertype of the packet.
A packet of type (OFPHTN_ETHERTYPE, <Ethertype>) is a standard L2 packet
whith the Ethernet header (and any VLAN tags) removed to expose the L3
(or L2.5) payload of the packet. These will simply be called L3 packets.
The Ethernet address fields dl_src and dl_dst in struct flow are not
applicable for an L3 packet and must be zero. However, to maintain
compatibility with the large code base, we have chosen to copy the
Ethertype of an L3 packet into the the dl_type field of struct flow.
This does not mean that it will be possible to match on dl_type for L3
packets with PTAP later on. Matching must be done on packet_type instead.
New dp_packets are initialized with packet_type Ethernet. Ports that
receive L3 packets will have to explicitly adjust the packet_type.
Signed-off-by: Jean Tourrilhes <jt@labs.hpe.com>
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Co-authored-by: Zoltan Balogh <zoltan.balogh@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Otherwise a malformed packet could cause a read up to about 40 bytes past
the end of the packet. The packet would still likely be dropped because
of checksum verification.
Reported-by: Bhargava Shastry <bshastry@sec.t-labs.tu-berlin.de>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
ICMP error packets (e.g. destination unreachable messages) are
considered 'related' to another connection and are treated as part of
that.
However:
* We shouldn't create new entries in the connection table if the
original connection is not found. This is consistent with what the
kernel does.
* We certainly shouldn't call valid_new() on the packet, because
valid_new() assumes the packet l4 type (might be TCP, UDP or ICMP)
to be consistent with the conn_key nw_proto type.
Found by inspection.
Fixes: a489b16854b5("conntrack: New userspace connection tracker.")
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Darrell Ball <dlu998@gmail.com>