commit_set_icmp_action() should do its job only if the packet is ICMP,
otherwise there will be two problems:
* A set ICMP action will be inserted in the ODP actions and the flow
will be slow pathed.
* The tp_src and tp_dst field will be unwildcarded.
Normal TCP or UDP packets won't be impacted, because
commit_set_icmp_action() is called after commit_set_port_action() and it
will see the fields as already committed (TCP/UCP transport ports and ICMP
code/type are stored in the same members in struct flow).
MPLS packets though will hit the bug, causing a nonsensical set action
(which will end up zeroing the transport source port) and an invalid
mask to be generated.
The commit also alters an MPLS testcase to trigger the bug.
We should match on the transport ports only if the tunnel has a UDP
header. It doesn't make sense to match on transport port for GRE
tunnels.
Also, to match on fragment bits we should use FLOW_NW_FRAG_MASK instead
of 0xFF. FLOW_NW_FRAG_MASK is what we get if we convert to the ODP
netlink format and back.
Adding the correct masks in the tunnel router classifier helps in making
sure that the translation generates masks that respect prerequisites.
If the mask has some fields that do not respect prerequisites, the flow
will get deleted by revalidation, because translating to ODP format and
back will generate a more generic mask, which will be perceived as too
generic (compared with the one generated by the translation).
If there's no actual packet (e.g. during revalidation),
execute_controller_action() exits right away, without calling
xlate_commit_actions().
xlate_commit_actions() might have an influence on slow_path reason
(which is included in the generated ODP actions), meaning that the
revalidation will not generate the same actions than the original
translation.
Fix the problem by making execute_controller_action() call
xlate_commit_actions() even without a packet.
Previously this was only done when connlabels were enabled in the kernel
config, even if the functions didn't exist. Fix the compile error.
Fixes: d70a6ff5d40d ("datapath: Define nf_connlabels_{put,get}.")
Reported-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
If userspace executes ct(zone=1), and the connection tracker determines
that the packet is invalid, then the ct_zone flow key field is populated
with the default zone rather than the zone that was specified. Even
though connection tracking failed, this field should be updated with the
value that userspace specified. Fix the issue.
Fixes: a94ebc3999 ("datapath: Add conntrack action")
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Document the use of the Fixes header to refer to a commit that
introduced a bug being fixed.
Signed-off-by: Russell Bryant <russell@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
The OVN symbol table contained outdated mappings between connection
states and the corresponding bit in the ct_state field. This patch
updates the symbol table with the proper values as defined in
lib/packets.h.
Signed-off-by: Russell Bryant <russell@ovn.org>
Fixes: 63bc9fb1c6 ("packets: Reorder CS_* flags to remove gap.")
Acked-by: Joe Stringer <joe@ovn.org>
Having a coverage counter tracking the value of the internal seq_next
should help in debugging.
Suggested-by: Justin Pettit <jpettit@ovn.org>
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
On Hyper-V, we currently don't validate a flow to see if datapath can
indeed execute all the actions specified or not. While support for it
gets implemented, an ASSERT seems too strong. I'm working on the support
for actions validation. Here's a workaround in the meantime to help
debugging.
Signed-off-by: Nithin Raju <nithin@vmware.com>
Acked-by: Sairam Venugopal <vsairam@vmware.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Upstream commit:
When looking for outer IP header, use the actual socket address family, not
the address family of the default destination which is not set for metadata
based interfaces (and doesn't have to match the address family of the
received packet even if it was set).
Fix also the misleading comment.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Upstream: ce212d0f6f5 ("vxlan: interpret IP headers for ECN correctly")
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@kernel.org>
Upstream commit:
Commit 3511494ce2f3d ("vxlan: Group Policy extension") changed definition of
VXLAN_HF_RCO from 0x00200000 to BIT(24). This is obviously incorrect. It's
also in violation with the RFC draft.
Fixes: 3511494ce2f3d ("vxlan: Group Policy extension")
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Upstream: c5fb8caaf91 ("vxlan: fix incorrect RCO bit in VXLAN header")
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@kernel.org>
Upstream commit:
After 614732eaa12d, no refcount is maintained for the vport-vxlan module.
This allows the userspace to remove such module while vport-vxlan
devices still exist, which leads to later oops.
v1 -> v2:
- move vport 'owner' initialization in ovs_vport_ops_register()
and make such function a macro
Fixes: 614732eaa12d ("openvswitch: Use regular VXLAN net_device device")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Upstream: 83e4bf7a74 ("openvswitch: properly refcount vport-vxlan
module").
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@kernel.org>
Upstream commit:
Each openvswitch tunnel vport (vxlan,gre,geneve) holds a reference
to the underlying tunnel device, but never released it when such
device is deleted.
Deleting the underlying device via the ip tool cause the kernel to
hangup in the netdev_wait_allrefs() loop.
This commit ensure that on device unregistration dp_detach_port_notify()
is called for all vports that hold the device reference, properly
releasing it.
Fixes: 614732eaa12d ("openvswitch: Use regular VXLAN net_device device")
Fixes: b2acd1dc3949 ("openvswitch: Use regular GRE net_device instead of vport")
Fixes: 6b001e682e90 ("openvswitch: Use Geneve device.")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Upstream: 131753030("openvswitch: fix hangup on vxlan/gre/geneve device
deletion").
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@kernel.org>
Fixes hang of 'ovs-appctl ofproto/tnl-push-pop' when an invalid
argument passed.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
The NAT validation is similar (and based on) the existing conntrack
validation: when a dpif backer is created, we try to install a flow with
the ct_state NAT bits set. If the flow setup fails we assume that the
backer doesn't support NAT and we reject OpenFlow flows with a NAT
action or a match on the ct_state NAT bits.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
FTP NAT system tests fail if the corresponding modules are not loaded.
Add a probe for nf_nat_ftp module to make sure it is loaded before the
tests.
Reported-by: Daniele Di Proietto <diproiettod@vmware.com>
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
seq values are 64-bit, and storing them to a 32-bit variable causes
the stored value never to match actual seq value after the seq value
gets big enough.
This is a likely cause of OVS main thread using 100% CPU in a system
using bonds after some runtime.
VMware-BZ: #1564993
Reported-by: Hiram Bayless <hbayless@vmware.com>
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
Only the names of the fields were supposed to be bold here, but omitting
the "fR" from "\fR" made everything between the field names bold too,
which looked funny.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
For lswitch ports with known IPs, ARP is responded directly from
local ovn-controller to avoid flooding.
Signed-off-by: Han Zhou <zhouhan@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Based on IPv4 tests, test tunnels over IPv6. In order to do that, add
netdev-dummy/ip6addr command for dummy bridges, and get_in6 support for
netdev-dummy as well.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
With this patch, it is possible to set the IPv6 source and destination address
in flow-based tunnels.
$ ovs-ofctl add-flow br0 "in_port=LOCAL actions=set_field:2001:cafe::92->tun_ipv6_dst"
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Co-authored-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
tnl_arp_lookup is not used anymore. All users have been converted to
IPv4-mapped addresses. New users need to use IPv4-mapped addresses and use
tnl_neigh_lookup.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
When doing push/pop and building tunnel header, do IPv6 route lookups and send
Neighbor Solicitations if needed.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Cc: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
This allows code to be written more naturally in some cases.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
FreeBSD needs to include netinet/in.h to define struct in6_addr.
Signed-off-by: Kevin Lo <kevlo@FreeBSD.org>
Acked-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Before refactoring the main loop to reuse ovsdb_idl_loop_* functions, we
would use a sequence to see if anything changed in NB database to
compute and notify the SB database, and vice versa. This logic got
dropped with the refactor, causing a testsuite failure in the ovn-sbctl
test. Reintroduce the IDL sequence number checking.
Fixes: 331e7aefe1 ("ovn-northd: Refactor main loop to use ovsdb_idl_loop_*
functions")
Suggested-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Justin Pettit <jpettit@ovn.org>
Tested-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Some recent features have more stringent requirements for kernel
versions than the FAQ describes. Add an entry to be more explicit on
which features work with which versions of the upstream kernel.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
If OVS receives a packet from another namespace, then the packet should
be scrubbed. However, people have already begun to rely on the behaviour
that skb->mark is preserved across namespaces, so retain this one field.
This is mainly to address information leakage between namespaces when
using OVS internal ports, but by placing it in ovs_vport_receive() it is
more generally applicable, meaning it should not be overlooked if other
port types are allowed to be moved into namespaces in future.
Upstream: 740dbc289155 ("openvswitch: Scrub skb between namespaces")
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Add support for using conntrack helpers to assist protocol detection.
The new OVS_CT_ATTR_HELPER attribute of the CT action specifies a helper
to be used for this connection. If no helper is specified, then helpers
will be automatically applied as per the sysctl configuration of
net.netfilter.nf_conntrack_helper.
The helper may be specified as part of the conntrack action, eg:
ct(helper=ftp). Initial packets for related connections should be
committed to allow later packets for the flow to be considered
established.
Example ovs-ofctl flows allowing FTP connections from ports 1->2:
in_port=1,tcp,action=ct(helper=ftp,commit),2
in_port=2,tcp,ct_state=-trk,action=ct(recirc)
in_port=2,tcp,ct_state=+trk-new+est,action=1
in_port=2,tcp,ct_state=+trk+rel,action=1
Upstream: cae3a26 "openvswitch: Allow attaching helpers to ct action"
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Allow matching and setting the ct_label field. As with ct_mark, this is
populated by executing the CT action. The label field may be modified by
specifying a label and mask nested under the CT action. It is stored as
metadata attached to the connection. Label modification occurs after
lookup, and will only persist when the conntrack entry is committed by
providing the COMMIT flag to the CT action. Labels are currently fixed
to 128 bits in size.
Upstream: c2ac667 "openvswitch: Allow matching on conntrack label"
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Allow matching and setting the ct_mark field. As with ct_state and
ct_zone, these fields are populated when the CT action is executed. To
write to this field, a value and mask can be specified as a nested
attribute under the CT action. This data is stored with the conntrack
entry, and is executed after the lookup occurs for the CT action. The
conntrack entry itself must be committed using the COMMIT flag in the CT
action flags for this change to persist.
Upstream: 182e304 "openvswitch: Allow matching on conntrack mark"
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Expose the kernel connection tracker via OVS. Userspace components can
make use of the CT action to populate the connection state (ct_state)
field for a flow. This state can be subsequently matched.
Exposed connection states are OVS_CS_F_*:
- NEW (0x01) - Beginning of a new connection.
- ESTABLISHED (0x02) - Part of an existing connection.
- RELATED (0x04) - Related to an established connection.
- INVALID (0x20) - Could not track the connection for this packet.
- REPLY_DIR (0x40) - This packet is in the reply direction for the flow.
- TRACKED (0x80) - This packet has been sent through conntrack.
When the CT action is executed by itself, it will send the packet
through the connection tracker and populate the ct_state field with one
or more of the connection state flags above. The CT action will always
set the TRACKED bit.
When the COMMIT flag is passed to the conntrack action, this specifies
that information about the connection should be stored. This allows
subsequent packets for the same (or related) connections to be
correlated with this connection. Sending subsequent packets for the
connection through conntrack allows the connection tracker to consider
the packets as ESTABLISHED, RELATED, and/or REPLY_DIR.
The CT action may optionally take a zone to track the flow within. This
allows connections with the same 5-tuple to be kept logically separate
from connections in other zones. If the zone is specified, then the
"ct_zone" match field will be subsequently populated with the zone id.
IP fragments are handled by transparently assembling them as part of the
CT action. The maximum received unit (MRU) size is tracked so that
refragmentation can occur during output.
IP frag handling contributed by Andy Zhou.
Based on original design by Justin Pettit.
Upstream: 7f8a436 "openvswitch: Add conntrack action"
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Previously, we used the kernel-internal netlink actions length to
calculate the size of messages to serialize back to userspace.
However,the sw_flow_actions may not be formatted exactly the same as the
actions on the wire, so store the original actions length when
de-serializing and re-use the original length when serializing.
Upstream: 8e2fed1 "openvswitch: Serialize acts with original netlink len"
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
This will allow the ovs-conntrack code to reuse these macros.
Upstream: be26b9a "openvswitch: Move MASKED* macros to datapath.h"
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Backport IPv6 fragment reassembly from upstream commits in the Linux 4.3
development tree.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
IPv6 fragmentation functionality is not exported by most kernels, so
backport this code from the upstream 4.3 development tree.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Backport IPv4 reassembly from the upstream commit caaecdd3d3f8 ("inet:
frags: remove INET_FRAG_EVICTED and use list_evictor for the test").
This is necessary because kernels prior to upstream commit d6b915e29f4a
("ip_fragment: don't forward defragmented DF packet") would not always
track the maximum received unit size during ip_defrag(). Without the
MRU, refragmentation cannot occur so reassembled packets are dropped.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Most kernels provide some form of ip fragmentation. However, until
recently many of them would always send ICMP responses for over_MTU
packets, even when operating in bridge mode. Backport the check to
ensure this doesn't occur.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>