Found by libfuzzer.
Reported-by: Bhargava Shastry <bshastry@sec.t-labs.tu-berlin.de>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Jan Scheurich <jan.scheurich@ericsson.com>
When nothing matched, the code would loop forever.
Found with libfuzzer.
Reported-by: Bhargava Shastry <bshastry@sec.t-labs.tu-berlin.de>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Jan Scheurich <jan.scheurich@ericsson.com>
scan_u128() should return 0 on an error but it actually returned an errno
value in some cases, so a command like this:
ovs-appctl dpctl/add-flow 'ct_label(1/55555555555555555555555555)' ''
could cause a buffer overread.
This bug is not as severe as it may sound because the string form of ODP
flows is not used over OpenFlow or OVSDB, only through the appctl interface
that is normally used just by local system administrators and not exposed
over a network.
Reported-by: Bhargava Shastry <bshastry@sec.t-labs.tu-berlin.de>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Make ovs_flow_key_attr_lens() public to be reused by other modules.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
- Fix 2 incorrect length checks
- Remove unnecessary limit of MD length to 16 bytes
- Remove incorrect comments stating MD2 was not supported
- Pad metadata in encap_nsh with zeroes if not multiple of 4 bytes
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
If we find that we need to change from a SET to SET_MASKED action,
then we write the mask to the actions opfbuf. But if there was netlink
pad added to the buffer when writing the key, mask won't follow the
key data as per SET_MASKED spec.
Fix that by removing the padding before writing the mask, and
readding it if needed for alignment.
Fixes: 6d670e7f0d45 ("lib/odp: Masked set action execution and printing.")
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
If tcp_flags value is 0 it isn't put to netlink, even if mask
isn't zero. Fix that so we can have matching on value 0.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
This commit adds translation and netdev datapath support for generic
encap and decap actions for the NSH MD1 header. The generic encap and
decap actions are mapped to specific encap_nsh and decap_nsh actions
in the datapath.
The translation follows that general scheme that decap() of an NSH
packet triggers recirculation after decapsulation, while encap(nsh)
just modifies struct flow and sets the ctx->pending_encap flag to
generate the encap_nsh action at the next commit to be able to include
subsequent set_field actions for NSH headers.
Support for the flexible MD2 format using TLV properties is foreseen
in encap(nsh), but not yet fully implemented.
The CLI syntax for encap of NSH is
encap(nsh(md_type=1))
encap(nsh(md_type=2[,tlv(<tlv_class>,<tlv_type>,<hex_string>),...]))
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
This patch adds support for NSH packet header fields to the OVS
control plane and the userspace datapath. Initially we support the
fields of the NSH base header as defined in
https://www.ietf.org/id/draft-ietf-sfc-nsh-13.txt
and the fixed context headers specified for metadata format MD1.
The variable length MD2 format is parsed but the TLV context headers
are not yet available for matching.
The NSH fields are modelled as experimenter fields with the dedicated
experimenter class 0x005ad650 proposed for NSH in ONF. The following
fields are defined:
NXOXM code ofctl name Size Comment
=====================================================================
NXOXM_NSH_FLAGS nsh_flags 8 Bits 2-9 of 1st NSH word
(0x005ad650,1)
NXOXM_NSH_MDTYPE nsh_mdtype 8 Bits 16-23
(0x005ad650,2)
NXOXM_NSH_NEXTPROTO nsh_np 8 Bits 24-31
(0x005ad650,3)
NXOXM_NSH_SPI nsh_spi 24 Bits 0-23 of 2nd NSH word
(0x005ad650,4)
NXOXM_NSH_SI nsh_si 8 Bits 24-31
(0x005ad650,5)
NXOXM_NSH_C1 nsh_c1 32 Maskable, nsh_mdtype==1
(0x005ad650,6)
NXOXM_NSH_C2 nsh_c2 32 Maskable, nsh_mdtype==1
(0x005ad650,7)
NXOXM_NSH_C3 nsh_c3 32 Maskable, nsh_mdtype==1
(0x005ad650,8)
NXOXM_NSH_C4 nsh_c4 32 Maskable, nsh_mdtype==1
(0x005ad650,9)
Co-authored-by: Johnson Li <johnson.li@intel.com>
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Don't print frag parsing error if mask is zero,
instead just don't print it.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Shadowing is when a variable with a given name in an inner scope hides a
different variable with the same name in a surrounding scope. This is
generally undesirable because it can confuse programmers. This commit
eliminates most of it.
Found with -Wshadow=local in GCC 7. The repo is not really ready to enable
this option by default because of a few cases that are harder to fix, and
harmless, such as nested use of CMAP_FOR_EACH.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Andy Zhou <azhou@ovn.org>
This commit adds support for the OpenFlow actions generic encap
and decap (as specified in ONF EXT-382) to the OVS control plane.
CLI syntax for encap action with properties:
encap(<header>)
encap(<header>(<prop>=<value>,<tlv>(<class>,<type>,<value>),...))
For example:
encap(ethernet)
encap(nsh(md_type=1))
encap(nsh(md_type=2,tlv(0x1000,10,0x12345678),tlv(0x2000,20,0xfedcba9876543210)))
CLI syntax for decap action:
decap()
decap(packet_type(ns=<pt_ns>,type=<pt_type>))
For example:
decap()
decap(packet_type(ns=0,type=0xfffe))
decap(packet_type(ns=1,type=0x894f))
The first header supported for encap and decap is "ethernet" to convert
packets between packet_type (1,Ethertype) and (0,0).
This commit also implements a skeleton for the translation of generic
encap and decap actions in ofproto-dpif and adds support to encap and
decap an Ethernet header.
In general translation of encap commits pending actions and then rewrites
struct flow in accordance with the new packet type and header. In the
case of encap(ethernet) it suffices to change the packet type from
(1, Ethertype) to (0,0) and set the dl_type accordingly. A new
pending_encap flag in xlate ctx is set to mark that an corresponding
datapath encap action must be triggered at the next commit. In the
case of encap(ethernet) ofproto generetas a push_eth action.
The general case for translation of decap() is to emit a datapath action
to decap the current outermost header and then recirculate the packet
to reparse the inner headers. In the special case of an Ethernet packet,
decap() just changes the packet type from (0,0) to (1, dl_type) without
a need to recirculate. The emission of the pop_eth action for the
datapath is postponed to the next commit.
Hence encap(ethernet) and decap() on an Ethernet packet are OF octions
that only incur a cost in the dataplane when a modifed packet is
actually committed, e.g. because it is sent out. They can freely be
used for normalizing the packet type in the OF pipeline without
degrading performance.
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Zoltan Balogh <zoltan.balogh@ericsson.com>
Co-authored-by: Zoltan Balogh <zoltan.balogh@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Change type from uint16_t to 'enum ovs_key_attr' so that the compiler
will warn the unhandled cases.
Suggested-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
The optimization logic in odp_key_to_dp_packet() used to be useful if the
number of wanted key attributes are small. However, as the expected key
attributes increase, and the optimization logic need to check all the
netlink attributes if one of the wanted key attributes is missing, the
benefit of the optimization logic is minimal. Therefore, this patch removes
the optimization.
Suggested-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
Previously, odp_key_to_dp_packet() may fail to get ct_orig_tuple
from ODP flow key. This patch fixes the issue.
VMWare-BZ: #1920903
Fixes: daf4d3c18da4 ("odp: Support conntrack orig tuple key.")
Suggested-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
Previously, odp_key_to_dp_packet() may fail to get ct_state, ct_zone,
ct_mark, and ct_labels from ODP flow key. This patch fixes the issue.
VMWare-BZ: #1920903
Fixes: 07659514c3c1 ("Add support for connection tracking.")
Fixes: 8e53fe8cf7a1 ("Add connection tracking mark support.")
Fixes: 9daf23484fb1 ("Add connection tracking label support.")
Suggested-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
Checking whether an ODP mask is all-0s or all-1s is a little more
complicated than one might expect because the structures sometimes have
trailing padding. The function odp_mask_is_exact() was fairly careful
about this, but odp_mask_attr_is_wildcard() didn't take padding into
consideration at all, which caused test failures on Travis and on some
machines because of uninitialized padding.
This commit fixes the problem by unifying the two different functions so
that both of them are careful about checking only significant bytes. It
also adds support for the ct_orig_tuples for IPv4 and IPv6, which also
have trailing padding but weren't special cased before.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Andy Zhou <azhou@ovn.org>
This special case isn't actually necessary. Commit 48954dab23ee
("odp-util: Remove last use of odp_tun_key_from_attr for formatting.")
retained it "as a safety measure" but that isn't really needed.
This makes an upcoming change more straightforward.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Andy Zhou <azhou@ovn.org>
The way this function was written seemed really funny to me, so this commit
rewrites it. There should be no behavioral change.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Andy Zhou <azhou@ovn.org>
odp_flow_format() passes masks to odp_mask_attr_is_wildcard() without
first checking that they are the correct length. This is OK for the
moment because odp_mask_attr_is_wildcard() doesn't care that the length
is correct. An upcoming commit will change odp_mask_attr_is_wildcard()
to make it pickier, so this prepares for that change.
This adds a few comments to make it a little harder to get length
validation wrong in the future.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Andy Zhou <azhou@ovn.org>
The 'max_len' parameters to these functions are actually the maximum type
values, not the maximum length of anything.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Andy Zhou <azhou@ovn.org>
Support for extracting original direction 5 tuple fields from the
connection tracking module may differ on some platforms between the IPv4
original tuple fields vs. IPv6. Detect IPv6 original tuple support
separately and reflect this support up to the OpenFlow layer.
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
When we use the 'ovs-appctl ofproto/trace' to send packets,
which include the 'vlan' field, but exclude the 'encap',
the ovs-vswitchd will crash. We should check 'encap' field
in parse_8021q_onward(), before using it.
ovs-appctl ofproto/trace ovs-system \
'in_port(1),eth(src=50:54:00:00:00:05,dst=50:54:00:00:00:07),
eth_type(0x8100),vlan(vid=99,pcp=0)'
#0 nl_attr_get_size (nla=nla@entry=0x0) at lib/netlink.c:567
#1 parse_8021q_onward (src_flow=0x7ffd0ec77540, key_len=40,
key=0x1207e00, flow=0x7ffd0ec77540, expected_attrs=<optimized out>,
out_of_range_attr=0, present_attrs=120, attrs=0x7ffd0ec77170)
at lib/odp-util.c:5359
#2 odp_flow_key_to_flow__ (key=0x1207e00, key_len=40,
flow=flow@entry=0x7ffd0ec77540, src_flow=src_flow@entry=0x7ffd0ec77540)
at lib/odp-util.c:5520
#3 odp_flow_key_to_flow (key=<optimized out>, key_len=<optimized out>,
flow=flow@entry=0x7ffd0ec77540) at lib/odp-util.c:5555
#4 parse_flow_and_packet (argc=3, argv=0x12b2220,
ofprotop=ofprotop@entry=0x7ffd0ec77510, flow=flow@entry=0x7ffd0ec77540,
packetp=packetp@entry=0x7ffd0ec77518)
at ofproto/ofproto-dpif-trace.c:211
#5 ofproto_unixctl_trace (conn=0x1268c20, argc=<optimized out>,
argv=<optimized out>, aux=<optimized out>) at ofproto/ofproto-dpif-trace.c:309
#6 process_command (request=<optimized out>, conn=0x1268c20) at lib/unixctl.c:313
#7 run_connection (conn=0x1268c20) at lib/unixctl.c:347
#8 unixctl_server_run (server=0x1180970) at lib/unixctl.c:400
#9 main (argc=5, argv=0x7ffd0ec779c8) at vswitchd/ovs-vswitchd.c:120
Signed-off-by: nickcooper-zhangtonghao <nic@opencloud.tech>
Acked-by: Eric Garver <e@erig.me>
Signed-off-by: Joe Stringer <joe@ovn.org>
Allow packet type namespace OFPHTN_ETHERTYPE as alternative pre-requisite
for matching L3 protocols (MPLS, IP, IPv6, ARP etc).
Change the meta-flow definition of packet_type field to use the new
custom format MFS_PACKET_TYPE representing "(NS,NS_TYPE)".
Parsing routine for MFS_PACKET_TYPE added to meta-flow.c. Formatting
routine for field packet_type extracted from match_format() and moved to
flow.c to be used from meta-flow.c for formatting MFS_PACKET_TYPE.
Updated the ovs-fields man page source meta-flow.xml with documentation
for packet-type-aware bridges and added documentation for field packet_type.
Added packet_type to the matching properties in tests/ofproto.at.
If dl_type is unwildcarded due to later packet modification, make sure it
is cleared again if the original packet_type was not PT_ETH.
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Until now, ODP output only showed port names for in_port matches. This
commit shows them in other places port numbers appear.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Jan Scheurich <jan.scheurich@ericsson.com>
Tested-by: Jan Scheurich <jan.scheurich@ericsson.com>
Until now, printing names in "ovs-dpctl dump-flows" was tied to the overall
output verbosity, which in practice meant that to see port names a user had
to see a distracting amount of verbosity. This decouples names from
verbosity.
I'd like to make showing names the default for interactive usage, but so
far names aren't accepted in input so that would frustrate cut-and-paste,
which is an important use of "ovs-dpctl dump-flows" output.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Jan Scheurich <jan.scheurich@ericsson.com>
Tested-by: Jan Scheurich <jan.scheurich@ericsson.com>
Using the new netdev flow api operate will now try and
offload flows to the relevant netdev of the input port.
Other operate methods flows will come in later patches.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
nl_msg_put_unspec_uninit() can return a pointer that is only 4-byte
aligned.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Lance Richardson <lrichard@redhat.com>
Use the helpers in appropriate places. In most cases, this fixes a
misaligned reference, since ovs_be128 and ovs_u128 require 8-byte alignment
but Netlink only guarantees 4-byte.
Found by GCC -fsanitize=undefined.
Reported-by: Lance Richardson <lrichard@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Lance Richardson <lrichard@redhat.com>
Ports have a new layer3 attribute if they send/receive L3 packets.
The packet_type included in structs dp_packet and flow is considered in
ofproto-dpif. The classical L2 match fields (dl_src, dl_dst, dl_type, and
vlan_tci, vlan_vid, vlan_pcp) now have Ethernet as pre-requisite.
A dummy ethernet header is pushed to L3 packets received from L3 ports
before the the pipeline processing starts. The ethernet header is popped
before sending a packet to a L3 port.
For datapath ports that can receive L2 or L3 packets, the packet_type
becomes part of the flow key for datapath flows and is handled
appropriately in dpif-netdev.
In the 'else' branch in flow_put_on_pmd() function, the additional check
flow_equal(&match.flow, &netdev_flow->flow) was removed, as a) the dpcls
lookup is sufficient to uniquely identify a flow and b) it caused false
negatives because the flow in netdev->flow may not properly masked.
In dpif_netdev_flow_put() we now use the same method for constructing the
netdev_flow_key as the one used when adding the flow to the dplcs to make sure
these always match. The function netdev_flow_key_from_flow() used so far was
not only inefficient but sometimes caused mismatches and subsequent flow
update failures.
The kernel datapath does not support the packet_type match field.
Instead it encodes the packet type implictly by the presence or absence of
the Ethernet attribute in the flow key and mask.
This patch filters the PACKET_TYPE attribute out of netlink flow key and
mask to be sent to the kernel datapath.
Signed-off-by: Lorand Jakab <lojakab@cisco.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Co-authored-by: Zoltan Balogh <zoltan.balogh@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
I know of two reasons to mark a structure as "packed". The first is
because the structure must match some defined interface and therefore
compiler-inserted padding between or after members would cause its layout
to diverge from that interface. This is not a problem in a structure that
follows the general alignment rules that are seen in ABIs for all the
architectures that OVS cares about: basically, that a struct member needs
to be aligned on a boundary that is a multiple of the member's size.
The second reason is because instances of the struct tend to be at
misaligned addresses.
struct eth_header and struct vlan_eth_header are normally aligned on
16-bit boundaries (at least), and they contain only 16-bit members, so
there's no need to pack them. This commit removes the packed annotation.
This commit also removes the packed annotation from struct llc_header.
Since that struct only contains 8-bit members, I don't know of any benefit
to packing it, period.
This commit also removes a few more packed annotations that are much less
important.
When these packed annotations were removed, it caused a few warnings
related to casts from 'uint8_t *' to more strictly aligned pointer types,
related to struct ovs_action_push_tnl. That's because that struct had a
trailing member used to store packet headers, that was declared as
a uint8_t[]. Before, when this was cast to 'struct eth_header *', there
was no change in alignment since eth_header was packed; now that
eth_header is not packed, the compiler considers it suspicious. This
commit avoids that problem by changing the member from uint8_t[] to
uint32_t[], which assures the compiler that it is properly aligned.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Add support for actions push_eth and pop_eth to the netdev datapath and
the supporting libraries. This patch relies on the support for these actions
in the kernel datapath to be present.
Signed-off-by: Lorand Jakab <lojakab@cisco.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Jean Tourrilhes <jt@labs.hpe.com>
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Co-authored-by: Zoltan Balogh <zoltan.balogh@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
I was about to add another complete list of all the connection states but
this eliminates the need.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Miguel Angel Ajo <majopela@redhat.com>
Various printf() format specifiers in the tree had minor technical issues
which the Mac OS build reported, e.g. here:
https://s3.amazonaws.com/archive.travis-ci.org/jobs/208718342/log.txt
These tend to fall into two categories of harmless warnings:
1. Wrong width for types that are all promoted to 'int'. For example,
both uint8_t and uint16_t are both promoted to 'int' as part of a call
to printf(), but using PRIu8 for a uint16_t causes a warning.
2. Wrong format specifier for type promoted to 'int' due to arithmetic.
For example, if 'x' is a uint8_t, then x >> 1 has type 'int' due to
C's promotion rules, so the correct format specifier is %d and using
PRIu8 will cause a warning.
This commit fixes the warnings. I didn't see anything that rose to the
level of a bug.
These warnings only showed up on Mac OS X because of differences in the
format specifiers that Mac OS uses for PRI*.
Reported-by: Shu Shen <shu.shen@gmail.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Flow key handling changes:
- Add VLAN header array in struct flow, to record multiple 802.1q VLAN
headers.
- Add dpif multi-VLAN capability probing. If datapath supports
multi-VLAN, increase the maximum depth of nested OVS_KEY_ATTR_ENCAP.
Refactor VLAN handling in dpif-xlate:
- Introduce 'xvlan' to track VLAN stack during flow processing.
- Input and output VLAN translation according to the xbundle type.
Push VLAN action support:
- Allow ethertype 0x88a8 in VLAN headers and push_vlan action.
- Support push_vlan on dot1q packets.
Use other_config:vlan-limit in table Open_vSwitch to limit maximum VLANs
that can be matched. This allows us to preserve backwards compatibility.
Add test cases for VLAN depth limit, Multi-VLAN actions and QinQ VLAN
handling
Co-authored-by: Thomas F Herbert <thomasfherbert@gmail.com>
Signed-off-by: Thomas F Herbert <thomasfherbert@gmail.com>
Co-authored-by: Xiao Liang <shaw.leon@gmail.com>
Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Signed-off-by: Eric Garver <e@erig.me>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Upstream commit:
commit 9dd7f8907c3705dc7a7a375d1c6e30b06e6daffc
Author: Jarno Rajahalme <jarno@ovn.org>
Date: Thu Feb 9 11:21:59 2017 -0800
openvswitch: Add original direction conntrack tuple to sw_flow_key.
Add the fields of the conntrack original direction 5-tuple to struct
sw_flow_key. The new fields are initially marked as non-existent, and
are populated whenever a conntrack action is executed and either finds
or generates a conntrack entry. This means that these fields exist
for all packets that were not rejected by conntrack as untrackable.
The original tuple fields in the sw_flow_key are filled from the
original direction tuple of the conntrack entry relating to the
current packet, or from the original direction tuple of the master
conntrack entry, if the current conntrack entry has a master.
Generally, expected connections of connections having an assigned
helper (e.g., FTP), have a master conntrack entry.
The main purpose of the new conntrack original tuple fields is to
allow matching on them for policy decision purposes, with the premise
that the admissibility of tracked connections reply packets (as well
as original direction packets), and both direction packets of any
related connections may be based on ACL rules applying to the master
connection's original direction 5-tuple. This also makes it easier to
make policy decisions when the actual packet headers might have been
transformed by NAT, as the original direction 5-tuple represents the
packet headers before any such transformation.
When using the original direction 5-tuple the admissibility of return
and/or related packets need not be based on the mere existence of a
conntrack entry, allowing separation of admission policy from the
established conntrack state. While existence of a conntrack entry is
required for admission of the return or related packets, policy
changes can render connections that were initially admitted to be
rejected or dropped afterwards. If the admission of the return and
related packets was based on mere conntrack state (e.g., connection
being in an established state), a policy change that would make the
connection rejected or dropped would need to find and delete all
conntrack entries affected by such a change. When using the original
direction 5-tuple matching the affected conntrack entries can be
allowed to time out instead, as the established state of the
connection would not need to be the basis for packet admission any
more.
It should be noted that the directionality of related connections may
be the same or different than that of the master connection, and
neither the original direction 5-tuple nor the conntrack state bits
carry this information. If needed, the directionality of the master
connection can be stored in master's conntrack mark or labels, which
are automatically inherited by the expected related connections.
The fact that neither ARP nor ND packets are trackable by conntrack
allows mutual exclusion between ARP/ND and the new conntrack original
tuple fields. Hence, the IP addresses are overlaid in union with ARP
and ND fields. This allows the sw_flow_key to not grow much due to
this patch, but it also means that we must be careful to never use the
new key fields with ARP or ND packets. ARP is easy to distinguish and
keep mutually exclusive based on the ethernet type, but ND being an
ICMPv6 protocol requires a bit more attention.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch squashes in minimal amount of OVS userspace code to not
break the build. Later patches contain the full userspace support.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Add DPIF-level infrastructure for meters. Allow meter_set to modify
the meter configuration (e.g. set the burst size if unspecified).
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Signed-off-by: Andy Zhou <azhou@ovn.org>
Upstream commit:
commit 91820da6ae85904d95ed53bf3a83f9ec44a6b80a
Author: Jiri Benc <jbenc@redhat.com>
Date: Thu Nov 10 16:28:23 2016 +0100
openvswitch: add Ethernet push and pop actions
It's not allowed to push Ethernet header in front of another Ethernet
header.
It's not allowed to pop Ethernet header if there's a vlan tag. This
preserves the invariant that L3 packet never has a vlan tag.
Based on previous versions by Lorand Jakab and Simon Horman.
Signed-off-by: Lorand Jakab <lojakab@cisco.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
[Committer notes]
Fix build with the upstream commit by folding in the required switch
case enum handlers.
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
When adding userspace datapath clone action, the corresponding odp
actions parser and unit tests were missing. This patch adds them.
Reported-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Andy Zhou <azhou@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
Add support for userspace datapath clone action. The clone action
provides an action envelope to enclose an action list.
For example, with actions A, B, C and D, and an action list:
A, clone(B, C), D
The clone action will ensure that:
- D will see the same packet, and any meta states, such as flow, as
action B.
- D will be executed regardless whether B, or C drops a packet. They
can only drop a clone.
- When B drops a packet, clone will skip all remaining actions
within the clone envelope. This feature is useful when we add
meter action later: The meter action can be implemented as a
simple action without its own envolop (unlike the sample action).
When necessary, the flow translation layer can enclose a meter action
in clone.
The clone action is very similar with the OpenFlow clone action.
This is by design to simplify vswitchd flow translation logic.
Without datapath clone, vswitchd simulate the effect by inserting
datapath actions to "undo" clone actions. The above flow will be
translated into A, B, C, -C, -B, D.
However, there are two issues:
- The resulting datapath action list may be longer without using
clone.
- Some actions, such as NAT may not be possible to reverse.
This patch implements clone() simply with packet copy. The performance
can be improved with later patches, for example, to delay or avoid
packet copy if possible. It seems datapath should have enough context
to carry out such optimization without the userspace context.
Signed-off-by: Andy Zhou <azhou@ovn.org>
Acked-by: Jarno Rajahalme <jarno@ovn.org>
Code is simplified when the ODP keys use the same type as the struct
flow for the IPv6 addresses. As the change is facilitated by
extract-odp-netlink-h, this change only affects the userspace. We
already do the same for the ethernet addresses.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
This patch fixes problems with MPLS handling related to patch ports
and group buckets.
If a group bucket or a peer bridge across a patch port pushes MPLS
headers to a non-MPLS packet and outputs, the flow translation after
returning from the group bucket or patch port would undo the packet
transformations so that the processing could continue with the packet
as it was before entering the patch port. There were two problems
with this:
1. As part of the first MPLS push on a non-MPLS packet, the flow
translation would first clear the L3/4 headers of the 'flow' to mark
those fields invalid. Later, when committing 'flow' changes to
datapath actions before output, the necessary datapath MPLS actions
are created and the corresponding changes updated to the 'base flow'.
This was done using the same flow_push_mpls() function that clears
the L2/3 headers, so also the 'base flow' L2/3 headers were cleared.
Then, when translation returns from a patch port or group bucket, the
original 'flow' is restored, now showing no sign of the MPLS labels.
Since the 'base flow' now has the MPLS labels, following translations
know to issue MPLS POP actions before any output actions. However, as
part of checking for changes to IP headers we test that the IP
protocol type was not changed. But now the 'base flow's 'nw_proto'
field is zero and an assert fail crashes OVS.
This is solved by not clearing the L3/4 fields of the 'base
flow'. This allows the processing after the patch port to continue
with L3/4 fields as if no MPLS was done, after first issuing the
necessary MPLS POP actions.
2. IP header updates were done before the MPLS POP actions were
issued. This caused incorrect packet output after, e.g., group action
or patch port. For example, with actions:
group 1234: all bucket=push_mpls,output:LOCAL
ip actions=group:1234,dec_ttl,output:LOCAL,output:LOCAL
the dec_ttl would only be executed before the last output to LOCAL,
since at the time of committing IP changes after the group action the
packet was still an MPLS packet.
This is solved by checking the dl_type of both 'flow' and 'base flow'
and issuing MPLS actions if they can transform the packet from an MPLS
packet to a non-MPLS packet. For an IP packet the change in ttl can
then be correctly committed before the last two output actions.
Two test cases are added to prevent future regressions.
Reported-by: Thomas Morin <thomas.morin@orange.com>
Suggested-by: Takashi YAMAMOTO <yamamoto@ovn.org>
Fixes: 8bfd0fdac ("Enhance userspace support for MPLS, for up to 3 labels.")
Fixes: 1b035ef20 ("mpls: Allow l3 and l4 actions to prior to a push_mpls action")
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: YAMAMOTO Takashi <yamamoto@ovn.org>
Before Open vSwitch 2.5.90, IPFIX reports from Open vSwitch didn't include
whether the packet was ingressing or egressing the switch. Starting in
OVS 2.5.90, this information was available but only accurate if the action
included a port number that indicated a tunnel. Conflating these two does
not always make sense (not every packet involves a tunnel!), so this patch
makes it possible for the sample action to simply say whether it's for
ingress or egress.
This is difficult to test, since the "tests" directory of OVS does not have
a proper IPFIX listener. This passes those tests, plus a couple that just
verify that the actions are properly parsed and formatted. Benli did test
it end-to-end in a VMware use case.
Requested-by: Benli Ye <daniely@vmware.com>
Tested-by: Benli Ye <daniely@vmware.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Simon Horman <simon.horman@netronome.com>
This helper is a little tidier than the alternative. Use it treewide.
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Simon Horman <simon.horman@netronome.com>
When using tunnel TLVs (at the moment, this means Geneve options), a
controller must first map the class and type onto an appropriate OXM
field so that it can be used in OVS flow operations. This table is
managed using OpenFlow extensions.
The original code that added support for TLVs made the mapping table
global as a simplification. However, this is not really logically
correct as the OpenFlow management commands are operating on a per-bridge
basis. This removes the original limitation to make the table per-bridge.
One nice result of this change is that it is generally clearer whether
the tunnel metadata is in datapath or OpenFlow format. Rather than
allowing ad-hoc format changes and trying to handle both formats in the
tunnel metadata functions, the format is more clearly separated by function.
Datapaths (both kernel and userspace) use datapath format and it is not
changed during the upcall process. At the beginning of action translation,
tunnel metadata is converted to OpenFlow format and flows and wildcards
are translated back at the end of the process.
As an additional benefit, this change improves performance in some flow
setup situations by keeping the tunnel metadata in the original packet
format in more cases. This helps when copies need to be made as the amount
of data touched is only what is present in the packet rather than the
maximum amount of metadata supported.
Co-authored-by: Madhu Challa <challa@noironetworks.com>
Signed-off-by: Madhu Challa <challa@noironetworks.com>
Signed-off-by: Jesse Gross <jesse@kernel.org>
Acked-by: Ben Pfaff <blp@ovn.org>
Upstream commit:
commit b46f6ded906ef0be52a4881ba50a084aeca64d7e
Author: Nicolas Dichtel <nicolas.dichtel@6wind.com>
libnl: nla_put_be64(): align on a 64-bit area
nla_data() is now aligned on a 64-bit area.
A temporary version (nla_put_be64_32bit()) is added for nla_put_net64().
This function is removed in the next patch.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>