STT and LISP tunnel types were deprecated and marked for removal in
the following commits in the OVS 3.5 release:
3b37a6154a59 ("netdev-vport: Deprecate STT tunnel port type.")
8d7ac031c03d ("netdev-vport: Deprecate LISP tunnel port type.")
Main reasons were that STT was rejected in upstream kernel and the
LISP was never upstreamed as well and doesn't really have a supported
implementation. Both protocols also appear to have lost their former
relevance.
Removing both now. While at it, also fixing some small documentation
issues and comments.
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Alin Serdean <aserdean@ovn.org>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Add support for parsing and formatting the new action.
Also, flag OVS_ACTION_ATTR_SAMPLE as requiring datapath assistance if it
contains a nested OVS_ACTION_ATTR_PSAMPLE. The reason is that the
sampling rate from the parent "sample" is made available to the nested
"psample" by the kernel.
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Changed sFlowRcvrTimeout to a uint32_t to avoid time_t warnings
reported by Coverity. A uint32_t is more than large enough as
this is a (seconds) tick counter and OVS is not even using this.
Fixes: c72e245a0e2c ("Add InMon's sFlow Agent library to the build system.")
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
This is prep for adding a different OVS_ACTION_ATTR_ enum value. This
action, OVS_ACTION_ATTR_DEC_TTL, is not actually implemented. However,
to make -Werror happy we must add a case to all existing switches.
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Eric Garver <eric@garver.life>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Currently, the netdev's speed is being calculated by taking the link's
feature bits (using netdev_get_features()) and transforming them into
bps.
This mechanism can be both inaccurate and difficult to maintain, mainly
because we currently use the feature bits supported by OpenFlow which
would have to be extended to support all new feature bits of all netdev
implementations while keeping the OpenFlow API intact.
In order to expose the link speed accurately for all current and future
hardware, add a new netdev API call that allows the implementations to
provide the current and maximum link speeds in Mbps.
Internally, the logic to get the maximum supported speed still relies on
feature bits so it might still get out of sync in the future. However,
the maximum configurable speed is not used as much as the current speed
and these feature bits are not exposed through the netdev interface so
it should be easier to add more.
Use this new function instead of netdev_get_features() where the link
speed is needed.
As a consequence of this patch, link speeds of cards is properly
reported (internally in OVSDB) even if not supported by OpenFlow.
A test verifies this behavior using a tap device.
Also, in order to avoid using the old, this patch adds a checkpatch.py
warning if the old API is used.
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2137567
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Using SHORT version of the *_SAFE loops makes the code cleaner and less
error prone. So, use the SHORT version and remove the extra variable
when possible for hmap and all its derived types.
In order to be able to use both long and short versions without changing
the name of the macro for all the clients, overload the existing name
and select the appropriate version depending on the number of arguments.
Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Clustered OVSDB allows to use DNS names as addresses of raft members.
However, if DNS resolution fails during the initial database read,
this causes a fatal failure and exit of the ovsdb-server process.
Also, if DNS name of a joining server is not resolvable for one of the
followers, this follower will reject append requests for a new server
to join until the name is successfully resolved. This makes a follower
effectively non-functional while DNS is unavailable.
To fix the problem relax the address verification. Allowing validation
to pass if only name resolution failed and the address is valid
otherwise. This will allow addresses to be added to the database, so
connections could be established later when the DNS is available.
Additionally fixing missed initialization of the dns-resolve module.
Without it, DNS requests are blocking. This causes unexpected delays
in runtime.
Fixes: 771680d96fb6 ("DNS: Add basic support for asynchronous DNS resolving")
Reported-at: https://bugzilla.redhat.com/2055097
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Upstream commit:
commit 1926407a4ab0e59d5a27bed7b82029b356d80fa0
Author: Ilya Maximets <i.maximets@ovn.org>
Date: Wed Mar 9 23:20:33 2022 +0100
net: openvswitch: fix uAPI incompatibility with existing user space
Few years ago OVS user space made a strange choice in the commit [1]
to define types only valid for the user space inside the copy of a
kernel uAPI header. '#ifndef __KERNEL__' and another attribute was
added later.
This leads to the inevitable clash between user space and kernel types
when the kernel uAPI is extended. The issue was unveiled with the
addition of a new type for IPv6 extension header in kernel uAPI.
When kernel provides the OVS_KEY_ATTR_IPV6_EXTHDRS attribute to the
older user space application, application tries to parse it as
OVS_KEY_ATTR_PACKET_TYPE and discards the whole netlink message as
malformed. Since OVS_KEY_ATTR_IPV6_EXTHDRS is supplied along with
every IPv6 packet that goes to the user space, IPv6 support is fully
broken.
Fixing that by bringing these user space attributes to the kernel
uAPI to avoid the clash. Strictly speaking this is not the problem
of the kernel uAPI, but changing it is the only way to avoid breakage
of the older user space applications at this point.
These 2 types are explicitly rejected now since they should not be
passed to the kernel. Additionally, OVS_KEY_ATTR_TUNNEL_INFO moved
out from the '#ifdef __KERNEL__' as there is no good reason to hide
it from the userspace. And it's also explicitly rejected now, because
it's for in-kernel use only.
Comments with warnings were added to avoid the problem coming back.
(1 << type) converted to (1ULL << type) to avoid integer overflow on
OVS_KEY_ATTR_IPV6_EXTHDRS, since it equals 32 now.
[1] beb75a40fdc2 ("userspace: Switching of L3 packets in L2 pipeline")
Fixes: 28a3f0601727 ("net: openvswitch: IPv6: Add IPv6 extension header support")
Link: https://lore.kernel.org/netdev/3adf00c7-fe65-3ef4-b6d7-6d8a0cad8a5f@nvidia.com
Link: beb75a40fd
Reported-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Link: https://lore.kernel.org/r/20220309222033.3018976-1-i.maximets@ovn.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Not adding OVS_KEY_ATTR_IPV6_EXTHDRS in this commit as this is not
necessary. Will be added along with the actual userspace
implementation.
This change should help avoiding incompatibility issues in the future.
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The encap & decap actions are extended to support MPLS packet type.
Encap & decap actions adds and removes MPLS header at start of the
packet.
The existing PUSH MPLS & POP MPLS actions inserts & removes MPLS
header between ethernet header and the IP header. Though this behaviour
is fine for L3 VPN where an IP packet is encapsulated inside a MPLS
tunnel, it does not suffice the L2 VPN requirements. In L2 VPN the
ethernet packets must be encapsulated inside MPLS tunnel.
In this change the encap & decap actions are extended to support MPLS
packet type. The encap & decap adds and removes MPLS header at the
start of packet as depicted below.
Encapsulation:
Actions - encap(mpls),encap(ethernet)
Incoming packet -> | ETH | IP | Payload |
1 Actions - encap(mpls) [Datapath action - ADD_MPLS:0x8847]
Outgoing packet -> | MPLS | ETH | Payload|
2 Actions - encap(ethernet) [ Datapath action - push_eth ]
Outgoing packet -> | ETH | MPLS | ETH | Payload|
Decapsulation:
Incoming packet -> | ETH | MPLS | ETH | IP | Payload |
Actions - decap(),decap(packet_type(ns=0,type=0))
1 Actions - decap() [Datapath action - pop_eth)
Outgoing packet -> | MPLS | ETH | IP | Payload|
2 Actions - decap(packet_type(ns=0,type=0)) [Datapath action - POP_MPLS:0x6558]
Outgoing packet -> | ETH | IP | Payload|
Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
sFlow agent may not exist while calling dpif_sflow_received. The sFlow
may be deleted in another thread, eg, by user, which will cause crash.
Signed-off-by: Gavin Li <gavinl@nvidia.com>
Reviewed-by: Gavi Teitz <gavi@nvidia.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
The new term is "member".
Most of these changes should not change user-visible behavior. One
place where they do is in "ovs-ofctl dump-flows", which will now output
"members:..." inside "bundle" actions instead of "slaves:...". I don't
expect this to cause real problems in most systems. The old syntax
is still supported on input for backward compatibility.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Problem:
In OVS, flows with output over a bond interface of type “balance-tcp”
gets translated by the ofproto layer into "HASH" and "RECIRC" datapath
actions. After recirculation, the packet is forwarded to the bond
member port based on 8-bits of the datapath hash value computed through
dp_hash. This causes performance degradation in the following ways:
1. The recirculation of the packet implies another lookup of the
packet’s flow key in the exact match cache (EMC) and potentially
Megaflow classifier (DPCLS). This is the biggest cost factor.
2. The recirculated packets have a new “RSS” hash and compete with the
original packets for the scarce number of EMC slots. This implies more
EMC misses and potentially EMC thrashing causing costly DPCLS lookups.
3. The 256 extra megaflow entries per bond for dp_hash bond selection
put additional load on the revalidation threads.
Owing to this performance degradation, deployments stick to “balance-slb”
bond mode even though it does not do active-active load balancing for
VXLAN- and GRE-tunnelled traffic because all tunnel packet have the
same source MAC address.
Proposed optimization:
This proposal introduces a new load-balancing output action instead of
recirculation.
Maintain one table per-bond (could just be an array of uint16's) and
program it the same way internal flows are created today for each
possible hash value (256 entries) from ofproto layer. Use this table to
load-balance flows as part of output action processing.
Currently xlate_normal() -> output_normal() ->
bond_update_post_recirc_rules() -> bond_may_recirc() and
compose_output_action__() generate 'dp_hash(hash_l4(0))' and
'recirc(<RecircID>)' actions. In this case the RecircID identifies the
bond. For the recirculated packets the ofproto layer installs megaflow
entries that match on RecircID and masked dp_hash and send them to the
corresponding output port.
Instead, we will now generate action as
'lb_output(<bond id>)'
This combines hash computation (only if needed, else re-use RSS hash)
and inline load-balancing over the bond. This action is used *only* for
balance-tcp bonds in userspace datapath (the OVS kernel datapath
remains unchanged).
Example:
Current scheme:
With 8 UDP flows (with random UDP src port):
flow-dump from pmd on cpu core: 2
recirc_id(0),in_port(7),<...> actions:hash(hash_l4(0)),recirc(0x1)
recirc_id(0x1),dp_hash(0xf8e02b7e/0xff),<...> actions:2
recirc_id(0x1),dp_hash(0xb236c260/0xff),<...> actions:1
recirc_id(0x1),dp_hash(0x7d89eb18/0xff),<...> actions:1
recirc_id(0x1),dp_hash(0xa78d75df/0xff),<...> actions:2
recirc_id(0x1),dp_hash(0xb58d846f/0xff),<...> actions:2
recirc_id(0x1),dp_hash(0x24534406/0xff),<...> actions:1
recirc_id(0x1),dp_hash(0x3cf32550/0xff),<...> actions:1
New scheme:
We can do with a single flow entry (for any number of new flows):
in_port(7),<...> actions:lb_output(1)
A new CLI has been added to dump datapath bond cache as given below.
# ovs-appctl dpif-netdev/bond-show [dp]
Bond cache:
bond-id 1 :
bucket 0 - slave 2
bucket 1 - slave 1
bucket 2 - slave 2
bucket 3 - slave 1
Co-authored-by: Manohar Krishnappa Chidambaraswamy <manukc@gmail.com>
Signed-off-by: Manohar Krishnappa Chidambaraswamy <manukc@gmail.com>
Signed-off-by: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com>
Tested-by: Matteo Croce <mcroce@redhat.com>
Tested-by: Adrian Moreno <amorenoz@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Currently OVS maintains explicit packet drop/error counters only on port
level. Packets that are dropped as part of normal OpenFlow processing
are counted in flow stats of “drop” flows or as table misses in table
stats. These can only be interpreted by controllers that know the
semantics of the configured OpenFlow pipeline. Without that knowledge,
it is impossible for an OVS user to obtain e.g. the total number of
packets dropped due to OpenFlow rules.
Furthermore, there are numerous other reasons for which packets can be
dropped by OVS slow path that are not related to the OpenFlow pipeline.
The generated datapath flow entries include a drop action to avoid
further expensive upcalls to the slow path, but subsequent packets
dropped by the datapath are not accounted anywhere.
Finally, the datapath itself drops packets in certain error situations.
Also, these drops are today not accounted for.This makes it difficult
for OVS users to monitor packet drop in an OVS instance and to alert a
management system in case of a unexpected increase of such drops.
Also OVS trouble-shooters face difficulties in analysing packet drops.
With this patch we implement following changes to address the issues
mentioned above.
1. Identify and account all the silent packet drop scenarios
2. Display these drops in ovs-appctl coverage/show
Co-authored-by: Rohith Basavaraja <rohith.basavaraja@gmail.com>
Co-authored-by: Keshav Gupta <keshugupta1@gmail.com>
Signed-off-by: Anju Thomas <anju.thomas@ericsson.com>
Signed-off-by: Rohith Basavaraja <rohith.basavaraja@gmail.com>
Signed-off-by: Keshav Gupta <keshugupta1@gmail.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com
Acked-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This patch adds a new action 'check_pkt_larger' which checks if the
packet is larger than the given size and stores the result in the
destination register.
Usage: check_pkt_larger(len)->REGISTER
Eg. match=...,actions=check_pkt_larger(1442)->NXM_NX_REG0[0],next;
This patch makes use of the new datapath action - 'check_pkt_len'
which was recently added in the commit [1].
At the start of ovs-vswitchd, datapath is probed for this action.
If the datapath action is present, then 'check_pkt_larger'
makes use of this datapath action.
Datapath action 'check_pkt_len' takes these nlattrs
* OVS_CHECK_PKT_LEN_ATTR_PKT_LEN - 'pkt_len' to check for
* OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER (optional) - Nested actions
to apply if the packet length is greater than the specified 'pkt_len'
* OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL (optional) - Nested
actions to apply if the packet length is lesser or equal to the
specified 'pkt_len'.
Let's say we have these flows added to an OVS bridge br-int
table=0, priority=100 in_port=1,ip,actions=check_pkt_larger:100->NXM_NX_REG0[0],resubmit(,1)
table=1, priority=200,in_port=1,ip,reg0=0x1/0x1 actions=output:3
table=1, priority=100,in_port=1,ip,actions=output:4
Then the action 'check_pkt_larger' will be translated as
- check_pkt_len(size=100,gt(3),le(4))
datapath will check the packet length and if the packet length is greater than 100,
it will output to port 3, else it will output to port 4.
In case, datapath doesn't support 'check_pkt_len' action, the OVS action
'check_pkt_larger' sets SLOW_ACTION so that datapath flow is not added.
This OVS action is intended to be used by OVN to check the packet length
and generate an ICMP packet with type 3, code 4 and next hop mtu
in the logical router pipeline if the MTU of the physical interface
is lesser than the packet length. More information can be found here [2]
[1] - 4d5ec89fc8
[2] - https://mail.openvswitch.org/pipermail/ovs-discuss/2018-July/047039.html
Reported-at:
https://mail.openvswitch.org/pipermail/ovs-discuss/2018-July/047039.html
Suggested-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
CC: Ben Pfaff <blp@ovn.org>
CC: Gregory Rose <gvrose8192@gmail.com>
Acked-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
As a side effect, this also reduces a lot of log messages' severities from
ERR to WARN. They just didn't seem like messages that in general reported
anything that would prevent functioning.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Currently OVS supports all ARP protocol fields as OXM match fields to
implement the relevant ARP procedures for IPv4. This includes support
for matching copying and setting ARP fields. In IPv6 ARP has been
replaced by ICMPv6 neighbor discovery (ND) procedures, neighbor
advertisement and neighbor solicitation.
The support for ICMPv6 fields in OVS is not complete for the use cases
equivalent to ARP in IPv4. OVS lacks support for matching, copying and
setting the “ND option type” and “ND reserved” fields. Without these user
cannot implement all ICMPv6 ND procedures for IPv6 support.
This commit adds additional OXM fields to OVS for ICMPv6 “ND option type“
and ICMPv6 “ND reserved” using the OXM extension mechanism. This allows
support for parsing these fields from an ICMPv6 packet header and extending
the OpenFlow protocol with specifications for these new OXM fields for
matching, copying and setting.
Signed-off-by: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com>
Co-authored-by: Ashvin Lakshmikantha <ashvin.lakshmikantha@ericsson.com>
Signed-off-by: Ashvin Lakshmikantha <ashvin.lakshmikantha@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
When 'make check' is called by the mock rpm build (which disables networking),
the test "ovn-nbctl: LBs - daemon" fails when it runs the command
"ovn-nbctl lb-add lb0 30.0.0.1a 192.168.10.10:80,192.168.10.20:80". ovn-nbctl
extracts the vip by calling the socket util function 'inet_parse_active()',
and this function blocks when libunbound function ub_resolve() is called
further down. ub_resolve() is a blocking function without timeout and all the
ovs/ovn utilities use this function.
As reported by Timothy Redaelli, the issue can also be reproduced by running
the below commands
$ sudo unshare -mn -- sh -c 'ip addr add dev lo 127.0.0.1 && \
mount --bind /dev/null /etc/resolv.conf && runuser $SUDO_USER'
$ make sandbox SANDBOXFLAGS="--ovn"
$ ovn-nbctl -vsocket_util:off lb-add lb0 30.0.0.1a \
192.168.10.10:80,192.168.10.20:80
To address this issue, this patch adds a new bool argument 'resolve_host' to
the function inet_parse_active() to resolve the host only if it is 'true'.
ovn-nbctl/ovn-northd will pass 'false' when it calls this function to parse
the load balancer values.
Reported-by: Timothy Redaelli <tredaelli@redhat.com>
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1641672
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
If an agent address is not provided, OVS tries to choose a source
address based on the source IP that would be used to connect to the
sFlow collector. The code previously set the agent address to the
collector's address instead of using the calculated source address.
This patch properly uses the source address.
Reported-by: Neil McKee <neil.mckee@inmon.com>
Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
It's always been OVS coding style to use spaces rather than tabs for
indentation, but some tabs have snuck in over time. This commit converts
them to spaces.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
It makes OVS native tunneling honor tunnel-specified source addresses,
in the same way that Linux kernel tunneling honors them.
This patch made valid tun_src specified by flow-action can be used for
tunnel_src of packet. add a "local" property for a route entry and enhance
the priority of local route higher than user route.
Like the kernel space when lookup the route, if there are tun_src specified
by flow-action or port options. Check the tun_src wheather is a local
address, then lookup the route.
Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: frank.zeng <frank.zeng@ucloud.cn>
Signed-off-by: Ben Pfaff <blp@ovn.org>
At least for testing purposes, and perhaps in production, it is useful to
be able to fix the agent IP address directly, rather that indirecting it
through a device name or the routing table.
This commit uses this feature to fix the agent IP address used in the unit
tests. This will be particularly useful in an upcoming commit that
disables the use of the system routing table in the unit tests, to make
the tests' results independent of the host's routes.
CC: Neil McKee <neil.mckee@inmon.com>
Tested-By: Timothy Redaelli <tredaelli@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Until now, dpif_sflow_read_actions() has ignored actions inside clone.
This means that sflow missed tnl_push actions inside clone, which OVS
now uses to avoid tx recirculation. This commit fixes the problem
by making dpif_sflow_read_actions() recursively process actions inside
clone.
In addition, some sflow data needs to be stored and restored in
ofproto-dpif-xlate when native_tunnel_output() is invoked. Otherwise the
output action of underlay bridge is getting counted too when sFlow is set
on the overlay bridge.
Both bugs are connected to sflows and were introduced by the commit in
the "Fixes:" tag below.
Signed-off-by: Zoltan Balogh <zoltan.balogh@ericsson.com>
CC: Sugesh Chandran <sugesh.chandran@intel.com>
Fixes: 7c12dfc527a5 ("tunneling: Avoid datapath-recirc by combining recirc actions at xlate.")
Signed-off-by: Ben Pfaff <blp@ovn.org>
This supports using the ct_clear action in the kernel datapath. To
preserve compatibility with current ct_clear behavior on old kernels, we
only pass this action down to the datapath if a probe reveals the
datapath actually supports it.
Signed-off-by: Eric Garver <e@erig.me>
Acked-by: William Tu <u9012063@gmail.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Justin Pettit <jpettit@ovn.org>
This patch changes OVS_KEY_ATTR_NSH
to nested attribute and adds three new NSH sub attribute keys:
OVS_NSH_KEY_ATTR_BASE: for length-fixed NSH base header
OVS_NSH_KEY_ATTR_MD1: for length-fixed MD type 1 context
OVS_NSH_KEY_ATTR_MD2: for length-variable MD type 2 metadata
Its intention is to align to NSH kernel implementation.
NSH match fields, set and PUSH_NSH action all use the below
nested attribute format:
OVS_KEY_ATTR_NSH begin
OVS_NSH_KEY_ATTR_BASE
OVS_NSH_KEY_ATTR_MD1
OVS_KEY_ATTR_NSH end
or
OVS_KEY_ATTR_NSH begin
OVS_NSH_KEY_ATTR_BASE
OVS_NSH_KEY_ATTR_MD2
OVS_KEY_ATTR_NSH end
In addition, NSH encap and decap actions are renamed as push_nsh
and pop_nsh to meet action naming convention.
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Poll-loop is the core to implement main loop. It should be available in
libopenvswitch.
Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
This commit adds translation and netdev datapath support for generic
encap and decap actions for the NSH MD1 header. The generic encap and
decap actions are mapped to specific encap_nsh and decap_nsh actions
in the datapath.
The translation follows that general scheme that decap() of an NSH
packet triggers recirculation after decapsulation, while encap(nsh)
just modifies struct flow and sets the ctx->pending_encap flag to
generate the encap_nsh action at the next commit to be able to include
subsequent set_field actions for NSH headers.
Support for the flexible MD2 format using TLV properties is foreseen
in encap(nsh), but not yet fully implemented.
The CLI syntax for encap of NSH is
encap(nsh(md_type=1))
encap(nsh(md_type=2[,tlv(<tlv_class>,<tlv_type>,<hex_string>),...]))
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
This patch adds support for NSH packet header fields to the OVS
control plane and the userspace datapath. Initially we support the
fields of the NSH base header as defined in
https://www.ietf.org/id/draft-ietf-sfc-nsh-13.txt
and the fixed context headers specified for metadata format MD1.
The variable length MD2 format is parsed but the TLV context headers
are not yet available for matching.
The NSH fields are modelled as experimenter fields with the dedicated
experimenter class 0x005ad650 proposed for NSH in ONF. The following
fields are defined:
NXOXM code ofctl name Size Comment
=====================================================================
NXOXM_NSH_FLAGS nsh_flags 8 Bits 2-9 of 1st NSH word
(0x005ad650,1)
NXOXM_NSH_MDTYPE nsh_mdtype 8 Bits 16-23
(0x005ad650,2)
NXOXM_NSH_NEXTPROTO nsh_np 8 Bits 24-31
(0x005ad650,3)
NXOXM_NSH_SPI nsh_spi 24 Bits 0-23 of 2nd NSH word
(0x005ad650,4)
NXOXM_NSH_SI nsh_si 8 Bits 24-31
(0x005ad650,5)
NXOXM_NSH_C1 nsh_c1 32 Maskable, nsh_mdtype==1
(0x005ad650,6)
NXOXM_NSH_C2 nsh_c2 32 Maskable, nsh_mdtype==1
(0x005ad650,7)
NXOXM_NSH_C3 nsh_c3 32 Maskable, nsh_mdtype==1
(0x005ad650,8)
NXOXM_NSH_C4 nsh_c4 32 Maskable, nsh_mdtype==1
(0x005ad650,9)
Co-authored-by: Johnson Li <johnson.li@intel.com>
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Using the correct type reduces the need for type conversions.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Jan Scheurich <jan.scheurich@ericsson.com>
Reviewed-by: nickcooper-zhangtonghao <nic@opencloud.tech>
Ports have a new layer3 attribute if they send/receive L3 packets.
The packet_type included in structs dp_packet and flow is considered in
ofproto-dpif. The classical L2 match fields (dl_src, dl_dst, dl_type, and
vlan_tci, vlan_vid, vlan_pcp) now have Ethernet as pre-requisite.
A dummy ethernet header is pushed to L3 packets received from L3 ports
before the the pipeline processing starts. The ethernet header is popped
before sending a packet to a L3 port.
For datapath ports that can receive L2 or L3 packets, the packet_type
becomes part of the flow key for datapath flows and is handled
appropriately in dpif-netdev.
In the 'else' branch in flow_put_on_pmd() function, the additional check
flow_equal(&match.flow, &netdev_flow->flow) was removed, as a) the dpcls
lookup is sufficient to uniquely identify a flow and b) it caused false
negatives because the flow in netdev->flow may not properly masked.
In dpif_netdev_flow_put() we now use the same method for constructing the
netdev_flow_key as the one used when adding the flow to the dplcs to make sure
these always match. The function netdev_flow_key_from_flow() used so far was
not only inefficient but sometimes caused mismatches and subsequent flow
update failures.
The kernel datapath does not support the packet_type match field.
Instead it encodes the packet type implictly by the presence or absence of
the Ethernet attribute in the flow key and mask.
This patch filters the PACKET_TYPE attribute out of netlink flow key and
mask to be sent to the kernel datapath.
Signed-off-by: Lorand Jakab <lojakab@cisco.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Co-authored-by: Zoltan Balogh <zoltan.balogh@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Add support for actions push_eth and pop_eth to the netdev datapath and
the supporting libraries. This patch relies on the support for these actions
in the kernel datapath to be present.
Signed-off-by: Lorand Jakab <lojakab@cisco.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Jean Tourrilhes <jt@labs.hpe.com>
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Co-authored-by: Zoltan Balogh <zoltan.balogh@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Flow key handling changes:
- Add VLAN header array in struct flow, to record multiple 802.1q VLAN
headers.
- Add dpif multi-VLAN capability probing. If datapath supports
multi-VLAN, increase the maximum depth of nested OVS_KEY_ATTR_ENCAP.
Refactor VLAN handling in dpif-xlate:
- Introduce 'xvlan' to track VLAN stack during flow processing.
- Input and output VLAN translation according to the xbundle type.
Push VLAN action support:
- Allow ethertype 0x88a8 in VLAN headers and push_vlan action.
- Support push_vlan on dot1q packets.
Use other_config:vlan-limit in table Open_vSwitch to limit maximum VLANs
that can be matched. This allows us to preserve backwards compatibility.
Add test cases for VLAN depth limit, Multi-VLAN actions and QinQ VLAN
handling
Co-authored-by: Thomas F Herbert <thomasfherbert@gmail.com>
Signed-off-by: Thomas F Herbert <thomasfherbert@gmail.com>
Co-authored-by: Xiao Liang <shaw.leon@gmail.com>
Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Signed-off-by: Eric Garver <e@erig.me>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Expose existing netdev stats via sFlow.
Export sFlow ETHERNET structure with available counters.
Map existing stats to counters in the GENERIC INTERFACE
sFlow structure.
Adjust unit test to accommodate these new counters.
Signed-off-by: Robert Wojciechowicz <robertx.wojciechowicz@intel.com>
Acked-by: Neil McKee <neil.mckee@inmon.com>
Acked-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Upstream commit:
commit 9dd7f8907c3705dc7a7a375d1c6e30b06e6daffc
Author: Jarno Rajahalme <jarno@ovn.org>
Date: Thu Feb 9 11:21:59 2017 -0800
openvswitch: Add original direction conntrack tuple to sw_flow_key.
Add the fields of the conntrack original direction 5-tuple to struct
sw_flow_key. The new fields are initially marked as non-existent, and
are populated whenever a conntrack action is executed and either finds
or generates a conntrack entry. This means that these fields exist
for all packets that were not rejected by conntrack as untrackable.
The original tuple fields in the sw_flow_key are filled from the
original direction tuple of the conntrack entry relating to the
current packet, or from the original direction tuple of the master
conntrack entry, if the current conntrack entry has a master.
Generally, expected connections of connections having an assigned
helper (e.g., FTP), have a master conntrack entry.
The main purpose of the new conntrack original tuple fields is to
allow matching on them for policy decision purposes, with the premise
that the admissibility of tracked connections reply packets (as well
as original direction packets), and both direction packets of any
related connections may be based on ACL rules applying to the master
connection's original direction 5-tuple. This also makes it easier to
make policy decisions when the actual packet headers might have been
transformed by NAT, as the original direction 5-tuple represents the
packet headers before any such transformation.
When using the original direction 5-tuple the admissibility of return
and/or related packets need not be based on the mere existence of a
conntrack entry, allowing separation of admission policy from the
established conntrack state. While existence of a conntrack entry is
required for admission of the return or related packets, policy
changes can render connections that were initially admitted to be
rejected or dropped afterwards. If the admission of the return and
related packets was based on mere conntrack state (e.g., connection
being in an established state), a policy change that would make the
connection rejected or dropped would need to find and delete all
conntrack entries affected by such a change. When using the original
direction 5-tuple matching the affected conntrack entries can be
allowed to time out instead, as the established state of the
connection would not need to be the basis for packet admission any
more.
It should be noted that the directionality of related connections may
be the same or different than that of the master connection, and
neither the original direction 5-tuple nor the conntrack state bits
carry this information. If needed, the directionality of the master
connection can be stored in master's conntrack mark or labels, which
are automatically inherited by the expected related connections.
The fact that neither ARP nor ND packets are trackable by conntrack
allows mutual exclusion between ARP/ND and the new conntrack original
tuple fields. Hence, the IP addresses are overlaid in union with ARP
and ND fields. This allows the sw_flow_key to not grow much due to
this patch, but it also means that we must be careful to never use the
new key fields with ARP or ND packets. ARP is easy to distinguish and
keep mutually exclusive based on the ethernet type, but ND being an
ICMPv6 protocol requires a bit more attention.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch squashes in minimal amount of OVS userspace code to not
break the build. Later patches contain the full userspace support.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Add DPIF-level infrastructure for meters. Allow meter_set to modify
the meter configuration (e.g. set the burst size if unspecified).
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Signed-off-by: Andy Zhou <azhou@ovn.org>
Upstream commit:
commit 91820da6ae85904d95ed53bf3a83f9ec44a6b80a
Author: Jiri Benc <jbenc@redhat.com>
Date: Thu Nov 10 16:28:23 2016 +0100
openvswitch: add Ethernet push and pop actions
It's not allowed to push Ethernet header in front of another Ethernet
header.
It's not allowed to pop Ethernet header if there's a vlan tag. This
preserves the invariant that L3 packet never has a vlan tag.
Based on previous versions by Lorand Jakab and Simon Horman.
Signed-off-by: Lorand Jakab <lojakab@cisco.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
[Committer notes]
Fix build with the upstream commit by folding in the required switch
case enum handlers.
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
OVS router is basically partial copy of linux kernel FIB.
kernel routing table uses skb-mark along with usual routing
parameters. Following patch brings in support for skb-mark
to ovs-router so that we can lookup route for given skb-mark.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jarno Rajahalme <jarno@ovn.org>
Add support for userspace datapath clone action. The clone action
provides an action envelope to enclose an action list.
For example, with actions A, B, C and D, and an action list:
A, clone(B, C), D
The clone action will ensure that:
- D will see the same packet, and any meta states, such as flow, as
action B.
- D will be executed regardless whether B, or C drops a packet. They
can only drop a clone.
- When B drops a packet, clone will skip all remaining actions
within the clone envelope. This feature is useful when we add
meter action later: The meter action can be implemented as a
simple action without its own envolop (unlike the sample action).
When necessary, the flow translation layer can enclose a meter action
in clone.
The clone action is very similar with the OpenFlow clone action.
This is by design to simplify vswitchd flow translation logic.
Without datapath clone, vswitchd simulate the effect by inserting
datapath actions to "undo" clone actions. The above flow will be
translated into A, B, C, -C, -B, D.
However, there are two issues:
- The resulting datapath action list may be longer without using
clone.
- Some actions, such as NAT may not be possible to reverse.
This patch implements clone() simply with packet copy. The performance
can be improved with later patches, for example, to delay or avoid
packet copy if possible. It seems datapath should have enough context
to carry out such optimization without the userspace context.
Signed-off-by: Andy Zhou <azhou@ovn.org>
Acked-by: Jarno Rajahalme <jarno@ovn.org>
This helper is a little tidier than the alternative. Use it treewide.
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Simon Horman <simon.horman@netronome.com>
OVS GRE IPsec tunnel support has multiple issues, Therefore
it was deprecated in OVS 2.6.
Following patch removes support for GRE IPsec and allows external
IPsec tunnel management for any type of tunnel not just GRE.
e.g. user can encrypt Geneve or VxLan traffic.
It can be done by using openflow pipeline to set skb-mark
and using IPsec keying daemons to implement IPsec tunnels.
This packet can be matched for the skb-mark to encrypt
selective tunnel traffic.
VMware-BZ: 1710701
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Ansis Atteka <aatteka@ovn.org>
When using tunnel TLVs (at the moment, this means Geneve options), a
controller must first map the class and type onto an appropriate OXM
field so that it can be used in OVS flow operations. This table is
managed using OpenFlow extensions.
The original code that added support for TLVs made the mapping table
global as a simplification. However, this is not really logically
correct as the OpenFlow management commands are operating on a per-bridge
basis. This removes the original limitation to make the table per-bridge.
One nice result of this change is that it is generally clearer whether
the tunnel metadata is in datapath or OpenFlow format. Rather than
allowing ad-hoc format changes and trying to handle both formats in the
tunnel metadata functions, the format is more clearly separated by function.
Datapaths (both kernel and userspace) use datapath format and it is not
changed during the upcall process. At the beginning of action translation,
tunnel metadata is converted to OpenFlow format and flows and wildcards
are translated back at the end of the process.
As an additional benefit, this change improves performance in some flow
setup situations by keeping the tunnel metadata in the original packet
format in more cases. This helps when copies need to be made as the amount
of data touched is only what is present in the packet rather than the
maximum amount of metadata supported.
Co-authored-by: Madhu Challa <challa@noironetworks.com>
Signed-off-by: Madhu Challa <challa@noironetworks.com>
Signed-off-by: Jesse Gross <jesse@kernel.org>
Acked-by: Ben Pfaff <blp@ovn.org>
Implementation of 'nullable_string_is_equal()' moved to util.c and
reused inside dpif-netdev.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
To easily allow both in- and out-of-tree building of the Python
wrapper for the OVS JSON parser (e.g. w/ pip), move json.h to
include/openvswitch. This also requires moving lib/{hmap,shash}.h.
Both hmap.h and shash.h were #include-ing "util.h" even though the
headers themselves did not use anything from there, but rather from
include/openvswitch/util.h. Fixing that required including util.h
in several C files mostly due to OVS_NOT_REACHED and things like
xmalloc.
Signed-off-by: Terry Wilson <twilson@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
The patch adds a new action to support packet truncation. The new action
is formatted as 'output(port=n,max_len=m)', as output to port n, with
packet size being MIN(original_size, m).
One use case is to enable port mirroring to send smaller packets to the
destination port so that only useful packet information is mirrored/copied,
saving some performance overhead of copying entire packet payload. Example
use case is below as well as shown in the testcases:
- Output to port 1 with max_len 100 bytes.
- The output packet size on port 1 will be MIN(original_packet_size, 100).
# ovs-ofctl add-flow br0 'actions=output(port=1,max_len=100)'
- The scope of max_len is limited to output action itself. The following
packet size of output:1 and output:2 will be intact.
# ovs-ofctl add-flow br0 \
'actions=output(port=1,max_len=100),output:1,output:2'
- The Datapath actions shows:
# Datapath actions: trunc(100),1,1,2
Tested-at: https://travis-ci.org/williamtu/ovs-travis/builds/140037134
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>