mir/ovs - ovs - Mike's Git repositories

mir/ovs

mirror of https://github.com/openvswitch/ovs synced 2025-08-31 14:25:26 +00:00

Author	SHA1	Message	Date
Ilya Maximets	8842fdf1b3	netdev-offload: Use dpif type instead of class. There is no real difference between the 'class' and 'type' in the context of common lookup operations inside netdev-offload module because it only checks the value of pointers without using the value itself. However, 'type' has some meaning and can be used by offload provides on the initialization phase to check if this type of Flow API in pair with the netdev type could be used in particular datapath type. For example, this is needed to check if Linux flow API could be used for current tunneling vport because it could be used only if tunneling vport belongs to system datapath, i.e. has backing linux interface. This is needed to unblock tunneling offloads in userspace datapath with DPDK flow API. Acked-by: Eli Britstein <elibr@mellanox.com> Acked-by: Roni Bar Yanai <roniba@mellanox.com> Acked-by: Ophir Munk <ophirmu@mellanox.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2020-07-08 19:07:21 +02:00
Vishal Deep Ajmera	9df65060cf	userspace: Avoid dp_hash recirculation for balance-tcp bond mode. Problem: In OVS, flows with output over a bond interface of type “balance-tcp” gets translated by the ofproto layer into "HASH" and "RECIRC" datapath actions. After recirculation, the packet is forwarded to the bond member port based on 8-bits of the datapath hash value computed through dp_hash. This causes performance degradation in the following ways: 1. The recirculation of the packet implies another lookup of the packet’s flow key in the exact match cache (EMC) and potentially Megaflow classifier (DPCLS). This is the biggest cost factor. 2. The recirculated packets have a new “RSS” hash and compete with the original packets for the scarce number of EMC slots. This implies more EMC misses and potentially EMC thrashing causing costly DPCLS lookups. 3. The 256 extra megaflow entries per bond for dp_hash bond selection put additional load on the revalidation threads. Owing to this performance degradation, deployments stick to “balance-slb” bond mode even though it does not do active-active load balancing for VXLAN- and GRE-tunnelled traffic because all tunnel packet have the same source MAC address. Proposed optimization: This proposal introduces a new load-balancing output action instead of recirculation. Maintain one table per-bond (could just be an array of uint16's) and program it the same way internal flows are created today for each possible hash value (256 entries) from ofproto layer. Use this table to load-balance flows as part of output action processing. Currently xlate_normal() -> output_normal() -> bond_update_post_recirc_rules() -> bond_may_recirc() and compose_output_action__() generate 'dp_hash(hash_l4(0))' and 'recirc(<RecircID>)' actions. In this case the RecircID identifies the bond. For the recirculated packets the ofproto layer installs megaflow entries that match on RecircID and masked dp_hash and send them to the corresponding output port. Instead, we will now generate action as 'lb_output(<bond id>)' This combines hash computation (only if needed, else re-use RSS hash) and inline load-balancing over the bond. This action is used only for balance-tcp bonds in userspace datapath (the OVS kernel datapath remains unchanged). Example: Current scheme: With 8 UDP flows (with random UDP src port): flow-dump from pmd on cpu core: 2 recirc_id(0),in_port(7),<...> actions:hash(hash_l4(0)),recirc(0x1) recirc_id(0x1),dp_hash(0xf8e02b7e/0xff),<...> actions:2 recirc_id(0x1),dp_hash(0xb236c260/0xff),<...> actions:1 recirc_id(0x1),dp_hash(0x7d89eb18/0xff),<...> actions:1 recirc_id(0x1),dp_hash(0xa78d75df/0xff),<...> actions:2 recirc_id(0x1),dp_hash(0xb58d846f/0xff),<...> actions:2 recirc_id(0x1),dp_hash(0x24534406/0xff),<...> actions:1 recirc_id(0x1),dp_hash(0x3cf32550/0xff),<...> actions:1 New scheme: We can do with a single flow entry (for any number of new flows): in_port(7),<...> actions:lb_output(1) A new CLI has been added to dump datapath bond cache as given below. # ovs-appctl dpif-netdev/bond-show [dp] Bond cache: bond-id 1 : bucket 0 - slave 2 bucket 1 - slave 1 bucket 2 - slave 2 bucket 3 - slave 1 Co-authored-by: Manohar Krishnappa Chidambaraswamy <manukc@gmail.com> Signed-off-by: Manohar Krishnappa Chidambaraswamy <manukc@gmail.com> Signed-off-by: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com> Tested-by: Matteo Croce <mcroce@redhat.com> Tested-by: Adrian Moreno <amorenoz@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2020-06-22 13:11:51 +02:00
Ilya Maximets	342b8904ab	dpif: Fix dp_extra_info leak by reworking the allocation scheme. dpctl module leaks the 'dp_extra_info' in case the dumped flow doesn't fit the dump filter while executing dpctl/dump-flows and also while executing dpctl/get-flow. This is already a 3rd attempt to fix all the leaks and incorrect usage of this string that definitely indicates poor initial design of the feature. Flow dump/get documentation clearly states that the caller does not own the data provided in dpif_flow. Datapath still owns all the data and promises to not free/modify it until the next quiescent period, however we're requesting the caller to free 'dp_extra_info' and this obviously breaks the rules. This patch fixes the issue by by storing 'dp_extra_info' within 'struct dp_netdev_flow' making datapath to own it. 'dp_netdev_flow' is RCU-protected, so it will be valid until the next quiescent period. Fixes: `0e8f5c6a38` ("dpif-netdev: Modified ovs-appctl dpctl/dump-flows command") Tested-by: Emma Finn <emma.finn@intel.com> Acked-by: Emma Finn <emma.finn@intel.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2020-01-27 21:20:01 +01:00
Ilya Maximets	d7b55c5c94	dpif: Fix leak and usage of uninitialized dp_extra_info. 'dpif_probe_feature'/'revalidate' doesn't free the 'dp_extra_info' string. Also, all the implementations of dpif_flow_get() should initialize the value to avoid printing/freeing of random memory. 30 bytes in 1 blocks are definitely lost in loss record 323 of 889 at 0x483AD19: realloc (vg_replace_malloc.c:836) by 0xDDAD89: xrealloc (util.c:149) by 0xCE1609: ds_reserve (dynamic-string.c:63) by 0xCE1A90: ds_put_format_valist (dynamic-string.c:161) by 0xCE19B9: ds_put_format (dynamic-string.c:142) by 0xCCCEA9: dp_netdev_flow_to_dpif_flow (dpif-netdev.c:3170) by 0xCCD2DD: dpif_netdev_flow_get (dpif-netdev.c:3278) by 0xCCEA0A: dpif_netdev_operate (dpif-netdev.c:3868) by 0xCDF81B: dpif_operate (dpif.c:1361) by 0xCDEE93: dpif_flow_get (dpif.c:1002) by 0xCDECF9: dpif_probe_feature (dpif.c:962) by 0xC635D2: check_recirc (ofproto-dpif.c:896) by 0xC65C02: check_support (ofproto-dpif.c:1567) by 0xC63274: open_dpif_backer (ofproto-dpif.c:818) by 0xC65E3E: construct (ofproto-dpif.c:1605) by 0xC4D436: ofproto_create (ofproto.c:549) by 0xC3931A: bridge_reconfigure (bridge.c:877) by 0xC3FEAC: bridge_run (bridge.c:3324) by 0xC4551D: main (ovs-vswitchd.c:127) CC: Emma Finn <emma.finn@intel.com> Fixes: `0e8f5c6a38` ("dpif-netdev: Modified ovs-appctl dpctl/dump-flows command") Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Roi Dayan <roid@mellanox.com>	2020-01-20 17:51:16 +01:00
Ilya Maximets	7a5e0ee7cc	dpif: Turn dpif_flow_hash function into generic odp_flow_key_hash. Current implementation of dpif_flow_hash() doesn't depend on datapath interface and only complicates the callers by forcing them to figure out what is their current 'dpif'. If we'll need different hashing for different 'dpif's we'll implement an API for dpif-providers and each dpif implementation will be able to use their local function directly without calling it via dpif API. This change will allow us to not store 'dpif' pointer in the userspace datapath implementation which is broken and will be removed in next commits. This patch moves dpif_flow_hash() to odp-util module and replaces unused odp_flow_key_hash() by it, along with removing of unused 'dpif' argument. Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>	2020-01-08 16:02:37 +01:00
Anju Thomas	a13a020975	userspace: Improved packet drop statistics. Currently OVS maintains explicit packet drop/error counters only on port level. Packets that are dropped as part of normal OpenFlow processing are counted in flow stats of “drop” flows or as table misses in table stats. These can only be interpreted by controllers that know the semantics of the configured OpenFlow pipeline. Without that knowledge, it is impossible for an OVS user to obtain e.g. the total number of packets dropped due to OpenFlow rules. Furthermore, there are numerous other reasons for which packets can be dropped by OVS slow path that are not related to the OpenFlow pipeline. The generated datapath flow entries include a drop action to avoid further expensive upcalls to the slow path, but subsequent packets dropped by the datapath are not accounted anywhere. Finally, the datapath itself drops packets in certain error situations. Also, these drops are today not accounted for.This makes it difficult for OVS users to monitor packet drop in an OVS instance and to alert a management system in case of a unexpected increase of such drops. Also OVS trouble-shooters face difficulties in analysing packet drops. With this patch we implement following changes to address the issues mentioned above. 1. Identify and account all the silent packet drop scenarios 2. Display these drops in ovs-appctl coverage/show Co-authored-by: Rohith Basavaraja <rohith.basavaraja@gmail.com> Co-authored-by: Keshav Gupta <keshugupta1@gmail.com> Signed-off-by: Anju Thomas <anju.thomas@ericsson.com> Signed-off-by: Rohith Basavaraja <rohith.basavaraja@gmail.com> Signed-off-by: Keshav Gupta <keshugupta1@gmail.com> Acked-by: Eelco Chaudron <echaudro@redhat.com Acked-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2020-01-07 17:01:42 +01:00
Paul Blakey	dcdcad68c6	dpif: Add support to set user features This enables user features on the kernel datapath via the DP_CMD_SET command, and also retrieves them to check for actual support and not just an older kernel ignoring the requested features. This will be used in next patch to enable recirc_id sharing with tc. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2019-12-22 11:54:40 +01:00
Ilya Maximets	f87c135706	vswitchd: Always cleanup userspace datapath. 'netdev' datapath is implemented within ovs-vswitchd process and can not exist without it, so it should be gracefully terminated with a full cleanup of resources upon ovs-vswitchd exit. This change forces dpif cleanup for 'netdev' datapath regardless of passing '--cleanup' to 'ovs-appctl exit'. Such solution allowes to not pass this additional option everytime for userspace datapath installations and also allowes to not terminate system datapath in setups where both datapaths runs at the same time. The main part is that dpif_port_del() will lead to netdev_close() and subsequent netdev_class->destroy(dev) which will stop HW NICs and free their resources. For vhost-user interfaces it will invoke vhost driver unregistering with a properly closed vhost-user connection. For upcoming AF_XDP netdev this will allow to gracefully destroy xdp sockets and unload xdp programs from linux interfaces. Another important thing is that port deletion will also trigger flushing of flows offloaded to HW NICs. Exception made for 'internal' ports that could have user ip/route configuration. These ports will not be removed without '--cleanup'. This change fixes OVS disappearing from the DPDK point of view (keeping HW NICs improperly configured, sudden closing of vhost-user connections) and will help with linux devices clearing with upcoming AF_XDP netdev support. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Tested-by: William Tu <u9012063@gmail.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Ben Pfaff <blp@ovn.org>	2019-07-02 12:24:47 +03:00
Numan Siddique	5b34f8fc3b	Add a new OVS action check_pkt_larger This patch adds a new action 'check_pkt_larger' which checks if the packet is larger than the given size and stores the result in the destination register. Usage: check_pkt_larger(len)->REGISTER Eg. match=...,actions=check_pkt_larger(1442)->NXM_NX_REG0[0],next; This patch makes use of the new datapath action - 'check_pkt_len' which was recently added in the commit [1]. At the start of ovs-vswitchd, datapath is probed for this action. If the datapath action is present, then 'check_pkt_larger' makes use of this datapath action. Datapath action 'check_pkt_len' takes these nlattrs * OVS_CHECK_PKT_LEN_ATTR_PKT_LEN - 'pkt_len' to check for * OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER (optional) - Nested actions to apply if the packet length is greater than the specified 'pkt_len' * OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL (optional) - Nested actions to apply if the packet length is lesser or equal to the specified 'pkt_len'. Let's say we have these flows added to an OVS bridge br-int table=0, priority=100 in_port=1,ip,actions=check_pkt_larger:100->NXM_NX_REG0[0],resubmit(,1) table=1, priority=200,in_port=1,ip,reg0=0x1/0x1 actions=output:3 table=1, priority=100,in_port=1,ip,actions=output:4 Then the action 'check_pkt_larger' will be translated as - check_pkt_len(size=100,gt(3),le(4)) datapath will check the packet length and if the packet length is greater than 100, it will output to port 3, else it will output to port 4. In case, datapath doesn't support 'check_pkt_len' action, the OVS action 'check_pkt_larger' sets SLOW_ACTION so that datapath flow is not added. This OVS action is intended to be used by OVN to check the packet length and generate an ICMP packet with type 3, code 4 and next hop mtu in the logical router pipeline if the MTU of the physical interface is lesser than the packet length. More information can be found here [2] [1] - `4d5ec89fc8` [2] - https://mail.openvswitch.org/pipermail/ovs-discuss/2018-July/047039.html Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2018-July/047039.html Suggested-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Numan Siddique <nusiddiq@redhat.com> CC: Ben Pfaff <blp@ovn.org> CC: Gregory Rose <gvrose8192@gmail.com> Acked-by: Mark Michelson <mmichels@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2019-04-22 12:56:50 -07:00
John Hurley	608ff46aaf	ovs-tc: offload datapath rules matching on internal ports Rules applied to OvS internal ports are not represented in TC datapaths. However, it is possible to support rules matching on internal ports in TC. The start_xmit ndo of OvS internal ports directs packets back into the OvS kernel datapath where they are rematched with the ingress port now being that of the internal port. Due to this, rules matching on an internal port can be added as TC filters to an egress qdisc for these ports. Allow rules applied to internal ports to be offloaded to TC as egress filters. Rules redirecting to an internal port are also offloaded. These are supported by the redirect ingress functionality applied in an earlier patch. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2019-04-10 13:55:59 +02:00
Flavio Leitner	9da3207af7	Revert "ofproto-dpif: Let the dpif report when a port is a duplicate." This reverts commit `7521e0cf9e`. This patch introduced a regression in OSP environments using internal ports in other netns. Their networking configuration is lost when the service is restarted because the ports are recreated now. Before the patch it checked using netlink if the port with a specific "name" was already there. The check is a lookup in all ports attached to the DP regardless of the port's netns. After the patch it relies on the kernel to identify that situation. Unfortunately the only protection there is register_netdevice() which fails only if the port with that name exists in the current netns. If the port is in another netns, it will get a new dp_port and because of that userspace will delete the old port. At this point the original port is gone from the other netns and there a fresh port in the current netns. Signed-off-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2019-01-25 13:20:13 -08:00
Ilya Maximets	1270b6e52c	treewide: Wider use of packet batch APIs. This patch replaces most of direct accesses to the dp_packet_batch internal components by appropriate APIs. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-12-11 20:18:26 +00:00
Sriharsha Basavapatna	e6530a8d5d	dpif: Restore a few lines with form feed characters A few lines with form feed characters (ASCII: ^L) were accidentally deleted by a recent commit to support rebalancing of offloaded flows. This patch reverts those lines. Fixes: `57924fc91c` ("revalidator: Rebalance offloaded flows") Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2018-10-31 13:30:39 -07:00
Sriharsha Basavapatna via dev	57924fc91c	revalidator: Rebalance offloaded flows based on the pps rate This is the third patch in the patch-set to support dynamic rebalancing of offloaded flows. The dynamic rebalancing functionality is implemented in this patch. The ukeys that are not scheduled for deletion are obtained and passed as input to the rebalancing routine. The rebalancing is done in the context of revalidation leader thread, after all other revalidator threads are done with gathering rebalancing data for flows. For each netdev that is in OOR state, a list of flows - both offloaded and non-offloaded (pending) - is obtained using the ukeys. For each netdev that is in OOR state, the flows are grouped and sorted into offloaded and pending flows. The offloaded flows are sorted in descending order of pps-rate, while pending flows are sorted in ascending order of pps-rate. The rebalancing is done in two phases. In the first phase, we try to offload all pending flows and if that succeeds, the OOR state on the device is cleared. If some (or none) of the pending flows could not be offloaded, then we start replacing an offloaded flow that has a lower pps-rate than a pending flow, until there are no more pending flows with a higher rate than an offloaded flow. The flows that are replaced from the device are added into kernel datapath. A new OVS configuration parameter "offload-rebalance", is added to ovsdb. The default value of this is "false". To enable this feature, set the value of this parameter to "true", which provides packets-per-second rate based policy to dynamically offload and un-offload flows. Note: This option can be enabled only when 'hw-offload' policy is enabled. It also requires 'tc-policy' to be set to 'skip_sw'; otherwise, flow offload errors (specifically ENOSPC error this feature depends on) reported by an offloaded device are supressed by TC-Flower kernel module. Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Co-authored-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com> Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com> Reviewed-by: Sathya Perla <sathya.perla@broadcom.com> Reviewed-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2018-10-19 11:27:52 +02:00
Ben Pfaff	769b50349f	dpif: Remove support for multiple queues per port. Commit `69c51582ff` ("dpif-netlink: don't allocate per thread netlink sockets") removed dpif-netlink support for multiple queues per port. No remaining dpif provider supports multiple queues per port, so remove infrastructure for the feature. CC: Matteo Croce <mcroce@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Tested-by: Yifeng Sun <pkusunyifeng@gmail.com> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>	2018-09-26 15:57:06 -07:00
Gavi Teitz	a692410af0	dpctl: Expand the flow dump type filter Added new types to the flow dump filter, and allowed multiple filter types to be passed at once, as a comma separated list. The new types added are: * tc - specifies flows handled by the tc dp * non-offloaded - specifies flows not offloaded to the HW * all - specifies flows of all types The type list is now fully parsed by the dpctl, and a new struct was added to dpif which enables dpctl to define which types of dumps to provide, rather than passing the type string and having dpif parse it. Signed-off-by: Gavi Teitz <gavi@mellanox.com> Acked-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2018-09-13 16:56:25 +02:00
Justin Pettit	8101f03fcd	dpif: Don't pass in '*meter_id' to meter_set commands. The original intent of the API appears to be that the underlying DPIF implementaion would choose a local meter id. However, neither of the existing datapath meter implementations (userspace or Linux) implemented that; they expected a valid meter id to be passed in, otherwise they returned an error. This commit follows the existing implementations and makes the API somewhat cleaner. Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>	2018-08-16 10:20:52 -07:00
Justin Pettit	6508c845ad	dpif: Move common meter checks into the dpif layer. Another dpif provider will soon add support for meters, so move some of the common sanity checks up into the dpif layer so that each provider doesn't need to re-implement them. Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>	2018-07-30 13:00:49 -07:00
Justin Pettit	494a74557a	Revert "dpctl: Expand the flow dump type filter" Commit `ab15e70eb5` ("dpctl: Expand the flow dump type filter") had a number of issues with style, build breakage, and failing unit tests. The patch is being reverted so that they can addressed. This reverts commit `ab15e70eb5`. CC: Gavi Teitz <gavi@mellanox.com> CC: Simon Horman <simon.horman@netronome.com> CC: Roi Dayan <roid@mellanox.com> CC: Aaron Conole <aconole@redhat.com> Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>	2018-07-25 14:17:36 -07:00
Gavi Teitz	ab15e70eb5	dpctl: Expand the flow dump type filter Added new types to the flow dump filter, and allowed multiple filter types to be passed at once, as a comma separated list. The new types added are: * tc - specifies flows handled by the tc dp * non-offloaded - specifies flows not offloaded to the HW * all - specifies flows of all types The type list is now fully parsed by the dpctl, and a new struct was added to dpif which enables dpctl to define which types of dumps to provide, rather than passing the type string and having dpif parse it. Signed-off-by: Gavi Teitz <gavi@mellanox.com> Acked-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2018-07-25 18:16:27 +02:00
Ben Pfaff	7521e0cf9e	ofproto-dpif: Let the dpif report when a port is a duplicate. The port_add() function checks whether the port about to be added to the dpif is already present and adds it only if it is not. This duplicates a check also present (and necessary) in each dpif and races with it as well. When a dpif has a large number of ports, the check can be expensive (it is not efficiently implemented). It would be nice to made the check cheaper, but it also seems reasonable to do as done in this patch and just let the dpif report the duplication. Reported-by: Haifeng Lin <haifeng.lin@huawei.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org>	2018-07-05 14:59:53 -07:00
Ben Pfaff	5a0e4aec1a	treewide: Convert leading tabs to spaces. It's always been OVS coding style to use spaces rather than tabs for indentation, but some tabs have snuck in over time. This commit converts them to spaces. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org>	2018-06-11 15:32:00 -07:00
Ben Pfaff	fa37affad3	Embrace anonymous unions. Several OVS structs contain embedded named unions, like this: struct { ... union { ... } u; }; C11 standardized a feature that many compilers already implemented anyway, where an embedded union may be unnamed, like this: struct { ... union { ... }; }; This is more convenient because it allows the programmer to omit "u." in many places. OVS already used this feature in several places. This commit embraces it in several others. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org> Tested-by: Alin Gabriel Serdean <aserdean@ovn.org> Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>	2018-05-25 13:36:05 -07:00
Darrell Ball	7d7ded7af7	odp-execute: Rename 'may_steal' to 'should_steal'. Signed-off-by: Darrell Ball <dlu998@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2018-05-23 11:36:47 -07:00
William Tu	c6d8720137	tunnel: make tun_key_to_attr aware of tunnel type. When there is a flow rule which forwards a packet from geneve port to another tunnel port, ex: gre, the tun_metadata carried from the geneve port might affect the outgoing port. For example, the datapath action from geneve port output to gre port (1) shows: set(tunnel(tun_id=0x7b,dst=2.2.2.2,ttl=64, geneve({class=0xffff,type=0,len=4,0x123}),flags(df\|key))),1 Where the geneve(...) should not exist. When using kernel's tunnel port, this triggers an error saying: "Multiple metadata blocks provided", when there is a rule forwarding the geneve packet to vxlan/erspan tunnel port. A userspace test case using geneve and gre also demonstrates the issue. The patch makes the tun_key_to_attr aware of the tunnel type. So only the relevant output tunnel's options are set. Reported-by: Xiaoyan Jin <xiaoyanj@vmware.com> Signed-off-by: William Tu <u9012063@gmail.com> Cc: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2018-05-14 16:21:03 -07:00
Ben Pfaff	0d71302e36	ofp-util, ofp-parse: Break up into many separate modules. ofp-util had been far too large and monolithic for a long time. This commit breaks it up into units that make some logical sense. It also moves the pieces of ofp-parse that were specific to each unit into the relevant unit. Most of this commit is just moving code around. Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>	2018-02-13 10:43:13 -08:00
Eric Garver	1fe178d251	dpif: Add support for OVS_ACTION_ATTR_CT_CLEAR This supports using the ct_clear action in the kernel datapath. To preserve compatibility with current ct_clear behavior on old kernels, we only pass this action down to the datapath if a probe reveals the datapath actually supports it. Signed-off-by: Eric Garver <e@erig.me> Acked-by: William Tu <u9012063@gmail.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Justin Pettit <jpettit@ovn.org>	2018-01-20 11:16:37 -08:00
Yi Yang	f59cb331c4	nsh: rework NSH netlink keys and actions This patch changes OVS_KEY_ATTR_NSH to nested attribute and adds three new NSH sub attribute keys: OVS_NSH_KEY_ATTR_BASE: for length-fixed NSH base header OVS_NSH_KEY_ATTR_MD1: for length-fixed MD type 1 context OVS_NSH_KEY_ATTR_MD2: for length-variable MD type 2 metadata Its intention is to align to NSH kernel implementation. NSH match fields, set and PUSH_NSH action all use the below nested attribute format: OVS_KEY_ATTR_NSH begin OVS_NSH_KEY_ATTR_BASE OVS_NSH_KEY_ATTR_MD1 OVS_KEY_ATTR_NSH end or OVS_KEY_ATTR_NSH begin OVS_NSH_KEY_ATTR_BASE OVS_NSH_KEY_ATTR_MD2 OVS_KEY_ATTR_NSH end In addition, NSH encap and decap actions are renamed as push_nsh and pop_nsh to meet action naming convention. Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2018-01-08 13:19:14 -08:00
Yifeng Sun	962044bf24	dpif: Fix memory leak Valgrind complains in test 2322 (ovn -- 3 HVs, 3 LS, 3 lports/LS, 1 LR): 31,584 (26,496 direct, 5,088 indirect) bytes in 48 blocks are definitely lost in loss record 422 of 427 by 0x5165F4: xmalloc (util.c:120) by 0x466194: dp_packet_new (dp-packet.c:138) by 0x466194: dp_packet_new_with_headroom (dp-packet.c:148) by 0x46621B: dp_packet_clone_data_with_headroom (dp-packet.c:210) by 0x46621B: dp_packet_clone_with_headroom (dp-packet.c:170) by 0x49DD46: dp_packet_batch_clone (dp-packet.h:789) by 0x49DD46: odp_execute_clone (odp-execute.c:616) by 0x49DD46: odp_execute_actions (odp-execute.c:795) by 0x471663: dpif_execute_with_help (dpif.c:1296) by 0x473795: dpif_operate (dpif.c:1411) by 0x473E20: dpif_execute.part.21 (dpif.c:1320) by 0x428D38: packet_execute (ofproto-dpif.c:4682) by 0x41EB51: ofproto_packet_out_finish (ofproto.c:3540) by 0x41EB51: handle_packet_out (ofproto.c:3581) by 0x4233DA: handle_openflow__ (ofproto.c:8044) by 0x4233DA: handle_openflow (ofproto.c:8219) by 0x4514AA: ofconn_run (connmgr.c:1437) by 0x4514AA: connmgr_run (connmgr.c:363) by 0x41C8B5: ofproto_run (ofproto.c:1813) by 0x40B103: bridge_run__ (bridge.c:2919) by 0x4103B3: bridge_run (bridge.c:2977) by 0x406F14: main (ovs-vswitchd.c:119) the parameter dp_packet_batch is leaked when 'may_steal' is true. When dpif_execute_helper_cb is passed with a true 'may_steal', it is supposed to take the ownership of dp_packet_batch and release it when done. Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2017-11-29 13:59:15 -08:00
Ashish Varma	97459c2f01	netdev, dpif: fix the crash/assert on port delete a crash is seen in "netdev_ports_remove" when an interface is deleted and added back in the system and when the interface is part of a bridge configuration. e.g. steps: create a tap0 interface using "ip tuntap add.." add the tap0 interface to br0 using "ovs-vsctl add-port.." delete the tap0 interface from system using "ip tuntap del.." add the tap0 interface back in system using "ip tuntap add.." (this changes the ifindex of the interface) delete tap0 from br0 using "ovs-vsctl del-port.." In the function "netdev_ports_insert", two hmap entries were created for mapping "portnum -> netdev" and "ifindex -> portnum". When the interface is deleted from the system, the "netdev_ports_remove" function is not getting called and the old ifindex entry is not getting cleaned up from the "ifindex_to_port" hmap. As part of the fix, added function "dpif_port_remove" which will call "netdev_ports_remove" in the path where the interface deletion from the system is detected. Also, in "netdev_ports_remove", added the code where the "ifindex_to_port_data" (ifindex -> portnum map node) is getting freed when the ifindex is not available any more. (as the interface is already deleted.) VMware-BZ: #1975788 Signed-off-by: Ashish Varma <ashishvarma.ovs@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2017-11-13 11:05:31 -08:00
Xiao Liang	fd016ae3fb	lib: Move lib/poll-loop.h to include/openvswitch Poll-loop is the core to implement main loop. It should be available in libopenvswitch. Signed-off-by: Xiao Liang <shaw.leon@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2017-11-03 10:47:55 -07:00
Kaige Fu	a0439037dd	dpif: Remove duplicated word in comment for dpif_recv() Signed-off-by: Kaige Fu <fukaige@huawei.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Greg Rose <gvrose8192@gmail.com>	2017-10-30 16:47:20 -07:00
Roi Dayan	71d90be10d	dpif: Fix cleanup of netdev_ports map Executing dpctl commands from userspace also calls to dpif_open()/dpif_close() but not really creating another dpif but using a clone. As for netdev_ports map is global we avoid adding duplicate entries but also need to make sure we are not removing needed entries. With this commit we make sure only the last dpif close should clean the netdev_ports map. Fixes: `6595cb95a4` ("dpif: Clean up netdev_ports map on dpif_close().") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Joe Stringer <joe@ovn.org>	2017-08-18 13:36:40 -07:00
Joe Stringer	6595cb95a4	dpif: Clean up netdev_ports map on dpif_close(). Commit 32b77c316d9982("dpif: Save added ports in a port map.") introduced tracking of all dpif ports by taking a reference on each available netdev when the dpif is opened, but it failed to clear out and release references to these netdevs when the dpif is closed. One of the problems introduced by this was that upon clean exit of ovs-vswitchd via "ovs-appctl exit --cleanup", the "ovs-netdev" device was not deleted. This which could cause problems in subsequent start up. Commit `5119e258da` ("dpif: Fix cleanup of userspace datapath.") fixed this particular problem by not adding such devices to the netdev_ports map, but the referencing/unreferencing upon dpif_open()/dpif_close() is still not balanced. Balance the referencing of netdevs by clearing these during dpif_close(). Fixes: 32b77c316d9982("dpif: Save added ports in a port map.") Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Andy Zhou <azhou@ovn.org>	2017-08-09 15:57:21 -07:00
Jan Scheurich	1fc11c5948	Generic encap and decap support for NSH This commit adds translation and netdev datapath support for generic encap and decap actions for the NSH MD1 header. The generic encap and decap actions are mapped to specific encap_nsh and decap_nsh actions in the datapath. The translation follows that general scheme that decap() of an NSH packet triggers recirculation after decapsulation, while encap(nsh) just modifies struct flow and sets the ctx->pending_encap flag to generate the encap_nsh action at the next commit to be able to include subsequent set_field actions for NSH headers. Support for the flexible MD2 format using TLV properties is foreseen in encap(nsh), but not yet fully implemented. The CLI syntax for encap of NSH is encap(nsh(md_type=1)) encap(nsh(md_type=2[,tlv(<tlv_class>,<tlv_type>,<hex_string>),...])) Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2017-08-07 11:26:17 -07:00
Roi Dayan	dfaf79ddd9	dpif: Refactor obj type from void pointer to dpif_class It's basically what is being passed today and passing a specific type adds a compiler type check. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2017-07-27 10:17:46 +02:00
Marcelo Leitner	57459d0de0	dpif: fix warn msg when failed to open netdev Currently it is using the datapath name/type but what has actually failed was the netdev. Fix it by using netdev name/type instead and also log why it failed. Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: Joe Stringer <joe@ovn.org>	2017-07-05 02:34:52 -07:00
Darrell Ball	5119e258da	dpif: Fix cleanup of userspace datapath. Hardware offload introduced extra tracking of netdev ports. This included ovs-netdev, which is really for internal infra usage for the userpace datapath. This breaks cleanup of the userspace datapath. One effect is that all userspace datapath system tests fail except for the first one run. There is no need to do this extra tracking of tap devices for the hardware offload effort. Hence, the approach taken is to filter both internal device and tap device types for hardware offload. Internal devices are 'internal' from the kernel datapath perspective and tap devices are 'internal' from the userpace datapath perspective. Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Darrell Ball <dlu998@gmail.com> Signed-off-by: Joe Stringer <joe@ovn.org>	2017-06-27 11:07:09 -07:00
Ben Pfaff	0722f34109	odp-util: Use port names in output in more places. Until now, ODP output only showed port names for in_port matches. This commit shows them in other places port numbers appear. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Jan Scheurich <jan.scheurich@ericsson.com> Tested-by: Jan Scheurich <jan.scheurich@ericsson.com>	2017-06-23 16:28:42 +08:00
Roi Dayan	eff1e5b091	dpif: Refactor flow logging functions to be used by other modules To be reused by other modules. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2017-06-15 11:57:36 +02:00
Paul Blakey	7e8b719938	dpctl: Add an option to dump only certain kinds of flows Usage: # to dump all datapath flows (default): ovs-dpctl dump-flows # to dump only flows that in kernel datapath: ovs-dpctl dump-flows type=ovs # to dump only flows that are offloaded: ovs-dpctl dump-flows type=offloaded Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2017-06-15 11:53:06 +02:00
Paul Blakey	32b77c316d	dpif: Save added ports in a port map for netdev flow api use To use netdev flow offloading api, dpifs needs to iterate over added ports. This addition inserts the added dpif ports in a hash map, The map will also be used to translate dpif ports to netdevs. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2017-06-15 11:39:41 +02:00
Jan Scheurich	beb75a40fd	userspace: Switching of L3 packets in L2 pipeline Ports have a new layer3 attribute if they send/receive L3 packets. The packet_type included in structs dp_packet and flow is considered in ofproto-dpif. The classical L2 match fields (dl_src, dl_dst, dl_type, and vlan_tci, vlan_vid, vlan_pcp) now have Ethernet as pre-requisite. A dummy ethernet header is pushed to L3 packets received from L3 ports before the the pipeline processing starts. The ethernet header is popped before sending a packet to a L3 port. For datapath ports that can receive L2 or L3 packets, the packet_type becomes part of the flow key for datapath flows and is handled appropriately in dpif-netdev. In the 'else' branch in flow_put_on_pmd() function, the additional check flow_equal(&match.flow, &netdev_flow->flow) was removed, as a) the dpcls lookup is sufficient to uniquely identify a flow and b) it caused false negatives because the flow in netdev->flow may not properly masked. In dpif_netdev_flow_put() we now use the same method for constructing the netdev_flow_key as the one used when adding the flow to the dplcs to make sure these always match. The function netdev_flow_key_from_flow() used so far was not only inefficient but sometimes caused mismatches and subsequent flow update failures. The kernel datapath does not support the packet_type match field. Instead it encodes the packet type implictly by the presence or absence of the Ethernet attribute in the flow key and mask. This patch filters the PACKET_TYPE attribute out of netlink flow key and mask to be sent to the kernel datapath. Signed-off-by: Lorand Jakab <lojakab@cisco.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Co-authored-by: Zoltan Balogh <zoltan.balogh@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2017-06-02 10:15:20 -07:00
Jan Scheurich	2482b0b0c8	userspace: Add packet_type in dp_packet and flow This commit adds a packet_type attribute to the structs dp_packet and flow to explicitly carry the type of the packet as prepration for the introduction of the so-called packet type-aware pipeline (PTAP) in OVS. The packet_type is a big-endian 32 bit integer with the encoding as specified in OpenFlow verion 1.5. The upper 16 bits contain the packet type name space. Pre-defined values are defined in openflow-common.h: enum ofp_header_type_namespaces { OFPHTN_ONF = 0, /* ONF namespace. / OFPHTN_ETHERTYPE = 1, / ns_type is an Ethertype. / OFPHTN_IP_PROTO = 2, / ns_type is a IP protocol number. / OFPHTN_UDP_TCP_PORT = 3, / ns_type is a TCP or UDP port. / OFPHTN_IPV4_OPTION = 4, / ns_type is an IPv4 option number. */ }; The lower 16 bits specify the actual type in the context of the name space. Only name spaces 0 and 1 will be supported for now. For name space OFPHTN_ONF the relevant packet type is 0 (Ethernet). This is the default packet_type in OVS and the only one supported so far. Packets of type (OFPHTN_ONF, 0) are called Ethernet packets. In name space OFPHTN_ETHERTYPE the type is the Ethertype of the packet. A packet of type (OFPHTN_ETHERTYPE, <Ethertype>) is a standard L2 packet whith the Ethernet header (and any VLAN tags) removed to expose the L3 (or L2.5) payload of the packet. These will simply be called L3 packets. The Ethernet address fields dl_src and dl_dst in struct flow are not applicable for an L3 packet and must be zero. However, to maintain compatibility with the large code base, we have chosen to copy the Ethertype of an L3 packet into the the dl_type field of struct flow. This does not mean that it will be possible to match on dl_type for L3 packets with PTAP later on. Matching must be done on packet_type instead. New dp_packets are initialized with packet_type Ethernet. Ports that receive L3 packets will have to explicitly adjust the packet_type. Signed-off-by: Jean Tourrilhes <jt@labs.hpe.com> Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Co-authored-by: Zoltan Balogh <zoltan.balogh@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2017-05-03 16:56:40 -07:00
Jarno Rajahalme	b701bce9c7	dpif: Log packet metadata on execute. Debug log output for execute operations is missing the packet metadata, which can be instrumental in tracing what the datapath should be executing. No reason to not have the metadata on the debug output, so add it there. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>	2017-04-17 12:28:17 -07:00
Andy Zhou	b52ac6592f	ofproto: Probe for sample nesting level. Add logics to detect the max level of nesting allowed by the sample action implemented in the datapath. Future patch allows xlate code to generate different odp actions based on this information. Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>	2017-03-10 17:35:49 -08:00
Andy Zhou	bb71c96ef2	dpif: Refactor dpif_probe_feature() Allow actions to be part of the probe. No functional changes. Future patch will make use this new API. Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>	2017-03-10 17:35:26 -08:00
Jarno Rajahalme	076caa2fb0	ofproto: Meter translation. Translate OpenFlow METER instructions to datapath meter actions. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: Andy Zhou <azhou@ovn.org>	2017-03-08 13:09:44 -08:00
Jarno Rajahalme	5dddf96065	dpif: Meter framework. Add DPIF-level infrastructure for meters. Allow meter_set to modify the meter configuration (e.g. set the burst size if unspecified). Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: Andy Zhou <azhou@ovn.org>	2017-03-08 13:09:43 -08:00
Yang, Yi Y	6fcecb85ab	datapath: add Ethernet push and pop actions Upstream commit: commit 91820da6ae85904d95ed53bf3a83f9ec44a6b80a Author: Jiri Benc <jbenc@redhat.com> Date: Thu Nov 10 16:28:23 2016 +0100 openvswitch: add Ethernet push and pop actions It's not allowed to push Ethernet header in front of another Ethernet header. It's not allowed to pop Ethernet header if there's a vlan tag. This preserves the invariant that L3 packet never has a vlan tag. Based on previous versions by Lorand Jakab and Simon Horman. Signed-off-by: Lorand Jakab <lojakab@cisco.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> [Committer notes] Fix build with the upstream commit by folding in the required switch case enum handlers. Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Joe Stringer <joe@ovn.org>	2017-03-02 15:51:39 -08:00

1 2 3 4 5 ...

260 Commits