2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-31 14:25:26 +00:00
Commit Graph

260 Commits

Author SHA1 Message Date
Ilya Maximets
8842fdf1b3 netdev-offload: Use dpif type instead of class.
There is no real difference between the 'class' and 'type' in the
context of common lookup operations inside netdev-offload module
because it only checks the value of pointers without using the
value itself.  However, 'type' has some meaning and can be used by
offload provides on the initialization phase to check if this type
of Flow API in pair with the netdev type could be used in particular
datapath type.  For example, this is needed to check if Linux flow
API could be used for current tunneling vport because it could be
used only if tunneling vport belongs to system datapath, i.e. has
backing linux interface.

This is needed to unblock tunneling offloads in userspace datapath
with DPDK flow API.

Acked-by: Eli Britstein <elibr@mellanox.com>
Acked-by: Roni Bar Yanai <roniba@mellanox.com>
Acked-by: Ophir Munk <ophirmu@mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-07-08 19:07:21 +02:00
Vishal Deep Ajmera
9df65060cf userspace: Avoid dp_hash recirculation for balance-tcp bond mode.
Problem:

In OVS, flows with output over a bond interface of type “balance-tcp”
gets translated by the ofproto layer into "HASH" and "RECIRC" datapath
actions. After recirculation, the packet is forwarded to the bond
member port based on 8-bits of the datapath hash value computed through
dp_hash. This causes performance degradation in the following ways:

1. The recirculation of the packet implies another lookup of the
packet’s flow key in the exact match cache (EMC) and potentially
Megaflow classifier (DPCLS). This is the biggest cost factor.

2. The recirculated packets have a new “RSS” hash and compete with the
original packets for the scarce number of EMC slots. This implies more
EMC misses and potentially EMC thrashing causing costly DPCLS lookups.

3. The 256 extra megaflow entries per bond for dp_hash bond selection
put additional load on the revalidation threads.

Owing to this performance degradation, deployments stick to “balance-slb”
bond mode even though it does not do active-active load balancing for
VXLAN- and GRE-tunnelled traffic because all tunnel packet have the
same source MAC address.

Proposed optimization:

This proposal introduces a new load-balancing output action instead of
recirculation.

Maintain one table per-bond (could just be an array of uint16's) and
program it the same way internal flows are created today for each
possible hash value (256 entries) from ofproto layer. Use this table to
load-balance flows as part of output action processing.

Currently xlate_normal() -> output_normal() ->
bond_update_post_recirc_rules() -> bond_may_recirc() and
compose_output_action__() generate 'dp_hash(hash_l4(0))' and
'recirc(<RecircID>)' actions. In this case the RecircID identifies the
bond. For the recirculated packets the ofproto layer installs megaflow
entries that match on RecircID and masked dp_hash and send them to the
corresponding output port.

Instead, we will now generate action as
    'lb_output(<bond id>)'

This combines hash computation (only if needed, else re-use RSS hash)
and inline load-balancing over the bond. This action is used *only* for
balance-tcp bonds in userspace datapath (the OVS kernel datapath
remains unchanged).

Example:
Current scheme:

With 8 UDP flows (with random UDP src port):

  flow-dump from pmd on cpu core: 2
  recirc_id(0),in_port(7),<...> actions:hash(hash_l4(0)),recirc(0x1)

  recirc_id(0x1),dp_hash(0xf8e02b7e/0xff),<...> actions:2
  recirc_id(0x1),dp_hash(0xb236c260/0xff),<...> actions:1
  recirc_id(0x1),dp_hash(0x7d89eb18/0xff),<...> actions:1
  recirc_id(0x1),dp_hash(0xa78d75df/0xff),<...> actions:2
  recirc_id(0x1),dp_hash(0xb58d846f/0xff),<...> actions:2
  recirc_id(0x1),dp_hash(0x24534406/0xff),<...> actions:1
  recirc_id(0x1),dp_hash(0x3cf32550/0xff),<...> actions:1

New scheme:
We can do with a single flow entry (for any number of new flows):

  in_port(7),<...> actions:lb_output(1)

A new CLI has been added to dump datapath bond cache as given below.

 # ovs-appctl dpif-netdev/bond-show [dp]

   Bond cache:
     bond-id 1 :
       bucket 0 - slave 2
       bucket 1 - slave 1
       bucket 2 - slave 2
       bucket 3 - slave 1

Co-authored-by: Manohar Krishnappa Chidambaraswamy <manukc@gmail.com>
Signed-off-by: Manohar Krishnappa Chidambaraswamy <manukc@gmail.com>
Signed-off-by: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com>
Tested-by: Matteo Croce <mcroce@redhat.com>
Tested-by: Adrian Moreno <amorenoz@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-06-22 13:11:51 +02:00
Ilya Maximets
342b8904ab dpif: Fix dp_extra_info leak by reworking the allocation scheme.
dpctl module leaks the 'dp_extra_info' in case the dumped flow doesn't
fit the dump filter while executing dpctl/dump-flows and also while
executing dpctl/get-flow.

This is already a 3rd attempt to fix all the leaks and incorrect usage
of this string that definitely indicates poor initial design of the
feature.

Flow dump/get documentation clearly states that the caller does not own
the data provided in dpif_flow.  Datapath still owns all the data and
promises to not free/modify it until the next quiescent period, however
we're requesting the caller to free 'dp_extra_info' and this obviously
breaks the rules.

This patch fixes the issue by by storing 'dp_extra_info' within
'struct dp_netdev_flow' making datapath to own it.  'dp_netdev_flow'
is RCU-protected, so it will be valid until the next quiescent period.

Fixes: 0e8f5c6a38 ("dpif-netdev: Modified ovs-appctl dpctl/dump-flows command")
Tested-by: Emma Finn <emma.finn@intel.com>
Acked-by: Emma Finn <emma.finn@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-01-27 21:20:01 +01:00
Ilya Maximets
d7b55c5c94 dpif: Fix leak and usage of uninitialized dp_extra_info.
'dpif_probe_feature'/'revalidate' doesn't free the 'dp_extra_info'
string.  Also, all the implementations of dpif_flow_get() should
initialize the value to avoid printing/freeing of random memory.

 30 bytes in 1 blocks are definitely lost in loss record 323 of 889
    at 0x483AD19: realloc (vg_replace_malloc.c:836)
    by 0xDDAD89: xrealloc (util.c:149)
    by 0xCE1609: ds_reserve (dynamic-string.c:63)
    by 0xCE1A90: ds_put_format_valist (dynamic-string.c:161)
    by 0xCE19B9: ds_put_format (dynamic-string.c:142)
    by 0xCCCEA9: dp_netdev_flow_to_dpif_flow (dpif-netdev.c:3170)
    by 0xCCD2DD: dpif_netdev_flow_get (dpif-netdev.c:3278)
    by 0xCCEA0A: dpif_netdev_operate (dpif-netdev.c:3868)
    by 0xCDF81B: dpif_operate (dpif.c:1361)
    by 0xCDEE93: dpif_flow_get (dpif.c:1002)
    by 0xCDECF9: dpif_probe_feature (dpif.c:962)
    by 0xC635D2: check_recirc (ofproto-dpif.c:896)
    by 0xC65C02: check_support (ofproto-dpif.c:1567)
    by 0xC63274: open_dpif_backer (ofproto-dpif.c:818)
    by 0xC65E3E: construct (ofproto-dpif.c:1605)
    by 0xC4D436: ofproto_create (ofproto.c:549)
    by 0xC3931A: bridge_reconfigure (bridge.c:877)
    by 0xC3FEAC: bridge_run (bridge.c:3324)
    by 0xC4551D: main (ovs-vswitchd.c:127)

CC: Emma Finn <emma.finn@intel.com>
Fixes: 0e8f5c6a38 ("dpif-netdev: Modified ovs-appctl dpctl/dump-flows command")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Roi Dayan <roid@mellanox.com>
2020-01-20 17:51:16 +01:00
Ilya Maximets
7a5e0ee7cc dpif: Turn dpif_flow_hash function into generic odp_flow_key_hash.
Current implementation of dpif_flow_hash() doesn't depend on datapath
interface and only complicates the callers by forcing them to figure
out what is their current 'dpif'.  If we'll need different hashing
for different 'dpif's we'll implement an API for dpif-providers
and each dpif implementation will be able to use their local function
directly without calling it via dpif API.

This change will allow us to not store 'dpif' pointer in the userspace
datapath implementation which is broken and will be removed in next
commits.

This patch moves dpif_flow_hash() to odp-util module and replaces
unused odp_flow_key_hash() by it, along with removing of unused 'dpif'
argument.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
2020-01-08 16:02:37 +01:00
Anju Thomas
a13a020975 userspace: Improved packet drop statistics.
Currently OVS maintains explicit packet drop/error counters only on port
level.  Packets that are dropped as part of normal OpenFlow processing
are counted in flow stats of “drop” flows or as table misses in table
stats. These can only be interpreted by controllers that know the
semantics of the configured OpenFlow pipeline.  Without that knowledge,
it is impossible for an OVS user to obtain e.g. the total number of
packets dropped due to OpenFlow rules.

Furthermore, there are numerous other reasons for which packets can be
dropped by OVS slow path that are not related to the OpenFlow pipeline.
The generated datapath flow entries include a drop action to avoid
further expensive upcalls to the slow path, but subsequent packets
dropped by the datapath are not accounted anywhere.

Finally, the datapath itself drops packets in certain error situations.
Also, these drops are today not accounted for.This makes it difficult
for OVS users to monitor packet drop in an OVS instance and to alert a
management system in case of a unexpected increase of such drops.
Also OVS trouble-shooters face difficulties in analysing packet drops.

With this patch we implement following changes to address the issues
mentioned above.

1. Identify and account all the silent packet drop scenarios
2. Display these drops in ovs-appctl coverage/show

Co-authored-by: Rohith Basavaraja <rohith.basavaraja@gmail.com>
Co-authored-by: Keshav Gupta <keshugupta1@gmail.com>
Signed-off-by: Anju Thomas <anju.thomas@ericsson.com>
Signed-off-by: Rohith Basavaraja <rohith.basavaraja@gmail.com>
Signed-off-by: Keshav Gupta <keshugupta1@gmail.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com
Acked-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-01-07 17:01:42 +01:00
Paul Blakey
dcdcad68c6 dpif: Add support to set user features
This enables user features on the kernel datapath via the DP_CMD_SET
command, and also retrieves them to check for actual support and
not just an older kernel ignoring the requested features.

This will be used in next patch to enable recirc_id sharing with tc.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2019-12-22 11:54:40 +01:00
Ilya Maximets
f87c135706 vswitchd: Always cleanup userspace datapath.
'netdev' datapath is implemented within ovs-vswitchd process and can
not exist without it, so it should be gracefully terminated with a
full cleanup of resources upon ovs-vswitchd exit.

This change forces dpif cleanup for 'netdev' datapath regardless of
passing '--cleanup' to 'ovs-appctl exit'. Such solution allowes to
not pass this additional option everytime for userspace datapath
installations and also allowes to not terminate system datapath in
setups where both datapaths runs at the same time.

The main part is that dpif_port_del() will lead to netdev_close()
and subsequent netdev_class->destroy(dev) which will stop HW NICs
and free their resources. For vhost-user interfaces it will invoke
vhost driver unregistering with a properly closed vhost-user
connection. For upcoming AF_XDP netdev this will allow to gracefully
destroy xdp sockets and unload xdp programs from linux interfaces.
Another important thing is that port deletion will also trigger
flushing of flows offloaded to HW NICs.

Exception made for 'internal' ports that could have user ip/route
configuration. These ports will not be removed without '--cleanup'.

This change fixes OVS disappearing from the DPDK point of view
(keeping HW NICs improperly configured, sudden closing of vhost-user
connections) and will help with linux devices clearing with upcoming
AF_XDP netdev support.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Tested-by: William Tu <u9012063@gmail.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Ben Pfaff <blp@ovn.org>
2019-07-02 12:24:47 +03:00
Numan Siddique
5b34f8fc3b Add a new OVS action check_pkt_larger
This patch adds a new action 'check_pkt_larger' which checks if the
packet is larger than the given size and stores the result in the
destination register.

Usage: check_pkt_larger(len)->REGISTER
Eg. match=...,actions=check_pkt_larger(1442)->NXM_NX_REG0[0],next;

This patch makes use of the new datapath action - 'check_pkt_len'
which was recently added in the commit [1].
At the start of ovs-vswitchd, datapath is probed for this action.
If the datapath action is present, then 'check_pkt_larger'
makes use of this datapath action.

Datapath action 'check_pkt_len' takes these nlattrs
      * OVS_CHECK_PKT_LEN_ATTR_PKT_LEN - 'pkt_len' to check for
      * OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER (optional) - Nested actions
        to apply if the packet length is greater than the specified 'pkt_len'
      * OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL (optional) - Nested
        actions to apply if the packet length is lesser or equal to the
        specified 'pkt_len'.

Let's say we have these flows added to an OVS bridge br-int

table=0, priority=100 in_port=1,ip,actions=check_pkt_larger:100->NXM_NX_REG0[0],resubmit(,1)
table=1, priority=200,in_port=1,ip,reg0=0x1/0x1 actions=output:3
table=1, priority=100,in_port=1,ip,actions=output:4

Then the action 'check_pkt_larger' will be translated as
  - check_pkt_len(size=100,gt(3),le(4))

datapath will check the packet length and if the packet length is greater than 100,
it will output to port 3, else it will output to port 4.

In case, datapath doesn't support 'check_pkt_len' action, the OVS action
'check_pkt_larger' sets SLOW_ACTION so that datapath flow is not added.

This OVS action is intended to be used by OVN to check the packet length
and generate an ICMP packet with type 3, code 4 and next hop mtu
in the logical router pipeline if the MTU of the physical interface
is lesser than the packet length. More information can be found here [2]

[1] - 4d5ec89fc8
[2] - https://mail.openvswitch.org/pipermail/ovs-discuss/2018-July/047039.html

Reported-at:
https://mail.openvswitch.org/pipermail/ovs-discuss/2018-July/047039.html
Suggested-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
CC: Ben Pfaff <blp@ovn.org>
CC: Gregory Rose <gvrose8192@gmail.com>
Acked-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2019-04-22 12:56:50 -07:00
John Hurley
608ff46aaf ovs-tc: offload datapath rules matching on internal ports
Rules applied to OvS internal ports are not represented in TC datapaths.
However, it is possible to support rules matching on internal ports in TC.
The start_xmit ndo of OvS internal ports directs packets back into the OvS
kernel datapath where they are rematched with the ingress port now being
that of the internal port. Due to this, rules matching on an internal port
can be added as TC filters to an egress qdisc for these ports.

Allow rules applied to internal ports to be offloaded to TC as egress
filters. Rules redirecting to an internal port are also offloaded. These
are supported by the redirect ingress functionality applied in an earlier
patch.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2019-04-10 13:55:59 +02:00
Flavio Leitner
9da3207af7 Revert "ofproto-dpif: Let the dpif report when a port is a duplicate."
This reverts commit 7521e0cf9e.

This patch introduced a regression in OSP environments using internal
ports in other netns. Their networking configuration is lost when
the service is restarted because the ports are recreated now.

Before the patch it checked using netlink if the port with a specific
"name" was already there. The check is a lookup in all ports attached
to the DP regardless of the port's netns.

After the patch it relies on the kernel to identify that situation.
Unfortunately the only protection there is register_netdevice() which
fails only if the port with that name exists in the current netns.

If the port is in another netns, it will get a new dp_port and because
of that userspace will delete the old port. At this point the original
port is gone from the other netns and there a fresh port in the current
netns.

Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2019-01-25 13:20:13 -08:00
Ilya Maximets
1270b6e52c treewide: Wider use of packet batch APIs.
This patch replaces most of direct accesses to the dp_packet_batch
internal components by appropriate APIs.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-12-11 20:18:26 +00:00
Sriharsha Basavapatna
e6530a8d5d dpif: Restore a few lines with form feed characters
A few lines with form feed characters (ASCII: ^L) were accidentally
deleted by a recent commit to support rebalancing of offloaded flows.
This patch reverts those lines.

Fixes: 57924fc91c ("revalidator: Rebalance offloaded flows")
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2018-10-31 13:30:39 -07:00
Sriharsha Basavapatna via dev
57924fc91c revalidator: Rebalance offloaded flows based on the pps rate
This is the third patch in the patch-set to support dynamic rebalancing
of offloaded flows.

The dynamic rebalancing functionality is implemented in this patch. The
ukeys that are not scheduled for deletion are obtained and passed as input
to the rebalancing routine. The rebalancing is done in the context of
revalidation leader thread, after all other revalidator threads are
done with gathering rebalancing data for flows.

For each netdev that is in OOR state, a list of flows - both offloaded
and non-offloaded (pending) - is obtained using the ukeys. For each netdev
that is in OOR state, the flows are grouped and sorted into offloaded and
pending flows.  The offloaded flows are sorted in descending order of
pps-rate, while pending flows are sorted in ascending order of pps-rate.

The rebalancing is done in two phases. In the first phase, we try to
offload all pending flows and if that succeeds, the OOR state on the device
is cleared. If some (or none) of the pending flows could not be offloaded,
then we start replacing an offloaded flow that has a lower pps-rate than
a pending flow, until there are no more pending flows with a higher rate
than an offloaded flow. The flows that are replaced from the device are
added into kernel datapath.

A new OVS configuration parameter "offload-rebalance", is added to ovsdb.
The default value of this is "false". To enable this feature, set the
value of this parameter to "true", which provides packets-per-second
rate based policy to dynamically offload and un-offload flows.

Note: This option can be enabled only when 'hw-offload' policy is enabled.
It also requires 'tc-policy' to be set to 'skip_sw'; otherwise, flow
offload errors (specifically ENOSPC error this feature depends on) reported
by an offloaded device are supressed by TC-Flower kernel module.

Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Co-authored-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Reviewed-by: Sathya Perla <sathya.perla@broadcom.com>
Reviewed-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2018-10-19 11:27:52 +02:00
Ben Pfaff
769b50349f dpif: Remove support for multiple queues per port.
Commit 69c51582ff ("dpif-netlink: don't allocate per thread netlink
sockets") removed dpif-netlink support for multiple queues per port.
No remaining dpif provider supports multiple queues per port, so
remove infrastructure for the feature.

CC: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Tested-by: Yifeng Sun <pkusunyifeng@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
2018-09-26 15:57:06 -07:00
Gavi Teitz
a692410af0 dpctl: Expand the flow dump type filter
Added new types to the flow dump filter, and allowed multiple filter
types to be passed at once, as a comma separated list. The new types
added are:
 * tc - specifies flows handled by the tc dp
 * non-offloaded - specifies flows not offloaded to the HW
 * all - specifies flows of all types

The type list is now fully parsed by the dpctl, and a new struct was
added to dpif which enables dpctl to define which types of dumps to
provide, rather than passing the type string and having dpif parse it.

Signed-off-by: Gavi Teitz <gavi@mellanox.com>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2018-09-13 16:56:25 +02:00
Justin Pettit
8101f03fcd dpif: Don't pass in '*meter_id' to meter_set commands.
The original intent of the API appears to be that the underlying DPIF
implementaion would choose a local meter id.  However, neither of the
existing datapath meter implementations (userspace or Linux) implemented
that; they expected a valid meter id to be passed in, otherwise they
returned an error.  This commit follows the existing implementations and
makes the API somewhat cleaner.

Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
2018-08-16 10:20:52 -07:00
Justin Pettit
6508c845ad dpif: Move common meter checks into the dpif layer.
Another dpif provider will soon add support for meters, so move
some of the common sanity checks up into the dpif layer so that each
provider doesn't need to re-implement them.

Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
2018-07-30 13:00:49 -07:00
Justin Pettit
494a74557a Revert "dpctl: Expand the flow dump type filter"
Commit ab15e70eb5 ("dpctl: Expand the flow dump type filter") had a
number of issues with style, build breakage, and failing unit tests.
The patch is being reverted so that they can addressed.

This reverts commit ab15e70eb5.

CC: Gavi Teitz <gavi@mellanox.com>
CC: Simon Horman <simon.horman@netronome.com>
CC: Roi Dayan <roid@mellanox.com>
CC: Aaron Conole <aconole@redhat.com>
Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
2018-07-25 14:17:36 -07:00
Gavi Teitz
ab15e70eb5 dpctl: Expand the flow dump type filter
Added new types to the flow dump filter, and allowed multiple filter
types to be passed at once, as a comma separated list. The new types
added are:
 * tc - specifies flows handled by the tc dp
 * non-offloaded - specifies flows not offloaded to the HW
 * all - specifies flows of all types

The type list is now fully parsed by the dpctl, and a new struct was
added to dpif which enables dpctl to define which types of dumps to
provide, rather than passing the type string and having dpif parse it.

Signed-off-by: Gavi Teitz <gavi@mellanox.com>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2018-07-25 18:16:27 +02:00
Ben Pfaff
7521e0cf9e ofproto-dpif: Let the dpif report when a port is a duplicate.
The port_add() function checks whether the port about to be added to the
dpif is already present and adds it only if it is not.  This duplicates a
check also present (and necessary) in each dpif and races with it as well.
When a dpif has a large number of ports, the check can be expensive (it is
not efficiently implemented).  It would be nice to made the check cheaper,
but it also seems reasonable to do as done in this patch and just let the
dpif report the duplication.

Reported-by: Haifeng Lin <haifeng.lin@huawei.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2018-07-05 14:59:53 -07:00
Ben Pfaff
5a0e4aec1a treewide: Convert leading tabs to spaces.
It's always been OVS coding style to use spaces rather than tabs for
indentation, but some tabs have snuck in over time.  This commit converts
them to spaces.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
2018-06-11 15:32:00 -07:00
Ben Pfaff
fa37affad3 Embrace anonymous unions.
Several OVS structs contain embedded named unions, like this:

struct {
    ...
    union {
        ...
    } u;
};

C11 standardized a feature that many compilers already implemented
anyway, where an embedded union may be unnamed, like this:

struct {
    ...
    union {
        ...
    };
};

This is more convenient because it allows the programmer to omit "u."
in many places.  OVS already used this feature in several places.  This
commit embraces it in several others.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
Tested-by: Alin Gabriel Serdean <aserdean@ovn.org>
Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
2018-05-25 13:36:05 -07:00
Darrell Ball
7d7ded7af7 odp-execute: Rename 'may_steal' to 'should_steal'.
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2018-05-23 11:36:47 -07:00
William Tu
c6d8720137 tunnel: make tun_key_to_attr aware of tunnel type.
When there is a flow rule which forwards a packet from geneve
port to another tunnel port, ex: gre, the tun_metadata carried
from the geneve port might affect the outgoing port.  For example,
the datapath action from geneve port output to gre port (1) shows:
  set(tunnel(tun_id=0x7b,dst=2.2.2.2,ttl=64,
    geneve({class=0xffff,type=0,len=4,0x123}),flags(df|key))),1
Where the geneve(...) should not exist.

When using kernel's tunnel port, this triggers an error saying:
"Multiple metadata blocks provided", when there is a rule forwarding
the geneve packet to vxlan/erspan tunnel port.  A userspace test case
using geneve and gre also demonstrates the issue.

The patch makes the tun_key_to_attr aware of the tunnel type. So only
the relevant output tunnel's options are set.

Reported-by: Xiaoyan Jin <xiaoyanj@vmware.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Cc: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2018-05-14 16:21:03 -07:00
Ben Pfaff
0d71302e36 ofp-util, ofp-parse: Break up into many separate modules.
ofp-util had been far too large and monolithic for a long time.  This
commit breaks it up into units that make some logical sense.  It also
moves the pieces of ofp-parse that were specific to each unit into the
relevant unit.

Most of this commit is just moving code around.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
2018-02-13 10:43:13 -08:00
Eric Garver
1fe178d251 dpif: Add support for OVS_ACTION_ATTR_CT_CLEAR
This supports using the ct_clear action in the kernel datapath. To
preserve compatibility with current ct_clear behavior on old kernels, we
only pass this action down to the datapath if a probe reveals the
datapath actually supports it.

Signed-off-by: Eric Garver <e@erig.me>
Acked-by: William Tu <u9012063@gmail.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Justin Pettit <jpettit@ovn.org>
2018-01-20 11:16:37 -08:00
Yi Yang
f59cb331c4 nsh: rework NSH netlink keys and actions
This patch changes OVS_KEY_ATTR_NSH
to nested attribute and adds three new NSH sub attribute keys:

    OVS_NSH_KEY_ATTR_BASE: for length-fixed NSH base header
    OVS_NSH_KEY_ATTR_MD1:  for length-fixed MD type 1 context
    OVS_NSH_KEY_ATTR_MD2:  for length-variable MD type 2 metadata

Its intention is to align to NSH kernel implementation.

NSH match fields, set and PUSH_NSH action all use the below
nested attribute format:

OVS_KEY_ATTR_NSH begin
    OVS_NSH_KEY_ATTR_BASE
    OVS_NSH_KEY_ATTR_MD1
OVS_KEY_ATTR_NSH end

or

OVS_KEY_ATTR_NSH begin
    OVS_NSH_KEY_ATTR_BASE
    OVS_NSH_KEY_ATTR_MD2
OVS_KEY_ATTR_NSH end

In addition, NSH encap and decap actions are renamed as push_nsh
and pop_nsh to meet action naming convention.

Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2018-01-08 13:19:14 -08:00
Yifeng Sun
962044bf24 dpif: Fix memory leak
Valgrind complains in test 2322 (ovn -- 3 HVs, 3 LS, 3 lports/LS, 1 LR):

31,584 (26,496 direct, 5,088 indirect) bytes in 48 blocks are definitely
lost in loss record 422 of 427
   by 0x5165F4: xmalloc (util.c:120)
   by 0x466194: dp_packet_new (dp-packet.c:138)
   by 0x466194: dp_packet_new_with_headroom (dp-packet.c:148)
   by 0x46621B: dp_packet_clone_data_with_headroom (dp-packet.c:210)
   by 0x46621B: dp_packet_clone_with_headroom (dp-packet.c:170)
   by 0x49DD46: dp_packet_batch_clone (dp-packet.h:789)
   by 0x49DD46: odp_execute_clone (odp-execute.c:616)
   by 0x49DD46: odp_execute_actions (odp-execute.c:795)
   by 0x471663: dpif_execute_with_help (dpif.c:1296)
   by 0x473795: dpif_operate (dpif.c:1411)
   by 0x473E20: dpif_execute.part.21 (dpif.c:1320)
   by 0x428D38: packet_execute (ofproto-dpif.c:4682)
   by 0x41EB51: ofproto_packet_out_finish (ofproto.c:3540)
   by 0x41EB51: handle_packet_out (ofproto.c:3581)
   by 0x4233DA: handle_openflow__ (ofproto.c:8044)
   by 0x4233DA: handle_openflow (ofproto.c:8219)
   by 0x4514AA: ofconn_run (connmgr.c:1437)
   by 0x4514AA: connmgr_run (connmgr.c:363)
   by 0x41C8B5: ofproto_run (ofproto.c:1813)
   by 0x40B103: bridge_run__ (bridge.c:2919)
   by 0x4103B3: bridge_run (bridge.c:2977)
   by 0x406F14: main (ovs-vswitchd.c:119)

the parameter dp_packet_batch is leaked when 'may_steal' is true.

When dpif_execute_helper_cb is passed with a true 'may_steal', it
is supposed to take the ownership of dp_packet_batch and release
it when done.

Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-11-29 13:59:15 -08:00
Ashish Varma
97459c2f01 netdev, dpif: fix the crash/assert on port delete
a crash is seen in "netdev_ports_remove" when an interface is deleted and added
back in the system and when the interface is part of a bridge configuration.
e.g. steps:
  create a tap0 interface using "ip tuntap add.."
  add the tap0 interface to br0 using "ovs-vsctl add-port.."
  delete the tap0 interface from system using "ip tuntap del.."
  add the tap0 interface back in system using "ip tuntap add.."
                       (this changes the ifindex of the interface)
  delete tap0 from br0 using "ovs-vsctl del-port.."

In the function "netdev_ports_insert", two hmap entries were created for
mapping "portnum -> netdev" and "ifindex -> portnum".
When the interface is deleted from the system, the "netdev_ports_remove"
function is not getting called and the old ifindex entry is not getting
cleaned up from the "ifindex_to_port" hmap.

As part of the fix, added function "dpif_port_remove" which will call
"netdev_ports_remove" in the path where the interface deletion from the system
is detected.
Also, in "netdev_ports_remove", added the code where the "ifindex_to_port_data"
(ifindex -> portnum map node) is getting freed when the ifindex is not
available any more. (as the interface is already deleted.)

VMware-BZ: #1975788
Signed-off-by: Ashish Varma <ashishvarma.ovs@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-11-13 11:05:31 -08:00
Xiao Liang
fd016ae3fb lib: Move lib/poll-loop.h to include/openvswitch
Poll-loop is the core to implement main loop. It should be available in
libopenvswitch.

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-11-03 10:47:55 -07:00
Kaige Fu
a0439037dd dpif: Remove duplicated word in comment for dpif_recv()
Signed-off-by: Kaige Fu <fukaige@huawei.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
2017-10-30 16:47:20 -07:00
Roi Dayan
71d90be10d dpif: Fix cleanup of netdev_ports map
Executing dpctl commands from userspace also calls to
dpif_open()/dpif_close() but not really creating another dpif
but using a clone.
As for netdev_ports map is global we avoid adding duplicate entries
but also need to make sure we are not removing needed entries.
With this commit we make sure only the last dpif close should clean
the netdev_ports map.

Fixes: 6595cb95a4 ("dpif: Clean up netdev_ports map on dpif_close().")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
2017-08-18 13:36:40 -07:00
Joe Stringer
6595cb95a4 dpif: Clean up netdev_ports map on dpif_close().
Commit 32b77c316d9982("dpif: Save added ports in a port map.")
introduced tracking of all dpif ports by taking a reference on each
available netdev when the dpif is opened, but it failed to clear out and
release references to these netdevs when the dpif is closed.

One of the problems introduced by this was that upon clean exit of
ovs-vswitchd via "ovs-appctl exit --cleanup", the "ovs-netdev" device
was not deleted. This which could cause problems in subsequent start up.
Commit 5119e258da ("dpif: Fix cleanup of userspace datapath.") fixed
this particular problem by not adding such devices to the netdev_ports
map, but the referencing/unreferencing upon dpif_open()/dpif_close() is
still not balanced.

Balance the referencing of netdevs by clearing these during dpif_close().

Fixes: 32b77c316d9982("dpif: Save added ports in a port map.")
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Andy Zhou <azhou@ovn.org>
2017-08-09 15:57:21 -07:00
Jan Scheurich
1fc11c5948 Generic encap and decap support for NSH
This commit adds translation and netdev datapath support for generic
encap and decap actions for the NSH MD1 header. The generic encap and
decap actions are mapped to specific encap_nsh and decap_nsh actions
in the datapath.

The translation follows that general scheme that decap() of an NSH
packet triggers recirculation after decapsulation, while encap(nsh)
just modifies struct flow and sets the ctx->pending_encap flag to
generate the encap_nsh action at the next commit to be able to include
subsequent set_field actions for NSH headers.

Support for the flexible MD2 format using TLV properties is foreseen
in encap(nsh), but not yet fully implemented.

The CLI syntax for encap of NSH is
encap(nsh(md_type=1))
encap(nsh(md_type=2[,tlv(<tlv_class>,<tlv_type>,<hex_string>),...]))

Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-08-07 11:26:17 -07:00
Roi Dayan
dfaf79ddd9 dpif: Refactor obj type from void pointer to dpif_class
It's basically what is being passed today and passing a specific
type adds a compiler type check.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2017-07-27 10:17:46 +02:00
Marcelo Leitner
57459d0de0 dpif: fix warn msg when failed to open netdev
Currently it is using the datapath name/type but what has actually
failed was the netdev.

Fix it by using netdev name/type instead and also log why it failed.

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
2017-07-05 02:34:52 -07:00
Darrell Ball
5119e258da dpif: Fix cleanup of userspace datapath.
Hardware offload introduced extra tracking of netdev ports.  This
included ovs-netdev, which is really for internal infra usage for
the userpace datapath.  This breaks cleanup of the userspace
datapath.  One effect is that all userspace datapath system tests
fail except for the first one run. There is no need to do this
extra tracking of tap devices for the hardware offload effort.
Hence, the approach taken is to filter both internal device
and tap device types for hardware offload. Internal devices are
'internal' from the kernel datapath perspective and tap devices
are 'internal' from the userpace datapath perspective.

Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
2017-06-27 11:07:09 -07:00
Ben Pfaff
0722f34109 odp-util: Use port names in output in more places.
Until now, ODP output only showed port names for in_port matches.  This
commit shows them in other places port numbers appear.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Jan Scheurich <jan.scheurich@ericsson.com>
Tested-by: Jan Scheurich <jan.scheurich@ericsson.com>
2017-06-23 16:28:42 +08:00
Roi Dayan
eff1e5b091 dpif: Refactor flow logging functions to be used by other modules
To be reused by other modules.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2017-06-15 11:57:36 +02:00
Paul Blakey
7e8b719938 dpctl: Add an option to dump only certain kinds of flows
Usage:
    # to dump all datapath flows (default):
    ovs-dpctl dump-flows

    # to dump only flows that in kernel datapath:
    ovs-dpctl dump-flows type=ovs

    # to dump only flows that are offloaded:
    ovs-dpctl dump-flows type=offloaded

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2017-06-15 11:53:06 +02:00
Paul Blakey
32b77c316d dpif: Save added ports in a port map for netdev flow api use
To use netdev flow offloading api, dpifs needs to iterate over
added ports. This addition inserts the added dpif ports in a hash map,
The map will also be used to translate dpif ports to netdevs.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2017-06-15 11:39:41 +02:00
Jan Scheurich
beb75a40fd userspace: Switching of L3 packets in L2 pipeline
Ports have a new layer3 attribute if they send/receive L3 packets.

The packet_type included in structs dp_packet and flow is considered in
ofproto-dpif. The classical L2 match fields (dl_src, dl_dst, dl_type, and
vlan_tci, vlan_vid, vlan_pcp) now have Ethernet as pre-requisite.

A dummy ethernet header is pushed to L3 packets received from L3 ports
before the the pipeline processing starts. The ethernet header is popped
before sending a packet to a L3 port.

For datapath ports that can receive L2 or L3 packets, the packet_type
becomes part of the flow key for datapath flows and is handled
appropriately in dpif-netdev.

In the 'else' branch in flow_put_on_pmd() function, the additional check
flow_equal(&match.flow, &netdev_flow->flow) was removed, as a) the dpcls
lookup is sufficient to uniquely identify a flow and b) it caused false
negatives because the flow in netdev->flow may not properly masked.

In dpif_netdev_flow_put() we now use the same method for constructing the
netdev_flow_key as the one used when adding the flow to the dplcs to make sure
these always match. The function netdev_flow_key_from_flow() used so far was
not only inefficient but sometimes caused mismatches and subsequent flow
update failures.

The kernel datapath does not support the packet_type match field.
Instead it encodes the packet type implictly by the presence or absence of
the Ethernet attribute in the flow key and mask.
This patch filters the PACKET_TYPE attribute out of netlink flow key and
mask to be sent to the kernel datapath.

Signed-off-by: Lorand Jakab <lojakab@cisco.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Co-authored-by: Zoltan Balogh <zoltan.balogh@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-06-02 10:15:20 -07:00
Jan Scheurich
2482b0b0c8 userspace: Add packet_type in dp_packet and flow
This commit adds a packet_type attribute to the structs dp_packet and flow
to explicitly carry the type of the packet as prepration for the
introduction of the so-called packet type-aware pipeline (PTAP) in OVS.

The packet_type is a big-endian 32 bit integer with the encoding as
specified in OpenFlow verion 1.5.

The upper 16 bits contain the packet type name space. Pre-defined values
are defined in openflow-common.h:

enum ofp_header_type_namespaces {
    OFPHTN_ONF = 0,             /* ONF namespace. */
    OFPHTN_ETHERTYPE = 1,       /* ns_type is an Ethertype. */
    OFPHTN_IP_PROTO = 2,        /* ns_type is a IP protocol number. */
    OFPHTN_UDP_TCP_PORT = 3,    /* ns_type is a TCP or UDP port. */
    OFPHTN_IPV4_OPTION = 4,     /* ns_type is an IPv4 option number. */
};

The lower 16 bits specify the actual type in the context of the name space.

Only name spaces 0 and 1 will be supported for now.

For name space OFPHTN_ONF the relevant packet type is 0 (Ethernet).
This is the default packet_type in OVS and the only one supported so far.
Packets of type (OFPHTN_ONF, 0) are called Ethernet packets.

In name space OFPHTN_ETHERTYPE the type is the Ethertype of the packet.
A packet of type (OFPHTN_ETHERTYPE, <Ethertype>) is a standard L2 packet
whith the Ethernet header (and any VLAN tags) removed to expose the L3
(or L2.5) payload of the packet. These will simply be called L3 packets.

The Ethernet address fields dl_src and dl_dst in struct flow are not
applicable for an L3 packet and must be zero. However, to maintain
compatibility with the large code base, we have chosen to copy the
Ethertype of an L3 packet into the the dl_type field of struct flow.

This does not mean that it will be possible to match on dl_type for L3
packets with PTAP later on. Matching must be done on packet_type instead.

New dp_packets are initialized with packet_type Ethernet. Ports that
receive L3 packets will have to explicitly adjust the packet_type.

Signed-off-by: Jean Tourrilhes <jt@labs.hpe.com>
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Co-authored-by: Zoltan Balogh <zoltan.balogh@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-05-03 16:56:40 -07:00
Jarno Rajahalme
b701bce9c7 dpif: Log packet metadata on execute.
Debug log output for execute operations is missing the packet
metadata, which can be instrumental in tracing what the datapath
should be executing.  No reason to not have the metadata on the debug
output, so add it there.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
2017-04-17 12:28:17 -07:00
Andy Zhou
b52ac6592f ofproto: Probe for sample nesting level.
Add logics to detect the max level of nesting allowed by the
sample action implemented in the datapath.

Future patch allows xlate code to generate different odp actions
based on this information.

Signed-off-by: Andy Zhou <azhou@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
2017-03-10 17:35:49 -08:00
Andy Zhou
bb71c96ef2 dpif: Refactor dpif_probe_feature()
Allow actions to be part of the probe. No functional changes.
Future patch will make use this new API.

Signed-off-by: Andy Zhou <azhou@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
2017-03-10 17:35:26 -08:00
Jarno Rajahalme
076caa2fb0 ofproto: Meter translation.
Translate OpenFlow METER instructions to datapath meter actions.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Signed-off-by: Andy Zhou <azhou@ovn.org>
2017-03-08 13:09:44 -08:00
Jarno Rajahalme
5dddf96065 dpif: Meter framework.
Add DPIF-level infrastructure for meters.  Allow meter_set to modify
the meter configuration (e.g. set the burst size if unspecified).

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Signed-off-by: Andy Zhou <azhou@ovn.org>
2017-03-08 13:09:43 -08:00
Yang, Yi Y
6fcecb85ab datapath: add Ethernet push and pop actions
Upstream commit:
    commit 91820da6ae85904d95ed53bf3a83f9ec44a6b80a
    Author: Jiri Benc <jbenc@redhat.com>
    Date:   Thu Nov 10 16:28:23 2016 +0100

    openvswitch: add Ethernet push and pop actions

    It's not allowed to push Ethernet header in front of another Ethernet
    header.

    It's not allowed to pop Ethernet header if there's a vlan tag. This
    preserves the invariant that L3 packet never has a vlan tag.

    Based on previous versions by Lorand Jakab and Simon Horman.

    Signed-off-by: Lorand Jakab <lojakab@cisco.com>
    Signed-off-by: Simon Horman <simon.horman@netronome.com>
    Signed-off-by: Jiri Benc <jbenc@redhat.com>
    Acked-by: Pravin B Shelar <pshelar@ovn.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

[Committer notes]

Fix build with the upstream commit by folding in the required switch
case enum handlers.

Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
2017-03-02 15:51:39 -08:00