2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-28 21:07:47 +00:00

295 Commits

Author SHA1 Message Date
Mike Pattrick
2276c3a2c6 userspace: Support GRE TSO.
This patch extends the userspace datapaths support of tunnel tso from
only supporting VxLAN and Geneve to also supporting GRE tunnels. There
is also a software fallback for cases where the egress netdev does not
support this feature.

Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-01-17 00:20:48 +01:00
Mike Pattrick
82c1028e37 Userspace: Software fallback for UDP encapsulated TCP segmentation.
When sending packets that are flagged as requiring segmentation to an
interface that does not support this feature, send the packet to the TSO
software fallback instead of dropping it.

Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
2024-09-11 15:36:27 +02:00
Dexia Li
084c808729 userspace: Support VXLAN and GENEVE TSO.
For userspace datapath, this patch provides vxlan and geneve tunnel tso.
Only support userspace vxlan or geneve tunnel, meanwhile support
tunnel outter and inner csum offload. If netdev do not support offload
features, there is a software fallback.If netdev do not support vxlan
and geneve tso,packets will drop. Front-end devices can close offload
features by ethtool also.

Acked-by: Simon Horman <horms@ovn.org>
Signed-off-by: Dexia Li <dexia.li@jaguarmicro.com>
Co-authored-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2024-01-17 22:06:45 +01:00
Flavio Leitner
8b5fe2dc60 userspace: Add Generic Segmentation Offloading.
This provides a software implementation in the case
the egress netdev doesn't support segmentation in hardware.

The challenge here is to guarantee packet ordering in the
original batch that may be full of TSO packets. Each TSO
packet can go up to ~64kB, so with segment size of 1440
that means about 44 packets for each TSO. Each batch has
32 packets, so the total batch amounts to 1408 normal
packets.

The segmentation estimates the total number of packets
and then the total number of batches. Then allocate
enough memory and finally do the work.

Finally each batch is sent in order to the netdev.

Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Co-authored-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Simon Horman <horms@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-12-02 01:33:37 +01:00
Adrian Moreno
6240c0b4c8 netdev: Add netdev_get_speed() to netdev API.
Currently, the netdev's speed is being calculated by taking the link's
feature bits (using netdev_get_features()) and transforming them into
bps.

This mechanism can be both inaccurate and difficult to maintain, mainly
because we currently use the feature bits supported by OpenFlow which
would have to be extended to support all new feature bits of all netdev
implementations while keeping the OpenFlow API intact.

In order to expose the link speed accurately for all current and future
hardware, add a new netdev API call that allows the implementations to
provide the current and maximum link speeds in Mbps.

Internally, the logic to get the maximum supported speed still relies on
feature bits so it might still get out of sync in the future. However,
the maximum configurable speed is not used as much as the current speed
and these feature bits are not exposed through the netdev interface so
it should be easier to add more.

Use this new function instead of netdev_get_features() where the link
speed is needed.

As a consequence of this patch, link speeds of cards is properly
reported (internally in OVSDB) even if not supported by OpenFlow.
A test verifies this behavior using a tap device.

Also, in order to avoid using the old, this patch adds a checkpatch.py
warning if the old API is used.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2137567
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-07-17 20:03:32 +02:00
Mike Pattrick
3337e6d91c userspace: Enable L4 checksum offloading by default.
The netdev receiving packets is supposed to provide the flags
indicating if the L4 checksum was verified and it is OK or BAD,
otherwise the stack will check when appropriate by software.

If the packet comes with good checksum, then postpone the
checksum calculation to the egress device if needed.

When encapsulate a packet with that flag, set the checksum
of the inner L4 header since that is not yet supported.

Calculate the L4 checksum when the packet is going to be sent
over a device that doesn't support the feature.

Linux tap devices allows enabling L3 and L4 offload, so this
patch enables the feature. However, Linux socket interface
remains disabled because the API doesn't allow enabling
those two features without enabling TSO too.

Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Co-authored-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-06-15 23:50:30 +02:00
Mike Pattrick
5d11c47d3e userspace: Enable IP checksum offloading by default.
The netdev receiving packets is supposed to provide the flags
indicating if the IP checksum was verified and it is GOOD or BAD,
otherwise the stack will check when appropriate by software.

If the packet comes with good checksum, then postpone the
checksum calculation to the egress device if needed.

When encapsulate a packet with that flag, set the checksum
of the inner IP header since that is not yet supported.

Calculate the IP checksum when the packet is going to be sent over
a device that doesn't support the feature.

Linux devices don't support IP checksum offload alone, so the
support is not enabled.

Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Co-authored-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-06-15 23:49:51 +02:00
Mike Pattrick
4433cc6860 dpif-netdev: Show netdev offloading flags.
This patch modifies netdev_get_status to include information about
checksum offload status by port, allowing the user to gain insight into
where checksum offloading is active.

Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Co-authored-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-06-15 15:44:57 +02:00
Eli Britstein
76ab364ea8 netdev-offload: Set 'miss_api_supported' to be under netdev.
Cited commit introduced a flag in dpif-netdev level, to optimize
performance and avoid hw_miss_packet_recover() for devices with no such
support.
However, there is a race condition between traffic processing and
assigning a 'flow_api' object to the netdev. In such case, EOPNOTSUPP is
returned by netdev_hw_miss_packet_recover() in netdev-offload.c layer
because 'flow_api' is not yet initialized. As a result, the flag is
falsely disabled, and subsequent packets won't be recovered, though they
should.

In order to fix it, move the flag to be in netdev-offload layer, to
avoid that race.

Fixes: 6e50c1651869 ("dpif-netdev: Avoid hw_miss_packet_recover() for devices with no support.")
Signed-off-by: Eli Britstein <elibr@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-10-25 21:35:51 +02:00
Tao Liu
378b51c6b0 netdev: Clear auto_classified if netdev reopened with the type specified.
When netdev first opened by netdev_open(..., NULL, ...), netdev_class sets
to system by default, and auto_classified sets to true.

If netdev reopens by netdev_open(..., "system", ...), auto_classified
should be cleared.  This will be used in next patch to fix lag issue.

Fixes: 8c2c225e481d ("netdev: Fix netdev_open() to track and recreate classless interfaces")
Signed-off-by: Tao Liu <thomas.liu@ucloud.cn>
Acked-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-07-26 12:41:41 +02:00
Yong Xu
c2567e533f add port-based ingress policing based packet-per-second rate-limiting
OVS has support for using policing to enforce a rate limit in
kilobits per second. This is configured using OVSDB. f.e.

$ ovs-vsctl set interface tap0 ingress_policing_rate=1000
$ ovs-vsctl set interface tap0 ingress_policing_burst=100

This patch adds a related feature, allowing policing to enforce a rate
limit in kilo-packets per second. This is also configured using OVSDB.

$ ovs-vsctl set interface tap0 ingress_policing_kpkts_rate=1000
$ ovs-vsctl set interface tap0 ingress_policing_kpkts_burst=100

The kilo-bit and kilo-packet rate limits may be used separately or in
combination.

Add separate action for BPS and PPS in netlink message.

Revise code and change action result to pipe to allow
traffic pipe into second action.

This patch implements the feature for:
* OVSDB (northbound API)
* TC policer when used both with and without TC offload (kernel API)

Signed-off-by: Yong Xu <yong.xu@corigine.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2021-07-01 20:44:07 +02:00
Ilya Maximets
48c1ab5d74 netdev: Allow storing dpif type into netdev structure.
Storing of the dpif type of the owning datapath interface will allow
us to easily distinguish, for example, userspace tunneling ports from
the system ones.  This is required in terms of HW offloading to avoid
offloading of userspace flows to kernel interfaces that doesn't belong
to userspace datapath, but have same dpif_port names.

Acked-by: Eli Britstein <elibr@mellanox.com>
Acked-by: Roni Bar Yanai <roniba@mellanox.com>
Acked-by: Ophir Munk <ophirmu@mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-07-08 19:07:21 +02:00
William Tu
5bfc519fee netdev-afxdp: Add interrupt mode netdev class.
The patch adds a new netdev class 'afxdp-nonpmd' to enable afxdp
interrupt mode. This is similar to 'type=afxdp', except that the
is_pmd field is set to false. As a result, the packet processing
is handled by main thread, not pmd thread. This avoids burning
the CPU to always 100% when there is no traffic.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-04-28 17:58:31 +02:00
Flavio Leitner
35b5586ba7 userspace TSO: SCTP checksum offload optional.
Ideally SCTP checksum offload needs be advertised by the
NIC when userspace TSO is enabled. However, very few drivers
do that and it's not a widely used protocol. So, this patch
enables SCTP checksum offload if available, otherwise userspace
TSO can still be enabled but SCTP packets will be dropped on
NICs without support.

Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support")
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-02-26 15:24:15 +01:00
Flavio Leitner
8c5163fe81 userspace TSO: Include UDP checksum offload.
Virtio doesn't expose flags to control which protocols checksum
offload needs to be enabled or disabled. This patch checks if the
NIC supports UDP checksum offload and active it when TSO is enabled.

Reported-by: Ilya Maximets <i.maximets@ovn.org>
Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support")
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-02-26 15:24:15 +01:00
Flavio Leitner
29cf9c1b3b userspace: Add TCP Segmentation Offload support
Abbreviated as TSO, TCP Segmentation Offload is a feature which enables
the network stack to delegate the TCP segmentation to the NIC reducing
the per packet CPU overhead.

A guest using vhostuser interface with TSO enabled can send TCP packets
much bigger than the MTU, which saves CPU cycles normally used to break
the packets down to MTU size and to calculate checksums.

It also saves CPU cycles used to parse multiple packets/headers during
the packet processing inside virtual switch.

If the destination of the packet is another guest in the same host, then
the same big packet can be sent through a vhostuser interface skipping
the segmentation completely. However, if the destination is not local,
the NIC hardware is instructed to do the TCP segmentation and checksum
calculation.

It is recommended to check if NIC hardware supports TSO before enabling
the feature, which is off by default. For additional information please
check the tso.rst document.

Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Tested-by: Ciara Loftus <ciara.loftus.intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2020-01-17 22:27:25 +00:00
Yanqin Wei
3343f8d6cf netdev: use acquire-release semantics for change_seq in netdev
"rxq_enabled" of netdev is writen in the vhost thread and read by pmd
thread once it observes 'change_seq' is updated. This patch is to keep
order on aarch64 or other weak memory model CPU to ensure 'rxq_enabled' is
observed before 'change_seq'.

Reviewed-by: Gavin Hu <Gavin.Hu@arm.com>
Signed-off-by: Yanqin Wei <Yanqin.Wei@arm.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2019-12-02 14:48:14 -08:00
Darrell Ball
594570ea1c conntrack: Optimize recirculations.
Cache the 'conn' context and use it when it is valid.  The cached 'conn'
context will get reset if it is not expected to be valid; the cost to do
this is negligible.  Besides being most optimal, this also handles corner
cases, such as decapsulation leading to the same tuple, as in tunnel VPN
cases.  A negative test is added to check the resetting of the cached
'conn'.

Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2019-09-25 08:58:11 -07:00
William Tu
0de1b42596 netdev-afxdp: add new netdev type for AF_XDP.
The patch introduces experimental AF_XDP support for OVS netdev.
AF_XDP, the Address Family of the eXpress Data Path, is a new Linux socket
type built upon the eBPF and XDP technology.  It is aims to have comparable
performance to DPDK but cooperate better with existing kernel's networking
stack.  An AF_XDP socket receives and sends packets from an eBPF/XDP program
attached to the netdev, by-passing a couple of Linux kernel's subsystems
As a result, AF_XDP socket shows much better performance than AF_PACKET
For more details about AF_XDP, please see linux kernel's
Documentation/networking/af_xdp.rst. Note that by default, this feature is
not compiled in.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
2019-07-19 17:42:06 +03:00
David Marchand
35c91567c8 dpif-netdev: Only poll enabled vhost queues.
We currently poll all available queues based on the max queue count
exchanged with the vhost peer and rely on the vhost library in DPDK to
check the vring status beneath.
This can lead to some overhead when we have a lot of unused queues.

To enhance the situation, we can skip the disabled queues.
On rxq notifications, we make use of the netdev's change_seq number so
that the pmd thread main loop can cache the queue state periodically.

$ ovs-appctl dpif-netdev/pmd-rxq-show
pmd thread numa_id 0 core_id 1:
  isolated : true
  port: dpdk0             queue-id:  0 (enabled)   pmd usage:  0 %
pmd thread numa_id 0 core_id 2:
  isolated : true
  port: vhost1            queue-id:  0 (enabled)   pmd usage:  0 %
  port: vhost3            queue-id:  0 (enabled)   pmd usage:  0 %
pmd thread numa_id 0 core_id 15:
  isolated : true
  port: dpdk1             queue-id:  0 (enabled)   pmd usage:  0 %
pmd thread numa_id 0 core_id 16:
  isolated : true
  port: vhost0            queue-id:  0 (enabled)   pmd usage:  0 %
  port: vhost2            queue-id:  0 (enabled)   pmd usage:  0 %

$ while true; do
  ovs-appctl dpif-netdev/pmd-rxq-show |awk '
  /port: / {
    tot++;
    if ($5 == "(enabled)") {
      en++;
    }
  }
  END {
    print "total: " tot ", enabled: " en
  }'
  sleep 1
done

total: 6, enabled: 2
total: 6, enabled: 2
...

 # Started vm, virtio devices are bound to kernel driver which enables
 # F_MQ + all queue pairs
total: 6, enabled: 2
total: 66, enabled: 66
...

 # Unbound vhost0 and vhost1 from the kernel driver
total: 66, enabled: 66
total: 66, enabled: 34
...

 # Configured kernel bound devices to use only 1 queue pair
total: 66, enabled: 34
total: 66, enabled: 19
total: 66, enabled: 4
...

 # While rebooting the vm
total: 66, enabled: 4
total: 66, enabled: 2
...
total: 66, enabled: 66
...

 # After shutting down the vm
total: 66, enabled: 66
total: 66, enabled: 2

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2019-06-26 18:43:39 +01:00
Ilya Maximets
4f746d526d netdev-offload: Rename offload providers.
Flow API providers renamed to be consistent with parent module
'netdev-offload' and look more like each other.

'_rte_' replaced with more convenient '_dpdk_'.

We'll have following structure:

  Common code:
    lib/netdev-offload-provider.h
    lib/netdev-offload.c
    lib/netdev-offload.h

  Providers:
    lib/netdev-offload-tc.c
    lib/netdev-offload-dpdk.c

'netdev-offload-dummy' still resides inside netdev-dummy, but it
makes no much sence to move it out of there.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Acked-by: Roi Dayan <roid@mellanox.com>
2019-06-11 09:39:36 +03:00
Ilya Maximets
b6cabb8f8f netdev: Split up netdev offloading to separate module.
New module 'netdev-offload' created to manage different flow API
implementations. All the generic and provider independent code moved
there from the 'netdev' module.

Flow API providers further encapsulated.

The only function that was changed is 'netdev_any_oor'.
Now it uses offloading related hmap instead of common 'netdev_shash'.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Acked-by: Roi Dayan <roid@mellanox.com>
2019-06-11 09:39:36 +03:00
Ilya Maximets
5fc5c50f3d netdev: Dynamic per-port Flow API.
Current issues with Flow API:

* OVS calls offloading functions regardless of successful
  flow API initialization. (ex. on init_flow_api failure)
* Static initilaization of Flow API for a netdev_class forbids
  having different offloading types for different instances
  of netdev with the same netdev_class. (ex. different vports in
  'system' and 'netdev' datapaths at the same time)

Solution:

* Move Flow API from the netdev_class to netdev instance.
* Make Flow API dynamic, i.e. probe the APIs and choose the
  suitable one.

Side effects:

* Flow API providers localized as possible in their modules.
* Now we have an ability to make runtime checks. For example,
  we could check if particular device supports features we
  need, like if dpdk device supports RSS+MARK action.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Roi Dayan <roid@mellanox.com>
2019-06-11 09:39:36 +03:00
Ilya Maximets
f7e4685015 treewide: Clean up inclusions of netdev-dpdk header.
'netdev-dpdk.h' provides only 'netdev_dpdk_register' and
'free_dpdk_buf' which are not used in these files and should
not be used.
Leftovers from the already removed code.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2019-03-14 08:45:05 +00:00
Ilya Maximets
a47e2db209 dp-packet: Refactor offloading API.
1. No reason to have mbuf related APIs in a generic code.
2. Not only RSS/checksums should be invalidated in case of tunnel
   decapsulation or sending to 'ring' ports.

In order to fix two above issues, new function
'dp_packet_reset_offload' introduced. In order to clean up/unify
the code and simplify addition of new offloading features to non-DPDK
version of dp_packet, introduced 'ol_flags' bitmask. Additionally
reduced code complexity in 'dp_packet_clone_with_headroom' by using
already existent generic APIs.

Unfortunately, we still need to have a special case for mbuf
initialization inside 'dp_packet_init__()'.
'dp_packet_init_specific()' introduced for this purpose as a generic
API for initialization of the implementation-specific fields.

Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2019-03-13 09:51:30 +00:00
Daniel Alvarez
3bdf8b620b netdev: Add comment to allow removing a workaround in the future
This patch [0] in glibc fixes an issue which is right now workarounded
in OVS by [1]. I'm adding a comment to indicate that from glibc 2.28
and beyond, the workaround is not needed so that we can eventually
remove it.

[0] https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=c1f86a33ca32e26a9d6e29fc961e5ecb5e2e5eb4
[1] 3434d30686

Signed-off-by: Daniel Alvarez <dalvarez@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2018-12-12 10:49:05 -08:00
Sriharsha Basavapatna via dev
57924fc91c revalidator: Rebalance offloaded flows based on the pps rate
This is the third patch in the patch-set to support dynamic rebalancing
of offloaded flows.

The dynamic rebalancing functionality is implemented in this patch. The
ukeys that are not scheduled for deletion are obtained and passed as input
to the rebalancing routine. The rebalancing is done in the context of
revalidation leader thread, after all other revalidator threads are
done with gathering rebalancing data for flows.

For each netdev that is in OOR state, a list of flows - both offloaded
and non-offloaded (pending) - is obtained using the ukeys. For each netdev
that is in OOR state, the flows are grouped and sorted into offloaded and
pending flows.  The offloaded flows are sorted in descending order of
pps-rate, while pending flows are sorted in ascending order of pps-rate.

The rebalancing is done in two phases. In the first phase, we try to
offload all pending flows and if that succeeds, the OOR state on the device
is cleared. If some (or none) of the pending flows could not be offloaded,
then we start replacing an offloaded flow that has a lower pps-rate than
a pending flow, until there are no more pending flows with a higher rate
than an offloaded flow. The flows that are replaced from the device are
added into kernel datapath.

A new OVS configuration parameter "offload-rebalance", is added to ovsdb.
The default value of this is "false". To enable this feature, set the
value of this parameter to "true", which provides packets-per-second
rate based policy to dynamically offload and un-offload flows.

Note: This option can be enabled only when 'hw-offload' policy is enabled.
It also requires 'tc-policy' to be set to 'skip_sw'; otherwise, flow
offload errors (specifically ENOSPC error this feature depends on) reported
by an offloaded device are supressed by TC-Flower kernel module.

Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Co-authored-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Reviewed-by: Sathya Perla <sathya.perla@broadcom.com>
Reviewed-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2018-10-19 11:27:52 +02:00
Sriharsha Basavapatna via dev
738c785ff1 dpif-netlink: Detect Out-Of-Resource condition on a netdev
This is the first patch in the patch-set to support dynamic rebalancing
of offloaded flows.

The patch detects OOR condition on a netdev port when ENOSPC error is
returned by TC-Flower while adding a flow rule. A new structure is added
to the netdev called "netdev_hw_info", to store OOR related information
required to perform dynamic offload-rebalancing.

Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Co-authored-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Reviewed-by: Sathya Perla <sathya.perla@broadcom.com>
Reviewed-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2018-10-19 11:27:45 +02:00
Ben Pfaff
63cf14cd7a netdev: Properly clear 'details' when iterating in NETDEV_QOS_FOR_EACH.
The function comment for netdev_queue_dump_next() said that it cleared its
'detail' argument, but it didn't actually do that, which meant that details
could be incorrectly carried along from one queue to the next.

Reported-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
2018-10-03 14:08:17 -07:00
Daniel Alvarez
3434d30686 netdev: Retry getting interfaces on inconsistent dumps from kernel
This patch in glibc [0] is fixing a bug where we may be getting
inconsistent dumps from the kernel when listing interfaces due to
a race condition.

This could happen if we try to retrieve them while interfaces are
being added/removed from the system at the same time.
For systems running against old glibc versions, this patch is retrying
the operation up to 3 times and then proceeding by logging a
warning.

Note that 3 times should be enough to not delay the operation much
and since it's unlikely that we hit the race condition 3 times in
a row. Still, if this happened, this patch is not changing the
current behavior.

[0] https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=c1f86a33ca32e26a9d6e29fc961e5ecb5e2e5eb4

Signed-off-by: Daniel Alvarez <dalvarez@redhat.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Co-authored-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2018-08-15 13:38:09 -07:00
John Hurley
88dcf2aa82 netdev-provider: add class op to get block_id
Add a new class op for netdevs to get the block_id if one exists. The
block_id is used in offload ops to group multiple qdiscs together.

Stub calls are made to the new class op (implementation to follow in
further patches). The default block_id of 0 (no block) will be used in
these cases.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2018-06-29 14:51:47 +02:00
Gavi Teitz
d63ca5329f dpctl: Properly reflect a rule's offloaded to HW state
Previously, any rule that is offloaded via a netdev, not necessarily
to the HW, would be reported as "offloaded". This patch fixes this
misalignment, and introduces the 'dp' state, as follows:

rule is in HW via TC offload  -> offloaded=yes dp:tc
rule is in not HW over TC DP  -> offloaded=no  dp:tc
rule is in not HW over OVS DP -> offloaded=no  dp:ovs

To achieve this, the flows's 'offloaded' flag was encapsulated in a new
attrs struct, which contains the offloaded state of the flow and the
DP layer the flow is handled in, and instead of setting the flow's
'offloaded' state based solely on the type of dump it was acquired
via, for netdev flows it now sends the new attrs struct to be
collected along with the rest of the flow via the netdev, allowing
it to be set per flow.

For TC offloads, the offloaded state is set based on the 'in_hw' and
'not_in_hw' flags received from the TC as part of the flower. If no
such flag was received, due to lack of kernel support, it defaults
to true.

Signed-off-by: Gavi Teitz <gavi@mellanox.com>
Acked-by: Roi Dayan <roid@mellanox.com>
[simon: resolved conflict in lib/dpctl.man]
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2018-06-18 09:57:37 +02:00
William Tu
754f8acb45 netdev-native-tnl: refactor the tunnel push header.
The patch adds additional 'struct netdev *' to the
native tunnel's push_header() interface.  This is used
for later GRE sequence number support.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2018-05-21 20:33:30 -07:00
Jan Scheurich
8492adc270 netdev: Add optional qfill output parameter to rxq_recv()
If the caller provides a non-NULL qfill pointer and the netdev
implemementation supports reading the rx queue fill level, the rxq_recv()
function returns the remaining number of packets in the rx queue after
reception of the packet burst to the caller. If the implementation does
not support this, it returns -ENOTSUP instead. Reading the remaining queue
fill level should not substantilly slow down the recv() operation.

A first implementation is provided for ethernet and vhostuser DPDK ports
in netdev-dpdk.c.

This output parameter will be used in the upcoming commit for PMD
performance metrics to supervise the rx queue fill level for DPDK
vhostuser ports.

Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Acked-by: Billy O'Mahony <billy.o.mahony@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-05-11 08:08:24 +01:00
Darrell Ball
762ceb66b2 netdev: If MTU set fails, issue warn log.
Recently, an issue was debugged that was thought to be a bond
failover triggered issue.  It turned out to an vlan interface MTU set issue
that had nothing to do with bonding or most other likely possibilities.
Besides the effect of not setting the MTU to the desired value, this can
result in increased netlink traffic and processing with associated wasted
work. Let us flag a configuration issue at warn level (rather than dbg) to
catch the problem early.

Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2018-04-18 11:09:58 -07:00
Ben Pfaff
d2a60e57a8 netdev: Fix typos in comment.
Fixes: ee4776b8bce1 ("netdev: New function netdev_get_ip_by_name().")
Suggested-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2018-04-17 08:33:41 -07:00
Ben Pfaff
ee4776b8bc netdev: New function netdev_get_ip_by_name().
This is like netdev_get_in4_by_name() but accepts any IP address instead
of just an IPv4 address.

It will acquire its first user in an upcoming commit.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Mark Michelson <mmichels@redhat.com>
2018-04-16 14:53:27 -07:00
Ben Pfaff
93b7faf1c7 socket-util: Add more functions for IPv[46] sockaddr and sockaddr_storage.
The existing functions for working with sockaddr_storage that contain an
IPv4 or IPv6 address are useful.  This commit adds more functions for
working with them, as well as a parallel set of functions for struct
sockaddr.

This also adds an initial user for some of the new sockaddr functions in
netdev.c.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Mark Michelson <mmichels@redhat.com>
2018-04-16 14:53:27 -07:00
Ben Pfaff
dfc77282c5 ofp-print: Move much of the printing code into message-specific files.
Until now, the ofp-print code has had a lot of logic specific to
individual messages.  This code is better put with the other code specific
to those messages, so this commit starts to migrate it.

There is more work of a similar type to do, but this is a reasonable start.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
2018-03-14 11:41:22 -07:00
Justin Pettit
e883448e3f dp-packet: Add index to DP_PACKET_BATCH_FOR_EACH to prevent shadowing.
Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
2018-02-28 14:53:27 -08:00
Michal Weglicki
971f4b394c netdev: Custom statistics.
- New get_custom_stats interface function is added to netdev. It
  allows particular netdev implementation to expose custom
  counters in dictionary format (counter name/counter value).
- New statistics are retrieved using experimenter code and
  are printed as a result to ofctl dump-ports.
- New counters are available for OpenFlow 1.4+.
- New statistics are printed to output via ofctl only if those
  are present in reply message.
- New statistics definition is added to include/openflow/intel-ext.h.
- Custom statistics are implemented only for dpdk-physical
  port type.
- DPDK-physical implementation uses xstats to collect statistics.
  Only dropped and error counters are exposed.

Co-authored-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Michal Weglicki <michalx.weglicki@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2018-01-10 15:29:13 -08:00
Ben Pfaff
34944e81f0 Merge branch 'dpdk_merge' of https://github.com/istokes/ovs into HEAD 2018-01-02 07:45:17 -08:00
Ben Pfaff
b2befd5bb2 sparse: Add guards to prevent FreeBSD-incompatible #include order.
FreeBSD insists that <sys/types.h> be included before <netinet/in.h> and
that <netinet/in.h> be included before <arpa/inet.h>.  This adds guards to
the "sparse" headers to yield a warning if this order is violated.  This
commit also adjusts the order of many #includes to suit this requirement.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
2017-12-22 12:58:02 -08:00
Ilya Maximets
b30896c969 netdev: Remove unused may_steal.
Not needed anymore because 'may_steal' already handled on
dpif-netdev layer and always true.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com
2017-12-20 21:07:46 +00:00
Yifeng Sun
59b1e023ae netdev: netdev_get_etheraddr is not functioning as advertised.
netdev_get_etheraddr claims to clear 'mac' on error, but it fails to do so.
When looking further into both netdev_windows_get_etheraddr() and
netdev_linux_get_etheraddr(), 'mac' is also not cleared. This will lead to
usage of uninitialised ofputil_phy_port.hw_addr.

v1 -> v2: fixed a bug in v1 found by Ben, thanks Ben.

Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-11-30 13:36:29 -08:00
Ben Pfaff
fa54741ea5 netdev: Eliminate redundant ifindex mapping.
Until now, the code for mapping ODP port number to ifindexes and vice versa
has maintained two completely separate data structures, one for each
direction.  It was possible for the two mappings to become out of sync
with each other since either one could change independently.  This commit
merges them into a single data structure (with two indexes), which at least
means that if one is removed then the other is as well.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
2017-11-15 10:57:49 -08:00
Ben Pfaff
8639555322 netdev: Indentation and style fixes.
White space changes only.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
2017-11-15 10:57:30 -08:00
Ben Pfaff
0d8efdc9ca netdev: Change macro to function.
There was no reason that this should have been a macro.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
2017-11-14 10:14:18 -08:00
Ashish Varma
97459c2f01 netdev, dpif: fix the crash/assert on port delete
a crash is seen in "netdev_ports_remove" when an interface is deleted and added
back in the system and when the interface is part of a bridge configuration.
e.g. steps:
  create a tap0 interface using "ip tuntap add.."
  add the tap0 interface to br0 using "ovs-vsctl add-port.."
  delete the tap0 interface from system using "ip tuntap del.."
  add the tap0 interface back in system using "ip tuntap add.."
                       (this changes the ifindex of the interface)
  delete tap0 from br0 using "ovs-vsctl del-port.."

In the function "netdev_ports_insert", two hmap entries were created for
mapping "portnum -> netdev" and "ifindex -> portnum".
When the interface is deleted from the system, the "netdev_ports_remove"
function is not getting called and the old ifindex entry is not getting
cleaned up from the "ifindex_to_port" hmap.

As part of the fix, added function "dpif_port_remove" which will call
"netdev_ports_remove" in the path where the interface deletion from the system
is detected.
Also, in "netdev_ports_remove", added the code where the "ifindex_to_port_data"
(ifindex -> portnum map node) is getting freed when the ifindex is not
available any more. (as the interface is already deleted.)

VMware-BZ: #1975788
Signed-off-by: Ashish Varma <ashishvarma.ovs@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-11-13 11:05:31 -08:00
Ilya Maximets
14f137ba1f netdev: Remove EOPNOTSUPP related comment for netdev_send().
Since 57eebbb4c315, the caller must make sure that 'netdev' supports
sending. This mentioned at the start of the comment.

Fixes: 57eebbb4c315 ("dpif-netdev: Don't try to output on a device without txqs.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
2017-11-09 14:11:30 -08:00