2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-23 02:17:42 +00:00

351 Commits

Author SHA1 Message Date
Flavio Leitner
8c5163fe81 userspace TSO: Include UDP checksum offload.
Virtio doesn't expose flags to control which protocols checksum
offload needs to be enabled or disabled. This patch checks if the
NIC supports UDP checksum offload and active it when TSO is enabled.

Reported-by: Ilya Maximets <i.maximets@ovn.org>
Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support")
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-02-26 15:24:15 +01:00
Flavio Leitner
73858f9dbe netdev-linux: Prepend the std packet in the TSO packet
Usually TSO packets are close to 50k, 60k bytes long, so to
to copy less bytes when receiving a packet from the kernel
change the approach. Instead of extending the MTU sized
packet received and append with remaining TSO data from
the TSO buffer, allocate a TSO packet with enough headroom
to prepend the std packet data.

Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support")
Suggested-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2020-02-06 11:37:23 -08:00
William Tu
105cf8df82 netdev-linux: Detect numa node id.
The patch detects the numa node id from the name of the netdev,
by reading the '/sys/class/net/<devname>/device/numa_node'.
If not available, ex: virtual device, or any error happens,
return numa id 0.  Currently only the afxdp netdev type uses it,
other linux netdev types are disabled due to no use case.

Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-01-18 01:42:22 +01:00
Flavio Leitner
29cf9c1b3b userspace: Add TCP Segmentation Offload support
Abbreviated as TSO, TCP Segmentation Offload is a feature which enables
the network stack to delegate the TCP segmentation to the NIC reducing
the per packet CPU overhead.

A guest using vhostuser interface with TSO enabled can send TCP packets
much bigger than the MTU, which saves CPU cycles normally used to break
the packets down to MTU size and to calculate checksums.

It also saves CPU cycles used to parse multiple packets/headers during
the packet processing inside virtual switch.

If the destination of the packet is another guest in the same host, then
the same big packet can be sent through a vhostuser interface skipping
the segmentation completely. However, if the destination is not local,
the NIC hardware is instructed to do the TCP segmentation and checksum
calculation.

It is recommended to check if NIC hardware supports TSO before enabling
the feature, which is off by default. For additional information please
check the tso.rst document.

Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Tested-by: Ciara Loftus <ciara.loftus.intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2020-01-17 22:27:25 +00:00
Yi Yang
2109841b79 Use batch process recv for tap and raw socket in netdev datapath
Current netdev_linux_rxq_recv_tap and netdev_linux_rxq_recv_sock
just receive single packet, that is very inefficient, per my test
case which adds two tap ports or veth ports into OVS bridge
(datapath_type=netdev) and use iperf3 to do performance test
between two ports (they are set into different network name space).

The result is as below:

  tap:  295 Mbits/sec
  veth: 207 Mbits/sec

After I change netdev_linux_rxq_recv_tap and
netdev_linux_rxq_recv_sock to use batch process, the performance
is boosted by about 7 times, here is the result:

  tap:  1.96 Gbits/sec
  veth: 1.47 Gbits/sec

Undoubtedly this is a huge improvement although it can't match
OVS kernel datapath yet.

FYI: here is thr result for OVS kernel datapath:

  tap:  37.2 Gbits/sec
  veth: 36.3 Gbits/sec

Note: performance result is highly related with your test machine,
you shouldn't expect the same results on your test machine.

Signed-off-by: Yi Yang <yangyi01@inspur.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2020-01-09 09:48:49 -08:00
Paul Blakey
acdd544c4c tc: Introduce tcf_id to specify a tc filter
Move all that is needed to identify a tc filter to a
new structure, tcf_id. This removes a lot of duplication
in accessing/creating tc filters.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2019-12-22 11:54:40 +01:00
William Tu
7bf075d95a netdev-afxdp: Enable libbpf logging to OVS.
libbpf has pr_warn, pr_info, and pr_debug. The patch registers
these print functions, integrating the libbpf logs to OVS log.

Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
2019-11-21 09:20:10 -08:00
Eelco Chaudron
52b5a5c0a3 netdev-afxdp: add afxdp specific maximum MTU check
Drivers natively supporting AF_XDP will check that a configured MTU size
will not exceed the allowed size for AF_XDP. However, when the skb
compatibility mode is used there is no check and any value is accepted.
This, for example, is the case when using the TAP interface.

This fix adds a check to make sure only AF_XDP valid values are excepted.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: William Tu <u9012063@gmail.com>
2019-11-19 11:20:49 -08:00
Ilya Maximets
d560bc1baa netdev-afxdp: Convert AFXDP_DEBUG to custom stats.
These are valid statistics of a network interface and should be
exposed via custom stats.

The same MACRO trick as in vswitchd/bridge.c is used to reduce code
duplication and easily add new stats if necessary in the future.

Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
2019-07-24 19:22:05 +03:00
Ilya Maximets
f627cf1dd9 netdev-afxdp: Fix use of unconfigured device.
In case of failure of 'xsk_configure_all()', 'n_rxq' and 'xdpmode'
will remain in a new state. This will result in successful
reconfiguration (immediate return, because configuration is already
applied) if 'netdev_reconfigure()' will be called again.

Same issue was fixed previously for netdev-dpdk using 'dev->started'
flag in commit:
606f66507250 ("netdev-dpdk: Don't use PMD driver if not configured successfully")

Let's use similar approach with checking the 'dev->xsks' which only
exists if configuration was successful.

Additionally implemented 'netdev_afxdp_construct()' function to
explicitly initialize all the specific fields and request the
reconfiguration.

CC: William Tu <u9012063@gmail.com>
Fixes: 0de1b425962d ("netdev-afxdp: add new netdev type for AF_XDP.")
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
2019-07-23 10:35:29 +03:00
William Tu
0de1b42596 netdev-afxdp: add new netdev type for AF_XDP.
The patch introduces experimental AF_XDP support for OVS netdev.
AF_XDP, the Address Family of the eXpress Data Path, is a new Linux socket
type built upon the eBPF and XDP technology.  It is aims to have comparable
performance to DPDK but cooperate better with existing kernel's networking
stack.  An AF_XDP socket receives and sends packets from an eBPF/XDP program
attached to the netdev, by-passing a couple of Linux kernel's subsystems
As a result, AF_XDP socket shows much better performance than AF_PACKET
For more details about AF_XDP, please see linux kernel's
Documentation/networking/af_xdp.rst. Note that by default, this feature is
not compiled in.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
2019-07-19 17:42:06 +03:00
Ilya Maximets
5fc5c50f3d netdev: Dynamic per-port Flow API.
Current issues with Flow API:

* OVS calls offloading functions regardless of successful
  flow API initialization. (ex. on init_flow_api failure)
* Static initilaization of Flow API for a netdev_class forbids
  having different offloading types for different instances
  of netdev with the same netdev_class. (ex. different vports in
  'system' and 'netdev' datapaths at the same time)

Solution:

* Move Flow API from the netdev_class to netdev instance.
* Make Flow API dynamic, i.e. probe the APIs and choose the
  suitable one.

Side effects:

* Flow API providers localized as possible in their modules.
* Now we have an ability to make runtime checks. For example,
  we could check if particular device supports features we
  need, like if dpdk device supports RSS+MARK action.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Roi Dayan <roid@mellanox.com>
2019-06-11 09:39:36 +03:00
Tonghao Zhang
718be50dae netdev-linux: Add coverage counters for netdev_set_policing when ingress tc-offload
When enable tc-offload, we should add coverage counters for netdev_set_policing.

Fixes: e7f6ba220e10 ("lib/tc: add ingress ratelimiting support for tc-offload")
Cc: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2019-04-22 10:06:41 -07:00
Flavio Leitner
23fa50f64b netlink linux: fix to append the netnsid netlink attr.
The attribute was being prepended to the netlink buffer, but
the function  nl_sock_transact_multiple__() expects to find the
netlink header as first to update the length, seq and pid fields.

This patch fixes to append the attribute instead of prepending it.

Fixes: 756819ddd788 ("netdev-linux: use netlink to update netdev.")
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2019-04-16 15:48:59 -07:00
Flavio Leitner
b43762a5ad netlink linux: account for the netnsid netlink attr.
The buffer needs to be reallocated and data copied when
the netnsid netlink attribute is included, so avoid that
by accounting the attribute when the buffer is initially
allocated.

Fixes: 756819ddd788 ("netdev-linux: use netlink to update netdev.")
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2019-04-16 15:48:50 -07:00
John Hurley
608ff46aaf ovs-tc: offload datapath rules matching on internal ports
Rules applied to OvS internal ports are not represented in TC datapaths.
However, it is possible to support rules matching on internal ports in TC.
The start_xmit ndo of OvS internal ports directs packets back into the OvS
kernel datapath where they are rematched with the ingress port now being
that of the internal port. Due to this, rules matching on an internal port
can be added as TC filters to an egress qdisc for these ports.

Allow rules applied to internal ports to be offloaded to TC as egress
filters. Rules redirecting to an internal port are also offloaded. These
are supported by the redirect ingress functionality applied in an earlier
patch.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2019-04-10 13:55:59 +02:00
John Hurley
95255018a8 ovs-tc: allow offloading TC rules to egress qdiscs
Offloading rules to a TC datapath only allows the creating of ingress hook
qdiscs and the application of filters to these. However, there may be
certain situations where an egress qdisc is more applicable (e.g. when
offloading to TC rules applied to OvS internal ports).

Extend the TC API in OvS to allow the creation of egress qdiscs and to add
or interact with flower filters applied to these.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2019-04-09 17:34:07 +02:00
Sharon K
2f564bb153 netdev-linux: netem QoS support
Signed-off-by: Sharon Krendel <thekafkaf@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2019-03-14 19:20:19 -07:00
Roi Dayan
cae643534e netdev-linux: Remove ingress qdisc before trying to add shared block
Adding shared ingress block with ingress qdisc already exists results
in a failure. So remove the ingress qdisc first.
Also while at it log the slave name.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Acked-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2019-03-12 10:18:29 +01:00
Pieter Jansen van Vuuren
e7f6ba220e lib/tc: add ingress ratelimiting support for tc-offload
Firstly this patch introduces the notion of reserved priority, as the
filter implementing ingress policing would require the highest priority.
Secondly it allows setting rate limiters while tc-offloads has been
enabled. Lastly it installs a matchall filter that matches all traffic
and then applies a police action, when configuring an ingress rate
limiter.

An example of what to expect:

OvS CLI:
ovs-vsctl set interface <netdev_name> ingress_policing_rate=5000
ovs-vsctl set interface <netdev_name> ingress_policing_burst=100

Resulting TC filter:
filter protocol ip pref 1 matchall chain 0
filter protocol ip pref 1 matchall chain 0 handle 0x1
  not_in_hw
	action order 1:  police 0x1 rate 5Mbit burst 125Kb mtu 64Kb
action drop/continue overhead 0b
        ref 1 bind 1 installed 3 sec used 3 sec
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
10.0.0.200 () port 0 AF_INET : demo
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

131072  16384  16384    60.13       4.49

ovs-vsctl list interface <netdev_name>
_uuid               : 2ca774e8-8b95-430f-a2c2-f8f742613ab1
admin_state         : up
...
ingress_policing_burst: 100
ingress_policing_rate: 5000
...
type                : ""

Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2019-03-04 17:22:34 +01:00
Ben Pfaff
61265c03f0 netdev-linux: Fix function argument order in sfq_tc_load().
sfq_install__() takes quantum before perturb.

Acked-by: Justin Pettit <jpettti@ovn.org>
Reported-by: shaoke xi <xishaoke.xsk@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2019-01-17 16:31:44 -08:00
Ben Pfaff
64ed99ffbc netdev-linux: Don't include <net/if_packet.h>.
This header only defines sockaddr_pkt, which this source file doesn't use.

This was the only user of net/if_packet.h, so also remove the
configure-time test for it (which netdev-linux wasn't using anyway).

Reported-by: Andre McCurdy <armccurdy@gmail.com>
Reported-at: https://github.com/openvswitch/ovs/pull/253
Signed-off-by: Ben Pfaff <blp@ovn.org>
2018-10-03 16:55:55 -07:00
Andre McCurdy
b24751fff8 netdev-linux: use unsigned int for ifi_flags temporary variables
ifi_flags in struct netdev_linux is an unsigned int, therefore use
unsigned int for variables which will hold ifi_flags values.

Signed-off-by: Andre McCurdy <armccurdy@gmail.com>
2018-10-02 15:39:35 -07:00
Ben Pfaff
89c09c1cd1 netdev: Clean up class initialization.
The macros are hard to read.  This makes it a little more readable.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-08-27 17:48:23 +01:00
Ben Pfaff
1bab4901c4 netdev-linux: Avoid division by 0 if kernel reports bad scheduler data.
If the kernel reported a value of 0 for the second value in
/proc/net/psched, it would cause a division-by-zero fault in
read_psched().  I don't know of a kernel that would actually do that, but
it's still better to be safe.

Found by clang static analyzer.

Reported-by: Bhargava Shastry <bshastry@sect.tu-berlin.de>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
2018-08-20 09:30:11 -07:00
Tiago Lam
e3b5d7c536 netdev-linux: Fix segfault in update_lag().
A bissect shows that commit d22f892 ("netdev-linux: monitor and offload
LAG slaves to TC") introduced netdev_linux_update_lag(), which is now
triggering a crash in the "datapath - ping over bond" test in
system-userspace-testsuite:

  (gdb) bt
  #0  0x00000000009762e7 in netdev_linux_update_lag (change=0x7ffdff013750) at lib/netdev-linux.c:728
  728                 if (is_netdev_linux_class(master_netdev->netdev_class)) {

This fixes the crash by simply returning in case netdev_from_name()
returns NULL, as this should indicate the master is not attached to the
bridge.

Additionally, netdev_linux_update_lag() isn't "clearing" the netdev
reference it gets from netdev_from_name(), meaning its ref_cnt is
incremented but never decremented. Thus, also call netdev_close() before
returning.

CC: John Hurley <john.hurley@netronome.com>
Fixes: d22f8927 ("netdev-linux: monitor and offload LAG slaves to TC")
Signed-off-by: Tiago Lam <tiago.lam@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2018-07-05 14:05:21 -07:00
John Hurley
d22f8927c3 netdev-linux: monitor and offload LAG slaves to TC
A LAG slave cannot be added directly to an OvS bridge, nor can a OvS
bridge port be added to a LAG dev. However, LAG masters can be added to
OvS.

Use TC blocks to indirectly offload slaves when their master is attached
as a linux-netdev to an OvS bridge. In the kernel TC datapath, blocks link
together netdevs in a similar way to LAG devices. For example, if a filter
is added to a block then it is added to all block devices, or if stats are
incremented on 1 device then the stats on the entire block are incremented.
This mimics LAG devices in that if a rule is applied to the LAG master
then it should be applied to all slaves etc.

Monitor LAG slaves via the netlink socket in netdev-linux and, if their
master is attached to the OvS bridge and has a block id, add the slave's
qdisc to the same block. Similarly, if a slave is freed from a master,
remove the qdisc from the masters block.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2018-06-29 14:57:47 +02:00
John Hurley
25db83be5a netdev-linux: assign LAG devs to tc blocks
Assign block ids to LAG masters that are added to OvS as linux-netdevs and
offloaded via offload API calls. Only LAG masters are assigned to blocks.

To ensure uniqueness, the block ids are determined by the netdev ifindex.
Implement a get_block_id op for linux netdevs to achieve this.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2018-06-29 14:57:44 +02:00
John Hurley
3d9c99ab8a netdev-linux: indicate if netdev is a LAG master
If a linux netdev is added to OvS that is a LAG master (for example, a
bond or team netdev) then record this in bool form in the dev struct. Use
the link info extracted from rtnetlink calls to determine this.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2018-06-29 14:57:40 +02:00
John Hurley
88dcf2aa82 netdev-provider: add class op to get block_id
Add a new class op for netdevs to get the block_id if one exists. The
block_id is used in offload ops to group multiple qdiscs together.

Stub calls are made to the new class op (implementation to follow in
further patches). The default block_id of 0 (no block) will be used in
these cases.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2018-06-29 14:51:47 +02:00
John Hurley
093c9458fb tc: allow offloading of block ids
Blocks, in tc classifiers, allow the grouping of multiple qdiscs with an
associated block id. Whenever a filter is added to/removed from this
block, the filter is added to/removed from all associated qdiscs.

Extend TC offload functions to take a block id as a parameter. If the id
is zero then the dqisc is not considered part of a block.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2018-06-29 14:33:59 +02:00
Flavio Leitner
19aac14ae4 tap: flag as present after opening it.
Assume the device is present if it can be opened.

Reported-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Tested-by: Eelco Chaudron <echaudro@redhat.com>
2018-06-14 16:50:28 -07:00
Flavio Leitner
629e1476b1 linux: Assume it is local if no API is available.
If the 'openvswitch' kernel module is not loaded, the API is not
available and the userspace will keep retrying. This approach is
not ideal for the netdev datapath type.

This patch disables network netns support if the error code returned
indicates that the API is not available.

Reported-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Tested-by: Eelco Chaudron <echaudro@redhat.com>
2018-06-14 16:06:11 -07:00
Flavio Leitner
3dbcbfe4a9 linux: disable netns support for tap.
Tap device is not added to the kernel datapath, so there is
no way to get netns information.

Reported-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Tested-by: Eelco Chaudron <echaudro@redhat.com>
2018-06-14 15:57:14 -07:00
Jan Scheurich
8492adc270 netdev: Add optional qfill output parameter to rxq_recv()
If the caller provides a non-NULL qfill pointer and the netdev
implemementation supports reading the rx queue fill level, the rxq_recv()
function returns the remaining number of packets in the rx queue after
reception of the packet burst to the caller. If the implementation does
not support this, it returns -ENOTSUP instead. Reading the remaining queue
fill level should not substantilly slow down the recv() operation.

A first implementation is provided for ethernet and vhostuser DPDK ports
in netdev-dpdk.c.

This output parameter will be used in the upcoming commit for PMD
performance metrics to supervise the rx queue fill level for DPDK
vhostuser ports.

Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Acked-by: Billy O'Mahony <billy.o.mahony@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-05-11 08:08:24 +01:00
Flavio Leitner
e0e2410d52 netdev-linux: fail ops not supporting remote netns.
When the netdev is in another namespace and the operation doesn't
support network namespaces, return the correct error.

Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2018-03-31 12:48:39 -07:00
Flavio Leitner
cf114a7fce netlink linux: enable listening to all nsids
Internal ports may be moved to another network namespace
and when that happens, the vswitch stops receiving netlink
notifications.

This patch enables the vswitch to listen to all network
namespaces that have a nsid assigned into the network
namespace where the socket has been opened.

It requires kernel 4.2 or newer.

Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2018-03-31 12:48:36 -07:00
Flavio Leitner
756819ddd7 netdev-linux: use netlink to update netdev.
The ioctl interface doesn't support network namespaces, so
try updating the netdev using netlink message instead.

To provide backwards compatibility, fall back to the previous
method if netlink isn't supported or fails.

Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2018-03-31 12:48:34 -07:00
Flavio Leitner
bfda523979 netnsid: update device only if netnsid matches.
Recent kernels provide the network namespace ID of a port,
so use that to discover where the port currently is.

A network device in another network namespace could have the
same name, so once the socket starts listening to other network
namespaces, it is necessary to confirm the netnsid.

Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2018-03-31 12:48:33 -07:00
Flavio Leitner
a86bd14ec9 netlink: provide network namespace id from a msg.
The netlink notification's ancillary data contains the network
namespace id (netnsid) needed to identify the device correctly.

Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2018-03-31 12:48:31 -07:00
Justin Pettit
e883448e3f dp-packet: Add index to DP_PACKET_BATCH_FOR_EACH to prevent shadowing.
Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
2018-02-28 14:53:27 -08:00
Justin Pettit
7ed58d4a0d Don't shadow global VLOG "rl" definition.
Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
2018-02-28 14:53:19 -08:00
Tonghao Zhang
e8e1a40974 netdev-linux: Report netdev change events when mac changed.
When mac addr of ports on bridge has been changed, for example,

$ ip link set dev eth0 address 00:11:22:33:44:55

we should reconfigure the datapath id and mac addr of local port.
But now openvswitch dont do that as expected.

A simple example of how to reproduce it:

$ ovs-vsctl add-br br0
$ ifconfig br0 			# for example, mac is c6:c6:d7:46:b4:4b
$ ip link set dev br0 address 00:11:22:33:44:55
$ ifconfig br0 			# mac of br0 will be 00:11:22:33:44:55

then repeat:
$ ip link set dev br0 address 00:11:22:33:44:55
$ ifconfig br0 			# mac of br0 will be c6:c6:d7:46:b4:4b

This patch reports the mac changed event when ports changed, then
openvswitch will reconfigure the datapath id and mac addr of local
port.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2018-02-05 09:21:36 -08:00
Flavio Leitner
22dcb53449 netdev-linux: do not send packets to down tap ifaces.
Today OVS pushes packets to the TAP interface ignoring its
current state. That works because the kernel will return -EIO
when it's not UP and OVS will just ignore that as it is not
an OVS issue.

However, it causes a huge impact when broadcasts happen when
using userspace datapath accelerated with DPDK (e.g.: action
NORMAL).  This patch improves the situation by checking the
TAP's interface state before issueing any syscall.

However, there might be use-cases moving interfaces to other
networking namespaces and in that case, OVS can't retrieve
the iface state (sets it to DOWN). That would stop the traffic
breaking the use-case. This patch relies on netlink notifications
to find out if the device is local or not. When it's local, the
device state is checked otherwise it will behave as before.

Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2018-01-22 10:28:17 -08:00
Michal Weglicki
971f4b394c netdev: Custom statistics.
- New get_custom_stats interface function is added to netdev. It
  allows particular netdev implementation to expose custom
  counters in dictionary format (counter name/counter value).
- New statistics are retrieved using experimenter code and
  are printed as a result to ofctl dump-ports.
- New counters are available for OpenFlow 1.4+.
- New statistics are printed to output via ofctl only if those
  are present in reply message.
- New statistics definition is added to include/openflow/intel-ext.h.
- Custom statistics are implemented only for dpdk-physical
  port type.
- DPDK-physical implementation uses xstats to collect statistics.
  Only dropped and error counters are exposed.

Co-authored-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Michal Weglicki <michalx.weglicki@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2018-01-10 15:29:13 -08:00
Ben Pfaff
34944e81f0 Merge branch 'dpdk_merge' of https://github.com/istokes/ovs into HEAD 2018-01-02 07:45:17 -08:00
Ben Pfaff
b2befd5bb2 sparse: Add guards to prevent FreeBSD-incompatible #include order.
FreeBSD insists that <sys/types.h> be included before <netinet/in.h> and
that <netinet/in.h> be included before <arpa/inet.h>.  This adds guards to
the "sparse" headers to yield a warning if this order is violated.  This
commit also adjusts the order of many #includes to suit this requirement.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
2017-12-22 12:58:02 -08:00
Ilya Maximets
ad8b0b4fe7 netdev: Remove useless cutlen.
Cutlen already applied while processing OVS_ACTION_ATTR_OUTPUT.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com
2017-12-20 21:07:46 +00:00
Ilya Maximets
b30896c969 netdev: Remove unused may_steal.
Not needed anymore because 'may_steal' already handled on
dpif-netdev layer and always true.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com
2017-12-20 21:07:46 +00:00
Justin Pettit
8a7903c632 Update mailing list archive pointers to the current server.
Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
2017-11-27 14:59:46 -08:00