mir/ovs - ovs - Mike's Git repositories

mir/ovs

mirror of https://github.com/openvswitch/ovs synced 2025-08-28 12:58:00 +00:00

Author	SHA1	Message	Date
Yong Xu	d2e97030ed	netdev-linux: fix compile error in nl_msg_put_act_police Use 'memset' to init memory to 0. This resolves a build problem with clang on Ubuntu 16.04 on ARM (in Travis): libtool: compile: clang -DHAVE_CONFIG_H -I. -I ./include -I ./include -I ./lib -I ./lib -Wstrict-prototypes -Wall -Wextra -Wno-sign-compare -Wpointer-arith -Wformat -Wformat-security -Wswitch-enum -Wunused-parameter -Wbad-function-cast -Wcast-align -Wstrict-prototypes -Wold-style-definition -Wmissing-prototypes -Wmissing-field-initializers -Wthread-safety -fno-strict-aliasing -Wswitch-bool -Wlogical-not-parentheses -Wsizeof-array-argument -Wshift-negative-value -Qunused-arguments -Wshadow -Warray-bounds-pointer-arithmetic -Werror -Werror -g -O2 -Wno-error=unused-command-line-argument -DHAVE_AVX512F -MT lib/netlink-conntrack.lo -MD -MP -MF lib/.deps/netlink-conntrack.Tpo -c lib/netlink-conntrack.c -o lib/netlink-conntrack.o lib/netdev-linux.c:2638:38: error: missing field 'action' initializer [-Werror,-Wmissing-field-initializers] struct tc_police null_police = {0}; ^ 1 error generated. make[2]: * [lib/netdev-linux.lo] Error 1 make[2]: * Waiting for unfinished jobs.... Fixes: c2567e533 ("add port-based ingress policing based packet-per-second rate-limiting") Reported-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Yong Xu <yong.xu@corigine.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2021-07-13 13:32:17 +02:00
Eelco Chaudron	b0d289bb5a	netdev-linux: Ignore TSO packets when TSO is not enabled for userspace. When TSO is disabled from a userspace forwarding datapath perspective, but TSO has been wrongly enabled on the kernel side, log a warning message, and drop the packet. With the current implementation, OVS will crash. [i.maximets]: The call stack looks like this: 0 dp_packet_set_size (b=0x0, b=0x0, v=13028) at lib/dp-packet.h:578 1 netdev_linux_batch_rxq_recv_sock at lib/netdev-linux.c:1310 2 netdev_linux_rxq_recv at lib/netdev-linux.c 3 netdev_rxq_recv at lib/netdev.c 4 dp_netdev_process_rxq_port at lib/dpif-netdev.c The problem is that the code assumes that (mmsgs[i].msg_len > std_len) can only be true if userpace-tso is enabled and additional buffers are provided to the kernel. However, since recvmmsg() is called with MSG_TRUNC, the resulting msg_len reflects the original packet size before truncation, and it can be larger than the buffer if TSO / GRO is enabled on the network interface. If TSO support for user space is not enabled in OVS, the aux_bufs are not allocated and are left NULL, resulting in a crash. Fixes: 73858f9dbe83 ("netdev-linux: Prepend the std packet in the TSO packet") Fixes: 2109841b7984 ("Use batch process recv for tap and raw socket in netdev datapath") Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-09 21:33:41 +02:00
Yong Xu	c2567e533f	add port-based ingress policing based packet-per-second rate-limiting OVS has support for using policing to enforce a rate limit in kilobits per second. This is configured using OVSDB. f.e. $ ovs-vsctl set interface tap0 ingress_policing_rate=1000 $ ovs-vsctl set interface tap0 ingress_policing_burst=100 This patch adds a related feature, allowing policing to enforce a rate limit in kilo-packets per second. This is also configured using OVSDB. $ ovs-vsctl set interface tap0 ingress_policing_kpkts_rate=1000 $ ovs-vsctl set interface tap0 ingress_policing_kpkts_burst=100 The kilo-bit and kilo-packet rate limits may be used separately or in combination. Add separate action for BPS and PPS in netlink message. Revise code and change action result to pipe to allow traffic pipe into second action. This patch implements the feature for: * OVSDB (northbound API) * TC policer when used both with and without TC offload (kernel API) Signed-off-by: Yong Xu <yong.xu@corigine.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2021-07-01 20:44:07 +02:00
Ilya Maximets	577b9a8169	netdev-linux: Fix use of uninitialized LAG master name. 'if_indextoname' may fail leaving the 'master_name' uninitialized: Conditional jump or move depends on uninitialised value(s) at 0x4C34329: strlen (vg_replace_strmem.c:459) by 0x51C638: hash_string (hash.h:342) by 0x51C638: hash_name (shash.c:28) by 0x51CC51: shash_find (shash.c:231) by 0x51CD38: shash_find_data (shash.c:245) by 0x4A797F: netdev_from_name (netdev.c:2013) by 0x544148: netdev_linux_update_lag (netdev-linux.c:676) by 0x544148: netdev_linux_run (netdev-linux.c:769) by 0x4A5997: netdev_run (netdev.c:186) by 0x40752B: main (ovs-vswitchd.c:129) Uninitialised value was created by a stack allocation at 0x543AFA: netdev_linux_run (netdev-linux.c:722) Fixes: d22f8927c3c9 ("netdev-linux: monitor and offload LAG slaves to TC") Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Mark D. Gray <mark.d.gray@redhat.com>	2021-05-24 20:21:56 +02:00
Michal Kazior	d90b4f2928	rtnetlink: ignore IFLA_WIRELESS events. Some older wireless drivers - ones relying on the old and long deprecated wireless extension ioctl system - can generate quite a bit of IFLA_WIRELESS events depending on their configuration and runtime conditions. These are delivered as RTNLGRP_LINK via RTM_NEWLINK messages. These tend to be relatively easily identifiable because they report the change mask being 0. This isn't guaranteed but in practice it shouldn't be a problem. None of the wireless events that I ever observed actually carry any unique information about netdev states that ovs-vswitchd is interested in. Hence ignoring these shouldn't cause any problems. These events can be responsible for a significant CPU churn as ovs-vswitchd attempts to do plenty of work for each and every netlink message regardless of what that message carries. On low-end devices such as consumer-grade routers these can lead to a lot of CPU cycles being wasted, adding up to heat output and reducing performance. It could be argued that wireless drivers in question should be fixed, but that isn't exactly a trivial thing to do. Patching ovs seems far more viable while still making sense. Signed-off-by: Michal Kazior <michal@plume.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-04-20 00:00:22 +02:00
Yong.Xu	67e0e0bc15	netdev-linux: correct unit of burst parameter Correct calculation of burst parameter used when configuring TC policer action for ingress port-based policing in the case where TC offload is in use. This now matches the value calculated for the case where TC offload is not in use. The division by 8 is to convert from bits to bytes. Its unclear why 64 was previously used. Fixes: e7f6ba220 ("lib/tc: add ingress ratelimiting support for tc-offload") Signed-off-by: Yong Xu <yong.xu@corigine.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Louis Peens <louis.peens@netronome.com> Reviewed-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>	2021-04-13 15:36:35 +02:00
William Tu	e7df370cff	netdev-linux: Fix indentation. Remove one extra space. No actual code logic changed. Fixes: 2109841b79845 ("Use batch process recv for tap and raw socket in netdev datapath") Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-02-19 18:40:25 +01:00
Ben Pfaff	91fc374a9c	Eliminate use of term "slave" in bond, LACP, and bundle contexts. The new term is "member". Most of these changes should not change user-visible behavior. One place where they do is in "ovs-ofctl dump-flows", which will now output "members:..." inside "bundle" actions instead of "slaves:...". I don't expect this to cause real problems in most systems. The old syntax is still supported on input for backward compatibility. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>	2020-10-21 11:28:24 -07:00
Yi-Hung Wei	fd4d477760	netdev-linux: Fix broken build on Ubuntu 14.04 Patch 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support") uses __virtio16 which is defined in kernel 3.19. Ubuntu 14.04 is using 3.13 kernel that lacks the virtio_types definition. This patch fixes that. Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support") Acked-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com>	2020-07-08 11:51:15 -07:00
Aaron Conole	7a076a5371	netdev-linux: Update LAG in all cases. In some cases, when processing a netlink change event, it's possible for an alternate part of OvS (like the IPv6 endpoint processing) to hold an active netdev interface. This creates a race-condition, where sometimes the OvS change processing will take the normal path. This doesn't work because the netdev device object won't actually be enslaved to the ovs-system (for instance, a linux bond) and ingress qdisc entries will be missing. To address this, we update the LAG information in ALL cases where LAG information could come in. Fixes: d22f8927c3c9 ("netdev-linux: monitor and offload LAG slaves to TC") Cc: Marcelo Leitner <mleitner@redhat.com> Cc: John Hurley <john.hurley@netronome.com> Acked-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2020-05-16 13:15:50 +02:00
William Tu	5119cfe32d	netdev-afxdp: Fix missing init. When introducing the interrupt mode for netdev-afxdp, the netdev init function is accidentally removed. Fix it by adding it back. Fixes: 5bfc519fee499 ("netdev-afxdp: Add interrupt mode netdev class.") Acked-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com>	2020-05-04 16:13:29 -07:00
Jiang Lidong	b9f825a544	netdev-linux: remove sum of vport stats and kernel netdev stats. When using kernel veth as OVS interface, doubled drop counter value is shown when veth drops packets due to traffic overrun. In netdev_linux_get_stats, it reads both vport stats and kernel netdev stats, in case vport stats retrieve failure. If both of them success, error counters are added to include errors from different layers. But implementation of ovs_vport_get_stats in kernel data path has included kernel netdev stats by calling dev_get_stats. When drop or other error counters is not zero, its value is doubled by netdev_linux_get_stats. In this change, adding kernel netdev stats into vport stats is removed, since vport stats includes all information of kernel netdev stats. Signed-off-by: Jiang Lidong <jianglidong3@jd.com> Signed-off-by: William Tu <u9012063@gmail.com>	2020-04-30 07:35:54 -07:00
William Tu	5bfc519fee	netdev-afxdp: Add interrupt mode netdev class. The patch adds a new netdev class 'afxdp-nonpmd' to enable afxdp interrupt mode. This is similar to 'type=afxdp', except that the is_pmd field is set to false. As a result, the packet processing is handled by main thread, not pmd thread. This avoids burning the CPU to always 100% when there is no traffic. Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2020-04-28 17:58:31 +02:00
Usman Ansari	fed4282c53	netdev-linux.c: Fix coverity unreachable code warning Coverity reports unreachable code in "?" statement Fixed by removing code segment and unused variables & defines Signed-off-by: Usman Ansari <ua1422@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com>	2020-04-06 11:58:35 -07:00
Flavio Leitner	6211ad5708	netdev-linux: Enable TSO in the TAP device. Use ioctl TUNSETOFFLOAD if kernel supports to enable TSO offloading in the tap device. Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support") Reported-by: "Yi Yang (杨�D)-云服务集团" <yangyi01@inspur.com> Tested-by: William Tu <u9012063@gmail.com> Signed-off-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: William Tu <u9012063@gmail.com>	2020-03-02 09:50:17 -08:00
Flavio Leitner	35b5586ba7	userspace TSO: SCTP checksum offload optional. Ideally SCTP checksum offload needs be advertised by the NIC when userspace TSO is enabled. However, very few drivers do that and it's not a widely used protocol. So, this patch enables SCTP checksum offload if available, otherwise userspace TSO can still be enabled but SCTP packets will be dropped on NICs without support. Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support") Signed-off-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2020-02-26 15:24:15 +01:00
Flavio Leitner	8c5163fe81	userspace TSO: Include UDP checksum offload. Virtio doesn't expose flags to control which protocols checksum offload needs to be enabled or disabled. This patch checks if the NIC supports UDP checksum offload and active it when TSO is enabled. Reported-by: Ilya Maximets <i.maximets@ovn.org> Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support") Signed-off-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2020-02-26 15:24:15 +01:00
Flavio Leitner	73858f9dbe	netdev-linux: Prepend the std packet in the TSO packet Usually TSO packets are close to 50k, 60k bytes long, so to to copy less bytes when receiving a packet from the kernel change the approach. Instead of extending the MTU sized packet received and append with remaining TSO data from the TSO buffer, allocate a TSO packet with enough headroom to prepend the std packet data. Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support") Suggested-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ben Pfaff <blp@ovn.org>	2020-02-06 11:37:23 -08:00
William Tu	105cf8df82	netdev-linux: Detect numa node id. The patch detects the numa node id from the name of the netdev, by reading the '/sys/class/net/<devname>/device/numa_node'. If not available, ex: virtual device, or any error happens, return numa id 0. Currently only the afxdp netdev type uses it, other linux netdev types are disabled due to no use case. Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2020-01-18 01:42:22 +01:00
Flavio Leitner	29cf9c1b3b	userspace: Add TCP Segmentation Offload support Abbreviated as TSO, TCP Segmentation Offload is a feature which enables the network stack to delegate the TCP segmentation to the NIC reducing the per packet CPU overhead. A guest using vhostuser interface with TSO enabled can send TCP packets much bigger than the MTU, which saves CPU cycles normally used to break the packets down to MTU size and to calculate checksums. It also saves CPU cycles used to parse multiple packets/headers during the packet processing inside virtual switch. If the destination of the packet is another guest in the same host, then the same big packet can be sent through a vhostuser interface skipping the segmentation completely. However, if the destination is not local, the NIC hardware is instructed to do the TCP segmentation and checksum calculation. It is recommended to check if NIC hardware supports TSO before enabling the feature, which is off by default. For additional information please check the tso.rst document. Signed-off-by: Flavio Leitner <fbl@sysclose.org> Tested-by: Ciara Loftus <ciara.loftus.intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2020-01-17 22:27:25 +00:00
Yi Yang	2109841b79	Use batch process recv for tap and raw socket in netdev datapath Current netdev_linux_rxq_recv_tap and netdev_linux_rxq_recv_sock just receive single packet, that is very inefficient, per my test case which adds two tap ports or veth ports into OVS bridge (datapath_type=netdev) and use iperf3 to do performance test between two ports (they are set into different network name space). The result is as below: tap: 295 Mbits/sec veth: 207 Mbits/sec After I change netdev_linux_rxq_recv_tap and netdev_linux_rxq_recv_sock to use batch process, the performance is boosted by about 7 times, here is the result: tap: 1.96 Gbits/sec veth: 1.47 Gbits/sec Undoubtedly this is a huge improvement although it can't match OVS kernel datapath yet. FYI: here is thr result for OVS kernel datapath: tap: 37.2 Gbits/sec veth: 36.3 Gbits/sec Note: performance result is highly related with your test machine, you shouldn't expect the same results on your test machine. Signed-off-by: Yi Yang <yangyi01@inspur.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2020-01-09 09:48:49 -08:00
Paul Blakey	acdd544c4c	tc: Introduce tcf_id to specify a tc filter Move all that is needed to identify a tc filter to a new structure, tcf_id. This removes a lot of duplication in accessing/creating tc filters. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2019-12-22 11:54:40 +01:00
William Tu	7bf075d95a	netdev-afxdp: Enable libbpf logging to OVS. libbpf has pr_warn, pr_info, and pr_debug. The patch registers these print functions, integrating the libbpf logs to OVS log. Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Eelco Chaudron <echaudro@redhat.com>	2019-11-21 09:20:10 -08:00
Eelco Chaudron	52b5a5c0a3	netdev-afxdp: add afxdp specific maximum MTU check Drivers natively supporting AF_XDP will check that a configured MTU size will not exceed the allowed size for AF_XDP. However, when the skb compatibility mode is used there is no check and any value is accepted. This, for example, is the case when using the TAP interface. This fix adds a check to make sure only AF_XDP valid values are excepted. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: William Tu <u9012063@gmail.com>	2019-11-19 11:20:49 -08:00
Ilya Maximets	d560bc1baa	netdev-afxdp: Convert AFXDP_DEBUG to custom stats. These are valid statistics of a network interface and should be exposed via custom stats. The same MACRO trick as in vswitchd/bridge.c is used to reduce code duplication and easily add new stats if necessary in the future. Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>	2019-07-24 19:22:05 +03:00
Ilya Maximets	f627cf1dd9	netdev-afxdp: Fix use of unconfigured device. In case of failure of 'xsk_configure_all()', 'n_rxq' and 'xdpmode' will remain in a new state. This will result in successful reconfiguration (immediate return, because configuration is already applied) if 'netdev_reconfigure()' will be called again. Same issue was fixed previously for netdev-dpdk using 'dev->started' flag in commit: 606f66507250 ("netdev-dpdk: Don't use PMD driver if not configured successfully") Let's use similar approach with checking the 'dev->xsks' which only exists if configuration was successful. Additionally implemented 'netdev_afxdp_construct()' function to explicitly initialize all the specific fields and request the reconfiguration. CC: William Tu <u9012063@gmail.com> Fixes: 0de1b425962d ("netdev-afxdp: add new netdev type for AF_XDP.") Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>	2019-07-23 10:35:29 +03:00
William Tu	0de1b42596	netdev-afxdp: add new netdev type for AF_XDP. The patch introduces experimental AF_XDP support for OVS netdev. AF_XDP, the Address Family of the eXpress Data Path, is a new Linux socket type built upon the eBPF and XDP technology. It is aims to have comparable performance to DPDK but cooperate better with existing kernel's networking stack. An AF_XDP socket receives and sends packets from an eBPF/XDP program attached to the netdev, by-passing a couple of Linux kernel's subsystems As a result, AF_XDP socket shows much better performance than AF_PACKET For more details about AF_XDP, please see linux kernel's Documentation/networking/af_xdp.rst. Note that by default, this feature is not compiled in. Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>	2019-07-19 17:42:06 +03:00
Ilya Maximets	5fc5c50f3d	netdev: Dynamic per-port Flow API. Current issues with Flow API: * OVS calls offloading functions regardless of successful flow API initialization. (ex. on init_flow_api failure) * Static initilaization of Flow API for a netdev_class forbids having different offloading types for different instances of netdev with the same netdev_class. (ex. different vports in 'system' and 'netdev' datapaths at the same time) Solution: * Move Flow API from the netdev_class to netdev instance. * Make Flow API dynamic, i.e. probe the APIs and choose the suitable one. Side effects: * Flow API providers localized as possible in their modules. * Now we have an ability to make runtime checks. For example, we could check if particular device supports features we need, like if dpdk device supports RSS+MARK action. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Roi Dayan <roid@mellanox.com>	2019-06-11 09:39:36 +03:00
Tonghao Zhang	718be50dae	netdev-linux: Add coverage counters for netdev_set_policing when ingress tc-offload When enable tc-offload, we should add coverage counters for netdev_set_policing. Fixes: e7f6ba220e10 ("lib/tc: add ingress ratelimiting support for tc-offload") Cc: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2019-04-22 10:06:41 -07:00
Flavio Leitner	23fa50f64b	netlink linux: fix to append the netnsid netlink attr. The attribute was being prepended to the netlink buffer, but the function nl_sock_transact_multiple__() expects to find the netlink header as first to update the length, seq and pid fields. This patch fixes to append the attribute instead of prepending it. Fixes: 756819ddd788 ("netdev-linux: use netlink to update netdev.") Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ben Pfaff <blp@ovn.org>	2019-04-16 15:48:59 -07:00
Flavio Leitner	b43762a5ad	netlink linux: account for the netnsid netlink attr. The buffer needs to be reallocated and data copied when the netnsid netlink attribute is included, so avoid that by accounting the attribute when the buffer is initially allocated. Fixes: 756819ddd788 ("netdev-linux: use netlink to update netdev.") Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ben Pfaff <blp@ovn.org>	2019-04-16 15:48:50 -07:00
John Hurley	608ff46aaf	ovs-tc: offload datapath rules matching on internal ports Rules applied to OvS internal ports are not represented in TC datapaths. However, it is possible to support rules matching on internal ports in TC. The start_xmit ndo of OvS internal ports directs packets back into the OvS kernel datapath where they are rematched with the ingress port now being that of the internal port. Due to this, rules matching on an internal port can be added as TC filters to an egress qdisc for these ports. Allow rules applied to internal ports to be offloaded to TC as egress filters. Rules redirecting to an internal port are also offloaded. These are supported by the redirect ingress functionality applied in an earlier patch. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2019-04-10 13:55:59 +02:00
John Hurley	95255018a8	ovs-tc: allow offloading TC rules to egress qdiscs Offloading rules to a TC datapath only allows the creating of ingress hook qdiscs and the application of filters to these. However, there may be certain situations where an egress qdisc is more applicable (e.g. when offloading to TC rules applied to OvS internal ports). Extend the TC API in OvS to allow the creation of egress qdiscs and to add or interact with flower filters applied to these. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2019-04-09 17:34:07 +02:00
Sharon K	2f564bb153	netdev-linux: netem QoS support Signed-off-by: Sharon Krendel <thekafkaf@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2019-03-14 19:20:19 -07:00
Roi Dayan	cae643534e	netdev-linux: Remove ingress qdisc before trying to add shared block Adding shared ingress block with ingress qdisc already exists results in a failure. So remove the ingress qdisc first. Also while at it log the slave name. Signed-off-by: Roi Dayan <roid@mellanox.com> Acked-by: John Hurley <john.hurley@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2019-03-12 10:18:29 +01:00
Pieter Jansen van Vuuren	e7f6ba220e	lib/tc: add ingress ratelimiting support for tc-offload Firstly this patch introduces the notion of reserved priority, as the filter implementing ingress policing would require the highest priority. Secondly it allows setting rate limiters while tc-offloads has been enabled. Lastly it installs a matchall filter that matches all traffic and then applies a police action, when configuring an ingress rate limiter. An example of what to expect: OvS CLI: ovs-vsctl set interface <netdev_name> ingress_policing_rate=5000 ovs-vsctl set interface <netdev_name> ingress_policing_burst=100 Resulting TC filter: filter protocol ip pref 1 matchall chain 0 filter protocol ip pref 1 matchall chain 0 handle 0x1 not_in_hw action order 1: police 0x1 rate 5Mbit burst 125Kb mtu 64Kb action drop/continue overhead 0b ref 1 bind 1 installed 3 sec used 3 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.200 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 131072 16384 16384 60.13 4.49 ovs-vsctl list interface <netdev_name> _uuid : 2ca774e8-8b95-430f-a2c2-f8f742613ab1 admin_state : up ... ingress_policing_burst: 100 ingress_policing_rate: 5000 ... type : "" Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2019-03-04 17:22:34 +01:00
Ben Pfaff	61265c03f0	netdev-linux: Fix function argument order in sfq_tc_load(). sfq_install__() takes quantum before perturb. Acked-by: Justin Pettit <jpettti@ovn.org> Reported-by: shaoke xi <xishaoke.xsk@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2019-01-17 16:31:44 -08:00
Ben Pfaff	64ed99ffbc	netdev-linux: Don't include <net/if_packet.h>. This header only defines sockaddr_pkt, which this source file doesn't use. This was the only user of net/if_packet.h, so also remove the configure-time test for it (which netdev-linux wasn't using anyway). Reported-by: Andre McCurdy <armccurdy@gmail.com> Reported-at: https://github.com/openvswitch/ovs/pull/253 Signed-off-by: Ben Pfaff <blp@ovn.org>	2018-10-03 16:55:55 -07:00
Andre McCurdy	b24751fff8	netdev-linux: use unsigned int for ifi_flags temporary variables ifi_flags in struct netdev_linux is an unsigned int, therefore use unsigned int for variables which will hold ifi_flags values. Signed-off-by: Andre McCurdy <armccurdy@gmail.com>	2018-10-02 15:39:35 -07:00
Ben Pfaff	89c09c1cd1	netdev: Clean up class initialization. The macros are hard to read. This makes it a little more readable. Signed-off-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-08-27 17:48:23 +01:00
Ben Pfaff	1bab4901c4	netdev-linux: Avoid division by 0 if kernel reports bad scheduler data. If the kernel reported a value of 0 for the second value in /proc/net/psched, it would cause a division-by-zero fault in read_psched(). I don't know of a kernel that would actually do that, but it's still better to be safe. Found by clang static analyzer. Reported-by: Bhargava Shastry <bshastry@sect.tu-berlin.de> Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>	2018-08-20 09:30:11 -07:00
Tiago Lam	e3b5d7c536	netdev-linux: Fix segfault in update_lag(). A bissect shows that commit d22f892 ("netdev-linux: monitor and offload LAG slaves to TC") introduced netdev_linux_update_lag(), which is now triggering a crash in the "datapath - ping over bond" test in system-userspace-testsuite: (gdb) bt #0 0x00000000009762e7 in netdev_linux_update_lag (change=0x7ffdff013750) at lib/netdev-linux.c:728 728 if (is_netdev_linux_class(master_netdev->netdev_class)) { This fixes the crash by simply returning in case netdev_from_name() returns NULL, as this should indicate the master is not attached to the bridge. Additionally, netdev_linux_update_lag() isn't "clearing" the netdev reference it gets from netdev_from_name(), meaning its ref_cnt is incremented but never decremented. Thus, also call netdev_close() before returning. CC: John Hurley <john.hurley@netronome.com> Fixes: d22f8927 ("netdev-linux: monitor and offload LAG slaves to TC") Signed-off-by: Tiago Lam <tiago.lam@intel.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2018-07-05 14:05:21 -07:00
John Hurley	d22f8927c3	netdev-linux: monitor and offload LAG slaves to TC A LAG slave cannot be added directly to an OvS bridge, nor can a OvS bridge port be added to a LAG dev. However, LAG masters can be added to OvS. Use TC blocks to indirectly offload slaves when their master is attached as a linux-netdev to an OvS bridge. In the kernel TC datapath, blocks link together netdevs in a similar way to LAG devices. For example, if a filter is added to a block then it is added to all block devices, or if stats are incremented on 1 device then the stats on the entire block are incremented. This mimics LAG devices in that if a rule is applied to the LAG master then it should be applied to all slaves etc. Monitor LAG slaves via the netlink socket in netdev-linux and, if their master is attached to the OvS bridge and has a block id, add the slave's qdisc to the same block. Similarly, if a slave is freed from a master, remove the qdisc from the masters block. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2018-06-29 14:57:47 +02:00
John Hurley	25db83be5a	netdev-linux: assign LAG devs to tc blocks Assign block ids to LAG masters that are added to OvS as linux-netdevs and offloaded via offload API calls. Only LAG masters are assigned to blocks. To ensure uniqueness, the block ids are determined by the netdev ifindex. Implement a get_block_id op for linux netdevs to achieve this. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2018-06-29 14:57:44 +02:00
John Hurley	3d9c99ab8a	netdev-linux: indicate if netdev is a LAG master If a linux netdev is added to OvS that is a LAG master (for example, a bond or team netdev) then record this in bool form in the dev struct. Use the link info extracted from rtnetlink calls to determine this. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2018-06-29 14:57:40 +02:00
John Hurley	88dcf2aa82	netdev-provider: add class op to get block_id Add a new class op for netdevs to get the block_id if one exists. The block_id is used in offload ops to group multiple qdiscs together. Stub calls are made to the new class op (implementation to follow in further patches). The default block_id of 0 (no block) will be used in these cases. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2018-06-29 14:51:47 +02:00
John Hurley	093c9458fb	tc: allow offloading of block ids Blocks, in tc classifiers, allow the grouping of multiple qdiscs with an associated block id. Whenever a filter is added to/removed from this block, the filter is added to/removed from all associated qdiscs. Extend TC offload functions to take a block id as a parameter. If the id is zero then the dqisc is not considered part of a block. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2018-06-29 14:33:59 +02:00
Flavio Leitner	19aac14ae4	tap: flag as present after opening it. Assume the device is present if it can be opened. Reported-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Eelco Chaudron <echaudro@redhat.com> Tested-by: Eelco Chaudron <echaudro@redhat.com>	2018-06-14 16:50:28 -07:00
Flavio Leitner	629e1476b1	linux: Assume it is local if no API is available. If the 'openvswitch' kernel module is not loaded, the API is not available and the userspace will keep retrying. This approach is not ideal for the netdev datapath type. This patch disables network netns support if the error code returned indicates that the API is not available. Reported-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Eelco Chaudron <echaudro@redhat.com> Tested-by: Eelco Chaudron <echaudro@redhat.com>	2018-06-14 16:06:11 -07:00
Flavio Leitner	3dbcbfe4a9	linux: disable netns support for tap. Tap device is not added to the kernel datapath, so there is no way to get netns information. Reported-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Eelco Chaudron <echaudro@redhat.com> Tested-by: Eelco Chaudron <echaudro@redhat.com>	2018-06-14 15:57:14 -07:00

1 2 3 4 5 ...

367 Commits