mir/ovs - ovs - Mike's Git repositories

mir/ovs

mirror of https://github.com/openvswitch/ovs synced 2025-08-30 13:58:14 +00:00

Author	SHA1	Message	Date
Mike Pattrick	d7e77143fb	tunnel: Allow UDP zero checksum with IPv6 tunnels. This patch adopts the proposed RFC 6935 by allowing null UDP checksums even if the tunnel protocol is IPv6. This is already supported by Linux through the udp6zerocsumtx tunnel option. It is disabled by default and IPv6 tunnels are flagged as requiring a checksum, but this patch enables the user to set csum=false on IPv6 tunnels. Acked-by: Simon Horman <horms@ovn.org> Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-07-12 11:54:45 +02:00
Nobuhiro MIKI	701c2dbfb8	userspace: Add new option srv6_flowlabel in SRv6 tunnel. It supports flowlabel based load balancing by controlling the flowlabel of outer IPv6 header, which is already implemented in Linux kernel as seg6_flowlabel sysctl [1]. [1]: https://docs.kernel.org/networking/seg6-sysctl.html Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-05-25 17:08:32 +02:00
Ilya Maximets	ce8828a372	netdev-vport: RCU-fy tunnel config. Tunnel config can be accessed by multiple threads at the same time and it is supposed to be protected by the netdev_vport mutex. However, many functions are getting direct access to it via netdev API without taking the mutex, creating a potential for various race conditions. Fix that by protecting the tunnel config with RCU. The whole structure is replaced on configuration changes. Individual fields are never updated and the structure itself is constant. This way it can be safely used by different threads within RCU grace period. Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-05-25 15:45:35 +02:00
Ilya Maximets	be6f096fbe	netdev-vport: Fix unsafe handling of GRE sequence number. GRE sequence number is maintained as part of the tunnel config. This triggers tunnel reconfiguration every time set_tunnel_config() is called, because memset over tunnel config will never be equal to the new config constructed from database options. And sequence number incremented non-atomically without holding a mutex on tunnel push, that may lead to corruption if multiple threads are sending packets to the same tunnel. Fix that by moving sequence number to the netdev_vport structure instead and using an atomic counter. Fixes: `0ffff49753` ("userspace: add gre sequence number support.") Fixes: `7dc18ae96d` ("userspace: add erspan tunnel support.") Fixes: `3c6d05a02e` ("userspace: Add GTP-U support.") Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-05-25 15:45:08 +02:00
Nobuhiro MIKI	03fc1ad785	userspace: Add SRv6 tunnel support. SRv6 (Segment Routing IPv6) tunnel vport is responsible for encapsulation and decapsulation the inner packets with IPv6 header and an extended header called SRH (Segment Routing Header). See spec in: https://datatracker.ietf.org/doc/html/rfc8754 This patch implements SRv6 tunneling in userspace datapath. It uses `remote_ip` and `local_ip` options as with existing tunnel protocols. It also adds a dedicated `srv6_segs` option to define a sequence of routers called segment list. Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-03-29 22:16:04 +02:00
Yong Xu	c2567e533f	add port-based ingress policing based packet-per-second rate-limiting OVS has support for using policing to enforce a rate limit in kilobits per second. This is configured using OVSDB. f.e. $ ovs-vsctl set interface tap0 ingress_policing_rate=1000 $ ovs-vsctl set interface tap0 ingress_policing_burst=100 This patch adds a related feature, allowing policing to enforce a rate limit in kilo-packets per second. This is also configured using OVSDB. $ ovs-vsctl set interface tap0 ingress_policing_kpkts_rate=1000 $ ovs-vsctl set interface tap0 ingress_policing_kpkts_burst=100 The kilo-bit and kilo-packet rate limits may be used separately or in combination. Add separate action for BPS and PPS in netlink message. Revise code and change action result to pipe to allow traffic pipe into second action. This patch implements the feature for: * OVSDB (northbound API) * TC policer when used both with and without TC offload (kernel API) Signed-off-by: Yong Xu <yong.xu@corigine.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2021-07-01 20:44:07 +02:00
Martin Varghese	ebe0e518b0	tunnel: Bareudp Tunnel Support. There are various L3 encapsulation standards using UDP being discussed to leverage the UDP based load balancing capability of different networks. MPLSoUDP (__ https://tools.ietf.org/html/rfc7510) is one among them. The Bareudp tunnel provides a generic L3 encapsulation support for tunnelling different L3 protocols like MPLS, IP, NSH etc. inside a UDP tunnel. An example to create bareudp device to tunnel MPLS traffic is given $ ovs-vsctl add-port br_mpls udp_port -- set interface udp_port \ type=bareudp options:remote_ip=2.1.1.3 options:local_ip=2.1.1.2 \ options:payload_type=0x8847 options:dst_port=6635 The bareudp device supports special handling for MPLS & IP as they can have multiple ethertypes. MPLS procotcol can have ethertypes ETH_P_MPLS_UC (unicast) & ETH_P_MPLS_MC (multicast). IP protocol can have ethertypes ETH_P_IP (v4) & ETH_P_IPV6 (v6). The bareudp device to tunnel L3 traffic with multiple ethertypes (MPLS & IP) can be created by passing the L3 protocol name as string in the field payload_type. An example to create bareudp device to tunnel MPLS unicast & multicast traffic is given below.:: $ ovs-vsctl add-port br_mpls udp_port -- set interface udp_port \ type=bareudp options:remote_ip=2.1.1.3 options:local_ip=2.1.1.2 \ options:payload_type=mpls options:dst_port=6635 Signed-off-by: Martin Varghese <martin.varghese@nokia.com> Acked-By: Greg Rose <gvrose8192@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2020-12-22 12:51:22 +01:00
Ilya Maximets	48c1ab5d74	netdev: Allow storing dpif type into netdev structure. Storing of the dpif type of the owning datapath interface will allow us to easily distinguish, for example, userspace tunneling ports from the system ones. This is required in terms of HW offloading to avoid offloading of userspace flows to kernel interfaces that doesn't belong to userspace datapath, but have same dpif_port names. Acked-by: Eli Britstein <elibr@mellanox.com> Acked-by: Roni Bar Yanai <roniba@mellanox.com> Acked-by: Ophir Munk <ophirmu@mellanox.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2020-07-08 19:07:21 +02:00
David Marchand	35c91567c8	dpif-netdev: Only poll enabled vhost queues. We currently poll all available queues based on the max queue count exchanged with the vhost peer and rely on the vhost library in DPDK to check the vring status beneath. This can lead to some overhead when we have a lot of unused queues. To enhance the situation, we can skip the disabled queues. On rxq notifications, we make use of the netdev's change_seq number so that the pmd thread main loop can cache the queue state periodically. $ ovs-appctl dpif-netdev/pmd-rxq-show pmd thread numa_id 0 core_id 1: isolated : true port: dpdk0 queue-id: 0 (enabled) pmd usage: 0 % pmd thread numa_id 0 core_id 2: isolated : true port: vhost1 queue-id: 0 (enabled) pmd usage: 0 % port: vhost3 queue-id: 0 (enabled) pmd usage: 0 % pmd thread numa_id 0 core_id 15: isolated : true port: dpdk1 queue-id: 0 (enabled) pmd usage: 0 % pmd thread numa_id 0 core_id 16: isolated : true port: vhost0 queue-id: 0 (enabled) pmd usage: 0 % port: vhost2 queue-id: 0 (enabled) pmd usage: 0 % $ while true; do ovs-appctl dpif-netdev/pmd-rxq-show \|awk ' /port: / { tot++; if ($5 == "(enabled)") { en++; } } END { print "total: " tot ", enabled: " en }' sleep 1 done total: 6, enabled: 2 total: 6, enabled: 2 ... # Started vm, virtio devices are bound to kernel driver which enables # F_MQ + all queue pairs total: 6, enabled: 2 total: 66, enabled: 66 ... # Unbound vhost0 and vhost1 from the kernel driver total: 66, enabled: 66 total: 66, enabled: 34 ... # Configured kernel bound devices to use only 1 queue pair total: 66, enabled: 34 total: 66, enabled: 19 total: 66, enabled: 4 ... # While rebooting the vm total: 66, enabled: 4 total: 66, enabled: 2 ... total: 66, enabled: 66 ... # After shutting down the vm total: 66, enabled: 66 total: 66, enabled: 2 Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2019-06-26 18:43:39 +01:00
Ilya Maximets	b6cabb8f8f	netdev: Split up netdev offloading to separate module. New module 'netdev-offload' created to manage different flow API implementations. All the generic and provider independent code moved there from the 'netdev' module. Flow API providers further encapsulated. The only function that was changed is 'netdev_any_oor'. Now it uses offloading related hmap instead of common 'netdev_shash'. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ben Pfaff <blp@ovn.org> Acked-by: Roi Dayan <roid@mellanox.com>	2019-06-11 09:39:36 +03:00
Sriharsha Basavapatna via dev	57924fc91c	revalidator: Rebalance offloaded flows based on the pps rate This is the third patch in the patch-set to support dynamic rebalancing of offloaded flows. The dynamic rebalancing functionality is implemented in this patch. The ukeys that are not scheduled for deletion are obtained and passed as input to the rebalancing routine. The rebalancing is done in the context of revalidation leader thread, after all other revalidator threads are done with gathering rebalancing data for flows. For each netdev that is in OOR state, a list of flows - both offloaded and non-offloaded (pending) - is obtained using the ukeys. For each netdev that is in OOR state, the flows are grouped and sorted into offloaded and pending flows. The offloaded flows are sorted in descending order of pps-rate, while pending flows are sorted in ascending order of pps-rate. The rebalancing is done in two phases. In the first phase, we try to offload all pending flows and if that succeeds, the OOR state on the device is cleared. If some (or none) of the pending flows could not be offloaded, then we start replacing an offloaded flow that has a lower pps-rate than a pending flow, until there are no more pending flows with a higher rate than an offloaded flow. The flows that are replaced from the device are added into kernel datapath. A new OVS configuration parameter "offload-rebalance", is added to ovsdb. The default value of this is "false". To enable this feature, set the value of this parameter to "true", which provides packets-per-second rate based policy to dynamically offload and un-offload flows. Note: This option can be enabled only when 'hw-offload' policy is enabled. It also requires 'tc-policy' to be set to 'skip_sw'; otherwise, flow offload errors (specifically ENOSPC error this feature depends on) reported by an offloaded device are supressed by TC-Flower kernel module. Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Co-authored-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com> Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com> Reviewed-by: Sathya Perla <sathya.perla@broadcom.com> Reviewed-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2018-10-19 11:27:52 +02:00
Sriharsha Basavapatna via dev	738c785ff1	dpif-netlink: Detect Out-Of-Resource condition on a netdev This is the first patch in the patch-set to support dynamic rebalancing of offloaded flows. The patch detects OOR condition on a netdev port when ENOSPC error is returned by TC-Flower while adding a flow rule. A new structure is added to the netdev called "netdev_hw_info", to store OOR related information required to perform dynamic offload-rebalancing. Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Co-authored-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com> Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com> Reviewed-by: Sathya Perla <sathya.perla@broadcom.com> Reviewed-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2018-10-19 11:27:45 +02:00
Eli Britstein	d9677a1f0e	netdev-tc-offloads: TC csum option is not matched with tunnel configuration Tunnels (gre, geneve, vxlan) support 'csum' option (true/false), default is false. Generated encap TC rule will now be configured as the tunnel configuration. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2018-10-16 09:28:30 +02:00
Yuanhan Liu	241bad15d9	dpif-netdev: associate flow with a mark id Most modern NICs have the ability to bind a flow with a mark, so that every packet matches such flow will have that mark present in its descriptor. The basic idea of doing that is, when we receives packets later, we could directly get the flow from the mark. That could avoid some very costly CPU operations, including (but not limiting to) miniflow_extract, emc lookup, dpcls lookup, etc. Thus, performance could be greatly improved. Thus, the major work of this patch is to associate a flow with a mark id (an uint32_t number). The association in netdev datapath is done by CMAP, while in hardware it's done by the rte_flow MARK action. One tricky thing in OVS-DPDK is, the flow tables is per-PMD. For the case there is only one phys port but with 2 queues, there could be 2 PMDs. In other words, even for a single mega flow (i.e. udp,tp_src=1000), there could be 2 different dp_netdev flows, one for each PMD. That could results to the same mega flow being offloaded twice in the hardware, worse, we may get 2 different marks and only the last one will work. To avoid that, a megaflow_to_mark CMAP is created. An entry will be added for the first PMD that wants to offload a flow. For later PMDs, it will see such megaflow is already offloaded, then the flow will not be offloaded to HW twice. Meanwhile, the mark to flow mapping becomes to 1:N mapping. That is what the mark_to_flow CMAP is for. When the first PMD wants to offload a flow, it allocates a new mark and performs the flow offload by reusing the ->flow_put method. When it succeeds, a "mark to flow" entry will be added. For later PMDs, it will get the corresponding mark by above megaflow_to_mark CMAP. Then, another "mark to flow" entry will be added. Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org> Co-authored-by: Finn Christensen <fc@napatech.com> Signed-off-by: Finn Christensen <fc@napatech.com> Co-authored-by: Shahaf Shuler <shahafs@mellanox.com> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-07-06 10:32:52 +01:00
John Hurley	88dcf2aa82	netdev-provider: add class op to get block_id Add a new class op for netdevs to get the block_id if one exists. The block_id is used in offload ops to group multiple qdiscs together. Stub calls are made to the new class op (implementation to follow in further patches). The default block_id of 0 (no block) will be used in these cases. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2018-06-29 14:51:47 +02:00
Gavi Teitz	d63ca5329f	dpctl: Properly reflect a rule's offloaded to HW state Previously, any rule that is offloaded via a netdev, not necessarily to the HW, would be reported as "offloaded". This patch fixes this misalignment, and introduces the 'dp' state, as follows: rule is in HW via TC offload -> offloaded=yes dp:tc rule is in not HW over TC DP -> offloaded=no dp:tc rule is in not HW over OVS DP -> offloaded=no dp:ovs To achieve this, the flows's 'offloaded' flag was encapsulated in a new attrs struct, which contains the offloaded state of the flow and the DP layer the flow is handled in, and instead of setting the flow's 'offloaded' state based solely on the type of dump it was acquired via, for netdev flows it now sends the new attrs struct to be collected along with the rest of the flow via the netdev, allowing it to be set per flow. For TC offloads, the offloaded state is set based on the 'in_hw' and 'not_in_hw' flags received from the TC as part of the flower. If no such flag was received, due to lack of kernel support, it defaults to true. Signed-off-by: Gavi Teitz <gavi@mellanox.com> Acked-by: Roi Dayan <roid@mellanox.com> [simon: resolved conflict in lib/dpctl.man] Signed-off-by: Simon Horman <simon.horman@netronome.com>	2018-06-18 09:57:37 +02:00
Greg Rose	068794b43f	erspan: Add flow-based erspan options The patch add supports for flow-based erspan options. The erspan_ver, erspan_idx, erspan_dir, and erspan_hwid can be set as "flow" so that its value is set by the openflow rule, instead of statically configured at port creation time. Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2018-05-21 20:33:30 -07:00
William Tu	7dc18ae96d	userspace: add erspan tunnel support. ERSPAN is a tunneling protocol based on GRE tunnel. The patch add erspan tunnel support for ovs-vswitchd with userspace datapath. Configuring erspan tunnel is similar to gre tunnel, but with additional erspan's parameters. Matching a flow on erspan's metadata is also supported, see ovs-fields for more details. Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2018-05-21 20:33:30 -07:00
William Tu	0ffff49753	userspace: add gre sequence number support. The patch adds support for gre sequence number. Default is disable. When enable with 'options:seq=true', the outgoing gre packet will have its sequence number incremented by one. Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2018-05-21 20:33:30 -07:00
Jan Scheurich	8492adc270	netdev: Add optional qfill output parameter to rxq_recv() If the caller provides a non-NULL qfill pointer and the netdev implemementation supports reading the rx queue fill level, the rxq_recv() function returns the remaining number of packets in the rx queue after reception of the packet burst to the caller. If the implementation does not support this, it returns -ENOTSUP instead. Reading the remaining queue fill level should not substantilly slow down the recv() operation. A first implementation is provided for ethernet and vhostuser DPDK ports in netdev-dpdk.c. This output parameter will be used in the upcoming commit for PMD performance metrics to supervise the rx queue fill level for DPDK vhostuser ports. Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Acked-by: Billy O'Mahony <billy.o.mahony@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-05-11 08:08:24 +01:00
Ben Pfaff	ee4776b8bc	netdev: New function netdev_get_ip_by_name(). This is like netdev_get_in4_by_name() but accepts any IP address instead of just an IPv4 address. It will acquire its first user in an upcoming commit. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Mark Michelson <mmichels@redhat.com>	2018-04-16 14:53:27 -07:00
Michal Weglicki	971f4b394c	netdev: Custom statistics. - New get_custom_stats interface function is added to netdev. It allows particular netdev implementation to expose custom counters in dictionary format (counter name/counter value). - New statistics are retrieved using experimenter code and are printed as a result to ofctl dump-ports. - New counters are available for OpenFlow 1.4+. - New statistics are printed to output via ofctl only if those are present in reply message. - New statistics definition is added to include/openflow/intel-ext.h. - Custom statistics are implemented only for dpdk-physical port type. - DPDK-physical implementation uses xstats to collect statistics. Only dropped and error counters are exposed. Co-authored-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Michal Weglicki <michalx.weglicki@intel.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2018-01-10 15:29:13 -08:00
Ilya Maximets	b30896c969	netdev: Remove unused may_steal. Not needed anymore because 'may_steal' already handled on dpif-netdev layer and always true. Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com	2017-12-20 21:07:46 +00:00
Bhanuprakash Bodireddy	fea6740fe2	netdev: Reorder elements in netdev_tunnel_config structure. By reordering elements in netdev_tunnel_config structure, sum holes and pad bytes can be reduced. Before: structure size: 96, sum holes: 17, pad bytes: 4, cachelines:2 After : structure size: 80, sum holes: 5, pad bytes: 0, cachelines:2 Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2017-11-03 12:54:28 -07:00
Roi Dayan	dfaf79ddd9	dpif: Refactor obj type from void pointer to dpif_class It's basically what is being passed today and passing a specific type adds a compiler type check. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2017-07-27 10:17:46 +02:00
Ben Pfaff	875ab13020	userspace: Handling of versatile tunnel ports In netdev_gre_build_header(), GRE protocol and VXLAN next_potocol is set based on packet_type of flow. If it's about an Ethernet packet, it is set to ETP_TYPE_TEB. Otherwise, if the name space is OFPHTN_ETHERNET, it is set according to the name space type. Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2017-06-27 17:28:30 -04:00
Paul Blakey	6c34398480	dpif-netlink: Use netdev flow get api to query a flow Search all datapath added netdevs for a given flow using netdev flow api and parse it back to dpif flow. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2017-06-15 11:49:26 +02:00
Paul Blakey	0335a89ced	dpif-netlink: Use netdev flow del api to delete a flow If a flow was offloaded to a netdev we delete it using netdev flow api. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2017-06-15 11:48:22 +02:00
Paul Blakey	f2280b4198	dpif-netlink: Dump netdevs flows on flow dump While dumping flows, dump flows that were offloaded to netdev and parse them back to dpif flow. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2017-06-15 11:39:51 +02:00
Paul Blakey	f7dde6df70	dpif-netlink: Flush added ports using netdev flow api If netdev flow offloading is enabled, flush all added ports using netdev flow api. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2017-06-15 11:39:45 +02:00
Paul Blakey	32b77c316d	dpif: Save added ports in a port map for netdev flow api use To use netdev flow offloading api, dpifs needs to iterate over added ports. This addition inserts the added dpif ports in a hash map, The map will also be used to translate dpif ports to netdevs. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2017-06-15 11:39:41 +02:00
Paul Blakey	53611f7b05	other-config: Add hw-offload switch to control netdev flow offloading Add a new configuration option - hw-offload that enables netdev flow api. Enabling this option will allow offloading flows using netdev implementation instead of the kernel datapath. This configuration option defaults to false - disabled. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2017-06-14 10:13:25 +02:00
Paul Blakey	18ebd48cfb	netdev: Adding a new netdev API to be used for offloading flows Add a new API interface for offloading dpif flows to netdev. The API consist on the following: flow_put - offload a new flow flow_get - query an offloaded flow flow_del - delete an offloaded flow flow_flush - flush all offloaded flows flow_dump_* - dump all offloaded flows In upcoming commits we will introduce an implementation of this API for netdev-linux. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2017-06-14 10:12:30 +02:00
Jan Scheurich	beb75a40fd	userspace: Switching of L3 packets in L2 pipeline Ports have a new layer3 attribute if they send/receive L3 packets. The packet_type included in structs dp_packet and flow is considered in ofproto-dpif. The classical L2 match fields (dl_src, dl_dst, dl_type, and vlan_tci, vlan_vid, vlan_pcp) now have Ethernet as pre-requisite. A dummy ethernet header is pushed to L3 packets received from L3 ports before the the pipeline processing starts. The ethernet header is popped before sending a packet to a L3 port. For datapath ports that can receive L2 or L3 packets, the packet_type becomes part of the flow key for datapath flows and is handled appropriately in dpif-netdev. In the 'else' branch in flow_put_on_pmd() function, the additional check flow_equal(&match.flow, &netdev_flow->flow) was removed, as a) the dpcls lookup is sufficient to uniquely identify a flow and b) it caused false negatives because the flow in netdev->flow may not properly masked. In dpif_netdev_flow_put() we now use the same method for constructing the netdev_flow_key as the one used when adding the flow to the dplcs to make sure these always match. The function netdev_flow_key_from_flow() used so far was not only inefficient but sometimes caused mismatches and subsequent flow update failures. The kernel datapath does not support the packet_type match field. Instead it encodes the packet type implictly by the presence or absence of the Ethernet attribute in the flow key and mask. This patch filters the PACKET_TYPE attribute out of netlink flow key and mask to be sent to the kernel datapath. Signed-off-by: Lorand Jakab <lojakab@cisco.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Co-authored-by: Zoltan Balogh <zoltan.balogh@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2017-06-02 10:15:20 -07:00
Pravin B Shelar	bf4bbd0d12	tunnel: Add support to configure ptk_mark Today packet mark action is broken for Tunnel ports with tunnel monitoring. User can write a flow to set pkt-mark for any tunnel traffic, but there is no way to set the packet mark for corresponding BFD traffic. Following patch introduces new option in OVSDB tunnel configuration so that user can set skb-mark for given tunnel endpoint. OVS would set the mark according to the skb-mark option for all tunnel traffic including packets generated by vSwitchd like tunnel monitoring BFD packet. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jarno Rajahalme <jarno@ovn.org>	2017-01-28 12:16:34 -08:00
Daniele Di Proietto	57eebbb4c3	dpif-netdev: Don't try to output on a device without txqs. Tunnel devices have 0 txqs and don't support netdev_send(). While netdev_send() simply returns EOPNOTSUPP, the XPS logic is still executed on output, and that might be confused by devices with no txqs. It seems better to have different structures in the fast path for ports that support netdev_{push,pop}_header (tunnel devices), and ports that support netdev_send. With this we can also remove a branch in netdev_send(). This is also necessary for a future commit, which starts DPDK devices without txqs. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ilya Maximets <i.maximets@samsung.com>	2017-01-15 19:25:11 -08:00
Stephen Finucane	7c9afefd0a	doc: Populate 'topics' section There are many docs that don't need to kept at the top level, along with many more hidden in random folders. Move them all. This also allows us to add the '-W' flag to Sphinx, ensuring unindexed docs result in build failures. Signed-off-by: Stephen Finucane <stephen@that.guru> Signed-off-by: Ben Pfaff <blp@ovn.org>	2016-12-12 08:57:06 -08:00
Pravin B Shelar	2b02d770c4	openvswitch: Allow external IPsec tunnel management. OVS GRE IPsec tunnel support has multiple issues, Therefore it was deprecated in OVS 2.6. Following patch removes support for GRE IPsec and allows external IPsec tunnel management for any type of tunnel not just GRE. e.g. user can encrypt Geneve or VxLan traffic. It can be done by using openflow pipeline to set skb-mark and using IPsec keying daemons to implement IPsec tunnels. This packet can be matched for the skb-mark to encrypt selective tunnel traffic. VMware-BZ: 1710701 Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Ansis Atteka <aatteka@ovn.org>	2016-09-27 11:06:09 -07:00
Daniele Di Proietto	3a414a0a4f	ofproto: Honor mtu_request even for internal ports. By default Open vSwitch tries to configure internal interfaces MTU to match the bridge minimum, overriding any attempt by the user to configure it through standard system tools, or the database. While this works in many simple cases (there are probably many users that rely on this) it may create problems for more advanced use cases (like any overlay networks). This commit allows the user to override the default behavior by providing an explict MTU in the mtu_request column in the Interface table. This means that Open vSwitch will now treat differently database MTU requests from standard system tools MTU requests (coming from `ip link` or `ifconfig`), but this seems the best way to remain compatible with old users while providing a more powerful interface. Suggested-by: Darrell Ball <dlu998@gmail.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ben Pfaff <blp@ovn.org> Tested-by: Joe Stringer <joe@ovn.org>	2016-09-02 16:01:12 -07:00
Daniele Di Proietto	4124cb1254	netdev: Make netdev_set_mtu() netdev parameter non-const. Every provider silently drops the const attribute when converting the parameter to the appropriate subclass. Might as well drop the const attribute from the parameter, since this is a "set" function. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ilya Maximets <i.maximets@samsung.com>	2016-08-12 19:32:12 -07:00
Ilya Maximets	324c837485	dpif-netdev: XPS (Transmit Packet Steering) implementation. If CPU number in pmd-cpu-mask is not divisible by the number of queues and in a few more complex situations there may be unfair distribution of TX queue-ids between PMD threads. For example, if we have 2 ports with 4 queues and 6 CPUs in pmd-cpu-mask such distribution is possible: <------------------------------------------------------------------------> pmd thread numa_id 0 core_id 13: port: vhost-user1 queue-id: 1 port: dpdk0 queue-id: 3 pmd thread numa_id 0 core_id 14: port: vhost-user1 queue-id: 2 pmd thread numa_id 0 core_id 16: port: dpdk0 queue-id: 0 pmd thread numa_id 0 core_id 17: port: dpdk0 queue-id: 1 pmd thread numa_id 0 core_id 12: port: vhost-user1 queue-id: 0 port: dpdk0 queue-id: 2 pmd thread numa_id 0 core_id 15: port: vhost-user1 queue-id: 3 <------------------------------------------------------------------------> As we can see above dpdk0 port polled by threads on cores: 12, 13, 16 and 17. By design of dpif-netdev, there is only one TX queue-id assigned to each pmd thread. This queue-id's are sequential similar to core-id's. And thread will send packets to queue with exact this queue-id regardless of port. In previous example: pmd thread on core 12 will send packets to tx queue 0 pmd thread on core 13 will send packets to tx queue 1 ... pmd thread on core 17 will send packets to tx queue 5 So, for dpdk0 port after truncating in netdev-dpdk: core 12 --> TX queue-id 0 % 4 == 0 core 13 --> TX queue-id 1 % 4 == 1 core 16 --> TX queue-id 4 % 4 == 0 core 17 --> TX queue-id 5 % 4 == 1 As a result only 2 of 4 queues used. To fix this issue some kind of XPS implemented in following way: * TX queue-ids are allocated dynamically. * When PMD thread first time tries to send packets to new port it allocates less used TX queue for this port. * PMD threads periodically performes revalidation of allocated TX queue-ids. If queue wasn't used in last XPS_TIMEOUT_MS milliseconds it will be freed while revalidation. * XPS is not working if we have enough TX queues. Reported-by: Zhihong Wang <zhihong.wang@intel.com> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-07-27 12:56:04 -07:00
Ben Pfaff	258e42fa2c	netdev: Fix typo in comment. The name of the macro was wrong. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Russell Bryant <russell@ovn.org>	2016-06-03 13:08:31 -07:00
Pravin B Shelar	4975aa3ee6	netdev-native-tnl: Introduce ip_build_header() The native tunneling build tunnel header code is spread across two different modules, it makes pretty hard to follow the code. Following patch refactors the code to move all code to netdev-ative-tnl module. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>	2016-05-23 20:27:14 -07:00
Daniele Di Proietto	050c60bfb5	netdev-dpdk: Use ->reconfigure() call to change rx/tx queues. This introduces in dpif-netdev and netdev-dpdk the first use for the newly introduce reconfigure netdev call. When a request to change the number of queues comes, netdev-dpdk will remember this and notify the upper layer via netdev_request_reconfigure(). The datapath, instead of periodically calling netdev_set_multiq(), can detect this and call reconfigure(). This mechanism can also be used to: * Automatically match the number of rxq with the one provided by qemu via the new_device callback. * Provide a way to change the MTU of dpdk devices at runtime. * Move a DPDK vhost device to the proper NUMA socket. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Tested-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ilya Maximets <i.maximets@samsung.com>	2016-05-23 10:27:42 -07:00
Daniele Di Proietto	790fb3b745	netdev: Add reconfigure request mechanism. A netdev provider, especially a PMD provider (like netdev DPDK) might not be able to change some of its parameters (such as MTU, or number of queues) without stopping everything and restarting. This commit introduces a mechanism that allows a netdev provider to request a restart (netdev_request_reconfigure()). The upper layer can be notified via netdev_wait_reconf_required() and netdev_is_reconf_required(). After closing all the rxqs the upper layer can finally call netdev_reconfigure(), to make sure that the new configuration is in place. This will be used by next commit to reconfigure rx and tx queues in netdev-dpdk. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Tested-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>	2016-05-23 10:27:42 -07:00
Pravin B Shelar	9235b4793e	dpif-netdev: Fix memory leak in tunnel header pop action. The tunnel header pop action can leak batch of packet in case of error. Following patch fixex the error code path. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>	2016-05-18 19:39:18 -07:00
Pravin B Shelar	1895cc8dbb	dpif-netdev: create batch object DPDK datapath operate on batch of packets. To pass the batch of packets around we use packets array and count. Next patch needs to associate meta-data with each batch of packets. So Introducing a batch structure to make handling the metadata easier. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>	2016-05-18 19:39:18 -07:00
Pravin B Shelar	1c8f98d96a	netdev: Return number of packet from netdev_pop_header() Current tunnel-pop API does not allow the netdev implementation retain a packet but STT can keep a packet from batch of packets during TCP reassembly processing. To return exact count of valid packet STT need to pass this number of packet parameter as a reference. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>	2016-05-18 19:39:18 -07:00
Ben Warren	b129cc9834	Break netdev.h into private and public parts Public (struct definitions and some prototypes) go in include/openvswitch Signed-off-by: Ben Warren <ben@skyportsystems.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2016-04-14 13:43:22 -07:00
Pravin B Shelar	6b6e13293e	netdev: remove netdev_get_in4() Since netdev can have multiple IP address use generic api netdev_get_addr_list(). This also make it easier to handle IPv4 and IPv6 address across vswitchd layers. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>	2016-03-24 09:30:57 -07:00

1 2 3

149 Commits