mir/ovs - ovs - Mike's Git repositories

mir/ovs

mirror of https://github.com/openvswitch/ovs synced 2025-08-28 12:58:00 +00:00

Author	SHA1	Message	Date
Sriharsha Basavapatna via dev	57924fc91c	revalidator: Rebalance offloaded flows based on the pps rate This is the third patch in the patch-set to support dynamic rebalancing of offloaded flows. The dynamic rebalancing functionality is implemented in this patch. The ukeys that are not scheduled for deletion are obtained and passed as input to the rebalancing routine. The rebalancing is done in the context of revalidation leader thread, after all other revalidator threads are done with gathering rebalancing data for flows. For each netdev that is in OOR state, a list of flows - both offloaded and non-offloaded (pending) - is obtained using the ukeys. For each netdev that is in OOR state, the flows are grouped and sorted into offloaded and pending flows. The offloaded flows are sorted in descending order of pps-rate, while pending flows are sorted in ascending order of pps-rate. The rebalancing is done in two phases. In the first phase, we try to offload all pending flows and if that succeeds, the OOR state on the device is cleared. If some (or none) of the pending flows could not be offloaded, then we start replacing an offloaded flow that has a lower pps-rate than a pending flow, until there are no more pending flows with a higher rate than an offloaded flow. The flows that are replaced from the device are added into kernel datapath. A new OVS configuration parameter "offload-rebalance", is added to ovsdb. The default value of this is "false". To enable this feature, set the value of this parameter to "true", which provides packets-per-second rate based policy to dynamically offload and un-offload flows. Note: This option can be enabled only when 'hw-offload' policy is enabled. It also requires 'tc-policy' to be set to 'skip_sw'; otherwise, flow offload errors (specifically ENOSPC error this feature depends on) reported by an offloaded device are supressed by TC-Flower kernel module. Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Co-authored-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com> Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com> Reviewed-by: Sathya Perla <sathya.perla@broadcom.com> Reviewed-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2018-10-19 11:27:52 +02:00
Sriharsha Basavapatna via dev	738c785ff1	dpif-netlink: Detect Out-Of-Resource condition on a netdev This is the first patch in the patch-set to support dynamic rebalancing of offloaded flows. The patch detects OOR condition on a netdev port when ENOSPC error is returned by TC-Flower while adding a flow rule. A new structure is added to the netdev called "netdev_hw_info", to store OOR related information required to perform dynamic offload-rebalancing. Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Co-authored-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com> Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com> Reviewed-by: Sathya Perla <sathya.perla@broadcom.com> Reviewed-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2018-10-19 11:27:45 +02:00
Eli Britstein	d9677a1f0e	netdev-tc-offloads: TC csum option is not matched with tunnel configuration Tunnels (gre, geneve, vxlan) support 'csum' option (true/false), default is false. Generated encap TC rule will now be configured as the tunnel configuration. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2018-10-16 09:28:30 +02:00
Matteo Croce	790a437229	dpif-netlink: Fix null pointer. In dpif_netlink_port_add__(), socksp could be NULL, because vport_socksp_to_pids() would allocate a new array and return a single zero element. Following vport_socksp_to_pids() removal, a NULL pointer can happen when dpif_netlink_port_add__() is called and dpif->handlers is 0. Restore the old behaviour of using a zero pid when dpif->handlers is 0. Fixes: 69c51582f ("dpif-netlink: don't allocate per thread netlink sockets") Reported-by: Flavio Leitner <fbl@redhat.com> Reported-by: Guru Shetty <guru@ovn.org> Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2018-10-08 08:17:31 -07:00
Ben Pfaff	769b50349f	dpif: Remove support for multiple queues per port. Commit 69c51582ff78 ("dpif-netlink: don't allocate per thread netlink sockets") removed dpif-netlink support for multiple queues per port. No remaining dpif provider supports multiple queues per port, so remove infrastructure for the feature. CC: Matteo Croce <mcroce@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Tested-by: Yifeng Sun <pkusunyifeng@gmail.com> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>	2018-09-26 15:57:06 -07:00
Matteo Croce	69c51582ff	dpif-netlink: don't allocate per thread netlink sockets When using the kernel datapath, OVS allocates a pool of sockets to handle netlink events. The number of sockets is: ports * n-handler-threads, where n-handler-threads is user configurable and defaults to 3/4*number of cores. This because vswitchd starts n-handler-threads threads, each one with a netlink socket for every port of the switch. Every thread then, starts listening on events on its set of sockets with epoll(). On setup with lot of CPUs and ports, the number of sockets easily hits the process file descriptor limit, and ovs-vswitchd will exit with -EMFILE. Change the number of allocated sockets to just one per port by moving the socket array from a per handler structure to a per datapath one, and let all the handlers share the same sockets by using EPOLLEXCLUSIVE epoll flag which avoids duplicate events, on systems that support it. The patch was tested on a 56 core machine running Linux 4.18 and latest Open vSwitch. A bridge was created with 2000+ ports, some of them being veth interfaces with the peer outside the bridge. The latency of the upcall is measured by setting a single 'action=controller,local' OpenFlow rule to force all the packets going to the slow path and then to the local port. A tool[1] injects some packets to the veth outside the bridge, and measures the delay until the packet is captured on the local port. The rx timestamp is get from the socket ancillary data in the attribute SO_TIMESTAMPNS, to avoid having the scheduler delay in the measured time. The first test measures the average latency for an upcall generated from a single port. To measure it 100k packets, one every msec, are sent to a single port and the latencies are measured. The second test is meant to check latency fairness among ports, namely if latency is equal between ports or if some ports have lower priority. The previous test is repeated for every port, the average of the average latencies and the standard deviation between averages is measured. The third test serves to measure responsiveness under load. Heavy traffic is sent through all ports, latency and packet loss is measured on a single idle port. The fourth test is all about fairness. Heavy traffic is injected in all ports but one, latency and packet loss is measured on the single idle port. This is the test setup: # nproc 56 # ovs-vsctl show \|grep -c Port 2223 # ovs-ofctl dump-flows ovs_upc_br cookie=0x0, duration=4.827s, table=0, n_packets=0, n_bytes=0, actions=CONTROLLER:65535,LOCAL # uname -a Linux fc28 4.18.7-200.fc28.x86_64 #1 SMP Mon Sep 10 15:44:45 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux And these are the results of the tests: Stock OVS Patched netlink sockets in use by vswitchd lsof -p $(pidof ovs-vswitchd) \ \|grep -c GENERIC 91187 2227 Test 1 one port latency min/avg/max/mdev (us) 2.7/6.6/238.7/1.8 1.6/6.8/160.6/1.7 Test 2 all port avg latency/mdev (us) 6.51/0.97 6.86/0.17 Test 3 single port latency under load avg/mdev (us) 7.5/5.9 3.8/4.8 packet loss 95 % 62 % Test 4 idle port latency under load min/avg/max/mdev (us) 0.8/1.5/210.5/0.9 1.0/2.1/344.5/1.2 packet loss 94 % 4 % CPU and RAM usage seems not to be affected, the resource usage of vswitchd idle with 2000+ ports is unchanged: # ps u $(pidof ovs-vswitchd) USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND openvsw+ 5430 54.3 0.3 4263964 510968 pts/1 RLl+ 16:20 0:50 ovs-vswitchd Additionally, to check if vswitchd is thread safe with this patch, the following test was run for circa 48 hours: on a 56 core machine, a bridge with kernel datapath is filled with 2200 dummy interfaces and 22 veth, then 22 traffic generators are run in parallel piping traffic into the veths peers outside the bridge. To generate as many upcalls as possible, all packets were forced to the slowpath with an openflow rule like 'action=controller,local' and packet size was set to 64 byte. Also, to avoid overflowing the FDB early and slowing down the upcall processing, generated mac addresses were restricted to a small interval. vswitchd ran without problems for 48+ hours, obviously with all the handler threads with almost 99% CPU usage. [1] https://github.com/teknoraver/network-tools/blob/master/weed.c Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Flavio Leitner <fbl@sysclose.org>	2018-09-25 14:52:20 -07:00
Gavi Teitz	a692410af0	dpctl: Expand the flow dump type filter Added new types to the flow dump filter, and allowed multiple filter types to be passed at once, as a comma separated list. The new types added are: * tc - specifies flows handled by the tc dp * non-offloaded - specifies flows not offloaded to the HW * all - specifies flows of all types The type list is now fully parsed by the dpctl, and a new struct was added to dpif which enables dpctl to define which types of dumps to provide, rather than passing the type string and having dpif parse it. Signed-off-by: Gavi Teitz <gavi@mellanox.com> Acked-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2018-09-13 16:56:25 +02:00
Justin Pettit	60ebc04d12	dpif-netlink: Prevent abort in probe_broken_meters(). Commit 92d0d515d ("dpif-netlink: Probe for broken Linux meter implementations.") introduced a deadlock on the 'once' structure declared in probe_broken_meters() with the following callstack: probe_broken_meters() probe_broken_meters__() dpif_netlink_meter_set() probe_broken_meters() This commit introduce a modified version of dpif_netlink_meter_set() that sets a meter without calling the probe. Reported-by: Numan Siddique <nusiddiq@redhat.com> Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>	2018-08-17 13:10:01 -07:00
Yi-Hung Wei	906ff9d229	dpif-netlink: Implement conntrack zone limit This patch provides the implementation of conntrack zone limit in dpif-netlink. It basically utilizes the netlink API to communicate with OVS kernel module to set, delete, and get conntrack zone limit. Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Justin Pettit <jpettit@ovn.org>	2018-08-17 09:46:56 -07:00
Yi-Hung Wei	cd015a11c2	dpif: Support conntrack zone limit. This patch defines the dpif interface to support conntrack per zone limit. Basically, OVS users can use this interface to set, delete, and get the conntrack per zone limit for various dpif interfaces. The following patch will make use of the proposed interface to implement the feature. Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Justin Pettit <jpettit@ovn.org>	2018-08-17 09:30:55 -07:00
Justin Pettit	92d0d515d6	dpif-netlink: Probe for broken Linux meter implementations. Meter support was introduced in Linux 4.15. In some versions of Linux 4.15, 4.16, and 4.17, there was a bug that never set the id when the meter was created, so all meters essentially had an id of zero. This commit adds a probe to check for that condition and disable meters on those kernels. Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>	2018-08-16 10:20:52 -07:00
Justin Pettit	8101f03fcd	dpif: Don't pass in '*meter_id' to meter_set commands. The original intent of the API appears to be that the underlying DPIF implementaion would choose a local meter id. However, neither of the existing datapath meter implementations (userspace or Linux) implemented that; they expected a valid meter id to be passed in, otherwise they returned an error. This commit follows the existing implementations and makes the API somewhat cleaner. Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>	2018-08-16 10:20:52 -07:00
Andy Zhou	80738e5f93	dpif-netlink: Add meter support. To work with kernel datapath that supports meter. Signed-off-by: Andy Zhou <azhou@ovn.org> Co-authored-by: Justin Pettit <jpettit@ovn.org> Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Alin Gabriel Serdean <aserdean@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>	2018-07-30 13:00:51 -07:00
Justin Pettit	494a74557a	Revert "dpctl: Expand the flow dump type filter" Commit ab15e70eb587 ("dpctl: Expand the flow dump type filter") had a number of issues with style, build breakage, and failing unit tests. The patch is being reverted so that they can addressed. This reverts commit ab15e70eb5878b46f8f84da940ffc915b6d74cad. CC: Gavi Teitz <gavi@mellanox.com> CC: Simon Horman <simon.horman@netronome.com> CC: Roi Dayan <roid@mellanox.com> CC: Aaron Conole <aconole@redhat.com> Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>	2018-07-25 14:17:36 -07:00
Gavi Teitz	ab15e70eb5	dpctl: Expand the flow dump type filter Added new types to the flow dump filter, and allowed multiple filter types to be passed at once, as a comma separated list. The new types added are: * tc - specifies flows handled by the tc dp * non-offloaded - specifies flows not offloaded to the HW * all - specifies flows of all types The type list is now fully parsed by the dpctl, and a new struct was added to dpif which enables dpctl to define which types of dumps to provide, rather than passing the type string and having dpif parse it. Signed-off-by: Gavi Teitz <gavi@mellanox.com> Acked-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2018-07-25 18:16:27 +02:00
Jianbo Liu	f9885dc594	Add support to offload QinQ double VLAN headers match Currently the inner VLAN header is ignored when using the TC data-path. As TC flower supports QinQ, now we can offload the rules to match on both outer and inner VLAN headers. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2018-07-25 18:15:52 +02:00
Gavi Teitz	d63ca5329f	dpctl: Properly reflect a rule's offloaded to HW state Previously, any rule that is offloaded via a netdev, not necessarily to the HW, would be reported as "offloaded". This patch fixes this misalignment, and introduces the 'dp' state, as follows: rule is in HW via TC offload -> offloaded=yes dp:tc rule is in not HW over TC DP -> offloaded=no dp:tc rule is in not HW over OVS DP -> offloaded=no dp:ovs To achieve this, the flows's 'offloaded' flag was encapsulated in a new attrs struct, which contains the offloaded state of the flow and the DP layer the flow is handled in, and instead of setting the flow's 'offloaded' state based solely on the type of dump it was acquired via, for netdev flows it now sends the new attrs struct to be collected along with the rest of the flow via the netdev, allowing it to be set per flow. For TC offloads, the offloaded state is set based on the 'in_hw' and 'not_in_hw' flags received from the TC as part of the flower. If no such flag was received, due to lack of kernel support, it defaults to true. Signed-off-by: Gavi Teitz <gavi@mellanox.com> Acked-by: Roi Dayan <roid@mellanox.com> [simon: resolved conflict in lib/dpctl.man] Signed-off-by: Simon Horman <simon.horman@netronome.com>	2018-06-18 09:57:37 +02:00
Ben Pfaff	fa37affad3	Embrace anonymous unions. Several OVS structs contain embedded named unions, like this: struct { ... union { ... } u; }; C11 standardized a feature that many compilers already implemented anyway, where an embedded union may be unnamed, like this: struct { ... union { ... }; }; This is more convenient because it allows the programmer to omit "u." in many places. OVS already used this feature in several places. This commit embraces it in several others. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org> Tested-by: Alin Gabriel Serdean <aserdean@ovn.org> Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>	2018-05-25 13:36:05 -07:00
Greg Rose	1c385f4972	lib/dpif-netlink: Fix miscompare of gre ports In netdev_to_ovs_vport_type() it checks for netdev types matching "gre" with a strstr(). This makes it match ip6gre as well and return OVS_VPORT_TYPE_GRE, which is clearly wrong. Move the usage of strstr() after all the exact matches with strcmp() to avoid the problem permanently because when I added the ip6gre type I ran into a very difficult to detect bug. Cc: Ben Pfaff <blp@ovn.org> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: William Tu <u9012063@gmail.com>	2018-05-21 20:33:30 -07:00
Greg Rose	3b10ceeed1	ip6gre: Add ip6gre vport type Add handlers for OVS_VPORT_TYPE_IP6GRE Cc: Ben Pfaff <blp@ovn.org> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: William Tu <u9012063@gmail.com>	2018-05-21 20:33:30 -07:00
William Tu	98514eea21	erspan: add kernel datapath support pass check, check-kernel (4.16-rc4), check-system-userspace Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2018-05-21 20:33:30 -07:00
William Tu	7dc18ae96d	userspace: add erspan tunnel support. ERSPAN is a tunneling protocol based on GRE tunnel. The patch add erspan tunnel support for ovs-vswitchd with userspace datapath. Configuring erspan tunnel is similar to gre tunnel, but with additional erspan's parameters. Matching a flow on erspan's metadata is also supported, see ovs-fields for more details. Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2018-05-21 20:33:30 -07:00
Greg Rose	c387d8177f	compat: Add ipv6 GRE and IPV6 Tunneling This patch backports upstream ipv6 GRE and tunneling into the OVS OOT (Out of Tree) datapath drivers. The primary reason for this is to support the ERSPAN feature. Because there is no previous history of ipv6 GRE and tunneling it is not possible to exactly reproduce the history of all the files in the patch. The two newly added files - ip6_gre.c and ip6_tunnel.c - are cut from whole cloth out of the upstream Linux 4.15 kernel and then modified as necessary with compatibility layer fixups. These two files already included parts of several other upstream commits that also touched other upstream files. As such, this patch may incorporate parts or all of the following commits: d350a82 net: erspan: create erspan metadata uapi header c69de58 net: erspan: use bitfield instead of mask and offset b423d13 net: erspan: fix use-after-free 214bb1c net: erspan: remove md NULL check afb4c97 ip6_gre: fix potential memory leak in ip6erspan_rcv 50670b6 ip_gre: fix potential memory leak in erspan_rcv a734321 ip6_gre: fix error path when ip6erspan_rcv failed dd8d5b8 ip_gre: fix error path when erspan_rcv failed 293a199 ip6_gre: fix a pontential issue in ip6erspan_rcv d91e8db5 net: erspan: reload pointer after pskb_may_pull ae3e133 net: erspan: fix wrong return value c05fad5 ip_gre: fix wrong return value of erspan_rcv 94d7d8f ip6_gre: add erspan v2 support f551c91 net: erspan: introduce erspan v2 for ip_gre 1d7e2ed net: erspan: refactor existing erspan code ef7baf5 ip6_gre: add ip6 erspan collect_md mode 5a963eb ip6_gre: Add ERSPAN native tunnel support ceaa001 openvswitch: Add erspan tunnel support. f192970 ip_gre: check packet length and mtu correctly in erspan tx c84bed4 ip_gre: erspan device should keep dst c122fda ip_gre: set tunnel hlen properly in erspan_tunnel_init 5513d08 ip_gre: check packet length and mtu correctly in erspan_xmit 935a974 ip_gre: get key from session_id correctly in erspan_rcv 1a66a83 gre: add collect_md mode to ERSPAN tunnel 84e54fe gre: introduce native tunnel support for ERSPAN In cases where the listed commits also touched other source code files then the patches are also listed separately within this patch series. Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: William Tu <u9012063@gmail.com>	2018-05-21 20:33:29 -07:00
Chris Mi	00a0a011d3	netdev-tc-offloads: Add offloading of multiple outputs Currently, we support offloading of one output port. Remove that limitation by use of mirred mirror action for all output ports, except that the last one is mirred redirect action. Signed-off-by: Chris Mi <chrism@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2018-04-12 11:08:34 +02:00
Flavio Leitner	e0e2410d52	netdev-linux: fail ops not supporting remote netns. When the netdev is in another namespace and the operation doesn't support network namespaces, return the correct error. Signed-off-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2018-03-31 12:48:39 -07:00
Flavio Leitner	bfda523979	netnsid: update device only if netnsid matches. Recent kernels provide the network namespace ID of a port, so use that to discover where the port currently is. A network device in another network namespace could have the same name, so once the socket starts listening to other network namespaces, it is necessary to confirm the netnsid. Signed-off-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2018-03-31 12:48:33 -07:00
Flavio Leitner	a86bd14ec9	netlink: provide network namespace id from a msg. The netlink notification's ancillary data contains the network namespace id (netnsid) needed to identify the device correctly. Signed-off-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2018-03-31 12:48:31 -07:00
Ben Pfaff	0d71302e36	ofp-util, ofp-parse: Break up into many separate modules. ofp-util had been far too large and monolithic for a long time. This commit breaks it up into units that make some logical sense. It also moves the pieces of ofp-parse that were specific to each unit into the relevant unit. Most of this commit is just moving code around. Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>	2018-02-13 10:43:13 -08:00
Darrell Ball	875075b362	dpctl conntrack: Add get number of connections. A get command is added for number of conntrack connections. This command is only supported in the userspace datapath at this time. Signed-off-by: Darrell Ball <dlu998@gmail.com> Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com> Co-authored-by: Antonio Fischetti <antonio.fischetti@intel.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2018-01-09 11:17:44 -08:00
Darrell Ball	c92339ad19	dpctl conntrack: Add get and set maxconns command. Get and set dpctl commands are added for conntrack maxconns. These commands are only supported in the userspace datapath at this time. Signed-off-by: Darrell Ball <dlu998@gmail.com> Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com> Co-authored-by: Antonio Fischetti <antonio.fischetti@intel.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2018-01-09 11:16:44 -08:00
Yi-Hung Wei	817a76577f	ct-dpif,dpif-netlink: Support conntrack flush by ct 5-tuple This patch adds support of flushing a conntrack entry specified by the conntrack 5-tuple, and provides the implementation in dpif-netlink. The implementation of dpif-netlink in the linux datapath utilizes the NFNL_SUBSYS_CTNETLINK netlink subsystem to delete a conntrack entry in nf_conntrack. Future patches will add support for the userspace and Windows datapaths. VMWare-BZ: #1983178 Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Justin Pettit <jpettit@ovn.org>	2017-12-07 13:49:40 -08:00
Xiao Liang	fd016ae3fb	lib: Move lib/poll-loop.h to include/openvswitch Poll-loop is the core to implement main loop. It should be available in libopenvswitch. Signed-off-by: Xiao Liang <shaw.leon@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2017-11-03 10:47:55 -07:00
Antonio Fischetti	ded30c74b1	dpctl: Add new 'ct-bkts' command. With the command: ovs-appctl dpctl/ct-bkts shows the number of connections per bucket. By using a threshold: ovs-appctl dpctl/ct-bkts gt=N for each bucket shows the number of connections when they are greater than N. Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com> Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com> Co-authored-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com> Signed-off-by: Darrell Ball <dlu998@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2017-08-02 10:18:55 -07:00
Roi Dayan	d52ef4ebe2	dpif-netlink: Fix log level for error message Since it's an error but also will always occur in older kernels log the message with level warning instead of info. Signed-off-by: Roi Dayan <roid@mellanox.com> Acked-by: Eric Garver <e@erig.me> Signed-off-by: Joe Stringer <joe@ovn.org>	2017-08-01 16:46:29 -07:00
Roi Dayan	dfaf79ddd9	dpif: Refactor obj type from void pointer to dpif_class It's basically what is being passed today and passing a specific type adds a compiler type check. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2017-07-27 10:17:46 +02:00
Joe Stringer	a8a3eee492	dpif-netlink: For non-Ethernet, use Ethertype from packet_type. For non-Ethernet flows, when fixing up the netlink message we need make sure to pass down a valid Ethertype. The kernel does not understand packet_type so it's implicitly encoded by the absence of _ETHERNET and exact match of _ETHERTYPE. Without this change match_validate in the kernel complains when trying to match packets from L3 tunnels. e.g. openvswitch: netlink: Unexpected mask (mask=110088, allowed=3d9804c) The mask use to always be set in xlate_wc_init() and xlate_wc_finish(), but that changed for non-Ethernet frames with the commit listed below. Fixes: 3d4b2e6eb74e ("userspace: Add OXM field MFF_PACKET_TYPE") Signed-off-by: Joe Stringer <joe@ovn.org> Co-authored-by: Eric Garver <e@erig.me> Acked-by: Eric Garver <e@erig.me>	2017-07-19 14:34:18 -07:00
Joe Stringer	1ca5b61bfe	dpif-netlink: Use netlink helpers for packet_type. Rather than open-coding access to netlink attribute pointers in put_exclude_packet_type(), make use of the netlink attribute helpers. This simplifies the following bugfix. Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Eric Garver <e@erig.me>	2017-07-19 14:34:18 -07:00
Roi Dayan	3cd9988619	dpif-netlink: Use dpif logging functions Remove redundant logging functions and reuse the exposed dpif logging functions. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2017-06-15 11:57:51 +02:00
Paul Blakey	4742003c74	dpctl: Indicate if flow is offloaded when dumping flows of all types When verbosity is requested on dump-flows (-m) indicate which flows are offloaded. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2017-06-15 11:55:00 +02:00
Paul Blakey	7e8b719938	dpctl: Add an option to dump only certain kinds of flows Usage: # to dump all datapath flows (default): ovs-dpctl dump-flows # to dump only flows that in kernel datapath: ovs-dpctl dump-flows type=ovs # to dump only flows that are offloaded: ovs-dpctl dump-flows type=offloaded Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2017-06-15 11:53:06 +02:00
Paul Blakey	6c34398480	dpif-netlink: Use netdev flow get api to query a flow Search all datapath added netdevs for a given flow using netdev flow api and parse it back to dpif flow. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2017-06-15 11:49:26 +02:00
Paul Blakey	0335a89ced	dpif-netlink: Use netdev flow del api to delete a flow If a flow was offloaded to a netdev we delete it using netdev flow api. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2017-06-15 11:48:22 +02:00
Paul Blakey	8b668ee3f0	dpif-netlink: Use netdev flow put api to insert a flow Using the new netdev flow api operate will now try and offload flows to the relevant netdev of the input port. Other operate methods flows will come in later patches. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2017-06-15 11:41:51 +02:00
Paul Blakey	f2280b4198	dpif-netlink: Dump netdevs flows on flow dump While dumping flows, dump flows that were offloaded to netdev and parse them back to dpif flow. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2017-06-15 11:39:51 +02:00
Paul Blakey	f7dde6df70	dpif-netlink: Flush added ports using netdev flow api If netdev flow offloading is enabled, flush all added ports using netdev flow api. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2017-06-15 11:39:45 +02:00
Ben Pfaff	ab79d262e1	netlink: Introduce helpers for 128-bit integer attributes. Use the helpers in appropriate places. In most cases, this fixes a misaligned reference, since ovs_be128 and ovs_u128 require 8-byte alignment but Netlink only guarantees 4-byte. Found by GCC -fsanitize=undefined. Reported-by: Lance Richardson <lrichard@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Lance Richardson <lrichard@redhat.com>	2017-06-14 12:34:36 -07:00
Jan Scheurich	beb75a40fd	userspace: Switching of L3 packets in L2 pipeline Ports have a new layer3 attribute if they send/receive L3 packets. The packet_type included in structs dp_packet and flow is considered in ofproto-dpif. The classical L2 match fields (dl_src, dl_dst, dl_type, and vlan_tci, vlan_vid, vlan_pcp) now have Ethernet as pre-requisite. A dummy ethernet header is pushed to L3 packets received from L3 ports before the the pipeline processing starts. The ethernet header is popped before sending a packet to a L3 port. For datapath ports that can receive L2 or L3 packets, the packet_type becomes part of the flow key for datapath flows and is handled appropriately in dpif-netdev. In the 'else' branch in flow_put_on_pmd() function, the additional check flow_equal(&match.flow, &netdev_flow->flow) was removed, as a) the dpcls lookup is sufficient to uniquely identify a flow and b) it caused false negatives because the flow in netdev->flow may not properly masked. In dpif_netdev_flow_put() we now use the same method for constructing the netdev_flow_key as the one used when adding the flow to the dplcs to make sure these always match. The function netdev_flow_key_from_flow() used so far was not only inefficient but sometimes caused mismatches and subsequent flow update failures. The kernel datapath does not support the packet_type match field. Instead it encodes the packet type implictly by the presence or absence of the Ethernet attribute in the flow key and mask. This patch filters the PACKET_TYPE attribute out of netlink flow key and mask to be sent to the kernel datapath. Signed-off-by: Lorand Jakab <lojakab@cisco.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Co-authored-by: Zoltan Balogh <zoltan.balogh@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2017-06-02 10:15:20 -07:00
Ben Pfaff	aa5c021607	dpif-netlink: Fix multiple-free and fd leak on error path. This function attempts to open a bunch of new handlers. If it fails, it attempts to close all the handlers that have already been opened. Unfortunately, the loop to close the opened handlers used the wrong array index: 'i' instead of 'j'. This fixes the problem. Found by Coverity. Reported-at: https://scan3.coverity.com/reports.htm#v16889/p10449/fileInstanceId=14762827&defectInstanceId=4305351&mergedDefectId=180429 Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org>	2017-06-01 16:39:24 -07:00
Eric Garver	921c370a9d	dpif-netlink: Probe for out-of-tree tunnels, decides used interface On dpif init, probe for whether tunnels are created using in-tree (upstream linux) or out-of-tree (OVS). This is done by probing for the existence of "ovs_geneve" via rtnetlink. This is used to determine how to create the tunnel devices. For out-of-tree tunnels, only try genetlink/compat. For in-tree kernel tunnels, try rtnetlink then fallback to genetlink. Signed-off-by: Eric Garver <e@erig.me> Signed-off-by: Joe Stringer <joe@ovn.org>	2017-05-19 12:51:58 -07:00
Eric Garver	c4e087530e	dpif-netlink: Support rtnetlink port creation. In order to be able to add those tunnels, we need to add code to create the tunnels and add them as NETDEV vports. And when there is no support to create them, we need to fallback to compatibility code and add them as tunnel vports. When removing those tunnels, we need to remove the interfaces as well, and detecting the right type might be important, at least to distinguish the tunnel vports that we should remove and the interfaces that we shouldn't. Co-authored-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Signed-off-by: Eric Garver <e@erig.me> Signed-off-by: Joe Stringer <joe@ovn.org>	2017-05-19 12:51:57 -07:00

1 2

98 Commits