mir/ovs - ovs - Mike's Git repositories

mir/ovs

mirror of https://github.com/openvswitch/ovs synced 2025-08-29 05:18:13 +00:00

Author	SHA1	Message	Date
Xiao Liang	fd016ae3fb	lib: Move lib/poll-loop.h to include/openvswitch Poll-loop is the core to implement main loop. It should be available in libopenvswitch. Signed-off-by: Xiao Liang <shaw.leon@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2017-11-03 10:47:55 -07:00
William Tu	d2a5f17093	netdev: Fix memory leak on error path. Instead of freeing in the error path, move the allocation after it. Found by inspection. Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2017-10-12 14:50:03 -07:00
Joe Stringer	c8d0f32a60	netdev: Free ifidx mapping in netdev_ports_remove(). Previously, netdev_ports_insert() would allocate and insert an ifindex->odp_port mapping, but netdev_ports_remove() would never remove the mapping or free the mapping structure. This patch fixes these up. Fixes: 32b77c316d9982("dpif: Save added ports in a port map.") Reported-by: Andy Zhou <azhou@ovn.org> Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Andy Zhou <azhou@ovn.org>	2017-08-11 09:46:54 -07:00
Daniel Alvarez	0ac01021a9	netdev: check for NULL fields in netdev_get_addrs When the interfaces list is retrieved through getiffaddrs(), there might be elements with iface_name set to NULL. This patch checks ifa_name to be not NULL before comparing it to the actual device name in the loop that calculates how many interfaces exist with that same name. Also, this patch checks that ifa_netmask is not NULL for coherence with the existing code so that it doesn't allocate more memory than needed if this field is NULL. Note, that these checks are already being done later in the function so it should be done in both places. Signed-off-by: Daniel Alvarez <dalvarez@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Lance Richardson <lrichard@redhat.com>	2017-08-08 16:45:18 -07:00
Eelco Chaudron	8c2c225e48	netdev: Fix netdev_open() to track and recreate classless interfaces Due to commit 67ac844 an existing issue with OVS persisten ports surfaced. If we revert the commit we no longer get the error, and basic traffic will flow. However the wrong netdev class is used, hence the wrong callbacks get called. The main issue is with netdev_open() being called with type = NULL before the interface is actually configured in the system. This patch tracks these "auto" generated interfaces, and once netdev_open() gets called with a valid type, re-configures (re-create) it. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2017-08-02 13:49:10 -07:00
Roi Dayan	dfaf79ddd9	dpif: Refactor obj type from void pointer to dpif_class It's basically what is being passed today and passing a specific type adds a compiler type check. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2017-07-27 10:17:46 +02:00
Haifeng Lin	f70ad93492	netdev: Fix crash when ifa_netmask is null. glibc sometimes doesn't initialize the ifa_netmask and ifa_addr fields, if the ioctl to fetch them fails. Check ifa_name also just for paranoia. Signed-off-by: Haifeng Lin <haifeng.lin@huawei.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2017-07-12 08:50:25 -07:00
Zoltán Balogh	429be0eeed	netdev: Fix crash when interface option is changed to invalid value. When trying to modify an interface option (e.g. remote IP of a GRE port) to an invalid value, the vswitchd does crash. For instance: ovs-vsctl add-br br0 ovs-vsctl add-port br0 gre0 -- set interface gre0 type=gre \ options:remote_ip=10.0.0.2 ovs-vsctl set interface gre0 options:remote_ip=9.9.9 The bug is caused by trying to dereference a NULL pointer. It was introduced by the commit 9fff138ec3a6. Before that, the NULL pointer was handled by the VLOG_WARN_BUF macro. Signed-off-by: Zoltán Balogh <zoltan.balogh@ericsson.com> CC: Daniele Di Proietto <diproiettod@vmware.com> Fixes: 9fff138ec3a6 ("netdev: Add 'errp' to set_config().") Signed-off-by: Ben Pfaff <blp@ovn.org>	2017-07-11 17:35:29 -07:00
Ben Pfaff	875ab13020	userspace: Handling of versatile tunnel ports In netdev_gre_build_header(), GRE protocol and VXLAN next_potocol is set based on packet_type of flow. If it's about an Ethernet packet, it is set to ETP_TYPE_TEB. Otherwise, if the name space is OFPHTN_ETHERNET, it is set according to the name space type. Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2017-06-27 17:28:30 -04:00
Ben Pfaff	81765c00a1	openvswitch.h: Use odp_port_t for port numbers in userspace-only structs. Using the correct type reduces the need for type conversions. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Jan Scheurich <jan.scheurich@ericsson.com> Reviewed-by: nickcooper-zhangtonghao <nic@opencloud.tech>	2017-06-20 07:35:49 +08:00
Paul Blakey	1067cf93ac	netdev: Init flow api on already added ports on offload enable Ports already added to a switch are not being initialized for offloading so when enabling offload we need to go over those ports. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2017-06-15 11:56:01 +02:00
Paul Blakey	6c34398480	dpif-netlink: Use netdev flow get api to query a flow Search all datapath added netdevs for a given flow using netdev flow api and parse it back to dpif flow. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2017-06-15 11:49:26 +02:00
Paul Blakey	0335a89ced	dpif-netlink: Use netdev flow del api to delete a flow If a flow was offloaded to a netdev we delete it using netdev flow api. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2017-06-15 11:48:22 +02:00
Paul Blakey	f2280b4198	dpif-netlink: Dump netdevs flows on flow dump While dumping flows, dump flows that were offloaded to netdev and parse them back to dpif flow. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2017-06-15 11:39:51 +02:00
Paul Blakey	f7dde6df70	dpif-netlink: Flush added ports using netdev flow api If netdev flow offloading is enabled, flush all added ports using netdev flow api. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2017-06-15 11:39:45 +02:00
Paul Blakey	32b77c316d	dpif: Save added ports in a port map for netdev flow api use To use netdev flow offloading api, dpifs needs to iterate over added ports. This addition inserts the added dpif ports in a hash map, The map will also be used to translate dpif ports to netdevs. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2017-06-15 11:39:41 +02:00
Paul Blakey	691d20cbdc	other-config: Add tc-policy switch to control tc flower flag Add a new configuration tc-policy option that controls tc flower flag. Possible options are none, skip_sw, skip_hw. The default is none which is to insert the rule both to sw and hw. This option is only relevant if hw-offload is enabled. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2017-06-15 11:39:40 +02:00
Paul Blakey	53611f7b05	other-config: Add hw-offload switch to control netdev flow offloading Add a new configuration option - hw-offload that enables netdev flow api. Enabling this option will allow offloading flows using netdev implementation instead of the kernel datapath. This configuration option defaults to false - disabled. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2017-06-14 10:13:25 +02:00
Paul Blakey	18ebd48cfb	netdev: Adding a new netdev API to be used for offloading flows Add a new API interface for offloading dpif flows to netdev. The API consist on the following: flow_put - offload a new flow flow_get - query an offloaded flow flow_del - delete an offloaded flow flow_flush - flush all offloaded flows flow_dump_* - dump all offloaded flows In upcoming commits we will introduce an implementation of this API for netdev-linux. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2017-06-14 10:12:30 +02:00
Eelco Chaudron	67ac844b55	netdev: Fix netdev_open() to adhere to class type if given When trying to configure a system port as type=internal it could start an infinite port creation loop. When this happens you will see the following log messages: 2017-06-01T09:00:17.900Z\|02813\|dpif\|WARN\|system@ovs-system: failed to add ve01_1 as port: File exists 2017-06-01T09:00:17.900Z\|02814\|bridge\|WARN\|could not add network device ve01_1 to ofproto (File exists) 2017-06-01T09:00:17.907Z\|02815\|bridge\|INFO\|bridge bzb: added interface ve01_1 on port 2 2017-06-01T09:00:17.909Z\|02816\|bridge\|INFO\|bridge bzb: deleted interface ve01_1 on port 2 2017-06-01T09:00:17.914Z\|02817\|dpif\|WARN\|system@ovs-system: failed to add ve01_1 as port: File exists 2017-06-01T09:00:17.914Z\|02818\|bridge\|WARN\|could not add network device ve01_1 to ofproto (File exists) 2017-06-01T09:00:17.921Z\|02819\|bridge\|INFO\|bridge bzb: added interface ve01_1 on port 3 2017-06-01T09:00:17.923Z\|02820\|bridge\|INFO\|bridge bzb: deleted interface ve01_1 on port 3 2017-06-01T09:00:17.929Z\|02821\|dpif\|WARN\|system@ovs-system: failed to add ve01_1 as port: File exists 2017-06-01T09:00:17.929Z\|02822\|bridge\|WARN\|could not add network device ve01_1 to ofproto (File exists) 2017-06-01T09:00:17.936Z\|02823\|bridge\|INFO\|bridge bzb: added interface ve01_1 on port 4 ... ... This is how to replicate it: ip link add name ve01_1 type veth peer name ve01_2 ovs-vsctl add-br bzb ovs-vsctl add-port bzb ve01_1 ovs-vsctl set interface ve01_1 type=internal ip link set dev ve01_1 up ip link set dev ve01_2 up When changing the type to internal, the async configuration logic get triggered and because the type has changed it will delete the interface and the ofproto port. Next it will call iface_do_create() to re-create the interface as internal. Because we just deleted the interface netdev_open() will try to recreate it as internal. However this will fail with EEXIST as a system interface already exists withe the name. Up till here all is fine... Now some ipv6 route change comes along for the ve01_1 interface, and the route infrastructure will call netdev_open(). This will create the interface of type system. Next the configuration verify process gets triggered due to if_notifier_changed() being true. We now retry the above, but because the interface exists (although in the system class) it will use it, and create the interface successfully. This triggers another if notification, causing yet another config update, and because the system != internal reconfiguration happens and it start from the top... So the fix as presented below is causing netdev_open() only to return the existing device for the class type requested (if the type is specified). Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2017-06-01 15:23:00 -07:00
Thadeu Lima de Souza Cascardo	33d80cf955	netdev: get device type from vport prefix if it uses one If the device name uses a vport prefix, then use that vport type. Since these names are reserved, we can assume this is the right type. This is important when we are querying the datapath right after vswitch has started and using the right type will be even more important when we add support to creating tunnel ports with rtnetlink. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Signed-off-by: Joe Stringer <joe@ovn.org>	2017-05-19 09:46:39 -07:00
Ben Pfaff	909153f143	netdev: Reject empty names in netdev_open(). The empty string is not a valid name for a network device. I would have expected that each of the netdev provider implementations would reject an empty string, but there was a special case for Linux tap devices where they instead caused unexpected behavior. This commit should fix the problem for those devices and every other kind. Reported-by: Gabor Locsei <gabor.locsei@ericsson.com> Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2017-February/043613.html Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Girish Moodalbail <girish.moodalbail@oracle.com> Acked-by: Andy Zhou <azhou@ovn.org>	2017-02-03 14:12:27 -08:00
Andy Zhou	72c84bc2db	dp-packet: Enhance packet batch APIs. One common use case of 'struct dp_packet_batch' is to process all packets in the batch in order. Add an iterator for this use case to simplify the logic of calling sites, Another common use case is to drop packets in the batch, by reading all packets, but writing back pointers of fewer packets. Add macros to support this use case. Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Jarno Rajahalme <jarno@ovn.org>	2017-01-26 17:35:29 -08:00
Daniele Di Proietto	7f381c2e54	netdev-dpdk: Start also dpdkr devices only once on port-add. Since commit 55e075e65ef9("netdev-dpdk: Arbitrary 'dpdk' port naming"), we don't call rte_eth_start() from netdev_open() anymore, we only call it from netdev_reconfigure(). This commit does that also for 'dpdkr' devices, and remove some useless code. Calling rte_eth_start() also from netdev_open() was unnecessary and wasteful. Not doing it reduces code duplication and makes adding a port faster (~900ms before the patch, ~400ms after). Another reason why this is useful is that some DPDK driver might have problems with reconfiguration. For example, until DPDK commit 8618d19b52b1("net/vmxnet3: reallocate shared memzone on re-config"), vmxnet3 didn't support being restarted with a different number of queues. Technically, the netdev interface changed because before opening rxqs or calling netdev_send() the user must check if reconfiguration is required. This patch also documents that, even though no change to the userspace datapath (the only user) is required. Lastly, this patch makes sure the errors returned by ofproto_port_add (which includes the first port reconfiguration) are reported back to the database. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ilya Maximets <i.maximets@samsung.com>	2017-01-15 19:25:11 -08:00
Daniele Di Proietto	57eebbb4c3	dpif-netdev: Don't try to output on a device without txqs. Tunnel devices have 0 txqs and don't support netdev_send(). While netdev_send() simply returns EOPNOTSUPP, the XPS logic is still executed on output, and that might be confused by devices with no txqs. It seems better to have different structures in the fast path for ports that support netdev_{push,pop}_header (tunnel devices), and ports that support netdev_send. With this we can also remove a branch in netdev_send(). This is also necessary for a future commit, which starts DPDK devices without txqs. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ilya Maximets <i.maximets@samsung.com>	2017-01-15 19:25:11 -08:00
Daniele Di Proietto	9fff138ec3	netdev: Add 'errp' to set_config(). Since 55e075e65ef9("netdev-dpdk: Arbitrary 'dpdk' port naming"), set_config() is used to identify a DPDK device, so it's better to report its detailed error message to the user. Tunnel devices and patch ports rely a lot on set_config() as well. This commit adds a param to set_config() that can be used to return an error message and makes use of that in netdev-dpdk and netdev-vport. Before this patch: $ ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk ovs-vsctl: Error detected while setting up 'dpdk0': dpdk0: could not set configuration (Invalid argument). See ovs-vswitchd log for details. ovs-vsctl: The default log directory is "/var/log/openvswitch/". $ ovs-vsctl add-port br0 p+ -- set Interface p+ type=patch ovs-vsctl: Error detected while setting up 'p+': p+: could not set configuration (Invalid argument). See ovs-vswitchd log for details. ovs-vsctl: The default log directory is "/var/log/openvswitch/". $ ovs-vsctl add-port br0 gnv0 -- set Interface gnv0 type=geneve ovs-vsctl: Error detected while setting up 'gnv0': gnv0: could not set configuration (Invalid argument). See ovs-vswitchd log for details. ovs-vsctl: The default log directory is "/var/log/openvswitch/". After this patch: $ ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk ovs-vsctl: Error detected while setting up 'dpdk0': 'dpdk0' is missing 'options:dpdk-devargs'. The old 'dpdk<port_id>' names are not supported. See ovs-vswitchd log for details. ovs-vsctl: The default log directory is "/var/log/openvswitch/". $ ovs-vsctl add-port br0 p+ -- set Interface p+ type=patch ovs-vsctl: Error detected while setting up 'p+': p+: patch type requires valid 'peer' argument. See ovs-vswitchd log for details. ovs-vsctl: The default log directory is "/var/log/openvswitch/". $ ovs-vsctl add-port br0 gnv0 -- set Interface gnv0 type=geneve ovs-vsctl: Error detected while setting up 'gnv0': gnv0: geneve type requires valid 'remote_ip' argument. See ovs-vswitchd log for details. ovs-vsctl: The default log directory is "/var/log/openvswitch/". CC: Ciara Loftus <ciara.loftus@intel.com> CC: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Tested-by: Ciara Loftus <ciara.loftus@intel.com>	2017-01-11 18:29:39 -08:00
Sugesh Chandran	1a2bb11817	netdev-dpdk: Enable Rx checksum offloading feature on DPDK physical ports. Add Rx checksum offloading feature support on DPDK physical ports. By default, the Rx checksum offloading is enabled if NIC supports. However, the checksum offloading can be turned OFF either while adding a new DPDK physical port to OVS or at runtime. The rx checksum offloading can be turned off by setting the parameter to 'false'. For eg: To disable the rx checksum offloading when adding a port, 'ovs-vsctl add-port br0 dpdk0 -- \ set Interface dpdk0 type=dpdk options:rx-checksum-offload=false' OR (to disable at run time after port is being added to OVS) 'ovs-vsctl set Interface dpdk0 options:rx-checksum-offload=false' Similarly to turn ON rx checksum offloading at run time, 'ovs-vsctl set Interface dpdk0 options:rx-checksum-offload=true' The Tx checksum offloading support is not implemented due to the following reasons. 1) Checksum offloading and vectorization are mutually exclusive in DPDK poll mode driver. Vector packet processing is turned OFF when checksum offloading is enabled which causes significant performance drop at Tx side. 2) Normally, OVS generates checksum for tunnel packets in software at the 'tunnel push' operation, where the tunnel headers are created. However enabling Tx checksum offloading involves, ) Mark every packets for tx checksum offloading at 'tunnel_push' and recirculate. ) At the time of xmit, validate the same flag and instruct the NIC to do the checksum calculation. In case NIC doesnt support Tx checksum offloading, the checksum calculation has to be done in software before sending out the packets. No significant performance improvement noticed with Tx checksum offloading due to the e overhead of additional validations + non vector packet processing. In some test scenarios, it introduces performance drop too. Rx checksum offloading still offers 8-9% of improvement on VxLAN tunneling decapsulation even though the SSE vector Rx function is disabled in DPDK poll mode driver. Signed-off-by: Sugesh Chandran <sugesh.chandran@intel.com> Acked-by: Jesse Gross <jesse@kernel.org> Acked-by: Pravin B Shelar <pshelar@ovn.org>	2017-01-04 01:10:35 -08:00
Joe Stringer	048318e072	netdev: Count ports within mutex. netdev_get_vports() previously counted the number of ports outside the mutex, allocated enough memory for that number, then grabbed the mutex to iterate through them and filled the array with the pointers. This is logically wrong; in theory the number of ports could change between allocating the memory and grabbing the mutex. In practice, only the main thread manages these so there is no chance for a segfault. Fix it up anyway. Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>	2016-11-16 11:53:50 -08:00
Pravin B Shelar	32b3c5338a	netdev: Fix sockaddr cast warning. Following warning was reported by Travis:- lib/netdev.c:1916:19: error: cast from 'struct sockaddr ' to 'struct sockaddr_in ' increases required alignment from 2 to 4 [-Werror,-Wcast-align] sin = (struct sockaddr_in ) ifa->ifa_netmask; ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ lib/netdev.c:1924:20: error: cast from 'struct sockaddr ' to 'struct sockaddr_in6 ' increases required alignment from 2 to 4 [-Werror,-Wcast-align] sin6 = (struct sockaddr_in6 ) ifa->ifa_netmask; Fixes: 3f31aded6 ("netdev: fix netmask in netdev_get_addrs"). Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>	2016-11-16 11:18:56 -08:00
Thadeu Lima de Souza Cascardo	3f31aded62	netdev: fix netmask in netdev_get_addrs When iterating on getifaddrs result, ifa_netmask is dereferenced, but it's already a pointer to struct sockaddr. This would result in wrong masks being used when comparing addresses while calculating the source address given a destination address at the routing code. For example, the mask ::ffff:116.85.0.0 would be used, causing 172.16.100.0/24 to match 172.16.101.1, though they should not match. This will not happen when using a dummy netdev, as netdev_get_addrs is not used by it. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org>	2016-11-15 12:31:19 -08:00
Huanle Han	1388f0b7a4	netdev: Avoid leaking seq in netdev_open() error path. Signed-off-by: Huanle Han <hanxueluo@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2016-09-20 08:52:51 -07:00
Daniele Di Proietto	3a414a0a4f	ofproto: Honor mtu_request even for internal ports. By default Open vSwitch tries to configure internal interfaces MTU to match the bridge minimum, overriding any attempt by the user to configure it through standard system tools, or the database. While this works in many simple cases (there are probably many users that rely on this) it may create problems for more advanced use cases (like any overlay networks). This commit allows the user to override the default behavior by providing an explict MTU in the mtu_request column in the Interface table. This means that Open vSwitch will now treat differently database MTU requests from standard system tools MTU requests (coming from `ip link` or `ifconfig`), but this seems the best way to remain compatible with old users while providing a more powerful interface. Suggested-by: Darrell Ball <dlu998@gmail.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ben Pfaff <blp@ovn.org> Tested-by: Joe Stringer <joe@ovn.org>	2016-09-02 16:01:12 -07:00
Thadeu Lima de Souza Cascardo	74d46929c6	Revert "netdev: do not allow devices to be opened with conflicting types" This reverts commit d2fa6c676a13e86acc7f17261b2d87484f625d45. When doing a restart, the routing table will open ports as system, which prevents internal ports to be opened with the right type. That causes failures in creating the ports. We should revisit this patch after finding a proper fix on the routing table layer. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-08-16 10:37:20 -07:00
Daniele Di Proietto	1c33f0c35e	netdev: Pass 'netdev_class' to ->run() and ->wait(). This will allow run() and wait() methods to be shared between different classes and still perform class-specific work. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ben Pfaff <blp@ovn.org>	2016-08-15 11:07:37 -07:00
Daniele Di Proietto	4124cb1254	netdev: Make netdev_set_mtu() netdev parameter non-const. Every provider silently drops the const attribute when converting the parameter to the appropriate subclass. Might as well drop the const attribute from the parameter, since this is a "set" function. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ilya Maximets <i.maximets@samsung.com>	2016-08-12 19:32:12 -07:00
Mark Kavanagh	c90e4d9c95	netdev-provider: fix comments for netdev_rxq_recv Commit 64839cf43 applies batch objects to netdev-providers, but some comments were not updated accordingly. Fix these: - replace 'pkts' with 'batch' - replace '*cnt' with 'batch->count' - replace MAX_RX_BATCH with NETDEV_MAX_BURST - remove superfluous whitespace Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com> Acked-by: William Tu <u9012063@gmail.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-07-27 16:16:05 -07:00
Thadeu Lima de Souza Cascardo	d2fa6c676a	netdev: do not allow devices to be opened with conflicting types When a device is already opened, netdev_open should verify that the types match, or else return an error. Otherwise, users might expect to open a device with a certain type and get a handle belonging to a different type. This also prevents certain conflicting configurations that would have a port of a certain type in the database and one of a different type on the system. For example, when adding an interface with a type other than system, and there is already a system interface with the same name, as the routing table will hold a reference to that system interface, some conflicts will arise. The netdev will be opened with the incorrect type and that will make vswitchd remove it, but adding it again will fail as it already exists. Failing earlier prevents some vswitchd loops in reconfiguring the interface. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-07-27 14:48:24 -07:00
Ilya Maximets	324c837485	dpif-netdev: XPS (Transmit Packet Steering) implementation. If CPU number in pmd-cpu-mask is not divisible by the number of queues and in a few more complex situations there may be unfair distribution of TX queue-ids between PMD threads. For example, if we have 2 ports with 4 queues and 6 CPUs in pmd-cpu-mask such distribution is possible: <------------------------------------------------------------------------> pmd thread numa_id 0 core_id 13: port: vhost-user1 queue-id: 1 port: dpdk0 queue-id: 3 pmd thread numa_id 0 core_id 14: port: vhost-user1 queue-id: 2 pmd thread numa_id 0 core_id 16: port: dpdk0 queue-id: 0 pmd thread numa_id 0 core_id 17: port: dpdk0 queue-id: 1 pmd thread numa_id 0 core_id 12: port: vhost-user1 queue-id: 0 port: dpdk0 queue-id: 2 pmd thread numa_id 0 core_id 15: port: vhost-user1 queue-id: 3 <------------------------------------------------------------------------> As we can see above dpdk0 port polled by threads on cores: 12, 13, 16 and 17. By design of dpif-netdev, there is only one TX queue-id assigned to each pmd thread. This queue-id's are sequential similar to core-id's. And thread will send packets to queue with exact this queue-id regardless of port. In previous example: pmd thread on core 12 will send packets to tx queue 0 pmd thread on core 13 will send packets to tx queue 1 ... pmd thread on core 17 will send packets to tx queue 5 So, for dpdk0 port after truncating in netdev-dpdk: core 12 --> TX queue-id 0 % 4 == 0 core 13 --> TX queue-id 1 % 4 == 1 core 16 --> TX queue-id 4 % 4 == 0 core 17 --> TX queue-id 5 % 4 == 1 As a result only 2 of 4 queues used. To fix this issue some kind of XPS implemented in following way: * TX queue-ids are allocated dynamically. * When PMD thread first time tries to send packets to new port it allocates less used TX queue for this port. * PMD threads periodically performes revalidation of allocated TX queue-ids. If queue wasn't used in last XPS_TIMEOUT_MS milliseconds it will be freed while revalidation. * XPS is not working if we have enough TX queues. Reported-by: Zhihong Wang <zhihong.wang@intel.com> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-07-27 12:56:04 -07:00
Terry Wilson	ee89ea7b47	json: Move from lib to include/openvswitch. To easily allow both in- and out-of-tree building of the Python wrapper for the OVS JSON parser (e.g. w/ pip), move json.h to include/openvswitch. This also requires moving lib/{hmap,shash}.h. Both hmap.h and shash.h were #include-ing "util.h" even though the headers themselves did not use anything from there, but rather from include/openvswitch/util.h. Fixing that required including util.h in several C files mostly due to OVS_NOT_REACHED and things like xmalloc. Signed-off-by: Terry Wilson <twilson@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2016-07-22 17:09:17 -07:00
William Tu	64839cf432	netdev-provider: Apply batch object to netdev provider. Commit 1895cc8dbb64 ("dpif-netdev: create batch object") introduces batch process functions and 'struct dp_packet_batch' to associate with batch-level metadata. This patch applies the packet batch object to the netdev provider interface (dummy, Linux, BSD, and DPDK) so that batch APIs can be used in providers. With batch metadata visible in providers, optimizations can be introduced at per-batch level instead of per-packet. Tested-at: https://travis-ci.org/williamtu/ovs-travis/builds/145694197 Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-07-21 16:46:32 -07:00
William Tu	aaca4fe0ce	ofp-actions: Add truncate action. The patch adds a new action to support packet truncation. The new action is formatted as 'output(port=n,max_len=m)', as output to port n, with packet size being MIN(original_size, m). One use case is to enable port mirroring to send smaller packets to the destination port so that only useful packet information is mirrored/copied, saving some performance overhead of copying entire packet payload. Example use case is below as well as shown in the testcases: - Output to port 1 with max_len 100 bytes. - The output packet size on port 1 will be MIN(original_packet_size, 100). # ovs-ofctl add-flow br0 'actions=output(port=1,max_len=100)' - The scope of max_len is limited to output action itself. The following packet size of output:1 and output:2 will be intact. # ovs-ofctl add-flow br0 \ 'actions=output(port=1,max_len=100),output:1,output:2' - The Datapath actions shows: # Datapath actions: trunc(100),1,1,2 Tested-at: https://travis-ci.org/williamtu/ovs-travis/builds/140037134 Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org>	2016-06-24 09:17:00 -07:00
Pravin B Shelar	4975aa3ee6	netdev-native-tnl: Introduce ip_build_header() The native tunneling build tunnel header code is spread across two different modules, it makes pretty hard to follow the code. Following patch refactors the code to move all code to netdev-ative-tnl module. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>	2016-05-23 20:27:14 -07:00
Daniele Di Proietto	050c60bfb5	netdev-dpdk: Use ->reconfigure() call to change rx/tx queues. This introduces in dpif-netdev and netdev-dpdk the first use for the newly introduce reconfigure netdev call. When a request to change the number of queues comes, netdev-dpdk will remember this and notify the upper layer via netdev_request_reconfigure(). The datapath, instead of periodically calling netdev_set_multiq(), can detect this and call reconfigure(). This mechanism can also be used to: * Automatically match the number of rxq with the one provided by qemu via the new_device callback. * Provide a way to change the MTU of dpdk devices at runtime. * Move a DPDK vhost device to the proper NUMA socket. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Tested-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ilya Maximets <i.maximets@samsung.com>	2016-05-23 10:27:42 -07:00
Daniele Di Proietto	790fb3b745	netdev: Add reconfigure request mechanism. A netdev provider, especially a PMD provider (like netdev DPDK) might not be able to change some of its parameters (such as MTU, or number of queues) without stopping everything and restarting. This commit introduces a mechanism that allows a netdev provider to request a restart (netdev_request_reconfigure()). The upper layer can be notified via netdev_wait_reconf_required() and netdev_is_reconf_required(). After closing all the rxqs the upper layer can finally call netdev_reconfigure(), to make sure that the new configuration is in place. This will be used by next commit to reconfigure rx and tx queues in netdev-dpdk. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Tested-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>	2016-05-23 10:27:42 -07:00
Pravin B Shelar	9235b4793e	dpif-netdev: Fix memory leak in tunnel header pop action. The tunnel header pop action can leak batch of packet in case of error. Following patch fixex the error code path. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>	2016-05-18 19:39:18 -07:00
Pravin B Shelar	1895cc8dbb	dpif-netdev: create batch object DPDK datapath operate on batch of packets. To pass the batch of packets around we use packets array and count. Next patch needs to associate meta-data with each batch of packets. So Introducing a batch structure to make handling the metadata easier. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>	2016-05-18 19:39:18 -07:00
Pravin B Shelar	1c8f98d96a	netdev: Return number of packet from netdev_pop_header() Current tunnel-pop API does not allow the netdev implementation retain a packet but STT can keep a packet from batch of packets during TCP reassembly processing. To return exact count of valid packet STT need to pass this number of packet parameter as a reference. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>	2016-05-18 19:39:18 -07:00
Ciara Loftus	d640a7e100	netdev: Initialise DPDK netdev classes only once DPDK netdev classes were being initialised twice, resulting in warning logs like so: netdev\|WARN\|attempted to register duplicate netdev provider: dpdk This commit removes one of the initialisation calls. Fixes: 0692257923fe ("netdev: Fix potential deadlock.") Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2016-05-17 19:15:08 -07:00
Ben Pfaff	0692257923	netdev: Fix potential deadlock. Until now, netdev_class_mutex and route_table_mutex could be taken in either order: * netdev_run() takes netdev_class_mutex, then netdev_vport_run() calls route_table_run(), which takes route_table_mutex. * route_table_init() takes route_table_mutex and then eventually calls netdev_open(), which takes netdev_class_mutex. This commit fixes the problem by converting the netdev_classes hmap, protected by netdev_class_mutex, into a cmap protected on the read side by RCU. Only a very small amount of code actually writes to the cmap in question, so it's a lot easier to understand the locking rules at that point. In particular, there's no need to take netdev_class_mutex from either netdev_run() or netdev_open(), so neither of the code paths above determines a lock ordering any longer. Reported-by: William Tu <u9012063@gmail.com> Reported-at: http://openvswitch.org/pipermail/discuss/2016-February/020216.html Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com> Tested-by: William Tu <u9012063@gmail.com>	2016-05-09 16:42:57 -07:00
mweglicx	d6e3feb57c	Add support for extended netdev statistics based on RFC 2819. Implementation of new statistics extension for DPDK ports: - Add new counters definition to netdev struct and open flow, based on RFC2819. - Initialize netdev statistics as "filtered out" before passing it to particular netdev implementation (because of that change, statistics which are not collected are reported as filtered out, and some unit tests were modified in this respect). - New statistics are retrieved using experimenter code and are printed as a result to ofctl dump-ports. - New counters are available for OpenFlow 1.4+. - Add new vendor id: INTEL_VENDOR_ID. - New statistics are printed to output via ofctl only if those are present in reply message. - Add new file header: include/openflow/intel-ext.h which contains new statistics definition. - Extended statistics are implemented only for dpdk-physical and dpdk-vhost port types. - Dpdk-physical implementation uses xstats to collect statistics. - Dpdk-vhost implements only part of statistics (RX packet sized based counters). Signed-off-by: Michal Weglicki <michalx.weglicki@intel.com> [blp@ovn.org made software devices more consistent] Signed-off-by: Ben Pfaff <blp@ovn.org>	2016-05-06 15:28:56 -07:00

1 2 3 4 5 ...

295 Commits