mir/ovs - ovs - Mike's Git repositories

mir/ovs

mirror of https://github.com/openvswitch/ovs synced 2025-08-29 05:18:13 +00:00

Author	SHA1	Message	Date
Jesse Gross	8d8ab6c2d5	tun-metadata: Manage tunnel TLV mapping table on a per-bridge basis. When using tunnel TLVs (at the moment, this means Geneve options), a controller must first map the class and type onto an appropriate OXM field so that it can be used in OVS flow operations. This table is managed using OpenFlow extensions. The original code that added support for TLVs made the mapping table global as a simplification. However, this is not really logically correct as the OpenFlow management commands are operating on a per-bridge basis. This removes the original limitation to make the table per-bridge. One nice result of this change is that it is generally clearer whether the tunnel metadata is in datapath or OpenFlow format. Rather than allowing ad-hoc format changes and trying to handle both formats in the tunnel metadata functions, the format is more clearly separated by function. Datapaths (both kernel and userspace) use datapath format and it is not changed during the upcall process. At the beginning of action translation, tunnel metadata is converted to OpenFlow format and flows and wildcards are translated back at the end of the process. As an additional benefit, this change improves performance in some flow setup situations by keeping the tunnel metadata in the original packet format in more cases. This helps when copies need to be made as the amount of data touched is only what is present in the packet rather than the maximum amount of metadata supported. Co-authored-by: Madhu Challa <challa@noironetworks.com> Signed-off-by: Madhu Challa <challa@noironetworks.com> Signed-off-by: Jesse Gross <jesse@kernel.org> Acked-by: Ben Pfaff <blp@ovn.org>	2016-09-19 09:52:22 -07:00
Daniele Di Proietto	e98d0cb3ac	netdev-dummy: Add dummy-internal class. "internal" netdevs are treated specially in OVS (e.g. for MTU), but the dummy datapath remaps both "system" and "internal" devices to the same "dummy" netdev class, so there's no way to discern those in tests. This commit adds a new "dummy-internal" netdev type, which will be used by the dummy datapath for internal ports, so that other parts of the code can understand which ports are internal just by looking at the netdev object. The alternative solution, using the original interface type ("internal") instead of the translated netdev type ("dummy"), is harder to implement, because in so many places only the netdev object is available. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ben Pfaff <blp@ovn.org>	2016-08-15 11:07:42 -07:00
Daniele Di Proietto	84dbfb2b69	dpif-netdev: Fix -Wformat warning on 32-bit build. Use the appropriate format specifier for size_t, otherwise the 32-bit build fails. Reported-at: https://travis-ci.org/openvswitch/ovs/jobs/151938383 Fixes: 3453b4d62a98("dpif-netdev: dpcls per in_port with sorted subtables") Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Joe Stringer <joe@ovn.org>	2016-08-12 17:56:43 -07:00
Jan Scheurich	3453b4d62a	dpif-netdev: dpcls per in_port with sorted subtables The user-space datapath (dpif-netdev) consists of a first level "exact match cache" (EMC) matching on 5-tuples and the normal megaflow classifier. With many parallel packet flows (e.g. TCP connections) the EMC becomes inefficient and the OVS forwarding performance is determined by the megaflow classifier. The megaflow classifier (dpcls) consists of a variable number of hash tables (aka subtables), each containing megaflow entries with the same mask of packet header and metadata fields to match upon. A dpcls lookup matches a given packet against all subtables in sequence until it hits a match. As megaflow cache entries are by construction non-overlapping, the first match is the only match. Today the order of the subtables in the dpcls is essentially random so that on average a dpcls lookup has to visit N/2 subtables for a hit, when N is the total number of subtables. Even though every single hash-table lookup is fast, the performance of the current dpcls degrades when there are many subtables. How does the patch address this issue: In reality there is often a strong correlation between the ingress port and a small subset of subtables that have hits. The entire megaflow cache typically decomposes nicely into partitions that are hit only by packets entering from a range of similar ports (e.g. traffic from Phy -> VM vs. traffic from VM -> Phy). Therefore, maintaining a separate dpcls instance per ingress port with its subtable vector sorted by frequency of hits reduces the average number of subtables lookups in the dpcls to a minimum, even if the total number of subtables gets large. This is possible because megaflows always have an exact match on in_port, so every megaflow belongs to unique dpcls instance. For thread safety, the PMD thread needs to block out revalidators during the periodic optimization. We use ovs_mutex_trylock() to avoid blocking the PMD. To monitor the effectiveness of the patch we have enhanced the ovs-appctl dpif-netdev/pmd-stats-show command with an extra line "avg. subtable lookups per hit" to report the average number of subtable lookup needed for a megaflow match. Ideally, this should be close to 1 and almost all cases much smaller than N/2. The PMD tests have been adjusted to the additional line in pmd-stats-show. We have benchmarked a L3-VPN pipeline on top of a VXLAN overlay mesh. With pure L3 tenant traffic between VMs on different nodes the resulting netdev dpcls contains N=4 subtables. Each packet traversing the OVS datapath is subject to dpcls lookup twice due to the tunnel termination. Disabling the EMC, we have measured a baseline performance (in+out) of ~1.45 Mpps (64 bytes, 10K L4 packet flows). The average number of subtable lookups per dpcls match is 2.5. With the patch the average number of subtable lookups per dpcls match is reduced to 1 and the forwarding performance grows by ~50% to 2.13 Mpps. Even with EMC enabled, the patch improves the performance by 9% (for 1000 L4 flows) and 34% (for 50K+ L4 flows). As the actual number of subtables will often be higher in reality, we can assume that this is at the lower end of the speed-up one can expect from this optimization. Just running a parallel ping between the VXLAN tunnel endpoints increases the number of subtables and hence the average number of subtable lookups from 2.5 to 3.5 on master with a corresponding decrease of throughput to 1.2 Mpps. With the patch the parallel ping has no impact on average number of subtable lookups and performance. The performance gain is then ~75%. Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Acked-by: Antonio Fischetti <antonio.fischetti@intel.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-08-12 14:38:15 -07:00
Jarno Rajahalme	da9cfca6e2	Revert "pvector: Expose non-concurrent priority vector." This reverts commit 8bdfe1313894047d44349fa4cf4402970865950f. I failed to see that lib/dpif-netdev.c actually needs the concurrency provided by pvector prior to this change. More specifically, when a subtable is removed, concurrent lookups may skip over another subtable swapped in to the place of the removed subtable in the vector. Since this was the only use of the non-concurrent pvector, it is cleaner to revert the whole patch. Reported-by: Jan Scheurich <jan.scheurich@ericsson.com> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-08-10 14:58:51 -07:00
Fischetti, Antonio	5b1c9c789d	dpcls_lookup: added comments. This patch adds some comments to the dpcls_lookup() funtion, which is one of the most important places where the Userspace wildcard matching happens. The purpose is to give some more explanations on its design and also on how it works. Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Jarno Rajahalme <jarno@ovn.org>	2016-08-05 13:48:38 -07:00
Ilya Maximets	9f7a3035d2	dpif-netdev: Fix xps revalidation. Revalidation should work in case of 'dynamic_txqs == true'. Fixes: 324c8374852a ("dpif-netdev: XPS (Transmit Packet Steering) implementation.") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-07-29 18:03:09 -07:00
Jarno Rajahalme	8bdfe13138	pvector: Expose non-concurrent priority vector. PMD threads use pvectors but do not need the overhead of the concurrent version. Expose the non-concurrent version for that use. Note that struct pvector is renamed as struct cpvector (for concurrent priority vector), and the former struct pvector_impl is now struct pvector. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>	2016-07-29 11:12:08 -07:00
Daniele Di Proietto	66e4ad8aa4	conntrack: Add 'dl_type' parameter to conntrack_execute(). Now that dpif_execute has a 'flow' member, it's pretty easy to access a the flow (or the matching megaflow) in dp_execute_cb(). This means that's not necessary anymore for the connection tracker to reextract 'dl_type' from the packet, it can be passed as a parameter. This change means that we have to complicate sightly test-conntrack to group the packets by dl_type before passing them to the connection tracker. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Joe Stringer <joe@ovn.org>	2016-07-27 18:53:29 -07:00
Daniele Di Proietto	5d9cbb4cb8	dpif-netdev: Implement conntrack flush interface. New functions are implemented in the conntrack module to support this. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Flavio Leitner <fbl@sysclose.org>	2016-07-27 18:52:13 -07:00
Daniele Di Proietto	4d4e68ed20	dpif-netdev: Implement conntrack dump functions. New functions are implemented in the conntrack module to support this. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Flavio Leitner <fbl@sysclose.org>	2016-07-27 18:52:13 -07:00
Daniele Di Proietto	5cf3edb311	dpif-netdev: Execute conntrack action. This commit implements the OVS_ACTION_ATTR_CT action in dpif-netdev. To allow ofproto-dpif to detect the conntrack feature, flow_put will not discard anymore flows with ct_* fields set. We still shouldn't allow flows with NAT bits set, since there is no support for NAT. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Antonio Fischetti <antonio.fischetti@intel.com>	2016-07-27 18:52:13 -07:00
Thadeu Lima de Souza Cascardo	a3e8437a18	dpif-netdev: use the open_type when creating the local port Instead of using the internal type, use the port_open_type when creating the local port. That makes sure that whenever dpif_port_query is used, the netdev open_type is returned instead of the "internal" type. For other ports, that is already the case, as the netdev type is used when creating the dp_netdev_port. That changes the output of dpctl when showing the local port, and also when trying to change its type. So, corresponding tests are fixed. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-07-27 14:48:24 -07:00
Ilya Maximets	3eb67853c4	dpif-netdev: Introduce pmd-rxq-affinity. New 'other_config:pmd-rxq-affinity' field for Interface table to perform manual pinning of RX queues to desired cores. This functionality is required to achieve maximum performance because all kinds of ports have different cost of rx/tx operations and only user can know about expected workload on different ports. Example: # ./bin/ovs-vsctl set interface dpdk0 options:n_rxq=4 \ other_config:pmd-rxq-affinity="0:3,1:7,3:8" Queue #0 pinned to core 3; Queue #1 pinned to core 7; Queue #2 not pinned. Queue #3 pinned to core 8; It's decided to automatically isolate cores that have rxq explicitly assigned to them because it's useful to keep constant polling rate on some performance critical ports while adding/deleting other ports without explicit pinning of all ports. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-07-27 12:56:04 -07:00
Ilya Maximets	a6a426d69a	dpif-netdev: Add reconfiguration request to dp_netdev. Next patches will add new conditions when reconfiguration will be required. It'll be simpler to have common way to request reconfiguration. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-07-27 12:56:04 -07:00
Ilya Maximets	91364d18de	bridge: Pass interface's configuration to datapath. This commit adds functionality to pass value of 'other_config' column of 'Interface' table to datapath. This may be used to pass not directly connected with netdev options and configure behaviour of the datapath for different ports. For example: pinning of rx queues to polling threads in dpif-netdev. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-07-27 12:56:04 -07:00
Ilya Maximets	324c837485	dpif-netdev: XPS (Transmit Packet Steering) implementation. If CPU number in pmd-cpu-mask is not divisible by the number of queues and in a few more complex situations there may be unfair distribution of TX queue-ids between PMD threads. For example, if we have 2 ports with 4 queues and 6 CPUs in pmd-cpu-mask such distribution is possible: <------------------------------------------------------------------------> pmd thread numa_id 0 core_id 13: port: vhost-user1 queue-id: 1 port: dpdk0 queue-id: 3 pmd thread numa_id 0 core_id 14: port: vhost-user1 queue-id: 2 pmd thread numa_id 0 core_id 16: port: dpdk0 queue-id: 0 pmd thread numa_id 0 core_id 17: port: dpdk0 queue-id: 1 pmd thread numa_id 0 core_id 12: port: vhost-user1 queue-id: 0 port: dpdk0 queue-id: 2 pmd thread numa_id 0 core_id 15: port: vhost-user1 queue-id: 3 <------------------------------------------------------------------------> As we can see above dpdk0 port polled by threads on cores: 12, 13, 16 and 17. By design of dpif-netdev, there is only one TX queue-id assigned to each pmd thread. This queue-id's are sequential similar to core-id's. And thread will send packets to queue with exact this queue-id regardless of port. In previous example: pmd thread on core 12 will send packets to tx queue 0 pmd thread on core 13 will send packets to tx queue 1 ... pmd thread on core 17 will send packets to tx queue 5 So, for dpdk0 port after truncating in netdev-dpdk: core 12 --> TX queue-id 0 % 4 == 0 core 13 --> TX queue-id 1 % 4 == 1 core 16 --> TX queue-id 4 % 4 == 0 core 17 --> TX queue-id 5 % 4 == 1 As a result only 2 of 4 queues used. To fix this issue some kind of XPS implemented in following way: * TX queue-ids are allocated dynamically. * When PMD thread first time tries to send packets to new port it allocates less used TX queue for this port. * PMD threads periodically performes revalidation of allocated TX queue-ids. If queue wasn't used in last XPS_TIMEOUT_MS milliseconds it will be freed while revalidation. * XPS is not working if we have enough TX queues. Reported-by: Zhihong Wang <zhihong.wang@intel.com> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-07-27 12:56:04 -07:00
Ilya Maximets	aacf18c3a7	util: Expose function nullable_string_is_equal. Implementation of 'nullable_string_is_equal()' moved to util.c and reused inside dpif-netdev. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-07-25 18:39:29 -07:00
Terry Wilson	ee89ea7b47	json: Move from lib to include/openvswitch. To easily allow both in- and out-of-tree building of the Python wrapper for the OVS JSON parser (e.g. w/ pip), move json.h to include/openvswitch. This also requires moving lib/{hmap,shash}.h. Both hmap.h and shash.h were #include-ing "util.h" even though the headers themselves did not use anything from there, but rather from include/openvswitch/util.h. Fixing that required including util.h in several C files mostly due to OVS_NOT_REACHED and things like xmalloc. Signed-off-by: Terry Wilson <twilson@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2016-07-22 17:09:17 -07:00
Flavio Leitner	9dede5cff5	dpif-netdev: Remove PMD latency on seq_mutex The PMD thread needs to keep processing RX queues in order to achieve maximum throughput. It also needs to sweep emc cache and quiesce which use seq_mutex. That mutex can eventually block the PMD thread causing latency spikes and affecting the throughput. Since there is no requirement for running those tasks at a specific time, this patch extend seq API to allow tentative locking instead. Reported-by: Karl Rister <krister@redhat.com> Co-authored-by: Karl Rister <krister@redhat.com> Signed-off-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-07-08 16:17:20 -07:00
Ilya Maximets	81acebdaaf	netdev-dpdk: Obtain number of queues for vhost ports from attached virtio. Currently, there are few inconsistencies in ways to configure number of queues for netdev device: * dpif-netdev can't know about exact number of queues allocated inside netdev. This leads to constant mapping of queue-ids to 'real' ones. * We are able to configure 'n_rxq' for vhost-user devices, but there is only one sane number of rx queues which must be used and configured manually (number of queues that allocated in QEMU). This patch disables configuration of 'n_rxq' for DPDK vHost devices. Configuration of rx and tx queues now automatically applied from connected virtio device. Standard reconfiguration mechanism was used to apply this changes. Also, now 'n_txq' and 'n_rxq' are always the real numbers of queues in the device. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-07-08 15:27:21 -07:00
William Tu	6ee9536320	Fix dead assignments. Found by Clang. Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2016-07-02 21:19:22 -07:00
Ben Pfaff	2225c0b935	util: New function nullable_xstrdup(). It's a pretty common pattern so create a function for it. Signed-off-by: Ben Pfaff <blp@ovn.org>	2016-06-26 20:31:28 -07:00
William Tu	aaca4fe0ce	ofp-actions: Add truncate action. The patch adds a new action to support packet truncation. The new action is formatted as 'output(port=n,max_len=m)', as output to port n, with packet size being MIN(original_size, m). One use case is to enable port mirroring to send smaller packets to the destination port so that only useful packet information is mirrored/copied, saving some performance overhead of copying entire packet payload. Example use case is below as well as shown in the testcases: - Output to port 1 with max_len 100 bytes. - The output packet size on port 1 will be MIN(original_packet_size, 100). # ovs-ofctl add-flow br0 'actions=output(port=1,max_len=100)' - The scope of max_len is limited to output action itself. The following packet size of output:1 and output:2 will be intact. # ovs-ofctl add-flow br0 \ 'actions=output(port=1,max_len=100),output:1,output:2' - The Datapath actions shows: # Datapath actions: trunc(100),1,1,2 Tested-at: https://travis-ci.org/williamtu/ovs-travis/builds/140037134 Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org>	2016-06-24 09:17:00 -07:00
Jesse Gross	9044f2c11f	dpif-netdev: Print installed flows in dpif format. When debug logging is enabled, dpif-netdev can print each flow as it is installed, which it currently does using OpenFlow match formatting. Compared to ODP formatting, there generally isn't too much difference since the fields are largely the same but it is inconsistent with other logging in dpif-netdev as well as the analogous functions that deal with the kernel. However, in some cases there is a difference between the two formats, such as in the cases of input port or tunnel metadata. For input port, datapath format helped detect that the generated masks were incorrect. As for tunnels, at the moment, it's possible to convert between the two formats on demand as we have a global metadata table. In the future, though this won't be possible as the metadata table becomes per-bridge which the datapath won't have access to. Signed-off-by: Jesse Gross <jesse@kernel.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-06-13 13:28:43 -07:00
Jesse Gross	098d2a9777	odp-util: Remove odp_in_port from struct odp_flow_key_parms. When calling odp_flow_key_from_flow (or _mask), the in_port included as part of the flow is ignored and must be explicitly passed as a separate parameter. This is because the assumption was that the flow's version would often be in OFP format, rather than ODP. However, at this point all flows that are ready for serialization in netlink format already have their in_port properly set to ODP format. As a result, every caller needs to explicitly initialize the extra paramter to the value that is in the flow. This switches to just use the value in the flow to simply things and avoid the possibility of forgetting to initialize the extra parameter. Signed-off-by: Jesse Gross <jesse@kernel.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-06-13 13:28:39 -07:00
Daniele Di Proietto	6930c7e01c	ovs-numa: Introduce function to set current thread affinity. This commit moves the code that sets the pmd threads affinity from netdev-dpdk to ovs-numa. There's one small part left in netdev-dpdk, to set the lcore_id. Now dpif-netdev will call both modules (ovs-numa and netdev-dpdk) when starting a pmd thread. This change will allow having a dummy implementation of the set affinity call, for testing purposes. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ilya Maximets <i.maximets@samsung.com>	2016-06-07 11:15:01 -07:00
Ilya Maximets	c673049c79	dpctl: Implement dpctl/flow-get for dpif-netdev. Currently 'dpctl/flow-get' doesn't work for flows installed by PMD threads. Fix that by implementing search across all PMD threads. Will be returned flow from first PMD thread with match. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-06-06 18:10:22 -07:00
Daniele Di Proietto	050c60bfb5	netdev-dpdk: Use ->reconfigure() call to change rx/tx queues. This introduces in dpif-netdev and netdev-dpdk the first use for the newly introduce reconfigure netdev call. When a request to change the number of queues comes, netdev-dpdk will remember this and notify the upper layer via netdev_request_reconfigure(). The datapath, instead of periodically calling netdev_set_multiq(), can detect this and call reconfigure(). This mechanism can also be used to: * Automatically match the number of rxq with the one provided by qemu via the new_device callback. * Provide a way to change the MTU of dpdk devices at runtime. * Move a DPDK vhost device to the proper NUMA socket. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Tested-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ilya Maximets <i.maximets@samsung.com>	2016-05-23 10:27:42 -07:00
Daniele Di Proietto	dc36593cf4	dpif-netdev: Handle errors in reconfigure_pmd_threads(). Errors returned by netdev_set_multiq() and netdev_rxq_open() weren't handled properly in reconfigure_pmd_threads(). In case of error now we remove the port from the datapath. Also, part of the code is moved in a new function, port_reconfigure(). Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Tested-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ilya Maximets <i.maximets@samsung.com>	2016-05-23 10:27:42 -07:00
Daniele Di Proietto	6e3c6fa4ad	dpif-netdev: Change pmd thread configuration in dpif_netdev_run(). Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Tested-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ilya Maximets <i.maximets@samsung.com>	2016-05-23 10:27:42 -07:00
Daniele Di Proietto	e9985d6aa4	dpif-netdev: Use hmap for ports. netdev objects are hard to use with RCU, because it's not possible to split removal and reclamation. Postponing the removal means that the port is not removed and cannot be readded immediately. Waiting for reclamation means introducing a quiescent state, and that may introduce subtle bugs, due to the RCU model we use in userspace. This commit changes the port container from cmap to hmap. 'port_mutex' must be held by readers and writers. This shouldn't have performance impact, as readers in the fast path use a thread local cache. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Tested-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ilya Maximets <i.maximets@samsung.com>	2016-05-23 10:27:35 -07:00
Daniele Di Proietto	d0cca6c344	dpif-netdev: Add pmd thread local port cache for transmission. A future commit will stop using RCU for 'dp->ports' and use a mutex for reading/writing them. To avoid taking a mutex in dp_execute_cb(), which is called in the fast path, this commit introduces a pmd thread local cache of ports. The downside is that every port add/remove now needs to synchronize with every pmd thread. Among the advantages, keeping a per thread port mapping could allow greater control over the txq assigment. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Tested-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ilya Maximets <i.maximets@samsung.com>	2016-05-23 10:10:23 -07:00
Daniele Di Proietto	d42f9307a0	dpif-netdev: Fix race condition in pmd thread initialization. The pmds and the main threads are synchronized using a condition variable. The main thread writes a new configuration, then it waits on the condition variable. A pmd thread reads the new configuration, then it calls signal() on the condition variable. To make sure that the pmds and the main thread have a consistent view, each signal() should be backed by a wait(). Currently the first signal() doesn't have a corresponding wait(). If the pmd thread takes a long time to start and the signal() is received by a later wait, the threads will have an inconsistent view. The commit fixes the problem by removing the first signal() from the pmd thread. This is hardly a problem on current master, because the main thread will call the first wait() a long time after the creation of a pmd thread. It becomes a problem with the next commits. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Tested-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ilya Maximets <i.maximets@samsung.com>	2016-05-23 10:10:23 -07:00
Daniele Di Proietto	b68872d8bb	dpif-netdev: Add functions to modify rxq without reloading pmd threads. This commit introduces some functions to add/remove rxqs from pmd threads without reloading them. They will be used by next commits. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Tested-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ilya Maximets <i.maximets@samsung.com>	2016-05-23 10:10:23 -07:00
Daniele Di Proietto	b8d2925224	dpif-netdev: Factor out port_create() from do_add_port(). Instead of performing every operation inside do_port_add() it seems clearer to introduce port_create(), since we already have port_destroy(). No functional change. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Tested-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ilya Maximets <i.maximets@samsung.com>	2016-05-23 10:10:23 -07:00
Daniele Di Proietto	0087346303	dpif-netdev: Remove unused 'index' in dp_netdev_pmd_thread. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Tested-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ilya Maximets <i.maximets@samsung.com>	2016-05-23 10:10:23 -07:00
Daniele Di Proietto	3186ea467b	dpif-netdev: Destroy 'port_mutex' in dp_netdev_free(). Found by inspection. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Tested-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ilya Maximets <i.maximets@samsung.com>	2016-05-23 10:10:23 -07:00
Daniele Di Proietto	36d8de17ff	dpif-netdev: Initialize packet RSS hash in dpif_netdev_execute(). The datapath code expects the RSS hash to always be initialized. This is enforced by checking in emc_processing() that the hash is valid, and eventually by computing a new one. Unfortunately, there is another entry point to the datapath, dpif_netdev_execute(). A packet generated by OVS (BFD frame, packet-out from controller) doesn't have a valid RSS hash and so is allowed to enter the datapath with an uninitialized hash value. This commit recomputes the hash (if not valid) in dpif_netdev_execute(). The only place where we would use an invalid hash is netdev-vport, in push_udp_header(). This caused an uninitialized memory read, and a random value to be assigned to the outer tunnel header source port. Reported-by: William Tu <u9012063@gmail.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: William Tu <u9012063@gmail.com> Acked-by: Ben Pfaff <blp@ovn.org>	2016-05-20 11:08:23 -07:00
Pravin B Shelar	66525ef3b8	dpif-netdev: Refactor userspace action Large segment support need to use this refactored function to send individual segments. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>	2016-05-18 19:39:18 -07:00
Pravin B Shelar	a260d96638	dpif-netdev: Refactor fast path process function. Once datapath support large packets, we need to segment packet before sending it to upcall. Refactoring this code make it bit cleaner. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>	2016-05-18 19:39:18 -07:00
Pravin B Shelar	4c74279641	dpif-netdev: Fix memory leak in tunnel header push action. in case of error from netdev_push_header() batch of packets was not freed. Following patch fixes this issue. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>	2016-05-18 19:39:18 -07:00
Pravin B Shelar	9235b4793e	dpif-netdev: Fix memory leak in tunnel header pop action. The tunnel header pop action can leak batch of packet in case of error. Following patch fixex the error code path. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>	2016-05-18 19:39:18 -07:00
Pravin B Shelar	1895cc8dbb	dpif-netdev: create batch object DPDK datapath operate on batch of packets. To pass the batch of packets around we use packets array and count. Next patch needs to associate meta-data with each batch of packets. So Introducing a batch structure to make handling the metadata easier. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>	2016-05-18 19:39:18 -07:00
Pravin B Shelar	f7ce48110d	dpif-netdev: rename packet_batch Next patch introduces new structure named packet_batch. So I am renaming it to packet_batch_per_flow. This does not change any functionality. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>	2016-05-18 19:39:18 -07:00
Pravin B Shelar	1c8f98d96a	netdev: Return number of packet from netdev_pop_header() Current tunnel-pop API does not allow the netdev implementation retain a packet but STT can keep a packet from batch of packets during TCP reassembly processing. To return exact count of valid packet STT need to pass this number of packet parameter as a reference. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>	2016-05-18 19:39:18 -07:00
Justin Pettit	2ff8484bbf	util: Pass 128-bit arguments directly instead of using pointers. Commit f2d105b5 (ofproto-dpif-xlate: xlate ct_{mark, label} correctly.) introduced the ovs_u128_and() function. It directly takes ovs_u128 values as arguments instead of pointers to them. As this is a bit more direct way to deal with 128-bit values, modify the other utility functions to do the same. Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>	2016-05-08 09:26:19 -07:00
Daniele Di Proietto	13f7d46a40	dpif-netdev: Fix dp_netdev_pmd_remove_flow(). After removing a flow from the dpcls classifier there might still be readers who have access to the flow, until the next grace period. Setting flow->cr.mask to NULL can cause concurrent readers to crash, so this commit avoids doing it. The crash can be reproduced, for example, by invoking an operation that cause datapath flows to be deleted (such as `ovs-appctl upcall/enable-megaflows`) while traffic is running. I think the assignment was intended just as a safety measure to catch race conditions, and it should be safe to remove. Here's a stack trace of a possible crash: Program terminated with signal SIGSEGV, Segmentation fault. rule=0x7f3ae8006190) at ../lib/dpif-netdev.c:4156 4156 if (OVS_UNLIKELY((value & maskp++) != keyp++)) { (gdb) bt rule=0x7f3ae8006190) at ../lib/dpif-netdev.c:4156 rules=0x7f3afa3f2e40, cnt=<optimized out>) at ../lib/dpif-netdev.c:4225 (pmd=pmd@entry=0x7f3afa3fc010, packets=packets@entry=0x7f3afa3fa420, cnt=cnt@entry=32, keys=keys@entry=0x7f3afa3f6428, batches=batches@entry=0x7f3afa3f4118, n_batches=n_batches@entry=0x7f3afa3fa3b0) at ../lib/dpif-netdev.c:3483 (pmd=pmd@entry=0x7f3afa3fc010, packets=packets@entry=0x7f3afa3fa420, cnt=<optimized out>, md_is_valid=md_is_valid@entry=false, port_no=<optimized out>) at ../lib/dpif-netdev.c:3625 cnt=<optimized out>, packets=0x7f3afa3fa420, pmd=0x7f3afa3fc010) at ../lib/dpif-netdev.c:3642 rxq=<optimized out>, port=<optimized out>, port=<optimized out>) at ../lib/dpif-netdev.c:2574 ../lib/dpif-netdev.c:2693 ../lib/ovs-thread.c:340 pthread_create.c:312 ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Fixes: 361d808dd9e4("flow: Split miniflow's map.") CC: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>	2016-05-04 19:19:54 -07:00
Ben Warren	25d436fbd4	Move lib/ofp-print.h to include/openvswitch directory Signed-off-by: Ben Warren <ben@skyportsystems.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2016-04-14 16:38:32 -07:00
Ben Warren	e29747e440	Move lib/match.h to include/openvswitch directory Signed-off-by: Ben Warren <ben@skyportsystems.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2016-04-14 13:46:49 -07:00

1 2 3 4 5 ...

466 Commits