mir/ovs - ovs - Mike's Git repositories

mir/ovs

mirror of https://github.com/openvswitch/ovs synced 2025-08-29 13:27:59 +00:00

Author	SHA1	Message	Date
Pravin B Shelar	1895cc8dbb	dpif-netdev: create batch object DPDK datapath operate on batch of packets. To pass the batch of packets around we use packets array and count. Next patch needs to associate meta-data with each batch of packets. So Introducing a batch structure to make handling the metadata easier. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>	2016-05-18 19:39:18 -07:00
Pravin B Shelar	f7ce48110d	dpif-netdev: rename packet_batch Next patch introduces new structure named packet_batch. So I am renaming it to packet_batch_per_flow. This does not change any functionality. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>	2016-05-18 19:39:18 -07:00
Pravin B Shelar	1c8f98d96a	netdev: Return number of packet from netdev_pop_header() Current tunnel-pop API does not allow the netdev implementation retain a packet but STT can keep a packet from batch of packets during TCP reassembly processing. To return exact count of valid packet STT need to pass this number of packet parameter as a reference. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>	2016-05-18 19:39:18 -07:00
Justin Pettit	2ff8484bbf	util: Pass 128-bit arguments directly instead of using pointers. Commit f2d105b5 (ofproto-dpif-xlate: xlate ct_{mark, label} correctly.) introduced the ovs_u128_and() function. It directly takes ovs_u128 values as arguments instead of pointers to them. As this is a bit more direct way to deal with 128-bit values, modify the other utility functions to do the same. Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>	2016-05-08 09:26:19 -07:00
Daniele Di Proietto	13f7d46a40	dpif-netdev: Fix dp_netdev_pmd_remove_flow(). After removing a flow from the dpcls classifier there might still be readers who have access to the flow, until the next grace period. Setting flow->cr.mask to NULL can cause concurrent readers to crash, so this commit avoids doing it. The crash can be reproduced, for example, by invoking an operation that cause datapath flows to be deleted (such as `ovs-appctl upcall/enable-megaflows`) while traffic is running. I think the assignment was intended just as a safety measure to catch race conditions, and it should be safe to remove. Here's a stack trace of a possible crash: Program terminated with signal SIGSEGV, Segmentation fault. rule=0x7f3ae8006190) at ../lib/dpif-netdev.c:4156 4156 if (OVS_UNLIKELY((value & maskp++) != keyp++)) { (gdb) bt rule=0x7f3ae8006190) at ../lib/dpif-netdev.c:4156 rules=0x7f3afa3f2e40, cnt=<optimized out>) at ../lib/dpif-netdev.c:4225 (pmd=pmd@entry=0x7f3afa3fc010, packets=packets@entry=0x7f3afa3fa420, cnt=cnt@entry=32, keys=keys@entry=0x7f3afa3f6428, batches=batches@entry=0x7f3afa3f4118, n_batches=n_batches@entry=0x7f3afa3fa3b0) at ../lib/dpif-netdev.c:3483 (pmd=pmd@entry=0x7f3afa3fc010, packets=packets@entry=0x7f3afa3fa420, cnt=<optimized out>, md_is_valid=md_is_valid@entry=false, port_no=<optimized out>) at ../lib/dpif-netdev.c:3625 cnt=<optimized out>, packets=0x7f3afa3fa420, pmd=0x7f3afa3fc010) at ../lib/dpif-netdev.c:3642 rxq=<optimized out>, port=<optimized out>, port=<optimized out>) at ../lib/dpif-netdev.c:2574 ../lib/dpif-netdev.c:2693 ../lib/ovs-thread.c:340 pthread_create.c:312 ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Fixes: 361d808dd9e4("flow: Split miniflow's map.") CC: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>	2016-05-04 19:19:54 -07:00
Ben Warren	25d436fbd4	Move lib/ofp-print.h to include/openvswitch directory Signed-off-by: Ben Warren <ben@skyportsystems.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2016-04-14 16:38:32 -07:00
Ben Warren	e29747e440	Move lib/match.h to include/openvswitch directory Signed-off-by: Ben Warren <ben@skyportsystems.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2016-04-14 13:46:49 -07:00
Daniele Di Proietto	62453dada9	dpif-netdev: Do not keep refcount for ports. Only the main thread will delete ports after pausing every other thread. There's no need to keep count. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Tested-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>	2016-04-07 18:59:49 -07:00
Daniele Di Proietto	71f634cfea	dpif-netdev: Remove useless dpif-dummy/delete-port appctl. It is only used in the testsuite and it can be replaced by a dpctl command. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Tested-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>	2016-04-07 18:59:45 -07:00
Daniele Di Proietto	490e82afe1	dpif-netdev: Keep count of elements in port->rxq[]. This will ease deleting a port with no open rxqs. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Tested-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>	2016-04-07 18:59:41 -07:00
Daniele Di Proietto	d17f4f082c	dpif-netdev: Proper error handling in do_add_port(). This fixes multiple error path mistakes in do_add_port, none of which has been a problem in practice so far. This change will make it easier for a following commit to return in case of error. Also, this removes an unneeded special case for tunnel ports. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Tested-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>	2016-04-07 18:59:37 -07:00
Panu Matilainen	27955e9852	dpif-netdev: report numa node number on pmd thread create failure Since PMD threads are placed on the NUMA node of the port regardless of a possible pmd-cpu-mask setting, this can lead to a somewhat confusing "out of unpinned cores" message - there might be plenty of available cores in the mask but they cannot be used if the port is on different NUMA node than the cores. Report the NUMA node number to help diagnosing the issue. Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1295952 Signed-off-by: Panu Matilainen <pmatilai@redhat.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-04-06 11:38:32 -07:00
Ben Warren	64c967795b	Move lib/ofpbuf.h to include/openvswitch directory Signed-off-by: Ben Warren <ben@skyportsystems.com> Acked-by: Ryan Moats <rmoats@us.ibm.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2016-03-30 13:10:18 -07:00
Ben Warren	417e7e66e1	list: Rename all functions in list.h with ovs_ prefix. This attempts to prevent namespace collisions with other list libraries Signed-off-by: Ben Warren <ben@skyportsystems.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2016-03-30 13:04:32 -07:00
Ben Warren	b19bab5b20	list: Remove lib/list.h completely. All code is now in include/openvswitch/list.h. Signed-off-by: Ben Warren <ben@skyportsystems.com> Acked-by: Ryan Moats <rmoats@us.ibm.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2016-03-30 13:01:21 -07:00
Ben Warren	3e8a2ad145	Move lib/dynamic-string.h to include/openvswitch directory Signed-off-by: Ben Warren <ben@skyportsystems.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2016-03-19 10:02:12 -07:00
Ben Pfaff	d79a39fece	dpif-netdev: Fix typo in comment. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org>	2016-03-07 20:43:09 -08:00
Ilya Maximets	7daedce44a	dpif-netdev: Fix double inclusion of cmap.h Also, all headers sorted lexicographically. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Joe Stringer <joe@ovn.org>	2016-02-25 16:09:29 -08:00
Ilya Maximets	cc245ce87d	dpif-netdev: Move rxq management into functions. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-02-22 18:13:04 -08:00
Ilya Maximets	762d146ab7	dpif-netdev: Reload each thread only once in do_add_port. While adding of pmd interface with multiple queues several queues may be assigned to one thread and this thread will be reloaded one time for each added queue. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-02-22 18:12:52 -08:00
Ilya Maximets	ce179f1163	dpif-netdev: Add dpif-netdev/pmd-rxq-show appctl command. This command can be used to check the port/rxq assignment to pmd threads. For each pmd thread of the datapath shows list of queue-ids with port names. Additionally log message from pmd_thread_main() extended with queue-id, and type of this message changed from INFO to DBG. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-02-22 16:37:17 -08:00
Ilya Maximets	a14b8947fd	dpif-netdev: Allow different numbers of rx queues for different ports. Currently, all of the PMD netdevs can only have the same number of rx queues, which is specified in other_config:n-dpdk-rxqs. Fix that by introducing of new option for PMD interfaces: 'n_rxq', which specifies the maximum number of rx queues to be created for this interface. Example: ovs-vsctl set Interface dpdk0 options:n_rxq=8 Old 'other_config:n-dpdk-rxqs' deleted. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ben Pfaff <blp@ovn.org> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-02-04 17:10:45 -08:00
Daniele Di Proietto	400486f7bb	dpif-netdev: Correctly update 'key' in emc_processing(). The 'key' pointer must point at the first unused element in the key array. Fixes: b89c678b7a26 ("dpif-netdev: optmizing emc_processing()") CC: Andy Zhou <azhou@ovn.org> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Andy Zhou <azhou@nicira.com>	2016-02-03 10:54:56 -08:00
Daniele Di Proietto	d916785ce9	dpif-netdev: Fix improper use of CMAP_FOR_EACH. It is ok to iterate a cmap with CMAP_FOR_EACH and remove elements with cmap_remove(), but having quiescent states inside the loop might create problems, since some of the postponed cleanup done inside the cmap might be executed, freeing the memory that the iterator is using. We had several of these errors in dpif-netdev, because when we rearrange ports or threads we often need to wait on a condition variable (which implies a quiescent state). This problem caused iterations to skip elements or to list them twice, resulting in the main thread waiting on a condition without anyone else to signal. Fix these cases by moving the possible quiescent states outside CMAP_FOR_EACH loops. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Tested-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Jarno Rajahalme <jarno@ovn.org>	2016-02-02 18:50:12 -08:00
Daniele Di Proietto	a90ed02611	dpif-netdev: Delay packets' metadata initialization. When a group of packets arrives from a port, we loop through them to initialize metadata and then we loop through them again to extract the flow and perform the exact match classification. This commit combines the two loops into one, and initializes packet->md in emc_processing() to improve performance. Since emc_processing() might also be called after recirculation (in which case the metadata is already valid), an extra parameter is added to support both cases. This commits also implements simple prefetching of packet metadata, to further improve performance. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Andy Zhou <azhou@ovn.org> Acked-by: Chandran, Sugesh <sugesh.chandran@intel.com>	2016-02-02 18:25:00 -08:00
Andy Zhou	b89c678b7a	dpif-netdev: optmizing emc_processing() Commit d262ac2c60ce1da7b477737f70e8efd38b32502d introduced a slight performance drop for the fast path, where every packets hits the emc cache. This patch removes that performance drop by only reloading the key pointer on emc cache miss. Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-02-02 12:01:14 -08:00
Andy Zhou	5a2fed4866	dpif-netdev: Load packet pointer only once in emc_processing() For the machines I have access to, Reloading the same pointer from memory seems to inhibit complier optimization somewhat. In emc_processing(), using a single packet pointer, instead reloading it from memory with packets[i], improves performance by 0.3 Mpps (tested with 10G NIC pushing 64 byte packets, with the base line of 12.2 Mpps). Besides improving performance, this patch should also improves code readability. Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-02-02 12:01:14 -08:00
Ben Pfaff	d262ac2c60	dpif-netdev: Avoid copying netdev_flow_key in emc_processing(). Before this commit, emc_processing() copied a netdev_flow_key if there was no exact-match cache (EMC) hit. This commit eliminates the copy by constructing the netdev_flow_key in the place it would be copied. Found by inspection. Shahbaz (CCed) reports that this reduces the cost of an EMC miss by 72 cycles in his test case in which the EMC is disabled. Presumably this is similarly valuable in cases where the EMC merely has few hits. For the original version of this patch, which was against a slightly earlier version of OVS, Daniele reported that: - With EMC disabled, this increases throughput from 4.8 Mpps to 5.4 Mpps. - With EMC enabled, this decreases throughput from 12.4 to 12.0 Mpps. CC: Muhammad Shahbaz <mshahbaz@cs.princeton.edu> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Andy Zhou <azhou@ovn.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-02-01 09:28:02 -08:00
Andy Zhou	f9fe365b82	dpif-netdev: Reduce code duplication Code clean up to reduce code duplication. Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-01-27 17:25:45 -08:00
Andy Zhou	d1aa0b94d8	dpif-netdev: drop swapping emc_processing() moves all the missed packets towards the beginning of packet array; matched packets are queued up into flow queues. Since the remaining of the packet array is not used anymore, don't bother swap packet pointers to save cycles and simplify logic. Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-01-27 17:25:31 -08:00
Andy Zhou	3d88a620f1	dpif-netdev: properly maintain exact match cache hit counter Current logic counts dropped packet as cache hit which is not correct. This patch removes dropped packet to improve accuracy. Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-01-27 17:23:02 -08:00
Ilya Maximets	347ba9bb9b	dpif-netdev: Unique and sequential tx_qids. Currently tx_qid is equal to pmd->core_id. This leads to unexpected behavior if pmd-cpu-mask different from '/(0)(1\|3\|7)?(f)/', e.g. if core_ids are not sequential, or doesn't start from 0, or both. Example: starting 2 pmd threads with 1 port, 2 rxqs per port, pmd-cpu-mask = 00000014 and let dev->real_n_txq = 2 It that case pmd_1->tx_qid = 2, pmd_2->tx_qid = 4 and txq_needs_locking = true (if device hasn't ovs_numa_get_n_cores()+1 queues). In that case, after truncating in netdev_dpdk_send__(): 'qid = qid % dev->real_n_txq;' pmd_1: qid = 2 % 2 = 0 pmd_2: qid = 4 % 2 = 0 So, both threads will call dpdk_queue_pkts() with same qid = 0. This is unexpected behavior if there is 2 tx queues in device. Queue #1 will not be used and both threads will lock queue #0 on each send. Fix that by using sequential tx_qids. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-01-26 11:40:45 -08:00
Ilya Maximets	ae7ad0a15e	dpif-netdev: Rework of rx queue management. Current rx queue management model is buggy and will not work properly without additional barriers and other syncronization between PMD threads and main thread. Known BUGS of current model: * While reloading, two PMD threads, one already reloaded and one not yet reloaded, can poll same queue of the same port. This behavior may lead to dpdk driver failure, because they are not thread-safe. * Same bug as fixed in commit e4e74c3a2b ("dpif-netdev: Purge all ukeys when reconfigure pmd.") but reproduced while only reconfiguring of pmd threads without restarting, because addition may change the sequence of other ports, which is important in time of reconfiguration. Introducing the new model, where distribution of queues made by main thread with minimal synchronizations and without data races between pmd threads. Also, this model should work faster, because only needed threads will be interrupted for reconfiguraition and total computational complexity of reconfiguration is less. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-01-26 11:40:35 -08:00
Jesse Gross	3f32cfebcb	dpif-netdev: Avoid using uninitialized memory with tunnel options. When handling an upcall with the userspace datapath, it's currently possible for a flow from a packet with no tunnel options to come back with matches on the options. If that happens, dpif-netdev will attempt to translate the wildcards provided by ofproto into the format used by dpif. The translation requires use of the original wildcards from the flow, which since they didn't exist, is uninitalized memory. Matching on fields which don't actually exist is itself a bug. However, this can occur when we attempt to set a tunnel option on the packet - ofproto generates a match on the field in the original packet. This is being fixed separately. In other situations where we have a match on an unexpected field, we simply ignore it. This happens with tunnel options with the kernel datapath, non-tunnel fields that don't exist in the packet, and even with Geneve where we do have some options but not the particular one that was matched on. This brings the same behavior for this case and avoids the possibility of accessing uninitialized memory. Reported-by: Daniele Di Proietto <diproiettod@vmware.com> Signed-off-by: Jesse Gross <jesse@kernel.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2015-12-22 16:20:57 -08:00
Daniele Di Proietto	a0f7b6d525	ct-dpif: Add ct_dpif_flush(). This function will flush the connection tracking tables of a specific datapath. It simply calls a function pointer in the dpif_class. No dpif currently implements the required interface. The next commits will provide an implementation in dpif-netlink. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Joe Stringer <joe@ovn.org>	2015-12-21 17:23:43 -08:00
Daniele Di Proietto	b77d9629ad	ct-dpif: Add ct_dpif_dump_{start,next,done}(). These function can be used to dump conntrack entries from a datapath. They simply call a function pointer in the dpif_class. No dpif currently implements the interface. The next commits will provide an implementation in dpif-netlink. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Joe Stringer <joe@ovn.org>	2015-12-21 17:23:32 -08:00
Mengke Liu	4e548ad9e6	geneve-map-rename: rename geneve-map to tlv-map. This patch renames the command name related with geneve-map to a more generic name as following: add-geneve-map -> add-tlv-map del-geneve-map -> del-tlv-map dump-geneve-map -> dump-tlv-map It also renames the Geneve_table to tlv_table. By doing this renaming, the NSH variable context header (the same TLV format as Geneve) or other protocol can reuse the field tun_metadata<N> in the future. Signed-off-by: Mengke Liu <mengke.liu@intel.com> Signed-off-by: Ricky Li <ricky.li@intel.com> Signed-off-by: Jesse Gross <jesse@kernel.org>	2015-12-15 13:06:11 -08:00
Daniele Di Proietto	ca8d344271	odp-util: Return exact mask if netlink mask attribute is missing. In the ODP context an empty mask netlink attribute usually means that the flow should be an exact match. odp_flow_key_to_mask{,_udpif}() instead return a struct flow_wildcards with matches only on recirc_id and vlan_tci. A more appropriate behavior is to handle a missing (zero length) netlink mask specially (like we do in userspace and Linux datapath) and create an exact match flow_wildcards from the original flow. This fixes a bug in revalidate_ukey(): every flow created with megaflows disabled would be revalidated away, because the mask would seem too generic. (Another possible fix would be to handle the special case of a missing mask in revalidate_ukey(), but this seems a more generic solution).	2015-12-10 17:38:23 -08:00
Daniele Di Proietto	4d8f90b1b1	dpif-netdev: Initialize match.tun_md in various places. This solves a crash in dp_netdev_flow_add(), when log level is debug.	2015-12-10 17:38:23 -08:00
Thadeu Lima de Souza Cascardo	53902038ab	tnl-arp-cache: Rename module and functions to tnl-neigh-cache. Since we don't distinguish between IPv4 and IPv6 lookups, consolidate ARP and ND cache into neighbor cache. Other references to ARP related to the ARP cache but that are not really about ARP have been renamed as well. tnl_arp_lookup is kept for lookups using IPv4 instead of IPv4-mapped addresses, but that is going to be removed in a later patch. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2015-11-30 10:27:51 -08:00
YAMAMOTO Takashi	65d43fdca9	dpif_dummy_override: Allow overriding a non-existing provider This allows --enable-dummy=system with a userland-only build. It's useful for testsuite. Signed-off-by: YAMAMOTO Takashi <yamamoto@midokura.com> Acked-by: Ben Pfaff <blp@ovn.org>	2015-11-26 18:37:24 +09:00
Joe Stringer	9daf23484f	Add connection tracking label support. This patch adds a new 128-bit metadata field to the connection tracking interface. When a label is specified as part of the ct action and the connection is committed, the value is saved with the current connection. Subsequent ct lookups with the table specified will expose this metadata as the "ct_label" field in the flow. For example, to allow new TCP connections from port 1->2 and only allow established connections from port 2->1, and to associate a label with those connections: table=0,priority=1,action=drop table=0,arp,action=normal table=0,in_port=1,tcp,action=ct(commit,exec(set_field:1->ct_label)),2 table=0,in_port=2,ct_state=-trk,tcp,action=ct(table=1) table=1,in_port=2,ct_state=+trk,ct_label=1,tcp,action=1 Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2015-10-13 15:34:16 -07:00
Joe Stringer	8e53fe8cf7	Add connection tracking mark support. This patch adds a new 32-bit metadata field to the connection tracking interface. When a mark is specified as part of the ct action and the connection is committed, the value is saved with the current connection. Subsequent ct lookups with the table specified will expose this metadata as the "ct_mark" field in the flow. For example, to allow new TCP connections from port 1->2 and only allow established connections from port 2->1, and to associate a mark with those connections: table=0,priority=1,action=drop table=0,arp,action=normal table=0,in_port=1,tcp,action=ct(commit,exec(set_field:1->ct_mark)),2 table=0,in_port=2,ct_state=-trk,tcp,action=ct(table=1) table=1,in_port=2,ct_state=+trk,ct_mark=1,tcp,action=1 Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2015-10-13 15:34:15 -07:00
Joe Stringer	07659514c3	Add support for connection tracking. This patch adds a new action and fields to OVS that allow connection tracking to be performed. This support works in conjunction with the Linux kernel support merged into the Linux-4.3 development cycle. Packets have two possible states with respect to connection tracking: Untracked packets have not previously passed through the connection tracker, while tracked packets have previously been through the connection tracker. For OpenFlow pipeline processing, untracked packets can become tracked, and they will remain tracked until the end of the pipeline. Tracked packets cannot become untracked. Connections can be unknown, uncommitted, or committed. Packets which are untracked have unknown connection state. To know the connection state, the packet must become tracked. Uncommitted connections have no connection state stored about them, so it is only possible for the connection tracker to identify whether they are a new connection or whether they are invalid. Committed connections have connection state stored beyond the lifetime of the packet, which allows later packets in the same connection to be identified as part of the same established connection, or related to an existing connection - for instance ICMP error responses. The new 'ct' action transitions the packet from "untracked" to "tracked" by sending this flow through the connection tracker. The following parameters are supported initally: - "commit": When commit is executed, the connection moves from uncommitted state to committed state. This signals that information about the connection should be stored beyond the lifetime of the packet within the pipeline. This allows future packets in the same connection to be recognized as part of the same "established" (est) connection, as well as identifying packets in the reply (rpl) direction, or packets related to an existing connection (rel). - "zone=[u16\|NXM]": Perform connection tracking in the zone specified. Each zone is an independent connection tracking context. When the "commit" parameter is used, the connection will only be committed in the specified zone, and not in other zones. This is 0 by default. - "table=NUMBER": Fork pipeline processing in two. The original instance of the packet will continue processing the current actions list as an untracked packet. An additional instance of the packet will be sent to the connection tracker, which will be re-injected into the OpenFlow pipeline to resume processing in the specified table, with the ct_state and other ct match fields set. If the table is not specified, then the packet is submitted to the connection tracker, but the pipeline does not fork and the ct match fields are not populated. It is strongly recommended to specify a table later than the current table to prevent loops. When the "table" option is used, the packet that continues processing in the specified table will have the ct_state populated. The ct_state may have any of the following flags set: - Tracked (trk): Connection tracking has occurred. - Reply (rpl): The flow is in the reply direction. - Invalid (inv): The connection tracker couldn't identify the connection. - New (new): This is the beginning of a new connection. - Established (est): This is part of an already existing connection. - Related (rel): This connection is related to an existing connection. For more information, consult the ovs-ofctl(8) man pages. Below is a simple example flow table to allow outbound TCP traffic from port 1 and drop traffic from port 2 that was not initiated by port 1: table=0,priority=1,action=drop table=0,arp,action=normal table=0,in_port=1,tcp,ct_state=-trk,action=ct(commit,zone=9),2 table=0,in_port=2,tcp,ct_state=-trk,action=ct(zone=9,table=1) table=1,in_port=2,ct_state=+trk+est,tcp,action=1 table=1,in_port=2,ct_state=+trk+new,tcp,action=drop Based on original design by Justin Pettit, contributions from Thomas Graf and Daniele Di Proietto. Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2015-10-13 15:34:15 -07:00
Jarno Rajahalme	449b813113	dpif-netdev: Exact match non-presence of vlans. The Netlink encoding of datapath flow keys cannot express wildcarding the presence of a VLAN tag. Instead, a missing VLAN tag is interpreted as exact match on the fact that there is no VLAN. This makes reading datapath flow dumps confusing, since for everything else, a missing key value means that the corresponding key was wildcarded. Unless we refactor a lot of code that translates between Netlink and struct flow representations, we have to do the same in the userspace datapath. This makes at least the flow install logs show that the vlan_tci field is matched to zero. However, the datapath flow dumps remain as they were before, as they are performed using the netlink format. Add a test to verify that packet with a vlan will not match a rule that may seem wildcarding the presence of the vlan tag. Applying this test without the userspace datapath modification showed that the userspace datapath failed to create a new datapath flow for the VLAN packet before this patch. Reported-by: Tony van der Peet <tony.vanderpeet@gmail.com> Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2015-09-18 17:47:37 -07:00
Daniele Di Proietto	f2f44f5da0	dpif-netdev: Check for PKT_RX_RSS_HASH flag. DPDK mbufs contain a valid RSS hash only if PKT_RX_RSS_HASH is set in 'ol_flags'. Otherwise the hash is garbage and doesn't relate to the packet. This fixes an issue with vhost, which, being a virtual NIC, doesn't compute the hash. Reported-by: Dongjun <dongj@dtdream.com> Suggested-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Kevin Traynor <kevin.traynor@intel.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2015-09-11 17:43:39 +01:00
Pravin B Shelar	7f9b850474	tnl-ports: Add destination IP and MAC address to the match. Currently tnl-port table wildcard destination ip and mac addresses for given tunnel packet. That could result accepting tunnel packets destined for other hosts. Following patch adds support for matching for ip and mac address. IP address upates to tnl-port table are piggybacked on ovs-router updates. Reported-by: Ben Pfaff <blp@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2015-09-08 16:24:35 -07:00
Alex Wang	e4e74c3a2b	dpif-netdev: Purge all ukeys when reconfigure pmd. When dpdk configuration changes, all pmd threads are recreated and rx queues of each port are reloaded. After this process, rx queue could be mapped to a different pmd thread other than the one before reconfiguration. However, this is totally transparent to ofproto layer modules. So, if the ofproto-dpif-upcall module still holds ukeys generated before pmd thread recreation, this old ukey will collide with the ukey for the new upcalls from same traffic flow, causing flow installation failure. To fix the bug, this commit adds a new call-back function in dpif layer for notifying upper layer the purging of datapath (e.g. pmd thread deletion in dpif-netdev). So, the ofproto-dpif-upcall module can react properly with deleting the ukeys and with collecting flows' last stats. Reported-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Alex Wang <ee07b291@gmail.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com> Tested-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Joe Stringer <joestringer@nicira.com>	2015-09-02 05:57:59 +00:00
Jarno Rajahalme	5fcff47b0b	flow: Add struct flowmap. Struct miniflow is now sometimes used just as a map. Define a new struct flowmap for that purpose. The flowmap is defined as an array of maps, and it is automatically sized according to the size of struct flow, so it will be easier to maintain in the future. It would have been tempting to use the existing struct bitmap for this purpose. The main reason this is not feasible at the moment is that some flowmap algorithms are simpler when it can be assumed that no struct flow member requires more bits than can fit to a single map unit. The tunnel member already requires more than 32 bits, so the map unit needs to be 64 bits wide. Performance critical algorithms enumerate the flowmap array units explicitly, as it is easier for the compiler to optimize, compared to the normal iterator. Without this optimization a classifier lookup without wildcard masks would be about 25% slower. With this more general (and maintainable) algorithm the classifier lookups are about 5% slower, when the struct flow actually becomes big enough to require a second map. This negates the performance gained in the "Pre-compute stage masks" patch earlier in the series. Requested-by: Ben Pfaff <blp@nicira.com> Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2015-08-26 15:37:22 -07:00
Alex Wang	fbe0962b28	coverage: Add coverage_try_clear() for performance-critical threads. For performance-critical threads like pmd threads, we currently make them never call coverage_clear() to avoid contention over the global mutex 'coverage_mutex'. So, even though pmd thread still keeps updating their thread-local coverage count, the count is never attributed to the global total. But it is useful to have them available. This commit makes this happen by implementing a non-contending version of the clear function, coverage_try_clear(). The function will use the ovs_mutex_trylock() and return immediately if the mutex cannot be acquired. Since threads like pmd thread are always busy-looping, the lock will eventually be acquired. Requested-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Alex Wang <alexw@nicira.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ben Pfaff <blp@nicira.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com	2015-08-25 16:21:36 -07:00

1 2 3 4 5 ...

473 Commits