mir/ovs - ovs - Mike's Git repositories

mir/ovs

mirror of https://github.com/openvswitch/ovs synced 2025-08-31 14:25:26 +00:00

Author	SHA1	Message	Date
Ilya Maximets	e180c431b9	tests: classifier: Add a stress test for prefixes reconfiguration. This test is reusing the benchmark infrastructure, but it has some pre-defined parameters, so it's easier to run in the test suite. The benchmark code is adjusted to start another thread that does prefix updates continuously in a loop and the lookup threads are updated to be able to enter quiescent state periodically, so the reconfiguration can proceed. This test is a reproducer for the crashes fixed in the previous commit. Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-05-26 17:38:52 +02:00
Ilya Maximets	6a61a70fcb	classifier: Fix race for prefix tree configuration. The thread fence in the classifier is supposed to ensure that when the subtable->trie_plen is updated, the actual prefix tree is ready to be used. On the write side in trie_init(), the fence is between the tree configuration and the 'trie_plen' update. On the reader's side however, the fence is at the beginning of the classifier_lookup__(), and both reads of the 'trie_plen' and the accesses to the tree itself are happening afterwards. And since both types of the reads are on the same side of the fence, the fence is kind of pointless and doesn't guarantee any memory ordering. So, readers can be accessing partially initialized prefix trees. Another problem with the configuration is that cls->n_tries is updated without any synchronization as well. The comment on the fence says that it also synchronizes for the cls->n_tries, but that doesn't make a lot of sense. In practice, cls->n_tries is read multiple times throughout the classifier_lookup__() and each of these reads may give a different value if there is a concurrent update, causing the reader to access trees that are not initialized or in the middle of being destroyed, leading to OVS crashes while the user updates the flow table prefixes. First thing that needs to be fixed here is to only read cls->n_tries once to avoid obvious crashes with access to uninitialized trie_ctx[] entries. The second thing is that we need a proper memory synchronization that will guarantee that our prefix trees are fully initialized when readers access them. In the current logic we would need to issue a thread fence after every read of a subtable->trie_plen value, i.e., we'd need a fence per subtable lookup. This would be very expensive and wasteful, considering the prefix tree configuration normally happens only once somewhere at startup. What we can do instead is to convert cls->n_tries into atomic and use it as a synchronization point: Writer (classifier_set_prefix_fields): 1. Before making any changes, set cls->n_tries to zero. Relaxed memory order can be used here, because we'll have a full memory barrier at the next step. 2. ovsrcu_synchronize() to wait for all threads to stop using tries. 3. Update tries while nobody is using them. 4. Set cls->n_tries to a new value with memory_order_release. Reader (classifier_lookup): 1. Read the cls->n_tries with the memory_order_acquire. 2. Use that once read value throughout. RCU in this scenario will ensure that every thread no longer uses the prefix trees when we're about to change them. The acquire-release semantics on the cls->n_tries just saves us from calling the ovsrcu_synchronize() the second time once we're done with the whole reconfiguration. We're just updating the number and making all the previous changes visible on CPUs that acquire it. Alternative solution might be to go full RCU and make the array of trees itself RCU-protected. This way we would not need to do any extra RCU synchronization or managing the memory ordering. However, that would mean having multiple layers of RCU with trees and rules in them potentially surviving multiple grace periods, which I would like to avoid, if possible. Previous code was also trying to be smart and not disable prefix tree lookups for prefixes that are not changing. We're sacrificing this functionality in the name of simpler code. Attempt to make that work would either require a full conversion to RCU or a per-subtable synchronization. Lookups can be done without the prefix match optimizations for a brief period of time. This doesn't affect correctness of the resulted datapath flows. In the actual implementation instead of dropping cls->n_tries to zero at step one, we keep the access to the first N tries that are not going to change by setting the cls->n_tries to the index of the first trie that will be updated. So, we'll not be disabling all the prefix match optimizations completely. There was an attempt to solve this problem already in commit: `a611705990` ("classifier: Prevent tries vs n_tries race leading to NULL dereference.") But it was focused on one particular crash and didn't take into account a wider issue with the memory ordering on these trees in general. The changes made in that commit are mostly reverted as not needed anymore. Fixes: `f358a2cb2e` ("lib/classifier: RCUify prefix trie code.") Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2025-April/422765.html Reported-by: Numan Siddique <numans@ovn.org> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-05-26 17:38:22 +02:00
Ilya Maximets	9234b9b40f	tests: classifier: Fix the rule number check during trie verification. Same rule can be in multiple prefix trees and so it is possible that the total number of rules in all trees exceeds the total number of rules in the classifier. But the number of rules in a single prefix tree still can't exceed the total number of rules in the classifier. Move the check accordingly. Note: checkpatch complains about usage of the assert(), but it is everywhere in this file and so, not changing in just this one place. Fixes: `f358a2cb2e` ("lib/classifier: RCUify prefix trie code.") Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-05-26 17:38:17 +02:00
Nobuhiro MIKI	7b514aba0e	ofproto-dpif-trace: Improve conjunctive match tracing. A conjunctive flow consists of two or more multiple flows with conjunction actions. When input to the ofproto/trace command matches a conjunctive flow, it outputs flows of all dimensions. Acked-by: Simon Horman <horms@ovn.org> Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-11-16 18:41:07 +01:00
Ben Pfaff	8205fbc8f5	Eliminate "whitelist" and "blacklist" terms. There is one remaining use under datapath. That change should happen upstream in Linux first according to our usual policy. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>	2020-10-16 19:22:24 -07:00
Eiichi Tsukata	a611705990	classifier: Prevent tries vs n_tries race leading to NULL dereference. Currently classifier tries and n_tries can be updated not atomically, there is a race condition which can lead to NULL dereference. The race can happen when main thread updates a classifier tries and n_tries in classifier_set_prefix_fields() and at the same time revalidator or handler thread try to lookup them in classifier_lookup__(). Such race can be triggered when user changes prefixes of flow_table. Race(user changes flow_table prefixes: ip_dst,ip_src => none): [main thread] [revalidator/handler thread] =========================================================== /* cls->n_tries == 2 / for (int i = 0; i < cls->n_tries; i++) { trie_init(cls, i, NULL); / n_tries == 0 / cls->n_tries = n_tries; / cls->tries[i]->feild is NULL / trie_ctx_init(&trie_ctx[i],&cls->tries[i]); / trie->field is NULL */ ctx->be32ofs = trie->field->flow_be32ofs; To prevent the race, instead of re-introducing internal mutex implemented in the commit `fccd7c092e` ("classifier: Remove internal mutex."), this patch makes trie field RCU protected and checks it after read. Fixes: `fccd7c092e` ("classifier: Remove internal mutex.") Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2020-05-28 11:46:39 +02:00
Justin Pettit	396d492cfa	Don't shadow variables. Rename the remaining variables that were shadowing another definition. Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>	2018-02-28 15:02:44 -08:00
Ben Pfaff	0d71302e36	ofp-util, ofp-parse: Break up into many separate modules. ofp-util had been far too large and monolithic for a long time. This commit breaks it up into units that make some logical sense. It also moves the pieces of ofp-parse that were specific to each unit into the relevant unit. Most of this commit is just moving code around. Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>	2018-02-13 10:43:13 -08:00
Ben Pfaff	46ab60bfe5	classifier: Refactor interface for classifier_remove(). Until now, classifier_remove() returned either null or the classifier rule passed to it, which is an unusual interface. This commit changes it to return true if it succeeds or false on failure. In addition, most of classifier_remove()'s callers know ahead of time that it must succeed, even though most of them didn't bother with an assertion, so this commit adds a classifier_remove_assert() function as a helper. Signed-off-by: Ben Pfaff <blp@ovn.org> Tested-by: Yifeng Sun <pkusunyifeng@gmail.com> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>	2018-01-31 11:19:21 -08:00
Ben Pfaff	134fefa4de	types: New macros ETH_ADDR_C and ETH_ADDR64_C. These macros expand to constants of type struct eth_addr and struct eth_addr64, respectively, and make it more convenient to initialize or assign to an Ethernet address object. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Mark Michelson <mmichels@redhat.com>	2017-11-29 13:27:16 -08:00
Ben Pfaff	71f21279f6	Eliminate most shadowing for local variable names. Shadowing is when a variable with a given name in an inner scope hides a different variable with the same name in a surrounding scope. This is generally undesirable because it can confuse programmers. This commit eliminates most of it. Found with -Wshadow=local in GCC 7. The repo is not really ready to enable this option by default because of a few cases that are harder to fix, and harmless, such as nested use of CMAP_FOR_EACH. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Andy Zhou <azhou@ovn.org>	2017-08-02 15:03:35 -07:00
Eric Garver	f0fb825a37	Add support for 802.1ad (QinQ tunneling) Flow key handling changes: - Add VLAN header array in struct flow, to record multiple 802.1q VLAN headers. - Add dpif multi-VLAN capability probing. If datapath supports multi-VLAN, increase the maximum depth of nested OVS_KEY_ATTR_ENCAP. Refactor VLAN handling in dpif-xlate: - Introduce 'xvlan' to track VLAN stack during flow processing. - Input and output VLAN translation according to the xbundle type. Push VLAN action support: - Allow ethertype 0x88a8 in VLAN headers and push_vlan action. - Support push_vlan on dot1q packets. Use other_config:vlan-limit in table Open_vSwitch to limit maximum VLANs that can be matched. This allows us to preserve backwards compatibility. Add test cases for VLAN depth limit, Multi-VLAN actions and QinQ VLAN handling Co-authored-by: Thomas F Herbert <thomasfherbert@gmail.com> Signed-off-by: Thomas F Herbert <thomasfherbert@gmail.com> Co-authored-by: Xiao Liang <shaw.leon@gmail.com> Signed-off-by: Xiao Liang <shaw.leon@gmail.com> Signed-off-by: Eric Garver <e@erig.me> Signed-off-by: Ben Pfaff <blp@ovn.org>	2017-03-16 15:18:40 -07:00
Ryan Moats	1f4a7252d9	Add read-only option to ovs-dpctl and ovs-ofctl commands. ovs-dpctl and ovs-ofctl lack a read-only option to prevent running of commands that perform read-write operations. Add it and the necessary scaffolding to each. Signed-off-by: Ryan Moats <rmoats@us.ibm.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2016-08-15 17:26:15 -07:00
Jarno Rajahalme	da9cfca6e2	Revert "pvector: Expose non-concurrent priority vector." This reverts commit `8bdfe13138`. I failed to see that lib/dpif-netdev.c actually needs the concurrency provided by pvector prior to this change. More specifically, when a subtable is removed, concurrent lookups may skip over another subtable swapped in to the place of the removed subtable in the vector. Since this was the only use of the non-concurrent pvector, it is cleaner to revert the whole patch. Reported-by: Jan Scheurich <jan.scheurich@ericsson.com> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-08-10 14:58:51 -07:00
Jarno Rajahalme	44e0c35d98	lib: Separate versioning to its own module. Separate rule versioning to lib/versions.h to make it easier to use versioning for other data types. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>	2016-07-29 16:52:01 -07:00
Jarno Rajahalme	8bdfe13138	pvector: Expose non-concurrent priority vector. PMD threads use pvectors but do not need the overhead of the concurrent version. Expose the non-concurrent version for that use. Note that struct pvector is renamed as struct cpvector (for concurrent priority vector), and the former struct pvector_impl is now struct pvector. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>	2016-07-29 11:12:08 -07:00
Jarno Rajahalme	5e27fe9781	classifier: Fix race condition leading to NULL dereference. Addition of table versioning exposed struct cls_rule member 'cls_match' to RCU readers and made it possible for 'cls_match' become NULL while being accessed by an RCU reader, but we failed to check for this condition. This may have resulted in NULL pointer dereference and ovs-vswitchd crash. Fix this by making the 'cls_match' member an RCU pointer and checking the value whenever it potentially read by an RCU reader. In these instances we use ovsrcu_get(), whereas functions accessible only by the exclusive writers use ovsrcu_get_protected() and do not need to check the result. VMware-BZ: 1643642 Fixes: `2b7b1427` ("classifier: Support table versioning") Signed-off-by: Jarno Rajahalme <jarno@ovn.org>	2016-04-17 08:51:21 -07:00
Ben Warren	f424833659	Move lib/ofp-util.h to include/openvswitch directory This commit also adds several #include directives in source files in order to make the 'ofp-util.h' move possible Signed-off-by: Ben Warren <ben@skyportsystems.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2016-04-14 13:48:25 -07:00
Ben Warren	417e7e66e1	list: Rename all functions in list.h with ovs_ prefix. This attempts to prevent namespace collisions with other list libraries Signed-off-by: Ben Warren <ben@skyportsystems.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2016-03-30 13:04:32 -07:00
William Tu	efe2037e15	test-classifier: Use `in_port.ofp_port`, instead of `in_port`. The test uses 16-bit ofp_port_t, however the struct flow member `in_port` is 32-bit, causing a memcpy to read uninitialized data. We should restrict the test to the `ofp_port` member of the `in_port` union Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Co-authored-by: Daniele Di Proietto <diproiettod@vmware.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2015-12-22 23:05:44 -08:00
Jarno Rajahalme	74ff3298c8	userspace: Define and use struct eth_addr. Define struct eth_addr and use it instead of a uint8_t array for all ethernet addresses in OVS userspace. The struct is always the right size, and it can be assigned without an explicit memcpy, which makes code more readable. "struct eth_addr" is a good type name for this as many utility functions are already named accordingly. struct eth_addr can be accessed as bytes as well as ovs_be16's, which makes the struct 16-bit aligned. All use seems to be 16-bit aligned, so some algorithms on the ethernet addresses can be made a bit more efficient making use of this fact. As the struct fits into a register (in 64-bit systems) we pass it by value when possible. This patch also changes the few uses of Linux specific ETH_ALEN to OVS's own ETH_ADDR_LEN, and removes the OFP_ETH_ALEN, as it is no longer needed. This work stemmed from a desire to make all struct flow members assignable for unrelated exploration purposes. However, I think this might be a nice code readability improvement by itself. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>	2015-08-28 14:55:11 -07:00
Jarno Rajahalme	5fcff47b0b	flow: Add struct flowmap. Struct miniflow is now sometimes used just as a map. Define a new struct flowmap for that purpose. The flowmap is defined as an array of maps, and it is automatically sized according to the size of struct flow, so it will be easier to maintain in the future. It would have been tempting to use the existing struct bitmap for this purpose. The main reason this is not feasible at the moment is that some flowmap algorithms are simpler when it can be assumed that no struct flow member requires more bits than can fit to a single map unit. The tunnel member already requires more than 32 bits, so the map unit needs to be 64 bits wide. Performance critical algorithms enumerate the flowmap array units explicitly, as it is easier for the compiler to optimize, compared to the normal iterator. Without this optimization a classifier lookup without wildcard masks would be about 25% slower. With this more general (and maintainable) algorithm the classifier lookups are about 5% slower, when the struct flow actually becomes big enough to require a second map. This negates the performance gained in the "Pre-compute stage masks" patch earlier in the series. Requested-by: Ben Pfaff <blp@nicira.com> Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2015-08-26 15:37:22 -07:00
Jarno Rajahalme	1add079ee2	test-classifier: Add benchmark. Add a benchmark command for classifier lookup performance testing. Running the test-classifier without arguments of with "--help" will print the following usage: usage: ovstest test-classifier benchmark <n_rules> <n_priorities> <n_subtables> <n_threads> <n_lookups> where: <n_rules> - The number of rules to install for lookups. More rules makes misses less likely. <n_priorities> - How many different priorities to use. Using only 1 priority will force lookups to continue through all subtables. <n_subtables> - Number of subtables to use. Normally a classifier has rules with different kinds of masks, resulting in multiple subtables (one per mask). However, in some special cases a table may consist of only one kind of rules, so there will be only one subtable. <n_threads> - How many lookup threads to use. Using one thread should give less variance accross runs, but classifier scaling can be tested with multiple threads. <n_lookups> - How many lookups each thread should perform. For testing the classifier is filled with <n_rules> rules using <n_subtables> different mask patterns and <n_priorities> different priorities. A random set of lookup flows are created, and <n_threads> lookup threads are spawned to perform <n_lookups> lookups each. The count of hits and misses, as well as the overall execution time is reported. Example run: $ tests/ovstest test-classifier benchmark 1000 1 30 1 3800000 Benchmarking with: 1000 rules with 1 priorities in 30 tables, 1 threads doing 3800000 lookups each Without wildcards: hits: 461520, misses: 3338480 classifier lookups: 386 ms, 9844559 lookups/sec With wildcards: hits: 461520, misses: 3338480 classifier lookups: 866 ms, 4387990 lookups/sec Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2015-08-21 12:49:46 -07:00
Jarno Rajahalme	b30001c7fd	classifier: Simplify minimask_hash(). minimask_hash() can be simplified as each value is known to be non-zero. Move miniflow_hash() into test-classifier.c as miniflow_hash__() as it is no longer needed elsewhere. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Joe Stringer <joestringer@nicira.com>	2015-08-12 16:13:14 -07:00
Jarno Rajahalme	a851eb9492	flow: Eliminate miniflow_clone() and minimask_clone(). miniflow_clone() and minimask_clone() are no longer used, remove them from the API. Now that miniflow data is always inlined, it makes sense to rename miniflow_clone_inline() miniflow_clone(). Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2015-07-15 13:17:10 -07:00
Jarno Rajahalme	8fd4792403	flow: Always inline miniflows. Now that performance critical code already inlines miniflows and minimasks, we can simplify struct miniflow by always dynamically allocating miniflows and minimasks to the correct size. This changes the struct minimatch to always contain pointers to its miniflow and minimask. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2015-07-15 13:17:10 -07:00
Jarno Rajahalme	bd53aa1723	classifier: Make versioning more explicit. Now that struct cls_match has 'add_version' the 'version' in cls_match was largely redundant. Remove 'version' from struct cls_rule, and add it to function prototypes that need it. This makes versioning more explicit (or less indirect) in the API. Suggested-by: Ben Pfaff <blp@nicira.com> Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2015-07-06 11:46:34 -07:00
Jarno Rajahalme	18721c4a4d	classifier: Simplify versioning. After all, there are some cases in which both the insertion version and removal version of a rule need to be considered. This makes the cls_match a bit bigger, but makes classifier versioning much simpler to understand. Also, avoid using type larger than int in an enum, as it is not portable C. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2015-06-12 16:12:56 -07:00
Jarno Rajahalme	3bbe9a1fda	test-classifier: Test versioning features. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2015-06-11 15:53:43 -07:00
Jarno Rajahalme	8f8023b3ee	classifier: Make traversing identical rules robust. The traversal of the list of identical rules from the lookup threads is fragile if the list head is removed during the list traversal. This patch simplifies the implementation of that list by making the list NULL terminated, singly linked RCU-protected list. By having the NULL at the end there is no longer a possiblity of missing the point when the list wraps around. This is significant when there can be multiple elements with the same priority in the list. This change also decreases the size of the struct cls_match back pre-'visibility' attribute size. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2015-06-11 15:53:42 -07:00
Jarno Rajahalme	2b7b1427c1	classifier: Support table versioning This patch allows classifier rules to become visible and invisible in specific versions. A 'version' is defined as a positive monotonically increasing integer, which never wraps around. The new 'visibility' attribute replaces the prior 'to_be_removed' and 'visible' attributes. When versioning is not used, the 'version' parameter should be passed as 'CLS_MIN_VERSION' when creating rules, and 'CLS_MAX_VERSION' when looking up flows. This feature enables the support for atomic OpenFlow bundles without significant performance penalty on 64-bit systems. There is a performance decrease in 32-bit systems due to 64-bit atomics used. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2015-06-10 16:17:47 -07:00
Russell Bryant	1636c76112	command-line: add ovs_cmdl_context I started working on a new command line utility that used this shared code. I wanted the ability to pass some data from common initialization code to all of the commands. You can find a similar pattern in ovs-vsctl. This patch updates the command handler to take a new struct, ovs_cmdl_context, instead of argc and argv directly. It includes argc and argv, but also includes an opaque type (void *), where the user of this API can attach its custom data it wants passed along to command handlers. This patch affected the ovstest sub-programs, as well. The patch includes a bit of an odd hack to OVSTEST_REGISTER() to avoid making the main() function of the sub-programs take a ovs_cmdl_context. The test main() functions still receive argc and argv directly, as that seems more natural. The test-subprograms themselves are able to make use of a context internally, though. Signed-off-by: Russell Bryant <rbryant@redhat.com> Signed-off-by: Ben Pfaff <blp@nicira.com>	2015-03-17 08:15:57 -07:00
Russell Bryant	5f38375100	command-line: add ovs_cmdl_ prefix The coding style guidelines include the following: - Pick a unique name prefix (ending with an underscore) for each module, and apply that prefix to all of that module's externally visible names. Names of macro parameters, struct and union members, and parameters in function prototypes are not considered externally visible for this purpose. This patch adds the new prefix to the externally visible names. This makes it a bit more obvious what code is coming from common command line handling code. Signed-off-by: Russell Bryant <rbryant@redhat.com> Signed-off-by: Ben Pfaff <blp@nicira.com>	2015-03-16 13:42:52 -07:00
Ben Pfaff	18080541d2	classifier: Add support for conjunctive matches. A "conjunctive match" allows higher-level matches in the flow table, such as set membership matches, without causing a cross-product explosion for multidimensional matches. Please refer to the documentation that this commit adds to ovs-ofctl(8) for a better explanation, including an example. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>	2015-01-11 13:25:24 -08:00
Jarno Rajahalme	d70e8c28f9	miniflow: Use 64-bit data. So far the compressed flow data in struct miniflow has been in 32-bit words with a 63-bit map, allowing for a maximum size of struct flow of 252 bytes. With the forthcoming Geneve options this is not sufficient any more. This patch solves the problem by changing the miniflow data to 64-bit words, doubling the flow max size to 504 bytes. Since the word size is doubled, there is some loss in compression efficiency. To counter this some of the flow fields have been reordered to keep related fields together (e.g., the source and destination IP addresses share the same 64-bit word). This change should speed up flow data processing on 64-bit CPUs, which may help counterbalance the impact of making the struct flow bigger in the future. Classifier lookup stage boundaries are also changed to 64-bit alignment, as the current algorithm depends on each miniflow word to not be split between ranges. This has resulted in new padding (part of the 'mpls_lse' field). The 'dp_hash' field is also moved to packet metadata to eliminate otherwise needed padding there. This allows the L4 to fit into one 64-bit word, and also makes matches on 'dp_hash' more efficient as misses can be found already on stage 1. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2015-01-06 14:47:30 -08:00
Jarno Rajahalme	802f84ffd7	classifier: Defer pvector publication. This patch adds a new functions classifier_defer() and classifier_publish(), which control when the classifier modifications are made available to lookups. By default, all modifications are made available to lookups immediately. Modifications made after a classifier_defer() call MAY be 'deferred' for later 'publication'. A call to classifier_publish() will both publish any deferred modifications, and cause subsequent changes to to be published immediately. Currently any deferring is limited to the visibility of the subtable vector changes. pvector now processes modifications mostly in a working copy, which needs to be explicitly published with pvector_publish(). pvector_publish() sorts the working copy and removes gaps before publishing it. This change helps avoiding O(n**2) memory behavior in corner cases, where large number of rules with different masks are inserted or deleted. VMware-BZ: #1322017 Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2014-11-14 16:00:46 -08:00
Jarno Rajahalme	fccd7c092e	classifier: Remove internal mutex. Almost all classifier users already exclude concurrent modifications, or are single-threaded, hence the classifier internal mutex can be removed. Due to this change, ovs-router.c and tnl-ports.c need new mutexes, which are added. As noted by Ben in review, ovs_router_flush() should also free the entries it removes from the classifier. It now calls ovsrcu_postpone() to that effect. Suggested-by: Ben Pfaff <blp@nicira.com> Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2014-11-14 15:58:09 -08:00
Jarno Rajahalme	de4ad4a215	classifier: Lockless and robust classifier iteration. Previously, accurate iteration required writers to be excluded during iteration. This patch adds an rculist to struct cls_subtable, and a corresponding list node to struct cls_rule, which makes iteration more straightforward, and allows the iterators to remain ignorant of the internals of the cls_match. This new list allows iteration of rules in the classifier by traversing the RCU-friendly subtables vector, and the rculist of rules in each subtable. Classifier modifications may be performed concurrently, but whether or not the concurrent iterator sees those changes depends on the timing of change. More specifically, an concurrent iterator: - May or may not see a rule that is being inserted or removed. - Will see either the new or the old version of a rule that is replaced. - Will see all the other rules (that are not being modified). Finally, The subtable's rculist also allows to make classifier_rule_overlaps() lockless, which this patch also does. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2014-11-14 15:55:44 -08:00
Jarno Rajahalme	f47eef15b7	classifier: Do not insert duplicate rules in indices. There is no point in adding duplicate information into prefix tries. Also, since the lower-priority duplicate rules are not visible to lookups, they do not need to be in staged lookup indices directly either (the head rule is). Finally, now that cmap operations return the number of elements in the cmap, subtable's 'n_rules' member is not needed any more. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2014-11-14 15:55:44 -08:00
Jarno Rajahalme	dfea28b3b4	classifier: Constify RCU pointers. Returning const struct cls_rule pointers from the classifier API helps callers to remember that they should not modify the rules returned. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2014-11-06 14:55:29 -08:00
Jarno Rajahalme	712b4d2470	test-classifier: Ensure priority is not INT_MIN. Classifier reserves the priority value INT_MIN for its own use. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>	2014-10-31 16:28:58 -07:00
Jarno Rajahalme	c501b42702	classifier: Use rculist. The list of identical, but lower priority rules is not currently used in classifier lookup. A later patch introducing conjunctive matches needs to access the list during lookups, so we must make the list RCU. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2014-10-31 16:22:23 -07:00
Ben Pfaff	eb391b76af	classifier: Change type used for priorities from 'unsigned int' to 'int'. OpenFlow has priorities in the 16-bit unsigned range, from 0 to 65535. In the classifier, it is sometimes useful to be able to have values below and above this range. With the 'unsigned int' type used for priorities until now, there were no values below the range, so some code worked around it by converting priorities to 64-bit signed integers. This didn't seem so great to me given that a plain 'int' also had the needed range. This commit therefore changes the type used for priorities to int. The interesting parts of this change are in pvector.h and classifier.c, where one can see the elimination of the use of int64_t. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>	2014-10-30 17:42:58 -07:00
Jarno Rajahalme	3f636c7e22	ovs_assert, tests: Support NDEBUG. ./configure accepts --enable-ndebug option. Make ovs_assert() honor it, and make sure all test programs disable it. The order of include files in test programs is also made uniform: 1. #include <config.h> 2. #undef NDEBUG 3. Include file of the test subject (to make sure it itself has sufficient include directives). 4. System includes in alphapetical order. 5. OVS includes in aplhapetical order. Suggested-by: Ben Pfaff <blp@nicira.com> Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2014-10-30 09:14:46 -07:00
Jarno Rajahalme	42767cce72	tests/test-classifier: Properly use ovsrcu_postpone. Following patches add stricter checks of RCU memory management of rules removed from a classifier. This patch properly postpones freeing of 'struct cls_rule's that have been removed from a classifier. Also remove all the rules from classifier before destructing it in test_rule_replacement(). Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2014-10-29 09:59:57 -07:00
Alex Wang	451de37e7f	command-line: Add function to print command usage. This commit adds a new variable in 'struct command' for recording the command usage. Also, a new function is added to print the usage given the array of defined commands. Later patch will use the output in bash command-line completion script. Signed-off-by: Alex Wang <alexw@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2014-10-28 18:43:11 -07:00
Jarno Rajahalme	38c449e0c5	lib/classifier: Add lib/classifier-private.h. tests/test-classifier.c used to include lib/classifier.c to gain access to the internal data structures and some utility functions. This was confusing, so this patch splits the relevant groups of classifier internal definations to a new file (lib/classifier-private.h), which is included by both lib/classifier.c and tests/test-classifier.c. Other use of the new file is discouraged. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2014-10-24 13:22:24 -07:00
Wang Sheng-Hui	3bd0fd39eb	Use magic ETH_ADDR_LEN instead of 6 for Ethernet address length. ETH_ADDR_LEN is defined in lib/packets.h, valued 6. Use this macro instead of magic number 6 to represent the length of eth mac address. Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com> Signed-off-by: Ben Pfaff <blp@nicira.com>	2014-10-22 08:46:52 -07:00
Ben Pfaff	78c8df129c	cmap, classifier: Avoid unsafe aliasing in iterators. CMAP_FOR_EACH and CLS_FOR_EACH and their variants tried to use void ** as a "pointer to any kind of pointer". That is a violation of the aliasing rules in ISO C which technically yields undefined behavior. With GCC 4.1, it causes both warnings and actual misbehavior. One option would to add -fno-strict-aliasing to the compiler flags, but that would only help with GCC; who knows whether this can be worked around with other compilers. Instead, this commit rewrites the iterators to avoid disallowed pointer aliasing. VMware-BZ: #1287651 Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>	2014-07-21 21:00:04 -07:00
Jarno Rajahalme	e48eccd133	lib/classifier: Unify struct classifier and cls_classifier. Now that it is clear that struct cls_classifier itself does not need RCU indirection and pvector is defined in its own header, it is possible get rid of the indirection from struct classifier to struct cls_classifier. Suggested-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp> Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2014-07-18 02:24:26 -07:00

1 2 3

137 Commits