openvswitch

mirror of https://github.com/openvswitch/ovs synced 2025-10-21 14:49:41 +00:00

Author	SHA1	Message	Date
Jesse Gross	f0cd669f19	datapath: Wrap struct ovs_key_ipv4_tunnel in a new structure. Currently, the flow information that is matched for tunnels and the tunnel data passed around with packets is the same. However, as additional information is added this is not necessarily desirable, as in the case of pointers. This adds a new structure for tunnel metadata which currently contains only the existing struct. This change is purely internal to the kernel since the current OVS_KEY_ATTR_IPV4_TUNNEL is simply a compressed version of OVS_KEY_ATTR_TUNNEL that is translated at flow setup. Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2014-06-19 18:33:28 -07:00
Jesse Gross	9cef26ac6a	datapath: Eliminate memset() from flow_extract. As new protocols are added, the size of the flow key tends to increase although few protocols care about all of the fields. In order to optimize this for hashing and matching, OVS uses a variable length portion of the key. However, when fields are extracted from the packet we must still zero out the entire key. This is no longer necessary now that OVS implements masking. Any fields (or holes in the structure) which are not part of a given protocol will be by definition not part of the mask and zeroed out during lookup. Furthermore, since masking already uses variable length keys this zeroing operation automatically benefits as well. In principle, the only thing that needs to be done at this point is remove the memset() at the beginning of flow. However, some fields assume that they are initialized to zero, which now must be done explicitly. In addition, in the event of an error we must also zero out corresponding fields to signal that there is no valid data present. These increase the total amount of code but very little of it is executed in non-error situations. Removing the memset() reduces the profile of ovs_flow_extract() from 0.64% to 0.56% when tested with large packets on a 10G link. Suggested-by: Pravin Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2014-06-19 18:33:27 -07:00
Ben Pfaff	a5b8d49bc6	datapath: Fix tracking of flags seen in TCP flows. Flow statistics need to take into account the TCP flags from the packet currently being processed (in 'key'), not the TCP flags matched by the flow found in the kernel flow table (in 'flow'). This bug made the Open vSwitch userspace fin_timeout action have no effect in many cases. Bug #1219516. Reported-by: Len Gao <leng@vmware.com> Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2014-04-08 15:39:18 -07:00
Jarno Rajahalme	4bb90bea0c	datapath/flow: Fix ovs_flow_stats_get/clear RCU dereference. For ovs_flow_stats_get() using ovsl_dereference() was wrong, since flow dumps call this with RCU read lock. ovs_flow_stats_clear() is always called with ovs_mutex, so can use ovsl_dereference(). Also, make the ovs_flow_stats_get() 'flow' argument const to make later patches cleaner. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-04-02 11:14:58 -07:00
Jarno Rajahalme	aa91700611	datapath: Clarify locking. Remove unnecessary locking from functions that are always called with appropriate locking. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Thomas Graf <tgraf@redhat.com>	2014-03-25 09:12:44 -07:00
Jarno Rajahalme	708fb4c50a	datapath: Compact sw_flow_key. Minimize padding in sw_flow_key and move 'tp' top the main struct. These changes simplify code when accessing the transport port numbers and the tcp flags, and makes the sw_flow_key 8 bytes smaller on 64-bit systems (128->120 bytes). These changes also make the keys for IPv4 packets to fit in one cache line. There is a valid concern for safety of packing the struct ovs_key_ipv4_tunnel, as it would be possible to take the address of the tun_id member as a __be64 * which could result in unaligned access in some systems. However: - sw_flow_key itself is 64-bit aligned, so the tun_id within is always 64-bit aligned. - We never make arrays of ovs_key_ipv4_tunnel (which would force every second tun_key to be misaligned). - We never take the address of the tun_id in to a __be64 *. - Whereever we use struct ovs_key_ipv4_tunnel outside the sw_flow_key, it is in stack (on tunnel input functions), where compiler has full control of the alignment. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-03-24 10:45:47 -07:00
Jarno Rajahalme	9ae5ab3c35	datapath: Use TCP flags in the flow key for stats. We already extract the TCP flags for the key, might as well use that for stats. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-03-24 09:52:03 -07:00
Ben Pfaff	4029c0435a	datapath: Correctly report flow used times for first 5 minutes after boot. The kernel starts out its "jiffies" timer as 5 minutes below zero, as shown in include/linux/jiffies.h: /* * Have the 32 bit jiffies value wrap 5 minutes after boot * so jiffies wrap bugs show up earlier. / #define INITIAL_JIFFIES ((unsigned long)(unsigned int) (-300HZ)) The loop in ovs_flow_stats_get() starts out with 'used' set to 0, then takes any "later" time. This means that for the first five minutes after boot, flows will always be reported as never used, since 0 is greater than any time already seen. Bug #1192516. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2014-02-28 15:14:25 -08:00
Joe Perches	982a47ecea	datapath: Use ether_addr_copy It's slightly smaller/faster for some architectures. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-02-16 08:31:45 -08:00
Jarno Rajahalme	9ac56358de	datapath: Per NUMA node flow stats. Keep kernel flow stats for each NUMA node rather than each (logical) CPU. This avoids using the per-CPU allocator and removes most of the kernel-side OVS locking overhead otherwise on the top of perf reports and allows OVS to scale better with higher number of threads. With 9 handlers and 4 revalidators netperf TCP_CRR test flow setup rate doubles on a server with two hyper-threaded physical CPUs (16 logical cores each) compared to the current OVS master. Tested with non-trivial flow table with a TCP port match rule forcing all new connections with unique port numbers to OVS userspace. The IP addresses are still wildcarded, so the kernel flows are not considered as exact match 5-tuple flows. This type of flows can be expected to appear in large numbers as the result of more effective wildcarding made possible by improvements in OVS userspace flow classifier. Perf results for this test (master): Events: 305K cycles + 8.43% ovs-vswitchd [kernel.kallsyms] [k] mutex_spin_on_owner + 5.64% ovs-vswitchd [kernel.kallsyms] [k] __ticket_spin_lock + 4.75% ovs-vswitchd ovs-vswitchd [.] find_match_wc + 3.32% ovs-vswitchd libpthread-2.15.so [.] pthread_mutex_lock + 2.61% ovs-vswitchd [kernel.kallsyms] [k] pcpu_alloc_area + 2.19% ovs-vswitchd ovs-vswitchd [.] flow_hash_in_minimask_range + 2.03% swapper [kernel.kallsyms] [k] intel_idle + 1.84% ovs-vswitchd libpthread-2.15.so [.] pthread_mutex_unlock + 1.64% ovs-vswitchd ovs-vswitchd [.] classifier_lookup + 1.58% ovs-vswitchd libc-2.15.so [.] 0x7f4e6 + 1.07% ovs-vswitchd [kernel.kallsyms] [k] memset + 1.03% netperf [kernel.kallsyms] [k] __ticket_spin_lock + 0.92% swapper [kernel.kallsyms] [k] __ticket_spin_lock ... And after this patch: Events: 356K cycles + 6.85% ovs-vswitchd ovs-vswitchd [.] find_match_wc + 4.63% ovs-vswitchd libpthread-2.15.so [.] pthread_mutex_lock + 3.06% ovs-vswitchd [kernel.kallsyms] [k] __ticket_spin_lock + 2.81% ovs-vswitchd ovs-vswitchd [.] flow_hash_in_minimask_range + 2.51% ovs-vswitchd libpthread-2.15.so [.] pthread_mutex_unlock + 2.27% ovs-vswitchd ovs-vswitchd [.] classifier_lookup + 1.84% ovs-vswitchd libc-2.15.so [.] 0x15d30f + 1.74% ovs-vswitchd [kernel.kallsyms] [k] mutex_spin_on_owner + 1.47% swapper [kernel.kallsyms] [k] intel_idle + 1.34% ovs-vswitchd ovs-vswitchd [.] flow_hash_in_minimask + 1.33% ovs-vswitchd ovs-vswitchd [.] rule_actions_unref + 1.16% ovs-vswitchd ovs-vswitchd [.] hindex_node_with_hash + 1.16% ovs-vswitchd ovs-vswitchd [.] do_xlate_actions + 1.09% ovs-vswitchd ovs-vswitchd [.] ofproto_rule_ref + 1.01% netperf [kernel.kallsyms] [k] __ticket_spin_lock ... There is a small increase in kernel spinlock overhead due to the same spinlock being shared between multiple cores of the same physical CPU, but that is barely visible in the netperf TCP_CRR test performance (maybe ~1% performance drop, hard to tell exactly due to variance in the test results), when testing for kernel module throughput (with no userspace activity, handful of kernel flows). On flow setup, a single stats instance is allocated (for the NUMA node 0). As CPUs from multiple NUMA nodes start updating stats, new NUMA-node specific stats instances are allocated. This allocation on the packet processing code path is made to never block or look for emergency memory pools, minimizing the allocation latency. If the allocation fails, the existing preallocated stats instance is used. Also, if only CPUs from one NUMA-node are updating the preallocated stats instance, no additional stats instances are allocated. This eliminates the need to pre-allocate stats instances that will not be used, also relieving the stats reader from the burden of reading stats that are never used. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-02-18 09:56:55 -08:00
Jarno Rajahalme	df65fec117	datapath: Remove 5-tuple optimization. The 5-tuple optimization becomes unnecessary with a later per-NUMA node stats patch. Remove it first to make the changes easier to grasp. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-02-18 09:07:03 -08:00
Jarno Rajahalme	ac3e564e43	datapath: Read tcp flags only then the tranport header is present. Only the first IP fragment can have a TCP header, check for this. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-02-10 08:52:25 -08:00
Pravin B Shelar	9d73c9cac7	datapath: Fix deadlock during stats update. Stats-read needs to lock stats but same lock is taken in stats update in irq context. Therefore it needs to disable irq to avoid following deadlock :- BUG: soft lockup - CPU#1 stuck for 23s! [ovs-vswitchd:1425] CPU 1 Pid: 1425, comm: ovs-vswitchd Tainted: G O 3.2.39-server-nn23 #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform RIP: 0010:[<ffffffff8103db22>] [<ffffffff8103db22>] __ticket_spin_lock+0x22/0x30 RSP: 0018:ffff88003fd03b30 EFLAGS: 00000297 RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000050 RDX: 0000000000000002 RSI: ffff88003d0a9900 RDI: ffff88003ae19598 RBP: ffff88003fd03b30 R08: 0000000000000000 R09: ffff88003ad44048 R10: 0000000000000001 R11: 0000000000000001 R12: ffff88003fd03aa8 R13: ffffffff8164e5de R14: ffff88003fd03b30 R15: ffff88003ae19580 FS: 00007ffb0b428940(0000) GS:ffff88003fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f3c0ef94000 CR3: 00000000250e2000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process ovs-vswitchd (pid: 1425, threadinfo ffff88002514a000, task ffff8800250aae00) Stack: ffff88003fd03b40 ffffffff8164596e ffff88003fd03b70 ffffffffa027622d ffff88003d0a9900 ffffe8ffffd03800 ffff8800297f5a80 ffff88003fd03ba8 ffff88003fd03c60 ffffffffa02759af ffff88003fd03de0 ffff88003fd03e4c Call Trace: <IRQ> [<ffffffff8164596e>] _raw_spin_lock+0xe/0x20 [<ffffffffa027622d>] ovs_flow_stats_update+0x5d/0x100 [openvswitch] [<ffffffffa02759af>] ovs_dp_process_received_packet+0x8f/0x130 [openvswitch] [<ffffffffa027c0ca>] ovs_vport_receive+0x2a/0x30 [openvswitch] [<ffffffffa027db18>] netdev_frame_hook+0xb8/0x120 [openvswitch] [<ffffffffa027da60>] ? free_port_rcu+0x30/0x30 [openvswitch] [<ffffffff81539318>] __netif_receive_skb+0x1c8/0x620 [<ffffffff8153a4c0>] netif_receive_skb+0x80/0x90 [<ffffffff8115f14c>] ? ksize+0x1c/0xc0 [<ffffffff8153a610>] napi_skb_finish+0x50/0x70 [<ffffffff8153ac15>] napi_gro_receive+0xf5/0x140 [<ffffffffa00368ae>] vmxnet3_rq_rx_complete+0x51e/0x7c0 [vmxnet3] [<ffffffff8101ac90>] ? nommu_map_sg+0xe0/0xe0 [<ffffffffa0036da5>] vmxnet3_poll_rx_only+0x45/0xc0 [vmxnet3] [<ffffffff8153ae64>] net_rx_action+0x134/0x290 [<ffffffff8103db0d>] ? __ticket_spin_lock+0xd/0x30 [<ffffffff8106e1a8>] __do_softirq+0xa8/0x210 [<ffffffff8164596e>] ? _raw_spin_lock+0xe/0x20 [<ffffffff8164fd6c>] call_softirq+0x1c/0x30 [<ffffffff81016215>] do_softirq+0x65/0xa0 [<ffffffff8106e58e>] irq_exit+0x8e/0xb0 [<ffffffff81650633>] do_IRQ+0x63/0xe0 [<ffffffff81645e2e>] common_interrupt+0x6e/0x6e ----------- Bug #21853 Reported-by: Pawan Shukla <shuklap@vmware.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2013-12-15 20:37:07 -08:00
Pravin B Shelar	b0f3a2feef	datapath: Use percpu allocator for flow-stats. Use percpu allocator for stats due to objection to stats array. But percpu allocator is not designed for high churn allocation/ deallcation. so we need to avoid allocating percpu flow for short lived flows. One cheaper way to detect flow is by checking if 5-tuple used in RSS are masked or not. if any one of them is masked, flow is likely shared across CPU where percpu stat should be more scalable. And that flow should be relatively long lived flow. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2013-12-03 08:57:56 -08:00
Jarno Rajahalme	dc235f7fbc	TCP flags matching support. tcp_flags=flags/mask Bitwise match on TCP flags. The flags and mask are 16-bit num‐ bers written in decimal or in hexadecimal prefixed by 0x. Each 1-bit in mask requires that the corresponding bit in port must match. Each 0-bit in mask causes the corresponding bit to be ignored. TCP protocol currently defines 9 flag bits, and additional 3 bits are reserved (must be transmitted as zero), see RFCs 793, 3168, and 3540. The flag bits are, numbering from the least significant bit: 0: FIN No more data from sender. 1: SYN Synchronize sequence numbers. 2: RST Reset the connection. 3: PSH Push function. 4: ACK Acknowledgement field significant. 5: URG Urgent pointer field significant. 6: ECE ECN Echo. 7: CWR Congestion Windows Reduced. 8: NS Nonce Sum. 9-11: Reserved. 12-15: Not matchable, must be zero. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2013-10-29 09:43:59 -07:00
Jarno Rajahalme	a66733a8bc	Widen TCP flags handling. Widen TCP flags handling from 7 bits (uint8_t) to 12 bits (uint16_t). The kernel interface remains at 8 bits, which makes no functional difference now, as none of the higher bits is currently of interest to the userspace. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2013-10-29 09:40:19 -07:00
Pravin B Shelar	b0b906ccf4	datapath: Per cpu flow stats. With mega flow implementation ovs flow can be shared between multiple CPUs which makes stats updates highly contended operation. Following patch allocates separate stats for each CPU to make stats update scalable. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2013-10-21 08:42:20 -07:00
Pravin B Shelar	a097c0b230	datapath: Restructure datapath.c and flow.c Over the time datapath.c and flow.c has became pretty large files. Following patch restructures functionality of component into three different components: flow.c: contains flow extract. flow_netlink.c: netlink flow api. flow_table.c: flow table api. Diffstat is showing wrong count. This patch mostly restructures code without changing logic. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2013-10-01 17:11:16 -07:00
Daniel Borkmann	eba93614ba	datapath: flow: fix potential illegal memory access in __parse_flow_nlattrs In function __parse_flow_nlattrs(), we check for condition (type > OVS_KEY_ATTR_MAX) and if true, print an error, but we do not return from this function as in other checks. It seems this has been forgotten, as otherwise, we could access beyond the memory of ovs_key_lens, which is of ovs_key_lens[OVS_KEY_ATTR_MAX + 1]. Hence, a maliciously prepared nla_type from user space could access beyond this upper limit. Introduced by 03f0d916a ("openvswitch: Mega flow implementation"). Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Andy Zhou <azhou@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-09-09 13:27:27 -07:00
Pravin B Shelar	3025a772a1	datapath: Remove skb->mark compat code. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2013-09-06 09:51:31 -07:00
Jesse Gross	9a6fe4c1ae	datapath: Fix alignment of struct sw_flow_key. sw_flow_key alignment was declared as " __aligned(__alignof__(long))". However, this breaks on the m68k architecture where long is 32 bit in size but 16 bit aligned by default. This aligns to the size of a long to ensure that we can always do comparsions in full long-sized chunks. It also adds an additional build check to catch any reduction in alignment. CC: Andy Zhou <azhou@nicira.com> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-09-05 13:03:51 -07:00
Andy Zhou	c2dd5e999f	datapath: optimize flow compare and mask functions Make sure the sw_flow_key structure and valid mask boundaries are always machine word aligned. Optimize the flow compare and mask operations using machine word size operations. This patch improves throughput on average by 15% when CPU is the bottleneck of forwarding packets. This patch is inspired by ideas and code from a patch submitted by Peter Klausler titled "replace memcmp() with specialized comparator". However, The original patch only optimizes for architectures support unaligned machine word access. This patch optimizes for all architectures. Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-08-27 12:53:12 -07:00
Jesse Gross	0154b18b84	datapath: Remove redundant EtherType check in vlan translation. This was supposed to be part of the previous patch but I forgot to commit it before pushing. Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-08-22 13:30:45 -07:00
Andy Zhou	799fe14776	datapath: More strict vlan encap netlink check Only parse the encap key field if eth_type is 802.1Q and VLAN_TAG_PRESENT bit is set. Add a few more eror checks and logs. Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-08-22 13:25:25 -07:00
Andy Zhou	cc611f66a6	datapath: Rename key_len to key_end Key_end is a better name describing the ending boundary than key_len. Rename those variables to make it less confusing. Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-08-22 11:39:53 -07:00
Joe Stringer	10f72e3da9	datapath: Add SCTP support This patch adds support for rewriting SCTP src,dst ports similar to the functionality already available for TCP/UDP. Rewriting SCTP ports is expensive due to double-recalculation of the SCTP checksums; this is performed to ensure that packets traversing OVS with invalid checksums will continue to the destination with any checksum corruption intact. Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: Joe Stringer <joe@wand.net.nz> Signed-off-by: Ben Pfaff <blp@nicira.com>	2013-08-22 09:29:39 -07:00
Justin Pettit	7d8777cdf4	datapath: Remove old argument description in flow.c. Signed-off-by: Justin Pettit <jpettit@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2013-08-19 17:45:54 -07:00
pritesh	e821558d0a	datapath: Fix typos in Netlink debugging messages. Signed-off-by: pritesh <pritesh.kothari@cisco.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-08-19 17:42:57 -07:00
Jesse Gross	6273244ac7	datapath: Fix typo in flow validation logic. A bit shift operation is using the value '11' instead of '1' as the starting value. This only makes validation weaker than it should be so unless userspace is trying to install an invalid flow there will be no effect. Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Kyle Mestery <kmestery@cisco.com>	2013-08-09 15:54:42 -07:00
Jesse Gross	54cc3ec63a	datapath: Add 'ovs_' prefix to extern symbols. The external symbols in the OVS kernel module are prefixed with 'ovs_' with the exception of ipv4_tun_to/from_nlattr(). This adds the prefix and makes the out of tree version consistent with upstream. Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Andy Zhou <azhou@nicira.com>	2013-08-09 12:47:43 -07:00
Jesse Gross	01fcdfc6b3	odp-util: Always serialize tunnel mask attributes. A tunnel value attribute is not allowed to have an empty IP destination address but this is legal for masks. This drops both the checks for serializing masks and also the sanity checks on them. Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Andy Zhou <azhou@nicira.com>	2013-08-01 19:00:40 -07:00
Jesse Gross	a583e032a3	datapath: Introduce is_mask when serializing netlink attributes. The intention is clearer than if we rederive it in every location. Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Andy Zhou <azhou@nicira.com>	2013-08-01 19:00:40 -07:00
Andy Zhou	33fd9f59af	datapath: Accept any 802.2 eth_type mask but override to be exact match When key.eth_type is absent it is interpreted to be 802.2, which is represented by a special value. In order to prevent inadvertant matches on this opaque value, the mask is forced to be either fully wildcarded or fully exact. Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-08-01 13:00:45 -07:00
Andy Zhou	202c1051a9	datapath: Accept any in_port mask but override to be exact match Pre mega flow, netlink allows the in_port key attribute to be missing. Missing in_port is interpreted as DP_MAX_PORTS. For backward compatibility, mega flow implementation will always allow the mask of in_port to be specified, as if the in_port key attribute is always specified. To prevent accidental match of the DP_MAX_PORTS, which value is opaque to the user space, we will always force the mask to be exact match, regardless of the value supplied by the netline message. Missing in_port mask continue to mean wildcarded match, same as other masks. Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-08-01 13:00:45 -07:00
Andy Zhou	78f1e5c7d3	datapath: Always allow tunnel mask to be specified in the netlink Netlink message usually only accpets a mask when there is a corresponding key attribute. Tunnel mask and eth_type are the only two expections so far. Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-08-01 09:23:26 -07:00
Andy Zhou	dd62b991ec	datapath: always export priority and skb_mark in netlink message Handling of missing attributes in netlink can be tricky and turns out to be error prone. The value (savings in netlink bandwidth) does not seem to be significant enough to justify allowing them. This patch series make both kernel and userspace always export priority and skb_mark attribute. There will be follow on patches in the direction of making all attributes explicit. Signed-off-by: Andy Zhou <azhou@nicira.com> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-07-31 11:20:15 -07:00
Andy Zhou	62f13eb2a1	datapath: Fix missing VLAN netlink attribute handling Missing VLAN netlink attribute should be interpreted as exact match of no VLAN tag, instead of wildcarded match for all VLAN tags. Bug #18736. Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-07-29 14:31:21 -07:00
Andy Zhou	388c13d860	datapath: fix a bug in SF_FLOW_KEY_PUT macro This bug will cause mask values to corrupt the flow key value. So far the bug has not showed up because we don't write mask value when there is no mask Netlink attributes. However, it needs to be fixed for the next and future commits where we will start to set default values for key and mask for missing Netlink attributes. Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-07-29 13:31:01 -07:00
Pravin B Shelar	16c6b9216e	datapath: Use non rcu hlist_del() flow table entry. Flow table destroy is done in rcu call-back context. Therefore there is no need to use rcu variant of hlist_del(). Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@niciria.com>	2013-07-25 11:28:02 -07:00
Pravin B Shelar	8e1a83d8f3	datapath: Use correct type while allocating flex array. Flex array is used to allocate hash buckets which is type struct hlist_head, but we use `struct hlist_head *` to calculate array size. Since hlist_head is of size pointer it works fine. Following patch use correct type. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2013-07-25 11:25:21 -07:00
Andy Zhou	3d09706410	datapath: remove RCU annotation from flow->mask After a mask is assigned to a flow, it will not change for the life of the flow. Since flow access is protected by RCU lock, access to flow->mask after getting a flow is always safe. Suggested-by: Jesse Gross <jesse@nicira.com> Reported-by: Ben Pfaff <blp@nicira.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-07-22 15:24:42 -07:00
Andy Zhou	a99c219c5d	datapath: Add mask check during flow lookup A mega flow matches when the masked key matches and the mask applied is the same as the mask used to create the mega flow. This patch adds the implementation of the second match condition mentioned above. Without this fix, mega flow lookup may result false match. Bug #18584 Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2013-07-19 07:33:42 -07:00
Jesse Gross	529db6351a	datapath: Use masked flow when validating actions. It is important to validate flow actions to ensure that they do not try to write off the end of a packet. The mechanism to do this is to ensure that a flow is precise enough to describe valid vs. invalid packets and only allowing actions on valid flows. The introduction of megaflows broke this by using a narrow base flow but a potentially wide match. This meant that while the original flow was properly validated, later packets might not conform to that flow and could be truncated. This switches to using the masked flow instead, effectively requiring that all possible matching packets be valid in order for a flow's actions to be accepted. This change only affects the flow setup path - executed packets have always used the flow extracted from the packet and therefore were properly validated. Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-07-16 15:37:24 -07:00
Jesse Gross	9450a6e061	datpath: Fix tunnel TTL flow rejection message. There is no default value for the tunnel TTL, so it must be specified when setting up a new flow. However, the flow rejection log message indicates that the TTL must be non-zero, which is not true. CC: Andy Zhou <azhou@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-07-09 18:50:42 -07:00
Jesse Gross	aa20ff9d43	datapath: Always include tunnel TTL in serialized Netlink attributes. There is no default value for the tunnel TTL so it must always be included in flow keys sent from userspace to kernel. The kernel should also respect this convertion when sending flows to userspace by always including the TTL in tunnel flows. CC: Andy Zhou <azhou@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-07-09 16:47:01 -07:00
Andy Zhou	04b13268fd	datapath: fix bugs in exporting netlink attributes Reported-by: Justin Pettit <jpettit@nicira.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-07-09 16:14:55 -07:00
Jesse Gross	861687d4a5	datapath: Use DP_MAX_PORTS when no IN_PORT attribute is present. To indicate that a flow is not associated with any particular in port, userspace may omit the IN_PORT attribute, which the kernel translates internally to the special value DP_MAX_PORTS. After the megaflows changes, this was no longer being done, resulting in it using port 0 (the internal port). This also adopts a wildcarding scheme similar to 802.2 packets where a mask can be specified for this non-existent key attribute but it must either be completely wildcarded or completely exact match. CC: Andy Zhou <azhou@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-07-09 15:31:02 -07:00
Jesse Gross	17ce1965b3	datapath: Make netlink error messages more consistent. Suggested-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-07-08 14:51:27 -07:00
Andy Zhou	1b93647256	datapath: add netlink error message to help kernel userspace integration. When kernel rejects a netlink message, it usually returns EINVAL error code to the userspace. The actual reason for rejecting the netlinke message is not available, making it harder to debug netlink issues. This patch adds kernel log messages whenever a netlink message is rejected with reasons. Those messages are logged at the info level. Those messages are logged only once per message, to keep kernel log noise level down. Reload the kernel module to re-enable already logged messages. The messages are meant to help developers to debug userspace and kernel intergration issues. The actual message may change or be removed over time. These messages are not expected to show up in a production environment. Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-07-03 16:51:05 -07:00
Andy Zhou	9bbd9da08e	datapath: Bug fix: Kernel rejects flow with valid vlan field Bug #18233 Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-06-27 19:10:39 -07:00

1 2 3 4

174 Commits