openvswitch

mirror of https://github.com/openvswitch/ovs synced 2025-10-19 14:37:21 +00:00

Author	SHA1	Message	Date
Joe Stringer	c05e20946d	datapath: Backport conntrack fixes. Backport the following fixes for conntrack from upstream. 9723e6abc70a openswitch: fix typo CONFIG_NF_CONNTRACK_LABEL 0d5cdef8d5dd openvswitch: Fix conntrack compilation without mark. 982b52700482 openvswitch: Fix mask generation for nested attributes. cc5706056baa openvswitch: Fix IPv6 exthdr handling with ct helpers. 33db4125ec74 openvswitch: Rename LABEL->LABELS b8f2257069f1 openvswitch: Fix skb leak in ovs_fragment() ec0d043d05e6 openvswitch: Ensure flow is valid before executing ct 6f225952461b openvswitch: Reject ct_state unsupported bits fbccce5965a5 openvswitch: Extend ct_state match field to 32 bits ab38a7b5a449 openvswitch: Change CT_ATTR_FLAGS to CT_ATTR_COMMIT 9e384715e9e7 openvswitch: Reject ct_state masks for unknown bits 4f0909ee3d8e openvswitch: Mark connections new when not confirmed. e754ec69ab69 openvswitch: Serialize nested ct actions if provided 74c16618137f openvswitch: Fix double-free on ip_defrag() errors 6f5cadee44d8 openvswitch: Fix skb leak using IPv6 defrag Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2015-12-03 17:17:25 -08:00
Joe Stringer	038e34abaa	datapath: Allow matching on conntrack label Allow matching and setting the ct_label field. As with ct_mark, this is populated by executing the CT action. The label field may be modified by specifying a label and mask nested under the CT action. It is stored as metadata attached to the connection. Label modification occurs after lookup, and will only persist when the conntrack entry is committed by providing the COMMIT flag to the CT action. Labels are currently fixed to 128 bits in size. Upstream: c2ac667 "openvswitch: Allow matching on conntrack label" Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2015-12-03 17:17:25 -08:00
Joe Stringer	372ce9737d	datapath: Allow matching on conntrack mark Allow matching and setting the ct_mark field. As with ct_state and ct_zone, these fields are populated when the CT action is executed. To write to this field, a value and mask can be specified as a nested attribute under the CT action. This data is stored with the conntrack entry, and is executed after the lookup occurs for the CT action. The conntrack entry itself must be committed using the COMMIT flag in the CT action flags for this change to persist. Upstream: 182e304 "openvswitch: Allow matching on conntrack mark" Signed-off-by: Justin Pettit <jpettit@nicira.com> Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2015-12-03 17:17:25 -08:00
Joe Stringer	a94ebc3999	datapath: Add conntrack action Expose the kernel connection tracker via OVS. Userspace components can make use of the CT action to populate the connection state (ct_state) field for a flow. This state can be subsequently matched. Exposed connection states are OVS_CS_F_*: - NEW (0x01) - Beginning of a new connection. - ESTABLISHED (0x02) - Part of an existing connection. - RELATED (0x04) - Related to an established connection. - INVALID (0x20) - Could not track the connection for this packet. - REPLY_DIR (0x40) - This packet is in the reply direction for the flow. - TRACKED (0x80) - This packet has been sent through conntrack. When the CT action is executed by itself, it will send the packet through the connection tracker and populate the ct_state field with one or more of the connection state flags above. The CT action will always set the TRACKED bit. When the COMMIT flag is passed to the conntrack action, this specifies that information about the connection should be stored. This allows subsequent packets for the same (or related) connections to be correlated with this connection. Sending subsequent packets for the connection through conntrack allows the connection tracker to consider the packets as ESTABLISHED, RELATED, and/or REPLY_DIR. The CT action may optionally take a zone to track the flow within. This allows connections with the same 5-tuple to be kept logically separate from connections in other zones. If the zone is specified, then the "ct_zone" match field will be subsequently populated with the zone id. IP fragments are handled by transparently assembling them as part of the CT action. The maximum received unit (MRU) size is tracked so that refragmentation can occur during output. IP frag handling contributed by Andy Zhou. Based on original design by Justin Pettit. Upstream: 7f8a436 "openvswitch: Add conntrack action" Signed-off-by: Joe Stringer <joestringer@nicira.com> Signed-off-by: Justin Pettit <jpettit@nicira.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2015-12-03 17:17:25 -08:00
Joe Stringer	c3bb15b38a	datapath: Serialize acts with original netlink len Previously, we used the kernel-internal netlink actions length to calculate the size of messages to serialize back to userspace. However,the sw_flow_actions may not be formatted exactly the same as the actions on the wire, so store the original actions length when de-serializing and re-use the original length when serializing. Upstream: 8e2fed1 "openvswitch: Serialize acts with original netlink len" Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2015-12-03 17:17:25 -08:00
Pravin B Shelar	e23775f20e	datapath: Add support for lwtunnel Following patch adds support for lwtunnel to OVS datapath. With this change OVS datapath detect lwtunnel support and make use of new APIs if available. On older kernel where the support is not there the backported tunnel modules are used. These backported tunnel devices acts as lwtunnel devices. I tried to keep backported module same as upstream for easier bug-fix backport. Since STT and LISP are not upstream OVS always needs to use respective modules from tunnel compat layer. To make it work on kernel 4.3 I have converted STT and LISP modules to lwtunnel API model. lwtunnel make use of skb-dst to pass tunnel information to the tunnel module. On older kernel this is not possible. So the in case of old kernel metadata ref is stored in OVS_CB and direct call to tunnel transmit function is made by respective tunnel vport modules. Similarly on receive side tunnel recv directly call netdev-vport-receive to pass the skb to OVS. Major backported components include: Geneve, GRE, VXLAN, ip_tunnel, udp-tunnels GRO. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Joe Stringer <joe@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>	2015-12-03 16:30:21 -08:00
Joe Stringer	a0fb56c1b2	datapath: Whitespace fixes. Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2015-07-30 16:42:07 -07:00
Joe Stringer	bc619e29df	datapath: Add support for unique flow IDs. Previously, flows were manipulated by userspace specifying a full, unmasked flow key. This adds significant burden onto flow serialization/deserialization, particularly when dumping flows. This patch adds an alternative way to refer to flows using a variable-length "unique flow identifier" (UFID). At flow setup time, userspace may specify a UFID for a flow, which is stored with the flow and inserted into a separate table for lookup, in addition to the standard flow table. Flows created using a UFID must be fetched or deleted using the UFID. All flow dump operations may now be made more terse with OVS_UFID_F_* flags. For example, the OVS_UFID_F_OMIT_KEY flag allows responses to omit the flow key from a datapath operation if the flow has a corresponding UFID. This significantly reduces the time spent assembling and transacting netlink messages. With all OVS_UFID_F_OMIT_* flags enabled, the datapath only returns the UFID and statistics for each flow during flow dump, increasing ovs-vswitchd revalidator performance by 40% or more. Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-27 10:58:37 -08:00
Thomas Graf	4b1632249f	datapath: Rename GENEVE_TUN_OPTS() to TUN_METADATA_OPTS() Backport of upstream commit: openvswitch: Rename GENEVE_TUN_OPTS() to TUN_METADATA_OPTS() Also factors out Geneve validation code into a new separate function validate_and_copy_geneve_opts(). A subsequent patch will introduce VXLAN options. Rename the existing GENEVE_TUN_OPTS() to reflect its extended purpose of carrying generic tunnel metadata options. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net> Upstream: d91641d ("openvswitch: Rename GENEVE_TUN_OPTS() to TUN_METADATA_OPTS()") Signed-off-by: Thomas Graf <tgraf@noironetworks.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2015-02-03 21:56:51 +01:00
Pravin B Shelar	7d16c8478e	datapath: fix coding style. Kernel datapath code has diverged from upstream code. This makes porting patches between these two code bases harder than it needs to be. Following patch fixes this by fixing coding style issues on this branch. Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-11-09 20:03:33 -08:00
Jarno Rajahalme	9233cef706	datapath: Add support for OVS_FLOW_ATTR_PROBE. This new flag is useful for suppressing error logging while probing for datapath features using flow commands. For backwards compatibility reasons the commands are executed normally, but error logging is suppressed. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2014-10-03 13:31:07 -07:00
Thomas Graf	f1f60b8583	datapath: Constify various function arguments Help produce better optimized code. Signed-off-by: Thomas Graf <tgraf@noironetworks.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-09-23 14:47:58 -07:00
Wenyu Zhang	8b7ea2d480	Extend OVS IPFIX exporter to export tunnel headers Extend IPFIX exporter to export tunnel headers when both input and output of the port. Add three other_config options in IPFIX table: enable-input-sampling, enable-output-sampling and enable-tunnel-sampling, to control whether sampling tunnel info, on which direction (input or output). Insert sampling action before output action and the output tunnel port is sent to datapath in the sampling action. Make datapath collect output tunnel info and send it back to userpace in upcall message with a new additional optional attribute. Add a tunnel ports map to make the tunnel port lookup faster in sampling upcalls in IPFIX exporter. Make the IPFIX exporter generate IPFIX template sets with enterprise elements for the tunnel info, save the tunnel info in IPFIX cache entries, and send IPFIX DATA with tunnel info. Add flowDirection element in IPFIX templates. Signed-off-by: Wenyu Zhang <wenyuz@vmware.com> Acked-by: Romain Lenglet <rlenglet@vmware.com> Acked-by: Ben Pfaff <blp@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2014-08-18 01:01:10 -07:00
Andy Zhou	e0a42ae29a	datapath: Update flow key before recirc When flow key becomes invalid due to push or pop actions, current implementation leaves it as invalid, only rebuild the flow key used for recirculation. This works, but is less efficient in case of multiple recirc actions. Each recirc action will have to re-extract its own flow keys. This patch update the original flow key as soon as the first recirc action is encountered, avoiding expensive flow extract call for any future recirc actions as long as the flow key remains valid. Signed-off-by: Andy Zhou <azhou@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2014-08-12 10:35:08 -07:00
Pravin B Shelar	fb66fbd15b	datapath: Use tun_info only for egress tunnel path. Currently tun_info is used for passing tunnel information on ingress and egress path, this cause confusion. Following patch removes its use on ingress path make it egress only parameter. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Andy Zhou <azhou@nicira.com>	2014-08-06 22:12:25 -07:00
Pravin B Shelar	7f45215aec	datapath: Avoid using wrong metadata for recic action. Recirc action needs to extract flow key from packet, it uses tun_info from OVS_CB for setting tunnel meta data in flow key. But tun_info can be overwritten by tunnel send action. This would result in wrong flow key for the recirculation. Following patch copies flow-key meta data from OVS_CB packet key itself thus avoids this bug. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Andy Zhou <azhou@nicira.com>	2014-08-06 18:03:19 -07:00
Pravin B Shelar	c135bba1a9	datapath: refactor ovs flow extract API. OVS flow extract is called on packet receive or packet execute code path. Following patch defines separate API for extracting flow-key in packet execute code path. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Andy Zhou <azhou@nicira.com>	2014-08-06 18:03:17 -07:00
Simon Horman	ccf4378615	datapath: Add basic MPLS support to kernel Allow datapath to recognize and extract MPLS labels into flow keys and execute actions which push, pop, and set labels on packets. Based heavily on work by Leo Alterman, Ravi K, Isaku Yamahata and Joe Stringer. Cc: Ravi K <rkerur@gmail.com> Cc: Leo Alterman <lalterman@nicira.com> Cc: Isaku Yamahata <yamahata@valinux.co.jp> Cc: Joe Stringer <joe@wand.net.nz> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-06-24 16:02:02 -07:00
Jesse Gross	c1fc1411d2	datapath: Add support for Geneve tunneling. This adds support for Geneve - Generic Network Virtualization Encapsulation. The protocol is documented at http://tools.ietf.org/html/draft-gross-geneve-00 The kernel implementation is completely agnostic to the options that are in use and can handle newly defined options without further work. It does this by simply matching on a byte array of options and allowing userspace to setup flows on this array. Userspace currently implements only support for basic version of Geneve. It can work with the base header (including the VNI) and is capable of parsing options but does not currently support any particular option definitions. Over time, the intention is to allow options to be matched through OpenFlow without requiring explicit support in OVS userspace. Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2014-06-20 15:19:35 -07:00
Jesse Gross	f0cd669f19	datapath: Wrap struct ovs_key_ipv4_tunnel in a new structure. Currently, the flow information that is matched for tunnels and the tunnel data passed around with packets is the same. However, as additional information is added this is not necessarily desirable, as in the case of pointers. This adds a new structure for tunnel metadata which currently contains only the existing struct. This change is purely internal to the kernel since the current OVS_KEY_ATTR_IPV4_TUNNEL is simply a compressed version of OVS_KEY_ATTR_TUNNEL that is translated at flow setup. Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2014-06-19 18:33:28 -07:00
Pravin B Shelar	d49fc3ff53	datapath: Convert mask list in mask array. mask caches index of mask in mask_list. On packet recv OVS need to traverse mask-list to get cached mask. Therefore array is better for retrieving cached mask. This also allows better cache replacement algorithm by directly checking mask's existence. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Thomas Graf <tgraf@redhat.com>	2014-04-29 10:43:50 -07:00
Andy Zhou	a605908001	datapath: add recirc action Recirculation implementation for Linux kernel data path. Signed-off-by: Andy Zhou <azhou@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2014-04-21 11:06:55 -07:00
Andy Zhou	7804df205f	datapath: add hash action Signed-off-by: Andy Zhou <azhou@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2014-04-21 11:06:55 -07:00
Ben Pfaff	a5b8d49bc6	datapath: Fix tracking of flags seen in TCP flows. Flow statistics need to take into account the TCP flags from the packet currently being processed (in 'key'), not the TCP flags matched by the flow found in the kernel flow table (in 'flow'). This bug made the Open vSwitch userspace fin_timeout action have no effect in many cases. Bug #1219516. Reported-by: Len Gao <leng@vmware.com> Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2014-04-08 15:39:18 -07:00
Jarno Rajahalme	4bb90bea0c	datapath/flow: Fix ovs_flow_stats_get/clear RCU dereference. For ovs_flow_stats_get() using ovsl_dereference() was wrong, since flow dumps call this with RCU read lock. ovs_flow_stats_clear() is always called with ovs_mutex, so can use ovsl_dereference(). Also, make the ovs_flow_stats_get() 'flow' argument const to make later patches cleaner. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-04-02 11:14:58 -07:00
Jarno Rajahalme	708fb4c50a	datapath: Compact sw_flow_key. Minimize padding in sw_flow_key and move 'tp' top the main struct. These changes simplify code when accessing the transport port numbers and the tcp flags, and makes the sw_flow_key 8 bytes smaller on 64-bit systems (128->120 bytes). These changes also make the keys for IPv4 packets to fit in one cache line. There is a valid concern for safety of packing the struct ovs_key_ipv4_tunnel, as it would be possible to take the address of the tun_id member as a __be64 * which could result in unaligned access in some systems. However: - sw_flow_key itself is 64-bit aligned, so the tun_id within is always 64-bit aligned. - We never make arrays of ovs_key_ipv4_tunnel (which would force every second tun_key to be misaligned). - We never take the address of the tun_id in to a __be64 *. - Whereever we use struct ovs_key_ipv4_tunnel outside the sw_flow_key, it is in stack (on tunnel input functions), where compiler has full control of the alignment. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-03-24 10:45:47 -07:00
Jarno Rajahalme	9ac56358de	datapath: Per NUMA node flow stats. Keep kernel flow stats for each NUMA node rather than each (logical) CPU. This avoids using the per-CPU allocator and removes most of the kernel-side OVS locking overhead otherwise on the top of perf reports and allows OVS to scale better with higher number of threads. With 9 handlers and 4 revalidators netperf TCP_CRR test flow setup rate doubles on a server with two hyper-threaded physical CPUs (16 logical cores each) compared to the current OVS master. Tested with non-trivial flow table with a TCP port match rule forcing all new connections with unique port numbers to OVS userspace. The IP addresses are still wildcarded, so the kernel flows are not considered as exact match 5-tuple flows. This type of flows can be expected to appear in large numbers as the result of more effective wildcarding made possible by improvements in OVS userspace flow classifier. Perf results for this test (master): Events: 305K cycles + 8.43% ovs-vswitchd [kernel.kallsyms] [k] mutex_spin_on_owner + 5.64% ovs-vswitchd [kernel.kallsyms] [k] __ticket_spin_lock + 4.75% ovs-vswitchd ovs-vswitchd [.] find_match_wc + 3.32% ovs-vswitchd libpthread-2.15.so [.] pthread_mutex_lock + 2.61% ovs-vswitchd [kernel.kallsyms] [k] pcpu_alloc_area + 2.19% ovs-vswitchd ovs-vswitchd [.] flow_hash_in_minimask_range + 2.03% swapper [kernel.kallsyms] [k] intel_idle + 1.84% ovs-vswitchd libpthread-2.15.so [.] pthread_mutex_unlock + 1.64% ovs-vswitchd ovs-vswitchd [.] classifier_lookup + 1.58% ovs-vswitchd libc-2.15.so [.] 0x7f4e6 + 1.07% ovs-vswitchd [kernel.kallsyms] [k] memset + 1.03% netperf [kernel.kallsyms] [k] __ticket_spin_lock + 0.92% swapper [kernel.kallsyms] [k] __ticket_spin_lock ... And after this patch: Events: 356K cycles + 6.85% ovs-vswitchd ovs-vswitchd [.] find_match_wc + 4.63% ovs-vswitchd libpthread-2.15.so [.] pthread_mutex_lock + 3.06% ovs-vswitchd [kernel.kallsyms] [k] __ticket_spin_lock + 2.81% ovs-vswitchd ovs-vswitchd [.] flow_hash_in_minimask_range + 2.51% ovs-vswitchd libpthread-2.15.so [.] pthread_mutex_unlock + 2.27% ovs-vswitchd ovs-vswitchd [.] classifier_lookup + 1.84% ovs-vswitchd libc-2.15.so [.] 0x15d30f + 1.74% ovs-vswitchd [kernel.kallsyms] [k] mutex_spin_on_owner + 1.47% swapper [kernel.kallsyms] [k] intel_idle + 1.34% ovs-vswitchd ovs-vswitchd [.] flow_hash_in_minimask + 1.33% ovs-vswitchd ovs-vswitchd [.] rule_actions_unref + 1.16% ovs-vswitchd ovs-vswitchd [.] hindex_node_with_hash + 1.16% ovs-vswitchd ovs-vswitchd [.] do_xlate_actions + 1.09% ovs-vswitchd ovs-vswitchd [.] ofproto_rule_ref + 1.01% netperf [kernel.kallsyms] [k] __ticket_spin_lock ... There is a small increase in kernel spinlock overhead due to the same spinlock being shared between multiple cores of the same physical CPU, but that is barely visible in the netperf TCP_CRR test performance (maybe ~1% performance drop, hard to tell exactly due to variance in the test results), when testing for kernel module throughput (with no userspace activity, handful of kernel flows). On flow setup, a single stats instance is allocated (for the NUMA node 0). As CPUs from multiple NUMA nodes start updating stats, new NUMA-node specific stats instances are allocated. This allocation on the packet processing code path is made to never block or look for emergency memory pools, minimizing the allocation latency. If the allocation fails, the existing preallocated stats instance is used. Also, if only CPUs from one NUMA-node are updating the preallocated stats instance, no additional stats instances are allocated. This eliminates the need to pre-allocate stats instances that will not be used, also relieving the stats reader from the burden of reading stats that are never used. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-02-18 09:56:55 -08:00
Jarno Rajahalme	df65fec117	datapath: Remove 5-tuple optimization. The 5-tuple optimization becomes unnecessary with a later per-NUMA node stats patch. Remove it first to make the changes easier to grasp. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-02-18 09:07:03 -08:00
Pravin B Shelar	b0f3a2feef	datapath: Use percpu allocator for flow-stats. Use percpu allocator for stats due to objection to stats array. But percpu allocator is not designed for high churn allocation/ deallcation. so we need to avoid allocating percpu flow for short lived flows. One cheaper way to detect flow is by checking if 5-tuple used in RSS are masked or not. if any one of them is masked, flow is likely shared across CPU where percpu stat should be more scalable. And that flow should be relatively long lived flow. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2013-12-03 08:57:56 -08:00
Ben Pfaff	8c19b83f86	datapath: Shrink sw_flow_mask by 8 bytes (64-bit) or 4 bytes (32-bit). We won't normally have a ton of flow masks but using a size_t to store values no bigger than sizeof(struct sw_flow_key) seems excessive. This reduces sw_flow_key_range and sw_flow_mask by 4 bytes on 32-bit systems. On 64-bit systems it shrinks sw_flow_key_range by 12 bytes but sw_flow_mask only by 8 bytes due to padding. Compile tested only. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com> Acked-by: Andy Zhou <azhou@nicira.com>	2013-11-18 14:59:26 -08:00
Jarno Rajahalme	dc235f7fbc	TCP flags matching support. tcp_flags=flags/mask Bitwise match on TCP flags. The flags and mask are 16-bit num‐ bers written in decimal or in hexadecimal prefixed by 0x. Each 1-bit in mask requires that the corresponding bit in port must match. Each 0-bit in mask causes the corresponding bit to be ignored. TCP protocol currently defines 9 flag bits, and additional 3 bits are reserved (must be transmitted as zero), see RFCs 793, 3168, and 3540. The flag bits are, numbering from the least significant bit: 0: FIN No more data from sender. 1: SYN Synchronize sequence numbers. 2: RST Reset the connection. 3: PSH Push function. 4: ACK Acknowledgement field significant. 5: URG Urgent pointer field significant. 6: ECE ECN Echo. 7: CWR Congestion Windows Reduced. 8: NS Nonce Sum. 9-11: Reserved. 12-15: Not matchable, must be zero. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2013-10-29 09:43:59 -07:00
Jarno Rajahalme	a66733a8bc	Widen TCP flags handling. Widen TCP flags handling from 7 bits (uint8_t) to 12 bits (uint16_t). The kernel interface remains at 8 bits, which makes no functional difference now, as none of the higher bits is currently of interest to the userspace. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2013-10-29 09:40:19 -07:00
Pravin B Shelar	b0b906ccf4	datapath: Per cpu flow stats. With mega flow implementation ovs flow can be shared between multiple CPUs which makes stats updates highly contended operation. Following patch allocates separate stats for each CPU to make stats update scalable. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2013-10-21 08:42:20 -07:00
Pravin B Shelar	a097c0b230	datapath: Restructure datapath.c and flow.c Over the time datapath.c and flow.c has became pretty large files. Following patch restructures functionality of component into three different components: flow.c: contains flow extract. flow_netlink.c: netlink flow api. flow_table.c: flow table api. Diffstat is showing wrong count. This patch mostly restructures code without changing logic. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2013-10-01 17:11:16 -07:00
Jesse Gross	9a6fe4c1ae	datapath: Fix alignment of struct sw_flow_key. sw_flow_key alignment was declared as " __aligned(__alignof__(long))". However, this breaks on the m68k architecture where long is 32 bit in size but 16 bit aligned by default. This aligns to the size of a long to ensure that we can always do comparsions in full long-sized chunks. It also adds an additional build check to catch any reduction in alignment. CC: Andy Zhou <azhou@nicira.com> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-09-05 13:03:51 -07:00
Andy Zhou	c2dd5e999f	datapath: optimize flow compare and mask functions Make sure the sw_flow_key structure and valid mask boundaries are always machine word aligned. Optimize the flow compare and mask operations using machine word size operations. This patch improves throughput on average by 15% when CPU is the bottleneck of forwarding packets. This patch is inspired by ideas and code from a patch submitted by Peter Klausler titled "replace memcmp() with specialized comparator". However, The original patch only optimizes for architectures support unaligned machine word access. This patch optimizes for all architectures. Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-08-27 12:53:12 -07:00
Andy Zhou	cc611f66a6	datapath: Rename key_len to key_end Key_end is a better name describing the ending boundary than key_len. Rename those variables to make it less confusing. Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-08-22 11:39:53 -07:00
Joe Stringer	10f72e3da9	datapath: Add SCTP support This patch adds support for rewriting SCTP src,dst ports similar to the functionality already available for TCP/UDP. Rewriting SCTP ports is expensive due to double-recalculation of the SCTP checksums; this is performed to ensure that packets traversing OVS with invalid checksums will continue to the destination with any checksum corruption intact. Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: Joe Stringer <joe@wand.net.nz> Signed-off-by: Ben Pfaff <blp@nicira.com>	2013-08-22 09:29:39 -07:00
Pravin B Shelar	adda018cb8	datapath: move tunnel-init function to flow.h This makes ovs-module more in-sync with upstream ovs-module. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2013-08-13 00:13:54 -07:00
Jesse Gross	54cc3ec63a	datapath: Add 'ovs_' prefix to extern symbols. The external symbols in the OVS kernel module are prefixed with 'ovs_' with the exception of ipv4_tun_to/from_nlattr(). This adds the prefix and makes the out of tree version consistent with upstream. Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Andy Zhou <azhou@nicira.com>	2013-08-09 12:47:43 -07:00
Andy Zhou	3d09706410	datapath: remove RCU annotation from flow->mask After a mask is assigned to a flow, it will not change for the life of the flow. Since flow access is protected by RCU lock, access to flow->mask after getting a flow is always safe. Suggested-by: Jesse Gross <jesse@nicira.com> Reported-by: Ben Pfaff <blp@nicira.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-07-22 15:24:42 -07:00
Jesse Gross	529db6351a	datapath: Use masked flow when validating actions. It is important to validate flow actions to ensure that they do not try to write off the end of a packet. The mechanism to do this is to ensure that a flow is precise enough to describe valid vs. invalid packets and only allowing actions on valid flows. The introduction of megaflows broke this by using a narrow base flow but a potentially wide match. This meant that while the original flow was properly validated, later packets might not conform to that flow and could be truncated. This switches to using the masked flow instead, effectively requiring that all possible matching packets be valid in order for a flow's actions to be accepted. This change only affects the flow setup path - executed packets have always used the flow extracted from the packet and therefore were properly validated. Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-07-16 15:37:24 -07:00
Andy Zhou	ede77a461f	datapath: Fix a kernel crash caused by corrupted mask list. When flow table is copied, the mask list from the old table is not properly copied into the new table. The corrupted mask list in the new table will lead to kernel crash. This patch fixes this bug. Bug #18110 Reported-by: Justin Pettit <jpettit@nicira.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-06-21 23:12:12 -07:00
Pravin B Shelar	5ebaf571f9	gre: Restructure tunneling. Following patch restructures ovs tunneling and gre vport implementation to make ovs tunneling more in sync with upstream kernel tunneling. Doing this tunneling code is simplified as most of protocol processing on send and recv is pushed to kernel tunneling. For external ovs module the code is moved to kernel compatibility code. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2013-06-20 23:04:29 -07:00
Andy Zhou	a1c564be1e	datapath: Mega flow implementation Add wildcarded flow support in kernel datapath. Wildcarded flow can improve OVS flow set up performance by avoid sending matching new flows to the user space program. The exact performance boost will largely dependent on wildcarded flow hit rate. In case all new flows hits wildcard flows, the flow set up rate is within 5% of that of linux bridge module. Pravin has made significant contributions to this patch. Including API clean ups and bug fixes. Co-authored-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Andy Zhou <azhou@nicira.com> [jesse: Additional documentation, fix memory leak, and improve validation.] Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-06-20 10:03:36 -07:00
Thomas Graf	0afa23732d	datapath: Refine Netlink message size calculation and kill FLOW_BUFSIZE Kills the FLOW_BUFSIZE constant which needs to be calculated manually and replaces it with key_attr_size() based on nla_total_size(). Calculates the size of datapath messages instead of relying on NLMSG_DEFAULT_SIZE and moves the existing message size calculations into own functions for clarity. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-03-29 18:18:58 -07:00
Pravin B Shelar	85c9de194b	Tunnel: Cleanup old tunnel infrastructure. Since userspace flow based tunneling code is checked in, the kernel port based tunneling code can be removed. Patch removes following components: - tunnel ports hash table and moved tunnel ports list to individual vports. - Cleaned per tnl-port config. - OVS_KEY_ATTR_TUN_ID action is removed. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com> Bug #15078	2013-03-04 13:00:25 -08:00
Pravin B Shelar	9ccb22ec5f	datapath: Increase maximum allocation size of action list. The switch to flow based tunneling increased the size of each output action in the flow action list. In extreme cases, this can result in the action list exceeding the maximum buffer size. This doubles the maximum buffer size to compensate for the increase in action size. Action list is recieved from netlink callback which is allocating linear-skb, therefore allocating another multi-page buffer would not increase probability of the allocation-failure a lot. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com> Bug #15203	2013-02-28 17:52:52 -08:00
Pravin B Shelar	ba4004356c	Revert "datapath: Increase maximum allocation size of action list." This reverts commit `82b0d75509`. This patch introduced bug by calling vfree() from interrupt context. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2013-02-28 17:52:34 -08:00
Pravin B Shelar	82b0d75509	datapath: Increase maximum allocation size of action list. The switch to flow based tunneling increased the size of each output action in the flow action list. In extreme cases, this can result in the action list exceeding the maximum buffer size. This doubles the maximum buffer size to compensate for the increase in action size. In the common case, most allocations will be less than a page and those uses kmalloc. Therefore, for the majority of situations, this will have no impact. Bug #15203 Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2013-02-22 17:16:11 -08:00

1 2 3

112 Commits