mir/ovs - ovs - Mike's Git repositories

mir/ovs

mirror of https://github.com/openvswitch/ovs synced 2025-08-29 05:18:13 +00:00

Author	SHA1	Message	Date
Jan Scheurich	beb75a40fd	userspace: Switching of L3 packets in L2 pipeline Ports have a new layer3 attribute if they send/receive L3 packets. The packet_type included in structs dp_packet and flow is considered in ofproto-dpif. The classical L2 match fields (dl_src, dl_dst, dl_type, and vlan_tci, vlan_vid, vlan_pcp) now have Ethernet as pre-requisite. A dummy ethernet header is pushed to L3 packets received from L3 ports before the the pipeline processing starts. The ethernet header is popped before sending a packet to a L3 port. For datapath ports that can receive L2 or L3 packets, the packet_type becomes part of the flow key for datapath flows and is handled appropriately in dpif-netdev. In the 'else' branch in flow_put_on_pmd() function, the additional check flow_equal(&match.flow, &netdev_flow->flow) was removed, as a) the dpcls lookup is sufficient to uniquely identify a flow and b) it caused false negatives because the flow in netdev->flow may not properly masked. In dpif_netdev_flow_put() we now use the same method for constructing the netdev_flow_key as the one used when adding the flow to the dplcs to make sure these always match. The function netdev_flow_key_from_flow() used so far was not only inefficient but sometimes caused mismatches and subsequent flow update failures. The kernel datapath does not support the packet_type match field. Instead it encodes the packet type implictly by the presence or absence of the Ethernet attribute in the flow key and mask. This patch filters the PACKET_TYPE attribute out of netlink flow key and mask to be sent to the kernel datapath. Signed-off-by: Lorand Jakab <lojakab@cisco.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Co-authored-by: Zoltan Balogh <zoltan.balogh@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2017-06-02 10:15:20 -07:00
Jan Scheurich	88fc528162	userspace: Support for push_eth and pop_eth actions Add support for actions push_eth and pop_eth to the netdev datapath and the supporting libraries. This patch relies on the support for these actions in the kernel datapath to be present. Signed-off-by: Lorand Jakab <lojakab@cisco.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Jean Tourrilhes <jt@labs.hpe.com> Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Co-authored-by: Zoltan Balogh <zoltan.balogh@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2017-05-08 10:31:50 -04:00
Zong Kai LI	86d46f3c18	lib: rename ovs_nd_opt to ovs_nd_lla_opt Since ovs_nd_mtu_opt and ovs_nd_prefix_opt is introducted, rename ovs_nd_opt to ovs_nd_lla_opt to specify it's Source/Target Link-layer Address Option. Signed-off-by: Zongkai LI <zealokii@gmail.com> Signed-off-by: Numan Siddique <nusiddiq@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2017-05-04 16:44:11 -07:00
Jan Scheurich	2482b0b0c8	userspace: Add packet_type in dp_packet and flow This commit adds a packet_type attribute to the structs dp_packet and flow to explicitly carry the type of the packet as prepration for the introduction of the so-called packet type-aware pipeline (PTAP) in OVS. The packet_type is a big-endian 32 bit integer with the encoding as specified in OpenFlow verion 1.5. The upper 16 bits contain the packet type name space. Pre-defined values are defined in openflow-common.h: enum ofp_header_type_namespaces { OFPHTN_ONF = 0, /* ONF namespace. / OFPHTN_ETHERTYPE = 1, / ns_type is an Ethertype. / OFPHTN_IP_PROTO = 2, / ns_type is a IP protocol number. / OFPHTN_UDP_TCP_PORT = 3, / ns_type is a TCP or UDP port. / OFPHTN_IPV4_OPTION = 4, / ns_type is an IPv4 option number. */ }; The lower 16 bits specify the actual type in the context of the name space. Only name spaces 0 and 1 will be supported for now. For name space OFPHTN_ONF the relevant packet type is 0 (Ethernet). This is the default packet_type in OVS and the only one supported so far. Packets of type (OFPHTN_ONF, 0) are called Ethernet packets. In name space OFPHTN_ETHERTYPE the type is the Ethertype of the packet. A packet of type (OFPHTN_ETHERTYPE, <Ethertype>) is a standard L2 packet whith the Ethernet header (and any VLAN tags) removed to expose the L3 (or L2.5) payload of the packet. These will simply be called L3 packets. The Ethernet address fields dl_src and dl_dst in struct flow are not applicable for an L3 packet and must be zero. However, to maintain compatibility with the large code base, we have chosen to copy the Ethertype of an L3 packet into the the dl_type field of struct flow. This does not mean that it will be possible to match on dl_type for L3 packets with PTAP later on. Matching must be done on packet_type instead. New dp_packets are initialized with packet_type Ethernet. Ports that receive L3 packets will have to explicitly adjust the packet_type. Signed-off-by: Jean Tourrilhes <jt@labs.hpe.com> Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Co-authored-by: Zoltan Balogh <zoltan.balogh@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2017-05-03 16:56:40 -07:00
Jarno Rajahalme	c30b4ceafa	datapath: Add original direction conntrack tuple to sw_flow_key. Upstream commit: commit 9dd7f8907c3705dc7a7a375d1c6e30b06e6daffc Author: Jarno Rajahalme <jarno@ovn.org> Date: Thu Feb 9 11:21:59 2017 -0800 openvswitch: Add original direction conntrack tuple to sw_flow_key. Add the fields of the conntrack original direction 5-tuple to struct sw_flow_key. The new fields are initially marked as non-existent, and are populated whenever a conntrack action is executed and either finds or generates a conntrack entry. This means that these fields exist for all packets that were not rejected by conntrack as untrackable. The original tuple fields in the sw_flow_key are filled from the original direction tuple of the conntrack entry relating to the current packet, or from the original direction tuple of the master conntrack entry, if the current conntrack entry has a master. Generally, expected connections of connections having an assigned helper (e.g., FTP), have a master conntrack entry. The main purpose of the new conntrack original tuple fields is to allow matching on them for policy decision purposes, with the premise that the admissibility of tracked connections reply packets (as well as original direction packets), and both direction packets of any related connections may be based on ACL rules applying to the master connection's original direction 5-tuple. This also makes it easier to make policy decisions when the actual packet headers might have been transformed by NAT, as the original direction 5-tuple represents the packet headers before any such transformation. When using the original direction 5-tuple the admissibility of return and/or related packets need not be based on the mere existence of a conntrack entry, allowing separation of admission policy from the established conntrack state. While existence of a conntrack entry is required for admission of the return or related packets, policy changes can render connections that were initially admitted to be rejected or dropped afterwards. If the admission of the return and related packets was based on mere conntrack state (e.g., connection being in an established state), a policy change that would make the connection rejected or dropped would need to find and delete all conntrack entries affected by such a change. When using the original direction 5-tuple matching the affected conntrack entries can be allowed to time out instead, as the established state of the connection would not need to be the basis for packet admission any more. It should be noted that the directionality of related connections may be the same or different than that of the master connection, and neither the original direction 5-tuple nor the conntrack state bits carry this information. If needed, the directionality of the master connection can be stored in master's conntrack mark or labels, which are automatically inherited by the expected related connections. The fact that neither ARP nor ND packets are trackable by conntrack allows mutual exclusion between ARP/ND and the new conntrack original tuple fields. Hence, the IP addresses are overlaid in union with ARP and ND fields. This allows the sw_flow_key to not grow much due to this patch, but it also means that we must be careful to never use the new key fields with ARP or ND packets. ARP is easy to distinguish and keep mutually exclusive based on the ethernet type, but ND being an ICMPv6 protocol requires a bit more attention. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> This patch squashes in minimal amount of OVS userspace code to not break the build. Later patches contain the full userspace support. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>	2017-03-08 17:22:47 -08:00
Jarno Rajahalme	5dddf96065	dpif: Meter framework. Add DPIF-level infrastructure for meters. Allow meter_set to modify the meter configuration (e.g. set the burst size if unspecified). Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: Andy Zhou <azhou@ovn.org>	2017-03-08 13:09:43 -08:00
Sugesh Chandran	911b7e9b08	odp-execute: Apply clone action on batch of packets instead of one by one. Clone action is optimized by cloning a batch of packets together instead of executing independently on every packet in a batch. Signed-off-by: Sugesh Chandran <sugesh.chandran@intel.com> Signed-off-by: Zoltán Balogh <zoltan.balogh@ericsson.com> Co-authored-by: Zoltán Balogh <zoltan.balogh@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2017-03-07 17:05:29 -08:00
Yang, Yi Y	6fcecb85ab	datapath: add Ethernet push and pop actions Upstream commit: commit 91820da6ae85904d95ed53bf3a83f9ec44a6b80a Author: Jiri Benc <jbenc@redhat.com> Date: Thu Nov 10 16:28:23 2016 +0100 openvswitch: add Ethernet push and pop actions It's not allowed to push Ethernet header in front of another Ethernet header. It's not allowed to pop Ethernet header if there's a vlan tag. This preserves the invariant that L3 packet never has a vlan tag. Based on previous versions by Lorand Jakab and Simon Horman. Signed-off-by: Lorand Jakab <lojakab@cisco.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> [Committer notes] Fix build with the upstream commit by folding in the required switch case enum handlers. Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Joe Stringer <joe@ovn.org>	2017-03-02 15:51:39 -08:00
Andy Zhou	70a0573d0a	odp: Fix sample action in userspace User space implementation of the sample action is not consistent with kernel datapath. In kernel datapath, the side effects of actions within the sample actions are not visible to the subsequent actions. Current user space handling does not follow the same logic. This patch makes them consistent. Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>	2017-02-03 14:43:15 -08:00
Andy Zhou	72c84bc2db	dp-packet: Enhance packet batch APIs. One common use case of 'struct dp_packet_batch' is to process all packets in the batch in order. Add an iterator for this use case to simplify the logic of calling sites, Another common use case is to drop packets in the batch, by reading all packets, but writing back pointers of fewer packets. Add macros to support this use case. Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Jarno Rajahalme <jarno@ovn.org>	2017-01-26 17:35:29 -08:00
Andy Zhou	535e3acfa7	dpif-netdev: Add clone action Add support for userspace datapath clone action. The clone action provides an action envelope to enclose an action list. For example, with actions A, B, C and D, and an action list: A, clone(B, C), D The clone action will ensure that: - D will see the same packet, and any meta states, such as flow, as action B. - D will be executed regardless whether B, or C drops a packet. They can only drop a clone. - When B drops a packet, clone will skip all remaining actions within the clone envelope. This feature is useful when we add meter action later: The meter action can be implemented as a simple action without its own envolop (unlike the sample action). When necessary, the flow translation layer can enclose a meter action in clone. The clone action is very similar with the OpenFlow clone action. This is by design to simplify vswitchd flow translation logic. Without datapath clone, vswitchd simulate the effect by inserting datapath actions to "undo" clone actions. The above flow will be translated into A, B, C, -C, -B, D. However, there are two issues: - The resulting datapath action list may be longer without using clone. - Some actions, such as NAT may not be possible to reverse. This patch implements clone() simply with packet copy. The performance can be improved with later patches, for example, to delay or avoid packet copy if possible. It seems datapath should have enough context to carry out such optimization without the userspace context. Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Jarno Rajahalme <jarno@ovn.org>	2017-01-23 22:58:34 -08:00
Jarno Rajahalme	932c96b7b0	odp: Use struct in6_addr for IPv6 addresses. Code is simplified when the ODP keys use the same type as the struct flow for the IPv6 addresses. As the change is facilitated by extract-odp-netlink-h, this change only affects the userspace. We already do the same for the ethernet addresses. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>	2017-01-04 16:31:06 -08:00
Zoltán Balogh	fc05230631	odp-execute: Optimize IP header modification in OVS datapath I measured the packet processing cost of OVS DPDK datapath for different OpenFlow actions. I configured OVS to use a single pmd thread and measured the packet throughput in a phy-to-phy setup. I used 10G interfaces bounded to DPDK driver and overloaded the vSwitch with 64 byte packets through one of the 10G interfaces. The processing cost of the dec_ttl action seemed to be gratuitously high compared with other actions. I looked into the code and saw that dec_ttl is encoded as a masked nested attribute in OVS_ACTION_ATTR_SET_MASKED(OVS_KEY_ATTR_IPV4). That way, OVS datapath can modify several IP header fields (TTL, TOS, source and destination IP addresses) by a single invocation of packet_set_ipv4() in the odp_set_ipv4() function in the lib/odp-execute.c file. The packet_set_ipv4() function takes the new TOS, TTL and IP addresses as arguments, compares them with the actual ones and updates the fields if needed. This means, that even if only TTL needs to be updated, each of the four IP header fields is passed to the callee and is compared to the actual field for each packet. The odp_set_ipv4() caller function possesses information about the fields that need to be updated in the 'mask' structure. The idea is to spare invocation of the packet_set_ipv4() function but use its code parts directly. So the 'mask' can be used to decide which IP header fields need to be updated. In addition, a faster packet processing can be achieved if the values of local variables are calculated right before their usage. \| T \| T \| I \| I \| \| T \| O \| P \| P \| Vanilla OVS \|\| + new patch \| L \| S \| s \| d \| (nsec/packet) \|\| (nsec/packet) -------+---+---+---+---+---------------++--------------- output \| \| \| \| \| 67.19 \|\| 67.19 \| X \| \| \| \| 74.48 \|\| 68.78 \| \| X \| \| \| 74.42 \|\| 70.07 \| \| \| X \| \| 84.62 \|\| 78.03 \| \| \| \| X \| 84.25 \|\| 77.94 \| \| \| X \| X \| 97.46 \|\| 91.86 \| X \| \| X \| X \| 100.42 \|\| 96.00 \| X \| X \| X \| X \| 102.80 \|\| 100.73 The table shows the average processing cost of packets in nanoseconds for the following actions: output; output + dec_ttl; output + mod_nw_tos; output + mod_nw_src; output + mod_nw_dst and some of their combinations. I ran each test five times. The values are the mean of the readings obtained. I added OVS_LIKELY to the 'if' condition for the TTL field, since as far as I know, this field will typically be decremented when any field of the IP header is modified. Signed-off-by: Zoltán Balogh <zoltan.balogh@ericsson.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-12-22 17:55:13 -08:00
Jesse Gross	8d8ab6c2d5	tun-metadata: Manage tunnel TLV mapping table on a per-bridge basis. When using tunnel TLVs (at the moment, this means Geneve options), a controller must first map the class and type onto an appropriate OXM field so that it can be used in OVS flow operations. This table is managed using OpenFlow extensions. The original code that added support for TLVs made the mapping table global as a simplification. However, this is not really logically correct as the OpenFlow management commands are operating on a per-bridge basis. This removes the original limitation to make the table per-bridge. One nice result of this change is that it is generally clearer whether the tunnel metadata is in datapath or OpenFlow format. Rather than allowing ad-hoc format changes and trying to handle both formats in the tunnel metadata functions, the format is more clearly separated by function. Datapaths (both kernel and userspace) use datapath format and it is not changed during the upcall process. At the beginning of action translation, tunnel metadata is converted to OpenFlow format and flows and wildcards are translated back at the end of the process. As an additional benefit, this change improves performance in some flow setup situations by keeping the tunnel metadata in the original packet format in more cases. This helps when copies need to be made as the amount of data touched is only what is present in the packet rather than the maximum amount of metadata supported. Co-authored-by: Madhu Challa <challa@noironetworks.com> Signed-off-by: Madhu Challa <challa@noironetworks.com> Signed-off-by: Jesse Gross <jesse@kernel.org> Acked-by: Ben Pfaff <blp@ovn.org>	2016-09-19 09:52:22 -07:00
William Tu	aaca4fe0ce	ofp-actions: Add truncate action. The patch adds a new action to support packet truncation. The new action is formatted as 'output(port=n,max_len=m)', as output to port n, with packet size being MIN(original_size, m). One use case is to enable port mirroring to send smaller packets to the destination port so that only useful packet information is mirrored/copied, saving some performance overhead of copying entire packet payload. Example use case is below as well as shown in the testcases: - Output to port 1 with max_len 100 bytes. - The output packet size on port 1 will be MIN(original_packet_size, 100). # ovs-ofctl add-flow br0 'actions=output(port=1,max_len=100)' - The scope of max_len is limited to output action itself. The following packet size of output:1 and output:2 will be intact. # ovs-ofctl add-flow br0 \ 'actions=output(port=1,max_len=100),output:1,output:2' - The Datapath actions shows: # Datapath actions: trunc(100),1,1,2 Tested-at: https://travis-ci.org/williamtu/ovs-travis/builds/140037134 Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org>	2016-06-24 09:17:00 -07:00
Pravin B Shelar	1895cc8dbb	dpif-netdev: create batch object DPDK datapath operate on batch of packets. To pass the batch of packets around we use packets array and count. Next patch needs to associate meta-data with each batch of packets. So Introducing a batch structure to make handling the metadata easier. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>	2016-05-18 19:39:18 -07:00
Simon Horman	31a9a58452	packets: use flow protocol when recalculating ipv6 checksums When using masked actions the ipv6_proto field of an action to set IPv6 fields may be zero rather than the prevailing protocol which will result in skipping checksum recalculation. This patch resolves the problem by relying on the protocol in the packet rather than that in the set field action. A similar fix for the kernel datapath has been accepted into David Miller's 'net' tree as b4f70527f052 ("openvswitch: use flow protocol when recalculating ipv6 checksums"). Cc: Jarno Rajahalme <jrajahalme@nicira.com> Fixes: 6d670e7f0d45 ("lib/odp: Masked set action execution and printing.") Signed-off-by: Simon Horman <simon.horman@netronome.com> Acked-by: Ben Pfaff <blp@ovn.org>	2016-04-23 14:41:49 +10:00
Justin Pettit	f6ecf944a9	vswitchd: Allow modifying ICMP type and code. Signed-off-by: Justin Pettit <jpettit@nicira.com> Acked-by: Flavio Leitner <fbl@sysclose.org>	2015-11-09 15:01:50 -08:00
Joe Stringer	9daf23484f	Add connection tracking label support. This patch adds a new 128-bit metadata field to the connection tracking interface. When a label is specified as part of the ct action and the connection is committed, the value is saved with the current connection. Subsequent ct lookups with the table specified will expose this metadata as the "ct_label" field in the flow. For example, to allow new TCP connections from port 1->2 and only allow established connections from port 2->1, and to associate a label with those connections: table=0,priority=1,action=drop table=0,arp,action=normal table=0,in_port=1,tcp,action=ct(commit,exec(set_field:1->ct_label)),2 table=0,in_port=2,ct_state=-trk,tcp,action=ct(table=1) table=1,in_port=2,ct_state=+trk,ct_label=1,tcp,action=1 Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2015-10-13 15:34:16 -07:00
Joe Stringer	8e53fe8cf7	Add connection tracking mark support. This patch adds a new 32-bit metadata field to the connection tracking interface. When a mark is specified as part of the ct action and the connection is committed, the value is saved with the current connection. Subsequent ct lookups with the table specified will expose this metadata as the "ct_mark" field in the flow. For example, to allow new TCP connections from port 1->2 and only allow established connections from port 2->1, and to associate a mark with those connections: table=0,priority=1,action=drop table=0,arp,action=normal table=0,in_port=1,tcp,action=ct(commit,exec(set_field:1->ct_mark)),2 table=0,in_port=2,ct_state=-trk,tcp,action=ct(table=1) table=1,in_port=2,ct_state=+trk,ct_mark=1,tcp,action=1 Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2015-10-13 15:34:15 -07:00
Joe Stringer	07659514c3	Add support for connection tracking. This patch adds a new action and fields to OVS that allow connection tracking to be performed. This support works in conjunction with the Linux kernel support merged into the Linux-4.3 development cycle. Packets have two possible states with respect to connection tracking: Untracked packets have not previously passed through the connection tracker, while tracked packets have previously been through the connection tracker. For OpenFlow pipeline processing, untracked packets can become tracked, and they will remain tracked until the end of the pipeline. Tracked packets cannot become untracked. Connections can be unknown, uncommitted, or committed. Packets which are untracked have unknown connection state. To know the connection state, the packet must become tracked. Uncommitted connections have no connection state stored about them, so it is only possible for the connection tracker to identify whether they are a new connection or whether they are invalid. Committed connections have connection state stored beyond the lifetime of the packet, which allows later packets in the same connection to be identified as part of the same established connection, or related to an existing connection - for instance ICMP error responses. The new 'ct' action transitions the packet from "untracked" to "tracked" by sending this flow through the connection tracker. The following parameters are supported initally: - "commit": When commit is executed, the connection moves from uncommitted state to committed state. This signals that information about the connection should be stored beyond the lifetime of the packet within the pipeline. This allows future packets in the same connection to be recognized as part of the same "established" (est) connection, as well as identifying packets in the reply (rpl) direction, or packets related to an existing connection (rel). - "zone=[u16\|NXM]": Perform connection tracking in the zone specified. Each zone is an independent connection tracking context. When the "commit" parameter is used, the connection will only be committed in the specified zone, and not in other zones. This is 0 by default. - "table=NUMBER": Fork pipeline processing in two. The original instance of the packet will continue processing the current actions list as an untracked packet. An additional instance of the packet will be sent to the connection tracker, which will be re-injected into the OpenFlow pipeline to resume processing in the specified table, with the ct_state and other ct match fields set. If the table is not specified, then the packet is submitted to the connection tracker, but the pipeline does not fork and the ct match fields are not populated. It is strongly recommended to specify a table later than the current table to prevent loops. When the "table" option is used, the packet that continues processing in the specified table will have the ct_state populated. The ct_state may have any of the following flags set: - Tracked (trk): Connection tracking has occurred. - Reply (rpl): The flow is in the reply direction. - Invalid (inv): The connection tracker couldn't identify the connection. - New (new): This is the beginning of a new connection. - Established (est): This is part of an already existing connection. - Related (rel): This connection is related to an existing connection. For more information, consult the ovs-ofctl(8) man pages. Below is a simple example flow table to allow outbound TCP traffic from port 1 and drop traffic from port 2 that was not initiated by port 1: table=0,priority=1,action=drop table=0,arp,action=normal table=0,in_port=1,tcp,ct_state=-trk,action=ct(commit,zone=9),2 table=0,in_port=2,tcp,ct_state=-trk,action=ct(zone=9,table=1) table=1,in_port=2,ct_state=+trk+est,tcp,action=1 table=1,in_port=2,ct_state=+trk+new,tcp,action=drop Based on original design by Justin Pettit, contributions from Thomas Graf and Daniele Di Proietto. Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2015-10-13 15:34:15 -07:00
Jarno Rajahalme	74ff3298c8	userspace: Define and use struct eth_addr. Define struct eth_addr and use it instead of a uint8_t array for all ethernet addresses in OVS userspace. The struct is always the right size, and it can be assigned without an explicit memcpy, which makes code more readable. "struct eth_addr" is a good type name for this as many utility functions are already named accordingly. struct eth_addr can be accessed as bytes as well as ovs_be16's, which makes the struct 16-bit aligned. All use seems to be 16-bit aligned, so some algorithms on the ethernet addresses can be made a bit more efficient making use of this fact. As the struct fits into a register (in 64-bit systems) we pass it by value when possible. This patch also changes the few uses of Linux specific ETH_ALEN to OVS's own ETH_ADDR_LEN, and removes the OFP_ETH_ALEN, as it is no longer needed. This work stemmed from a desire to make all struct flow members assignable for unrelated exploration purposes. However, I think this might be a nice code readability improvement by itself. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>	2015-08-28 14:55:11 -07:00
Jesse Gross	6728d578f6	dpif-netdev: Translate Geneve options per-flow, not per-packet. The kernel implementation of Geneve options stores the TLV option data in the flow exactly as received, without any further parsing. This is then translated to known options for the purposes of matching on flow setup (which will then install a datapath flow in the form the kernel is expecting). The userspace implementation behaves a little bit differently - it looks up known options as each packet is received. The reason for this is there is a much tighter coupling between datapath and flow translation and the representation is generally expected to be the same. This works but it incurs work on a per-packet basis that could be done per-flow instead. This introduces a small translation step for Geneve packets between datapath and flow lookup for the userspace datapath in order to allow the same kind of processing that the kernel does. A side effect of this is that unknown options are now shown when flows dumped via ovs-appctl dpif/dump-flows, similar to the kernel. There is a second benefit to this as well: for some operations it is preferable to keep the options exactly as they were received on the wire, which this enables. One example is that for packets that are executed from ofproto-dpif-upcall to the datapath, this avoids the translation of Geneve metadata. Since this conversion is potentially lossy (for unknown options), keeping everything in the same format removes the possibility of dropping options if the packet comes back up to userspace and the Geneve option translation table has changed. To help with these types of operations, most functions can understand both formats of data and seamlessly do the right thing. Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>	2015-08-05 20:26:48 -07:00
Thomas F. Herbert	d694339457	Add support functions for 8021.ad push and pop vlan. Changes to allow the tpid to be specified and all vlan tpid checking to be generalized. Signed-off-by: Thomas F Herbert <thomasfherbert@gmail.com> Signed-off-by: Ben Pfaff <blp@nicira.com>	2015-06-07 11:35:21 -07:00
Joe Stringer	db8bb9a51e	odp-execute: Refactor determining dpif assistance. To be more explicit about which actions require datapath assistance, split this out into a separate function. While this is fairly trivial currently, there will be more special cases for the upcoming conntrack changes. Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2015-05-29 15:27:58 -07:00
Daniele Di Proietto	2bc1bbd27d	dp-packet: Rename 'dp_hash' in 'rss_hash'. We already have the 'dp_hash' embedded in the metadata. This caused confusion in the code. With this commit it should be clear that 'rss_hash' is the packet hash used for internal purposes, while 'md.dp_hash' is part of the flow, computed during the execution of certain actions. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ethan Jackson <ethan@nicira.com>	2015-04-20 12:49:41 -07:00
Pravin B Shelar	cf62fa4c70	dp-packet: Remove ofpbuf dependency. Currently dp-packet make use of ofpbuf for managing packet buffers. That complicates ofpbuf, by making dp-packet independent of ofpbuf both libraries can be optimized for their own use case. This avoids mapping operation between ofpbuf and dp_packet in datapath upcalls. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2015-03-03 13:37:37 -08:00
Pravin B Shelar	e14deea0bd	dpif_packet: Rename to dp_packet dp_packet is short and better name for datapath packet structure. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>	2015-03-03 13:37:34 -08:00
Sharo, Randall A CIV SPAWARSYSCEN-ATLANTIC, 55200	e60e935b1f	Implement set-field for IPv6 ND fields (nd_target, nd_sll, and nd_tll). This patch adds set-field operations for nd_target, nd_sll, and nd_tll fields, with and without masks, using Nicira extensions and OpenFlow 1.2 protocol. Signed-off-by: Randall A Sharo <randall.sharo at navy.mil> Signed-off-by: Ben Pfaff <blp@nicira.com>	2015-01-13 16:22:44 -08:00
Alex Wang	1368af0c85	FreeBSD: Fix build failure. This commit fixes an include dependency for header ip6.h, on FreeBSD. Without this commit, the gmake of ovs master on FreeBSD will result in the following error. /usr/include/netinet/ip6.h:82: error: field 'ip6_src' has incomplete type /usr/include/netinet/ip6.h:83: error: field 'ip6_dst' has incomplete type Signed-off-by: Alex Wang <alexw@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2015-01-04 21:00:15 -08:00
Pravin B Shelar	a36de779d7	openvswitch: Userspace tunneling. Following patch adds support for userspace tunneling. Tunneling needs three more component first is routing table which is configured by caching kernel routes and second is ARP cache which build automatically by snooping arp. And third is tunnel protocol table which list all listening protocols which is populated by vswitchd as tunnel ports are added. GRE and VXLAN protocol support is added in this patch. Tunneling works as follows: On packet receive vswitchd check if this packet is targeted to tunnel port. If it is then vswitchd inserts tunnel pop action which pops header and sends packet to tunnel port. On packet xmit rather than generating Set tunnel action it generate tunnel push action which has tunnel header data. datapath can use tunnel-push action data to generate header for each packet and forward this packet to output port. Since tunnel-push action contains most of packet header vswitchd needs to lookup routing table and arp table to build this action. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Thomas Graf <tgraf@noironetworks.com> Acked-by: Ben Pfaff <blp@nicira.com>	2014-11-12 15:08:33 -08:00
Jarno Rajahalme	b8778a0d0b	Fix setting transport ports with frags. Packets with 'LATER' fragment do not have a transport header, so it is not possible to either match on or set transport ports on such packets. Matching is prevented by augmenting mf_are_prereqs_ok() with a nw_frag 'LATER' bit check. Setting the transport headers on such packets is prevented in three ways: 1. Flows with an explicit match on nw_frag, where the LATER bit is 1: existing calls to the modified mf_are_prereqs_ok() prohibit using transport header fields (port numbers) in OXM/NXM actions (set_field, move). SET_TP_* actions need a new check on the LATER bit. 2. Flows that wildcard the nw_frag LATER bit: At flow translation time, add calls to mf_are_prereqs_ok() to make sure that we do not use transport ports in flows that do not have them. 3. At action execution time, do not set transport ports, if the packet does not have a full transport header. This ensures that we never call the packet_set functions, that require a valid transport header, with packets that do not have them. For example, if the flow was created with a IPv6 first fragment that had the full TCP header, but the next packet's first fragment is missing them. 3 alone would suffice for correct behavior, but 1 and 2 seem like a right thing to do, anyway. Currently, if we are setting port numbers, we will also match them, due to us tracking the set fields with the same flow_wildcards as the matched fields. Hence, if the incoming port number was not zero, the flow would not match any packets with missing or truncated transport headers. However, relying on no packets having zero port numbers would not be very robust. Also, we may separate the tracking of set and matched fields in the future, which would allow some flows that blindly set port numbers to not match on them at all. For TCP in case 3 we use ofpbuf_get_tcp_payload() that requires the whole (potentially variable size) TCP header to be present. However, when parsing a flow, we only require the fixed size portion of the TCP header to be present, which would be enough to set the port numbers and fix the TCP checksum. Finally, we add tests testing the new behavior. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2014-11-10 13:40:03 -08:00
Pravin B Shelar	41ccaa249c	netdev-dpif: Add metadata to dpif-packet. Today dpif-netdev has single metadat for given batch, since one batch belongs to one port, but soon packets fro single tunnel ports can belong to different ports, so we need to have per packet metadata. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>	2014-10-09 14:12:11 -07:00
Daniele Di Proietto	1164afb6cc	odp-execute: Refactor odp_execute_{actions, sample}() Firstly, with this change, the 'more_actions' parameter is removed and is integrated into 'steal'. Then, every function that receives a batch of packets with 'steal' set to true is responsible for freeing the packets. Finally, odp_execute_actions() and odp_execute_actions__() can be be merged. This also fixes a memory leak in odp_execute_sample(), when the subactions are not executed Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>	2014-10-03 15:04:15 -07:00
Daniele Di Proietto	0057762a2f	odp-execute: Fix memory leak on recirc action If odp_execute_actions() has been called with 'steal' set to true and OVS_ACTION_ATTR_RECIRC as last action, it should allow dp_execute_cb() to steal the packet. Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>	2014-10-03 15:04:15 -07:00
Daniele Di Proietto	f7c2f97d57	lib/odp-execute: Use dpif_packet_set_dp_hash() instead of ->dp_hash When building with DPDK support, 'struct dpif_packet' won't have 'dp_hash' member. dpif_packet_set_dp_hash() and dpif_packet_get_dp_hash() should be used. Furthermore, the masked set action shouldn't read 'md->dp_hash' (which is shared in a batch), but should use dpif_packet_get_dp_hash() to get each packet private hash. This commit fixes the build with DPDK. Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>	2014-09-09 14:21:41 -07:00
Jarno Rajahalme	6d670e7f0d	lib/odp: Masked set action execution and printing. Add a new action type OVS_ACTION_ATTR_SET_MASKED, and support for parsing, printing, and committing them. Masked set actions add a mask, immediately following the netlink attribute data, within the netlink attribute itself. Thus the key attribute size for a masked set action is exactly double of the non-masked set action. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2014-09-08 14:57:08 -07:00
Daniele Di Proietto	61a2647e15	packet-dpif: Add dpif_packet_{get, set}_hash() These function are used to stored the packet hash. 'netdev-dpdk' automatically set this value to the RSS hash returned by the NIC. Other 'netdev's set it to 0 (which is an invalid hash value), so that callers can compute the hash on their own. If DPDK support is enabled, struct dpif_packet's member 'dp_hash' is removed and 'pkt.hash.rss' from DPDK mbuf is used This commit also configure DPDK devices to compute RSS hash for UDP and IPv6 packets Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2014-08-29 16:32:21 -07:00
Ben Pfaff	837eefc76b	Do not seemingly #include Linux-specific headers on other platforms. Until now, the OVS source tree has had a whole maze of header files that make "#include <linux/openvswitch.h>" work OK regardless of platform, but this confuses everyone new to the tree, at first glance, and is difficult to understand at second glance too. This commit renames include/linux/openvswitch.h to datapath/linux/compat/include/linux/openvswitch.h without other change, then modifies the userspace build to generate a header that makes sense in portable Open vSwitch userspace from that header. It then removes all the remaining include/linux/* files since they are now unused. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2014-08-04 11:11:40 -07:00
Pravin B Shelar	e381def971	lib: Rename ofp to buf. dpif-packet contains ofpbuf which points to packet data. Here buf is better name rather than ofp. Following patch renames all remaining instances of ofp variable. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Daniele Di Proietto <ddiproietto@vmware.com>	2014-06-25 09:28:42 -07:00
Daniele Di Proietto	8cbf4f479b	dpif-netdev: batch packet processing This change in dpif-netdev allows faster packet processing for devices which implement batching (netdev-dpdk currently). Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2014-06-23 14:41:15 -07:00
Daniele Di Proietto	910885540a	dpif-netdev: use dpif_packet structure for packets This commit introduces a new data structure used for receiving packets from netdevs and passing them to dpifs. The purpose of this change is to allow storing some private data for each packet. The subsequent commits make use of it. Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2014-06-23 14:41:12 -07:00
Andy Zhou	c6bf49f3fa	dpif: Fix slow action handling for DP_HASH and RECIRC In case DP_HASH and RECIRC actions need to be executed in slow path, current implementation simply don't handle them -- vswitchd simply crashes. This patch fixes them by supply an implementation for them. RECIRC will be handled by the datapath, same as the output action. DP_HASH, on the other hand, is handled in the user space. Although the resulting hash values may not match those computed by the datapath, it is less expensive; current use case (bonding) does not require a strict match to work properly. Reported-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp> Signed-off-by: Andy Zhou <azhou@nicira.com> Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>	2014-06-04 14:06:40 -07:00
Andy Zhou	347bf289b3	dpif-netdev: Move hash function out of the recirc action, into its own action Currently recirculation action can optionally compute hash. This patch adds a hash action that is independent of the recirc action, which no longer computes hash. For megaflow bond with recirc, the output to a bond port action will look like: hash(hash_l4(0)), recirc(<recirc_id>) Obviously, when a recirculation application that does not depend on hash value can just use the recirc action alone. Signed-off-by: Andy Zhou <azhou@nicira.com> Reviewed-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp> Acked-by: Pravin B Shelar <pshelar@nicira.com	2014-04-16 15:30:42 -07:00
Andy Zhou	adcf00ba35	ofproto/bond: Implement bond megaflow using recirculation Infrastructure to enable megaflow support for bond ports using recirculation. This patch adds the following features: * Generate RECIRC action when bond can benefit from recirculation. * Populate post recirculation rules in a hidden table. Currently table 254. * Uses post recirculation rules for bond rebalancing * A recirculation implementation in dpif-netdev. The goal of this patch is to be able to megaflow bond outputs and thus greatly improve performance. However, this patch does not actually improve the megaflow generation. It is left for a later commit. Signed-off-by: Andy Zhou <azhou@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2014-04-07 19:55:30 -07:00
Jarno Rajahalme	cf3b753866	ofpbuf: Abstract 'l2' pointer and document usage conventions. Rename 'l2' to 'frame' and add new ofpbuf_set_frame() and ofpbuf_l2(). ofpbuf_set_frame() alse resets all the layer offsets. ofpbuf_l2() returns NULL if the packet has no Ethernet header, as indicated either by unset l3 offset or NULL frame pointer. Callers of ofpbuf_l2() are supposed to check the return value, unless they can otherwise be sure that the packet has a valid Ethernet header. The recent commit 437d0d22 made some assumptions that were not valid regarding the use of the 'l2' pointer in rconn module and by compose_rarp(). This is now fixed as follows: rconn now relies on the fact that once OpenFlow messages are given to rconn for transport, the frame pointer is no longer needed to refer to the OpenFlow header; and compose_rarp() now sets the frame pointer and offsets as expected. In addition to storing network frames, ofpbufs are also used for handling OpenFlow messages and action lists. lib/ofpbuf.h now has a comment documenting the current usage conventions and invariants. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>	2014-04-03 11:51:59 -07:00
Jarno Rajahalme	6b8c377a6e	ofpbuf: Rename trivial _get_ functions without the "get". Code reads better without the "get", for example "ofpbuf_l3()" v.s. "ofpbuf_get_l3()". L4 payoad access functions still use the "get" (e.g., "ofpbuf_get_tcp_payload()"). Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>	2014-04-03 11:51:54 -07:00
Jarno Rajahalme	437d0d22ab	lib/ofpbuf: Compact This patch shrinks the struct ofpbuf from 104 to 48 bytes on 64-bit systems, or from 52 to 36 bytes on 32-bit systems (counting in the 'l7' removal from an earlier patch). This may help contribute to cache efficiency, and will speed up initializing, copying and manipulating ofpbufs. This is potentially important for the DPDK datapath, but the rest of the code base may also see a little benefit. Changes are: - Remove 'l7' pointer (previous patch). - Use offsets instead of layer pointers for l2_5, l3, and l4 using 'l2' as basis. Usually 'data' is the same as 'l2', but this is not always the case (e.g., when parsing or constructing a packet), so it can not be easily used as the offset basis. Also, packet parsing is faster if we do not need to maintain the offsets each time we pull data from the ofpbuf. - Use uint32_t for 'allocated' and 'size', as 2^32 is enough even for largest possible messages/packets. - Use packed enum for 'source'. - Rearrange to avoid unnecessary padding. - Remove 'private_p', which was used only in two cases, both of which had the invariant ('l2' == 'data'), so we can temporarily use 'l2' as a private pointer. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>	2014-03-29 17:22:19 -07:00
Andy Zhou	572f732ab0	dpif-netdev: user space datapath recirculation Add basic recirculation infrastructure and user space data path support for it. The following bond mega flow patch will make use of this infrastructure. Signed-off-by: Andy Zhou <azhou@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2014-03-25 13:24:39 -07:00
Pravin	df1e5a3bc7	netdev: Extend rx_recv to pass multiple packets. DPDK can receive multiple packets but current netdev API does not allow that. Following patch allows dpif-netdev receive batch of packet in a rx_recv() call for any netdev port. This will be used by dpdk-netdev. Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-03-21 11:48:28 -07:00

1 2

66 Commits