ofpbuf was complicated due to its wide usage across all
layers of OVS, Now we have introduced independent dp_packet
which can be used for datapath packet, we can simplify ofpbuf.
Following patch removes DPDK mbuf and access API of ofpbuf
members.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
dp_packet is short and better name for datapath packet
structure.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
This patch adds set-field operations for nd_target, nd_sll, and nd_tll
fields, with and without masks, using Nicira extensions and OpenFlow 1.2
protocol.
Signed-off-by: Randall A Sharo <randall.sharo at navy.mil>
Signed-off-by: Ben Pfaff <blp@nicira.com>
struct list is a common name and can't be used in public headers.
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Following patch adds support for userspace tunneling. Tunneling
needs three more component first is routing table which is configured by
caching kernel routes and second is ARP cache which build automatically
by snooping arp. And third is tunnel protocol table which list all
listening protocols which is populated by vswitchd as tunnel ports
are added. GRE and VXLAN protocol support is added in this patch.
Tunneling works as follows:
On packet receive vswitchd check if this packet is targeted to tunnel
port. If it is then vswitchd inserts tunnel pop action which pops
header and sends packet to tunnel port.
On packet xmit rather than generating Set tunnel action it generate
tunnel push action which has tunnel header data. datapath can use
tunnel-push action data to generate header for each packet and
forward this packet to output port. Since tunnel-push action
contains most of packet header vswitchd needs to lookup routing
table and arp table to build this action.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Ethernet frames may contain padding after the IP payload. When
parsing IP packets, check the IP total size (IPv4) or IP payload size
(IPv6) to detect the size of l2 padding. The l2 padding size is
stored in the ofpbuf to prevent ofpbuf_pull from entering the padding,
as well as to allow ofpbuf_l4_size() to return the size of the IP
payload without the l2 padding.
This helps avoiding parsing truncated transport headers, for example.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
The return type of ofpbuf_tail() and ofpbuf_end() is pointer, not byte.
Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
DPDK mempools rely on rte_lcore_id() to implement a thread-local cache.
Our non pmd threads had rte_lcore_id() == 0. This allowed concurrent access to
the "thread-local" cache, causing crashes.
This commit resolves the issue with the following changes:
- Every non pmd thread has the same lcore_id (0, for management reasons), which
is not shared with any pmd thread (lcore_id for pmd threads now start from 1)
- DPDK mbufs must be allocated/freed in pmd threads. When there is the need to
use mempools in non pmd threads, like in dpdk_do_tx_copy(), a mutex must be
held.
- The previous change does not allow us anymore to pass DPDK mbufs to handler
threads: therefore this commit partially revert 143859ec63. Now packets
are copied for upcall processing. We can remove the extra memcpy by
processing upcalls in the pmd thread itself.
With the introduction of the extra locking, the packet throughput will be lower
in the following cases:
- When using internal (tap) devices with DPDK devices on the same datapath.
Anyway, to support internal devices efficiently, we needed DPDK KNI devices,
which will be proper pmd devices and will not need this locking.
- When packets are processed in the slow path by non pmd threads. This overhead
can be avoided by handling the upcalls directly in pmd threads (a change that
has already been proposed by Ryan Wilson)
Also, the following two fixes have been introduced:
- In dpdk_free_buf() use rte_pktmbuf_free_seg() instead of rte_mempool_put().
This allows OVS to run properly with CONFIG_RTE_LIBRTE_MBUF_DEBUG DPDK option
- Do not bulk free mbufs in a transmission queue. They may belong to different
mempools
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
This commit introduces a new data structure used for receiving packets from
netdevs and passing them to dpifs.
The purpose of this change is to allow storing some private data for each
packet. The subsequent commits make use of it.
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
When a bridge of datatype type netdev receives a packet, it
copies the packet from the NIC to a buffer in userspace.
Currently, when making an upcall, the packet is again copied
to the upcall's buffer. However, this extra copy is not
necessary when the datapath exists in userspace as the upcall
can directly access the packet data.
This patch eliminates this extra copy of the packet data in
most cases. In cases where the packet may still be used later
by callers of dp_netdev_execute_actions, making a copy of the
packet data is still necessary.
This patch also adds a dpdk_buf field to 'struct ofpbuf' when
using DPDK. This field holds a pointer to the allocated DPDK
buffer in the rte_mempool. Thus, an upcall packet ofpbuf
allocated on the stack can now share data and free memory of
a rte_mempool allocated ofpbuf.
Signed-off-by: Ryan Wilson <wryan@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Direct use of 'data', 'base', and 'size' will break DPDK builds. Try
to wean us off the habit by renaming the fields.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
Rename 'l2' to 'frame' and add new ofpbuf_set_frame() and ofpbuf_l2().
ofpbuf_set_frame() alse resets all the layer offsets. ofpbuf_l2()
returns NULL if the packet has no Ethernet header, as indicated either
by unset l3 offset or NULL frame pointer. Callers of ofpbuf_l2() are
supposed to check the return value, unless they can otherwise be sure
that the packet has a valid Ethernet header.
The recent commit 437d0d22 made some assumptions that were not valid
regarding the use of the 'l2' pointer in rconn module and by
compose_rarp(). This is now fixed as follows: rconn now relies on the
fact that once OpenFlow messages are given to rconn for transport, the
frame pointer is no longer needed to refer to the OpenFlow header; and
compose_rarp() now sets the frame pointer and offsets as expected.
In addition to storing network frames, ofpbufs are also used for
handling OpenFlow messages and action lists. lib/ofpbuf.h now has a
comment documenting the current usage conventions and invariants.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Code reads better without the "get", for example "ofpbuf_l3()"
v.s. "ofpbuf_get_l3()". L4 payoad access functions still use the
"get" (e.g., "ofpbuf_get_tcp_payload()").
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
On DPDK packet recv, ovs is given pointer to mbuf which has
information about a packet, for example pointer to data and size.
By moving mbuf to ofpbuf we can let dpdk allocate ofpbuf and
pass that to ovs for processing the packet.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
netdev-dpdk uses this pointer to store dpdk mbuf. This patch fixes
compilation error in dpdk.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
This patch shrinks the struct ofpbuf from 104 to 48 bytes on 64-bit
systems, or from 52 to 36 bytes on 32-bit systems (counting in the
'l7' removal from an earlier patch). This may help contribute to
cache efficiency, and will speed up initializing, copying and
manipulating ofpbufs. This is potentially important for the DPDK
datapath, but the rest of the code base may also see a little benefit.
Changes are:
- Remove 'l7' pointer (previous patch).
- Use offsets instead of layer pointers for l2_5, l3, and l4 using
'l2' as basis. Usually 'data' is the same as 'l2', but this is not
always the case (e.g., when parsing or constructing a packet), so it
can not be easily used as the offset basis. Also, packet parsing is
faster if we do not need to maintain the offsets each time we pull
data from the ofpbuf.
- Use uint32_t for 'allocated' and 'size', as 2^32 is enough even for
largest possible messages/packets.
- Use packed enum for 'source'.
- Rearrange to avoid unnecessary padding.
- Remove 'private_p', which was used only in two cases, both of which
had the invariant ('l2' == 'data'), so we can temporarily use 'l2'
as a private pointer.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Now that we don't need to parse TCP flags from the packet after
extraction, we usually do not need the 'l7' pointer any more. When
needed, ofpbuf_get_tcp|udp|sctp|icmp_payload() or ofpbuf_get_l4_size()
can be used instead.
Removal of 'l7' was requested by Pravin for the DPDK datapath work, as
it simplifies packet parsing a bit.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Inline the most trivial ofpbuf functions to allow for better optimization.
Also inline the most often used ofpbuf_pull() and ofpbuf_try_pull(), which
should help streamline packet parsing.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Following patch adds DPDK netdev-class to userspace datapath. Now
OVS can use DPDK port for IO by just configuring DPDK port and then
adding dpdk type port to userspace datapath.
Refer to INSTALL.DPDK doc for further info.
This is based a patch from Gerald Rogers.
Signed-off-by: Gerald Rogers <gerald.rogers@intel.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@redhat.com>
Allowing the packet to be modified by execution allows less data
copying for userspace action execution. Some users of the
dpif_execute already expect that the packet may be modified. This
patch makes this behavior uniform and makes the userspace datapath and
the execution helpers modify the packet as it is being executed.
Userspace action now steals the packet if given permission, as the
packet is normally not needed after it. The only exception is the
sample action, and this is accounted for my keeping track of any
actions that could be following the userspace action.
The packet in dpif_upcall is changed from a pointer to a struct,
allowing the packet to be honest about it's headroom. After this
change the packet can safely be pushed on over the precarious 4 byte
limit earlier allowed by the netlink data preceding the packet.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This patch implements use-space datapath and non-datapath code
to match and use the datapath API set out in Leo Alterman's patch
"user-space datapath: Add basic MPLS support to kernel".
The resulting MPLS implementation supports:
* Pushing a single MPLS label
* Poping a single MPLS label
* Modifying an MPLS lable using set-field or load actions
that act on the label value, tc and bos bit.
* There is no support for manipulating the TTL
this is considered future work.
The single-level push pop limitation is implemented by processing
push, pop and set-field/load actions in order and discarding information
that would require multiple levels of push/pop to be supported.
e.g.
push,push -> the first push is discarded
pop,pop -> the first pop is discarded
This patch is based heavily on work by Ravi K.
Cc: Ravi K <rkerur@gmail.com>
Reviewed-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Replaced all instances of Nicira Networks(, Inc) to Nicira, Inc.
Feature #10593
Signed-off-by: Raju Subramanian <rsubramanian@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
My original intent for ofpbufs initialized with ofpbuf_use_stack() was that
the caller was providing enough space on the stack for the common case,
with dynamic allocation as a fallback. But in practice, none of the
clients actually do this. Instead, all of them actually know that the
stack-allocated buffer is big enough and, since they don't want to bother
with having to call ofpbuf_delete(), they instead assert that the buffer
wasn't reallocated.
Since this is a bit of a pain, this commit changes the semantics of
ofpbuf_use_stack() to be that the stack-allocated buffer cannot be
reallocated at all. This is more convenient for the existing clients.
This new function is a simple helper that creates a new ofpbuf with some
initial contents plus a caller-specified amount of headroom.
This will be used in upcoming commits.
Acked-by: Jesse Gross <jesse@nicira.com>
This new function is useful in a situation where a small stack-allocated
buffer is usually appropriate but occasionally it must be expanded.
Acked-by: Jesse Gross <jesse@nicira.com>
ovs_queue doesn't seem very useful; it's just a singly-linked list. It's
more generally useful to use a general-purpose "struct list" for lists of
packets, so this commit adds such a member to "struct ofpbuf" and shifts
the existing users to use it.