2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-29 13:27:59 +00:00

58 Commits

Author SHA1 Message Date
Andy Zhou
70a0573d0a odp: Fix sample action in userspace
User space implementation of the sample action is not consistent with
kernel datapath. In kernel datapath, the side effects of actions
within the sample actions are not visible to the subsequent actions.
Current user space handling does not follow the same logic. This patch
makes them consistent.

Signed-off-by: Andy Zhou <azhou@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
2017-02-03 14:43:15 -08:00
Andy Zhou
72c84bc2db dp-packet: Enhance packet batch APIs.
One common use case of 'struct dp_packet_batch' is to process all
packets in the batch in order. Add an iterator for this use case
to simplify the logic of calling sites,

Another common use case is to drop packets in the batch, by reading
all packets, but writing back pointers of fewer packets. Add macros
to support this use case.

Signed-off-by: Andy Zhou <azhou@ovn.org>
Acked-by: Jarno Rajahalme <jarno@ovn.org>
2017-01-26 17:35:29 -08:00
Andy Zhou
535e3acfa7 dpif-netdev: Add clone action
Add support for userspace datapath clone action.  The clone action
provides an action envelope to enclose an action list.
For example, with actions A, B, C and D,  and an action list:
      A, clone(B, C), D

The clone action will ensure that:

- D will see the same packet, and any meta states, such as flow, as
  action B.

- D will be executed regardless whether B, or C drops a packet. They
  can only drop a clone.

- When B drops a packet, clone will skip all remaining actions
  within the clone envelope. This feature is useful when we add
  meter action later:  The meter action can be implemented as a
  simple action without its own envolop (unlike the sample action).
  When necessary, the flow translation layer can enclose a meter action
  in clone.

The clone action is very similar with the OpenFlow clone action.
This is by design to simplify vswitchd flow translation logic.

Without datapath clone, vswitchd simulate the effect by inserting
datapath actions to "undo" clone actions. The above flow will be
translated into   A, B, C, -C, -B, D.

However, there are two issues:
- The resulting datapath action list may be longer without using
  clone.

- Some actions, such as NAT may not be possible to reverse.

This patch implements clone() simply with packet copy. The performance
can be improved with later patches, for example, to delay or avoid
packet copy if possible.  It seems datapath should have enough context
to carry out such optimization without the userspace context.

Signed-off-by: Andy Zhou <azhou@ovn.org>
Acked-by: Jarno Rajahalme <jarno@ovn.org>
2017-01-23 22:58:34 -08:00
Jarno Rajahalme
932c96b7b0 odp: Use struct in6_addr for IPv6 addresses.
Code is simplified when the ODP keys use the same type as the struct
flow for the IPv6 addresses.  As the change is facilitated by
extract-odp-netlink-h, this change only affects the userspace.  We
already do the same for the ethernet addresses.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
2017-01-04 16:31:06 -08:00
Zoltán Balogh
fc05230631 odp-execute: Optimize IP header modification in OVS datapath
I measured the packet processing cost of OVS DPDK datapath for different
OpenFlow actions. I configured OVS to use a single pmd thread and
measured the packet throughput in a phy-to-phy setup. I used 10G
interfaces bounded to DPDK driver and overloaded the vSwitch with 64
byte packets through one of the 10G interfaces.

The processing cost of the dec_ttl action seemed to be gratuitously high
compared with other actions.

I looked into the code and saw that dec_ttl is encoded as a masked
nested attribute in OVS_ACTION_ATTR_SET_MASKED(OVS_KEY_ATTR_IPV4).
That way, OVS datapath can modify several IP header fields (TTL, TOS,
source and destination IP addresses) by a single invocation of
packet_set_ipv4() in the odp_set_ipv4() function in the
lib/odp-execute.c file. The packet_set_ipv4() function takes the new
TOS, TTL and IP addresses as arguments, compares them with the actual
ones and updates the fields if needed. This means, that even if only TTL
needs to be updated, each of the four IP header fields is passed to the
callee and is compared to the actual field for each packet.

The odp_set_ipv4() caller function possesses information about the
fields that need to be updated in the 'mask' structure. The idea is to
spare invocation of the packet_set_ipv4() function but use its code
parts directly. So the 'mask' can be used to decide which IP header
fields need to be updated. In addition, a faster packet processing can
be achieved if the values of local variables are
calculated right before their usage.

       | T | T | I | I |
       | T | O | P | P |  Vanilla OVS  ||  + new patch
       | L | S | s | d | (nsec/packet) || (nsec/packet)
-------+---+---+---+---+---------------++---------------
output |   |   |   |   |    67.19      ||    67.19
       | X |   |   |   |    74.48      ||    68.78
       |   | X |   |   |    74.42      ||    70.07
       |   |   | X |   |    84.62      ||    78.03
       |   |   |   | X |    84.25      ||    77.94
       |   |   | X | X |    97.46      ||    91.86
       | X |   | X | X |   100.42      ||    96.00
       | X | X | X | X |   102.80      ||   100.73

The table shows the average processing cost of packets in nanoseconds
for the following actions:
output; output + dec_ttl; output + mod_nw_tos; output + mod_nw_src;
output + mod_nw_dst and some of their combinations.
I ran each test five times. The values are the mean of the readings
obtained.

I added OVS_LIKELY to the 'if' condition for the TTL field, since as far
as I know, this field will typically be decremented when any field of
the IP header is modified.

Signed-off-by: Zoltán Balogh <zoltan.balogh@ericsson.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-12-22 17:55:13 -08:00
Jesse Gross
8d8ab6c2d5 tun-metadata: Manage tunnel TLV mapping table on a per-bridge basis.
When using tunnel TLVs (at the moment, this means Geneve options), a
controller must first map the class and type onto an appropriate OXM
field so that it can be used in OVS flow operations. This table is
managed using OpenFlow extensions.

The original code that added support for TLVs made the mapping table
global as a simplification. However, this is not really logically
correct as the OpenFlow management commands are operating on a per-bridge
basis. This removes the original limitation to make the table per-bridge.

One nice result of this change is that it is generally clearer whether
the tunnel metadata is in datapath or OpenFlow format. Rather than
allowing ad-hoc format changes and trying to handle both formats in the
tunnel metadata functions, the format is more clearly separated by function.
Datapaths (both kernel and userspace) use datapath format and it is not
changed during the upcall process. At the beginning of action translation,
tunnel metadata is converted to OpenFlow format and flows and wildcards
are translated back at the end of the process.

As an additional benefit, this change improves performance in some flow
setup situations by keeping the tunnel metadata in the original packet
format in more cases. This helps when copies need to be made as the amount
of data touched is only what is present in the packet rather than the
maximum amount of metadata supported.

Co-authored-by: Madhu Challa <challa@noironetworks.com>
Signed-off-by: Madhu Challa <challa@noironetworks.com>
Signed-off-by: Jesse Gross <jesse@kernel.org>
Acked-by: Ben Pfaff <blp@ovn.org>
2016-09-19 09:52:22 -07:00
William Tu
aaca4fe0ce ofp-actions: Add truncate action.
The patch adds a new action to support packet truncation.  The new action
is formatted as 'output(port=n,max_len=m)', as output to port n, with
packet size being MIN(original_size, m).

One use case is to enable port mirroring to send smaller packets to the
destination port so that only useful packet information is mirrored/copied,
saving some performance overhead of copying entire packet payload.  Example
use case is below as well as shown in the testcases:

    - Output to port 1 with max_len 100 bytes.
    - The output packet size on port 1 will be MIN(original_packet_size, 100).
    # ovs-ofctl add-flow br0 'actions=output(port=1,max_len=100)'

    - The scope of max_len is limited to output action itself.  The following
      packet size of output:1 and output:2 will be intact.
    # ovs-ofctl add-flow br0 \
            'actions=output(port=1,max_len=100),output:1,output:2'
    - The Datapath actions shows:
    # Datapath actions: trunc(100),1,1,2

Tested-at: https://travis-ci.org/williamtu/ovs-travis/builds/140037134
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
2016-06-24 09:17:00 -07:00
Pravin B Shelar
1895cc8dbb dpif-netdev: create batch object
DPDK datapath operate on batch of packets. To pass the batch of
packets around we use packets array and count.  Next patch needs
to associate meta-data with each batch of packets. So Introducing
a batch structure to make handling the metadata easier.

Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
2016-05-18 19:39:18 -07:00
Simon Horman
31a9a58452 packets: use flow protocol when recalculating ipv6 checksums
When using masked actions the ipv6_proto field of an action
to set IPv6 fields may be zero rather than the prevailing protocol
which will result in skipping checksum recalculation.

This patch resolves the problem by relying on the protocol
in the packet rather than that in the set field action.

A similar fix for the kernel datapath has been accepted into David Miller's
'net' tree as b4f70527f052 ("openvswitch: use flow protocol when
recalculating ipv6 checksums").

Cc: Jarno Rajahalme <jrajahalme@nicira.com>
Fixes: 6d670e7f0d45 ("lib/odp: Masked set action execution and printing.")
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Ben Pfaff <blp@ovn.org>
2016-04-23 14:41:49 +10:00
Justin Pettit
f6ecf944a9 vswitchd: Allow modifying ICMP type and code.
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
2015-11-09 15:01:50 -08:00
Joe Stringer
9daf23484f Add connection tracking label support.
This patch adds a new 128-bit metadata field to the connection tracking
interface. When a label is specified as part of the ct action and the
connection is committed, the value is saved with the current connection.
Subsequent ct lookups with the table specified will expose this metadata
as the "ct_label" field in the flow.

For example, to allow new TCP connections from port 1->2 and only allow
established connections from port 2->1, and to associate a label with
those connections:

    table=0,priority=1,action=drop
    table=0,arp,action=normal
    table=0,in_port=1,tcp,action=ct(commit,exec(set_field:1->ct_label)),2
    table=0,in_port=2,ct_state=-trk,tcp,action=ct(table=1)
    table=1,in_port=2,ct_state=+trk,ct_label=1,tcp,action=1

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-10-13 15:34:16 -07:00
Joe Stringer
8e53fe8cf7 Add connection tracking mark support.
This patch adds a new 32-bit metadata field to the connection tracking
interface. When a mark is specified as part of the ct action and the
connection is committed, the value is saved with the current connection.
Subsequent ct lookups with the table specified will expose this metadata
as the "ct_mark" field in the flow.

For example, to allow new TCP connections from port 1->2 and only allow
established connections from port 2->1, and to associate a mark with those
connections:

    table=0,priority=1,action=drop
    table=0,arp,action=normal
    table=0,in_port=1,tcp,action=ct(commit,exec(set_field:1->ct_mark)),2
    table=0,in_port=2,ct_state=-trk,tcp,action=ct(table=1)
    table=1,in_port=2,ct_state=+trk,ct_mark=1,tcp,action=1

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-10-13 15:34:15 -07:00
Joe Stringer
07659514c3 Add support for connection tracking.
This patch adds a new action and fields to OVS that allow connection
tracking to be performed. This support works in conjunction with the
Linux kernel support merged into the Linux-4.3 development cycle.

Packets have two possible states with respect to connection tracking:
Untracked packets have not previously passed through the connection
tracker, while tracked packets have previously been through the
connection tracker. For OpenFlow pipeline processing, untracked packets
can become tracked, and they will remain tracked until the end of the
pipeline. Tracked packets cannot become untracked.

Connections can be unknown, uncommitted, or committed. Packets which are
untracked have unknown connection state. To know the connection state,
the packet must become tracked. Uncommitted connections have no
connection state stored about them, so it is only possible for the
connection tracker to identify whether they are a new connection or
whether they are invalid. Committed connections have connection state
stored beyond the lifetime of the packet, which allows later packets in
the same connection to be identified as part of the same established
connection, or related to an existing connection - for instance ICMP
error responses.

The new 'ct' action transitions the packet from "untracked" to
"tracked" by sending this flow through the connection tracker.
The following parameters are supported initally:

- "commit": When commit is executed, the connection moves from
  uncommitted state to committed state. This signals that information
  about the connection should be stored beyond the lifetime of the
  packet within the pipeline. This allows future packets in the same
  connection to be recognized as part of the same "established" (est)
  connection, as well as identifying packets in the reply (rpl)
  direction, or packets related to an existing connection (rel).
- "zone=[u16|NXM]": Perform connection tracking in the zone specified.
  Each zone is an independent connection tracking context. When the
  "commit" parameter is used, the connection will only be committed in
  the specified zone, and not in other zones. This is 0 by default.
- "table=NUMBER": Fork pipeline processing in two. The original instance
  of the packet will continue processing the current actions list as an
  untracked packet. An additional instance of the packet will be sent to
  the connection tracker, which will be re-injected into the OpenFlow
  pipeline to resume processing in the specified table, with the
  ct_state and other ct match fields set. If the table is not specified,
  then the packet is submitted to the connection tracker, but the
  pipeline does not fork and the ct match fields are not populated. It
  is strongly recommended to specify a table later than the current
  table to prevent loops.

When the "table" option is used, the packet that continues processing in
the specified table will have the ct_state populated. The ct_state may
have any of the following flags set:

- Tracked (trk): Connection tracking has occurred.
- Reply (rpl): The flow is in the reply direction.
- Invalid (inv): The connection tracker couldn't identify the connection.
- New (new): This is the beginning of a new connection.
- Established (est): This is part of an already existing connection.
- Related (rel): This connection is related to an existing connection.

For more information, consult the ovs-ofctl(8) man pages.

Below is a simple example flow table to allow outbound TCP traffic from
port 1 and drop traffic from port 2 that was not initiated by port 1:

    table=0,priority=1,action=drop
    table=0,arp,action=normal
    table=0,in_port=1,tcp,ct_state=-trk,action=ct(commit,zone=9),2
    table=0,in_port=2,tcp,ct_state=-trk,action=ct(zone=9,table=1)
    table=1,in_port=2,ct_state=+trk+est,tcp,action=1
    table=1,in_port=2,ct_state=+trk+new,tcp,action=drop

Based on original design by Justin Pettit, contributions from Thomas
Graf and Daniele Di Proietto.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-10-13 15:34:15 -07:00
Jarno Rajahalme
74ff3298c8 userspace: Define and use struct eth_addr.
Define struct eth_addr and use it instead of a uint8_t array for all
ethernet addresses in OVS userspace.  The struct is always the right
size, and it can be assigned without an explicit memcpy, which makes
code more readable.

"struct eth_addr" is a good type name for this as many utility
functions are already named accordingly.

struct eth_addr can be accessed as bytes as well as ovs_be16's, which
makes the struct 16-bit aligned.  All use seems to be 16-bit aligned,
so some algorithms on the ethernet addresses can be made a bit more
efficient making use of this fact.

As the struct fits into a register (in 64-bit systems) we pass it by
value when possible.

This patch also changes the few uses of Linux specific ETH_ALEN to
OVS's own ETH_ADDR_LEN, and removes the OFP_ETH_ALEN, as it is no
longer needed.

This work stemmed from a desire to make all struct flow members
assignable for unrelated exploration purposes.  However, I think this
might be a nice code readability improvement by itself.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
2015-08-28 14:55:11 -07:00
Jesse Gross
6728d578f6 dpif-netdev: Translate Geneve options per-flow, not per-packet.
The kernel implementation of Geneve options stores the TLV option
data in the flow exactly as received, without any further parsing.
This is then translated to known options for the purposes of matching
on flow setup (which will then install a datapath flow in the form
the kernel is expecting).

The userspace implementation behaves a little bit differently - it
looks up known options as each packet is received. The reason for this
is there is a much tighter coupling between datapath and flow translation
and the representation is generally expected to be the same. This works
but it incurs work on a per-packet basis that could be done per-flow
instead.

This introduces a small translation step for Geneve packets between
datapath and flow lookup for the userspace datapath in order to
allow the same kind of processing that the kernel does. A side effect
of this is that unknown options are now shown when flows dumped via
ovs-appctl dpif/dump-flows, similar to the kernel.

There is a second benefit to this as well: for some operations it is
preferable to keep the options exactly as they were received on the wire,
which this enables. One example is that for packets that are executed from
ofproto-dpif-upcall to the datapath, this avoids the translation of
Geneve metadata. Since this conversion is potentially lossy (for unknown
options), keeping everything in the same format removes the possibility
of dropping options if the packet comes back up to userspace and the
Geneve option translation table has changed. To help with these types of
operations, most functions can understand both formats of data and seamlessly
do the right thing.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2015-08-05 20:26:48 -07:00
Thomas F. Herbert
d694339457 Add support functions for 8021.ad push and pop vlan.
Changes to allow the tpid to be specified and all vlan tpid checking to be
generalized.

Signed-off-by: Thomas F Herbert <thomasfherbert@gmail.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2015-06-07 11:35:21 -07:00
Joe Stringer
db8bb9a51e odp-execute: Refactor determining dpif assistance.
To be more explicit about which actions require datapath assistance,
split this out into a separate function. While this is fairly trivial
currently, there will be more special cases for the upcoming conntrack
changes.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-05-29 15:27:58 -07:00
Daniele Di Proietto
2bc1bbd27d dp-packet: Rename 'dp_hash' in 'rss_hash'.
We already have the 'dp_hash' embedded in the metadata.  This caused
confusion in the code.  With this commit it should be clear that
'rss_hash' is the packet hash used for internal purposes, while
'md.dp_hash' is part of the flow, computed during the execution of
certain actions.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
2015-04-20 12:49:41 -07:00
Pravin B Shelar
cf62fa4c70 dp-packet: Remove ofpbuf dependency.
Currently dp-packet make use of ofpbuf for managing packet
buffers. That complicates ofpbuf, by making dp-packet
independent of ofpbuf both libraries can be optimized for
their own use case.
This avoids mapping operation between ofpbuf and dp_packet
in datapath upcalls.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-03-03 13:37:37 -08:00
Pravin B Shelar
e14deea0bd dpif_packet: Rename to dp_packet
dp_packet is short and better name for datapath packet
structure.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2015-03-03 13:37:34 -08:00
Sharo, Randall A CIV SPAWARSYSCEN-ATLANTIC, 55200
e60e935b1f Implement set-field for IPv6 ND fields (nd_target, nd_sll, and nd_tll).
This patch adds set-field operations for nd_target, nd_sll, and nd_tll
fields, with and without masks, using Nicira extensions and OpenFlow 1.2
protocol.

Signed-off-by: Randall A Sharo <randall.sharo at navy.mil>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2015-01-13 16:22:44 -08:00
Alex Wang
1368af0c85 FreeBSD: Fix build failure.
This commit fixes an include dependency for header ip6.h, on
FreeBSD.  Without this commit, the gmake of ovs master on
FreeBSD will result in the following error.

/usr/include/netinet/ip6.h:82: error: field 'ip6_src' has incomplete type
/usr/include/netinet/ip6.h:83: error: field 'ip6_dst' has incomplete type

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-01-04 21:00:15 -08:00
Pravin B Shelar
a36de779d7 openvswitch: Userspace tunneling.
Following patch adds support for userspace tunneling. Tunneling
needs three more component first is routing table which is configured by
caching kernel routes and second is ARP cache which build automatically
by snooping arp. And third is tunnel protocol table which list all
listening protocols which is populated by vswitchd as tunnel ports
are added. GRE and VXLAN protocol support is added in this patch.

Tunneling works as follows:
On packet receive vswitchd check if this packet is targeted to tunnel
port. If it is then vswitchd inserts tunnel pop action which pops
header and sends packet to tunnel port.
On packet xmit rather than generating Set tunnel action it generate
tunnel push action which has tunnel header data. datapath can use
tunnel-push action data to generate header for each packet and
forward this packet to output port. Since tunnel-push action
contains most of packet header vswitchd needs to lookup routing
table and arp table to build this action.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-11-12 15:08:33 -08:00
Jarno Rajahalme
b8778a0d0b Fix setting transport ports with frags.
Packets with 'LATER' fragment do not have a transport header, so it is
not possible to either match on or set transport ports on such
packets.  Matching is prevented by augmenting mf_are_prereqs_ok() with
a nw_frag 'LATER' bit check.  Setting the transport headers on such
packets is prevented in three ways:

1. Flows with an explicit match on nw_frag, where the LATER bit is 1:
   existing calls to the modified mf_are_prereqs_ok() prohibit using
   transport header fields (port numbers) in OXM/NXM actions
   (set_field, move).  SET_TP_* actions need a new check on the LATER
   bit.

2. Flows that wildcard the nw_frag LATER bit: At flow translation
   time, add calls to mf_are_prereqs_ok() to make sure that we do not
   use transport ports in flows that do not have them.

3. At action execution time, do not set transport ports, if the packet
   does not have a full transport header.  This ensures that we never
   call the packet_set functions, that require a valid transport
   header, with packets that do not have them.  For example, if the
   flow was created with a IPv6 first fragment that had the full TCP
   header, but the next packet's first fragment is missing them.

3 alone would suffice for correct behavior, but 1 and 2 seem like a
right thing to do, anyway.

Currently, if we are setting port numbers, we will also match them,
due to us tracking the set fields with the same flow_wildcards as the
matched fields.  Hence, if the incoming port number was not zero, the
flow would not match any packets with missing or truncated transport
headers.  However, relying on no packets having zero port numbers
would not be very robust.  Also, we may separate the tracking of set
and matched fields in the future, which would allow some flows that
blindly set port numbers to not match on them at all.

For TCP in case 3 we use ofpbuf_get_tcp_payload() that requires the
whole (potentially variable size) TCP header to be present.  However,
when parsing a flow, we only require the fixed size portion of the TCP
header to be present, which would be enough to set the port numbers
and fix the TCP checksum.

Finally, we add tests testing the new behavior.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-11-10 13:40:03 -08:00
Pravin B Shelar
41ccaa249c netdev-dpif: Add metadata to dpif-packet.
Today dpif-netdev has single metadat for given batch, since one
batch belongs to one port, but soon packets fro single tunnel ports
can belong to different ports, so we need to have per packet metadata.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-10-09 14:12:11 -07:00
Daniele Di Proietto
1164afb6cc odp-execute: Refactor odp_execute_{actions, sample}()
Firstly, with this change, the 'more_actions' parameter is removed and
is integrated into 'steal'. Then, every function that receives a batch
of packets with 'steal' set to true is responsible for freeing the
packets. Finally, odp_execute_actions() and odp_execute_actions__()
can be be merged.

This also fixes a memory leak in odp_execute_sample(), when the
subactions are not executed

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-10-03 15:04:15 -07:00
Daniele Di Proietto
0057762a2f odp-execute: Fix memory leak on recirc action
If odp_execute_actions() has been called with 'steal' set to true and
OVS_ACTION_ATTR_RECIRC as last action, it should allow dp_execute_cb()
to steal the packet.

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-10-03 15:04:15 -07:00
Daniele Di Proietto
f7c2f97d57 lib/odp-execute: Use dpif_packet_set_dp_hash() instead of ->dp_hash
When building with DPDK support, 'struct dpif_packet' won't have 'dp_hash'
member. dpif_packet_set_dp_hash() and dpif_packet_get_dp_hash() should be used.

Furthermore, the masked set action shouldn't read 'md->dp_hash' (which is
shared in a batch), but should use dpif_packet_get_dp_hash() to get each packet
private hash.

This commit fixes the build with DPDK.

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-09-09 14:21:41 -07:00
Jarno Rajahalme
6d670e7f0d lib/odp: Masked set action execution and printing.
Add a new action type OVS_ACTION_ATTR_SET_MASKED, and support for
parsing, printing, and committing them.

Masked set actions add a mask, immediately following the netlink
attribute data, within the netlink attribute itself.  Thus the key
attribute size for a masked set action is exactly double of the
non-masked set action.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-09-08 14:57:08 -07:00
Daniele Di Proietto
61a2647e15 packet-dpif: Add dpif_packet_{get, set}_hash()
These function are used to stored the packet hash. 'netdev-dpdk'
automatically set this value to the RSS hash returned by the
NIC. Other 'netdev's set it to 0 (which is an invalid hash
value), so that callers can compute the hash on their own.

If DPDK support is enabled, struct dpif_packet's member
'dp_hash' is removed and 'pkt.hash.rss' from DPDK mbuf is used

This commit also configure DPDK devices to compute RSS hash
for UDP and IPv6 packets

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-08-29 16:32:21 -07:00
Ben Pfaff
837eefc76b Do not seemingly #include Linux-specific headers on other platforms.
Until now, the OVS source tree has had a whole maze of header files that
make "#include <linux/openvswitch.h>" work OK regardless of platform, but
this confuses everyone new to the tree, at first glance, and is difficult
to understand at second glance too.

This commit renames include/linux/openvswitch.h to
datapath/linux/compat/include/linux/openvswitch.h without other change,
then modifies the userspace build to generate a header that makes sense in
portable Open vSwitch userspace from that header.

It then removes all the remaining include/linux/* files since they are now
unused.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2014-08-04 11:11:40 -07:00
Pravin B Shelar
e381def971 lib: Rename ofp to buf.
dpif-packet contains ofpbuf which points to packet data.  Here buf
is better name rather than ofp.
Following patch renames all remaining instances of ofp variable.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Daniele Di Proietto <ddiproietto@vmware.com>
2014-06-25 09:28:42 -07:00
Daniele Di Proietto
8cbf4f479b dpif-netdev: batch packet processing
This change in dpif-netdev allows faster packet processing for devices which
implement batching (netdev-dpdk currently).

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-06-23 14:41:15 -07:00
Daniele Di Proietto
910885540a dpif-netdev: use dpif_packet structure for packets
This commit introduces a new data structure used for receiving packets from
netdevs and passing them to dpifs.
The purpose of this change is to allow storing some private data for each
packet. The subsequent commits make use of it.

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-06-23 14:41:12 -07:00
Andy Zhou
c6bf49f3fa dpif: Fix slow action handling for DP_HASH and RECIRC
In case DP_HASH and RECIRC actions need to be executed in slow path,
current implementation simply don't handle them -- vswitchd simply
crashes. This patch fixes them by supply an implementation for them.

RECIRC will be handled by the datapath, same as the output action.

DP_HASH, on the other hand, is handled in the user space. Although the
resulting hash values may not match those computed by the datapath, it
is less expensive; current use case (bonding) does not require a strict
match to work properly.

Reported-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
2014-06-04 14:06:40 -07:00
Andy Zhou
347bf289b3 dpif-netdev: Move hash function out of the recirc action, into its own action
Currently recirculation action can optionally compute hash. This patch
adds a hash action that is independent of the recirc action, which
no longer computes hash.  For megaflow bond with recirc, the output
to a bond port action will look like:

    hash(hash_l4(0)), recirc(<recirc_id>)

Obviously, when a recirculation application that does not depend on
hash value can just use the recirc action alone.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Reviewed-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Acked-by: Pravin B Shelar <pshelar@nicira.com
2014-04-16 15:30:42 -07:00
Andy Zhou
adcf00ba35 ofproto/bond: Implement bond megaflow using recirculation
Infrastructure to enable megaflow support for bond ports using
recirculation. This patch adds the following features:
* Generate RECIRC action when bond can benefit from recirculation.
* Populate post recirculation rules in a hidden table. Currently table 254.
* Uses post recirculation rules for bond rebalancing
* A recirculation implementation in dpif-netdev.

The goal of this patch is to be able to megaflow bond outputs and
thus greatly improve performance. However, this patch does not
actually improve the megaflow generation. It is left for a later commit.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-04-07 19:55:30 -07:00
Jarno Rajahalme
cf3b753866 ofpbuf: Abstract 'l2' pointer and document usage conventions.
Rename 'l2' to 'frame' and add new ofpbuf_set_frame() and ofpbuf_l2().
ofpbuf_set_frame() alse resets all the layer offsets.  ofpbuf_l2()
returns NULL if the packet has no Ethernet header, as indicated either
by unset l3 offset or NULL frame pointer.  Callers of ofpbuf_l2() are
supposed to check the return value, unless they can otherwise be sure
that the packet has a valid Ethernet header.

The recent commit 437d0d22 made some assumptions that were not valid
regarding the use of the 'l2' pointer in rconn module and by
compose_rarp().  This is now fixed as follows: rconn now relies on the
fact that once OpenFlow messages are given to rconn for transport, the
frame pointer is no longer needed to refer to the OpenFlow header; and
compose_rarp() now sets the frame pointer and offsets as expected.

In addition to storing network frames, ofpbufs are also used for
handling OpenFlow messages and action lists.  lib/ofpbuf.h now has a
comment documenting the current usage conventions and invariants.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2014-04-03 11:51:59 -07:00
Jarno Rajahalme
6b8c377a6e ofpbuf: Rename trivial _get_ functions without the "get".
Code reads better without the "get", for example "ofpbuf_l3()"
v.s. "ofpbuf_get_l3()".  L4 payoad access functions still use the
"get" (e.g., "ofpbuf_get_tcp_payload()").

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-04-03 11:51:54 -07:00
Jarno Rajahalme
437d0d22ab lib/ofpbuf: Compact
This patch shrinks the struct ofpbuf from 104 to 48 bytes on 64-bit
systems, or from 52 to 36 bytes on 32-bit systems (counting in the
'l7' removal from an earlier patch).  This may help contribute to
cache efficiency, and will speed up initializing, copying and
manipulating ofpbufs.  This is potentially important for the DPDK
datapath, but the rest of the code base may also see a little benefit.

Changes are:

- Remove 'l7' pointer (previous patch).
- Use offsets instead of layer pointers for l2_5, l3, and l4 using
  'l2' as basis.  Usually 'data' is the same as 'l2', but this is not
  always the case (e.g., when parsing or constructing a packet), so it
  can not be easily used as the offset basis.  Also, packet parsing is
  faster if we do not need to maintain the offsets each time we pull
  data from the ofpbuf.
- Use uint32_t for 'allocated' and 'size', as 2^32 is enough even for
  largest possible messages/packets.
- Use packed enum for 'source'.
- Rearrange to avoid unnecessary padding.
- Remove 'private_p', which was used only in two cases, both of which
  had the invariant ('l2' == 'data'), so we can temporarily use 'l2'
  as a private pointer.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2014-03-29 17:22:19 -07:00
Andy Zhou
572f732ab0 dpif-netdev: user space datapath recirculation
Add basic recirculation infrastructure and user space
data path support for it. The following bond mega flow patch will
make use of this infrastructure.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-03-25 13:24:39 -07:00
Pravin
df1e5a3bc7 netdev: Extend rx_recv to pass multiple packets.
DPDK can receive multiple packets but current netdev API does
not allow that.  Following patch allows dpif-netdev receive batch
of packet in a rx_recv() call for any netdev port.  This will be
used by dpdk-netdev.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-03-21 11:48:28 -07:00
Simon Horman
1bf02876a4 lib: Add tpid parameter to eth_push_vlan()
This is in preparation for pushing vlan tags
using the TPID provided by the kernel via auxdata.

Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2014-01-16 14:38:21 -08:00
Jarno Rajahalme
758c456df5 dpif: Use explicit packet metadata.
This helps reduce confusion about when a flow is a flow and when it is
just metadata.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2013-12-30 16:52:43 -08:00
Jarno Rajahalme
09f9da0bca odp-execute: Consolidate callbacks.
Use one callback instead of many, helps in adding new functionality
later on.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2013-12-30 15:58:58 -08:00
Harold Lim
428b2eddc9 Rename NOT_REACHED to OVS_NOT_REACHED
This allows other libraries to use util.h that has already
defined NOT_REACHED.

Signed-off-by: Harold Lim <haroldl@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2013-12-17 13:16:39 -08:00
Jarno Rajahalme
da546e0764 dpif: Allow execute to modify the packet.
Allowing the packet to be modified by execution allows less data
copying for userspace action execution.  Some users of the
dpif_execute already expect that the packet may be modified.  This
patch makes this behavior uniform and makes the userspace datapath and
the execution helpers modify the packet as it is being executed.
Userspace action now steals the packet if given permission, as the
packet is normally not needed after it.  The only exception is the
sample action, and this is accounted for my keeping track of any
actions that could be following the userspace action.

The packet in dpif_upcall is changed from a pointer to a struct,
allowing the packet to be honest about it's headroom.  After this
change the packet can safely be pushed on over the precarious 4 byte
limit earlier allowed by the netlink data preceding the packet.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2013-12-16 08:14:52 -08:00
Jarno Rajahalme
dc235f7fbc TCP flags matching support.
tcp_flags=flags/mask
        Bitwise  match on TCP flags.  The flags and mask are 16-bit num‐
        bers written in decimal or in hexadecimal prefixed by 0x.   Each
        1-bit  in  mask requires that the corresponding bit in port must
        match.  Each 0-bit in mask causes the corresponding  bit  to  be
        ignored.

        TCP  protocol  currently  defines  9 flag bits, and additional 3
        bits are reserved (must be transmitted as zero), see  RFCs  793,
        3168, and 3540.  The flag bits are, numbering from the least
	significant bit:

        0: FIN No more data from sender.

        1: SYN Synchronize sequence numbers.

        2: RST Reset the connection.

        3: PSH Push function.

        4: ACK Acknowledgement field significant.

        5: URG Urgent pointer field significant.

        6: ECE ECN Echo.

        7: CWR Congestion Windows Reduced.

        8: NS  Nonce Sum.

        9-11:  Reserved.

        12-15: Not matchable, must be zero.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2013-10-29 09:43:59 -07:00
Alex Wang
c51876c3af odp-execute: Fix possible segfault.
In current code, the odp_execute_actions() function does not check
the value of "userspace" function pointer before invoking it.  This
patch adds a check for it.

Signed-off-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2013-10-21 15:54:19 -07:00
Ben Pfaff
f6c8a6b163 Add software switch support for modifying ARP headers in OpenFlow.
This support is added through the userspace slow path, because we don't
judge that this is important enough to require permanent support in the
Linux kernel ABI.

Bug #19259.
CC: Teemu Koponen <koponen@nicira.com>
CC: Pankaj Thakkar <thakkar@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2013-10-09 17:37:30 -07:00