2
0
mirror of https://github.com/openvswitch/ovs synced 2025-10-25 15:07:05 +00:00
Commit Graph

81 Commits

Author SHA1 Message Date
Ben Pfaff
fea393b1d6 datapath: Describe policy for extending flow key, implement needed changes.
When the datapath was converted to use Netlink attributes for describing
flow keys, I had a vague idea of how it could be smoothly extensible, but
I didn't actually implement extensibility or carefully think it through.
This commit adds a document that describes how flow keys can be extended
in a compatible fashion and adapts the existing interface to match what
it says.

This commit doesn't actually implement extensibility.  I already have a
separate patch series out for that.  This patch series borrows from that
one heavily, but the extensibility series will need to be reworked
somewhat once this one is in.

This commit is only lightly tested because I don't have a good test setup
for VLANs.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-11-14 16:52:51 -08:00
Ben Pfaff
34118caede datapath: Allow flow key Netlink attributes to appear in any order.
This is more conventional use of Netlink.

For upstreaming, 'u64 attrs' can be changed to u32 and the uses of 1ULL
can be changed to 1.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-11-14 15:09:01 -08:00
Ben Pfaff
a1bf209f58 datapath: Rearrange ovs_key_lens.
This seems clearer to me.  It should not cause any behavioral change.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-11-12 12:03:21 -08:00
Ansis Atteka
58828b08f0 datapath: Kernel flow metadata parsing should be less restrictive
The function flow_metadata_from_nlattrs() is very restrictive
about the ordering and type of metadata attributes that it receives.
This patch will change flow_metadata_from_nlattrs() behavior by
ignoring attributes that it does not understand and allowing them
to be passed in arbitrary order.

Issue #8167

Signed-off-by: Ansis Atteka <aatteka@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-11-10 11:03:10 -08:00
Justin Pettit
60258dcba6 datapath: Rename ipv6_tos to ipv6_tclass.
IPv6 uses the term "traffic class" for what IPv4 calls
"type-of-service".  This commit renames the the "ipv6_tos" field to
"ipv6_tclass" in the "ovs-key_ipv6" struct to be more consistent with
the IPv6 terminology.

Suggested-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-11-09 13:24:52 -08:00
Justin Pettit
a61680c6d1 Support matching and modifying IP TTL.
Add support matching the IPv4 TTL and IPv6 hop limit fields.  This
commit also adds support for modifying the IPv4 TTL.  Modifying the IPv6
hop limit isn't currently supported, since we don't support modifying
IPv6 headers.

We will likely want to change the user-space interface, since basic
matching and setting the TTL are not generally useful.  We will probably
want the ability to match on extraordinary events (such as TTL of 0 or 1)
and a decrement action.

Feature #8024

Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-11-09 13:24:52 -08:00
Justin Pettit
530180fd5a Support matching and modifying IP ECN bits.
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-11-09 10:47:59 -08:00
Justin Pettit
9e44d71563 Don't overload IP TOS with the frag matching bits.
This will be useful later when we add support for matching the ECN bits
within the TOS field.

Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-11-09 10:37:57 -08:00
Justin Pettit
fa8223b7fd Support matching IPv6 flow label.
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-11-09 10:37:55 -08:00
Jesse Gross
cdb1a85bba datapath: Renumber non-upstreamable interfaces.
The interfaces related to tunneling aren't finalized enough to be
sent upstream but we also still want to retain them in the OVS
repository.  Since userspace should be compatible with both versions
of the kernel, this renumbers the tunnel interfaces to high numbers
so that we can continue to add new interfaces without conflict.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2011-11-07 18:24:36 -08:00
Pravin B Shelar
6455100f38 datapath: Fix coding style issues.
Most of issues are reported by checkpatch.pl

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>

Bug #7771
2011-11-07 15:53:01 -08:00
Jesse Gross
a7d7f493ba datapath: Drop useless WARN_ON_ONCE during flow conversion.
This checks whether key_len is not zero but we set the key length
at the beginning of the function, so I don't see this as a useful
check.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2011-11-02 16:00:11 -07:00
Jesse Gross
515c382daf datapath: Add IPv6 to list of parsed EtherTypes.
The kernel can parse IPv6, so if it receives a flow with an IPv6
EtherType then it expects to get IPv6 information as well.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2011-11-02 16:00:10 -07:00
Pravin B Shelar
abff858b5a datapath: Convert kernel priority actions into match/set.
Following patch adds skb-priority to flow key. So userspace will know
what was priority when packet arrived and we can remove the pop/reset
priority action. It's no longer necessary to have a special action for
pop that is based on the kernel remembering original skb->priority.
Userspace can just emit a set priority action with the original value.

Since the priority field is a match field with just a normal set action,
we can convert it into the new model for actions that are based on
matches.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>

Bug #7715
2011-11-01 10:13:16 -07:00
Ben Pfaff
7257b535ab Implement new fragment handling policy.
Until now, OVS has handled IP fragments more awkwardly than necessary.  It
has not been possible to match on L4 headers, even in fragments with offset
0 where they are actually present.  This means that there was no way to
implement ACLs that treat, say, different TCP ports differently, on
fragmented traffic; instead, all decisions for fragment forwarding had to
be made on the basis of L2 and L3 headers alone.

This commit improves the situation significantly.  It is still not possible
to match on L4 headers in fragments with nonzero offset, because that
information is simply not present in such fragments, but this commit adds
the ability to match on L4 headers for fragments with zero offset.  This
means that it becomes possible to implement ACLs that drop such "first
fragments" on the basis of L4 headers.  In practice, that effectively
blocks even fragmented traffic on an L4 basis, because the receiving IP
stack cannot reassemble a full packet when the first fragment is missing.

This commit works by adding a new "fragment type" to the kernel flow match
and making it available through OpenFlow as a new NXM field named
NXM_NX_IP_FRAG.  Because OpenFlow 1.0 explicitly says that the L4 fields
are always 0 for IP fragments, it adds a new OpenFlow fragment handling
mode that fills in the L4 fields for "first fragments".  It also enhances
ovs-ofctl to allow users to configure this new fragment handling mode and
to parse the new field.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Bug #7557.
2011-10-21 15:07:36 -07:00
Pravin B Shelar
4edb9ae90e datapath: Refactor actions in terms of match fields.
Almost all current actions can be expressed in the form of
push/pop/set <field>, where field is one of the match fields. We can
create three base actions and take a field. This has both a nice
symmetry and avoids inconsistencies where we can match on the vlan
TPID but not set it.
Following patch converts all actions to this new format.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>

Bug #7115
2011-10-21 14:38:54 -07:00
Pravin B Shelar
e9141eec24 datapath: Remove RT kernel support.
Following patch removes RT kernel support. This allows us to cleanup
the loop detection.
Along with this BH is now disabled while running execute_actions()
for packet from user-space.
As a result we can simplify the stats code as entire send and receive
path runs in BH context on all supported platforms.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Bug #7621
2011-10-06 21:52:39 -07:00
Pravin Shelar
3544358aa5 datapath: Improve kernel hash table
Currently OVS uses its own hashing implmentation for hash tables
which has some problems, e.g. error case on deletion code.
Following patch replaces that with hlist based hash table which is
consistent with other kernel hash tables. As Jesse suggested, flex-array
is used for allocating hash buckets, So that we can have large
hash-table without large contiguous kernel memory.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-09-09 19:09:47 -07:00
Ben Pfaff
18886b60bc datapath: Allow a packet with no input port to omit OVS_KEY_ATTR_IN_PORT.
When ovs-vswitchd executes actions on a synthesized packet, that is, on a
packet that is not being forwarded from any particular port but is being
generated by ovs-vswitchd itself or by an OpenFlow controller (using a
OFPT_PACKET_OUT message with an in_port of OFPP_NONE), there is no good
choice for the in_port to pass to the kernel in the flow in the
OVS_PACKET_CMD_EXECUTE message.  This commit allows ovs-vswitchd to omit
the in_port entirely in this case.

This fixes a bug in OFPT_PACKET_OUT: using an in_port of OFPP_NONE would
cause the packet to be dropped by the kernel, since that's an invalid
input port.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Reported-by: Aaron Rosen <arosen@clemson.edu>
2011-09-08 16:30:20 -07:00
Justin Pettit
df2c07f433 datapath: Use "OVS_*" as opposed to "ODP_*" for user<->kernel interactions.
The prefix "ODP_*" is not overly descriptive in the context of the
larger Linux tree.  This commit changes the prefix to "OVS_*" for the
userpace to kernel interactions.  The userspace libraries still use
"ODP_" in many of their interfaces since it is more descriptive in the
OVS oeuvre.

Feature #6904

Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-08-19 22:48:23 -07:00
Jesse Gross
28bad4735c datapath: Remove redundant nw_ prefix from fields in flow key.
The fields of the kernel flow key are now grouped by protocol rather
than using generic names.  The containing structures describe the
category, so it is no longer necessary to use prefixes.  Most of
these prefixes have been removed but nw_proto and nw_tos have
retained them.  This renames the fields for consistency and brevity.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2011-06-08 13:57:55 -07:00
Jesse Gross
5f4d087c76 datapath: IP fragments should include L4 header in flow length.
If we can't parse a header because it is invalid or not present due to
fragmentation, we still need to include the length of that header when
comparing the flow key.  The value of the field will be zero to
indicate that header was not present, rather than effectively
wildcarding the value.  However, this was not done with fragments on
flow extract but is effectively done on flow setup.  Since the flow
length also changes the hash, it caused all fragments to miss the
hash table and be sent to useerspace.

Reported-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Tested-by: Ben Pfaff <blp@nicira.com>
2011-06-08 13:56:19 -07:00
Ben Pfaff
80e5eed9c2 datapath: Get packet metadata from userspace in odp_packet_cmd_execute().
Until now, the tun_id and in_port have been lost when a packet is sent from
the kernel to userspace and then back to the kernel.  I didn't think that
this was a problem, but recent behavior made me look closer and see that
it makes a difference if sFlow is turned on or if an
ODP_ATTR_ACTION_CONTROLLER action is present.  We could possibly kluge
around those, but for future-proofing it seems better to pass the packet
metadata from userspace to the kernel.  That is what this commit does.

This commit introduces a user-kernel protocol break.  We could avoid that,
if it is desirable, by making ODP_PACKET_ATTR_KEY optional for
ODP_PACKET_CMD_EXECUTE commands.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-06-01 13:39:51 -07:00
Andrew Evans
76abe283ba datapath: Hash and compare only the part of sw_flow_key actually used.
Currently the whole flow key struct is hashed on every packet received from the
network or userspace. The whole struct is also compared byte-for-byte when
doing flow table lookups. This consumes a fair percentage of CPU time, and most
of the time part of the structure is unused (e.g. the IPv6 fields when handling
IPv4 traffic; the IPv4 fields when handling Ethernet frames).

This commit reorders the fields in the flow key struct to put the least
commonly used elements at the end and changes the hash and comparison functions
to look only at the part that contains data.

Signed-off-by: Andrew Evans <aevans@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-05-18 11:30:07 -07:00
Jesse Gross
2db65bf72c datapath: Pull data into linear area only on demand.
We currently always pull 64 bytes of data (if it exists) into the
skb linear data area when parsing flows.  The theory behind this
is that the data should always be there and it's enough to parse
common flows.  However, this causes a number of problems in
different situations.  The first is that it is not enough to handle
IPv6 so we must pull additional data anyways.  However, the main
problem is that GRO typically allocates a new skb and puts just the
headers in there.  For a typical TCP/IPv4 packet there are 54 bytes
of headers, which means that we must possibly reallocate and copy
on every packet.  In addition, GRO creates frag_lists with this
specific geometry in order to allow later segmentation if the packet
is forwarded to a device that does not support frag_lists.  When
we pull additional data it changes the geometry and causes later
problems for the device.  This patch instead incrementally pulls
data, which avoids these problems.

Signed-off-by: Jesse Gross <jesse@nicira.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
2011-05-11 11:37:17 -07:00
Ben Pfaff
6f04e94a26 datapath: Avoid freeing wild pointer in corner case.
In odp_flow_cmd_new_or_set(), if flow_actions_alloc() fails in the "new
flow" case, then flow_put() will kfree() the new flow's 'sf_acts' pointer,
but nothing has initialized that pointer.  Initialize the pointer to NULL
to avoid the problem.

Found by inspection.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-04-29 10:47:10 -07:00
Jesse Gross
bfef471742 datapath: Update IPv6 parsing code for kernel style.
Fixes a number of minor elements in the IPv6 extraction and
parsing code to better conform to kernel style.  Examples include
using kernel types/functions, adding line breaks, and using
unlikely() macros.  There is no functional change.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2011-03-03 12:17:40 -08:00
Jesse Gross
e977ba19df datapath: Allow jumbograms through IPv6 parsing.
Currently we stop parsing packets that are IPv6 jumbograms.  While
it isn't possible to send such large packets to userspace, it's better
to drop them at that point rather than prematurely in the IPv6 code.
IPv6 does make some use of the payload length field but we can just as
easily use skb->len, which is what all other parsing uses.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2011-03-03 12:13:13 -08:00
Justin Pettit
18c4363158 datapath: Better calculate max nlattr-formatted flow size.
Both userspace and the kernel allocate space based on the max size of a
nlattr-formatted flow.  It was easy to change the max size of a flow
definition and cause crashes by forgetting to update one or both of
those definitions.  This commit attempts to make that harder by
providing a better description of how the max size is calculated and a
build check to look for a common indication that it may have changed.

Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-02-07 15:40:35 -08:00
Jesse Gross
6ce3921345 datapath: Use vlan acceleration for vlan operations.
Using the kernel vlan acceleration has a number of benefits:
it enables hardware tagging, allows usage of TSO and checksum
offloading, and is generally easier to manipulate.  This switches
the vlan actions to use skb->vlan_tci field for any necessary
changes.  In places that do not support vlan acceleration in a way
that we can use (in particular kernels before 2.6.37) we perform
any necessary conversions, such as tagging and GSO before the
packet leaves Open vSwitch.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2011-02-07 13:49:01 -08:00
Ben Pfaff
535d6987a7 Zero padding bytes in odp_key_ipv4, odp_key_arp.
This is a potential security issue for the kernel.  In userspace it just
provokes false-positive valgrind warnings (which is how I found it).

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-02-03 14:55:28 -08:00
Justin Pettit
685a51a5b8 nicira-ext: Support matching IPv6 Neighbor Discovery messages.
IPv6 uses Neighbor Discovery messages in a similar manner to how IPv4
uses ARP.  This commit adds support for matching deeper into the
payloads of Neighbor Solicitation (NS) and Neighbor Advertisement (NA)
messages.  Currently, the matching fields include:

    - NS and NA Target (nd_target)
    - NS Source Link Layer Address (nd_sll)
    - NA Target Link Layer Address (nd_tll)

When defining IPv6 Neighbor Discovery rules, the Nicira Extensible Match
(NXM) extension to OVS must be used.

Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2011-02-02 13:22:34 -08:00
Justin Pettit
d31f1109f1 nicira-ext: Support matching IPv6 traffic.
Provides ability to match over IPv6 traffic in the same manner as IPv4.
Currently, the matching fields include:

    - IPv6 source and destination addresses (ipv6_src and ipv6_dst)
    - Traffic Class (nw_tos)
    - Next Header (nw_proto)
    - ICMPv6 Type and Code (icmp_type and icmp_code)
    - TCP and UDP Ports over IPv6 (tp_src and tp_dst)

When defining IPv6 rules, the Nicira Extensible Match (NXM) extension to
OVS must be used.

Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2011-02-02 12:53:26 -08:00
Justin Pettit
bad68a9965 nicira-ext: Support matching ARP source and target hardware addresses.
OpenFlow 1.0 doesn't allow matching on the ARP source and target
hardware address.  This has caused us to introduce hacks such as the
Drop Spoofed ARP action.  Now that we have extensible match, we can
match on more fields within ARP:

    - Source Hardware Address (arp_sha)
    - Target Hardware Address (arp_tha)

Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2011-02-02 12:42:40 -08:00
Jesse Gross
ec58547a81 datapath: Fix flow time used computation.
The current reporting of flow last used time has two issues that
cause it to incorrectly report the system monotonic time when the
flow was last used.

The first is that it simply converts the stored jiffies value to
milliseconds by scaling with a constant.  This does not work because
jiffies is not zero based and can wrap around on 32-bit platforms.

The second is there is no guarantee that jiffies advances at the
same rate as the RTC based monotonic time that userspace uses.
A variety of factors can cause differences, including system suspend
and clock drift.  These are not too important for relatively short
time periods such as the duration of the flow (nor is the flow timing
precision of extreme importance).  However, when the time being
measured is the duration since system boot (assuming that the above
issues had been addressed) the difference can become significant.

This addresses both issues by restoring behavior similar to the
previous method of computing the flow used time, though in a
slightly different form to reflect the needs of the Netlink code.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2011-01-31 14:56:44 -08:00
Ben Pfaff
37a1300c3c datapath: Convert ODP_FLOW_* commands to use AF_NETLINK socket layer.
This completes the transition to the Generic Netlink interface, and
so this commit restores support for Linux 2.6.18 and later.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-01-28 15:34:30 -08:00
Ben Pfaff
d656937779 datapath: Convert datapath operations to use Netlink framing.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-01-27 21:08:40 -08:00
Ben Pfaff
856081f683 datapath: Report kernel's flow key when passing packets up to userspace.
One of the goals for Open vSwitch is to decouple kernel and userspace
software, so that either one can be upgraded or rolled back independent of
the other.  To do this in full generality, it must be possible to change
the kernel's idea of the flow key separately from the userspace version.

This commit takes one step in that direction by making the kernel report
its idea of the flow that a packet belongs to whenever it passes a packet
up to userspace.  This means that userspace can intelligently figure out
what to do:

   - If userspace's notion of the flow for the packet matches the kernel's,
     then nothing special is necessary.

   - If the kernel has a more specific notion for the flow than userspace,
     for example if the kernel decoded IPv6 headers but userspace stopped
     at the Ethernet type (because it does not understand IPv6), then again
     nothing special is necessary: userspace can still set up the flow in
     the usual way.

   - If userspace has a more specific notion for the flow than the kernel,
     for example if userspace decoded an IPv6 header but the kernel
     stopped at the Ethernet type, then userspace can forward the packet
     manually, without setting up a flow in the kernel.  (This case is
     bad from a performance point of view, but at least it is correct.)

This commit does not actually make userspace flexible enough to handle
changes in the kernel flow key structure, although userspace does now
have enough information to do that intelligently.  This will have to wait
for later commits.

This commit is bigger than it would otherwise be because it is rolled
together with changing "struct odp_msg" to a sequence of Netlink
attributes.  The alternative, to do each of those changes in a separate
patch, seemed like overkill because it meant that either we would have to
introduce and then kill off Netlink attributes for in_port and tun_id, if
Netlink conversion went first, or shove yet another variable-length header
into the stuff already after odp_msg, if adding the flow key to odp_msg
went first.

This commit will slow down performance of checksumming packets sent up to
userspace.  I'm not entirely pleased with how I did it.  I considered a
couple of alternatives, but none of them seemed that much better.
Suggestions welcome.  Not changing anything wasn't an option,
unfortunately.  At any rate some slowdown will become unavoidable when OVS
actually starts using Netlink instead of just Netlink framing.

(Actually, I thought of one option where we could avoid that: make
userspace do the checksum instead, by passing csum_start and csum_offset as
part of what goes to userspace.  But that's not perfect either.)

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-01-27 21:08:36 -08:00
Ben Pfaff
36956a7d33 datapath: Convert odp_flow_key to use Netlink attributes instead.
One of the goals for Open vSwitch is to decouple kernel and userspace
software, so that either one can be upgraded or rolled back independent of
the other.  To do this in full generality, it must be possible to change
the kernel's idea of the flow key separately from the userspace version.
In turn, that means that flow keys must become variable-length.  This
commit makes that change using Netlink attribute sequences.

This commit does not actually make userspace flexible enough to handle
changes in the kernel flow key structure, because userspace doesn't yet
have enough information to do that intelligently.  Upcoming commits will
fix that.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-01-27 21:08:35 -08:00
Ben Pfaff
84c17d988c datapath: Consistently parenthesize the operand of 'sizeof'.
This is proper kernel style.

Kernel style also encourages using a type name instead of an expression as
sizeof's operand, but this patch doesn't make any of those changes.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-01-17 15:01:43 -08:00
Ben Pfaff
f632c8fc81 datapath: Get rid of compat.h, compat26.h in favor of modern approach.
I had completely forgotten that we had a top-level compat.h and compat26.h.
It's better to distribute their contents to individual compat headers, so
this commit does so and deletes them.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-01-13 12:26:15 -08:00
Simon Horman
875244c7a9 datapath: Treat GSO skbs as if they were fragments
In dp_output_control() UDP GSO skbs are split into fragments which are
passed to userspace.  So the resulting flow set-up by the controller
(I am using ovs-vswitchd) is created based on a fragment.  This means
that the UDP source and destination port of the flow is zero.

In order for the datapath to match the resulting flow flow_extract() needs
to treat UDP GSO skbs as if they are fragments.  That is, set the UDP
source and destination port to 0.

A flow established for a UDP GSO skb with this change won't match any
subsequent non-GSO skbs, they will need to be passed to the controller and
a new flow established. But without this change no UDP GSO skbs will ever
match any flow.

I noticed this while using KVM using virtio with VhostNet and netperf's
UDP_STREAM test. The result was that the test sent ~5Gbit/s but only a
small fraction of that was received by the other side. Much less than the
1Gbit/s available on the physical link between the host (and guest) and the
machine running netserver. 100% of one of the host's CPUs was consumed, 50%
for the host and 50% for the guest.  The host consumption was contributed
to largely by ovs-vswitchd.

With this change I get a much nicer result of a fraction under 1Gbit/s sent
and almost all packets ending up at the other end.

Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2011-01-05 19:09:39 -08:00
Jesse Gross
39872c70e2 datapath: Add casts for direct freeing of RCU data.
There are a few places where we have two levels of RCU protected
data to allow the second level to change independently of the
first.  Although the two pieces are independent, they have the
same users and lifetime of the first level always exceeds that
of the second level.  This means that we can directly free the
second level when it is safe to free the first.  This implies
that we directly access RCU-protected data, which is generally
not allowed.  There are no locks to check, so none of the normal
RCU functions apply.  Instead, this adds an explicit cast.

Found with sparse.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2010-12-29 10:31:11 -08:00
Jesse Gross
8dda8c9b63 datapath: Correct byte order annotations.
We have generally been using the byte order specific data types
(i.e. __be32 instead of u32) in most places.  This corrects a
declaration and adds a few needed casts.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2010-12-13 13:40:20 -08:00
Ben Pfaff
cdee00fd63 datapath: Replace "struct odp_action" by Netlink attributes.
In the medium term, we plan to migrate the datapath to use Netlink as its
communication channel.  In the short term, we need to be able to have
actions with 64-bit arguments but "struct odp_action" only has room for
48 bits.  So this patch shifts to variable-length arguments using Netlink
attributes, which starts in on the Netlink transition and makes 64-bit
arguments possible at the same time.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2010-12-10 11:13:32 -08:00
Jesse Gross
33b38b63e4 datapath: Use static where possible.
Mark functions and global variables used only in a single file as
static.

Found with sparse.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2010-12-09 17:43:36 -08:00
Jesse Gross
83e3e75ba6 datapath: Use __read_mostly annotations where appropriate.
Variables which are changed only infrequently should be annotated
with __read_mostly, which will group them together in a special
linker section.  This prevents them from sharing cache lines with
data on the hot path.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2010-12-02 17:10:15 -08:00
Jesse Gross
eb88521dd6 datapath: Drop obsolete comment.
The comment above flow_extract() refers to setting OVS_CB(skb)->is_frag
but that member no longer exists.  The correct way to set is_frag is
already documented, so just drop the incorrect comment.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2010-12-02 17:10:14 -08:00
Ben Pfaff
26233bb461 datapath: Combine dl_vlan and dl_vlan_pcp.
This allows eliminating padding from odp_flow_key, although actually doing
that is postponed until the next commit.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2010-10-11 13:31:43 -07:00
Jesse Gross
b7a31ec13d datapath: Move is_frag out of struct ovs_skb_cb.
is_frag is only used for communication between two functions, which
means that it doesn't really need to be in the SKB CB.  This wouldn't
necessarily be a problem except that there are also a number of other
paths that lead to this being uninitialized.  This isn't a problem
now but uninitialized memory seems dangerous and there isn't much
upside.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Reviewed-by: Ben Pfaff <blp@nicira.com>
2010-09-22 13:43:01 -07:00