2
0
mirror of https://github.com/openvswitch/ovs synced 2025-10-25 15:07:05 +00:00
Commit Graph

57 Commits

Author SHA1 Message Date
Jesse Gross
2db65bf72c datapath: Pull data into linear area only on demand.
We currently always pull 64 bytes of data (if it exists) into the
skb linear data area when parsing flows.  The theory behind this
is that the data should always be there and it's enough to parse
common flows.  However, this causes a number of problems in
different situations.  The first is that it is not enough to handle
IPv6 so we must pull additional data anyways.  However, the main
problem is that GRO typically allocates a new skb and puts just the
headers in there.  For a typical TCP/IPv4 packet there are 54 bytes
of headers, which means that we must possibly reallocate and copy
on every packet.  In addition, GRO creates frag_lists with this
specific geometry in order to allow later segmentation if the packet
is forwarded to a device that does not support frag_lists.  When
we pull additional data it changes the geometry and causes later
problems for the device.  This patch instead incrementally pulls
data, which avoids these problems.

Signed-off-by: Jesse Gross <jesse@nicira.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
2011-05-11 11:37:17 -07:00
Ben Pfaff
6f04e94a26 datapath: Avoid freeing wild pointer in corner case.
In odp_flow_cmd_new_or_set(), if flow_actions_alloc() fails in the "new
flow" case, then flow_put() will kfree() the new flow's 'sf_acts' pointer,
but nothing has initialized that pointer.  Initialize the pointer to NULL
to avoid the problem.

Found by inspection.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-04-29 10:47:10 -07:00
Jesse Gross
bfef471742 datapath: Update IPv6 parsing code for kernel style.
Fixes a number of minor elements in the IPv6 extraction and
parsing code to better conform to kernel style.  Examples include
using kernel types/functions, adding line breaks, and using
unlikely() macros.  There is no functional change.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2011-03-03 12:17:40 -08:00
Jesse Gross
e977ba19df datapath: Allow jumbograms through IPv6 parsing.
Currently we stop parsing packets that are IPv6 jumbograms.  While
it isn't possible to send such large packets to userspace, it's better
to drop them at that point rather than prematurely in the IPv6 code.
IPv6 does make some use of the payload length field but we can just as
easily use skb->len, which is what all other parsing uses.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2011-03-03 12:13:13 -08:00
Justin Pettit
18c4363158 datapath: Better calculate max nlattr-formatted flow size.
Both userspace and the kernel allocate space based on the max size of a
nlattr-formatted flow.  It was easy to change the max size of a flow
definition and cause crashes by forgetting to update one or both of
those definitions.  This commit attempts to make that harder by
providing a better description of how the max size is calculated and a
build check to look for a common indication that it may have changed.

Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-02-07 15:40:35 -08:00
Jesse Gross
6ce3921345 datapath: Use vlan acceleration for vlan operations.
Using the kernel vlan acceleration has a number of benefits:
it enables hardware tagging, allows usage of TSO and checksum
offloading, and is generally easier to manipulate.  This switches
the vlan actions to use skb->vlan_tci field for any necessary
changes.  In places that do not support vlan acceleration in a way
that we can use (in particular kernels before 2.6.37) we perform
any necessary conversions, such as tagging and GSO before the
packet leaves Open vSwitch.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2011-02-07 13:49:01 -08:00
Ben Pfaff
535d6987a7 Zero padding bytes in odp_key_ipv4, odp_key_arp.
This is a potential security issue for the kernel.  In userspace it just
provokes false-positive valgrind warnings (which is how I found it).

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-02-03 14:55:28 -08:00
Justin Pettit
685a51a5b8 nicira-ext: Support matching IPv6 Neighbor Discovery messages.
IPv6 uses Neighbor Discovery messages in a similar manner to how IPv4
uses ARP.  This commit adds support for matching deeper into the
payloads of Neighbor Solicitation (NS) and Neighbor Advertisement (NA)
messages.  Currently, the matching fields include:

    - NS and NA Target (nd_target)
    - NS Source Link Layer Address (nd_sll)
    - NA Target Link Layer Address (nd_tll)

When defining IPv6 Neighbor Discovery rules, the Nicira Extensible Match
(NXM) extension to OVS must be used.

Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2011-02-02 13:22:34 -08:00
Justin Pettit
d31f1109f1 nicira-ext: Support matching IPv6 traffic.
Provides ability to match over IPv6 traffic in the same manner as IPv4.
Currently, the matching fields include:

    - IPv6 source and destination addresses (ipv6_src and ipv6_dst)
    - Traffic Class (nw_tos)
    - Next Header (nw_proto)
    - ICMPv6 Type and Code (icmp_type and icmp_code)
    - TCP and UDP Ports over IPv6 (tp_src and tp_dst)

When defining IPv6 rules, the Nicira Extensible Match (NXM) extension to
OVS must be used.

Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2011-02-02 12:53:26 -08:00
Justin Pettit
bad68a9965 nicira-ext: Support matching ARP source and target hardware addresses.
OpenFlow 1.0 doesn't allow matching on the ARP source and target
hardware address.  This has caused us to introduce hacks such as the
Drop Spoofed ARP action.  Now that we have extensible match, we can
match on more fields within ARP:

    - Source Hardware Address (arp_sha)
    - Target Hardware Address (arp_tha)

Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2011-02-02 12:42:40 -08:00
Jesse Gross
ec58547a81 datapath: Fix flow time used computation.
The current reporting of flow last used time has two issues that
cause it to incorrectly report the system monotonic time when the
flow was last used.

The first is that it simply converts the stored jiffies value to
milliseconds by scaling with a constant.  This does not work because
jiffies is not zero based and can wrap around on 32-bit platforms.

The second is there is no guarantee that jiffies advances at the
same rate as the RTC based monotonic time that userspace uses.
A variety of factors can cause differences, including system suspend
and clock drift.  These are not too important for relatively short
time periods such as the duration of the flow (nor is the flow timing
precision of extreme importance).  However, when the time being
measured is the duration since system boot (assuming that the above
issues had been addressed) the difference can become significant.

This addresses both issues by restoring behavior similar to the
previous method of computing the flow used time, though in a
slightly different form to reflect the needs of the Netlink code.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2011-01-31 14:56:44 -08:00
Ben Pfaff
37a1300c3c datapath: Convert ODP_FLOW_* commands to use AF_NETLINK socket layer.
This completes the transition to the Generic Netlink interface, and
so this commit restores support for Linux 2.6.18 and later.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-01-28 15:34:30 -08:00
Ben Pfaff
d656937779 datapath: Convert datapath operations to use Netlink framing.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-01-27 21:08:40 -08:00
Ben Pfaff
856081f683 datapath: Report kernel's flow key when passing packets up to userspace.
One of the goals for Open vSwitch is to decouple kernel and userspace
software, so that either one can be upgraded or rolled back independent of
the other.  To do this in full generality, it must be possible to change
the kernel's idea of the flow key separately from the userspace version.

This commit takes one step in that direction by making the kernel report
its idea of the flow that a packet belongs to whenever it passes a packet
up to userspace.  This means that userspace can intelligently figure out
what to do:

   - If userspace's notion of the flow for the packet matches the kernel's,
     then nothing special is necessary.

   - If the kernel has a more specific notion for the flow than userspace,
     for example if the kernel decoded IPv6 headers but userspace stopped
     at the Ethernet type (because it does not understand IPv6), then again
     nothing special is necessary: userspace can still set up the flow in
     the usual way.

   - If userspace has a more specific notion for the flow than the kernel,
     for example if userspace decoded an IPv6 header but the kernel
     stopped at the Ethernet type, then userspace can forward the packet
     manually, without setting up a flow in the kernel.  (This case is
     bad from a performance point of view, but at least it is correct.)

This commit does not actually make userspace flexible enough to handle
changes in the kernel flow key structure, although userspace does now
have enough information to do that intelligently.  This will have to wait
for later commits.

This commit is bigger than it would otherwise be because it is rolled
together with changing "struct odp_msg" to a sequence of Netlink
attributes.  The alternative, to do each of those changes in a separate
patch, seemed like overkill because it meant that either we would have to
introduce and then kill off Netlink attributes for in_port and tun_id, if
Netlink conversion went first, or shove yet another variable-length header
into the stuff already after odp_msg, if adding the flow key to odp_msg
went first.

This commit will slow down performance of checksumming packets sent up to
userspace.  I'm not entirely pleased with how I did it.  I considered a
couple of alternatives, but none of them seemed that much better.
Suggestions welcome.  Not changing anything wasn't an option,
unfortunately.  At any rate some slowdown will become unavoidable when OVS
actually starts using Netlink instead of just Netlink framing.

(Actually, I thought of one option where we could avoid that: make
userspace do the checksum instead, by passing csum_start and csum_offset as
part of what goes to userspace.  But that's not perfect either.)

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-01-27 21:08:36 -08:00
Ben Pfaff
36956a7d33 datapath: Convert odp_flow_key to use Netlink attributes instead.
One of the goals for Open vSwitch is to decouple kernel and userspace
software, so that either one can be upgraded or rolled back independent of
the other.  To do this in full generality, it must be possible to change
the kernel's idea of the flow key separately from the userspace version.
In turn, that means that flow keys must become variable-length.  This
commit makes that change using Netlink attribute sequences.

This commit does not actually make userspace flexible enough to handle
changes in the kernel flow key structure, because userspace doesn't yet
have enough information to do that intelligently.  Upcoming commits will
fix that.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-01-27 21:08:35 -08:00
Ben Pfaff
84c17d988c datapath: Consistently parenthesize the operand of 'sizeof'.
This is proper kernel style.

Kernel style also encourages using a type name instead of an expression as
sizeof's operand, but this patch doesn't make any of those changes.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-01-17 15:01:43 -08:00
Ben Pfaff
f632c8fc81 datapath: Get rid of compat.h, compat26.h in favor of modern approach.
I had completely forgotten that we had a top-level compat.h and compat26.h.
It's better to distribute their contents to individual compat headers, so
this commit does so and deletes them.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2011-01-13 12:26:15 -08:00
Simon Horman
875244c7a9 datapath: Treat GSO skbs as if they were fragments
In dp_output_control() UDP GSO skbs are split into fragments which are
passed to userspace.  So the resulting flow set-up by the controller
(I am using ovs-vswitchd) is created based on a fragment.  This means
that the UDP source and destination port of the flow is zero.

In order for the datapath to match the resulting flow flow_extract() needs
to treat UDP GSO skbs as if they are fragments.  That is, set the UDP
source and destination port to 0.

A flow established for a UDP GSO skb with this change won't match any
subsequent non-GSO skbs, they will need to be passed to the controller and
a new flow established. But without this change no UDP GSO skbs will ever
match any flow.

I noticed this while using KVM using virtio with VhostNet and netperf's
UDP_STREAM test. The result was that the test sent ~5Gbit/s but only a
small fraction of that was received by the other side. Much less than the
1Gbit/s available on the physical link between the host (and guest) and the
machine running netserver. 100% of one of the host's CPUs was consumed, 50%
for the host and 50% for the guest.  The host consumption was contributed
to largely by ovs-vswitchd.

With this change I get a much nicer result of a fraction under 1Gbit/s sent
and almost all packets ending up at the other end.

Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2011-01-05 19:09:39 -08:00
Jesse Gross
39872c70e2 datapath: Add casts for direct freeing of RCU data.
There are a few places where we have two levels of RCU protected
data to allow the second level to change independently of the
first.  Although the two pieces are independent, they have the
same users and lifetime of the first level always exceeds that
of the second level.  This means that we can directly free the
second level when it is safe to free the first.  This implies
that we directly access RCU-protected data, which is generally
not allowed.  There are no locks to check, so none of the normal
RCU functions apply.  Instead, this adds an explicit cast.

Found with sparse.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2010-12-29 10:31:11 -08:00
Jesse Gross
8dda8c9b63 datapath: Correct byte order annotations.
We have generally been using the byte order specific data types
(i.e. __be32 instead of u32) in most places.  This corrects a
declaration and adds a few needed casts.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2010-12-13 13:40:20 -08:00
Ben Pfaff
cdee00fd63 datapath: Replace "struct odp_action" by Netlink attributes.
In the medium term, we plan to migrate the datapath to use Netlink as its
communication channel.  In the short term, we need to be able to have
actions with 64-bit arguments but "struct odp_action" only has room for
48 bits.  So this patch shifts to variable-length arguments using Netlink
attributes, which starts in on the Netlink transition and makes 64-bit
arguments possible at the same time.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2010-12-10 11:13:32 -08:00
Jesse Gross
33b38b63e4 datapath: Use static where possible.
Mark functions and global variables used only in a single file as
static.

Found with sparse.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2010-12-09 17:43:36 -08:00
Jesse Gross
83e3e75ba6 datapath: Use __read_mostly annotations where appropriate.
Variables which are changed only infrequently should be annotated
with __read_mostly, which will group them together in a special
linker section.  This prevents them from sharing cache lines with
data on the hot path.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2010-12-02 17:10:15 -08:00
Jesse Gross
eb88521dd6 datapath: Drop obsolete comment.
The comment above flow_extract() refers to setting OVS_CB(skb)->is_frag
but that member no longer exists.  The correct way to set is_frag is
already documented, so just drop the incorrect comment.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2010-12-02 17:10:14 -08:00
Ben Pfaff
26233bb461 datapath: Combine dl_vlan and dl_vlan_pcp.
This allows eliminating padding from odp_flow_key, although actually doing
that is postponed until the next commit.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2010-10-11 13:31:43 -07:00
Jesse Gross
b7a31ec13d datapath: Move is_frag out of struct ovs_skb_cb.
is_frag is only used for communication between two functions, which
means that it doesn't really need to be in the SKB CB.  This wouldn't
necessarily be a problem except that there are also a number of other
paths that lead to this being uninitialized.  This isn't a problem
now but uninitialized memory seems dangerous and there isn't much
upside.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Reviewed-by: Ben Pfaff <blp@nicira.com>
2010-09-22 13:43:01 -07:00
Jesse Gross
fb8c93473e datapath: Add ref counting for flows.
Currently flows are only used within the confines of one
rcu_read_lock()/rcu_read_unlock() session.  However, with the
addition of header caching we will need to hold references to flows
for longer periods of time.  This adds support for that by adding
refcounts to flows.  RCU is still used for normal packet handling
to avoid a performance impact from constantly updating the refcount.
However, instead of directly freeing the flow after a grace period
we simply decrement the refcount.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Reviewed-by: Ben Pfaff <blp@nicira.com>
2010-09-22 13:43:01 -07:00
Jesse Gross
560e802229 datapath: Move flow allocation into a function.
As the process to allocate a flow becomes more involved it becomes
more cumbersome for the code to be mixed in with the general
datapath so split it out into a new function.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Reviewed-by: Ben Pfaff <blp@nicira.com>
2010-09-22 13:43:01 -07:00
Ben Pfaff
722d19c504 datapath: Increase maximum number of actions per flow.
Until now the number of actions in a flow has been limited to what fits in
a page.  Each action is 8 bytes, and on 32-bit architectures there is a
12-byte header, so with 4-kB pages that limits flows to 510 actions.  We
and Citrix have noticed that OVS stops working properly after about 509
VIFs are added to a bridge.  According to log messages this is the reason:
at this point it is no longer possible to flood a packet to all ports.

This commit should help, by increasing the maximum number of actions in a
flow.  In the long term, though, we should adopt use of port groups or
otherwise reduce the number of actions needed to flood a packet.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Bug #3573.
NIC-234.
2010-09-14 13:37:44 -07:00
Joe Perches
d295e8e97a treewide: Remove trailing whitespace
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2010-08-30 13:23:08 -07:00
Ben Pfaff
ca78c6b69c datapath: Avoid accesses past the end of skbuff data in actions.
Some of the flow actions that modify skbuff data did not check that the
skbuff was long enough before doing so.  This commit fixes that problem.

Previously, the strategy for avoiding this was to only indicate the layer-3
nw_proto field in the flow if the corresponding layer-4 header was fully
present, so that if, for example, nw_proto was IPPROTO_TCP, this meant
that a TCP header was present.  The original motivation for this patch was
to add corresponding code to only indicate a layer-2 dl_type if the
corresponding layer-3 header was fully present.  But I'm now convinced that
this approach is conceptually wrong, because the meaning of a layer-N
header should not be affected by the meaning of a layer-(N+1) header.

This commit switches to a new approach.  Now, when a header is missing, its
fields in the flow are simply zeroed and have no effect on the "type" field
for the outer header.  Responsibility for ensuring that a header is fully
present is now shifted to the actions that wish to modify that header.

Signed-off-by: Ben Pfaff <blp@nicira.com>
2010-08-27 12:42:39 -07:00
Ben Pfaff
59a18f80dd datapath: Fix default value of skb transport_header.
This commit started out as simply better documenting flow_extract(),
but then I realized that nothing cares about transport_header in the
non-IP case, so don't bother with it at all.

Signed-off-by: Ben Pfaff <blp@nicira.com>
2010-08-27 12:41:00 -07:00
Ben Pfaff
7d0ab001db datapath: Avoid pskb_may_pull() checks where not needed.
These calls to pskb_may_pull() can be reduced to checks on skb->len because
in these contexts those headers will already have been pulled into the
skb linear area if it is there at all.

Signed-off-by: Ben Pfaff <blp@nicira.com>
2010-08-27 12:32:07 -07:00
Ben Pfaff
4c1ad23312 datapath: Report memory allocation errors in flow_extract().
Until now flow_extract() has simply returned a bogus flow when memory
allocation errors occurred.  This fixes the problem by propagating the
error to the caller.

Signed-off-by: Ben Pfaff <blp@nicira.com>
2010-08-27 12:32:05 -07:00
Ben Pfaff
401eeb92d3 Add Nicira extension to OpenFlow for dropping spoofed ARP packets.
"ARP spoofing" is when a host claims an incorrect association between an
IP address and a MAC address for deceptive purposes.  OpenFlow by itself
can prevent a host from sending out ARP replies from an incorrect MAC
address in the Ethernet L2 header, but it cannot control the MAC addresses
inside the ARP L3 packet.  This commit adds a new action that can be used
to drop these spoofed packets.

CC: Paul Ingram <paul@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2010-08-26 10:56:20 -07:00
Ben Pfaff
769f8ccd5f datapath: Free up flow_extract() return value for reporting errors.
flow_extract() can fail due to memory allocation errors in pskb_may_pull().
Currently it doesn't return those properly, instead just reporting a bogus
flow to the caller.  But its return value is currently in use for reporting
whether the packet was an IPv4 fragment.  This commit switches to reporting
that in the skb itself so that the return value can be reused to report
errors.

Signed-off-by: Ben Pfaff <blp@nicira.com>
2010-08-26 09:15:42 -07:00
Ben Pfaff
a31e0e31c6 datapath: Remove skb->len >= ETH_HLEN check from flow_extract().
The callers ensure that this is already the case.

Signed-off-by: Ben Pfaff <blp@nicira.com>
2010-08-26 09:15:42 -07:00
Ben Pfaff
e819fb4762 datapath: Use 'bool' instead of 'int' where appropriate.
'bool' is better modern kernel style.

Signed-off-by: Ben Pfaff <blp@nicira.com>
2010-08-26 09:15:42 -07:00
Ben Pfaff
d9fce1ca23 datapath: Use min() instead of open-coding it.
Signed-off-by: Ben Pfaff <blp@nicira.com>
2010-08-26 09:15:42 -07:00
Ben Pfaff
50f06e1642 datapath: Fix handling of 802.1Q and SNAP headers.
The kernel and user datapaths have code that assumes that 802.1Q headers
are used only inside Ethernet II frames, not inside SNAP-encapsulated
frames.  But the kernel and user flow_extract() implementations would
interpret 802.1Q headers inside SNAP headers as being valid VLANs.  This
would cause packet corruption if any VLAN-related actions were to be taken,
so change the two flow_extract() implementations only to accept 802.1Q as
an Ethernet II frame type, not as a SNAP-encoded frame type.

802.1Q-2005 says that this is correct anyhow:

    Where the ISS instance used to transmit and receive tagged frames is
    provided by a media access control method that can support Ethernet
    Type encoding directly (e.g., is an IEEE 802.3 or IEEE 802.11 MAC) or
    is media access method independent (e.g., 6.6), the TPID is Ethernet
    Type encoded, i.e., is two octets in length and comprises solely the
    assigned Ethernet Type value.

    Where the ISS instance is provided by a media access method that
    cannot directly support Ethernet Type encoding (e.g., is an IEEE
    802.5 or FDDI MAC), the TPID is encoded according to the rule for
    a Subnetwork Access Protocol (Clause 10 of IEEE Std 802) that
    encapsulates Ethernet frames over LLC, and comprises the SNAP
    header (AA-AA-03) followed by the SNAP PID (00-00-00) followed by
    the two octets of the assigned Ethernet Type value.

All of the media that OVS handles supports Ethernet Type fields, so to me
that means that we don't have to handle 802.1Q-inside-SNAP.

On the other hand, we *do* have to handle SNAP-inside-802.1Q, because this
is actually allowed by the standards.  So this commit also adds that
support.

I verified that, with this change, both SNAP and Ethernet packets are
properly recognized both with and without 802.1Q encapsulation.

I was a bit surprised to find out that Linux does not accept
SNAP-encapsulated IP frames on Ethernet.

Here's a summary of how frames are handled before and after this commit:

Common cases
------------

       Ethernet
    +------------+
1.  |dst|src|TYPE|
    +------------+

       Ethernet       LLC         SNAP
    +------------+ +--------+ +-----------+
2.  |dst|src| len| |aa|aa|03| |000000|TYPE|
    +------------+ +--------+ +-----------+

       Ethernet       802.1Q
    +------------+ +---------+
3.  |dst|src|8100| |VLAN|TYPE|
    +------------+ +---------+

       Ethernet       802.1Q      LLC         SNAP
    +------------+ +---------+ +--------+ +-----------+
4.  |dst|src|8100| |VLAN| LEN| |aa|aa|03| |000000|TYPE|
    +------------+ +---------+ +--------+ +-----------+

Unusual cases
-------------

       Ethernet       LLC         SNAP         802.1Q
    +------------+ +--------+ +-----------+ +---------+
5.  |dst|src| len| |aa|aa|03| |000000|8100| |VLAN|TYPE|
    +------------+ +--------+ +-----------+ +---------+

       Ethernet       LLC
    +------------+ +--------+
6.  |dst|src| len| |xx|xx|xx|
    +------------+ +--------+

       Ethernet       LLC         SNAP
    +------------+ +--------+ +-----------+
7.  |dst|src| len| |aa|aa|03| |xxxxxx|xxxx|
    +------------+ +--------+ +-----------+

       Ethernet       802.1Q      LLC
    +------------+ +---------+ +--------+
8.  |dst|src|8100| |VLAN| LEN| |xx|xx|xx|
    +------------+ +---------+ +--------+

       Ethernet       802.1Q      LLC         SNAP
    +------------+ +---------+ +--------+ +-----------+
9.  |dst|src|8100| |VLAN| LEN| |aa|aa|03| |xxxxxx|xxxx|
    +------------+ +---------+ +--------+ +-----------+

Behavior
--------

   ---------------  ---------------  -------------------------------------
       Before           After
     this commit      this commit
   dl_type dl_vlan  dl_type dl_vlan  Notes
   ------- -------  ------- -------  -------------------------------------
1.   TYPE    ffff     TYPE    ffff   no change
2.   TYPE    ffff     TYPE    ffff   no change
3.   TYPE    VLAN     TYPE    VLAN   no change
4.    LEN    VLAN     TYPE    VLAN   proposal fixes behavior
5.   TYPE    VLAN     8100    ffff   802.1Q says this is invalid framing
6.   05ff    ffff     05ff    ffff   no change
7.   05ff    ffff     05ff    ffff   no change
8.    LEN    VLAN     05ff    VLAN   proposal fixes behavior
9.    LEN    VLAN     05ff    VLAN   proposal fixes behavior

Signed-off-by: Ben Pfaff <blp@nicira.com>
2010-08-10 11:35:46 -07:00
Ben Pfaff
9d2094938d datapath: Inline flow_cast().
This function is both trivial and on the packet processing fast path, so
expand it inline.
2010-08-02 20:16:32 -07:00
Ben Pfaff
abfec86556 datapath: Don't track IP TOS value two different ways.
Originally, the datapath didn't care about IP TOS at all.  Then, to support
NetFlow, we made it keep track of the last-seen IP TOS value on a per-flow
basis.  Then, to support OpenFlow 1.0, we added a nw_tos field to
odp_flow_key.  We don't need both methods, so this commit drops the
NetFlow-specific tracking.

This introduces a small kernel ABI break: upgrading the kernel module
without upgrading the OVS userspace will mean that NetFlow records will
all show an IP TOS value of 0.  I don't consider that to be a serious
problem.
2010-08-02 20:16:32 -07:00
Jesse Gross
b4a7d61582 datapath: Remove dead code.
Several blocks of code were either no longer being called or had
been "#if 0"'d out for a long time.  This removes them.
2010-07-30 13:41:59 -07:00
Jesse Gross
6bfafa55fb datapath: Don't query time for every packet.
Rather than actually query the time every time a packet comes through,
just store the current jiffies and convert it to actual time when
requested.  GRE is the primary beneficiary of this because the traffic
travels through the datapath twice.  This change reduces CPU utilization
3-4% with GRE.
2010-07-26 14:39:59 -07:00
Jesse Gross
c73814a3e6 timeval: Use monotonic time where appropriate.
Most of the timekeeping needs of OVS are simply to measure intervals,
which means that it is sensitive to changes in the clock.  This commit
replaces the existing clocks with monotonic timers.  An additional set
of wall clock timers are added and used in locations that need absolute
time.

Bug #1858
2010-06-08 18:01:25 -07:00
Jesse Gross
8d5ebd839b datapath: Genericize hash table.
Currently the flow hash table assumes that it is storing flows.
However, we will need additional types of hash tables in the
future so remove assumptions about flows and convert the datapath
to use the new table.
2010-04-19 09:11:57 -04:00
Jesse Gross
f2459fe7d9 datapath: Add generic virtual port layer.
Currently the datapath directly accesses devices through their
Linux functions.  Obviously this doesn't work for virtual devices
that are not backed by an actual Linux device.  This creates a
new virtual port layer which handles all interaction with devices.

The existing support for Linux devices was then implemented on top
of this layer as two device types.  It splits out and renames dp_dev
to internal_dev.  There were several places where datapath devices
had to handled in a special manner and this cleans that up by putting
all the special casing in a single location.
2010-04-19 09:11:57 -04:00
Jesse Gross
659586efcf tunneling: Add support for tunnel ID.
Add a tun_id field which contains the ID of the encapsulating tunnel
on which a packet was received (0 if not received on a tunnel).  Also
add an action which allows the tunnel ID to be set for outgoing
packets.  At this point there aren't any tunnel implementations so
these fields don't have any effect.

The matching is exposed to OpenFlow by overloading the high 32 bits
of the cookie as the tunnel ID.  ovs-ofctl is capable of turning
on this special behavior using a new "tun-cookie" command but this
command is intentially undocumented to avoid it being used without
a full understanding of the consequences.
2010-04-19 09:11:51 -04:00
Jesse Gross
3c5f6de385 datapath: Validate ToS when flow is added.
Check that the ToS is valid when the flow is added, not every time
it is used.
2010-03-15 15:44:41 -04:00
Jesse Gross
f5e86186f3 datapath: Use constants instead of actual values.
Use the appropriate constants instead of the values for masks, shifts,
etc.
2010-03-15 15:44:40 -04:00