Mark functions and global variables used only in a single file as
static.
Found with sparse.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Variables which are changed only infrequently should be annotated
with __read_mostly, which will group them together in a special
linker section. This prevents them from sharing cache lines with
data on the hot path.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
The comment above flow_extract() refers to setting OVS_CB(skb)->is_frag
but that member no longer exists. The correct way to set is_frag is
already documented, so just drop the incorrect comment.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This allows eliminating padding from odp_flow_key, although actually doing
that is postponed until the next commit.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
is_frag is only used for communication between two functions, which
means that it doesn't really need to be in the SKB CB. This wouldn't
necessarily be a problem except that there are also a number of other
paths that lead to this being uninitialized. This isn't a problem
now but uninitialized memory seems dangerous and there isn't much
upside.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Reviewed-by: Ben Pfaff <blp@nicira.com>
Currently flows are only used within the confines of one
rcu_read_lock()/rcu_read_unlock() session. However, with the
addition of header caching we will need to hold references to flows
for longer periods of time. This adds support for that by adding
refcounts to flows. RCU is still used for normal packet handling
to avoid a performance impact from constantly updating the refcount.
However, instead of directly freeing the flow after a grace period
we simply decrement the refcount.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Reviewed-by: Ben Pfaff <blp@nicira.com>
As the process to allocate a flow becomes more involved it becomes
more cumbersome for the code to be mixed in with the general
datapath so split it out into a new function.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Reviewed-by: Ben Pfaff <blp@nicira.com>
Until now the number of actions in a flow has been limited to what fits in
a page. Each action is 8 bytes, and on 32-bit architectures there is a
12-byte header, so with 4-kB pages that limits flows to 510 actions. We
and Citrix have noticed that OVS stops working properly after about 509
VIFs are added to a bridge. According to log messages this is the reason:
at this point it is no longer possible to flood a packet to all ports.
This commit should help, by increasing the maximum number of actions in a
flow. In the long term, though, we should adopt use of port groups or
otherwise reduce the number of actions needed to flood a packet.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Bug #3573.
NIC-234.
Some of the flow actions that modify skbuff data did not check that the
skbuff was long enough before doing so. This commit fixes that problem.
Previously, the strategy for avoiding this was to only indicate the layer-3
nw_proto field in the flow if the corresponding layer-4 header was fully
present, so that if, for example, nw_proto was IPPROTO_TCP, this meant
that a TCP header was present. The original motivation for this patch was
to add corresponding code to only indicate a layer-2 dl_type if the
corresponding layer-3 header was fully present. But I'm now convinced that
this approach is conceptually wrong, because the meaning of a layer-N
header should not be affected by the meaning of a layer-(N+1) header.
This commit switches to a new approach. Now, when a header is missing, its
fields in the flow are simply zeroed and have no effect on the "type" field
for the outer header. Responsibility for ensuring that a header is fully
present is now shifted to the actions that wish to modify that header.
Signed-off-by: Ben Pfaff <blp@nicira.com>
This commit started out as simply better documenting flow_extract(),
but then I realized that nothing cares about transport_header in the
non-IP case, so don't bother with it at all.
Signed-off-by: Ben Pfaff <blp@nicira.com>
These calls to pskb_may_pull() can be reduced to checks on skb->len because
in these contexts those headers will already have been pulled into the
skb linear area if it is there at all.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Until now flow_extract() has simply returned a bogus flow when memory
allocation errors occurred. This fixes the problem by propagating the
error to the caller.
Signed-off-by: Ben Pfaff <blp@nicira.com>
"ARP spoofing" is when a host claims an incorrect association between an
IP address and a MAC address for deceptive purposes. OpenFlow by itself
can prevent a host from sending out ARP replies from an incorrect MAC
address in the Ethernet L2 header, but it cannot control the MAC addresses
inside the ARP L3 packet. This commit adds a new action that can be used
to drop these spoofed packets.
CC: Paul Ingram <paul@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
flow_extract() can fail due to memory allocation errors in pskb_may_pull().
Currently it doesn't return those properly, instead just reporting a bogus
flow to the caller. But its return value is currently in use for reporting
whether the packet was an IPv4 fragment. This commit switches to reporting
that in the skb itself so that the return value can be reused to report
errors.
Signed-off-by: Ben Pfaff <blp@nicira.com>
The kernel and user datapaths have code that assumes that 802.1Q headers
are used only inside Ethernet II frames, not inside SNAP-encapsulated
frames. But the kernel and user flow_extract() implementations would
interpret 802.1Q headers inside SNAP headers as being valid VLANs. This
would cause packet corruption if any VLAN-related actions were to be taken,
so change the two flow_extract() implementations only to accept 802.1Q as
an Ethernet II frame type, not as a SNAP-encoded frame type.
802.1Q-2005 says that this is correct anyhow:
Where the ISS instance used to transmit and receive tagged frames is
provided by a media access control method that can support Ethernet
Type encoding directly (e.g., is an IEEE 802.3 or IEEE 802.11 MAC) or
is media access method independent (e.g., 6.6), the TPID is Ethernet
Type encoded, i.e., is two octets in length and comprises solely the
assigned Ethernet Type value.
Where the ISS instance is provided by a media access method that
cannot directly support Ethernet Type encoding (e.g., is an IEEE
802.5 or FDDI MAC), the TPID is encoded according to the rule for
a Subnetwork Access Protocol (Clause 10 of IEEE Std 802) that
encapsulates Ethernet frames over LLC, and comprises the SNAP
header (AA-AA-03) followed by the SNAP PID (00-00-00) followed by
the two octets of the assigned Ethernet Type value.
All of the media that OVS handles supports Ethernet Type fields, so to me
that means that we don't have to handle 802.1Q-inside-SNAP.
On the other hand, we *do* have to handle SNAP-inside-802.1Q, because this
is actually allowed by the standards. So this commit also adds that
support.
I verified that, with this change, both SNAP and Ethernet packets are
properly recognized both with and without 802.1Q encapsulation.
I was a bit surprised to find out that Linux does not accept
SNAP-encapsulated IP frames on Ethernet.
Here's a summary of how frames are handled before and after this commit:
Common cases
------------
Ethernet
+------------+
1. |dst|src|TYPE|
+------------+
Ethernet LLC SNAP
+------------+ +--------+ +-----------+
2. |dst|src| len| |aa|aa|03| |000000|TYPE|
+------------+ +--------+ +-----------+
Ethernet 802.1Q
+------------+ +---------+
3. |dst|src|8100| |VLAN|TYPE|
+------------+ +---------+
Ethernet 802.1Q LLC SNAP
+------------+ +---------+ +--------+ +-----------+
4. |dst|src|8100| |VLAN| LEN| |aa|aa|03| |000000|TYPE|
+------------+ +---------+ +--------+ +-----------+
Unusual cases
-------------
Ethernet LLC SNAP 802.1Q
+------------+ +--------+ +-----------+ +---------+
5. |dst|src| len| |aa|aa|03| |000000|8100| |VLAN|TYPE|
+------------+ +--------+ +-----------+ +---------+
Ethernet LLC
+------------+ +--------+
6. |dst|src| len| |xx|xx|xx|
+------------+ +--------+
Ethernet LLC SNAP
+------------+ +--------+ +-----------+
7. |dst|src| len| |aa|aa|03| |xxxxxx|xxxx|
+------------+ +--------+ +-----------+
Ethernet 802.1Q LLC
+------------+ +---------+ +--------+
8. |dst|src|8100| |VLAN| LEN| |xx|xx|xx|
+------------+ +---------+ +--------+
Ethernet 802.1Q LLC SNAP
+------------+ +---------+ +--------+ +-----------+
9. |dst|src|8100| |VLAN| LEN| |aa|aa|03| |xxxxxx|xxxx|
+------------+ +---------+ +--------+ +-----------+
Behavior
--------
--------------- --------------- -------------------------------------
Before After
this commit this commit
dl_type dl_vlan dl_type dl_vlan Notes
------- ------- ------- ------- -------------------------------------
1. TYPE ffff TYPE ffff no change
2. TYPE ffff TYPE ffff no change
3. TYPE VLAN TYPE VLAN no change
4. LEN VLAN TYPE VLAN proposal fixes behavior
5. TYPE VLAN 8100 ffff 802.1Q says this is invalid framing
6. 05ff ffff 05ff ffff no change
7. 05ff ffff 05ff ffff no change
8. LEN VLAN 05ff VLAN proposal fixes behavior
9. LEN VLAN 05ff VLAN proposal fixes behavior
Signed-off-by: Ben Pfaff <blp@nicira.com>
Originally, the datapath didn't care about IP TOS at all. Then, to support
NetFlow, we made it keep track of the last-seen IP TOS value on a per-flow
basis. Then, to support OpenFlow 1.0, we added a nw_tos field to
odp_flow_key. We don't need both methods, so this commit drops the
NetFlow-specific tracking.
This introduces a small kernel ABI break: upgrading the kernel module
without upgrading the OVS userspace will mean that NetFlow records will
all show an IP TOS value of 0. I don't consider that to be a serious
problem.
Rather than actually query the time every time a packet comes through,
just store the current jiffies and convert it to actual time when
requested. GRE is the primary beneficiary of this because the traffic
travels through the datapath twice. This change reduces CPU utilization
3-4% with GRE.
Most of the timekeeping needs of OVS are simply to measure intervals,
which means that it is sensitive to changes in the clock. This commit
replaces the existing clocks with monotonic timers. An additional set
of wall clock timers are added and used in locations that need absolute
time.
Bug #1858
Currently the flow hash table assumes that it is storing flows.
However, we will need additional types of hash tables in the
future so remove assumptions about flows and convert the datapath
to use the new table.
Currently the datapath directly accesses devices through their
Linux functions. Obviously this doesn't work for virtual devices
that are not backed by an actual Linux device. This creates a
new virtual port layer which handles all interaction with devices.
The existing support for Linux devices was then implemented on top
of this layer as two device types. It splits out and renames dp_dev
to internal_dev. There were several places where datapath devices
had to handled in a special manner and this cleans that up by putting
all the special casing in a single location.
Add a tun_id field which contains the ID of the encapsulating tunnel
on which a packet was received (0 if not received on a tunnel). Also
add an action which allows the tunnel ID to be set for outgoing
packets. At this point there aren't any tunnel implementations so
these fields don't have any effect.
The matching is exposed to OpenFlow by overloading the high 32 bits
of the cookie as the tunnel ID. ovs-ofctl is capable of turning
on this special behavior using a new "tun-cookie" command but this
command is intentially undocumented to avoid it being used without
a full understanding of the consequences.
OpenFlow 1.0 adds support for matching on IP ToS/DSCP bits.
NOTE: OVS at this point is not wire-compatible with OpenFlow 1.0 until
the final commit in this OpenFlow 1.0 set.
Starting in OpenFlow 0.9, it is possible to match on the VLAN PCP
(priority) field and rewrite the IP ToS/DSCP bits. This check-in
provides that support and bumps the wire protocol number to 0x98.
NOTE: The wire changes come together over the set of OpenFlow 0.9 commits,
so OVS will not be OpenFlow-compatible with any official release between
this commit and the one that completes the set.
The ability to match the IP addresses in ARP packets allows for fine-grained
control of ARP processing. Some forthcoming changes to allow in-band
control to operate over L3 requires this support if we don't want to
allow overly broad rules regarding ARPs to always be white-listed.
Unfortunately, OpenFlow does not support this sort of processing yet, so
we must treat OpenFlow ARP rules as having wildcarded those L3 fields.