Our update_csum() function was exactly the same as
inet_proto_csum_replace4() with the one exception that it uses our
checksum status fields on older kernels that need it. Unfortunately,
we can't completely move the code to the compat directory because it
relies on fields in OVS CB but we can at least exile it to checksum.h.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
The update_csum() function that we currently use to update
checksums on actions is really intended for L4 checksums. In
particular, if the packet has a partial checksum and the field
is not in the pseudo header, it doesn't do anything at all.
This doesn't make sense for the IP header because Linux doesn't
use hardware offload for it, so we always need to recompute the
checksum. Instead, we can use the kernel function csum_replace4(),
which will always do the right thing.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Checksum offloading has changed quite a bit across different kernel
and Xen versions. Since it is part of the skb data structure it is
unfortunately difficult to separate out into compatibility code.
This consolidates all of the checksum code in one place which makes
it easier read and remove as we prepare for upstreaming. On newer
kernels it also puts everything in inline functions, eliminating the
need to run through the compat code or make extra function calls.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
After the previous commit, which changed the datapath to always create and
attach a vport at the same time, and to always detach and delete a vport
at the same time, there is no longer any real distinction between a dp_port
and a vport. This commit, therefore, merges the two together to simplify
code. It might even improve performance, although I have not checked.
I wasn't sure at first whether the merged structure should be "struct
dp_port" or "struct vport". I went with the latter since the "v" prefix
sounds cool.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
There's no need to have a mask in this action, because both parts of the
TCI are part of the flow structure.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
These functions run 99% of the time in atomic context and the benefit of
passing along the 'gfp' argument for the other 1% doesn't seem to outweigh
the cost.
Suggested-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This allows eliminating padding from odp_flow_key, although actually doing
that is postponed until the next commit.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
The "port group" concept seems like a good one, but it has not been
used very much in userspace so far, so before we commit ourselves to
a frozen API that we must maintain forever, remove it. We can always
add it back in later as a new kind of vport.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Some of the flow actions that modify skbuff data did not check that the
skbuff was long enough before doing so. This commit fixes that problem.
Previously, the strategy for avoiding this was to only indicate the layer-3
nw_proto field in the flow if the corresponding layer-4 header was fully
present, so that if, for example, nw_proto was IPPROTO_TCP, this meant
that a TCP header was present. The original motivation for this patch was
to add corresponding code to only indicate a layer-2 dl_type if the
corresponding layer-3 header was fully present. But I'm now convinced that
this approach is conceptually wrong, because the meaning of a layer-N
header should not be affected by the meaning of a layer-(N+1) header.
This commit switches to a new approach. Now, when a header is missing, its
fields in the flow are simply zeroed and have no effect on the "type" field
for the outer header. Responsibility for ensuring that a header is fully
present is now shifted to the actions that wish to modify that header.
Signed-off-by: Ben Pfaff <blp@nicira.com>
"ARP spoofing" is when a host claims an incorrect association between an
IP address and a MAC address for deceptive purposes. OpenFlow by itself
can prevent a host from sending out ARP replies from an incorrect MAC
address in the Ethernet L2 header, but it cannot control the MAC addresses
inside the ARP L3 packet. This commit adds a new action that can be used
to drop these spoofed packets.
CC: Paul Ingram <paul@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
On vport ingress we already check for shared SKBs but then later
warn in several other places. In a similar vein, we check every
packet to see if it is LRO but only certain vports can produce
these packets. Remove and consolidate checks to the places where
they are needed.
Currently the flow key is updated to match an action that is applied
to a packet but these field are never looked at again. Not only is
this a waste of time it also makes optimizations involving caching
the flow key more difficult.
In some places we would put the return type on the same line as
the rest of the function definition and other places we wouldn't.
Reformat everything to match kernel style.
We currently check for packets that are over the MTU in do_output().
There is a one-to-one correlation between this function and
vport_send() so move the MTU check there for consistency with
other error checking.
If vswitch_skb_checksum_setup() returns an error, the checksum
pointers probably haven't been set correctly which could cause
a crash later. We should give up immediately on error.
ovs-vswitchd doesn't declare its QoS capabilities in the database yet,
so the controller has to know what they are. We can add that later.
The linux-htb QoS class has been tested to the extent that I can see that
it sets up the queues I expect when I run "tc qdisc show" and "tc class
show". I haven't tested that the effects on flows are what we expect them
to be. I am sure that there will be problems in that area that we will
have to fix.
Some kernels don't copy the checksum offload state in the skb
header when doing different types of copies. Xen adds even more
fields, which are also not consistently copied. The result is
uninitialized memory and random outcomes. This adds a function to
consistently copy these bits across all kernel versions.
Currently the datapath directly accesses devices through their
Linux functions. Obviously this doesn't work for virtual devices
that are not backed by an actual Linux device. This creates a
new virtual port layer which handles all interaction with devices.
The existing support for Linux devices was then implemented on top
of this layer as two device types. It splits out and renames dp_dev
to internal_dev. There were several places where datapath devices
had to handled in a special manner and this cleans that up by putting
all the special casing in a single location.
Add a tun_id field which contains the ID of the encapsulating tunnel
on which a packet was received (0 if not received on a tunnel). Also
add an action which allows the tunnel ID to be set for outgoing
packets. At this point there aren't any tunnel implementations so
these fields don't have any effect.
The matching is exposed to OpenFlow by overloading the high 32 bits
of the cookie as the tunnel ID. ovs-ofctl is capable of turning
on this special behavior using a new "tun-cookie" command but this
command is intentially undocumented to avoid it being used without
a full understanding of the consequences.
After executing an action that changes a packet sometimes we update
the flow key and sometimes we don't. This is potentially problematic
because we sometimes use the key for checks later on. This consistently
maintains the key.
The checksum computed by hardware on receive stored in skb->csum
when skb->ip_summed == CHECKSUM_COMPLETE is supposed to reflect
the contents of the packet starting at skb->data, which includes
the VLAN tag if there is one. However, when we manipulate the
VLAN tag we don't update the checksum. This leads to all kinds
of nasty warnings about broken hardware, not to mention we can't
take advantage of the checksum that was already computed.
This also fixes some issues with our private checksum type value
on some different kernels and after GSO.
When adding a VLAN tag it is necessary for us to setup checksum
pointers for offloaded packets manually. However, this process
clobbers some of the fields that other components need to determine
the current status. Here we mark the packet with its status upon
ingress in our own format that does not get clobbered and is
consistent across kernel versions.
Bug #2436
Starting in OpenFlow 0.9, it is possible to match on the VLAN PCP
(priority) field and rewrite the IP ToS/DSCP bits. This check-in
provides that support and bumps the wire protocol number to 0x98.
NOTE: The wire changes come together over the set of OpenFlow 0.9 commits,
so OVS will not be OpenFlow-compatible with any official release between
this commit and the one that completes the set.
On older kernels we would not correctly update partial checksums
because it was difficult to determine the type of checksum. This
uses some hints to infer the correct type of checksum so that it
can be updated. It also allows us to correctly define
CHECKSUM_PARTIAL, which is important for other components.
While updating the checksum after changing the transport port,
we indicated that this was a change to the psuedoheader. Since
this is not the case, it could produce an incorrect checksum.
On older kernels (< 2.6.19) CHECKSUM_HW can mean either that the
checksum has already been computed by hardware or that the checksum
needs to be computed by hardware, depending on whether we are on
the transmit or receive path. Unfortunately since we are in the
middle of these two paths it is impossible to tell which is the
case. Code after us assumes that CHECKSUM_HW means that the
checksum needs to be computed and will panic if there already is
a checksum. On these kernels we mark these packets as CHECKSUM_NONE
before handing them off.
Without this change using certain NICs will cause panics.
According to Neil McKee, in an email archived at
http://openvswitch.org/pipermail/dev_openvswitch.org/2010-January/000934.html:
The containment rule is that a given sflow-datasource (sampler or
poller) should be scoped within only one sflow-agent (or
sub-agent). So the issue arrises when you have two
switches/datapaths defined on the same host being managed with
the same IP address: each switch is a separate sub-agent, so they
can run independently (e.g. with their own sequence numbers) but
they can't both claim to speak for the same sflow-datasource.
Specifically, they can't both represent the <ifindex>:0
data-source. This containment rule is necessary so that the
sFlow collector can scale and combine the results accurately.
One option would be to stick with the <ifindex>:0 data-source but
elevate it to be global across all bridges, with a global
sample_pool and a global sflow_agent. Not tempting. Better to
go the other way and allow each interface to have it's own
sampler, just as it already has it's own poller. The ifIndex
numbers are globally unique across all switches/datapaths on the
host, so the containment is now clean. Datasource <ifindex>:5
might be on one switch, whille <ifindex>:7 can be on another.
Other benefits are that 1) you can support the option of
overriding the default sampling-rate on an interface-by-interface
basis, and 2) this is how most sFlow implementations are coded,
so there will be no surprises or interoperability issues with any
sFlow collectors out there.
This commit implements the approach suggested by Neil.
This commit uses an atomic_t to represent the sampling pool. This is
because we do want access to it to be atomic, but we expect that it will
"mostly" be accessed from a single CPU at a time. Perhaps this is a bad
assumption; we can always switch to another form of synchronization later.
CC: Neil McKee <neil.mckee@inmon.com>
Commit 5ef800a69 "datapath: Copy Xen's checksumming fields when doing
skb_copy" should copy proto_data_valid between sk_buffs when that field
is present. However the check for CONFIG_XEN plus kernel version 2.6.18
isn't sufficient, because SLES 11 kernels are version 2.6.27 but do have
this field.
This commit adds a configure-time check for the presence of the member
instead of attempting to guess based on the kernel version.
Thanks to Ian Campbell for reporting this problem.
CC: <Ian.Campbell@citrix.com>
If we need to copy an sk_buff in order to make it writable, allow
the minimum headroom to be specified. This ensures that if we
need to add additional data, such as a VLAN tag, we will not have
to make a second copy.
Solves bug #2197 in certain situations.
Two fields that control checksumming were added to sk_buff in
Xen: proto_data_valid and proto_csum_blank. These fields are copied
when doing a skb_clone but not in other functions such as skb_copy,
which can lead to checksum errors in TCP and UDP when offloading is
enabled in the guest. To fix this we manually copy these fields,
though ideally this should be fixed upstream in Xen.
Bug #2299
When the set_tp_src or set_tp_dst action is used, the calculation for
where the checksum is located was wrong. This caused the checksum to
not be updated and packet corruption in the bad offset.
For newer kernels the checksum setup is done at the point the skb is
received in netback or netfront so there is no more need to sprinkle
skb_checksum_setup calls throughout the kernel.