'port' is a kernel-space copy of the odp_port and modifying it is useless.
'portp' is the userspace copy; modifying it is useful.
None of our current userspace users care about the port number and so we
never noticed.
Found by sparse (http://sparse.wiki.kernel.org/).
The newly updated network namespace compatibility layer contained
an issue in the pernet automatic storage allocation feature that
prematurely freed memory. This caused an OOPS if the module was
unloaded.
The IPTUNNEL_XMIT macro was split into different versions based on
the kernel. This adds the compatibility code to allow a single
copy to work on all kernel versions, making it easier to maintain.
The MTU of the local port should be no larger than the minimum of
the MTUs of the ports attached to the bridge, overwise packets may be
dropped. We already prevent changes to the MTU that would violate
this constraint but don't actuallly proactively set the MTU. This
changes makes everything consistent and matches the behavior of
the bridge.
Some ports of Xen (such as Debian Lenny's) use the old style
Xen checksumming fields on newer kernels. Normally the code that
deals with those fields isn't used at all on newer kernels. This
updates the checksumming pointer code with some changes from Lenny
Xen since it is cleaner and works well with our existing compatibility
layer.
CC:pspreadborough@comcast.net
On older kernels we would not correctly update partial checksums
because it was difficult to determine the type of checksum. This
uses some hints to infer the correct type of checksum so that it
can be updated. It also allows us to correctly define
CHECKSUM_PARTIAL, which is important for other components.
While updating the checksum after changing the transport port,
we indicated that this was a change to the psuedoheader. Since
this is not the case, it could produce an incorrect checksum.
On older kernels (< 2.6.19) CHECKSUM_HW can mean either that the
checksum has already been computed by hardware or that the checksum
needs to be computed by hardware, depending on whether we are on
the transmit or receive path. Unfortunately since we are in the
middle of these two paths it is impossible to tell which is the
case. Code after us assumes that CHECKSUM_HW means that the
checksum needs to be computed and will panic if there already is
a checksum. On these kernels we mark these packets as CHECKSUM_NONE
before handing them off.
Without this change using certain NICs will cause panics.
No conflicts, but lib/dpif.c needed a few changes since struct dpif's
member "class" was renamed to "dpif_class" in master since sflow was
branched off.
vswitch_skb_checksum_setup() can be defined in datapath.h as a no-op
when defined(CONFIG_XEN) && defined(HAVE_PROTO_DATA_VALID) is false.
Also, skb_checksum_setup(), which was defined similarly, can be dropped
now, since it was unused.
According to Neil McKee, in an email archived at
http://openvswitch.org/pipermail/dev_openvswitch.org/2010-January/000934.html:
The containment rule is that a given sflow-datasource (sampler or
poller) should be scoped within only one sflow-agent (or
sub-agent). So the issue arrises when you have two
switches/datapaths defined on the same host being managed with
the same IP address: each switch is a separate sub-agent, so they
can run independently (e.g. with their own sequence numbers) but
they can't both claim to speak for the same sflow-datasource.
Specifically, they can't both represent the <ifindex>:0
data-source. This containment rule is necessary so that the
sFlow collector can scale and combine the results accurately.
One option would be to stick with the <ifindex>:0 data-source but
elevate it to be global across all bridges, with a global
sample_pool and a global sflow_agent. Not tempting. Better to
go the other way and allow each interface to have it's own
sampler, just as it already has it's own poller. The ifIndex
numbers are globally unique across all switches/datapaths on the
host, so the containment is now clean. Datasource <ifindex>:5
might be on one switch, whille <ifindex>:7 can be on another.
Other benefits are that 1) you can support the option of
overriding the default sampling-rate on an interface-by-interface
basis, and 2) this is how most sFlow implementations are coded,
so there will be no surprises or interoperability issues with any
sFlow collectors out there.
This commit implements the approach suggested by Neil.
This commit uses an atomic_t to represent the sampling pool. This is
because we do want access to it to be atomic, but we expect that it will
"mostly" be accessed from a single CPU at a time. Perhaps this is a bad
assumption; we can always switch to another form of synchronization later.
CC: Neil McKee <neil.mckee@inmon.com>
The first change is to not propagate the IP DF bit from the inner
packet to the outer packet. Large TCP packets can get segmented
first which will set the DF bit. However these segmented packets
might still be too large after the GRE header is added, requiring
fragmentation.
The second change is to raise the MTU of the GRE tunnel device.
This prevents packets from being dropped in the datapath before
they can be fragmented. Since the datapath is layer 2 it does not
do any fragmentation and drops any packets that are too large.
Both of these are temporary workarounds that need to be addressed
more carefully in the future.
Bug #2379
dp_dev_recv() is called from do_output(), which can be called from
execute_actions(), which can be called from process context with
preemption enabled, so it needs to disabled preemption before it can
access per-CPU data.
Build tested on a few different kernels.
Bug #2316.
Reported-by: John Galgay <john@galgay.net>
CC: Dan Wendlandt <dan@nicira.com>
This implements the kernel portion of GRE on Linux. It consists
of a backported module that provides the GRE capabilities of 2.6.32
plus bug fixes to kernels 2.6.18+.
The upcoming GRE kernel module compiles on a range (2.6.18+) of
Linux kernel versions. The module expects the kernel headers to
look like newer versions. Where older and newer versions of the
kernel differ this commit implements shims to paper over the changes.
The purpose of the non-empty variant of vswitch_skb_checksum_setup is to
synchronise the proto_data_valid and proto_csum_blank fields into the
standard skb csum/ip_summed fields, therefore it is more correct to key
off of HAVE_PROTO_DATA_VALID.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
The Linux 'min' macro checks that its arguments have the same type, and
if not the compiler reports a message about incompatible pointer types.
On pre-2.6.24 kernels skb_headroom() returns int, so this code was
firing a warning:
unsigned headroom = max(min_headroom, skb_headroom(skb));
This commit makes skb_headroom() return an unsigned int regardless of
kernel version.
Commit 5ef800a69 "datapath: Copy Xen's checksumming fields when doing
skb_copy" should copy proto_data_valid between sk_buffs when that field
is present. However the check for CONFIG_XEN plus kernel version 2.6.18
isn't sufficient, because SLES 11 kernels are version 2.6.27 but do have
this field.
This commit adds a configure-time check for the presence of the member
instead of attempting to guess based on the kernel version.
Thanks to Ian Campbell for reporting this problem.
CC: <Ian.Campbell@citrix.com>
If we need to copy an sk_buff in order to make it writable, allow
the minimum headroom to be specified. This ensures that if we
need to add additional data, such as a VLAN tag, we will not have
to make a second copy.
Solves bug #2197 in certain situations.
Two fields that control checksumming were added to sk_buff in
Xen: proto_data_valid and proto_csum_blank. These fields are copied
when doing a skb_clone but not in other functions such as skb_copy,
which can lead to checksum errors in TCP and UDP when offloading is
enabled in the guest. To fix this we manually copy these fields,
though ideally this should be fixed upstream in Xen.
Bug #2299
Recent Debian kernel-header packages divide kernel headers into two
directories: the "common" headers that are not architecture-specific,
which go in a directory named like
/usr/src/kernel-headers-2.6.31-1-common,
and architecture-specific headers in a directory named, e.g.
/usr/src/kernel-headers-2.6.31-1-686.
OVS needs to look at the ones in the "common" directory as part of its
configuration process, but the build directory provided on --with-l26 is
the architecture-specific directory. We also need the
architecture-specific directory, since it is the one that we use as part
of the "make", so we can't simply make the user specify the common
directory on --with-l26. Furthermore, there is no easy-to-see link
between the two directories, except as part of the text in a Makefile,
which is not the easiest language to parse.
This commit attempts to kluge around the problem by using the Debian
directory naming. If the build directory does not contain the headers,
then we replace the last component of its name by "-common" and check
for the headers there. This is not ideal, but it does solve the actual
problem at hand.
Tested with Debian's linux-headers-2.6.31-1-686 and with a few older
sets of headers that do not use this scheme.
When the set_tp_src or set_tp_dst action is used, the calculation for
where the checksum is located was wrong. This caused the checksum to
not be updated and packet corruption in the bad offset.
When querying flow stats allow the TCP flags to be reset. Since
the datapath ORs together all flags that have previously been
seen it is otherwise impossible to determine the set of flags from
after a particular time.
When querying flow stats allow the TCP flags to be reset. Since
the datapath ORs together all flags that have previously been
seen it is otherwise impossible to determine the set of flags from
after a particular time.
This merge took a little bit of care due to two issues:
- Crossport of "interface-reconfigure" fixes from master back to
citrix that had happened and needed to be canceled out of the merge.
- New script "refresh-xs-network-uuids" added on citrix branch that
needed to be moved from /root/vswitch/scripts to
/usr/share/vswitch/scripts.
In Linux 2.6.30, the rtnl_notify() return type was changed from int to
void along with the following commit message:
This patch also modifies the rtnetlink code to ignore the return
value of rtnl_notify() in all callers. The function rtnl_notify()
(before this patch) returned the error of the unicast notification
which makes rtnl_set_sk_err() reports errors to all listeners. This
is not of any help since the origin of the change (the socket that
requested the echoing) notices the ENOBUFS error if the notification
fails and should resync itself.
Thus there's no point in checking the return value, even in older versions
of the kernel, and so this commit changes our code to ignore it, even
on older kernel versions. We also update the rtnl_notify() wrapper macros
to make the return type void on older kernel versions.
This has not been tested, just built.
Thanks to Mikio for spurring me to try building with Linux 2.6.29 and
2.6.30.