I've been reluctant in the past to make wholesale changes to openflow.h
because it would be a divergence from upstream that would make comparisons
and merges more difficult. But, in practice, no one does such comparisons
and no merges happen (because OpenFlow 1.0 is not changing). I'd still be
inclined to resist, except that in this series I'm adding actual checking
for byte order conventions (as opposed to just documentation).
I looked at almost every uint<N>_t in the tree to determine whether it was
really in network byte order, and converted the ones that were.
The only remaining ones, modulo my mistakes, are in openflow.h. I'm not
sure whether we should convert those, because there might be some value
in remaining close to upstream for this header.
ofproto_send_packet() seems very confused about whether vlan_tci is in
host or network byte order. But the callers always pass 0 anyway, so we
might as well remove it.
GCC and glibc conspire to allow struct sockaddr_in * to be passed in
place of struct sockaddr *, but that's non-standard and we're better
off not taking advantage of it.
Found by sparse.
ovs-openflowd outputs a number of log messages that we don't want to
suppress. We do want to know if it logs anything that we don't expect.
So this commit starts checking the log output, discarding any normal,
expected messages.
Reviewed-by: Simon Horman <horms@verge.net.au>
Only a privileged process can open a raw AF_PACKET socket, so netdev-linux
will fail to initialize if run as non-root and you get a cascade of error
messages, like this:
netdev_linux|ERR|failed to create packet socket: Operation not permitted
netdev|ERR|failed to initialize system network device class: Operation not permitted
netdev|ERR|failed to initialize internal network device class: Operation not permitted
netdev|ERR|failed to initialize tap network device class: Operation not permitted
But in fact the AF_PACKET socket is not needed for most operations (only
for sending packets) and it is never needed for testing with the "dummy"
datapath and network device, so we can avoid logging all of these errors
by opening the packet socket only on demand, as this commit does.
Reviewed-by: Simon Horman <horms@verge.net.au>
For a long time, Open vSwitch has "normalized" OpenFlow 1.0 flows in a
funny way: it tries to change fields that are wildcarded into fields
that are exact-match. For example, the normalize_match() function
knows that if dl_type is wildcarded, then all of the L3 and L4 fields
will always be extracted as 0, so it sets those fields to exact-match
and their values to 0.
The reason for this was originally that exact-match flows were much
cheaper for Open vSwitch to implement, because they could be implemented
with a hash table, whereas other kinds of flows had to be implemented
with an expensive linear search. But these days Open vSwitch has a
smarter classifier in which wildcarded flows have minimal cost. Also,
it is no longer possible for OpenFlow 1.0 to specify truly exact-match
flows, because Open vSwitch supports fields for which OpenFlow 1.0
cannot specify values and therefore will always be wildcarded.
Now, it no longer makes sense to do this transformation, so this commit
removes it. Presumably, this will be less surprising for users.
Reviewed-by: Simon Horman <horms@verge.net.au>
The normalize_match() function does more work than really needed. It goes
to some trouble to zero out fields that are wildcarded. This is not
necessary, because cls_rule_from_match() will take care of it later.
Also make normalize_match() private to ofp-util.c, since it has no other
users now and I don't expect more later.
Reviewed-by: Simon Horman <horms@verge.net.au>
OpenFlow 1.0 uses a 6-bit field to express the number of wildcarded bits
in the nw_src and nw_dst field. Any value 32 or greater in these fields
(binary 1xxxxx) means that all of the bits are wildcarded. That means
that there are 32 different ways to express a wildcarded nw_src or nw_dst.
At least two of those seem sensible (100000 and 111111) so we shouldn't
warn about one of them.
This fixes the problem by ORing with 100000 instead of 111111, so that any
already-correct wildcarded mask won't be affected.
This fix allows us to update some tests.
Reviewed-by: Simon Horman <horms@verge.net.au>
This code issues a warning if obtaining a lock takes even 1 millisecond.
That's far too aggressive. There's no need to warn if we have to wait
a few milliseconds. This function already warns elsewhere if locking takes
more than 1 second, which is much more reasonable.
This change allows us to test ovsdb-server stderr output more carefully.
Before now, the tests had to ignore what ovsdb-server writes to stderr
because sometimes it would log a warning that locking took 1 ms (or so).
Reviewed-by: Simon Horman <horms@verge.net.au>
It's better to check output than to ignore it, because ignoring
output can fail to detect real bugs later if the output changes.
Reviewed-by: Simon Horman <horms@verge.net.au>
Until now, when the poll_loop module's log level was turned up to "debug",
it would log a backtrace of the call stack for the event that caused poll()
to wake up in poll_block(). This was pretty useful from time to time to
find out why ovs-vswitchd was using more CPU than expected, because we
could find out what was causing it to wake up.
But there were some issues. One is simply that the backtrace was printed
as a series of hexadecimal numbers, so GDB or another debugger was needed
to translate it into human-readable format. Compiler optimizations meant
that even the human-readable backtrace wasn't, in my experience, as helpful
as it could have been. And, of course, one needed to have the binary to
interpret the backtrace. When the backtrace couldn't be interpreted or
wasn't meaningful, there was essentially nothing to fall back on.
This commit changes the way that "debug" logging for poll_block() wakeups
works. Instead of logging a backtrace, it logs the source code file name
and line number of the call to a poll_loop function, using __FILE__ and
__LINE__. This is by itself much more meaningful than a sequence of
hexadecimal numbers, since no additional interpretation is necessary. It
can be useful even if the Open vSwitch version is only approximately known.
In addition to the file and line, this commit adds, for wakeups caused by
file descriptors, information about the file descriptor itself: what kind
of file it is (regular file, directory, socket, etc.), the name of the file
(on Linux only), and the local and remote endpoints for socket file
descriptors.
Here are a few examples of the new output format:
932-ms timeout at ../ofproto/in-band.c:507
[POLLIN] on fd 20 (192.168.0.20:35388<->192.168.0.3:6633) at ../lib/stream-fd.c:149
[POLLIN] on fd 7 (FIFO pipe:[48049]) at ../lib/fatal-signal.c:168
The backtrace_capture() implementation only worked properly with GNU C on
systems that have a simple stack frame with a frame pointer. Notably,
the x86-64 ABI by default has no frame pointer, so this failed on x86-64.
However, glibc has a function named backtrace() that does what we want.
This commit tests for this function and uses it when it is present, fixing
x86-64 backtraces.
According to the 802.1ag specification, reception of a CCM from an
unexpected source should trigger a fault. This patch causes the CFM
module to simply warn instead. There are several reasons for this
change outlined below.
- Faults can cause controllers to make potentially expensive
changes to the network topology.
- Faults can be maliciously triggered by crafting invalid CCMs.
- With this patch, cfm->fault and rmp->fault are only updated in
cfm_run() making the code easier to debug and reason about.
According to the 802.1ag specification, when a CCM is received
which advertises a misconfigured transmission interval, a fault
should be triggered. This patch goes against the spec by simply
warning when this happens. This is done for several reasons.
- Faults can cause controllers to make potentially expensive
changes in the network topology.
- Faults can be maliciously triggered by crafting invalid CCMs.
- Reducing the number of places in the code where rmp->fault and
cfm->fault are changed makes the code easier to debug and
reason about.
In some circumstances the bridge can't find a stable physical Ethernet
address to use, so until now it has just picked a random Ethernet address.
In these circumstances, therefore, the bridge Ethernet address would change
from one ovs-vswitchd run to another. But OVS does have a stable
identifier for a bridge: its UUID. This commit changes to use that as the
default bridge Ethernet address.
The datapath ID is sometimes derived from the bridge Ethernet address, so
this change also makes the bridge Ethernet address more stable.
CC: Natasha Gude <natasha@nicira.com>
Bug #5594.
We currently always pull 64 bytes of data (if it exists) into the
skb linear data area when parsing flows. The theory behind this
is that the data should always be there and it's enough to parse
common flows. However, this causes a number of problems in
different situations. The first is that it is not enough to handle
IPv6 so we must pull additional data anyways. However, the main
problem is that GRO typically allocates a new skb and puts just the
headers in there. For a typical TCP/IPv4 packet there are 54 bytes
of headers, which means that we must possibly reallocate and copy
on every packet. In addition, GRO creates frag_lists with this
specific geometry in order to allow later segmentation if the packet
is forwarded to a device that does not support frag_lists. When
we pull additional data it changes the geometry and causes later
problems for the device. This patch instead incrementally pulls
data, which avoids these problems.
Signed-off-by: Jesse Gross <jesse@nicira.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
Commit daf2ebb (xenserver: Use xe-switch-network-stack in RPM spec
file.) changed the spec file to use xe-switch-network-backend instead of
directly modifying "/etc/xensource/network.conf". It incorrectly
assumed that the command was in the search path. It also didn't take
into account that the command will remove the "openvswitch" service with
chkconfig. This commit fixes those errors.
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Previously, if --private-key or another option that requires SSL support
was used, but OVS was built without OpenSSL support, then OVS would fail
with an error message that the specified option was not supported. This
confused users because it made them think that the option had been removed:
http://openvswitch.org/pipermail/discuss/2011-April/005034.html
This commit improves the error message: OVS will now report that it was
built without SSL support. This should be make the problem clear to users.
Reported-by: Aaron Rosen <arosen@clemson.edu>
Feature #5325.
It doesn't make sense to create a QoS object without any queues.
Before this patch, OVS would configure the QoS object and as a
result drop all traffic going through the affected interface. With
this patch, OVS will simply clear QoS configuration on the
interface.
Bug #5583.
In some cases, when a facet's actions change because it resubmits
into a different rule, it will account all packets it as accrued
in the datapath to the new rule. Due to the algorithm we are
using, it is acceptable for a facet to miscount at most 1 second
worth of packets in this manner. This patch implements the proper
behavior.
Generally speaking, when a facet is facet_put__() into the
datapath, the kernel returns the old flow's statistics so they may
be accounted for in user space. These statistics are generally
pushed down to the relevant facet's resubmit children. Before this
patch, facet_put__() did not compensate for the fact that many of
the statistics in the datapath may have been already pushed.
Thus the entire packet count stored in the datapath would be pushed
to its children instead of simply the packets which have accrued
since the last accounting. This patch fixes the behavior by
subtracting already accounted for packets from the statistics
returned by the datapath.
This commit creates a new heartbeat mode for LACP. This mode
treats LACP as a protocol simply for monitoring link status. It
strips out most of the sanity checks built into the protocol.
Addition of this mode makes "lacp-force-aggregatable" and
"lacp-strict" options obsolete so they are removed.
Stable bonding mode needs an ID to guarantee consistent slave
selection decisions across ovs-vswitchd instances. Before this
patch, we used the lacp-port-id for this purpose. However, LACP
places restrictions on how lacp-port-ids can be allocated which may
be inconvenient. This patch creates a special purpose
bond-stable-id other_config setting which allows users to tweak
this value directly.
The proper way to switch the networking back-end is to use the
"xe-switch-network-stack" command rather than directly modifying
"/etc/xensource/network.conf". Use that method in the spec file.
network.dbcache was introduced by Open vSwitch for its own purposes, but
it has now migrated into the base install of XenServer, which uses it
whether Open vSwitch is installed or not, so we should no longer remove it
on package uninstall.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Reported-by: Bob Ball <bob.ball@citrix.com>
On XenServer, depmod.conf causes modules in /lib/modules/$(uname -r)/extra
to take priority over standard modules. Unfortunately, we were installing
our modules in /lib/modules/$(uname -r)/kernel/extra, which isn't special.
This commit fixes the problem.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Reported-by: Bob Ball <bob.ball@citrix.com>
Quite a few changes to LACP and bonding have gone in recently which
allowed additional other_config parameters on ports and interfaces.
These changes should have updated the vswitch.ovsschema version
number.
Requested-by: Jeremy Stribling <strib@nicira.com>
An upcoming commit will introduce another function that needs to convert
between rtnl_link_stats64 and netdev_stats, so it seemed best to just add
functions to do the conversion.
Split existing pmtud tunnel option's functionality into three. Existing pmtud
option still exists, but now governs only whether datapath sends ICMP frag
needed messages. New df_inherit option controls whether DF bit is copied from
packet inner header to outer tunnel header. New df_default option controls
whether DF bit is set if inner packet isn't IP or if df_inherit is disabled.
Suggested-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Andrew Evans <aevans@nicira.com>
Feature #5456.
When ports are deleted from the datapath they need to be added to
an LRU list maintained in dpif-linux so they may be reallocated.
When using vswitchd to delete the ports this happens automatically.
However, if a port is deleted directly from the datapath it is
never reclaimed by dpif-linux. If this happens often, eventually
no ports will be available for allocation and dpif-linux will fall
back to using the old, kernel implemented, allocation strategy.
This commit fixes the problem by automatically reclaiming ports
missing from the datapath whenever the list of ports in the
datapath is dumped.
Bug #2140.