Occasionally in the unit tests the following race can happen:
1. ovs-vsctl updates database
2. ovs-vswitchd reconfigures, notifies ovs-vsctl that it is complete
3. ovs-appctl ofproto/trace fails to see newly added port
4. ovs-vswitchd main loop calls ofproto's ->type_run(), making the
new port visible to translation.
This race may be seen in the failures of tests 5 and 624 here:
https://launchpadlibrarian.net/151884888/buildlog_ubuntu-precise-amd64.openvswitch_2.0~201309300804-1ppa1~precise_FAILEDTOBUILD.txt.gz
Reported-by: Vasiliy Tolstov <v.tolstov@selfip.ru>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Putting "static inline" on a function definition in a .c file does not help
the compiler and does suppress warnings for unused functions that one would
want, should the function ever become unused.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
Commit 348f01e3e3 (cfm: Eight byte MPIDs in extended mode.) allows
eight byte MPIDs when running in extended mode. This commit explains
this change in the vswitch.conf.db.
Signed-off-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
There's no particular reason for the function controlling the number
of threads to be bound up with dpif_recv_set(). This patch breaks
them up, but as a side effect means threads will run doing nothing
when datapath upcall receiving is disabled. By doing this, the udpif
thread creation API becomes a bit easier to reason about once there
are multiple types of thread introduced in future patches.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
They don't really make sense in a multithreaded architecture. Once
flow miss batches are dispatched with, they will be extra useless.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Previously, we iterated through all interfaces in instant_stats_run(),
grabbing up-to-date information about device and port status. After
assembling all of this information for all interfaces, we would
determine whether anything changed and only send an update to
ovsdb-server if something changed.
This patch uses the new global connectivity_seq to determine whether
there have been any changes before polling all interfaces, which reduces
unnecessary processing in the average case. In a test environment of
5000 internal ports and 50 tunnel ports with bfd, this reduces average
CPU usage of the main thread from about 15% to about 5%. When ports
change status more often than every 100ms, CPU usage is expected to
increase to previous rates.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
This greatly simplifies the reconfiguration code, making it much easier
to understand and modify. The old multi-pass configuration had the
property that it didn't delay block packet processing as much, but that's
not much of a worry anymore now that latency critical activities have
been moved outside the main thread.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Add a prefix tree (trie) structure for tracking the used address
space, enabling skipping classifier tables containing longer masks
than necessary for an address field value in a packet header being
classified. This enables less unwildcarding for datapath flows in
parts of the address space without host routes.
Trie lookup is interwoven to the staged lookup, so that a trie is
searched only when the configured trie field becomes relevant
for the lookup. The trie lookup results are retained so that each
trie is checked at most once for each classifier lookup.
This implementation tracks the number of rules at each address prefix
for the whole classifier. More aggressive table skipping would be
possible by maintaining lists of tables that have prefixes at the
lengths encountered on tree traversal, or by maintaining separate
tries for subsets of rules separated by metadata fields.
Prefix tracking is configured via OVSDB. A new column "prefixes" is
added to the database table "Flow_Table". "prefixes" is a set of
string values listing the field names for which prefix lookup should
be used.
As of now, the fields for which prefix lookup can be enabled are:
- tun_id, tun_src, tun_dst
- nw_src, nw_dst (or aliases ip_src and ip_dst)
- ipv6_src, ipv6_dst
There is a maximum number of fields that can be enabled for any one
flow table. Currently this limit is 3.
Examples:
ovs-vsctl set Bridge br0 flow_tables:0=@N1 -- \
--id=@N1 create Flow_Table name=table0
ovs-vsctl set Bridge br0 flow_tables:1=@N1 -- \
--id=@N1 create Flow_Table name=table1
ovs-vsctl set Flow_Table table0 prefixes=ip_dst,ip_src
ovs-vsctl set Flow_Table table1 prefixes=[]
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Currently, we refresh STP status (id, state, role) alongside
statistics (rx, tx, errors), all within instant_stats_run(). This
patch splits statistics out, and refreshes them with the 5 second
stats instead. This paves the way to reducing execution of
instant_stats_run().
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
Relocating bond.[ch] to allow bond.c to make ofproto calls.
This is needed for upcoming patches that enable megaflow support
for bond ports.
Signed-off-by: Andy Zhou <azhou@nicira.com>
This should behave the same as before but the code reads more naturally to
me this way.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
The "targets" column in IPFIX had a min=1 constraints, so OVSDB
implicitly adds an empty string "" into that column if no value is
given. No connection can be opened to a target with address "", so
the whole IPFIX exporter for that row was disabled until that ""
target was removed by users. That behavior is correct but proved to
be unintuitive to users.
This patch removes the min=1 constraint, to avoid the trouble for
users who insert IPFIX rows with no targets: it eliminates the log
messages due to failed connections to target "", and eliminates the
need to manually remove the "" target after row insertion.
This doesn't impact the behavior for any existing row, whether it has
a "" target or not.
Signed-off-by: Romain Lenglet <rlenglet@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Guard any access to an IPFIX row referenced from
Flow_Sample_Collector_Set by a test that the reference is not NULL.
Signed-off-by: Romain Lenglet <rlenglet@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This commit adds a new key "flap_count" to "bfd_status" to count
the number of bfd "forwarding" flag flaps. A flap is considered
as a change of the "forwarding" flag value.
Signed-off-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Add member is_layer3 to struct ofport_dpif to mark layer 3 ports. Set
it to "true" for the only layer 3 port we support for now: lisp.
Additionally, prevent flooding to layer 3 ports. A later patch will
also prevent MAC learning.
Signed-off-by: Lorand Jakab <lojakab@cisco.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Commit bdebeece5 (lacp: Require successful LACP negotiations when
configured.) makes successful LACP negotiation mandatory for the
bond to come UP. This patch provides a configuration option to
bring up the bond by falling back to active-backup mode on LACP
negotiation failure.
Several of the physical switches that support LACP block all traffic
for ports that are configured to use LACP, until LACP is negotiated
with the host. When configuring a LACP bond on a OVS host
(eg: XenServer), this means that there will be an interruption of the
network connectivity between the time the ports on the physical
switch and the bond on the OVS host are configured. The interruption
may be relatively long, if different people are responsible for
managing the switches and the OVS host.
Such network connectivity failure can be avoided if LACP can be
configured on the OVS host before configuring the physical switch,
and having the OVS host fall back to a bond mode (active-backup) till
the physical switch LACP configuration is complete. An option
"lacp-fallback-ab" is introduced with this patch to provide such
behavior on openvswitch.
Signed-off-by: Ravi Kondamuru <Ravi.Kondamuru@citrix.com>
Signed-off-by: Dominic Curran <Dominic.Curran@citrix.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
A couple of controller vendors have mentioned to me that they would like to
have some part of the OpenFlow port number space reserved for the
controller to use. This commit reserves 32768 and up (roughly the upper
half of the OF1.0 port range) to the controller.
Bug #18753.
Signed-off-by: Ben Pfaff <blp@nicira.com>
This commit adds a new ovsdb column "cfm_flap_count". It counts the
number of cfm fault flaps since boot.
Signed-off-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
Too many users have incorrectly assumed that ovs-controller is a necessary
or desirable part of an Open vSwitch deployment. This commit should fix
the problem by renaming it test-controller and removing it from the
default install and from packaging.
Signed-off-by: Ben Pfaff <blp@nicira.com>
This update improves the BFD documentation in a few ways:
- Demand mode is now supported.
- Wordsmithing, spelling, etc.
- Attempt to better explain decay_min_rx, forwarding_if_rx, and
cpath_down.
- Break into subgroups for configuration and status, to better explain
which party sets which fields.
- Reindents to match the rest of vswitch.xml.
Because of the reindentation, this patch may be easier to view with spacing
changes suppressed.
Signed-off-by: Ben Pfaff <blp@nicira.com>
The variable OVSDB_DOT_DIAGRAM_ARG is describing the vswitch dot file,
so use the name VSWITCH_DOT_DIAGRAM_ARG to prevent confusion in the
generated makefile.
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
When ovsdb-dot generates diagrams for use in the manpages, the dot2pic
postprocessor makes nicer output if the arrowheads are omitted (dot2pic
adds the arrowheads itself). But for other uses that don't go through
the postprocessor, we generally want the arrowheads. So this commit adds
an option. On the principle that the default should be the least
surprising to a naive user, arrowheads are included by default.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
These are auto-generated files, so it would be better not to keep them
inside Open vSwitch repository.
Behaviour before this patch was that if dot tool was not present on
the system, then ovs-vswitchd.conf.db.5 would have used pre-generated
vswitch.pic file that was already checked in the git repository. After
this patch ovs-vswitchd.conf.db.5 will simply not have a dot diagram,
if dot was not present at the time when Open vSwitch was built.
Signed-off-by: Ansis Atteka <aatteka@nicira.com>
The OVS code has always made a distinction between the unencrypted (TCP)
and SSL port numbers for the OpenFlow and OVSDB protocols. The default
port numbers for both protocols has changed, and there continues to be
no distinction between the unencrypted and SSL versions. This
commit removes the distinction in port numbers. A future patch will
recognize the change in default port number.
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This commit prevents cfm from raising 'interval' fault when demand
mode is only enabled on one end of link.
Signed-off-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
This commit makes vswitchd clear the 'bfd_status' column
in ovsdb when bfd is disabled or not supported.
Reported-by: Ansis Atteka <aatteka@nicira.com>
Signed-off-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This commit fixes a place in bridge.c where smap_destroy() is not
always called after smap_init(). Though there is no memory leak
now, it is necessary to fix it and prevent memory leak in the
future when smap_init() may be modified to allocate dynamic memory.
Reported-by: Ansis Atteka <aatteka@nicira.com>
Signed-off-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
The default number of miss handlers should be the number of processors
minus two to account for the dispatcher and main threads.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This commit removes the CACHE_TIME scheme from timeval module. This
is for eliminating the lock contention over the read/write lock of
the cached time. To get the time, the thread now will directly do
the system call 'clock_gettime()'.
As a side effect, timer can only be warpped after timer is stopped
by 'appctl time/stop' command.
Signed-off-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
We have a call chain like this:
iface_configure_qos() calls
netdev_dump_queues(), which calls
netdev_linux_dump_queues(), which calls back through 'cb' to
qos_unixctl_show_cb(), which calls
netdev_delete_queue(), which calls
netdev_linux_delete_queue().
Both netdev_dump_queues() and netdev_linux_delete_queue() take the same
mutex in the same netdev, which deadlocks.
This commit fixes the problem by getting rid of the callback.
netdev_linux_dump_queue_stats() would benefit from the same treatment but
it's less urgent because I don't see any callbacks from that function that
call back into a netdev function.
Bug #19319.
Reported-by: Scott Hendricks <shendricks@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This commit adds a new boolean option "forwarding_if_rx" to bfd.
When forwarding_if_rx is true the interface will be considered
capable of packet I/O as long as there is packet received at
interface. This is important in that when link becomes temporarily
conjested, consecutive BFD control packets can be lost. And the
forwarding_if_rx can prevent link failover by detecting non-control
packets received at interface.
Signed-off-by: Alex Wang <alexw@nicira.com>
Implement a per-exporter flow cache with active timeout expiration.
Add columns "cache_active_timeout" and "cache_max_flows" into table
"IPFIX" to configure each cache.
Add per-flow elements "octetDeltaSumOfSquares",
"minimumIpTotalLength", and "maximumIpTotalLength" to replace
"ethernetTotalLength". Add per-flow element "flowEndReason" to
indicate whether a flow has expired because of an active timeout, the
cache size limit being reached, or the exporter being stopped.
Signed-off-by: Romain Lenglet <rlenglet@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
When there is no incoming data traffic at the interface for a period,
BFD decay allows the bfd session to increase the min_rx. This is
helpful in that some interfaces may usually be idle for a long time.
And cpu consumption can be reduced by processing fewer bfd control
packets.
Signed-off-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
We've seen a number of deadlocks in the tree since thread safety was
introduced. So far, all of these are self-deadlocks, that is, a single
thread acquiring a lock and then attempting to re-acquire the same lock
recursively. When this has happened, the process simply hung, and it was
somewhat difficult to find the cause.
POSIX "error-checking" mutexes check for this specific problem (and
others). This commit switches from other types of mutexes to
error-checking mutexes everywhere that we can, that is, everywhere that
we're not using recursive mutexes. This ought to help find problems more
quickly in the future.
There might be performance advantages to other kinds of mutexes in some
cases. However, the existing mutex type choices were just guesses, so I'd
rather go for easy detection of errors until we know that other mutex
types actually perform better in specific cases. Also, I did a quick
microbenchmark of glibc mutex types on my host and found that the
error checking mutexes weren't any slower than the other types, at least
when the mutex is uncontended.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
This commit adds a new column "n-handler-threads" to the Open_vSwitch
table. This is used to set the number of upcall handler threads created by
the ofproto-dpif-upcall module.
Signed-off-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This commit changes the code such that arguments to thread-safety
macros are not ampersanded.
Signed-off-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
The current situation is that whenever any packet enters the
userspace, bfd_should_process_flow() looks at the UDP destination
port to figure out whether that is a BFD packet. This means that
UDP destination port cannot be wildcarded for all the other flows
too.
To optimize BFD for megaflows, we introduce a new
'bfd:bfd_dst_mac' field in the database. Whenever this field is set
by a controller, it is assumed that all the BFD packets to/from
this interface will have the destination mac address set as the one
specified in the bfd:bfd_dst_mac field. If this field is set, we
first look at the destination mac address of a packet and if it
does not match the mac address set in bfd:bfd_dst_mac, we do not
process that packet as bfd. If the field does match, we go ahead
and look at the UDP destination port too.
Also, change the default BFD destination mac address to
"00:23:20:00:00:01".
Feature #18850.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
Until now, the async append interface has required async_append_enable()
to be called while the process was still single-threaded, with the
rationale being that async_append_enable() could race with
async_append_write() on some existing async_append object. This was a
difficult problem when the async append interface was introduced, because
at the time Open vSwitch did not have any infrastructure for inter-thread
synchronization.
Now it is easy to solve, by introducing synchronization into the
async append module. However, that's more or less wasted, because the
client is already required to serialize access to async append objects.
Moreover, vlog, the only existing client, needs to serialize access for
other reasons, so it wouldn't even be possible to just drop the client's
synchronization.
This commit therefore takes another approach. It drops the
async_append_enable() interface entirely. Now any existing async_append
object is always enabled. The responsibility for "enabling", then, now
rests in whether the client creates and uses an async_append object, and
so vlog now takes care of that by itself. Also, since vlog now has to
deal with sometimes having an async_append and sometimes not having one,
we might as well allow creating an async_append to fail, thereby slightly
simplifying the "no async I/O" implementation from "write synchronously"
to "always fail creating an async_append".
Reported-by: Shih-Hao Li <shihli@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Certain platforms like xenserver do not have the latest
python libraries that are needed by ovsdb-doc (which in-turn
creates ovs-vswitchd.conf.db.5). When we run 'make dist' and
copy over the tar ball to xenserver ddk environemt, we
already include ovs-vswitchd.conf.db.5. But the absence of
ovsdb-doc results in an attempt to regenerate ovs-vswitchd.conf.db.5
and that fails because of the missing python libraries.
Instead of producing ovsdb-doc from ovsdb-doc.in dynamically, we
statically provide ovsdb-doc and pass on the version information
to it through the command line option --version.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This commit adds annotations for thread safety check. And the
check can be conducted by using -Wthread-safety flag in clang.
Co-authored-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>