As of now, we are using the process subsystem in
ovsdb-server to handle the "--run" command line
option. That particular option is not used often
and till deemed necessary, make it unsupported on
Windows platform.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Commit 96ed775f resizes all userspace metadata to be 8 bytes minimum.
Fix the upcall size checks accordingly.
Signed-off-by: Romain Lenglet <rlenglet@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
There are two different MPLS ethertypes, 0x8847 and 0x8848 and a push MPLS
action applied to an MPLS packet may cause the ethertype to change from one
to the other. To ensure that this happens update the ethertype in
push_mpls() regardless of if the packet is already MPLS or not.
Test based on a similar test by Joe Stringer.
Cc: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>
If ovs-vsctl has to wait for ovs-vswitchd to reconfigure itself
according to the new database, then sometimes ovs-vsctl could
end up stuck in the event loop if OVSDB connection was dropped
while ovs-vsctl was still running.
This patch fixes this problem by letting ovs-vsctl to reconnect
to the OVSDB, if it has to wait cur_cfg field to be updated.
Issue: 1191997
Reported-by: Spiro Kourtessis <spiro@nicira.com>
Signed-Off-By: Ansis Atteka <aatteka@nicira.com>
Make set_ethertype() static as it is not used outside of packet.c
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This configuration item was introduced to assist testing of upcall
handling behaviour with and without facets. Facets were removed in
commit e79a6c833e, so this patch removes the configuration item.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Correct pop MPLS ethtype consistency check to verify that
the packet has an MPLS ethtype before the pop action rather than after:
an MPLS ethtype is a pre-condition but not a post-condition of pop MPLS.
With this change the consistency check in ofpact_check__()
becomes consistent with that in ofpact_from_nxast().
This was found using Ryu tests via the new make check-ryu target.
It allows all of the "POP_MPLS"[1] tests to pass where they previously
failed.
[1] http://osrg.github.io/ryu-certification/switch/ovs
Cc: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Tweak our configuration to match with Ryu tests' single-bridge assumption.
Signed-off-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Tested-by: Simon Horman <horms@verge.net.au>
OpenFlow 1.3.4 is going to revert the tag order change made by OpenFlow
1.3.
MPLS BoS matching is implemented.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
Keep kernel flow stats for each NUMA node rather than each (logical)
CPU. This avoids using the per-CPU allocator and removes most of the
kernel-side OVS locking overhead otherwise on the top of perf reports
and allows OVS to scale better with higher number of threads.
With 9 handlers and 4 revalidators netperf TCP_CRR test flow setup
rate doubles on a server with two hyper-threaded physical CPUs (16
logical cores each) compared to the current OVS master. Tested with
non-trivial flow table with a TCP port match rule forcing all new
connections with unique port numbers to OVS userspace. The IP
addresses are still wildcarded, so the kernel flows are not considered
as exact match 5-tuple flows. This type of flows can be expected to
appear in large numbers as the result of more effective wildcarding
made possible by improvements in OVS userspace flow classifier.
Perf results for this test (master):
Events: 305K cycles
+ 8.43% ovs-vswitchd [kernel.kallsyms] [k] mutex_spin_on_owner
+ 5.64% ovs-vswitchd [kernel.kallsyms] [k] __ticket_spin_lock
+ 4.75% ovs-vswitchd ovs-vswitchd [.] find_match_wc
+ 3.32% ovs-vswitchd libpthread-2.15.so [.] pthread_mutex_lock
+ 2.61% ovs-vswitchd [kernel.kallsyms] [k] pcpu_alloc_area
+ 2.19% ovs-vswitchd ovs-vswitchd [.] flow_hash_in_minimask_range
+ 2.03% swapper [kernel.kallsyms] [k] intel_idle
+ 1.84% ovs-vswitchd libpthread-2.15.so [.] pthread_mutex_unlock
+ 1.64% ovs-vswitchd ovs-vswitchd [.] classifier_lookup
+ 1.58% ovs-vswitchd libc-2.15.so [.] 0x7f4e6
+ 1.07% ovs-vswitchd [kernel.kallsyms] [k] memset
+ 1.03% netperf [kernel.kallsyms] [k] __ticket_spin_lock
+ 0.92% swapper [kernel.kallsyms] [k] __ticket_spin_lock
...
And after this patch:
Events: 356K cycles
+ 6.85% ovs-vswitchd ovs-vswitchd [.] find_match_wc
+ 4.63% ovs-vswitchd libpthread-2.15.so [.] pthread_mutex_lock
+ 3.06% ovs-vswitchd [kernel.kallsyms] [k] __ticket_spin_lock
+ 2.81% ovs-vswitchd ovs-vswitchd [.] flow_hash_in_minimask_range
+ 2.51% ovs-vswitchd libpthread-2.15.so [.] pthread_mutex_unlock
+ 2.27% ovs-vswitchd ovs-vswitchd [.] classifier_lookup
+ 1.84% ovs-vswitchd libc-2.15.so [.] 0x15d30f
+ 1.74% ovs-vswitchd [kernel.kallsyms] [k] mutex_spin_on_owner
+ 1.47% swapper [kernel.kallsyms] [k] intel_idle
+ 1.34% ovs-vswitchd ovs-vswitchd [.] flow_hash_in_minimask
+ 1.33% ovs-vswitchd ovs-vswitchd [.] rule_actions_unref
+ 1.16% ovs-vswitchd ovs-vswitchd [.] hindex_node_with_hash
+ 1.16% ovs-vswitchd ovs-vswitchd [.] do_xlate_actions
+ 1.09% ovs-vswitchd ovs-vswitchd [.] ofproto_rule_ref
+ 1.01% netperf [kernel.kallsyms] [k] __ticket_spin_lock
...
There is a small increase in kernel spinlock overhead due to the same
spinlock being shared between multiple cores of the same physical CPU,
but that is barely visible in the netperf TCP_CRR test performance
(maybe ~1% performance drop, hard to tell exactly due to variance in
the test results), when testing for kernel module throughput (with no
userspace activity, handful of kernel flows).
On flow setup, a single stats instance is allocated (for the NUMA node
0). As CPUs from multiple NUMA nodes start updating stats, new
NUMA-node specific stats instances are allocated. This allocation on
the packet processing code path is made to never block or look for
emergency memory pools, minimizing the allocation latency. If the
allocation fails, the existing preallocated stats instance is used.
Also, if only CPUs from one NUMA-node are updating the preallocated
stats instance, no additional stats instances are allocated. This
eliminates the need to pre-allocate stats instances that will not be
used, also relieving the stats reader from the burden of reading stats
that are never used.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
The 5-tuple optimization becomes unnecessary with a later per-NUMA
node stats patch. Remove it first to make the changes easier to
grasp.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
This fixes crash when userspace does "ovs-dpctl add-dp dev" where dev is
existing non-dp netdevice.
Introduced by:
commit 94358dcffb
"openvswitch: Drop user features if old user space attempted to create datapath"
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Jesse Gross <jesse@nicira.com>
The Ryu controller comes with an extensive library of OpenFlow tests, but
it doesn't seem so easy to me to run all of them against a development
version of Open vSwitch. This commit introduces a Makefile target so that
one can run all the Ryu tests with a simple "make check-ryu".
This commit adds documentation for the new target to INSTALL. It also
moves the documentation for the "check-oftest" target and the
--enable-coverage configure option into INSTALL.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Reviewed-by: Simon Horman <horms@verge.net.au>
This lets us call ovs_lasterror_to_string() and not having
to do an extra call of LocalFree() on the returned string.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Start with a simple implementation that does not support
symbolic links.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Start with not supporting symbolic links for windows.
This is useful for an upcoming commit.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Stats critical sections are held for very short time, so it makes
sense to spin on them for a while rather than sleeping immediately.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
ovs_vport_cmd_dump() did rcu_read_lock() only after getting the
datapath, which could have been deleted in between. Resolved by
taking rcu_read_lock() before the get_dp() call.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Rule's 'used' timestamp is updated at the same time with the other
stats. So far the 'used' has been updated without proper protection,
which may lead to 'tearing' in 32-bit architectures, resulting in an
incorrect 'used' timestamp. While in practice this is highly
improbable, it is still better to handle this correctly.
This is resolved by moving the 'used' field to the provider's stats,
so that whatever protection is used for updating packet and byte
counts, can be also used for both reading and writing of the 'used'
timestamp.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
When the target flow exists already as intended, only the 'modified'
time needs to be updated. Allow modification without taking the
'ofproto_mutex' by always using the 'rule->mutex' when accessing the
'modified' time.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Reading the hmap count for determining if it is empty or not is thread
safe, so avoid locking when not necessary.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
netdev-dummy will mostly be used for testing and debugging over fairly
reliable connection. Reduce reconnect back off timeout in case the first
connect attempt failed.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
The netdev-dummy unit test ran into the reconnect condition on Jarno's
machine. With his test environment, we found and fixed the bugs in
handling reconnect.
Co-authored-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
There are 12 possible ways to specify a tunnel (2 * 2 * 3 == 12):
- Specific in_key or flow-based (2 choices).
- Specific ip_dst or flow-based (2 choices).
- Specific ip_src, wildcarded, or flow-based (3 choices).
Until now, only 6 of the 12 possibilities have been supported. We
have had a couple of requests to add another. This commit adds all the
possibilities, so that we won't have to add the other 6 one by one.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Requested-by: Thomas Morin <thomas.morin@orange.com>
Acked-by: pritesh <pritesh.kothari@cisco.com>
Before this commit, ovs randomly selects a slave for unassigned
bond entry. If the selected slave is not enabled, the active slave
is chosen instead. In this commit, the slave is selected from the
list of all enabled slaves in a round-robin fashion. This helps
improve the consistency of bond behavior when new flows are added.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
The "userspace" MPLS test case was checking the same things as the
"drop" test case, rather than checking to see that packets were being
sent to userspace. This patch makes the testsuite consistent with itself.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
OpenFlow 1.3.3 spec (and earlier) specify that the default value for an
MPLS label should be copied from the outer header.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This commit creates events and through poll_fd_wait_event()
associates them with socket file descriptors to get woken up
from poll_block().
Some other changes:
* Windows does not have sys/fcntl.h but has a fcntl.h
On Linux, there is fctnl.h too.
* include <openssl/applink.c> to handle different C-Runtime linking
of OVS and openssl libraries as suggested at
https://www.openssl.org/support/faq.html#PROG2
The above include will not be needed if we compile Open vSwitch with
/MD compiler option.
* SHUT_RDWR is equivalent to SD_BOTH on Windows.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
For winsock2 functions, error number has to be converted to string
using FormatMessage().
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This is helpful if we want to wait either on 'fd' for POSIX or
events for Windows.
For Windows, if both 'fd' and 'wevent' is specified, we associate
that event with the 'fd' using WSAEventSelect() in poll_block().
So any 'events' on that 'fd' will wake us up from WaitForMultipleObjects().
WSAEventSelect() does not understand POLLIN, POLLOUT etc. Instead the
macros it understands are FD_READ, FD_ACCEPT, FD_CONNECT, FD_CLOSE etc.
So we need to make that transition.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
ICMPv4 and ICMPv6 have 8-bit "type" and "code" fields. struct flow
uses the low 8 bits of the 16-bit tp_src and tp_dst members to
represent these fields. The datapath interface, on the other hand,
represents them with just 8 bits each. This means that if the high 8
bits of the masks for these fields somehow become set (meaning to
match on the nonexistent "high bits" of these fields) during
translation, then they will get chopped off by a round trip through
the datapath, and revalidation will spot that as an inconsistency and
delete the flow. This commit avoids the problem by making sure that
only the low 8 bits of either field can be unwildcarded for ICMP.
This seems like the minimal fix for this problem, appropriate for
backporting to earlier branches. The root of the issue is that these high
bits can get set in the match at all. I have some leads on that, but they
require more invasive changes elsewhere.
Bug #23320.
Reported-by: Krishna Miriyala <miriyalak@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
Until now, the tcp_stream() code has ignored any TCP packets that don't
correspond to a stream that began with a TCP SYN. This commit changes the
code so that any TCP packet that contains a payload starts a new stream.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Reported-by: Vasu Dasari <vdasari@gmail.com>
This should assist testing of datapath performance, as it allows us to
skip "warming up" the flow limit value.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Only the first IP fragment can have a TCP header, check for this.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
This would cause testsuite failures if someone runs the testsuite
without strace installed.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
netdev-dummy is mostly useed over reliable link for testing. Periodic
probing over those links seem overkill.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>