When executing actions, it's possible a recirculation will occur
causing dp_netdev_input() to be called multiple times. If the batch
pointers embedded in dp_netdev_flow aren't cleared, it's possible
packets after the recirculation will be reinserted into a batch
associated with the original lookup. This could be very bad.
This patch fixes the problem by zeroing out flow batch pointers before
calling packet_batch_execute(). This probably has a slightly negative
performance impact, though I haven't tried it.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
This patch simplifies Rx/Tx NIC configuration by removing
custom values and using the defaults provided by the DPDK
PMDs. This also enables Rx vectorisation which improves
performance.
Signed-off-by: Kevin Traynor <kevin.traynor@intel.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Prior to this commit, the number of possible entries in the Exact
Match Cache stood at 1024 per thread exacting to 0.18Mb. A typical
server system will have 2.5Mb cache per core meaning a larger EMC will
comfortably fit in. This patch increases the number of entries to 8192
per thread (1.4Mb) which in turn yields improved throughput when
processing multiple flows of traffic.
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
The documentation says it is required to use bpf ioctls on both
NetBSD and FreeBSD. It causes a compile time failure on FreeBSD 10.
Signed-off-by: Dan McGregor <dan.mcgregor@usask.ca>
Signed-off-by: Ben Pfaff <blp@nicira.com>
The MAX_PKT_BURST and NETDEV_MAX_RX_BATCH macros had a confusing
relationship. They basically purport to do the same thing, making it
unclear which is the source of truth.
Furthermore, while NETDEV_MAX_RX_BATCH was 256, MAX_PKT_BURST was 32,
meaning we never process a batch larger than 32 packets further adding
to the confusion.
This patch resolves the issue by removing MAX_PKT_BURST completely,
and shrinking the new NETDEV_MAX_BURST macro to only 32. This should
have no change in the execution path except shrinking a couple of
structs and memory allocations (can't hurt).
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
The output of 'ovs-ofctl dump-flows' command prints recirc_id in decimal
in action parts of the output, while prints that in hex in matching
parts of the same output.
This patch fixes the inconsistency by always printing recirc_id
values in decimal.
Reported-by: Justin Pettit <jpettit@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
When containers are running inside VMs and the openflow flows
are added in the hypervisor, the physical to logical translation
(and vice versa) needs to handle the VLAN tags that the packet
comes with.
Signed-off-by: Gurucharan Shetty <shettyg@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Until now the exact match cache processing was able to handle only four
megaflows. The rest of the packets was passed to the megaflow
classifier.
The limit was arbitraly set to four also because the algorithm used to
group packets in output batches didn't perform well with a lot of
megaflows.
After changing the algorithm and after some performance testing it seems
much better just to share the same output batches between the exact
match cache and the megaflow classifier.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
The userspace datapath
1. receives a batch of packets.
2. finds a 'netdev_flow' (megaflow) for each packet.
3. groups the packets in output batches based on the 'netdev_flow'.
Until now the grouping (2) was done using a simple algorithm with a
O(N^2) runtime, where N is the number of distinct megaflows of the packets
in the incoming batch. This could quickly become a bottleneck, even with
a small number of megaflows.
With this commit the datapath simply stores in the 'netdev_flow' (the
megaflow) a pointer to the output batch, if one has been created for the
current input batch. The pointer will be cleared when the output batch
is sent.
In a simple phy2phy test with 128 megaflows the throughput is more than
doubled.
The reason that stopped us from doing this change was that the
'netdev_flow' memory was shared between multiple threads: this is no
longer the case with the per-thread classifier.
Also, this commit reorders struct dp_netdev_flow to group toghether the
members used in the fastpath.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Initializing a struct pkt_metadata for every packet can be surprisingly
expensive. It's much faster to keep a copy for each port and copying it
on each packet.
Suggested-by: Pravin Shelar <pshelar@nicira.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
DPDK buf_len is only 16-bit wide ('allocated' was 32-bit), but it should
be enough to store the number of allocated bytes.
This will reduce 'struct dp_packet' size.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
In 'struct ofpbuf' the 'frame' pointer was used to parse different kinds of
data (Ethernet, OpenFlow, Netlink attributes). For Ethernet packets the
'frame' pointer was supposed to have the same value as the 'data'
pointer.
Since 'struct dp_packet' is only used for Ethernet packets, there's no
need for a separate 'frame' pointer: we can use the 'data' pointer
instead.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
The 'list' member is only used (two users) in the slow path.
This commit removes it to reduce the struct size
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Fix a memory leak that was introduced in commit 834fe5cb99 (ofproto:
Additional simplifications.). We used to unref the flow
asynchronously, but forgot to do it when the support for asynchronous
operations was removed.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
CHECK /home/pravin/ovs/w8/datapath/linux/flow_table.c
/home/pravin/ovs/w8/datapath/linux/flow_table.c:536:6: warning: symbol
'ovs_flow_cmp_unmasked_key' was not declared. Should it be static?
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
When a desired flow is different than the installed flow,
we should update its actions based on the desired flow.
Signed-off-by: Gurucharan Shetty <shettyg@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
The last merge from master broke ovs-sandbox OVN support. The rungdb
function now takes an additional argument for whether or not the
daemon should be automatically started under gdb.
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
The correct group ID to avoid requiring any particular output group when
removing a flow is OFPG_ANY. OFPG_ALL just caused the OFPFC_DELETE_STRICT
commands to be ignored because no OVN flows output to OFPG_ALL.
Before this patch, ofctrl wasn't deleting flows when logical ports were
deleted; this fixes the problem.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Gurucharan Shetty <gshetty@nicira.com>
Additional configuration is required if you want to run ovs-vswitchd
with DPDK backend inside a QEMU virtual machine. This happens because,
by default, virtio NIC provided to the guest doesn't support multiple
TX queues which are required by ovs-vswitchd/dpdk. This commit updates
INSTALL.DPDK.md to provide guidelines on how to enable support for
multiple TX queues using QEMU command line and Libvirt config file.
Signed-off-by: Oleg Strikov <oleg.strikov@canonical.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
The max allowed burst size for a single vhost enqueue is 32.
This code facilitates trying to send greater than the burst
size of packets to the vhost interface by adding a retry loop
and calling vhost enqueue multiple times. As this could
potentially block, a timeout is added.
Signed-off-by: Kevin Traynor <kevin.traynor@intel.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Change phy rx burst size from 192 to 32. This aligns the
burst size with the other dpdk interfaces and significantly
improves performance when forwarding to dpdk vhost ports.
Signed-off-by: Kevin Traynor <kevin.traynor@intel.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
When doing OVS performance testing, it's important to have both
realistic traffic traces and OpenFlow pipelines on which to evaluate
prospective changes. As a first step in this direction, this patch
adds a python script which generates an OpenFlow pipeline intended to
simulate typical network virtualization workloads.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
This is necessary to allow it to work.
This bug was introduced during the review process.
Reported-by: Gurucharan Shetty <shettyg@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
OpenFlow 1.3 says:
If a switch cannot add the incoming group entry due to restrictions
(hardware or otherwise) limiting the number of group buckets, it must
refuse to add the group entry and must send an ofp_error_msg with
OFPET_GROUP_MOD_FAILED type and OFPGMFC_OUT_OF_BUCKETS code.
This indicates that OFPGMFC_OUT_OF_BUCKETS is appropriate for an indirect
group with the wrong number of buckets, but OVS was using a different
error. This fixes the problem.
ONF-JIRA: EXT-546
Reported-by: Mrinmoy Das <mrdas@ixiacom.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
OVS correctly define skb_gso_segment() to handle MPLS and VLAN
segmentation correctly. But OVS also uses __skb_gso_segment() in
some cases. Following patch defines compat __skb_gso_segment()
to handle all segmentation cases.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
The convention in OVSDB is to use singular names for database tables,
but Bindings was plural.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
When dpctl commands are used to inspect a userspace datapath, but OVS
has also built-in support for the kernel datapath, an error message is
reported if the kernel module is not loaded. This commit suppresses the
message.
Suggested-by: Ethan Jackson <ethan@nicira.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This commit introduces dps_for_each() which calls a callback for each
datapath of each registered type.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Clang-3.7 generates warnings such as the following:
../lib/ovs-lldp.c:394:19: error: address of array 'hardware->h_ifname'
will always evaluate to 'true' [-Werror,-Wpointer-bool-conversion]
This value is fetched from a netdev, which as far as I can tell must
always have a non-NULL name. Simplify this code.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Dennis Flynn <drflynn@avaya.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Fixes passing variable data as a printf() format string.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Unlike system interfaces, DPDK enabled interfaces must have their interface
type explicitly set when used to create ports. Mention this in relevant parts
of the documentation and add references to INTALL.DPDK.md, where there are many
examples.
Signed-off-by: Billy O'Mahony <billy.o.mahony@intel.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
GCC 4.6.1 complained about the match structure not being properly
initialzed when using MATCH_CATCHALL_INITIALIZER macro.
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Russell Bryant <rbryant@redhat.com>
ovn-controller updates the chassis column of the Bindings table in
OVN_Southbound when a logical port appears on the local switch. A
logical port that has a parent will never appear on a switch managed
by ovn-controller. When a parent port appears, all child container
ports should be updated as being on that chassis, as well.
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: Justin Pettit <jpettit@nicira.com>
This last piece allows us to start testing and debugging a complete OVN
installation. A previous version of this patch was tested in a VM
environment, but this exact version has not been.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
This implementation is really simple, but it seems effective enough in my
minimal testing.
We still need code to generate flows for logical-to-physical and
physical-to-logical translation. With that, plus code to set up tunnels,
we should be able to start end-to-end testing.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>