Existing DPDK integration is provided by use of command line options which
must be split out and passed to librte in a special manner. However, this
forces any configuration to be passed by way of a special DPDK flag, and
interferes with ovs+dpdk packaging solutions.
This commit delays dpdk initialization until after the OVS database
connection is established, at which point ovs initializes librte. It
pulls all of the config data from the OVS database, and assembles a
new argv/argc pair to be passed along.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Kevin Traynor <kevin.traynor@intel.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Some point-to-point devices like TUN devices will not have an address, and while
iterating over ifaddrs, its ifa_addr will be NULL. This patch fixes a crash when
starting ovs-vswitchd on a system with such a device.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Fixes: a8704b502785 ("tunneling: Handle multiple ip address for given device.")
Cc: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
This attempts to prevent namespace collisions with other list libraries
Signed-off-by: Ben Warren <ben@skyportsystems.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
All code is now in include/openvswitch/list.h.
Signed-off-by: Ben Warren <ben@skyportsystems.com>
Acked-by: Ryan Moats <rmoats@us.ibm.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Since netdev can have multiple IP address use
generic api netdev_get_addr_list(). This also make it
easier to handle IPv4 and IPv6 address across vswitchd
layers.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
Device can have multiple IP address but netdev_get_in4/6()
returns only one configured IPv6 address. Following
patch fixes it.
OVS router is also updated to return source ip address for
given destination, This is required when interface has multiple
IP address configured.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
Made to simplify creation of derived classes.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
The comment was incomplete in some ways and simply wrong in others.
Also ensure that *cnt is set to 0 if an error is encountered. It's nice
when callers can rely on this.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
This command can be used to check the port/rxq assignment to
pmd threads. For each pmd thread of the datapath shows list
of queue-ids with port names.
Additionally log message from pmd_thread_main() extended with
queue-id, and type of this message changed from INFO to DBG.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
This manifested as a memory leak in test 898 "ofproto-dpif - sFlow packet
sampling - tunnel set", which included an output to a tunnel vport that
doesn't have an implementation of netdev_send().
Reported-by: William Tu <u9012063@gmail.com>
Reported-at: http://openvswitch.org/pipermail/dev/2016-February/065873.html
Signed-off-by: Ben Pfaff <blp@ovn.org>
Currently, all of the PMD netdevs can only have the same number of
rx queues, which is specified in other_config:n-dpdk-rxqs.
Fix that by introducing of new option for PMD interfaces: 'n_rxq', which
specifies the maximum number of rx queues to be created for this
interface.
Example:
ovs-vsctl set Interface dpdk0 options:n_rxq=8
Old 'other_config:n-dpdk-rxqs' deleted.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Define struct eth_addr and use it instead of a uint8_t array for all
ethernet addresses in OVS userspace. The struct is always the right
size, and it can be assigned without an explicit memcpy, which makes
code more readable.
"struct eth_addr" is a good type name for this as many utility
functions are already named accordingly.
struct eth_addr can be accessed as bytes as well as ovs_be16's, which
makes the struct 16-bit aligned. All use seems to be 16-bit aligned,
so some algorithms on the ethernet addresses can be made a bit more
efficient making use of this fact.
As the struct fits into a register (in 64-bit systems) we pass it by
value when possible.
This patch also changes the few uses of Linux specific ETH_ALEN to
OVS's own ETH_ADDR_LEN, and removes the OFP_ETH_ALEN, as it is no
longer needed.
This work stemmed from a desire to make all struct flow members
assignable for unrelated exploration purposes. However, I think this
might be a nice code readability improvement by itself.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
The addition of Geneve options to packet metadata significantly
expanded its size. It was reported that this can decrease performance
for DPDK ports by up to 25% since we need to initialize the whole
structure on each packet receive.
It is not really necessary to zero out the entire structure because
miniflow_extract() only copies the tunnel metadata when particular
fields indicate that it is valid. Therefore, as long as we zero out
these fields when the metadata is initialized and ensure that the
rest of the structure is correctly set in the presence of a tunnel,
we can avoid touching the tunnel fields on packet reception.
Reported-by: Ciara Loftus <ciara.loftus@intel.com>
Tested-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Otherwise, if netdev_unregister_provider() is called before any other
netdev function, netdev_class_mutex is not initialized and the attempt to
lock it aborts.
This doesn't fix an existing bug but with the following commit
--enable-dummy=system will make netdev_unregister_provider() the first
netdev function to be called.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
This patch adds support for a new port type to the userspace
datapath called dpdkvhostuser.
A new dpdkvhostuser port will create a unix domain socket which
when provided to QEMU is used to facilitate communication between
the virtio-net device on the VM and the OVS port on the host.
vhost-cuse ('dpdkvhost') ports are still available as 'dpdkvhostcuse'
ports and will be enabled if vhost-cuse support is detected in the
DPDK build specified during compilation of the switch. Otherwise,
vhost-user ports are enabled.
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
This commit changes the semantics of 'netdev_set_multiq()' to allow OVS
DPDK to run on device with limited multi queue support.
* If a netdev doesn't have the requested number of rxqs it can simply
inform the datapath without failing.
* If a netdev doesn't have the requested number of txqs it should try
to create as many as possible and use locking.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
If we receive a packet with an invalid tunnel header, we
should drop the packet without further processing. Currently
we do this by removing any parsed tunnel metadata. However,
this is not sufficient to stop processing - this only results
in the packet getting dropped by chance when something
usually runs across part of the packet that does not make
sense. Since both the packet and its metadata are in an
inconsistent state, it's also possible that the result is
an ovs-vswitchd crash or forwarding of a mangled packet.
Rather than clear the metadata, an alternate solution is to
remove all of the packet data. This guarantees that the
packet gets dropped during the next round of processing.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
The userspace tunneling API for pushing and popping tunnel headers
is currently based on processing batches of packets. However, there
is no obvious way to take advantage of batching for these operations
and so each tunnel operation has a pair of loops to process the
batch. This changes the API to operate on single packets to enable
better code reuse.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
e.g. Set tunnel id for encapsulated VxLAN packet (out_key=flow):
ovs-vsctl add-port int-br vxlan0 -- set interface vxlan0 \
type=vxlan options:remote_ip=172.168.1.2 options:out_key=flow
ovs-ofctl add-flow int-br in_port=LOCAL, icmp,\
actions=set_tunnel:3, output:1 (1 is the port# of vxlan0)
Output tunnel ID should be modified to 3 with this patch.
Signed-off-by: Ricky Li <ricky.li@intel.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
This patch adds support for a new port type to userspace datapath
called dpdkvhost. This allows KVM (QEMU) to offload the servicing
of virtio-net devices to its associated dpdkvhost port. Instructions
for use are in INSTALL.DPDK.
This has been tested on Intel multi-core platforms and with clients
that have virtio-net interfaces.
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Kevin Traynor <kevin.traynor@intel.com>
Signed-off-by: Maryam Tahhan <maryam.tahhan@intel.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Currently dp-packet make use of ofpbuf for managing packet
buffers. That complicates ofpbuf, by making dp-packet
independent of ofpbuf both libraries can be optimized for
their own use case.
This avoids mapping operation between ofpbuf and dp_packet
in datapath upcalls.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
dp_packet is short and better name for datapath packet
structure.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
A new function vlog_insert_module() is introduced to avoid using
list_insert() from the vlog.h header.
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Ben Pfaff <blp@nicira.com>
struct list is a common name and can't be used in public headers.
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Ben Pfaff <blp@nicira.com>
OVS router depends on tnl_conf_seq and all tunnel related
components should be initialized before registering dpif
implementations.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Following patch adds support for userspace tunneling. Tunneling
needs three more component first is routing table which is configured by
caching kernel routes and second is ARP cache which build automatically
by snooping arp. And third is tunnel protocol table which list all
listening protocols which is populated by vswitchd as tunnel ports
are added. GRE and VXLAN protocol support is added in this patch.
Tunneling works as follows:
On packet receive vswitchd check if this packet is targeted to tunnel
port. If it is then vswitchd inserts tunnel pop action which pops
header and sends packet to tunnel port.
On packet xmit rather than generating Set tunnel action it generate
tunnel push action which has tunnel header data. datapath can use
tunnel-push action data to generate header for each packet and
forward this packet to output port. Since tunnel-push action
contains most of packet header vswitchd needs to lookup routing
table and arp table to build this action.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Ben Pfaff <blp@nicira.com>
In this patch, we add a lib/netdev-windows.c which mostly contains stub
code and in subsequent patches, would use the netlink interface to query
netdev information for a vport.
The code implements netdev functionality for "internal" and "system"
types of vports.
Signed-off-by: Nithin Raju <nithin@vmware.com>
Acked-by: Ankur Sharma <ankursharma@vmware.com>
Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Tested-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Reported-by: Daniel Badea <daniel.badea@windriver.com>
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Daniele Di Proietto <ddiproietto@vmware.com>
This commit adds a new API to the 'struct netdev_class' which
allows user to configure the number of tx queues and rx queues
of 'netdev'. Upcoming patches will use this function to set
multiple tx/rx queues when adding the netdev to dpif-netdev.
Currently, only netdev-dpdk module implements this function.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
There are couple of reasons to remove this support:
* This is used in very old OVS use-case. It is much better
to read stats directly from OVS.
* Forthcoming commit will remove support for setting stats
for vport. The stats update depends on stats-set.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This commit adds new variable n_txq to 'struct netdev' for recording
the number of tx queues. Correspondingly, the send_*() functions are
extended to accept queue id as input argument.
All 'netdev-*' implementation will ignore the queue id since having
multiple tx queues is not supported. Upcomping patches will start
using it and create multiple tx queues for dpdk netdev.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
This commit adds a new API to the 'struct netdev_class' which
allows user to query the numa node id the 'netdev' is on.
Currently, only netdev-dpdk module implements this function.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
All access to struct netdev_registered_class ref_cnt member was done
with netdev_class_mutex held, so it does not need to be an atomic
variable.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Shared memory ring patch
This patch enables the client dpdk rings within the netdev-dpdk. It adds
a new dpdk device called dpdkr (other naming suggestions?). This allows
for the use of shared memory to communicate with other dpdk applications,
on the host or within a virtual machine. Instructions for use are in
INSTALL.DPDK.
This has been tested on Intel multi-core platforms and with the client
application within the host.
Signed-off-by: Gerald Rogers <gerald.rogers@intel.com>
Signed-off-by: Maryam Tahhan <maryam.tahhan@intel.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
The netdev_send function has been modified to accept multiple packets, to
allow netdev providers to amortize locking and queuing costs.
This is especially true for netdev-dpdk.
Later commits exploit the new API.
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
This commit introduces a new data structure used for receiving packets from
netdevs and passing them to dpifs.
The purpose of this change is to allow storing some private data for each
packet. The subsequent commits make use of it.
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
In most cases, tunnel ports specify a dpif name to act as the backing
port in the datapath. However, in the case of UDP tunnels the type is
used with the port number appended. This is potentially a problem for
IPsec tunnels because they have different types but should have the
same backing port. The hasn't been a problem in practice though because
no UDP tunnels are currently used with IPsec.
This switches to use the dpif_port in all cases plus a port number if
necessary. It does this by making the names short enough to accomodate
ports, which also makes the naming more consistent.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Store the error condition of a failed port configuration in a new
column 'error' in the Interface table.
Example:
$ ovs-vsctl add-port br0 test -- \
set Interface test type=vxlan options:unknown=1
ovs-vsctl: Error detected while setting up 'test'. [...]
$ ovs-vsctl list Interface test | grep error
error : "test: could not set configuration (Invalid argument)"
Fixing the error will clear the error column:
$ ovs-vsctl set Interface test options:remote_ip=1.1.1.1
$ ovs-vsctl list Interface test | grep error
error : []
$
For now, the high level error messages when opening and configuring
the netdev are used. Further patches can extend passing the error
pointer into the individual netdev implementations to allow for more
fine grained error messages to be stored.
Signed-off-by: Thomas Graf <tgraf@redhat.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
When the user changes port type (i.e. changing p0 from type 'internal' to
'gre'), the netdev must first be deleted, then re-created with the new type.
Deleting the netdev requires there exist no more references to the netdev.
However, the xlate cache holds references to netdevs and the cache is only
invalidated by revalidator threads. Thus, if cache is not invalidated prior to
the netdev being re-created, the netdev will not be able to be re-created and
the configuration change will fail.
This patch always removes the netdev from the global netdev shash when the
user changes port type. This ensures that the new netdev can always be created
while handler and revalidator threads can retain references to the old netdev
until they are finished.
Signed-off-by: Ryan Wilson <wryan@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
netdev_open() would previously increment a netdev's refcount without
holding a lock for it. This commit shifts the locking to protect it.
Found by inspection.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Reviewed-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Acked-by: Ben Pfaff <blp@nicira.com>
netdev_rxq_open() open-codes much of netdev_ref(), so re-use that
function instead.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Reviewed-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Acked-by: Ben Pfaff <blp@nicira.com>
Commit 3e912ffcbb (netdev: Add 'change_seq' back to netdev.) added per-
netdev change number for indicating status change. Future commits used
this change number to optimize the netdev status update to database.
However, the work also introduced the bug in the following scenario:
- assume interface eth0 has address 1.2.3.4, eth1 has adddress 10.0.0.1.
- assume tunnel port p1 is set with remote_ip=10.0.0.5.
- after setup, 'ovs-vsctl list interface p1 status' should show the
'tunnel_egress_iface="eth1"'.
- now if the address of eth1 is change to 0 via 'ifconfig eth1 0'.
- expectedly, after change, 'ovs-vsctl list interface p1 status' should
show the 'tunnel_egress_iface="eth0"'
However, 'tunnel_egress_iface' will not be updated on current master.
This is in that, the 'netdev-vport' module corresponding to p1 does
not react to routing related changes.
To fix the bug, this commit adds a change sequence number in the route-
table module and makes netdev-vport check the sequence number for
tunnel status update.
Bug #1240626
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Commit 05bf6d3c62e1d (ovs-thread: Add checking for mutex and
rwlock initialization.) helps find an use of uninitialized
mutex (netdev_class_mutex) during upgrade. The assertion
check aborts the ovs.
This commit fixes the issue by adding the proper initialization.
Bug #1239914.
Bug #1240598.
Bug #1240626.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This code path currently does not initialize
netdev_class_mutex.
dummy_enable
->netdev_dummy_register
->netdev_register_provider
->ovs_mutex_lock(&netdev_class_mutex)
ovsdb-server on windows crashes without it.
This commit adds a new initialization function.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This commit can be seen as a partial revert of commit
da4a619179d (netdev: Globally track port status changes)
by adding the 'change_seq' to 'struct netdev'.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Following patch adds DPDK netdev-class to userspace datapath. Now
OVS can use DPDK port for IO by just configuring DPDK port and then
adding dpdk type port to userspace datapath.
Refer to INSTALL.DPDK doc for further info.
This is based a patch from Gerald Rogers.
Signed-off-by: Gerald Rogers <gerald.rogers@intel.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@redhat.com>
new netdev type like DPDK can support multi-queue IO. Following
patch Adds support for same.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@redhat.com>
Preparation for multi queue netdev IO. There are no functional changes
in this patch.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@redhat.com>