If netdev flow offloading is enabled, flush all
added ports using netdev flow api.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
To use netdev flow offloading api, dpifs needs to iterate over
added ports. This addition inserts the added dpif ports in a hash map,
The map will also be used to translate dpif ports to netdevs.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Add a new configuration tc-policy option that controls tc
flower flag. Possible options are none, skip_sw, skip_hw.
The default is none which is to insert the rule both to sw and hw.
This option is only relevant if hw-offload is enabled.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Add a new configuration option - hw-offload that enables netdev
flow api. Enabling this option will allow offloading flows
using netdev implementation instead of the kernel datapath.
This configuration option defaults to false - disabled.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Add a new API interface for offloading dpif flows to netdev.
The API consist on the following:
flow_put - offload a new flow
flow_get - query an offloaded flow
flow_del - delete an offloaded flow
flow_flush - flush all offloaded flows
flow_dump_* - dump all offloaded flows
In upcoming commits we will introduce an implementation of this
API for netdev-linux.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
When trying to configure a system port as type=internal it could start
an infinite port creation loop. When this happens you will see the
following log messages:
2017-06-01T09:00:17.900Z|02813|dpif|WARN|system@ovs-system: failed to add ve01_1 as port: File exists
2017-06-01T09:00:17.900Z|02814|bridge|WARN|could not add network device ve01_1 to ofproto (File exists)
2017-06-01T09:00:17.907Z|02815|bridge|INFO|bridge bzb: added interface ve01_1 on port 2
2017-06-01T09:00:17.909Z|02816|bridge|INFO|bridge bzb: deleted interface ve01_1 on port 2
2017-06-01T09:00:17.914Z|02817|dpif|WARN|system@ovs-system: failed to add ve01_1 as port: File exists
2017-06-01T09:00:17.914Z|02818|bridge|WARN|could not add network device ve01_1 to ofproto (File exists)
2017-06-01T09:00:17.921Z|02819|bridge|INFO|bridge bzb: added interface ve01_1 on port 3
2017-06-01T09:00:17.923Z|02820|bridge|INFO|bridge bzb: deleted interface ve01_1 on port 3
2017-06-01T09:00:17.929Z|02821|dpif|WARN|system@ovs-system: failed to add ve01_1 as port: File exists
2017-06-01T09:00:17.929Z|02822|bridge|WARN|could not add network device ve01_1 to ofproto (File exists)
2017-06-01T09:00:17.936Z|02823|bridge|INFO|bridge bzb: added interface ve01_1 on port 4
...
...
This is how to replicate it:
ip link add name ve01_1 type veth peer name ve01_2
ovs-vsctl add-br bzb
ovs-vsctl add-port bzb ve01_1
ovs-vsctl set interface ve01_1 type=internal
ip link set dev ve01_1 up
ip link set dev ve01_2 up
When changing the type to internal, the async configuration logic get
triggered and because the type has changed it will delete the
interface and the ofproto port. Next it will call iface_do_create() to
re-create the interface as internal. Because we just deleted the
interface netdev_open() will try to recreate it as internal.
However this will fail with EEXIST as a system interface already
exists withe the name.
Up till here all is fine...
Now some ipv6 route change comes along for the ve01_1 interface, and
the route infrastructure will call netdev_open(). This will create the
interface of type system.
Next the configuration verify process gets triggered due to
if_notifier_changed() being true. We now retry the above, but because
the interface exists (although in the system class) it will use it,
and create the interface successfully.
This triggers another if notification, causing yet another config
update, and because the system != internal reconfiguration happens and
it start from the top...
So the fix as presented below is causing netdev_open() only to return
the existing device for the class type requested (if the type is
specified).
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
If the device name uses a vport prefix, then use that vport type.
Since these names are reserved, we can assume this is the right type.
This is important when we are querying the datapath right after vswitch has
started and using the right type will be even more important when we add support
to creating tunnel ports with rtnetlink.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
The empty string is not a valid name for a network device. I would have
expected that each of the netdev provider implementations would reject an
empty string, but there was a special case for Linux tap devices where they
instead caused unexpected behavior. This commit should fix the problem for
those devices and every other kind.
Reported-by: Gabor Locsei <gabor.locsei@ericsson.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2017-February/043613.html
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Girish Moodalbail <girish.moodalbail@oracle.com>
Acked-by: Andy Zhou <azhou@ovn.org>
One common use case of 'struct dp_packet_batch' is to process all
packets in the batch in order. Add an iterator for this use case
to simplify the logic of calling sites,
Another common use case is to drop packets in the batch, by reading
all packets, but writing back pointers of fewer packets. Add macros
to support this use case.
Signed-off-by: Andy Zhou <azhou@ovn.org>
Acked-by: Jarno Rajahalme <jarno@ovn.org>
Since commit 55e075e65ef9("netdev-dpdk: Arbitrary 'dpdk' port naming"),
we don't call rte_eth_start() from netdev_open() anymore, we only call
it from netdev_reconfigure(). This commit does that also for 'dpdkr'
devices, and remove some useless code.
Calling rte_eth_start() also from netdev_open() was unnecessary and
wasteful. Not doing it reduces code duplication and makes adding a port
faster (~900ms before the patch, ~400ms after).
Another reason why this is useful is that some DPDK driver might have
problems with reconfiguration. For example, until DPDK commit
8618d19b52b1("net/vmxnet3: reallocate shared memzone on re-config"),
vmxnet3 didn't support being restarted with a different number of
queues.
Technically, the netdev interface changed because before opening rxqs or
calling netdev_send() the user must check if reconfiguration is
required. This patch also documents that, even though no change to the
userspace datapath (the only user) is required.
Lastly, this patch makes sure the errors returned by ofproto_port_add
(which includes the first port reconfiguration) are reported back to the
database.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Tunnel devices have 0 txqs and don't support netdev_send(). While
netdev_send() simply returns EOPNOTSUPP, the XPS logic is still executed
on output, and that might be confused by devices with no txqs.
It seems better to have different structures in the fast path for ports
that support netdev_{push,pop}_header (tunnel devices), and ports that
support netdev_send. With this we can also remove a branch in
netdev_send().
This is also necessary for a future commit, which starts DPDK devices
without txqs.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Since 55e075e65ef9("netdev-dpdk: Arbitrary 'dpdk' port naming"),
set_config() is used to identify a DPDK device, so it's better to report
its detailed error message to the user. Tunnel devices and patch ports
rely a lot on set_config() as well.
This commit adds a param to set_config() that can be used to return
an error message and makes use of that in netdev-dpdk and netdev-vport.
Before this patch:
$ ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
ovs-vsctl: Error detected while setting up 'dpdk0': dpdk0: could not set
configuration (Invalid argument). See ovs-vswitchd log for details.
ovs-vsctl: The default log directory is "/var/log/openvswitch/".
$ ovs-vsctl add-port br0 p+ -- set Interface p+ type=patch
ovs-vsctl: Error detected while setting up 'p+': p+: could not set
configuration (Invalid argument). See ovs-vswitchd log for details.
ovs-vsctl: The default log directory is "/var/log/openvswitch/".
$ ovs-vsctl add-port br0 gnv0 -- set Interface gnv0 type=geneve
ovs-vsctl: Error detected while setting up 'gnv0': gnv0: could not set
configuration (Invalid argument). See ovs-vswitchd log for details.
ovs-vsctl: The default log directory is "/var/log/openvswitch/".
After this patch:
$ ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
ovs-vsctl: Error detected while setting up 'dpdk0': 'dpdk0' is missing
'options:dpdk-devargs'. The old 'dpdk<port_id>' names are not
supported. See ovs-vswitchd log for details.
ovs-vsctl: The default log directory is "/var/log/openvswitch/".
$ ovs-vsctl add-port br0 p+ -- set Interface p+ type=patch
ovs-vsctl: Error detected while setting up 'p+': p+: patch type requires
valid 'peer' argument. See ovs-vswitchd log for details.
ovs-vsctl: The default log directory is "/var/log/openvswitch/".
$ ovs-vsctl add-port br0 gnv0 -- set Interface gnv0 type=geneve
ovs-vsctl: Error detected while setting up 'gnv0': gnv0: geneve type
requires valid 'remote_ip' argument. See ovs-vswitchd log for
details.
ovs-vsctl: The default log directory is "/var/log/openvswitch/".
CC: Ciara Loftus <ciara.loftus@intel.com>
CC: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Tested-by: Ciara Loftus <ciara.loftus@intel.com>
Add Rx checksum offloading feature support on DPDK physical ports. By default,
the Rx checksum offloading is enabled if NIC supports. However,
the checksum offloading can be turned OFF either while adding a new DPDK
physical port to OVS or at runtime.
The rx checksum offloading can be turned off by setting the parameter to
'false'. For eg: To disable the rx checksum offloading when adding a port,
'ovs-vsctl add-port br0 dpdk0 -- \
set Interface dpdk0 type=dpdk options:rx-checksum-offload=false'
OR (to disable at run time after port is being added to OVS)
'ovs-vsctl set Interface dpdk0 options:rx-checksum-offload=false'
Similarly to turn ON rx checksum offloading at run time,
'ovs-vsctl set Interface dpdk0 options:rx-checksum-offload=true'
The Tx checksum offloading support is not implemented due to the following
reasons.
1) Checksum offloading and vectorization are mutually exclusive in DPDK poll
mode driver. Vector packet processing is turned OFF when checksum offloading
is enabled which causes significant performance drop at Tx side.
2) Normally, OVS generates checksum for tunnel packets in software at the
'tunnel push' operation, where the tunnel headers are created. However
enabling Tx checksum offloading involves,
*) Mark every packets for tx checksum offloading at 'tunnel_push' and
recirculate.
*) At the time of xmit, validate the same flag and instruct the NIC to do the
checksum calculation. In case NIC doesnt support Tx checksum offloading,
the checksum calculation has to be done in software before sending out the
packets.
No significant performance improvement noticed with Tx checksum offloading
due to the e overhead of additional validations + non vector packet processing.
In some test scenarios, it introduces performance drop too.
Rx checksum offloading still offers 8-9% of improvement on VxLAN tunneling
decapsulation even though the SSE vector Rx function is disabled in DPDK poll
mode driver.
Signed-off-by: Sugesh Chandran <sugesh.chandran@intel.com>
Acked-by: Jesse Gross <jesse@kernel.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
netdev_get_vports() previously counted the number of ports outside the
mutex, allocated enough memory for that number, then grabbed the mutex
to iterate through them and filled the array with the pointers.
This is logically wrong; in theory the number of ports could change
between allocating the memory and grabbing the mutex. In practice, only
the main thread manages these so there is no chance for a segfault. Fix
it up anyway.
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
Following warning was reported by Travis:-
lib/netdev.c:1916:19: error: cast from 'struct sockaddr *' to 'struct
sockaddr_in *' increases required alignment from 2 to 4
[-Werror,-Wcast-align]
sin = (struct sockaddr_in *) ifa->ifa_netmask;
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
lib/netdev.c:1924:20: error: cast from 'struct sockaddr *' to 'struct
sockaddr_in6 *' increases required alignment from 2 to 4
[-Werror,-Wcast-align]
sin6 = (struct sockaddr_in6 *) ifa->ifa_netmask;
Fixes: 3f31aded6 ("netdev: fix netmask in netdev_get_addrs").
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
When iterating on getifaddrs result, ifa_netmask is dereferenced, but it's
already a pointer to struct sockaddr. This would result in wrong masks being
used when comparing addresses while calculating the source address given a
destination address at the routing code.
For example, the mask ::ffff:116.85.0.0 would be used, causing 172.16.100.0/24
to match 172.16.101.1, though they should not match.
This will not happen when using a dummy netdev, as netdev_get_addrs is not used
by it.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
By default Open vSwitch tries to configure internal interfaces MTU to
match the bridge minimum, overriding any attempt by the user to
configure it through standard system tools, or the database.
While this works in many simple cases (there are probably many users
that rely on this) it may create problems for more advanced use cases
(like any overlay networks).
This commit allows the user to override the default behavior by
providing an explict MTU in the mtu_request column in the Interface
table.
This means that Open vSwitch will now treat differently database MTU
requests from standard system tools MTU requests (coming from `ip link`
or `ifconfig`), but this seems the best way to remain compatible with
old users while providing a more powerful interface.
Suggested-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Tested-by: Joe Stringer <joe@ovn.org>
This reverts commit d2fa6c676a13e86acc7f17261b2d87484f625d45.
When doing a restart, the routing table will open ports as system, which
prevents internal ports to be opened with the right type. That causes failures
in creating the ports.
We should revisit this patch after finding a proper fix on the routing table
layer.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
This will allow run() and wait() methods to be shared between different
classes and still perform class-specific work.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Every provider silently drops the const attribute when converting the
parameter to the appropriate subclass. Might as well drop the const
attribute from the parameter, since this is a "set" function.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Commit 64839cf43 applies batch objects to netdev-providers, but
some comments were not updated accordingly. Fix these:
- replace 'pkts' with 'batch'
- replace '*cnt' with 'batch->count'
- replace MAX_RX_BATCH with NETDEV_MAX_BURST
- remove superfluous whitespace
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: William Tu <u9012063@gmail.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
When a device is already opened, netdev_open should verify that the types match,
or else return an error.
Otherwise, users might expect to open a device with a certain type and get a
handle belonging to a different type.
This also prevents certain conflicting configurations that would have a port of
a certain type in the database and one of a different type on the system.
For example, when adding an interface with a type other than system, and there
is already a system interface with the same name, as the routing table will hold
a reference to that system interface, some conflicts will arise. The netdev will
be opened with the incorrect type and that will make vswitchd remove it, but
adding it again will fail as it already exists. Failing earlier prevents some
vswitchd loops in reconfiguring the interface.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
If CPU number in pmd-cpu-mask is not divisible by the number of queues and
in a few more complex situations there may be unfair distribution of TX
queue-ids between PMD threads.
For example, if we have 2 ports with 4 queues and 6 CPUs in pmd-cpu-mask
such distribution is possible:
<------------------------------------------------------------------------>
pmd thread numa_id 0 core_id 13:
port: vhost-user1 queue-id: 1
port: dpdk0 queue-id: 3
pmd thread numa_id 0 core_id 14:
port: vhost-user1 queue-id: 2
pmd thread numa_id 0 core_id 16:
port: dpdk0 queue-id: 0
pmd thread numa_id 0 core_id 17:
port: dpdk0 queue-id: 1
pmd thread numa_id 0 core_id 12:
port: vhost-user1 queue-id: 0
port: dpdk0 queue-id: 2
pmd thread numa_id 0 core_id 15:
port: vhost-user1 queue-id: 3
<------------------------------------------------------------------------>
As we can see above dpdk0 port polled by threads on cores:
12, 13, 16 and 17.
By design of dpif-netdev, there is only one TX queue-id assigned to each
pmd thread. This queue-id's are sequential similar to core-id's. And
thread will send packets to queue with exact this queue-id regardless
of port.
In previous example:
pmd thread on core 12 will send packets to tx queue 0
pmd thread on core 13 will send packets to tx queue 1
...
pmd thread on core 17 will send packets to tx queue 5
So, for dpdk0 port after truncating in netdev-dpdk:
core 12 --> TX queue-id 0 % 4 == 0
core 13 --> TX queue-id 1 % 4 == 1
core 16 --> TX queue-id 4 % 4 == 0
core 17 --> TX queue-id 5 % 4 == 1
As a result only 2 of 4 queues used.
To fix this issue some kind of XPS implemented in following way:
* TX queue-ids are allocated dynamically.
* When PMD thread first time tries to send packets to new port
it allocates less used TX queue for this port.
* PMD threads periodically performes revalidation of
allocated TX queue-ids. If queue wasn't used in last
XPS_TIMEOUT_MS milliseconds it will be freed while revalidation.
* XPS is not working if we have enough TX queues.
Reported-by: Zhihong Wang <zhihong.wang@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
To easily allow both in- and out-of-tree building of the Python
wrapper for the OVS JSON parser (e.g. w/ pip), move json.h to
include/openvswitch. This also requires moving lib/{hmap,shash}.h.
Both hmap.h and shash.h were #include-ing "util.h" even though the
headers themselves did not use anything from there, but rather from
include/openvswitch/util.h. Fixing that required including util.h
in several C files mostly due to OVS_NOT_REACHED and things like
xmalloc.
Signed-off-by: Terry Wilson <twilson@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Commit 1895cc8dbb64 ("dpif-netdev: create batch object") introduces
batch process functions and 'struct dp_packet_batch' to associate with
batch-level metadata. This patch applies the packet batch object to
the netdev provider interface (dummy, Linux, BSD, and DPDK) so that
batch APIs can be used in providers. With batch metadata visible in
providers, optimizations can be introduced at per-batch level instead
of per-packet.
Tested-at: https://travis-ci.org/williamtu/ovs-travis/builds/145694197
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
The patch adds a new action to support packet truncation. The new action
is formatted as 'output(port=n,max_len=m)', as output to port n, with
packet size being MIN(original_size, m).
One use case is to enable port mirroring to send smaller packets to the
destination port so that only useful packet information is mirrored/copied,
saving some performance overhead of copying entire packet payload. Example
use case is below as well as shown in the testcases:
- Output to port 1 with max_len 100 bytes.
- The output packet size on port 1 will be MIN(original_packet_size, 100).
# ovs-ofctl add-flow br0 'actions=output(port=1,max_len=100)'
- The scope of max_len is limited to output action itself. The following
packet size of output:1 and output:2 will be intact.
# ovs-ofctl add-flow br0 \
'actions=output(port=1,max_len=100),output:1,output:2'
- The Datapath actions shows:
# Datapath actions: trunc(100),1,1,2
Tested-at: https://travis-ci.org/williamtu/ovs-travis/builds/140037134
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
The native tunneling build tunnel header code is spread across
two different modules, it makes pretty hard to follow the code.
Following patch refactors the code to move all code to
netdev-ative-tnl module.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
This introduces in dpif-netdev and netdev-dpdk the first use for the
newly introduce reconfigure netdev call.
When a request to change the number of queues comes, netdev-dpdk will
remember this and notify the upper layer via
netdev_request_reconfigure().
The datapath, instead of periodically calling netdev_set_multiq(), can
detect this and call reconfigure().
This mechanism can also be used to:
* Automatically match the number of rxq with the one provided by qemu
via the new_device callback.
* Provide a way to change the MTU of dpdk devices at runtime.
* Move a DPDK vhost device to the proper NUMA socket.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Tested-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
A netdev provider, especially a PMD provider (like netdev DPDK) might
not be able to change some of its parameters (such as MTU, or number of
queues) without stopping everything and restarting.
This commit introduces a mechanism that allows a netdev provider to
request a restart (netdev_request_reconfigure()). The upper layer can
be notified via netdev_wait_reconf_required() and
netdev_is_reconf_required(). After closing all the rxqs the upper layer
can finally call netdev_reconfigure(), to make sure that the new
configuration is in place.
This will be used by next commit to reconfigure rx and tx queues in
netdev-dpdk.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Tested-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
The tunnel header pop action can leak batch of packet
in case of error. Following patch fixex the error code path.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
DPDK datapath operate on batch of packets. To pass the batch of
packets around we use packets array and count. Next patch needs
to associate meta-data with each batch of packets. So Introducing
a batch structure to make handling the metadata easier.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
Current tunnel-pop API does not allow the netdev implementation
retain a packet but STT can keep a packet from batch of packets
during TCP reassembly processing. To return exact count of
valid packet STT need to pass this number of packet parameter
as a reference.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
DPDK netdev classes were being initialised twice, resulting in warning
logs like so:
netdev|WARN|attempted to register duplicate netdev provider: dpdk
This commit removes one of the initialisation calls.
Fixes: 0692257923fe ("netdev: Fix potential deadlock.")
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Until now, netdev_class_mutex and route_table_mutex could be taken in
either order:
* netdev_run() takes netdev_class_mutex, then netdev_vport_run() calls
route_table_run(), which takes route_table_mutex.
* route_table_init() takes route_table_mutex and then eventually calls
netdev_open(), which takes netdev_class_mutex.
This commit fixes the problem by converting the netdev_classes hmap,
protected by netdev_class_mutex, into a cmap protected on the read
side by RCU. Only a very small amount of code actually writes to the
cmap in question, so it's a lot easier to understand the locking rules
at that point. In particular, there's no need to take netdev_class_mutex
from either netdev_run() or netdev_open(), so neither of the code paths
above determines a lock ordering any longer.
Reported-by: William Tu <u9012063@gmail.com>
Reported-at: http://openvswitch.org/pipermail/discuss/2016-February/020216.html
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Ryan Moats <rmoats@us.ibm.com>
Tested-by: William Tu <u9012063@gmail.com>
Implementation of new statistics extension for DPDK ports:
- Add new counters definition to netdev struct and open flow,
based on RFC2819.
- Initialize netdev statistics as "filtered out"
before passing it to particular netdev implementation
(because of that change, statistics which are not
collected are reported as filtered out, and some
unit tests were modified in this respect).
- New statistics are retrieved using experimenter code and
are printed as a result to ofctl dump-ports.
- New counters are available for OpenFlow 1.4+.
- Add new vendor id: INTEL_VENDOR_ID.
- New statistics are printed to output via ofctl only if those
are present in reply message.
- Add new file header: include/openflow/intel-ext.h which
contains new statistics definition.
- Extended statistics are implemented only for dpdk-physical
and dpdk-vhost port types.
- Dpdk-physical implementation uses xstats to collect statistics.
- Dpdk-vhost implements only part of statistics (RX packet sized
based counters).
Signed-off-by: Michal Weglicki <michalx.weglicki@intel.com>
[blp@ovn.org made software devices more consistent]
Signed-off-by: Ben Pfaff <blp@ovn.org>
Existing DPDK integration is provided by use of command line options which
must be split out and passed to librte in a special manner. However, this
forces any configuration to be passed by way of a special DPDK flag, and
interferes with ovs+dpdk packaging solutions.
This commit delays dpdk initialization until after the OVS database
connection is established, at which point ovs initializes librte. It
pulls all of the config data from the OVS database, and assembles a
new argv/argc pair to be passed along.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Kevin Traynor <kevin.traynor@intel.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Some point-to-point devices like TUN devices will not have an address, and while
iterating over ifaddrs, its ifa_addr will be NULL. This patch fixes a crash when
starting ovs-vswitchd on a system with such a device.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Fixes: a8704b502785 ("tunneling: Handle multiple ip address for given device.")
Cc: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
This attempts to prevent namespace collisions with other list libraries
Signed-off-by: Ben Warren <ben@skyportsystems.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
All code is now in include/openvswitch/list.h.
Signed-off-by: Ben Warren <ben@skyportsystems.com>
Acked-by: Ryan Moats <rmoats@us.ibm.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Since netdev can have multiple IP address use
generic api netdev_get_addr_list(). This also make it
easier to handle IPv4 and IPv6 address across vswitchd
layers.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
Device can have multiple IP address but netdev_get_in4/6()
returns only one configured IPv6 address. Following
patch fixes it.
OVS router is also updated to return source ip address for
given destination, This is required when interface has multiple
IP address configured.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
Made to simplify creation of derived classes.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
The comment was incomplete in some ways and simply wrong in others.
Also ensure that *cnt is set to 0 if an error is encountered. It's nice
when callers can rely on this.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
This command can be used to check the port/rxq assignment to
pmd threads. For each pmd thread of the datapath shows list
of queue-ids with port names.
Additionally log message from pmd_thread_main() extended with
queue-id, and type of this message changed from INFO to DBG.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
This manifested as a memory leak in test 898 "ofproto-dpif - sFlow packet
sampling - tunnel set", which included an output to a tunnel vport that
doesn't have an implementation of netdev_send().
Reported-by: William Tu <u9012063@gmail.com>
Reported-at: http://openvswitch.org/pipermail/dev/2016-February/065873.html
Signed-off-by: Ben Pfaff <blp@ovn.org>
Currently, all of the PMD netdevs can only have the same number of
rx queues, which is specified in other_config:n-dpdk-rxqs.
Fix that by introducing of new option for PMD interfaces: 'n_rxq', which
specifies the maximum number of rx queues to be created for this
interface.
Example:
ovs-vsctl set Interface dpdk0 options:n_rxq=8
Old 'other_config:n-dpdk-rxqs' deleted.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Define struct eth_addr and use it instead of a uint8_t array for all
ethernet addresses in OVS userspace. The struct is always the right
size, and it can be assigned without an explicit memcpy, which makes
code more readable.
"struct eth_addr" is a good type name for this as many utility
functions are already named accordingly.
struct eth_addr can be accessed as bytes as well as ovs_be16's, which
makes the struct 16-bit aligned. All use seems to be 16-bit aligned,
so some algorithms on the ethernet addresses can be made a bit more
efficient making use of this fact.
As the struct fits into a register (in 64-bit systems) we pass it by
value when possible.
This patch also changes the few uses of Linux specific ETH_ALEN to
OVS's own ETH_ADDR_LEN, and removes the OFP_ETH_ALEN, as it is no
longer needed.
This work stemmed from a desire to make all struct flow members
assignable for unrelated exploration purposes. However, I think this
might be a nice code readability improvement by itself.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
The addition of Geneve options to packet metadata significantly
expanded its size. It was reported that this can decrease performance
for DPDK ports by up to 25% since we need to initialize the whole
structure on each packet receive.
It is not really necessary to zero out the entire structure because
miniflow_extract() only copies the tunnel metadata when particular
fields indicate that it is valid. Therefore, as long as we zero out
these fields when the metadata is initialized and ensure that the
rest of the structure is correctly set in the presence of a tunnel,
we can avoid touching the tunnel fields on packet reception.
Reported-by: Ciara Loftus <ciara.loftus@intel.com>
Tested-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>