This is the third patch in the patch-set to support dynamic rebalancing
of offloaded flows.
The dynamic rebalancing functionality is implemented in this patch. The
ukeys that are not scheduled for deletion are obtained and passed as input
to the rebalancing routine. The rebalancing is done in the context of
revalidation leader thread, after all other revalidator threads are
done with gathering rebalancing data for flows.
For each netdev that is in OOR state, a list of flows - both offloaded
and non-offloaded (pending) - is obtained using the ukeys. For each netdev
that is in OOR state, the flows are grouped and sorted into offloaded and
pending flows. The offloaded flows are sorted in descending order of
pps-rate, while pending flows are sorted in ascending order of pps-rate.
The rebalancing is done in two phases. In the first phase, we try to
offload all pending flows and if that succeeds, the OOR state on the device
is cleared. If some (or none) of the pending flows could not be offloaded,
then we start replacing an offloaded flow that has a lower pps-rate than
a pending flow, until there are no more pending flows with a higher rate
than an offloaded flow. The flows that are replaced from the device are
added into kernel datapath.
A new OVS configuration parameter "offload-rebalance", is added to ovsdb.
The default value of this is "false". To enable this feature, set the
value of this parameter to "true", which provides packets-per-second
rate based policy to dynamically offload and un-offload flows.
Note: This option can be enabled only when 'hw-offload' policy is enabled.
It also requires 'tc-policy' to be set to 'skip_sw'; otherwise, flow
offload errors (specifically ENOSPC error this feature depends on) reported
by an offloaded device are supressed by TC-Flower kernel module.
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Co-authored-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Reviewed-by: Sathya Perla <sathya.perla@broadcom.com>
Reviewed-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
This is the first patch in the patch-set to support dynamic rebalancing
of offloaded flows.
The patch detects OOR condition on a netdev port when ENOSPC error is
returned by TC-Flower while adding a flow rule. A new structure is added
to the netdev called "netdev_hw_info", to store OOR related information
required to perform dynamic offload-rebalancing.
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Co-authored-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Reviewed-by: Sathya Perla <sathya.perla@broadcom.com>
Reviewed-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
The function comment for netdev_queue_dump_next() said that it cleared its
'detail' argument, but it didn't actually do that, which meant that details
could be incorrectly carried along from one queue to the next.
Reported-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
This patch in glibc [0] is fixing a bug where we may be getting
inconsistent dumps from the kernel when listing interfaces due to
a race condition.
This could happen if we try to retrieve them while interfaces are
being added/removed from the system at the same time.
For systems running against old glibc versions, this patch is retrying
the operation up to 3 times and then proceeding by logging a
warning.
Note that 3 times should be enough to not delay the operation much
and since it's unlikely that we hit the race condition 3 times in
a row. Still, if this happened, this patch is not changing the
current behavior.
[0] https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=c1f86a33ca32e26a9d6e29fc961e5ecb5e2e5eb4
Signed-off-by: Daniel Alvarez <dalvarez@redhat.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Co-authored-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Add a new class op for netdevs to get the block_id if one exists. The
block_id is used in offload ops to group multiple qdiscs together.
Stub calls are made to the new class op (implementation to follow in
further patches). The default block_id of 0 (no block) will be used in
these cases.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Previously, any rule that is offloaded via a netdev, not necessarily
to the HW, would be reported as "offloaded". This patch fixes this
misalignment, and introduces the 'dp' state, as follows:
rule is in HW via TC offload -> offloaded=yes dp:tc
rule is in not HW over TC DP -> offloaded=no dp:tc
rule is in not HW over OVS DP -> offloaded=no dp:ovs
To achieve this, the flows's 'offloaded' flag was encapsulated in a new
attrs struct, which contains the offloaded state of the flow and the
DP layer the flow is handled in, and instead of setting the flow's
'offloaded' state based solely on the type of dump it was acquired
via, for netdev flows it now sends the new attrs struct to be
collected along with the rest of the flow via the netdev, allowing
it to be set per flow.
For TC offloads, the offloaded state is set based on the 'in_hw' and
'not_in_hw' flags received from the TC as part of the flower. If no
such flag was received, due to lack of kernel support, it defaults
to true.
Signed-off-by: Gavi Teitz <gavi@mellanox.com>
Acked-by: Roi Dayan <roid@mellanox.com>
[simon: resolved conflict in lib/dpctl.man]
Signed-off-by: Simon Horman <simon.horman@netronome.com>
The patch adds additional 'struct netdev *' to the
native tunnel's push_header() interface. This is used
for later GRE sequence number support.
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
If the caller provides a non-NULL qfill pointer and the netdev
implemementation supports reading the rx queue fill level, the rxq_recv()
function returns the remaining number of packets in the rx queue after
reception of the packet burst to the caller. If the implementation does
not support this, it returns -ENOTSUP instead. Reading the remaining queue
fill level should not substantilly slow down the recv() operation.
A first implementation is provided for ethernet and vhostuser DPDK ports
in netdev-dpdk.c.
This output parameter will be used in the upcoming commit for PMD
performance metrics to supervise the rx queue fill level for DPDK
vhostuser ports.
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Acked-by: Billy O'Mahony <billy.o.mahony@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Recently, an issue was debugged that was thought to be a bond
failover triggered issue. It turned out to an vlan interface MTU set issue
that had nothing to do with bonding or most other likely possibilities.
Besides the effect of not setting the MTU to the desired value, this can
result in increased netlink traffic and processing with associated wasted
work. Let us flag a configuration issue at warn level (rather than dbg) to
catch the problem early.
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Fixes: ee4776b8bce1 ("netdev: New function netdev_get_ip_by_name().")
Suggested-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
This is like netdev_get_in4_by_name() but accepts any IP address instead
of just an IPv4 address.
It will acquire its first user in an upcoming commit.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Mark Michelson <mmichels@redhat.com>
The existing functions for working with sockaddr_storage that contain an
IPv4 or IPv6 address are useful. This commit adds more functions for
working with them, as well as a parallel set of functions for struct
sockaddr.
This also adds an initial user for some of the new sockaddr functions in
netdev.c.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Mark Michelson <mmichels@redhat.com>
Until now, the ofp-print code has had a lot of logic specific to
individual messages. This code is better put with the other code specific
to those messages, so this commit starts to migrate it.
There is more work of a similar type to do, but this is a reasonable start.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
- New get_custom_stats interface function is added to netdev. It
allows particular netdev implementation to expose custom
counters in dictionary format (counter name/counter value).
- New statistics are retrieved using experimenter code and
are printed as a result to ofctl dump-ports.
- New counters are available for OpenFlow 1.4+.
- New statistics are printed to output via ofctl only if those
are present in reply message.
- New statistics definition is added to include/openflow/intel-ext.h.
- Custom statistics are implemented only for dpdk-physical
port type.
- DPDK-physical implementation uses xstats to collect statistics.
Only dropped and error counters are exposed.
Co-authored-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Michal Weglicki <michalx.weglicki@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
FreeBSD insists that <sys/types.h> be included before <netinet/in.h> and
that <netinet/in.h> be included before <arpa/inet.h>. This adds guards to
the "sparse" headers to yield a warning if this order is violated. This
commit also adjusts the order of many #includes to suit this requirement.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
Not needed anymore because 'may_steal' already handled on
dpif-netdev layer and always true.
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com
netdev_get_etheraddr claims to clear 'mac' on error, but it fails to do so.
When looking further into both netdev_windows_get_etheraddr() and
netdev_linux_get_etheraddr(), 'mac' is also not cleared. This will lead to
usage of uninitialised ofputil_phy_port.hw_addr.
v1 -> v2: fixed a bug in v1 found by Ben, thanks Ben.
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Until now, the code for mapping ODP port number to ifindexes and vice versa
has maintained two completely separate data structures, one for each
direction. It was possible for the two mappings to become out of sync
with each other since either one could change independently. This commit
merges them into a single data structure (with two indexes), which at least
means that if one is removed then the other is as well.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
White space changes only.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
a crash is seen in "netdev_ports_remove" when an interface is deleted and added
back in the system and when the interface is part of a bridge configuration.
e.g. steps:
create a tap0 interface using "ip tuntap add.."
add the tap0 interface to br0 using "ovs-vsctl add-port.."
delete the tap0 interface from system using "ip tuntap del.."
add the tap0 interface back in system using "ip tuntap add.."
(this changes the ifindex of the interface)
delete tap0 from br0 using "ovs-vsctl del-port.."
In the function "netdev_ports_insert", two hmap entries were created for
mapping "portnum -> netdev" and "ifindex -> portnum".
When the interface is deleted from the system, the "netdev_ports_remove"
function is not getting called and the old ifindex entry is not getting
cleaned up from the "ifindex_to_port" hmap.
As part of the fix, added function "dpif_port_remove" which will call
"netdev_ports_remove" in the path where the interface deletion from the system
is detected.
Also, in "netdev_ports_remove", added the code where the "ifindex_to_port_data"
(ifindex -> portnum map node) is getting freed when the ifindex is not
available any more. (as the interface is already deleted.)
VMware-BZ: #1975788
Signed-off-by: Ashish Varma <ashishvarma.ovs@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Since 57eebbb4c315, the caller must make sure that 'netdev' supports
sending. This mentioned at the start of the comment.
Fixes: 57eebbb4c315 ("dpif-netdev: Don't try to output on a device without txqs.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Poll-loop is the core to implement main loop. It should be available in
libopenvswitch.
Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Instead of freeing in the error path, move the allocation
after it. Found by inspection.
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Previously, netdev_ports_insert() would allocate and insert an
ifindex->odp_port mapping, but netdev_ports_remove() would never remove
the mapping or free the mapping structure. This patch fixes these up.
Fixes: 32b77c316d9982("dpif: Save added ports in a port map.")
Reported-by: Andy Zhou <azhou@ovn.org>
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Andy Zhou <azhou@ovn.org>
When the interfaces list is retrieved through getiffaddrs(), there
might be elements with iface_name set to NULL.
This patch checks ifa_name to be not NULL before comparing it to the
actual device name in the loop that calculates how many interfaces
exist with that same name.
Also, this patch checks that ifa_netmask is not NULL for coherence
with the existing code so that it doesn't allocate more memory
than needed if this field is NULL.
Note, that these checks are already being done later in the function
so it should be done in both places.
Signed-off-by: Daniel Alvarez <dalvarez@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Lance Richardson <lrichard@redhat.com>
Due to commit 67ac844 an existing issue with OVS persisten ports
surfaced. If we revert the commit we no longer get the error, and
basic traffic will flow. However the wrong netdev class is used, hence
the wrong callbacks get called.
The main issue is with netdev_open() being called with type = NULL
before the interface is actually configured in the system. This patch
tracks these "auto" generated interfaces, and once netdev_open() gets
called with a valid type, re-configures (re-create) it.
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
It's basically what is being passed today and passing a specific
type adds a compiler type check.
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
glibc sometimes doesn't initialize the ifa_netmask and ifa_addr fields, if
the ioctl to fetch them fails. Check ifa_name also just for paranoia.
Signed-off-by: Haifeng Lin <haifeng.lin@huawei.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
When trying to modify an interface option (e.g. remote IP of a GRE port) to
an invalid value, the vswitchd does crash. For instance:
ovs-vsctl add-br br0
ovs-vsctl add-port br0 gre0 -- set interface gre0 type=gre \
options:remote_ip=10.0.0.2
ovs-vsctl set interface gre0 options:remote_ip=9.9.9
The bug is caused by trying to dereference a NULL pointer. It was introduced
by the commit 9fff138ec3a6. Before that, the NULL pointer was handled by the
VLOG_WARN_BUF macro.
Signed-off-by: Zoltán Balogh <zoltan.balogh@ericsson.com>
CC: Daniele Di Proietto <diproiettod@vmware.com>
Fixes: 9fff138ec3a6 ("netdev: Add 'errp' to set_config().")
Signed-off-by: Ben Pfaff <blp@ovn.org>
In netdev_gre_build_header(), GRE protocol and VXLAN next_potocol is set based
on packet_type of flow. If it's about an Ethernet packet, it is set to
ETP_TYPE_TEB. Otherwise, if the name space is OFPHTN_ETHERNET, it is set
according to the name space type.
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Using the correct type reduces the need for type conversions.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Jan Scheurich <jan.scheurich@ericsson.com>
Reviewed-by: nickcooper-zhangtonghao <nic@opencloud.tech>
Ports already added to a switch are not being initialized for offloading
so when enabling offload we need to go over those ports.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Search all datapath added netdevs for a given flow
using netdev flow api and parse it back to dpif flow.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
If a flow was offloaded to a netdev we delete it using netdev
flow api.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
While dumping flows, dump flows that were offloaded to
netdev and parse them back to dpif flow.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
If netdev flow offloading is enabled, flush all
added ports using netdev flow api.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
To use netdev flow offloading api, dpifs needs to iterate over
added ports. This addition inserts the added dpif ports in a hash map,
The map will also be used to translate dpif ports to netdevs.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Add a new configuration tc-policy option that controls tc
flower flag. Possible options are none, skip_sw, skip_hw.
The default is none which is to insert the rule both to sw and hw.
This option is only relevant if hw-offload is enabled.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Add a new configuration option - hw-offload that enables netdev
flow api. Enabling this option will allow offloading flows
using netdev implementation instead of the kernel datapath.
This configuration option defaults to false - disabled.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Add a new API interface for offloading dpif flows to netdev.
The API consist on the following:
flow_put - offload a new flow
flow_get - query an offloaded flow
flow_del - delete an offloaded flow
flow_flush - flush all offloaded flows
flow_dump_* - dump all offloaded flows
In upcoming commits we will introduce an implementation of this
API for netdev-linux.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
When trying to configure a system port as type=internal it could start
an infinite port creation loop. When this happens you will see the
following log messages:
2017-06-01T09:00:17.900Z|02813|dpif|WARN|system@ovs-system: failed to add ve01_1 as port: File exists
2017-06-01T09:00:17.900Z|02814|bridge|WARN|could not add network device ve01_1 to ofproto (File exists)
2017-06-01T09:00:17.907Z|02815|bridge|INFO|bridge bzb: added interface ve01_1 on port 2
2017-06-01T09:00:17.909Z|02816|bridge|INFO|bridge bzb: deleted interface ve01_1 on port 2
2017-06-01T09:00:17.914Z|02817|dpif|WARN|system@ovs-system: failed to add ve01_1 as port: File exists
2017-06-01T09:00:17.914Z|02818|bridge|WARN|could not add network device ve01_1 to ofproto (File exists)
2017-06-01T09:00:17.921Z|02819|bridge|INFO|bridge bzb: added interface ve01_1 on port 3
2017-06-01T09:00:17.923Z|02820|bridge|INFO|bridge bzb: deleted interface ve01_1 on port 3
2017-06-01T09:00:17.929Z|02821|dpif|WARN|system@ovs-system: failed to add ve01_1 as port: File exists
2017-06-01T09:00:17.929Z|02822|bridge|WARN|could not add network device ve01_1 to ofproto (File exists)
2017-06-01T09:00:17.936Z|02823|bridge|INFO|bridge bzb: added interface ve01_1 on port 4
...
...
This is how to replicate it:
ip link add name ve01_1 type veth peer name ve01_2
ovs-vsctl add-br bzb
ovs-vsctl add-port bzb ve01_1
ovs-vsctl set interface ve01_1 type=internal
ip link set dev ve01_1 up
ip link set dev ve01_2 up
When changing the type to internal, the async configuration logic get
triggered and because the type has changed it will delete the
interface and the ofproto port. Next it will call iface_do_create() to
re-create the interface as internal. Because we just deleted the
interface netdev_open() will try to recreate it as internal.
However this will fail with EEXIST as a system interface already
exists withe the name.
Up till here all is fine...
Now some ipv6 route change comes along for the ve01_1 interface, and
the route infrastructure will call netdev_open(). This will create the
interface of type system.
Next the configuration verify process gets triggered due to
if_notifier_changed() being true. We now retry the above, but because
the interface exists (although in the system class) it will use it,
and create the interface successfully.
This triggers another if notification, causing yet another config
update, and because the system != internal reconfiguration happens and
it start from the top...
So the fix as presented below is causing netdev_open() only to return
the existing device for the class type requested (if the type is
specified).
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
If the device name uses a vport prefix, then use that vport type.
Since these names are reserved, we can assume this is the right type.
This is important when we are querying the datapath right after vswitch has
started and using the right type will be even more important when we add support
to creating tunnel ports with rtnetlink.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
The empty string is not a valid name for a network device. I would have
expected that each of the netdev provider implementations would reject an
empty string, but there was a special case for Linux tap devices where they
instead caused unexpected behavior. This commit should fix the problem for
those devices and every other kind.
Reported-by: Gabor Locsei <gabor.locsei@ericsson.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2017-February/043613.html
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Girish Moodalbail <girish.moodalbail@oracle.com>
Acked-by: Andy Zhou <azhou@ovn.org>
One common use case of 'struct dp_packet_batch' is to process all
packets in the batch in order. Add an iterator for this use case
to simplify the logic of calling sites,
Another common use case is to drop packets in the batch, by reading
all packets, but writing back pointers of fewer packets. Add macros
to support this use case.
Signed-off-by: Andy Zhou <azhou@ovn.org>
Acked-by: Jarno Rajahalme <jarno@ovn.org>
Since commit 55e075e65ef9("netdev-dpdk: Arbitrary 'dpdk' port naming"),
we don't call rte_eth_start() from netdev_open() anymore, we only call
it from netdev_reconfigure(). This commit does that also for 'dpdkr'
devices, and remove some useless code.
Calling rte_eth_start() also from netdev_open() was unnecessary and
wasteful. Not doing it reduces code duplication and makes adding a port
faster (~900ms before the patch, ~400ms after).
Another reason why this is useful is that some DPDK driver might have
problems with reconfiguration. For example, until DPDK commit
8618d19b52b1("net/vmxnet3: reallocate shared memzone on re-config"),
vmxnet3 didn't support being restarted with a different number of
queues.
Technically, the netdev interface changed because before opening rxqs or
calling netdev_send() the user must check if reconfiguration is
required. This patch also documents that, even though no change to the
userspace datapath (the only user) is required.
Lastly, this patch makes sure the errors returned by ofproto_port_add
(which includes the first port reconfiguration) are reported back to the
database.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Tunnel devices have 0 txqs and don't support netdev_send(). While
netdev_send() simply returns EOPNOTSUPP, the XPS logic is still executed
on output, and that might be confused by devices with no txqs.
It seems better to have different structures in the fast path for ports
that support netdev_{push,pop}_header (tunnel devices), and ports that
support netdev_send. With this we can also remove a branch in
netdev_send().
This is also necessary for a future commit, which starts DPDK devices
without txqs.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Since 55e075e65ef9("netdev-dpdk: Arbitrary 'dpdk' port naming"),
set_config() is used to identify a DPDK device, so it's better to report
its detailed error message to the user. Tunnel devices and patch ports
rely a lot on set_config() as well.
This commit adds a param to set_config() that can be used to return
an error message and makes use of that in netdev-dpdk and netdev-vport.
Before this patch:
$ ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
ovs-vsctl: Error detected while setting up 'dpdk0': dpdk0: could not set
configuration (Invalid argument). See ovs-vswitchd log for details.
ovs-vsctl: The default log directory is "/var/log/openvswitch/".
$ ovs-vsctl add-port br0 p+ -- set Interface p+ type=patch
ovs-vsctl: Error detected while setting up 'p+': p+: could not set
configuration (Invalid argument). See ovs-vswitchd log for details.
ovs-vsctl: The default log directory is "/var/log/openvswitch/".
$ ovs-vsctl add-port br0 gnv0 -- set Interface gnv0 type=geneve
ovs-vsctl: Error detected while setting up 'gnv0': gnv0: could not set
configuration (Invalid argument). See ovs-vswitchd log for details.
ovs-vsctl: The default log directory is "/var/log/openvswitch/".
After this patch:
$ ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
ovs-vsctl: Error detected while setting up 'dpdk0': 'dpdk0' is missing
'options:dpdk-devargs'. The old 'dpdk<port_id>' names are not
supported. See ovs-vswitchd log for details.
ovs-vsctl: The default log directory is "/var/log/openvswitch/".
$ ovs-vsctl add-port br0 p+ -- set Interface p+ type=patch
ovs-vsctl: Error detected while setting up 'p+': p+: patch type requires
valid 'peer' argument. See ovs-vswitchd log for details.
ovs-vsctl: The default log directory is "/var/log/openvswitch/".
$ ovs-vsctl add-port br0 gnv0 -- set Interface gnv0 type=geneve
ovs-vsctl: Error detected while setting up 'gnv0': gnv0: geneve type
requires valid 'remote_ip' argument. See ovs-vswitchd log for
details.
ovs-vsctl: The default log directory is "/var/log/openvswitch/".
CC: Ciara Loftus <ciara.loftus@intel.com>
CC: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Tested-by: Ciara Loftus <ciara.loftus@intel.com>