Linux ``No operation'' qos type is used to inform the vswitch that the
traffic control for the port is managed externally. Any configuration values
set for this type will have no effect.
This patch provides a solution suggested in this mail -
http://openvswitch.org/pipermail/discuss/2015-May/017687.html
Signed-off-by: Babu Shanmugam <bschanmu@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
This introduces in dpif-netdev and netdev-dpdk the first use for the
newly introduce reconfigure netdev call.
When a request to change the number of queues comes, netdev-dpdk will
remember this and notify the upper layer via
netdev_request_reconfigure().
The datapath, instead of periodically calling netdev_set_multiq(), can
detect this and call reconfigure().
This mechanism can also be used to:
* Automatically match the number of rxq with the one provided by qemu
via the new_device callback.
* Provide a way to change the MTU of dpdk devices at runtime.
* Move a DPDK vhost device to the proper NUMA socket.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Tested-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
A netdev provider, especially a PMD provider (like netdev DPDK) might
not be able to change some of its parameters (such as MTU, or number of
queues) without stopping everything and restarting.
This commit introduces a mechanism that allows a netdev provider to
request a restart (netdev_request_reconfigure()). The upper layer can
be notified via netdev_wait_reconf_required() and
netdev_is_reconf_required(). After closing all the rxqs the upper layer
can finally call netdev_reconfigure(), to make sure that the new
configuration is in place.
This will be used by next commit to reconfigure rx and tx queues in
netdev-dpdk.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Tested-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Implementation of new statistics extension for DPDK ports:
- Add new counters definition to netdev struct and open flow,
based on RFC2819.
- Initialize netdev statistics as "filtered out"
before passing it to particular netdev implementation
(because of that change, statistics which are not
collected are reported as filtered out, and some
unit tests were modified in this respect).
- New statistics are retrieved using experimenter code and
are printed as a result to ofctl dump-ports.
- New counters are available for OpenFlow 1.4+.
- Add new vendor id: INTEL_VENDOR_ID.
- New statistics are printed to output via ofctl only if those
are present in reply message.
- Add new file header: include/openflow/intel-ext.h which
contains new statistics definition.
- Extended statistics are implemented only for dpdk-physical
and dpdk-vhost port types.
- Dpdk-physical implementation uses xstats to collect statistics.
- Dpdk-vhost implements only part of statistics (RX packet sized
based counters).
Signed-off-by: Michal Weglicki <michalx.weglicki@intel.com>
[blp@ovn.org made software devices more consistent]
Signed-off-by: Ben Pfaff <blp@ovn.org>
The tc_police structure was filled with a value calculated in bits
instead of bytes while bytes were expected. This led the setting
of an x8 higher burst value.
Documentation and defaults have been corrected accordingly to minimize
nuisances on users sticking to the defaults.
The suggested burst value is now 80% of policing rate to make sure
TCP works correctly.
Signed-off-by: Miguel Angel Ajo <majopela@redhat.com>
Tested-by: Miguel Angel Ajo <majopela@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Valgrind reports "Conditional jump or move depends on uninitialised value"
and "Use of uninitialised value" at case 2016 ovn -- 3 HVs, 1 LS, 3
lports/HV. It is caused by 1) assigning an uninitialized value to 'key->hash'
at emc_processing(). Due to uninit rss_hash_valid, dp_packet_rss_valid() might
return true and undefined hash value is returned, and 2) at emc_lookup, the
'current_entry->key.hash' could be uninitialized due to dp_packet_clone().
The patch fixes the two and as a result, a couple of calls to
dp_packet_rss_invalidate() become redundant and thus are removed.
Call stacks:
- Connditional jump or move depends on uninitialised value(s)
dpif_netdev_packet_get_rss_hash (dpif-netdev.c:3334)
emc_processing (dpif-netdev.c:3455)
dp_netdev_input__ (dpif-netdev.c:3639)
and,
- Use of uninitialised value of size 8
emc_lookup (dpif-netdev.c:1785)
emc_processing (dpif-netdev.c:3457)
dp_netdev_input__ (dpif-netdev.c:3639)
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Since netdev can have multiple IP address use
generic api netdev_get_addr_list(). This also make it
easier to handle IPv4 and IPv6 address across vswitchd
layers.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
Device can have multiple IP address but netdev_get_in4/6()
returns only one configured IPv6 address. Following
patch fixes it.
OVS router is also updated to return source ip address for
given destination, This is required when interface has multiple
IP address configured.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
Made to simplify creation of derived classes.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Listen to RTNLGRP_IPV6_IFINFO to get IPv6 address change
notification.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
Instead of reading
"error receiving Ethernet packet on Permission denied: ens3",
it should read
"error receiving Ethernet packet on ens3: Permission denied".
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Handle advertised and supported flags for the following speeds:
* 1G base KX
* 10G base KX4, KR, R
* 40G base KR4, CR4, SR4, LR4
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
This includes bits for:
* Backplane
* 1000 baseKX (full duplex)
* All speeds of 10Gbit and above other than 10000 baseT (full duplex).
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
While splitting netdev_linux_rx_recv() into netdev_linux_rx_recv_sock()
and netdev_linux_rx_recv_tap() in commit
b73c85181df9 ("netdev-linux: Read packet auxdata to obtain vlan_tid")
error handling part was copied 'as is' to both functions.
But in case of netdev_linux_rx_recv_tap(), according to POSIX, the
number of bytes read shall never be greater than 'size'.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
When there are PtP TUN devices on the system or SIT devices, tests will fail
because of a warning that it was not possible to get their Ethernet addresses.
That call comes from the route code adding tunnel ports.
Make that warning an informational message and filter that out during tests.
Also, return EINVAL when trying to get those interface Ethernet addresses, which
will prevent them from being added to the tunnel ports pool and will properly
fail in other places as well.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
DPDK mbufs contain a valid RSS hash only if PKT_RX_RSS_HASH is
set in 'ol_flags'. Otherwise the hash is garbage and doesn't
relate to the packet.
This fixes an issue with vhost, which, being a virtual NIC, doesn't
compute the hash.
Reported-by: Dongjun <dongj@dtdream.com>
Suggested-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Kevin Traynor <kevin.traynor@intel.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Check if ethtool flags is already set on a netdev, before trying to set it.
This patch works around issues with some older verison of ethernet drivers,
which tend to reset the NIC when call to disable LRO is made, even if LRO is
already disable on that NIC. NIC reset is not desirable in OVS upgrade scenario
as it causes extended downtime.
Signed-off-by: Anoob Soman <anoob.soman@citrix.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Define struct eth_addr and use it instead of a uint8_t array for all
ethernet addresses in OVS userspace. The struct is always the right
size, and it can be assigned without an explicit memcpy, which makes
code more readable.
"struct eth_addr" is a good type name for this as many utility
functions are already named accordingly.
struct eth_addr can be accessed as bytes as well as ovs_be16's, which
makes the struct 16-bit aligned. All use seems to be 16-bit aligned,
so some algorithms on the ethernet addresses can be made a bit more
efficient making use of this fact.
As the struct fits into a register (in 64-bit systems) we pass it by
value when possible.
This patch also changes the few uses of Linux specific ETH_ALEN to
OVS's own ETH_ADDR_LEN, and removes the OFP_ETH_ALEN, as it is no
longer needed.
This work stemmed from a desire to make all struct flow members
assignable for unrelated exploration purposes. However, I think this
might be a nice code readability improvement by itself.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
This commit makes netdev_linux_set_in4() cache the result of previous
reading of in4 address (successful or not).
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
The get_in6() API defined in netdev-provider.h requires the return
of error values when the 'netdev' has no assigned IPv6 address or
the 'netdev' does not support IPv6. However, the netdev_linux_get_in6()
implementation does not follow this (always return 0). And this
causes the a bug in deleting vlan interfaces created for vlan
splinter.
This commit makes netdev_linux_get_in6() conform to the API definition
and returns the specified error value when failing to get IPv6 address.
VMware-BZ: #1485521
Reported-by: Ronald Lee <ronaldlee@vmware.com>
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
and RTNLGRP_IPV6_IFADDR.
Currently the netdev_linux_notify_sock only joins multicast group
RTNLGRP_LINK for link status change notification. Some ovs features
also require the detection of ip addresses changes and update of the
netdev-linux's cache. To achieve this, we need to make
netdev_linux_notify_sock join the multicast group RTNLGRP_IPV4_IFADDR
and RTNLGRP_IPV6_IFADDR.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
RTNLGRP_IPV6_IFADDR.
This commit renames the rtnetlink-link.{c,h} to rtnetlink.{c,h}
and extends the module to support RTNLGRP_IPV4_IFADDR and
RTNLGRP_IPV4_IFADDR multicast groups. A later patch will start
using this module to react to interface address changes.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
When there is no vport for a given netdev, dpif_netlink_vport_get might return
ENODEV. Do not warn a failure to get port stats when that's the case.
This happens when the userspace switch is used.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
We already have the 'dp_hash' embedded in the metadata. This caused
confusion in the code. With this commit it should be clear that
'rss_hash' is the packet hash used for internal purposes, while
'md.dp_hash' is part of the flow, computed during the execution of
certain actions.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
Currently, ovs uses hardcoded rate2quantum = 10 for each htb qdisc.
When qdisc class's rate is small, the resulting quantum (calculated
by min_rate / rate2quantum) will be smaller than MTU. This is not
recommended and tc will keep complaining the following in syslog.
localhost kernel: HTB: quantum of class 10003 is small. Consider r2q change.
localhost kernel: HTB: quantum of class 10004 is small. Consider r2q change.
localhost kernel: HTB: quantum of class 10005 is small. Consider r2q change.
localhost kernel: HTB: quantum of class 10006 is small. Consider r2q change.
To fix the issue, this commit makes ovs always use htb quantum no less
than the MTU.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Commit 677d9158fc0a (netdev-linux: Support for SFQ, FQ_CoDel and CoDel
qdiscs.) added support for new qdiscs. The commit uses TCA_CODEL_* and
TCA_FQ_CODEL_* not in old kernel headers, causing a build failure against
such headers. This commit should fix the problem by defining these values
ourselves. (I haven't tested it against old headers, so I might have
missed something, but it's a straightforward change and at worst won't do
harm.)
It appears that sfq (also added by the same commit) was in Linux before
2.6.32, so it seems unlikely that we need any compatibility code there.
CC: Jonathan Vestin <jonavest@kau.se>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This patch adds support for SFQ, CoDel and FQ_CoDel classless qdiscs to Open vSwitch. It also removes the requirement for a QoS to have at least one Queue (as this makes no sense when using classless qdiscs). I have also not implemented class_{get,set,delete,get_stats,dump_stats} because they are meant for qdiscs with classes.
Signed-off-by: Jonathan Vestin <jonavest@kau.se>
[blp@nicira.com mostly applied stylistic changes]
Signed-off-by: Ben Pfaff <blp@nicira.com>
Otherwise we can't detect classless qdiscs.
This has no useful effect for the currently supported qdiscs, which all
have classes, but it makes it possible to add support for new classless
qdiscs.
This suddenly makes netdev-linux complain about qdiscs it doesn't know
about (e.g. pfifo_fast), which isn't too useful, so this commit also
demotes that INFO message to DBG level.
Reported-by: Jonathan Vestin <jonavest@kau.se>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Otherwise the policing limits could make no sense if large rates were
specified.
Reported-by: Zhangguanghui <zhang.guanghui@h3c.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ansis Atteka <aatteka@nicira.com>
Currently dp-packet make use of ofpbuf for managing packet
buffers. That complicates ofpbuf, by making dp-packet
independent of ofpbuf both libraries can be optimized for
their own use case.
This avoids mapping operation between ofpbuf and dp_packet
in datapath upcalls.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
dp_packet is short and better name for datapath packet
structure.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
A new function vlog_insert_module() is introduced to avoid using
list_insert() from the vlog.h header.
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Following patch adds support for userspace tunneling. Tunneling
needs three more component first is routing table which is configured by
caching kernel routes and second is ARP cache which build automatically
by snooping arp. And third is tunnel protocol table which list all
listening protocols which is populated by vswitchd as tunnel ports
are added. GRE and VXLAN protocol support is added in this patch.
Tunneling works as follows:
On packet receive vswitchd check if this packet is targeted to tunnel
port. If it is then vswitchd inserts tunnel pop action which pops
header and sends packet to tunnel port.
On packet xmit rather than generating Set tunnel action it generate
tunnel push action which has tunnel header data. datapath can use
tunnel-push action data to generate header for each packet and
forward this packet to output port. Since tunnel-push action
contains most of packet header vswitchd needs to lookup routing
table and arp table to build this action.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Ben Pfaff <blp@nicira.com>
We have to define our own with some kernel headers, so we might as well do
it everywhere, especially since there seems to be a problem with detecting
the presence of the definition with at least some kernels.
Reported-by: Wang Sheng-Hui <shhuiw@gmail.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
The patch contains the necessary modifications to compile and also to run
under MSVC.
Added the files to the build system and also changed dpif_linux to be under
a more generic name dpif_windows.
Added a TODO under the windows part in case we want to implement another
counterpart for epoll functions.
Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This commit adds a new API to the 'struct netdev_class' which
allows user to configure the number of tx queues and rx queues
of 'netdev'. Upcoming patches will use this function to set
multiple tx/rx queues when adding the netdev to dpif-netdev.
Currently, only netdev-dpdk module implements this function.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
There are couple of reasons to remove this support:
* This is used in very old OVS use-case. It is much better
to read stats directly from OVS.
* Forthcoming commit will remove support for setting stats
for vport. The stats update depends on stats-set.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This commit adds new variable n_txq to 'struct netdev' for recording
the number of tx queues. Correspondingly, the send_*() functions are
extended to accept queue id as input argument.
All 'netdev-*' implementation will ignore the queue id since having
multiple tx queues is not supported. Upcomping patches will start
using it and create multiple tx queues for dpdk netdev.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
This commit adds a new API to the 'struct netdev_class' which
allows user to query the numa node id the 'netdev' is on.
Currently, only netdev-dpdk module implements this function.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
These function are used to stored the packet hash. 'netdev-dpdk'
automatically set this value to the RSS hash returned by the
NIC. Other 'netdev's set it to 0 (which is an invalid hash
value), so that callers can compute the hash on their own.
If DPDK support is enabled, struct dpif_packet's member
'dp_hash' is removed and 'pkt.hash.rss' from DPDK mbuf is used
This commit also configure DPDK devices to compute RSS hash
for UDP and IPv6 packets
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
tc_fill_rate() takes a 64bit int, casting kbits_rate from int
to uint64_t avoids a possible overflow when translating from
kbits to bytes.
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
'miimon_cnt' and the actual device miimon configuration is only
loosely coupled, so we can use the relaxed atomic_count for it.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Open vSwitch userspace uses special types to indicate that a particular
object may not be naturally aligned. Netlink is one source of such
problems: in Netlink, 64-bit integers are often aligned only on 32-bit
boundaries. This commit changes the odp-netlink.h that is transformed from
<linux/openvswitch.h> to use these types to make it harder to accidentally
access a misaligned 64-bit member.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
The netdev_send function has been modified to accept multiple packets, to
allow netdev providers to amortize locking and queuing costs.
This is especially true for netdev-dpdk.
Later commits exploit the new API.
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
This commit introduces a new data structure used for receiving packets from
netdevs and passing them to dpifs.
The purpose of this change is to allow storing some private data for each
packet. The subsequent commits make use of it.
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
I think these were leftovers from the removal of %z for MSVC that happened
some time ago.
VMware-BZ: 1265762
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Pritesh Kothari <pritesh.kothari@cisco.com>