2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-28 04:47:49 +00:00

282 Commits

Author SHA1 Message Date
William Tu
48c6733cb5 bridge: Prohibit "default" and "all" bridge name.
Under Linux, when users create bridge named "default" or "all", although
ovs-vsctl fails but vswitchd in the background will keep retrying it,
causing the systemd-udev to reach 100% cpu utilization. The patch prevents
any attempt to create or open a netdev named "default" or "all" because
these two names are reserved on Linux due to
/proc/sys/net/ipv4/conf/ always contains directories by these names.

The reason for high CPU utilization is due to frequent calls into kernel's
register_netdevice function, which will invoke several kernel elements who
has registered on the netdevice notifier chain.  And due to creation failed,
OVS wakes up and re-recreate the device, which ends up as a high CPU loop.

VMWare-BZ: #1842388
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Greg Rose <gvrose8192@gmail.com>
2017-05-01 10:08:26 -07:00
Andy Zhou
72c84bc2db dp-packet: Enhance packet batch APIs.
One common use case of 'struct dp_packet_batch' is to process all
packets in the batch in order. Add an iterator for this use case
to simplify the logic of calling sites,

Another common use case is to drop packets in the batch, by reading
all packets, but writing back pointers of fewer packets. Add macros
to support this use case.

Signed-off-by: Andy Zhou <azhou@ovn.org>
Acked-by: Jarno Rajahalme <jarno@ovn.org>
2017-01-26 17:35:29 -08:00
Eric Garver
1ebdc7eb81 netdev-linux: double tagged packets should use 0x88a8
We need to check if a packet is double tagged. If so make sure to push
0x88a8 instead of 0x8100. Without this a simple port redirect of 802.1ad
frames means the outer tag gets translated from 0x88a8 to 0x8100 by the
userspace datapath.

This only affected kernels that don't use TP_STATUS_VLAN_TPID_VALID,
which is kernels < 3.14.

Signed-off-by: Eric Garver <e@erig.me>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2016-10-04 08:47:18 -07:00
David Hill
9120cfc05c netdev-linux: Use ethtool when miimon fails.
Some network drivers might return true to SIOCGMIIPHY and an error on
SIOCGMIIREG when using MII to query phy state. Fall back to ethtool if this
happens to allow failover to work when using such nics.

Reported-at: http://openvswitch.org/pipermail/dev/2016-August/078800.html
Signed-off-by: David Hill <dhill@redhat.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
2016-09-27 11:38:18 -07:00
Daniele Di Proietto
1c33f0c35e netdev: Pass 'netdev_class' to ->run() and ->wait().
This will allow run() and wait() methods to be shared between different
classes and still perform class-specific work.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ben Pfaff <blp@ovn.org>
2016-08-15 11:07:37 -07:00
Daniele Di Proietto
4124cb1254 netdev: Make netdev_set_mtu() netdev parameter non-const.
Every provider silently drops the const attribute when converting the
parameter to the appropriate subclass.  Might as well drop the const
attribute from the parameter, since this is a "set" function.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
2016-08-12 19:32:12 -07:00
Ben Pfaff
13c1637f5b smap: New function smap_get_ullong().
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Ryan Moats <rmoats@us.ibm.com>
2016-08-08 11:00:37 -07:00
Daniele Di Proietto
efe179e041 netdev-*: Do not use dp_packet_pad() in recv() functions.
All the netdevs used by dpif-netdev (except for netdev-dpdk) have a
dp_packet_pad() call in the receive function, probably because the
userspace datapath couldn't handle properly short packets.

This doesn't appear to be the case anymore.

This commit removes the call to have a more consistent behavior with the
kernel datapath.

All the testsuite changes in this commit adjust the expectations for
packet lengths in flow dumps and other stats.  There's only one fix in
ovn.at: one of the test_ip() functions generated an incomplete udp
packet, which was not a problem until now, because of the padding.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ben Pfaff <blp@ovn.org>
2016-07-29 14:08:10 -07:00
Ilya Maximets
324c837485 dpif-netdev: XPS (Transmit Packet Steering) implementation.
If CPU number in pmd-cpu-mask is not divisible by the number of queues and
in a few more complex situations there may be unfair distribution of TX
queue-ids between PMD threads.

For example, if we have 2 ports with 4 queues and 6 CPUs in pmd-cpu-mask
such distribution is possible:
<------------------------------------------------------------------------>
pmd thread numa_id 0 core_id 13:
        port: vhost-user1       queue-id: 1
        port: dpdk0     queue-id: 3
pmd thread numa_id 0 core_id 14:
        port: vhost-user1       queue-id: 2
pmd thread numa_id 0 core_id 16:
        port: dpdk0     queue-id: 0
pmd thread numa_id 0 core_id 17:
        port: dpdk0     queue-id: 1
pmd thread numa_id 0 core_id 12:
        port: vhost-user1       queue-id: 0
        port: dpdk0     queue-id: 2
pmd thread numa_id 0 core_id 15:
        port: vhost-user1       queue-id: 3
<------------------------------------------------------------------------>

As we can see above dpdk0 port polled by threads on cores:
	12, 13, 16 and 17.

By design of dpif-netdev, there is only one TX queue-id assigned to each
pmd thread. This queue-id's are sequential similar to core-id's. And
thread will send packets to queue with exact this queue-id regardless
of port.

In previous example:

	pmd thread on core 12 will send packets to tx queue 0
	pmd thread on core 13 will send packets to tx queue 1
	...
	pmd thread on core 17 will send packets to tx queue 5

So, for dpdk0 port after truncating in netdev-dpdk:

	core 12 --> TX queue-id 0 % 4 == 0
	core 13 --> TX queue-id 1 % 4 == 1
	core 16 --> TX queue-id 4 % 4 == 0
	core 17 --> TX queue-id 5 % 4 == 1

As a result only 2 of 4 queues used.

To fix this issue some kind of XPS implemented in following way:

	* TX queue-ids are allocated dynamically.
	* When PMD thread first time tries to send packets to new port
	  it allocates less used TX queue for this port.
	* PMD threads periodically performes revalidation of
	  allocated TX queue-ids. If queue wasn't used in last
	  XPS_TIMEOUT_MS milliseconds it will be freed while revalidation.
        * XPS is not working if we have enough TX queues.

Reported-by: Zhihong Wang <zhihong.wang@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-07-27 12:56:04 -07:00
Terry Wilson
ee89ea7b47 json: Move from lib to include/openvswitch.
To easily allow both in- and out-of-tree building of the Python
wrapper for the OVS JSON parser (e.g. w/ pip), move json.h to
include/openvswitch. This also requires moving lib/{hmap,shash}.h.

Both hmap.h and shash.h were #include-ing "util.h" even though the
headers themselves did not use anything from there, but rather from
include/openvswitch/util.h. Fixing that required including util.h
in several C files mostly due to OVS_NOT_REACHED and things like
xmalloc.

Signed-off-by: Terry Wilson <twilson@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2016-07-22 17:09:17 -07:00
William Tu
64839cf432 netdev-provider: Apply batch object to netdev provider.
Commit 1895cc8dbb64 ("dpif-netdev: create batch object") introduces
batch process functions and 'struct dp_packet_batch' to associate with
batch-level metadata.  This patch applies the packet batch object to
the netdev provider interface (dummy, Linux, BSD, and DPDK) so that
batch APIs can be used in providers.  With batch metadata visible in
providers, optimizations can be introduced at per-batch level instead
of per-packet.

Tested-at: https://travis-ci.org/williamtu/ovs-travis/builds/145694197
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-07-21 16:46:32 -07:00
Daniele Di Proietto
29736cc076 netdev-linux: Do not log a warning if the device is down.
In the userspace datapath we use tap devices as internal netdev.  The
datapath doesn't consider whether a device is up or down before sending
to it, and so far this hasn't been a problem.

Since Linux upstream commit 1bd4978a88ac("tun: honor IFF_UP in
tun_get_user()"), included in 4.4, writing to a tap device that is not
up sets errno to EIO.  This commit avoids printing a warning in this
case.

This fixes a failures in the system-userspace-testsuites.

Reported-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ben Pfaff <blp@ovn.org>
2016-07-06 18:14:25 -07:00
William Tu
aaca4fe0ce ofp-actions: Add truncate action.
The patch adds a new action to support packet truncation.  The new action
is formatted as 'output(port=n,max_len=m)', as output to port n, with
packet size being MIN(original_size, m).

One use case is to enable port mirroring to send smaller packets to the
destination port so that only useful packet information is mirrored/copied,
saving some performance overhead of copying entire packet payload.  Example
use case is below as well as shown in the testcases:

    - Output to port 1 with max_len 100 bytes.
    - The output packet size on port 1 will be MIN(original_packet_size, 100).
    # ovs-ofctl add-flow br0 'actions=output(port=1,max_len=100)'

    - The scope of max_len is limited to output action itself.  The following
      packet size of output:1 and output:2 will be intact.
    # ovs-ofctl add-flow br0 \
            'actions=output(port=1,max_len=100),output:1,output:2'
    - The Datapath actions shows:
    # Datapath actions: trunc(100),1,1,2

Tested-at: https://travis-ci.org/williamtu/ovs-travis/builds/140037134
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
2016-06-24 09:17:00 -07:00
bschanmu@redhat.com
6cf888b821 netdev-linux: Add new QoS type linux-noop.
Linux ``No operation'' qos type is used to inform the vswitch that the
traffic control for the port is managed externally. Any configuration values
set for this type will have no effect.

This patch provides a solution suggested in this mail -
http://openvswitch.org/pipermail/discuss/2015-May/017687.html

Signed-off-by: Babu Shanmugam <bschanmu@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2016-06-23 16:51:11 -07:00
Daniele Di Proietto
050c60bfb5 netdev-dpdk: Use ->reconfigure() call to change rx/tx queues.
This introduces in dpif-netdev and netdev-dpdk the first use for the
newly introduce reconfigure netdev call.

When a request to change the number of queues comes, netdev-dpdk will
remember this and notify the upper layer via
netdev_request_reconfigure().

The datapath, instead of periodically calling netdev_set_multiq(), can
detect this and call reconfigure().

This mechanism can also be used to:
* Automatically match the number of rxq with the one provided by qemu
  via the new_device callback.
* Provide a way to change the MTU of dpdk devices at runtime.
* Move a DPDK vhost device to the proper NUMA socket.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Tested-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
2016-05-23 10:27:42 -07:00
Daniele Di Proietto
790fb3b745 netdev: Add reconfigure request mechanism.
A netdev provider, especially a PMD provider (like netdev DPDK) might
not be able to change some of its parameters (such as MTU, or number of
queues) without stopping everything and restarting.

This commit introduces a mechanism that allows a netdev provider to
request a restart (netdev_request_reconfigure()).  The upper layer can
be notified via netdev_wait_reconf_required() and
netdev_is_reconf_required().  After closing all the rxqs the upper layer
can finally call netdev_reconfigure(), to make sure that the new
configuration is in place.

This will be used by next commit to reconfigure rx and tx queues in
netdev-dpdk.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Tested-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
2016-05-23 10:27:42 -07:00
mweglicx
d6e3feb57c Add support for extended netdev statistics based on RFC 2819.
Implementation of new statistics extension for DPDK ports:
- Add new counters definition to netdev struct and open flow,
  based on RFC2819.
- Initialize netdev statistics as "filtered out"
  before passing it to particular netdev implementation
  (because of that change, statistics which are not
  collected are reported as filtered out, and some
  unit tests were modified in this respect).
- New statistics are retrieved using experimenter code and
  are printed as a result to ofctl dump-ports.
- New counters are available for OpenFlow 1.4+.
- Add new vendor id: INTEL_VENDOR_ID.
- New statistics are printed to output via ofctl only if those
  are present in reply message.
- Add new file header: include/openflow/intel-ext.h which
  contains new statistics definition.
- Extended statistics are implemented only for dpdk-physical
  and dpdk-vhost port types.
- Dpdk-physical implementation uses xstats to collect statistics.
- Dpdk-vhost implements only part of statistics (RX packet sized
  based counters).

Signed-off-by: Michal Weglicki <michalx.weglicki@intel.com>
[blp@ovn.org made software devices more consistent]
Signed-off-by: Ben Pfaff <blp@ovn.org>
2016-05-06 15:28:56 -07:00
Daniele Di Proietto
4ec3d7c757 hmap: Add HMAP_FOR_EACH_POP.
Makes popping each member of the hmap a bit easier.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ben Pfaff <blp@ovn.org>
2016-04-26 23:28:59 -07:00
Miguel Angel Ajo
79abacc880 netdev-linux: Fix ingress policing burst rate configuration via tc
The tc_police structure was filled with a value calculated in bits
instead of bytes while bytes were expected. This led the setting
of an x8 higher burst value.

Documentation and defaults have been corrected accordingly to minimize
nuisances on users sticking to the defaults.

The suggested burst value is now 80% of policing rate to make sure
TCP works correctly.

Signed-off-by: Miguel Angel Ajo <majopela@redhat.com>
Tested-by: Miguel Angel Ajo <majopela@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2016-04-21 14:48:23 -07:00
William Tu
91644f45c6 dp-packet: Fix use of uninitialised value at emc_lookup.
Valgrind reports "Conditional jump or move depends on uninitialised value"
and "Use of uninitialised value" at case 2016 ovn -- 3 HVs, 1 LS, 3
lports/HV.  It is caused by 1) assigning an uninitialized value to 'key->hash'
at emc_processing(). Due to uninit rss_hash_valid, dp_packet_rss_valid() might
return true and undefined hash value is returned, and 2) at emc_lookup, the
'current_entry->key.hash' could be uninitialized due to dp_packet_clone().
The patch fixes the two and as a result, a couple of calls to
dp_packet_rss_invalidate() become redundant and thus are removed.

Call stacks:
- Connditional jump or move depends on uninitialised value(s)
    dpif_netdev_packet_get_rss_hash (dpif-netdev.c:3334)
    emc_processing (dpif-netdev.c:3455)
    dp_netdev_input__ (dpif-netdev.c:3639)
and,
- Use of uninitialised value of size 8
    emc_lookup (dpif-netdev.c:1785)
    emc_processing (dpif-netdev.c:3457)
    dp_netdev_input__ (dpif-netdev.c:3639)

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-04-06 19:32:34 -07:00
Ben Warren
64c967795b Move lib/ofpbuf.h to include/openvswitch directory
Signed-off-by: Ben Warren <ben@skyportsystems.com>
Acked-by: Ryan Moats <rmoats@us.ibm.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2016-03-30 13:10:18 -07:00
Pravin B Shelar
6b6e13293e netdev: remove netdev_get_in4()
Since netdev can have multiple IP address use
generic api netdev_get_addr_list().  This also make it
easier to handle IPv4 and IPv6 address across vswitchd
layers.

Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
2016-03-24 09:30:57 -07:00
Pravin B Shelar
a8704b5027 tunneling: Handle multiple ip address for given device.
Device can have multiple IP address but netdev_get_in4/6()
returns only one configured IPv6 address. Following
patch fixes it.
OVS router is also updated to return source ip address for
given destination, This is required when interface has multiple
IP address configured.

Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
2016-03-24 09:30:57 -07:00
Ben Warren
3e8a2ad145 Move lib/dynamic-string.h to include/openvswitch directory
Signed-off-by: Ben Warren <ben@skyportsystems.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2016-03-19 10:02:12 -07:00
Ilya Maximets
118c77b1a8 netdev: New field 'is_pmd' in netdev_class.
Made to simplify creation of derived classes.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-03-16 17:03:07 -07:00
Pravin B Shelar
989d713599 netdev-linux: Fix netdev ipv6 notification
Listen to RTNLGRP_IPV6_IFINFO to get IPv6 address change
notification.

Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
2016-03-10 18:53:50 -08:00
Thadeu Lima de Souza Cascardo
7c6401ca2b netdev-linux: Fix warning message.
Instead of reading

"error receiving Ethernet packet on Permission denied: ens3",

it should read

"error receiving Ethernet packet on ens3: Permission denied".

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2016-02-16 09:34:38 -08:00
Simon Horman
67bed84cb2 netdev-linux: Handle flags for 10G and 40G speeds
Handle advertised and supported flags for the following speeds:

* 1G base KX
* 10G base KX4, KR, R
* 40G base KR4, CR4, SR4, LR4

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2015-11-29 16:09:20 -08:00
Simon Horman
0c61535603 netdev-linux: correctly detect port speed bits beyond 16bit
This includes bits for:
* Backplane
* 1000 baseKX (full duplex)
* All speeds of 10Gbit and above other than 10000 baseT (full duplex).

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2015-11-29 16:05:21 -08:00
Ilya Maximets
bc85690660 netdev-linux: Remove unreachable code in netdev_linux_rx_recv_tap().
While splitting netdev_linux_rx_recv() into netdev_linux_rx_recv_sock()
and netdev_linux_rx_recv_tap() in commit
b73c85181df9 ("netdev-linux: Read packet auxdata to obtain vlan_tid")
error handling part was copied 'as is' to both functions.
But in case of netdev_linux_rx_recv_tap(), according to POSIX, the
number of bytes read shall never be greater than 'size'.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2015-11-25 21:58:28 -08:00
Thadeu Lima de Souza Cascardo
c9697f354e Prevent test failures when there are non Ethernet devices on the system.
When there are PtP TUN devices on the system or SIT devices, tests will fail
because of a warning that it was not possible to get their Ethernet addresses.
That call comes from the route code adding tunnel ports.

Make that warning an informational message and filter that out during tests.

Also, return EINVAL when trying to get those interface Ethernet addresses, which
will prevent them from being added to the tunnel ports pool and will properly
fail in other places as well.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2015-11-23 10:18:28 -08:00
Daniele Di Proietto
f2f44f5da0 dpif-netdev: Check for PKT_RX_RSS_HASH flag.
DPDK mbufs contain a valid RSS hash only if PKT_RX_RSS_HASH is
set in 'ol_flags'.  Otherwise the hash is garbage and doesn't
relate to the packet.

This fixes an issue with vhost, which, being a virtual NIC, doesn't
compute the hash.

Reported-by: Dongjun <dongj@dtdream.com>
Suggested-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Kevin Traynor <kevin.traynor@intel.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
2015-09-11 17:43:39 +01:00
Anoob Soman
ad2dade565 netdev-linux: Don't set ethtool flags if flag is already set on netdev
Check if ethtool flags is already set on a netdev, before trying to set it.

This patch works around issues with some older verison of ethernet drivers,
which tend to reset the NIC when call to disable LRO is made, even if LRO is
already disable on that NIC. NIC reset is not desirable in OVS upgrade scenario
as it causes extended downtime.

Signed-off-by: Anoob Soman <anoob.soman@citrix.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2015-09-03 12:51:26 -07:00
Jarno Rajahalme
74ff3298c8 userspace: Define and use struct eth_addr.
Define struct eth_addr and use it instead of a uint8_t array for all
ethernet addresses in OVS userspace.  The struct is always the right
size, and it can be assigned without an explicit memcpy, which makes
code more readable.

"struct eth_addr" is a good type name for this as many utility
functions are already named accordingly.

struct eth_addr can be accessed as bytes as well as ovs_be16's, which
makes the struct 16-bit aligned.  All use seems to be 16-bit aligned,
so some algorithms on the ethernet addresses can be made a bit more
efficient making use of this fact.

As the struct fits into a register (in 64-bit systems) we pass it by
value when possible.

This patch also changes the few uses of Linux specific ETH_ALEN to
OVS's own ETH_ADDR_LEN, and removes the OFP_ETH_ALEN, as it is no
longer needed.

This work stemmed from a desire to make all struct flow members
assignable for unrelated exploration purposes.  However, I think this
might be a nice code readability improvement by itself.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
2015-08-28 14:55:11 -07:00
Alex Wang
49af9a3d44 netdev-linux: Cache the result of previous reading of in4 address.
This commit makes netdev_linux_set_in4() cache the result of previous
reading of in4 address (successful or not).

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-07-28 09:29:14 -07:00
Alex Wang
7df6932e0f netdev-linux: Make netdev_linux_get_in6() conform to API definition.
The get_in6() API defined in netdev-provider.h requires the return
of error values when the 'netdev' has no assigned IPv6 address or
the 'netdev' does not support IPv6.  However, the netdev_linux_get_in6()
implementation does not follow this (always return 0).  And this
causes the a bug in deleting vlan interfaces created for vlan
splinter.

This commit makes netdev_linux_get_in6() conform to the API definition
and returns the specified error value when failing to get IPv6 address.

VMware-BZ: #1485521
Reported-by: Ronald Lee <ronaldlee@vmware.com>
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-07-28 09:27:42 -07:00
Alex Wang
d6384a3a20 netdev-linux: Make netdev_linux_notify_sock join RTNLGRP_IPV4_IFADDR
and RTNLGRP_IPV6_IFADDR.

Currently the netdev_linux_notify_sock only joins multicast group
RTNLGRP_LINK for link status change notification.  Some ovs features
also require the detection of ip addresses changes and update of the
netdev-linux's cache.  To achieve this, we need to make
netdev_linux_notify_sock join the multicast group RTNLGRP_IPV4_IFADDR
and RTNLGRP_IPV6_IFADDR.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-07-28 09:27:09 -07:00
Alex Wang
7e9dcc0f99 rtnetlink: Extend rtnetlink to support RTNLGRP_IPV4_IFADDR and
RTNLGRP_IPV6_IFADDR.

This commit renames the rtnetlink-link.{c,h} to rtnetlink.{c,h}
and extends the module to support RTNLGRP_IPV4_IFADDR and
RTNLGRP_IPV4_IFADDR multicast groups.  A later patch will start
using this module to react to interface address changes.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-07-28 09:27:06 -07:00
Thadeu Lima de Souza Cascardo
bb13fe5ee9 netdev-linux: do not warn when getting stats for netdev with no vport
When there is no vport for a given netdev, dpif_netlink_vport_get might return
ENODEV. Do not warn a failure to get port stats when that's the case.

This happens when the userspace switch is used.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2015-07-16 10:58:13 -07:00
Daniele Di Proietto
2bc1bbd27d dp-packet: Rename 'dp_hash' in 'rss_hash'.
We already have the 'dp_hash' embedded in the metadata.  This caused
confusion in the code.  With this commit it should be clear that
'rss_hash' is the packet hash used for internal purposes, while
'md.dp_hash' is part of the flow, computed during the execution of
certain actions.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
2015-04-20 12:49:41 -07:00
Alex Wang
4f631ccd51 netdev-linux: Make htb quantum always no less than mtu.
Currently, ovs uses hardcoded rate2quantum = 10 for each htb qdisc.
When qdisc class's rate is small, the resulting quantum (calculated
by min_rate / rate2quantum) will be smaller than MTU.  This is not
recommended and tc will keep complaining the following in syslog.

localhost kernel: HTB: quantum of class 10003 is small. Consider r2q change.
localhost kernel: HTB: quantum of class 10004 is small. Consider r2q change.
localhost kernel: HTB: quantum of class 10005 is small. Consider r2q change.
localhost kernel: HTB: quantum of class 10006 is small. Consider r2q change.

To fix the issue, this commit makes ovs always use htb quantum no less
than the MTU.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-03-27 13:48:39 -07:00
Ben Pfaff
2f4298ce26 netdev-linux: Fix build with old kernel headers.
Commit 677d9158fc0a (netdev-linux: Support for SFQ, FQ_CoDel and CoDel
qdiscs.) added support for new qdiscs.  The commit uses TCA_CODEL_* and
TCA_FQ_CODEL_* not in old kernel headers, causing a build failure against
such headers.  This commit should fix the problem by defining these values
ourselves.  (I haven't tested it against old headers, so I might have
missed something, but it's a straightforward change and at worst won't do
harm.)

It appears that sfq (also added by the same commit) was in Linux before
2.6.32, so it seems unlikely that we need any compatibility code there.

CC: Jonathan Vestin <jonavest@kau.se>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2015-03-24 14:06:46 -07:00
Jonathan Vestin
677d9158fc netdev-linux: Support for SFQ, FQ_CoDel and CoDel qdiscs.
This patch adds support for SFQ, CoDel and FQ_CoDel classless qdiscs to Open vSwitch. It also removes the requirement for a QoS to have at least one Queue (as this makes no sense when using classless qdiscs). I have also not implemented class_{get,set,delete,get_stats,dump_stats} because they are meant for qdiscs with classes.

Signed-off-by: Jonathan Vestin <jonavest@kau.se>
[blp@nicira.com mostly applied stylistic changes]
Signed-off-by: Ben Pfaff <blp@nicira.com>
2015-03-23 13:55:30 -07:00
Ben Pfaff
ac3e3aaa0a netdev-linux: Avoid RTM_GETQDISC bug workaround on new-enough kernels.
Otherwise we can't detect classless qdiscs.

This has no useful effect for the currently supported qdiscs, which all
have classes, but it makes it possible to add support for new classless
qdiscs.

This suddenly makes netdev-linux complain about qdiscs it doesn't know
about (e.g. pfifo_fast), which isn't too useful, so this commit also
demotes that INFO message to DBG level.

Reported-by: Jonathan Vestin <jonavest@kau.se>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2015-03-18 08:28:18 -07:00
Ben Pfaff
c7952afb12 netdev-linux: Be more careful about integer overflow in policing.
Otherwise the policing limits could make no sense if large rates were
specified.

Reported-by: Zhangguanghui <zhang.guanghui@h3c.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ansis Atteka <aatteka@nicira.com>
2015-03-17 12:35:50 -07:00
Pravin B Shelar
cf62fa4c70 dp-packet: Remove ofpbuf dependency.
Currently dp-packet make use of ofpbuf for managing packet
buffers. That complicates ofpbuf, by making dp-packet
independent of ofpbuf both libraries can be optimized for
their own use case.
This avoids mapping operation between ofpbuf and dp_packet
in datapath upcalls.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-03-03 13:37:37 -08:00
Pravin B Shelar
e14deea0bd dpif_packet: Rename to dp_packet
dp_packet is short and better name for datapath packet
structure.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2015-03-03 13:37:34 -08:00
Thomas Graf
e6211adce4 lib: Move vlog.h to <openvswitch/vlog.h>
A new function vlog_insert_module() is introduced to avoid using
list_insert() from the vlog.h header.

Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-12-15 14:15:19 +01:00
Pravin B Shelar
a36de779d7 openvswitch: Userspace tunneling.
Following patch adds support for userspace tunneling. Tunneling
needs three more component first is routing table which is configured by
caching kernel routes and second is ARP cache which build automatically
by snooping arp. And third is tunnel protocol table which list all
listening protocols which is populated by vswitchd as tunnel ports
are added. GRE and VXLAN protocol support is added in this patch.

Tunneling works as follows:
On packet receive vswitchd check if this packet is targeted to tunnel
port. If it is then vswitchd inserts tunnel pop action which pops
header and sends packet to tunnel port.
On packet xmit rather than generating Set tunnel action it generate
tunnel push action which has tunnel header data. datapath can use
tunnel-push action data to generate header for each packet and
forward this packet to output port. Since tunnel-push action
contains most of packet header vswitchd needs to lookup routing
table and arp table to build this action.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-11-12 15:08:33 -08:00
Ben Pfaff
fa373af463 netdev-linux: Avoid depending on kernel definition of rtnl_link_stats64.
We have to define our own with some kernel headers, so we might as well do
it everywhere, especially since there seems to be a problem with detecting
the presence of the definition with at least some kernels.

Reported-by: Wang Sheng-Hui <shhuiw@gmail.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
2014-10-30 12:47:01 -07:00