2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-29 13:27:59 +00:00

217 Commits

Author SHA1 Message Date
Pravin B Shelar
32b3c5338a netdev: Fix sockaddr cast warning.
Following warning was reported by Travis:-

lib/netdev.c:1916:19: error: cast from 'struct sockaddr *' to 'struct
sockaddr_in *' increases required alignment from 2 to 4
[-Werror,-Wcast-align]
            sin = (struct sockaddr_in *) ifa->ifa_netmask;
                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
lib/netdev.c:1924:20: error: cast from 'struct sockaddr *' to 'struct
sockaddr_in6 *' increases required alignment from 2 to 4
[-Werror,-Wcast-align]
            sin6 = (struct sockaddr_in6 *) ifa->ifa_netmask;

Fixes: 3f31aded6 ("netdev: fix netmask in netdev_get_addrs").
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
2016-11-16 11:18:56 -08:00
Thadeu Lima de Souza Cascardo
3f31aded62 netdev: fix netmask in netdev_get_addrs
When iterating on getifaddrs result, ifa_netmask is dereferenced, but it's
already a pointer to struct sockaddr. This would result in wrong masks being
used when comparing addresses while calculating the source address given a
destination address at the routing code.

For example, the mask ::ffff:116.85.0.0 would be used, causing 172.16.100.0/24
to match 172.16.101.1, though they should not match.

This will not happen when using a dummy netdev, as netdev_get_addrs is not used
by it.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
2016-11-15 12:31:19 -08:00
Huanle Han
1388f0b7a4 netdev: Avoid leaking seq in netdev_open() error path.
Signed-off-by: Huanle Han <hanxueluo@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2016-09-20 08:52:51 -07:00
Daniele Di Proietto
3a414a0a4f ofproto: Honor mtu_request even for internal ports.
By default Open vSwitch tries to configure internal interfaces MTU to
match the bridge minimum, overriding any attempt by the user to
configure it through standard system tools, or the database.

While this works in many simple cases (there are probably many users
that rely on this) it may create problems for more advanced use cases
(like any overlay networks).

This commit allows the user to override the default behavior by
providing an explict MTU in the mtu_request column in the Interface
table.

This means that Open vSwitch will now treat differently database MTU
requests from standard system tools MTU requests (coming from `ip link`
or `ifconfig`), but this seems the best way to remain compatible with
old users while providing a more powerful interface.

Suggested-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Tested-by: Joe Stringer <joe@ovn.org>
2016-09-02 16:01:12 -07:00
Thadeu Lima de Souza Cascardo
74d46929c6 Revert "netdev: do not allow devices to be opened with conflicting types"
This reverts commit d2fa6c676a13e86acc7f17261b2d87484f625d45.

When doing a restart, the routing table will open ports as system, which
prevents internal ports to be opened with the right type. That causes failures
in creating the ports.

We should revisit this patch after finding a proper fix on the routing table
layer.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-08-16 10:37:20 -07:00
Daniele Di Proietto
1c33f0c35e netdev: Pass 'netdev_class' to ->run() and ->wait().
This will allow run() and wait() methods to be shared between different
classes and still perform class-specific work.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ben Pfaff <blp@ovn.org>
2016-08-15 11:07:37 -07:00
Daniele Di Proietto
4124cb1254 netdev: Make netdev_set_mtu() netdev parameter non-const.
Every provider silently drops the const attribute when converting the
parameter to the appropriate subclass.  Might as well drop the const
attribute from the parameter, since this is a "set" function.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
2016-08-12 19:32:12 -07:00
Mark Kavanagh
c90e4d9c95 netdev-provider: fix comments for netdev_rxq_recv
Commit 64839cf43 applies batch objects to netdev-providers, but
some comments were not updated accordingly. Fix these:
   - replace 'pkts' with 'batch'
   - replace '*cnt' with 'batch->count'
   - replace MAX_RX_BATCH with NETDEV_MAX_BURST
   - remove superfluous whitespace

Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: William Tu <u9012063@gmail.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-07-27 16:16:05 -07:00
Thadeu Lima de Souza Cascardo
d2fa6c676a netdev: do not allow devices to be opened with conflicting types
When a device is already opened, netdev_open should verify that the types match,
or else return an error.

Otherwise, users might expect to open a device with a certain type and get a
handle belonging to a different type.

This also prevents certain conflicting configurations that would have a port of
a certain type in the database and one of a different type on the system.

For example, when adding an interface with a type other than system, and there
is already a system interface with the same name, as the routing table will hold
a reference to that system interface, some conflicts will arise. The netdev will
be opened with the incorrect type and that will make vswitchd remove it, but
adding it again will fail as it already exists. Failing earlier prevents some
vswitchd loops in reconfiguring the interface.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-07-27 14:48:24 -07:00
Ilya Maximets
324c837485 dpif-netdev: XPS (Transmit Packet Steering) implementation.
If CPU number in pmd-cpu-mask is not divisible by the number of queues and
in a few more complex situations there may be unfair distribution of TX
queue-ids between PMD threads.

For example, if we have 2 ports with 4 queues and 6 CPUs in pmd-cpu-mask
such distribution is possible:
<------------------------------------------------------------------------>
pmd thread numa_id 0 core_id 13:
        port: vhost-user1       queue-id: 1
        port: dpdk0     queue-id: 3
pmd thread numa_id 0 core_id 14:
        port: vhost-user1       queue-id: 2
pmd thread numa_id 0 core_id 16:
        port: dpdk0     queue-id: 0
pmd thread numa_id 0 core_id 17:
        port: dpdk0     queue-id: 1
pmd thread numa_id 0 core_id 12:
        port: vhost-user1       queue-id: 0
        port: dpdk0     queue-id: 2
pmd thread numa_id 0 core_id 15:
        port: vhost-user1       queue-id: 3
<------------------------------------------------------------------------>

As we can see above dpdk0 port polled by threads on cores:
	12, 13, 16 and 17.

By design of dpif-netdev, there is only one TX queue-id assigned to each
pmd thread. This queue-id's are sequential similar to core-id's. And
thread will send packets to queue with exact this queue-id regardless
of port.

In previous example:

	pmd thread on core 12 will send packets to tx queue 0
	pmd thread on core 13 will send packets to tx queue 1
	...
	pmd thread on core 17 will send packets to tx queue 5

So, for dpdk0 port after truncating in netdev-dpdk:

	core 12 --> TX queue-id 0 % 4 == 0
	core 13 --> TX queue-id 1 % 4 == 1
	core 16 --> TX queue-id 4 % 4 == 0
	core 17 --> TX queue-id 5 % 4 == 1

As a result only 2 of 4 queues used.

To fix this issue some kind of XPS implemented in following way:

	* TX queue-ids are allocated dynamically.
	* When PMD thread first time tries to send packets to new port
	  it allocates less used TX queue for this port.
	* PMD threads periodically performes revalidation of
	  allocated TX queue-ids. If queue wasn't used in last
	  XPS_TIMEOUT_MS milliseconds it will be freed while revalidation.
        * XPS is not working if we have enough TX queues.

Reported-by: Zhihong Wang <zhihong.wang@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-07-27 12:56:04 -07:00
Terry Wilson
ee89ea7b47 json: Move from lib to include/openvswitch.
To easily allow both in- and out-of-tree building of the Python
wrapper for the OVS JSON parser (e.g. w/ pip), move json.h to
include/openvswitch. This also requires moving lib/{hmap,shash}.h.

Both hmap.h and shash.h were #include-ing "util.h" even though the
headers themselves did not use anything from there, but rather from
include/openvswitch/util.h. Fixing that required including util.h
in several C files mostly due to OVS_NOT_REACHED and things like
xmalloc.

Signed-off-by: Terry Wilson <twilson@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2016-07-22 17:09:17 -07:00
William Tu
64839cf432 netdev-provider: Apply batch object to netdev provider.
Commit 1895cc8dbb64 ("dpif-netdev: create batch object") introduces
batch process functions and 'struct dp_packet_batch' to associate with
batch-level metadata.  This patch applies the packet batch object to
the netdev provider interface (dummy, Linux, BSD, and DPDK) so that
batch APIs can be used in providers.  With batch metadata visible in
providers, optimizations can be introduced at per-batch level instead
of per-packet.

Tested-at: https://travis-ci.org/williamtu/ovs-travis/builds/145694197
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-07-21 16:46:32 -07:00
William Tu
aaca4fe0ce ofp-actions: Add truncate action.
The patch adds a new action to support packet truncation.  The new action
is formatted as 'output(port=n,max_len=m)', as output to port n, with
packet size being MIN(original_size, m).

One use case is to enable port mirroring to send smaller packets to the
destination port so that only useful packet information is mirrored/copied,
saving some performance overhead of copying entire packet payload.  Example
use case is below as well as shown in the testcases:

    - Output to port 1 with max_len 100 bytes.
    - The output packet size on port 1 will be MIN(original_packet_size, 100).
    # ovs-ofctl add-flow br0 'actions=output(port=1,max_len=100)'

    - The scope of max_len is limited to output action itself.  The following
      packet size of output:1 and output:2 will be intact.
    # ovs-ofctl add-flow br0 \
            'actions=output(port=1,max_len=100),output:1,output:2'
    - The Datapath actions shows:
    # Datapath actions: trunc(100),1,1,2

Tested-at: https://travis-ci.org/williamtu/ovs-travis/builds/140037134
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
2016-06-24 09:17:00 -07:00
Pravin B Shelar
4975aa3ee6 netdev-native-tnl: Introduce ip_build_header()
The native tunneling build tunnel header code is spread across
two different modules, it makes pretty hard to follow the code.
Following patch refactors the code to move all code to
netdev-ative-tnl module.

Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
2016-05-23 20:27:14 -07:00
Daniele Di Proietto
050c60bfb5 netdev-dpdk: Use ->reconfigure() call to change rx/tx queues.
This introduces in dpif-netdev and netdev-dpdk the first use for the
newly introduce reconfigure netdev call.

When a request to change the number of queues comes, netdev-dpdk will
remember this and notify the upper layer via
netdev_request_reconfigure().

The datapath, instead of periodically calling netdev_set_multiq(), can
detect this and call reconfigure().

This mechanism can also be used to:
* Automatically match the number of rxq with the one provided by qemu
  via the new_device callback.
* Provide a way to change the MTU of dpdk devices at runtime.
* Move a DPDK vhost device to the proper NUMA socket.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Tested-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
2016-05-23 10:27:42 -07:00
Daniele Di Proietto
790fb3b745 netdev: Add reconfigure request mechanism.
A netdev provider, especially a PMD provider (like netdev DPDK) might
not be able to change some of its parameters (such as MTU, or number of
queues) without stopping everything and restarting.

This commit introduces a mechanism that allows a netdev provider to
request a restart (netdev_request_reconfigure()).  The upper layer can
be notified via netdev_wait_reconf_required() and
netdev_is_reconf_required().  After closing all the rxqs the upper layer
can finally call netdev_reconfigure(), to make sure that the new
configuration is in place.

This will be used by next commit to reconfigure rx and tx queues in
netdev-dpdk.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Tested-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
2016-05-23 10:27:42 -07:00
Pravin B Shelar
9235b4793e dpif-netdev: Fix memory leak in tunnel header pop action.
The tunnel header pop action can leak batch of packet
in case of error. Following patch fixex the error code path.

Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
2016-05-18 19:39:18 -07:00
Pravin B Shelar
1895cc8dbb dpif-netdev: create batch object
DPDK datapath operate on batch of packets. To pass the batch of
packets around we use packets array and count.  Next patch needs
to associate meta-data with each batch of packets. So Introducing
a batch structure to make handling the metadata easier.

Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
2016-05-18 19:39:18 -07:00
Pravin B Shelar
1c8f98d96a netdev: Return number of packet from netdev_pop_header()
Current tunnel-pop API does not allow the netdev implementation
retain a packet but STT can keep a packet from batch of packets
during TCP reassembly processing. To return exact count of
valid packet STT need to pass this number of packet parameter
as a reference.

Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
2016-05-18 19:39:18 -07:00
Ciara Loftus
d640a7e100 netdev: Initialise DPDK netdev classes only once
DPDK netdev classes were being initialised twice, resulting in warning
logs like so:

netdev|WARN|attempted to register duplicate netdev provider: dpdk

This commit removes one of the initialisation calls.

Fixes: 0692257923fe ("netdev: Fix potential deadlock.")
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2016-05-17 19:15:08 -07:00
Ben Pfaff
0692257923 netdev: Fix potential deadlock.
Until now, netdev_class_mutex and route_table_mutex could be taken in
either order:

    * netdev_run() takes netdev_class_mutex, then netdev_vport_run() calls
      route_table_run(), which takes route_table_mutex.

    * route_table_init() takes route_table_mutex and then eventually calls
      netdev_open(), which takes netdev_class_mutex.

This commit fixes the problem by converting the netdev_classes hmap,
protected by netdev_class_mutex, into a cmap protected on the read
side by RCU.  Only a very small amount of code actually writes to the
cmap in question, so it's a lot easier to understand the locking rules
at that point.  In particular, there's no need to take netdev_class_mutex
from either netdev_run() or netdev_open(), so neither of the code paths
above determines a lock ordering any longer.

Reported-by: William Tu <u9012063@gmail.com>
Reported-at: http://openvswitch.org/pipermail/discuss/2016-February/020216.html
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Ryan Moats <rmoats@us.ibm.com>
Tested-by: William Tu <u9012063@gmail.com>
2016-05-09 16:42:57 -07:00
mweglicx
d6e3feb57c Add support for extended netdev statistics based on RFC 2819.
Implementation of new statistics extension for DPDK ports:
- Add new counters definition to netdev struct and open flow,
  based on RFC2819.
- Initialize netdev statistics as "filtered out"
  before passing it to particular netdev implementation
  (because of that change, statistics which are not
  collected are reported as filtered out, and some
  unit tests were modified in this respect).
- New statistics are retrieved using experimenter code and
  are printed as a result to ofctl dump-ports.
- New counters are available for OpenFlow 1.4+.
- Add new vendor id: INTEL_VENDOR_ID.
- New statistics are printed to output via ofctl only if those
  are present in reply message.
- Add new file header: include/openflow/intel-ext.h which
  contains new statistics definition.
- Extended statistics are implemented only for dpdk-physical
  and dpdk-vhost port types.
- Dpdk-physical implementation uses xstats to collect statistics.
- Dpdk-vhost implements only part of statistics (RX packet sized
  based counters).

Signed-off-by: Michal Weglicki <michalx.weglicki@intel.com>
[blp@ovn.org made software devices more consistent]
Signed-off-by: Ben Pfaff <blp@ovn.org>
2016-05-06 15:28:56 -07:00
Aaron Conole
bab6940971 netdev-dpdk: Convert initialization from cmdline to db
Existing DPDK integration is provided by use of command line options which
must be split out and passed to librte in a special manner. However, this
forces any configuration to be passed by way of a special DPDK flag, and
interferes with ovs+dpdk packaging solutions.

This commit delays dpdk initialization until after the OVS database
connection is established, at which point ovs initializes librte. It
pulls all of the config data from the OVS database, and assembles a
new argv/argc pair to be passed along.

Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Kevin Traynor <kevin.traynor@intel.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-04-29 15:07:39 -07:00
Thadeu Lima de Souza Cascardo
3e6dc8b7a8 netdev: Verify ifa_addr is not NULL when iterating over getifaddrs.
Some point-to-point devices like TUN devices will not have an address, and while
iterating over ifaddrs, its ifa_addr will be NULL. This patch fixes a crash when
starting ovs-vswitchd on a system with such a device.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Fixes: a8704b502785 ("tunneling: Handle multiple ip address for given device.")
Cc: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2016-03-30 16:59:48 -07:00
Ben Warren
417e7e66e1 list: Rename all functions in list.h with ovs_ prefix.
This attempts to prevent namespace collisions with other list libraries

Signed-off-by: Ben Warren <ben@skyportsystems.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2016-03-30 13:04:32 -07:00
Ben Warren
b19bab5b20 list: Remove lib/list.h completely.
All code is now in include/openvswitch/list.h.

Signed-off-by: Ben Warren <ben@skyportsystems.com>
Acked-by: Ryan Moats <rmoats@us.ibm.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2016-03-30 13:01:21 -07:00
Pravin B Shelar
6b6e13293e netdev: remove netdev_get_in4()
Since netdev can have multiple IP address use
generic api netdev_get_addr_list().  This also make it
easier to handle IPv4 and IPv6 address across vswitchd
layers.

Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
2016-03-24 09:30:57 -07:00
Pravin B Shelar
a8704b5027 tunneling: Handle multiple ip address for given device.
Device can have multiple IP address but netdev_get_in4/6()
returns only one configured IPv6 address. Following
patch fixes it.
OVS router is also updated to return source ip address for
given destination, This is required when interface has multiple
IP address configured.

Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
2016-03-24 09:30:57 -07:00
Ben Warren
3e8a2ad145 Move lib/dynamic-string.h to include/openvswitch directory
Signed-off-by: Ben Warren <ben@skyportsystems.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2016-03-19 10:02:12 -07:00
Ilya Maximets
118c77b1a8 netdev: New field 'is_pmd' in netdev_class.
Made to simplify creation of derived classes.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-03-16 17:03:07 -07:00
Ben Pfaff
e175dc83c5 netdev: Improve comments on netdev_rxq_recv().
The comment was incomplete in some ways and simply wrong in others.

Also ensure that *cnt is set to 0 if an error is encountered.  It's nice
when callers can rely on this.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
2016-03-07 20:43:17 -08:00
Ilya Maximets
ce179f1163 dpif-netdev: Add dpif-netdev/pmd-rxq-show appctl command.
This command can be used to check the port/rxq assignment to
pmd threads. For each pmd thread of the datapath shows list
of queue-ids with port names.

Additionally log message from pmd_thread_main() extended with
queue-id, and type of this message changed from INFO to DBG.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-02-22 16:37:17 -08:00
Ben Pfaff
7dbd520ed9 netdev: Free packets in netdev_send() for devices that don't support send.
This manifested as a memory leak in test 898 "ofproto-dpif - sFlow packet
sampling - tunnel set", which included an output to a tunnel vport that
doesn't have an implementation of netdev_send().

Reported-by: William Tu <u9012063@gmail.com>
Reported-at: http://openvswitch.org/pipermail/dev/2016-February/065873.html
Signed-off-by: Ben Pfaff <blp@ovn.org>
2016-02-08 22:42:59 -08:00
Ilya Maximets
a14b8947fd dpif-netdev: Allow different numbers of rx queues for different ports.
Currently, all of the PMD netdevs can only have the same number of
rx queues, which is specified in other_config:n-dpdk-rxqs.

Fix that by introducing of new option for PMD interfaces: 'n_rxq', which
specifies the maximum number of rx queues to be created for this
interface.

Example:
	ovs-vsctl set Interface dpdk0 options:n_rxq=8

Old 'other_config:n-dpdk-rxqs' deleted.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-02-04 17:10:45 -08:00
Jarno Rajahalme
74ff3298c8 userspace: Define and use struct eth_addr.
Define struct eth_addr and use it instead of a uint8_t array for all
ethernet addresses in OVS userspace.  The struct is always the right
size, and it can be assigned without an explicit memcpy, which makes
code more readable.

"struct eth_addr" is a good type name for this as many utility
functions are already named accordingly.

struct eth_addr can be accessed as bytes as well as ovs_be16's, which
makes the struct 16-bit aligned.  All use seems to be 16-bit aligned,
so some algorithms on the ethernet addresses can be made a bit more
efficient making use of this fact.

As the struct fits into a register (in 64-bit systems) we pass it by
value when possible.

This patch also changes the few uses of Linux specific ETH_ALEN to
OVS's own ETH_ADDR_LEN, and removes the OFP_ETH_ALEN, as it is no
longer needed.

This work stemmed from a desire to make all struct flow members
assignable for unrelated exploration purposes.  However, I think this
might be a nice code readability improvement by itself.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
2015-08-28 14:55:11 -07:00
Jesse Gross
35303d715b tunnels: Don't initialize unnecessary packet metadata.
The addition of Geneve options to packet metadata significantly
expanded its size. It was reported that this can decrease performance
for DPDK ports by up to 25% since we need to initialize the whole
structure on each packet receive.

It is not really necessary to zero out the entire structure because
miniflow_extract() only copies the tunnel metadata when particular
fields indicate that it is valid. Therefore, as long as we zero out
these fields when the metadata is initialized and ensure that the
rest of the structure is correctly set in the presence of a tunnel,
we can avoid touching the tunnel fields on packet reception.

Reported-by: Ciara Loftus <ciara.loftus@intel.com>
Tested-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-07-01 15:24:04 -07:00
Justin Pettit
2d34dbd9e1 Merge remote-tracking branch 'origin/master' into ovn4 2015-06-18 22:02:55 -07:00
Ben Pfaff
7a82d30569 netdev: Initialize at the beginning of netdev_unregister_provider().
Otherwise, if netdev_unregister_provider() is called before any other
netdev function, netdev_class_mutex is not initialized and the attempt to
lock it aborts.

This doesn't fix an existing bug but with the following commit
--enable-dummy=system will make netdev_unregister_provider() the first
netdev function to be called.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
2015-06-16 08:21:33 -07:00
Ciara Loftus
7d1ced0177 netdev-dpdk: add dpdk vhost-user ports
This patch adds support for a new port type to the userspace
datapath called dpdkvhostuser.

A new dpdkvhostuser port will create a unix domain socket which
when provided to QEMU is used to facilitate communication between
the virtio-net device on the VM and the OVS port on the host.

vhost-cuse ('dpdkvhost') ports are still available as 'dpdkvhostcuse'
ports and will be enabled if vhost-cuse support is detected in the
DPDK build specified during compilation of the switch. Otherwise,
vhost-user ports are enabled.

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2015-06-14 20:36:52 -07:00
Daniele Di Proietto
a0cb2d66f5 netdev-dpdk: Adapt the requested number of tx and rx queues.
This commit changes the semantics of 'netdev_set_multiq()' to allow OVS
DPDK to run on device with limited multi queue support.

* If a netdev doesn't have the requested number of rxqs it can simply
  inform the datapath without failing.
* If a netdev doesn't have the requested number of txqs it should try
  to create as many as possible and use locking.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
2015-05-22 11:28:19 -07:00
Jesse Gross
e62d903bca tunneling: Invalid packets should be cleared.
If we receive a packet with an invalid tunnel header, we
should drop the packet without further processing. Currently
we do this by removing any parsed tunnel metadata. However,
this is not sufficient to stop processing - this only results
in the packet getting dropped by chance when something
usually runs across part of the packet that does not make
sense. Since both the packet and its metadata are in an
inconsistent state, it's also possible that the result is
an ovs-vswitchd crash or forwarding of a mangled packet.

Rather than clear the metadata, an alternate solution is to
remove all of the packet data. This guarantees that the
packet gets dropped during the next round of processing.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2015-04-09 14:29:08 -07:00
Jesse Gross
d625fbd13e tunneling: Convert tunnel push/pop functions to act on single packets.
The userspace tunneling API for pushing and popping tunnel headers
is currently based on processing batches of packets. However, there
is no obvious way to take advantage of batching for these operations
and so each tunnel operation has a pair of loops to process the
batch. This changes the API to operate on single packets to enable
better code reuse.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2015-04-09 14:29:08 -07:00
Ricky Li
c876a4bb9b netdev: Fix user space tunneling for set_tunnel action.
e.g. Set tunnel id for encapsulated VxLAN packet (out_key=flow):

ovs-vsctl add-port int-br vxlan0 -- set interface vxlan0 \
    type=vxlan options:remote_ip=172.168.1.2 options:out_key=flow

ovs-ofctl add-flow int-br in_port=LOCAL, icmp,\
    actions=set_tunnel:3, output:1 (1 is the port# of vxlan0)

Output tunnel ID should be modified to 3 with this patch.

Signed-off-by: Ricky Li <ricky.li@intel.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2015-03-26 18:56:12 -07:00
Kevin Traynor
58397e6c1e netdev-dpdk: add dpdk vhost-cuse ports
This patch adds support for a new port type to userspace datapath
called dpdkvhost. This allows KVM (QEMU) to offload the servicing
of virtio-net devices to its associated dpdkvhost port. Instructions
for use are in INSTALL.DPDK.

This has been tested on Intel multi-core platforms and with clients
that have virtio-net interfaces.

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Kevin Traynor <kevin.traynor@intel.com>
Signed-off-by: Maryam Tahhan <maryam.tahhan@intel.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2015-03-19 20:26:03 -07:00
Pravin B Shelar
cf62fa4c70 dp-packet: Remove ofpbuf dependency.
Currently dp-packet make use of ofpbuf for managing packet
buffers. That complicates ofpbuf, by making dp-packet
independent of ofpbuf both libraries can be optimized for
their own use case.
This avoids mapping operation between ofpbuf and dp_packet
in datapath upcalls.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-03-03 13:37:37 -08:00
Pravin B Shelar
e14deea0bd dpif_packet: Rename to dp_packet
dp_packet is short and better name for datapath packet
structure.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2015-03-03 13:37:34 -08:00
Thomas Graf
e6211adce4 lib: Move vlog.h to <openvswitch/vlog.h>
A new function vlog_insert_module() is introduced to avoid using
list_insert() from the vlog.h header.

Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-12-15 14:15:19 +01:00
Thomas Graf
ca6ba70092 list: Rename struct list to struct ovs_list
struct list is a common name and can't be used in public headers.

Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-12-15 14:15:12 +01:00
Pravin B Shelar
36f29fb1a7 dpif: Fix initialization order.
OVS router depends on tnl_conf_seq and all tunnel related
components should be initialized before registering dpif
implementations.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-11-24 17:12:20 -08:00
Pravin B Shelar
a36de779d7 openvswitch: Userspace tunneling.
Following patch adds support for userspace tunneling. Tunneling
needs three more component first is routing table which is configured by
caching kernel routes and second is ARP cache which build automatically
by snooping arp. And third is tunnel protocol table which list all
listening protocols which is populated by vswitchd as tunnel ports
are added. GRE and VXLAN protocol support is added in this patch.

Tunneling works as follows:
On packet receive vswitchd check if this packet is targeted to tunnel
port. If it is then vswitchd inserts tunnel pop action which pops
header and sends packet to tunnel port.
On packet xmit rather than generating Set tunnel action it generate
tunnel push action which has tunnel header data. datapath can use
tunnel-push action data to generate header for each packet and
forward this packet to output port. Since tunnel-push action
contains most of packet header vswitchd needs to lookup routing
table and arp table to build this action.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-11-12 15:08:33 -08:00