2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-22 09:58:01 +00:00

155 Commits

Author SHA1 Message Date
Mark Michelson
b6e840aed0 pcap-file: Add nanosecond resolution pcap support.
PCAP header magic numbers are different for microsecond and nanosecond
resolution timestamps. This patch adds support for understanding the
difference and reporting the time correctly with ovs_pcap_read().

When writing pcap files, OVS will always use microsecond resolution, so
no new calculations were added to those functions.

Signed-off-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2018-10-05 17:35:07 -07:00
Ben Pfaff
89c09c1cd1 netdev: Clean up class initialization.
The macros are hard to read.  This makes it a little more readable.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-08-27 17:48:23 +01:00
John Hurley
88dcf2aa82 netdev-provider: add class op to get block_id
Add a new class op for netdevs to get the block_id if one exists. The
block_id is used in offload ops to group multiple qdiscs together.

Stub calls are made to the new class op (implementation to follow in
further patches). The default block_id of 0 (no block) will be used in
these cases.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2018-06-29 14:51:47 +02:00
Ben Pfaff
fa37affad3 Embrace anonymous unions.
Several OVS structs contain embedded named unions, like this:

struct {
    ...
    union {
        ...
    } u;
};

C11 standardized a feature that many compilers already implemented
anyway, where an embedded union may be unnamed, like this:

struct {
    ...
    union {
        ...
    };
};

This is more convenient because it allows the programmer to omit "u."
in many places.  OVS already used this feature in several places.  This
commit embraces it in several others.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
Tested-by: Alin Gabriel Serdean <aserdean@ovn.org>
Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
2018-05-25 13:36:05 -07:00
Jan Scheurich
8492adc270 netdev: Add optional qfill output parameter to rxq_recv()
If the caller provides a non-NULL qfill pointer and the netdev
implemementation supports reading the rx queue fill level, the rxq_recv()
function returns the remaining number of packets in the rx queue after
reception of the packet burst to the caller. If the implementation does
not support this, it returns -ENOTSUP instead. Reading the remaining queue
fill level should not substantilly slow down the recv() operation.

A first implementation is provided for ethernet and vhostuser DPDK ports
in netdev-dpdk.c.

This output parameter will be used in the upcoming commit for PMD
performance metrics to supervise the rx queue fill level for DPDK
vhostuser ports.

Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Acked-by: Billy O'Mahony <billy.o.mahony@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-05-11 08:08:24 +01:00
Justin Pettit
e883448e3f dp-packet: Add index to DP_PACKET_BATCH_FOR_EACH to prevent shadowing.
Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
2018-02-28 14:53:27 -08:00
Ben Pfaff
6f06837989 flow: Add some L7 payload data to most L4 protocols that accept it.
This makes traffic generated by flow_compose() look slightly more
realistic.  It requires lots of updates to tests, but at least the tests
themselves should be slightly more realistic too.

At the same time, add --l7 and --l7-len options to ofproto/trace to allow
users to specify the amount or contents of payloads that they want.

Suggested-by: Brad Cowie <brad@cowie.nz>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Tested-by: Yifeng Sun <pkusunyifeng@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
2018-01-27 08:58:31 -08:00
Ben Pfaff
ae9f2ce7c5 netdev-dummy: Lock mutex when retrieving custom stats.
Found by Clang.

CC: Michal Weglicki <michalx.weglicki@intel.com>
Fixes: 971f4b394c6e ("netdev: Custom statistics.")
Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
2018-01-10 16:07:53 -08:00
Michal Weglicki
971f4b394c netdev: Custom statistics.
- New get_custom_stats interface function is added to netdev. It
  allows particular netdev implementation to expose custom
  counters in dictionary format (counter name/counter value).
- New statistics are retrieved using experimenter code and
  are printed as a result to ofctl dump-ports.
- New counters are available for OpenFlow 1.4+.
- New statistics are printed to output via ofctl only if those
  are present in reply message.
- New statistics definition is added to include/openflow/intel-ext.h.
- Custom statistics are implemented only for dpdk-physical
  port type.
- DPDK-physical implementation uses xstats to collect statistics.
  Only dropped and error counters are exposed.

Co-authored-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Michal Weglicki <michalx.weglicki@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2018-01-10 15:29:13 -08:00
Ilya Maximets
ad8b0b4fe7 netdev: Remove useless cutlen.
Cutlen already applied while processing OVS_ACTION_ATTR_OUTPUT.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com
2017-12-20 21:07:46 +00:00
Ilya Maximets
b30896c969 netdev: Remove unused may_steal.
Not needed anymore because 'may_steal' already handled on
dpif-netdev layer and always true.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com
2017-12-20 21:07:46 +00:00
Bhanuprakash Bodireddy
7a385993a6 netdev-dummy: Reorder elements in dummy_packet_stream structure.
By reordering elements in dummy_packet_stream structure, sum holes
can be reduced, thus saving a cache line.

Before: structure size: 784, sum holes: 56, cachelines:13
After : structure size: 768, sum holes: 40, cachelines:12

Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-11-03 12:52:09 -07:00
Xiao Liang
fd016ae3fb lib: Move lib/poll-loop.h to include/openvswitch
Poll-loop is the core to implement main loop. It should be available in
libopenvswitch.

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-11-03 10:47:55 -07:00
Yifeng Sun
dac0fb811e netdev-dummy: Avoid double-free in netdev_dummy_ip4addr().
netdev_dummy_ip6addr() calls netdev_close() twice though it increases
netdev's reference only once from netdev_from_name(). As a result, Valgrind
test 788 (tunnel_push_pop - action) reports the error below:

==20465== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
 Invalid read of size 8
    at 0x493FE0: netdev_get_name (netdev.c:911)
    by 0x5125D3: tnl_port_map_delete_ipdev (tnl-ports.c:470)
    by 0x4E551C: __rt_entry_delete (ovs-router.c:252)
    by 0x4E64AA: ovs_router_flush (ovs-router.c:478)
    by 0x475CA8: call_hooks.part.2 (fatal-signal.c:254)
    by 0x5E53FF7: __run_exit_handlers (exit.c:82)
    by 0x5E54044: exit (exit.c:104)
    by 0x5E3A836: (below main) (libc-start.c:325)
  Address 0x65ea680 is 0 bytes inside a block of size 640 free'd
    at 0x4C2EDEB: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
    by 0x492BA2: netdev_unref (netdev.c:572)
    by 0x41646E: ofport_destroy__ (ofproto.c:2516)
    by 0x41FD58: ofproto_destroy (ofproto.c:1645)
    by 0x40B96B: bridge_destroy (bridge.c:3273)
    by 0x410238: bridge_exit (bridge.c:506)
    by 0x40700E: main (ovs-vswitchd.c:135)
  Block was alloc'd at
    at 0x4C2FB55: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
    by 0x516A82: xcalloc (util.c:103)
    by 0x48D74D: netdev_dummy_alloc (netdev-dummy.c:661)
    by 0x4931D1: netdev_open.part.12 (netdev.c:406)
    by 0x40A985: iface_do_create (bridge.c:1784)
    by 0x40A985: iface_create (bridge.c:1837)
    by 0x40A985: bridge_add_ports__ (bridge.c:931)
    by 0x40C7EA: bridge_add_ports (bridge.c:947)
    by 0x40C7EA: bridge_reconfigure (bridge.c:663)
    by 0x410485: bridge_run (bridge.c:2998)
    by 0x406F64: main (ovs-vswitchd.c:119)

Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-11-02 13:55:51 -07:00
Joe Stringer
df3a6d503e netdev-dummy: Fix minor style variation.
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
2017-08-09 16:56:30 -07:00
Ben Pfaff
360990eb1d netdev-dummy: Close pcap files when dummy device is closed.
Fixes a fd leak.

Reported-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
2017-08-08 16:54:04 -07:00
Ben Pfaff
a61a289119 dp-packet: New function dp_packet_get_send_len().
This function is useful in a few places for representing the packet's
length minus the cutlen.

Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-08-02 18:58:10 -07:00
Ben Pfaff
71f21279f6 Eliminate most shadowing for local variable names.
Shadowing is when a variable with a given name in an inner scope hides a
different variable with the same name in a surrounding scope.  This is
generally undesirable because it can confuse programmers.  This commit
eliminates most of it.

Found with -Wshadow=local in GCC 7.  The repo is not really ready to enable
this option by default because of a few cases that are harder to fix, and
harmless, such as nested use of CMAP_FOR_EACH.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Andy Zhou <azhou@ovn.org>
2017-08-02 15:03:35 -07:00
Andy Zhou
bc0f51765d flow: Refactor flow_compose() API.
Currently, flow_compose_size() is only supposed to be called after
flow_compose(). I find this API to be unintuitive.

Change flow_compose() API to take the 'size' argument, and
returns 'true' if the packet can be created, 'false' otherwise.

This change also improves error detection and reporting when
'size' is unreasonably small.

Signed-off-by: Andy Zhou <azhou@ovn.org>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
2017-07-27 15:22:39 -07:00
Ilya Maximets
1e2eecbbf7 netdev-dummy: Fix setting length in recieve command.
Currently, if '--len' option passed to 'netdev-dummy/receive' command,
only 'size' field of dp_packet will changes.

This is incorrect behaviour, because memory for that size is not
allocated and also packet headers not fixed to reflect the new size.
This leads to flow_extract() failure, because it checks the
'ip->tot_len' and stops further parsing if it doesn't match the
dp_packet_size(). As a result packets created while processing of the
'receive' command can't be parsed to the same flow.
Additionally this may lead to wrong memory accesses in case someone
will try to read or modify packets data.

Fix that by creating right packets using recently introduced
'flow_compose_size()'.

CC: Andy Zhou <azhou@ovn.org>
Fixes: d8ada2368cbe ("netdev-dummy: Add --len option for netdev-dummy/receive command")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Andy Zhou <azhou@ovn.org>
2017-07-25 14:42:11 -07:00
Ben Pfaff
875ab13020 userspace: Handling of versatile tunnel ports
In netdev_gre_build_header(), GRE protocol and VXLAN next_potocol is set based
on packet_type of flow. If it's about an Ethernet packet, it is set to
ETP_TYPE_TEB. Otherwise, if the name space is OFPHTN_ETHERNET, it is set
according to the name space type.

Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-06-27 17:28:30 -04:00
Paul Blakey
18ebd48cfb netdev: Adding a new netdev API to be used for offloading flows
Add a new API interface for offloading dpif flows to netdev.
The API consist on the following:
  flow_put - offload a new flow
  flow_get - query an offloaded flow
  flow_del - delete an offloaded flow
  flow_flush - flush all offloaded flows
  flow_dump_* - dump all offloaded flows

In upcoming commits we will introduce an implementation of this
API for netdev-linux.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2017-06-14 10:12:30 +02:00
Jan Scheurich
2482b0b0c8 userspace: Add packet_type in dp_packet and flow
This commit adds a packet_type attribute to the structs dp_packet and flow
to explicitly carry the type of the packet as prepration for the
introduction of the so-called packet type-aware pipeline (PTAP) in OVS.

The packet_type is a big-endian 32 bit integer with the encoding as
specified in OpenFlow verion 1.5.

The upper 16 bits contain the packet type name space. Pre-defined values
are defined in openflow-common.h:

enum ofp_header_type_namespaces {
    OFPHTN_ONF = 0,             /* ONF namespace. */
    OFPHTN_ETHERTYPE = 1,       /* ns_type is an Ethertype. */
    OFPHTN_IP_PROTO = 2,        /* ns_type is a IP protocol number. */
    OFPHTN_UDP_TCP_PORT = 3,    /* ns_type is a TCP or UDP port. */
    OFPHTN_IPV4_OPTION = 4,     /* ns_type is an IPv4 option number. */
};

The lower 16 bits specify the actual type in the context of the name space.

Only name spaces 0 and 1 will be supported for now.

For name space OFPHTN_ONF the relevant packet type is 0 (Ethernet).
This is the default packet_type in OVS and the only one supported so far.
Packets of type (OFPHTN_ONF, 0) are called Ethernet packets.

In name space OFPHTN_ETHERTYPE the type is the Ethertype of the packet.
A packet of type (OFPHTN_ETHERTYPE, <Ethertype>) is a standard L2 packet
whith the Ethernet header (and any VLAN tags) removed to expose the L3
(or L2.5) payload of the packet. These will simply be called L3 packets.

The Ethernet address fields dl_src and dl_dst in struct flow are not
applicable for an L3 packet and must be zero. However, to maintain
compatibility with the large code base, we have chosen to copy the
Ethertype of an L3 packet into the the dl_type field of struct flow.

This does not mean that it will be possible to match on dl_type for L3
packets with PTAP later on. Matching must be done on packet_type instead.

New dp_packets are initialized with packet_type Ethernet. Ports that
receive L3 packets will have to explicitly adjust the packet_type.

Signed-off-by: Jean Tourrilhes <jt@labs.hpe.com>
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Co-authored-by: Zoltan Balogh <zoltan.balogh@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-05-03 16:56:40 -07:00
Andy Zhou
72c84bc2db dp-packet: Enhance packet batch APIs.
One common use case of 'struct dp_packet_batch' is to process all
packets in the batch in order. Add an iterator for this use case
to simplify the logic of calling sites,

Another common use case is to drop packets in the batch, by reading
all packets, but writing back pointers of fewer packets. Add macros
to support this use case.

Signed-off-by: Andy Zhou <azhou@ovn.org>
Acked-by: Jarno Rajahalme <jarno@ovn.org>
2017-01-26 17:35:29 -08:00
Andy Zhou
d8ada2368c netdev-dummy: Add --len option for netdev-dummy/receive command
Currently, there is no way to specify the packet size when injecting
a packet via "netdev-dummy/receive" with a flow specification. Thus
far, packet size is not important for testing OVS features, but it
becomes useful in writing unit tests for the future patches.

Signed-off-by: Andy Zhou <azhou@ovn.org>
Acked-by: Jarno Rajahalme <jarno@ovn.org>
2017-01-26 15:02:50 -08:00
Daniele Di Proietto
9fff138ec3 netdev: Add 'errp' to set_config().
Since 55e075e65ef9("netdev-dpdk: Arbitrary 'dpdk' port naming"),
set_config() is used to identify a DPDK device, so it's better to report
its detailed error message to the user.  Tunnel devices and patch ports
rely a lot on set_config() as well.

This commit adds a param to set_config() that can be used to return
an error message and makes use of that in netdev-dpdk and netdev-vport.

Before this patch:

$ ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
ovs-vsctl: Error detected while setting up 'dpdk0': dpdk0: could not set
    configuration (Invalid argument).  See ovs-vswitchd log for details.
ovs-vsctl: The default log directory is "/var/log/openvswitch/".

$ ovs-vsctl add-port br0 p+ -- set Interface p+ type=patch
ovs-vsctl: Error detected while setting up 'p+': p+: could not set
    configuration (Invalid argument).  See ovs-vswitchd log for details.
ovs-vsctl: The default log directory is "/var/log/openvswitch/".

$ ovs-vsctl add-port br0 gnv0 -- set Interface gnv0 type=geneve
ovs-vsctl: Error detected while setting up 'gnv0': gnv0: could not set
    configuration (Invalid argument).  See ovs-vswitchd log for details.
ovs-vsctl: The default log directory is "/var/log/openvswitch/".

After this patch:

$ ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
ovs-vsctl: Error detected while setting up 'dpdk0': 'dpdk0' is missing
    'options:dpdk-devargs'. The old 'dpdk<port_id>' names are not
    supported.  See ovs-vswitchd log for details.
ovs-vsctl: The default log directory is "/var/log/openvswitch/".

$ ovs-vsctl add-port br0 p+ -- set Interface p+ type=patch
ovs-vsctl: Error detected while setting up 'p+': p+: patch type requires
    valid 'peer' argument.  See ovs-vswitchd log for details.
ovs-vsctl: The default log directory is "/var/log/openvswitch/".

$ ovs-vsctl add-port br0 gnv0 -- set Interface gnv0 type=geneve
ovs-vsctl: Error detected while setting up 'gnv0': gnv0: geneve type
    requires valid 'remote_ip' argument.  See ovs-vswitchd log for
    details.
ovs-vsctl: The default log directory is "/var/log/openvswitch/".

CC: Ciara Loftus <ciara.loftus@intel.com>
CC: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Tested-by: Ciara Loftus <ciara.loftus@intel.com>
2017-01-11 18:29:39 -08:00
nickcooper-zhangtonghao
bf9f6f80c0 netdev-dummy: Limits the number of tx/rx queues.
This patch avoids the ovs_rcu to report WARN, caused by blocked
for a long time, when ovs-vswitchd processes a port with many
rx/tx queues. The number of tx/rx queues per port may be appropriate,
because the dpdk uses it as an default max value.

Signed-off-by: nickcooper-zhangtonghao <nic@opencloud.tech>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
2017-01-10 18:53:34 -08:00
nickcooper-zhangtonghao
cce57f8daa netdev-dummy: Uses the NR_QUEUE instead of magic numbers.
The NR_QUEUE is defined in "lib/dpif-netdev.h", netdev-dpdk
uses it instead of magic number. netdev-dummy should be
in the same case.

Signed-off-by: nickcooper-zhangtonghao <nic@opencloud.tech>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
2017-01-08 18:06:39 -08:00
nickcooper-zhangtonghao
56edfb185b datapath: Checks the MTU for netdev-dummy ports.
Signed-off-by: nickcooper-zhangtonghao <nic@opencloud.tech>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2016-12-12 17:00:36 -08:00
Ilya Maximets
2a21e75796 netdev: Set the default number of queues at removal from the database
Expected behavior for attribute removal from the database is
resetting it to default value. Currently this doesn't work for
n_rxq/n_txq options of pmd netdevs (last requested value used):

	# ovs-vsctl set interface dpdk0 options:n_rxq=4
	# ovs-vsctl remove interface dpdk0 options n_rxq
	# ovs-appctl dpif/show | grep dpdk0
	  <...>
	  dpdk0 1/1: (dpdk: configured_rx_queues=4, <...> \
	                    requested_rx_queues=4,  <...>)

Fix that by using NR_QUEUE or 1 as a default value for 'smap_get_int'.

Fixes: a14b8947fd13 ("dpif-netdev: Allow different numbers of
                      rx queues for different ports.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Tested-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-12-09 18:15:51 -08:00
Daniele Di Proietto
ae59d13433 tests: Add a new MTU test.
Also, netdev-dummy needs to call netdev_change_seq_changed() in
set_mtu().

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ben Pfaff <blp@ovn.org>
2016-08-15 11:07:47 -07:00
Daniele Di Proietto
e98d0cb3ac netdev-dummy: Add dummy-internal class.
"internal" netdevs are treated specially in OVS (e.g. for MTU), but
the dummy datapath remaps both "system" and "internal" devices to the
same "dummy" netdev class, so there's no way to discern those in tests.

This commit adds a new "dummy-internal" netdev type, which will be used
by the dummy datapath for internal ports, so that other parts of the
code can understand which ports are internal just by looking at the
netdev object.

The alternative solution, using the original interface type ("internal")
instead of the translated netdev type ("dummy"), is harder to implement,
because in so many places only the netdev object is available.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ben Pfaff <blp@ovn.org>
2016-08-15 11:07:42 -07:00
Daniele Di Proietto
1c33f0c35e netdev: Pass 'netdev_class' to ->run() and ->wait().
This will allow run() and wait() methods to be shared between different
classes and still perform class-specific work.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ben Pfaff <blp@ovn.org>
2016-08-15 11:07:37 -07:00
Daniele Di Proietto
4124cb1254 netdev: Make netdev_set_mtu() netdev parameter non-const.
Every provider silently drops the const attribute when converting the
parameter to the appropriate subclass.  Might as well drop the const
attribute from the parameter, since this is a "set" function.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
2016-08-12 19:32:12 -07:00
Daniele Di Proietto
efe179e041 netdev-*: Do not use dp_packet_pad() in recv() functions.
All the netdevs used by dpif-netdev (except for netdev-dpdk) have a
dp_packet_pad() call in the receive function, probably because the
userspace datapath couldn't handle properly short packets.

This doesn't appear to be the case anymore.

This commit removes the call to have a more consistent behavior with the
kernel datapath.

All the testsuite changes in this commit adjust the expectations for
packet lengths in flow dumps and other stats.  There's only one fix in
ovn.at: one of the test_ip() functions generated an incomplete udp
packet, which was not a problem until now, because of the padding.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ben Pfaff <blp@ovn.org>
2016-07-29 14:08:10 -07:00
Ilya Maximets
324c837485 dpif-netdev: XPS (Transmit Packet Steering) implementation.
If CPU number in pmd-cpu-mask is not divisible by the number of queues and
in a few more complex situations there may be unfair distribution of TX
queue-ids between PMD threads.

For example, if we have 2 ports with 4 queues and 6 CPUs in pmd-cpu-mask
such distribution is possible:
<------------------------------------------------------------------------>
pmd thread numa_id 0 core_id 13:
        port: vhost-user1       queue-id: 1
        port: dpdk0     queue-id: 3
pmd thread numa_id 0 core_id 14:
        port: vhost-user1       queue-id: 2
pmd thread numa_id 0 core_id 16:
        port: dpdk0     queue-id: 0
pmd thread numa_id 0 core_id 17:
        port: dpdk0     queue-id: 1
pmd thread numa_id 0 core_id 12:
        port: vhost-user1       queue-id: 0
        port: dpdk0     queue-id: 2
pmd thread numa_id 0 core_id 15:
        port: vhost-user1       queue-id: 3
<------------------------------------------------------------------------>

As we can see above dpdk0 port polled by threads on cores:
	12, 13, 16 and 17.

By design of dpif-netdev, there is only one TX queue-id assigned to each
pmd thread. This queue-id's are sequential similar to core-id's. And
thread will send packets to queue with exact this queue-id regardless
of port.

In previous example:

	pmd thread on core 12 will send packets to tx queue 0
	pmd thread on core 13 will send packets to tx queue 1
	...
	pmd thread on core 17 will send packets to tx queue 5

So, for dpdk0 port after truncating in netdev-dpdk:

	core 12 --> TX queue-id 0 % 4 == 0
	core 13 --> TX queue-id 1 % 4 == 1
	core 16 --> TX queue-id 4 % 4 == 0
	core 17 --> TX queue-id 5 % 4 == 1

As a result only 2 of 4 queues used.

To fix this issue some kind of XPS implemented in following way:

	* TX queue-ids are allocated dynamically.
	* When PMD thread first time tries to send packets to new port
	  it allocates less used TX queue for this port.
	* PMD threads periodically performes revalidation of
	  allocated TX queue-ids. If queue wasn't used in last
	  XPS_TIMEOUT_MS milliseconds it will be freed while revalidation.
        * XPS is not working if we have enough TX queues.

Reported-by: Zhihong Wang <zhihong.wang@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-07-27 12:56:04 -07:00
Terry Wilson
ee89ea7b47 json: Move from lib to include/openvswitch.
To easily allow both in- and out-of-tree building of the Python
wrapper for the OVS JSON parser (e.g. w/ pip), move json.h to
include/openvswitch. This also requires moving lib/{hmap,shash}.h.

Both hmap.h and shash.h were #include-ing "util.h" even though the
headers themselves did not use anything from there, but rather from
include/openvswitch/util.h. Fixing that required including util.h
in several C files mostly due to OVS_NOT_REACHED and things like
xmalloc.

Signed-off-by: Terry Wilson <twilson@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2016-07-22 17:09:17 -07:00
Lance Richardson
7778360b0b netdev-dummy: fix crash with more than one passive connection
Investigation found that Some of the occasional failures in the
"ovn -- vtep: 3 HVs, 1 VIFs/HV, 1 GW, 1 LS" test case are caused
by ovs-vswitchd crashing with SIGSEGV. It turns out that the
crash occurrs when the number of netdev-dummy passive connections
transitions from 1 to 2.  When xrealloc() copies the array of
dummy_packet_stream structures from the original buffer to a
newly allocated one, the struct ovs_list txq member of the structure
becomes corrupt (e.g. if ovs_list_is_empty() would have returned
false before the copy, it will return true after the copy, which
will lead to a crash when the bogus packet buffer on the list is
dereferenced).

Fix by taking a hint from David Wheeler and adding a level of
indirection.

Signed-off-by: Lance Richardson <lrichard@redhat.com>
[blp@ovn.org folded in an additional bug fix]
Signed-off-by: Ben Pfaff <blp@ovn.org>
2016-07-22 15:16:35 -07:00
William Tu
64839cf432 netdev-provider: Apply batch object to netdev provider.
Commit 1895cc8dbb64 ("dpif-netdev: create batch object") introduces
batch process functions and 'struct dp_packet_batch' to associate with
batch-level metadata.  This patch applies the packet batch object to
the netdev provider interface (dummy, Linux, BSD, and DPDK) so that
batch APIs can be used in providers.  With batch metadata visible in
providers, optimizations can be introduced at per-batch level instead
of per-packet.

Tested-at: https://travis-ci.org/williamtu/ovs-travis/builds/145694197
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-07-21 16:46:32 -07:00
Ilya Maximets
d66e4a5e7e netdev-dummy: Add n_txq option.
Will be used for testing with different numbers of TX queues.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-07-08 15:27:37 -07:00
Daniele Di Proietto
d537e73a42 netdev-dummy: Allow configuring the numa_id for testing purposes.
This commit introduces an (undocumented) option for dummy Interfaces to
specify a dummy numa_id, to which the device belongs.  It will be used
to test the pmd threads in dpif-netdev.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ben Pfaff <blp@ovn.org>
2016-06-24 14:15:04 -07:00
William Tu
aaca4fe0ce ofp-actions: Add truncate action.
The patch adds a new action to support packet truncation.  The new action
is formatted as 'output(port=n,max_len=m)', as output to port n, with
packet size being MIN(original_size, m).

One use case is to enable port mirroring to send smaller packets to the
destination port so that only useful packet information is mirrored/copied,
saving some performance overhead of copying entire packet payload.  Example
use case is below as well as shown in the testcases:

    - Output to port 1 with max_len 100 bytes.
    - The output packet size on port 1 will be MIN(original_packet_size, 100).
    # ovs-ofctl add-flow br0 'actions=output(port=1,max_len=100)'

    - The scope of max_len is limited to output action itself.  The following
      packet size of output:1 and output:2 will be intact.
    # ovs-ofctl add-flow br0 \
            'actions=output(port=1,max_len=100),output:1,output:2'
    - The Datapath actions shows:
    # Datapath actions: trunc(100),1,1,2

Tested-at: https://travis-ci.org/williamtu/ovs-travis/builds/140037134
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
2016-06-24 09:17:00 -07:00
Daniele Di Proietto
f8cf65022b netdev-dummy: Introduce sched_yield() in rxq_recv() for pmd devices.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
2016-06-07 11:15:01 -07:00
Ilya Maximets
9a81a63728 netdev-dummy: Add multiqueue support to dummy-pmd.
All previous multi-open logic preserved for rx queues.
Also, added new optional parameter '--qid' for 'netdev-dummy/receive'
in order to allow user to choose id of rx queue to which packet will
be sent.

Ex.:
	ovs-appctl netdev-dummy/receive p1 --qid 3 'in_port(1) ...'

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-06-06 18:13:49 -07:00
Ilya Maximets
f9176a3a7f netdev-dummy: Add dummy-pmd class.
'dummy-pmd' class is a new dummy class.
Created in purposes of testing of PMD interfaces.

Ex.:
	ovs-vsctl set interface <iface> type=dummy-pmd

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-06-06 18:10:17 -07:00
Daniele Di Proietto
050c60bfb5 netdev-dpdk: Use ->reconfigure() call to change rx/tx queues.
This introduces in dpif-netdev and netdev-dpdk the first use for the
newly introduce reconfigure netdev call.

When a request to change the number of queues comes, netdev-dpdk will
remember this and notify the upper layer via
netdev_request_reconfigure().

The datapath, instead of periodically calling netdev_set_multiq(), can
detect this and call reconfigure().

This mechanism can also be used to:
* Automatically match the number of rxq with the one provided by qemu
  via the new_device callback.
* Provide a way to change the MTU of dpdk devices at runtime.
* Move a DPDK vhost device to the proper NUMA socket.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Tested-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
2016-05-23 10:27:42 -07:00
Daniele Di Proietto
790fb3b745 netdev: Add reconfigure request mechanism.
A netdev provider, especially a PMD provider (like netdev DPDK) might
not be able to change some of its parameters (such as MTU, or number of
queues) without stopping everything and restarting.

This commit introduces a mechanism that allows a netdev provider to
request a restart (netdev_request_reconfigure()).  The upper layer can
be notified via netdev_wait_reconf_required() and
netdev_is_reconf_required().  After closing all the rxqs the upper layer
can finally call netdev_reconfigure(), to make sure that the new
configuration is in place.

This will be used by next commit to reconfigure rx and tx queues in
netdev-dpdk.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Tested-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
2016-05-23 10:27:42 -07:00
mweglicx
d6e3feb57c Add support for extended netdev statistics based on RFC 2819.
Implementation of new statistics extension for DPDK ports:
- Add new counters definition to netdev struct and open flow,
  based on RFC2819.
- Initialize netdev statistics as "filtered out"
  before passing it to particular netdev implementation
  (because of that change, statistics which are not
  collected are reported as filtered out, and some
  unit tests were modified in this respect).
- New statistics are retrieved using experimenter code and
  are printed as a result to ofctl dump-ports.
- New counters are available for OpenFlow 1.4+.
- Add new vendor id: INTEL_VENDOR_ID.
- New statistics are printed to output via ofctl only if those
  are present in reply message.
- Add new file header: include/openflow/intel-ext.h which
  contains new statistics definition.
- Extended statistics are implemented only for dpdk-physical
  and dpdk-vhost port types.
- Dpdk-physical implementation uses xstats to collect statistics.
- Dpdk-vhost implements only part of statistics (RX packet sized
  based counters).

Signed-off-by: Michal Weglicki <michalx.weglicki@intel.com>
[blp@ovn.org made software devices more consistent]
Signed-off-by: Ben Pfaff <blp@ovn.org>
2016-05-06 15:28:56 -07:00
Ben Warren
25d436fbd4 Move lib/ofp-print.h to include/openvswitch directory
Signed-off-by: Ben Warren <ben@skyportsystems.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2016-04-14 16:38:32 -07:00
William Tu
91644f45c6 dp-packet: Fix use of uninitialised value at emc_lookup.
Valgrind reports "Conditional jump or move depends on uninitialised value"
and "Use of uninitialised value" at case 2016 ovn -- 3 HVs, 1 LS, 3
lports/HV.  It is caused by 1) assigning an uninitialized value to 'key->hash'
at emc_processing(). Due to uninit rss_hash_valid, dp_packet_rss_valid() might
return true and undefined hash value is returned, and 2) at emc_lookup, the
'current_entry->key.hash' could be uninitialized due to dp_packet_clone().
The patch fixes the two and as a result, a couple of calls to
dp_packet_rss_invalidate() become redundant and thus are removed.

Call stacks:
- Connditional jump or move depends on uninitialised value(s)
    dpif_netdev_packet_get_rss_hash (dpif-netdev.c:3334)
    emc_processing (dpif-netdev.c:3455)
    dp_netdev_input__ (dpif-netdev.c:3639)
and,
- Use of uninitialised value of size 8
    emc_lookup (dpif-netdev.c:1785)
    emc_processing (dpif-netdev.c:3457)
    dp_netdev_input__ (dpif-netdev.c:3639)

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-04-06 19:32:34 -07:00