2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-28 21:07:47 +00:00

552 Commits

Author SHA1 Message Date
Bhanuprakash Bodireddy
37eabc706e dpif-netdev: Remove 'cnt' in dp_netdev_input__().
There is little use of 'cnt' variable in dp_netdev_input__(). Get rid of
it and use dp_packet_batch_size() to initialize PKT_ARRAY_SIZE.

Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-09-22 02:16:05 -07:00
Bhanuprakash Bodireddy
31c82130fc dpif-netdev: Use DP_PACKET_BATCH_FOR_EACH in fast_path_processing.
Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-09-22 02:11:17 -07:00
Bhanuprakash Bodireddy
79c81260c2 dpif-netdev: Use DP_PACKET_BATCH_FOR_EACH in dp_netdev_run_meter.
Use DP_PACKET_BATCH_FOR_EACH macro in dp_netdev_run_meter().

Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-09-22 02:07:53 -07:00
Fischetti, Antonio
bde94613e6 dpif-netdev: Avoid reading RSS hash when EMC is disabled.
When EMC is disabled the reading of RSS hash is skipped.
Also, for packets that are not recirculated it retrieves
the hash value without considering the recirc id.

Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>
Acked-by: Billy O'Mahony <billy.o.mahony@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-09-22 01:56:28 -07:00
Ben Pfaff
a5781d3270 Merge branch 'dpdk_merge' of https://github.com/darball/ovs. 2017-09-12 07:12:53 -07:00
Ben Pfaff
4ee87ad31e dpif-netdev: Avoid side-effect in argument of atomic_store_relaxed().
Some of the implementations of atomic_store_relaxed() evaluate their
first argument more than once, so arguments with side effects cause
strange behavior.  This fixes a problem observed on 64-bit Windows.

Reported-by: Alin Serdean <aserdean@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Alin Serdean <aserdean@ovn.org>
Acked-by: Darrell Ball <dlu998@gmail.com>
2017-09-10 10:44:30 -07:00
Kevin Traynor
280802762b dpif-netdev: Fix a couple of coding style issues.
A couple of trivial fixes for a ternery operator placement
and pointer declaration.

Fixes: 655856ef39b9 ("dpif-netdev: Change rxq_scheduling to use rxq processing cycles.")
Fixes: a2ac666d5265 ("dpif-netdev: Change definitions of 'idle' & 'processing' cycles")
Cc: ciara.loftus@intel.com
Reported-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-09-05 12:02:25 -07:00
Cian Ferriter
45df9fef60 dpif-netdev: Rename "size" variable to "cnt".
Commit 72c84bc (dp-packet: Enhance packet batch APIs.) changed how the amount
of packets to be processed is retrieved. In the process, the patch used "size"
as the variable holding the amount of packets rather than "cnt". Change this
back to match with the "emc_processing()" comment.

Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-09-05 12:02:25 -07:00
Fischetti, Antonio
0230552087 dpif-netdev: Fix comments in function headers.
Fix comments for emc_processing and dp_netdev_input__
regarding md_is_valid.

Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>
Acked-by: Cian Ferriter <cian.ferriter@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-09-05 12:02:25 -07:00
Ilya Maximets
435c2797d0 dpif-netdev: Fix per packet cycles statistics.
DP_STAT_LOOKUP_HIT statistics used mistakenly for calculation
of total number of packets. This leads to completely wrong
per packet cycles statistics.

For example:

	emc hits:0
	megaflow hits:253702308
	avg. subtable lookups per hit:1.50
	miss:0
	lost:0
	avg cycles per packet: 248.32 (157498766585/634255770)

	In this case 634255770 total_packets value used for avg
	per packet calculation:

	  total_packets = 'megaflow hits' + 'megaflow hits' * 1.5

	The real value should be 524.38 (157498766585/253702308)

Fix that by summing only stats that reflect match/not match.
It's decided to make direct summing of required values instead of
disabling some stats in a loop to make calculations more clear and
avoid similar issues in the future.

CC: Jan Scheurich <jan.scheurich@ericsson.com>
Fixes: 3453b4d62a98 ("dpif-netdev: dpcls per in_port with sorted subtables")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Jan Scheurich <jan.scheurich@ericsson.com>
Acked-by: Cian Ferriter <cian.ferriter@intel.com>
Tested-by: Cian Ferriter <cian.ferriter@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-09-05 12:02:25 -07:00
Kevin Traynor
cd995c739a dpif-netdev: Add ovs-appctl dpif-netdev/pmd-rxq-rebalance.
Rxqs consumed processing cycles are used to improve the balance
of how rxqs are assigned to pmds. Currently some reconfiguration
is needed to perform a reassignment.

Add an ovs-appctl command to perform a new assignment in order
to balance based on the latest rxq processing cycle information.

Note: Jan requested this for testing purposes.

Suggested-by: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-08-25 00:54:26 -07:00
Kevin Traynor
79da1e411b dpif-netdev: Change pmd selection order.
Up to his point rxqs are sorted by processing cycles they
consumed and assigned to pmds in a round robin manner.

Ian pointed out that on wrap around the most loaded pmd will be
the next one to be assigned an additional rxq and that it would be
better to reverse the pmd order when wraparound occurs.

In other words, change from assigning by rr to assigning in a forward
and reverse cycle through pmds.

Also, now that the algorithm has finalized, document an example.

Suggested-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-08-25 00:51:18 -07:00
Kevin Traynor
655856ef39 dpif-netdev: Change rxq_scheduling to use rxq processing cycles.
Previously rxqs were assigned to pmds by round robin in
port/queue order.

Now that we have the processing cycles used for existing rxqs,
use that information to try and produced a better balanced
distribution of rxqs across pmds. i.e. given multiple pmds, the
rxqs which have consumed the largest amount of processing cycles
will be placed on different pmds.

The rxqs are sorted by their processing cycles and assigned (in
sorted order) round robin across pmds.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-08-25 00:48:01 -07:00
Kevin Traynor
4809891b2e dpif-netdev: Count the rxq processing cycles for an rxq.
Count the cycles used for processing an rxq during the
pmd rxq interval. As this is an in flight counter and
pmds run independently, also store the total cycles used
during the last full interval.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-08-25 00:44:25 -07:00
Kevin Traynor
c59e759f33 dpif-netdev: Add rxq processing cycle counters.
Add counters to dp_netdev_rxq which will later be used for storing the
processing cycles of an rxq. Processing cycles will be stored in reference
to a defined time interval. We will store the cycles of the current in progress
interval, a number of completed intervals and the sum of the completed
intervals.

cycles_count_intermediate was used to count cycles for a pmd. With some small
additions we can also use it to count the cycles used for processing an rxq.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-08-25 00:42:06 -07:00
Kevin Traynor
922b28d435 dpif-netdev: Change polled_queue to use dp_netdev_rxq.
Soon we will want to store processing cycle counts in the dp_netdev_rxq,
so use that as a basis for the polled_queue that pmd_thread_main uses.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-08-25 00:39:40 -07:00
Fischetti, Antonio
94053e66e3 conntrack: pass current time to conntrack_execute.
Current time is passed to conntrack_execute so it doesn't have
to recompute it again.

Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>
Acked by: Sugesh Chandran <sugesh.chandran@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-08-24 22:23:33 -07:00
Jan Scheurich
1fc11c5948 Generic encap and decap support for NSH
This commit adds translation and netdev datapath support for generic
encap and decap actions for the NSH MD1 header. The generic encap and
decap actions are mapped to specific encap_nsh and decap_nsh actions
in the datapath.

The translation follows that general scheme that decap() of an NSH
packet triggers recirculation after decapsulation, while encap(nsh)
just modifies struct flow and sets the ctx->pending_encap flag to
generate the encap_nsh action at the next commit to be able to include
subsequent set_field actions for NSH headers.

Support for the flexible MD2 format using TLV properties is foreseen
in encap(nsh), but not yet fully implemented.

The CLI syntax for encap of NSH is
encap(nsh(md_type=1))
encap(nsh(md_type=2[,tlv(<tlv_class>,<tlv_type>,<hex_string>),...]))

Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-08-07 11:26:17 -07:00
Ben Pfaff
71f21279f6 Eliminate most shadowing for local variable names.
Shadowing is when a variable with a given name in an inner scope hides a
different variable with the same name in a surrounding scope.  This is
generally undesirable because it can confuse programmers.  This commit
eliminates most of it.

Found with -Wshadow=local in GCC 7.  The repo is not really ready to enable
this option by default because of a few cases that are harder to fix, and
harmless, such as nested use of CMAP_FOR_EACH.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Andy Zhou <azhou@ovn.org>
2017-08-02 15:03:35 -07:00
Bhanuprakash Bodireddy
ca62bb16ab dpif-netdev: Reorder elements in dp_netdev_port structure.
By reordering the elements in dp_netdev_port structure, pad bytes can be
reduced there by saving a cache line. Marginal performance improvement
is also observed with this change.

Before: structure size: 136, holes: 7, sum padbytes:7, cachelines:3
After : structure size: 128, holes: 6, sum padbytes:0, cachelines:2

Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-08-02 10:18:56 -07:00
Antonio Fischetti
ded30c74b1 dpctl: Add new 'ct-bkts' command.
With the command:
 ovs-appctl dpctl/ct-bkts
shows the number of connections per bucket.

By using a threshold:
 ovs-appctl dpctl/ct-bkts gt=N
for each bucket shows the number of connections when they
are greater than N.

Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>
Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Co-authored-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-08-02 10:18:55 -07:00
Billy O'Mahony
c37813fdb0 dpif-netdev: Assign ports to pmds on non-local numa node.
Previously if there is no available (non-isolated) pmd on the numa node
for a port then the port is not polled at all. This can result in a
non-operational system until such time as nics are physically
repositioned. It is preferable to operate with a pmd on the 'wrong' numa
node albeit with lower performance. Local pmds are still chosen when
available.

Signed-off-by: Billy O'Mahony <billy.o.mahony@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Co-authored-by: Ilya Maximets <i.maximets@samsung.com>
Tested-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-08-02 10:17:56 -07:00
Ilya Maximets
e215018b0b dpif-netdev: Don't uninit emc on reload.
There are many reasons for reloading of pmd threads:
	* reconfiguration of one of the ports.
	* Adjusting of static_tx_qid.
	* Adding new tx/rx ports.

In many cases EMC is still useful after reload and uninit
will only lead to unnecessary upcalls/classifier lookups.

Such behaviour slows down the datapath. Uninit itself slows
down the reload path. All this factors leads to additional
unexpected latencies/drops on events not directly connected
to current PMD thread.

Lets not uninitialize emc cache on reload path.
'emc_cache_slow_sweep()' and replacements should free all
the old/unwanted entries.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Cian Ferriter <cian.ferriter@intel.com>
Tested-by: Cian Ferriter <cian.ferriter@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-08-02 10:17:55 -07:00
Ilya Maximets
140dd69946 dpif-netdev: Incremental addition/deletion of PMD threads.
Currently, change of 'pmd-cpu-mask' is very heavy operation.
It requires destroying of all the PMD threads and creating
them back. After that, all the threads will sleep until
ports' redistribution finished.

This patch adds ability to not stop the datapath while
adjusting number/placement of PMD threads. All not affected
threads will forward traffic without any additional latencies.

id-pool created for static tx queue ids to keep them sequential
in a flexible way. non-PMD thread will always have
static_tx_qid = 0 as it was before.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Tested-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-08-02 10:17:50 -07:00
Justin Pettit
2575df07f6 dpif-netdev: Indicate support for various ct features.
The userspace datapath uses a structure to indicate supported features
that affects debug output.  This commit updates that structure to
indicate that "ct_state_nat", "ct_orig_tuple", and "ct_orig_tuple6" are
supported.

Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Darrell Ball <dlu998@gmail.com>
2017-07-19 22:15:54 -07:00
Sugesh Chandran
7c12dfc527 tunneling: Avoid datapath-recirc by combining recirc actions at xlate.
This patch set removes the recirculation of encapsulated tunnel packets
if possible. It is done by computing the post tunnel actions at the time of
translation. The combined nested action set are programmed in the datapath
using CLONE action.

The following test results shows the performance improvement offered by
this optimization for tunnel encap.

          +-------------+
      dpdk0 |             |
         -->o    br-in    |
            |             o--> gre0
            +-------------+

                   --> LOCAL
            +-----------o-+
            |             | dpdk1
            |    br-p1    o-->
            |             |
            +-------------+

Test result on OVS master with DPDK 16.11.2 (Without optimization):

 # dpdk0

 RX packets         : 7037641.60  / sec
 RX packet errors   : 0  / sec
 RX packets dropped : 7730632.90  / sec
 RX rate            : 402.69 MB/sec

 # dpdk1

 TX packets         : 7037641.60  / sec
 TX packet errors   : 0  / sec
 TX packets dropped : 0  / sec
 TX rate            : 657.73 MB/sec
 TX processing cost per TX packets in nsec : 142.09

Test result on OVS master + DPDK 16.11.2 (With optimization):

 # dpdk0

 RX packets         : 9386809.60  / sec
 RX packet errors   : 0  / sec
 RX packets dropped : 5381496.40  / sec
 RX rate            : 537.11 MB/sec

 # dpdk1

 TX packets         : 9386809.60  / sec
 TX packet errors   : 0  / sec
 TX packets dropped : 0  / sec
 TX rate            : 877.29 MB/sec
 TX processing cost per TX packets in nsec : 106.53

The offered performance gain is approx 30%.

Signed-off-by: Sugesh Chandran <sugesh.chandran@intel.com>
Signed-off-by: Zoltán Balogh <zoltan.balogh@ericsson.com>
Co-authored-by: Zoltán Balogh <zoltan.balogh@ericsson.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
2017-07-19 14:34:20 -07:00
Justin Pettit
b2f4b622dd dpif-netdev: Initialize 'tun_md' member of match.
Found by valgrind.

Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
2017-07-15 13:43:09 -07:00
Ilya Maximets
85a4f23811 dpif-netdev: Fix few comments.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-07-13 15:32:46 -07:00
Ilya Maximets
a3aa871111 dpif-netdev: Remove useless port checking.
Since commit ff073a71f9bb ("dpif-netdev: Use hmap instead of
list+array for tracking ports."), 'is_valid_port_number()' is
equal to 'port_no != ODPP_NONE', and the expression below will
never be true.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Greg Rose <gvrose8192@gmail.com>
2017-07-11 13:08:21 -07:00
Ciara Loftus
656238ee92 dpif-netdev: Fix insertion probability
emc_conditional_insert uses pmd->last_cycles and the packet's RSS hash
to generate a random number used to determine whether or not an emc
entry should be inserted. This works for single-packet bursts as
last_cycles is updated for each burst. However, for bursts > 1 packet,
where the packets in the batch generate the same RSS hash,
pmd->last_cycles remains constant for the entire burst also, and thus
cannot be used as a random number for each packet in the burst.

This commit replaces the use of pmd->last_cycles with random_uint32()
for this purpose and subsequently fixes the behavior of the
emc_insert_inv_prob setting for high-throughput (large bursts)
single-flow cases.

Fixes: 4c30b24602c3 ("dpif-netdev: Conditional EMC insert")
Reported-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Darrell Ball <dlu998@gmail.com>
Tested-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-07-11 13:03:08 -07:00
Antonio Fischetti
1401f6deb6 Fix coding style and some typos.
Fixes some lines exceeding 80 chars and a couple of typos.

Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-07-11 12:41:21 -07:00
Ciara Loftus
a2ac666d52 dpif-netdev: Change definitions of 'idle' & 'processing' cycles
Instead of counting all polling cycles as processing cycles, only count
the cycles where packets were received from the polling.

Signed-off-by: Georg Schmuecking <georg.schmuecking@ericsson.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Co-authored-by: Georg Schmuecking <georg.schmuecking@ericsson.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Ian Stokes <ian.stokes@intel.com>
Tested-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-07-06 16:49:50 -07:00
Ben Pfaff
0722f34109 odp-util: Use port names in output in more places.
Until now, ODP output only showed port names for in_port matches.  This
commit shows them in other places port numbers appear.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Jan Scheurich <jan.scheurich@ericsson.com>
Tested-by: Jan Scheurich <jan.scheurich@ericsson.com>
2017-06-23 16:28:42 +08:00
Bhanuprakash Bodireddy
1cc1b5f6b0 dpif-netdev: Skip invoking qsort on empty list.
sorted_poll_list() returns the sorted list of rxqs mapped to PMD thread
along with the rxq count. Skip sorting the list if there are no rxqs
mapped to the PMD thread. This can be reproduced with manual pinning and
'dpif-netdev/pmd-rxq-show' command.

Also Clang reports that null argument is passed to qsort in this case.

Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-06-20 10:33:46 +08:00
Ben Pfaff
81765c00a1 openvswitch.h: Use odp_port_t for port numbers in userspace-only structs.
Using the correct type reduces the need for type conversions.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Jan Scheurich <jan.scheurich@ericsson.com>
Reviewed-by: nickcooper-zhangtonghao <nic@opencloud.tech>
2017-06-20 07:35:49 +08:00
Paul Blakey
7e8b719938 dpctl: Add an option to dump only certain kinds of flows
Usage:
    # to dump all datapath flows (default):
    ovs-dpctl dump-flows

    # to dump only flows that in kernel datapath:
    ovs-dpctl dump-flows type=ovs

    # to dump only flows that are offloaded:
    ovs-dpctl dump-flows type=offloaded

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2017-06-15 11:53:06 +02:00
Darrell Ball
4cddb1f0d8 dpdk: Parse NAT netlink for userspace datapath.
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Daniele Di Proietto <diproiettod@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-06-02 15:07:16 -07:00
Jan Scheurich
beb75a40fd userspace: Switching of L3 packets in L2 pipeline
Ports have a new layer3 attribute if they send/receive L3 packets.

The packet_type included in structs dp_packet and flow is considered in
ofproto-dpif. The classical L2 match fields (dl_src, dl_dst, dl_type, and
vlan_tci, vlan_vid, vlan_pcp) now have Ethernet as pre-requisite.

A dummy ethernet header is pushed to L3 packets received from L3 ports
before the the pipeline processing starts. The ethernet header is popped
before sending a packet to a L3 port.

For datapath ports that can receive L2 or L3 packets, the packet_type
becomes part of the flow key for datapath flows and is handled
appropriately in dpif-netdev.

In the 'else' branch in flow_put_on_pmd() function, the additional check
flow_equal(&match.flow, &netdev_flow->flow) was removed, as a) the dpcls
lookup is sufficient to uniquely identify a flow and b) it caused false
negatives because the flow in netdev->flow may not properly masked.

In dpif_netdev_flow_put() we now use the same method for constructing the
netdev_flow_key as the one used when adding the flow to the dplcs to make sure
these always match. The function netdev_flow_key_from_flow() used so far was
not only inefficient but sometimes caused mismatches and subsequent flow
update failures.

The kernel datapath does not support the packet_type match field.
Instead it encodes the packet type implictly by the presence or absence of
the Ethernet attribute in the flow key and mask.
This patch filters the PACKET_TYPE attribute out of netlink flow key and
mask to be sent to the kernel datapath.

Signed-off-by: Lorand Jakab <lojakab@cisco.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Co-authored-by: Zoltan Balogh <zoltan.balogh@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-06-02 10:15:20 -07:00
Ben Pfaff
f582b6df9e dpif-netdev: Fix use-after-free error in reconfigure_datapath().
Found by Coverity.

Reported-at: https://scan3.coverity.com/reports.htm#v16889/p10449/fileInstanceId=14762915&defectInstanceId=4305352&mergedDefectId=180430
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
2017-06-01 16:20:53 -07:00
Eelco Chaudron
34d8e04bec dpif-netdev: The pmd-*-show commands will show info in core order
The "ovs-appctl dpif-netdev/pmd-rxq-show" and "ovs-appctl
dpif-netdev/pmd-stats-show" commands show their output per core_id,
sorted on the hash location. My OCD was kicking in when using these
commands, hence this change to display them in natural core_id order.

In addition I had to change a test case that would fail if the cores
where not in order in the hash list. This is due to OVS assigning
queues to cores based on the order in the hash list. The test case now
checks if any core has the set of queues in the given order.

Manually tested this on my setup, and ran clang-analyze.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-05-18 15:40:33 -07:00
Bhanuprakash Bodireddy
1859876c04 dpif-netdev: Fix comments for dp_netdev_pmd_thread struct.
The sorted subtable ranking patch introduced a classifier instance per
ingress port with its subtables ranked on the frequency of hits. The PMD
thread can have more classifier instances now and solely depends on the
number of ingress ports currently handled by the pmd thread.

Fixes: 3453b4d62a98 ("dpif-netdev: dpcls per in_port with sorted subtables")
Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-05-18 14:17:38 -07:00
Bhanuprakash Bodireddy
65dcf3da40 dpif-netdev: Reorder elements in dp_netdev structure.
'emc_insert_min' variable is made to align on a 64-byte boundary and this
introduces a 24 byte hole.

This patch moves the emc_insert_min member variable slightly higher in
the order to remove the hole and thus saves a cache line with the new
ordering.

Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
CC: Ciara Loftus <ciara.loftus@intel.com>
CC: Georg Schmuecking <georg.schmuecking@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
2017-05-18 14:13:06 -07:00
Bhanuprakash Bodireddy
f79b1ddb84 dpif-netdev: Skip EMC lookup when EMC is disabled.
Conditional EMC insert patch gives the flexibility to configure the
probability of flow insertion in to EMC. This also allows an option to
entirely disable EMC by setting 'emc-insert-inv-prob=0' which can be
useful at large number of parallel flows.

This patch skips EMC lookup when EMC is disabled. This is useful to
avoid wasting CPU cycles and also improve performance considerably.

Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
CC: Ciara Loftus <ciara.loftus@intel.com>
CC: Georg Schmuecking <georg.schmuecking@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Darrell Ball dlu998@gmail.com
2017-05-18 14:09:28 -07:00
Kevin Traynor
47a45d868f dpif-netdev/netdev-dpdk: Fix line lengths.
Fix line lengths to be <= 79 as per coding style and so that checkpatch
will not show up existing warnings on these files.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-05-18 13:48:38 -07:00
Joe Stringer
f5f64552ec Revert "tunneling: Avoid recirculation on datapath."
This reverts commit f1dac5128ce6db2e493f0d1c7a8b53fb9f34476f. When this
commit was introduced, it broke the 'make check-system-userspace'
testsuite. It appears that the new translation fails to modify the flow
in a way that would represent the flow as an encapsulated flow when the
traffic is patched through to the second bridge. As such, rather than
matching on, for example, "ip,proto=47" for gre, it would use the inner
packet's flow headers. It also results in problems reporting statistics,
as the tunnel's header is not reflected in subsequent statistics and
truncation is not properly applied during translation.

While a refreshed approach to solving the above problem is formed,
revert this patch.

Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2017-May/331972.html
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Greg Rose <gvrose8192@gmail.com>
2017-05-10 12:14:02 -07:00
Jan Scheurich
2482b0b0c8 userspace: Add packet_type in dp_packet and flow
This commit adds a packet_type attribute to the structs dp_packet and flow
to explicitly carry the type of the packet as prepration for the
introduction of the so-called packet type-aware pipeline (PTAP) in OVS.

The packet_type is a big-endian 32 bit integer with the encoding as
specified in OpenFlow verion 1.5.

The upper 16 bits contain the packet type name space. Pre-defined values
are defined in openflow-common.h:

enum ofp_header_type_namespaces {
    OFPHTN_ONF = 0,             /* ONF namespace. */
    OFPHTN_ETHERTYPE = 1,       /* ns_type is an Ethertype. */
    OFPHTN_IP_PROTO = 2,        /* ns_type is a IP protocol number. */
    OFPHTN_UDP_TCP_PORT = 3,    /* ns_type is a TCP or UDP port. */
    OFPHTN_IPV4_OPTION = 4,     /* ns_type is an IPv4 option number. */
};

The lower 16 bits specify the actual type in the context of the name space.

Only name spaces 0 and 1 will be supported for now.

For name space OFPHTN_ONF the relevant packet type is 0 (Ethernet).
This is the default packet_type in OVS and the only one supported so far.
Packets of type (OFPHTN_ONF, 0) are called Ethernet packets.

In name space OFPHTN_ETHERTYPE the type is the Ethertype of the packet.
A packet of type (OFPHTN_ETHERTYPE, <Ethertype>) is a standard L2 packet
whith the Ethernet header (and any VLAN tags) removed to expose the L3
(or L2.5) payload of the packet. These will simply be called L3 packets.

The Ethernet address fields dl_src and dl_dst in struct flow are not
applicable for an L3 packet and must be zero. However, to maintain
compatibility with the large code base, we have chosen to copy the
Ethertype of an L3 packet into the the dl_type field of struct flow.

This does not mean that it will be possible to match on dl_type for L3
packets with PTAP later on. Matching must be done on packet_type instead.

New dp_packets are initialized with packet_type Ethernet. Ports that
receive L3 packets will have to explicitly adjust the packet_type.

Signed-off-by: Jean Tourrilhes <jt@labs.hpe.com>
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Co-authored-by: Zoltan Balogh <zoltan.balogh@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-05-03 16:56:40 -07:00
Andy Zhou
333ad77dbd ofproto-dpif: Add 'meter_ids' to backer
Add 'meter_ids', an id-pool object to manage datapath meter id, i.e.
provider_meter_id.

Currently, only userspace datapath supports meter, and it implements
the provider_meter_id management. Moving this function to 'backer'
allows other datapath implementation to share the same logic.

Signed-off-by: Andy Zhou <azhou@ovn.org>
Acked-by: Jarno Rajahalme <jarno@ovn.org>
2017-04-28 14:22:03 -07:00
Jarno Rajahalme
8e83854cf2 datapath: Add eventmask support to CT action.
Upstream commit:

    commit 120645513f55a4ac5543120d9e79925d30a0156f
    Author: Jarno Rajahalme <jarno@ovn.org>
    Date:   Fri Apr 21 16:48:06 2017 -0700

    openvswitch: Add eventmask support to CT action.

    Add a new optional conntrack action attribute OVS_CT_ATTR_EVENTMASK,
    which can be used in conjunction with the commit flag
    (OVS_CT_ATTR_COMMIT) to set the mask of bits specifying which
    conntrack events (IPCT_*) should be delivered via the Netfilter
    netlink multicast groups.  Default behavior depends on the system
    configuration, but typically a lot of events are delivered.  This can be
    very chatty for the NFNLGRP_CONNTRACK_UPDATE group, even if only some
    types of events are of interest.

    Netfilter core init_conntrack() adds the event cache extension, so we
    only need to set the ctmask value.  However, if the system is
    configured without support for events, the setting will be skipped due
    to extension not being found.

    Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
    Reviewed-by: Greg Rose <gvrose8192@gmail.com>
    Acked-by: Joe Stringer <joe@ovn.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
2017-04-27 10:34:42 -07:00
Sugesh Chandran
f1dac5128c tunneling: Avoid recirculation on datapath.
Open vSwitch datapath recirculates packets for tunneling, i.e. the
incoming packets are encapsulated at first pass.  Further actions are
applied on encapsulated packets on the second pass after
recirculating.  The proposed patch compute and append the post tunnel
actions at the time of translation itself instead of recirculating at
datapath. These actions are solely depends on tunnel attributes so
there is no need of datapath recirculation.  By avoiding the
recirculation at datapath, the patch offers up to 30% performance
improvement for VXLAN tunneling in our testing.  The action execution
logic is using the new CLONE action to define the packet cloning when
the actions are combined.  The length in the CLONE action specifies
the size of nested action set.

It also fixing the testsuite failures that are introduced by nested
CLONE action in tunneling.

Signed-off-by: Sugesh Chandran <sugesh.chandran@intel.com>
Signed-off-by: Zoltán Balogh <zoltan.balogh@ericsson.com>
Co-authored-by: Zoltán Balogh <zoltan.balogh@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-04-21 15:57:38 -07:00
Eric Garver
f0fb825a37 Add support for 802.1ad (QinQ tunneling)
Flow key handling changes:
 - Add VLAN header array in struct flow, to record multiple 802.1q VLAN
   headers.
 - Add dpif multi-VLAN capability probing. If datapath supports
   multi-VLAN, increase the maximum depth of nested OVS_KEY_ATTR_ENCAP.

Refactor VLAN handling in dpif-xlate:
 - Introduce 'xvlan' to track VLAN stack during flow processing.
 - Input and output VLAN translation according to the xbundle type.

Push VLAN action support:
 - Allow ethertype 0x88a8 in VLAN headers and push_vlan action.
 - Support push_vlan on dot1q packets.

Use other_config:vlan-limit in table Open_vSwitch to limit maximum VLANs
that can be matched. This allows us to preserve backwards compatibility.

Add test cases for VLAN depth limit, Multi-VLAN actions and QinQ VLAN
handling

Co-authored-by: Thomas F Herbert <thomasfherbert@gmail.com>
Signed-off-by: Thomas F Herbert <thomasfherbert@gmail.com>
Co-authored-by: Xiao Liang <shaw.leon@gmail.com>
Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Signed-off-by: Eric Garver <e@erig.me>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-03-16 15:18:40 -07:00