2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-28 21:07:47 +00:00

576 Commits

Author SHA1 Message Date
Ben Pfaff
34944e81f0 Merge branch 'dpdk_merge' of https://github.com/istokes/ovs into HEAD 2018-01-02 07:45:17 -08:00
Ben Pfaff
b2befd5bb2 sparse: Add guards to prevent FreeBSD-incompatible #include order.
FreeBSD insists that <sys/types.h> be included before <netinet/in.h> and
that <netinet/in.h> be included before <arpa/inet.h>.  This adds guards to
the "sparse" headers to yield a warning if this order is violated.  This
commit also adjusts the order of many #includes to suit this requirement.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
2017-12-22 12:58:02 -08:00
Ilya Maximets
cc4891f39d dpif-netdev: Count sent packets and batches.
New statistics for 'pmd-stats-show' command:
average number of packets per output batch.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com
2017-12-20 21:07:46 +00:00
Ilya Maximets
b30896c969 netdev: Remove unused may_steal.
Not needed anymore because 'may_steal' already handled on
dpif-netdev layer and always true.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com
2017-12-20 21:07:46 +00:00
Ilya Maximets
009e0033dc dpif-netdev: Output packet batching.
While processing incoming batch of packets they are scattered
across many per-flow batches and sent separately.

This becomes an issue while using more than a few flows.

For example if we have balanced-tcp OvS bonding with 2 ports
there will be 256 datapath internal flows for each dp_hash
pattern. This will lead to scattering of a single recieved
batch across all of that 256 per-flow batches and invoking
send for each packet separately. This behaviour greatly degrades
overall performance of netdev_send because of inability to use
advantages of vectorized transmit functions.
But the half (if 2 ports in bonding) of datapath flows will
have the same output actions. This means that we can collect
them in a single place back and send at once using single call
to netdev_send. This patch introduces per-port packet batch
for output packets for that purpose.

'output_pkts' batch is thread local and located in send port cache.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com
2017-12-20 21:07:46 +00:00
Ilya Maximets
b010be1760 dpif-netdev: Keep latest measured time for PMD thread.
In current implementation 'now' variable updated once on each
receive cycle and passed through the whole datapath via function
arguments. It'll be better to keep this variable inside PMD
thread structure to be able to get it at any time. Such solution
will save the stack memory and simplify possible modifications
in current logic.

This patch introduces new structure 'dp_netdev_pmd_thread_ctx'
contained by 'struct dp_netdev_pmd_thread' to store any processing
context of this PMD thread. For now, only time and cycles moved to
that structure. Can be extended in the future.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-12-20 21:07:46 +00:00
Darrell Ball
bd7d93f8b4 conntrack: Allow specified alg port numbers.
Algs can use variable control port numbers for servers.
The main use case is a kind of feeble security measure; the
thinking being by some is that it obscures the alg traffic.
It is really not very effective, but the kernel has this
capability. This patch mimics the capability.

Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
2017-12-11 14:14:11 -08:00
Ben Pfaff
f0aa3801f1 dpif-netdev: Avoid "sparse" warning.
"sparse" warns when odp_port_t is used directly in an inequality
comparison.  This avoids the warning.

CC: Kevin Traynor <ktraynor@redhat.com>
Fixes: a130f1a89bd8 ("dpif-netdev: Add port/queue tiebreaker to rxq_cycle_sort.")
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Ian Stokes <ian.stokes@intel.com>
2017-12-11 13:42:53 -08:00
Ilya Maximets
d9d73f84ea Revert "dpif_netdev: Refactor dp_netdev_pmd_thread structure."
This reverts commit a807c15796ddc43ba1ffb2a6b0bd2ad4e2b73941.

Padding and aligning of dp_netdev_pmd_thread structure members
is useless, broken in a several ways and only greatly degrades
maintainability and extensibility of the structure.

Issues:

    1. It's not working because all the instances of struct
       dp_netdev_pmd_thread allocated only by usual malloc. All the
       memory is not aligned to cachelines -> structure almost never
       starts at aligned memory address. This means that any further
       paddings and alignments inside the structure are completely
       useless. Fo example:

       Breakpoint 1, pmd_thread_main
       (gdb) p pmd
       $49 = (struct dp_netdev_pmd_thread *) 0x1b1af20
       (gdb) p &pmd->cacheline1
       $51 = (OVS_CACHE_LINE_MARKER *) 0x1b1af60
       (gdb) p &pmd->cacheline0
       $52 = (OVS_CACHE_LINE_MARKER *) 0x1b1af20
       (gdb) p &pmd->flow_cache
       $53 = (struct emc_cache *) 0x1b1afe0

       All of the above addresses shifted from cacheline start by 32B.

       Can we fix it properly? NO.
       OVS currently doesn't have appropriate API to allocate aligned
       memory. The best candidate is 'xmalloc_cacheline()' but it
       clearly states that "The memory returned will not be at the
       start of a cache line, though, so don't assume such alignment".
       And also, this function will never return aligned memory on
       Windows or MacOS.

    2. CACHE_LINE_SIZE is not constant. Different architectures have
       different cache line sizes, but the code assumes that
       CACHE_LINE_SIZE is always equal to 64 bytes. All the structure
       members are grouped by 64 bytes and padded to CACHE_LINE_SIZE.
       This leads to a huge holes in a structures if CACHE_LINE_SIZE
       differs from 64. This is opposite to portability. If I want
       good performance of cmap I need to have CACHE_LINE_SIZE equal
       to the real cache line size, but I will have huge holes in the
       structures. If you'll take a look to struct rte_mbuf from DPDK
       you'll see that it uses 2 defines: RTE_CACHE_LINE_SIZE and
       RTE_CACHE_LINE_MIN_SIZE to avoid holes in mbuf structure.

    3. Sizes of system/libc defined types are not constant for all the
       systems. For example, sizeof(pthread_mutex_t) == 48 on my
       ARMv8 machine, but only 40 on x86. The difference could be
       much bigger on Windows or MacOS systems. But the code assumes
       that sizeof(struct ovs_mutex) is always 48 bytes. This may lead
       to broken alignment/big holes in case of padding/wrong comments
       about amount of free pad bytes.

    4. Sizes of the many fileds in structure depends on defines like
       DP_N_STATS, PMD_N_CYCLES, EM_FLOW_HASH_ENTRIES and so on.
       Any change in these defines or any change in any structure
       contained by thread should lead to the not so simple
       refactoring of the whole dp_netdev_pmd_thread structure. This
       greatly reduces maintainability and complicates development of
       a new features.

    5. There is no reason to align flow_cache member because it's
       too big and we usually access random entries by single thread
       only.

So, the padding/alignment only creates some visibility of performance
optimization but does nothing useful in reality. It only complicates
maintenance and adds huge holes for non-x86 architectures and non-Linux
systems. Performance improvement stated in a original commit message
should be random and not valuable. I see no performance difference.

Most of the above issues are also true for some other padded/aligned
structures like 'struct netdev_dpdk'. They will be treated separately.

CC: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
CC: Ben Pfaff <blp@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-12-08 21:42:54 +00:00
Yifeng Sun
d1ce9c2033 dpif-netdev: Fix memory leak
Valgrind complains in test 1019 (dpctl - add-if set-if del-if):

4,850,896 (4,850,240 direct, 656 indirect) bytes in 1 blocks are
definitely lost in loss record 364 of 364
   by 0x517062: xcalloc (util.c:103)
   by 0x46CBBC: dp_netdev_set_nonpmd (dpif-netdev.c:4498)
   by 0x46CBBC: create_dp_netdev (dpif-netdev.c:1299)
   by 0x46CBBC: dpif_netdev_open (dpif-netdev.c:1337)
   by 0x472CB0: do_open (dpif.c:350)
   by 0x472E6F: dpif_create (dpif.c:404)
   by 0x472E6F: dpif_create_and_open (dpif.c:417)
   by 0x430EBC: open_dpif_backer (ofproto-dpif.c:727)
   by 0x430EBC: construct (ofproto-dpif.c:1411)
   by 0x41B714: ofproto_create (ofproto.c:539)
   by 0x40C84E: bridge_reconfigure (bridge.c:647)
   by 0x4104C5: bridge_run (bridge.c:2998)
   by 0x406FA4: main (ovs-vswitchd.c:119)

The reference count wasn't released at this earlier return.

This fix passes the test 'make check'.

Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-12-08 21:42:54 +00:00
Kevin Traynor
8368866efb dpif-netdev: Calculate rxq cycles prior to compare_rxq_cycles calls.
compare_rxq_cycles sums the latest cycles from each queue for
comparison with each other. While each comparison correctly
gets the latest cycles, the cycles could change between calls
to compare_rxq_cycle. In order to use consistent values through
each call of compare_rxq_cycles, sum the cycles before qsort is
called.

Requested-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-12-08 21:42:54 +00:00
Kevin Traynor
cc131ac184 dpif-netdev: Rename rxq_cycle_sort to compare_rxq_cycles.
This function is used for comparison between queues
as part of the sort. It does not do the sort itself.
As such, give it a more appropriate name.

Suggested-by: Billy O'Mahony <billy.o.mahony@intel.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Billy O'Mahony
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-12-08 21:42:54 +00:00
Kevin Traynor
a130f1a89b dpif-netdev: Add port/queue tiebreaker to rxq_cycle_sort.
rxq_cycle_sort is used to compare rx queues by their measured number
of cycles. In the event that they are equal, 0 could be returned.
However, it is observed that returning 0 results in a different sort
order on Windows/Linux. This is ok in practice but it causes a unit
test failure for
"1007: PMD - pmd-cpu-mask/distribution of rx queues" when running
on different OS's.

In order to have a consistent sort result across multiple OS's,
introduce a tiebreaker of port/queue.

Fixes: 655856ef39b9 ("dpif-netdev: Change rxq_scheduling to use rxq processing cycles.")
Reported-by: Alin Gabriel Serdean <aserdean@ovn.org>
Tested-by: Alin Gabriel Serdean <aserdean@ovn.org>
Co-authored-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-12-08 21:42:54 +00:00
Yi-Hung Wei
817a76577f ct-dpif,dpif-netlink: Support conntrack flush by ct 5-tuple
This patch adds support of flushing a conntrack entry specified by the
conntrack 5-tuple, and provides the implementation in dpif-netlink.
The implementation of dpif-netlink in the linux datapath utilizes the
NFNL_SUBSYS_CTNETLINK netlink subsystem to delete a conntrack entry in
nf_conntrack.  Future patches will add support for the userspace and
Windows datapaths.

VMWare-BZ: #1983178
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Justin Pettit <jpettit@ovn.org>
2017-12-07 13:49:40 -08:00
Kevin Traynor
64bf452e68 dpif-netdev: Rename rxq_interval.
rxq_interval was added before there was other #defines
and code related to rxq intervals.

Rename to rxq_next_cycles_store in order to make it more intuitive.

Requested-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Antonio Fischetti <antonio.fischetti@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-11-16 16:24:11 +00:00
Kevin Traynor
d9f79b6a5c dpif-netdev: Remove unnecessary resets on new rxqs.
Commit 38259bd7eb21 (dpif-netdev: Initialize new rxqs in
port_reconfigure().) added a memset for the dp_netdev_rxq of new rxq's
to remove a valgrind warning for an index field in that struct.  With
the addition of that memset, it also means there are some existing
resets on other fields in that struct that are no longer needed and
gives the opportunity to simplify by removing them.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-11-12 14:44:12 -08:00
Guoshuai Li
3f9d3836d6 dpif-netdev: Set MAX_RECIRC_DEPTH to 6.
In an ovn gateway node with DPDK, the RECIRC_DEPTH may be greater than 5.

Scenes:
VM ping self floating IP, or
VM ping Floating IP of VMs with the same network.

It need process UNDNAT SNAT in LRouter egress and
UNSNAT DNAT in LRouter ingress, and
output to geneve tunnel also need recirc.

This has an WARN:
dpif_netdev(pmd36)|WARN|Packet dropped. Max recirculation depth exceeded.

Signed-off-by: Guoshuai Li <ligs@dtdream.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-11-03 14:29:39 -07:00
Bhanuprakash Bodireddy
a807c15796 dpif_netdev: Refactor dp_netdev_pmd_thread structure.
This commit introduces below changes to dp_netdev_pmd_thread
structure.

- Mark cachelines and in this process reorder few members to avoid
  holes.
- Align emc_cache to a cacheline.
- Maintain the grouping of related member variables.
- Add comment on the information on pad bytes whereever appropriate so
  that new member variables may be introduced to fill the holes in future.

  Below is how the structure looks with this commit.

              Member                    size

     OVS_CACHE_LINE_MARKER cacheline0;
         struct dp_netdev * dp;          8
         struct cmap_node node;          8
         pthread_cond_t cond;           48

     OVS_CACHE_LINE_MARKER cacheline1;
         struct ovs_mutex cond_mutex;   48
         pthread_t  thread;              8
         unsigned int core_id;           4
         int        numa_id;             4

     OVS_CACHE_LINE_MARKER cacheline2;
         struct emc_cache flow_cache;   4849672

     ###cachelineX: 64 bytes, 0 pad bytes####
         struct cmap flow_table;         8
         ....

     ###cachelineY: 59 bytes, 5 pad bytes####
       struct dp_netdev_pmd_stats stats 40
         ....

     ###cachelineZ: 48 bytes, 16 pad bytes###
         struct ovs_mutex port_mutex;   48
         ....

This change also improve the performance marginally.

Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-11-03 13:36:14 -07:00
Bhanuprakash Bodireddy
ee42dd70dc dpif-netdev: Reorder elements in dp_netdev_rxq structure.
By reordering elements in dp_netdev_rxq structure, pad bytes and a hole
can be removed.

Before: structure size: 104, sum holes: 1, sum padbytes:4, cachelines:2
After : structure size:  96, sum holes: 0, sum padbytes:0, cachelines:2

Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-11-03 12:56:22 -07:00
Xiao Liang
fd016ae3fb lib: Move lib/poll-loop.h to include/openvswitch
Poll-loop is the core to implement main loop. It should be available in
libopenvswitch.

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-11-03 10:47:55 -07:00
Ben Pfaff
38259bd7eb dpif-netdev: Initialize new rxqs in port_reconfigure().
valgrind reported use of uninitialized data in port_reconfigure(), which
was due to xrealloc() not initializing the newly added data, combined with
dp_netdev_rxq_set_intrvl_cycles() reading 'intrvl_idx' from the added data.
This avoids the warning.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
2017-10-27 10:01:33 -07:00
Andy Zhou
66a396d4ff dpif-netdev: Use portable error code for zero rate meter band
'EBADRQC' is only defined on the Linux platform. Without this fix,
The travis MacOS build fails. Switching to using EDOM which is more
portable.

Fixes: 2029ce9ac3a601 (dpif-netdev: Fix a zero-rate bug for meter)
CC: Ali Volkan ATLI <volkan.atli@argela.com.tr>
Signed-off-by: Andy Zhou <azhou@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
2017-09-29 12:35:59 -07:00
Ali Volkan ATLI
2029ce9ac3 dpif-netdev: Fix a zero-rate bug for meter
Open vSwitch daemon crashes (by receiving signal SIGFPE,
Arithmetic exception) when a controller tries to send
a meter-mod message with zero rate.

Signed-off-by: Ali Volkan ATLI <volkan.atli@argela.com.tr>
Signed-off-by: Andy Zhou <azhou@ovn.org>
2017-09-27 10:35:28 -07:00
Bhanuprakash Bodireddy
899363ed03 dpif-netdev: Fix comments for pmd_load_cached_ports.
Commit 57eebbb4c315 replaces thread local 'pmd->port_cache' with
'pmd->tnl_port_cache' and 'pmd->send_port_cache' maps. Update the
comments accordingly.

Fixes: 57eebbb4c315 ("Don't try to output on a device without txqs")
Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-09-22 02:19:59 -07:00
Bhanuprakash Bodireddy
37eabc706e dpif-netdev: Remove 'cnt' in dp_netdev_input__().
There is little use of 'cnt' variable in dp_netdev_input__(). Get rid of
it and use dp_packet_batch_size() to initialize PKT_ARRAY_SIZE.

Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-09-22 02:16:05 -07:00
Bhanuprakash Bodireddy
31c82130fc dpif-netdev: Use DP_PACKET_BATCH_FOR_EACH in fast_path_processing.
Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-09-22 02:11:17 -07:00
Bhanuprakash Bodireddy
79c81260c2 dpif-netdev: Use DP_PACKET_BATCH_FOR_EACH in dp_netdev_run_meter.
Use DP_PACKET_BATCH_FOR_EACH macro in dp_netdev_run_meter().

Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-09-22 02:07:53 -07:00
Fischetti, Antonio
bde94613e6 dpif-netdev: Avoid reading RSS hash when EMC is disabled.
When EMC is disabled the reading of RSS hash is skipped.
Also, for packets that are not recirculated it retrieves
the hash value without considering the recirc id.

Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>
Acked-by: Billy O'Mahony <billy.o.mahony@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-09-22 01:56:28 -07:00
Ben Pfaff
a5781d3270 Merge branch 'dpdk_merge' of https://github.com/darball/ovs. 2017-09-12 07:12:53 -07:00
Ben Pfaff
4ee87ad31e dpif-netdev: Avoid side-effect in argument of atomic_store_relaxed().
Some of the implementations of atomic_store_relaxed() evaluate their
first argument more than once, so arguments with side effects cause
strange behavior.  This fixes a problem observed on 64-bit Windows.

Reported-by: Alin Serdean <aserdean@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Alin Serdean <aserdean@ovn.org>
Acked-by: Darrell Ball <dlu998@gmail.com>
2017-09-10 10:44:30 -07:00
Kevin Traynor
280802762b dpif-netdev: Fix a couple of coding style issues.
A couple of trivial fixes for a ternery operator placement
and pointer declaration.

Fixes: 655856ef39b9 ("dpif-netdev: Change rxq_scheduling to use rxq processing cycles.")
Fixes: a2ac666d5265 ("dpif-netdev: Change definitions of 'idle' & 'processing' cycles")
Cc: ciara.loftus@intel.com
Reported-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-09-05 12:02:25 -07:00
Cian Ferriter
45df9fef60 dpif-netdev: Rename "size" variable to "cnt".
Commit 72c84bc (dp-packet: Enhance packet batch APIs.) changed how the amount
of packets to be processed is retrieved. In the process, the patch used "size"
as the variable holding the amount of packets rather than "cnt". Change this
back to match with the "emc_processing()" comment.

Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-09-05 12:02:25 -07:00
Fischetti, Antonio
0230552087 dpif-netdev: Fix comments in function headers.
Fix comments for emc_processing and dp_netdev_input__
regarding md_is_valid.

Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>
Acked-by: Cian Ferriter <cian.ferriter@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-09-05 12:02:25 -07:00
Ilya Maximets
435c2797d0 dpif-netdev: Fix per packet cycles statistics.
DP_STAT_LOOKUP_HIT statistics used mistakenly for calculation
of total number of packets. This leads to completely wrong
per packet cycles statistics.

For example:

	emc hits:0
	megaflow hits:253702308
	avg. subtable lookups per hit:1.50
	miss:0
	lost:0
	avg cycles per packet: 248.32 (157498766585/634255770)

	In this case 634255770 total_packets value used for avg
	per packet calculation:

	  total_packets = 'megaflow hits' + 'megaflow hits' * 1.5

	The real value should be 524.38 (157498766585/253702308)

Fix that by summing only stats that reflect match/not match.
It's decided to make direct summing of required values instead of
disabling some stats in a loop to make calculations more clear and
avoid similar issues in the future.

CC: Jan Scheurich <jan.scheurich@ericsson.com>
Fixes: 3453b4d62a98 ("dpif-netdev: dpcls per in_port with sorted subtables")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Jan Scheurich <jan.scheurich@ericsson.com>
Acked-by: Cian Ferriter <cian.ferriter@intel.com>
Tested-by: Cian Ferriter <cian.ferriter@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-09-05 12:02:25 -07:00
Kevin Traynor
cd995c739a dpif-netdev: Add ovs-appctl dpif-netdev/pmd-rxq-rebalance.
Rxqs consumed processing cycles are used to improve the balance
of how rxqs are assigned to pmds. Currently some reconfiguration
is needed to perform a reassignment.

Add an ovs-appctl command to perform a new assignment in order
to balance based on the latest rxq processing cycle information.

Note: Jan requested this for testing purposes.

Suggested-by: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-08-25 00:54:26 -07:00
Kevin Traynor
79da1e411b dpif-netdev: Change pmd selection order.
Up to his point rxqs are sorted by processing cycles they
consumed and assigned to pmds in a round robin manner.

Ian pointed out that on wrap around the most loaded pmd will be
the next one to be assigned an additional rxq and that it would be
better to reverse the pmd order when wraparound occurs.

In other words, change from assigning by rr to assigning in a forward
and reverse cycle through pmds.

Also, now that the algorithm has finalized, document an example.

Suggested-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-08-25 00:51:18 -07:00
Kevin Traynor
655856ef39 dpif-netdev: Change rxq_scheduling to use rxq processing cycles.
Previously rxqs were assigned to pmds by round robin in
port/queue order.

Now that we have the processing cycles used for existing rxqs,
use that information to try and produced a better balanced
distribution of rxqs across pmds. i.e. given multiple pmds, the
rxqs which have consumed the largest amount of processing cycles
will be placed on different pmds.

The rxqs are sorted by their processing cycles and assigned (in
sorted order) round robin across pmds.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-08-25 00:48:01 -07:00
Kevin Traynor
4809891b2e dpif-netdev: Count the rxq processing cycles for an rxq.
Count the cycles used for processing an rxq during the
pmd rxq interval. As this is an in flight counter and
pmds run independently, also store the total cycles used
during the last full interval.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-08-25 00:44:25 -07:00
Kevin Traynor
c59e759f33 dpif-netdev: Add rxq processing cycle counters.
Add counters to dp_netdev_rxq which will later be used for storing the
processing cycles of an rxq. Processing cycles will be stored in reference
to a defined time interval. We will store the cycles of the current in progress
interval, a number of completed intervals and the sum of the completed
intervals.

cycles_count_intermediate was used to count cycles for a pmd. With some small
additions we can also use it to count the cycles used for processing an rxq.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-08-25 00:42:06 -07:00
Kevin Traynor
922b28d435 dpif-netdev: Change polled_queue to use dp_netdev_rxq.
Soon we will want to store processing cycle counts in the dp_netdev_rxq,
so use that as a basis for the polled_queue that pmd_thread_main uses.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-08-25 00:39:40 -07:00
Fischetti, Antonio
94053e66e3 conntrack: pass current time to conntrack_execute.
Current time is passed to conntrack_execute so it doesn't have
to recompute it again.

Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>
Acked by: Sugesh Chandran <sugesh.chandran@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-08-24 22:23:33 -07:00
Jan Scheurich
1fc11c5948 Generic encap and decap support for NSH
This commit adds translation and netdev datapath support for generic
encap and decap actions for the NSH MD1 header. The generic encap and
decap actions are mapped to specific encap_nsh and decap_nsh actions
in the datapath.

The translation follows that general scheme that decap() of an NSH
packet triggers recirculation after decapsulation, while encap(nsh)
just modifies struct flow and sets the ctx->pending_encap flag to
generate the encap_nsh action at the next commit to be able to include
subsequent set_field actions for NSH headers.

Support for the flexible MD2 format using TLV properties is foreseen
in encap(nsh), but not yet fully implemented.

The CLI syntax for encap of NSH is
encap(nsh(md_type=1))
encap(nsh(md_type=2[,tlv(<tlv_class>,<tlv_type>,<hex_string>),...]))

Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-08-07 11:26:17 -07:00
Ben Pfaff
71f21279f6 Eliminate most shadowing for local variable names.
Shadowing is when a variable with a given name in an inner scope hides a
different variable with the same name in a surrounding scope.  This is
generally undesirable because it can confuse programmers.  This commit
eliminates most of it.

Found with -Wshadow=local in GCC 7.  The repo is not really ready to enable
this option by default because of a few cases that are harder to fix, and
harmless, such as nested use of CMAP_FOR_EACH.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Andy Zhou <azhou@ovn.org>
2017-08-02 15:03:35 -07:00
Bhanuprakash Bodireddy
ca62bb16ab dpif-netdev: Reorder elements in dp_netdev_port structure.
By reordering the elements in dp_netdev_port structure, pad bytes can be
reduced there by saving a cache line. Marginal performance improvement
is also observed with this change.

Before: structure size: 136, holes: 7, sum padbytes:7, cachelines:3
After : structure size: 128, holes: 6, sum padbytes:0, cachelines:2

Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-08-02 10:18:56 -07:00
Antonio Fischetti
ded30c74b1 dpctl: Add new 'ct-bkts' command.
With the command:
 ovs-appctl dpctl/ct-bkts
shows the number of connections per bucket.

By using a threshold:
 ovs-appctl dpctl/ct-bkts gt=N
for each bucket shows the number of connections when they
are greater than N.

Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>
Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Co-authored-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-08-02 10:18:55 -07:00
Billy O'Mahony
c37813fdb0 dpif-netdev: Assign ports to pmds on non-local numa node.
Previously if there is no available (non-isolated) pmd on the numa node
for a port then the port is not polled at all. This can result in a
non-operational system until such time as nics are physically
repositioned. It is preferable to operate with a pmd on the 'wrong' numa
node albeit with lower performance. Local pmds are still chosen when
available.

Signed-off-by: Billy O'Mahony <billy.o.mahony@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Co-authored-by: Ilya Maximets <i.maximets@samsung.com>
Tested-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-08-02 10:17:56 -07:00
Ilya Maximets
e215018b0b dpif-netdev: Don't uninit emc on reload.
There are many reasons for reloading of pmd threads:
	* reconfiguration of one of the ports.
	* Adjusting of static_tx_qid.
	* Adding new tx/rx ports.

In many cases EMC is still useful after reload and uninit
will only lead to unnecessary upcalls/classifier lookups.

Such behaviour slows down the datapath. Uninit itself slows
down the reload path. All this factors leads to additional
unexpected latencies/drops on events not directly connected
to current PMD thread.

Lets not uninitialize emc cache on reload path.
'emc_cache_slow_sweep()' and replacements should free all
the old/unwanted entries.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Cian Ferriter <cian.ferriter@intel.com>
Tested-by: Cian Ferriter <cian.ferriter@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-08-02 10:17:55 -07:00
Ilya Maximets
140dd69946 dpif-netdev: Incremental addition/deletion of PMD threads.
Currently, change of 'pmd-cpu-mask' is very heavy operation.
It requires destroying of all the PMD threads and creating
them back. After that, all the threads will sleep until
ports' redistribution finished.

This patch adds ability to not stop the datapath while
adjusting number/placement of PMD threads. All not affected
threads will forward traffic without any additional latencies.

id-pool created for static tx queue ids to keep them sequential
in a flexible way. non-PMD thread will always have
static_tx_qid = 0 as it was before.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Tested-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-08-02 10:17:50 -07:00
Justin Pettit
2575df07f6 dpif-netdev: Indicate support for various ct features.
The userspace datapath uses a structure to indicate supported features
that affects debug output.  This commit updates that structure to
indicate that "ct_state_nat", "ct_orig_tuple", and "ct_orig_tuple6" are
supported.

Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Darrell Ball <dlu998@gmail.com>
2017-07-19 22:15:54 -07:00
Sugesh Chandran
7c12dfc527 tunneling: Avoid datapath-recirc by combining recirc actions at xlate.
This patch set removes the recirculation of encapsulated tunnel packets
if possible. It is done by computing the post tunnel actions at the time of
translation. The combined nested action set are programmed in the datapath
using CLONE action.

The following test results shows the performance improvement offered by
this optimization for tunnel encap.

          +-------------+
      dpdk0 |             |
         -->o    br-in    |
            |             o--> gre0
            +-------------+

                   --> LOCAL
            +-----------o-+
            |             | dpdk1
            |    br-p1    o-->
            |             |
            +-------------+

Test result on OVS master with DPDK 16.11.2 (Without optimization):

 # dpdk0

 RX packets         : 7037641.60  / sec
 RX packet errors   : 0  / sec
 RX packets dropped : 7730632.90  / sec
 RX rate            : 402.69 MB/sec

 # dpdk1

 TX packets         : 7037641.60  / sec
 TX packet errors   : 0  / sec
 TX packets dropped : 0  / sec
 TX rate            : 657.73 MB/sec
 TX processing cost per TX packets in nsec : 142.09

Test result on OVS master + DPDK 16.11.2 (With optimization):

 # dpdk0

 RX packets         : 9386809.60  / sec
 RX packet errors   : 0  / sec
 RX packets dropped : 5381496.40  / sec
 RX rate            : 537.11 MB/sec

 # dpdk1

 TX packets         : 9386809.60  / sec
 TX packet errors   : 0  / sec
 TX packets dropped : 0  / sec
 TX rate            : 877.29 MB/sec
 TX processing cost per TX packets in nsec : 106.53

The offered performance gain is approx 30%.

Signed-off-by: Sugesh Chandran <sugesh.chandran@intel.com>
Signed-off-by: Zoltán Balogh <zoltan.balogh@ericsson.com>
Co-authored-by: Zoltán Balogh <zoltan.balogh@ericsson.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
2017-07-19 14:34:20 -07:00