Commit 72c84bc (dp-packet: Enhance packet batch APIs.) changed how the amount
of packets to be processed is retrieved. In the process, the patch used "size"
as the variable holding the amount of packets rather than "cnt". Change this
back to match with the "emc_processing()" comment.
Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
DP_STAT_LOOKUP_HIT statistics used mistakenly for calculation
of total number of packets. This leads to completely wrong
per packet cycles statistics.
For example:
emc hits:0
megaflow hits:253702308
avg. subtable lookups per hit:1.50
miss:0
lost:0
avg cycles per packet: 248.32 (157498766585/634255770)
In this case 634255770 total_packets value used for avg
per packet calculation:
total_packets = 'megaflow hits' + 'megaflow hits' * 1.5
The real value should be 524.38 (157498766585/253702308)
Fix that by summing only stats that reflect match/not match.
It's decided to make direct summing of required values instead of
disabling some stats in a loop to make calculations more clear and
avoid similar issues in the future.
CC: Jan Scheurich <jan.scheurich@ericsson.com>
Fixes: 3453b4d62a ("dpif-netdev: dpcls per in_port with sorted subtables")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Jan Scheurich <jan.scheurich@ericsson.com>
Acked-by: Cian Ferriter <cian.ferriter@intel.com>
Tested-by: Cian Ferriter <cian.ferriter@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
After ovs-vswitchd reboots, vhost user client port status is displayed as
LINK DOWN though the traffic is OK.
The problem is that the port may be udpated while the vhost_reconfigured
is false. Then the vhost_reconfigured is updated to true. As a result,
the vhost port status is kept as LINK-DOWN.
Signed-off-by: wangzhike <wangzhike@jd.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
The current implementation lacked an upper bound of number of entries in
the system. Set the size to ~2M (2^21) for the time being.
Signed-off-by: Sairam Venugopal <vsairam@vmware.com>
Acked-by: Shashank Ram <rams@vmware.com>
Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
The commit 7bc1aae71e ("rhel: make the selinux policy intermediate")
broke the centos RPM builds. This commit ensures that the centos rpmbuild
will first create the openvswitch-custom.te file, and then create the
final policy files.
Fixes: 7bc1aae71e ("rhel: make the selinux policy intermediate")
Reported-by: Ansis Atteka <aatteka@ovn.org>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Ansis Atteka <aatteka@ovn.org>
The selinux documentation mentions to check the selinux/openvswitch.te file
for any permissions that might need to be added. However, the commit
7bc1aae71e ("rhel: make the selinux policy intermediate") moved this
file to be generated from intermediate file selinux/openvswitch.te.in
instead.
Correct the documentation, so that users won't be trying to edit a generated
file.
Also, add a gitignore for the autogenerated file.
Fixes: 7bc1aae71e ("rhel: make the selinux policy intermediate")
Reported-by: Ansis Atteka <aatteka@ovn.org>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Ansis Atteka <aatteka@ovn.org>
A last-minute change to the selinux policy caught by testing
incorrectly omitted moving a definition from non-dpdk to dpdk.
This moves the chr_file definition to a non-dpdk enabled permission,
which should allow non-dpdk enabled builds to work.
Fixes: 84d2723305 ("selinux: update policy to reflect non-root and dpdk support")
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Ansis Atteka <aatteka@ovn.org>
The selinux policy that exists in the repository did not specify access to
all of the resources needed for Open vSwitch to properly function with
an enforcing selinux policy. This update allows Open vSwitch to operate
with selinux set to Enforcing mode, even while running as a non-root user.
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Ansis Atteka <aatteka@ovn.org>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Tested-by: Jean Hsiao <jhsiao@redhat.com>
When building the openvswitch-custom.te file, it is important to have the
ability to filter out dpdk blocks depending on whether the system has been
configured with dpdk or not. This allows using all the standard .in file
blocks, as well as the dpdkstrip blocks, when constructing the selinux
policy file.
Additionally, this means any .in files which might want to change based on
configuration to exclude blocks based on dpdk can do so.
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Ansis Atteka <aatteka@ovn.org>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Tested-by: Jean Hsiao <jhsiao@redhat.com>
This will be used by an upcoming commit to have @begin_ and @end_ dpdk
blocks to keep dpdk specific policy decisions only active when dpdk is
used.
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Ansis Atteka <aatteka@ovn.org>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Tested-by: Jean Hsiao <jhsiao@redhat.com>
This feature landed late in 2.8 and the NSH wire protocol itself is not
completely stable.
Acked-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
This removes n_true_cnd from struct ovsdb_monitor_session_condition.
It was an "optimization" that is not part of any inner loop, but
make the code harder to reason about.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Andy Zhou <azhou@ovn.org>
Acked-by: Liran Schour <lirans@il.ibm.com>
The current implementation of ovsdb-server caches only non-conditional
monitors, that is, monitors for every table row, not those that monitor
only rows that match some condition. To figure out which monitors are
conditional, the code track the number of tables that have conditions that
are uniformly true (cond->n_true_cnd) and compares that against the number
of tables in the condition (shash_count(&cond->tables)). If they are the
same, then every table has (effectively) no condition, and so
cond->conditional is set to false.
However, the implementation was buggy. The function that adds a new
table condition, ovsdb_monitor_table_condition_create(), only updated
cond->conditional if the table condition being added was true. This is
wrong; only adding a non-true condition can actually change
cond->conditional. This commit fixes the problem by always recalculating
cond->conditional.
The most visible side effect of cond->conditional being true when it
should be false, as caused by this bug, was that conditional monitors were
being mixed with unconditional monitors for the purpose of caching. This
meant that, if a client requested a conditional monitor that was the
same as an unconditional one, except for the condition, then the client
would receive the cached data previously sent for the unconditional one.
This commit fixes the problem.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Andy Zhou <azhou@ovn.org>
Acked-by: Liran Schour <lirans@il.ibm.com>
The upgrade from older Open vSwitch versions on RHEL will try, as much as
possible, to preserve the system. This means no new users or groups are
created. As an effect, it's possible for the chown to fail, because the
hugetlbfs group may not exist. While it did on my systems, it was not
there on others.
This change allows the ExecStartPre commands to fail. In the case that the
user doesn't use DPDK, it won't matter anyway.
Fixes: e3e738a3d0 ('redhat: allow dpdk to also run as non-root user')
Signed-off-by: Aaron Conole <aconole@redhat.com>
Reported-by: Jean-Tsung Hsiao <jhsiao@redhat.com>
Tested-by: Jean-Tsung Hsiao <jhsiao@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Russell Bryant <russell@ovn.org>
Operator '*' will be executed prior to operator '>>',
but we expect operator '>>' is executed prior to '*',
this patch fixed the issue.
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
In netdev_dpdk_vhost_get_stats, '+=' was used in a few places
where '=' was expected.
Signed-off-by: wangzhike <wangzhike@jd.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Force 64-bit arithmetic to be used when converting uint32_t rate
and burst parameters from kilobits per second to bytes per second,
avoiding incorrect behavior for rates exceeding UINT_MAX bits
per second.
Reported-by: "王志克" <wangzhike@jd.com>
Fixes: 9509913aa7 ("netdev-dpdk.c: Add ingress-policing functionality.")
Signed-off-by: Lance Richardson <lrichard@redhat.com>
Acked-By: Mark Michelson <mmichels@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Conn should be removed from the connection expiry list when
the connection tracker experiences NAT resource exhaustion
and the connection needing NAT mapping cannot get it.
If this is not done, the connection tracker can crash during
cleanup of expired connections by the clean thread.
This crash will be triggered when a established flow do ct(nat)
again, like
"ip,actions=ct(table=1)
table=1,in_port=1,ip,actions=ct(commit,nat(dst=5.5.5.5)),2
table=1,in_port=2,ip,ct_state=+est,actions=1
table=1,in_port=1,ip,ct_state=+est,actions=2"
Fixes: bd5e81a0e5 ("Userspace Datapath: Add ALG infra and FTP.")
Signed-off-by: Lili Huang <huanglili.huang@huawei.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Rxqs consumed processing cycles are used to improve the balance
of how rxqs are assigned to pmds. Currently some reconfiguration
is needed to perform a reassignment.
Add an ovs-appctl command to perform a new assignment in order
to balance based on the latest rxq processing cycle information.
Note: Jan requested this for testing purposes.
Suggested-by: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Up to his point rxqs are sorted by processing cycles they
consumed and assigned to pmds in a round robin manner.
Ian pointed out that on wrap around the most loaded pmd will be
the next one to be assigned an additional rxq and that it would be
better to reverse the pmd order when wraparound occurs.
In other words, change from assigning by rr to assigning in a forward
and reverse cycle through pmds.
Also, now that the algorithm has finalized, document an example.
Suggested-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Previously rxqs were assigned to pmds by round robin in
port/queue order.
Now that we have the processing cycles used for existing rxqs,
use that information to try and produced a better balanced
distribution of rxqs across pmds. i.e. given multiple pmds, the
rxqs which have consumed the largest amount of processing cycles
will be placed on different pmds.
The rxqs are sorted by their processing cycles and assigned (in
sorted order) round robin across pmds.
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Count the cycles used for processing an rxq during the
pmd rxq interval. As this is an in flight counter and
pmds run independently, also store the total cycles used
during the last full interval.
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Add counters to dp_netdev_rxq which will later be used for storing the
processing cycles of an rxq. Processing cycles will be stored in reference
to a defined time interval. We will store the cycles of the current in progress
interval, a number of completed intervals and the sum of the completed
intervals.
cycles_count_intermediate was used to count cycles for a pmd. With some small
additions we can also use it to count the cycles used for processing an rxq.
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Soon we will want to store processing cycle counts in the dp_netdev_rxq,
so use that as a basis for the polled_queue that pmd_thread_main uses.
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
memcpy replaces the several single copies inside
dp_packet_clone_with_headroom().
Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
DPDK uses dp-packet pools and manages the mbuf portion of
each packet. When a pool is created, partial initialization is
also done on the OVS portion (i.e. non-mbuf). Since packet
memory is reused, this is not very useful for transient
fields and is also misleading. Furthermore, some of these
transient fields are properly initialized for DPDK packets
entering OVS anyways, which is the only reasonable way to do this.
Another field, cutlen, is initialized in this manner in the pool
and intended to be reset when cutlen is applied on sending the
packet out. However, if cutlen context is set but the packet is
not sent out for some reason, then the packet header would be
corrupted in the memory pool. It is better to just reset the
cutlen in the packets when received. I did not detect a
degradation in performance, however, I would be willing to
have some degradation, since this is a proper way to handle
this. In addition to initializing cutlen in received packets,
the other OVS transient fields are removed from the DPDK pool
initialization.
Acked-by: Sugesh Chandran <sugesh.chandran@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
The DPDK introductory documentation has various references to
pmd-cpu-mask, including a section devoted to it. These parts of
the documentation seemed to have been written at different times
and look like they were individually ported from other sources.
They all include an example command which gets repeated several times.
Here, we consolidate those referenes to make the documentation
easier to maintain. At the same time, create linkages to the
pmd-cpu-mask section from other sections to provide some level of
coherence.
Reviewed-by: Greg rose <gvrose8192@gmail.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
The bfd_calculate_chassis function calculates gateway's peer datapaths
to figure out which tunnel's BFD should be enabled to from the current chassis.
Existing algorithm only calculats peer datapaths at one hop, but multiple
logical switches and E/W routers could be in the path, making several hops
which were not considered on the calculation.
It may disable BFD on some gw's tunnel ports. Then a port on a remote ovs
cannot send packet out because it believes all remote gateways are down.
This patch will go through whole graph and visit all datapath's port
which has connection with gateways.
Signed-off-by: Zhenyu Gao <sysugaozhenyu@gmail.com>
Acked-by: Venkata Anil Kommaddi <vkommadi@redhat.com>
Tested-by: Venkata Anil Kommaddi <vkommadi@redhat.com>
Signed-off-by: Russell Bryant <russell@ovn.org>
When a system traffic is skipped due to 'HAVE_FTP = no' or
'HAVE_TFTP = no', it takes some effort to figure out it is due to
missing the required python library. Add some comments around the
find_l7_lib(), so that user can figure that out by
$ git grep HAVE_FTP.
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
RHEL 7.4 introduces netdev_master_upper_dev_link_rh() that breaks the
backport of OVS kernel module on RHEL 7.4. This patch fixes that issue.
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
Moves function OvsCreateNewNBLsFromMultipleNBs() to BufferMgmt.c
to facilitate consumption from outside PacketIO.c.
Signed-off-by: Shashank Ram <rams@vmware.com>
Acked-by: Anand Kumar <kumaranand@vmware.com>
Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
The function poll from poller should return a list of tuples
containing the events and their types.
On Windows the event type is not returned at the moment.
Instead of returning zero all the time, we check to see
the type of event and we set it accordingly before returning
the list.
This is used only for debugging purposes inside the function
"__log_wakeup" later on.
Signed-off-by: Alin Balutoiu <abalutoiu@cloudbasesolutions.com>
Acked-by: Russell Bryant <russell@ovn.org>
Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
Packet and Connection state is only available to the processing path
that follows the "recirc_table" argument of the ct() action. The
previous behavior made these states available until the end of the
pipeline. This commit changes the behavior so that the Packet and
Connection state are cleared for the current processing path whenever
ct() is called (in addition to reaching the end of the pipeline.)
A future commit will remove the behavior that a "send to controller"
action causes all packets for that flow to be handled via the slow-path.
The current behavior of connection tracking state makes that difficult
due to datapath actions containing multiple OpenFlow rules that may
contain different connection tracking states. This change will make
that future commit possible.
Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
The coding style states that BSD-style brace placement should be used,
and even single statements should be enclosed. Add checks to checkpatch
for this, particularly for 'else' statements.
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
This patch adds support for a "requested-chassis" option for logical
switch ports. If set, the only chassis that will claim this port is the
chassis identfied by this option; if already bound by another chassis,
it will be released.
The primary benefit of this enhancement is allowing a CMS to prevent
"thrashing" in the southbound database during live migration by keeping
the original chassis from attempting to re-bind a port that is in the
process of migrating.
This would also allow (with some additional work) RBAC to be applied
to the Port_Binding table for additional security.
Signed-off-by: Lance Richardson <lrichard@redhat.com>
Signed-off-by: Russell Bryant <russell@ovn.org>
Doing dump-flows also altering the netdev ports list.
So doing it pre the actual test is adding a check to
make sure we don't break the that list.
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
Executing dpctl commands from userspace also calls to
dpif_open()/dpif_close() but not really creating another dpif
but using a clone.
As for netdev_ports map is global we avoid adding duplicate entries
but also need to make sure we are not removing needed entries.
With this commit we make sure only the last dpif close should clean
the netdev_ports map.
Fixes: 6595cb95a4 ("dpif: Clean up netdev_ports map on dpif_close().")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
Fix double encoding/decoding on data, caused by
'get_decoded_buffer' and 'get_encoded_buffer'.
The functions 'get_decoded_buffer' and 'get_encoded_buffer'
from winutils have been removed. They are no longer
necessary since the buffers received/returned are already
in the right form.
The necessary encoding has been moved before any sending
function (this also includes named pipes send on Windows).
Co-authored-by: Alin Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Alin Balutoiu <abalutoiu@cloudbasesolutions.com>
Signed-off-by: Alin Serdean <aserdean@cloudbasesolutions.com>
Acked-by: Lance Richardson <lrichard@redhat.com>
Signed-off-by: Russell Bryant <russell@ovn.org>
During SNAT/DNAT, we should not be updating the port field of ct_endpoint
struct, as ICMP packets do not have port information. Since port and
icmp_id are overlapped in ct_endpoint struct, icmp_id gets changed.
As a result, NAT look up fails to find a matching entry.
This patch addresses this issue by not modifying icmp_id field during
SNAT/DNAT only for ICMP traffic
The current NAT module doesn't take the ICMP type/code into account
during the lookups. Fix this to make it similar with the other conntrack
module.
Acked-by: Shashank Ram <rams@vmware.com>
Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
Signed-off-by: Anand Kumar <kumaranand@vmware.com>
Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>