Until now, the rconn timers have been precise only to the nearest second.
This increases them to millisecond precision, which seems cleaner these
days.
Signed-off-by: Ben Pfaff <blp@ovn.org>
The rconn connection timer measures time on the granularity of seconds,
which means that sometimes the actual timeout can be just a millsecond or
so, which led to occasional immediate connection failures from ovs-ofctl.
VMware-BZ: #2295760
Fixes: 476d2551ab ("rconn: Introduce new invariant to fix assertion failure in corner case.")
Reported-by: Ken Ajiro <ken-ajiro@xr.jp.nec.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Recent assertion failure fix changed rconn workflow for unreliable
connections (such as connections from ovs-ofctl) from
|rconn|DBG|br-int<->unix#151: entering ACTIVE
|rconn|DBG|br-int<->unix#151: connection closed by peer
|rconn|DBG|br-int<->unix#151: entering DISCONNECTED
To
|rconn|DBG|br-int<->unix#200: entering CONNECTING
|rconn|INFO|br-int<->unix#200: connected
|rconn|DBG|br-int<->unix#200: entering ACTIVE
|rconn|DBG|br-int<->unix#200: connection closed by peer
|rconn|DBG|br-int<->unix#200: entering DISCONNECTED
Many monitoring/configuring tools (ex. ovs-neutron-agent) uses
ovs-ofctl frequently to check the statuses of installed flows.
This produces a lot of "connected" logs, that are useless in general.
Fix that by changing the log level to DBG for unreliable connections.
Suggested-by: Ben Pfaff <blp@ovn.org>
Fixes: c9a9b9b00b ("rconn: Introduce new invariant to fix assertion failure in corner case.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Until now, rconn_get_version() has only reported the OpenFlow version in
use when the rconn is actually connected. This makes sense, but it has a
harsh consequence. Consider code like this:
if (rconn_is_connected(rconn) && rconn_get_version(rconn) >= 0) {
for (int i = 0; i < 2; i++) {
struct ofpbuf *b = ofputil_encode_echo_request(
rconn_get_version(rconn));
rconn_send(rconn, b, NULL);
}
}
Maybe not the smartest code in the world, and probably no one would write
this exact code in any case, but it doesn't look too risky or crazy.
But it is. The second trip through the loop can assert-fail inside
ofputil_encode_echo_request() because rconn_get_version(rconn) returns -1
instead of a valid OpenFlow version. That happens if the first call to
rconn_send() encounters an error while sending the message and therefore
destroys the underlying vconn and disconnects so that rconn_get_version()
doesn't have a vconn to query for its version.
In a case like this where all the code to send the messages is close by, we
could just check rconn_get_version() in each loop iteration. We could even
go through the tree and convince ourselves that individual bits of code are
safe, or be conservative and check rconn_get_version() >= 0 in the iffy
cases. But this seems to me like an ongoing source of risk and a way to
get things wrong in corner cases.
This commit takes a different approach. It introduces a new invariant: if
an rconn has ever been connected, then it returns a valid OpenFlow version
from rconn_get_version(). In addition, if an rconn is currently connected,
then the OpenFlow version it returns is the correct one (that may be
obvious, but there were corner cases before where it returned -1 even
though rconn_is_connected() returned true).
With this commit, the code above would work OK. If the first call to
rconn_send() encounters an error sending the message, then
rconn_get_version() in the second iteration will return the same value as
in the first iteration. The message passed to rconn_send() will end up
being discarded, but that's much better than either an assertion failure or
having to carefully analyze a lot of our code to deal with one unusual
corner case.
Reported-by: Han Zhou <zhouhan@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Han Zhou <hzhou8@ebay.com>
Most of the tree now uses "encode" as the verb for making an OpenFlow
message, so adopt it here in this very old code as well.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
Poll-loop is the core to implement main loop. It should be available in
libopenvswitch.
Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Rconn provides useful features over vconn. Make it available to library
users.
Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
If an rconn peer fails to send a hello message, the version number doesn't
get set. Later, if the peer delays long enough, the rconn attempts to send
an echo request but assert-fails instead because it doesn't know what
version to use. This fixes the problem.
To reproduce this problem:
make sandbox
ovs-vsctl add-br br0
ovs-vsctl set-controller br0 ptcp:12345
nc 127.0.0.1 12345
and wait 10 seconds for ovs-vswitchd to die. (Then exit the sandbox.)
Reported-by: 张东亚 <fortitude.zhang@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
It is meaningful for user to check the stats of IPFIX.
Using IPFIX stats, user can know how much flows the system
can support. It is also can be used for performance check
of IPFIX.
IPFIX stats is added for per IPFIX exporter. If bridge IPFIX is
enabled on the bridge, the whole bridge will have one exporter.
For flow IPFIX, the system keeps per id (column in
Flow_Sample_Collector_Set) per exporter.
1) Add 'ovs-ofctl dump-ipfix-bridge SWITCH' to export IPFIX stats of
the bridge which enable bridge IPFIX. The output format:
NXST_IPFIX_BRIDGE reply (xid=0x2):
bridge ipfix: flows=0, current flows=0, sampled pkts=0, \
ipv4 ok=0, ipv6 ok=0, tx pkts=0
pkts errs=0, ipv4 errs=0, ipv6 errs=0, tx errs=0
2) Add 'ovs-ofctl dump-ipfix-flow SWITCH' to export IPFIX stats of
the bridge which enable flow IPFIX. The output format:
NXST_IPFIX_FLOW reply (xid=0x2): 2 ids
id 1: flows=4, current flows=4, sampled pkts=14, ipv4 ok=13, \
ipv6 ok=0, tx pkts=0
pkts errs=0, ipv4 errs=0, ipv6 errs=0, tx errs=0
id 2: flows=0, current flows=0, sampled pkts=0, ipv4 ok=0, \
ipv6 ok=0, tx pkts=0
pkts errs=0, ipv4 errs=0, ipv6 errs=0, tx errs=0
flows: the number of total flow records, including those exported.
current flows: the number of current flow records cached.
sampled pkts: Successfully sampled packet count.
ipv4 ok: successfully sampled IPv4 flow packet count.
ipv6 ok: Successfully sampled IPv6 flow packet count.
tx pkts: the count of IPFIX exported packets sent to the collector(s).
pkts errs: count of packets failed when sampling, maybe not supported or other error.
ipv4 errs: Count of IPV4 flow packet in the error packets.
ipv6 errs: Count of IPV6 flow packet in the error packets.
tx errs: the count of IPFIX exported packets failed when sending to the collector(s).
Signed-off-by: Benli Ye <daniely@vmware.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
There are four sessions established from ovn-controller to the following:
OVN Southbound — JSONRPC based
Local ovsdb — JSONRPC based
Local vswitchd — openflow based from ofctrl
Local vswitchd — openflow based from pinctrl
All of these sessions have their own probe_interval, For the last
two connections, they do not need probe_timer as they are over unix domain
socket. This patch takes care of that.
This change has been tested putting logs in several places like in
ovn-controller.c, lib/rconn.c to make sure the probe_timer is
disabled. Also, by making sure from ovn-controller's
log file that there is no more reconnect happening due to probe
under heavy load.
Signed-off-by: Nirapada Ghosh <nghosh@us.ibm.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
This commit also adds several #include directives in source files in
order to make the 'ofp-util.h' move possible
Signed-off-by: Ben Warren <ben@skyportsystems.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
This attempts to prevent namespace collisions with other list libraries
Signed-off-by: Ben Warren <ben@skyportsystems.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
On change in a table state, the controller needs to be informed with
the OFPT_TABLE_STATUS message. The message is sent with reason
OFPTR_VACANCY_DOWN or OFPTR_VACANCY_UP in case of change in remaining
space eventually crossing any one of the threshold.
Signed-off-by: Saloni Jain <saloni.jain@tcs.com>
Co-authored-by: Rishi Bamba <rishi.bamba@tcs.com>
Signed-off-by: Rishi Bamba <rishi.bamba@tcs.com>
[blp@ovn.org added vacancy event initialization and tests
and updated NEWS]
Signed-off-by: Ben Pfaff <blp@ovn.org>
One purpose of OpenFlow packet-in messages is to allow a controller to
interpose on the path of a packet through the flow tables. If, for
example, the controller needs to modify a packet in some way that the
switch doesn't directly support, the controller should be able to
program the switch to send it the packet, then modify the packet and
send it back to the switch to continue through the flow table.
That's the theory. In practice, this doesn't work with any but the
simplest flow tables. Packet-in messages simply don't include enough
context to allow the flow table traversal to continue. For example:
* Via "resubmit" actions, an Open vSwitch packet can have an
effective "call stack", but a packet-in can't describe it, and
so it would be lost.
* A packet-in can't preserve the stack used by NXAST_PUSH and
NXAST_POP actions.
* A packet-in can't preserve the OpenFlow 1.1+ action set.
* A packet-in can't preserve the state of Open vSwitch mirroring
or connection tracking.
This commit introduces a solution called "continuations". A continuation
is the state of a packet's traversal through OpenFlow flow tables. A
"controller" action with the "pause" flag, which is newly implemented in
this commit, generates a continuation and sends it to the OpenFlow
controller in a packet-in asynchronous message (only NXT_PACKET_IN2
supports continuations, so the controller must configure them with
NXT_SET_PACKET_IN_FORMAT). The controller processes the packet-in,
possibly modifying some of its data, and sends it back to the switch with
an NXT_RESUME request, which causes flow table traversal to continue. In
principle, a single packet can be paused and resumed multiple times.
Another way to look at it is:
- "pause" is an extension of the existing OFPAT_CONTROLLER
action. It sends the packet to the controller, with full
pipeline context (some of which is switch implementation
dependent, and may thus vary from switch to switch).
- A continuation is an extension of OFPT_PACKET_IN, allowing for
implementation dependent metadata.
- NXT_RESUME is an extension of OFPT_PACKET_OUT, with the
semantics that the pipeline processing is continued with the
original translation context from where it was left at the time
it was paused.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Jarno Rajahalme <jarno@ovn.org>
This patch renames the command name related with geneve-map to a more
generic name as following:
add-geneve-map -> add-tlv-map
del-geneve-map -> del-tlv-map
dump-geneve-map -> dump-tlv-map
It also renames the Geneve_table to tlv_table.
By doing this renaming, the NSH variable context header (the same TLV
format as Geneve) or other protocol can reuse the field tun_metadata<N>
in the future.
Signed-off-by: Mengke Liu <mengke.liu@intel.com>
Signed-off-by: Ricky Li <ricky.li@intel.com>
Signed-off-by: Jesse Gross <jesse@kernel.org>
This patch adds support for Openflow1.4 Group & meter change notification
messages. In a multi controller environment, when a controller modifies the
state of group and meter table, the request that successfully modifies this
state is forwarded to other controllers. Other controllers are informed with
the OFPT_REQUESTFORWARD message. Request forwarding is enabled on a per
controller channel basis using the Set Asynchronous Configuration Message.
Signed-off-by: Niti Rohilla <niti.rohilla@tcs.com>
Co-authored-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
In order to work with Geneve options, we need to maintain a mapping
table between an option (defined by <class, type, length>) and
an NXM field that can be operated on for the purposes of matches,
actions, etc. This mapping must be explicitly specified by the
user.
Conceptually, this table could be communicated using either OpenFlow
or OVSDB. Using OVSDB requires less code and definition of extensions
than OpenFlow but introduces the possibility that mapping table
updates and flow modifications are desynchronized from each other.
This is dangerous because the mapping table signifcantly impacts the
way that flows using Geneve options are installed and processed by
OVS. Therefore, the mapping table is maintained using OpenFlow commands
instead, which opens the possibility of using synchronization between
table changes and flow modifications through barriers, bundles, etc.
There are two primary groups of OpenFlow messages that are introduced
as Nicira extensions: modification commands (add, delete, clear mappings)
and table status request/reply to dump the current table along with switch
information.
Note that mappings should not be changed while they are in active use by
a flow. The result of doing so is undefined.
This only adds the OpenFlow infrastructure but doesn't actually
do anything with the information yet after the messages have been
decoded.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
rconn_get_connection_seqno() is documented to change only when an rconn
connects or disconnnects, but in fact it was also changing whenever an
rconn went into or out of the "idle" state (following sending an echo
request). This fixes the problem.
rconn_get_connection_seqno() didn't have any existing users, but an
upcoming commit adds one.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Russell Bryant <rbryant@redhat.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
ofpbuf was complicated due to its wide usage across all
layers of OVS, Now we have introduced independent dp_packet
which can be used for datapath packet, we can simplify ofpbuf.
Following patch removes DPDK mbuf and access API of ofpbuf
members.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
These functions had no callers, so remove them and the data maintained
just to implement them.
Found by inspection.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Thomas Graf <tgraf@noironetworks.com>
Use of OF 1.4 bundle messages by a controller should indicate that the
controller has decided to use the switch, hence make is_admitted_msg()
return 'true' for them.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Also moves definitions for struct vconn and pvconn to the public
header. The provider interface is kept private.
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Ben Pfaff <blp@nicira.com>
A new function vlog_insert_module() is introduced to avoid using
list_insert() from the vlog.h header.
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Ben Pfaff <blp@nicira.com>
struct list is a common name and can't be used in public headers.
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Ben Pfaff <blp@nicira.com>
On Windows, when a peer terminates without calling a close
on socket fd, the server ends up printing "connection dropped"
warning messages. We probably don't want those warning messages
when the error is WSAECONNRESET.
(In OVS unit tests on Windows, anytime a client like ovs-ofctl
calls a ovs_fatal without clean close(fd) on the socket, the
server like ovs-vswitchd prints warnings that cause unit tests
to fail.)
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
An rconn has a human-readable name that typically designates both endpoints
of the connection. For a "reliable" rconn, that automatically reconnects,
the name remains constant regardless of whether the rconn is currently
connected. Until now, though, an "unreliable" rconn, that cannot
automatically reconnect, kept its name only until disconnection occurred.
This is OK for the uses currently in the OVS tree, which only use the name
of a rconn while it is connected, but an upcoming commit will add a final
log message following disconnection in some cases, and it makes the log
messages less useful if unreliable rconns just report "void" in that case.
This commit, therefore, modifies the rconn code so that unreliable rconns
preserve their names past disconnection, just like reliable ones.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
rconn objects do not cache IP address and port information any longer.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
This is only the communication part of the bundles functionality.
The actual message pre-validation and commits are not implemented.
We also enable OF1.4 for all the tests.
Signed-off-by: Alexandru Copot <alex.mihai.c@gmail.com>
Cc: Daniel Baluta <dbaluta@ixiacom.com>
[blp@nicira.com made ofputil_decode_bundle_add() more obviously correct]
Signed-off-by: Ben Pfaff <blp@nicira.com>
Rename 'l2' to 'frame' and add new ofpbuf_set_frame() and ofpbuf_l2().
ofpbuf_set_frame() alse resets all the layer offsets. ofpbuf_l2()
returns NULL if the packet has no Ethernet header, as indicated either
by unset l3 offset or NULL frame pointer. Callers of ofpbuf_l2() are
supposed to check the return value, unless they can otherwise be sure
that the packet has a valid Ethernet header.
The recent commit 437d0d22 made some assumptions that were not valid
regarding the use of the 'l2' pointer in rconn module and by
compose_rarp(). This is now fixed as follows: rconn now relies on the
fact that once OpenFlow messages are given to rconn for transport, the
frame pointer is no longer needed to refer to the OpenFlow header; and
compose_rarp() now sets the frame pointer and offsets as expected.
In addition to storing network frames, ofpbufs are also used for
handling OpenFlow messages and action lists. lib/ofpbuf.h now has a
comment documenting the current usage conventions and invariants.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This patch shrinks the struct ofpbuf from 104 to 48 bytes on 64-bit
systems, or from 52 to 36 bytes on 32-bit systems (counting in the
'l7' removal from an earlier patch). This may help contribute to
cache efficiency, and will speed up initializing, copying and
manipulating ofpbufs. This is potentially important for the DPDK
datapath, but the rest of the code base may also see a little benefit.
Changes are:
- Remove 'l7' pointer (previous patch).
- Use offsets instead of layer pointers for l2_5, l3, and l4 using
'l2' as basis. Usually 'data' is the same as 'l2', but this is not
always the case (e.g., when parsing or constructing a packet), so it
can not be easily used as the offset basis. Also, packet parsing is
faster if we do not need to maintain the offsets each time we pull
data from the ofpbuf.
- Use uint32_t for 'allocated' and 'size', as 2^32 is enough even for
largest possible messages/packets.
- Use packed enum for 'source'.
- Rearrange to avoid unnecessary padding.
- Remove 'private_p', which was used only in two cases, both of which
had the invariant ('l2' == 'data'), so we can temporarily use 'l2'
as a private pointer.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
rconn_recv() calls vconn_recv(), which will report a connection error and
cause rconn_recv() to disconnect the rconn. Most rconn users regularly
call rconn_recv(), so that disconnection happens promptly. However,
the lswitch code only calls rconn_recv() after the connection negotiates an
OpenFlow version, so a connection that failed before negotiation would
never be detected as failed. This commit fixes the problem by making
rconn_run() also detect and handle connection errors.
The lswitch code is only used by the test-controller program, so this is
not an important bug fix.
Reported-by: Vasu Dasari <vdasari@gmail.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
These functions don't have any ultimate users. The in-band control code
used to use them, but not anymore, so we might as well delete them all.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
This allows other libraries to use util.h that has already
defined NOT_REACHED.
Signed-off-by: Harold Lim <haroldl@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This should make sending OFPT_FLOW_REMOVED and NXST_FLOW_MONITOR safe from
miss handler threads.
Bug #20271.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
Added infrastructure to support Openflow OFPT_TABLE_MOD message. This patch
does not include the flexible table miss handling code that is necessary to
support the semantics specified in OFPT_TABLE_MOD messages.
Current flow miss behavior continues to conform to Openflow 1.0. Future
commits to add more flexible table miss support are needed to fully support
OPFT_TABLE_MOD for Openflow-1.1+.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>