The ability to retrieve and set MAC addresses on vports is only
necessary for tunnel ports (the addresses for actual devices can be
retrieved through direct Linux mechanisms). Tunnel ports only used
the information for the purpose of generating path MTU discovery
packets, which has now been removed. Current userspace code already
reflects these changes, so this drops the functionality from the
kernel.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Kyle Mestery <kmestery@cisco.com>
Messages about packets being lost are logged at level WARN, but when
they were generated at a high rate, those consolidated messages were
logged at ERR. This changes to consolidated messages to be logged at
WARN, too.
Thanks to Ben Pfaff for quickly suggesting the culprit.
Bug #14783
Reported-by: James Schmidt <jschmidt@nicira.com>
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Commit 78a2d59c (dpif-linux.c: Let the kernel pick a port number if one
not requested.) changed the logic for port assignment, but didn't
properly handle some error conditions. An attempt to add a tunnel port
that already exists would lead to a segfault. This commit fixes the
logic to stop processing and return an error.
Reported-by: Gurucharan Shetty <shettyg@nicira.com>
Signed-off-by: Justin Pettit <jpettit@nicira.com>
This is a straight search-and-replace, except that I also removed #include
<assert.h> from each file where there were no assert calls left.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
Userspace-kernel communication is a possible bottleneck when OVS is
receiving a large number of flow set up requests. To help prevent a bad
actor from consuming too much of this resource, we introduced channels
to segegrate traffic. Previously, we created 17 channels and
round-robin assigned ports to one of 16 channels (the 17th was reserved
for use by the system). This meant if there were more than 16 ports,
sharing of channels would occur.
This commit creates a new channel for each port, so that there is no
more sharing and better isolation. The special system port uses the
"ovs-system"'s channel (port 0), since it is not heavily loaded.
This also fixes an issue introduced in commit acf60855 (ofproto-dpif:
Use a single underlying datapath across multiple bridges.) where ports
that were added at run-time were given the special system channel.
Issue #12073
Signed-off-by: Justin Pettit <jpettit@nicira.com>
When adding a port, the code previously logged the requested port number
(which is generally UINT32_MAX) instead of the assigned port number.
Signed-off-by: Justin Pettit <jpettit@nicira.com>
This patch adds support for skb mark matching and set action.
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Ansis Atteka <aatteka@nicira.com>
Depending on the port and type of datapath, a port may need to be opened
as a different type of device than it's configured. For example, an
"internal" port on a "dummy" datapath should opened as a "dummy" port.
This commit adds the ability for a dpif to provide this information to a
caller. It will be used in a future commit.
Signed-off-by: Justin Pettit <jpettit@nicira.com>
With the single datapath change, we no longer depend on the kernel to
make sure that we don't reuse OpenFlow port numbers, since the ofproto
library now picks them. Remove the code that contained that logic.
Suggested-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Provide the ability to determine whether a port exists in a datapath
without having to deal with a "dpif_port" structure as with
dpif_port_query_by_name(). A future patch will use this function.
Signed-off-by: Justin Pettit <jpettit@nicira.com>
The current code has a simple mapping between datapath and OpenFlow port
numbers (the port numbers were the same other than OFPP_LOCAL which maps
to datapath port 0). Since the translation was know at compile time,
this allowed different layers to easily translate between the two, so
the translation often occurred late.
A future commit will break this simple mapping, so this commit draws a
line between where datapath and OpenFlow port numbers are used. The
ofproto-dpif layer will be responsible for the translations. Callers
above will use OpenFlow port numbers. Providers below will use
datapath port numbers.
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Most of the code referred to datapath ports as 32-bit values, but a few
places still used 16-bit references.
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Soon the kernel will begin supplying the information about the outer
IP header for tunneled packets and userspace will need to be able to
track it as part of the flow. For the time being this is only used
internally by OVS and not exposed outwards to OpenFlow. As a result,
this threads the information throughout userspace but simply stores
the existing tun_id in it.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Packet loss is recoverable so it doesn't warrant an ERR.
Bug #12920.
Reported-by: Scott Hendricks <shendricks@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Casts are sometimes necessary. One common reason that they are necessary
is for discarding a "const" qualifier. However, this can impede
maintenance: if the type of the expression being cast changes, then the
presence of the cast can hide a necessary change in the code that does the
cast. Using CONST_CAST, instead of a bare cast, makes these changes
visible.
Inspired by my own work elsewhere:
http://git.savannah.gnu.org/cgit/pspp.git/tree/src/libpspp/cast.h#n80
Signed-off-by: Ben Pfaff <blp@nicira.com>
The datapath allows requesting a specific port number for a port, but
the dpif interface didn't expose it. This commit adds that support.
Signed-off-by: Justin Pettit <jpettit@nicira.com>
When DPIF_OP_FLOW_PUT or DPIF_OP_FLOW_DEL operations failed, they left
their 'stats' outputs uninitialized. For DPIF_OP_FLOW_DEL, this meant that
the caller would read indeterminate data:
Conditional jump or move depends on uninitialised value(s)
at 0x805C1EB: subfacet_reset_dp_stats (ofproto-dpif.c:4410)
by 0x80637D2: expire_batch (ofproto-dpif.c:3471)
by 0x8066114: run (ofproto-dpif.c:3513)
by 0x8059DF4: ofproto_run (ofproto.c:1035)
by 0x8052E17: bridge_run (bridge.c:2005)
by 0x8053F74: main (ovs-vswitchd.c:108)
It's unusual for a delete operation to fail. The most common reason is an
administrator running "ovs-dpctl del-flows".
The only user of DPIF_OP_FLOW_PUT did not request stats, so this doesn't
fix an actual bug for that case.
Bug #11797.
Reported-by: James Schmidt <jschmidt@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This fixes the following warning on my system. "format '%d' expects
argument of type 'int', but argument 5 has type 'long int'"
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Until now, when a packet was dropped in the kernel-to-user buffers, we
logged the occurrence but nothing that would allow a person reading the
log after the fact to learn why it was dropped. This commit adds details
that identify the major sources of packets in the buffer, which should
help.
Signed-off-by: Ben Pfaff <blp@nicira.com>
An initial attempt also replaced the 'uint32_t ready_mask' in struct
dpif_linux by a 'bool ready' in each struct dpif_channel, but I wasn't
happy with the result (the ready_mask bitmap works out really well) and so
I dropped that part.
Signed-off-by: Ben Pfaff <blp@nicira.com>
When a kernel-to-user Netlink buffer overflows, the kernel reports
ENOBUFS without passing along an actual message. When it does this,
we should immediately try again, because we know that there is a
message waiting, instead of reporting the error to the caller.
This improves the OVS response rate to "hping3 --flood" traffic by
a few percentage points in my testing.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Until now, packets for these special protocols have been mixed with general
traffic in the kernel-to-userspace queues. This means that a big-enough
storm of new flows in these queues can cause packets for these special
protocols to be dropped at this interface, fooling userspace into believing
that, say, no CFM packets have been received even though they are arriving
at the expected rate.
This commit moves special protocols to a dedicated kernel-to-userspace
queue to avoid the problem.
Bug #7550.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Replaced all instances of Nicira Networks(, Inc) to Nicira, Inc.
Feature #10593
Signed-off-by: Raju Subramanian <rsubramanian@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Typically an nl_sock client can stack-allocate the buffer for receiving
a Netlink message, which provides a performance boost.
Signed-off-by: Ben Pfaff <blp@nicira.com>
The kernel will report a vport with the given name in any datapath, but
userspace only wants a vport with the given name in a specific datapath.
Receiving information on a vport in an unexpected datapath yields bizarre
and hard-to-debug problems.
Bug #9889.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Use hash table to store ports of datapath. Allow 64K ports per switch.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Bug #2462
Until now, a "flow put" has represented its parameters in two different
ways, depending on whether it was coming from dpif_flow_put() or from
dpif_operate(), and similarly for an "execute" operation. This commit
adopts the operation struct consistently within the dpif provider
interface, which seems cleaner.
This commit also factors out logging for flow puts and executes, which
is useful in the following commit.
This doesn't change the dpif client interface, since the two forms are
more convenient for clients than always filling out an operation struct.
Signed-off-by: Ben Pfaff <blp@nicira.com>
I'd like to change ->dpif_flow_put() and ->dpif_execute() in the dpif
provider to take the structures of the same names as parameters, instead of
passing them discrete parameters, because this seems like a more sensible
way to do things internally than to have two different ways to pass the
parameters. It might even simplify code slightly. But ->flow_put() and
->execute() wouldn't want the 'type' (because it's implied by the function
being called) or 'error' (because it would be the same as the return
value). Although of course they could just ignore those members, it seems
slightly cleaner to omit them entirely, as this change allows.
Signed-off-by: Ben Pfaff <blp@nicira.com>
At one point in the past, there were three separate queues between the
kernel module and OVS userspace, each of which corresponded to a Netlink
socket (or, before that, to a character device). It made sense to allow
each of these to be enabled or disabled separately, hence the "listen mask"
concept in the dpif layer.
These days, the concept is much less clear-cut. Queuing is no longer on
the basis of different classes of packets but instead striped across a
collection of sockets based on input port. It doesn't really make sense
to enable receiving packets on the basis of the kind of packet anymore.
Accordingly, this commit simplifies the "listen_mask" to just a bool that
either enables or disables receiving packets.
It could be useful to enable or disable receiving packets on a per-vport
basis, but the rest of the code isn't ready to make use of that so this
commit doesn't generalize this much.
Based on this discussion on ovs-dev:
http://openvswitch.org/pipermail/dev/2011-October/012044.html
Signed-off-by: Ben Pfaff <blp@nicira.com>
Valgrind points out correctly that there are uninitialized bytes in the
'event' structure. That's OK, but it doesn't hurt to suppress the warning
by zeroing all of the bytes.
This doesn't fix a real bug.
Signed-off-by: Ben Pfaff <blp@nicira.com>
epoll appears to be much more efficient than poll() at least for
static file descriptor sets. I can't otherwise explain why this
patch increases netperf CRR performance by 20% above the previous
commit, which is also about a 19% overall improvement versus
the baseline from before the poll_fd_woke() optimization was
removed.
Using poll() internally in dpif_linux_recv(), instead of relying
on the results of the main loop poll() call, brings netperf CRR
performance back within 1% of par versus the code base before the
poll_fd_woke() optimizations were introduced. It also increases
the ovs-benchmark results by about 5% versus that baseline, too.
My theory is that this is because the main loop takes long enough
that a significant number of packets can arrive during the main
loop itself, so this reduces the time before OVS gets to those
packets.
This optimization on its own provided about 37% benefit against a
load of a single netperf CRR test, but at the same time it penalized
ovs-benchmark by about 11%. We can get back the CRR performance
loss, and more, other ways, so the first step is to revert this
patch, temporarily accepting the performance loss.
Following patch adds skb-priority to flow key. So userspace will know
what was priority when packet arrived and we can remove the pop/reset
priority action. It's no longer necessary to have a special action for
pop that is based on the kernel remembering original skb->priority.
Userspace can just emit a set priority action with the original value.
Since the priority field is a match field with just a normal set action,
we can convert it into the new model for actions that are based on
matches.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Bug #7715
Currently we hard code the versions of our GENL families to 1 but it's
nicer to have symbolic constants.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Until now, OVS has handled IP fragments more awkwardly than necessary. It
has not been possible to match on L4 headers, even in fragments with offset
0 where they are actually present. This means that there was no way to
implement ACLs that treat, say, different TCP ports differently, on
fragmented traffic; instead, all decisions for fragment forwarding had to
be made on the basis of L2 and L3 headers alone.
This commit improves the situation significantly. It is still not possible
to match on L4 headers in fragments with nonzero offset, because that
information is simply not present in such fragments, but this commit adds
the ability to match on L4 headers for fragments with zero offset. This
means that it becomes possible to implement ACLs that drop such "first
fragments" on the basis of L4 headers. In practice, that effectively
blocks even fragmented traffic on an L4 basis, because the receiving IP
stack cannot reassemble a full packet when the first fragment is missing.
This commit works by adding a new "fragment type" to the kernel flow match
and making it available through OpenFlow as a new NXM field named
NXM_NX_IP_FRAG. Because OpenFlow 1.0 explicitly says that the L4 fields
are always 0 for IP fragments, it adds a new OpenFlow fragment handling
mode that fills in the L4 fields for "first fragments". It also enhances
ovs-ofctl to allow users to configure this new fragment handling mode and
to parse the new field.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Bug #7557.
A fair number of datapath flow operations optionally report back results
to the requester based on whether NLM_F_ECHO is set in the request. When
userspace isn't going to use those results anyway, it wastes memory to
store them and a system call to retrieve them.
This commit omits the NLM_F_ECHO bit in cases where the caller isn't going
to use the results.
(NLM_F_ECHO has no effect on operations whose entire purpose is to retrieve
data, e.g. "get" and "dump" operations, so we need not bother to set it
for those.)
This improves "ovs-benchmark rate" results in my testing by about 4%.
Most of the enum tags in this file are lowercased versions of the uppercase
enum prefixes (or slightly less abbreviated versions, e.g. "dp" becomes
"datapath"). This commit fixes up the others for consistency.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
This increases consistency with the OVS_ACTION_ATTR_USERSPACE action, which
also requires an explicit pid.
Suggested-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Commit b063d9f06 "datapath: Use unicast Netlink sockets for upcalls" that
switched from multicast to unicast Netlink for sending upcalls added a
Netlink PID to each kernel flow, used by OVS_ACTION_ATTR_USERSPACE actions
within the flow as target.
This commit drops this per-flow PID in favor of a per-action PID, because
that is more flexible. It does not yet make use of this additional
flexibility, so behavior should not change.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Bug #7559.
Commit b063d9f0 "datapath: Use unicast Netlink sockets for upcalls" that
introduced an 'upcall_pid' member into struct dpif_linux_vport, struct
dpif_linux_dp, and struct dpif_linux_flow neglected to do so only if the
member was nonzero. This caused every datapath, vport, and flow operation
to supply an upcall_pid. In particular, the netdev_set_config() called at
startup when a vport already existed caused the upcall_pid for that vport
to be reset to 0, which in turn caused all packets received on the vport to
be dropped instead of forwarded to ovs-vswitchd.
Reported-by: Shih-Hao Li <shli@nicira.com>
Bug #7714.
Following patch removes ifIndex attribute of vport which is not
used in userspace.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Bug #7114
Over time we wish to reduce the number of datapath-protocol.h definitions
used directly outside of Linux-specific code. This commit removes use of
"struct ovs_dp_stats" from platform-independent code.
Bug #7559.