2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-28 12:58:00 +00:00

169 Commits

Author SHA1 Message Date
Ben Pfaff
3aa30359b1 netdev-vport: Don't return static data in netdev_vport_get_dpif_port().
Returning a static data buffer makes code more brittle and definitely
not thread-safe, so this commit switches to using a caller-provided
buffer instead.

Signed-off-by: Ben Pfaff <blp@nicira.com>
2013-06-06 16:54:46 -07:00
Ben Pfaff
d64e176c35 dpif-linux: Make dummy_action const in dpif_linux_init_flow_put().
This makes this code more obviously thread-safe.

Signed-off-by: Ben Pfaff <blp@nicira.com>
2013-05-03 13:29:46 -07:00
Ben Pfaff
48f6fbe197 dpif-linux: Close channel Netlink sockets when a port number gets recycled.
When ovs-vswitchd deletes a port with dpif_linux_port_del(), that function
uses del_channel() to delete the corresponding channel, including closing
its Netlink socket fd.  However, if the vport gets removed by some other
process (e.g. "ip link delete" for veths) then this function never gets
called and thus the channel never gets deleted.

This commit partially fixes the problem.  Now, if a port number gets
reused, add_channel() closes the old Netlink socket assigned to that port
before it installs the new one.

Bug #16784.
Reported-by: Paul Ingram <paul@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2013-05-02 10:10:42 -07:00
Ben Pfaff
12d7685905 dpif-linux: Use MAX_PORTS instead of hard-coded 65535.
MAX_PORTS is currently USHRT_MAX (also 65535).  I think that's a
coincidence; I don't remember MAX_PORTS being mentioned when the new
dpif_channel code was written.

Signed-off-by: Ben Pfaff <blp@nicira.com>
2013-05-02 10:06:35 -07:00
Ethan Jackson
fa717215da dpif-linux: Reset epoll() on channel deletion.
The list of epoll events contains references to channels which may
be stale when one of those channels is deleted.  The safest thing
to do is simply refresh epoll() whenever a channel is deleted.

Bug #16057.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
2013-04-10 13:05:04 -07:00
Pravin B Shelar
85c9de194b Tunnel: Cleanup old tunnel infrastructure.
Since userspace flow based tunneling code is checked in, the kernel
port based tunneling code can be removed.

Patch removes following components:
 - tunnel ports hash table and moved tunnel ports list to individual
   vports.
 - Cleaned per tnl-port config.
 - OVS_KEY_ATTR_TUN_ID action is removed.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>

Bug #15078
2013-03-04 13:00:25 -08:00
Lorand Jakab
a6ae068b7b Add support for LISP tunneling
LISP is an experimental layer 3 tunneling protocol, described in RFC
6830.  This patch adds support for LISP tunneling.  Since LISP
encapsulated packets do not carry an Ethernet header, it is removed
before encapsulation, and added with hardcoded source and destination
MAC addresses after decapsulation.  The harcoded MAC chosen for this
purpose is the locally administered address 02:00:00:00:00:00.  Flow
actions can be used to rewrite this MAC for correct reception.  As such,
this patch is intended to be used for static network configurations, or
with a LISP capable controller.

Signed-off-by: Lorand Jakab <lojakab@cisco.com>
Signed-off-by: Kyle Mestery <kmestery@cisco.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2013-02-25 15:55:46 -08:00
Pravin B Shelar
09538fdc57 datapath: Remove CAPWAP tunneling support.
The CAPWAP implementation is just the encapsulation format and
therefore really not the full protocol.  While there were some
uses of it (primarily hardware support and UDP transport).  But
these are most likely better provided by VXLAN.

Following patch removes CAPWAP tunneling support.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2013-02-19 12:45:57 -08:00
Ben Pfaff
e995e3df57 Allow OVS_USERSPACE_ATTR_USERDATA to be variable length.
Until now, the optional OVS_USERSPACE_ATTR_USERDATA attribute had to be
exactly 64 bits long, if it was present.  However, 64 bits is not enough
space to associate as much information with a flow as would be convenient
for some userspace features now under development.  This commit generalizes
the attribute, allowing it to be any length.

This generalization is backward-compatible: if userspace only uses 64-bit
attributes, then it will not see any change in behavior.

CC: Romain Lenglet <rlenglet@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2013-02-15 16:48:32 -08:00
Ben Pfaff
7e2d8aeaf5 dpif-linux: Fix byte-swapping direction in nl_msg_put_u16() call.
OVS_TUNNEL_ATTR_DST_PORT expects a u16, tnl_cfg->dst_port is a be16, so
we want ntohs() instead of htons().

In practice htons() and ntohs() perform the same operation, so this does
not fix a real bug.

Found by sparse.

Signed-off-by: Ben Pfaff <blp@nicira.com>
2013-02-15 13:23:07 -08:00
Kyle Mestery
26508d9a55 Modify dpif_linux_port_add() to set the destination port for VXLAN ports.
Signed-off-by: Kyle Mestery <kmestery@cisco.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2013-02-14 10:01:05 -08:00
Ethan Jackson
c060c4cf83 netdev-vport: Build on all platforms.
This patch removes the final bit of linux specific code which
prevents building netdev-vport everywhere.  With this, other
platforms automatically get access to patch ports, and (if their
datapath supports it), flow based tunneling.

Signed-off-by: Ethan Jackson <ethan@nicira.com>
2013-01-28 19:09:58 -08:00
Ethan Jackson
b9ad7294a5 lib: Switch to flow based tunneling.
With this patch, ovs-vswitchd uses flow based tunneling
exclusively.  I.E. each kind of tunnel shares a single tunnel
backer in the datapath.  Tunnel headers are set by userspace using
the ipv4_tunnel datapath action.  And, the configuration of
individual tunnels is now a userspace responsibility, so
netdev-vport no longer marshals and unmarshals Netlink attributes
for tunnel configuration, instead only storing the configuration
internally.  There are still some significant pieces of work to do,
but the basic building blocks are there to begin testing.

Signed-off-by: Ethan Jackson <ethan@nicira.com>
Co-authored-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2013-01-28 19:09:58 -08:00
Ethan Jackson
de28115365 netdev: New function netdev_get_dpif_port().
In future patches, a netdev's datapath port name may not
necessarily be the same as its device name. This patch prepares for
this by making the distinction in the netdev and dpif layers.

Signed-off-by: Ethan Jackson <ethan@nicira.com>
2013-01-28 19:09:58 -08:00
Jesse Gross
1fc7083dc4 datapath: Remove vport MAC address configuration.
The ability to retrieve and set MAC addresses on vports is only
necessary for tunnel ports (the addresses for actual devices can be
retrieved through direct Linux mechanisms).  Tunnel ports only used
the information for the purpose of generating path MTU discovery
packets, which has now been removed.  Current userspace code already
reflects these changes, so this drops the functionality from the
kernel.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Kyle Mestery <kmestery@cisco.com>
2013-01-28 10:26:32 -08:00
Justin Pettit
8d675c5aee dpif-linux: Report dropped lost messages at WARN level.
Messages about packets being lost are logged at level WARN, but when
they were generated at a high rate, those consolidated messages were
logged at ERR.  This changes to consolidated messages to be logged at
WARN, too.

Thanks to Ben Pfaff for quickly suggesting the culprit.

Bug #14783

Reported-by: James Schmidt <jschmidt@nicira.com>
Signed-off-by: Justin Pettit <jpettit@nicira.com>
2013-01-25 14:37:39 -08:00
Justin Pettit
2510ba7cfd dpif-linux: Fix segfault when a port already exists.
Commit 78a2d59c (dpif-linux.c: Let the kernel pick a port number if one
not requested.) changed the logic for port assignment, but didn't
properly handle some error conditions.  An attempt to add a tunnel port
that already exists would lead to a segfault.  This commit fixes the
logic to stop processing and return an error.

Reported-by: Gurucharan Shetty <shettyg@nicira.com>
Signed-off-by: Justin Pettit <jpettit@nicira.com>
2013-01-16 17:35:51 -08:00
Ben Pfaff
cb22974d77 Replace most uses of assert by ovs_assert.
This is a straight search-and-replace, except that I also removed #include
<assert.h> from each file where there were no assert calls left.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
2013-01-16 16:03:37 -08:00
Justin Pettit
989fd54803 dpif-linux: Give each port its own userspace-kernel channel.
Userspace-kernel communication is a possible bottleneck when OVS is
receiving a large number of flow set up requests.  To help prevent a bad
actor from consuming too much of this resource, we introduced channels
to segegrate traffic.  Previously, we created 17 channels and
round-robin assigned ports to one of 16 channels (the 17th was reserved
for use by the system).  This meant if there were more than 16 ports,
sharing of channels would occur.

This commit creates a new channel for each port, so that there is no
more sharing and better isolation.  The special system port uses the
"ovs-system"'s channel (port 0), since it is not heavily loaded.

This also fixes an issue introduced in commit acf60855 (ofproto-dpif:
Use a single underlying datapath across multiple bridges.) where ports
that were added at run-time were given the special system channel.

Issue #12073

Signed-off-by: Justin Pettit <jpettit@nicira.com>
2013-01-10 14:38:42 -08:00
Justin Pettit
f205882ae9 dpif-linux: Log the correct port-PID mapping.
When adding a port, the code previously logged the requested port number
(which is generally UINT32_MAX) instead of the assigned port number.

Signed-off-by: Justin Pettit <jpettit@nicira.com>
2013-01-10 14:38:31 -08:00
Ethan Jackson
552e20d02a netdev-vport: Remove the ability to send packets.
The only user of netdev_send() is dpif-netdev which doesn't support
vports.

Signed-off-by: Ethan Jackson <ethan@nicira.com>
2012-12-26 12:56:09 -08:00
Ansis Atteka
72e8bf28bb datapath: add skb mark matching and set action
This patch adds support for skb mark matching and set action.

Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Ansis Atteka <aatteka@nicira.com>
2012-11-21 16:19:30 -08:00
Justin Pettit
0aeaabc8db Add functions to determine how port should be opened based on type.
Depending on the port and type of datapath, a port may need to be opened
as a different type of device than it's configured.  For example, an
"internal" port on a "dummy" datapath should opened as a "dummy" port.
This commit adds the ability for a dpif to provide this information to a
caller.  It will be used in a future commit.

Signed-off-by: Justin Pettit <jpettit@nicira.com>
2012-11-16 12:35:55 -08:00
Justin Pettit
78a2d59c1c dpif-linux.c: Let the kernel pick a port number if one not requested.
With the single datapath change, we no longer depend on the kernel to
make sure that we don't reuse OpenFlow port numbers, since the ofproto
library now picks them.  Remove the code that contained that logic.

Suggested-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Justin Pettit <jpettit@nicira.com>
2012-11-16 12:35:55 -08:00
Justin Pettit
4afba28d55 dpif: Add new dpif_port_exists() function.
Provide the ability to determine whether a port exists in a datapath
without having to deal with a "dpif_port" structure as with
dpif_port_query_by_name().  A future patch will use this function.

Signed-off-by: Justin Pettit <jpettit@nicira.com>
2012-11-01 22:54:27 -07:00
Justin Pettit
ddbfda8462 Use ODP ports in dpif layer and below.
The current code has a simple mapping between datapath and OpenFlow port
numbers (the port numbers were the same other than OFPP_LOCAL which maps
to datapath port 0).  Since the translation was know at compile time,
this allowed different layers to easily translate between the two, so
the translation often occurred late.

A future commit will break this simple mapping, so this commit draws a
line between where datapath and OpenFlow port numbers are used.  The
ofproto-dpif layer will be responsible for the translations.  Callers
above will use OpenFlow port numbers.  Providers below will use
datapath port numbers.

Signed-off-by: Justin Pettit <jpettit@nicira.com>
2012-11-01 22:54:27 -07:00
Justin Pettit
9b56fe137d Always treat datapath ports as 32 bits.
Most of the code referred to datapath ports as 32-bit values, but a few
places still used 16-bit references.

Signed-off-by: Justin Pettit <jpettit@nicira.com>
2012-11-01 22:54:27 -07:00
Jesse Gross
296e07ace0 flow: Extend struct flow to contain tunnel outer header.
Soon the kernel will begin supplying the information about the outer
IP header for tunneled packets and userspace will need to be able to
track it as part of the flow.  For the time being this is only used
internally by OVS and not exposed outwards to OpenFlow.  As a result,
this threads the information throughout userspace but simply stores
the existing tun_id in it.

Signed-off-by: Jesse Gross <jesse@nicira.com>
2012-10-03 10:04:10 -07:00
Ben Pfaff
8d0abb5ef5 dpif-linux: Report packet loss as WARN instead of ERR.
Packet loss is recoverable so it doesn't warrant an ERR.

Bug #12920.
Reported-by: Scott Hendricks <shendricks@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2012-09-05 13:36:35 -07:00
Ben Pfaff
ebc56baa41 util: New macro CONST_CAST.
Casts are sometimes necessary.  One common reason that they are necessary
is for discarding a "const" qualifier.  However, this can impede
maintenance: if the type of the expression being cast changes, then the
presence of the cast can hide a necessary change in the code that does the
cast.  Using CONST_CAST, instead of a bare cast, makes these changes
visible.

Inspired by my own work elsewhere:
http://git.savannah.gnu.org/cgit/pspp.git/tree/src/libpspp/cast.h#n80

Signed-off-by: Ben Pfaff <blp@nicira.com>
2012-08-03 13:33:13 -07:00
Justin Pettit
232dfa4aa3 dpif: Allow the port number to be requested when adding an interface.
The datapath allows requesting a specific port number for a port, but
the dpif interface didn't expose it.  This commit adds that support.

Signed-off-by: Justin Pettit <jpettit@nicira.com>
2012-07-30 20:54:16 -07:00
Ben Pfaff
cfceb2b57a dpif-linux: Zero 'stats' outputs of dpif_operate() ops on failure.
When DPIF_OP_FLOW_PUT or DPIF_OP_FLOW_DEL operations failed, they left
their 'stats' outputs uninitialized.  For DPIF_OP_FLOW_DEL, this meant that
the caller would read indeterminate data:

Conditional jump or move depends on uninitialised value(s)
   at 0x805C1EB: subfacet_reset_dp_stats (ofproto-dpif.c:4410)
    by 0x80637D2: expire_batch (ofproto-dpif.c:3471)
    by 0x8066114: run (ofproto-dpif.c:3513)
    by 0x8059DF4: ofproto_run (ofproto.c:1035)
    by 0x8052E17: bridge_run (bridge.c:2005)
    by 0x8053F74: main (ovs-vswitchd.c:108)

It's unusual for a delete operation to fail.  The most common reason is an
administrator running "ovs-dpctl del-flows".

The only user of DPIF_OP_FLOW_PUT did not request stats, so this doesn't
fix an actual bug for that case.

Bug #11797.
Reported-by: James Schmidt <jschmidt@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2012-06-25 16:47:21 -07:00
Ethan Jackson
aecfb4af7e dpif-linux: Fix invalid format specifier.
This fixes the following warning on my system. "format '%d' expects
argument of type 'int', but argument 5 has type 'long int'"

Signed-off-by: Ethan Jackson <ethan@nicira.com>
2012-06-06 11:29:33 -07:00
Ben Pfaff
14b4d2f99f dpif-linux: Log details when a packet is lost.
Until now, when a packet was dropped in the kernel-to-user buffers, we
logged the occurrence but nothing that would allow a person reading the
log after the fact to learn why it was dropped.  This commit adds details
that identify the major sources of packets in the buffer, which should
help.

Signed-off-by: Ben Pfaff <blp@nicira.com>
2012-06-01 17:40:35 -04:00
Ben Pfaff
fe3d61b373 dpif-linux: Slightly refactor internal data structures.
An initial attempt also replaced the 'uint32_t ready_mask' in struct
dpif_linux by a 'bool ready' in each struct dpif_channel, but I wasn't
happy with the result (the ready_mask bitmap works out really well) and so
I dropped that part.

Signed-off-by: Ben Pfaff <blp@nicira.com>
2012-06-01 17:40:35 -04:00
Ben Pfaff
e222833e39 dpif-linux: Avoid pessimal behavior when kernel-to-user buffers overflow.
When a kernel-to-user Netlink buffer overflows, the kernel reports
ENOBUFS without passing along an actual message.  When it does this,
we should immediately try again, because we know that there is a
message waiting, instead of reporting the error to the caller.

This improves the OVS response rate to "hping3 --flood" traffic by
a few percentage points in my testing.

Signed-off-by: Ben Pfaff <blp@nicira.com>
2012-06-01 17:40:34 -04:00
Ben Pfaff
625b07205a ofproto-dpif: Segregate CFM, LACP, and STP traffic into separate queues.
Until now, packets for these special protocols have been mixed with general
traffic in the kernel-to-userspace queues.  This means that a big-enough
storm of new flows in these queues can cause packets for these special
protocols to be dropped at this interface, fooling userspace into believing
that, say, no CFM packets have been received even though they are arriving
at the expected rate.

This commit moves special protocols to a dedicated kernel-to-userspace
queue to avoid the problem.

Bug #7550.
Signed-off-by: Ben Pfaff <blp@nicira.com>
2012-05-09 12:58:55 -07:00
Raju Subramanian
e0edde6fee Global replace of Nicira Networks.
Replaced all instances of Nicira Networks(, Inc) to Nicira, Inc.

Feature #10593
Signed-off-by: Raju Subramanian <rsubramanian@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2012-05-02 17:08:02 -07:00
Ben Pfaff
90a7c55e56 dpif: Make caller of dpif_recv() provide buffer space.
This improves performance under heavy flow setup loads.

Signed-off-by: Ben Pfaff <blp@nicira.com>
2012-04-18 20:28:51 -07:00
Ben Pfaff
72d32ac0b3 netlink-socket: Make caller provide message receive buffers.
Typically an nl_sock client can stack-allocate the buffer for receiving
a Netlink message, which provides a performance boost.

Signed-off-by: Ben Pfaff <blp@nicira.com>
2012-04-18 20:28:48 -07:00
Ben Pfaff
eabe7c680d dpif-linux: Avoid malloc() in dpif_linux_operate().
Signed-off-by: Ben Pfaff <blp@nicira.com>
2012-04-18 20:28:44 -07:00
Ben Pfaff
b99d3ceeed ofproto-dpif: Batch flow uninstallations due to expiration.
Signed-off-by: Ben Pfaff <blp@nicira.com>
2012-04-18 20:28:12 -07:00
Ben Pfaff
33db159244 dpif-linux: Make dpif_linux_port_query_by_name() query only one datapath.
The kernel will report a vport with the given name in any datapath, but
userspace only wants a vport with the given name in a specific datapath.
Receiving information on a vport in an unexpected datapath yields bizarre
and hard-to-debug problems.

Bug #9889.
Signed-off-by: Ben Pfaff <blp@nicira.com>
2012-02-27 18:42:17 -08:00
Pravin B Shelar
95b1d73a4a datapath: Increase maximum number of datapath ports.
Use hash table to store ports of datapath. Allow 64K ports per switch.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>

Bug #2462
2012-02-16 17:12:36 -08:00
Ben Pfaff
89625d1efb dpif: Change provider interface to consistently use operation structs.
Until now, a "flow put" has represented its parameters in two different
ways, depending on whether it was coming from dpif_flow_put() or from
dpif_operate(), and similarly for an "execute" operation.  This commit
adopts the operation struct consistently within the dpif provider
interface, which seems cleaner.

This commit also factors out logging for flow puts and executes, which
is useful in the following commit.

This doesn't change the dpif client interface, since the two forms are
more convenient for clients than always filling out an operation struct.

Signed-off-by: Ben Pfaff <blp@nicira.com>
2012-01-16 13:37:27 -08:00
Ben Pfaff
c2b565b54e dpif: Factor 'type' and 'error' out of individual dpif_op members.
I'd like to change ->dpif_flow_put() and ->dpif_execute() in the dpif
provider to take the structures of the same names as parameters, instead of
passing them discrete parameters, because this seems like a more sensible
way to do things internally than to have two different ways to pass the
parameters.  It might even simplify code slightly.  But ->flow_put() and
->execute() wouldn't want the 'type' (because it's implied by the function
being called) or 'error' (because it would be the same as the return
value).  Although of course they could just ignore those members, it seems
slightly cleaner to omit them entirely, as this change allows.

Signed-off-by: Ben Pfaff <blp@nicira.com>
2012-01-16 13:35:21 -08:00
Ben Pfaff
a12b3eadc6 dpif: Simplify the "listen mask" concept.
At one point in the past, there were three separate queues between the
kernel module and OVS userspace, each of which corresponded to a Netlink
socket (or, before that, to a character device).  It made sense to allow
each of these to be enabled or disabled separately, hence the "listen mask"
concept in the dpif layer.

These days, the concept is much less clear-cut.  Queuing is no longer on
the basis of different classes of packets but instead striped across a
collection of sockets based on input port.  It doesn't really make sense
to enable receiving packets on the basis of the kind of packet anymore.
Accordingly, this commit simplifies the "listen_mask" to just a bool that
either enables or disables receiving packets.

It could be useful to enable or disable receiving packets on a per-vport
basis, but the rest of the code isn't ready to make use of that so this
commit doesn't generalize this much.

Based on this discussion on ovs-dev:
http://openvswitch.org/pipermail/dev/2011-October/012044.html

Signed-off-by: Ben Pfaff <blp@nicira.com>
2012-01-12 17:09:22 -08:00
Ben Pfaff
733c8d13d7 dpif-linux: Avoid valgrind warning in epoll_ctl() call.
Valgrind points out correctly that there are uninitialized bytes in the
'event' structure.  That's OK, but it doesn't hurt to suppress the warning
by zeroing all of the bytes.

This doesn't fix a real bug.

Signed-off-by: Ben Pfaff <blp@nicira.com>
2011-12-12 14:47:10 -08:00
Ben Pfaff
50f80534f2 dpif-linux: Use "epoll" instead of poll().
epoll appears to be much more efficient than poll() at least for
static file descriptor sets.  I can't otherwise explain why this
patch increases netperf CRR performance by 20% above the previous
commit, which is also about a 19% overall improvement versus
the baseline from before the poll_fd_woke() optimization was
removed.
2011-11-28 09:31:07 -08:00
Ben Pfaff
8522ba0996 dpif-linux: Use poll() internally in dpif_linux_recv().
Using poll() internally in dpif_linux_recv(), instead of relying
on the results of the main loop poll() call, brings netperf CRR
performance back within 1% of par versus the code base before the
poll_fd_woke() optimizations were introduced.  It also increases
the ovs-benchmark results by about 5% versus that baseline, too.

My theory is that this is because the main loop takes long enough
that a significant number of packets can arrive during the main
loop itself, so this reduces the time before OVS gets to those
packets.
2011-11-28 09:29:18 -08:00