Since netdev can have multiple IP address use
generic api netdev_get_addr_list(). This also make it
easier to handle IPv4 and IPv6 address across vswitchd
layers.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
There is check to disable IPv6 tunneling. Following patch
removes it and reintroduces the tunneling automake tests.
This reverts mostly commit 250bd94d1e500a89c76cac944e660bd9c07ac364.
There are couple of new autotests and updated documentation
related to ipv6 tunneling added in this patch.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
Device can have multiple IP address but netdev_get_in4/6()
returns only one configured IPv6 address. Following
patch fixes it.
OVS router is also updated to return source ip address for
given destination, This is required when interface has multiple
IP address configured.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
Made to simplify creation of derived classes.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
NetBSD requires <netinet/in.h> to be included before <netinit/ip6.h>.
Without this fix we have:
In file included from lib/netdev-vport.c:25:0:
/usr/include/netinet/ip6.h:82:18: error: field 'ip6_src' has incomplete type
/usr/include/netinet/ip6.h:83:18: error: field 'ip6_dst' has incomplete type
Signed-off-by: Lance Richardson <lrichard@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
There are multiple issues in IPv6 userspace tunnel
implementation. Even the kernel module that ships with
2.5 does not support IPv6 tunneling. There is not
enough time to get all fixes in branch-2.5. So it make
sense to disable the support on 2.5.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Acked-by: Jesse Gross <jesse@kernel.org>
Allow configuration of IPv6 tunnel endpoints.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Co-authored-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
This adds support for IPv6 in ovs-router and route-table. IPv4 is stored in
ovs-router using IPv4-mapped addresses.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Currently, when doing userspace tunneling we don't perform much in
the way of integrity checks on the incoming IP header. The case of
tunneling is different from the usual case of switching since we are
acting as the endpoint here and should not allow invalid packets to
pass.
This adds checks for IP checksum, version, total length, and options and
drops packets that don't pass.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Define struct eth_addr and use it instead of a uint8_t array for all
ethernet addresses in OVS userspace. The struct is always the right
size, and it can be assigned without an explicit memcpy, which makes
code more readable.
"struct eth_addr" is a good type name for this as many utility
functions are already named accordingly.
struct eth_addr can be accessed as bytes as well as ovs_be16's, which
makes the struct 16-bit aligned. All use seems to be 16-bit aligned,
so some algorithms on the ethernet addresses can be made a bit more
efficient making use of this fact.
As the struct fits into a register (in 64-bit systems) we pass it by
value when possible.
This patch also changes the few uses of Linux specific ETH_ALEN to
OVS's own ETH_ADDR_LEN, and removes the OFP_ETH_ALEN, as it is no
longer needed.
This work stemmed from a desire to make all struct flow members
assignable for unrelated exploration purposes. However, I think this
might be a nice code readability improvement by itself.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
GRE64 was introduced to extend gre key from 32-bit to 64-bit using
gre-key and sequence number field. But GRE64 is not standard
protocol. There are not many users of this protocol. Therefore we
have decided to remove it.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
The kernel implementation of Geneve options stores the TLV option
data in the flow exactly as received, without any further parsing.
This is then translated to known options for the purposes of matching
on flow setup (which will then install a datapath flow in the form
the kernel is expecting).
The userspace implementation behaves a little bit differently - it
looks up known options as each packet is received. The reason for this
is there is a much tighter coupling between datapath and flow translation
and the representation is generally expected to be the same. This works
but it incurs work on a per-packet basis that could be done per-flow
instead.
This introduces a small translation step for Geneve packets between
datapath and flow lookup for the userspace datapath in order to
allow the same kind of processing that the kernel does. A side effect
of this is that unknown options are now shown when flows dumped via
ovs-appctl dpif/dump-flows, similar to the kernel.
There is a second benefit to this as well: for some operations it is
preferable to keep the options exactly as they were received on the wire,
which this enables. One example is that for packets that are executed from
ofproto-dpif-upcall to the datapath, this avoids the translation of
Geneve metadata. Since this conversion is potentially lossy (for unknown
options), keeping everything in the same format removes the possibility
of dropping options if the packet comes back up to userspace and the
Geneve option translation table has changed. To help with these types of
operations, most functions can understand both formats of data and seamlessly
do the right thing.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
The addition of Geneve options to packet metadata significantly
expanded its size. It was reported that this can decrease performance
for DPDK ports by up to 25% since we need to initialize the whole
structure on each packet receive.
It is not really necessary to zero out the entire structure because
miniflow_extract() only copies the tunnel metadata when particular
fields indicate that it is valid. Therefore, as long as we zero out
these fields when the metadata is initialized and ensure that the
rest of the structure is correctly set in the presence of a tunnel,
we can avoid touching the tunnel fields on packet reception.
Reported-by: Ciara Loftus <ciara.loftus@intel.com>
Tested-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Currently the userspace datapath only supports Geneve in a
basic mode - without options - since the rest of userspace
previously didn't support options either. This enables the
userspace datapath to send and receive options as well.
The receive path for extracting the tunnel options isn't entirely
optimal because it does a lookup on the options on a per-packet
basis, rather than per-flow like the kernel does. This is not
as straightforward to do in the userspace datapath since there
is no translation step between packet formats used in packet vs.
flow lookup. This can be optimized in the future and in the
meantime option support is still useful for testing and simulation.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Currently dont-fragment and TTL are initialized to zero, but
those are not default config for tunnel ports. dpctl
does not show default config of a port. So by setting these
values to default we can get cleaner `dpctl show` output.
% ovs-dpctl show
system@ovs-system:
port 0: ovs-system (internal)
port 1: br0 (internal)
port 4: gre_sys (gre: df_default=false, ttl=0)
% ovs-dpctl show # After initializing default values.
system@ovs-system:
port 0: ovs-system (internal)
port 1: br0 (internal)
port 4: gre_sys (gre)
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
The Stateless TCP Tunnel (STT) protocol encapsulates traffic in
IPv4/TCP packets.
STT uses TCP segmentation offload available in most of NIC. On
packet xmit STT driver appends STT header along with TCP header
to the packet. For GSO packet GSO parameters are set according
to tunnel configuration and packet is handed over to networking
stack. This allows use of segmentation offload available in NICs
The protocol is documented at
http://www.ietf.org/archive/id/draft-davie-stt-06.txt
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
We already have the 'dp_hash' embedded in the metadata. This caused
confusion in the code. With this commit it should be clear that
'rss_hash' is the packet hash used for internal purposes, while
'md.dp_hash' is part of the flow, computed during the execution of
certain actions.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
The userspace tunneling API for pushing and popping tunnel headers
is currently based on processing batches of packets. However, there
is no obvious way to take advantage of batching for these operations
and so each tunnel operation has a pair of loops to process the
batch. This changes the API to operate on single packets to enable
better code reuse.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Kernel based OVS recently added the ability to support checksums
for UDP based tunnels (Geneve and VXLAN). This adds similar support
for the userspace datapath to bring feature parity.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
This adds basic userspace dataplane support for the Geneve
tunneling protocol. The rest of userspace only has the ability
to handle Geneve without options and this follows that pattern
for the time being. However, when the rest of userspace is updated
it should be easy to extend the dataplane as well.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Currently, the userspace VXLAN implementation contains the code
for generating and parsing both the UDP and VXLAN headers. This
pulls out the UDP portion for better layering and to make it
easier to support additional UDP based tunnels and features.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
The VNI is always present in the VXLAN header, so we should
set the FLOW_TNL_F_KEY flag to indicate this. However, the
userspace implementation of VXLAN currently does not.
Signed-off-by: Jesse Gross <jesse@nicira.com>
The indication to calculate the GRE checksum is currently the port
config rather than the tunnel flow. Currently there is a one to one
mapping between the two so there is no difference. However, the
kernel datapath must use the flow and it is also potentially more
flexible, so this switches how we decide whether to calculate the
checksum.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Pritesh Kothari <pritesh.kothari@cisco.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
The GRE checksum is a 16 bit field stored in a 32 bit option (the
rest is reserved). The current code treats the checksum as a 32-bit
field and places it in the right place for little endian systems but
not big endian. This fixes the problem by storing the 16 bit field
directly.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Pritesh Kothari <pritesh.kothari@cisco.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
On receive, the userspace GRE code doesn't check the protocol
field. Since OVS only understands Ethernet packets, this adds a
check that the inner protocol is Ethernet and discards other types
of packets.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Pritesh Kothari <pritesh.kothari@cisco.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
The IP TTL is currently omitted in the extracted tunnel information
that is stored in the flow for userspace tunneling. This includes it
so that the same logic used by the kernel also applies.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Pritesh Kothari <pritesh.kothari@cisco.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
When there is any update from ovsdb, ovs will call netdev_set_config()
for every vport. Even though the change is not related to vport, the
current implementation will always increment the per-netdev sequence
number. Subsequently this could cause even more unwanted effects,
e.g. the recreation of 'struct tnl_port' in ofproto level.
This commit fixes the issue by only updating the netdev when there
is actual configuration change.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
e.g. Set tunnel id for encapsulated VxLAN packet (out_key=flow):
ovs-vsctl add-port int-br vxlan0 -- set interface vxlan0 \
type=vxlan options:remote_ip=172.168.1.2 options:out_key=flow
ovs-ofctl add-flow int-br in_port=LOCAL, icmp,\
actions=set_tunnel:3, output:1 (1 is the port# of vxlan0)
Output tunnel ID should be modified to 3 with this patch.
Signed-off-by: Ricky Li <ricky.li@intel.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
The kernel module can already support outer UDP checksums for
Geneve and VXLAN using the standard checksum flag in tunnel
metadata. This makes userspace aware of the capability so that
users can enable it on tunnel ports.
There is a complication in that there is no way for userspace to
probe or detect if the kernel does not support this capability
in order to warn the user. In this case, connectivity will appear
to function normally but packets will not be checksum protected.
This is mainly an issue for VXLAN which has existed in the kernel
for a some time without checksum support - while there are also
a few kernel versions that support Geneve only without checksums,
they are much less common.
There isn't a particularly good solution to the compatibility
issue without introducing a larger capabilities structure. However,
UDP checksums are likely to be used only rarely at this point in
time and the VXLAN spec (where the main problem lies) recommends
against them. Therefore, this is considered to be an advanced user
feature and we settle for just documenting the issue.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Pritesh Kothari <pritesh.kothari@cisco.com>
Currently dp-packet make use of ofpbuf for managing packet
buffers. That complicates ofpbuf, by making dp-packet
independent of ofpbuf both libraries can be optimized for
their own use case.
This avoids mapping operation between ofpbuf and dp_packet
in datapath upcalls.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
dp_packet is short and better name for datapath packet
structure.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Supports a new "exts" field in the tunnel configuration which takes a
comma separated list of enabled extensions.
The only extension supported so far is GBP but this can be used to
enable RCO and possibly others as soon as the OVS datapath supports
them.
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Ben Pfaff <blp@nicira.com>
A new function vlog_insert_module() is introduced to avoid using
list_insert() from the vlog.h header.
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Since dpif registering for routing table at initialization
there is no need to unregister it. Following patch removes
support for turning routing table notifications on and off.
Due to this change OVS always listens for these
notifications.
Reported-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Following patch adds support for userspace tunneling. Tunneling
needs three more component first is routing table which is configured by
caching kernel routes and second is ARP cache which build automatically
by snooping arp. And third is tunnel protocol table which list all
listening protocols which is populated by vswitchd as tunnel ports
are added. GRE and VXLAN protocol support is added in this patch.
Tunneling works as follows:
On packet receive vswitchd check if this packet is targeted to tunnel
port. If it is then vswitchd inserts tunnel pop action which pops
header and sends packet to tunnel port.
On packet xmit rather than generating Set tunnel action it generate
tunnel push action which has tunnel header data. datapath can use
tunnel-push action data to generate header for each packet and
forward this packet to output port. Since tunnel-push action
contains most of packet header vswitchd needs to lookup routing
table and arp table to build this action.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Rather than using hmap for storing routing entries we can directly use
classifier which has support for priority and wildcard entries.
This makes route lookup lockless. This help when we use route lookup
for native tunneling.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
This commit adds a new API to the 'struct netdev_class' which
allows user to configure the number of tx queues and rx queues
of 'netdev'. Upcoming patches will use this function to set
multiple tx/rx queues when adding the netdev to dpif-netdev.
Currently, only netdev-dpdk module implements this function.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
There are couple of reasons to remove this support:
* This is used in very old OVS use-case. It is much better
to read stats directly from OVS.
* Forthcoming commit will remove support for setting stats
for vport. The stats update depends on stats-set.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This commit adds a new API to the 'struct netdev_class' which
allows user to query the numa node id the 'netdev' is on.
Currently, only netdev-dpdk module implements this function.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
We can't unlock the netdev's mutex after close the netdev, because closing
the netdev might destroy the mutex.
VMware-BZ: #1275187
Signed-off-by: Ben Pfaff <blp@nicira.com>
This adds support for Geneve - Generic Network Virtualization
Encapsulation. The protocol is documented at
http://tools.ietf.org/html/draft-gross-geneve-00
The kernel implementation is completely agnostic to the options
that are in use and can handle newly defined options without
further work. It does this by simply matching on a byte array
of options and allowing userspace to setup flows on this array.
Userspace currently implements only support for basic version of
Geneve. It can work with the base header (including the VNI) and
is capable of parsing options but does not currently support any
particular option definitions. Over time, the intention is to
allow options to be matched through OpenFlow without requiring
explicit support in OVS userspace.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
In most cases, tunnel ports specify a dpif name to act as the backing
port in the datapath. However, in the case of UDP tunnels the type is
used with the port number appended. This is potentially a problem for
IPsec tunnels because they have different types but should have the
same backing port. The hasn't been a problem in practice though because
no UDP tunnels are currently used with IPsec.
This switches to use the dpif_port in all cases plus a port number if
necessary. It does this by making the names short enough to accomodate
ports, which also makes the naming more consistent.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Commit 3e912ffcbb (netdev: Add 'change_seq' back to netdev.) added per-
netdev change number for indicating status change. Future commits used
this change number to optimize the netdev status update to database.
However, the work also introduced the bug in the following scenario:
- assume interface eth0 has address 1.2.3.4, eth1 has adddress 10.0.0.1.
- assume tunnel port p1 is set with remote_ip=10.0.0.5.
- after setup, 'ovs-vsctl list interface p1 status' should show the
'tunnel_egress_iface="eth1"'.
- now if the address of eth1 is change to 0 via 'ifconfig eth1 0'.
- expectedly, after change, 'ovs-vsctl list interface p1 status' should
show the 'tunnel_egress_iface="eth0"'
However, 'tunnel_egress_iface' will not be updated on current master.
This is in that, the 'netdev-vport' module corresponding to p1 does
not react to routing related changes.
To fix the bug, this commit adds a change sequence number in the route-
table module and makes netdev-vport check the sequence number for
tunnel status update.
Bug #1240626
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This commit can be seen as a partial revert of commit
da4a619179d (netdev: Globally track port status changes)
by adding the 'change_seq' to 'struct netdev'.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
We do not have pidfiles in Windows. And we do not yet have support
for ipsec tunnels. This lets us move forward with compilation.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Previously, we tracked status changes for ofports on a per-device basis.
Each time in the main thread's loop, we would inspect every ofport
to determine whether the status had changed for corresponding devices.
This patch replaces the per-netdev change_seq with a global 'struct seq'
which tracks status change for all ports. In the average case where
ports are not constantly going up or down, this allows us to check the
sequence once per main loop and not poll any ports. In the worst case,
execution is expected to be similar to how it is currently.
In a test environment of 5000 internal ports and 50 tunnel ports with
bfd, this reduces average CPU usage of the main thread from about 40% to
about 35%.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
Add member is_layer3 to struct ofport_dpif to mark layer 3 ports. Set
it to "true" for the only layer 3 port we support for now: lisp.
Additionally, prevent flooding to layer 3 ports. A later patch will
also prevent MAC learning.
Signed-off-by: Lorand Jakab <lojakab@cisco.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>