VPORT_F_TUN_ID is last remaining flag, once we remove it, flags
field from vport-ops can be removed. Since it does not complicate
much code, we decided to remove this flag.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
vport->init and exit() functions are defined by gre and netdev vport
only and both can be moved to first port create.
Following patch does same, it moves vport init to respective vport
create and gets rid of vport->init() and vport->exit() functions.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
The only user is get_dpifindex(), no need to redirect via the port
operations.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Currently OVS uses combination of genl and rtnl lock to protect
datapath state. This was done due to networking stack locking.
But this has complicated locking and there are few lock ordering
issues with new tunneling protocols.
Following patch simplifies locking by introducing new ovs mutex
and now this lock is used to protect entire ovs state.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Header caching used to store a precomputed flow along with the skb
but no longer exists. There were a few remaining checks for those
flows, which this removes. It simplifies the code slightly and brings
us closer to upstream.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Use strlcpy where possible to ensure the string is \0 terminated.
Use always sizeof(string) instead of 32, ETHTOOL_BUSINFO_LEN
and custom defines.
Use snprintf instead of sprint.
Remove unnecessary inits of ->fw_version
Remove unnecessary inits of drvinfo struct.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jesse Gross <jesse@nicira.com>
The ability to retrieve and set MAC addresses on vports is only
necessary for tunnel ports (the addresses for actual devices can be
retrieved through direct Linux mechanisms). Tunnel ports only used
the information for the purpose of generating path MTU discovery
packets, which has now been removed. Current userspace code already
reflects these changes, so this drops the functionality from the
kernel.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Kyle Mestery <kmestery@cisco.com>
Currently brcompat does not work on master due to recent
datapath changes. We have decided to remove it as it is
not used very widely.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
bonus: if we ever are to use IFF_LIVE_ADDR_CHANGE for
anything further than to check availability in eth_mac_addr(),
Open vSwitch will be ready for that.
Signed-off-by: Thomas Graf <tgraf@redhat.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
It's possible that packets that are sent on internal devices (from
the OVS perspective) have already traversed the local IP stack.
After they go through the internal device, they will again travel
through the IP stack which may get confused by the presence of
existing information in the skb. The problem can be observed
when switching between namespaces. This clears out that information
to avoid problems but deliberately leaves other metadata alone.
This is to provide maximum flexibility in chaining together OVS
and other Linux components.
Bug #10995
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Replaced all instances of Nicira Networks(, Inc) to Nicira, Inc.
Feature #10593
Signed-off-by: Raju Subramanian <rsubramanian@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Use eth_hw_addr_random() instead of calling random_ether_addr()
to set addr_assign_type correctly to NET_ADDR_RANDOM.
Reset the state to NET_ADDR_PERM as soon as the MAC get
changed via .ndo_set_mac_address.
Signed-off-by: Danny Kukawka <danny.kukawka@bisect.de>
[jesse: add backporting to older kernels]
Signed-off-by: Jesse Gross <jesse@nicira.com>
Following patch adds support for Linux net-namespace. Now we can
have independent OVS instance in each net-ns.
Namespace support requires 2.6.32 or newer kernel as per-net-ns
genl-sock is not available in earlier kernel.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Bug #7821
OVS has quite a few global symbols that should be scoped with a
prefix to prevent collisions with other modules in the kernel.
Suggested-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Many of our kernel copyright messages make reference to code being
copied from the Linux kernel, which is a bit odd for code in the
kernel. This changes them to use the standard GNU GPL boilerplate
instead. It does not change the actual license, which continues to
be GPLv2.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
From 2.6.39 kernel netdev features are set using set_features and
fix_features APIs. Since internal-dev does not need any special
checks on setting feature, there is no need to define set_features
or fix_features. Only hw_features needs to be set to features that
are supported by internal-dev.
Following patch does same and drops discrete offload setting ops for
newer kernel.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Bug #7772
We currently set netdev->flags to IFF_BROADCAST | IFF_MULTICAST
but this is unnecessary because it's already done by ether_setup().
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Linux 3.1 adds a flag to check whether it's OK for shared skbs to
be transmitted on devices. This generally isn't a problem for
hardware devices but software devices such as OVS that hold state
in the skb need to clear the flag, which is enabled by default.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Linux 3.1 drops the symbol HAVE_NET_DEVICE_OPS that lets us know
whether struct netdev_ops is present. As a result, we need to
replace it with an explicit version check.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Following patch removes RT kernel support. This allows us to cleanup
the loop detection.
Along with this BH is now disabled while running execute_actions()
for packet from user-space.
As a result we can simplify the stats code as entire send and receive
path runs in BH context on all supported platforms.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Bug #7621
In case CONFIG_PREEMPT_RCU, rcu grace period waits only for RCU
read-side critical sections that are delimited by rcu_read_lock() and
rcu_read_unlock(). internal_dev_xmit() is called in
rcu_read_lock_bh context. Therefore we need to explicitly take rcu
lock to prevent race with call_rcu() in PREEMPT_RCU case.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Currently ovs is using device stats for Linux devices and count them
itself in other situations. This leads to overlap with hardware stats,
inconsistencies, etc. It's much better to just always count the packets
flowing through the switch and let userspace do any merging that it wants.
Following patch removes vport->get_stats() interface. vport-stat is changed
to use new `struct ovs_vport_stat` rather than rtnl_link_stats64.
Definitions of rtnl_link_stats64 is removed from OVS. dipf_port->stat is also
removed as aggregate stats are only available at netdev layer.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Currently the kernel automatically sets the MTU of any internal
interfaces to the minimum of all attached interfaces because the Linux
bridge does this. Userspace can do this with more knowledge and
flexibility.
Feature #7323
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Currently OVS uses its own hashing implmentation for hash tables
which has some problems, e.g. error case on deletion code.
Following patch replaces that with hlist based hash table which is
consistent with other kernel hash tables. As Jesse suggested, flex-array
is used for allocating hash buckets, So that we can have large
hash-table without large contiguous kernel memory.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Remove iflink from vport interface. iflink is not used anywhere in
OVS. So there is not need to have iflink as vport attribute.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
The prefix "ODP_*" is not overly descriptive in the context of the
larger Linux tree. This commit changes the prefix to "OVS_*" for the
userpace to kernel interactions. The userspace libraries still use
"ODP_" in many of their interfaces since it is more descriptive in the
OVS oeuvre.
Feature #6904
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
The internal dev vport really needs hardirq.h but doesn't depend
directly on it and has relied on it being included from other sources.
Recent kernels broke this, so explicitly add the header.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Older kernels (those before 2.6.22) rely on implicit assumptions
to determine checksum offloading status. These assumptions tend
to break down when doing switching because it sits in the middle
of the transmit and receive path. Newer kernels deal with this
problem by adding more explicit information about how to checksum.
This replicates that behavior by mirroring the state from newer
kernels in private OVS storage on the kernels that lack it. On
ingress and egress we then map that state onto the appropriate
location for the given kernel and can consistently manipulate it
within OVS. Some of this was already done for the checksum type
but this makes it more robust and expands it to the checksum start
and offset as well.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Although it is generally best to configure vlans directly though
Open vSwitch, enabling vlan acceleration on internal devices can
avoid some issues and hardware limitations if Linux vlan devices
are used. It is only used on kernels that support modern vlan
data structures, which are 2.6.27 and later.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
We currently call vport_free() for internal devices after the
device is unregistered. This takes care of callers that use
either RTNL or RCU but not ones that have only a device reference.
In particular, if stats are requested while a datapath is being
unregistered we can try to use the vport data structures which
have already been freed.
Bug #4736
Reported-by: Brad Hall <brad@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Using the kernel vlan acceleration has a number of benefits:
it enables hardware tagging, allows usage of TSO and checksum
offloading, and is generally easier to manipulate. This switches
the vlan actions to use skb->vlan_tci field for any necessary
changes. In places that do not support vlan acceleration in a way
that we can use (in particular kernels before 2.6.37) we perform
any necessary conversions, such as tagging and GSO before the
packet leaves Open vSwitch.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Kernels prior to 2.6.27 did not have a vlan_tci field in struct
sk_buff for vlan acceleration. It's very convenient to use this
field for manipulating vlan tags, so we would like to use it as
the primary mechanism. To enable this, this commit adds similar
infrastructure to the OVS_CB on the kernels that need it and a
set of functions to use the correct location.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
The vport_mutex really only protects the vport dev_table, which isn't very
much. By getting rid of it we take one step toward simplifying the vswitch
locking, which will necessarily have to be based mainly around the Generic
Netlink genl_mutex once we switch to Generic Netlink.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
I plan to make the vport type part of the standard header stuck on each
Netlink message related to a vport. As such, it is more convenient to use
an integer than a string. In addition, by being fundamentally different
from strings, using an integer may reduce the confusion we've had in the
past over the differences in userspace and kernel names for network device
and vport types.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
I introduced this a long time ago as an efficient way for userspace to find
out whether and where an internal device was attached, but I've always
considered it an ugly kluge. Now that ODP_VPORT_QUERY can fetch a vport's
info regardless of datapath, it is no longer necessary. This commit
stops using Ethtool for this purpose and drops the feature.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
unregister_netdevice() contains a call to synchronize_rcu(), so there
is no need to directly call it ourselves immediately beforehand.
We were relying on the call during unregistration anyways to stop
packets from being transmited on the device, so our version was
both misleading and had a performance penalty.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
The vports are now attached and ready to go when they are allocated,
so we don't have to worry about future changes. As a result, we can
directly store the pointer in the internal dev's netdevice private
space before it is registered. The registration process will handle
the necessary write memory barriers and anyone who has a reference
to the netdev will have done the read side barriers, we don't need
to use RCU at all.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Checksum offloading has changed quite a bit across different kernel
and Xen versions. Since it is part of the skb data structure it is
unfortunately difficult to separate out into compatibility code.
This consolidates all of the checksum code in one place which makes
it easier read and remove as we prepare for upstreaming. On newer
kernels it also puts everything in inline functions, eliminating the
need to run through the compat code or make extra function calls.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
These steps are sequentially in lockstep, so we might as well combine them.
This expands the region over which the vport_lock is held. I didn't
carefully verify that this was OK.
This also eliminates the synchronize_rcu() call from destruction of tunnel
vports, since they didn't appear to me to need it.
It should be possible to eliminate the synchronize_rcu() from the netdev,
patch, and internal_dev vports, but this commit does not do that.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
After the previous commit, which changed the datapath to always create and
attach a vport at the same time, and to always detach and delete a vport
at the same time, there is no longer any real distinction between a dp_port
and a vport. This commit, therefore, merges the two together to simplify
code. It might even improve performance, although I have not checked.
I wasn't sure at first whether the merged structure should be "struct
dp_port" or "struct vport". I went with the latter since the "v" prefix
sounds cool.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Upcoming commits will keep needing to pass more information to the vport
'create' member function. It's annoying to have to modify a dozen pieces
of code every time just to do this, so this commit encapsulates all of the
parameters in a new struct and passes that instead.
Acked-by: Jesse Gross <jesse@nicira.com>
We can already receive packets with a frag list due to reassembly
in CAPWAP tunneling. Since we can handle it, we might as well open
it up to internal devices as well to prevent linearization.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
dev->last_rx is used for rebalancing in Linux bonding. However,
on a SMP machine it quickly becomes a very hot cacheline. On
kernels 2.6.29 and later the networking core will update last_rx
only if bonding is in use, so drivers do not need to set it at all.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
vport_ops, tunnel_ops, and ethtool_ops should not change at runtime.
Therefore, mark them as const to keep them out of the hotpath and to
prevent them from getting trampled.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
We currently call skb_reset_mac_header() in a few places when a
packet is received. However, this is not needed because flow_extract()
will set all of the protocol headers during parsing and nothing needs
the packet headers before that time.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
When transmitting on a device, dev_hard_start_xmit() always provides
a private clone. The skb_share_check() in internal_dev_xmit() is
therefore unnecessary, so remove it.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>