Until now, tunnel vports have had a specific MTU, in the same way that
ordinary network devices have an MTU, but treating them this way does not
always make sense. For example, consider a datapath that has three ports:
the local port, a GRE tunnel to another host, and a physical port. If
the physical port is configured with a jumbo MTU, it should be possible to
send jumbo packets across the tunnel: the tunnel can do fragmentation or
the physical port traversed by the tunnel might have a jumbo MTU.
However, until now, tunnels always had a 1500-byte MTU by default. It
could be adjusted using ODP_VPORT_MTU_SET, but nothing actually did this.
One alternative would be to make ovs-vswitchd able to set the vport's MTU.
This commit, however, takes a different approach, of dropping the concept
of MTU entirely for tunnel vports. This also solves the problem described
above, without making any additional work for anyone.
I tested that, without this change, I could not send 1600-byte "pings"
between two machines whose NICs had 2000-byte MTUs that were connected to
vswitches that were in turn connected over GRE tunnels with the default
1500-byte MTU. With this change, it worked OK, regardless of the MTU of
the network traversed by the GRE tunnel.
This patch also makes "patch" ports MTU-less.
It might make sense to remove vport_set_mtu() and the associated callback
now, since ordinary network devices are the only vports that support it
now.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Suggested-by: Jesse Gross <jesse@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Bug #3728.
This gives network device implementations the opportunity to fetch an
existing device's configuration and store it as their arguments, so that
netdev clients can find out how an existing device is configured.
So far netdev-vport is the only implementation that needs to use this.
The next commit will add use by clients.
Reviewed by Justin Pettit.
This commit removes the tunnel_egress_iface column from the
interface table and moves it's data to the status column. In the
process it reverts the database to version 1.0.0.
For some time now, Open vSwitch datapaths have internally made a
distinction between adding a vport and attaching it to a datapath. Adding
a vport just means to create it, as an entity detached from any datapath.
Attaching it gives it a port number and a datapath. Similarly, a vport
could be detached and deleted separately.
After some study, I think I understand why this distinction exists. It is
because ovs-vswitchd tries to open all the datapath ports before it tries
to create them. However, changing it to create them before it tries to
open them is not difficult, so this commit does this.
The bulk of this commit, however, changes the datapath interface to one
that always creates a vport and attaches it to a datapath in a single step,
and similarly detaches a vport and deletes it in a single step.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Until now, the collection of coverage counters supported by a given OVS
program was not specific to that program. That means that, for example,
even though ovs-dpctl does not have anything to do with mac_learning, it
still has a coverage counter for it. This is confusing, at best.
This commit fixes the problem on some systems, in particular on ones that
use GCC and the GNU linker. It uses the feature of the GNU linker
described in its manual as:
If an orphaned section's name is representable as a C identifier then
the linker will automatically see PROVIDE two symbols: __start_SECNAME
and __end_SECNAME, where SECNAME is the name of the section. These
indicate the start address and end address of the orphaned section
respectively.
Systems that don't support these features retain the earlier behavior.
This commit also fixes the annoyance that files that include coverage
counters must be listed on COVERAGE_FILES in lib/automake.mk.
This commit also fixes the annoyance that modifying any source file that
includes a coverage counter caused all programs that link against
libopenvswitch.a to relink, even programs that the source file was not
linked into. For example, modifying ofproto/ofproto.c (which includes
coverage counters) caused tests/test-aes128 to relink, even though
test-aes128 does not link again ofproto.o.
Currently netdev_get_carrier() returns both a carrier status and
an error code. However, usage of the error code was inconsistent:
most callers either ignored it or didn't perform their task if an
error occured, which prevented bond rebalancing. This makes the
handling consistent by translating an error into a down status in
the netdev library.
Bug #3959
The only real difference between netdev-patch and netdev-tunnel is in their
parse_config() implementation. That's a lot of extra code to maintain, for
questionable benefit. This commit merges them into the netdev-vport code,
which was heretofore merely a collection of helper functions.
Currently we print a warning if a user tries to configure a
netdev that is not in the list that userspace knows about.
However, it is possible that a given netdev maybe be enabled but
when it tries to create a device it finds out that it can't
(not supported by kernel module, hardware not present, etc.).
This makes the behavior the same in both cases.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Adding a macro to define the vlog module in use adds a level of
indirection, which makes it easier to change how the vlog module must be
defined. A followup commit needs to do that, so getting these widespread
changes out of the way first should make that commit easier to review.
ovs-vswitchd doesn't declare its QoS capabilities in the database yet,
so the controller has to know what they are. We can add that later.
The linux-htb QoS class has been tested to the extent that I can see that
it sets up the queues I expect when I run "tc qdisc show" and "tc class
show". I haven't tested that the effects on flows are what we expect them
to be. I am sure that there will be problems in that area that we will
have to fix.
The most recent revision of the netdev library added may_create
and may_open flags to explicitly state the intent of the caller as
to whether the device should already be in use. This was simply
a sanity check for users of the netdev library and the configuration.
At this point the netdev library and its users are well behaved and
should no longer need to be checked. Additional checks have also
been added for incorrect configuration that mean the netdev library
is no longer the primary line of defense.
These flags themselves create problems because it is not always
easy for a library to know what the state of devices should be.
This is particularly a problem for ovs-openflowd, which expects
ports to be added by ovs-dpctl. Fixing this either requires that
the checks are so permissive to be useless or ugly hacks to get
around them. Since they are no longer needed, just remove the
checks.
This commit restores the previous behavior of ovs-openflowd to
not require that ports be specified on the command line or
cleaned up after use.
Bug #2652
CC: Natasha Gude <natasha@nicira.com>
CC: Jean Tourrilhes <jt@hpl.hp.com>
CC: 蒲彦 <yan.p.bjtu@gmail.com>
Now that we have a new patch implementation, remove the veth driver
and its userspace components. Then rename 'patchnew' to 'patch'.
The new implementation is a drop-in replacement for the old one.
Add a netdev to talk to the 'patch' vport in the kenerl. Since
there is currently a 'patch' implementation using the veth driver,
this one is temporarily called 'patchnew'.
The new GRE implementation provides a complete drop in replacement
for the old Linux based implementation. Therefore, remove the
old implementation and rename "grenew" to "gre".
Allow netdev providers to set get_ifindex and get_features it
null if they would always return EOPNOTSUPP. This is particuarly
useful for virtual devices.
When receiving a change notification from rtnetlink we checked whether
a netdev of that name existed and if so tried to handle it. This also
checks that the type of the device is one handled by netdev-linux.
Add netdev_is_open(), which checks to see if a given netdev is
currently open. It will be used to assist in cleaning up old ports
that are no longer in use.
This commit introduces a new netdev type called "patch". A patch is a
pair of interfaces, in which frames sent through one of the devices
pop out of the other. This is useful for linking together datapaths.
A patch's only argument on creation is "peer", which specifies the other
side of the patch. A patch must be created in pairs, so a second netdev
must be created with the "name" and "peer" values reversed.
The current implementation is built using veth devices. Further, it's
limited to the veth devices which support configuration through sysfs.
This limits the ability to use a "patch" on 2.6.18 kernels using the
veth device we include (read: flavors of XenServer 5.5). In the not too
distant future, the implementation will be modified to use the new
kernel port abstraction introduced by Jesse Gross's forthcoming GRE
work. At that point, patch devices will work on any Linux platform
supported by OVS.
If an error occured while opening a netdev it would decrement the
refcount, even though it was never incremented. Depending on
the timing this could result in either an error message or an
assertion failure. This workaround simply always increments
the refcount before openning a device. A more complete fix
already exists in the netdev overhaul in the 'next' branch.
NIC-59
No conflicts, but lib/dpif.c needed a few changes since struct dpif's
member "class" was renamed to "dpif_class" in master since sflow was
branched off.
We only reconfigure netdevs if the arguments have changed, which
was previously detected based on a hash. This stores and compares
the full argument list to avoid any chance of missing changes due
to collisions.
We previously maintained a list of open devices inside of the
linux netdev. Since the netdev library now maintains this list,
it is better to use that list instead of our own.
Until now, fatal_signal_fork() has simply disabled all the fatal signal
callback hooks. This worked fine, because a daemon process forked only
once and the parent didn't do much before it exited.
But upcoming commits will introduce a --monitor option, which requires
processes to fork multiple times. Sometimes the parent process will fork,
then run for a while, then fork again. It's not good to disable the
hooks in the child process in such a case, because that prevents e.g.
pidfiles from being removed at the child's exit.
So this commit changes the semantics of fatal_signal_fork() to just
clearing out hooks. After hooks are cleared, new hooks can be added and
will be executed on process termination in the usual way.
This commit also introduces a cancellation callback function so that a
canceled hook can free resources.
This builds on earlier work that implemented netdev object refcounting.
However, rather than requiring explicit create and destroy calls,
these operations are now performed automatically based on the referenece
count. This is important because in certain situations it is not
possible to know whether a netdev has already been created. A
workaround existed (which looked fairly similar to this paradigm) but
introduced it's own issues. This simplifies and unifies the API.
Rather than running signal hooks directly from the actual signal
handler, simply record the fact that the signal occured and run
the hook next time around the poll loop. This allows significantly
more freedom as to what can actually be done in the signal hooks.