Do not ask kernel to transfer more opts than we really need.
When we're sending fds with flags, we ask kernel to copy the whole
struct scm_fdset::opts array, like we'd send CR_SCM_MAX_FD fds,
even if really we're transmitting only one fd.
send_fds() does not initializes the rest of array memory, but kernel
transmits this garbage. Also, recv_msg() does not return it to userspace.
This patch makes kernel do not transmit uninitialized garbage.
travis-ci: success for pie: Optimize send_fds() and recv_fds() with opts
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
The need to mess with binfmt_misc super-blocks only exists
in OpenVZ kernel and troubes all the other users. So make
this code get compiled-out by default.
In VZ-builds the BINFMT_MISC_VIRTUALIZED should be put into
.config file before running make.
https://github.com/xemul/criu/issues/235
travis-ci: success for Don't compile in binfmt_misc dumping code by default (rev3)
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Reviewed-by: Dmirty Safonov <dsafonov@virtuozzo.com>
In this file one can add options with which to build CRIU.
Each line is (for now) expanded into CONFIG_$(TEXT) macros
defined in config.h that can be tested later in the code.
v2: Add .config to .gitignore
v3: Don't check that make mrproper removes .config
https://github.com/xemul/criu/issues/235
travis-ci: success for Don't compile in binfmt_misc dumping code by default (rev3)
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Reviewed-by: Dmirty Safonov <dsafonov@virtuozzo.com>
When pages are swapped out we can't detect their presence
with mincore.
Pavel found that lseek(SEEK_DATA, SEEK_HOLE) can show which
pages are used.
travis-ci: success for shmem: use lseek(SEEK_DATA) instead of mincore
Cc: Eugene Batalov <eabatalov89@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Looks-good-to: Eugene Batalov <eabatalov89@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
It is required to not dump content of the root mount in dump_one_fd().
travis-ci: success for Fix a few issues to dump/restore Docker containers with userns
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Superblock flags can be changed only an owner of the global CAP_SYS_ADMIN.
But it is posible to mount tmpfs with any flags.
travis-ci: success for Fix a few issues to dump/restore Docker containers with userns
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
travis-ci: success for Fix a few issues to dump/restore Docker containers with userns
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Part-of: Fix a few issues to dump/restore Docker containers with userns
travis-ci: success for Fix a few issues to dump/restore Docker containers with userns
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
This adds the description of --external option for all the supported
cases, both for dump and restore.
References: https://criu.org/CLI/opt/--external
travis-ci: success for Add/fix description of --external and --inherit-fd
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
First, minor/major are separated by a slash, not a semicolon.
Second, use NAME not VAL.
travis-ci: success for Add/fix description of --external and --inherit-fd
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Add that --inherit-fd may also access file_path argument, such as
in this example from wiki (see
https://criu.org/Inheriting_FDs_on_restore#Regular_files):
$ ./test.sh > /tmp/old &
<pid>
$ sudo criu dump -j -t <pid>
$ sudo criu restore -d -j --inherit-fd 'fd[7]:tmp/old' 7> /tmp/new
travis-ci: success for Add/fix description of --external and --inherit-fd
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
This patch describes the correct syntax of --inherit-fd.
travis-ci: success for Add/fix description of --external and --inherit-fd
CC: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Remove the following options (obsoleted by --external):
--ext-unix-sk
--veth-pair
--ext-mount-map
--enable-external-masters
--enable-exteral-sharing
travis-ci: success for Add/fix description of --external and --inherit-fd
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
If userns_restore_one_link() is called outside of usernsd,
it switches into the criu namespace and switches back before exiting.
v2: rid of the include of linux/net_namespace.h in criu/include/net.h,
as well as the associated defines and feature checks
travis-ci: success for net: simplify restore of macvlan-s (rev2)
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
It's like switch_ns, but it gets a namespace file descriptor instead of pid.
travis-ci: success for net: simplify restore of macvlan-s (rev2)
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
travis-ci: success for zdtm/cr_veth: use the --clean alias of the cleanup action
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
The syntax for --ext-mount-map auto is
--external mnt[]{:ms}
where optional 'm' means --enable-external-masters and optional
's' means --enable-external-sharing.
travis-ci: success for mnt: Deprecate --ext-mount-map for --external
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Make --external support --ext-mount-map. The syntax is
--ext-mount-map KEY:VAL == --external mnt[KEY]:VAL
Old option is kept for backward compatibility.
travis-ci: success for mnt: Deprecate --ext-mount-map for --external
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Note, this depends on Pavel's patch here:
https://lists.openvz.org/pipermail/criu/2016-October/032499.html which is
not yet applied.
travis-ci: success for test: use .pid.inprogress file for macvlan test (rev2)
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Inspired by Tycho's macvlan test, here's the same thing for
--external veth option. In master we still have the --veth-pair
one, but the plan is to move this all under the --external opt.
v2:
* Travis doesn't have /usr/bin/sed
* Added .checkskip hook for older environments
v3:
* Delete bridge hanging around after previous flavor
* Wait for host veth end to die after dump
v4:
* Get the pid of task to move veth into from .pid.inprogress file
v5:
* Wait for host veth end to die after test stop too :\
Travised-by: https://travis-ci.org/xemul/criu/builds/170726663
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Note that this test doesn't run in uns mode, even though we have support
for that. Without a full container engine, I couldn't think of a nice way
to pass a macvlan device into the zdtm "container" when in UNS mode.
v2: use the nsid_manip feature flag
travis-ci: success for series starting with [v10,01/11] net: pass the struct nlattrs to dump() functions
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
travis-ci: success for series starting with [v10,01/11] net: pass the struct nlattrs to dump() functions
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
This header was only introduced in 2015, so we need to build without it.
travis-ci: success for series starting with [v10,01/11] net: pass the struct nlattrs to dump() functions
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
While this is in principle similar to how veths are handled, we have to do
things in two different ways depending on whether or not there is a user
namespace involved, because there is no way to ask the kernel to attach a
macvlan NIC to a device in a net ns that we don't have CAP_NET_ADMIN in.
So we do it in two ways:
a. If we are in a user namespace, we create the device in usernsd and use
IFLA_NET_NS_FD to set the netns which it should be created in (saving
us a "move into this netns" step).
b. If we aren't in a user namespace, we could still be in a net namespace,
so we use IFLA_LINK_NETNSID to set namespace that the i/o device will be
in. Then we open a netlink socket from criu's netns and use
IFLA_NET_NS_FD to tell the kernel to create the macvlan device in the
target's namespace.
v2: * s/CLONE_NEWNET/CLONE_NEWUSER
* Don't bother to dump IFLA_LINK and IFLA_LINK_NETNSID. Although we
need to provide these on restore, there's no kernel interface that
persists these. To populate IFLA_LINK, we require users pass
--macvlan-pair, and we create a NETNSID relation as needed and pass
that in for macvlan links (although this infrastructure could be used
elsewhere for links that need it in the future, since is in the
hoisted populate_newlink_req()).
* use new external command instead of creating a --macvlan-pair option
v3: add a feature check for linux/net_namespace.h, since not every arch in
travis has this (new-ish) header
v4: * include sys/types.h instead of linux/if.h to get IFF_UP flag
* remove old doc addition about --macvlan-pair option
v5: define IFLA_LINK_NETNSID and RTM_NEWNSID if they don't exist
v6: define IFLA_MACVLAN_FLAGS and bump the size of IFLA_MACVLAN_MAX when
necessary
v7: * remove unused struct macvlan_pair
* split feature test for linux/net_namespace.h into separate patch
* move IFLA_INFO_MAX testing in dump_one_netdev to the right patch
* add documents for netwlink_extras fields
* split changeflags into separate patch
* use existing netnsid if we get EEXIST
* move macvlan code to a helper function
* use netnsid to restore in userns case, and not pid
v8: * define RTM_GETNSID since we use that too now :)
* don't bother with IFLA_MACVLAN_MAX; we only understand things up to
IFLA_MACVLAN_FLAGS, so let's just use that as our max instead. The
problem with using macros here, is that IFLA_MACLAN_MAX is defined as
a macro with an enum expansion in it, so we get bitten by the enum
not being available at preprocessing time, and implicit zero coercion
when testing against its value for stuff. Yeesh.
v10: * add some comments about when we set up NET_NS_FD and why we use
IFLA_LINK and IFLA_NET_NS_ID
* use the socket opened in restore_links() instead of opening one in
restore_one_macvlan()
* split the new argument to restore_one_link into its own patch
travis-ci: success for series starting with [v10,01/11] net: pass the struct nlattrs to dump() functions
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
We'll use this socket to restore macvlan interfaces.
travis-ci: success for series starting with [v10,01/11] net: pass the struct nlattrs to dump() functions
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
We'll use this struct in the next patch to set some top level IFLA_ members
that we need for restoring macvlan devices.
travis-ci: success for series starting with [v10,01/11] net: pass the struct nlattrs to dump() functions
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
We'll use this in the next patch to set some macvlan flags.
travis-ci: success for series starting with [v10,01/11] net: pass the struct nlattrs to dump() functions
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
We'll use this in the next patch to find the ifindex for a macvlan bridge
in the host's net ns.
travis-ci: success for series starting with [v10,01/11] net: pass the struct nlattrs to dump() functions
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
For macvlan we need to restore in different ways depending on whether we're
inside or outside a user namespace. We want to share the code that does the
building of the base request, so let's split it out into a populate()
function.
travis-ci: success for series starting with [v10,01/11] net: pass the struct nlattrs to dump() functions
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
We'll use this later in the series to get specific information that macvlan
links need.
v2: pass the IFLA_LINKINFO instead of the whole attribute buffer, since
that's al all we expect the info functions to need, and all we allow
them to populate on restore
travis-ci: success for series starting with [v10,01/11] net: pass the struct nlattrs to dump() functions
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
0e869bf82f30ff6bce3d7cdc66779d8b642c82af introduces this bug, which chops
off the last character of the external veth name, and then subsequent
move_veth_to_bridge() calls fail:
(01.012478) Error (criu/net.c:1758): Can't get index of veth69A67O: No such device
travis-ci: success for veth: fix off by one error
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
CC: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
The latter option should hide, the official API is the --external.
This patch tests the option, thus completing the deprecation.
The legacy -x|--ext-unix-sk test is still in zdtm/static/socket-ext.
travis-ci: success for unix: Test the --external instead of --ext-unix-sk
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
As it should be built anyway - it will not increase build time
significally.
Cc: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
We're using the old cgns prefix length in allocating dirnew, so let's not
update it before that.
travis-ci: success for cgroup: update cgns prefix *after* using it
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
The pid is set on ctl and in the next patch I'll use one
more field.
travis-ci: success for Don't get task regs twice
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Reviewed-by: Dmitry Safonov <dsafonov@virtuozzo.com>
It doesn't matter much how late we check this, but in the
new place we already have parasite_ctl I will need in the
next patch.
travis-ci: success for Don't get task regs twice
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Reviewed-by: Dmitry Safonov <dsafonov@virtuozzo.com>