validate_mounts() prints ->mnt_id in hex when it reports the failure.
This complicates the understanding because this ->mnt_id is printed as
decimal elsewhere, including /proc/$pid/mountinfo.
parse_mountinfo() adds "0x" at least and this is just pr_info(), but
lets change it too.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Andrew Vagin <avagin@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When an image of a certian type is not found, CRIU sometimes
fails, sometimes ignores this fact. I propose to ignore this
fact always and treat absent images and those containing no
objects inside (i.e. -- empty). If the latter code flow will
_need_ objects, then criu will fail later.
Why object will be explicitly required? For example, due to
restoring code reading the image with pb_read_one, w/o the
_eof suffix thus required the object to be in the image.
Another example is objects dependencies. E.g. fdinfo objects
require various files objects. So missing image files will
result in non-resolved searches later.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Fedora bind-mounts a part of the root mount to itself. Currently we
don't allow to mount children of a shared mount, if other mount from
this shared group are not mounted.
This patch adds an exclusion for cases, when a child has the same
group. We allow to mount a child, if wider mounts are mounted.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Currently we connect roots of sub-namespaces to the root of the root
mount namespace. And we get problems, if the root of the root mntns is
shared, because all children of a shared mount must be propagated to
other mounts in this group.
Actually we mount tmpfs in mnt_roots and here is nothing wrong to add it
in a tree.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
And use the expression for it, it's quite short. This
makes the amount of variables in the code fit into brains.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
Do paths conversions and checks step-by-step and add many comments
what we do in each step and why.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
When we check whether a submount of a mount is visible in another
mount (shared peer of the latter), we can and should use the new
issubpath helper.
Should because the used strncmp may scan beyond ct_mpnt_rpath if
its length is smaller (no checks for this in the code).
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
These are constant for given m, so calculate them outside
of the loop. Also rename them to reflect what they are.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
The path lenght is zero for the "/" one and strlen(path)
for all the others. This is done so to make it possible
to use this length to get tail-paths: if path_1 starts
with path_2 and both are absolute, then
path_1 + path_length(path_2)
would give the tail of the tail of path_1 relative to
path_2 even if the path_2 is just "/".
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
A problem which is solved in this path is that some children can be
unaccessiable (unvisiable) for non-root bind-mounts
root mount point
-------------------
/ /a (shared:1)
/ /a/x
/ /a/x/y
/ /a/z
/x /b (shared:1)
/ /b/y
/b is a non-root bind-mount of /a
/y is visiable to both mounts
/z is vidiable only for /a
Before this patch we checked that the set of children is the same for
all mount in a shared group. Now we check that a visiable set of mounts
is the same for all mounts in a shared group.
Now we take the next mount in the shared group, which is wider or equal
to current and compare children between them.
Before this patch validate_shared(m) validates the m->parent mount.
Now it validates the "m" mount. So you can find following lines in the
patch:
- if (m->parent->shared_id && validate_shared(m))
+ if (m->shared_id && validate_shared(m))
We doesn't support shared mounts with different set of children.
Here is an example of such case can be created:
mount tmpfs a /a
mount --make-shared /a
mkdir /a/b
mount tmpfs b /a/b
mount --bind /a /c
In this case /c doesn't have the /b child. To support such cases,
we need to sort all shared mounts accoding with a set of children.
v2: If root is equal to "/", its len should be zero. We expect that the
last symbol in a path is not "/".
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When we validate the mount tree not to have overmounts we need to
check one path to be the sub-path of another. Here's a helper for
this.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
When we create a new mntns in a userns, all inhereted mounts are marked
as locked. pivot_root() returns EINVAL if a new root is locked.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Introduced by eb214be2, the empty mnt_share list cannot
produce the list_first_entry element :)
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
This is for two reasons. First, validation can meet external mount
and will call plugins, which is not correct on pre-dump and actually
crashes on uninitilized plugins lists. Second, even if on pre-dump
mount tree is not "supported" this can be a temporary situation (yes,
yes, unlikely, but still).
On the other hand, it's better to fail earlier, but that's another
story.
Reported-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
In case if we meet virtualized devtmpfs on dump
(which means its s_dev is different from one obtained
during mountpoints dump procedure) we should dump it
with tar help. Thus on restore it get filled from the
image.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We will need devtmpfs as well so make it general.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
So we need to create a temporary private mount for the old root.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This reverts commit 21d1b2fdb97facf88436d083896311367e4398af.
pivot_root() can move the current root to a non-shared mount. So we are
going to create a temporary private mount in put_old.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We don't support posix mqueues at the moment
but in case if this fs is simply mounted and
not used lets proceed without errors.
In case if someone is using it we detect it
because fs won't be empty and refuse to dump.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
I found this solution in the LXC code. We can open the old root, call
pivot_root(".", "."), call fchdir to the old root and call umount(".").
Now restore will not fail, if the root is read-only.
In addition it's a bit faster.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We want to have buffered images to speed up dump and,
slightly, restore. Right now we use plan file descriptors
to write and read images to/from. Making them buffered
cannot be gracefully done on plain fds, so introduce
a new class.
This will also help if (when?) we will want to do more
complex changes with images, e.g. store them all in one
file or send them directly to the network.
For now the cr_img just contains one int _fd variable.
This patch chages the prototype of open_image() to
return struct cr_img *, pb_(read|write)* to accept one
and fixes the compilation of the rest of the code :)
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
We currently have all mouninfo-s from all mnt namespaces collected
in one big list. On dump we scan through it to find the namespaces
we need to dump.
This can be optimized by walking the list of namespaces instead.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
Currently here is a bug, because when we see criu's mount namespace,
we go to the "out" mark and don't validate mounts.
Reported-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
$ cat /proc/self/mountinfo
...
1 1 0:2 / / rw - rootfs rootfs rw,size=373396k,nr_inodes=93349
...
You can see that mnt_id and parent_mnt_id are equals here.
This patch interpretes this case as a root mount in a tree.
0'th mount is rootfs, which is mounted in init_mount_tree().
We don't see it in cases when system makes chroot, because of
static int show_mountinfo(struct seq_file *m, struct vfsmount *mnt)
...
/* mountpoints outside of chroot jail will give SEQ_SKIP on this */
err = seq_path_root(m, &mnt_path, &root, " \t\n\\");
Cc: beproject criu <beprojectcriu@gmail.com>
Cc: Christopher Covington <cov@codeaurora.org>
Reported-by: beproject criu <beprojectcriu@gmail.com>
Reviewed-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
mntinfo contains mounts from all namespaces, so we can validate it only
once after collecting mounts.
v2: add a fake comment about goto
v3: add a real comment about goto
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Currently we stript options only one of brothers, but
mount_equal() thinks that two brothers should have the same options.
Execute zdtm/live/static/mountpoints
./mountpoints --pidfile=mountpoints.pid --outfile=mountpoints.out
Dump 2737
WARNING: mountpoints returned 1 and left running for debug needs
Test: zdtm/live/static/mountpoints, Result: FAIL
==================================== ERROR ====================================
Test: zdtm/live/static/mountpoints, Namespace:
Dump log : /root/git/criu/test/dump/static/mountpoints/2737/1/dump.log
--------------------------------- grep Error ---------------------------------
(00.146444) Error (mount.c:399): Two shared mounts 50, 67 have different sets of children
(00.146460) Error (mount.c:402): 67:./zdtm_mpts/dev/share-1 doesn't have a proper point for 54:./zdtm_mpts/dev/share-3/test.mnt.share
(00.146820) Error (cr-dump.c:1921): Dumping FAILED.
------------------------------------- END -------------------------------------
================================= ERROR OVER =================================
Reported-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Tested-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
"continue" is called by mistake, so we skip a few checks for shared
mounts without siblings.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Here we define new api to be used in plugins.
- Plugin should provide a descriptor with help of
CR_PLUGIN_REGISTER macro, or in case if plugin require
no init/exit functions -- with CR_PLUGIN_REGISTER_DUMMY.
- Plugin should define a plugin hook with help of
CR_PLUGIN_REGISTER_HOOK macro.
- Now init/exit functions of plugins takes @stage
argument which tells plugin which stage of criu
it's been called on dump/restore. For exit it
also takes @ret which allows plugin to know if
something went wrong and it needs to cleanup
own resources.
The idea behind is to not limit plugins authors with names
of functions they might need to use for particular hook.
Such new API deprecates olds plugins structure but to keep
backward compatibility we will provide a tiny layer of
additional code to support old plugins for at least a couple
of release cycles.
For example a trivial plugin might look like
| #include <sys/types.h>
| #include <sys/stat.h>
| #include <fcntl.h>
| #include <libgen.h>
| #include <errno.h>
|
| #include <sys/socket.h>
| #include <linux/un.h>
|
| #include <stdio.h>
| #include <stdlib.h>
| #include <string.h>
| #include <unistd.h>
|
| #include "criu-plugin.h"
| #include "criu-log.h"
|
| static int dump_ext_file(int fd, int id)
| {
| pr_info("dump_ext_file: fd %d id %d\n", fd, id);
| return 0;
| }
|
| CR_PLUGIN_REGISTER_DUMMY("trivial")
| CR_PLUGIN_REGISTER_HOOK(CR_PLUGIN_HOOK__DUMP_EXT_FILE, dump_ext_file)
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The AUFS support code handles the "bad" information that we get from
the kernel in /proc/<pid>/map_files and /proc/<pid>/mountinfo files.
For details see comments in sysfs_parse.c.
The main motivation for this work was dumping and restoring Docker
containers which by default use the AUFS graph driver. For dump,
--aufs-root <container_root> should be added to the command line options.
For restore, there is no need for AUFS-specific command line options
but the container's AUFS filesystem should already be set up before
calling criu restore.
[ xemul: With AUFS files sometimes, in particular -- in case of a
mapping of an executable file (likekely the one created at elf load),
in the /proc/pid/map_files/xxx link target we see not the path
by which the file is seen in AUFS, but the path by which AUFS
accesses this file from one of its "branches". In order to fix
the path we get the info about branches from sysfs and when we
meet such a file, we cut the branch part of the path. ]
Signed-off-by: Saied Kazemi <saied@google.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
If the root task is forked in a new pidns, it can't use its pid for
accessing /proc, because this proc belongs to the source pidns.
v2: don't copy a static string.
v3: take a bright part of Tycho's patch
Reported-by: Tycho Andersen <tycho.andersen@canonical.com>
Cc: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It looks like criu constantly postpones external bind mounts. I'm trying to resolve
when we manage to break this (when I did ext-mount-map they for some reason didn't).
Meanwhile, this patch fixes it back.
Reported-by: Saied Kazemi <saied@google.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>