Finally add --enable-fs option to specify the comma separated list of
filesystem names which should be treated as FSTYPE_AUTO.
Note: obviously this option is not safe, use at your own risk. "dump"
will always succeed if the mntpoint is auto, but "restore" can fail or
do something wrong if mount(src, mountpoint, flags, options) can not
actually "just work" as FSTYPE_AUTO logic expects.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Add the new mnt_entry->fsname member and change dump_one_mountpoint()
to save pm->fstype->name if fstype == FSTYPE__AUTO.
Change collect_mnt_from_image() to pass this ->fsname to decode_fstype()
which falls back to __find_fstype_by_name(fsname, true) if FSTYPE__AUTO.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Simple preparation to simplify the review of the next patch. Turn
find_fstype_by_name(name) into __find_fstype_by_name(name, force_auto)
and reimplement find_fstype_by_name() as a trivial wrapper on top.
This allows "restore" to specify that this particular fsname was treated
as FSTYPE__AUTO by "dump".
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The comment in find_fstype_by_name() says:
just mounting anything is wrong
and this is true in general, but:
almost every fs has its own features
this is not true in a sense that a lot of supported filesystems do not
need any special processing: FSTYPE__PROC, FSTYPE__SYSFS, and more. More
importantly, this logic does not allow to spicify from the command line
that (say) currently unsupported hugetlbfs can "just work", do_new_mount()
should only pass the right name/options.
This patch adds the new FSTYPE__AUTO code, find_fstype_by_name(name) adds
the new entry if fsname_is_auto(name) returns true. We do not care that
different fstype's can have the same FSTYPE__AUTO code, fstype->code has
no meaning unless we need to do something special with this fs, but in
this case it should not be FSTYPE__AUTO by definition.
Note: currently find_fstype_by_name() just returns true, it is obviously
pointless to "dump" until we teach "restore" to handle FSTYPE__AUTO.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Preparation. Enlarge fstypes[] to make it possible to add the new
fstype's dynamically.
This means ths find_fstype_by_name() and decode_fstype() need the
additional ->name == NULL check to terminate the search.
Also change them to start with "i == 1", we rely on the fact that
fstypes[0] is FSTYPE__UNSUPPORTED anyway.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
1. If a fuse connection is present, but there are no fuse mounts of that type
in the mount namespace, don't refuse to dump.
2. If there are mounts of that type in the container but they are external,
we're going to bind them anyway, so there's no fuse-specific things that
need to be done, so it is safe to dump.
v2: check that the fstype is fuse as well
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
In the rest of this series we need to walk all the namespaces to autodetect
which mounts are master/shared/private bind mounts, so we need the information
from criu's namespace in the case when the namespaces are not the same.
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Currently if /tmp does not exist, CRIU fails because it will not be
able to create a temporary directory there. But when checkpointing
and restoring containers, we cannot rely on the existence of /tmp.
For such containers, we should use root (/). The temporary directory
will be removed after CRIU is done.
Signed-off-by: Saied Kazemi <saied@google.com>
Acked-by: Andrew Vagin <avagin@odin.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
setns(fd, CLONE_NEWNS) resets cwd and root, so we need to
restore them back.
Without this patch stats-dump isn't saved in the work dir:
-rw-r--r-- 1 root root 32 Apr 2 14:21 /stats-dump
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Preparation.
1. Add the new "bool for_dump" arg to collect/parse_mntinfo().
2. Introduce "struct collect_mntns_arg" to pass the additional
"bool for_dump" field to collect_mntinfo() and change it to
pass this boolean to collect_mntinfo()->parse_mountinfo() path.
3. Change other callers of collect_mntinfo() to pass "false".
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
validate_mounts() prints ->mnt_id in hex when it reports the failure.
This complicates the understanding because this ->mnt_id is printed as
decimal elsewhere, including /proc/$pid/mountinfo.
parse_mountinfo() adds "0x" at least and this is just pr_info(), but
lets change it too.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Andrew Vagin <avagin@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When an image of a certian type is not found, CRIU sometimes
fails, sometimes ignores this fact. I propose to ignore this
fact always and treat absent images and those containing no
objects inside (i.e. -- empty). If the latter code flow will
_need_ objects, then criu will fail later.
Why object will be explicitly required? For example, due to
restoring code reading the image with pb_read_one, w/o the
_eof suffix thus required the object to be in the image.
Another example is objects dependencies. E.g. fdinfo objects
require various files objects. So missing image files will
result in non-resolved searches later.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Fedora bind-mounts a part of the root mount to itself. Currently we
don't allow to mount children of a shared mount, if other mount from
this shared group are not mounted.
This patch adds an exclusion for cases, when a child has the same
group. We allow to mount a child, if wider mounts are mounted.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Currently we connect roots of sub-namespaces to the root of the root
mount namespace. And we get problems, if the root of the root mntns is
shared, because all children of a shared mount must be propagated to
other mounts in this group.
Actually we mount tmpfs in mnt_roots and here is nothing wrong to add it
in a tree.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
And use the expression for it, it's quite short. This
makes the amount of variables in the code fit into brains.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
Do paths conversions and checks step-by-step and add many comments
what we do in each step and why.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
When we check whether a submount of a mount is visible in another
mount (shared peer of the latter), we can and should use the new
issubpath helper.
Should because the used strncmp may scan beyond ct_mpnt_rpath if
its length is smaller (no checks for this in the code).
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
These are constant for given m, so calculate them outside
of the loop. Also rename them to reflect what they are.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
The path lenght is zero for the "/" one and strlen(path)
for all the others. This is done so to make it possible
to use this length to get tail-paths: if path_1 starts
with path_2 and both are absolute, then
path_1 + path_length(path_2)
would give the tail of the tail of path_1 relative to
path_2 even if the path_2 is just "/".
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
A problem which is solved in this path is that some children can be
unaccessiable (unvisiable) for non-root bind-mounts
root mount point
-------------------
/ /a (shared:1)
/ /a/x
/ /a/x/y
/ /a/z
/x /b (shared:1)
/ /b/y
/b is a non-root bind-mount of /a
/y is visiable to both mounts
/z is vidiable only for /a
Before this patch we checked that the set of children is the same for
all mount in a shared group. Now we check that a visiable set of mounts
is the same for all mounts in a shared group.
Now we take the next mount in the shared group, which is wider or equal
to current and compare children between them.
Before this patch validate_shared(m) validates the m->parent mount.
Now it validates the "m" mount. So you can find following lines in the
patch:
- if (m->parent->shared_id && validate_shared(m))
+ if (m->shared_id && validate_shared(m))
We doesn't support shared mounts with different set of children.
Here is an example of such case can be created:
mount tmpfs a /a
mount --make-shared /a
mkdir /a/b
mount tmpfs b /a/b
mount --bind /a /c
In this case /c doesn't have the /b child. To support such cases,
we need to sort all shared mounts accoding with a set of children.
v2: If root is equal to "/", its len should be zero. We expect that the
last symbol in a path is not "/".
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When we validate the mount tree not to have overmounts we need to
check one path to be the sub-path of another. Here's a helper for
this.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
When we create a new mntns in a userns, all inhereted mounts are marked
as locked. pivot_root() returns EINVAL if a new root is locked.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Introduced by eb214be2, the empty mnt_share list cannot
produce the list_first_entry element :)
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
This is for two reasons. First, validation can meet external mount
and will call plugins, which is not correct on pre-dump and actually
crashes on uninitilized plugins lists. Second, even if on pre-dump
mount tree is not "supported" this can be a temporary situation (yes,
yes, unlikely, but still).
On the other hand, it's better to fail earlier, but that's another
story.
Reported-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
In case if we meet virtualized devtmpfs on dump
(which means its s_dev is different from one obtained
during mountpoints dump procedure) we should dump it
with tar help. Thus on restore it get filled from the
image.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We will need devtmpfs as well so make it general.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
So we need to create a temporary private mount for the old root.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This reverts commit 21d1b2fdb9.
pivot_root() can move the current root to a non-shared mount. So we are
going to create a temporary private mount in put_old.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We don't support posix mqueues at the moment
but in case if this fs is simply mounted and
not used lets proceed without errors.
In case if someone is using it we detect it
because fs won't be empty and refuse to dump.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>