Also use is_root_mount() helper instead of opencoded
strcmp("./", m->mountpoint) and -Ex error codes in
ERR_PTR.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It's quite unclean while this structure lives
in proc_parse.h, which only have to fill this
structure on procfs read, but real handling
is inside mount.c. Move it as appropriate.
Same time ext_mount structure should be moved
into a header as well with sane @list name
used instead of @l.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It's required to dump uid-s and gid-s from this userns.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This is here only to support the Linux Kernel between versions
3.18 and 4.2. After that, this workaround is not needed anymore,
but it will work properly on both a kernel with and without the bug.
The bug is that when a process has a file open in an OverlayFS directory,
the information in /proc/<pid>/fd/<fd> and /proc/<pid>/fdinfo/<fd>
is wrong, so we grab that information from the mountinfo table instead.
This is done every time fill_fdlink is called.
We first check to see if the mnt_id and st_dev numbers currently match
some entry in the mountinfo table. If so, we already have the correct mnt_id
and no fixup is needed.
Then we proceed to see if there are any overlayFS mounted directories
in the mountinfo table. If so, we concatenate the mountpoint with the
name of the file, and stat the resulting path to check if we found the
correct device id and node number. If that is the case, we update the
mount id and link variables with the correct values.
Signed-off-by: Gabriel Guimaraes <gabriellimaguimaraes@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Otherwise the root yard can be propagated into the host mount namespace
and remain there and criu will fail, because it will not be able to
remove the roots yard.
It occures if we give a shared mount as root to "criu restore" and
criu converts it into a slave mount.
Reported-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
In case if we've dumped read only tmpfs we fail restoring it
because it's mounted with ro flags. Lets mount it with rw,
restore content and then remount as ro.
upd (by xemul@): any fs with restore method likely to
need rw permission on restore.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When open_image() was modified to return a pointer rather than an int
in commit 295090c1, these two checks were overlooked and never fixed.
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Acked-by: Andrew Vagin <avagin@odin.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When using pr_perror(), format string should not end with \n,
as it is added by the macro itself.
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Acked-by: Andrew Vagin <avagin@odin.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Contrary to what I naively thought, the contents of fsauto_names
is undefined if asprintf(&fsauto_names) and this was fixed by
a052e0b60a "check return code of asprintf".
But we can simplify this code a bit. If we rely on return value from
asprintf(), we can simply nullify fsauto_names on failure and avoid
the assymetrical "return false".
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
On ubuntu (gcc 4.9.2), I get:
mount.c: In function ‘add_fsname_auto’:
mount.c:1414:3: error: ignoring return value of ‘asprintf’, declared with attribute warn_unused_result [-Werror=unused-result]
asprintf(&fsauto_names, "%s,%s", old, names);
^
cc1: all warnings being treated as errors
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Note that if the root is unbindable then restore will fail because
cr_pivot_root() tries to bind mount the put dir. If this is a case we want to
support, we may want to rearrange how this code is called.
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
I am stupid. fsname_is_auto() can't use strtok(), the 2nd call will
see zeroes instead of commas in fsauto_names.
Add the css_contains() helper and change fsname_is_auto() to use it.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Andrey reported this issue and it took me a while to figure out exactly what
might cause it. I think the comment describes it accurately, as with that
example I end up with mountinfo on the host like:
47 23 253:1 /root/bind1/subdir /root/bind2 rw,relatime shared:1 - ext4 /dev/disk/by-uuid/6c5a78e0-95fa-49a8-aa91-a8093d295e58 rw,data=ordered
48 23 253:1 /root/bind1 /root/bind3 rw,relatime shared:1 - ext4 /dev/disk/by-uuid/6c5a78e0-95fa-49a8-aa91-a8093d295e58 rw,data=ordered
Reported-by: Andrew Vagin <avagin@odin.com>
CC: Andrew Vagin <avagin@odin.com>
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Andrew Vagin <avagin@odin.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
If a mount like:
96 95 0:21 /cgmanager /sys/fs/cgroup/cgmanager rw master:9 - tmpfs tmpfs rw,mode=755
is present in the container and the host has a similar bind mount, e.g.
46 27 0:21 /cgmanager /sys/fs/cgroup/cgmanager rw shared:9 - tmpfs tmpfs rw,mode=755
then the best match mount's root path /and/ the target mountpoint have part of
the path in them; we should cut the shared piece of the path and just
concatenate the non-duplicate pieces.
Reported-by: Andrew Vagin <avagin@odin.com>
CC: Andrew Vagin <avagin@odin.com>
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Andrew Vagin <avagin@odin.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We only malloc() size amount of space, so we shouldn't snprintf past that.
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Currently this doesn't matter correctness-wise (with or without the
previous changes), but imho collect_mntinfo() needs a cleanup. We
should not return with ->mntinfo_list pointing to the freed memory
on failure, even if currently this failure is fatal and nobody will
ever use this pointer.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This check was added by commit aebfabb5 "mnt: add --ext-mount-map
auto option", but unless I am totally confused it actually belongs
to the (already reverted) 246367e4e4 "add walk_all flag to
walk_namespaces".
Remove it. It is no longer needed and it was very unobvious.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We no longer need to populate ext_ns->mnt.mntinfo_list until
resolve_external_mounts(). We can rely on find_ext_ns_id() which
does collect_mntinfo() on demand.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Currently we rely on the fact that ->mntinfo_list was already
collected by walk_namespaces(walk_all => true), but we are going
to change this.
This patch simply adds collect_mntinfo(ns) into find_ext_ns_id() if
->mntinfo_list == NULL. This is all we need for this ns_id if it was
not initialized by collect_mnt_namespaces().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Preparation. Extract the "search the criu's mount info" code from
resolve_external_mounts() into the new simple helper, find_ext_ns_id().
Also change resolve_external_mounts() to check ext_ns == NULL rather
than !opts.autodetect_ext_mounts. Cosmetic.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
do_new_mount() clears MS_SHARED but this is not enough. It should clear
all bits processed in restore_shared_options().
The patch also adds MS_UNBINDABLE to MS_CHANGE_TYPE_MASK even if it is
not currently used. Just to match the kernel's do_change_type() check.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
resolve_source() insists on kdev_major() == 0, and this makes sense.
However, at least FSTYPE__AUTO can try to use mi->source as a block
device and pray it will work.
[ Also bout this change from Oleg:
Let me send another (last) functional change before the promised
cleanups we discussed.
To remind, without this patch I still can't dump/restore /home and
/boot on my testing machine. --enable-fs xfs "works" in a sense that
"dump" succeeds. But "restore" fails.
However. Lets forget this for the moment. To me resolve_source() looks
just wrong. Sure, I agree, it is not safe to blindly use mi->source if
kdev_major() != 0. But this means that we should not have dumped this
mountpoint, simply because we can't restore it.
Yes, currently this works because fstypes[] contains only the diskless
filesystems, but still.
So this probably needs more cleanups too, and this patch doesn't make
this logic look better.
To me, we should do something like
static char *resolve_source(struct mount_info *mi)
{
if (kdev_major(mi->s_dev) == 0)
/*
* Anonymous block device. Kernel creates them for
* diskless mounts.
*/
return mi->source;
if (mi->fstype->code != FSTYPE__AUTO) {
pr_err("OOPS! something is wrong!!!\n");
return NULL;
}
// OK, this is FSTYPE__AUTO, it should "just work"
// by definition. Or the user should blame himself.
struct stat st;
if (stat(mi->source, &st) || !S_ISBLK(st.st_mode) ||
major(st.st_rdev) != kdev_major(mi->s_dev) ||
minor(st.st_rdev) != kdev_minor(mi->s_dev))
pr_warn("Hmm, can't verify blkdev. Lets see if mount will work...\n");
return mi->source;
}
But this patch only does a minimal change to make FSTYPE__AUTO work
with blkdev.
]
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This option enables external (slave) bind mounts to be resolved.
v2: don't always assume that when the master id matches, the mounts match
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
With this flag, external shared bind mounts are attempted to be resolved
automatically.
v2: don't always assume when the sharing matches that the mount matches
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When this option is specified, if an external (private) bind mount is not
specified by --ext-mount-map KEY:VAL then it is attempted to be resolved
automatically.
v2: introduce find_best_external_match, which looks for the best match based on
sharing/slave ids; don't try to resolve fsroot_mounted() mountpoints
v3: get rid of really_collect_self_mounts
v4: get rid of fsroot_mounted() check when autodetecting external mounts
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Finally add --enable-fs option to specify the comma separated list of
filesystem names which should be treated as FSTYPE_AUTO.
Note: obviously this option is not safe, use at your own risk. "dump"
will always succeed if the mntpoint is auto, but "restore" can fail or
do something wrong if mount(src, mountpoint, flags, options) can not
actually "just work" as FSTYPE_AUTO logic expects.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Add the new mnt_entry->fsname member and change dump_one_mountpoint()
to save pm->fstype->name if fstype == FSTYPE__AUTO.
Change collect_mnt_from_image() to pass this ->fsname to decode_fstype()
which falls back to __find_fstype_by_name(fsname, true) if FSTYPE__AUTO.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Simple preparation to simplify the review of the next patch. Turn
find_fstype_by_name(name) into __find_fstype_by_name(name, force_auto)
and reimplement find_fstype_by_name() as a trivial wrapper on top.
This allows "restore" to specify that this particular fsname was treated
as FSTYPE__AUTO by "dump".
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The comment in find_fstype_by_name() says:
just mounting anything is wrong
and this is true in general, but:
almost every fs has its own features
this is not true in a sense that a lot of supported filesystems do not
need any special processing: FSTYPE__PROC, FSTYPE__SYSFS, and more. More
importantly, this logic does not allow to spicify from the command line
that (say) currently unsupported hugetlbfs can "just work", do_new_mount()
should only pass the right name/options.
This patch adds the new FSTYPE__AUTO code, find_fstype_by_name(name) adds
the new entry if fsname_is_auto(name) returns true. We do not care that
different fstype's can have the same FSTYPE__AUTO code, fstype->code has
no meaning unless we need to do something special with this fs, but in
this case it should not be FSTYPE__AUTO by definition.
Note: currently find_fstype_by_name() just returns true, it is obviously
pointless to "dump" until we teach "restore" to handle FSTYPE__AUTO.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Preparation. Enlarge fstypes[] to make it possible to add the new
fstype's dynamically.
This means ths find_fstype_by_name() and decode_fstype() need the
additional ->name == NULL check to terminate the search.
Also change them to start with "i == 1", we rely on the fact that
fstypes[0] is FSTYPE__UNSUPPORTED anyway.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
1. If a fuse connection is present, but there are no fuse mounts of that type
in the mount namespace, don't refuse to dump.
2. If there are mounts of that type in the container but they are external,
we're going to bind them anyway, so there's no fuse-specific things that
need to be done, so it is safe to dump.
v2: check that the fstype is fuse as well
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
In the rest of this series we need to walk all the namespaces to autodetect
which mounts are master/shared/private bind mounts, so we need the information
from criu's namespace in the case when the namespaces are not the same.
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Currently if /tmp does not exist, CRIU fails because it will not be
able to create a temporary directory there. But when checkpointing
and restoring containers, we cannot rely on the existence of /tmp.
For such containers, we should use root (/). The temporary directory
will be removed after CRIU is done.
Signed-off-by: Saied Kazemi <saied@google.com>
Acked-by: Andrew Vagin <avagin@odin.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
setns(fd, CLONE_NEWNS) resets cwd and root, so we need to
restore them back.
Without this patch stats-dump isn't saved in the work dir:
-rw-r--r-- 1 root root 32 Apr 2 14:21 /stats-dump
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Preparation.
1. Add the new "bool for_dump" arg to collect/parse_mntinfo().
2. Introduce "struct collect_mntns_arg" to pass the additional
"bool for_dump" field to collect_mntinfo() and change it to
pass this boolean to collect_mntinfo()->parse_mountinfo() path.
3. Change other callers of collect_mntinfo() to pass "false".
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
validate_mounts() prints ->mnt_id in hex when it reports the failure.
This complicates the understanding because this ->mnt_id is printed as
decimal elsewhere, including /proc/$pid/mountinfo.
parse_mountinfo() adds "0x" at least and this is just pr_info(), but
lets change it too.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Andrew Vagin <avagin@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>