If a process is in another pidns than /proc, the link /proc/self doesn't
work.
(00.061569) Error (mount.c:558): Can't bind-mount
46:/zdtm/live/static/tempfs.test to /tmp/cr-tmpfs.gBVwTb: No such file
or directory
But since we've switched to the mount namespace (with setns) we
can just go an open the path by its name.
Reported-by: Urgen Sherpa <urgen.sherpa@nepallink.net>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Some filesystems do not provide open-by-handle functionality. For those,
we should abort fsnotifies dumping, not restoring.
The open_mount() changes are about opening mountpoints inside another
mount namespace.
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
If a mount is slave and it has a shared group. crtools must convert it
in slave and only than crtools can make it shared.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The expression in if () becomes quite complex and
deserves a helper with proper explanation of what's
going on.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
All the entries with with_plugin set will be mounted by plugin.
The interesting case is when we do the pivot-root restore. In this
case we call restore callback very early (before we unmount the old
tree) and ask it to create the mountpoint at temporary location.
Later we move the mount to proper place.
The old_root argument of the callback is where it can find files
in the original mount namespace.
The is_file is return-argument. Sine files and directories cannot be
bind-mounted to each-other, the callback should create the mountpoint
itself and report whether it created file or directory.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
External bind mounts are those with source sitting outside of the
current FS view. Such are detected in validate_mounts(), so we
just go ahead and call plugins.
The plugin is provided with the mountpoint to decide whether it's
his or not (what else does the guy need?) and an ID with this it
can identify the mountpoint in /proc. The same ID will be used at
restore time to find the needed restore info.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We will need images at hands while we do pivot_root (see further patches),
so prepare the images reading routine.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Remove whitespace at EOL (found by git grep ' $')
(the character before $ is real tab, typed in shell using Ctrl+V Tab)
To people using vim, I'd suggest adding the following code to ~/.vimrc:
let c_space_errors = 1
highlight FormatError ctermbg=darkred guibg=darkred
match FormatError /\s\+$\|\ \+\t\|\%80v.\|\ \{8\}/
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
If we specify a new root for restore old mounts get destroyed
with pivot_root + umount calls, tree umount is omitted. In this
case mi-s are leaked.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Validation means -- check chat we can _restore_ this tree.
Those read from proc can be in any state -- we're going to
umount them (can do anything) and do path resolution (work
for any knots as well).
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Remove whitespace at EOL (found by git grep ' $')
To people using vim, I'd suggest adding the following code to ~/.vimrc:
let c_space_errors = 1
highlight FormatError ctermbg=darkred guibg=darkred
match FormatError /\s\+$\|\ \+\t\|\%80v.\|\ \{8\}/
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This is more correct, as if st_dev == phys_dev check fails
we have to treat phys_dev as kdev for path resolve device
comparison.
Howver, this is not the case for non-btrfs FSs, and for the
latter one doesn't change anything as it uses anon devices
which are equal for kdev and odev cases.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When dumping a ghost file we put real device in its header,
not the (btrfs) virtual one. This is done since we put real
devices into fsnotify images (we get them from proc). That
said on fsnotify ghost restore we don't need to do path
resolution, just devices compare.
And one more thing. When dumping device for ghost file for
_non_ btrfs case we have to convert stat dev_t into kernel
dev_t as all the other places in criu manipulate the latter
ones.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It's used by phys_stat_resolve_dev (broken by c5d2386a)
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Instead of scanning btrfs subvolumes (which can be even unaccessbile
if mount point lays on directory instead of subvolume itself) we use
path resolving feature here -- once we need to figure out if some
device number need to be altered up to mount point (as we know stat()
called on subvolume returns st_dev for subvolume itself, but not
one that associated with a superblock and shown in /proc/self/mountinfo
output).
This as well implies that we need to check if device number for ghost
files are to be updated to match mountinfo, thus we use phys_stat_resolve_dev
helper here.
After this patch the previously merged btrfs engine is no longer needed
(at least it seems so) and can be dropped.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This routine is aimed to find a mount point on which
the path passed as argument is laying on. We walk over
all mount points and see which one is matching.
Once found (in worst case it will be a root mount point
so function is never failing) we're checking if this is
btrfs and then return subvolume0 device id.
See commit 921cf873f30ad35df2b65602ed402f94695a6eb3
for details what the hell we're doing here.
v2: rewrite mount_resolve_path w/o recursion
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
For paths resolution we will need mount tree to be parsed
and built, but it's not that simple -- the current code
implies that once parsed the tree must not be re-parsed
again, so we pass @parse argument from a caller: if a task
we're restoring do not use mount namespace, we should parse
mount tree early, otherwise defer this action until mount
tree is read from the image.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This variable should be carrying root of a parsed
mount tree pointed by @mntinfo. Thus if @mntinfo
get destroyed the @mntinfo_root should be set to NULL.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We will need to parse btrfs stuff, but this one is not
in the supported list yet (as it's bound to hardware).
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
This helper serves to hide fs specifics (in particular
btrfs) thus the caller won't need the details.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We allocate mount_info with xzalloc, no need for
additional NULL assignment.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
There's ... a number of places where we want to do something
with /proc/self/fd/%d path. Each time we guess buffer size
that is enough for this. Make standard constant for this and
save some space on stack and drop args for some functions.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We are going to replace pid on id in names of image files. The id is
uniq for each namespace, so it's more convient, if image files are
opened per namespace.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
In case criu and dumpee live in the same mount namespace there's no
need in getting ns' root from init task. We can get it from criu and
(!) void the root == "/" check, required for namespace case.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Currently we check that all shared mounts have identical set of
children and that Each non-root mount has a proper root mount.
v2: check that nobody is overmounted
check a tree before trying to restore it.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
A non-root mount is bind-mounted from a proper root mount.
Non-root mount without root mount is not supported yet
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The idea is simple. If a mount can't be mounted now, we will try to
mount it later.
v2: don't wait slaves, they are unmounted anyway
v3: add a comment in do_bind_mount to explain restoring of shared groups
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
A few sentences, which are required for understanging this patch
2a) A shared mount can be replicated to as many mountpoints and all the
replicas continue to be exactly same.
2b) A slave mount is like a shared mount except that mount and umount
events only propagate towards it.
2c) A private mount does not forward or receive propagation.
All rules is there Documentation/filesystems/sharedsubtree.txt
If it's a first mount in a group, all group members should be
bind-mounted from this one.
Each mount propagates to all members of parent's group. The group can
contains a few slaves.
Mounts, which have propagated to slaves, are unmounted, because we can't
be sure, that they propagated in real life. For example:
mount --bind --make-slave /share /slave1
mount --bind --make-slave /share /slave2
mount /share/test
umount /slave2/test
mount --make-share /slave1/test
mount --bind --make-share /slave1/test /slave2/test
41 40 0:33 / /share rw,relatime shared:28 - tmpfs xxx rw
42 40 0:33 / /slave1 rw,relatime master:28 - tmpfs xxx rw
43 40 0:33 / /slave2 rw,relatime master:28 - tmpfs xxx rw
44 41 0:34 / /share/test rw,relatime shared:29 - tmpfs xxx rw
46 42 0:34 / /slave1/test rw,relatime shared:30 master:29 - tmpfs xxx rw
45 43 0:34 / /slave2/test rw,relatime shared:30 master:29 - tmpfs xxx rw
/slave1/test and /slave2/test depend on each other and minimum one of them
doesn't propagate from /share/test
v2: use false and true for bool
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Try to restore mounts while a postpone list isn't empty and check
that each iteration has some progress, otherwice it will fails for
preventing infinite loops
v2: rework logic about postpone list
add more comments
v3: one more attempt to make it more readable
v4: Here is a master class from Pavel how to write self-documented code.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
They are required for restoring shared and slave mounts
v2: use the same names of variables in image and in code
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
All shared mounts from one group are connected to circular list.
All slave are added into the proper master list.
v2: change variable name and fix a bug about adding shared mounts in a
circular list.
v3: handle errors of collect_shared
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>