This is a preparation of mounts-v2 new algorithm for mount restore, we
add an alternative mountpoints to each mount, so that if we mount mounts
in these mountpoints they will be "plain": each mount in separate
sub-directory of root_yard, mounts will be mounted without tree. Tree
reconstruction will be done in separate step.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/5e6de171a
Changes: improve get_plain_mountpoint().
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
We plan to switch to Mounts-v2 engine for restoring mounts by default,
this options is to allow switching to old engine. This patch only adds
an option, no engine behind it yet.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/503f9ad2c
Changes: allow --mntns-compat-mode option only on restore and only if
MOVE_MOUNT_SET_GROUP is supported (this also requires change in
unittest/mock.c), change id in rpc criu_opts.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
This helper would be useful to get mountpoints of source path of
external mounts without parsing host mountinfo. When we restore
mountpoint-external mount and we need to copy sharing from source via
MOVE_MOUNT_SET_GROUP, it would require from us to give it real
mountpoint of source path to be able to copy sharing group.
This uses openat2 RESOLVE_NO_XDEV feature which detects crossing
mountpoint boundary instead of potentially slow mountinfo parsing.
v3: coverity CID 389209: close fd only when it was opened
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Will use this for cross mount namespace bindmounts.
Note: don't need separate kdat for mount-v2, as MOVE_MOUNT_SET_GROUP
were added much later than open_tree and all related fixups.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Mounts-v2 requires new kernel feature MOVE_MOUNT_SET_GROUP to be able to
restore propagation between mounts right.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/7da7f9a17
Changes: define move_mount syscall, check mainstream kernel
MOVE_MOUNT_SET_GROUP feature, use our "linux/mount.h" to overcome
possible problems of non-existing header on older kernels.
v3: coverity CID 389201: check ret of umount2 and rmdir at cleanup stage
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
While mounts-v2 mounts all mounts plain without tree in service mntns we can't
just use path relative to mntns to find remap. Make it mount related, it is
also compatible with mounts-v1.
Also we don't need openat and unlinkat here as we've opened rmntns_root
just before that, lets switch to "non-at" variants.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/dc9ac0c80
Changes: rework to skip vz-specific hunks.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
While mounts-v2 would mount all mounts plain without tree in service
mntns we can't just use path relative to mntns to find remap. Make it
mount related, it is also compatible with current mount engine.
Also handle no-mntns case separately in nomntns_create_ghost.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/9cdf0b3e4
Changes: make gf->remap.rpath always relative else we get:
Error (criu/files-reg.c:779): Couldn't unlink remap
/tmp/.criu.mntns.BCurDL/13-0000000000 /zdtm/static/cwd02.test:
No such file or directory
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
This getter should be used when we wan't to access the mount on the filesystem.
In next patches we want to be able to change the location of the mount on
restore in service mount namespace, while not changing ->mountpoint string.
All places where we don't want to access the mount but instead want to
determine relations between mounts in the initial mount tree or just print path
should use ns_mountpoint.
This change effectively brings no change of behaviour everything is the same
for now.
Still leave ->mountpoint references for remap, cr_time and initialization which
need to work with exact variable.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/235c761e0
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
On dump ->mountpoint and ->ns_mountpoint is the same, but on restore
->mountpoint can be changed by mount tree yard setup and remap (and who
knows what else =) ). It is not good to use ->mountpoint for path
comparison between mounts if we are not explictly need to compare
"changed" paths. Imagine the remap change will make two mounts have
different prefixes in ->mountpoint and we won't be able so understand
that those mounts originally were subpaths.
This patch handles 2 simple cases:
a) These functions called ONLY ON DUMP so for them there is no effective
change: fixup_overlayfs, fusectl_dump, check_one_mark, __lookup_overlayfs,
mount_resolve_path, try_resolve_ext_mount, validate_mounts (first and third),
resolve_external_mounts, get_clean_mnt, __umount_children_overmounts,
__umount_overmounts, ns_open_mountpoint, open_mountpoint, dump_one_fs,
dump_one_mountpoint, clean_cr_time_mounts, collect_unix_bindmounts.
b) In these functions ONLY LOGS changed, so no algorithm change:
always_fail, mnt_build_ids_tree, mnt_tree_show, unsupported_nfs_bindmounts,
unsupported_nfs_mount, unsupported_mount, validate_mounts (second),
__search_bindmounts, resolve_shared_mounts, mnt_tree_for_each, resolve_source,
propagate_siblings, propagate_mount, do_mount_one, get_mp_root,
collect_mnt_from_image, merge_mount_trees, ns_remount_writable,
__remount_readonly_mounts, parse_mountinfo.
All complex cases are handled in separate patches.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/4972888dd
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
At this point ns_mountpoint is equal to mountpoint.
More over let's use robust is_same_path helper in should_skip_mount so
that we don't need to rely on ->mountpoint + 1 hacks.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/d4c4271a0
Changes: use is_same_path helper.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Previous code did:
1) get rpath: mount's mountpoint relative to it's parent mountpoint
2) get cut_root: parent's root relative to parent's slave root or vice
versa (will be "-" if parents root is wider of "+" if thicker)
3) return parent's slave mountpoint +/- cut_root + rpath
It can be done more robust with get_relative_path:
1) get rpath: mount's mountpoint relative to it's parent mountpoint
2) get fsrpath: add rpath to parent's root (path relative to fs root)
3) get rpath: fsrpath relative to parent's slave root
4) return parent's slave mountpoint + rpath
In the latter approach we do not need to open code workarounds for
consequent slashes in paths (get_relative_path would do this for us),
and we also do not need to have complex logic with +/-.
While on it let's also switch ->mountpoint to ->ns_mountpoint where
possible, as mountpoint can have unexpected prefixes.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/0fd09f8571
Changes: rework mnt_get_sibling_path more.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
We need to skip root_yard_mp parent as it has no ns_mountpoint, it also
has no children overmounts so we are safe, all others can be compared by
ns_mountpoints.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/e5665c976
Changes: add mi->parent pre-check, reword commit message.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Fail root_path_from_parent if parent is root_yard, we want to only
lookup root path in real parent mounts.
Now it is safe to use ns_mountpoint instead of mountpoint as both
children and parent have it and they are relative.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/e58a91883
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Function validate_children_collision is both called on dump and on
restore. On dump mountpoint and ns_mountpoint are the same. On restore
as we never call validate_children_collision on helper mounts
(root_yard_mp and cr_time are not in mntinfo list), for all other mounts
strcmp results would be the same with mountpoint and ns_mountpoint.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/8f4fda5ac
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
There is no point of remaping ns root mounts they can't overmount anybody.
This also allows us to switch mnt_needs_remap from ->mountpoint to
->ns_mountpoint for mount comparison in overmount detection.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/9475bf843
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Let's use ->ns_mountpoint in comparison as ->mountpoint can change (e.g.
see how we add ns root in get_mp_mountpoint and in do_remap_mount we can
change it again). We plan to get rid of ->mountpoint everywhere where we
can use unchanged ->ns_mountpoint.
Cherry-picked hunks from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/e98e1456d
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Replace ->mountpoint with ->ns_mountpoint for determining relations
between mounts.
Also let's use get_relative_path in autofs_create_dentries as it is more
robust, before that we've missed the case where mountpoint of child of
autofs mount is multilevel subdirectory of parent mountpoint, and always
created them as single level subdirectory.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/5d5462202
Changes: skip children overmount as it does not need a subdirectory.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Put remounted_rw to it. This allows us to easily add some more of such
variables without allocating each one of them separately.
Due to existance of shfree_last shmalloc'ed region can be inherited from
the previous caller so it needs to be explicitly zero initialized.
Fixes: 0a2d380e6 ("ghost/mount: allocate remounted_rw in shmem to get
info from other processes")
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/6750e5793
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Expression (x && REMOUNTED_RW) is always same as just (x).
It should've been (x & REMOUNTED_RW) to check if mount is marked as
temporary remounted writable and requires to be switched back.
By fixing this check we eliminate excess readonly remounts.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/167f8ac67
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Function mnt_is_overmounted is designed to detect if mount is overmounted in
current tree using comparison of mountpoints of neighbour mounts for detection.
We want to get actual overmounts in dumped tree, we don't expect that helper
mounts we add or merging will introduce new overmounts. So let's do overmount
detection earlier before adding helpers.
Set is_overmounted = false for root yard and binfmt helper mounts.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/e98e1456d
Changes: rename set_is_overmounted to prepare_is_overmounted, move it
just after collecting mounts from images to mount tree, handle helper
mounts.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
There is no point to lose this information, having -1 everywhere in
mount images instead of acutall master id can be confusing.
Note that now need_master is true for bindmounts of root mounts with
same master_id as root mount, so now they are handled with a common
code, we've added can_receive_master_from_root check specially to handle
this case right. Also note that in propagate_mount we no more set ->bind
for this case, this is handled by mnt_ext_slave list related code.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/b3c9dc05e
Stripped only master_id relative part of original patch, add
preparational patches before this one.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
If mount has external master_id it can inherit it as a bind of external
mount, but also it can inherit it as a bind of container root mount, so
let's add similar condition to allow such mounts.
Note: need_master is false for binds of root mount which can inherit
master_id from root mounts yet, this would change in next patch.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Root yard mount also has mnt_id == 0 so it will look better with a new
name. Let's explicitly initialize root yard mnt_id to HELPER_MNT_ID
for the sake of code readability.
Also in near future we might want to create additional mount helpers to support
mounts in CT with no fsroot mounted.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/45bf6f0ee
Changes: split umount hunk to previous patch, set HELPER_MNT_ID for root
yard.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
On dump, yes, mountpoint and ns_mountpoint are the same, but on restore
they don't and puting something like "<root_yard>/binfmt_misc" to
ns_mountpoint is wrong, let's leave ns_mountpoint NULL, this mount
should not be compared by ns_mountpoint with other mounts anyway.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Put our auxiliary binfmt_misc mount in "<root_yard>/binfmt_misc" instead
of "<root_yard>/<mntns>/proc/sys/fs/binfmt_misc". Thus we can restore
binfmt_misc without altering actual mount tree, which looks much more
safe.
For that we need to remove "fake top mount_info" handling from
add_cr_time_mount as now we intentionally add binfmt_misc mount as a
child of ("fake") root yard. On dump this does not change anything.
Also we need to create mountpoint for binfmt_misc in root yard.
As now mount is out of restored mount tree we don't need to umount it,
so remove corresponding CRTIME_MNT_ID umount hunk in do_new_mount.
Note: to make binfmt_misc c/r work criu should be compiled with
CONFIG_BINFMT_MISC_VIRTUALIZED and binfmt_misc should be actually
virtualized and this is only done in Virtuozzo kernel per ve.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/2eb535843
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/d79c7f441
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/34002bef4
Cherry-picked one hunk from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/45bf6f0ee
Changes: merge all fixups together to one consistent patch.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Before this change we didn't apply sb-flags if we mount the root mount of
non-root mntns. There is no point in it, if we got to do_new_mount this root
mount is not external bind, so we won't change sb-flags on host if we change it
for this mount. So we just loose sb-flags on some regular container mount for
no reason. Fix it.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/e7ffe4c60
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
This creates nested mntns and does pivot_root to tmpfs mount, so that
roots of original test mntns and in nested mntns are different.
Before allowing nested mntnses with different roots in previous patch
this would fail.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Helper mnt_is_root_bind indicates that mount can be bind-mounted from
the root mount (which in it's turn from opts.root).
Use it in validate_mounts: we should skip unsupported mount from fsroot check
if we know it will be bindmounted from root mount, is_ns_root check was wrong.
Also fix root mount check in dump_one_fs, root mounts in non root mntns should
be dumped normally if they are not bind-mounts of root mount.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/25d078971
Changes: switch to mnt_bind_pick helper, export to mount.h, also add
mnt_get_root_bind helper for future use in mount-v2, remove excess root
yard hunk.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
This test creates two mount namespaces, one "root" with external mount
at /mnt_ext_collision.test/dst and one "nested" with different internal
mount at /mnt_ext_collision.test/dst instead.
This case is important for nested containers, if we dump a container
with some external mount in /mnt we should not also replace mounts in
/mnt for nested containers with the external one. (One example is docker
containers inside Virtuozzo containers.)
Without previous patch which restricts external mounts resolution to
only root mntns of container this test fails as internal mount is
replaced by external one after migration.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
We resolve mountpoint-external mounts on dump by mountpoint comparison,
so if we have other mount (other superblock e.g. in nested mntns) with
same mountpoint we would also resolve this mount as external and restore
it as external: replacing it completely with different mount... That's
wrong, so to make this interface more robust let's only resolve
mountpoint-external mounts in root mntns of container, not in all
mntnses as it was before.
Note: if actual external mount (bind of external) gets to nested mntns
it's ok not to resolve it as external as criu would bind it from the
resolved mount in root mntns. So external mounts in nested mntns are
still supported after this patch.
Cherry-picked one hunk from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/034498b28
Changes: apply mntns check only to mountpoint-external mounts.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
This test simply creates a) root external mount and b) "deeper"
bindmount for it (deeper in terms of mnt_depth). Our mount restore code
tries to mount (b) first and fails (without previous patch ordering
external mounts before their binds).
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/d31954669
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
The problem when we don't order these mounts we can get to mounting
non-external bind first via do_new_mount and fail c/r. For instance for
tmpfs we would fail on no image to get contents from. See the test
mnt_ext_root for more info.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/baf3f8db8
Changes: switch to mnt_bind_pick helper, export to mount.h, make check
in can_mount_now skip mounts with ->bind set.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Function dump_one_fs already has mnt_is_external_bind check inside, so
there is no point to check pm->external one more time.
Function check_bindmount is intended to check devpts bindmount's master
was opened in right mount namespace, but if bindmount is external mount
there is no point to check this. Let's also skip check for bindmounts of
external mounts.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
We use mnt_is_external():
1) In validate_mounts() to skip fsroot existence check for mounts which
will be bind-mounted from external mounts.
2) In resolve_shared_mounts() to skip error on slave mounts without
master mount, if they can receive these master_id through external
mount.
3) In dump_one_fs to skip dump of mounts which will be bind-mounted from
external mounts.
Cases (1) and (3) are the same, but case (2) is quiet different. Lets
split these cases thus making things simplier.
Effectively these patch does not change criu's behaviour at all. While
I can't say that old mnt_is_external was wrong, it was too complex and
hard for understanding, so it's worth to switch to lookup across
bindmounts list via general mnt_bind_pick() helper. And now when it is
obvious that mnt_is_external looks for external bindmount, let's also
change it's name to mnt_is_external_bind.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/494b52ba8
Changes: use mnt_bind_pick helper, use is_sub_path helper to be more
robust, rename mnt_is_external to mnt_is_external_bind, fix
clang-format, export to mount.h, use mnt_is_nodev_external as we can not
inherit master from device-external mounts.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>