Previous code did:
1) get rpath: mount's mountpoint relative to it's parent mountpoint
2) get cut_root: parent's root relative to parent's slave root or vice
versa (will be "-" if parents root is wider of "+" if thicker)
3) return parent's slave mountpoint +/- cut_root + rpath
It can be done more robust with get_relative_path:
1) get rpath: mount's mountpoint relative to it's parent mountpoint
2) get fsrpath: add rpath to parent's root (path relative to fs root)
3) get rpath: fsrpath relative to parent's slave root
4) return parent's slave mountpoint + rpath
In the latter approach we do not need to open code workarounds for
consequent slashes in paths (get_relative_path would do this for us),
and we also do not need to have complex logic with +/-.
While on it let's also switch ->mountpoint to ->ns_mountpoint where
possible, as mountpoint can have unexpected prefixes.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/0fd09f8571
Changes: rework mnt_get_sibling_path more.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
We need to skip root_yard_mp parent as it has no ns_mountpoint, it also
has no children overmounts so we are safe, all others can be compared by
ns_mountpoints.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/e5665c976
Changes: add mi->parent pre-check, reword commit message.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Fail root_path_from_parent if parent is root_yard, we want to only
lookup root path in real parent mounts.
Now it is safe to use ns_mountpoint instead of mountpoint as both
children and parent have it and they are relative.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/e58a91883
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Function validate_children_collision is both called on dump and on
restore. On dump mountpoint and ns_mountpoint are the same. On restore
as we never call validate_children_collision on helper mounts
(root_yard_mp and cr_time are not in mntinfo list), for all other mounts
strcmp results would be the same with mountpoint and ns_mountpoint.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/8f4fda5ac
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
There is no point of remaping ns root mounts they can't overmount anybody.
This also allows us to switch mnt_needs_remap from ->mountpoint to
->ns_mountpoint for mount comparison in overmount detection.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/9475bf843
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Let's use ->ns_mountpoint in comparison as ->mountpoint can change (e.g.
see how we add ns root in get_mp_mountpoint and in do_remap_mount we can
change it again). We plan to get rid of ->mountpoint everywhere where we
can use unchanged ->ns_mountpoint.
Cherry-picked hunks from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/e98e1456d
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Replace ->mountpoint with ->ns_mountpoint for determining relations
between mounts.
Also let's use get_relative_path in autofs_create_dentries as it is more
robust, before that we've missed the case where mountpoint of child of
autofs mount is multilevel subdirectory of parent mountpoint, and always
created them as single level subdirectory.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/5d5462202
Changes: skip children overmount as it does not need a subdirectory.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Put remounted_rw to it. This allows us to easily add some more of such
variables without allocating each one of them separately.
Due to existance of shfree_last shmalloc'ed region can be inherited from
the previous caller so it needs to be explicitly zero initialized.
Fixes: 0a2d380e6 ("ghost/mount: allocate remounted_rw in shmem to get
info from other processes")
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/6750e5793
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Expression (x && REMOUNTED_RW) is always same as just (x).
It should've been (x & REMOUNTED_RW) to check if mount is marked as
temporary remounted writable and requires to be switched back.
By fixing this check we eliminate excess readonly remounts.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/167f8ac67
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Let's merge mount trees under root_yard just after reading from image.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/8e8ecdfdc
Changes: split only root yard part as a separate patch, and put root
yard alloc into merge_mount_trees.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Function mnt_is_overmounted is designed to detect if mount is overmounted in
current tree using comparison of mountpoints of neighbour mounts for detection.
We want to get actual overmounts in dumped tree, we don't expect that helper
mounts we add or merging will introduce new overmounts. So let's do overmount
detection earlier before adding helpers.
Set is_overmounted = false for root yard and binfmt helper mounts.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/e98e1456d
Changes: rename set_is_overmounted to prepare_is_overmounted, move it
just after collecting mounts from images to mount tree, handle helper
mounts.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Will use it to find shared mount we can bind from and also can inherit
external slavery. Device-external can't give us external slavery.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/dcd952c4c
Changes: switch to mnt_bind_pick helper.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
There is no point to lose this information, having -1 everywhere in
mount images instead of acutall master id can be confusing.
Note that now need_master is true for bindmounts of root mounts with
same master_id as root mount, so now they are handled with a common
code, we've added can_receive_master_from_root check specially to handle
this case right. Also note that in propagate_mount we no more set ->bind
for this case, this is handled by mnt_ext_slave list related code.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/b3c9dc05e
Stripped only master_id relative part of original patch, add
preparational patches before this one.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
We need to put mounts which need to inherit master_id from external
mounts or from root mount into separate list, so that we can set ->bind
on them right in propagate_siblings.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/ea592cf6e
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
If mount has external master_id it can inherit it as a bind of external
mount, but also it can inherit it as a bind of container root mount, so
let's add similar condition to allow such mounts.
Note: need_master is false for binds of root mount which can inherit
master_id from root mounts yet, this would change in next patch.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Root yard mount also has mnt_id == 0 so it will look better with a new
name. Let's explicitly initialize root yard mnt_id to HELPER_MNT_ID
for the sake of code readability.
Also in near future we might want to create additional mount helpers to support
mounts in CT with no fsroot mounted.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/45bf6f0ee
Changes: split umount hunk to previous patch, set HELPER_MNT_ID for root
yard.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
On dump, yes, mountpoint and ns_mountpoint are the same, but on restore
they don't and puting something like "<root_yard>/binfmt_misc" to
ns_mountpoint is wrong, let's leave ns_mountpoint NULL, this mount
should not be compared by ns_mountpoint with other mounts anyway.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Put our auxiliary binfmt_misc mount in "<root_yard>/binfmt_misc" instead
of "<root_yard>/<mntns>/proc/sys/fs/binfmt_misc". Thus we can restore
binfmt_misc without altering actual mount tree, which looks much more
safe.
For that we need to remove "fake top mount_info" handling from
add_cr_time_mount as now we intentionally add binfmt_misc mount as a
child of ("fake") root yard. On dump this does not change anything.
Also we need to create mountpoint for binfmt_misc in root yard.
As now mount is out of restored mount tree we don't need to umount it,
so remove corresponding CRTIME_MNT_ID umount hunk in do_new_mount.
Note: to make binfmt_misc c/r work criu should be compiled with
CONFIG_BINFMT_MISC_VIRTUALIZED and binfmt_misc should be actually
virtualized and this is only done in Virtuozzo kernel per ve.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/2eb535843
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/d79c7f441
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/34002bef4
Cherry-picked one hunk from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/45bf6f0ee
Changes: merge all fixups together to one consistent patch.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Before this change we didn't apply sb-flags if we mount the root mount of
non-root mntns. There is no point in it, if we got to do_new_mount this root
mount is not external bind, so we won't change sb-flags on host if we change it
for this mount. So we just loose sb-flags on some regular container mount for
no reason. Fix it.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/e7ffe4c60
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
This creates nested mntns and does pivot_root to tmpfs mount, so that
roots of original test mntns and in nested mntns are different.
Before allowing nested mntnses with different roots in previous patch
this would fail.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Only root in root-mntns is special (see rst_mnt_is_root) all other
mounts are mounted regulary there is no difference between ns root and
any other mount or bind-mount.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/f41e41dd5
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Helper mnt_is_root_bind indicates that mount can be bind-mounted from
the root mount (which in it's turn from opts.root).
Use it in validate_mounts: we should skip unsupported mount from fsroot check
if we know it will be bindmounted from root mount, is_ns_root check was wrong.
Also fix root mount check in dump_one_fs, root mounts in non root mntns should
be dumped normally if they are not bind-mounts of root mount.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/25d078971
Changes: switch to mnt_bind_pick helper, export to mount.h, also add
mnt_get_root_bind helper for future use in mount-v2, remove excess root
yard hunk.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
This test creates two mount namespaces, one "root" with external mount
at /mnt_ext_collision.test/dst and one "nested" with different internal
mount at /mnt_ext_collision.test/dst instead.
This case is important for nested containers, if we dump a container
with some external mount in /mnt we should not also replace mounts in
/mnt for nested containers with the external one. (One example is docker
containers inside Virtuozzo containers.)
Without previous patch which restricts external mounts resolution to
only root mntns of container this test fails as internal mount is
replaced by external one after migration.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
We resolve mountpoint-external mounts on dump by mountpoint comparison,
so if we have other mount (other superblock e.g. in nested mntns) with
same mountpoint we would also resolve this mount as external and restore
it as external: replacing it completely with different mount... That's
wrong, so to make this interface more robust let's only resolve
mountpoint-external mounts in root mntns of container, not in all
mntnses as it was before.
Note: if actual external mount (bind of external) gets to nested mntns
it's ok not to resolve it as external as criu would bind it from the
resolved mount in root mntns. So external mounts in nested mntns are
still supported after this patch.
Cherry-picked one hunk from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/034498b28
Changes: apply mntns check only to mountpoint-external mounts.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
This test simply creates a) root external mount and b) "deeper"
bindmount for it (deeper in terms of mnt_depth). Our mount restore code
tries to mount (b) first and fails (without previous patch ordering
external mounts before their binds).
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/d31954669
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
The problem when we don't order these mounts we can get to mounting
non-external bind first via do_new_mount and fail c/r. For instance for
tmpfs we would fail on no image to get contents from. See the test
mnt_ext_root for more info.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/baf3f8db8
Changes: switch to mnt_bind_pick helper, export to mount.h, make check
in can_mount_now skip mounts with ->bind set.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Function dump_one_fs already has mnt_is_external_bind check inside, so
there is no point to check pm->external one more time.
Function check_bindmount is intended to check devpts bindmount's master
was opened in right mount namespace, but if bindmount is external mount
there is no point to check this. Let's also skip check for bindmounts of
external mounts.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
We use mnt_is_external():
1) In validate_mounts() to skip fsroot existence check for mounts which
will be bind-mounted from external mounts.
2) In resolve_shared_mounts() to skip error on slave mounts without
master mount, if they can receive these master_id through external
mount.
3) In dump_one_fs to skip dump of mounts which will be bind-mounted from
external mounts.
Cases (1) and (3) are the same, but case (2) is quiet different. Lets
split these cases thus making things simplier.
Effectively these patch does not change criu's behaviour at all. While
I can't say that old mnt_is_external was wrong, it was too complex and
hard for understanding, so it's worth to switch to lookup across
bindmounts list via general mnt_bind_pick() helper. And now when it is
obvious that mnt_is_external looks for external bindmount, let's also
change it's name to mnt_is_external_bind.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/494b52ba8
Changes: use mnt_bind_pick helper, use is_sub_path helper to be more
robust, rename mnt_is_external to mnt_is_external_bind, fix
clang-format, export to mount.h, use mnt_is_nodev_external as we can not
inherit master from device-external mounts.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Adding different pick functions we would be able to search different
things like mounted bind with wider root, or external bind, or external
bind with same sharing group and so on and so forth.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
This is a smart way of getting relative paths:
1) Always returns relative path, no unexpected starting '/';
2) Detects subpath even if path formats are different, only real directory
and file names matter;
3) No path modiffication/allocation, returns shifted pointer to the
orignal path.
We have many places where we need to cut subpath from path. Different code
blocks doing this job spread widely across the codebase for instance see:
cut_root_for_bind and root_path_from_parent. But those implementations rely on
the fact that subpath's and path's formats are the same.
When we modify or concatenate paths we can accidentally get strange
path formats, paths given by user can have strange format, and the job
to manually maintain all paths in "simple" format everywhere is too
hard. So let's just add a tool to compare "strange" paths.
E.g.:
get_relative_path("./a////.///./b//././c", "///./a/b") == "c"
Note: ".." in path is not supported, and we just can't support it right
without full filesystem tree information to resolve paths like
"../../a", so we just treat ".." as a directory name which should work
in simple cases.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/73a771348
Changes: add other useful robust path comparison helpers is_sub_path and
is_same_path based on get_relative_path, fix clang-format.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Before this patch mnt_is_external() used non-populated mnt_bind list
when called from resolve_shared_mounts(), thus it could work not as
intended.
Let's add separate helper search_bindmounts() for populating mnt_bind
list, and add mnt_bind_is_populated to differentiate between
non-populated list and just empty populated list. This way we can add a
BUG_ON to mnt_is_external to catch such order problems in future.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/e464c1c6d
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/8b22b30d5
Cherry-picked one hunk from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/ca9de41e3
Changes: simplify commit message, merge fixups: search bindmounts
earlier so that we have bindmounts info as early as possible, rename
mnt_no_bind to mnt_bind_is_populated and simplify it's logic a bit.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Fstype and source fields can be changed by resolve_external_mounts() or
by try_resolve_ext_mount() for external mounts, but we can have other
mounts from same superblock which are not detected as external, for
instance bind of subdirectory from device-external or bind of
mountpoint-external mount to other mountpoint. So we need to still be
able to find bindmounts between mounts with changed fstype or source and
unchanged mounts.
So let's make fstype/source checks in mounts_sb_equal ignored for
external mounts. Leave only fstype->sb_equal checks if have them.
Signed-off-by: Alexander Mikhalitsyn (Virtuozzo) <alexander@mihalicyn.com>
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/fadc38d84
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/f9700cb12
Changes: merge two commits in one and rework, remove ":)", reword
commit-message to make patch self-sufficient.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Previously only autodetected and mountpoint external mounts had
mount_info->external field set, let's fix this injustice so that we can
operate all external mounts in a similar manner.
Also:
Print info message when device external mount is detected similar to
mountpoint external mounts detection.
Add helper mnt_is_nodev_external to let do_mount_one, can_mount_now and
do_bind_mount handle device external mounts separately as it was before.
Handle device external mount right in get_mp_root to set ->external on
restore. (note: calling ext_mount_lookup is only meaningfull for
mountpoint external mounts)
Add helper mnt_is_dev_external to use in resolve_source to make it more
clear that it is a device external mount restore path.
All other "if (mi->external)" checks now also handle device external
mounts, but they all look safe to do so and could've done it initially,
here is a list: fusectl_dump, mnt_is_external, dump_one_mountpoint,
propagate_mount.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/afd899539
Changes: cleanup commit message, add some helpers.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Device-external mounts are restored via do_new_mount(), but function
do_new_mount only allows creating mounts with root "/", as it does
simple mount (not bind) without any later root change. Restoring
non-root mounts via do_new_mount is just imposible.
So let's detect mounts as device-external only when they have fsroot
root, all other non-fsroot binds of this device would be restored as
bindmounts of fsroot ones.
This is a cosmetic change as though non-root mounts were detected as
device-external before this patch they anyway would not be created with
do_new_mount() because of fsroot/bind check in can_mount_now orders them
to be restored as binds.
Cherry-picked one hunk from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/afd899539
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Use this helper everywhere instead of manually adding mounts to the head
of the list, this way it is much easier to track all places where we do
add to mntinfo list.
Signed-off-by: Alexander Mikhalitsyn (Virtuozzo) <alexander@mihalicyn.com>
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/7bca9397b
Changes: skip hunk adding root_yard_mp to the list because root yard has
not fully initialized mountinfo structure (can break code which uses
mntinfo fallback in lookup_nsid_by_mnt_id), let's only have real mounts
in mntinfo list. Also skip cr_time mount from mntinfo list for the same
reason.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Before these change the on-host-"zdtm_auto_ext_mnt" mount with
mountpoint "/tmp/zdtm_ext_auto.XXXXXX" was private/shared depending on
it's parent mount "/tmp". And e.g. on my setup the parent mount on
"/tmp" is private and our "host" mount becomes private too. So
in-container-"zdtm_auto_ext_mnt" external mount is also private but test
name hints it should be slave.
E.g. If I ran mnt_ext_master before this patch, in mnt_ext_master
process mntns we see that our "external" mount is private but not slave:
[root@fedora criu]# grep zdtm_auto_ext_mnt /proc/167077/mountinfo
1239 1238 0:138 /test /ext_mounts rw,relatime - tmpfs zdtm_auto_ext_mnt rw,seclabel,inode64
After this patch:
[root@fedora criu]# grep zdtm_auto_ext_mnt /proc/166385/mountinfo
1239 1238 0:138 /test /ext_mounts rw,relatime master:413 - tmpfs zdtm_auto_ext_mnt rw,seclabel,inode64
^^^^^^^^^^
So we just explicitly make on-host-"zdtm_auto_ext_mnt" shared, and this
makes in-container-"zdtm_auto_ext_mnt" external mount slave.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/a1a221fe9
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
coverity CID 389197:
CID 389197 (#1 of 1): Invalid printf format string (PRINTF_ARGS)
format_error: Length modifier L not applicable to conversion specifier in %Lu. [show details]
284 pr_err("Incompatible uffd API: expected %Lu, got %Lu\n", UFFD_API, uffdio_api.api);
Looking on C11 standard it seems that "%Lu" is undefined, we better not
use this, see:
"L Specifies that a following a, A, e, E, f, F, g, or G conversion
specifier applies to a long double argument."
http://port70.net/~nsz/c/c11/n1570.html#7.21.6.1p7
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
coverity CID 389191:
int unix_sk_id_add(unsigned int ino)
2327{
2328 char *e_str;
2329
1. alloc_fn: Storage is returned from allocation function malloc.
2. var_assign: Assigning: ___p = storage returned from malloc(20UL).
3. Condition !___p, taking false branch.
4. leaked_storage: Variable ___p going out of scope leaks the storage it points to.
5. var_assign: Assigning: e_str = ({...; ___p;}).
2330 e_str = xmalloc(20);
6. Condition !e_str, taking false branch.
2331 if (!e_str)
2332 return -1;
7. noescape: Resource e_str is not freed or pointed-to in snprintf.
2333 snprintf(e_str, 20, "unix[%u]", ino);
8. noescape: Resource e_str is not freed or pointed-to in add_external. [show details]
CID 389191 (#1 of 1): Resource leak (RESOURCE_LEAK)9. leaked_storage: Variable e_str going out of scope leaks the storage it points to.
2334 return add_external(e_str);
2335}
We should free e_str string after we finish it's use in unix_sk_id_add,
easiest way to do it is to use cleanup_free attribute.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Modifications to support criu image streamer when using amdgpu_plugin.
When running with criu image streamer, fseek/lseek is not available so
we store the file size in the first 8-bytes of the actual file.
Signed-off-by: David Yat Sin <david.yatsin@amd.com>
Store BO contents directly to file (1 per GPU) instead of using
protobuf.
Bug Fix:
Fixes an issue where we could not handle BOs bigger than 4GB because
protobuf has an internal limit of 4GB for the Bytes structure.
Performance Improvements:
This significantly reduces CR duration on multi-GPU systems as it allows
reading and writing to disk in parallel. During checkpoint, instead of
waiting for all the BO contents to be read from the one protobuf file,
we can now start writing the BO contents as soon as the first BO is read
from disk. During restore, we can start writing BO contents to disk
after the first BO from VRAM. This also reduces the peak amount of
system memory used as we only need to keep 1 BO content in memory per
GPU at a time instead of all the BO contents.
Signed-off-by: David Yat Sin <david.yatsin@amd.com>
This sets up the pytorch environment for BERT Transformers and also sets
up CRIU along with all its dependencies including amdgpu plugin for
supporting CR with AMDGPUs.
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: David Yat Sin <david.yatsin@amd.com>
On newer kernel's (> 5.13), KFD & DRM drivers will only allow the
/dev/renderD* file descriptors that were used during the CRIU_RESTORE
ioctl when calling mmap for the vma's.
During restore, after opening /dev/renderD*, amdgpu_plugin keeps the
FDs opened and instead returns a copy of the FDs to CRIU. The same FDs
are then returned during the UPDATE_VMAMAP hooks so that they can be
used by CRIU to call mmap. Duplicated FDs created using dup are
references to the same struct file inside the kernel so they are also
allowed to mmap.
To prevent the opened FDs inside amdgpu_plugin from conflicting with
FDs used by the target restore application, we make sure that the
lowest-numbered FD that amdgpu_plugin will use is greater than the
highest-numbered FD that is used by the target application.
Signed-off-by: David Yat Sin <david.yatsin@amd.com>
AMD Radeon GPUs have special sDMA (system dma engines) IPs that can be
used to speed up the read write operations from the VRAM and GTT memory.
Depends on:
* The kernel mode driver (kfd) creating the dmabuf objects for the kfd
BOs in both checkpoint and restore operation.
* libdrm and libdrm_amdgpu libraries
Suggested-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: David Yat Sin <david.yatsin@amd.com>
Libhsakmt(thunk) uses a shared memory file in /dev/shm/hsakmt_shared_mem
and its semaphore in /dev/shm/hsakmt_shared_mem. Adding a check during
checkpoint to see if these two files exist. If they exist then the
plugin will try to restore them during restore.
Signed-off-by: David Yat Sin <david.yatsin@amd.com>
Implement multi-threaded code to read and write contents of each GPU
VRAM BOs in parallel in order to speed up dumping process when using
multiple GPUs.
Signed-off-by: David Yat Sin <david.yatsin@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Adding unit tests for GPU remapping code when checkpointing and
restoring on different nodes with different topologies.
Signed-off-by: David Yat Sin <david.yatsin@amd.com>