mir/criu - criu - Mike's Git repositories

mir/criu

mirror of https://github.com/checkpoint-restore/criu synced 2025-08-22 18:07:57 +00:00

Author	SHA1	Message	Date
Yanning Yang	bfb4a3d842	plugins/amdgpu: Implement parallel restore This patch implements the entire logic to enable the offloading of buffer object content restoration. The goal of this patch is to offload the buffer object content restoration to the main CRIU process so that this restoration can occur in parallel with other restoration logic (mainly the restoration of memory state in the restore blob, which is time-consuming) to speed up the restore phase. The restoration of buffer object content usually takes a significant amount of time for GPU applications, so parallelizing it with other operations can reduce the overall restore time. It has three parts: the first replaces the restoration of buffer objects in the target process by sending a parallel restore command to the main CRIU process; the second implements the POST_FORKING hook in the amdgpu plugin to enable buffer object content restoration in the main CRIU process; the third stops the parallel thread in the RESUME_DEVICES_LATE hook. This optimization only focuses on the single-process situation (common case). In other scenarios, it will turn to the original method. This is achieved with the new `parallel_disabled` flag. Signed-off-by: Yanning Yang <yangyanning@sjtu.edu.cn>	2025-05-17 13:36:36 -07:00
Radostin Stoyanov	adf2c5be96	images/inventory: add field for enabled plugins This patch extends the inventory image with a `plugins` field that contains an array of plugins which were used during checkpoint, for example, to save GPU state. In particular, the CUDA and AMDGPU plugins are added to this field only when the checkpoint contains GPU state. This allows to disable unnecessary plugins during restore, show appropriate error messages if required CRIU plugin are missing, and migrate a process that does not use GPU from a GPU-enabled system to CPU-only environment. We use the `optional plugins_entry` for backwards compatibility. This entry allows us to distinguish between unset and missing field: - When the field is missing, it indicates that the checkpoint was created with a previous version of CRIU, and all plugins should be enabled during restore. - When the field is empty, it indicates that no plugins were used during checkpointing. Thus, all plugins can be disabled during restore. Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2024-10-26 22:18:22 -07:00
David Francis	60ee5ebd9d	plugins/amdgpu: Zero ib_info on initialization This struct was being used un-initialized, meaning it was filled with random garbage. Mea culpa. Signed-off-by: David Francis <David.Francis@amd.com>	2024-09-19 15:23:42 -07:00
Radostin Stoyanov	21ea718f9f	plugins/amdgpu: fix printf format specifiers Errors on aarch64: In file included from amdgpu_plugin_drm.h:10, from amdgpu_plugin.c:33: amdgpu_plugin.c: In function 'amdgpu_plugin_dump_file': amdgpu_plugin_util.h:24:20: error: format '%lld' expects argument of type 'long long int', but argument 6 has type '__u64' {aka 'long unsigned int'} [-Werror=format=] 24 \| #define LOG_PREFIX "amdgpu_plugin: " \| ^~~~~~~~~~~~~~~~~ ../../criu/include/log.h:47:52: note: in expansion of macro 'LOG_PREFIX' 47 \| #define pr_info(fmt, ...) print_on_level(LOG_INFO, LOG_PREFIX fmt, ##__VA_ARGS__) \| ^~~~~~~~~~ amdgpu_plugin.c:1236:9: note: in expansion of macro 'pr_info' 1236 \| pr_info("devices:%d bos:%d objects:%d priv_data:%lld\n", args.num_devices, args.num_bos, args.num_objects, \| ^~~~~~~ cc1: all warnings being treated as errors Errors on ppc64: In file included from amdgpu_plugin_drm.h:10, from amdgpu_plugin.c:33: amdgpu_plugin.c: In function 'amdgpu_plugin_dump_file': amdgpu_plugin_util.h:24:20: error: format '%llu' expects argument of type 'long long unsigned int', but argument 6 has type '__u64' {aka 'long unsigned int'} [-Werror=format=] 24 \| #define LOG_PREFIX "amdgpu_plugin: " \| ^~~~~~~~~~~~~~~~~ ../../criu/include/log.h:47:52: note: in expansion of macro 'LOG_PREFIX' 47 \| #define pr_info(fmt, ...) print_on_level(LOG_INFO, LOG_PREFIX fmt, ##__VA_ARGS__) \| ^~~~~~~~~~ amdgpu_plugin.c:1236:9: note: in expansion of macro 'pr_info' 1236 \| pr_info("devices:%u bos:%u objects:%u priv_data:%llu\n", \| ^~~~~~~ cc1: all warnings being treated as errors In file included from amdgpu_plugin_util.c:38: amdgpu_plugin_util.c: In function 'print_kfd_bo_stat': amdgpu_plugin_util.h:24:20: error: format '%llx' expects argument of type 'long long unsigned int', but argument 5 has type '__u64' {aka 'long unsigned int'} [-Werror=format=] 24 \| #define LOG_PREFIX "amdgpu_plugin: " \| ^~~~~~~~~~~~~~~~~ ../../criu/include/log.h:47:52: note: in expansion of macro 'LOG_PREFIX' 47 \| #define pr_info(fmt, ...) print_on_level(LOG_INFO, LOG_PREFIX fmt, ##__VA_ARGS__) \| ^~~~~~~~~~ amdgpu_plugin_util.c:196:17: note: in expansion of macro 'pr_info' 196 \| pr_info("%s(), %d. KFD BO Addr: %llx \n", __func__, idx, bo->addr); \| ^~~~~~~ amdgpu_plugin_util.h:24:20: error: format '%llx' expects argument of type 'long long unsigned int', but argument 5 has type '__u64' {aka 'long unsigned int'} [-Werror=format=] 24 \| #define LOG_PREFIX "amdgpu_plugin: " \| ^~~~~~~~~~~~~~~~~ ../../criu/include/log.h:47:52: note: in expansion of macro 'LOG_PREFIX' 47 \| #define pr_info(fmt, ...) print_on_level(LOG_INFO, LOG_PREFIX fmt, ##__VA_ARGS__) \| ^~~~~~~~~~ amdgpu_plugin_util.c:197:17: note: in expansion of macro 'pr_info' 197 \| pr_info("%s(), %d. KFD BO Size: %llx \n", __func__, idx, bo->size); \| ^~~~~~~ amdgpu_plugin_util.h:24:20: error: format '%llx' expects argument of type 'long long unsigned int', but argument 5 has type '__u64' {aka 'long unsigned int'} [-Werror=format=] 24 \| #define LOG_PREFIX "amdgpu_plugin: " \| ^~~~~~~~~~~~~~~~~ ../../criu/include/log.h:47:52: note: in expansion of macro 'LOG_PREFIX' 47 \| #define pr_info(fmt, ...) print_on_level(LOG_INFO, LOG_PREFIX fmt, ##__VA_ARGS__) \| ^~~~~~~~~~ amdgpu_plugin_util.c:198:17: note: in expansion of macro 'pr_info' 198 \| pr_info("%s(), %d. KFD BO Offset: %llx \n", __func__, idx, bo->offset); \| ^~~~~~~ amdgpu_plugin_util.h:24:20: error: format '%llx' expects argument of type 'long long unsigned int', but argument 5 has type '__u64' {aka 'long unsigned int'} [-Werror=format=] 24 \| #define LOG_PREFIX "amdgpu_plugin: " \| ^~~~~~~~~~~~~~~~~ ../../criu/include/log.h:47:52: note: in expansion of macro 'LOG_PREFIX' 47 \| #define pr_info(fmt, ...) print_on_level(LOG_INFO, LOG_PREFIX fmt, ##__VA_ARGS__) \| ^~~~~~~~~~ amdgpu_plugin_util.c:199:17: note: in expansion of macro 'pr_info' 199 \| pr_info("%s(), %d. KFD BO Restored Offset: %llx \n", __func__, idx, bo->restored_offset); \| ^~~~~~~ cc1: all warnings being treated as errors Co-developed-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2024-09-19 15:23:42 -07:00
Radostin Stoyanov	c42b58f4fb	plugin: enable multiple plugins for the same hook CRIU provides two plugins for checkpoint/restore of GPU applications: amdgpu and cuda. Both plugins use the `RESUME_DEVICES_LATE` hook to enable restore: CR_PLUGIN_REGISTER_HOOK(CR_PLUGIN_HOOK__RESUME_DEVICES_LATE, amdgpu_plugin_resume_devices_late) CR_PLUGIN_REGISTER_HOOK(CR_PLUGIN_HOOK__RESUME_DEVICES_LATE, cuda_plugin_resume_devices_late) However, CRIU currently does not support running more than one plugin for the same hook. As a result, when both plugins are installed, the resume function for CUDA applications is not executed. To fix this, we need to make sure that both `plugin_resume_devices_late()` functions return `-ENOTSUP` when restore is not supported. Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2024-09-11 16:02:11 -07:00
Radostin Stoyanov	a808f09bea	amdgpu_plugin: fix lint errors $ make lint ... # Do not append \n to pr_perror, pr_pwarn or fail ! git --no-pager grep -E '^\s\<(pr_perror\|pr_pwarn\|fail)\>.\\n"' plugins/amdgpu/amdgpu_plugin.c: pr_perror("%s(), Can't handle VMAs of input device\n", __func__); ! git --no-pager grep -En '^\s\<pr_(err\|warn\|msg\|info\|debug)\>.);$' \| grep -v '\\n' plugins/amdgpu/amdgpu_plugin_drm.c:45: pr_err("Error in getting stat for: %s", path); plugins/amdgpu/amdgpu_plugin_util.c:77: pr_err("Unable to read file (read:%ld buf_len:%ld)", len_read, buf_len); plugins/amdgpu/amdgpu_plugin_util.c:89: pr_err("Unable to write file (wrote:%ld buf_len:%ld)", len_write, buf_len); plugins/amdgpu/amdgpu_plugin_util.c:120: pr_err("%s: Failed to open for %s", path, write ? "write" : "read"); plugins/amdgpu/amdgpu_plugin_util.c:126: pr_err("%s: Failed get pointer for %s", path, write ? "write" : "read"); plugins/amdgpu/amdgpu_plugin_util.c:136: pr_err("%s:Failed to access file size", path); plugins/amdgpu/amdgpu_plugin_util.c:152: pr_err("Cannot fopen %s", file_path); make: *** [Makefile:470: lint] Error 1 Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2024-09-11 16:02:11 -07:00
Ramesh Errabolu	0d5923c95e	amdgpu_plugin: Refactor code used to implement Checkpoint Refactor code used to Checkpoint DRM devices. Code is moved into amdgpu_plugin_drm.c file which hosts various methods to checkpoint and restore a workload. Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com>	2024-09-11 16:02:11 -07:00
Ramesh Errabolu	733ef96315	amdgpu_plugin: Refactor code in preparation to support C&R for DRM devices Add a new compilation unit to host symbols and methods that will be needed to C&R DRM devices. Refactor code that indicates support for C&R and checkpoints KFD and DRM devices Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com>	2024-09-11 16:02:11 -07:00
Pavel Tikhomirov	b689a6710c	plugin/amdgpu: Also don't print 'plugin failed' in criu We already don't treat it as error in the plugin itself, but after returning -1 from RESUME_DEVICES_LATE hook we print debug message in criu about failed plugin, let's return 0 instead. While on it let's replace ret to exit_code. Fixes: a9cbdad76 ("plugin/amdgpu: Don't print error for "No such process" during resume") Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>	2024-09-11 16:02:11 -07:00
David Francis	59599dacdd	plugin/amdgpu: Don't print error for "No such process" during resume During the late stages of restore, each process being resumed gets an ioctl call to KFD_CRIU_OP_RESUME. If the process has no kfd process info, this call with fail with -ESRCH. This is normal behaviour, so we shouldn't print an error message for it. Signed-off-by: David Francis <David.Francis@amd.com>	2024-09-11 16:02:11 -07:00
Radostin Stoyanov	28e854d662	amdgpu: fix clang warnings amdgpu_plugin.c:930:6: error: variable 'buffer' is used uninitialized whenever 'if' condition is true [-Werror,-Wsometimes-uninitialized] if (ret) { ^~~ amdgpu_plugin.c:988:8: note: uninitialized use occurs here xfree(buffer); Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2023-10-22 13:29:25 -07:00
Andrei Vagin	aa38a59899	amdgpu: print an error if the dup syscall fails Signed-off-by: Andrei Vagin <avagin@gmail.com>	2023-10-22 13:29:25 -07:00
Andrei Vagin	940a05c0ba	amdgpu: don't leak fd on an error path in open_img_file Signed-off-by: Andrei Vagin <avagin@gmail.com>	2023-10-22 13:29:25 -07:00
Andrei Vagin	a68975c06d	plugins: the UPDATE_VMA_MAP callback returns fd with the full control It means CRIU has to close it when it is not needed. It looks more logically correct and matches the behaviour of the RESTORE_EXT_FILE callback. Signed-off-by: Andrei Vagin <avagin@gmail.com>	2023-10-22 13:29:25 -07:00
David Francis	d06c9b5cda	criu/plugin: Add environment variable to cap size of buffers. The amdgpu plugin would create a memory buffer at the size of the largest VRAM bo (buffer object). On some systems, VRAM size exceeds RAM size, so the largest bo might be larger than the available memory. Add an environment variable KFD_MAX_BUFFER_SIZE, which caps the size of this buffer. By default, it is set to 0, and has no effect. When active, any bo larger than its value will be saved to/restored from file in multiple passes. Signed-off-by: David Francis <David.Francis@amd.com>	2023-10-22 13:29:25 -07:00
Radostin Stoyanov	a4b49c46fe	amdgpu_plugin: remove duplicated log prefix The log prefix "amdgpu_plugin:" is defined with `LOG_PREFIX` in `amdgpu_plugin.c`. However, the prefix is also included in each log message. As a result it appears duplicated in the log messages: (00.044324) amdgpu_plugin: amdgpu_plugin: devices:1 bos:58 objects:148 priv_data:45696 (00.045376) amdgpu_plugin: amdgpu_plugin: Thread[0x5589] started (00.167172) amdgpu_plugin: amdgpu_plugin: img_path = amdgpu-kfd-62.img (00.083739) amdgpu_plugin: amdgpu_plugin : amdgpu_plugin_dump_file() called for fd = 235 Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2023-10-22 13:29:25 -07:00
Radostin Stoyanov	46ec6749fa	ci: Fix code indent This patch contains auto-generated changes from `make indent` Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2022-06-22 10:20:33 -07:00
David Yat Sin	87d3735145	criu/plugin: Add support for criu image streamer Modifications to support criu image streamer when using amdgpu_plugin. When running with criu image streamer, fseek/lseek is not available so we store the file size in the first 8-bytes of the actual file. Signed-off-by: David Yat Sin <david.yatsin@amd.com>	2022-04-28 17:53:52 -07:00
David Yat Sin	55370b720e	criu/plugin: Store BO contents directly to file Store BO contents directly to file (1 per GPU) instead of using protobuf. Bug Fix: Fixes an issue where we could not handle BOs bigger than 4GB because protobuf has an internal limit of 4GB for the Bytes structure. Performance Improvements: This significantly reduces CR duration on multi-GPU systems as it allows reading and writing to disk in parallel. During checkpoint, instead of waiting for all the BO contents to be read from the one protobuf file, we can now start writing the BO contents as soon as the first BO is read from disk. During restore, we can start writing BO contents to disk after the first BO from VRAM. This also reduces the peak amount of system memory used as we only need to keep 1 BO content in memory per GPU at a time instead of all the BO contents. Signed-off-by: David Yat Sin <david.yatsin@amd.com>	2022-04-28 17:53:52 -07:00
David Yat Sin	2095de9f03	criu/plugin: Fix for FDs not allowed to mmap On newer kernel's (> 5.13), KFD & DRM drivers will only allow the /dev/renderD* file descriptors that were used during the CRIU_RESTORE ioctl when calling mmap for the vma's. During restore, after opening /dev/renderD*, amdgpu_plugin keeps the FDs opened and instead returns a copy of the FDs to CRIU. The same FDs are then returned during the UPDATE_VMAMAP hooks so that they can be used by CRIU to call mmap. Duplicated FDs created using dup are references to the same struct file inside the kernel so they are also allowed to mmap. To prevent the opened FDs inside amdgpu_plugin from conflicting with FDs used by the target restore application, we make sure that the lowest-numbered FD that amdgpu_plugin will use is greater than the highest-numbered FD that is used by the target application. Signed-off-by: David Yat Sin <david.yatsin@amd.com>	2022-04-28 17:53:52 -07:00
Rajneesh Bhardwaj	bd83330095	criu/plugin: Implement sDMA based buffer access AMD Radeon GPUs have special sDMA (system dma engines) IPs that can be used to speed up the read write operations from the VRAM and GTT memory. Depends on: * The kernel mode driver (kfd) creating the dmabuf objects for the kfd BOs in both checkpoint and restore operation. * libdrm and libdrm_amdgpu libraries Suggested-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: David Yat Sin <david.yatsin@amd.com>	2022-04-28 17:53:52 -07:00
David Yat Sin	6d79266229	criu/plugin: Restore libhsakmt shared memory files Libhsakmt(thunk) uses a shared memory file in /dev/shm/hsakmt_shared_mem and its semaphore in /dev/shm/hsakmt_shared_mem. Adding a check during checkpoint to see if these two files exist. If they exist then the plugin will try to restore them during restore. Signed-off-by: David Yat Sin <david.yatsin@amd.com>	2022-04-28 17:53:52 -07:00
David Yat Sin	a218fe0baa	criu/plugin: Read and write BO contents in parallel Implement multi-threaded code to read and write contents of each GPU VRAM BOs in parallel in order to speed up dumping process when using multiple GPUs. Signed-off-by: David Yat Sin <david.yatsin@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>	2022-04-28 17:53:52 -07:00
David Yat Sin	4856e0d4d0	criu/plugin: Add parameters to override mapping Add optional parameters to override default behavior during restore. These parameters are passed in as environment variables before executing CRIU. List of parameters: KFD_FW_VER_CHECK - disable firmware version check KFD_SDMA_FW_VER_CHECK - disable SDMA firmware version check KFD_CACHES_COUNT_CHECK - disable caches count check KFD_NUM_GWS_CHECK - disable num_gws check KFD_VRAM_SIZE_CHECK - disable VRAM size check KFD_NUMA_CHECK - preserve NUMA regions KFD_CAPABILITY_CHECK - disable capability check Signed-off-by: David Yat Sin <david.yatsin@amd.com>	2022-04-28 17:53:52 -07:00
David Yat Sin	72905c9c9b	criu/plugin: Remap GPUs on checkpoint restore The device topology on the restore node can be different from the topology on the checkpointed node. The GPUs on the restore node may have different gpu_ids, minor number. or some GPUs may have different properties as checkpointed node. During restore, the CRIU plugin determines the target GPUs to avoid restore failures caused by trying to restore a process on a gpu that is different. Signed-off-by: David Yat Sin <david.yatsin@amd.com>	2022-04-28 17:53:52 -07:00
David Yat Sin	6e99fea2fa	criu/plugin: Implement system topology parsing Parse local system topology in /sys/class/kfd/kfd/topology/nodes/ and store properties for each gpu in the CRIU image files. The gpu properties can then be used later during restore to make the process is restored on gpu's with similar properties. Signed-off-by: David Yat Sin <david.yatsin@amd.com>	2022-04-28 17:53:52 -07:00
David Yat Sin	c4e3ac7fef	criu/plugin: Adding check for kernel IOCTL version Adding check for minimum kernel IOCTL version before attempting to checkpoint. Signed-off-by: David Yat Sin <david.yatsin@amd.com>	2022-04-28 17:53:52 -07:00
Rajneesh Bhardwaj	55a5993bc7	criu/plugin: Support AMD ROCm Checkpoint Restore with KFD To support Checkpoint Restore with AMDGPUs for ROCm workloads, introduce a new plugin to assist CRIU with the help of AMD KFD kernel driver. This initial commit just provides the basic framework to build up further capabilities. Like CRIU, the amdgpu plugin also uses protobuf to serialize and save the amdkfd data which is mostly VRAM contents with some metadata. We generate a data file "amdgpu-kfd-<id>.img" during the dump stage. On restore this file is read and extracted to re-create various types of buffer objects that belonged to the previously checkpointed process. Upon restore the mmap page offset within a device file might change so we use the new hook to update and adjust the mmap offsets for newly created target process. This is needed for sys_mmap call in pie restorer phase. Support for queues and events is added in future patches of this series. With the current implementation (amdgpu_plugin), we support: - Only compute workloads such (Non Gfx) are supported - GPU visible inside a container - AMD GPU Gfx 9 Family - Pytorch Benchmarks such as BERT Base amdgpu plugin dependes on libdrm and libdrm_amdgpu which are typically installed with libdrm-dev package. We build amdgpu_plugin only when the dependencies are met on the target system and when user intends to install the amdgpu plugin and not by default with criu build. Suggested-by: Felix Kuehling <felix.kuehling@amd.com> Co-authored-by: David Yat Sin <david.yatsin@amd.com> Signed-off-by: David Yat Sin <david.yatsin@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>	2022-04-28 17:53:52 -07:00

28 Commits