mirror of
https://github.com/checkpoint-restore/criu
synced 2025-08-22 01:51:51 +00:00
plugins/amdgpu: Update README.md
and criu-amdgpu-plugin.txt
Signed-off-by: Yanning Yang <yangyanning@sjtu.edu.cn>
This commit is contained in:
parent
bfb4a3d842
commit
7c4bcdb2d4
@ -15,6 +15,7 @@ Checkpoint / Restore inside a docker container
|
||||
Pytorch
|
||||
Tensorflow
|
||||
Using CRIU Image Streamer
|
||||
Parallel Restore
|
||||
|
||||
DESCRIPTION
|
||||
-----------
|
||||
|
@ -3,7 +3,8 @@ Supporting ROCm with CRIU
|
||||
|
||||
_Felix Kuehling <Felix.Kuehling@amd.com>_<br>
|
||||
_Rajneesh Bardwaj <Rajneesh.Bhardwaj@amd.com>_<br>
|
||||
_David Yat Sin <David.YatSin@amd.com>_
|
||||
_David Yat Sin <David.YatSin@amd.com>_<br>
|
||||
_Yanning Yang <yangyanning@sjtu.edu.cn>_
|
||||
|
||||
# Introduction
|
||||
|
||||
@ -224,6 +225,26 @@ to resume execution on the GPUs.
|
||||
*This new plugin is enabled by the new hook `__RESUME_DEVICES_LATE` in our RFC
|
||||
patch series.*
|
||||
|
||||
## Restoring BO content in parallel
|
||||
|
||||
Restoring the BO content is an important part in the restore of GPU state and
|
||||
usually takes a significant amount of time. A possible location for this
|
||||
procedure is the `cr_plugin_restore_file` hook. However, restoring in this hook
|
||||
blocks the target process from performing other restore operations, which
|
||||
hinders further optimization of the restore process.
|
||||
|
||||
Therefore, a new plugin hook that runs in the master restore process is
|
||||
introduced, and it interacts with the `cr_plugin_restore_file` hook to complete
|
||||
the restore of BO content. Specifically, the target process only needs to send
|
||||
the relevant BOs to the master restore process, while this new hook handles all
|
||||
the restore of buffer objects. Through this method, during the restore of the BO
|
||||
content, the target process can perform other restore operations, thus
|
||||
accelerating the restore procedure. This is an implementation of the gCROP
|
||||
method proposed in the ACM SoCC'24 paper: [On-demand and Parallel
|
||||
Checkpoint/Restore for GPU Applications](https://dl.acm.org/doi/10.1145/3698038.3698510).
|
||||
|
||||
*This optimization technique is enabled by the `__POST_FORKING` hook.*
|
||||
|
||||
## Other CRIU changes
|
||||
|
||||
In addition to the new plugins, we need to make some changes to CRIU itself to
|
||||
|
Loading…
x
Reference in New Issue
Block a user