2
0
mirror of https://github.com/checkpoint-restore/criu synced 2025-08-31 06:15:24 +00:00
Yanning Yang bfb4a3d842 plugins/amdgpu: Implement parallel restore
This patch implements the entire logic to enable the offloading of
buffer object content restoration.

The goal of this patch is to offload the buffer object content
restoration to the main CRIU process so that this restoration can occur
in parallel with other restoration logic (mainly the restoration of
memory state in the restore blob, which is time-consuming) to speed up
the restore phase. The restoration of buffer object content usually
takes a significant amount of time for GPU applications, so
parallelizing it with other operations can reduce the overall restore
time.

It has three parts: the first replaces the restoration of buffer objects
in the target process by sending a parallel restore command to the main
CRIU process; the second implements the POST_FORKING hook in the amdgpu
plugin to enable buffer object content restoration in the main CRIU
process; the third stops the parallel thread in the RESUME_DEVICES_LATE
hook.

This optimization only focuses on the single-process situation (common
case). In other scenarios, it will turn to the original method. This is
achieved with the new `parallel_disabled` flag.

Signed-off-by: Yanning Yang <yangyanning@sjtu.edu.cn>
2025-05-17 13:36:36 -07:00
2025-04-01 19:19:18 -07:00
2025-04-04 08:29:52 -07:00
2023-10-22 13:29:25 -07:00
2025-05-17 13:36:36 -07:00
2025-03-15 11:59:19 +00:00
2025-04-24 13:18:51 -07:00
2023-10-22 13:29:25 -07:00
2023-04-15 21:17:21 -07:00
2021-09-03 10:31:00 -07:00
2024-09-11 16:02:11 -07:00
2016-08-11 16:18:43 +03:00
2012-07-30 13:52:37 +04:00
2024-09-21 16:34:14 -07:00
2024-09-11 16:02:11 -07:00

X86_64 GCC Test Docker Test Podman Test CircleCI

CRIU -- A project to implement checkpoint/restore functionality for Linux

CRIU (stands for Checkpoint and Restore in Userspace) is a utility to checkpoint/restore Linux tasks.

Using this tool, you can freeze a running application (or part of it) and checkpoint it to a hard drive as a collection of files. You can then use the files to restore and run the application from the point it was frozen at. The distinctive feature of the CRIU project is that it is mainly implemented in user space. There are some more projects doing C/R for Linux, and so far CRIU appears to be the most feature-rich and up-to-date with the kernel.

CRIU project is (almost) the never-ending story, because we have to always keep up with the Linux kernel supporting checkpoint and restore for all the features it provides. Thus we're looking for contributors of all kinds -- feedback, bug reports, testing, coding, writing, etc. Please refer to CONTRIBUTING.md if you would like to get involved.

The project started as the way to do live migration for OpenVZ Linux containers, but later grew to more sophisticated and flexible tool. It is currently used by (integrated into) OpenVZ, LXC/LXD, Docker, and other software, project gets tremendous help from the community, and its packages are included into many Linux distributions.

The project home is at http://criu.org. This wiki contains all the knowledge base for CRIU we have. Pages worth starting with are:

Checkpoint and restore of simple loop process

Advanced features

As main usage for CRIU is live migration, there's a library for it called P.Haul. Also the project exposes two cool core features as standalone libraries. These are libcompel for parasite code injection and libsoccr for TCP connections checkpoint-restore.

Live migration

True live migration using CRIU is possible, but doing all the steps by hands might be complicated. The phaul sub-project provides a Go library that encapsulates most of the complexity. This library and the Go bindings for CRIU are stored in the go-criu repository.

Parasite code injection

In order to get state of the running process CRIU needs to make this process execute some code, that would fetch the required information. To make this happen without killing the application itself, CRIU uses the parasite code injection technique, which is also available as a standalone library called libcompel.

TCP sockets checkpoint-restore

One of the CRIU features is the ability to save and restore state of a TCP socket without breaking the connection. This functionality is considered to be useful by itself, and we have it available as the libsoccr library.

Licence

The project is licensed under GPLv2 (though files sitting in the lib/ directory are LGPLv2.1).

All files in the images/ directory are licensed under the Expat license (so-called MIT). See the images/LICENSE file.

Description
No description provided
Readme 81 MiB
Languages
C 86%
Python 6.1%
Java 2.6%
Shell 2.6%
Makefile 2%
Other 0.7%