Right now, this test fails with this error:
Error (criu/files-reg.c:1031): Can't dump ghost file
/criu/test/javaTests/omrvmem_000000626_Mlm48x of 2097152 size,
increase limit
Signed-off-by: Andrei Vagin <avagin@google.com>
cuda-checkpoint returns the positive CUDA error code when it runs into an issue
and passing that along as the return value would cause errors to get ignored
Signed-off-by: Jesus Ramos <jeramos@nvidia.com>
The vvar_vclock was introduced by [1]. Basically, the old vvar vma has
been splited on two parts. In term of C/R, these two vma-s can be still
treated as one.
[1] e93d2521b27f ("x86/vdso: Split virtual clock pages into dedicated mapping")
Signed-off-by: Andrei Vagin <avagin@google.com>
Fix for the following error when building CRIU on Rocky Linux 8
criu/pidfd.c: In function ‘pidfd_open’:
criu/pidfd.c:119:17: error: ‘__NR_pidfd_open’ undeclared (first use in this function); did you mean ‘pidfd_open’?
return syscall(__NR_pidfd_open, pid, flags);
^~~~~~~~~~~~~~~
pidfd_open
criu/pidfd.c:119:17: note: each undeclared identifier is reported only once for each function it appears in
criu/pidfd.c:120:1: error: control reaches end of non-void function [-Werror=return-type]
}
^
criu/pidfd.c: At top level:
cc1: error: unrecognized command line option ‘-Wno-unknown-warning-option’ [-Werror]
cc1: error: unrecognized command line option ‘-Wno-dangling-pointer’ [-Werror]
cc1: all warnings being treated as errors
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
We need to dynamically calculate TASK_SIZE depending
on the MMU on RISC-V system. [We are using analogical
approach on aarch64/ppc64le.]
This change was tested on physical machine:
StarFive VisionFive 2
isa : rv64imafdc_zicntr_zicsr_zifencei_zihpm_zca_zcd_zba_zbb
mmu : sv39
uarch : sifive,u74-mc
mvendorid : 0x489
marchid : 0x8000000000000007
mimpid : 0x4210427
hart isa : rv64imafdc_zicntr_zicsr_zifencei_zihpm_zca_zcd_zba_zbb
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
We don't need to have compel/arch/riscv64/plugins/std/syscalls/syscalls.S
tracked in git. It is autogenerated. We also need to update our .gitignore
to ignore autogenerated files with syscall tables.
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
If a CUDA process is already in a "locked" or "checkpointed" state
during criu dump, the CUDA plugin currently fails with an error because
it attempts an unnecessary "lock" action using the cuda-checkpoint tool.
This patch extends the CUDA plugin to handle such cases by first
verifying the initial state of the CUDA processes and skipping
unnecessary "lock" and "checkpoint" actions when a process has been
locked or checkpointed before CRIU is invoked.
In particular, CUDA tasks may already be in a "locked" or "checkpointed"
state to ensure consistent checkpoint/restore for distributed workloads,
such as model training, where multiple containers run across different
cluster nodes.
Another use case for this functionality is optimizing resource
utilization, where CUDA tasks with low-priority are preempted
immediately to release GPU resources needed by high-priority
tasks, and the paused workloads are later resumed or migrated
to another node.
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
We have multiple processes open a pidfd to a common dead process.
After C/R we check that the inode numbers for these pidfds are equal or
not.
Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
Currently, the `waitpid()` call on the tmp process can be made by a
process which is not its parent. This causes restore to fail.
This patch instead selects one process to create the tmp process and
open all the fds that point to it. These fds are sent to the correct
process(es).
Fixes: #2496
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
The check for `/dev/nvidiactl` to determine if the CUDA plugin can be
used is unreliable because in some cases the default path for driver
installation is different [1]. This patch changes the logic to check
if a GPU device is available in `/proc/driver/nvidia/gpus/`. This
approach is similar to `torch.cuda.is_available()` and it is a more
accurate indicator.
The subsequent check for support of the `cuda-checkpoint --action`
option would confirm if the driver supports checkpoint/restore.
[1] https://github.com/NVIDIA/gpu-operatorFixes: #2509
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Container runtimes like CRI-O and containerd utilize the freezer cgroup
to create a consistent snapshot of container root filesystem (rootfs)
changes. In this case, the container is frozen before invoking CRIU.
After CRIU successfully completes, a copy of the container rootfs diff
is saved, and the container is then unfrozen.
However, the `cuda-checkpoint` tool is not able to perform a 'lock'
action on frozen threads. To support GPU checkpointing with these
container runtimes, we need to unfreeze the cgroup and return it to its
original state once the checkpointing is complete.
To reflect this new behavior, the following changes are applied:
- `dont_use_freeze_cgroup(void)` -> `set_compel_interrupt_only_mode(void)`
- `bool freeze_cgroup_disabled` -> `bool compel_interrupt_only_mode`
- `check_freezer_cgroup(void)` -> `prepare_freezer_for_interrupt_only_mode(void)`
Note that when `compel_interrupt_only_mode` is set to `true`,
`compel_interrupt_task()` is used instead of `freeze_processes()`
to prevent tasks from running during `criu dump`.
Fixes: #2508
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
When `check_freezer_cgroup()` has non-zero return value, `goto err` calls
`return ret`. However, the value of `ret` has been set to `0` in the lines
above and CRIU does not handle the error properly.
This problem is related to https://github.com/checkpoint-restore/criu/issues/2508
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
When restoring dumps in new mount + pid namespaces where multiple dumps
share the same network namespace, CRIU may fail due to conflicting
unix socket names. This happens because the service worker creates
sockets using a pattern that includes criu_run_id, but util_init()
is called after cr_service_work() starts.
The socket naming pattern "crtools-fd-%d-%d" uses the restore PID
and criu_run_id, however criu_run_id is always 0 when not initialized,
leading to conflicts when multiple restores run simultaneously either
in the same CRIU process or because of multiple CRIU processes
doing the same operation in different PID namespaces.
Fix this by:
- Moving util_init() before cr_service_work() starts
- Adding a second util_init() call in the service worker fork
to ensure unique IDs across multiple worker runs
- Making sure that dump and restore operations have util_init() called
early to generate unique socket names
With this fix, socket names always include the namespace ID, preventing
conflicts when multiple processes with the same pid share a network
namespace.
Fixes#2499
[ avagin: minore code changes ]
Signed-off-by: Lorenzo Fontana <fontanalorenz@gmail.com>
Signed-off-by: Andrei Vagin <avagin@google.com>
After a fork, both the child and parent processes may trigger a page fault (#PF)
at the same virtual address, referencing the same position in the page image.
If deduplication is enabled, the last process to trigger the page fault will fail.
Therefore, deduplication should be disabled after a fork to prevent this issue.
Signed-off-by: Liu Hua <weldonliu@tencent.com>
This patch blocks SIGCHLD during temporary process creation to prevent a
race condition between kill() and waitpid() where sigchld_handler()
causes `criu restore` to fail with an error.
Fixes: #2490
Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
This patch adds two test plugins to verify that CRIU plugins listed
in the inventory image are enabled, while those that are not listed
can be disabled.
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
This patch extends the inventory image with a `plugins` field that
contains an array of plugins which were used during checkpoint,
for example, to save GPU state. In particular, the CUDA and AMDGPU
plugins are added to this field only when the checkpoint contains
GPU state. This allows to disable unnecessary plugins during restore,
show appropriate error messages if required CRIU plugin are missing,
and migrate a process that does not use GPU from a GPU-enabled system
to CPU-only environment.
We use the `optional plugins_entry` for backwards compatibility. This
entry allows us to distinguish between *unset* and *missing* field:
- When the field is missing, it indicates that the checkpoint was
created with a previous version of CRIU, and all plugins should be
*enabled* during restore.
- When the field is empty, it indicates that no plugins were used during
checkpointing. Thus, all plugins can be *disabled* during restore.
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
This patch fixes the following errors reported by ruff:
lib/pycriu/images/pb2dict.py:307:24: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
|
305 | elif field.type in _basic_cast:
306 | cast = _basic_cast[field.type]
307 | if pretty and (cast == int):
| ^^^^^^^^^^^ E721
308 | if is_hex:
309 | # Fields that have (criu).hex = true option set
|
lib/pycriu/images/pb2dict.py:379:13: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
|
377 | elif field.type in _basic_cast:
378 | cast = _basic_cast[field.type]
379 | if (cast == int) and is_string(value):
| ^^^^^^^^^^^ E721
380 | if _marked_as_dev(field):
381 | return encode_dev(field, value)
|
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
We open a pidfd to a thread using `PIDFD_THREAD` flag and after C/R
ensure that we can send signals using it with `PIDFD_SIGNAL_THREAD`.
signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
After, C/R of pidfds that point to dead processes their inodes might
change. But if two pidfds point to same dead process they should
continue to do so after C/R.
This test ensures that this happens by calling `statx()` on pidfds after
C/R and then comparing their inode numbers.
Support for comparing pidfds by using `statx()` and inode numbers was
introduced alongside pidfs. So if `f_type` of pidfd is not equal to
`PID_FS_MAGIC` then we skip this test.
signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
Validate that pidfds can been used to send signals to different
processes after C/R using the `pidfd_send_signal()` syscall.
Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
Process file descriptors (pidfds) were introduced to provide a stable
handle on a process. They solve the problem of pid recycling.
For a detailed explanation, see https://lwn.net/Articles/801319/ and
http://www.corsix.org/content/what-is-a-pidfd
Before Linux 6.9, anonymous inodes were used for the implementation of
pidfds. So, we detect them in a fashion similiar to other fd types that
use anonymous inodes by calling `readlink()`.
After 6.9, pidfs (a file system for pidfds) was introduced.
In 6.9 `S_ISREG()` returned true for pidfds, but this again changed with
6.10.
(https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/pidfs.c?h=v6.11-rc2#n285)
After this change, pidfs inodes have no file type in st_mode in
userspace.
We use `PID_FS_MAGIC` to detect pidfds for kernel >= 6.9
Hence, check for pidfds occurs before the check for regular files.
For pidfds that refer to dead processes, we lose the pid of the process
as the Pid and NSpid fields in /proc/<pid>/fdinfo/<pidfd> change to -1.
So, we create a temporary process for each unique inode and open pidfds
that refer to this process. After all pidfds have been opened we kill
this temporary process.
This commit does not include support for pidfds that point to a specific
thread, i.e pidfds opened with `PIDFD_THREAD` flag.
Fixes: #2258
Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
We only use the last pid from the list in NSpid entry (from
/proc/<pid>/fdinfo/<pidfd>) while restoring pidfds.
The last pid refers to the pid of the process in the most deeply nested
pid namespace. Since CRIU does not currently support nested pid
namespaces, this entry is the one we want.
After Linux 6.9, inode numbers can be used to compare pidfds. pidfds
referring to the same process will have the same inode numbers. We use
inode numbers to restore pidfds that point to dead processes.
Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
By default, CRIU uses the path "/usr/lib/criu" to install and load
plugins at runtime. This path is defined by the `PLUGINDIR` variable
in Makefile.install and `CR_PLUGIN_DEFAULT` in `criu/include/plugin.h`.
However, some distribution packages might install the CRIU plugins at
"/usr/lib64/criu" instead. This patch updates the makefile to align
the path defined by `CR_PLUGIN_DEFAULT` with the value of `PLUGINDIR`.
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
This patch fixes the following warnings that appear
when building an RPM package:
+ /usr/lib/rpm/redhat/brp-mangle-shebangs
*** WARNING: ./usr/src/debug/criu-4.0-1.fc42.x86_64/plugins/amdgpu/amdgpu_plugin_util.c is executable but has no shebang, removing executable bit
*** WARNING: ./usr/src/debug/criu-4.0-1.fc42.x86_64/plugins/amdgpu/amdgpu_plugin_util.h is executable but has no shebang, removing executable bit
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Major changes:
* CUDA plugin to support checkpointing and restoring NVIDIA CUDA applications.
* Shadow stack support
* Pagemap cache: Added support for PAGEMAP_SCAN ioctl
The full changelog can be found here: https://criu.org/Download/criu/4.0.
Signed-off-by: Andrei Vagin <avagin@gmail.com>
The topology parsing assumed that all parameter names were
30 characters or fewer, but
recommended_sdma_engine_id_mask
is 31 characters.
Make the maximum length a macro, and set it to 64.
Signed-off-by: David Francis <David.Francis@amd.com>
The presence of /dev/nvidiactl indicates that the system has a
compatible NVIDIA GPU driver installed and that the GPU is accessible to
the operating system.
Signed-off-by: Andrei Vagin <avagin@google.com>
Some plugins (e.g., CUDA) may not function correctly when processes are
frozen using cgroups. This change introduces a mechanism to disable the
use of freeze cgroups during process seizing, even if explicitly
requested via the --freeze-cgroup option.
The CUDA plugin is updated to utilize this new mechanism to ensure
compatibility.
Signed-off-by: Andrei Vagin <avagin@google.com>
This patch fixes the following typos reported by codespell:
./test/others/bers/bers.c:394: dependin ==> depending, depend in
./criu/kerndat.c:837: hitted ==> hit
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
The `uninstall_module.py` script is a wrapper for the `pip uninstall`
command that enables support for specifying installation prefix
(i.e., `--prefix`). When this functionality is used, we intentionally
set `sys.path` to include only search paths for the specified prefix
to avoid unintentional uninstallation of packages in system paths.
Since `importlib_metadata` version 8.1.0, the `Distribution.from_name()`
method has been modified [1] to perform additional pre-processing of
Distribution objects [2] that requires loading distribution metadata
and results in the following error:
File "/usr/local/lib/python3.12/site-packages/importlib_metadata/__init__.py", line 422, in <lambda>
buckets = bucket(dists, lambda dist: bool(dist.metadata))
^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/importlib_metadata/__init__.py", line 454, in metadata
from . import _adapters
File "/usr/local/lib/python3.12/site-packages/importlib_metadata/_adapters.py", line 3, in <module>
import email.message
File "/usr/lib64/python3.12/email/message.py", line 11, in <module>
import quopri
ModuleNotFoundError: No module named 'quopri'
This error occurs because we have excluded system paths from the list
of search paths (`sys.path`).
However, this pre-processing is not required for our use case, as we
only use the discovery mechanism of importlib_metadata to resolve the
metadata directory path of the module being uninstalled.
To fix this problem, this patch updates `uninstall_module` to avoid the
`from_name()` method and use `discover(name=package_name)` directly.
[1] a65c29adc0
[2] https://github.com/python/importlib_metadata/blob/a65c29ad/importlib_metadata/__init__.py#L391Fixes: #2468
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
When attempting to checkpoint a container with CUDA processes,
CRIU could fail with the following error:
Error (criu/cr-dump.c:1791): Timeout reached. Try to interrupt: 1
Error (cuda_plugin.c:143): cuda_plugin: Unable to read output of cuda-checkpoint: Interrupted system call
Error (cuda_plugin.c:384): cuda_plugin: PAUSE_DEVICES failed with
In this situation, the target process is locked, but CRIU fails due to
a timeout and exits with an error. We need to make sure that the target
PID is unlocked in such case.
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>