2
0
mirror of https://github.com/checkpoint-restore/criu synced 2025-08-29 05:18:00 +00:00

11541 Commits

Author SHA1 Message Date
Haorong Lu
bb29067de9 zdtm: add riscv64 support
Signed-off-by: Haorong Lu <ancientmodern4@gmail.com>
2025-03-21 12:40:31 -07:00
Haorong Lu
6d970ed047 criu: add riscv64 support to parasite and restorer
Co-authored-by: Yixue Zhao <felicitia2010@gmail.com>
Co-authored-by: stove <stove@rivosinc.com>
Signed-off-by: Haorong Lu <ancientmodern4@gmail.com>
2025-03-21 12:40:31 -07:00
Haorong Lu
1d028ef44e images: add riscv64 core image
Co-authored-by: Yixue Zhao <felicitia2010@gmail.com>
Co-authored-by: stove <stove@rivosinc.com>
Signed-off-by: Haorong Lu <ancientmodern4@gmail.com>
2025-03-21 12:40:31 -07:00
Haorong Lu
95359a62aa compel: add riscv64 support
Co-authored-by: Yixue Zhao <felicitia2010@gmail.com>
Co-authored-by: stove <stove@rivosinc.com>
Signed-off-by: Haorong Lu <ancientmodern4@gmail.com>
---
- rebased
- added a membarrier() to syscall table (fix authored by Cryolitia PukNgae)
Signed-off-by: PukNgae Cryolitia <Cryolitia@gmail.com>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2025-03-21 12:40:31 -07:00
Haorong Lu
d8f93e7bac include: add common header files for riscv64
Co-authored-by: Yixue Zhao <felicitia2010@gmail.com>
Co-authored-by: stove <stove@rivosinc.com>
Signed-off-by: Haorong Lu <ancientmodern4@gmail.com>
---
- rebased
- imported a page_size() type fix (authored by Cryolitia PukNgae)
Signed-off-by: PukNgae Cryolitia <Cryolitia@gmail.com>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2025-03-21 12:40:31 -07:00
Bhavik Sachdev
c49eb18f9f pidfd: block SIGCHLD during tmp process creation
This patch blocks SIGCHLD during temporary process creation to prevent a
race condition between kill() and waitpid() where sigchld_handler()
causes `criu restore` to fail with an error.

Fixes: #2490

Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-03-21 12:40:31 -07:00
Radostin Stoyanov
5ca4400699 zdtm: add inventory test plugins
This patch adds two test plugins to verify that CRIU plugins listed
in the inventory image are enabled, while those that are not listed
can be disabled.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-03-21 12:40:31 -07:00
Radostin Stoyanov
5335b35f72 images/inventory: add field for enabled plugins
This patch extends the inventory image with a `plugins` field that
contains an array of plugins which were used during checkpoint,
for example, to save GPU state. In particular, the CUDA and AMDGPU
plugins are added to this field only when the checkpoint contains
GPU state. This allows to disable unnecessary plugins during restore,
show appropriate error messages if required CRIU plugin are missing,
and migrate a process that does not use GPU from a GPU-enabled system
to CPU-only environment.

We use the `optional plugins_entry` for backwards compatibility. This
entry allows us to distinguish between *unset* and *missing* field:

- When the field is missing, it indicates that the checkpoint was
  created with a previous version of CRIU, and all plugins should be
  *enabled* during restore.

- When the field is empty, it indicates that no plugins were used during
  checkpointing. Thus, all plugins can be *disabled* during restore.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-03-21 12:40:31 -07:00
Radostin Stoyanov
b524dab32f pycriu: fix lint errors
This patch fixes the following errors reported by ruff:

lib/pycriu/images/pb2dict.py:307:24: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
    |
305 |     elif field.type in _basic_cast:
306 |         cast = _basic_cast[field.type]
307 |         if pretty and (cast == int):
    |                        ^^^^^^^^^^^ E721
308 |             if is_hex:
309 |                 # Fields that have (criu).hex = true option set
    |

lib/pycriu/images/pb2dict.py:379:13: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
    |
377 |     elif field.type in _basic_cast:
378 |         cast = _basic_cast[field.type]
379 |         if (cast == int) and is_string(value):
    |             ^^^^^^^^^^^ E721
380 |             if _marked_as_dev(field):
381 |                 return encode_dev(field, value)
    |

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-03-21 12:40:31 -07:00
Radostin Stoyanov
88aa7e2c10 make/lint: use 'ruff check <path>'
The command `ruff <path>` has been deprecated and removed:
https://astral.sh/blog/ruff-v0.5.0#removed-deprecated-features

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-03-21 12:40:31 -07:00
Bhavik Sachdev
f29e655df9 zdtm: Check pidfd for thread is valid after C/R
We open a pidfd to a thread using `PIDFD_THREAD` flag and after C/R
ensure that we can send signals using it with `PIDFD_SIGNAL_THREAD`.

signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
2025-03-21 12:40:31 -07:00
Bhavik Sachdev
7a64004dc8 zdtm: Check fd from pidfd_getfd is C/Red correctly
We get the read end of a pipe using `pidfd_getfd` and check if we can
read from it after C/R.

signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
2025-03-21 12:40:31 -07:00
Bhavik Sachdev
2e6f348458 zdtm: Check dead pidfd is restored correctly
After, C/R of pidfds that point to dead processes their inodes might
change. But if two pidfds point to same dead process they should
continue to do so after C/R.

This test ensures that this happens by calling `statx()` on pidfds after
C/R and then comparing their inode numbers.

Support for comparing pidfds by using `statx()` and inode numbers was
introduced alongside pidfs. So if `f_type` of pidfd is not equal to
`PID_FS_MAGIC` then we skip this test.

signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
2025-03-21 12:40:31 -07:00
Bhavik Sachdev
3f30ec0eda zdtm: Check pidfd can kill descendant processes
Validate that pidfds can been used to send signals to different
processes after C/R using the `pidfd_send_signal()` syscall.

Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
2025-03-21 12:40:31 -07:00
Bhavik Sachdev
2899d46000 zdtm: Check pidfd can send signal after C/R
Ensure `pidfd_send_signal()` syscall works as expected after C/R.

Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
2025-03-21 12:40:31 -07:00
Bhavik Sachdev
3096df9ea3 zdtm: Check pidfd fdinfo entry is consistent
Ensures that entries in /proc/<pid>/fdinfo/<pidfd> are same.

Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
2025-03-21 12:40:31 -07:00
Bhavik Sachdev
1ce408ffa4 criu: Support C/R of pidfds
Process file descriptors (pidfds) were introduced to provide a stable
handle on a process. They solve the problem of pid recycling.

For a detailed explanation, see https://lwn.net/Articles/801319/ and
http://www.corsix.org/content/what-is-a-pidfd

Before Linux 6.9, anonymous inodes were used for the implementation of
pidfds. So, we detect them in a fashion similiar to other fd types that
use anonymous inodes by calling `readlink()`.
After 6.9, pidfs (a file system for pidfds) was introduced.
In 6.9 `S_ISREG()` returned true for pidfds, but this again changed with
6.10.
(https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/pidfs.c?h=v6.11-rc2#n285)
After this change, pidfs inodes have no file type in st_mode in
userspace.
We use `PID_FS_MAGIC` to detect pidfds for kernel >= 6.9
Hence, check for pidfds occurs before the check for regular files.

For pidfds that refer to dead processes, we lose the pid of the process
as the Pid and NSpid fields in /proc/<pid>/fdinfo/<pidfd> change to -1.
So, we create a temporary process for each unique inode and open pidfds
that refer to this process. After all pidfds have been opened we kill
this temporary process.

This commit does not include support for pidfds that point to a specific
thread, i.e pidfds opened with `PIDFD_THREAD` flag.

Fixes: #2258

Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
2025-03-21 12:40:31 -07:00
Bhavik Sachdev
3322d1e94c images: Add protobuf definition for pidfd
We only use the last pid from the list in NSpid entry (from
/proc/<pid>/fdinfo/<pidfd>) while restoring pidfds.
The last pid refers to the pid of the process in the most deeply nested
pid namespace. Since CRIU does not currently support nested pid
namespaces, this entry is the one we want.

After Linux 6.9, inode numbers can be used to compare pidfds. pidfds
referring to the same process will have the same inode numbers. We use
inode numbers to restore pidfds that point to dead processes.

Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
2025-03-21 12:40:31 -07:00
Radostin Stoyanov
4f8f6f2883 Makefile.config: set CR_PLUGIN_DEFAULT variable
By default, CRIU uses the path "/usr/lib/criu" to install and load
plugins at runtime. This path is defined by the `PLUGINDIR` variable
in Makefile.install and `CR_PLUGIN_DEFAULT` in `criu/include/plugin.h`.
However, some distribution packages might install the CRIU plugins at
"/usr/lib64/criu" instead. This patch updates the makefile to align
the path defined by `CR_PLUGIN_DEFAULT` with the value of `PLUGINDIR`.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-03-21 12:40:31 -07:00
Radostin Stoyanov
f1d465448f amdgpu: remove exec permissions on source files
This patch fixes the following warnings that appear
when building an RPM package:

+ /usr/lib/rpm/redhat/brp-mangle-shebangs
*** WARNING: ./usr/src/debug/criu-4.0-1.fc42.x86_64/plugins/amdgpu/amdgpu_plugin_util.c is executable but has no shebang, removing executable bit
*** WARNING: ./usr/src/debug/criu-4.0-1.fc42.x86_64/plugins/amdgpu/amdgpu_plugin_util.h is executable but has no shebang, removing executable bit

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-03-21 12:40:31 -07:00
Andrei Vagin
c2b48ff423 criu: Version 4.0 (CRIUDA)
Major changes:
* CUDA plugin to support checkpointing and restoring NVIDIA CUDA applications.
* Shadow stack support
* Pagemap cache: Added support for PAGEMAP_SCAN ioctl

The full changelog can be found here: https://criu.org/Download/criu/4.0.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
v4.0
2024-09-21 16:34:14 -07:00
Andrei Vagin
a8cbe76d4f util: dump fsfd log messages
It should help to investigate errors of fsconfig, fsmount and etc.

Signed-off-by: Andrei Vagin <avagin@google.com>
2024-09-19 15:23:42 -07:00
David Francis
096c1f7a4d plugins/amdgpu - Increase maximum parameter length
The topology parsing assumed that all parameter names were
30 characters or fewer, but

recommended_sdma_engine_id_mask

is 31 characters.

Make the maximum length a macro, and set it to 64.

Signed-off-by: David Francis <David.Francis@amd.com>
2024-09-19 15:23:42 -07:00
David Francis
60ee5ebd9d plugins/amdgpu: Zero ib_info on initialization
This struct was being used un-initialized, meaning it
was filled with random garbage.

Mea culpa.

Signed-off-by: David Francis <David.Francis@amd.com>
2024-09-19 15:23:42 -07:00
Andrei Vagin
6918998897 plugin/cuda: disable CUDA plugin if /dev/nvidiactl isn't present
The presence of /dev/nvidiactl indicates that the system has a
compatible NVIDIA GPU driver installed and that the GPU is accessible to
the operating system.

Signed-off-by: Andrei Vagin <avagin@google.com>
2024-09-19 15:23:42 -07:00
Andrei Vagin
e1331a4b60 fault: allow to check dont_use_freeze_cgroup
Adds a new "fault" to call dont_use_freeze_cgroup.

Signed-off-by: Andrei Vagin <avagin@google.com>
2024-09-19 15:23:42 -07:00
Andrei Vagin
651df375bd criu: Allow disabling freeze cgroups
Some plugins (e.g., CUDA) may not function correctly when processes are
frozen using cgroups. This change introduces a mechanism to disable the
use of freeze cgroups during process seizing, even if explicitly
requested via the --freeze-cgroup option.

The CUDA plugin is updated to utilize this new mechanism to ensure
compatibility.

Signed-off-by: Andrei Vagin <avagin@google.com>
2024-09-19 15:23:42 -07:00
Radostin Stoyanov
59f49c6276 codespell: fix typos
This patch fixes the following typos reported by codespell:

./test/others/bers/bers.c:394: dependin ==> depending, depend in
./criu/kerndat.c:837: hitted ==> hit

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-19 15:23:42 -07:00
Radostin Stoyanov
edb6fbb820 scripts/uninstall_module: fix package discovery
The `uninstall_module.py` script is a wrapper for the `pip uninstall`
command that enables support for specifying installation prefix
(i.e., `--prefix`). When this functionality is used, we intentionally
set `sys.path` to include only search paths for the specified prefix
to avoid unintentional uninstallation of packages in system paths.

Since `importlib_metadata` version 8.1.0, the `Distribution.from_name()`
method has been modified [1] to perform additional pre-processing of
Distribution objects [2] that requires loading distribution metadata
and results in the following error:

  File "/usr/local/lib/python3.12/site-packages/importlib_metadata/__init__.py", line 422, in <lambda>
    buckets = bucket(dists, lambda dist: bool(dist.metadata))
                                              ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/importlib_metadata/__init__.py", line 454, in metadata
    from . import _adapters
  File "/usr/local/lib/python3.12/site-packages/importlib_metadata/_adapters.py", line 3, in <module>
    import email.message
  File "/usr/lib64/python3.12/email/message.py", line 11, in <module>
    import quopri
  ModuleNotFoundError: No module named 'quopri'

This error occurs because we have excluded system paths from the list
of search paths (`sys.path`).

However, this pre-processing is not required for our use case, as we
only use the discovery mechanism of importlib_metadata to resolve the
metadata directory path of the module being uninstalled.

To fix this problem, this patch updates `uninstall_module` to avoid the
`from_name()` method and use `discover(name=package_name)` directly.

[1] a65c29adc0
[2] https://github.com/python/importlib_metadata/blob/a65c29ad/importlib_metadata/__init__.py#L391

Fixes: #2468

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-19 15:23:42 -07:00
Radostin Stoyanov
b1b3c14b17 cuda: unlock on timeout error
When attempting to checkpoint a container with CUDA processes,
CRIU could fail with the following error:

	Error (criu/cr-dump.c:1791): Timeout reached. Try to interrupt: 1
	Error (cuda_plugin.c:143): cuda_plugin: Unable to read output of cuda-checkpoint: Interrupted system call
	Error (cuda_plugin.c:384): cuda_plugin: PAUSE_DEVICES failed with

In this situation, the target process is locked, but CRIU fails due to
a timeout and exits with an error. We need to make sure that the target
PID is unlocked in such case.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-19 15:23:42 -07:00
Adrian Reber
dbfa450246 ci: run aarch64 tests native via actuated
Signed-off-by: Adrian Reber <areber@redhat.com>
2024-09-19 15:23:42 -07:00
Adrian Reber
8beac656fc coredump: fail on unsupported architectures early
Currently coredump only works on x86_64. Fail early on any other
architecture.

Signed-off-by: Adrian Reber <areber@redhat.com>
2024-09-19 15:23:42 -07:00
Adrian Reber
d44fc0de5a test: only run macvlan tests if macvlan devices can be created
Some test environments (Actuated runners for example) do not support
maclvan devices. Skip tests depending on it automatically.

Signed-off-by: Adrian Reber <areber@redhat.com>
2024-09-19 15:23:42 -07:00
Adrian Reber
01c65732b6 test: better test for SELinux tools
Previously the check was just if /sys/fs/selinux is mounted. This
extends the check to see if all necessary tools are installed.

Signed-off-by: Adrian Reber <areber@redhat.com>
2024-09-19 15:23:42 -07:00
Adrian Reber
615ccf98cf crit: do not crash on aarch64 doing 'crit x ./ rss'
Running 'crit x ./ rss' on aarch64 crashes with:

    File "/home/criu/crit/crit/__main__.py", line 331, in explore_rss
      while vmas[vmi]['start'] < pme:
            ~~~~^^^^^
  IndexError: list index out of range

This adds an additional check to the while loop to do access indexes out
of range.

Signed-off-by: Adrian Reber <areber@redhat.com>
2024-09-19 15:23:42 -07:00
Radostin Stoyanov
21ea718f9f plugins/amdgpu: fix printf format specifiers
Errors on aarch64:

	In file included from amdgpu_plugin_drm.h:10,
			 from amdgpu_plugin.c:33:
	amdgpu_plugin.c: In function 'amdgpu_plugin_dump_file':
	amdgpu_plugin_util.h:24:20: error: format '%lld' expects argument of type 'long long int', but argument 6 has type '__u64' {aka 'long unsigned int'} [-Werror=format=]
	   24 | #define LOG_PREFIX "amdgpu_plugin: "
	      |                    ^~~~~~~~~~~~~~~~~
	../../criu/include/log.h:47:52: note: in expansion of macro 'LOG_PREFIX'
	   47 | #define pr_info(fmt, ...) print_on_level(LOG_INFO, LOG_PREFIX fmt, ##__VA_ARGS__)
	      |                                                    ^~~~~~~~~~
	amdgpu_plugin.c:1236:9: note: in expansion of macro 'pr_info'
	 1236 |         pr_info("devices:%d bos:%d objects:%d priv_data:%lld\n", args.num_devices, args.num_bos, args.num_objects,
	      |         ^~~~~~~
	cc1: all warnings being treated as errors

Errors on ppc64:

	In file included from amdgpu_plugin_drm.h:10,
			 from amdgpu_plugin.c:33:
	amdgpu_plugin.c: In function 'amdgpu_plugin_dump_file':
	amdgpu_plugin_util.h:24:20: error: format '%llu' expects argument of type 'long long unsigned int', but argument 6 has type '__u64' {aka 'long unsigned int'} [-Werror=format=]
	   24 | #define LOG_PREFIX "amdgpu_plugin: "
	      |                    ^~~~~~~~~~~~~~~~~
	../../criu/include/log.h:47:52: note: in expansion of macro 'LOG_PREFIX'
	   47 | #define pr_info(fmt, ...) print_on_level(LOG_INFO, LOG_PREFIX fmt, ##__VA_ARGS__)
	      |                                                    ^~~~~~~~~~
	amdgpu_plugin.c:1236:9: note: in expansion of macro 'pr_info'
	 1236 |         pr_info("devices:%u bos:%u objects:%u priv_data:%llu\n",
	      |         ^~~~~~~
	cc1: all warnings being treated as errors
	In file included from amdgpu_plugin_util.c:38:
	amdgpu_plugin_util.c: In function 'print_kfd_bo_stat':
	amdgpu_plugin_util.h:24:20: error: format '%llx' expects argument of type 'long long unsigned int', but argument 5 has type '__u64' {aka 'long unsigned int'} [-Werror=format=]
	   24 | #define LOG_PREFIX "amdgpu_plugin: "
	      |                    ^~~~~~~~~~~~~~~~~
	../../criu/include/log.h:47:52: note: in expansion of macro 'LOG_PREFIX'
	   47 | #define pr_info(fmt, ...) print_on_level(LOG_INFO, LOG_PREFIX fmt, ##__VA_ARGS__)
	      |                                                    ^~~~~~~~~~
	amdgpu_plugin_util.c:196:17: note: in expansion of macro 'pr_info'
	  196 |                 pr_info("%s(), %d. KFD BO Addr: %llx \n", __func__, idx, bo->addr);
	      |                 ^~~~~~~
	amdgpu_plugin_util.h:24:20: error: format '%llx' expects argument of type 'long long unsigned int', but argument 5 has type '__u64' {aka 'long unsigned int'} [-Werror=format=]
	   24 | #define LOG_PREFIX "amdgpu_plugin: "
	      |                    ^~~~~~~~~~~~~~~~~
	../../criu/include/log.h:47:52: note: in expansion of macro 'LOG_PREFIX'
	   47 | #define pr_info(fmt, ...) print_on_level(LOG_INFO, LOG_PREFIX fmt, ##__VA_ARGS__)
	      |                                                    ^~~~~~~~~~
	amdgpu_plugin_util.c:197:17: note: in expansion of macro 'pr_info'
	  197 |                 pr_info("%s(), %d. KFD BO Size: %llx \n", __func__, idx, bo->size);
	      |                 ^~~~~~~
	amdgpu_plugin_util.h:24:20: error: format '%llx' expects argument of type 'long long unsigned int', but argument 5 has type '__u64' {aka 'long unsigned int'} [-Werror=format=]
	   24 | #define LOG_PREFIX "amdgpu_plugin: "
	      |                    ^~~~~~~~~~~~~~~~~
	../../criu/include/log.h:47:52: note: in expansion of macro 'LOG_PREFIX'
	   47 | #define pr_info(fmt, ...) print_on_level(LOG_INFO, LOG_PREFIX fmt, ##__VA_ARGS__)
	      |                                                    ^~~~~~~~~~
	amdgpu_plugin_util.c:198:17: note: in expansion of macro 'pr_info'
	  198 |                 pr_info("%s(), %d. KFD BO Offset: %llx \n", __func__, idx, bo->offset);
	      |                 ^~~~~~~
	amdgpu_plugin_util.h:24:20: error: format '%llx' expects argument of type 'long long unsigned int', but argument 5 has type '__u64' {aka 'long unsigned int'} [-Werror=format=]
	   24 | #define LOG_PREFIX "amdgpu_plugin: "
	      |                    ^~~~~~~~~~~~~~~~~
	../../criu/include/log.h:47:52: note: in expansion of macro 'LOG_PREFIX'
	   47 | #define pr_info(fmt, ...) print_on_level(LOG_INFO, LOG_PREFIX fmt, ##__VA_ARGS__)
	      |                                                    ^~~~~~~~~~
	amdgpu_plugin_util.c:199:17: note: in expansion of macro 'pr_info'
	  199 |                 pr_info("%s(), %d. KFD BO Restored Offset: %llx \n", __func__, idx, bo->restored_offset);
	      |                 ^~~~~~~
	cc1: all warnings being treated as errors

Co-developed-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-19 15:23:42 -07:00
Radostin Stoyanov
3e2ed18790 plugins/amdgpu: use C99-standard types
Co-developed-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-19 15:23:42 -07:00
Radostin Stoyanov
d68205e919 ci: enable cross compile testing for amdgpu-plugin
Skip cross-compilation on armv7 because, among many other errors,
it fails with the following:

	In file included from ../../include/common/lock.h:9,
			 from ../../criu/include/files.h:9,
			 from amdgpu_plugin.c:30:
	../../include/common/asm/atomic.h:60:2: error: #error ARM architecture version (CONFIG_ARMV*) not set or unsupported.
	   60 | #error ARM architecture version (CONFIG_ARMV*) not set or unsupported.
	      |  ^~~~~
	../../include/common/asm/atomic.h: In function 'atomic_add_return':
	../../include/common/asm/atomic.h:81:9: error: implicit declaration of function 'smp_mb' [-Werror=implicit-function-declaration]
	   81 |         smp_mb();
	      |         ^~~~~~

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-19 15:23:42 -07:00
Radostin Stoyanov
2ee5844411 plugins/amdgpu: fix cross-compilation
To enable cross-compile we need to use the CC definition from
criu/scripts/nmk/scripts/tools.mk:

CC := $(CROSS_COMPILE)$(HOSTCC)

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-19 15:23:42 -07:00
Andrei Vagin
9a19cf34de scripts/ci: run tests with the mocked cuda-checkpoint tool
Signed-off-by: Andrei Vagin <avagin@google.com>
2024-09-11 16:02:11 -07:00
Andrei Vagin
de31abb970 criu/plugin: don't call plugin device hooks for non-alive tasks
Dead tasks don't hold any resources.

Fixes: 2465
Signed-off-by: Andrei Vagin <avagin@google.com>
2024-09-11 16:02:11 -07:00
Andrei Vagin
dea6305914 test/zdtm: allow to run tests with the mocked cuda-checkpoint tool
Here is an example how to run one test:
$ python test/zdtm.py run -t zdtm/static/env00 --ignore-taint --mocked-cuda-checkpoint

Signed-off-by: Andrei Vagin <avagin@google.com>
2024-09-11 16:02:11 -07:00
haozi007
67fe44e981 support user set remote mmap vma address
1. os auto assignment vma addr maybe conflict with vma in gpu living migrate scene;
2. so, we should give choice to user;

Signed-off-by: haozi007 <liuhao27@huawei.com>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
551cd92447 timer: fix printf specifiers for __suseconds64_t
New internal glibc types __timeval64 [1] and __suseconds64_t [2] have
been introduced as a solution for the Y2038 problem [3]. These 64-bit
types are used across all architectures. However, this change causes
the following build errors when cross-compiling on ARMv7 (armhf):

criu/timer.c:49:17: error: format '%ld' expects argument of type 'long int', but argument 5 has type '__suseconds64_t' {aka 'long long int'} [-Werror=format=]
   49 |         pr_info("Restored %s timer to %" PRId64 ".%ld -> %" PRId64 ".%ld\n", n,
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~
   50 |                 (int64_t)val->it_value.tv_sec, val->it_value.tv_usec,
      |                                                ~~~~~~~~~~~~~~~~~~~~~
      |                                                             |
      |                                                             __suseconds64_t {aka long long int}

criu/timer.c:49:17: error: format '%ld' expects argument of type 'long int', but argument 7 has type '__suseconds64_t' {aka 'long long int'} [-Werror=format=]
   49 |         pr_info("Restored %s timer to %" PRId64 ".%ld -> %" PRId64 ".%ld\n", n,
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~
   50 |                 (int64_t)val->it_value.tv_sec, val->it_value.tv_usec,
   51 |                 (int64_t)val->it_interval.tv_sec, val->it_interval.tv_usec);
      |                                                   ~~~~~~~~~~~~~~~~~~~~~~~~
      |                                                                   |
      |                                                                   __suseconds64_t {aka long long int}

ns.c:234:48: error: format '%ld' expects argument of type 'long int', but argument 5 has type 'time_t' {aka 'long long int'} [-Werror=format=]
  234 |         len = snprintf(buf, sizeof(buf), "%d %ld 0", clk_id, offset);
      |                                              ~~^             ~~~~~~
      |                                                |             |
      |                                                long int      time_t {aka long long int}
      |                                              %lld

msg.c:58:41: error: format '%ld' expects argument of type 'long int', but argument 3 has type '__suseconds64_t' {aka 'long long int'} [-Werror=format=]
   58 |         off += sprintf(buf + off, ".%.3ld: ", tv.tv_usec / 1000);
      |                                     ~~~~^     ~~~~~~~~~~~~~~~~~
      |                                         |                |
      |                                         long int         __suseconds64_t {aka long long int}
      |                                     %.3lld

../lib/zdtmtst.h:137:26: error: format '%ld' expects argument of type 'long int', but argument 4 has type '__time64_t' {aka 'long long int'} [-Werror=format=]
  137 |                 test_msg("ERR: %s:%d: " format " (errno = %d (%s))\n", __FILE__, __LINE__, ##arg, errno, \
      |                          ^~~~~~~~~~~~~~
pthread_timers_h.c:72:17: note: in expansion of macro 'pr_perror'
   72 |                 pr_perror("wrong interval: %ld:%ld", itimerspec.it_interval.tv_sec, itimerspec.it_interval.tv_nsec);
      |                 ^~~~~~~~~

vdso00.c:22:32: error: format '%li' expects argument of type 'long int', but argument 3 has type '__time64_t' {aka 'long long int'} [-Werror=format=]
   22 |         test_msg("%d time: %10li\n", getpid(), tv.tv_sec);
      |                            ~~~~^               ~~~~~~~~~
      |                                |                 |
      |                                long int          __time64_t {aka long long int}
      |                            %10lli

vdso00.c:29:32: error: format '%li' expects argument of type 'long int', but argument 3 has type '__time64_t' {aka 'long long int'} [-Werror=format=]
   29 |         test_msg("%d time: %10li\n", getpid(), tv.tv_sec);
      |                            ~~~~^               ~~~~~~~~~
      |                                |                 |
      |                                long int          __time64_t {aka long long int}
      |                            %10lli

vdso01.c:357:42: error: format '%li' expects argument of type 'long int', but argument 2 has type '__time64_t' {aka 'long long int'} [-Werror=format=]
  357 |         test_msg("gettimeofday: tv_sec %li vdso_gettimeofday: tv_sec %li\n", tv1.tv_sec, tv2.tv_sec);
      |                                        ~~^                                   ~~~~~~~~~~
      |                                          |                                      |
      |                                          long int                               __time64_t {aka long long int}
      |                                        %lli

vdso01.c:357:72: error: format '%li' expects argument of type 'long int', but argument 3 has type '__time64_t' {aka 'long long int'} [-Werror=format=]
  357 |         test_msg("gettimeofday: tv_sec %li vdso_gettimeofday: tv_sec %li\n", tv1.tv_sec, tv2.tv_sec);
      |                                                                      ~~^                 ~~~~~~~~~~
      |                                                                        |                    |
      |                                                                        long int             __time64_t {aka long long int}
      |

vdso01.c:328:43: error: format '%li' expects argument of type 'long int', but argument 2 has type '__time64_t' {aka 'long long int'} [-Werror=format=]
  328 |         test_msg("clock_gettime: tv_sec %li vdso_clock_gettime: tv_sec %li\n", ts1.tv_sec, ts2.tv_sec);
      |                                         ~~^                                    ~~~~~~~~~~
      |                                           |                                       |
      |                                           long int                                __time64_t {aka long long int}
      |                                         %lli

vdso01.c:328:74: error: format '%li' expects argument of type 'long int', but argument 3 has type '__time64_t' {aka 'long long int'} [-Werror=format=]
  328 |         test_msg("clock_gettime: tv_sec %li vdso_clock_gettime: tv_sec %li\n", ts1.tv_sec, ts2.tv_sec);
      |                                                                        ~~^                 ~~~~~~~~~~
      |                                                                          |                    |
      |                                                                          long int             __time64_t {aka long long int}
      |

../lib/zdtmtst.h:144:26: error: format '%ld' expects argument of type 'long int', but argument 4 has type 'time_t' {aka 'long long int'} [-Werror=format=]
  144 |                 test_msg("FAIL: %s:%d: " format " (errno = %d (%s))\n", __FILE__, __LINE__, ##arg, errno, \
      |                          ^~~~~~~~~~~~~~~
mtime_mmap.c:80:17: note: in expansion of macro 'fail'
   80 |                 fail("mtime %ld wasn't updated on mmapped %s file", mtime_new, filename);
      |                 ^~~~

../lib/zdtmtst.h:144:26: error: format '%ld' expects argument of type 'long int', but argument 4 has type '__time64_t' {aka 'long long int'} [-Werror=format=]
  144 |                 test_msg("FAIL: %s:%d: " format " (errno = %d (%s))\n", __FILE__, __LINE__, ##arg, errno, \
      |                          ^~~~~~~~~~~~~~~
mtime_mmap.c:101:17: note: in expansion of macro 'fail'
  101 |                 fail("After migration, mtime changed to %ld", fst.st_mtime);
      |                 ^~~~

[1] https://sourceware.org/git/?p=glibc.git;h=504c98717062cb9bcbd4b3e59e932d04331ddca5
[2] https://sourceware.org/git/?p=glibc.git;h=3fced064f23562ec24f8312ffbc14950993969e6
[3] https://en.wikipedia.org/wiki/Year_2038_problem

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
a045c874cb ci: run tests with amdgpu and cuda plugins
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
2453ed69a2 zdtm: add option to run tests with criu plugins
By default, if the "CRIU_LIBS_DIR" environment variable is not set,
CRIU will load all plugins installed in `/usr/lib/criu`. This may
result in running the ZDTM tests with plugins for a different version
of CRIU (e.g., installed from a package).

This patch updates ZDTM to always set the "CRIU_LIBS_DIR" environment
variable and use a local "plugins" directory. This directory contains
copies of the plugin files built from source. In addition, this patch
adds the `--criu-plugin` option to the `zdtm.py run` command, allowing
tests to be run with specified CRIU plugins.

Example:

  - Run test only with AMDGPU plugin
    ./zdtm.py run -t zdtm/static/busyloop00 --criu-plugin amdgpu

  - Run test only with CUDA plugin
    ./zdtm.py run -t zdtm/static/busyloop00 --criu-plugin cuda

  - Run test with both AMDGPU and CUDA plugins
    ./zdtm.py run -t zdtm/static/busyloop00 --criu-plugin amdgpu cuda

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
ad66c27a11 cuda: fix launch cuda-checkpoint
When the cuda-checkpoint tool is not installed, execvp() is expected to
fail and return -1. In this case, we need to call exit() to terminate
the child process that was created earlier with fork().

Since CRIU can be used with applications that do not use CUDA, even
when the CUDA plugin is installed, this patch also updates the log
messages to show debug and warning (instead of error) when the
cuda-checkpoint tool is not found in $PATH.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Signed-off-by: Andrei Vagin <avagin@google.com>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
fde0b7ac69 cuda: don't leak fds to cuda-checkpoint
Leaking open file descriptors to third-party tools can lead
to security risks.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
4dde52a308 ci/podman: show mounts
Show information about mounts available on the host filesystem.
This is useful for debugging.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
9a85fb6382 ci/podman: show criu logs in case of error
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00