mir/criu - criu - Mike's Git repositories

mir/criu

mirror of https://github.com/checkpoint-restore/criu synced 2025-08-22 01:51:51 +00:00

Author	SHA1	Message	Date
Radostin Stoyanov	adf2c5be96	images/inventory: add field for enabled plugins This patch extends the inventory image with a `plugins` field that contains an array of plugins which were used during checkpoint, for example, to save GPU state. In particular, the CUDA and AMDGPU plugins are added to this field only when the checkpoint contains GPU state. This allows to disable unnecessary plugins during restore, show appropriate error messages if required CRIU plugin are missing, and migrate a process that does not use GPU from a GPU-enabled system to CPU-only environment. We use the `optional plugins_entry` for backwards compatibility. This entry allows us to distinguish between unset and missing field: - When the field is missing, it indicates that the checkpoint was created with a previous version of CRIU, and all plugins should be enabled during restore. - When the field is empty, it indicates that no plugins were used during checkpointing. Thus, all plugins can be disabled during restore. Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2024-10-26 22:18:22 -07:00
Radostin Stoyanov	87b5ac9d9f	pycriu: fix lint errors This patch fixes the following errors reported by ruff: lib/pycriu/images/pb2dict.py:307:24: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks \| 305 \| elif field.type in _basic_cast: 306 \| cast = _basic_cast[field.type] 307 \| if pretty and (cast == int): \| ^^^^^^^^^^^ E721 308 \| if is_hex: 309 \| # Fields that have (criu).hex = true option set \| lib/pycriu/images/pb2dict.py:379:13: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks \| 377 \| elif field.type in _basic_cast: 378 \| cast = _basic_cast[field.type] 379 \| if (cast == int) and is_string(value): \| ^^^^^^^^^^^ E721 380 \| if _marked_as_dev(field): 381 \| return encode_dev(field, value) \| Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2024-10-26 22:18:22 -07:00
Radostin Stoyanov	0e780145a5	make/lint: use 'ruff check <path>' The command `ruff <path>` has been deprecated and removed: https://astral.sh/blog/ruff-v0.5.0#removed-deprecated-features Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2024-10-26 22:18:22 -07:00
Bhavik Sachdev	b88d40e334	zdtm: Check pidfd for thread is valid after C/R We open a pidfd to a thread using `PIDFD_THREAD` flag and after C/R ensure that we can send signals using it with `PIDFD_SIGNAL_THREAD`. signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>	2024-10-26 22:18:22 -07:00
Bhavik Sachdev	98c49d0c4f	zdtm: Check fd from pidfd_getfd is C/Red correctly We get the read end of a pipe using `pidfd_getfd` and check if we can read from it after C/R. signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>	2024-10-26 22:18:22 -07:00
Bhavik Sachdev	643e160210	zdtm: Check dead pidfd is restored correctly After, C/R of pidfds that point to dead processes their inodes might change. But if two pidfds point to same dead process they should continue to do so after C/R. This test ensures that this happens by calling `statx()` on pidfds after C/R and then comparing their inode numbers. Support for comparing pidfds by using `statx()` and inode numbers was introduced alongside pidfs. So if `f_type` of pidfd is not equal to `PID_FS_MAGIC` then we skip this test. signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>	2024-10-26 22:18:22 -07:00
Bhavik Sachdev	032a822a28	zdtm: Check pidfd can kill descendant processes Validate that pidfds can been used to send signals to different processes after C/R using the `pidfd_send_signal()` syscall. Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>	2024-10-26 22:18:22 -07:00
Bhavik Sachdev	487853ff21	zdtm: Check pidfd can send signal after C/R Ensure `pidfd_send_signal()` syscall works as expected after C/R. Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>	2024-10-26 22:18:22 -07:00
Bhavik Sachdev	f9fdfcacdc	zdtm: Check pidfd fdinfo entry is consistent Ensures that entries in /proc/<pid>/fdinfo/<pidfd> are same. Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>	2024-10-26 22:18:22 -07:00
Bhavik Sachdev	99ec62028b	criu: Support C/R of pidfds Process file descriptors (pidfds) were introduced to provide a stable handle on a process. They solve the problem of pid recycling. For a detailed explanation, see https://lwn.net/Articles/801319/ and http://www.corsix.org/content/what-is-a-pidfd Before Linux 6.9, anonymous inodes were used for the implementation of pidfds. So, we detect them in a fashion similiar to other fd types that use anonymous inodes by calling `readlink()`. After 6.9, pidfs (a file system for pidfds) was introduced. In 6.9 `S_ISREG()` returned true for pidfds, but this again changed with 6.10. (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/pidfs.c?h=v6.11-rc2#n285) After this change, pidfs inodes have no file type in st_mode in userspace. We use `PID_FS_MAGIC` to detect pidfds for kernel >= 6.9 Hence, check for pidfds occurs before the check for regular files. For pidfds that refer to dead processes, we lose the pid of the process as the Pid and NSpid fields in /proc/<pid>/fdinfo/<pidfd> change to -1. So, we create a temporary process for each unique inode and open pidfds that refer to this process. After all pidfds have been opened we kill this temporary process. This commit does not include support for pidfds that point to a specific thread, i.e pidfds opened with `PIDFD_THREAD` flag. Fixes: #2258 Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>	2024-10-26 22:18:22 -07:00
Bhavik Sachdev	a1db7627b9	images: Add protobuf definition for pidfd We only use the last pid from the list in NSpid entry (from /proc/<pid>/fdinfo/<pidfd>) while restoring pidfds. The last pid refers to the pid of the process in the most deeply nested pid namespace. Since CRIU does not currently support nested pid namespaces, this entry is the one we want. After Linux 6.9, inode numbers can be used to compare pidfds. pidfds referring to the same process will have the same inode numbers. We use inode numbers to restore pidfds that point to dead processes. Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>	2024-10-26 22:18:22 -07:00
Radostin Stoyanov	9463371787	Makefile.config: set CR_PLUGIN_DEFAULT variable By default, CRIU uses the path "/usr/lib/criu" to install and load plugins at runtime. This path is defined by the `PLUGINDIR` variable in Makefile.install and `CR_PLUGIN_DEFAULT` in `criu/include/plugin.h`. However, some distribution packages might install the CRIU plugins at "/usr/lib64/criu" instead. This patch updates the makefile to align the path defined by `CR_PLUGIN_DEFAULT` with the value of `PLUGINDIR`. Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2024-10-26 22:18:22 -07:00
Radostin Stoyanov	e9cceed87b	amdgpu: remove exec permissions on source files This patch fixes the following warnings that appear when building an RPM package: + /usr/lib/rpm/redhat/brp-mangle-shebangs * WARNING: ./usr/src/debug/criu-4.0-1.fc42.x86_64/plugins/amdgpu/amdgpu_plugin_util.c is executable but has no shebang, removing executable bit * WARNING: ./usr/src/debug/criu-4.0-1.fc42.x86_64/plugins/amdgpu/amdgpu_plugin_util.h is executable but has no shebang, removing executable bit Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2024-10-26 22:18:22 -07:00
Pengda Yang	810f52e443	limit the field width of 'scanf' Fixes: #2121 Signed-off-by: Pengda Yang <daz-3ux@proton.me>	2024-10-26 22:17:07 -07:00
Andrei Vagin	c2b48ff423	criu: Version 4.0 (CRIUDA) Major changes: * CUDA plugin to support checkpointing and restoring NVIDIA CUDA applications. * Shadow stack support * Pagemap cache: Added support for PAGEMAP_SCAN ioctl The full changelog can be found here: https://criu.org/Download/criu/4.0. Signed-off-by: Andrei Vagin <avagin@gmail.com> v4.0	2024-09-21 16:34:14 -07:00
Andrei Vagin	a8cbe76d4f	util: dump fsfd log messages It should help to investigate errors of fsconfig, fsmount and etc. Signed-off-by: Andrei Vagin <avagin@google.com>	2024-09-19 15:23:42 -07:00
David Francis	096c1f7a4d	plugins/amdgpu - Increase maximum parameter length The topology parsing assumed that all parameter names were 30 characters or fewer, but recommended_sdma_engine_id_mask is 31 characters. Make the maximum length a macro, and set it to 64. Signed-off-by: David Francis <David.Francis@amd.com>	2024-09-19 15:23:42 -07:00
David Francis	60ee5ebd9d	plugins/amdgpu: Zero ib_info on initialization This struct was being used un-initialized, meaning it was filled with random garbage. Mea culpa. Signed-off-by: David Francis <David.Francis@amd.com>	2024-09-19 15:23:42 -07:00
Andrei Vagin	6918998897	plugin/cuda: disable CUDA plugin if /dev/nvidiactl isn't present The presence of /dev/nvidiactl indicates that the system has a compatible NVIDIA GPU driver installed and that the GPU is accessible to the operating system. Signed-off-by: Andrei Vagin <avagin@google.com>	2024-09-19 15:23:42 -07:00
Andrei Vagin	e1331a4b60	fault: allow to check dont_use_freeze_cgroup Adds a new "fault" to call dont_use_freeze_cgroup. Signed-off-by: Andrei Vagin <avagin@google.com>	2024-09-19 15:23:42 -07:00
Andrei Vagin	651df375bd	criu: Allow disabling freeze cgroups Some plugins (e.g., CUDA) may not function correctly when processes are frozen using cgroups. This change introduces a mechanism to disable the use of freeze cgroups during process seizing, even if explicitly requested via the --freeze-cgroup option. The CUDA plugin is updated to utilize this new mechanism to ensure compatibility. Signed-off-by: Andrei Vagin <avagin@google.com>	2024-09-19 15:23:42 -07:00
Radostin Stoyanov	59f49c6276	codespell: fix typos This patch fixes the following typos reported by codespell: ./test/others/bers/bers.c:394: dependin ==> depending, depend in ./criu/kerndat.c:837: hitted ==> hit Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2024-09-19 15:23:42 -07:00
Radostin Stoyanov	edb6fbb820	scripts/uninstall_module: fix package discovery The `uninstall_module.py` script is a wrapper for the `pip uninstall` command that enables support for specifying installation prefix (i.e., `--prefix`). When this functionality is used, we intentionally set `sys.path` to include only search paths for the specified prefix to avoid unintentional uninstallation of packages in system paths. Since `importlib_metadata` version 8.1.0, the `Distribution.from_name()` method has been modified [1] to perform additional pre-processing of Distribution objects [2] that requires loading distribution metadata and results in the following error: File "/usr/local/lib/python3.12/site-packages/importlib_metadata/__init__.py", line 422, in <lambda> buckets = bucket(dists, lambda dist: bool(dist.metadata)) ^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/importlib_metadata/__init__.py", line 454, in metadata from . import _adapters File "/usr/local/lib/python3.12/site-packages/importlib_metadata/_adapters.py", line 3, in <module> import email.message File "/usr/lib64/python3.12/email/message.py", line 11, in <module> import quopri ModuleNotFoundError: No module named 'quopri' This error occurs because we have excluded system paths from the list of search paths (`sys.path`). However, this pre-processing is not required for our use case, as we only use the discovery mechanism of importlib_metadata to resolve the metadata directory path of the module being uninstalled. To fix this problem, this patch updates `uninstall_module` to avoid the `from_name()` method and use `discover(name=package_name)` directly. [1] `a65c29adc0` [2] https://github.com/python/importlib_metadata/blob/a65c29ad/importlib_metadata/__init__.py#L391 Fixes: #2468 Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2024-09-19 15:23:42 -07:00
Radostin Stoyanov	b1b3c14b17	cuda: unlock on timeout error When attempting to checkpoint a container with CUDA processes, CRIU could fail with the following error: Error (criu/cr-dump.c:1791): Timeout reached. Try to interrupt: 1 Error (cuda_plugin.c:143): cuda_plugin: Unable to read output of cuda-checkpoint: Interrupted system call Error (cuda_plugin.c:384): cuda_plugin: PAUSE_DEVICES failed with In this situation, the target process is locked, but CRIU fails due to a timeout and exits with an error. We need to make sure that the target PID is unlocked in such case. Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2024-09-19 15:23:42 -07:00
Adrian Reber	dbfa450246	ci: run aarch64 tests native via actuated Signed-off-by: Adrian Reber <areber@redhat.com>	2024-09-19 15:23:42 -07:00
Adrian Reber	8beac656fc	coredump: fail on unsupported architectures early Currently coredump only works on x86_64. Fail early on any other architecture. Signed-off-by: Adrian Reber <areber@redhat.com>	2024-09-19 15:23:42 -07:00
Adrian Reber	d44fc0de5a	test: only run macvlan tests if macvlan devices can be created Some test environments (Actuated runners for example) do not support maclvan devices. Skip tests depending on it automatically. Signed-off-by: Adrian Reber <areber@redhat.com>	2024-09-19 15:23:42 -07:00
Adrian Reber	01c65732b6	test: better test for SELinux tools Previously the check was just if /sys/fs/selinux is mounted. This extends the check to see if all necessary tools are installed. Signed-off-by: Adrian Reber <areber@redhat.com>	2024-09-19 15:23:42 -07:00
Adrian Reber	615ccf98cf	crit: do not crash on aarch64 doing 'crit x ./ rss' Running 'crit x ./ rss' on aarch64 crashes with: File "/home/criu/crit/crit/__main__.py", line 331, in explore_rss while vmas[vmi]['start'] < pme: ~~~~^^^^^ IndexError: list index out of range This adds an additional check to the while loop to do access indexes out of range. Signed-off-by: Adrian Reber <areber@redhat.com>	2024-09-19 15:23:42 -07:00
Radostin Stoyanov	21ea718f9f	plugins/amdgpu: fix printf format specifiers Errors on aarch64: In file included from amdgpu_plugin_drm.h:10, from amdgpu_plugin.c:33: amdgpu_plugin.c: In function 'amdgpu_plugin_dump_file': amdgpu_plugin_util.h:24:20: error: format '%lld' expects argument of type 'long long int', but argument 6 has type '__u64' {aka 'long unsigned int'} [-Werror=format=] 24 \| #define LOG_PREFIX "amdgpu_plugin: " \| ^~~~~~~~~~~~~~~~~ ../../criu/include/log.h:47:52: note: in expansion of macro 'LOG_PREFIX' 47 \| #define pr_info(fmt, ...) print_on_level(LOG_INFO, LOG_PREFIX fmt, ##__VA_ARGS__) \| ^~~~~~~~~~ amdgpu_plugin.c:1236:9: note: in expansion of macro 'pr_info' 1236 \| pr_info("devices:%d bos:%d objects:%d priv_data:%lld\n", args.num_devices, args.num_bos, args.num_objects, \| ^~~~~~~ cc1: all warnings being treated as errors Errors on ppc64: In file included from amdgpu_plugin_drm.h:10, from amdgpu_plugin.c:33: amdgpu_plugin.c: In function 'amdgpu_plugin_dump_file': amdgpu_plugin_util.h:24:20: error: format '%llu' expects argument of type 'long long unsigned int', but argument 6 has type '__u64' {aka 'long unsigned int'} [-Werror=format=] 24 \| #define LOG_PREFIX "amdgpu_plugin: " \| ^~~~~~~~~~~~~~~~~ ../../criu/include/log.h:47:52: note: in expansion of macro 'LOG_PREFIX' 47 \| #define pr_info(fmt, ...) print_on_level(LOG_INFO, LOG_PREFIX fmt, ##__VA_ARGS__) \| ^~~~~~~~~~ amdgpu_plugin.c:1236:9: note: in expansion of macro 'pr_info' 1236 \| pr_info("devices:%u bos:%u objects:%u priv_data:%llu\n", \| ^~~~~~~ cc1: all warnings being treated as errors In file included from amdgpu_plugin_util.c:38: amdgpu_plugin_util.c: In function 'print_kfd_bo_stat': amdgpu_plugin_util.h:24:20: error: format '%llx' expects argument of type 'long long unsigned int', but argument 5 has type '__u64' {aka 'long unsigned int'} [-Werror=format=] 24 \| #define LOG_PREFIX "amdgpu_plugin: " \| ^~~~~~~~~~~~~~~~~ ../../criu/include/log.h:47:52: note: in expansion of macro 'LOG_PREFIX' 47 \| #define pr_info(fmt, ...) print_on_level(LOG_INFO, LOG_PREFIX fmt, ##__VA_ARGS__) \| ^~~~~~~~~~ amdgpu_plugin_util.c:196:17: note: in expansion of macro 'pr_info' 196 \| pr_info("%s(), %d. KFD BO Addr: %llx \n", __func__, idx, bo->addr); \| ^~~~~~~ amdgpu_plugin_util.h:24:20: error: format '%llx' expects argument of type 'long long unsigned int', but argument 5 has type '__u64' {aka 'long unsigned int'} [-Werror=format=] 24 \| #define LOG_PREFIX "amdgpu_plugin: " \| ^~~~~~~~~~~~~~~~~ ../../criu/include/log.h:47:52: note: in expansion of macro 'LOG_PREFIX' 47 \| #define pr_info(fmt, ...) print_on_level(LOG_INFO, LOG_PREFIX fmt, ##__VA_ARGS__) \| ^~~~~~~~~~ amdgpu_plugin_util.c:197:17: note: in expansion of macro 'pr_info' 197 \| pr_info("%s(), %d. KFD BO Size: %llx \n", __func__, idx, bo->size); \| ^~~~~~~ amdgpu_plugin_util.h:24:20: error: format '%llx' expects argument of type 'long long unsigned int', but argument 5 has type '__u64' {aka 'long unsigned int'} [-Werror=format=] 24 \| #define LOG_PREFIX "amdgpu_plugin: " \| ^~~~~~~~~~~~~~~~~ ../../criu/include/log.h:47:52: note: in expansion of macro 'LOG_PREFIX' 47 \| #define pr_info(fmt, ...) print_on_level(LOG_INFO, LOG_PREFIX fmt, ##__VA_ARGS__) \| ^~~~~~~~~~ amdgpu_plugin_util.c:198:17: note: in expansion of macro 'pr_info' 198 \| pr_info("%s(), %d. KFD BO Offset: %llx \n", __func__, idx, bo->offset); \| ^~~~~~~ amdgpu_plugin_util.h:24:20: error: format '%llx' expects argument of type 'long long unsigned int', but argument 5 has type '__u64' {aka 'long unsigned int'} [-Werror=format=] 24 \| #define LOG_PREFIX "amdgpu_plugin: " \| ^~~~~~~~~~~~~~~~~ ../../criu/include/log.h:47:52: note: in expansion of macro 'LOG_PREFIX' 47 \| #define pr_info(fmt, ...) print_on_level(LOG_INFO, LOG_PREFIX fmt, ##__VA_ARGS__) \| ^~~~~~~~~~ amdgpu_plugin_util.c:199:17: note: in expansion of macro 'pr_info' 199 \| pr_info("%s(), %d. KFD BO Restored Offset: %llx \n", __func__, idx, bo->restored_offset); \| ^~~~~~~ cc1: all warnings being treated as errors Co-developed-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2024-09-19 15:23:42 -07:00
Radostin Stoyanov	3e2ed18790	plugins/amdgpu: use C99-standard types Co-developed-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2024-09-19 15:23:42 -07:00
Radostin Stoyanov	d68205e919	ci: enable cross compile testing for amdgpu-plugin Skip cross-compilation on armv7 because, among many other errors, it fails with the following: In file included from ../../include/common/lock.h:9, from ../../criu/include/files.h:9, from amdgpu_plugin.c:30: ../../include/common/asm/atomic.h:60:2: error: #error ARM architecture version (CONFIG_ARMV) not set or unsupported. 60 \| #error ARM architecture version (CONFIG_ARMV) not set or unsupported. \| ^~~~~ ../../include/common/asm/atomic.h: In function 'atomic_add_return': ../../include/common/asm/atomic.h:81:9: error: implicit declaration of function 'smp_mb' [-Werror=implicit-function-declaration] 81 \| smp_mb(); \| ^~~~~~ Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2024-09-19 15:23:42 -07:00
Radostin Stoyanov	2ee5844411	plugins/amdgpu: fix cross-compilation To enable cross-compile we need to use the CC definition from criu/scripts/nmk/scripts/tools.mk: CC := $(CROSS_COMPILE)$(HOSTCC) Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2024-09-19 15:23:42 -07:00
Andrei Vagin	9a19cf34de	scripts/ci: run tests with the mocked cuda-checkpoint tool Signed-off-by: Andrei Vagin <avagin@google.com>	2024-09-11 16:02:11 -07:00
Andrei Vagin	de31abb970	criu/plugin: don't call plugin device hooks for non-alive tasks Dead tasks don't hold any resources. Fixes: 2465 Signed-off-by: Andrei Vagin <avagin@google.com>	2024-09-11 16:02:11 -07:00
Andrei Vagin	dea6305914	test/zdtm: allow to run tests with the mocked cuda-checkpoint tool Here is an example how to run one test: $ python test/zdtm.py run -t zdtm/static/env00 --ignore-taint --mocked-cuda-checkpoint Signed-off-by: Andrei Vagin <avagin@google.com>	2024-09-11 16:02:11 -07:00
haozi007	67fe44e981	support user set remote mmap vma address 1. os auto assignment vma addr maybe conflict with vma in gpu living migrate scene; 2. so, we should give choice to user; Signed-off-by: haozi007 <liuhao27@huawei.com>	2024-09-11 16:02:11 -07:00
Radostin Stoyanov	551cd92447	timer: fix printf specifiers for __suseconds64_t New internal glibc types __timeval64 [1] and __suseconds64_t [2] have been introduced as a solution for the Y2038 problem [3]. These 64-bit types are used across all architectures. However, this change causes the following build errors when cross-compiling on ARMv7 (armhf): criu/timer.c:49:17: error: format '%ld' expects argument of type 'long int', but argument 5 has type '__suseconds64_t' {aka 'long long int'} [-Werror=format=] 49 \| pr_info("Restored %s timer to %" PRId64 ".%ld -> %" PRId64 ".%ld\n", n, \| ^~~~~~~~~~~~~~~~~~~~~~~~ 50 \| (int64_t)val->it_value.tv_sec, val->it_value.tv_usec, \| ~~~~~~~~~~~~~~~~~~~~~ \| \| \| __suseconds64_t {aka long long int} criu/timer.c:49:17: error: format '%ld' expects argument of type 'long int', but argument 7 has type '__suseconds64_t' {aka 'long long int'} [-Werror=format=] 49 \| pr_info("Restored %s timer to %" PRId64 ".%ld -> %" PRId64 ".%ld\n", n, \| ^~~~~~~~~~~~~~~~~~~~~~~~ 50 \| (int64_t)val->it_value.tv_sec, val->it_value.tv_usec, 51 \| (int64_t)val->it_interval.tv_sec, val->it_interval.tv_usec); \| ~~~~~~~~~~~~~~~~~~~~~~~~ \| \| \| __suseconds64_t {aka long long int} ns.c:234:48: error: format '%ld' expects argument of type 'long int', but argument 5 has type 'time_t' {aka 'long long int'} [-Werror=format=] 234 \| len = snprintf(buf, sizeof(buf), "%d %ld 0", clk_id, offset); \| ~~^ ~~~~~~ \| \| \| \| long int time_t {aka long long int} \| %lld msg.c:58:41: error: format '%ld' expects argument of type 'long int', but argument 3 has type '__suseconds64_t' {aka 'long long int'} [-Werror=format=] 58 \| off += sprintf(buf + off, ".%.3ld: ", tv.tv_usec / 1000); \| ~~~~^ ~~~~~~~~~~~~~~~~~ \| \| \| \| long int __suseconds64_t {aka long long int} \| %.3lld ../lib/zdtmtst.h:137:26: error: format '%ld' expects argument of type 'long int', but argument 4 has type '__time64_t' {aka 'long long int'} [-Werror=format=] 137 \| test_msg("ERR: %s:%d: " format " (errno = %d (%s))\n", __FILE__, __LINE__, ##arg, errno, \ \| ^~~~~~~~~~~~~~ pthread_timers_h.c:72:17: note: in expansion of macro 'pr_perror' 72 \| pr_perror("wrong interval: %ld:%ld", itimerspec.it_interval.tv_sec, itimerspec.it_interval.tv_nsec); \| ^~~~~~~~~ vdso00.c:22:32: error: format '%li' expects argument of type 'long int', but argument 3 has type '__time64_t' {aka 'long long int'} [-Werror=format=] 22 \| test_msg("%d time: %10li\n", getpid(), tv.tv_sec); \| ~~~~^ ~~~~~~~~~ \| \| \| \| long int __time64_t {aka long long int} \| %10lli vdso00.c:29:32: error: format '%li' expects argument of type 'long int', but argument 3 has type '__time64_t' {aka 'long long int'} [-Werror=format=] 29 \| test_msg("%d time: %10li\n", getpid(), tv.tv_sec); \| ~~~~^ ~~~~~~~~~ \| \| \| \| long int __time64_t {aka long long int} \| %10lli vdso01.c:357:42: error: format '%li' expects argument of type 'long int', but argument 2 has type '__time64_t' {aka 'long long int'} [-Werror=format=] 357 \| test_msg("gettimeofday: tv_sec %li vdso_gettimeofday: tv_sec %li\n", tv1.tv_sec, tv2.tv_sec); \| ~~^ ~~~~~~~~~~ \| \| \| \| long int __time64_t {aka long long int} \| %lli vdso01.c:357:72: error: format '%li' expects argument of type 'long int', but argument 3 has type '__time64_t' {aka 'long long int'} [-Werror=format=] 357 \| test_msg("gettimeofday: tv_sec %li vdso_gettimeofday: tv_sec %li\n", tv1.tv_sec, tv2.tv_sec); \| ~~^ ~~~~~~~~~~ \| \| \| \| long int __time64_t {aka long long int} \| vdso01.c:328:43: error: format '%li' expects argument of type 'long int', but argument 2 has type '__time64_t' {aka 'long long int'} [-Werror=format=] 328 \| test_msg("clock_gettime: tv_sec %li vdso_clock_gettime: tv_sec %li\n", ts1.tv_sec, ts2.tv_sec); \| ~~^ ~~~~~~~~~~ \| \| \| \| long int __time64_t {aka long long int} \| %lli vdso01.c:328:74: error: format '%li' expects argument of type 'long int', but argument 3 has type '__time64_t' {aka 'long long int'} [-Werror=format=] 328 \| test_msg("clock_gettime: tv_sec %li vdso_clock_gettime: tv_sec %li\n", ts1.tv_sec, ts2.tv_sec); \| ~~^ ~~~~~~~~~~ \| \| \| \| long int __time64_t {aka long long int} \| ../lib/zdtmtst.h:144:26: error: format '%ld' expects argument of type 'long int', but argument 4 has type 'time_t' {aka 'long long int'} [-Werror=format=] 144 \| test_msg("FAIL: %s:%d: " format " (errno = %d (%s))\n", __FILE__, __LINE__, ##arg, errno, \ \| ^~~~~~~~~~~~~~~ mtime_mmap.c:80:17: note: in expansion of macro 'fail' 80 \| fail("mtime %ld wasn't updated on mmapped %s file", mtime_new, filename); \| ^~~~ ../lib/zdtmtst.h:144:26: error: format '%ld' expects argument of type 'long int', but argument 4 has type '__time64_t' {aka 'long long int'} [-Werror=format=] 144 \| test_msg("FAIL: %s:%d: " format " (errno = %d (%s))\n", __FILE__, __LINE__, ##arg, errno, \ \| ^~~~~~~~~~~~~~~ mtime_mmap.c:101:17: note: in expansion of macro 'fail' 101 \| fail("After migration, mtime changed to %ld", fst.st_mtime); \| ^~~~ [1] https://sourceware.org/git/?p=glibc.git;h=504c98717062cb9bcbd4b3e59e932d04331ddca5 [2] https://sourceware.org/git/?p=glibc.git;h=3fced064f23562ec24f8312ffbc14950993969e6 [3] https://en.wikipedia.org/wiki/Year_2038_problem Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2024-09-11 16:02:11 -07:00
Radostin Stoyanov	a045c874cb	ci: run tests with amdgpu and cuda plugins Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2024-09-11 16:02:11 -07:00
Radostin Stoyanov	2453ed69a2	zdtm: add option to run tests with criu plugins By default, if the "CRIU_LIBS_DIR" environment variable is not set, CRIU will load all plugins installed in `/usr/lib/criu`. This may result in running the ZDTM tests with plugins for a different version of CRIU (e.g., installed from a package). This patch updates ZDTM to always set the "CRIU_LIBS_DIR" environment variable and use a local "plugins" directory. This directory contains copies of the plugin files built from source. In addition, this patch adds the `--criu-plugin` option to the `zdtm.py run` command, allowing tests to be run with specified CRIU plugins. Example: - Run test only with AMDGPU plugin ./zdtm.py run -t zdtm/static/busyloop00 --criu-plugin amdgpu - Run test only with CUDA plugin ./zdtm.py run -t zdtm/static/busyloop00 --criu-plugin cuda - Run test with both AMDGPU and CUDA plugins ./zdtm.py run -t zdtm/static/busyloop00 --criu-plugin amdgpu cuda Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2024-09-11 16:02:11 -07:00
Radostin Stoyanov	ad66c27a11	cuda: fix launch cuda-checkpoint When the cuda-checkpoint tool is not installed, execvp() is expected to fail and return -1. In this case, we need to call exit() to terminate the child process that was created earlier with fork(). Since CRIU can be used with applications that do not use CUDA, even when the CUDA plugin is installed, this patch also updates the log messages to show debug and warning (instead of error) when the cuda-checkpoint tool is not found in $PATH. Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org> Signed-off-by: Andrei Vagin <avagin@google.com>	2024-09-11 16:02:11 -07:00
Radostin Stoyanov	fde0b7ac69	cuda: don't leak fds to cuda-checkpoint Leaking open file descriptors to third-party tools can lead to security risks. Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2024-09-11 16:02:11 -07:00
Radostin Stoyanov	4dde52a308	ci/podman: show mounts Show information about mounts available on the host filesystem. This is useful for debugging. Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2024-09-11 16:02:11 -07:00
Radostin Stoyanov	9a85fb6382	ci/podman: show criu logs in case of error Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2024-09-11 16:02:11 -07:00
liuchao173	8437663cc6	delete redundant include header files restorer.h has been included in line 43. Fixes: 22963d282729 ("Hide asm/restorer.h from sources") Signed-off-by: liuchao173 <liuchao173@huawei.com>	2024-09-11 16:02:11 -07:00
Radostin Stoyanov	c42b58f4fb	plugin: enable multiple plugins for the same hook CRIU provides two plugins for checkpoint/restore of GPU applications: amdgpu and cuda. Both plugins use the `RESUME_DEVICES_LATE` hook to enable restore: CR_PLUGIN_REGISTER_HOOK(CR_PLUGIN_HOOK__RESUME_DEVICES_LATE, amdgpu_plugin_resume_devices_late) CR_PLUGIN_REGISTER_HOOK(CR_PLUGIN_HOOK__RESUME_DEVICES_LATE, cuda_plugin_resume_devices_late) However, CRIU currently does not support running more than one plugin for the same hook. As a result, when both plugins are installed, the resume function for CUDA applications is not executed. To fix this, we need to make sure that both `plugin_resume_devices_late()` functions return `-ENOTSUP` when restore is not supported. Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2024-09-11 16:02:11 -07:00
Radostin Stoyanov	85050be66b	seize: fix pause-devices plugin hook The plugin hook "PAUSE_DEVICES" was recently introduced in the following commit. This hook was intended to execute the cuda-checkpoint tool before the process tree is frozen. However, the run_plugins() call has been placed immediately after freeze_processes(). This causes the cuda-checkpoint tool to hang indefinitely during the checkpointing of CUDA applications running in containers, eventually leading to its termination by the timeout alarm. a85f488595e0a3a6e6cc6ca7c94d4a00b1341aaf criu/plugin: Introduce new plugin hooks PAUSE_DEVICES and CHECKPOINT_DEVICES to be used during pstree collection This problem can be reproduced with the following example: sudo podman run -d --rm \ --device nvidia.com/gpu=all --security-opt=label=disable \ quay.io/radostin/cuda-counter sudo podman container checkpoint -l -e /tmp/checkpoint.tar Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2024-09-11 16:02:11 -07:00
Andrei Vagin	21108b40de	test/zdtm: mount a new tmpfs to the zdtm root /dev The current file system can be mounted with nodev. Fixes #2441 Signed-off-by: Andrei Vagin <avagin@google.com>	2024-09-11 16:02:11 -07:00
Radostin Stoyanov	fcbadfbdbf	plugins: set executable bit on .so files For historical reasons, some tools like rpm [1] or ldd [2,3] may expect the executable bit to be present for the correct identification of shared libraries. The executable bit on .so files is set by default by compilers (e.g., GCC). It is not strictly necessary but primarily a convention. [1] https://docs.fedoraproject.org/en-US/package-maintainers/CommonRpmlintIssues/#unstripped_binary_or_object [2] https://sourceware.org/git/?p=glibc.git;a=blob;f=elf/ldd.bash.in;h=d6b640df;hb=HEAD#l154 [3] $ sudo ldd /usr/lib/criu/*.so /usr/lib/criu/amdgpu_plugin.so: ldd: warning: you do not have execution permission for `/usr/lib/criu/amdgpu_plugin.so' linux-vdso.so.1 (0x00007fd0a2a3e000) libdrm.so.2 => /lib64/libdrm.so.2 (0x00007fd0a29eb000) libdrm_amdgpu.so.1 => /lib64/libdrm_amdgpu.so.1 (0x00007fd0a29de000) libc.so.6 => /lib64/libc.so.6 (0x00007fd0a27fc000) /lib64/ld-linux-x86-64.so.2 (0x00007fd0a2a40000) /usr/lib/criu/cuda_plugin.so: ldd: warning: you do not have execution permission for `/usr/lib/criu/cuda_plugin.so' linux-vdso.so.1 (0x00007f1806e13000) libc.so.6 => /lib64/libc.so.6 (0x00007f1806c08000) /lib64/ld-linux-x86-64.so.2 (0x00007f1806e15000) Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2024-09-11 16:02:11 -07:00
Radostin Stoyanov	5783706d57	docs: update amdgpu-plugin man page This patch updates the dependencies section of the AMDGPU plugin man page to reflect that the plugin has been merged upstream and to fix a formatting issue. Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2024-09-11 16:02:11 -07:00

... 2 3 4 5 6 ...

11635 Commits