mir/criu - criu - Mike's Git repositories

mir/criu

mirror of https://github.com/checkpoint-restore/criu synced 2025-08-22 01:51:51 +00:00

Author	SHA1	Message	Date
Florian Weimer	089345f77a	Adjust to glibc __rseq_size semantic change In commit 2e456ccf0c34a056e3ccafac4a0c7effef14d918 ("Linux: Make __rseq_size useful for feature detection (bug 31965)") glibc 2.40 changed the meaning of __rseq_size slightly: it is now the size of the active/feature area (20 bytes initially), and not the size of the entire initially defined struct (32 bytes including padding). The reason for the change is that the size including padding does not allow detection of newly added features while previously unused padding is consumed. The prep_libc_rseq_info change in criu/cr-restore.c is not necessary on kernels which have full ptrace support for obtaining rseq information because the code is not used. On older kernels, it is a correctness fix because with size 20 (the new value), rseq registeration would fail. The two other changes are required to make rseq unregistration work in tests. Signed-off-by: Florian Weimer <fweimer@redhat.com>	2024-09-11 16:02:11 -07:00
Bui Quang Minh	b9081ca56b	zdtm: make cgroup testcases run non-parallel cgroup testcases live in the same cgroup root zdtmtst and zdtmtst.defaultroot controller then create child subgroup for testing. This can cause problems when cgroup testcases run in parallel. For example, testcase A dumps the child subgroup of testcase B since it's in the cgroup root but in the middle of restoring of testcase A, testcase B completes and cleans up the subgroup directory. This causes error in testcase A restore. This commit adds excl flag to all cgroup testcases description so that these don't run parallel. Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>	2024-09-11 16:02:11 -07:00
Andrei Vagin	4f45572fde	util: use close_range when it's supported close_range is faster than reading /proc/self/fd and closing descriptors one by one. Signed-off-by: Andrei Vagin <avagin@gmail.com>	2024-09-11 16:02:11 -07:00
Radostin Stoyanov	42b177da62	scripts/build: drop centos 7 targets The CI tests with CentOS 7 have been disabled and removed [1,2]. This patch removes the obsolete Makefile targets for these tests. [1] `24bc083653` [2] `f8466ca798` Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2024-09-11 16:02:11 -07:00
Andrei Vagin	1815838191	vdso: proxify the __vdso_clock_gettime64 function It was added in v5.3-rc1~211^2~4^2~10. Fixes #2390 Signed-off-by: Andrei Vagin <avagin@gmail.com>	2024-09-11 16:02:11 -07:00
Andrei Vagin	ac22aaf576	apparmor: get_suspend_policy must return NULL in error cases Before this fix, it could return MAP_FAILED which is ((void *) -1). Signed-off-by: Andrei Vagin <avagin@gmail.com>	2024-09-11 16:02:11 -07:00
Pavel Tikhomirov	71999d8883	cgroupd: unblock SIGTERM to make stop_cgroupd actually work Sometimes due to sigblockmask inheritance cgroupd can inherit SIGTERM blocked. That will lead cgroupd ignoring SIGTERM from stop_cgroupd() and CRIU will get stuck due to waiting for never-stopping cgroupd. I see this happening in lxc-checkpoint, also saw this in OpenVZ jenkins on cgroup_inotify00 test. Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>	2024-09-11 16:02:11 -07:00
Liu Hua	daed6c3535	irmap: duplicate string in irmap_scan_path_add Duplicate string in irmap_scan_path_add, otherwise it will free before parsing next configuration input. [ avagin: handle errors of xstrdup ] Signed-off-by: Liu Hua <weldonliu@tencent.com> Signed-off-by: Andrei Vagin <avagin@gmail.com>	2024-09-11 16:02:11 -07:00
Andrei Vagin	b169e3b63d	plugins/cuda: fix crosscompilation Signed-off-by: Andrei Vagin <avagin@gmail.com>	2024-09-11 16:02:11 -07:00
Pratyush Yadav	ca971b7f8b	compel: fix build on Amazon Linux 2 due to missing PTRACE_ARCH_PRCTL Commit fc683cb01 ("compel: shstk: save CET state when CPU supports it") started using PTRACE_ARCH_PRCTL to query shadow stack status. While PTRACE_ARCH_PRCTL has existed in the kernel for a long time, it was only added to glibc in version 2.27. Amazon Linux 2 (AL2) has glibc 2.26, which does not have this definition. As a result, build on AL2 fails with the below error: compel/arch/x86/src/lib/infect.c: In function ‘get_task_xsave’: compel/arch/x86/src/lib/infect.c:276:14: error: ‘PTRACE_ARCH_PRCTL’ undeclared (first use in this function) 276 \| if (ptrace(PTRACE_ARCH_PRCTL, pid, (unsigned long)&features, ARCH_SHSTK_STATUS)) { \| ^~~~~~~~~~~~~~~~~ While the definition is present on the system via the kernel headers (in asm/ptrace-abi.h) which can be reached by including linux/ptrace.h, the comment in compel/include/uapi/ptrace.h says: We'd want to include both sys/ptrace.h and linux/ptrace.h, hoping that most definitions come from either one or another. Alas, on Alpine/musl both files declare struct ptrace_peeksiginfo_args, so there is no way they can be used together. Let's rely on libc one. Since including linux/ptrace.h is not an option, define PTRACE_ARCH_PRCTL if it doesn't already exist. An interesting point to note is that in sys/ptrace.h, PTRACE_ARCH_PRCTL is an enum value so the preprocessor doesn't know about it. PT_ARCH_PRCTL is the preprocessor symbol that matches the value of PTRACE_ARCH_PRCTL. So look for PT_ARCH_PRCTL to decide if PTRACE_ARCH_PRCTL is available or not. Another interesting point to note is that AL2 ships with GCC 7 by default, which does not support the -mshstk option, causing other build failures. Luckily, it also ships GCC 10 which does have the option. Using GCC 10 lets the build succeed. Fixes: fc683cb01 ("compel: shstk: save CET state when CPU supports it") Signed-off-by: Pratyush Yadav <ptyadav@amazon.de>	2024-09-11 16:02:11 -07:00
Jesus Ramos	bf417dd050	criu/plugin: Add NVIDIA CUDA plugin Adding support for the NVIDIA cuda-checkpoint utility, requires the use of an r555 or higher driver along with the cuda-checkpoint binary. Signed-off-by: Jesus Ramos <jeramos@nvidia.com>	2024-09-11 16:02:11 -07:00
Jesus Ramos	5f486d5aee	criu/plugin: Introduce new plugin hooks PAUSE_DEVICES and CHECKPOINT_DEVICES to be used during pstree collection PAUSE_DEVICES is called before a process is frozen and is used by the CUDA plugin to place the process in a state that's ready to be checkpointed and quiesce any pending work CHECKPOINT_DEVICES is called after all processes in the tree have been frozen and PAUSE'd and performs the actual checkpointing operation for CUDA applications Signed-off-by: Jesus Ramos <jeramos@nvidia.com>	2024-09-11 16:02:11 -07:00
Jesus Ramos	1012e542e5	criu: Restore rseq_cs state slightly earlier in the restore sequence and run the plugin finalizer later in the dump sequence Restore rseq_cs state before calling RESUME_DEVICES_LATE as the CUDA plugin will temporarily unfreeze a thread during the plugin hook to assist with device restore Run the plugin finalizer later in the dump sequence since the finalizer is used by the CUDA plugin to handle some process cleanup Signed-off-by: Jesus Ramos <jeramos@nvidia.com>	2024-09-11 16:02:11 -07:00
Radostin Stoyanov	7ac4537069	readme: update link to FAQ page The current link opens a page with the following text: The MediaWiki FAQ can be found at: https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:FAQ Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2024-09-11 16:02:11 -07:00
Radostin Stoyanov	4f15fe8c59	make: improve check for externally managed Python Move PYTHON_EXTERNALLY_MANAGED and PIP_BREAK_SYSTEM_PACKAGES into Makefile.install to avoid code duplication. In addition, add PIPFLAGS variable to enable specifying pip options during installation. This is particularly useful for packaging, where it is common for `pip install` to run in an environment with pre-installed dependencies and without internet access. In such environment, we need to specify the following options: --no-build-isolation --no-index --no-deps Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2024-09-11 16:02:11 -07:00
Adrian Reber	fdf546dbd5	ci: upgrade to Fedora 40 Vagrant images (38 is EOL) Signed-off-by: Adrian Reber <areber@redhat.com>	2024-09-11 16:02:11 -07:00
Bhavik Sachdev	f171649264	test/dump-crash: check code path when dump crashes Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>	2024-09-11 16:02:11 -07:00
Bhavik Sachdev	a252a240c3	zdtm: Distinguish between fail and crash of dump Adds a exit_signal static method to criu_cli, criu_config and criu_rpc used to detect a crash. Fixes: #350 Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>	2024-09-11 16:02:11 -07:00
Adrian Reber	6feb57a840	ci: remove CentOS Stream 8 test (EOL) Signed-off-by: Adrian Reber <areber@redhat.com>	2024-09-11 16:02:11 -07:00
Radostin Stoyanov	1da29f27f6	zdtm: add support for LD_PRELOAD tests This commit adds a `--preload-libfault` option to ZDTM's run command. This option runs CRIU with LD_PRELOAD to intercept libc functions such as pread(). This method allows to simulate special cases, for example, when a successful call to pread() transfers fewer bytes than requested. Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2024-09-11 16:02:11 -07:00
Andrei Vagin	e7276cf63b	pagemap-cache: handle short reads It is possible for pread() to return fewer number of bytes than requested. In such case, we need to repeat the read operation with appropriate offset. Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2024-09-11 16:02:11 -07:00
Andrei Vagin	cc88b1e1ff	net: Fix TOCTOU race condition in unix_conf_op The unix_conf_op function reads the size of the sysctl entry array twice. gcc thinks that it can lead to a time-of-check to time-of-use (TOCTOU) race condition if the array size changes between the two reads. Fixes #2398 Signed-off-by: Andrei Vagin <avagin@gmail.com>	2024-09-11 16:02:11 -07:00
Alexander Mikhalitsyn	457bc6a8ff	criu: use proper format-specified to accommodate time_t 64-bit change See also: https://wiki.debian.org/ReleaseGoals/64bit-time Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>	2024-09-11 16:02:11 -07:00
Arnav Bhatt	95f66d13db	criu: move sigact dump/restore code into sigact.c Seperate sigact dump/restore code from cr-restore.c and parasite-syscall.c into sigact.c Signed-off-by: Arnav Bhatt <arnav@ghativega.in>	2024-09-11 16:02:11 -07:00
Adrian Reber	9c8a6927aa	ci: update check for SELinux The rawhide tests runs in a container. Containers always have SELinux disabled from the inside. Somehow /sys/fs/selinux is now mounted. We used the existence of that directory if SELinux is available. This seems to be no longer true. Signed-off-by: Adrian Reber <areber@redhat.com> Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2024-09-11 16:02:11 -07:00
Radostin Stoyanov	b3c3422cd9	test/make: remove unused target A fault-injection test was introduced in commit [1] and later removed in commit [2]. This patch removes the obsolete Makefile target. [1] b95407e264fcf58f4f73f78abef6dac60436e7dd test: check, that parasite can rollback itself (v2) [2] 2cb4532e266d0c9f8e87839d5b5eb728a3e4d10d tests: remove zdtm.sh (v2) Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2024-09-11 16:02:11 -07:00
Radostin Stoyanov	30aa8dbe4d	mount: fix unbounded write Replace sprintf() with snprintf() and specify maximum length of characters to avoid potential overflow. Reported-by: GitHub CodeQL (https://codeql.github.com/) Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2024-09-11 16:02:11 -07:00
Juntong Deng	708f872a6d	sk-tcp: Add test cases for TCP_CORK and TCP_NODELAY socket options Currently there are no socket option test cases for TCP_CORK and TCP_NODELAY, this commit adds related test cases. The socket option test cases for TCP_KEEPCNT, TCP_KEEPIDLE, and TCP_KEEPINTVL already exist in socket-tcp_keepalive.c, so they are not included in this test case. Signed-off-by: Juntong Deng <juntong.deng@outlook.com>	2024-09-11 16:02:11 -07:00
Juntong Deng	9ba9aff77f	sk-tcp: Move TCP socket options from SkOptsEntry to TcpOptsEntry Currently some TCP socket option information is stored in SkOptsEntry, which is a little confusing. SkOptsEntry should only contain socket options that are common to all sockets. In this commit move the TCP-specific socket options from SkOptsEntry to TcpOptsEntry. Signed-off-by: Juntong Deng <juntong.deng@outlook.com>	2024-09-11 16:02:11 -07:00
Juntong Deng	1cb75c0b1e	sk-tcp: Move TCP socket options from TcpStreamEntry to TcpOptsEntry Currently some of the TCP socket option information is stored in the TcpStreamEntry, but the information in the TcpStreamEntry is only restored after the TCP socket has established connection, which results in these TCP socket options not being restored for unconnected TCP sockets. In this commit move the TCP socket options from TcpStreamEntry to TcpOptsEntry and add dump_tcp_opts() and restore_tcp_opts() for TCP socket options dump and restore. Signed-off-by: Juntong Deng <juntong.deng@outlook.com>	2024-09-11 16:02:11 -07:00
Kir Kolyshkin	13854a988c	criu: fix a fatal failure if nft doesn't work On some systems, nft binary might not be installed, or some kernel options might be unconfigured, resulting in something like this: sudo unshare -n nft create table inet CRIU Error: Could not process rule: Operation not supported create table inet CRIU ^^^^^^^^^^^^^^^^^^^^^^^ This is similar to what kerndat_has_nftables_concat() does, and if the outcome is the same, it returns an error to kerndat_init(), and an error from kerndat_init() is considered fatal. Let's relax the check, returning mere "feature not working" instead of a fatal error. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2024-09-11 16:02:11 -07:00
Pavel Tikhomirov	df178c7e53	sk-tcp: cleanup dump_tcp_conn_state error handling 1) In dump_tcp_conn_state, if return from libsoccr_save is >=0, we check that sizeof(struct libsoccr_sk_data) returned from libsoccr_save is equal to sizeof(struct libsoccr_sk_data) we see in dump_tcp_conn_state (probably to check if we use the right library version). And if sizes are different we go to err_r, which just returns ret, which can teoretically be 0 (if size in library is zero) and that would lead dump_one_tcp treat this as success though it is obvious error. 2) In case of dump_opt or open_image fails we don't explicitly set ret and rely that sizeof(struct libsoccr_sk_data) previously set to ret is not 0, I don't really like it, it makes reading code too complex. 3) We have a lot of err_* labels which do exactly the same thing, there is no point in having all of them, also it is better to choose the name of the label based on what it really does. So let's refactor error handling to avoid these inconsistencies. Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>	2024-09-11 16:02:11 -07:00
Radostin Stoyanov	4607b53566	mem: optimize debug logging of enqueued pages During restore, CRIU prints "Enqueue page-read" messages for each page-read request [1]. However, this message does not provide useful information, increases performance overhead during restore and the size of log file. $ ./zdtm.py run -t zdtm/static/maps06 -f h -k always $ grep 'Enqueue page-read' dump/zdtm/static/maps06/56/1/restore.log \| wc -l 20493 This commit replaces these log messages with a single message that shows the number of enqueued page-read requests. $ grep 'enqueued' dump/zdtm/static/maps06/56/1/restore.log (00.061449) 56: nr_enqueued: 20493 [1] https://github.com/checkpoint-restore/criu/commit/91388fc Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2024-09-11 16:02:11 -07:00
Radostin Stoyanov	f4290868bb	ci/vdso01: fix typo Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2024-09-11 16:02:11 -07:00
Radostin Stoyanov	e68a06cfd1	ci: update actions/checkout to v4 Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2024-09-11 16:02:11 -07:00
Radostin Stoyanov	5aaf450213	ci: update base OS to ubuntu 22.04 Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2024-09-11 16:02:11 -07:00
Andrei Vagin	1c2a3d7faa	check: verify ino and dev of overlayfs files in /proc/pid/maps Check that the file device and inode shown in /proc/pid/maps match values returned by stat(2). Signed-off-by: Andrei Vagin <avagin@gmail.com>	2024-09-11 16:02:11 -07:00
Kir Kolyshkin	e07ffa04b0	Makefile.config: fix/improve feature warnings. 1. Tell which RPMs or DEBs are required in all cases. 2. Use $(info ...) everywhere. 3. Drop extra nested $(info), instead use (a document) a simpler kludge. 4. Simplify and unify the language, add missing periods. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2024-09-11 16:02:11 -07:00
Pavel Tikhomirov	af4058871e	timer: fix wrapping allignment in function declaration Currently we have tabs + spaces on the wrapped line but the wrapped part is not alligned to the opening bracket. Fixes: bbe26d1b7 ("timer: fix allignment in function definition") Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>	2024-09-11 16:02:11 -07:00
Adrian Reber	0fc83a79b1	ci: silence CircleCI warning about deprecated image CircleCI currently prints out the following warning: This job is using a deprecated image 'ubuntu-2004:202010-01', please update to a newer image According to https://discuss.circleci.com/t/linux-image-deprecations-and-eol-for-2024/ the recommended image name is: "image: default" Signed-off-by: Adrian Reber <areber@redhat.com>	2024-09-11 16:02:11 -07:00
ccccrrrr	52623cca16	criu: move timers dump/restore code into separate file Fixes: #335 Signed-off-by: ccccrrrr <zcr1006@gmail.com>	2024-09-11 16:02:11 -07:00
Radostin Stoyanov	231ba0cd29	zdtm/sched_policy00: use reset-on-fork flag This patch extends the sched_policy00 test case to verify that the SCHED_RESET_ON_FORK flag is restored correctly. Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2024-09-11 16:02:11 -07:00
Radostin Stoyanov	75fed59ef6	Add support for reset-on-fork scheduling flag This patch extends CRIU with support for SCHED_RESET_ON_FORK. When the SCHED_RESET_ON_FORK flag is set, the following rules apply for subsequently created children: - If the calling thread has a scheduling policy of SCHED_FIFO or SCHED_RR, the policy is reset to SCHED_OTHER in child processes. - If the calling process has a negative nice value, the nice value is reset to zero in child processes. (See 'man 7 sched') Fixes: #2359 Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2024-09-11 16:02:11 -07:00
Artem Trushkin	8f0e200e66	mem: fix some VMAs being incorrectly mapped wtih PROT_WRITE A memory interval is a half-open interval, so the condition when pr->pe->vaddr == vma->e->end should not be interpreted as an intersection and should cause vma to be marked with VMA_NO_PROT_WRITE. Fixes: #2364 Signed-off-by: Artem Trushkin <at.120@ya.ru>	2024-09-11 16:02:11 -07:00
Adrian Reber	a2b018a188	ci: try to fix broken docker test Upgrade to 22.04 base image and use the existing version of docker. Signed-off-by: Adrian Reber <areber@redhat.com>	2024-09-11 16:02:11 -07:00
Mike Rapoport (IBM)	a48aa33eaa	restorer: shstk: implement shadow stack restore The restore of a task with shadow stack enabled adds these steps: * switch from the default shadow stack to a temporary shadow stack allocated in the premmaped area * unmap CRIU mappings; nothing changed here, but it's important that CRIU mappings can be removed only after switching to a temporary shadow stack * create shadow stack VMA with map_shadow_stack() * restore shadow stack contents with wrss * switch to "real" shadow stack * lock shadow stack features Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>	2024-09-11 16:02:11 -07:00
Mike Rapoport (IBM)	7dd5830023	restore: add infrastructure to enable shadow stack There are several gotachs when restoring a task with shadow stack: * depending on the compiler options, glibc version and glibc tunables CRIU can run with or without shadow stack. * shadow stack VMAs are special, they must be created using a dedicated map_shadow_stack() system call and can be modified only by a special instruction (wrss) that is only available when shadow stack is enabled. * once shadow stack is enabled, it is not writable even with wrss; writes to shadow stack can be only enabled with ptrace() and only when shadow stack is enabled in the tracee. * if the shadow stack is enabled during restore rather than by glibc, calling retq after arch_prctl() that enables the shadow stack causes #CP, so the function that enables shadow stack can never return. Add the infrastructure required to cope with all of those: * modify the restore code to allow trampoline (arch_shstk_trampoline) that will enable shadow stack and call restore_task_with_children(). * add call to arch_shstk_unlock() right after the tasks are clone()ed; this will allow unlocking shadow stack features and making shadow stack writable. * add stubs for architectures that do not support shadow stacks * add implementation of arch_shstk_trampoline() and arch_shstk_unlock() for x86, but keep it disabled; it will be enabled along with addtion of the code that will restore shadow stack in the restorer blob Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>	2024-09-11 16:02:11 -07:00
Mike Rapoport (IBM)	f47899c9ef	criu: kerndat: add kdat_has_shstk() Detect if CRIU runs with shadow stack enabled and store the result in kerndat. Unlike most kerndat knobs, kdat_has_shstk() does not check for availability of the shadow stack in the kernel, but rather checks if criu runs with shadow stack enabled. This depends on hardware availabilty, kernel and glibc support, compiler options and glibc tunables, so kdat_has_shstk() must be called every time CRIU starts and its result cannot be cached. The result will be used by the code that controls shadow stack enablement in the next commit. Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>	2024-09-11 16:02:11 -07:00
Mike Rapoport (IBM)	2ebd1a4f0b	criu: shstk: prepare shadow stack parameters for restorer blob Shadow stacks must be populated using special WRSS instruction. This instruction is only available when shadow stack is enabled, calling it with disabled shadow stack causes #UD. Moreover, shadow stack VMAs cannot be mremap()ed and they must be created using map_shadow_stack() system call. This requires delaying the restore of shadow stacks to restorer blob after the CRIU mappings are cleared. Introduce rst_shstk_info structure to hold shadow stack parameters required in the restorer blob and populate this structure in arch_prepare_shstk() method. Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> Signed-off-by: Andrei Vagin <avagin@gmail.com>	2024-09-11 16:02:11 -07:00
Mike Rapoport (IBM)	4b6dda7ec0	criu: shstk: premap and prepopulate shadow stack VMAs Shadow stack VMAs cannot be mmap()ed, they must be created using map_shadow_stack() system call and populated using special wrss instruction available only when shadow stack is enabled. Premap them to reserve virtual address space and populate it to have there contents available for later copying after enabling shadow stack. Along with the space required by shadow stack VMAs also reserve an extra page that will be later used as a temporary shadow stack. Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>	2024-09-11 16:02:11 -07:00

... 3 4 5 6 7 ...

11635 Commits