2
0
mirror of https://github.com/checkpoint-restore/criu synced 2025-08-22 01:51:51 +00:00

11425 Commits

Author SHA1 Message Date
Jesus Ramos
bf417dd050 criu/plugin: Add NVIDIA CUDA plugin
Adding support for the NVIDIA cuda-checkpoint utility, requires the use of an
r555 or higher driver along with the cuda-checkpoint binary.

Signed-off-by: Jesus Ramos <jeramos@nvidia.com>
2024-09-11 16:02:11 -07:00
Jesus Ramos
5f486d5aee criu/plugin: Introduce new plugin hooks PAUSE_DEVICES and CHECKPOINT_DEVICES to be used during pstree collection
PAUSE_DEVICES is called before a process is frozen and is used by the CUDA
plugin to place the process in a state that's ready to be checkpointed and
quiesce any pending work

CHECKPOINT_DEVICES is called after all processes in the tree have been frozen
and PAUSE'd and performs the actual checkpointing operation for CUDA
applications

Signed-off-by: Jesus Ramos <jeramos@nvidia.com>
2024-09-11 16:02:11 -07:00
Jesus Ramos
1012e542e5 criu: Restore rseq_cs state slightly earlier in the restore sequence and run the plugin finalizer later in the dump sequence
Restore rseq_cs state before calling RESUME_DEVICES_LATE as the CUDA plugin will
temporarily unfreeze a thread during the plugin hook to assist with device
restore

Run the plugin finalizer later in the dump sequence since the finalizer is used
by the CUDA plugin to handle some process cleanup

Signed-off-by: Jesus Ramos <jeramos@nvidia.com>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
7ac4537069 readme: update link to FAQ page
The current link opens a page with the following text:

    The MediaWiki FAQ can be found at:
    https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:FAQ

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
4f15fe8c59 make: improve check for externally managed Python
Move PYTHON_EXTERNALLY_MANAGED and PIP_BREAK_SYSTEM_PACKAGES
into Makefile.install to avoid code duplication. In addition, add
PIPFLAGS variable to enable specifying pip options during installation.
This is particularly useful for packaging, where it is common for `pip install`
to run in an environment with pre-installed dependencies and without internet
access. In such environment, we need to specify the following options:

    --no-build-isolation --no-index --no-deps

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Adrian Reber
fdf546dbd5 ci: upgrade to Fedora 40 Vagrant images (38 is EOL)
Signed-off-by: Adrian Reber <areber@redhat.com>
2024-09-11 16:02:11 -07:00
Bhavik Sachdev
f171649264 test/dump-crash: check code path when dump crashes
Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
2024-09-11 16:02:11 -07:00
Bhavik Sachdev
a252a240c3 zdtm: Distinguish between fail and crash of dump
Adds a exit_signal static method to criu_cli, criu_config and criu_rpc
used to detect a crash.

Fixes: #350

Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
2024-09-11 16:02:11 -07:00
Adrian Reber
6feb57a840 ci: remove CentOS Stream 8 test (EOL)
Signed-off-by: Adrian Reber <areber@redhat.com>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
1da29f27f6 zdtm: add support for LD_PRELOAD tests
This commit adds a `--preload-libfault` option to ZDTM's run command.
This option runs CRIU with LD_PRELOAD to intercept libc functions
such as pread(). This method allows to simulate special cases,
for example, when a successful call to pread() transfers fewer
bytes than requested.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Andrei Vagin
e7276cf63b pagemap-cache: handle short reads
It is possible for pread() to return fewer number of bytes than
requested. In such case, we need to repeat the read operation
with appropriate offset.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Andrei Vagin
cc88b1e1ff net: Fix TOCTOU race condition in unix_conf_op
The unix_conf_op function reads the size of the sysctl entry array
twice. gcc thinks that it can lead to a time-of-check to time-of-use
(TOCTOU) race condition if the array size changes between the two reads.

Fixes #2398

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2024-09-11 16:02:11 -07:00
Alexander Mikhalitsyn
457bc6a8ff criu: use proper format-specified to accommodate time_t 64-bit change
See also:
https://wiki.debian.org/ReleaseGoals/64bit-time

Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2024-09-11 16:02:11 -07:00
Arnav Bhatt
95f66d13db criu: move sigact dump/restore code into sigact.c
Seperate sigact dump/restore code from cr-restore.c and parasite-syscall.c into sigact.c

Signed-off-by: Arnav Bhatt <arnav@ghativega.in>
2024-09-11 16:02:11 -07:00
Adrian Reber
9c8a6927aa ci: update check for SELinux
The rawhide tests runs in a container. Containers always have SELinux
disabled from the inside. Somehow /sys/fs/selinux is now mounted. We
used the existence of that directory if SELinux is available. This seems
to be no longer true.

Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
b3c3422cd9 test/make: remove unused target
A fault-injection test was introduced in commit [1] and later removed in
commit [2]. This patch removes the obsolete Makefile target.

[1] b95407e264fcf58f4f73f78abef6dac60436e7dd
    test: check, that parasite can rollback itself (v2)

[2] 2cb4532e266d0c9f8e87839d5b5eb728a3e4d10d
    tests: remove zdtm.sh (v2)

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
30aa8dbe4d mount: fix unbounded write
Replace sprintf() with snprintf() and specify maximum length of
characters to avoid potential overflow.

Reported-by: GitHub CodeQL (https://codeql.github.com/)
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Juntong Deng
708f872a6d sk-tcp: Add test cases for TCP_CORK and TCP_NODELAY socket options
Currently there are no socket option test cases for TCP_CORK and
TCP_NODELAY, this commit adds related test cases.

The socket option test cases for TCP_KEEPCNT, TCP_KEEPIDLE, and
TCP_KEEPINTVL already exist in socket-tcp_keepalive.c, so they are
not included in this test case.

Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
2024-09-11 16:02:11 -07:00
Juntong Deng
9ba9aff77f sk-tcp: Move TCP socket options from SkOptsEntry to TcpOptsEntry
Currently some TCP socket option information is stored in SkOptsEntry,
which is a little confusing.

SkOptsEntry should only contain socket options that are common to
all sockets.

In this commit move the TCP-specific socket options from SkOptsEntry
to TcpOptsEntry.

Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
2024-09-11 16:02:11 -07:00
Juntong Deng
1cb75c0b1e sk-tcp: Move TCP socket options from TcpStreamEntry to TcpOptsEntry
Currently some of the TCP socket option information is stored in the
TcpStreamEntry, but the information in the TcpStreamEntry is only
restored after the TCP socket has established connection, which
results in these TCP socket options not being restored for
unconnected TCP sockets.

In this commit move the TCP socket options from TcpStreamEntry to
TcpOptsEntry and add dump_tcp_opts() and restore_tcp_opts() for TCP
socket options dump and restore.

Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
2024-09-11 16:02:11 -07:00
Kir Kolyshkin
13854a988c criu: fix a fatal failure if nft doesn't work
On some systems, nft binary might not be installed, or some kernel
options might be unconfigured, resulting in something like this:

	sudo unshare -n nft create table inet CRIU
	Error: Could not process rule: Operation not supported
	create table inet CRIU
	^^^^^^^^^^^^^^^^^^^^^^^

This is similar to what kerndat_has_nftables_concat() does, and if the
outcome is the same, it returns an error to kerndat_init(), and an error
from kerndat_init() is considered fatal.

Let's relax the check, returning mere "feature not working" instead of
a fatal error.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2024-09-11 16:02:11 -07:00
Pavel Tikhomirov
df178c7e53 sk-tcp: cleanup dump_tcp_conn_state error handling
1) In dump_tcp_conn_state, if return from libsoccr_save is >=0, we check
that sizeof(struct libsoccr_sk_data) returned from libsoccr_save is
equal to sizeof(struct libsoccr_sk_data) we see in dump_tcp_conn_state
(probably to check if we use the right library version). And if sizes
are different we go to err_r, which just returns ret, which can
teoretically be 0 (if size in library is zero) and that would lead
dump_one_tcp treat this as success though it is obvious error.

2) In case of dump_opt or open_image fails we don't explicitly set ret
and rely that sizeof(struct libsoccr_sk_data) previously set to ret is
not 0, I don't really like it, it makes reading code too complex.

3) We have a lot of err_* labels which do exactly the same thing, there
is no point in having all of them, also it is better to choose the name
of the label based on what it really does.

So let's refactor error handling to avoid these inconsistencies.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
4607b53566 mem: optimize debug logging of enqueued pages
During restore, CRIU prints "Enqueue page-read" messages for
each page-read request [1]. However, this message does not
provide useful information, increases performance overhead
during restore and the size of log file.

$ ./zdtm.py run -t zdtm/static/maps06 -f h -k always
$ grep 'Enqueue page-read' dump/zdtm/static/maps06/56/1/restore.log | wc -l
20493

This commit replaces these log messages with a single message
that shows the number of enqueued page-read requests.

$ grep 'enqueued' dump/zdtm/static/maps06/56/1/restore.log
(00.061449)     56: nr_enqueued:   20493

[1] https://github.com/checkpoint-restore/criu/commit/91388fc

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
f4290868bb ci/vdso01: fix typo
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
e68a06cfd1 ci: update actions/checkout to v4
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
5aaf450213 ci: update base OS to ubuntu 22.04
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Andrei Vagin
1c2a3d7faa check: verify ino and dev of overlayfs files in /proc/pid/maps
Check that the file device and inode shown in /proc/pid/maps match
values returned by stat(2).

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2024-09-11 16:02:11 -07:00
Kir Kolyshkin
e07ffa04b0 Makefile.config: fix/improve feature warnings.
1. Tell which RPMs or DEBs are required in all cases.

2. Use $(info ...) everywhere.

3. Drop extra nested $(info), instead use (a document) a simpler kludge.

4. Simplify and unify the language, add missing periods.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2024-09-11 16:02:11 -07:00
Pavel Tikhomirov
af4058871e timer: fix wrapping allignment in function declaration
Currently we have tabs + spaces on the wrapped line but the wrapped part
is not alligned to the opening bracket.

Fixes: bbe26d1b7 ("timer: fix allignment in function definition")
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2024-09-11 16:02:11 -07:00
Adrian Reber
0fc83a79b1 ci: silence CircleCI warning about deprecated image
CircleCI currently prints out the following warning:

   This job is using a deprecated image 'ubuntu-2004:202010-01', please update to a newer image

According to https://discuss.circleci.com/t/linux-image-deprecations-and-eol-for-2024/
the recommended image name is: "image: default"

Signed-off-by: Adrian Reber <areber@redhat.com>
2024-09-11 16:02:11 -07:00
ccccrrrr
52623cca16 criu: move timers dump/restore code into separate file
Fixes: #335

Signed-off-by: ccccrrrr <zcr1006@gmail.com>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
231ba0cd29 zdtm/sched_policy00: use reset-on-fork flag
This patch extends the sched_policy00 test case to verify that
the SCHED_RESET_ON_FORK flag is restored correctly.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
75fed59ef6 Add support for reset-on-fork scheduling flag
This patch extends CRIU with support for SCHED_RESET_ON_FORK.
When the SCHED_RESET_ON_FORK flag is set, the following rules
apply for subsequently created children:

- If the calling thread has a scheduling policy of SCHED_FIFO or
SCHED_RR, the policy is reset to SCHED_OTHER in child processes.

- If the calling process has a negative nice value, the nice value
is reset to zero in child processes.

(See 'man 7 sched')

Fixes: #2359

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Artem Trushkin
8f0e200e66 mem: fix some VMAs being incorrectly mapped wtih PROT_WRITE
A memory interval is a half-open interval, so the condition
when pr->pe->vaddr == vma->e->end should not be interpreted
as an intersection and should cause vma to be marked with VMA_NO_PROT_WRITE.

Fixes: #2364

Signed-off-by: Artem Trushkin <at.120@ya.ru>
2024-09-11 16:02:11 -07:00
Adrian Reber
a2b018a188 ci: try to fix broken docker test
Upgrade to 22.04 base image and use the existing version of docker.

Signed-off-by: Adrian Reber <areber@redhat.com>
2024-09-11 16:02:11 -07:00
Mike Rapoport (IBM)
a48aa33eaa restorer: shstk: implement shadow stack restore
The restore of a task with shadow stack enabled adds these steps:

* switch from the default shadow stack to a temporary shadow stack
  allocated in the premmaped area
* unmap CRIU mappings; nothing changed here, but it's important that
  CRIU mappings can be removed only after switching to a temporary
  shadow stack
* create shadow stack VMA with map_shadow_stack()
* restore shadow stack contents with wrss
* switch to "real" shadow stack
* lock shadow stack features

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
2024-09-11 16:02:11 -07:00
Mike Rapoport (IBM)
7dd5830023 restore: add infrastructure to enable shadow stack
There are several gotachs when restoring a task with shadow stack:
* depending on the compiler options, glibc version and glibc tunables
  CRIU can run with or without shadow stack.
* shadow stack VMAs are special, they must be created using a dedicated
  map_shadow_stack() system call and can be modified only by a special
  instruction (wrss) that is only available when shadow stack is
  enabled.
* once shadow stack is enabled, it is not writable even with wrss;
  writes to shadow stack can be only enabled with ptrace() and only when
  shadow stack is enabled in the tracee.
* if the shadow stack is enabled during restore rather than by glibc,
  calling retq after arch_prctl() that enables the shadow stack causes
  #CP, so the function that enables shadow stack can never return.

Add the infrastructure required to cope with all of those:

* modify the restore code to allow trampoline (arch_shstk_trampoline)
  that will enable shadow stack and call restore_task_with_children().
* add call to arch_shstk_unlock() right after the tasks are clone()ed;
  this will allow unlocking shadow stack features and making shadow
  stack writable.
* add stubs for architectures that do not support shadow stacks
* add implementation of arch_shstk_trampoline() and arch_shstk_unlock()
  for x86, but keep it disabled; it will be enabled along with addtion
  of the code that will restore shadow stack in the restorer blob

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
2024-09-11 16:02:11 -07:00
Mike Rapoport (IBM)
f47899c9ef criu: kerndat: add kdat_has_shstk()
Detect if CRIU runs with shadow stack enabled and store the result in
kerndat.

Unlike most kerndat knobs, kdat_has_shstk() does not check for
availability of the shadow stack in the kernel, but rather checks if
criu runs with shadow stack enabled.

This depends on hardware availabilty, kernel and glibc support, compiler
options and glibc tunables, so kdat_has_shstk() must be called every
time CRIU starts and its result cannot be cached.

The result will be used by the code that controls shadow stack
enablement in the next commit.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
2024-09-11 16:02:11 -07:00
Mike Rapoport (IBM)
2ebd1a4f0b criu: shstk: prepare shadow stack parameters for restorer blob
Shadow stacks must be populated using special WRSS instruction. This
instruction is only available when shadow stack is enabled, calling it
with disabled shadow stack causes #UD.

Moreover, shadow stack VMAs cannot be mremap()ed and they must be
created using map_shadow_stack() system call. This requires delaying the
restore of shadow stacks to restorer blob after the CRIU mappings are
cleared.

Introduce rst_shstk_info structure to hold shadow stack parameters
required in the restorer blob and populate this structure in
arch_prepare_shstk() method.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2024-09-11 16:02:11 -07:00
Mike Rapoport (IBM)
4b6dda7ec0 criu: shstk: premap and prepopulate shadow stack VMAs
Shadow stack VMAs cannot be mmap()ed, they must be created using
map_shadow_stack() system call and populated using special wrss
instruction available only when shadow stack is enabled.

Premap them to reserve virtual address space and populate it to have
there contents available for later copying after enabling shadow stack.

Along with the space required by shadow stack VMAs also reserve an extra
page that will be later used as a temporary shadow stack.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
2024-09-11 16:02:11 -07:00
Mike Rapoport (IBM)
17eda3ce57 criu: shstk: add VMA_AREA_SHSTK flag
The shadow stack VMAs require special care because they can only be
created and populated using special system calls.

Add VMA_AREA_SHSTK flag and set it for VMAs that are marked as "ss" in
/proc/pid/smaps

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
2024-09-11 16:02:11 -07:00
Mike Rapoport (IBM)
0aba3dcfa1 compel: shstk: prepare shadow stack signal frame
When calling sigreturn with CET enabled, the kernel verifies that the
shadow stack has proper address of sa_restorer and a "restore token".
Normally, they pushed to the shadow stack when signal processing is
started.

Since compel calls sigreturn directly, the shadow stack should be
updated to match the kernel expectations for sigreturn invocation.

Add parasite_setup_shstk() that sets up the shadow stack with the
address of __export_parasite_head_start as sa_restorer and with the
required restore token.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
2024-09-11 16:02:11 -07:00
Mike Rapoport (IBM)
63a45e1c8a compel: infect: prepare parasite_service() for addition of CET support
To support sigreturn with CET enabled parasite must rewind its stack
before calling sigreturn so that shadow stack will be compatible with
actual calling sequence.

In addition, calling sigreturn from top level routine
(__export_parasite_head_start) will significantly simplify the shadow
stack manipulations required to execute sigreturn.

For x86 make fini_sigreturn() return the stack pointer for the signal
frame that will be used by sigreturn and propagate that return value up
to __export_parasite_head_start.

In non-daemon mode parasite_trap_cmd() returns non-positive value
which allows to distinguish daemon and non-daemon mode and properly stop
at int3 in non-daemon mode.

Architectures other than x86 remain unchanged and will still call
sigreturn from fini_sigreturn().

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
2024-09-11 16:02:11 -07:00
Mike Rapoport (IBM)
6e491a19a3 compel: shstk: save CET state when CPU supports it
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
2024-09-11 16:02:11 -07:00
Mike Rapoport (IBM)
17f4dd0959 compel: always pass user_fpregs_struct_t to compel_get_task_regs()
All architectures create on-stack structure for floating point save area
in compel_get_task_regs() if the caller passes NULL rather than a valid
pointer.

The only place that calls compel_get_task_regs() with NULL for floating
point save area is parasite_start_daemon() and it is simpler to define
this strucuture on stack of parasite_start_daemon().

The availability of floating point save data is required in
parasite_start_daemon() to detect shadow stack presence early during
parasite infection and will be used in later patches.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
2024-09-11 16:02:11 -07:00
Mike Rapoport (IBM)
0b8c51eaad compiler: add ALIGN_DOWN macro
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
2024-09-11 16:02:11 -07:00
Stepan Pieshkin
f590c2b638 zdtm/static: check that cgroup layout of threads is preserved
Co-developed-by: Stepan Pieshkin <stepanpieshkin@google.com>
Signed-off-by: Stepan Pieshkin <stepanpieshkin@google.com>
Signed-off-by: Michal Clapinski <mclapinski@google.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2024-09-11 16:02:11 -07:00
Stepan Pieshkin
a0a6ec3dc0 cgroup: Add support for restoring a thread in a correct v1 cgroup
Currently we have checkpoint/restore support only of cgroup v2 threaded
controllers. Threads originating in cgroup v1 environments will be
restored to the main thread's cgroup. This change extends the support
for a cgroups v1.

Signed-off-by: Stepan Pieshkin <stepanpieshkin@google.com>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
835afb1b88 criu-ns: fix lint error
This patch fixes the following lint error:
scripts/criu-ns:219:16: E713 [*] Test for membership should be `not in`

The change in this patch is auto-generated with `ruff --fix`.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
e0b74f558b make: replace flake8 with ruff
Ruff (https://github.com/astral-sh/ruff) is a Python linter
written in Rust, designed to replace Flake8. It is significantly
faster and actively maintained.

In addition to replacing flake8 with ruff, this patch also
creates separate makefile targets for ruff, shellcheck and
codespell, so that they can be tested independently.

RUFF_FLAGS can be used to specify options such as '--fix'.
Example:
	make lint
	make ruff RUFF_FLAGS=--fix

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00