2
0
mirror of https://github.com/checkpoint-restore/criu synced 2025-08-30 13:58:34 +00:00
Commit Graph

11541 Commits

Author SHA1 Message Date
Pavel Tikhomirov
5cd7092fda sk-unix: make add_fake_unix_queuers earier and rework find_queuer_for
Before this patch, if we had a unixsk with incomming scm packets (with
fds) and with the sender side fd closed, we got an error:

Error (criu/sk-unix.c:1125): unix: Can't find sender for 0x1e

First part of the problem is that unix_note_scm_rights() expects to see
a "queuer" which would send scm packets to the unixsk, and there is no
as the sender side is closed.

Second part of the problem is that we already have "fake" queuers
feature so that it already creates a unix socket pair and leaves other
end open for later queuing packets. But function add_fake_unix_queuers()
is called after unix_note_scm_rights() thus there is no chance to find
queuer at the point of failure.

Third part is that when we look for a queuer in find_queuer_for() we
actually look for a socket for which we are a queuer and not for the
socket which is a queuer for us, which is opposite to the name. For
cases where both ends are alive both are queuers for each other so this
was not important, but for our closed sender case it breaks.

So let's reorder add_fake_unix_queuers() before unix_note_scm_rights()
and make find_queuer_for() actually do what it's name implies.

This situation is started to reproduce on Virtuozzo start/stop tests
with the unixsk belonging to systemd, we suppose that this state where
the sender fd side is closed happens rarely only on systemd start/stop,
so we don't see it in regular suspend resume of long-living containers.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Ashutosh Mehra
28358db13b Fix the check for mnt namespace in criu-ns
criu-ns script incorrectly compares the pidns fd with mntns fd.
Also reversed the condition in is_my_namespace function to align it
with the function name.

Signed-off-by: Ashutosh Mehra <asmehra@redhat.com>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
295dc85ca0 github: use git-clang-format instead of make indent
This allows us to only detect bad formating in PR changes but not all
the CRIU codebase.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Alexander Mikhalitsyn
ced4ab4b0a zdtm: skip zdtm/static/shm-hugetlb when hugetlb is not supported
Reported-by: Mr. Jenkins (ppc64le)
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Bui Quang Minh
c830643d86 Revert "ci: skip new hugetlb maps09/maps10 tests for pre-dump"
This reverts commit 37ea8c5fcf.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
2023-04-15 21:17:21 -07:00
Bui Quang Minh
b26e1fdbf7 mem: Skip pre-dumping on hugetlb mappings
As private hugetlb mappings are not pre-mapped, the content of them is restored
in the the restorer which cannot use page_read->read_pages. As a result, we
cannot recursively read the content of pre-dumped image in the parent directory
and use preadv to read the content from the last dumped image only. Therefore,
it may freeze while restoring when the content of mapping is in pre-dumped image
in parent directory.

We need to skip pre-dumping on hugetlb mappings to resolve the issue.

Suggested-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
9066f87417 cr-dump: do not report success to logs if post-dump script failed
It can be confusing to see error from post-dump action script and non
zero return from criu though at the same time see "Dumping finished
successfully" in log. I believe it is logical to consider post-dump
action script as a part of "dump" process so fail in it means that the
whole dump failed.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Adrian Reber
d46f40f4ff criu: Version 3.17.1
* Fixes for pre-dump read mode
 * Fixes for mount-v2
 * amdgpu plugin build and installation fixes
 * Some minor CI related fixes

Signed-off-by: Adrian Reber <areber@redhat.com>
v3.17.1
2022-06-23 14:53:25 -07:00
Radostin Stoyanov
46ec6749fa ci: Fix code indent
This patch contains auto-generated changes from `make indent`

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2022-06-22 10:20:33 -07:00
Pavel Tikhomirov
f29d51560e zdtm: add mnt_root_ext test
This test has one external mount [criumntns] /zdtm_root_ext.tmp ->
[testmntns] /mnt_root_ext.test, and it specifically gives '--external
mnt[MNT]:.zdtm_root_ext.tmp' option on restore without '/' to make
dirname on it return static '.' path (see glibc dirname() code) and
reproduce a segfault in resolve_mountpoint().

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-06-22 10:20:33 -07:00
Pavel Tikhomirov
8a18faea09 util/mount-v2: fix resolve_mountpoint() to always return freeable pointer
Else we have a Segmentation fault in __move_mount_set_group() on
xfree(source_mp) if resolve_mountpoint() returned statically allocated
path.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-06-22 10:20:33 -07:00
Pavel Tikhomirov
4cc8a18f3b zdtm: test multiple ext bindmounts with no common root and same master
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-06-22 10:20:33 -07:00
Pavel Tikhomirov
229c5df5ce mount-v2: workaround for multiple external bindmounts with no common root
It's a problem when while restoring sharing group we need to copy
sharing between two mounts with non-intersecting roots, because kernel
does not allow it.

We have a case https://github.com/opencontainers/runc/pull/3442, where
runc adds different devtmpfs file-bindmounts to container and there is
no fsroot mount in container for this devtmpfs, thus mount-v2 faces the
above problem.

Luckily for the case of external mounts which are in one sharing group
and which have non-intersecting roots, these mounts likely only have
external master with no sharing, so we can just copy sharing from
external source and make it slave as a workaround.

https://github.com/checkpoint-restore/criu/issues/1886

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-06-22 10:20:33 -07:00
Pavel Tikhomirov
f8c9e07e4f mount-v2: split out restore_one_sharing helper
This helper restores master_id and shared_id of first mount in the
sharing group. It first copies sharing from either external source or
internal parent sharing group and makes master_id from shared_id. Next
it creates new shared_id when needed.

All other mounts except first are just copied from the first one.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-06-22 10:20:33 -07:00
Radostin Stoyanov
a90a1d4827 amdgpu: Set PLUGINDIR to /usr/lib/criu
Building the criu packages for Ubuntu/Debian fails with:

	mkdir: cannot create directory '/var/lib/criu': Permission denied

This patch updates PLUGINDIR with the value /usr/lib/criu

Fixes: #1877

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2022-06-22 10:20:33 -07:00
Radostin Stoyanov
e6f292cb38 amdgpu/Makefile: Fix include path
When building packages for CRIU the source directory might have a
name different than 'criu'.

Fixes: #1877

Reported-by: @siris
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2022-06-22 10:20:33 -07:00
Andrei Vagin
6507ae5331 ci: test the read mode of pre-dump
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2022-06-22 10:20:33 -07:00
Andrei Vagin
f43dae720a page-xfer: refactoring analyze_iov and fill_userbuf
* handle unexpected errors of process_vm_readv
* adjust riovs in analyze_iov
* call handle_faulty_iov only if process_vm_readv returns EFAULT.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2022-06-22 10:20:33 -07:00
Andrei Vagin
efeedf3912 pre-dump: call vmsplice with SPLICE_F_GIFT
In this case, vmplice attaches pages without coping them.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2022-06-22 10:20:33 -07:00
Andrei Vagin
b2bfb7745d page-xfer: adjust a buffer to a pipe size
Due to side effects of F_SETPIPE_SZ, the actual pipe size can be greater
than PIPE_MAX_SIZE.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2022-06-22 10:20:33 -07:00
Andrei Vagin
0df0a7dace page-xfer: use negative values for error codes
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2022-06-22 10:20:33 -07:00
Andrei Vagin
51533d98ac page-pipe: fix limiting a pipe size
But actually, 5a92f100b8 probably has to be reverted as a whole.
PIPE_MAX_SIZE is the hard limit to avoid PAGE_ALLOC_COSTLY_ORDER
allocations in the kernel. But F_SETPIPE_SZ rounds up a requested pipe
size to a power-of-2 pages. It means that when we request PIPE_MAX_SIZE
that isn't a power-of-2 number, we actually request a pipe size greater
than PIPE_MAX_SIZE.

Fixes: 5a92f100b8 ("page-pipe: Resize up to PIPE_MAX_SIZE")

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2022-06-22 10:20:33 -07:00
Radostin Stoyanov
ff92731690 crit: Use same version as criu
Name collision with an abandoned project named 'crit' in pypi causes pip
to show crit (CRiu Image Tool) as outdated.  This patch updates crit to
use the same version and license as criu.

Fixes #1878

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2022-06-22 10:20:33 -07:00
Radostin Stoyanov
f522adec4a ci: Fix unsafe repository error
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2022-06-22 10:20:33 -07:00
Adrian Reber
4f8f295e57 criu: Version 3.17
Amongst a huge number of fixes all over the place this release introduces:

* mount-v2 engine
* support for MAP_HUGETLB mappings
* support for Linux Restartable Sequences
* support for SOCK_SEQPACKET unix sockets
* CRIU AMD GPU plugin
* setsockopt(SO_BUF_LOCK) support for tcp sockets

Signed-off-by: Adrian Reber <areber@redhat.com>
v3.17
2022-05-05 12:42:14 -07:00
Alexander Mikhalitsyn
991f27c841 ci: skip new hugetlb maps09/maps10 tests for pre-dump
This commit has to be reverted once we fix the issue.

Issue: https://github.com/checkpoint-restore/criu/issues/1868

Reported-by: Mr. Jenkins
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-05-05 12:42:14 -07:00
Alexander Mikhalitsyn
0c1f0256ff kerndat: handle the case when hugetlb isn't supported
Currently we check memfd_hugetlb by doing memfd_create("", MFD_HUGETLB).
If we see EINVAL we report that it's not supported, but we can also
get ENOENT error in such case in hugetlb_file_setup() while trying
to find proper hugetlbfs mount.

Reference:
https://github.com/torvalds/linux/blob/06fb4ecfeac/fs/hugetlbfs/inode.c#L1465

Fixes: 4245e6b02f ("check: Add a check for using memfd with hugetlb")

Reported-by: Mr. Jenkins (ppc64le)
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-05-05 12:42:14 -07:00
Alexander Mikhalitsyn
17a19676cd zdtm: handle the case when hugetlb isn't supported
Fixes: e2e02bc83e ("zdtm: Add MAP_HUGETLB memory mapping test")

Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-05-05 12:42:14 -07:00
Alexander Mikhalitsyn
c1380c077a ci: workaround race between sit module loading and bridge test
https://github.com/checkpoint-restore/criu/issues/1866

Suggested-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-05-05 12:42:14 -07:00
Alexander Mikhalitsyn
550eafc5d8 ci: print kernel modules list
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-05-05 12:42:14 -07:00
Adrian Reber
f635b61f49 test: install criu in /usr
GitHub Actions comes with pre-installed criu in /usr. configure scripts
looking for CRIU will pickup the pre-installed version in /usr if we do
not install CI criu also in /usr.

Signed-off-by: Adrian Reber <areber@redhat.com>
2022-05-05 12:42:14 -07:00
Radostin Stoyanov
2f0f128396 readme: Add badge links to workflows
This commit adds a link to the workflow runs for each badge.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2022-04-28 17:53:52 -07:00
Andrey Zhadchenko
d14dbb8c74 sk-unix: rework bind_on_deleted() return codes
bind_on_delete() return code is only used for setting errno for pr_perror()
This is mostly useless since a lot of syscalls already set it. All of
non-syscall errors already have prints in case of failure.
Fix bind_on_deleted() always returning 0 and simplify error juggling to
returning -1 in case of errors.

Fixes: #1771
Fixes: d0308e5ecc ("sk-unix: make criu respect existing files while restoring ghost unix socket fd")
Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Radostin Stoyanov
5b872c7183 proc_parse: Fix parsing bpf map_extra
The map_extra field has been introduced in Linux Kernel release 5.16
and does not exist in older kernel versions. The current parsing
implementation fails when map_extra is missing.

In particular, it tries to parse the `memlock` field as `map_extra` and
fails but it does not exit with an error because map_extra is marked as
"optional". It then tries to parse the `map_id` field as `memlock` and
fails with an error because map_id is not optional:

Error (criu/proc_parse.c:2161): parse_fdinfo_pid_s: error parsing [map_type:\t2] for 19: Success'

To correctly handle this, we should try to parse again the next field
when parsing of `map_extra` fails, without reading the next line from
the bpfmap.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2022-04-28 17:53:52 -07:00
Radostin Stoyanov
d40b332cef bpf: update deprecated API
bpf_create_map_xattr() has been replaced with bpf_map_create()
https://github.com/libbpf/libbpf/commit/6cfb97c

DECLARE_LIBBPF_OPTS has been renamed to LIBBPF_OPTS
https://github.com/libbpf/libbpf/commit/ea6c242

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2022-04-28 17:53:52 -07:00
Alexander Mikhalitsyn
f641e0c4ba ci: print mountinfo instead of mount cmd output
mountinfo contains more info than just "mount" output

Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Alexander Mikhalitsyn
5c0b4fbcda ci: criu-fault: skip inotify_irmap fault-injection on btrfs
It looks like we've got broken fhandles from fdinfo
for inotifies/fanotifies for btrfs. I will look into that.

Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Alexander Mikhalitsyn
7ac85cab86 scripts/ci: fix ZDTM_OPTS variable passing
We have a separate target for alpine in script/ci/Makefile
which defines some extra opts for zdtm using ZDTM_OPTIONS
variable. But really it doesn't work. First of all, variable
should be named as ZDTM_OPTS and also we have to specify
it directly in the CONTAINER_RUNTIME cmdline to make it work.

I've also changed variable value just to make it consistent
with docker.env value which was really used.

Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Alexander Mikhalitsyn
ead227994b zdtm: temporary disable rseq02 test
That's strange but rseq02 test fails with:
09:06:16.222:    51: exit 555f52082120 555f52082120
09:06:16.282:    51: exit 555f52082120 555f52082120
09:06:16.340:    51: exit 555f52082120 555f52082120
09:06:16.397:    51: exit 555f52082120 555f52082120
09:06:16.503:    51: exit 0 555f52082120
09:06:16.503:    51: FAIL: rseq02.c:235: Failed to increment per-cpu counter (errno = 2 (No such file or directory))
09:06:16.503:    51: FAIL: rseq02.c:246:  (errno = 16 (Device or resource busy))

It means that rseq_cs pointer was cleaned up by the kernel despite of
NO_RESTART* flags. That's a hardly reproducible and I will investigate that.

Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Alexander Mikhalitsyn
db9ec13616 zdtm: add rseq02 transition test with NO_RESTART CS flag
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Alexander Mikhalitsyn
1e0bed3d69 rseq: handle rseq/rseq_cs flags properly
Userspace may configure rseq cs abort policy by
setting RSEQ_CS_FLAG_NO_RESTART_ON_* flags.

In ("cr-dump: fixup thread IP when inside rseq cs") we have supported
the case when process was caught by CRIU during rseq cs execution by
fixing up IP to abort_ip. Thats a common case, but there is special flag
called RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL, in this case we have to leave
process IP as it was before CRIU seized it. Unfortunately, that's not
all that we need here. We also must preserve (struct rseq)->rseq_cs field.

You may ask like "why we need to preserve it by hands? CRIU is dumping
all process memory and restores it". That's true. But not so easy. The problem
here is that the kernel performs this field cleanup when it realized that
the process gets out of rseq cs. But during dump/restore procedures we are
executing parasite/restorer from the process context. It means that process
will get out of rseq cs in any case and (struct rseq)->rseq_cs will be cleared
by the kernel. So we need to restore this field by hands at the *last* stage
of restore just before releasing processes.

Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Alexander Mikhalitsyn
13338dee5c Revert "test: disable rseq also on Archlinux"
This reverts commit f008f74041.

Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Alexander Mikhalitsyn
064e9925a0 zdtm: add transition/rseq01 test for amd64
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Alexander Mikhalitsyn
2d3354e7b6 cr-dump: fixup thread IP when inside rseq cs
If we caught the process when it's inside rseq
critical section we have to handle it properly.

From the kernel side of view, if the process
is executing inside the rseq cs and gets a signal,
rseq critical section execution will be interrupted
and after signal handler execution, we will proceed
to rseq cs abort handler instead of continuing normal
rseq cs execution (if RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL
isn't set).

When CRIU seizes processes that's the same thing as
getting signal from the rseq point of view. So we need
to fixup instruction pointer to rseq cs abort handler
address.

Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Alexander Mikhalitsyn
4c7ece0bb7 compel: add helpers to get/set instruction pointer
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Alexander Mikhalitsyn
441310c260 zdtm/static/rseq00: fix rseq test when linking with a fresh Glibc
Fresh Glibc does rseq() register by default. We need to unregister
rseq before registering our own.

Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Alexander Mikhalitsyn
f70ddab24e pie/restorer: unregister (g)libc rseq before memory restoration
Fresh glibc does rseq registration by default during start_thread().
[ see https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=95e114a0919d844d8fe07839cb6538b7f5ee920e ]

This cause process crashes during memory restore procedure, because
memory which corresponds to the struct rseq will be unmapped and overriden
in __export_restore_task.

Let's perform rseq unregistration just before unmap_old_vmas(). To achieve
that we need to determine (struct rseq) address at first while we are in Glibc
(we do that in prep_libc_rseq_info using Glibc exported symbols).

See also
("nptl: Add public rseq symbols and <sys/rseq.h>")
https://sourceware.org/git?p=glibc.git;a=commit;h=c901c3e764d7c7079f006b4e21e877d5036eb4f5
("nptl: Add <thread_pointer.h> for defining __thread_pointer")
https://sourceware.org/git?p=glibc.git;a=commit;h=8dbeb0561eeb876f557ac9eef5721912ec074ea5

TODO: do the same for musl-libc if it will start to register rseq by default

Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Alexander Mikhalitsyn
e1799e5305 include: add thread_pointer.h from Glibc
Let's take thread_pointer() implementation from Glibc.
It will be useful in the further because Glibc stores
struct rseq on the TLS. Absolute address can be calculated
as __criu_thread_pointer() + __rseq_offset.
__rseq_offset is an exported symbol from Glibc itself.

We need to have an ability to determine where struct
rseq is stored to unregister it in CRIU during the restore
stage.

For different libc like musl-libc we will have to handle
rseq separately depends on how struct rseq is stored.

Right now that's not a problem because musl-libc has no
rseq support, so we don't need to unregister it.

https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=8dbeb0561eeb876f557ac9eef5721912ec074ea5
https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=cb976fba4c51ede7bf8cee5035888527c308dfbc

Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Alexander Mikhalitsyn
267c1fdade ci: add Fedora Rawhide based test on Cirrus
We have ability to use nested virtualization on
Cirrus, and already have "Vagrant Fedora based test (no VDSO)"
test, let's do analogical for Fedora Rawhide to get fresh kernel.

Suggested-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Alexander Mikhalitsyn
03aff7e823 Revert "ci: disable glibc rseq support"
Let's see how rseq() C/R feature works

This reverts commit d99def7dcf.

Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-04-28 17:53:52 -07:00