Breakpoints are used to stop as close as possible to a target system call.
First, we don't need it after this point.
Second, PTRACE_CONT can't pass through a breakpoint on arm64.
Signed-off-by: Andrei Vagin <avagin@gmail.com>
When delivering system call traps, set bit 7 in the signal number (i.e.,
deliver SIGTRAP|0x80). This makes it easy for the tracer to distinguish
normal traps from those caused by a system call.
Signed-off-by: Andrei Vagin <avagin@gmail.com>
1. Rename CentOS 8 to CentOS Stream 8 (which it is).
2. Install junit_xml from the repo rather than via pip.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Mostly a copy-paste from the CentOS 8 task, with a few differences:
- Use dnf instead of yum
- Enable crb instead of powertools
- Different way of installing EPEL
- No need to switch to python3 as this is the default
- junit_xml is now available as an rpm
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
There is a race condition in docker/containerd that causes docker to
occasionally fail when starting a container from a checkpoint immediately
after the checkpoint has been created.
This problem is unrelated to criu and has been reported in
https://github.com/moby/moby/issues/42900
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Let's use dynamic approach to detect built-in *libc rseq in all cases,
and "old" static approach as a fallback path if the user kernel
lacks support of ptrace_get_rseq_conf feature.
Suggested-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Before this patch we assumed that CRIU is compiled against
the same GLibc as it runs with. But as we see from real
world examples like #1935 it's not always true.
The idea of this patch is to detect rseq configuration
for the main CRIU process and use it to unregister
rseq for all further child processes. It's correct,
because we restore pstree using clone*() syscalls,
don't use exec*() (!) syscalls, so rseq gets inherited
in the kernel and rseq configuration remains the same
for all children processes.
This will prevent issues like this:
https://github.com/checkpoint-restore/criu/issues/1935
Suggested-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
The result of check_aa_ns_dumping() is stored in kdat. Instead of doing
the same check twice - once on kerndat_init(), and again in
check_apparmor_stacking(), we can check the stored value.
Suggested-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
The feature check for AppArmor stacking was introduced in
commit:
8723e3f998d1ec5f125e6600436a96f7ff9c1631
check: add a feature test for apparmor_stacking
However, on systems that don't support AppArmour, this check always
fails. As a result, `criu check --all` shows the following message:
Looks good but some kernel features are missing
which, depending on your process tree, may cause
dump or restore failure.
Reported-by: André Rösti (@andrej)
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
In commits [1, 2] the version of containerd installed by default in the
GitHub CI virtual environment was replaced with the latest release from
GitHub as a workaround to a bug in containerd. This bug has been fixed
sometime ago and the current default version of containerd (1.6.6) does
not require this workaround. However, with the latest release, the
containerd binaries uploaded on GitHub have been built for Ubuntu 22.04
[3]. Our tests are still running on Ubuntu 20.04 and this results in the
following error:
/usr/bin/containerd: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /usr/bin/containerd)
/usr/bin/containerd: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by /usr/bin/containerd)
[1] https://github.com/checkpoint-restore/criu/commit/046cad8
[2] https://github.com/checkpoint-restore/criu/commit/81a68ad
[3] https://github.com/containerd/containerd/commit/6b2dc9a37
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
There are several changes in glibc 2.36 that make sys/mount.h header
incompatible with kernel headers:
https://sourceware.org/glibc/wiki/Release/2.36#Usage_of_.3Clinux.2Fmount.h.3E_and_.3Csys.2Fmount.h.3E
This patch removes conflicting includes for `<linux/mount.h>` and
updates the content of `criu/include/linux/mount.h` to match
`/usr/include/sys/mount.h`. In addition, inline definitions sys_*()
functions have been moved from "linux/mount.h" to "syscall.h" to
avoid conflicts with `uapi/compel/plugins/std/syscall.h` and
`<unistd.h>`. The include for `<linux/aio_abi.h>` has been replaced
with local include to avoid conflicts with `<sys/mount.h>`.
Fixes: #1949
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
We need to pass environment variables from the CI environment to
distinguish between CI environments. However, when `sudo -E` is
used to run Podman it results in the XDG_RUNTIME_DIR environment
variable being set incorrectly that prevents Podman from running.
This patch fixes the following error in the GitHub Action virtual
environment:
error running container: error from /usr/bin/crun creating
container for [/bin/sh -c /bin/prepare-for-fedora-rawhide.sh]:
sd-bus call: Connection reset by peer
Fixes: #1942
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
I've been contributing to CRIU for sometime and I'm hoping that my
familiarity with the project would be sufficient to self-nominate as a
maintainer. I would like to help with code reviews, submitting patches,
implementing new features, and maintaining the project in general.
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
ghost_holes_large00 is a test which creates a large ghost sparse file with 1GiB
hole(pwrite can only handle 2GiB maximum on 32-bit system) and 8KiB data, criu
should be able to handle this kind of situation.
ghost_holes_large01 is a test which creates a large ghost sparse file with 1GiB
hole and 2MiB data, since 2MiB is larger than the default ghost_limit(1MiB),
criu should fail on this test.
v2: fix overflow on 32-bit arch.
Signed-off-by: Liang-Chun Chen <featherclc@gmail.com>
unlink_largefile test
In the past, the unlink_largefile test should be fail on large ghost file.
However, it used sparse file, it will pass in current criu, since the large
ghost sparse file issue was fixed.
So the crfail flag of this test should be removed.
Signed-off-by: Liang-Chun Chen <featherclc@gmail.com>
files-reg.c checks whether the file size is larger than ghost_limit with st_size
(in dump_ghost_remap), which can not deal with large ghost sparse file, since
its actual file size is not the same as what st_size shows.
Therefore, in this commit, I replace st_size with st_blocks, which shows the
actual file size. (1 block = 512B), thus criu can deal with large ghost sparse
file.
Signed-off-by: Liang-Chun Chen <featherclc@gmail.com>
This test specifically wants to create external bind-mount of "/" from
criu mntns to test mntns, and it wants "/" in criu mntns to be a shared
mount so that "external" mount in the test mntns is it's slave. This is
to triger specific dirname() resolution which happens only when sharing
restore is involved for external mounts, and only if rootfs is involved.
But initially I missed that when we create external mount in test's
temporary mntns it creates a propagation in criu mntns on top of root
mount. This mount may influence other tests restore as child mount in
root mount converts to locked child mount in criu service mntns (for uns
flavour) and when criu would restore root container mount it would fail
with EINVAL on non recursive bind with locked children.
To fix this mess we just need to prohibit propagating from tests
temporary mntns to criu mntns by making mounts slave.
Fixes: #1941
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
If root mount in criu mntns is slave, it would be slave of host mount
where criu is stored, so if someone mounts something in subdir of
{criu-dir}/test/ on host while tests are running this mount can
influence the test as it appears on top of root mount in criu mntns.
1) With mount-compat this mount can get into restored test mntns, which
means wrong restore, as this mount was not there on dump.
2) With mount-v2 this mount would just fail container restore, as root
container mount is mounted non-recursively to protect from unexpected
mounts appear after restore.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
On Arch Linux with 5.18.3-zen1-1-zen kernel, the vdso's size is 3 pages which
exceeds the current 2-page reserved buffer. This commit simply increases the
reserved buffer size to 4 pages.
Fixes: https://github.com/checkpoint-restore/criu/issues/1916
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Normally, vsyscall vma has VM_READ, VM_EXEC permission. However, when
CONFIG_LEGACY_VSYSCALL_XONLY=y, that vma only has VM_EXEC. This commit removes
the permission part when checking to skip vsyscall vma in x32 tests.
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Error from:
./test/zdtm.py run -t zdtm/static/fpu00 --fault 134 -f h --norst
(00.003111) Dumping GP/FPU registers for 56
(00.003121) Error (compel/arch/x86/src/lib/infect.c:310): Corrupting fpuregs for 56, seed 1651766595
(00.003125) Error (compel/arch/x86/src/lib/infect.c:314): Can't set FPU registers for 56: Invalid argument
(00.003129) Error (compel/src/lib/infect.c:688): Can't obtain regs for thread 56
(00.003174) Error (criu/cr-dump.c:1564): Can't infect (pid: 56) with parasite
See also:
145e9e0d8c6 ("x86/fpu: Fail ptrace() requests that try to set invalid MXCSR values")
145e9e0d8c
We decided to move from mxcsr cleaning up scheme and use mxcsr mask
(0x0000ffbf) as kernel does. Thanks to Dmitry Safonov for pointing out.
Tested-on: Intel(R) Xeon(R) CPU E3-1246 v3 @ 3.50GHz
Reported-by: Mr. Jenkins
Suggested-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
1. For some reason, Marier distribution headers
not correctly define __GLIBC_HAVE_KERNEL_RSEQ
compile-time constant. It remains undefined,
but in fact header files provides corresponding
rseq types declaration which leads to conflict.
2. Another issue, is that they use uint*_t types
instead of __u* types as in original rseq.h.
This leads to compile time issues like this:
format '%llx' expects argument of type 'long long unsigned int', but argument 5 has type 'uint64_t' {aka 'long unsigned int'}
and we can't even replace %llx to %PRIx64 because it will break
compilation on other distros (like Fedora) with analogical error:
error: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 6 has type ‘__u64’ {aka ‘long long unsigned int’}
Let's use our-own struct rseq copy fully equal to the kernel one,
it's safe because this structure is a part of Linux Kernel ABI.
Fixes#1934
Reported-by: Nikola Bojanic
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Add a simple test using tail to check that processes can't be restored
by default when the r/w/x mode of an open file changes, unless
--skip-file-rwx-check is used.
Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
A file's r/w/x changing between checkpoint and restore does
not necessarily imply that something is wrong. For example,
if a process opens a file having perms rw- for reading and
we change the perms to r--, the process can be restored and
will function as expected.
Therefore, this patch adds an option
--skip-file-rwx-check
to disable this check on restore. File validation is unaffected
and should still function as expected with respect to the content
of files.
Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
stopped03 check that stopped by SIGTSTP tasks are restored correctly.
stopped04 check that stopped by SIGSTOP tasks which have blocked SIGTSTP and
have SIGTSTP pending are restored correctly.
Signed-off-by: Yuriy Vasiliev <yuriy.vasiliev@openvz.org>
Add SIGTSTP signal dump and restore. Add a corresponding field
in the image, save it only if a task is in the stopped state.
Restore task state by sending desired stop signal if it is present
in the image. Fallback to SIGSTOP if it's absent.
Signed-off-by: Yuriy Vasiliev <yuriy.vasiliev@openvz.org>
Else we trigger BUG in task_reset_dirty_track():
Error (criu/mem.c:45): BUG at criu/mem.c:45
The check in kerndat_get_dirty_track() does not work right.
https://github.com/checkpoint-restore/criu/issues/1917
Reported-by: @mrc1119
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Currently, the content of anonymous private hugetlb mapping is dumped in 2
different images: memfd approach and normal private mapping dumping. In memfd
approach, we dump the content of the backing pseudo file (/anon_hugepage). This
is incorrect and redundant since the mapping is private, the content of backing
file may differ from the content of the mapping. With this commit, we remove the
redundant memfd approach dump and only do the normal private mapping dump on
anonymous hugetlb mapping.
Run zdtm.py run -f h --keep-img always -t zdtm/static/maps09, du -h in the
dumped image directory
Before this commit
13M test/dump/zdtm/static/maps09/55/1
After this commit
8.5M test/dump/zdtm/static/maps09/55/1
The reduction in size is approximately 4MB which is the size of anonymous
private hugetlb mapping in the test.
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Before this patch, if we had a unixsk with incomming scm packets (with
fds) and with the sender side fd closed, we got an error:
Error (criu/sk-unix.c:1125): unix: Can't find sender for 0x1e
First part of the problem is that unix_note_scm_rights() expects to see
a "queuer" which would send scm packets to the unixsk, and there is no
as the sender side is closed.
Second part of the problem is that we already have "fake" queuers
feature so that it already creates a unix socket pair and leaves other
end open for later queuing packets. But function add_fake_unix_queuers()
is called after unix_note_scm_rights() thus there is no chance to find
queuer at the point of failure.
Third part is that when we look for a queuer in find_queuer_for() we
actually look for a socket for which we are a queuer and not for the
socket which is a queuer for us, which is opposite to the name. For
cases where both ends are alive both are queuers for each other so this
was not important, but for our closed sender case it breaks.
So let's reorder add_fake_unix_queuers() before unix_note_scm_rights()
and make find_queuer_for() actually do what it's name implies.
This situation is started to reproduce on Virtuozzo start/stop tests
with the unixsk belonging to systemd, we suppose that this state where
the sender fd side is closed happens rarely only on systemd start/stop,
so we don't see it in regular suspend resume of long-living containers.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
criu-ns script incorrectly compares the pidns fd with mntns fd.
Also reversed the condition in is_my_namespace function to align it
with the function name.
Signed-off-by: Ashutosh Mehra <asmehra@redhat.com>
As private hugetlb mappings are not pre-mapped, the content of them is restored
in the the restorer which cannot use page_read->read_pages. As a result, we
cannot recursively read the content of pre-dumped image in the parent directory
and use preadv to read the content from the last dumped image only. Therefore,
it may freeze while restoring when the content of mapping is in pre-dumped image
in parent directory.
We need to skip pre-dumping on hugetlb mappings to resolve the issue.
Suggested-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
It can be confusing to see error from post-dump action script and non
zero return from criu though at the same time see "Dumping finished
successfully" in log. I believe it is logical to consider post-dump
action script as a part of "dump" process so fail in it means that the
whole dump failed.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
* Fixes for pre-dump read mode
* Fixes for mount-v2
* amdgpu plugin build and installation fixes
* Some minor CI related fixes
Signed-off-by: Adrian Reber <areber@redhat.com>
This test has one external mount [criumntns] /zdtm_root_ext.tmp ->
[testmntns] /mnt_root_ext.test, and it specifically gives '--external
mnt[MNT]:.zdtm_root_ext.tmp' option on restore without '/' to make
dirname on it return static '.' path (see glibc dirname() code) and
reproduce a segfault in resolve_mountpoint().
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Else we have a Segmentation fault in __move_mount_set_group() on
xfree(source_mp) if resolve_mountpoint() returned statically allocated
path.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
It's a problem when while restoring sharing group we need to copy
sharing between two mounts with non-intersecting roots, because kernel
does not allow it.
We have a case https://github.com/opencontainers/runc/pull/3442, where
runc adds different devtmpfs file-bindmounts to container and there is
no fsroot mount in container for this devtmpfs, thus mount-v2 faces the
above problem.
Luckily for the case of external mounts which are in one sharing group
and which have non-intersecting roots, these mounts likely only have
external master with no sharing, so we can just copy sharing from
external source and make it slave as a workaround.
https://github.com/checkpoint-restore/criu/issues/1886
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
This helper restores master_id and shared_id of first mount in the
sharing group. It first copies sharing from either external source or
internal parent sharing group and makes master_id from shared_id. Next
it creates new shared_id when needed.
All other mounts except first are just copied from the first one.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Building the criu packages for Ubuntu/Debian fails with:
mkdir: cannot create directory '/var/lib/criu': Permission denied
This patch updates PLUGINDIR with the value /usr/lib/criu
Fixes: #1877
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
When building packages for CRIU the source directory might have a
name different than 'criu'.
Fixes: #1877
Reported-by: @siris
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>