* handle unexpected errors of process_vm_readv
* adjust riovs in analyze_iov
* call handle_faulty_iov only if process_vm_readv returns EFAULT.
Signed-off-by: Andrei Vagin <avagin@gmail.com>
But actually, 5a92f100b88e probably has to be reverted as a whole.
PIPE_MAX_SIZE is the hard limit to avoid PAGE_ALLOC_COSTLY_ORDER
allocations in the kernel. But F_SETPIPE_SZ rounds up a requested pipe
size to a power-of-2 pages. It means that when we request PIPE_MAX_SIZE
that isn't a power-of-2 number, we actually request a pipe size greater
than PIPE_MAX_SIZE.
Fixes: 5a92f100b88e ("page-pipe: Resize up to PIPE_MAX_SIZE")
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Name collision with an abandoned project named 'crit' in pypi causes pip
to show crit (CRiu Image Tool) as outdated. This patch updates crit to
use the same version and license as criu.
Fixes#1878
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Amongst a huge number of fixes all over the place this release introduces:
* mount-v2 engine
* support for MAP_HUGETLB mappings
* support for Linux Restartable Sequences
* support for SOCK_SEQPACKET unix sockets
* CRIU AMD GPU plugin
* setsockopt(SO_BUF_LOCK) support for tcp sockets
Signed-off-by: Adrian Reber <areber@redhat.com>
Currently we check memfd_hugetlb by doing memfd_create("", MFD_HUGETLB).
If we see EINVAL we report that it's not supported, but we can also
get ENOENT error in such case in hugetlb_file_setup() while trying
to find proper hugetlbfs mount.
Reference:
https://github.com/torvalds/linux/blob/06fb4ecfeac/fs/hugetlbfs/inode.c#L1465
Fixes: 4245e6b02fa ("check: Add a check for using memfd with hugetlb")
Reported-by: Mr. Jenkins (ppc64le)
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
GitHub Actions comes with pre-installed criu in /usr. configure scripts
looking for CRIU will pickup the pre-installed version in /usr if we do
not install CI criu also in /usr.
Signed-off-by: Adrian Reber <areber@redhat.com>
bind_on_delete() return code is only used for setting errno for pr_perror()
This is mostly useless since a lot of syscalls already set it. All of
non-syscall errors already have prints in case of failure.
Fix bind_on_deleted() always returning 0 and simplify error juggling to
returning -1 in case of errors.
Fixes: #1771
Fixes: d0308e5ecc1c ("sk-unix: make criu respect existing files while restoring ghost unix socket fd")
Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com>
The map_extra field has been introduced in Linux Kernel release 5.16
and does not exist in older kernel versions. The current parsing
implementation fails when map_extra is missing.
In particular, it tries to parse the `memlock` field as `map_extra` and
fails but it does not exit with an error because map_extra is marked as
"optional". It then tries to parse the `map_id` field as `memlock` and
fails with an error because map_id is not optional:
Error (criu/proc_parse.c:2161): parse_fdinfo_pid_s: error parsing [map_type:\t2] for 19: Success'
To correctly handle this, we should try to parse again the next field
when parsing of `map_extra` fails, without reading the next line from
the bpfmap.
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
It looks like we've got broken fhandles from fdinfo
for inotifies/fanotifies for btrfs. I will look into that.
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
We have a separate target for alpine in script/ci/Makefile
which defines some extra opts for zdtm using ZDTM_OPTIONS
variable. But really it doesn't work. First of all, variable
should be named as ZDTM_OPTS and also we have to specify
it directly in the CONTAINER_RUNTIME cmdline to make it work.
I've also changed variable value just to make it consistent
with docker.env value which was really used.
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
That's strange but rseq02 test fails with:
09:06:16.222: 51: exit 555f52082120 555f52082120
09:06:16.282: 51: exit 555f52082120 555f52082120
09:06:16.340: 51: exit 555f52082120 555f52082120
09:06:16.397: 51: exit 555f52082120 555f52082120
09:06:16.503: 51: exit 0 555f52082120
09:06:16.503: 51: FAIL: rseq02.c:235: Failed to increment per-cpu counter (errno = 2 (No such file or directory))
09:06:16.503: 51: FAIL: rseq02.c:246: (errno = 16 (Device or resource busy))
It means that rseq_cs pointer was cleaned up by the kernel despite of
NO_RESTART* flags. That's a hardly reproducible and I will investigate that.
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Userspace may configure rseq cs abort policy by
setting RSEQ_CS_FLAG_NO_RESTART_ON_* flags.
In ("cr-dump: fixup thread IP when inside rseq cs") we have supported
the case when process was caught by CRIU during rseq cs execution by
fixing up IP to abort_ip. Thats a common case, but there is special flag
called RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL, in this case we have to leave
process IP as it was before CRIU seized it. Unfortunately, that's not
all that we need here. We also must preserve (struct rseq)->rseq_cs field.
You may ask like "why we need to preserve it by hands? CRIU is dumping
all process memory and restores it". That's true. But not so easy. The problem
here is that the kernel performs this field cleanup when it realized that
the process gets out of rseq cs. But during dump/restore procedures we are
executing parasite/restorer from the process context. It means that process
will get out of rseq cs in any case and (struct rseq)->rseq_cs will be cleared
by the kernel. So we need to restore this field by hands at the *last* stage
of restore just before releasing processes.
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
If we caught the process when it's inside rseq
critical section we have to handle it properly.
From the kernel side of view, if the process
is executing inside the rseq cs and gets a signal,
rseq critical section execution will be interrupted
and after signal handler execution, we will proceed
to rseq cs abort handler instead of continuing normal
rseq cs execution (if RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL
isn't set).
When CRIU seizes processes that's the same thing as
getting signal from the rseq point of view. So we need
to fixup instruction pointer to rseq cs abort handler
address.
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Fresh Glibc does rseq() register by default. We need to unregister
rseq before registering our own.
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Let's take thread_pointer() implementation from Glibc.
It will be useful in the further because Glibc stores
struct rseq on the TLS. Absolute address can be calculated
as __criu_thread_pointer() + __rseq_offset.
__rseq_offset is an exported symbol from Glibc itself.
We need to have an ability to determine where struct
rseq is stored to unregister it in CRIU during the restore
stage.
For different libc like musl-libc we will have to handle
rseq separately depends on how struct rseq is stored.
Right now that's not a problem because musl-libc has no
rseq support, so we don't need to unregister it.
https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=8dbeb0561eeb876f557ac9eef5721912ec074ea5https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=cb976fba4c51ede7bf8cee5035888527c308dfbc
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
We have ability to use nested virtualization on
Cirrus, and already have "Vagrant Fedora based test (no VDSO)"
test, let's do analogical for Fedora Rawhide to get fresh kernel.
Suggested-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Let's see how rseq() C/R feature works
This reverts commit d99def7dcfa938918368c91021f72a77f738bc61.
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Here we just want to check that if rseq was registered before C/R
it remains registered after it.
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
A lot of kernel versions lacks support for ptrace(PTRACE_GET_RSEQ_CONFIGURATION).
But the userspace may be fresh (for instance containers with fresh Fedora runs
on CentOS 7 host). Consider two scenarious:
- kernel has no ptrace(PTRACE_GET_RSEQ_CONFIGURATION) support
1. there is a process which use rseq => fail dump
2. there is no process which use rseq => we can dump without any problems
But how to determine if process use rseq or not without get_rseq_conf feature?
Let's just try to do rseq registration from the parasite. If rseq is already
registered then we'll got EBUSY error. If not we'll success in registration.
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Support basic rseq C/R scenario. Assume that:
- there are no processes with IP inside the rseq critical section (CS)
- kernel has ptrace(PTRACE_GET_RSEQ_CONFIGURATION) support
On dump:
1. use ptrace(PTRACE_GET_RSEQ_CONFIGURATION) to get
struct rseq pointer, rseq size and signature from the kernel.
2. save to the image
On restore:
1. get rseq ptr, size, signature from the image
2. register it back using rseq() from the restorer parasite
Fixes: #1696
Reported-by: Radostin Stoyanov <radostin@redhat.com>
Suggested-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Add "get_rseq_conf" feature corresponding to the
ptrace(PTRACE_GET_RSEQ_CONFIGURATION) support.
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
The code expected that the cgroup directory ends with a ',' and
unconditionally removes the last character. For the "unified" case this
resulted in the last 'd' being remove instead of the non existing comma.
This just adds a comma after "unified" so that the last removed
character is not the 'd'.
Suggested-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
Those that codespell have a few variants for:
./soccr/soccr.c:219: thise ==> these, this
./soccr/soccr.c:444: sence ==> sense, since
./criu/net.c:665: ot ==> to, of, or
./criu/net.c:775: ot ==> to, of, or
./criu/files.c:1244: wan't ==> want, wasn't
./criu/kerndat.c:1141: happend ==> happened, happens, happen
./criu/mount-v2.c:781: carefull ==> careful, carefully
./test/zdtm/static/socket_aio.c:54: Chiled ==> Child, chilled
./test/zdtm/static/socket_listen6.c:73: Chiled ==> Child, chilled
./test/zdtm/static/socket_listen.c:73: Chiled ==> Child, chilled
./test/zdtm/static/socket_listen4v6.c:73: Chiled ==> Child, chilled
./test/zdtm/static/sk-unix-dgram-ghost.c:201: childs ==> children, child's
./test/zdtm/static/sk-unix-dgram-ghost.c:205: childs ==> children, child's
./compel/arch/x86/src/lib/infect.c:297: automatical ==> automatically, automatic, automated
While at it, do some other minor fixes in the same lines.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
I am not sure if this is going to bring any compatibility issues.
If yes, we need to remove this patch and add "useable" to the list of
ignored words instead.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>