The restore of a task with shadow stack enabled adds these steps:
* switch from the default shadow stack to a temporary shadow stack
allocated in the premmaped area
* unmap CRIU mappings; nothing changed here, but it's important that
CRIU mappings can be removed only after switching to a temporary
shadow stack
* create shadow stack VMA with map_shadow_stack()
* restore shadow stack contents with wrss
* switch to "real" shadow stack
* lock shadow stack features
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Note: Silently drops MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED as it's
not currently detectable. This is still better than silently dropping
all membarrier() registrations.
Signed-off-by: Michał Mirosław <emmir@google.com>
Will use this for cross mount namespace bindmounts.
Note: don't need separate kdat for mount-v2, as MOVE_MOUNT_SET_GROUP
were added much later than open_tree and all related fixups.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Mounts-v2 requires new kernel feature MOVE_MOUNT_SET_GROUP to be able to
restore propagation between mounts right.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/7da7f9a17
Changes: define move_mount syscall, check mainstream kernel
MOVE_MOUNT_SET_GROUP feature, use our "linux/mount.h" to overcome
possible problems of non-existing header on older kernels.
v3: coverity CID 389201: check ret of umount2 and rmdir at cleanup stage
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
pidfd_getfd syscall will be needed later to send pidfds between
pre-dump/dump iterations for pid reuse detection.
v2:
- check size written/read of val_a/val_b is correct
- return with error when val_a != val_b
Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
pidfd_open syscall will be needed later to send pidfds between
pre-dump/dump iterations for pid reuse detection.
v2:
- make kerndat_has_pidfd_open void since 0 is always returned
- fix missing tabs in syscall tables
Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
musl defines 'loff_t' in fcntl.h as 'off_t'.
This patch resolves the following error when running the compel tests
on Alpine Linux:
gcc -O2 -g -Wall -Werror -c -Wstrict-prototypes -fno-stack-protector -nostdlib -fomit-frame-pointer -ffreestanding -fpie -I ../../../compel/include/uapi -o parasite.o parasite.c
In file included from ../../../compel/include/uapi/compel/plugins/std/syscall.h:8,
from ../../../compel/include/uapi/compel/plugins/std.h:5,
from parasite.c:3:
../../../compel/include/uapi/compel/plugins/std/syscall-64.h:19:66: error: unknown type name 'loff_t'; did you mean 'off_t'?
19 | extern long sys_pread (unsigned int fd, char *buf, size_t count, loff_t pos) ;
| ^~~~~~
| off_t
../../../compel/include/uapi/compel/plugins/std/syscall-64.h:96:46: error: unknown type name 'loff_t'; did you mean 'off_t'?
96 | extern long sys_fallocate (int fd, int mode, loff_t offset, loff_t len) ;
| ^~~~~~
| off_t
../../../compel/include/uapi/compel/plugins/std/syscall-64.h:96:61: error: unknown type name 'loff_t'; did you mean 'off_t'?
96 | extern long sys_fallocate (int fd, int mode, loff_t offset, loff_t len) ;
| ^~~~~~
| off_t
make[1]: *** [Makefile:32: parasite.o] Error 1
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Linux kernel 5.4 extends clone3() with set_tid to allow processes to
specify the PID of a newly created process. This introduces detection
of the clone3() syscall and if set_tid is supported.
This first implementation is X86_64 only.
Signed-off-by: Adrian Reber <areber@redhat.com>
I've mentioned the problem that after c/r each inotify receives one or
more unexpected events.
This happens because our algorithm mixes setting up an inotify watch on
the file with opening and closing it.
We mix inotify creation and watched file open/close because we need to
create the inotify watch on the file from another mntns (generally). And
we do a trick opening the file so that it can be referenced in current
mntns by /proc/<pid>/fd/<id> path.
Moreover if we have several inotifies on the same file, than queue gets
even more events than just one which happens in a simple case.
note: For now we don't have a way to c/r events in queue but we need to
at least leave the queue clean from events generated by our own.
These, still, looks harder to rewrite wd creation without this proc-fd
trick than to remove unexpected events from queues.
So just cleanup these events for each fdt-restorer process, for each of
its inotify fds _after_ restore stage (at CR_STATE_RESTORE_SIGCHLD).
These is a closest place where for an _alive_ process we know that all
prepare_fds() are done by all processes. These means we need to do the
cleanup in PIE code, so need to add sys_ppoll definitions for PIE and
divide process in two phases: first collect and transfer fds, second do
real cleanup.
note: We still do prepare_fds() for zombies. But zombies have no fds in
/proc/pid/fd so we will collect no in collect_fds() and therefore we
have no in prepare_fds(), thus there is no need to cleanup inotifies for
zombies.
v2: adopt to multiple unexpected events
v3: do not cleanup from fdt-receivers, done from fdt-restorer
v4: do without additional fds restore stage
v5: replace sys_poll with sys_ppoll and fix minor nits
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
use ppoll always and remove poll
This reduces memory usage if image files are stored on tmpfs.
Signed-off-by: Pawel Stradomski <pstradomski@google.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Commit 37e4c7bfc264 fixed arm, ppc, x86 (32bit),
while it made wrong definition of x86_64. Fix that.
Also, add commentary to raw fork() implementation.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
It has two arguments "pos_l and "pos_h" instead of one "off". It is used
to handle 64-bit offsets on 32-bit kernels.
SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h)
https://github.com/checkpoint-restore/criu/issues/424
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The right order for all of our 4 archs is:
SYSCALL_DEFINE5(clone, unsigned long, clone_flags, unsigned long, newsp,
int __user *, parent_tidptr,
unsigned long, tls,
int __user *, child_tidptr)
See Linux kernel for the details.
Note, this is just a fix, and it's not connected with the second patch.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Reviewed-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The objective is to only do parasite code linking once -- when we link
parasite objects with compel plugin(s). So, let's use ar (rather than
ld) here. This way we'll have a single ld invocation with the proper
flags (from compel ldflags) etc.
There are two tricks in doing it:
1. The order of objects while linking is important. Therefore, compel
plugins should be the last to add to ld command line.
2. Somehow ld doesn't want to include parasite-head.o in the output
(probably because no one else references it), so we have to force
it in with the modification to our linker scripts.
NB: compel makefiles are still a big mess, but I'll get there.
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
And use it in CRIU directly instead:
- move syscalls into compel/arch/ARCH/plugins/std/syscalls
- drop old symlinks
- no build for 32bit on x86 as expected
- use std.built-in.o inside criu directly (compel_main stub)
- drop syscalls on x86 criu directory, I copied them already
in first compel commist, so we can't move them now, but
delete in place
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The prologue includes routines needed for parasite blob to work
and is always included with the std plugin.
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The plugin provides basic features as string copying, syscalls, printing.
Not used on its own by now but will be shipping by default with other
plugins.
With great help from Dmitry Safonov.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>