The restore of a task with shadow stack enabled adds these steps:
* switch from the default shadow stack to a temporary shadow stack
allocated in the premmaped area
* unmap CRIU mappings; nothing changed here, but it's important that
CRIU mappings can be removed only after switching to a temporary
shadow stack
* create shadow stack VMA with map_shadow_stack()
* restore shadow stack contents with wrss
* switch to "real" shadow stack
* lock shadow stack features
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
To support sigreturn with CET enabled parasite must rewind its stack
before calling sigreturn so that shadow stack will be compatible with
actual calling sequence.
In addition, calling sigreturn from top level routine
(__export_parasite_head_start) will significantly simplify the shadow
stack manipulations required to execute sigreturn.
For x86 make fini_sigreturn() return the stack pointer for the signal
frame that will be used by sigreturn and propagate that return value up
to __export_parasite_head_start.
In non-daemon mode parasite_trap_cmd() returns non-positive value
which allows to distinguish daemon and non-daemon mode and properly stop
at int3 in non-daemon mode.
Architectures other than x86 remain unchanged and will still call
sigreturn from fini_sigreturn().
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Note: Silently drops MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED as it's
not currently detectable. This is still better than silently dropping
all membarrier() registrations.
Signed-off-by: Michał Mirosław <emmir@google.com>
I am not sure if this is going to bring any compatibility issues.
If yes, we need to remove this patch and add "useable" to the list of
ignored words instead.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Will use this for cross mount namespace bindmounts.
Note: don't need separate kdat for mount-v2, as MOVE_MOUNT_SET_GROUP
were added much later than open_tree and all related fixups.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Mounts-v2 requires new kernel feature MOVE_MOUNT_SET_GROUP to be able to
restore propagation between mounts right.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/7da7f9a17
Changes: define move_mount syscall, check mainstream kernel
MOVE_MOUNT_SET_GROUP feature, use our "linux/mount.h" to overcome
possible problems of non-existing header on older kernels.
v3: coverity CID 389201: check ret of umount2 and rmdir at cleanup stage
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
pidfd_getfd syscall will be needed later to send pidfds between
pre-dump/dump iterations for pid reuse detection.
v2:
- check size written/read of val_a/val_b is correct
- return with error when val_a != val_b
Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
pidfd_open syscall will be needed later to send pidfds between
pre-dump/dump iterations for pid reuse detection.
v2:
- make kerndat_has_pidfd_open void since 0 is always returned
- fix missing tabs in syscall tables
Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
musl defines 'loff_t' in fcntl.h as 'off_t'.
This patch resolves the following error when running the compel tests
on Alpine Linux:
gcc -O2 -g -Wall -Werror -c -Wstrict-prototypes -fno-stack-protector -nostdlib -fomit-frame-pointer -ffreestanding -fpie -I ../../../compel/include/uapi -o parasite.o parasite.c
In file included from ../../../compel/include/uapi/compel/plugins/std/syscall.h:8,
from ../../../compel/include/uapi/compel/plugins/std.h:5,
from parasite.c:3:
../../../compel/include/uapi/compel/plugins/std/syscall-64.h:19:66: error: unknown type name 'loff_t'; did you mean 'off_t'?
19 | extern long sys_pread (unsigned int fd, char *buf, size_t count, loff_t pos) ;
| ^~~~~~
| off_t
../../../compel/include/uapi/compel/plugins/std/syscall-64.h:96:46: error: unknown type name 'loff_t'; did you mean 'off_t'?
96 | extern long sys_fallocate (int fd, int mode, loff_t offset, loff_t len) ;
| ^~~~~~
| off_t
../../../compel/include/uapi/compel/plugins/std/syscall-64.h:96:61: error: unknown type name 'loff_t'; did you mean 'off_t'?
96 | extern long sys_fallocate (int fd, int mode, loff_t offset, loff_t len) ;
| ^~~~~~
| off_t
make[1]: *** [Makefile:32: parasite.o] Error 1
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
We don't need to push 0 on the stack. This seems to be a remnant of the
initial commit of 2011 that had a `pushq $0`.
The 16 bytes %rsp alignment was added with commit 2a0cea29 in 2012.
This is no longer necessary as we already guarantee that %rsp is 16
bytes aligned. A BUG_ON() is added to enforce this guarantee.
Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
Previously, __export_parasite_cmd was located in parasite-head.S, and
__export_parasite_args located exactly at the end of the parasite blob.
This is not ideal for various reasons:
1) These two variables work together. It would be preferrable to have
them in the same location
2) This prevent us from allocating another section betweeen the parasite
blob and the args area. We'll need this to allocate a GOT table
This commit changes the allocation of these symbols from assembly/linker
script to a C file.
Moreover, the assembly entry points that invoke parasite_service()
prepares arguments with hand crafted assembly. This is unecessary.
This commit rewrite this logic with regular C code.
Note: if it wasn't for the x86 compat mode, we could remove all
parasite-head.S files and directly jump to parasite_service() via
ptrace. An int3 architecture specific equivalent could be called at the
end of parasite_service() with an inline asm statement.
Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
Linux kernel 5.4 extends clone3() with set_tid to allow processes to
specify the PID of a newly created process. This introduces detection
of the clone3() syscall and if set_tid is supported.
This first implementation is X86_64 only.
Signed-off-by: Adrian Reber <areber@redhat.com>
`pushq` sign-extends the value. Which is a bummer as the label's address
may be higher that 2Gb, which means that the sign-bit will be set.
As it long-jumps with ia32 selector, %r11 can be scratched.
Use %r11 register as a temporary to push the 32-bit address.
Complements: a9a760278c ("arch/x86: push correct eip on the stack
before lretq")
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Reported-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
I've mentioned the problem that after c/r each inotify receives one or
more unexpected events.
This happens because our algorithm mixes setting up an inotify watch on
the file with opening and closing it.
We mix inotify creation and watched file open/close because we need to
create the inotify watch on the file from another mntns (generally). And
we do a trick opening the file so that it can be referenced in current
mntns by /proc/<pid>/fd/<id> path.
Moreover if we have several inotifies on the same file, than queue gets
even more events than just one which happens in a simple case.
note: For now we don't have a way to c/r events in queue but we need to
at least leave the queue clean from events generated by our own.
These, still, looks harder to rewrite wd creation without this proc-fd
trick than to remove unexpected events from queues.
So just cleanup these events for each fdt-restorer process, for each of
its inotify fds _after_ restore stage (at CR_STATE_RESTORE_SIGCHLD).
These is a closest place where for an _alive_ process we know that all
prepare_fds() are done by all processes. These means we need to do the
cleanup in PIE code, so need to add sys_ppoll definitions for PIE and
divide process in two phases: first collect and transfer fds, second do
real cleanup.
note: We still do prepare_fds() for zombies. But zombies have no fds in
/proc/pid/fd so we will collect no in collect_fds() and therefore we
have no in prepare_fds(), thus there is no need to cleanup inotifies for
zombies.
v2: adopt to multiple unexpected events
v3: do not cleanup from fdt-receivers, done from fdt-restorer
v4: do without additional fds restore stage
v5: replace sys_poll with sys_ppoll and fix minor nits
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
use ppoll always and remove poll
This reduces memory usage if image files are stored on tmpfs.
Signed-off-by: Pawel Stradomski <pstradomski@google.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Commit 37e4c7bfc264 fixed arm, ppc, x86 (32bit),
while it made wrong definition of x86_64. Fix that.
Also, add commentary to raw fork() implementation.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
It has two arguments "pos_l and "pos_h" instead of one "off". It is used
to handle 64-bit offsets on 32-bit kernels.
SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h)
https://github.com/checkpoint-restore/criu/issues/424
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The right order for all of our 4 archs is:
SYSCALL_DEFINE5(clone, unsigned long, clone_flags, unsigned long, newsp,
int __user *, parent_tidptr,
unsigned long, tls,
int __user *, child_tidptr)
See Linux kernel for the details.
Note, this is just a fix, and it's not connected with the second patch.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Reviewed-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The objective is to only do parasite code linking once -- when we link
parasite objects with compel plugin(s). So, let's use ar (rather than
ld) here. This way we'll have a single ld invocation with the proper
flags (from compel ldflags) etc.
There are two tricks in doing it:
1. The order of objects while linking is important. Therefore, compel
plugins should be the last to add to ld command line.
2. Somehow ld doesn't want to include parasite-head.o in the output
(probably because no one else references it), so we have to force
it in with the modification to our linker scripts.
NB: compel makefiles are still a big mess, but I'll get there.
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
It was never functional neither we plan to support
native ia32 mode, so drop these incomplete code
pieces out.
- Presumably we will need TASK_SIZE for compat
mode so I provide TASK_SIZE_IA32 for this sake
- 32 bit syscalls are remaining for a while
Acked-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Acked-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
This is the difference between two commits
criu-dev/b0f6f293/Unify own memcpy/memset/memcmp
master/0367a1fe/Drop prefix from own memcpy/memset/memcmp
that makes criu-dev after rebase on master with latter commit
be the same as it was with former commit before rebase.
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
It's a workaround to clang-3.4, which doesn't handle numbers
in asm macros rightly:
https://llvm.org/bugs/show_bug.cgi?id=21500
Which resulted in:
CC compel/arch/x86/plugins/std/parasite-head.o
<instantiation>:3:2: error: too few operands for instruction
pushq
^
compel/arch/x86/plugins/std/parasite-head.S:26:2: note: while in macro instantiation
PARASITE_ENTRY
^
Fixes: https://travis-ci.org/0x7f454c46/criu/jobs/186099057
travis-ci: success for 32-bit tests fixes
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Otherwise we'll try to set 32-bit register set to 64-bit task,
which is not possible with ptrace - it uses register set size,
according to processes mode. So we should set 32-bit regset
only to tasks those are in 32-bit mode already.
Please, see inline comment in the patch for more info.
travis-ci: success for 32-bit tests fixes
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
To drop the second parasite blob, create another entry in 64-bit
parasite.
Didn't remove parasite-head-compat.S - it we gonna support native 32-bit
buids, we gonna need it.
travis-ci: success for Rectify 32-bit compatible C/R on x86
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Remove compatible sigset structure: as it has the same size for both
32-bit and 64-bit, I didn't use it across the code, only for a size check.
The check is removed as we use now only k_rtsigset_t.
Wordsize for sigset is changed to 64-bit - as it's written in comment
for possible 32-bit native building.
If we ever going to support compat mode for other archs, we will
need to re-introduce compat_sigset_t type if it has for those archs
different sizes for compat/native builds.
But for a while, let's simplify this.
travis-ci: success for Compel/compat cleanups
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Providing infect functionality inside std plugin
doesn't look suite for me: the restorer has to define
dummy parasite_daemon_cmd/parasite_trap_cmd/parasite_cleanup
just to be able to compile with it.
So we have to define weak stubs right here in near future.
travis-ci: success for compel: The final infect move and install target
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
And use it in CRIU directly instead:
- move syscalls into compel/arch/ARCH/plugins/std/syscalls
- drop old symlinks
- no build for 32bit on x86 as expected
- use std.built-in.o inside criu directly (compel_main stub)
- drop syscalls on x86 criu directory, I copied them already
in first compel commist, so we can't move them now, but
delete in place
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Both std and criu will use it for syscalls sake.
Note I've to disable x86 compat mode for a while:
we have to provide native types there thus will
back once everything else is complete.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The prologue includes routines needed for parasite blob to work
and is always included with the std plugin.
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Delete plugins/include/asm/std directory - let it be without plugin name.
Make symlinks to reuse criu's files, except those, which will
be deleted after libcompel from criu (like syscalls).
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The plugin provides basic features as string copying, syscalls, printing.
Not used on its own by now but will be shipping by default with other
plugins.
With great help from Dmitry Safonov.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>