Linux kernel 5.4 extends clone3() with set_tid to allow processes to
specify the PID of a newly created process. This introduces detection
of the clone3() syscall and if set_tid is supported.
This first implementation is X86_64 only.
Signed-off-by: Adrian Reber <areber@redhat.com>
Compiling 'criu-dev' on Fedora 31 gives two errors about wrong clobber
lists:
compel/include/uapi/compel/asm/sigframe.h:47:9: error: listing the stack pointer register ‘1’ in a clobber list is deprecated [-Werror=deprecated]
criu/arch/ppc64/include/asm/restore.h:14:2: error: listing the stack pointer register ‘1’ in a clobber list is deprecated [-Werror=deprecated]
There was also a bug report from Debian that CRIU does not build because
of this.
Each of these errors comes with the following note:
note: the value of the stack pointer after an ‘asm’ statement must be the same as it was before the statement
As far as I understand it this should not be a problem in this cases as
the code never returns anyway.
Running zdtm very seldom fails during 'zdtm/static/cgroup_ifpriomap'
with a double free or corruption. This happens not very often and I
cannot verify if it happens without this patch. As CRIU does not build
without the patch.
Signed-off-by: Adrian Reber <areber@redhat.com>
SRCARCH is always equal ARCH. There are no rules when to use one or
another and architectures may forget to set one of them up.
No need for a second variable meaning the same and confusing people.
Remove it completely.
Self-correction [after some debug]: SRCARCH was different in one place:
zdtm Makefile by some unintentional mistake:
> ifeq ($(ARCH),arm64)
> ARCH ?= aarch64
> SRCARCH ?= aarch64
> endif
That meant to be "ARCH := aarch64" because "?=" would never work inside
that ifeq. Fix up this part of mess too.
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
All those compel functions can fail by various reasons.
It may be status of the system, interruption by user or anything else.
It's really desired to handle as many PIE related errors as possible
otherwise it's hard to analyze statuses of parasite/restorer
and the C/R process.
At least warning for logs should be produced or even C/R stopped.
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Unknown state means that the task in the end may be not in wanted state.
Return err code.
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
As a preparation for __must_check on compel_syscall(), check it on
close() too - maybe not as useful as with other syscalls, but why not.
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Also, don't use the magic -2 => return errno on failure.
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
>From man ptrace:
> On error, all requests return -1, and errno is set appropriately.
> Since the value returned by a successful PTRACE_PEEK* request may be
> -1, the caller must clear errno before the call, and then check
> it afterward to determine whether or not an error occurred.
FWIW: if ptrace_peek_area() is called with (errno != 0) it may
false-fail if the data is (-1).
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Before the 5.2 kernel, only fpu_state->fpu_state_64.xsave has to be
64-byte aligned. But staring with the 5.2 kernel, the same is required
for pu_state->fpu_state_ia32.xsave.
The behavior was changed in:
c2ff9e9a3d9d ("x86/fpu: Merge the two code paths in __fpu__restore_sig()")
Signed-off-by: Andrei Vagin <avagin@gmail.com>
`pushq` sign-extends the value. Which is a bummer as the label's address
may be higher that 2Gb, which means that the sign-bit will be set.
As it long-jumps with ia32 selector, %r11 can be scratched.
Use %r11 register as a temporary to push the 32-bit address.
Complements: a9a760278c1a ("arch/x86: push correct eip on the stack
before lretq")
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Reported-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Right now we use pushq, but it pushes sign-extended value, so if the
parasite code is placed higher that 2Gb, we will see something like
this:
0xf7efd5b0: pushq $0x23
0xf7efd5b2: pushq $0xfffffffff7efd5b9
=> 0xf7efd5b7: lretq
Actually we want to push 0xf7efd5b9 instead of 0xfffffffff7efd5b9.
Fixes: #398
Cc: Dmitry Safonov <dima@arista.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Acked-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
I've mentioned the problem that after c/r each inotify receives one or
more unexpected events.
This happens because our algorithm mixes setting up an inotify watch on
the file with opening and closing it.
We mix inotify creation and watched file open/close because we need to
create the inotify watch on the file from another mntns (generally). And
we do a trick opening the file so that it can be referenced in current
mntns by /proc/<pid>/fd/<id> path.
Moreover if we have several inotifies on the same file, than queue gets
even more events than just one which happens in a simple case.
note: For now we don't have a way to c/r events in queue but we need to
at least leave the queue clean from events generated by our own.
These, still, looks harder to rewrite wd creation without this proc-fd
trick than to remove unexpected events from queues.
So just cleanup these events for each fdt-restorer process, for each of
its inotify fds _after_ restore stage (at CR_STATE_RESTORE_SIGCHLD).
These is a closest place where for an _alive_ process we know that all
prepare_fds() are done by all processes. These means we need to do the
cleanup in PIE code, so need to add sys_ppoll definitions for PIE and
divide process in two phases: first collect and transfer fds, second do
real cleanup.
note: We still do prepare_fds() for zombies. But zombies have no fds in
/proc/pid/fd so we will collect no in collect_fds() and therefore we
have no in prepare_fds(), thus there is no need to cleanup inotifies for
zombies.
v2: adopt to multiple unexpected events
v3: do not cleanup from fdt-receivers, done from fdt-restorer
v4: do without additional fds restore stage
v5: replace sys_poll with sys_ppoll and fix minor nits
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
use ppoll always and remove poll
Provide a way to set gettimeofday() function for an infected task.
CRIU's parasite & restorer are very voluble as more logs are better
than lesser in terms of bug investigations.
In all modern kernels there is a way to get time without entering
kernel: vdso. So, add a way to reduce the cost of logging without making
it less valuable.
[I'm not particularly fond of std_log_set_gettimeofday() name, so
if someone can come with a better naming - I'm up for a change]
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Doesn't change uapi, but makes it a bit more friendly and documented
which loglevel means what for foreign user.
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
After patching code - we need to flush CPU cache, it's done with
__builtin___clear_cache(). As we don't link to libgcc, provide a helper
that wraps ARM-specific syscall.
Fixes:
LINK criu/pie/restorer.built-in.o
ld: ./criu/arch/arm/vdso-pie.o: in function `insert_trampoline':
/root/criu/criu/arch/arm/vdso-pie.c:32: undefined reference to `__clear_cache'
Signed-off-by: Dmitry Safonov <dima@arista.com>
It's needed for PIEs, but not for the library.
It comes earlier than commit 61e6c01d0964, but I don't see the point.
Regardles, I'm a bit afraid to break s390, hopefully testing covers the
platform.
This and the next one "make: Move CR_NOGLIBC into CFLAGS_PIE" should be
reverted/dropped from criu-dev if they turn to be breaking something.
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
__ASSEMBLY__ is used to guard C-related code in headers from
asm-compatible defines. We actually want every .S file to be assembled
with -D__ASSEMBLY__ not to burst with C in asm file.
Move __ASSEMBLY__ from all local asflags to top Makefile's AFLAGS.
Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
In a function with return type bool, returning a non-zero value is
interpreted as returning true. In the error paths we want to return
false to indicate failure. Change -1 to false to fix this.
Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
sizeof(sizeof(x)) is the size of size_t. Instead use the size of the
array to ensure the entire array is zeroed.
Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
to enable Hygon Dhyana, which can reuse most AMD CPU support codes.
Signed-off-by: hygonsoc <hygonsoc@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
There are a few places where spaces have been used instead of tabs for
indentation. This patch converts the spaces to tabs for consistency
with the rest of the code base.
Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
It has a different alignment - rework ugly macro RT_SIGFRAME_UC_SIGMASK
into helpers.
Fixes: #666
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
with Android P's Clang versoin: 6.0.2, and Android NDK's Clang version 8.0.2
Clang will report below error:
criu/compel/include/uapi/compel/sigframe-common.h:55:34: error: expected member name or ';' after declaration specifiers
int __unused[32 - (sizeof (k_rtsigset_t) / sizeof (int))];
~~~ ^
it takes __unused as an attribute, not a varible, chang to _unused, pass compile.
Cc: Chen Hu <hu1.chen@intel.com>
Signed-off-by: Zhang Ning <ning.a.zhang@intel.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
in Android NDK, <elf.h> doesn't has define for:
NT_X86_XSTATE
NT_PRSTATUS
so add these defines to pass compile.
NOTE: add <linux/elf.h> will have more build errors
Cc: Chen Hu <hu1.chen@intel.com>
Signed-off-by: Zhang Ning <ning.a.zhang@intel.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Removed return value assignment statements as they are not referenced or used
anywhere after the assignment is done.
Fixes#334: Removing Unneeded Assignments
Signed-off-by: Mitul Karnik <mitulkarnik.92@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
I received this patch from Jeff Law as a fix for build failures with the
upcoming GCC 9. The following is Jeff Law's description of the patch:
Attached you'll find the fix for criu. You'll see it's just a matter
of dropping the sp/esp clobber from the relevant asm statements. THe
details:
criu has a macro which defines an asm which appears to want to set a new
stack pointer, then directly issue a sigreturn call to the kernel. Some
variants clobber sp (aarch64, arm, x86), others do not (ppc, s390)
While the asm does indeed set a new stack pointer, we never return from
a sigreturn syscall -- at least not in the normal way. We actually
return back to the point where the process was interrupted by the
signal. So describing the affect of the asm on the stack pointer is
pedantically correct, it actually has no real effect and can just be
dropped to avoid the hard error from gcc-9.
Suggested-by: Jeff Law <law@redhat.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Not needed since e2bd7af94cb7b43e2c0bed0546ef83c2f3eea25c
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
This avoids installing .gitignore file to compel includes.
While at it, describe why this .gitignore is needed.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
MAP_ANON has been deprecated, use MAP_ANONYMOUS instead.
https://lkml.org/lkml/2007/2/3/55
Acked-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
This reverts commit 6b0292e12614.
handle_binary() is needed for preparing PIE header during compilation.
If we need to provide API for building another compel binary, probably
it should be a separate library from libcompel.{a,so}
IOW, there is no sense to have handle_binary() included into CRIU.
For CRIU it's needed only during the build to generate parasite header.
Move handle_elf() to compel binary from library, as it was previously
corrected in commit 64bb75a859cd ("compel: Drop off handle-elf routines
from library").
Acked-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Libcompel supplies to the parasite PAGE_SIZE during infection.
On ppc64/aarch64 PAGE_SIZE can be different with Large pages,
under the cover it returns __page_size.
At this moment __page_size is located in criu binary, which
prevents using libcompel as .so library in other applications
than criu on ppc64/aarch64.
Export __page_size and __page_shift in libcompel.
Fixes: #572
Fixes: 2d965008d367 ("ppc64/aarch64: Dynamically define PAGE_SIZE")
Cc: Adrian Reber <areber@redhat.com>
Acked-by: Cyrill Gorcunov <gorcunov@gmail.com>
Reported-by: Alexey Shabalin
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
for i in $(find . -name "*.[ch]" -type f); do
sed -i 's/\(pr_warn(".*[^n]\)\("[),]\)/\1\\n\2/' $i
done
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
We need to know what are stack pointers of every thread to ensure that the
current stack page will not be treated as lazy.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Alice Frosi <alice@linux.vnet.ibm.com>
Reviewed-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
They were occasionally kept in obj-y.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
arm's cpu code is the same as aarch64, so use
the symlink.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>