Will need them to mask some of the features from
command line options.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Tracking cpuid features is easier when sync'ed with kernel
source code. Note though that while in kernel feature bits
are not part of ABI, we're saving bits into an image so
as result make sure they are posted in proper place together
with keeping in mind the backward compatibility issue.
Here we also start using v2 of cpuinfo image with more
feature bits.
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
To be close to the kernel code.
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
On fedora rawhide seccomp_metadata for some
reason is not defined (while in kernel it introduced
together with PTRACE_SECCOMP_GET_METADATA). So
lets do a trick for a while -- define own alias.
Once system headers get settled down we might find
more suitable solution. Because it's a part of kernel
API we're on the safe side.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
There are a lot of lines, which are longer than 79:
(04.331172)pie: 1: Error (criu/pie/restorer.c:460): seccomp: Unexpected tid ->
(04.331172)pie: 1: 1 != 1
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
We will use it to figure out if filter log target is used.
Metadata associated with seccomp filter is relatively new
feature which allows userspace to get and set it back.
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
For architectures like aarch64/ppc64 it's needed to propagate the size
of page inside PIEs. For the parasite page size will be defined during
seizing, and for restorer during early initialization.
Afterward we can use PAGE_SIZE in PIEs like we did before.
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
This function is an analogue to vsprintf(), and is used in very much the
same way. The caller expects the modified string pointer to be pointing to
a null-terminated string.
Signed-off-by: Joel Nider <joeln@il.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
As it's aligned to 16, all structures that contain it should be
also aligned to 16. In the kernel there is no such align as
there two separate definitions of i387_fxsave_struct:
one for ia32 and another for x86_64.
Fixes newly introduced align warning in gcc-8.1:
In file included from compel/include/uapi/compel/asm/sigframe.h:7,
from compel/plugins/std/infect.c:13:
compel/include/uapi/compel/asm/fpu.h:89:1: error: alignment 1 of 'struct xsave_struct_ia32' is less than 16 [-Werror=packed-not-aligned]
} __packed;
^
It doesn't change the current align of the struct, as containing
structures are __packed and it aligned already *by fact*.
It only affects the function users of the struct's local variables:
now they lay aligned on a stack.
Signed-off-by: Dmitry Safonov <dima@arista.com>
Ugh, I've spent 25 mins at 4 A.M. to figure out why the tests are failing.
And the reason is stupied me, who defined a new flag after 0x8
as 0x16, not as 0x10. Simplify those definitions for such simple-minded
living creatures like Dima.
Signed-off-by: Dmitry Safonov <dima@arista.com>
On Skylake processors and kernel older than v4.14
ptrace(PTRACE_GETREGSET, pid, NT_X86_XSTATE, iov)
may return not full xstate, ommiting FP part (that is XFEATURE_MASK_FP).
There is a patch which describes this bug:
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1318800.html
Anyway, it's fixed in v4.14 kernel by (what we believe with Andrey) this:
https://patchwork.kernel.org/patch/9567939/
As we still support kernels from v3.10 and newer, we need to have a
workaround for this kernel bug on Skylake CPUs.
Big thanks to Shlomi for the reports, the effort and for providing an
Amazon VM to test this. I wish more bug reporters were like you.
Reported-by: Shlomi Matichin <shlomi@binaris.com>
Provided-test-env: Shlomi Matichin <shlomi@binaris.com>
Investigated-with: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Mere cleanup. For Skylake workaround I'll call one after another,
so it's better separate it in a small helpers.
Signed-off-by: Dmitry Safonov <dima@arista.com>
get_task_regs() needs to know if it needs to use workaround
for a Skylake ptrace() bug. The next patch will introduce a
new flag for that.
I also thought about making 3 versions of get_task_regs() and
adding them to ictx->get_task_regs() depending on the flags..
But get_task_regs() is a private function and infect_ctx is
a uapi.. So, let's just pass context flags to get_task_regs().
Signed-off-by: Dmitry Safonov <dima@arista.com>
As we anyway define save_regs_t for other purposes,
use it in the function declaration.
To unify infect_ctx style, add make_sigframe_t.
Mere cleanup.
Signed-off-by: Dmitry Safonov <dima@arista.com>
It has two arguments "pos_l and "pos_h" instead of one "off". It is used
to handle 64-bit offsets on 32-bit kernels.
SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h)
https://github.com/checkpoint-restore/criu/issues/424
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Except for several false positives done by:
find -type f -name "*.c" -not -path "./test/*" -exec sed -i
's/\(\<pr_err.*[^\][^n]\)\("[,)]\)/\1\\n\2/g' {} \;
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Currently we use the "map_files/%p-%p" format, but actually it should
be "map_files/%lx-%lx".
The kernel could handle both formats, but recently Alexey Dobriyan fixed
the kernel and it accept only the second format.
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Commit 37e4c7bfc264 fixed arm, ppc, x86 (32bit),
while it made wrong definition of x86_64. Fix that.
Also, add commentary to raw fork() implementation.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Regs are present in unsigned format so convert them
into signed first to provide results.
In particular if memfd_create syscall failed we won't
notice -ENOMEM error but rather treat it as unsigned
hex value
| (05.303002) Putting parasite blob into 0x7f1c6ffe0000->0xfffffff4
| (05.303234) Putting tsock into pid 42773
Signed-off-by: Cyrill Gorcunov <gorcunov@virtuozzo.com>
Reviewed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Its declaration was changed in glibc headers and we don't want
to handle two version of itю
In file included from compel/include/uapi/compel/infect.h:6:0,
from compel/include/uapi/compel/compel.h:12,
from compel/include/log.h:4,
from compel/arch/aarch64/src/lib/infect.c:8:
compel/arch/aarch64/src/lib/infect.c: In function 'sigreturn_prep_regs_plain':
compel/include/uapi/compel/asm/sigframe.h:47:99: error: 'mcontext_t {aka struct <anonymous>}' has no member named '__reserved'; did you mean '__glibc_reserved1'?
#define RT_SIGFRAME_AUX_CONTEXT(rt_sigframe) ((struct aux_context*)&(rt_sigframe)->uc.uc_mcontext.__reserved)
^
compel/include/uapi/compel/asm/sigframe.h:48:41: note: in expansion of macro 'RT_SIGFRAME_AUX_CONTEXT'
#define RT_SIGFRAME_FPU(rt_sigframe) (&RT_SIGFRAME_AUX_CONTEXT(rt_sigframe)->fpsimd)
^~~~~~~~~~~~~~~~~~~~~~~
compel/arch/aarch64/src/lib/infect.c:34:34: note: in expansion of macro 'RT_SIGFRAME_FPU'
struct fpsimd_context *fpsimd = RT_SIGFRAME_FPU(sigframe);
^~~~~~~~~~~~~~~
make[1]: *** [/builddir/build/BUILD/criu-3.5/scripts/nmk/scripts/build.mk:209: compel/arch/aarch64/src/lib/infect.o] Error 1
make: *** [Makefile.compel:36: compel/libcompel.a] Error 2
error: Bad exit status from /var/tmp/rpm-tmp.IExY9K (%build)
This seems to be related to the following glibc changes:
https://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=4fa9b3bfe6759c82beb4b043a54a3598ca467289
Reported-by: Adrian Reber <adrian@lisas.de>
Tested-by: Adrian Reber <adrian@lisas.de>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Add handeling of R_X86_64_GOTPCRELX and R_X86_64_REX_GOTPCRELX.
They are not that old, so I provided ifdef-guards for them.
According to x86-64 ABI specification paper, they should be
generated instead of R_X86_64_GOTPCREL for cases when relaxation
is possible.
At this moment we can handle them the same way like R_X86_64_GOTPCREL.
[0] https://github.com/hjl-tools/x86-psABI/wiki/x86-64-psABI-r249.pdfFixes: #397
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Reported-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Dump and restore process with RI control block. Runtime instrumentation
allows to collect information about program execution as CPU data.
The RI control block is dumped and restored for all threads.
Ptrace kernel interface is provided by:
c122bc239b13 ("s390/ptrace: add runtime instrumention register get/set")
Signed-off-by: Alice Frosi <alice@linux.vnet.ibm.com>
Reviewed-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Dump and restore tasks with GS control blocks. Guarded-storage is a new
s390 feature to improve garbage collecting languages like Java.
There are two control blocks in the CPU:
- GS control block
- GS broadcast control block
Both control blocks have to be dumped and restored for all threads.
Signed-off-by: Alice Frosi <alice@linux.vnet.ibm.com>
Reviewed-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
This is what we have:
> compel/src/lib/infect.c:1145:38: error: taking address of packed member
> 'uc_sigmask' of class or structure 'ucontext_ia32' may result in an
> unaligned pointer value [-Werror,-Waddress-of-packed-member]
> blk_sigset = RT_SIGFRAME_UC_SIGMASK(f);
> ~~~~~~~~~~~~~~~~~~~~~~~^~
> compel/include/uapi/asm/sigframe.h:133:4: note: expanded from macro
> 'RT_SIGFRAME_UC_SIGMASK'
> (&rt_sigframe->compat.uc.uc_sigmask))
> ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 1 error generated.
Indeed this results in an unaligned pointer, but as this is intended and
well known (see commit dd6736bd "compel/x86/compat: pack ucontext_ia32"),
we need to silence the warning here.
For more details, see https://reviews.llvm.org/D20561
Originally found by Travis on Alpine Linux, reproduced on Ubuntu 17.10.
[v2: fix for non-x86]
Reported-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The right order for all of our 4 archs is:
SYSCALL_DEFINE5(clone, unsigned long, clone_flags, unsigned long, newsp,
int __user *, parent_tidptr,
unsigned long, tls,
int __user *, child_tidptr)
See Linux kernel for the details.
Note, this is just a fix, and it's not connected with the second patch.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Reviewed-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The recent changes to user address space limits in the Linux kernel break
the assumption that TASK_SIZE is 128TB. For now, the maximal task size on
ppc64le is 512TB and we need to detect it in runtime for compatibility with
older kernels.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Add linux/userfaultfd.h to criu sources. This header is a part
of the kernel API and I see nothing wrong to have in the repo.
Why we want to do this:
* to check that criu works correctly if a kernel doesn't
support userfaultfd.
* to check compilation of the userfaultfd part in travis-ci.
v2: remove UFFD from FEATURES_LIST
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Adrian Reber <areber@redhat.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Adrian Reber <areber@redhat.com>
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
For dumping threads we execute parasite code before we collect the
floating point registers with get_task_regs().
The following describes the code path were this is done:
dump_task_threads()
for (i = 0; i < item->nr_threads; i++)
dump_task_thread()
parasite_dump_thread_seized()
compel_prepare_thread()
prepare_thread()
### Get general purpose registers ###
ptrace_get_regs(ctx->regs)
### parasite code is executed in thread ###
compel_run_in_thread(tctl, PARASITE_CMD_DUMP_THREAD)
compel_get_thread_regs()
### Get FP and VX registers ###
get_task_regs()
Since on s390 gcc uses floating point registers to cache general
purpose without the -msoft-float option the floating point
registers would have been clobbered after compel_run_in_thread().
With this patch we first save all thread registers with task_get_regs() and
then call the parasite code.
We introduce a new function arch_set_thread_regs() that restores the saved
registers for all threads. This is necessary for criu dump --leave-running,
--check-only, and error handling.
Pre-dump does not require to save the registers for the threads because
because only the main thread (task) is used for parsite code execution.
The above changes allow us to use all register sets in the parasite
code - therefore we can remove -msoft-float on s390.
Reviewed-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Alice Frosi <alice@linux.vnet.ibm.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
The entries in /proc/<pid>/maps look like the following:
3fffffdf000-40000000000 rw-p 00000000 00:00 0
The upper address is the first address that is *not* included in the
address range.
Our function max_mapped_addr() should return the last valid address
for a process, but currently returns the first invalid address.
This can lead to the following error message on kernel that have
kernel commit ee71d16d22bb:
Error (criu/proc_parse.c:694): Can't dump high memory region
1ffffffffff000-20000000000000 of task 24 because kernel commit ee71d16d22bb
is missing
Fix this and make max_mapped_addr() the last valid address (first invalid
address - 1).
Reported-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
We currently define "CFLAGS += -msoft-float -fno-optimize-sibling-calls"
in Makefile.compel.
Makefile.compel is included into the toplevel Makefile and so changes
CFLAGS which are exported to all criu and zdtm Makefiles.
We must not use -msoft-float outside the compel files. E.g. otherwise for
zdtm we get the following build error:
uptime_grow.o: In function `main':
/tmp/2/criu/test/zdtm/static/uptime_grow.c:36: undefined reference to `__floatdidf'
/tmp/2/criu/test/zdtm/static/uptime_grow.c:36: undefined reference to `__muldf3'
/tmp/2/criu/test/zdtm/static/uptime_grow.c:36: undefined reference to `__floatdidf'
/tmp/2/criu/test/zdtm/static/uptime_grow.c:36: undefined reference to `__adddf3
Fix this and move the CFLAGS definition to the compel Makefiles.
We do this by defining a new flag CFLAGS_PIE that is only used for pie code.
Reported-by: Adrian Reber <areber@redhat.com>
Suggested-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Reviewed-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Tested-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Add s390 parts to common code files.
Patch history
-------------
v2->v3:
* Add: s390: Consolidate -msoft-float into Makefile.compel
Reviewed-by: Alice Frosi <alice@linux.vnet.ibm.com>
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Reviewed-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
This patch only adds the support but does not enable it for building.
Reviewed-by: Alice Frosi <alice@linux.vnet.ibm.com>
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Reviewed-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
In case of success, we want to be silent when on default log level.
This is a time-honored UNIX tradition, who we are to break it?
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
1. Commit 8b99809 ("compel: make plugins .a archives") changed the
suffix of compel plugins, so this test no longer compiles.
2. "compel plugins" can print auxiliary plugins now, let's use it.
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
As we have more than 1 working plugin right now, let's implement
the TODO item for "compel plugins" command.
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Some get_status() methods may allocate data, because
not all of the fields in /proc/[pid]/status file
have the fixed size. For example, NSpid, which
size may vary.
Introduce new method free_status() in counterweight
for such type get_status() methods. it will be called
in case of we go to try_again and need to free allocated
data.
Also, introduce data parameter for a use in the future.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The goal of this function is to compare everything except caps,
but caps size is took to compare. It's wrong, there must be
used offsetof(struct proc_status_creds, cap_inh) instead.
Also, sigpnd may be different too.
v3: Move excluding sigpnd from comparation in this patch (was in another patch).
Reorder fields in seize_task_status().
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
This naming is left from the first compatible kernel patches.
At that time to return to 32-bit task rt_sigreturn was used with
a special flag.
Now it's not true anymore, the naming doesn't relate.
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
As for compiling syscalls.S needed syscall-aux.h header,
which is linked with making $(sys-asm-types), add it to deps.
Fixes:
> In file included from compel/arch/aarch64/plugins/std/syscalls/syscalls.S:2:0:
> compel/include/uapi/compel/plugins/std/syscall-codes.h:568:44: fatal error: compel/plugins/std/syscall-aux.h: No such file or directory
> #include <compel/plugins/std/syscall-aux.h>
> ^
> compilation terminated.
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Unlike pr_perror(), pr_err() does not append a newline.
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>