v2: Follow the kerndat style that "features" are described
just by global boolean variables.
v3: give NULL as a name to get EFAULT if memfd_create is supported
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
TASK_SIZE for AArch64 may be any of several options between (1 << 39)
and (1 << 48), inclusive, depending on kernel configuration. Go back
to just checking the most significant bit, as was done before commit
3f12d688ae563b91d8a738d68902ea2e23f2b59c was made to accomodate 32-bit
ARM (before mmap_seized got architecture-specific implementations).
This fixes the following error for AArch64 kernels with
CONFIG_ARM64_64K_PAGES=y.
Error (parasite-syscall.c:1105): Can't allocate memory for parasite blob (pid: 104)
Signed-off-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Add missing implementations for ARM platforms.
Reported-by: Mr. Travis
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
On Wed, Oct 01, 2014 at 05:51:09PM +0400, Pavel Emelyanov wrote:
> > Yes, what you've been expecting?
>
> if (!strcmp(argv[optind]))
> return cpu_cap_check()
>
> or smth like this.
updated. So if it become confusing -- feel free to merge [1;9] and
ping me to resend the rest, or pick up from attachements.
>From 6af96ff63ac82f9566c3cba9c116dc67698c9797 Mon Sep 17 00:00:00 2001
From: Cyrill Gorcunov <gorcunov@openvz.org>
Date: Tue, 30 Sep 2014 18:33:40 +0400
Subject: [PATCH] cpuinfo: Add "cpuinfo [dump|check]" commands
They allow to validate cpuinfo information
without running complete dump/restore actions.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
On Wed, Oct 01, 2014 at 04:57:40PM +0400, Pavel Emelyanov wrote:
> On 10/01/2014 01:07 AM, Cyrill Gorcunov wrote:
> > On Tue, Sep 30, 2014 at 09:18:53PM +0400, Cyrill Gorcunov wrote:
> >> If a user requested criu to dump cpuinfo image then we
> >> write one on dump and verify on restore. At the moment
> >> we require all cpu feature bits to match the destination
> >> cpu in a sake of simplicity, but in future we need deps
> >> engine which would filer out bits and test if cpu we're
> >> restoring on is more capable than one we were dumping at
> >> allowing to proceed restore procedure.
> >>
> >> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
> >
> > Updated to new img format
Something like attached?
>From 59272a9514311e6736cddee08d5f88aa95d49189 Mon Sep 17 00:00:00 2001
From: Cyrill Gorcunov <gorcunov@openvz.org>
Date: Thu, 25 Sep 2014 16:04:10 +0400
Subject: [PATCH] cpuinfo: x86 -- Add dump and validation of cpuinfo image
If a user requested criu to dump cpuinfo image then we
write one on dump and verify on restore. At the moment
we require all cpu feature bits to match the destination
cpu in a sake of simplicity, but in future we need deps
engine which would filer out bits and test if cpu we're
restoring on is more capable than one we were dumping at
allowing to proceed restore procedure.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Instead of parsing procfs lets use native cpuid(), it's a way faster.
The dark side is that the kernel may disable some of features via
bootline options even if they are present on hardware but for us
it's fine -- we will be testing hardware cpu for features anyway.
The X86_FEATURE_ bits are gathered from two sources: linux kernel
and cpu specifications.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It's simply a wrapper over cpu_has_feature,
so use this it directly instead.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Fix compilation on ARM:
pie/restorer.c: In function ‘wait_helpers’:
pie/restorer.c:728:3: error: implicit declaration of function ‘sys_waitpid’ [-Werror=implicit-function-declaration]
cc1: all warnings being treated as errors
Reported-by: Mr Jenkins
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Unfortunately the kernel doesn't flush hw breakpoints on
detaching ptrace. If a breakpoint is triggered without ptrace, it
will be killed by SIGTRAP.
Reported-by: Mr Jenkins
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Currently CRIU traces syscalls to catch a moment, when sigreturn() is
called. Now we trace recv(cmd), close(logfd), close(cmdfd), sigreturn().
We can reduce a number of steps by using hw breakpoints. A breakpoint is
set before sigreturn, so we will need to trace only it.
v2: In the first version a breakpoint is set after sigreturn. In this
case we have a problem with signals. If a process has pending signals,
it will start to precess them after exiting from sigreturn(), but before
returning to userspace. So the breakpoint will not be triggered.
And at the end Here are a few numbers how we catch sigreturn.
Before this patch criu executes 36 syscalls and gets 12 signals.
With this patch criu executes 18 syscalls and gets 5 signals.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Needed for future user namespace support. Capabilities will have to be
dumped from the parasite, ie from inside the namespace since there is no
obvious way to 'translate' capabilities from the global namespace (unlike
with uids and gids, where the id mappings can be used for translation).
[ additional explanation from Andrew Vagin:
"capabilities" are not translated between namespaces. They can exist
only in one userns, where a process lives. If a process is created in a
new userns, it gets a full set of capabilities in this userns, and
loses all caps in a parent userns.
So if capabilities are not shown in /proc/pid/stat, we have no way to
get it except of using parasite code. ]
Signed-off-by: Sophie Blee-Goldman <ableegoldman@google.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This brings the changes made in the following commits to the
aarch64 copy of the code.
commit 7794f67f2055420c6b6c2967edfbe0c39a7cd744
Author: Cyrill Gorcunov <gorcunov@openvz.org>
Date: Tue Aug 5 13:59:18 2014 +0400
vdso: x86 -- Fix missing ability to remap vDSO if only one zone present
commit 066add0de44f462e7482571763f303ded0b4762f
Author: Cyrill Gorcunov <gorcunov@openvz.org>
Date: Tue Aug 5 13:07:00 2014 +0400
vdso: x86 -- Simplify vdso_proxify
Signed-off-by: Christopher Covington <cov@codeaurora.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This is required to support checkpoint and restore of timers
that notify via file descriptors on ARM and AArch64.
Signed-off-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This modifies the x86 VDSO code to work on AArch64.
Signed-off-by: Christopher Covington <cov@codeaurora.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
While it duplicates hundreds of lines of code, this is the
short term strategy Cyrill and I have agreed to for supporting
VDSOs across multiple architectures [1]. With better
understanding of where things differ per-architecture, or even
improved consolidation in the kernel, we can hopefully move to
a more shared implementation in the future.
1. http://lists.openvz.org/pipermail/criu/2014-August/015218.html
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Occasionally I ruined the ability to do a in-place remap for
pre 3.16 kernels. Bring it back.
CID 1230182: Logically dead code (DEADCODE)
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
No need for second if() statement, merge everything
in previous one.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
In latest linux-next the vdso zone is placed _after_ vvar
zone so eventually we need to handle any combination of
the following cases
- no vvar zone
- vvar before vdso
- vvar after vdso
Here we address all them.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Since we might have a several vDSO zones lets hide
handling in arch-specific routines.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
In linux kernel 3.17 most probably the vvar and vdso zones will
be in reverse order, ie vvar first and vdso later so do extended
test for these VMAs coming in one bundle.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
If pfn = 0 it means we hit something very strange
condition but better to not yield BUG_ON here,
better exit with error for future investigation.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Otherwise we're meeting somehow corrupted mark and
must abort dumping.
Reported-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
New kernel 3.16 will have old vDSO zone splitted into the two vmas:
one for vdso code itself and second that named vvar for data been
referenced from vdso code.
Because I can't do 'dump' and 'restore' parts of the code separately
(otherwise test would fail) the commit is pretty big one and hard to
read so here is detailed explanation what's going on.
1) When start dumping we detect vvar zone by reading /proc/pid/smap
and looking up for "[vvar]" token. Note the vvar zone is mapped
by a kernel with PF/IO flags so we should not fail here.
Also it's assumed that at least for now kernel won't be changed
much and [vvar] zone always follows the [vdso] zone, otherwise
criu will print error.
2) In previous commits we disabled dumping vvar area contents so
the restorer code never try to read vvar data but still we need
to map vvar zone thus vma entry remains in image.
3) As with previous vdso format we might have 2 cases
a) Dump and restore is happening on same kernel
b) Dump and restore are done on different kernels
To detect which case we have we parse vdso data from image
and find symbols offsets then compare their values with runtime
symbols provided us by a kernel. If they match and (!!!) the
size of vvar zone is the same -- we simply remap both zones
from runtime kernel into the positions dumpee had at checkpoint
time. This is that named "inplace" remap (a).
If this happens the vdso_proxify() routine drops VMA_AREA_REGULAR
from vvar area provided by a caller code and restorer won't try
to handle this vma. It looks somehow strange and probably should
be reworked but for now I left it as is to minimize the patch.
In case of (b) we need to generate a proxy. We do that in same
way as we were before just include vvar zone into proxy and save
vvar proxy address inside vdso mark injected into vdso area. Thus
on subsequent checkpoint we can detect proxy vvar zone and rip
it off the list of vmas to handle.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Because of new vvar area we need to carry the
address of vvar proxy inside the mark. Thus
add members needed and update routines.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
A proxy vdso is removed from the vma_area_list list,
so vma_area_list->nr must be decremented.
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
New vDSO are in stripped format so use dynamic
symbols instead of sectioned ones.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We're not sharing the code anymore so drop it.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Currently we build vDSO handling code for all archs provided
in the source code having some "common" parts inside pie/vdso.c,
pie/vdso-stub.c, vdso-stub.c and vdso.c. This were more or
less well but in new linux kernels (starting from 3.16 presumably)
the vDSO has been significantly reworked so every architecture
must have own vDSO handling engine (just like the kernel does).
So in this patch we move vDSO code to arch specific and because
aarch64 actually doesn't implement proxification yet due to
kernel restrictions -- we drops it out. When there will be
kernel support we bring it back in proper arch/aarch64
implementation.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Alexander Kartashov <alekskartashov@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
* VDSO is never mapped with MAP_GROWSDOWN
* The first page of growsdown vma may be a guard page, so any attempt
to read it is suppressed by SIGBUS.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
aarch64: TLS register checkpoint/restore implementation by Christopher Covington.
Signed-off-by: Alexander Kartashov <alekskartashov@parallels.com>
Signed-off-by: Christopher Covington <cov@codeaurora.org>
Reviewed-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This will make comparison with other ports easier.
Signed-off-by: Christopher Covington <cov@codeaurora.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This patch moves the files arch/$ARCH/include/asm/int.h to
include/asm-generic/int.h and makes the types {u,s}{8,16,32}
be aliases of the fixed sized integer types [u]int{8,16,32}_t.
This makes it possible to use single set of integer typedefs
in all architectural ports.
Signed-off-by: Alexander Kartashov <alekskartashov@parallels.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This patch splits the file arch/x86/vdso-pie.c into machine-dependent
and machine-independent parts by moving the routines vdso_fill_symtable(),
vdso_proxify(), and vdso_remap() to the file pie/vdso.c.
The ARM version of the routines is moved to the source pie/vdso-stub.c
to provide the vDSO proxy stub implementation for architectures
that don't provide the vDSO.
Signed-off-by: Alexander Kartashov <alekskartashov@parallels.com>
Looks-good-to: Cyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
These variables are architecture-specific so they shouldn't appear
in the upcoming generic version of the routine vdso_fill_symtable().
At first glance it seems it's better to make these variables global
and reference them as external in the generic version of the routine.
However experiments with the vDSO restore routine for AArch64 showed
that the AArch64 compiler uses the GOT to access such variables
rendering our blobs unrelocatable.
Signed-off-by: Alexander Kartashov <alekskartashov@parallels.com>
Looks-good-to: Cyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This patch makes the name of the vDSO symbol name table
a bit more appropriate for the generic version of
the routine vdso_fill_symtable().
Signed-off-by: Alexander Kartashov <alekskartashov@parallels.com>
Looks-good-to: Cyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This patch moves the enum VDSO_SYMBOL_* and macros VDSO_SYMBOL_*_NAME
to the x86 specific header since different architectures export
different symbols from their vDSOs.
Signed-off-by: Alexander Kartashov <alekskartashov@parallels.com>
Looks-good-to: Cyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This patch splits the file arch/x86/vdso.c into machine-independent
and machine-dependent parts by moving the routine vdso_init()
from the file vdso.c. The routine seems to be suitable for all
architectures supporting the vDSO.
The ARM version of the routine is moved to the source vdso-stub.c
that is supposed to be the vDSO proxy stub implementation for
architectures that don't provide the vDSO. The build scripts are
adjusted as well to enable selection between the full-fledged
and stub vDSO proxy implementations.
Signed-off-by: Alexander Kartashov <alekskartashov@parallels.com>
Looks-good-to: Cyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Supported machine architectures provide TLS stogares of different sizes:
the size of the TLS storage in x86-64 is 24 bytes, ARM --- 4 bytes
and upcoming AArch64 --- 8 bytes. This means every supported architecture
needs a specific type to store the value of the TLS register.
This patch reworks the insterface of the routines arch_get_tls()
and restore_tls() passing them the TLS storage by pointer
rather than by value to simplify the TLS stub for x86.
Signed-off-by: Alexander Kartashov <alekskartashov@parallels.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
arch/x86/crtools.c: In function ‘arch_alloc_thread_info’:
arch/x86/crtools.c:271:6: error: ‘with_xsave’ may be used uninitialized in this function [-Werror=uninitialized]
Actually the with_xsave is with_fpu dependant, but some gccs
can't guess that fact :\
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>