An mmaped file is opened O_RDONLY or O_RDWR depending on the permissions
on the first vma dump_task_mm() encounters mapping that file. This
causes two problems:
1. If a file has multiple MAP_SHARED mappings, some of which are
read-only and some of which are read-write, and the first encountered
mapping happens to be read-only, the file will be opened O_RDONLY
during restore, and mmap(PROT_WRITE) will fail with EACCES, causing
the restore to fail.
2. If a file is opened read-write and mapped read-only, it will be
opened O_RDONLY during restore, so restore will succeed, but
mprotect(PROT_WRITE) on the read-only mapping after restore will
fail.
To fix both of these, record open flags per-vma based on the presence of
VM_MAYWRITE in smaps.
Signed-off-by: Jamie Liu <jamieliu@google.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
prepare_mappings() uses the return value of map_private_vma() for the
size of the mapped vma. Unfortunately the return value of
map_private_vma() is an int, resulting in breakage when the size exceeds
31 bits. Change map_private_vma() to return only an error code, and
mutate addr in-place.
Signed-off-by: Jamie Liu <jamieliu@google.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Using O_PATH known to be buggy on 3.11 kernel so use
direct link reading procedure here.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
post-restore script can fail, so it can't be called after network-unlock.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The --exec-cmd option specifies a command that will be execvp()-ed on successful
restore. This way the command specified here will become the parent process of
the restored process tree.
Waiting for the restored processes to finish is responsibility of this command.
All service FDs are closed before we call execvp(). Standad output and error of
the command are redirected to the log file when we are restoring through the RPC
service.
This option will be used when restoring LinuX Containers and it seems helpful
for perf or other use cases when restored processes must be supervised by a
parent.
Two directions were researched in order to integrate CRIU and LXC:
1. We tell to CRIU, that after restoring container is should execve()
lxc properly explaining to it that there's a new container hanging
around.
2. We make LXC set himself as child subreaper, then fork() criu and ask
it to detach (-d) from restore container afterwards. Being a subreaper,
it should get the container's init into his child list after it.
The main reason for choosing the first option is that the second one can't work
with the RPC service. If we call restore via the service then criu service will
be the top-most task in the hierarchy and will not be able to reparent the
restore trees to any other task in the system. Calling execve from service
worker sub-task (and daemonizing it) should solve this.
Signed-off-by: Deyan Doychev <deyandoichev@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Don't need it, also add DEFINES into try-cc,
after all we define a number of things in
this variable and it's better to pass it
to tests for conditional compilation.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Tested-by: Ruslan Kuprieiv <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It takes only two arguments. Note it's not error since
we don't even reference to a third argument here but
just to be consistent and clear.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Tested-by: Ruslan Kuprieiv <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Currently we marks all mounts as private before restoring mntns. We do
these to avoid problem with pivot_root.
It's wrong, because the root mount can be slave for an external shared
group. The root mount is not mounted by CRIU, so here is nothing wrong.
Now look at the pivot_root code in kernel
if (IS_MNT_SHARED(old_mnt) ||
IS_MNT_SHARED(new_mnt->mnt_parent) ||
IS_MNT_SHARED(root_mnt->mnt_parent))
goto out4;
So we don't need to change options for all mounts. We need to remount
/ and the parent of the new root. It's safe, because we already in another
mntns.
v2: simplify code
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The root mount isn't always private. For example it is mounted
as a slave in LXC 1.0 containers. So we need to execute logic
about propogation for the root mount too.
v2: move all logic about the root mount in a separate function
v3: make code more readable
v4: do_mount_root() looks like other do_*_root() functions
Reported-by: David Shwatrz <dshwatrz@gmail.com>
Cc: David Shwatrz <dshwatrz@gmail.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The current code think that /vz/lxc/centos-6-x86_64-root is
in /vz/lxc/centos-6-x86_64.
If the path is not equal to mountpoint, we need to check, that
path contains a slash after mountpoint.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This patch splits the file arch/x86/vdso-pie.c into machine-dependent
and machine-independent parts by moving the routines vdso_fill_symtable(),
vdso_proxify(), and vdso_remap() to the file pie/vdso.c.
The ARM version of the routines is moved to the source pie/vdso-stub.c
to provide the vDSO proxy stub implementation for architectures
that don't provide the vDSO.
Signed-off-by: Alexander Kartashov <alekskartashov@parallels.com>
Looks-good-to: Cyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
These variables are architecture-specific so they shouldn't appear
in the upcoming generic version of the routine vdso_fill_symtable().
At first glance it seems it's better to make these variables global
and reference them as external in the generic version of the routine.
However experiments with the vDSO restore routine for AArch64 showed
that the AArch64 compiler uses the GOT to access such variables
rendering our blobs unrelocatable.
Signed-off-by: Alexander Kartashov <alekskartashov@parallels.com>
Looks-good-to: Cyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This patch makes the name of the vDSO symbol name table
a bit more appropriate for the generic version of
the routine vdso_fill_symtable().
Signed-off-by: Alexander Kartashov <alekskartashov@parallels.com>
Looks-good-to: Cyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This patch moves the enum VDSO_SYMBOL_* and macros VDSO_SYMBOL_*_NAME
to the x86 specific header since different architectures export
different symbols from their vDSOs.
Signed-off-by: Alexander Kartashov <alekskartashov@parallels.com>
Looks-good-to: Cyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This patch splits the file arch/x86/vdso.c into machine-independent
and machine-dependent parts by moving the routine vdso_init()
from the file vdso.c. The routine seems to be suitable for all
architectures supporting the vDSO.
The ARM version of the routine is moved to the source vdso-stub.c
that is supposed to be the vDSO proxy stub implementation for
architectures that don't provide the vDSO. The build scripts are
adjusted as well to enable selection between the full-fledged
and stub vDSO proxy implementations.
Signed-off-by: Alexander Kartashov <alekskartashov@parallels.com>
Looks-good-to: Cyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Supported machine architectures provide TLS stogares of different sizes:
the size of the TLS storage in x86-64 is 24 bytes, ARM --- 4 bytes
and upcoming AArch64 --- 8 bytes. This means every supported architecture
needs a specific type to store the value of the TLS register.
This patch reworks the insterface of the routines arch_get_tls()
and restore_tls() passing them the TLS storage by pointer
rather than by value to simplify the TLS stub for x86.
Signed-off-by: Alexander Kartashov <alekskartashov@parallels.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
For example restore_shmem_content allocates the page_read structure on
stack.
Cc: Pavel Emelyanov <xemul@parallels.com>
Reported-by: Jenkins Criuovich
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
bers stads for berserker which should eat computer
resources emulating "load" for CRIU performance testing.
It's still far from being complete but can do trivial things:
- generate mmap's with memory dirtified
- open files
Nothing serious.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
if final size of all pages images is zero than dedup works
use :
bash test/zdtm.sh --auto-dedup
bash test/zdtm.sh --auto-dedup static/maps04
bash test/zdtm.sh -P -i 3 --auto-dedup -t transition/maps007
changes: aplicable for all tests, -ad changed to --auto-dedup,
simplify, check all private pages images, check shmem images
and go with shmem patch set.
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Tikhomirov Pavel <snorcht@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
in restore_shmem_content() use open_page_read() to open images
Signed-off-by: Tikhomirov Pavel <snorcht@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
During the time some files become obsolete and might be missing
in checkpoint image set, but to keep backward compatibility we
still trying to open them, which might print out error like
| Unable to open 'path-to-file'
and confuse a reader why criu prints error but continue working.
To eliminate this problem O_OPT flag has been introduced in
commit 16b5692061, which suppress error message priting
if the flag is set.
Now start using O_OPT in the following functions
- open_irmap_cache: irmap cache is relatively new optional feature
- prepare_rlimits, open_signal_image, restore_file_locks,
prepare_fd_pid, prepare_mm_pid, collect_image: all these
helpers are trying to open image files which can be missing.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
On PI machine we've got
| CC protobuf.o
| pstree.c: In function ‘core_entry_alloc’:
| pstree.c:36:10: error: ‘RLIM_NLIMITS’ undeclared (first use in this function)
due to old kernel headers. Note I've dropped off
BUG_ON here to localize all things in pstree code,
no need to sprinkle constants.
Reported-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
No all distros (Rpi) provide O_PATH definition,
so include fcntl.h here thus we don't hit compilation
problem like
| CC image.o
| image.c: In function ‘open_image_at’:
| image.c:187:29: error: ‘O_PATH’ undeclared (first use in this function)
Reported-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The array element is RlimitEntry properly initialized,
no need in additional memcpy-s and size-checks.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
On Thu, Mar 13, 2014 at 02:30:50PM +0400, Cyrill Gorcunov wrote:
>
> This image is deprecated now so move it out of
> _CR_FD_TASK thus we won't be even generating it
> on the dump.
>
> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Updated
>From cb9c3953beac7d42de80635e7a6e537cc867c479 Mon Sep 17 00:00:00 2001
From: Cyrill Gorcunov <gorcunov@openvz.org>
Date: Thu, 13 Mar 2014 14:24:50 +0400
Subject: [PATCH 7/7] rlimit: Move CR_FD_RLIMIT out of _CR_FD_TASK
This image is deprecated now so move it out of
_CR_FD_TASK thus we won't be even generating it
on the dump.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We're using new image format, but old image file
is still generated. This will be addressed in
next patch.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
To save backward compatibility try to read
data from old image if Core entry doesn't
has rlimits bound.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Note the restore remains as is for a while, it'll
be addressed later.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
On Thu, Mar 13, 2014 at 04:20:37PM +0400, Pavel Emelyanov wrote:
>
> Would you rework this patch on top of my recent
> "allocate Core in on xmalloc call" one?
Attached.
>From c2233d4fafce30c4e7214a1a7ab3677824a30d75 Mon Sep 17 00:00:00 2001
From: Cyrill Gorcunov <gorcunov@openvz.org>
Date: Thu, 13 Mar 2014 16:57:14 +0400
Subject: [PATCH] rlimit: Allocate and free appropriate Core entry
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We will carry rlimits inside Core itself instead of separate image.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This allows us to distinguish the situation where image
to be opened is missing but optional, thus no error message
should be printed.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
arch/x86/crtools.c: In function ‘arch_alloc_thread_info’:
arch/x86/crtools.c:271:6: error: ‘with_xsave’ may be used uninitialized in this function [-Werror=uninitialized]
Actually the with_xsave is with_fpu dependant, but some gccs
can't guess that fact :\
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
For example opening the FIFO blocks until the other end is opened also.
We can use O_PATH to avoid sleeping in the open() call.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It's going to be used for restoring namespaces. For example we need to
enumirate the ns_ids list for restoring mount namespaces.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The implementation of bit operations for ARM isn't actually
architecture-specific so it would rather be shared with
the upcoming port for AArch64 that won't provide optimized
implementation of bit operations.
Signed-off-by: Alexander Kartashov <alekskartashov@parallels.com>
Reviewed-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>