A process can be restored in a new pidns. close_old_fds() opens
the /proc/PID directory. Without this patch we can see errors like this:
(00.333915) 1: Error (util.c:102): Unable to close fd 6: Bad file descriptor
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The root mount is an external mount and its source can be not '/'.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The source of the root mount may be not equal to "/" and we need to take
this fact into account, when we bind-mount it to somewhere.
For example:
11877 ? Ss 0:00 ./bind-mount --pidfile=bind-mount.pid --outfile=bind-mount.out --dirname=bind-mount.test
11880 ? Ss 0:00 \_ ./bind-mount --pidfile=bind-mount.pid --outfile=bind-mount.out --dirname=bind-mount.test
[root@avagin-fc19-cr crtools]# cat /proc/11880/mountinfo
68 42 8:3 /root/git/crtools/test / rw,relatime - ext4 /dev/sda3 rw,data=ordered
43 68 0:33 / /proc rw,relatime - proc proc rw
44 68 0:34 / /dev/pts rw,relatime - devpts pts rw,mode=666,ptmxmode=666
45 68 8:3 /root/git/crtools/test/zdtm/live/static/bind-mount.test/test /zdtm/live/static/bind-mount.test/bind rw,relatime - ext4 /dev/sda3 rw,data=ordered
The 45 mount is bind-mount of the 68 mount.
mi(45)->root = /root/git/crtools/test/zdtm/live/static/bind-mount.test/test
mi(68)->root = /root/git/crtools/test
so the comman part is "/root/git/crtools/test" and the command is
mount --bind /zdtm/live/static/bind-mount.test/test /zdtm/live/static/bind-mount.test/bind
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
These messages are constructed in the same spirit as ARM and x86 ones
except for two major points:
* general-purpose registers are stored in a variable-length array
of uint64's: the architecture provides 32 general-purpose registers
that makes it unfeasible to create a separate protobuf field
for each of them since it requires a lot of "copy-paste" to convert
between the struct pt_regs and protobuf message; the length of
the array storing registers is to be checked by the architecture-
dependent CRIU code;
* AArch64 FP/SIMD registers are 128 bit long while protobuf lacks
the support for integers of this size; the FP/SIMD registers
are stored in an array of uint64, two consecutive elements
of the array represent a single FP/SIMD register.
Signed-off-by: Alexander Kartashov <alekskartashov@parallels.com>
Signed-off-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
CRIU can handle stopped multithreaded processes when all threads
are stopped. Refine the check to allow this case.
Signed-off-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It will be used for restoring files from proper mounts.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We are going to parse fdinfo for getting mnt_id,
so we can take there pos and flags and don't call
fcntl and lseek for that.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
For all other tasks only unsed service descriptors will be closed.
This change allows to have file descriptors, which may be used for
restoring namespaces. All non-server descriptors must be closed before
restoring files.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Opening /dev/null may fail, check for ret code.
CID 1168167
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We have a condition
BUG_ON(kid > UINT_MAX);
but kid is unsigned int so it's never bigger than UINT_MAX,
use unsigned long instead.
CID 1042296
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The type of the field ucontext::uc_sigmask isn't k_rtsigset_t
if the struct ucontext is imported from system headers
rather than provided by an architecture-specific header.
Signed-off-by: Alexander Kartashov <alekskartashov@parallels.com>
Cc: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This patch moves the files arch/$ARCH/include/asm/int.h to
include/asm-generic/int.h and makes the types {u,s}{8,16,32}
be aliases of the fixed sized integer types [u]int{8,16,32}_t.
This makes it possible to use single set of integer typedefs
in all architectural ports.
Signed-off-by: Alexander Kartashov <alekskartashov@parallels.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The devpts instance was mounted w/o the newinstance option if,
the device number is equal to the root /dev/pts.
I think this condition is strong enough to not mount devpts in a
temporary place.
v2: move the host.bla-bla-bla in kerndat.c
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Fixes two issues with efe594f8f4 "criu: fix filemap open permissions":
- Permissions on files with both open file descriptors and mappings.
- Restore compatibility with dumps created by previous versions of criu.
Signed-off-by: Jamie Liu <jamieliu@google.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
maps03 should have caught the bug fixed by 288cf51741 "restore: mutate
tgt_addr in map_private_vma", but didn't because integer literals
(defaulting to 32-bit ints) were shifted out of range.
Signed-off-by: Jamie Liu <jamieliu@google.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
An mmaped file is opened O_RDONLY or O_RDWR depending on the permissions
on the first vma dump_task_mm() encounters mapping that file. This
causes two problems:
1. If a file has multiple MAP_SHARED mappings, some of which are
read-only and some of which are read-write, and the first encountered
mapping happens to be read-only, the file will be opened O_RDONLY
during restore, and mmap(PROT_WRITE) will fail with EACCES, causing
the restore to fail.
2. If a file is opened read-write and mapped read-only, it will be
opened O_RDONLY during restore, so restore will succeed, but
mprotect(PROT_WRITE) on the read-only mapping after restore will
fail.
To fix both of these, record open flags per-vma based on the presence of
VM_MAYWRITE in smaps.
Signed-off-by: Jamie Liu <jamieliu@google.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
prepare_mappings() uses the return value of map_private_vma() for the
size of the mapped vma. Unfortunately the return value of
map_private_vma() is an int, resulting in breakage when the size exceeds
31 bits. Change map_private_vma() to return only an error code, and
mutate addr in-place.
Signed-off-by: Jamie Liu <jamieliu@google.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Using O_PATH known to be buggy on 3.11 kernel so use
direct link reading procedure here.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
post-restore script can fail, so it can't be called after network-unlock.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The --exec-cmd option specifies a command that will be execvp()-ed on successful
restore. This way the command specified here will become the parent process of
the restored process tree.
Waiting for the restored processes to finish is responsibility of this command.
All service FDs are closed before we call execvp(). Standad output and error of
the command are redirected to the log file when we are restoring through the RPC
service.
This option will be used when restoring LinuX Containers and it seems helpful
for perf or other use cases when restored processes must be supervised by a
parent.
Two directions were researched in order to integrate CRIU and LXC:
1. We tell to CRIU, that after restoring container is should execve()
lxc properly explaining to it that there's a new container hanging
around.
2. We make LXC set himself as child subreaper, then fork() criu and ask
it to detach (-d) from restore container afterwards. Being a subreaper,
it should get the container's init into his child list after it.
The main reason for choosing the first option is that the second one can't work
with the RPC service. If we call restore via the service then criu service will
be the top-most task in the hierarchy and will not be able to reparent the
restore trees to any other task in the system. Calling execve from service
worker sub-task (and daemonizing it) should solve this.
Signed-off-by: Deyan Doychev <deyandoichev@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Don't need it, also add DEFINES into try-cc,
after all we define a number of things in
this variable and it's better to pass it
to tests for conditional compilation.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Tested-by: Ruslan Kuprieiv <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It takes only two arguments. Note it's not error since
we don't even reference to a third argument here but
just to be consistent and clear.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Tested-by: Ruslan Kuprieiv <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Currently we marks all mounts as private before restoring mntns. We do
these to avoid problem with pivot_root.
It's wrong, because the root mount can be slave for an external shared
group. The root mount is not mounted by CRIU, so here is nothing wrong.
Now look at the pivot_root code in kernel
if (IS_MNT_SHARED(old_mnt) ||
IS_MNT_SHARED(new_mnt->mnt_parent) ||
IS_MNT_SHARED(root_mnt->mnt_parent))
goto out4;
So we don't need to change options for all mounts. We need to remount
/ and the parent of the new root. It's safe, because we already in another
mntns.
v2: simplify code
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The root mount isn't always private. For example it is mounted
as a slave in LXC 1.0 containers. So we need to execute logic
about propogation for the root mount too.
v2: move all logic about the root mount in a separate function
v3: make code more readable
v4: do_mount_root() looks like other do_*_root() functions
Reported-by: David Shwatrz <dshwatrz@gmail.com>
Cc: David Shwatrz <dshwatrz@gmail.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The current code think that /vz/lxc/centos-6-x86_64-root is
in /vz/lxc/centos-6-x86_64.
If the path is not equal to mountpoint, we need to check, that
path contains a slash after mountpoint.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This patch splits the file arch/x86/vdso-pie.c into machine-dependent
and machine-independent parts by moving the routines vdso_fill_symtable(),
vdso_proxify(), and vdso_remap() to the file pie/vdso.c.
The ARM version of the routines is moved to the source pie/vdso-stub.c
to provide the vDSO proxy stub implementation for architectures
that don't provide the vDSO.
Signed-off-by: Alexander Kartashov <alekskartashov@parallels.com>
Looks-good-to: Cyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
These variables are architecture-specific so they shouldn't appear
in the upcoming generic version of the routine vdso_fill_symtable().
At first glance it seems it's better to make these variables global
and reference them as external in the generic version of the routine.
However experiments with the vDSO restore routine for AArch64 showed
that the AArch64 compiler uses the GOT to access such variables
rendering our blobs unrelocatable.
Signed-off-by: Alexander Kartashov <alekskartashov@parallels.com>
Looks-good-to: Cyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This patch makes the name of the vDSO symbol name table
a bit more appropriate for the generic version of
the routine vdso_fill_symtable().
Signed-off-by: Alexander Kartashov <alekskartashov@parallels.com>
Looks-good-to: Cyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This patch moves the enum VDSO_SYMBOL_* and macros VDSO_SYMBOL_*_NAME
to the x86 specific header since different architectures export
different symbols from their vDSOs.
Signed-off-by: Alexander Kartashov <alekskartashov@parallels.com>
Looks-good-to: Cyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This patch splits the file arch/x86/vdso.c into machine-independent
and machine-dependent parts by moving the routine vdso_init()
from the file vdso.c. The routine seems to be suitable for all
architectures supporting the vDSO.
The ARM version of the routine is moved to the source vdso-stub.c
that is supposed to be the vDSO proxy stub implementation for
architectures that don't provide the vDSO. The build scripts are
adjusted as well to enable selection between the full-fledged
and stub vDSO proxy implementations.
Signed-off-by: Alexander Kartashov <alekskartashov@parallels.com>
Looks-good-to: Cyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Supported machine architectures provide TLS stogares of different sizes:
the size of the TLS storage in x86-64 is 24 bytes, ARM --- 4 bytes
and upcoming AArch64 --- 8 bytes. This means every supported architecture
needs a specific type to store the value of the TLS register.
This patch reworks the insterface of the routines arch_get_tls()
and restore_tls() passing them the TLS storage by pointer
rather than by value to simplify the TLS stub for x86.
Signed-off-by: Alexander Kartashov <alekskartashov@parallels.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
For example restore_shmem_content allocates the page_read structure on
stack.
Cc: Pavel Emelyanov <xemul@parallels.com>
Reported-by: Jenkins Criuovich
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>