2
0
mirror of https://github.com/checkpoint-restore/criu synced 2025-08-30 05:48:05 +00:00

729 Commits

Author SHA1 Message Date
Cyrill Gorcunov
fa0587ed81 restore: Use min_t helper for type casting
On arm

 | CC       crtools.o
 | In file included from arch/arm/include/asm/bitops.h:4:0,
 |                  from arch/arm/include/asm/types.h:9,
 |                  from include/proc_parse.h:5,
 |                  from include/ptrace.h:8,
 |                  from cr-restore.c:27:
 | cr-restore.c: In function 'restore_priv_vma_content':
 | include/compiler.h:60:17: error: comparison of distinct pointer types lacks a cast [-Werror]
 |   (void) (&_min1 == &_min2);  \
 |

Reported-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-11-12 14:57:05 +03:00
Cyrill Gorcunov
c63a42ac2f restore: Use bitmap_set helper
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-11-12 11:15:13 +03:00
Pavel Emelyanov
03b217c0a6 restore: Restore as many pages at once as possible
When the VMA being restored is not COW-ed we read pages from images
one-by-one which results in suboptimal pages.img access. Fix this
by reading as many pages from iamge at once as possible withing the
active pagemap and VMA.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-11-12 11:14:44 +03:00
Pavel Emelyanov
780d699401 page-read: Teach page-read to read multiple pages at once
This is preparatory patch, the problem to solve is described in
the next one.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-11-12 11:14:43 +03:00
Tycho Andersen
934c312554 rst: unmap restore memory after seccomp restore
In order to restore seccomp filters, we need to have access to dynamically
allocated memory from the restorer blob, so we should unmap this memory
afterwards. In order to do this, we need to suspend seccomp earlier, right
after we attach to the tasks instead of just before we do the unmap of the
restorer blob itself.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-11-11 15:57:26 +03:00
Pavel Emelyanov
6f0681c1b1 Revert "rst: Re-use opened fd when restoring private mappings"
This reverts commit 73cb87f9182bf46fceacde1e9023d8d5cdf99de6.

Two reasons: individual VMA-s may require different open flags
and ghost and link-remap files should be properly unlinked at
the end of open_path().

Need some more intelligent solution to this.
2015-11-10 17:20:55 +03:00
Pavel Emelyanov
73cb87f918 rst: Re-use opened fd when restoring private mappings
On restore we do a sequence of open+mmap+close steps. On real apps
there exists chains of private file mappings for the same file with
different pgoffs and/or flags/prots.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-11-10 15:59:28 +03:00
Pavel Emelyanov
68baf8e77d criu: Fault injection core
This patch(set) is inspired by similar from Andrey Vagin sent
sime time earlier.

The major idea is to artificially fail criu dump or restore at
specific places and let zdtm tests check whether failed dump
or restore resulted in anything bad.

This particular patch introduces the ability to tell criu "fail
at X point". Each point is specified with a integer constant
and with the next patches there will appear places over the
code checking for specific fail code being set and failing.

Two points are introduced -- early on dump, right after loading
the parasite and right after creation of the root task.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-10-19 12:42:29 +03:00
Kir Kolyshkin
4456273738 restore_root_task(): fix calling try_clean_remaps
1. As pointed out by Coverity (CID 114629), mnt_ns_fd is closed,
but then the function calls try_clean_remaps(mnt_ns_fd)
which tries to close the file descriptor which is already closed.

To address this, let's use safe_close() which sets closed fd to -1.
As it also checks its argument, there's no need for explicit check
so let's remove "if" check before close().

2. As Pavel pointed out, "calling the whole try_clean_remaps()
is not required once we've passed the cleanup_mnt_ns() point".
This could be addressed by introducing yet another label, but
it's cleaner to just use a flag variable.

Note that since the second issue is being addressed, the first one
goes away, but let's keep the fix for it anyway, it might help in
the future.

Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-10-19 12:33:58 +03:00
Andrew Vagin
14da0f780e restore: wait while processes are dying
If criu restore failed, criu should wait all processes because they
hold files, namespaces and other stuff that caller might want to
have released (in our case it was ploop device).

Here we do this only for cases when processes are restored in a pid
namespace. We'd like to do the same for non-ns case, but there's
no simple way to wait for a bunch of unconnected processes.

Another good side effect is that "Restoring FAILED." will be printed
at the end of the log (now after we kill init tasks still have time
to do smth and write log messages).

Cc: Nikita Spiridonov <nspiridonov@odin.com>
Reported-by: Nikita Spiridonov <nspiridonov@odin.com>
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-10-14 15:49:25 +03:00
Matthew Krafczyk
29c08d8672 Add pre-dump and pre-restore action scripts
This allows the user to perform actions before dumping or restoration
occurs.

Signed-off-by: Matthew Krafczyk <krafczyk.matthew@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-10-09 18:23:41 +03:00
Andrew Vagin
d9b1b9ff37 restore: fix checking error code of sys_sigaction
sys_sigaction() returns an error code

Reported-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-10-08 13:19:33 +03:00
Cyrill Gorcunov
0516001f91 restore: Report error if write into last pid failed
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-09-30 12:32:03 +03:00
Pavel Emelyanov
efa7dcf7c2 ghost: Remove ghost files if restore fails
Issue #18. When restore fails ghost files remain there. And
to remove them we have to know their list, paths to original
files (to construct the ghost name) and the namespace ghost
lives in.

For the latter we keep the restore task namespace at hands
till the final stage and setns into it to kill ghosts.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-09-28 22:00:37 +03:00
Pavel Emelyanov
b0e23c3d4f files: Collect ghosts and regilfes early
Info about ghosts presence and paths will be needed to
remove the ghosts itself and thus are needed in criu.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-09-28 22:00:35 +03:00
Pavel Emelyanov
7ca6cc1eb2 mnt: Clean roots yard from criu process
So here it is. If root task dies on restore the roots yard
dir remains unrmdired :( Since we already know its name, we
can remove one from criu. By the time we get to this place
the sub mount namespace(s) are already dead and yard dir
is empty. But umounting should be done by tasks after
successfull restore, so keep depopulation there.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-09-28 21:57:35 +03:00
Pavel Emelyanov
3e7c92ed02 mnt: Renames around roots yard
Same thing as in previous patch -- we have too many generic
clean_ and fini_ prefixes over the code. And we need more (see
next patch), so let's specify what exactly we clean or fini.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-09-28 21:57:21 +03:00
Pavel Emelyanov
e3f5ba3c37 ns: Prepare namespaces before tasks
There's already two things we do in criu namespaces before
forking the init task (start unsd and keep netnsfd for back
reference). Next patches will introduce the 3rd action for
mount namespaces, so have a special pre-call for all this
stuff.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-09-28 21:56:26 +03:00
Christopher Covington
f9ae6d9dd4 Replace remaining hard-coded TASK_SIZE use
If we want one CRIU binary to work across all AArch64 kernel
configurations, a single task size value cannot be hard coded. While
trivial applications successfully checkpoint and restore on AArch64
kernels with CONFIG_ARM64_64K_PAGES=y without this patch, replacing
the remaining use of the hard-coded value seems like the best way to
guard against failures that more complex process trees and future uses
may expose.

Signed-off-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-08-03 17:14:19 +03:00
Christopher Covington
1438f013a2 Pass task_size to vma_area_is_private()
If we want one CRIU binary to work across all AArch64 kernel
configurations, a single task size value cannot be hard coded. Since
vma_area_is_private() is used by both restorer blob code and non
restorer blob code, which must use different variables for recording
the task size, make task_size a function argument and modify the call
sites accordingly. This fixes the following error on AArch64 kernels
with CONFIG_ARM64_64K_PAGES=y.

  pie: Error (pie/restorer.c:929): Can't restore 0x3ffb7e70000 mapping w>
  pie: ith 0xfffffffffffffff7

Signed-off-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-08-03 17:14:18 +03:00
Christopher Covington
7451fc7d23 restorer: Replace most hard-coded TASK_SIZE use
If we want one CRIU binary to work across all AArch64 kernel
configurations, a single task size value cannot be hard coded.
This fixes the following error on AArch64 kernels with
CONFIG_ARM64_64K_PAGES=y.

  pie: Error (pie/restorer.c:772): Unable to unmap (-): -1211695104

Signed-off-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-08-03 17:14:17 +03:00
Andrew Vagin
f13ec96e58 restore: fix race in calculation of a number of zombies
Currently each task subtracts number of zombies from
task_entries->nr_threads without locks, so if two tasks will do this
operation concurrently, the result may be unpredictable.

https://github.com/xemul/criu/issues/13

Cc: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-08-03 17:12:10 +03:00
Christopher Covington
69d008d567 Use run-time page_size() for mremap
The old and new address parameters passed to the mremap system
call must be page size aligned. On AArch64, the page size can
only be correctly determined at run time. This fixes the following
errors for CRIU on AArch64 kernels with CONFIG_ARM64_64K_PAGES=y.

      call mremap(0x3ffb7d50000, 8192, 8192, MAYMOVE | FIXED, 0x2a000)
  Error (rst-malloc.c:201): Can't mremap rst mem: Invalid argument

      call mremap(0x3ffb7d90000, 8192, 8192, MAYMOVE | FIXED, 0x32000)
  Error (rst-malloc.c:201): Can't mremap rst mem: Invalid argument

Signed-off-by: Christopher Covington <cov@codeaurora.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-07-28 13:38:30 +03:00
Christopher Covington
b61224bffe Use run-time page_size() in pie_size()
This fixes the following error for CRIU on AArch64 kernels with
CONFIG_ARM64_64K_PAGES=y.

  Error (cr-restore.c:2828): Can't mmap section for restore code

This occurred because the address being requested (0x16000 in
one case) was not page aligned.

Also change the capitalization of the pie_size() macro to make it
clear that the value is not necessarily a build-time constant.

Signed-off-by: Christopher Covington <cov@codeaurora.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-07-28 13:38:20 +03:00
Tycho Andersen
5f729636b4 rst: don't hang when SIGCHLD is coalesced
When a TASK_HELPER would exit just before a zombie, sometimes the signal
would get coalesced, and we would miss the zombie exit, causing us to block
forever waiting for the zombie to complete. Let's use an entirely different
strategy for waiting on zombies: explicitly wait on them with waitid, and
use WNOWAIT to prevent their data from actually being reaped.

v2: don't decrement nr_{tasks,threads} in the loop

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-07-23 15:17:55 +03:00
Tycho Andersen
7b20f42f78 rst: move lsm memory allocations before rst_mem_lock()
8ffbe754bd9 moved the rst_mem_lock() call, but didn't move the
corresponding LSM allocations, so we do that here.

One unfortunate thing is that we have to split this into two steps: first
we have to read the creds to figure out exactly how much memory to
allocate for the lsm string. Since prepare_creds() wants to write directly
to the task_restore_args struct and that can't be allocated until after we
lock the restore memory, we break it up into two steps.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-07-16 14:26:44 +03:00
Andrey Vagin
445dbd9d09 log: don't forget LF for pr_err()
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-07-16 14:24:13 +03:00
Pavel Emelyanov
f231a11908 rst: Remove actually unused pid arg from restorer_get_vma_hint
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-07-14 14:03:00 +03:00
Pavel Emelyanov
6fe296a26e rst: Remove actually unused pid arg from restore_one_zombie
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-07-14 14:02:53 +03:00
Pavel Emelyanov
dc149e884d rst: Remove actually unused pid arg from prepare_mappings
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-07-14 14:02:45 +03:00
Pavel Emelyanov
73e3925bcd pstree_item: Keep has_seccomp field on rst_info tail
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-07-14 14:01:48 +03:00
Pavel Emelyanov
8ffbe754bd rst: Lock rst memory allocations earlier
After we got the total remapable rst memory size, we no longer
can allocate from it, otherwise the bootstrap area will not
have enough size.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-07-14 14:00:27 +03:00
Pavel Emelyanov
d9a9d4c9b3 rst: Fix timerfd rst memory management
It's similar to previous patch with tcp mem -- no need to
realloc big arrays and then memcpy data between them. It's
enough just to walk timerfd objects at the very end.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-07-14 13:59:39 +03:00
Pavel Emelyanov
73e303c8e2 rst: Fix rst_tcp_sock memory management
In current scheme we grow an array with realloc()-s then
memcpy() the result into rst_mem. I propose to get rid
or realloc-s (we already have objects for the data we
need to keep) and memcpy-s (and put objects directly
into rst_mem at the end).

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-07-14 13:59:21 +03:00
Pavel Emelyanov
7166e3c984 rst: Fix helpers memory allocation
Calling rst_mem_alloc() in a loop with increasing size causes the
n^2 memory grow :) since _alloc is not _realloc.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-07-14 13:59:08 +03:00
Tycho Andersen
209693d49b don't assume the kernel has CONFIG_SECCOMP
linux/seccomp.h may not be available, and the seccomp mode might not be
listed in /proc/pid/status, so let's not assume those two things are
present.

v2: add a seccomp.h with all the constants we use from linux/seccomp.h
v3: don't do a compile time check for PTRACE_O_SUSPEND_SECCOMP, just let
    ptrace return EINVAL for it; also add a checkskip to skip the
    seccomp_strict test if PTRACE_O_SUSPEND_SECCOMP or linux/seccomp.h
    aren't present.
v4: use criu check --feature instead of checkskip to check whether the
    kernel supports seccomp_suspend

Reported-by: Mr. Jenkins
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Andrew Vagin <avagin@odin.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-07-13 14:50:35 +03:00
Tycho Andersen
e0b24e21d3 creds: fail to dump when creds in thread group don't match
Since we don't support dumping per-thread creds, let's at least fail to
dump if the creds don't match.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-06-24 17:38:35 +03:00
Tycho Andersen
0d8aec0c3a seccomp: add initial support for SECCOMP_MODE_STRICT
Unfortunately, SECCOMP_MODE_FILTER is not currently exposed to userspace,
so we can't checkpoint that. In any case, this is what we need to do for
SECCOMP_MODE_STRICT, so let's do it.

This patch works by first disabling seccomp for any processes who are going
to have seccomp filters restored, then restoring the process (including the
seccomp filters), and finally resuming the seccomp filters before detaching
from the process.

v2 changes:

* update for kernel patch v2
* use protobuf enum for seccomp type
* don't parse /proc/pid/status twice

v3 changes:

* get rid of extra CR_STAGE_SECCOMP_SUSPEND stage
* only suspend seccomp in finalize_restore(), just before the unmap
* restore the (same) seccomp state in threads too; also add a note about
  how this is slightly wrong, and that we should at least check for a
  mismatch

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-06-24 17:38:32 +03:00
Andrey Vagin
2a0c8db72b proc: mount proc with minimal permissions
Eric wants to restrict permissions for proc mounts in a non-root userns
according with proc mounts in the root userns.

Author: Eric W. Biederman <ebiederm@xmission.com>
Date:   Fri May 8 23:49:47 2015 -0500

    mnt: Modify fs_fully_visible to deal with locked ro nodev and atime

    Ignore an existing mount if the locked readonly, nodev or atime
    attributes are less permissive than the desired attributes
    of the new mount.
...

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-06-19 12:20:15 +03:00
Tycho Andersen
081a5b9e77 pie: use the /proc fd for last pid
Instead of keeping around multiple fds that point to various places in
/proc, let's just use /proc and openat() things relative to it.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-06-16 12:17:37 +03:00
Tycho Andersen
7083fc370d lsm: restore lsm bits per tid instead of per pid
This is a little tricky, since the threads are forked in the restorer blob, we
can't open their attr/curent files to pass into the restorer blob. So, we pass
in an fd for /proc that the restorer blob can use to access the attr/current
files once they exist.

N.B. this is still incorrect in that it restores the same credentials for all
threads in the group; however, it matches the behavior of the current creds
restore code, which also restores the same creds for all threads in the group.

v2: use simple_sprintf() instead of pie_strcat()

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-06-16 12:17:36 +03:00
Cyrill Gorcunov
1998fbfa87 pie: relocs -- Fix compilation on ARM
Otherwise getting

 | parasite-syscall.c: In function ‘parasite_infect_seized’:
 | parasite-syscall.c:1222:5: error: ‘elf_relocs’ undeclared (first use in this function)

Simply wrap the @elf_relocs_apply with macros.

Reported-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-06-16 11:40:20 +03:00
Cyrill Gorcunov
ea0fd2aa08 pie: piegen -- Make different names for parasite and restorer relocs
Otherwise it's confusing since.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-06-15 21:15:57 +03:00
Cyrill Gorcunov
46270d11fe pie: Use PIE_SIZE helper
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-06-08 23:53:37 +03:00
Cyrill Gorcunov
f03a4672ce pie: piegen -- Slightly rework the building procedure
- Move relocs application into a separate file which get
   compiled as a regular C file in criu (pie/pie-relocs.[ch])
 - Move types used by piegen into pie/piegen/uapi/types.h

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-06-08 23:53:27 +03:00
Cyrill Gorcunov
ac187856c4 pie: x86 -- Do a real call for applying relocations
At moment both parasite and restorer do not have
any relocs because we support x86-64 only, but
this will be changed soon so do a call and apply
relocations.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-06-08 23:53:26 +03:00
Cyrill Gorcunov
c84fa8c506 pie: x86 -- Adjust size of parasite and restorer code
In case of @gotpcrel relocations we need additional
space to carry pointers.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-06-08 23:53:24 +03:00
Pavel Emelyanov
7a9813346b rst: Sanitize standard arrays remapping
On restore we have several arrays of objects that get remapped
into pie area and their number is also passed. Clean and shorten
the remapping code a bit and bing their naming to common format.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-06-08 23:39:27 +03:00
Tycho Andersen
6979838793 ensure SIGCHLD isn't inherited as blocked
Use SIG_SETMASK instead of SIG_BLOCKMASK here in case the parent had SIGCHLD
blocked. In this case if one of the criu threads has a problem, since the
SIGCHLD is blocked, the restore simply hangs.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-06-08 23:35:48 +03:00
Pavel Emelyanov
b08f3fae5b vdso: Reduce the amount of in-code ifdef-s
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Reviewed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2015-06-08 23:34:33 +03:00