2
0
mirror of https://github.com/checkpoint-restore/criu synced 2025-08-31 06:15:24 +00:00
Commit Graph

717 Commits

Author SHA1 Message Date
Andrey Vagin
0b33bac3bc criu: allow the root task to handle SIGCHLD
If criu process attaches to the root task (it happens for opts.swrk_restore
and opts.restore_detach) with ptrace, then any signal delivered to the root
would be also delivered to criu. The latter woult treat the former to die
due to this delivery and would abort the restore.

Fix it by checking that criu (current == NULL) gets ptrace notification
(si_code == CLD_TRAPPED) about signal delivered (si_status = SIGCHLD,
no other signals are allowed by the restoring tasks).

This patch fixes the following error of static/zombie00:

Execute zdtm/live/static/zombie00
./zombie00 --pidfile=zombie00.pid --outfile=zombie00.out
Dump 2207
Restore
Test: zdtm/live/static/zombie00, Result: FAIL
==================================== ERROR ====================================
Restore log: /root/git/orig/criu/test/dump/static/zombie00/2207/1/restore.log
(00.026826) Error (cr-restore.c:1085): 2207 killed by signal 17
(00.026985) Error (cr-restore.c:1706): Restoring FAILED.
================================= ERROR OVER =================================

Reported-by: Mr Jenkins
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-14 17:09:53 +04:00
Tycho Andersen
e301b1d56c restore: --restore-detached implies CLONE_PARENT
We need to use CLONE_PARENT to prevent processes from immediately dying due to
pdeath_sig when they are restored in detached mode.

[ xemul: One more place which requires check for restore-detach
         is in sigactions preparation ]

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-14 12:25:07 +04:00
Pavel Emelyanov
15b39a1dd5 pstree: Use task_alive() instead of switch()-es
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-12 14:41:10 +04:00
Pavel Emelyanov
548625132d pstree: Introduce task_alive() helper
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-12 14:41:00 +04:00
Pavel Emelyanov
7960379f71 flock: Merge all file lock entries into single image file
They are now in per-pid images, but every entry contains a
pid to which it "belongs". This belonging is fake -- it's
just a pid of a task who placed the lock, while locks really
belong to files. We even have a bug when task that locked
a file exited and "delegated" the lock to its child.

This images merge reduces the amount of image files criu
generates and may simplify the fix of mentioned above issue.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-12 14:38:49 +04:00
Garrison Bellack
4c7bc7678e Cgroup property restoration infrastructure
Restores 2 cgroup properties after the criu restoration of tasks.
Currently the cgroup files to be restored are static but
are easily extendable. To change the properties to be restored,
edit this list at the top of cgroup.c. If a cgroup exists during
restoration, its properties will not be overwritten.
Work based off Tycho Anderson tycho.andersen@canonical.com

Change-Id: Ida32b9773eeac1d4d6e82ad644524ed099d5f9b1
Signed-off-by: Garrison Bellack <gbellack@google.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-08 17:06:08 +04:00
gbellack
9752c11d23 Quick bug fix for missing fd for move_in_cgroup
There is an issue where if the proccess to be killed spawns a child proccess and
moves it in a child cgroup of the one the parent process is in, the cgroup fd
was being closed in the parent process before it forked the child. Then when
move_in_cgroup() is called for the child process, the file descriptor has
already been closed causing a failure for the second call to move_in_cgroup().
Moved the fd close after the fork call.

Change-Id: I6ae88b95c5410a7f56108e28eb3133f113e868d0
Signed-off-by: Garrison Bellack <gbellack@google.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-08 17:04:39 +04:00
Andrey Vagin
7a203afe0a restore: fix index for accessing entries of the parent_act array
SIGMAX is a valid value, but the 0 signal doesn't exist.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-07 17:29:49 +04:00
Andrew Vagin
e44f4e7acd restore: restore sigaction for alive tasks
The helper task doesn't change sigaction and does nothing with
parent_sigacts. paren_sigacts will contain values for the previous alive
task, so the logic about inherence should work as expected.

Reported-by: Jenkins Criuovich
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-07 12:12:20 +04:00
Pavel Emelyanov
b674caf2ff sig: Add some logging to sigactions restore
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-07 11:05:54 +04:00
Pavel Emelyanov
50f712e9df sig: Optimize sigactions restore
Most of the sigactions are the same across the tasks in the image.
Nonetheless existing code always calls a syscall to restore them
and spends 64 calls per-task.

Let's restore signals before forking children and let them inherit
sigactions. Tune one only if it differs from the parent's.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
2014-08-07 11:05:47 +04:00
Pavel Emelyanov
bf0d4c4b2c sig: Block signals once before forking children
We already have a signals setup helper for this.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
2014-08-07 11:05:33 +04:00
Pavel Emelyanov
8c133309a3 sig: Setup CHLD handler in dedicated helper
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-07 11:05:19 +04:00
Pavel Emelyanov
e50d0e7c6f sig: Don't reset CHLD handler to old action, DFL it
The whole idea behind this code was to stop receiving CHLD from
restored tasks after resume. The comment about this is done for
scripts is wrong (we call more scripts before this) because
sigchld_handler() knows about scripts:

commit de71bc6917
	 exit = (siginfo->si_code == CLD_EXITED);
	 status = siginfo->si_status;
	+
	+       /* skip scripts */
	+       if (!current && root_item->pid.real != pid) {
	+               pid = waitpid(root_item->pid.real, &status, WNOHANG);
	+               if (pid <= 0)
	+                       return;
	+       }

And since CHLD handler makes little sence after exec, it's easier
just to reset one to default action at the end.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
2014-08-07 11:05:11 +04:00
Pavel Emelyanov
adc63c73d5 sig: Instantly drop SA_NOCLDSTOP for swrk_restore
We tune the CHLD handler if we're restoring root task
as sibling. This tuning is better to be done with one
sigaction() call, rather than two. First, it's shorter
and the second -- it will allow us to move the whole
criu signalling setup into one helper.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
2014-08-07 11:04:21 +04:00
Pavel Emelyanov
bc7d6e315d sig: Don't feed pid argument to prepare_sigactions
We don't need pid in any of these calls actually, they are
all legacy from the old days. I plan to move the call to
prepare_sigactions, so remove the pid argument in advance.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
2014-08-07 11:04:08 +04:00
Pavel Emelyanov
d14abcf7c3 sig: Don't request for old act when restoring sigactions
This old info is simply not used at that place.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
2014-08-07 11:03:58 +04:00
Tycho Andersen
2b1021a43b restore: actually fail if clone() fails
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-07 10:20:59 +04:00
Cyrill Gorcunov
ecd432fe27 timerfd: Implement c/r procedure
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-06 19:20:09 +04:00
Pavel Emelyanov
57965aabaa rst: Check for task->state to restore in one place
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-06 09:37:14 +04:00
Cyrill Gorcunov
6906e1a830 vdso: Drop unneeded @vdso_rt_vma_size variable
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-04 15:34:22 +04:00
Ruslan Kuprieiev
9f8a7ccaad restore: sigreturn_restore: free core _after_ using it
Currently we have this:
	.......
	/* No longer need it */
	core_entry__free_unpacked(core, NULL);

	ret = prepare_itimers(pid, core, task_args);
	if (ret < 0)
		goto err;
	.......

So we're using ptr right after free-ing it.

Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-04 13:09:02 +04:00
Pavel Emelyanov
9b91bf390d files: Split fs restore into prepare and restore
The prepare one will become more complicated soon.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-07-04 15:09:03 +04:00
Pavel Emelyanov
b8d01d1b7a files: Rename prepare_fs into restore_fs
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-07-04 15:09:02 +04:00
Pavel Emelyanov
b429492dbc rst: Include criu/include/ptrace.h instead of system one
On ARM some PTRACE_... constants are not declared in sys/ptrace.h file.
They are in linux/ptrace.h, but on x86 this file somewhat conflicts with
the sys/ one. For now fix ARM compilation by using criu/ one and think
of it later.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-07-01 19:48:23 +04:00
Pavel Emelyanov
84eb0a1927 criu: Restore tasks as siblings in swrk
Andrey validly pointed out, that restoring pdeath_sig is not
compatible with criu_restore_child() call -- after criu restore
children, it will exit and fire the pdeath_sig into restored
tree root, potentially killing it.

The fix for that could be -- when started in swrk more, criu can
restore tree not as children tasks, but as siblings, using the
CLONE_PARENT flag when fork()-ing the root task.

With this we should also take care about errors handing -- right
now criu catches the SIGCHILD from dying children tasks, and
since we plan to create them be children of the criu parent (the
library caller) we will not be able to catch them. To do so we
SEIZE the root task in advance thus causing all SIGCHLD-s go to
criu, not to its parent.

Having this done we no longer need the SUBREAPER trick in the
library call -- tasks get restored right as callers kids :)

Some thoughts for future -- using this trick we can finally make
"natural" restoration of shell jobs. I.e. -- make criu restore
some subtree right under bash, w/o leaving itself as intermediate
task and w/o re-parenting the subtree to init after restore.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrey Vagin <avagin@parallels.com>
2014-07-01 16:16:07 +04:00
Pavel Emelyanov
5e9c57a13d criu: Dump and restore pdeath_sig value
The implementation is pretty straightforward. When dumping per-thread
misc data with parasite, collect one, then write in thread_core_info.

On restore wait for creds restore and put the value back (some creds
changes drop it to zero).

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2014-07-01 16:16:04 +04:00
Cyrill Gorcunov
fe7b8aeb8c vdso: x86 -- Add handling of vvar zones
New kernel 3.16 will have old vDSO zone splitted into the two vmas:
one for vdso code itself and second that named vvar for data been
referenced from vdso code.

Because I can't do 'dump' and 'restore' parts of the code separately
(otherwise test would fail) the commit is pretty big one and hard to
read so here is detailed explanation what's going on.

 1) When start dumping we detect vvar zone by reading /proc/pid/smap
    and looking up for "[vvar]" token. Note the vvar zone is mapped
    by a kernel with PF/IO flags so we should not fail here.

    Also it's assumed that at least for now kernel won't be changed
    much and [vvar] zone always follows the [vdso] zone, otherwise
    criu will print error.

 2) In previous commits we disabled dumping vvar area contents so
    the restorer code never try to read vvar data but still we need
    to map vvar zone thus vma entry remains in image.

 3) As with previous vdso format we might have 2 cases

    a) Dump and restore is happening on same kernel
    b) Dump and restore are done on different kernels

    To detect which case we have we parse vdso data from image
    and find symbols offsets then compare their values with runtime
    symbols provided us by a kernel. If they match and (!!!) the
    size of vvar zone is the same -- we simply remap both zones
    from runtime kernel into the positions dumpee had at checkpoint
    time. This is that named "inplace" remap (a).

    If this happens the vdso_proxify() routine drops VMA_AREA_REGULAR
    from vvar area provided by a caller code and restorer won't try
    to handle this vma. It looks somehow strange and probably should
    be reworked but for now I left it as is to minimize the patch.

    In case of (b) we need to generate a proxy. We do that in same
    way as we were before just include vvar zone into proxy and save
    vvar proxy address inside vdso mark injected into vdso area. Thus
    on subsequent checkpoint we can detect proxy vvar zone and rip
    it off the list of vmas to handle.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-06-24 22:48:43 +04:00
Pavel Emelyanov
3659d60ab7 restore: Open /proc/sys/kernel/ns_last_pid via helpers
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
2014-06-09 15:29:49 +04:00
Pavel Emelyanov
203c291467 cg: Restore tasks into proper cgroups
On restore find out in which sets tasks live in and move
them there.

Optimization note -- move tasks into cgroups _before_ fork
kids to make them inherit cgroups if required. This saves
a lot of time.

Accessibility note -- when moving tasks into cgroups don't
search for existing host mounts (they may be not available)
and don't mount temporary ones (may be impossible due to
user namespaces). Instead introduce service fd with a yard
of mounts.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-27 23:48:06 +04:00
Pavel Emelyanov
8b8eb53a0a cg: Skeleton for cgroup code
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-27 23:48:06 +04:00
Cyrill Gorcunov
676708e3b3 vdso: Put CONFIG_VDSO where needed
Guard vDSO code with CONFIG_VDSO, no need to even build it
on archs which do not support vDSO handling.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Alexander Kartashov <alekskartashov@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-27 23:40:07 +04:00
Andrey Vagin
1a1b50168d vma: don't skip vmas during searching a parent vma
We stop searching if vma->start is bigger than a required one.
The coursor is set on the last examined vma. When we are searching a
parent vma for the next vma, we start examine vma-s starting from
coursor->next, so we don't examine the vma, which is pointed by cursor.

This patch replaces list_for_each_entry_continue on list_for_each_entry_from.

Reported-by: Filipe Brandenburger <filbranden@google.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-14 01:00:31 +04:00
Andrey Vagin
c838494034 mem: stop searching a parent vma if we found one
A parent vma can be only one.

Fixes: 57d25e7cea ("mm: fix expression to determine which vma-s can be shared")
Reported-by: Filipe Brandenburger <filbranden@google.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-14 01:00:22 +04:00
Andrey Vagin
b57497ade5 mem: fix typo in determining an address of parent vma
Look at this hunk from 7659c995f5:
-    paddr = decode_pointer(vma_premmaped_start(&p->vma));
+    paddr = decode_pointer(vma->premmaped_addr);

Obviously we want to use p->premmaped_addr instead of
vma->premmaped_addr.

Fixes: 7659c995f5 ("vm: don't overwrite vma->shmid for private mappings")
Reported-by: Filipe Brandenburger <filbranden@google.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-14 01:00:13 +04:00
Andrey Vagin
20003300d8 mount: remove extra calls of mntns_collect_root()
Now mntns_collect_root() should be called each time when we need to get
a root of a specified namespace and we don't need to call it for
initializing the global variable.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-22 23:49:43 +04:00
Pavel Emelyanov
79f3e90856 rst: Less arguments to restore_task_mnt_ns
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-22 23:48:46 +04:00
Pavel Emelyanov
8550f52017 mnt: Move local mntns collecting on restore into prepare_mnt_ns
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-22 23:48:43 +04:00
Andrey Vagin
2f4be997b6 mount: use per-namespace mntinfo_tree (v2)
This patch removes the global mntinfo_tree and collect_mount_info where
it was constructed. The mntinfo list is filled from dump_mnt_ns,
rst_collect_local_mntns, collect_mnt_namespaces and read_mnt_ns_img.

A mountinfo entry contains a reference on a proper ns_id entry, so
we cau use mnt_id to look up a proper mount namespace.

v2: remove trash after rebasing.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:40:19 +04:00
Andrey Vagin
5418938ec3 resotre: collect mounts of current mntns
It's required for restoring in the current mntns.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:39:46 +04:00
Andrey Vagin
de4326a382 mount: return descriptor from mntns_collect_root
We are going to support nested mount namespaces, so files can be opened
from more than one namespace and a root must be collect for each file.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:39:32 +04:00
Andrey Vagin
36a9f31665 restore: close PROC_FD_OFF before calling sigreturn
mntns_collect_root() uses PROC_FD_OFF, so we need to close it.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:39:30 +04:00
Andrey Vagin
d2012883ab criu: rename current_ns_mask to root_ns_mask (v2)
Now we supports sub-mntns, so root_ns_mask sounds more correct than
current_ns_mask.

v2: typo fix
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:38:33 +04:00
Andrey Vagin
3a291e33ff crtools: restore nested mount namespaces (v2)
Known issue:
* currently only namespaces with the same root is supported
* nested namespaces can be dumped and restored only if the root task
  has own mount namespace.

All nested namespaces are restored in a root namespace in temporary
directories. All mount points restored in one tree and then they are
divided into namesaces.
The task with minimal pid for each namespaces unshared mntns and
then it makes pivot_root in a proper temporary directory. All other
tasks makes setns to enter into a mount namespace of the task with
minimal pid.

v2: clean up

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:38:17 +04:00
Andrey Vagin
e7e9c2ee6e mounts: create a temporary directory for restoring non-root mntns (v2)
All non-root namespaces will be restored as sub-trees of the root tree.

This patch adds helpers to create a temporary directory and mount tmpfs
in it, then create directories for each non-root mount namespace.

tmpfs is quite useful here to simplify destroying this construction,
we don't need to unmount each namespace separately.

v2: add a comment why MNT_DETACH is not dangerous here
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:38:12 +04:00
Pavel Emelyanov
e8ac085af8 Revert "crtools: close all desriptors only for the root task"
We have a race. Consider we have 3 tasks, A, B and C. A and B
share fdtable, C -- does not. Then we might be in a situation
when A is restoring memory reading mem images, and B -- forking
the C child. In that case descriptors held by A (for mem restore)
will be inherited by C and will not get closed.

This reverts commit d36e07aabe.
2014-04-21 14:48:05 +04:00
Andrey Vagin
946eadd598 mount: open_mount uses __open_mountpoint instead of own logic
Now we have two funсtions which do mostly the same, so this patch merges
them.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-17 12:03:15 +04:00
Pavel Emelyanov
302591aa05 rlimits: Reshuffle new and legacy restoration code
Do the same as was done with timers.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-17 12:01:10 +04:00
Pavel Emelyanov
1d438db66d rlimits: Move entries from top-core into task-core
This appeared after latest 1.2, so it's still possible
to do this move.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-17 12:01:08 +04:00
Pavel Emelyanov
35e560de00 timers: Reshuffle new and legacy restoration code
Make explicit checks and helpers for legacy images.
This should facilitate its removal some day in the
future.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-17 12:01:06 +04:00