mir/criu - criu - Mike's Git repositories

mir/criu

mirror of https://github.com/checkpoint-restore/criu synced 2025-08-31 14:25:49 +00:00

Author	SHA1	Message	Date
Tycho Andersen	e301b1d56c	restore: --restore-detached implies CLONE_PARENT We need to use CLONE_PARENT to prevent processes from immediately dying due to pdeath_sig when they are restored in detached mode. [ xemul: One more place which requires check for restore-detach is in sigactions preparation ] Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-08-14 12:25:07 +04:00
Pavel Emelyanov	15b39a1dd5	pstree: Use task_alive() instead of switch()-es Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-08-12 14:41:10 +04:00
Pavel Emelyanov	548625132d	pstree: Introduce task_alive() helper Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-08-12 14:41:00 +04:00
Pavel Emelyanov	7960379f71	flock: Merge all file lock entries into single image file They are now in per-pid images, but every entry contains a pid to which it "belongs". This belonging is fake -- it's just a pid of a task who placed the lock, while locks really belong to files. We even have a bug when task that locked a file exited and "delegated" the lock to its child. This images merge reduces the amount of image files criu generates and may simplify the fix of mentioned above issue. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-08-12 14:38:49 +04:00
Garrison Bellack	4c7bc7678e	Cgroup property restoration infrastructure Restores 2 cgroup properties after the criu restoration of tasks. Currently the cgroup files to be restored are static but are easily extendable. To change the properties to be restored, edit this list at the top of cgroup.c. If a cgroup exists during restoration, its properties will not be overwritten. Work based off Tycho Anderson tycho.andersen@canonical.com Change-Id: Ida32b9773eeac1d4d6e82ad644524ed099d5f9b1 Signed-off-by: Garrison Bellack <gbellack@google.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-08-08 17:06:08 +04:00
gbellack	9752c11d23	Quick bug fix for missing fd for move_in_cgroup There is an issue where if the proccess to be killed spawns a child proccess and moves it in a child cgroup of the one the parent process is in, the cgroup fd was being closed in the parent process before it forked the child. Then when move_in_cgroup() is called for the child process, the file descriptor has already been closed causing a failure for the second call to move_in_cgroup(). Moved the fd close after the fork call. Change-Id: I6ae88b95c5410a7f56108e28eb3133f113e868d0 Signed-off-by: Garrison Bellack <gbellack@google.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-08-08 17:04:39 +04:00
Andrey Vagin	7a203afe0a	restore: fix index for accessing entries of the parent_act array SIGMAX is a valid value, but the 0 signal doesn't exist. Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-08-07 17:29:49 +04:00
Andrew Vagin	e44f4e7acd	restore: restore sigaction for alive tasks The helper task doesn't change sigaction and does nothing with parent_sigacts. paren_sigacts will contain values for the previous alive task, so the logic about inherence should work as expected. Reported-by: Jenkins Criuovich Signed-off-by: Andrew Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-08-07 12:12:20 +04:00
Pavel Emelyanov	b674caf2ff	sig: Add some logging to sigactions restore Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-08-07 11:05:54 +04:00
Pavel Emelyanov	50f712e9df	sig: Optimize sigactions restore Most of the sigactions are the same across the tasks in the image. Nonetheless existing code always calls a syscall to restore them and spends 64 calls per-task. Let's restore signals before forking children and let them inherit sigactions. Tune one only if it differs from the parent's. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Andrew Vagin <avagin@parallels.com>	2014-08-07 11:05:47 +04:00
Pavel Emelyanov	bf0d4c4b2c	sig: Block signals once before forking children We already have a signals setup helper for this. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Andrew Vagin <avagin@parallels.com>	2014-08-07 11:05:33 +04:00
Pavel Emelyanov	8c133309a3	sig: Setup CHLD handler in dedicated helper Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-08-07 11:05:19 +04:00
Pavel Emelyanov	e50d0e7c6f	sig: Don't reset CHLD handler to old action, DFL it The whole idea behind this code was to stop receiving CHLD from restored tasks after resume. The comment about this is done for scripts is wrong (we call more scripts before this) because sigchld_handler() knows about scripts: commit `de71bc6917` exit = (siginfo->si_code == CLD_EXITED); status = siginfo->si_status; + + /* skip scripts */ + if (!current && root_item->pid.real != pid) { + pid = waitpid(root_item->pid.real, &status, WNOHANG); + if (pid <= 0) + return; + } And since CHLD handler makes little sence after exec, it's easier just to reset one to default action at the end. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Andrew Vagin <avagin@parallels.com>	2014-08-07 11:05:11 +04:00
Pavel Emelyanov	adc63c73d5	sig: Instantly drop SA_NOCLDSTOP for swrk_restore We tune the CHLD handler if we're restoring root task as sibling. This tuning is better to be done with one sigaction() call, rather than two. First, it's shorter and the second -- it will allow us to move the whole criu signalling setup into one helper. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Andrew Vagin <avagin@parallels.com>	2014-08-07 11:04:21 +04:00
Pavel Emelyanov	bc7d6e315d	sig: Don't feed pid argument to prepare_sigactions We don't need pid in any of these calls actually, they are all legacy from the old days. I plan to move the call to prepare_sigactions, so remove the pid argument in advance. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Andrew Vagin <avagin@parallels.com>	2014-08-07 11:04:08 +04:00
Pavel Emelyanov	d14abcf7c3	sig: Don't request for old act when restoring sigactions This old info is simply not used at that place. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Andrew Vagin <avagin@parallels.com>	2014-08-07 11:03:58 +04:00
Tycho Andersen	2b1021a43b	restore: actually fail if clone() fails Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-08-07 10:20:59 +04:00
Cyrill Gorcunov	ecd432fe27	timerfd: Implement c/r procedure Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-08-06 19:20:09 +04:00
Pavel Emelyanov	57965aabaa	rst: Check for task->state to restore in one place Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-08-06 09:37:14 +04:00
Cyrill Gorcunov	6906e1a830	vdso: Drop unneeded @vdso_rt_vma_size variable Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-08-04 15:34:22 +04:00
Ruslan Kuprieiev	9f8a7ccaad	restore: sigreturn_restore: free core _after_ using it Currently we have this: ....... /* No longer need it */ core_entry__free_unpacked(core, NULL); ret = prepare_itimers(pid, core, task_args); if (ret < 0) goto err; ....... So we're using ptr right after free-ing it. Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-08-04 13:09:02 +04:00
Pavel Emelyanov	9b91bf390d	files: Split fs restore into prepare and restore The prepare one will become more complicated soon. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-07-04 15:09:03 +04:00
Pavel Emelyanov	b8d01d1b7a	files: Rename prepare_fs into restore_fs Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-07-04 15:09:02 +04:00
Pavel Emelyanov	b429492dbc	rst: Include criu/include/ptrace.h instead of system one On ARM some PTRACE_... constants are not declared in sys/ptrace.h file. They are in linux/ptrace.h, but on x86 this file somewhat conflicts with the sys/ one. For now fix ARM compilation by using criu/ one and think of it later. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-07-01 19:48:23 +04:00
Pavel Emelyanov	84eb0a1927	criu: Restore tasks as siblings in swrk Andrey validly pointed out, that restoring pdeath_sig is not compatible with criu_restore_child() call -- after criu restore children, it will exit and fire the pdeath_sig into restored tree root, potentially killing it. The fix for that could be -- when started in swrk more, criu can restore tree not as children tasks, but as siblings, using the CLONE_PARENT flag when fork()-ing the root task. With this we should also take care about errors handing -- right now criu catches the SIGCHILD from dying children tasks, and since we plan to create them be children of the criu parent (the library caller) we will not be able to catch them. To do so we SEIZE the root task in advance thus causing all SIGCHLD-s go to criu, not to its parent. Having this done we no longer need the SUBREAPER trick in the library call -- tasks get restored right as callers kids :) Some thoughts for future -- using this trick we can finally make "natural" restoration of shell jobs. I.e. -- make criu restore some subtree right under bash, w/o leaving itself as intermediate task and w/o re-parenting the subtree to init after restore. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Andrey Vagin <avagin@parallels.com>	2014-07-01 16:16:07 +04:00
Pavel Emelyanov	5e9c57a13d	criu: Dump and restore pdeath_sig value The implementation is pretty straightforward. When dumping per-thread misc data with parasite, collect one, then write in thread_core_info. On restore wait for creds restore and put the value back (some creds changes drop it to zero). Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>	2014-07-01 16:16:04 +04:00
Cyrill Gorcunov	fe7b8aeb8c	vdso: x86 -- Add handling of vvar zones New kernel 3.16 will have old vDSO zone splitted into the two vmas: one for vdso code itself and second that named vvar for data been referenced from vdso code. Because I can't do 'dump' and 'restore' parts of the code separately (otherwise test would fail) the commit is pretty big one and hard to read so here is detailed explanation what's going on. 1) When start dumping we detect vvar zone by reading /proc/pid/smap and looking up for "[vvar]" token. Note the vvar zone is mapped by a kernel with PF/IO flags so we should not fail here. Also it's assumed that at least for now kernel won't be changed much and [vvar] zone always follows the [vdso] zone, otherwise criu will print error. 2) In previous commits we disabled dumping vvar area contents so the restorer code never try to read vvar data but still we need to map vvar zone thus vma entry remains in image. 3) As with previous vdso format we might have 2 cases a) Dump and restore is happening on same kernel b) Dump and restore are done on different kernels To detect which case we have we parse vdso data from image and find symbols offsets then compare their values with runtime symbols provided us by a kernel. If they match and (!!!) the size of vvar zone is the same -- we simply remap both zones from runtime kernel into the positions dumpee had at checkpoint time. This is that named "inplace" remap (a). If this happens the vdso_proxify() routine drops VMA_AREA_REGULAR from vvar area provided by a caller code and restorer won't try to handle this vma. It looks somehow strange and probably should be reworked but for now I left it as is to minimize the patch. In case of (b) we need to generate a proxy. We do that in same way as we were before just include vvar zone into proxy and save vvar proxy address inside vdso mark injected into vdso area. Thus on subsequent checkpoint we can detect proxy vvar zone and rip it off the list of vmas to handle. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Andrew Vagin <avagin@parallels.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-06-24 22:48:43 +04:00
Pavel Emelyanov	3659d60ab7	restore: Open /proc/sys/kernel/ns_last_pid via helpers Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Andrew Vagin <avagin@parallels.com>	2014-06-09 15:29:49 +04:00
Pavel Emelyanov	203c291467	cg: Restore tasks into proper cgroups On restore find out in which sets tasks live in and move them there. Optimization note -- move tasks into cgroups _before_ fork kids to make them inherit cgroups if required. This saves a lot of time. Accessibility note -- when moving tasks into cgroups don't search for existing host mounts (they may be not available) and don't mount temporary ones (may be impossible due to user namespaces). Instead introduce service fd with a yard of mounts. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-05-27 23:48:06 +04:00
Pavel Emelyanov	8b8eb53a0a	cg: Skeleton for cgroup code Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-05-27 23:48:06 +04:00
Cyrill Gorcunov	676708e3b3	vdso: Put CONFIG_VDSO where needed Guard vDSO code with CONFIG_VDSO, no need to even build it on archs which do not support vDSO handling. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Alexander Kartashov <alekskartashov@parallels.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-05-27 23:40:07 +04:00
Andrey Vagin	1a1b50168d	vma: don't skip vmas during searching a parent vma We stop searching if vma->start is bigger than a required one. The coursor is set on the last examined vma. When we are searching a parent vma for the next vma, we start examine vma-s starting from coursor->next, so we don't examine the vma, which is pointed by cursor. This patch replaces list_for_each_entry_continue on list_for_each_entry_from. Reported-by: Filipe Brandenburger <filbranden@google.com> Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-05-14 01:00:31 +04:00
Andrey Vagin	c838494034	mem: stop searching a parent vma if we found one A parent vma can be only one. Fixes: `57d25e7cea` ("mm: fix expression to determine which vma-s can be shared") Reported-by: Filipe Brandenburger <filbranden@google.com> Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-05-14 01:00:22 +04:00
Andrey Vagin	b57497ade5	mem: fix typo in determining an address of parent vma Look at this hunk from `7659c995f5`: - paddr = decode_pointer(vma_premmaped_start(&p->vma)); + paddr = decode_pointer(vma->premmaped_addr); Obviously we want to use p->premmaped_addr instead of vma->premmaped_addr. Fixes: `7659c995f5` ("vm: don't overwrite vma->shmid for private mappings") Reported-by: Filipe Brandenburger <filbranden@google.com> Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-05-14 01:00:13 +04:00
Andrey Vagin	20003300d8	mount: remove extra calls of mntns_collect_root() Now mntns_collect_root() should be called each time when we need to get a root of a specified namespace and we don't need to call it for initializing the global variable. Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-04-22 23:49:43 +04:00
Pavel Emelyanov	79f3e90856	rst: Less arguments to restore_task_mnt_ns Acked-by: Andrew Vagin <avagin@parallels.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-04-22 23:48:46 +04:00
Pavel Emelyanov	8550f52017	mnt: Move local mntns collecting on restore into prepare_mnt_ns Acked-by: Andrew Vagin <avagin@parallels.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-04-22 23:48:43 +04:00
Andrey Vagin	2f4be997b6	mount: use per-namespace mntinfo_tree (v2) This patch removes the global mntinfo_tree and collect_mount_info where it was constructed. The mntinfo list is filled from dump_mnt_ns, rst_collect_local_mntns, collect_mnt_namespaces and read_mnt_ns_img. A mountinfo entry contains a reference on a proper ns_id entry, so we cau use mnt_id to look up a proper mount namespace. v2: remove trash after rebasing. Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-04-21 22:40:19 +04:00
Andrey Vagin	5418938ec3	resotre: collect mounts of current mntns It's required for restoring in the current mntns. Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-04-21 22:39:46 +04:00
Andrey Vagin	de4326a382	mount: return descriptor from mntns_collect_root We are going to support nested mount namespaces, so files can be opened from more than one namespace and a root must be collect for each file. Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-04-21 22:39:32 +04:00
Andrey Vagin	36a9f31665	restore: close PROC_FD_OFF before calling sigreturn mntns_collect_root() uses PROC_FD_OFF, so we need to close it. Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-04-21 22:39:30 +04:00
Andrey Vagin	d2012883ab	criu: rename current_ns_mask to root_ns_mask (v2) Now we supports sub-mntns, so root_ns_mask sounds more correct than current_ns_mask. v2: typo fix Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-04-21 22:38:33 +04:00
Andrey Vagin	3a291e33ff	crtools: restore nested mount namespaces (v2) Known issue: * currently only namespaces with the same root is supported * nested namespaces can be dumped and restored only if the root task has own mount namespace. All nested namespaces are restored in a root namespace in temporary directories. All mount points restored in one tree and then they are divided into namesaces. The task with minimal pid for each namespaces unshared mntns and then it makes pivot_root in a proper temporary directory. All other tasks makes setns to enter into a mount namespace of the task with minimal pid. v2: clean up Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-04-21 22:38:17 +04:00
Andrey Vagin	e7e9c2ee6e	mounts: create a temporary directory for restoring non-root mntns (v2) All non-root namespaces will be restored as sub-trees of the root tree. This patch adds helpers to create a temporary directory and mount tmpfs in it, then create directories for each non-root mount namespace. tmpfs is quite useful here to simplify destroying this construction, we don't need to unmount each namespace separately. v2: add a comment why MNT_DETACH is not dangerous here Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-04-21 22:38:12 +04:00
Pavel Emelyanov	e8ac085af8	Revert "crtools: close all desriptors only for the root task" We have a race. Consider we have 3 tasks, A, B and C. A and B share fdtable, C -- does not. Then we might be in a situation when A is restoring memory reading mem images, and B -- forking the C child. In that case descriptors held by A (for mem restore) will be inherited by C and will not get closed. This reverts commit `d36e07aabe`.	2014-04-21 14:48:05 +04:00
Andrey Vagin	946eadd598	mount: open_mount uses __open_mountpoint instead of own logic Now we have two funсtions which do mostly the same, so this patch merges them. Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-04-17 12:03:15 +04:00
Pavel Emelyanov	302591aa05	rlimits: Reshuffle new and legacy restoration code Do the same as was done with timers. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-04-17 12:01:10 +04:00
Pavel Emelyanov	1d438db66d	rlimits: Move entries from top-core into task-core This appeared after latest 1.2, so it's still possible to do this move. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-04-17 12:01:08 +04:00
Pavel Emelyanov	35e560de00	timers: Reshuffle new and legacy restoration code Make explicit checks and helpers for legacy images. This should facilitate its removal some day in the future. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-04-17 12:01:06 +04:00
Pavel Emelyanov	87da9b83ce	posix-timers: Clean restore code flow Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-04-17 12:01:04 +04:00

1 2 3 4 5 ...

616 Commits