mir/criu - criu - Mike's Git repositories

mir/criu

mirror of https://github.com/checkpoint-restore/criu synced 2025-08-27 12:28:14 +00:00

Author	SHA1	Message	Date
Ruslan Kuprieiev	5e58a5dc9f	crtools: check for setproctitle_init Check for setproctitle_init, as old versions of libbsd don't have one. Reported-by: Kir Kolyshkin <kir@openvz.org> Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com> Acked-by: Kir Kolyshkin <kir@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-09-02 16:14:39 +04:00
Ruslan Kuprieiev	2144583732	include: add setproctitle.h Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com> Acked-by: Kir Kolyshkin <kir@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-09-02 16:14:37 +04:00
Andrey Vagin	5ed2004733	dump: clean up shared_fdtable It's cleaned up accoding with following statements: * files_id can't be zero (look at dump_task_kobj_ids) * item->ids is allocated for all non-dead tasks * a parent can't be dead In addition here is a tiny coding stype fix. Fixes: 475bb1e77522 ("rst: Evaluate per-task clone mask early") Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-09-02 16:10:14 +04:00
Andrey Vagin	33c75d0df9	eventpoll: parse_fdinfo_pid_s() returns allocated object for eventpol tfd We are going to collect all objects in a list and write them into the eventpoll image. The eventpoll tfd image will be depricated. Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-09-02 16:08:17 +04:00
Andrey Vagin	78a54bd87c	fsnotify: parse_fdinfo_pid_s() returns allocated object for fanotify marks We are going to collect all objects in a list and write them into the fanotify image. The fanotify mark image will be depricated. Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-09-02 16:07:44 +04:00
Andrey Vagin	7079bb1086	fsnotify: parse_fdinfo_pid_s() returns allocated object for inotify wd (v2) We are going to collect all objects in a list and write them into the inotify image. The inotify wd image will be depricated. v2: cb() must always free an entry Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-09-02 16:07:43 +04:00
Saied Kazemi	9eec8b03af	Use --root instead of --aufs-root When dumping Docker containers using the AUFS graph driver, we can use the --root option instead of --aufs-root for specifying the container's root. This patch obviates the need for --aufs-root and makes dump CLI more consistent with restore CLI. Signed-off-by: Saied Kazemi <saied@google.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-08-27 14:31:40 +04:00
Saied Kazemi	d8b41b6525	Added AUFS support. The AUFS support code handles the "bad" information that we get from the kernel in /proc/<pid>/map_files and /proc/<pid>/mountinfo files. For details see comments in sysfs_parse.c. The main motivation for this work was dumping and restoring Docker containers which by default use the AUFS graph driver. For dump, --aufs-root <container_root> should be added to the command line options. For restore, there is no need for AUFS-specific command line options but the container's AUFS filesystem should already be set up before calling criu restore. [ xemul: With AUFS files sometimes, in particular -- in case of a mapping of an executable file (likekely the one created at elf load), in the /proc/pid/map_files/xxx link target we see not the path by which the file is seen in AUFS, but the path by which AUFS accesses this file from one of its "branches". In order to fix the path we get the info about branches from sysfs and when we meet such a file, we cut the branch part of the path. ] Signed-off-by: Saied Kazemi <saied@google.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-08-21 18:35:22 +04:00
Ruslan Kuprieiev	9b2d1774ba	image: mark CR_FD_SIGNAL and CR_FD_PSIGNAL as obsoleted and don't create signal-s.img, v2 After this patch, signal-s.img won't be created. v2: just move them to the end of array Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-08-19 13:09:49 +04:00
Pavel Emelyanov	f781ba0466	rst: Rework task_entries to use rst_mem engine The task_entries is a small structure used to coordinate the processes restore stages. Currentl we allocate one page for it and handle one separately. No need in this complexity, actually. The rst_mem engine is already capable to controll this small object. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-08-19 13:00:10 +04:00
Pavel Emelyanov	5f9acc8dc9	shmem: Explicitly initialize rst_shmems This is a position in the RM_SHREMAP memory. Since shmems are currently the only user of it, this is validly equals zero, but it will change soon. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-08-19 13:00:07 +04:00
Tycho Andersen	94f6c87c9f	cg: add --cgroup-root option The motivation for this is to be able to restore containers into cgroups other than what they were dumped in (if, e.g. they might conflict with an existing container). Suppose you have a container in: memory:/mycontainer cpuacct,cpu:/mycontainer blkio:/mycontainer name=systemd:/mycontainer You could then restore them to /mycontainer2 via --cgroup-root /mycontainer2. If you want to restore different controllers to different paths, you can provide multiple arguments, for example, passing: --cgroup-root /mycontainer2 --cgroup-root cpuacct,cpu:/specialcpu \ --cgroup-root name=systemd:/specialsystemd Would result in things being restored to: memory:/mycontainer2 cpuacct,cpu:/specialcpu blkio:/mycontainer2 name=systemd:/specialsystemd i.e. a --cgroup-root without a controller prefix specifies the new default root for all cgroups. Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-08-19 12:58:36 +04:00
Sophie Blee-Goldman	e606c2141e	Dump capabilities from the parasite Needed for future user namespace support. Capabilities will have to be dumped from the parasite, ie from inside the namespace since there is no obvious way to 'translate' capabilities from the global namespace (unlike with uids and gids, where the id mappings can be used for translation). [ additional explanation from Andrew Vagin: "capabilities" are not translated between namespaces. They can exist only in one userns, where a process lives. If a process is created in a new userns, it gets a full set of capabilities in this userns, and loses all caps in a parent userns. So if capabilities are not shown in /proc/pid/stat, we have no way to get it except of using parasite code. ] Signed-off-by: Sophie Blee-Goldman <ableegoldman@google.com> Acked-by: Andrew Vagin <avagin@parallels.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-08-15 23:10:44 +04:00
Sophie Blee-Goldman	3faaed2f64	Bug-fix in size calculation Fixes a bug in how PARASITE_MAX_GROUPS was calculated, and adds a compiler check to assert that parasite_dump_creds doesn't exceed the page size. Signed-off-by: Sophie Blee-Goldman <ableegoldman@google.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-08-13 13:04:58 +04:00
Pavel Emelyanov	548625132d	pstree: Introduce task_alive() helper Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-08-12 14:41:00 +04:00
Pavel Emelyanov	7960379f71	flock: Merge all file lock entries into single image file They are now in per-pid images, but every entry contains a pid to which it "belongs". This belonging is fake -- it's just a pid of a task who placed the lock, while locks really belong to files. We even have a bug when task that locked a file exited and "delegated" the lock to its child. This images merge reduces the amount of image files criu generates and may simplify the fix of mentioned above issue. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-08-12 14:38:49 +04:00
Pavel Emelyanov	4816882da9	img: Add ability to check whether optional image collection happened A bit later we'd need to check whether cinfo collector opened an image or not due to file absense. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-08-12 14:38:22 +04:00
Tycho Andersen	f95b05eb75	opts: add --manage-cgroups option criu managed cgroups is now an opt-in thing, so by default criu does not manage (i.e. dump or restore) cgroups. This allows users to use the previous behavior. Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-08-12 14:32:50 +04:00
Garrison Bellack	4c7bc7678e	Cgroup property restoration infrastructure Restores 2 cgroup properties after the criu restoration of tasks. Currently the cgroup files to be restored are static but are easily extendable. To change the properties to be restored, edit this list at the top of cgroup.c. If a cgroup exists during restoration, its properties will not be overwritten. Work based off Tycho Anderson tycho.andersen@canonical.com Change-Id: Ida32b9773eeac1d4d6e82ad644524ed099d5f9b1 Signed-off-by: Garrison Bellack <gbellack@google.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-08-08 17:06:08 +04:00
Cyrill Gorcunov	7158448dd6	timerfd: Implement check routine Reported-by: Jenkins Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-08-07 10:18:09 +04:00
Cyrill Gorcunov	ecd432fe27	timerfd: Implement c/r procedure Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-08-06 19:20:09 +04:00
Cyrill Gorcunov	5c93ba3b7b	timerfd: Add protobuf entries into the image Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-08-06 19:18:34 +04:00
Andrey Vagin	339f456af3	link-remap: open link-remap files from correct mountpoints (v3) Here is a problem with ghost files. Links are created on restore, but they can't be created on any mount point, because a mount point can be non-root bind-mount of another one. So we need to find the root mount and create all links there. v2: clean up v3: add optimization for the case when both links on the same mount point. v4: don't look up mount points by mnt_id in a second time. Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-08-06 19:14:16 +04:00
Andrey Vagin	ce5aa74d10	mount: save local mount point paths on restore On restore we add a temporary root to a mount point path. It's convinient for restoring mount namespaces, but real paths are used for restoring link-remap files. v2: replace the offset field on a char * field Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-08-06 19:14:15 +04:00
Pavel Emelyanov	5289ea973a	mnt: Extend comment about how mntinfo->mountpoint path looks like Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-08-06 12:04:22 +04:00
Pavel Emelyanov	9fd793e565	stat: Pass namespace into phys_stat_resolve_dev, not mnt tree This makes the API simpler. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-08-06 10:57:27 +04:00
Pavel Emelyanov	090587e1a1	stat: Pass namespace into phys_stat_dev_match, not mnt tree This makes the API simpler. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-08-06 10:57:25 +04:00
Ruslan Kuprieiev	2b268c6c21	security: check additional groups,v5 Currently, we only check if process gids match primary gid of user. But process and user have additional groups too. So lets: 1) check that process rgid,egid and sgid are in the user's grouplist. 2) on restore check that user has all groups from the images. Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-08-06 10:20:27 +04:00
Andrey Vagin	967dba606a	mount: add helper mntns_get_root_by_mnt_id Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-08-05 16:38:19 +04:00
Cyrill Gorcunov	18fe357563	vdso: Implement vDSO proxification of any vvar/vdso order In latest linux-next the vdso zone is placed _after_ vvar zone so eventually we need to handle any combination of the following cases - no vvar zone - vvar before vdso - vvar after vdso Here we address all them. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-08-04 15:35:03 +04:00
Cyrill Gorcunov	6446fd2c1d	vdso: Move parking into a separate routine Since we might have a several vDSO zones lets hide handling in arch-specific routines. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-08-04 15:34:34 +04:00
Pavel Emelyanov	9b6c41f2a0	cg: Remove unused cgroup_dir field Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Tycho Andersen <tycho.andersen@canonical.com>	2014-07-15 17:29:23 +04:00
Tycho Andersen	0f178a1f99	cg: correctly detect co-mounted controller mount point Before we would not detect the mount point for co-mounted controllers. Things still worked because we'd just re-mount them ourselves and traverse our own mount point, but this saves an extra mount(). Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-07-14 15:14:37 +04:00
Tycho Andersen	51876eea5d	Attempt to restore cgroups During the dump phase, /proc/cgroups is parsed to find co-mounted cgroups. Then, for each task /proc/self/cgroup is parsed for the cgroups that it is a member of, and that cgroup is traversed to find any child cgroups which may also need restoring. Any cgroups not currently mounted will be temporarily mounted and traversed. All of this information is persisted along with the original cg_sets, which indicate which cgroups a task is a member of. On restore, an initial phase creates all the cgroups which were saved. Tasks are then restored into these cgroups via cg_sets as usual. Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-07-10 17:00:28 +04:00
Pavel Emelyanov	a919dbc9c6	files: Fix restoration of ghost cwd (and root) When cwd is removed (it can be) we need to collect the respective file_desc before starting opening any files to properly handle ghost refcounts. Otherwise we will miss one refcount from the cwd's on ghost, which in turn will either BUG inside ghost removal, or will fail the cwd due to the respective dir being removed too early. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-07-04 15:09:06 +04:00
Pavel Emelyanov	ba8671b4c1	files: Split open_reg_by_id into two parts Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-07-04 15:09:04 +04:00
Pavel Emelyanov	9b91bf390d	files: Split fs restore into prepare and restore The prepare one will become more complicated soon. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-07-04 15:09:03 +04:00
Pavel Emelyanov	b8d01d1b7a	files: Rename prepare_fs into restore_fs Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-07-04 15:09:02 +04:00
Pavel Emelyanov	d0097b2db0	files: Support ghost directories restore If we have opened and rmdir-ed directory, the dump works OK creating the ghost file and remap, but restore creates _file_ instead of directory. Fix this. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-07-04 15:08:59 +04:00
Pavel Emelyanov	84eb0a1927	criu: Restore tasks as siblings in swrk Andrey validly pointed out, that restoring pdeath_sig is not compatible with criu_restore_child() call -- after criu restore children, it will exit and fire the pdeath_sig into restored tree root, potentially killing it. The fix for that could be -- when started in swrk more, criu can restore tree not as children tasks, but as siblings, using the CLONE_PARENT flag when fork()-ing the root task. With this we should also take care about errors handing -- right now criu catches the SIGCHILD from dying children tasks, and since we plan to create them be children of the criu parent (the library caller) we will not be able to catch them. To do so we SEIZE the root task in advance thus causing all SIGCHLD-s go to criu, not to its parent. Having this done we no longer need the SUBREAPER trick in the library call -- tasks get restored right as callers kids :) Some thoughts for future -- using this trick we can finally make "natural" restoration of shell jobs. I.e. -- make criu restore some subtree right under bash, w/o leaving itself as intermediate task and w/o re-parenting the subtree to init after restore. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Andrey Vagin <avagin@parallels.com>	2014-07-01 16:16:07 +04:00
Pavel Emelyanov	5e9c57a13d	criu: Dump and restore pdeath_sig value The implementation is pretty straightforward. When dumping per-thread misc data with parasite, collect one, then write in thread_core_info. On restore wait for creds restore and put the value back (some creds changes drop it to zero). Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>	2014-07-01 16:16:04 +04:00
Pavel Emelyanov	d30521a3cf	crtools: Add internal "swrk" action To help restoring tasks from images as kids to the caller, we can do the trick. 1. Caller sets himself as child reaper with PR_SET_CHILD_SUBREAPER prctl 2. Caller makes sure criu binary is suid-ed and owned by root 3. Caller forks and calls execv() on criu asking it to restore 4. Criu finishes restore and exits. All its kids get reparented to the criu's parent, i.e. -- to the library caller. 5. Caller stops being subreaper In order to make the execv() and arguments passing simpler I propose to execv() the service worker function, that accepts options via socket. This is good for two reasons. 1. We don't have to construct CLI options in libcriu 2. We reuse other service's facilities, such as security checks, ability to dump, pre-dump and other stuff Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-06-27 14:24:33 +04:00
Pavel Emelyanov	fac7befa6b	files: Sanity check for reg file on restore is not corrupted When opening a reg file on restore -- check that the file size we opened matches the on we saw on dump. This is not bullet-proof protection, but is helpful to protect against FS updates between dump/restore. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-06-24 23:38:48 +04:00
Cyrill Gorcunov	fe7b8aeb8c	vdso: x86 -- Add handling of vvar zones New kernel 3.16 will have old vDSO zone splitted into the two vmas: one for vdso code itself and second that named vvar for data been referenced from vdso code. Because I can't do 'dump' and 'restore' parts of the code separately (otherwise test would fail) the commit is pretty big one and hard to read so here is detailed explanation what's going on. 1) When start dumping we detect vvar zone by reading /proc/pid/smap and looking up for "[vvar]" token. Note the vvar zone is mapped by a kernel with PF/IO flags so we should not fail here. Also it's assumed that at least for now kernel won't be changed much and [vvar] zone always follows the [vdso] zone, otherwise criu will print error. 2) In previous commits we disabled dumping vvar area contents so the restorer code never try to read vvar data but still we need to map vvar zone thus vma entry remains in image. 3) As with previous vdso format we might have 2 cases a) Dump and restore is happening on same kernel b) Dump and restore are done on different kernels To detect which case we have we parse vdso data from image and find symbols offsets then compare their values with runtime symbols provided us by a kernel. If they match and (!!!) the size of vvar zone is the same -- we simply remap both zones from runtime kernel into the positions dumpee had at checkpoint time. This is that named "inplace" remap (a). If this happens the vdso_proxify() routine drops VMA_AREA_REGULAR from vvar area provided by a caller code and restorer won't try to handle this vma. It looks somehow strange and probably should be reworked but for now I left it as is to minimize the patch. In case of (b) we need to generate a proxy. We do that in same way as we were before just include vvar zone into proxy and save vvar proxy address inside vdso mark injected into vdso area. Thus on subsequent checkpoint we can detect proxy vvar zone and rip it off the list of vmas to handle. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Andrew Vagin <avagin@parallels.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-06-24 22:48:43 +04:00
Cyrill Gorcunov	154d1c6c2c	vdso: parasite -- Prepare new vdso mark structure. Because of new vvar area we need to carry the address of vvar proxy inside the mark. Thus add members needed and update routines. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Andrew Vagin <avagin@parallels.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-06-24 22:48:43 +04:00
Cyrill Gorcunov	72ead490e4	vdso: image -- Add VMA_AREA_VVAR flag Will need it to handle vvar zones in a special way. Because VMA_UNSUPP never goes into the image file lets reuse bit 12 for VVAR. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Andrew Vagin <avagin@parallels.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-06-24 22:48:40 +04:00
Pavel Emelyanov	3b995f1aef	iov: Add iovec2pagemap() helper Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-06-20 16:35:52 +04:00
Andrey Vagin	494c044384	mount: dump one file system only once (v2) A file system can be bind-mounted a few times and some of these mounts can be non-root. We need to find one of root mounts and dump it. v2: don't forget to check pm->dumped and pm->parent don't dump a root file system, it's always external for now. Reported-by: Saied Kazemi <saied@google.com> Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-06-17 10:40:00 +04:00
Andrey Vagin	697211908a	tmpfs: use device number instead of mnt_id in image names One file system can be mounted a few times, so mnt_id isn't unique for it. Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-06-17 10:39:52 +04:00
Pavel Emelyanov	c7e0042946	crtools: Introduce the --ext-mount-map option (v3) On dump one uses one or more --ext-mount-map option with A:B arguments. A denotes a mountpoint (as seen from the target mount namespace) criu dumps and B is the string that will be written into the image file instead of the mountpoint's root. On restore one uses the same --ext-mount-map option(s) with similar A:B arguments, but this time criu treats A as string from the image's root field (foobar in the example above) and B as the path in criu's mount namespace the should be bind mounted into the mountpoint. v3: * Added documentation * Added RPC bits * Changed option name into --ext-mount-map * Use colon as key and value separator Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-06-17 10:36:30 +04:00

... 6 7 8 9 10 ...

1696 Commits