mir/criu - criu - Mike's Git repositories

mir/criu

mirror of https://github.com/checkpoint-restore/criu synced 2025-08-30 05:48:05 +00:00

Author	SHA1	Message	Date
Tycho Andersen	5fe3a138df	lsm: add support for c/ring LSM profiles This patch adds support for checkpoint and restore of two linux security modules (apparmor and selinux). The actual checkpoint or restore code isn't that interesting, other than that we have to do the LSM restore in the restorer blob since it may block any number of things that we want to do as part of the restore process. I tried originally to get this to work using libraries in the restorer blob, but I could _not_ get things to work correctly (I assume I was doing something wrong with all the static linking, you can see my draft attempts here: https://github.com/tych0/criu/commits/apparmor-using-libraries ). I can try to resurrect this if it makes more sense, to do it that way, though. v2: lsm_profile lives in creds.proto instead of the task core, look in a more canonical place for selinuxfs and don't try to special case any selinux profile names. v3: only allow unconfined selinux profiles Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-05-08 15:31:05 +03:00
Pavel Emelyanov	b0115358d5	img: Always assume images to be v1.1 Only if inventory says it's v1 -- switch to old scheme. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-04-14 22:08:59 +03:00
Pavel Emelyanov	11c4094c15	img: Initialize "images are new" bool earlier Doing it at inventory write time is too late. Other than this, inventory isn't created for pre-dump, thus this one always generates v1 images. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-04-14 17:35:16 +03:00
Cyrill Gorcunov	9ce0254c04	vma: Unify private VMAs testing We have two helpers for VMA type testing: privately_dump_vma() and vma_priv(). They work with different types but basically do the same: check if we should dump VMA into the image and restore it back then. Lets unify they both into common vma_entry_is_private() helper and vma_area_is_private() for working with vma_area type. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-04-01 12:36:46 +03:00
Ruslan Kuprieiev	df301b7eb7	security: create separate security.h header Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-02-10 16:53:54 +03:00
Cyrill Gorcunov	fd07bc7791	cpu: Add 'ins' mode to --cpu-cap option In this mode we test if target cpu has all features present in image file but do not require bit to bit match: target cpu may be a new one with more features present. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-12-26 18:15:46 +03:00
Pavel Emelyanov	08c204820f	aio: Dump AIO rings When AIO context is set up kernel does two things: 1. creates an in-kernel aioctx object 2. maps a ring into process memory The 2nd thing gives us all the needed information about how the AIO was set up. So, in order to dump one we need to pick the ring in memory and get all the information we need from it. One thing to note -- we cannot dump tasks if there are any AIO requests pending. So we also need to go to parasite and check the ring to be empty. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-12-26 18:13:36 +03:00
Pavel Emelyanov	6a6cdb8d4a	proc: Drop always true last argument of parse_smaps() Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Andrew Vagin <avagin@parallels.com>	2014-12-22 13:52:03 +03:00
Pavel Emelyanov	a7bfa05a21	collect: Rename children/threads collecting routines Make their name look similar. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-11-11 20:16:43 +04:00
Pavel Emelyanov	ac1c74fc5b	collect: Don't check for zombie before collecting We have sanity check for zombie-with kids below, no need in additional. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-11-11 20:16:29 +04:00
Pavel Emelyanov	8078e38774	collect: Factor out recurring collection of threads and children We scan threads and children list several times while freezing the tree, this is done to avoid race with new threads/kids appearing. Factor out the iterations code. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-11-11 20:16:22 +04:00
Pavel Emelyanov	83df36e731	collect: Move parse_threads into collect_threads To make the threads collect code be structured similar to children collect. This will also help in further patching. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-11-11 20:16:13 +04:00
Pavel Emelyanov	13a628df95	collect: Clean children collect code variables usage Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-11-11 20:16:05 +04:00
Pavel Emelyanov	009c173be5	collect: Clean children and threads recurring collect checks Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-11-11 20:15:57 +04:00
Pavel Emelyanov	ee2e8e5bb9	parasite: Cleanup args size fetching Right now we push all the auxiliary arguments to parasite_infect_seized while 2 of them are only required to calculate the size of args area. Let's better keep track of required args size and get rid of excessive arguments to parasite_infect_seized(). Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-11-11 20:11:34 +04:00
Pavel Emelyanov	ca3b8ca051	dump: Comment how we dump zombies in pid namespaces Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-11-10 17:42:22 +04:00
Andrey Vagin	cb2f9223a0	dump: dump user namespaces (v2) For that we need to save per-namespace mappings of user and group IDs. And all id-s for tasks and files are saved from the target user namespace. v2: move code into collect_namespaces() Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-11-07 17:16:16 +04:00
Andrey Vagin	a4243f075b	dump: move the may_dump() check in seize_task() It's a bad idea to a group of processes and only then check rights for this operation. We need to check permissions a soon as posible to reduce impacts in case of wrong permissions. In addtion criu doesn't to parse /proc/pid/state and gets all required infromation from /proc/pid/status. Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-11-07 17:15:29 +04:00
Andrey Vagin	2cb6f2b68b	dump: remove useless arguments from seize_task() We get sig and pgid from a parasite, because we need to get them from a target pid namespace. Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-11-07 17:14:54 +04:00
Andrey Vagin	77905aae19	dump: get tasks ids from parasite We have two reason for that: * parsing of /proc/pid/status is slow * parasite returns ids from a target userns Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-11-07 17:14:32 +04:00
Andrey Vagin	c80048d215	cr-dump: fix out-of-bounds write (OVERRUN) CID 73381 (#1 of 1): Out-of-bounds write (OVERRUN) 15. overrun-local: Overrunning array loc_buf of 4096 bytes at byte offset 4096 using index len (which evaluates to 4096). CID 73355 (#1 of 1): Out-of-bounds write (OVERRUN) 6. overrun-local: Overrunning array loc_buf of 4096 bytes at byte offset 4096 using index ret (which evaluates to 4096) Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-11-05 15:41:45 +04:00
Pavel Emelyanov	1ef5ca8235	bfd: Check images got flushed at the end Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-11-05 15:37:39 +04:00
Andrey Vagin	bc9b4bcc3f	parasite: stop a parasite daemon before dumping threads The parasite daemon set up SIGCHLD handler, but for dumping threads we use parasite-trap. While doing this the sigchild handler notices the CHLD arriving on the thread trap, emits an error (00.020292) Error (parasite-syscall.c:387): si_code=4 si_pid=3485 si_status=5 but wait() reports -1 (task is not dead, just trapped) and handler just exits. Let's stop a parasite daemon before dumping threads. Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-10-27 21:32:45 +04:00
Andrey Vagin	9498609f89	dump: don't play with a function exit code We should not have a chance to exit with a wrong code on error paths. Now dump_one_task() returs zero, if allocation of dfds failed: ret = collect_mappings(pid, &vmas); if (ret) { pr_err("Collect mappings (pid: %d) failed with %d\n", pid, ret); goto err; } if (!shared_fdtable(item)) { dfds = xmalloc(sizeof(*dfds)); if (!dfds) goto err; ... err: return -1; Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-10-17 17:22:49 +04:00
Cyrill Gorcunov	87273ccdb8	cpuinfo: x86 -- Add dump and validation of cpuinfo image, v2 On Wed, Oct 01, 2014 at 04:57:40PM +0400, Pavel Emelyanov wrote: > On 10/01/2014 01:07 AM, Cyrill Gorcunov wrote: > > On Tue, Sep 30, 2014 at 09:18:53PM +0400, Cyrill Gorcunov wrote: > >> If a user requested criu to dump cpuinfo image then we > >> write one on dump and verify on restore. At the moment > >> we require all cpu feature bits to match the destination > >> cpu in a sake of simplicity, but in future we need deps > >> engine which would filer out bits and test if cpu we're > >> restoring on is more capable than one we were dumping at > >> allowing to proceed restore procedure. > >> > >> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> > > > > Updated to new img format Something like attached? >From 59272a9514311e6736cddee08d5f88aa95d49189 Mon Sep 17 00:00:00 2001 From: Cyrill Gorcunov <gorcunov@openvz.org> Date: Thu, 25 Sep 2014 16:04:10 +0400 Subject: [PATCH] cpuinfo: x86 -- Add dump and validation of cpuinfo image If a user requested criu to dump cpuinfo image then we write one on dump and verify on restore. At the moment we require all cpu feature bits to match the destination cpu in a sake of simplicity, but in future we need deps engine which would filer out bits and test if cpu we're restoring on is more capable than one we were dumping at allowing to proceed restore procedure. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-10-03 13:26:57 +04:00
Pavel Emelyanov	c57c2cfa64	predump: Collect mnt and net namespaces properly On pre-dump we collect only two namespaces -- the mnt one for criu and mnt one again for root task. This is not correct. We need all mount namespaces to make the irmap generation work properly and we need all net namespaces to have parasite sockets created. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-10-02 14:30:31 +04:00
Pavel Emelyanov	7327ffe6a7	ns: Introduce collect_net_namespaces And move sockets collection there. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-10-01 13:33:56 +04:00
Pavel Emelyanov	01f6f890c2	ns: Introduce collect_namespaces routine Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-10-01 13:33:42 +04:00
Pavel Emelyanov	b476879239	irmap: Get root mntfd before releasing tasks on predump We have a use-after-free in predump code: 1st the free_pstree() is called in pre_dump_tasks(), then we go to irmap_predump_run() which may call the lookup_irmap() which, in turn, dereferences the root_item to get the root mount ns fd. But the problem is bigger than that. After we've released the tasks (done before freeing pstree on predump) we can no longer access them by PIDs, so keeping the root-item after irmap scan is not a fix. Fix is to get the root fd before releasing the tasks and using one in irmap scanner. Caught recently on iterative inotify_irmap test. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Andrew Vagin <avagin@parallels.com>	2014-10-01 09:37:04 +04:00
Pavel Emelyanov	295090c1ea	img: Introduce the struct cr_img We want to have buffered images to speed up dump and, slightly, restore. Right now we use plan file descriptors to write and read images to/from. Making them buffered cannot be gracefully done on plain fds, so introduce a new class. This will also help if (when?) we will want to do more complex changes with images, e.g. store them all in one file or send them directly to the network. For now the cr_img just contains one int _fd variable. This patch chages the prototype of open_image() to return struct cr_img , pb_(read\|write) to accept one and fixes the compilation of the rest of the code :) Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>	2014-09-30 21:48:13 +04:00
Pavel Emelyanov	5f2a7ac27b	img: Rename fdset -> imgset Since we're going to switch from int-fd-s to class-image soon the fdset name will not fit into the new terminology. This patch is sed -e 's/fdset/imgset/g' -i * sed -e 's/imgset_fd/img_from_set/g' -i * git mv include/fdset.h include/imgset.h Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>	2014-09-30 21:48:10 +04:00
Pavel Emelyanov	1a2e6cbd3f	dump: Don't close pid-proc in vain The open_pid_proc engine knows itself how to cache per-pid descriptors. No need in closing it by hands. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-09-29 13:22:21 +04:00
Pavel Emelyanov	e651a6eba4	filemap: Get vma mnt_id early We have a, well, issue with how we calculate the vma's mnt_id. Right now get one via criu side file descriptor that it got by opening the /proc/pid/map_files/ link. The problem is that these descriptors are 'merged' or 'borrowed' by adjacent vmas from previous ones. Thus, getting the mnt_id value for each of them makes no sense -- these files are the same. So move this mnt_id getting earlier into vma parsing code. This brings a potential problem -- if we have two adjacent vmas mapping the same inode (dev:ino pair) but living in different mount namespaces -- this check would produce wrong result. "Wrong" from the perspective that on restore correct file would be opened from wrong namespace. I propose to live with it, since this is not worse than the --evasive-devices option, it's _very_ unlikely, but saves a lot of openeings. Note, that in case app switched mount namespace and then mapped some new library (with dlopen) things would work correctly -- new vmas will likely be not adjacent and for different dev:ino. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-09-29 13:20:55 +04:00
Pavel Emelyanov	cf8c9ae870	vma: Reshuffle the struct vma_area We have some fields, that are dump-only and some that are restore only (quite a lot of them actually). Reshuffle them on the vma_area to explicitly show which one is which. And rename some of them for easier grep. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-09-29 13:19:55 +04:00
Pavel Emelyanov	1ebd56b024	proc: Don't use FILE * to reach children The same reasoning as for personality file -- switch to plan open + read + close. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-09-19 17:39:56 +04:00
Pavel Emelyanov	f3bee6d584	proc: Don't use FILE* for reading personality It turned out, that fdopen (used in fopen_proc) always maps a 4k buffer for reads and this buffer gets unmap-ed later on fclose. Taking into account the amount of proc files we read (~20 per task plus one file per opened file descriptor) this mmap+munmap result in quite a lot of useless CPU time. E.g. for a container of 20 tasks we have 1000 calls taking ~8% of total dump time. So lets first stop doing this for simple cases -- one line proc files. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-09-19 17:39:49 +04:00
Pavel Emelyanov	17d44de9af	scripts: Use numeric script names Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-09-05 13:48:26 +04:00
Pavel Emelyanov	069bdd9674	scripts: Move scripts code into separate sources Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-09-05 13:48:21 +04:00
Cyrill Gorcunov	3146f58317	plugin: Rework plugins API, v2 Here we define new api to be used in plugins. - Plugin should provide a descriptor with help of CR_PLUGIN_REGISTER macro, or in case if plugin require no init/exit functions -- with CR_PLUGIN_REGISTER_DUMMY. - Plugin should define a plugin hook with help of CR_PLUGIN_REGISTER_HOOK macro. - Now init/exit functions of plugins takes @stage argument which tells plugin which stage of criu it's been called on dump/restore. For exit it also takes @ret which allows plugin to know if something went wrong and it needs to cleanup own resources. The idea behind is to not limit plugins authors with names of functions they might need to use for particular hook. Such new API deprecates olds plugins structure but to keep backward compatibility we will provide a tiny layer of additional code to support old plugins for at least a couple of release cycles. For example a trivial plugin might look like \| #include <sys/types.h> \| #include <sys/stat.h> \| #include <fcntl.h> \| #include <libgen.h> \| #include <errno.h> \| \| #include <sys/socket.h> \| #include <linux/un.h> \| \| #include <stdio.h> \| #include <stdlib.h> \| #include <string.h> \| #include <unistd.h> \| \| #include "criu-plugin.h" \| #include "criu-log.h" \| \| static int dump_ext_file(int fd, int id) \| { \| pr_info("dump_ext_file: fd %d id %d\n", fd, id); \| return 0; \| } \| \| CR_PLUGIN_REGISTER_DUMMY("trivial") \| CR_PLUGIN_REGISTER_HOOK(CR_PLUGIN_HOOK__DUMP_EXT_FILE, dump_ext_file) Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Andrew Vagin <avagin@parallels.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-09-03 20:48:36 +04:00
Pavel Emelyanov	57c7826a8e	locks: Check for --file-locks option when real locks are found Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-09-02 20:20:47 +04:00
Pavel Emelyanov	d58aafc447	dump: Don't allocate dfds in case we dump shared fdtable After patches, that dump locks w/o dfds array, we can even not allocate one when we don't need it. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-09-02 17:45:29 +04:00
Pavel Emelyanov	53537f52c8	locks: Don't dump locks in per-task manner (v3) We have a problem with file locks (bug #2512) -- the /proc/locks file shows the ID of lock creator, not the owner. Thus, if the creator died, but holder is still alive, criu fails to dump the lock held by latter task. The proposal is to find who _might_ hold the lock by checking for dev:inode pairs on lock vs file descriptors being dumped. If the creator of the lock is still alive, then he will take the priority. One thing to note about flocks -- these belong to file entries, not to tasks. Thus, when we meet one, we should check whether the flock is really held by task's FD by trying to set yet another one. In case of success -- lock really belongs to fd we dump, in case it doesn't trylock should fail. At the very end -- walk the list of locks and dump them all at once, which is possible by merge of per-task file-locks images into one global one. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-09-02 17:44:46 +04:00
Saied Kazemi	d8b41b6525	Added AUFS support. The AUFS support code handles the "bad" information that we get from the kernel in /proc/<pid>/map_files and /proc/<pid>/mountinfo files. For details see comments in sysfs_parse.c. The main motivation for this work was dumping and restoring Docker containers which by default use the AUFS graph driver. For dump, --aufs-root <container_root> should be added to the command line options. For restore, there is no need for AUFS-specific command line options but the container's AUFS filesystem should already be set up before calling criu restore. [ xemul: With AUFS files sometimes, in particular -- in case of a mapping of an executable file (likekely the one created at elf load), in the /proc/pid/map_files/xxx link target we see not the path by which the file is seen in AUFS, but the path by which AUFS accesses this file from one of its "branches". In order to fix the path we get the info about branches from sysfs and when we meet such a file, we cut the branch part of the path. ] Signed-off-by: Saied Kazemi <saied@google.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-08-21 18:35:22 +04:00
Pavel Emelyanov	546f2701f0	signals: Comments and while (1) loop Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-08-19 15:27:54 +04:00
Pavel Emelyanov	11fc475853	signals: Sanitize j loop control variable Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-08-19 15:27:40 +04:00
Pavel Emelyanov	f9ebd18354	signals: Don't collect siginfo_t-s on stack We've moved signinfos on core entry, thus the bits with siginfo-s themselves cannot sit on stack any longer. Otherwise we would overwritem them with next batch and will feed stack pointer to the caller, thus causing a data and garbage on the stack to be written into image instead of siginfo data. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-08-19 15:27:19 +04:00
Pavel Emelyanov	92664c5220	signals: Don't forget to allocate SiginfoEntry The se variable is just an array of pointers on these objects. Need to allocate the objects themselves. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-08-19 15:25:57 +04:00
Pavel Emelyanov	8197bae072	signals: Move nr variable into peeking loop And sanitize its usage a little bit. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-08-19 15:25:13 +04:00
Pavel Emelyanov	22082b0e55	signals: Calculate peek offset in-place No need in extra variable for that. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-08-19 15:24:36 +04:00
Ruslan Kuprieiev	68501cde88	dump: dump signals into signals_* Every thread has it's own private signals stored at thread_core->signals_p and leader thread has also shared signals stored at tc->signals_s. Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-08-19 13:09:47 +04:00

1 2 3 4 5 ...

545 Commits