mir/criu - criu - Mike's Git repositories

mir/criu

mirror of https://github.com/checkpoint-restore/criu synced 2025-08-30 13:58:34 +00:00

Author	SHA1	Message	Date
Pavel Emelyanov	69b3ebd002	vma: Remove FDINFO_MAP fd type The regfile's ID of a VMA is stored in its shmid field. And the file itself if sumped into regfiles.img image with 'special'-ly generated ID (i.e. -- just allocate a new unique one). Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2012-04-09 12:57:38 +04:00
Pavel Emelyanov	28a1474779	usk: The INFLIGHT flag is no longer used It was required before we switched to socketpair restore scheme. Now it's not required, sockets just connect to the peer they want to. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2012-04-06 21:43:19 +04:00
Pavel Emelyanov	b386751697	sockets: Rework unix sockets onto fdinfo scheme This is a big change, yes. Dump unix sockets in the same manner as all the other files are done now. A few notes however. 1. We explicitly drop names for connected stream sockets. This is done to avoid conflicts with names -- accepted sockets share their names with the listening parent. This can be done later by binding a socket to a name, them renaming it to some temporary uniq one and at the very very end renaming some back to original. 2. Interconnected sockets are restored via socketpair() call. This is correct, but names are dropped. Need to bind() sockets after this (yes, this can be done), but for this we need to implement the trick with renames described before. 3. FD for socket queues is constantly re-opened not to resolve fd conflicts. Need to use service fds engine for this later. 4. Some code cleanup is still required, yes (will follow shortly). Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2012-04-06 19:27:08 +04:00
Andrey Vagin	96be8be2d1	pipe: save all pipe data in a separate file A pipe buffer has 16 slots. A slot is page, offset and size. When we use splice and data is not aligned, splice connects a page from file cache and set offset. For this reason we loose a part of buffer. If a data size is more than 15 pages, data will be aligned in a image. Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2012-04-05 21:23:57 +04:00
Andrey Vagin	bdb3932be5	pipe: all pipes are saved in one file (v2) Information about pipe's file structs saved in one global file and fdinfo_entry is saved for each descriptor Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2012-04-05 21:17:24 +04:00
Pavel Emelyanov	2a33c4d5dc	mem: Remove zero page from the end of mem image files This was required when pages were stored in elf files for exec. Now we can stop reading it on eof. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2012-04-05 14:07:31 +04:00
Pavel Emelyanov	9b2617353b	inet: Rework inet sk dumping on new fdinfo scheme Now every inetsk fd dump results in a new entry in the fdinfo.img file. Sockets itself are dumped into inetsk.img global image file. On restore the generic fdinfo redistribution algo is used and inet sockets are opened only when required. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2012-03-27 12:42:59 +04:00
Pavel Emelyanov	6b79601ccb	files: Split regfiles info into separate file Since now on the fdinfo image only contains plain fdinfo_entry-es. The tpye == FDINFO_REG files are described by regfiles.img entries and are matched by te ID in both. At dump stage each new ID generated results in a new entry in the regfiles.img. At restore stage open_fe_fd should open a regfile by the fdinfo's ID. Now this is done in suboptimal way, need to improve. Show shows both images separately. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2012-03-25 21:15:16 +04:00
Pavel Emelyanov	95f957b837	image: New image file for regfiles Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2012-03-25 21:11:58 +04:00
Pavel Emelyanov	500468d4e7	files: Split fdinfo in two parts Make fdinfo_entry carry only the minimal info describing a file descriptor -- the fd value itself, the fd type (regular file, exe link, cwd, filemap and it will be pipes, sockets, inotifies, etc.) and the describing file ID. The mentioned ID will identify the type-d object, e.g. for regfiles this ID is already generated with file-ids.c code. The other part of this structure describes a regfile (i.e. a file opened with open syscall). I put this new entry at the end of the fdinfo_entry just to make the patching simpler. Soon this entry will be dumped into its own file. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2012-03-25 21:03:26 +04:00
Pavel Emelyanov	159d3bdfd5	fdinfo: Sanitize types in fdinfo_entry The namelen is u16, to cover the PATH_MAX u8 is not enough. The pos is u64, since file offset is that long indeed. The id is u32 as per previous patch. Fix printf-s respectively. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2012-03-25 21:00:35 +04:00
Stanislav Kinsbursky	b68b3d5dd5	dump: convert fd types into enum This is a precursor patch. Macro for max possible fd type will be required. And it's easier to use enum in this case. Signed-off-by: Stanislav Kinsbursky <skinsbursky@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2012-03-24 00:31:58 +04:00
Pavel Emelyanov	97a1d8bb1c	mm: Dump vmas into separate image file The core image now contains only core per-task stuff. The new file resurrects Tula magic number removed earlier. Acked-by: Andrey Vagin <avagin@parallels.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2012-03-21 18:17:12 +04:00
Andrey Vagin	e869c16df5	mm: rework of dumping shared memory vma_entry contains shmid and all shared memory are dumped in own files. The most interesting thing is restore. A maping is restored by process with the smallest pid. The mamping is created before executing restorer. We map a full mapping and restore it's conten, then we open a file from /proc/pid/map_files and store a descriptor in vma_info. The mapping is unmaped. Now we can map any region of this mapping in the restorer. We use this trick, because a target process may have this mapping in some places and the restorer has not function to open proc files. v2: fix error hangling xemul: Fixed static-s and args for cr_dump_shmem Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2012-03-21 11:03:55 +04:00
Andrey Vagin	5dda50468b	mm: change offset of zero_page_entry to ~0LL Because 0 is actually a valid value. Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2012-03-21 10:57:14 +04:00
Andrey Vagin	37a6c1fc88	mm: move shmid to vma_entry (v2) It will be used to restore shared mappings v2: clean up Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2012-03-21 10:56:31 +04:00
Kinsbursky Stanislav	c1999ec58e	dump: use fd_params->type for cwd and exe magic This is a cleanup patch. Use file entry type variable for special files instead of file entry addr variable. Signed-off-by: Stanislav Kinsbursky <skinsbursky@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2012-03-06 16:59:28 +04:00
Pavel Emelyanov	bad126e7a5	sock: Add dst creds to socket structs These are required for inet sockets, but were not added since listen sockets do not have them. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>	2012-03-02 15:54:42 +04:00
Pavel Emelyanov	f8a18edd44	dump: Remove SHOULD_BE_DEAD task state Move proc checks for Z-state into seize_task(). Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>	2012-03-01 19:31:20 +04:00
Kinsbursky Stanislav	c19012326d	dump: socket queues support This patch was designed to be generic and thus usable for all kinds of sockets. Not sure, thah this goal has been reached, but at least I tried. Key ideas: 1) On-stack structure for collecting sockets queues and then passing them to parasite code. 2) Singly linked list is used for collecting structures, representing sockets of any kind (!) with queues. Based on xemul@ patches. Signed-off-by: Stanislav Kinsbursky <skinsbursky@openvz.org> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>	2012-02-29 17:42:30 +04:00
Kinsbursky Stanislav	8ce9e94705	parasite: support sockets queues This patch adds sockets queue dump functionality. Key ideas 1) sockets info is passed as plain array in parasite args. 2) new socket option SO_PEEK_OFF with MSG_PEEK is used to read the get the queue's packets. 3) Buffer for packet will be allocated for each socket separately and with size of socket sending buffer. For stream sockets is means, that it's queue will be dumped in chunks of this size. Note: loop around sys_msgrcv() is required for DGRAM sockets - sys_msgrcv() with MSG_PEEK will return only one packet. Based on xemul@ patches. Signed-off-by: Stanislav Kinsbursky <skinsbursky@openvz.org> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>	2012-02-29 17:42:30 +04:00
Cyrill Gorcunov	2acc741a3a	files: Use sys_kcmp to find file descriptor duplicates v4 We switch generic-object-id concept with sys_kcmp approach, which implies changes of image format a bit (and since it's early time for project overall, we're allowed to). In short -- previously every file descriptor had an ID generated by a kernel and exported via procfs. If the appropriate file descriptors were the same objects in kernel memory -- the IDs did match up to bit. It allows us to figure out which files were actually the identical ones and should be restored in a special way. Once sys_kcmp system call was merged into the kernel, we've got a new opprotunity -- to use this syscall instead. The syscall basically compares kernel objects and returns ordered results suitable for objects sorting in a userspace. For us it means -- we treat every file descriptor as a combination of 'genid' and 'subid'. While 'genid' serves for fast comparison between fds, the 'subid' is kind of a second key, which guarantees uniqueness of genid+subid tuple over all file descritors found in a process (or group of processes). To be able to find and dump file descriptors in a single pass we collect every fd into a global rbtree, where (!) each node might become a root for a subtree as well. The main tree carries only non-equal genid. If we find genid which is already in tree, we need to make sure that it's either indeed a duplicate or not. For this we use sys_kcmp syscall and if we find that file descriptors are different -- we simply put new fd into a subtree. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Pavel Emelyanov <xemul@parallels.com>	2012-02-28 19:13:47 +04:00
Kinsbursky Stanislav	4141296ed7	IPC: dump semaphores set Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>	2012-02-15 13:33:46 +04:00
Kinsbursky Stanislav	b3cfe73556	dump: support SYSV IPC vma This patch introduces the following changes: 1) introduces new flag VMA_AREA_SYSVIPC to mark corresponding vma entries. 2) enhance task /proc/<pid>/maps parsing to obtain first 5 letters of mapped file. If device major file belong to ins equal to 0 (tmpfs) and it's name starts with "/SYSV", then this mapping is considered as SYSV IPC and corresponding vma entry status is updated with VMA_AREA_SYSVIPC flag. 3) omit dumping of mapping pages for SYSV IPC vmas. Signed-off-by: Stanislav Kinsbursky <skinsbursky@openvz.org> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>	2012-02-15 13:30:34 +04:00
Kinsbursky Stanislav	fa2ff60680	IPC: dump message queue v2: New "MSG_STEAL" functionality is used Signed-off-by: Stanislav Kinsbursky <skinsbursky@openvz.org> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>	2012-02-14 20:21:30 +04:00
Kinsbursky Stanislav	f86d167bf1	ipc: rename struct ipc_seg This name for the structure is obfuscating, because the structure will be used also for queues and semaphores sets migration. This patch renames this structure int ipc_desc_entry. It also renames all related functions and prints to reflect structure name change. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>	2012-02-13 21:04:23 +04:00
Kinsbursky Stanislav	3d886be2c6	IPC: dump shared memory Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>	2012-02-09 13:21:46 +04:00
Kinsbursky Stanislav	530f9d9030	IPC: collect and dump tunables sequentially This patch removes collect stage and dumps tunables object right after collect. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>	2012-02-08 16:31:41 +04:00
Cyrill Gorcunov	76a249282e	restore: Add checkpoint/restore for /proc/pid/exe symlink This patch adds ability to checkpoint/restore /proc/pid/exe symlink, so if a process we've just checkpointed has been say /path/to/exe, then at restore time we bring this path back. There some restiction from kernel side: if existing /proc/pid/exe already mapped more than once, the kernel will refuse to change the symlink, so we need to restore it lately when mmaps of crtools itself already unmapped (ie via late call in restorer.c). Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Pavel Emelyanov <xemul@parallels.com>	2012-02-07 20:08:01 +04:00
Andrey Vagin	4d962b27c0	crtools: dump and restore clear_tid_address pthread_join works with this patch Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>	2012-02-03 17:28:04 +04:00
Cyrill Gorcunov	405985e964	Add sysctl handling engine Since we need to operate with sysctls pretty heavy, better to add some common engine for all handlers. Based-on-patch-from: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Pavel Emelyanov <xemul@parallels.com>	2012-02-02 21:22:20 +04:00
Cyrill Gorcunov	e61605169f	ctrools: Rewrite task/threads stopping engine is back This commit brings the former "Rewrite task/threads stopping engine" commit back. Handling it separately is too complex so better try to handle it in-place. Note some tests might fault, it's expected. --- Stopping tasks with STOP and proceeding with SEIZE is actually excessive -- the SEIZE if enough. Moreover, just killing a task with STOP is also racy, since task should be given some time to come to sleep before its proc can be parsed. Rewrite all this code to SEIZE task and all its threads from the very beginning. With this we can distinguish stopped task state and migrate it properly (not supported now, need to implement). This thing however has one BIG problem -- after we SEIZE-d a task we should seize it's threads, but we should do it in a loop -- reading /proc/pid/task and seizing them again and again, until the contents of this dir stops changing (not done now). Besides, after we seized a task and all its threads we cannot scan it's children list once -- task can get reparented to init and any task's child can call clone with CLONE_PARENT flag thus repopulating the children list of the already seized task (not done also) This patch is ugly, yes, but splitting it doesn't help to review it much, sorry :( Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>	2012-02-01 19:49:28 +04:00
Cyrill Gorcunov	ab82c2de98	Revert "ipc: Drop u32[2] from image, simply use u64 all the time" This reverts commit `4f83d028ff`. It breaks IPC test-case, need to investigate.	2012-02-01 19:27:39 +04:00
Cyrill Gorcunov	63b88720a3	Revert "ctrools: Rewrite task/threads stopping engine" This reverts commit `6da51eee3f`. It breaks transition/file_read test case	2012-02-01 19:27:28 +04:00
Pavel Emelyanov	6da51eee3f	ctrools: Rewrite task/threads stopping engine Stopping tasks with STOP and proceeding with SEIZE is actually excessive -- the SEIZE if enough. Moreover, just killing a task with STOP is also racy, since task should be given some time to come to sleep before its proc can be parsed. Rewrite all this code to SEIZE task and all its threads from the very beginning. With this we can distinguish stopped task state and migrate it properly (not supported now, need to implement). This thing however has one BIG problem -- after we SEIZE-d a task we should seize it's threads, but we should do it in a loop -- reading /proc/pid/task and seizing them again and again, until the contents of this dir stops changing (not done now). Besides, after we seized a task and all its threads we cannot scan it's children list once -- task can get reparented to init and any task's child can call clone with CLONE_PARENT flag thus repopulating the children list of the already seized task (not done also) This patch is ugly, yes, but splitting it doesn't help to review it much, sorry :( Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>	2012-02-01 17:29:13 +04:00
Cyrill Gorcunov	4f83d028ff	ipc: Drop u32[2] from image, simply use u64 all the time This eliminate \| ipc_ns.c:287:2: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing] and makes code simplier. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Pavel Emelyanov <xemul@parallels.com>	2012-02-01 17:23:44 +04:00
Stanislav Kinsbursky	c826057a9c	IPC: dump namespace itself Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>	2012-01-31 22:32:22 +04:00
Pavel Emelyanov	beb158a66e	cr: Task creds support Dumping is simple. All but secbits can be read from proc, secbits are got from parasite. Restoring is a bit tricky -- when you change anything on kernel cred's struct it performs sophisticated checks and can change some more stuff than requested, so the creds restoration procedure is carefully commented step-by-step. Another thing to mention is that creds are restored after everything else, i.e. right before performing final threads sync and sigreturns. This is done to avoid potential problems with insufficient caps for restoring other stuff (e.g. CAP_DAC_OVERRIDE or zero euid is most likely required for opening any image file and the notorious control /proc/sys/kernel/ns_last_pid, which in turn is performed till the very last moment). Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>	2012-01-30 13:00:50 +04:00
Cyrill Gorcunov	29bda9aae5	sockets: Restore in-flight unix stream sockets It's done in two steps - On checkpoint we find which icons are present over all sockets and setup peer number to appropriate listening socket - On restore we collect listening sockets and once we find in-flight connection we search for appropriate listening socket name and use it to call connect() then Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Pavel Emelyanov <xemul@parallels.com>	2012-01-27 23:21:06 +04:00
Pavel Emelyanov	16c58dbd11	magic: Fix PIPEFS_MAGIC constant This one is actually an internal kernel magic number for pipefs filesystem and shouldn't be changed. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2012-01-26 20:42:45 +04:00
Pavel Emelyanov	60dee71484	magic: Change magic numbers Existing ones are boring. Let's switch them into geographical coordinates of various Russian towns in NNNNEEEE form. 4 digits for a coordinate give us up to 2km of inaccuracy, which is more than enough to find a town. We cannot use longitude further than 99.99, i.e. we won't cover the Far East region, but that's OK -- there's more than enough good candidates even in the European part of the country only. Feel free to extend. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>	2012-01-26 19:49:27 +04:00
Pavel Emelyanov	98f4c2e4de	ns: Support UTS namespace Only two fields are modifiable -- hostname and domainname. So read them on dump and write on restore. File format is simple -- u32 magic u32 length of nodename u8[] nodename string u32 length of domainname u8[] domainname string For OpenVZ we can write the release at the end, but this is later. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>	2012-01-26 16:54:22 +04:00
Pavel Emelyanov	b7de83aaf3	crtools: Interval timers support Timers are dumped from inside parasite code, the format is plain -- just 3 pairs of interval/value one-by-one. The restoration occurs in two stages -- first prepare the timer values in restorer (and check for sanity), then setup the timers in the latest stage before actually calling the sigreturn. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>	2012-01-24 18:41:49 +04:00
Cyrill Gorcunov	415b789cbf	image: Add mm_saved_auxv entry It's needed for auxv dump and restore. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Pavel Emelyanov <xemul@parallels.com>	2012-01-24 18:01:07 +04:00
Cyrill Gorcunov	faf41eb5b2	dump: Dump cmdline and envirion parameters It implies update to kernel side as well. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Pavel Emelyanov <xemul@parallels.com>	2012-01-24 18:01:07 +04:00
Pavel Emelyanov	18aaad6164	img: Extend task image with state and exit code Introduce 3 states we will have to work with: * alive for tasks sleeping or running * dead for zombies * stopped for stopped tasks. We cannot distinguish tasks in this state now, but with freezer cgroup this will become possible Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>	2012-01-23 01:43:36 +04:00
Pavel Emelyanov	dbf3c1a8cd	crtools: Reformat core_entry Keep task arch-independent fields in one struct (will be extended) in the beginning of the image and make pads be located separately. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>	2012-01-23 01:43:00 +04:00
Stanislav Kinsbursky	f3253a40d2	checkpoint: IPv4 listening sockets dumping support	2012-01-18 12:38:58 +04:00
Pavel Emelyanov	d1b3fd09b3	fdinfo: fd_is_special helper for maps and cwd Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>	2012-01-16 23:51:12 +04:00
Pavel Emelyanov	e2d8aec7f5	files: Named constant for cwd fdinfo Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>	2012-01-16 23:50:50 +04:00

1 2

80 Commits