2
0
mirror of https://github.com/checkpoint-restore/criu synced 2025-08-30 05:48:05 +00:00

148 Commits

Author SHA1 Message Date
Cyrill Gorcunov
b38777dff4 eventpoll: Add checkpoint/restore v2
v2:
 - Move everything into eventpoll.[ch]
 - Use rst_file_params

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-05-04 14:00:05 +04:00
Cyrill Gorcunov
889795da5d eventfd: Add checkpoint/restore support v2
v2:
 - Pass initial counter value to eventfd call
   (can't pass flags here since they are obtained
    with fcntl and must be restored same way or
    restore will fail)
 - Use rst_file_params for flags and owner restore
 - Use eventfd.[ch] instead of eventfs.[ch]
 - Move show funcs to eventfd.c

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-05-04 13:59:51 +04:00
Pavel Emelyanov
a53aa45be0 tcp: Initial image description
Introduce the image file for tcp info, its entry and the show method.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-04-28 17:59:21 +04:00
Cyrill Gorcunov
19c1de828b sockets: Restore unconnected dgram sockets v7
In case if dgram socket peer is not connected back
we can try to resolve peer by name.

For security reason this happens only if '-x' option
is passed at checkpoint and restore time.

In particular this is needed for programs which do
use dgram socket to send messages to /dev/log.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-04-19 12:04:58 +04:00
Pavel Emelyanov
1708c747d1 magics: Move them one tab right
To line the constants up.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-04-13 18:02:50 +04:00
Pavel Emelyanov
a1ccfb9297 files: Support dumping/restoring of completely unlinked files
Completely unlinked file is the one with n_link count being zero.
Such files only allow to read their contents and carry with us.

In order to dump this thing I introduce the "path remap" technology.
For reg file a remapping entry is dumped which describes, that at
restore stage before opening a regfile->path this path should be
linked to some other name and then (after open) unlinked.

For completely unlinked files the remap path would be a path to
a "ghost" file, i.e. a file which is created only at the time of
restore and which is removed completely at the end of it.

Partially unlinked files (i.e. those having n_link != 0, but a
path by which we see them in someone's fd is not accessible) should
be handled in another way.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-04-13 17:54:36 +04:00
Cyrill Gorcunov
a8840ba721 fowners: Add checkpoint/restore for sockets
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-04-12 12:31:33 +04:00
Cyrill Gorcunov
ff3471f726 fowners: Prepare ground for dump and restore
Just show implemented and stubs added to image
(regular file and pipes).

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-04-12 12:28:15 +04:00
Pavel Emelyanov
6f67bb8fc3 xids: Save pgid and sid on pstree_Item and pstree_entry
I store them on _entry since sids can only be inherited or
set to current's pid. Thus the best we can do it restore sids
at fork time, thus save them in the image we use to fork.

Maybe when we submit patches that will give us ability to set
arbitrary pgid and sid we'll change this, but this is in the
future.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-04-11 22:10:09 +04:00
Pavel Emelyanov
13ee53a098 sockets: Save and restore fd flags for sockets
For regfiles this is done at open() time, for pipes thit is done with fcntl. Use
the same fcntl approach for sockets.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-04-11 13:20:03 +04:00
Pavel Emelyanov
05e3c4d2c9 fd: Handle close-on-exec bits
This bit is not per-file, but per-fd, thus put it on the fdinfo_entry.
Draing these bits from parasite together with the fds themselves, save
into image and restore with fcntl F_SETFD cmd where applicable.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-04-10 18:36:59 +04:00
Pavel Emelyanov
1d6578bbd5 kcmp: Dump task's objects shared with CLONE_ flags
Just dump their IDs and check they are not shared. For future.
IO and SEMUNDO is not there since tasks may have NO such objects
and currently we cannot detect whether they have them equal or
both don't have.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-04-09 18:02:00 +04:00
Pavel Emelyanov
43367e2545 fdinfo: Rename fdinfo_entry addr to fd
Now we store only real fdtable entries in this file, so it's
time to name the field properly and change type to u32.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-04-09 16:18:33 +04:00
Pavel Emelyanov
447f369ba9 fd: Remove fs_is_special
It's no longer required. All the previously special fds are
now scattered over other images.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-04-09 15:54:28 +04:00
Pavel Emelyanov
b984eeff9c mm: Move exe file id on mm_entry
This is mm_struct entity, so save one there. Also gets rid
of special FDINFO-s.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-04-09 15:52:00 +04:00
Pavel Emelyanov
fe70efad29 mm: Split mm parts from task core image
The mm_xxx bits are per-mm_struct, not per-task_struct in kernel.
Thus, when we support CLONE_VM we'd better have these bits in a
separate image file.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-04-09 14:51:37 +04:00
Pavel Emelyanov
de66a5d04b fs: Reserve place for task's root dumping
Do not restore it yet -- the logic we're about to apply to
resolve tasks' paths relative to dumper/restorer is not yet
clear to me and it should better be hidden into a couple of
calls (dump_one_reg_file/open_fe_fd). But since we can't
chroot to fd we're about to expose the logic outside of the
open_fe_fd, which is not desirable ATM.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-04-09 13:52:42 +04:00
Pavel Emelyanov
e5e57e832b fs: Move info about cwd into separate file
Why? Because one day we'll support various CLONE_ flags and
for fdtable and fs info we'd like to have separate images (since
these objects are separate in kernel).

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-04-09 13:41:05 +04:00
Pavel Emelyanov
69b3ebd002 vma: Remove FDINFO_MAP fd type
The regfile's ID of a VMA is stored in its shmid field. And the
file itself if sumped into regfiles.img image with 'special'-ly
generated ID (i.e. -- just allocate a new unique one).

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-04-09 12:57:38 +04:00
Pavel Emelyanov
28a1474779 usk: The INFLIGHT flag is no longer used
It was required before we switched to socketpair restore
scheme. Now it's not required, sockets just connect to
the peer they want to.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-04-06 21:43:19 +04:00
Pavel Emelyanov
b386751697 sockets: Rework unix sockets onto fdinfo scheme
This is a big change, yes. Dump unix sockets in the same manner
as all the other files are done now. A few notes however.

1. We explicitly drop names for connected stream sockets. This is
   done to avoid conflicts with names -- accepted sockets share their
   names with the listening parent. This can be done later by binding
   a socket to a name, them renaming it to some temporary uniq one
   and at the very very end renaming some back to original.

2. Interconnected sockets are restored via socketpair() call. This is
   correct, but names are dropped. Need to bind() sockets after this
   (yes, this can be done), but for this we need to implement the trick
   with renames described before.

3. FD for socket queues is constantly re-opened not to resolve fd
   conflicts. Need to use service fds engine for this later.

4. Some code cleanup is still required, yes (will follow shortly).

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-04-06 19:27:08 +04:00
Andrey Vagin
96be8be2d1 pipe: save all pipe data in a separate file
A pipe buffer has 16 slots. A slot is page, offset and size.
When we use splice and data is not aligned, splice connects
a page from file cache and set offset. For this reason we loose
a part of buffer.

If a data size is more than 15 pages, data will be aligned in a image.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-04-05 21:23:57 +04:00
Andrey Vagin
bdb3932be5 pipe: all pipes are saved in one file (v2)
Information about pipe's file structs saved in one global file and
fdinfo_entry is saved for each descriptor

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-04-05 21:17:24 +04:00
Pavel Emelyanov
2a33c4d5dc mem: Remove zero page from the end of mem image files
This was required when pages were stored in elf files for
exec. Now we can stop reading it on eof.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-04-05 14:07:31 +04:00
Pavel Emelyanov
9b2617353b inet: Rework inet sk dumping on new fdinfo scheme
Now every inetsk fd dump results in a new entry in the fdinfo.img file. Sockets itself are
dumped into inetsk.img global image file. On restore the generic fdinfo redistribution algo
is used and inet sockets are opened only when required.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-03-27 12:42:59 +04:00
Pavel Emelyanov
6b79601ccb files: Split regfiles info into separate file
Since now on the fdinfo image only contains plain fdinfo_entry-es.
The tpye == FDINFO_REG files are described by regfiles.img entries
and are matched by te ID in both.

At dump stage each new ID generated results in a new entry in the
regfiles.img. At restore stage open_fe_fd should open a regfile by
the fdinfo's ID. Now this is done in suboptimal way, need to improve.

Show shows both images separately.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-03-25 21:15:16 +04:00
Pavel Emelyanov
95f957b837 image: New image file for regfiles
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-03-25 21:11:58 +04:00
Pavel Emelyanov
500468d4e7 files: Split fdinfo in two parts
Make fdinfo_entry carry only the minimal info describing a file
descriptor -- the fd value itself, the fd type (regular file, exe
link, cwd, filemap and it will be pipes, sockets, inotifies, etc.)
and the describing file ID.

The mentioned ID will identify the type-d object, e.g. for regfiles
this ID is already generated with file-ids.c code.

The other part of this structure describes a regfile (i.e. a file
opened with open syscall). I put this new entry at the end of the
fdinfo_entry just to make the patching simpler. Soon this entry
will be dumped into its own file.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-03-25 21:03:26 +04:00
Pavel Emelyanov
159d3bdfd5 fdinfo: Sanitize types in fdinfo_entry
The namelen is u16, to cover the PATH_MAX u8 is not enough.
The pos is u64, since file offset is that long indeed.
The id is u32 as per previous patch.

Fix printf-s respectively.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-03-25 21:00:35 +04:00
Stanislav Kinsbursky
b68b3d5dd5 dump: convert fd types into enum
This is a precursor patch. Macro for max possible fd type will be required.
And it's easier to use enum in this case.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-03-24 00:31:58 +04:00
Pavel Emelyanov
97a1d8bb1c mm: Dump vmas into separate image file
The core image now contains only core per-task stuff.
The new file resurrects Tula magic number removed earlier.

Acked-by: Andrey Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-03-21 18:17:12 +04:00
Andrey Vagin
e869c16df5 mm: rework of dumping shared memory
vma_entry contains shmid and all shared memory are dumped in own files.
The most interesting thing is restore.
A maping is restored by process with the smallest pid. The mamping
is created before executing restorer.
We map a full mapping and restore it's conten, then we open a file from
/proc/pid/map_files and store a descriptor in vma_info. The mapping is
unmaped. Now we can map any region of this mapping in the restorer.

We use this trick, because a target process may have this mapping in
some places and the restorer has not function to open proc files.

v2: fix error hangling
xemul: Fixed static-s and args for cr_dump_shmem

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-03-21 11:03:55 +04:00
Andrey Vagin
5dda50468b mm: change offset of zero_page_entry to ~0LL
Because 0 is actually a valid value.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-03-21 10:57:14 +04:00
Andrey Vagin
37a6c1fc88 mm: move shmid to vma_entry (v2)
It will be used to restore shared mappings

v2: clean up

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-03-21 10:56:31 +04:00
Kinsbursky Stanislav
c1999ec58e dump: use fd_params->type for cwd and exe magic
This is a cleanup patch. Use file entry type variable for special files
instead of file entry addr variable.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-03-06 16:59:28 +04:00
Pavel Emelyanov
bad126e7a5 sock: Add dst creds to socket structs
These are required for inet sockets, but were not added since listen
sockets do not have them.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2012-03-02 15:54:42 +04:00
Pavel Emelyanov
f8a18edd44 dump: Remove SHOULD_BE_DEAD task state
Move proc checks for Z-state into seize_task().

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2012-03-01 19:31:20 +04:00
Kinsbursky Stanislav
c19012326d dump: socket queues support
This patch was designed to be generic and thus usable for all kinds of
sockets. Not sure, thah this goal has been reached, but at least I tried.

Key ideas:
1) On-stack structure for collecting sockets queues and then passing them to
   parasite code.
2) Singly linked list is used for collecting structures, representing sockets
   of any kind (!) with queues.

Based on xemul@ patches.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2012-02-29 17:42:30 +04:00
Kinsbursky Stanislav
8ce9e94705 parasite: support sockets queues
This patch adds sockets queue dump functionality. Key ideas
1) sockets info is passed as plain array in parasite args.
2) new socket option SO_PEEK_OFF with MSG_PEEK is used to read the get the
   queue's packets.
3) Buffer for packet will be allocated for each socket separately and with
   size of socket sending buffer. For stream sockets is means, that it's queue
   will be dumped in chunks of this size.

Note: loop around sys_msgrcv() is required for DGRAM sockets - sys_msgrcv()
with MSG_PEEK will return only one packet.

Based on xemul@ patches.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2012-02-29 17:42:30 +04:00
Cyrill Gorcunov
2acc741a3a files: Use sys_kcmp to find file descriptor duplicates v4
We switch generic-object-id concept with sys_kcmp approach,
which implies changes of image format a bit (and since it's
early time for project overall, we're allowed to).

In short -- previously every file descriptor had an ID
generated by a kernel and exported via procfs. If the
appropriate file descriptors were the same objects in
kernel memory -- the IDs did match up to bit. It allows
us to figure out which files were actually the identical
ones and should be restored in a special way.

Once sys_kcmp system call was merged into the kernel,
we've got a new opprotunity -- to use this syscall instead.
The syscall basically compares kernel objects and returns
ordered results suitable for objects sorting in a userspace.

For us it means -- we treat every file descriptor as a combination
of 'genid' and 'subid'. While 'genid' serves for fast comparison
between fds, the 'subid' is kind of a second key, which guarantees
uniqueness of genid+subid tuple over all file descritors found
in a process (or group of processes).

To be able to find and dump file descriptors in a single pass we
collect every fd into a global rbtree, where (!) each node might
become a root for a subtree as well.

The main tree carries only non-equal genid. If we find genid which
is already in tree, we need to make sure that it's either indeed
a duplicate or not. For this we use sys_kcmp syscall and if we
find that file descriptors are different -- we simply put new
fd into a subtree.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
2012-02-28 19:13:47 +04:00
Kinsbursky Stanislav
4141296ed7 IPC: dump semaphores set
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2012-02-15 13:33:46 +04:00
Kinsbursky Stanislav
b3cfe73556 dump: support SYSV IPC vma
This patch introduces the following changes:

1) introduces new flag VMA_AREA_SYSVIPC to mark corresponding vma entries.
2) enhance task /proc/<pid>/maps parsing to obtain first 5 letters of mapped
   file. If device major file belong to ins equal to 0 (tmpfs) and it's name
   starts with "/SYSV", then this mapping is considered as SYSV IPC and
   corresponding vma entry status is updated with VMA_AREA_SYSVIPC flag.
3) omit dumping of mapping pages for SYSV IPC vmas.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2012-02-15 13:30:34 +04:00
Kinsbursky Stanislav
fa2ff60680 IPC: dump message queue
v2: New "MSG_STEAL" functionality is used

Signed-off-by: Stanislav Kinsbursky <skinsbursky@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2012-02-14 20:21:30 +04:00
Kinsbursky Stanislav
f86d167bf1 ipc: rename struct ipc_seg
This name for the structure is obfuscating, because the structure
will be used also for queues and semaphores sets migration.

This patch renames this structure int ipc_desc_entry. It also renames
all related functions and prints to reflect structure name change.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2012-02-13 21:04:23 +04:00
Kinsbursky Stanislav
3d886be2c6 IPC: dump shared memory
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2012-02-09 13:21:46 +04:00
Kinsbursky Stanislav
530f9d9030 IPC: collect and dump tunables sequentially
This patch removes collect stage and dumps tunables object right after
collect.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2012-02-08 16:31:41 +04:00
Cyrill Gorcunov
76a249282e restore: Add checkpoint/restore for /proc/pid/exe symlink
This patch adds ability to checkpoint/restore
/proc/pid/exe symlink, so if a process we've just
checkpointed has been say /path/to/exe, then at restore
time we bring this path back.

There some restiction from kernel side: if
existing /proc/pid/exe already mapped more than
once, the kernel will refuse to change the symlink,
so we need to restore it lately when mmaps of crtools
itself already unmapped (ie via late call in
restorer.c).

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
2012-02-07 20:08:01 +04:00
Andrey Vagin
4d962b27c0 crtools: dump and restore clear_tid_address
pthread_join works with this patch

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2012-02-03 17:28:04 +04:00
Cyrill Gorcunov
405985e964 Add sysctl handling engine
Since we need to operate with sysctls pretty heavy,
better to add some common engine for all handlers.

Based-on-patch-from: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
2012-02-02 21:22:20 +04:00
Cyrill Gorcunov
e61605169f ctrools: Rewrite task/threads stopping engine is back
This commit brings the former "Rewrite task/threads stopping engine"
commit back. Handling it separately is too complex so better try
to handle it in-place.

Note some tests might fault, it's expected.
---

Stopping tasks with STOP and proceeding with SEIZE is actually excessive --
the SEIZE if enough. Moreover, just killing a task with STOP is also racy,
since task should be given some time to come to sleep before its proc
can be parsed.

Rewrite all this code to SEIZE task and all its threads from the very beginning.

With this we can distinguish stopped task state and migrate it properly (not
supported now, need to implement).

This thing however has one BIG problem -- after we SEIZE-d a task we should
seize
it's threads, but we should do it in a loop -- reading /proc/pid/task and
seizing
them again and again, until the contents of this dir stops changing (not done
now).

Besides, after we seized a task and all its threads we cannot scan it's children
list once -- task can get reparented to init and any task's child can call clone
with CLONE_PARENT flag thus repopulating the children list of the already seized
task (not done also)

This patch is ugly, yes, but splitting it doesn't help to review it much, sorry
:(

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2012-02-01 19:49:28 +04:00