2
0
mirror of https://github.com/checkpoint-restore/criu synced 2025-08-29 05:18:00 +00:00

83 Commits

Author SHA1 Message Date
Pavel Emelyanov
705bf5aba6 img: Hide image dir fd into util.c
No image copy occurs since 2b175282, so this
fd can be hidden. Thus fixind the FIXME near it :)

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-04-06 21:39:10 +04:00
Pavel Emelyanov
b386751697 sockets: Rework unix sockets onto fdinfo scheme
This is a big change, yes. Dump unix sockets in the same manner
as all the other files are done now. A few notes however.

1. We explicitly drop names for connected stream sockets. This is
   done to avoid conflicts with names -- accepted sockets share their
   names with the listening parent. This can be done later by binding
   a socket to a name, them renaming it to some temporary uniq one
   and at the very very end renaming some back to original.

2. Interconnected sockets are restored via socketpair() call. This is
   correct, but names are dropped. Need to bind() sockets after this
   (yes, this can be done), but for this we need to implement the trick
   with renames described before.

3. FD for socket queues is constantly re-opened not to resolve fd
   conflicts. Need to use service fds engine for this later.

4. Some code cleanup is still required, yes (will follow shortly).

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-04-06 19:27:08 +04:00
Andrey Vagin
96be8be2d1 pipe: save all pipe data in a separate file
A pipe buffer has 16 slots. A slot is page, offset and size.
When we use splice and data is not aligned, splice connects
a page from file cache and set offset. For this reason we loose
a part of buffer.

If a data size is more than 15 pages, data will be aligned in a image.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-04-05 21:23:57 +04:00
Andrey Vagin
bdb3932be5 pipe: all pipes are saved in one file (v2)
Information about pipe's file structs saved in one global file and
fdinfo_entry is saved for each descriptor

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-04-05 21:17:24 +04:00
Pavel Emelyanov
056f41fb46 img: Remove unused open_image_ro_nocheck
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-04-05 15:43:50 +04:00
Pavel Emelyanov
a541fa8df3 rst: Remove unused pstree_pid
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-04-05 14:23:21 +04:00
Pavel Emelyanov
2a33c4d5dc mem: Remove zero page from the end of mem image files
This was required when pages were stored in elf files for
exec. Now we can stop reading it on eof.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-04-05 14:07:31 +04:00
Pavel Emelyanov
9b2617353b inet: Rework inet sk dumping on new fdinfo scheme
Now every inetsk fd dump results in a new entry in the fdinfo.img file. Sockets itself are
dumped into inetsk.img global image file. On restore the generic fdinfo redistribution algo
is used and inet sockets are opened only when required.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-03-27 12:42:59 +04:00
Pavel Emelyanov
c58abfd03d show: Introduce ->show callback for fdset
Each fdset item now has the callback which will show a contents of a magic-described
image file. Per-task and global show code is reworked to walk the respective fdsets
and calling ->show on each file.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-03-27 12:01:14 +04:00
Pavel Emelyanov
82b7c07ca9 show: Fix 'all' mode showing
After we removed the pid from pstree image file the -t or -p option for show
command no longer makes sense. Make 'show' mode rely on -D option to find out
where to find the root (i.e. pstree.img) file.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-03-27 11:04:23 +04:00
Pavel Emelyanov
4a3861acb8 fdset: Introduce glbal fdset
This contains reg-files and sk-queues images, as they contain data
which is potentially generated by every task, so keep it open all
the time dump goes.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-03-26 22:57:07 +04:00
Pavel Emelyanov
1fb1d94186 fdset: Introduce new fdsets
Current fdsets are ugly, limited (bitmask will exhaust in several months) and
suffer from unknown problems with fdsets reuse :(

With new approach (this set) the images management is simple. The basic function
is open_image, which gives you an fd for an image. If you want to pre-open several
images at once instead of calling open_image every single time, you can use the
new fdsets.

Images CR_FD_ descriptors should be grouped like

_CR_FD_FOO_FROM,
CR_FD_FOO_ITEM1,
CR_FD_FOO_ITEM2,
..
CR_FD_FOO_ITEMN,
_CR_FD_FOO_TO,

After this you can call cr_fd_open() specifying ranges -- _FROM and _TO macros,
it will give you an cr_fdset object. Then the fdset_fd(set, type) will give you
the descriptor of the open "set" group corresponding to the "type" type.

3 groups are introduced in this set -- tasks, ns and global.

That's it.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-03-26 22:57:04 +04:00
Pavel Emelyanov
44a6dd4ff5 fdset: Reorder CR_FD_ types
Move pstree and sk-queues below, they are not per-task/-ns and
will be global soon.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-03-26 22:57:02 +04:00
Pavel Emelyanov
3858ee4950 fdset: Introduce two fdsets -- task and ns
Write two helpers for opening an fdset for task and one for ns.

This probably can be done with some "generic" macro(s), but this
time it's simpler not to produce more code of that type.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-03-26 22:57:00 +04:00
Pavel Emelyanov
bcf9ee3d1c fdset: Helper for getting fd out of a set
This patch does

s/$fdset->fds[$nr]/fdset_fd($fdset, $nr)/

over the code.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-03-26 22:56:59 +04:00
Pavel Emelyanov
7241b9291b fdset: Kill ability to re-use fdset
It's not required any longer. Now fdsets are allocated one-by-one only
when required and there's no need in adding new fds to existing sets.

Thus just remove the last arg from cr_fdset_open.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-03-26 22:56:51 +04:00
Pavel Emelyanov
f1429be087 dump: Don't include sk-queues fd in task set
This fd is global, so make it such. It will stop being just a global
variable soon.

Plus, remove the pid arg from format.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-03-26 22:56:44 +04:00
Stanislav Kinsbursky
b82edc9fdf crtools: dump pstree to image file without pid number
Pid number is redundant - this file is one for the whole tree.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-03-26 15:46:07 +04:00
Stanislav Kinsbursky
f659f64247 crtools: make pid parameter optional for open_image_ro()
Signed-off-by: Stanislav Kinsbursky <skinsbursky@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-03-26 15:43:44 +04:00
Pavel Emelyanov
95f957b837 image: New image file for regfiles
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-03-25 21:11:58 +04:00
Pavel Emelyanov
0fb534a947 util: Fix open_image_ro definition
No colon at the end and handle empty ... set properly.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-03-25 21:05:32 +04:00
Pavel Emelyanov
7a451ff108 rst: Open exe link fd early
Open the exec link at fd restore stage as yet another service fd,
then pass it to restover via args and just call prctl on it.

This is good for several reasons -- the amount of code required for
this is less and opening files should better happen before we switch
to restorer (opening will be complex and it's MUCH easier to open all
we need in one place).

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-03-24 15:30:20 +04:00
Pavel Emelyanov
97a1d8bb1c mm: Dump vmas into separate image file
The core image now contains only core per-task stuff.
The new file resurrects Tula magic number removed earlier.

Acked-by: Andrey Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-03-21 18:17:12 +04:00
Andrey Vagin
c3051512f3 crtools: delete CORE_OUT
It's a rudiment from old times, when restore worked via ececve.
Now we modify the core file in place to fixup vma-s.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-03-21 11:06:10 +04:00
Andrey Vagin
e869c16df5 mm: rework of dumping shared memory
vma_entry contains shmid and all shared memory are dumped in own files.
The most interesting thing is restore.
A maping is restored by process with the smallest pid. The mamping
is created before executing restorer.
We map a full mapping and restore it's conten, then we open a file from
/proc/pid/map_files and store a descriptor in vma_info. The mapping is
unmaped. Now we can map any region of this mapping in the restorer.

We use this trick, because a target process may have this mapping in
some places and the restorer has not function to open proc files.

v2: fix error hangling
xemul: Fixed static-s and args for cr_dump_shmem

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-03-21 11:03:55 +04:00
Andrey Vagin
31feef8ab4 mm: s/PAGES_SHMEM/SHMEM_PAGES
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-03-21 10:57:23 +04:00
Andrey Vagin
37a6c1fc88 mm: move shmid to vma_entry (v2)
It will be used to restore shared mappings

v2: clean up

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-03-21 10:56:31 +04:00
Andrey Vagin
5ca347889e crtools: support any format of image path (v3)
Now a name of an image file is hard coded ("smth-%d.img", pid),
but the images of namespaces, shared memery, etc belong to
not one task, so they may have other formats of names, which
will describe objects.

For example a image of shared memory content may have name like
this ("pages-shmem-%ld.img", shmid)

v2: fix comment
v3: rebase

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-03-19 15:45:12 +04:00
Pavel Emelyanov
ffacd0f17c image: Open images via openat
Using absolute paths for this is dangerous - while doing c/r we should
be extremely carefully and not change tasks' roots and mount namespaces
too early. Sometimes it will not work -- when restoring containers we'll
be unable to switch to new CT and still have the ability to open images.

Rework the images opening via openat and keep the image dir fd open all
the time as the service fd (introduced earlier).

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-03-16 20:45:50 +04:00
Pavel Emelyanov
3dd8f2e659 fds: Introduce service fd-s
These are the fds that help us to do c/r. We want them not to intersect
with any "real-life" ones and thus store them close the the file rlimit
boundary. For now only the logfd one is such.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-03-16 20:45:42 +04:00
Kinsbursky Stanislav
fe1cf26085 dump: add const qualifiers where possible
Signed-off-by: Stanislav Kinsbursky <skinsbursky@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-03-06 17:00:42 +04:00
Pavel Emelyanov
85f18ef02f restorer: Pass self-vmas via memory, not temp file
Reserve more mem for bootrstrap code and put all self vmas at its tail.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2012-03-02 21:31:35 +04:00
Pavel Emelyanov
c39e759048 check: Initial skeleton
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2012-03-02 14:15:09 +04:00
Cyrill Gorcunov
8a8ce9b342 show: Don't forget to open sockets queue descriptor before accessing it
Otherwise I'm getting errors like

CR_FD_SK_QUEUES
----------------
Error (./include/util.h:102): Can't read img file: Bad file descriptor
----------------

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2012-03-02 11:58:55 +04:00
Pavel Emelyanov
9e0b308af0 dump/restore: Rework final-state switch
Remove CR_TASK_XXX states, use the TASK_XXX ones (for image). This is
required to unseize tasks properly in the next patches.

Plus, make sure that pstree_list and the seized set coincide (i.e.
handle error in collect_task).

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2012-03-01 19:31:20 +04:00
Pavel Emelyanov
0afad031d5 dump: Check for process/threads tree not to change after seizeing
When we've seized all the tasks and threads found in /proc check for
the /proc contents be the same. Do it one-by-one as we descend the tree.
This is OK, since tasks cannot create kids for anyone but themselves or
their parents (reparent will be handled later).

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2012-03-01 19:31:20 +04:00
Pavel Emelyanov
199e8d8248 dump: Check for pids reuse at suspend
While we try to seize task it can die and give its pid to
somebody else. This can break pstree consistency. Check for
parent being valid after task is seized.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2012-03-01 19:31:20 +04:00
Kinsbursky Stanislav
46535130c6 restore: UNIX sockets queue support
Based on xemul@ patches.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2012-02-29 17:42:30 +04:00
Kinsbursky Stanislav
c19012326d dump: socket queues support
This patch was designed to be generic and thus usable for all kinds of
sockets. Not sure, thah this goal has been reached, but at least I tried.

Key ideas:
1) On-stack structure for collecting sockets queues and then passing them to
   parasite code.
2) Singly linked list is used for collecting structures, representing sockets
   of any kind (!) with queues.

Based on xemul@ patches.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2012-02-29 17:42:30 +04:00
Kinsbursky Stanislav
4141296ed7 IPC: dump semaphores set
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2012-02-15 13:33:46 +04:00
Kinsbursky Stanislav
fa2ff60680 IPC: dump message queue
v2: New "MSG_STEAL" functionality is used

Signed-off-by: Stanislav Kinsbursky <skinsbursky@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2012-02-14 20:21:30 +04:00
Kinsbursky Stanislav
3d886be2c6 IPC: dump shared memory
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2012-02-09 13:21:46 +04:00
Kinsbursky Stanislav
a0ec1002b2 crtools: cleanup fdset initalization
v2: wrappers names become less obfuscating

This patch:
1) Updates function cr_fdset_open() to be suitable for handling fdset creation
for dump and show stages.
2) Replaces cr_fdset_open() by new wrapper function cr_fdset_dump().
3) Replaces prep_cr_fdset_for_restore() by new wrapper function cr_fdset_show().

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2012-02-08 21:43:12 +04:00
Kinsbursky Stanislav
530f9d9030 IPC: collect and dump tunables sequentially
This patch removes collect stage and dumps tunables object right after
collect.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2012-02-08 16:31:41 +04:00
Cyrill Gorcunov
e61605169f ctrools: Rewrite task/threads stopping engine is back
This commit brings the former "Rewrite task/threads stopping engine"
commit back. Handling it separately is too complex so better try
to handle it in-place.

Note some tests might fault, it's expected.
---

Stopping tasks with STOP and proceeding with SEIZE is actually excessive --
the SEIZE if enough. Moreover, just killing a task with STOP is also racy,
since task should be given some time to come to sleep before its proc
can be parsed.

Rewrite all this code to SEIZE task and all its threads from the very beginning.

With this we can distinguish stopped task state and migrate it properly (not
supported now, need to implement).

This thing however has one BIG problem -- after we SEIZE-d a task we should
seize
it's threads, but we should do it in a loop -- reading /proc/pid/task and
seizing
them again and again, until the contents of this dir stops changing (not done
now).

Besides, after we seized a task and all its threads we cannot scan it's children
list once -- task can get reparented to init and any task's child can call clone
with CLONE_PARENT flag thus repopulating the children list of the already seized
task (not done also)

This patch is ugly, yes, but splitting it doesn't help to review it much, sorry
:(

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2012-02-01 19:49:28 +04:00
Cyrill Gorcunov
63b88720a3 Revert "ctrools: Rewrite task/threads stopping engine"
This reverts commit 6da51eee3f6cd7aca9dd88275844e73fb78b767b.

It breaks transition/file_read test case
2012-02-01 19:27:28 +04:00
Pavel Emelyanov
6da51eee3f ctrools: Rewrite task/threads stopping engine
Stopping tasks with STOP and proceeding with SEIZE is actually excessive --
the SEIZE if enough. Moreover, just killing a task with STOP is also racy,
since task should be given some time to come to sleep before its proc
can be parsed.

Rewrite all this code to SEIZE task and all its threads from the very beginning.

With this we can distinguish stopped task state and migrate it properly (not
supported now, need to implement).

This thing however has one BIG problem -- after we SEIZE-d a task we should seize
it's threads, but we should do it in a loop -- reading /proc/pid/task and seizing
them again and again, until the contents of this dir stops changing (not done now).

Besides, after we seized a task and all its threads we cannot scan it's children
list once -- task can get reparented to init and any task's child can call clone
with CLONE_PARENT flag thus repopulating the children list of the already seized
task (not done also)

This patch is ugly, yes, but splitting it doesn't help to review it much, sorry :(

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2012-02-01 17:29:13 +04:00
Stanislav Kinsbursky
c826057a9c IPC: dump namespace itself
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2012-01-31 22:32:22 +04:00
Stanislav Kinsbursky
0213d3ec64 namespaces: parametrized namespace option introduced
v2: strlen() check removed from parse_ns_string()

Now '-n' option must be followed by namespaces tags, separated by commas.
Currently, only "uts" namespace is supported.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2012-01-31 22:32:22 +04:00
Pavel Emelyanov
beb158a66e cr: Task creds support
Dumping is simple. All but secbits can be read from proc, secbits
are got from parasite.

Restoring is a bit tricky -- when you change anything on kernel
cred's struct it performs sophisticated checks and can change
some more stuff than requested, so the creds restoration procedure
is carefully commented step-by-step.

Another thing to mention is that creds are restored after everything
else, i.e. right before performing final threads sync and sigreturns.
This is done to avoid potential problems with insufficient caps for
restoring other stuff (e.g. CAP_DAC_OVERRIDE or zero euid is most
likely required for opening any image file and the notorious control
/proc/sys/kernel/ns_last_pid, which in turn is performed till the
very last moment).

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2012-01-30 13:00:50 +04:00