2
0
mirror of https://github.com/checkpoint-restore/criu synced 2025-08-30 05:48:05 +00:00

148 Commits

Author SHA1 Message Date
Cyrill Gorcunov
ab82c2de98 Revert "ipc: Drop u32[2] from image, simply use u64 all the time"
This reverts commit 4f83d028ff3062d23357f62583f22381805c6bda.

It breaks IPC test-case, need to investigate.
2012-02-01 19:27:39 +04:00
Cyrill Gorcunov
63b88720a3 Revert "ctrools: Rewrite task/threads stopping engine"
This reverts commit 6da51eee3f6cd7aca9dd88275844e73fb78b767b.

It breaks transition/file_read test case
2012-02-01 19:27:28 +04:00
Pavel Emelyanov
6da51eee3f ctrools: Rewrite task/threads stopping engine
Stopping tasks with STOP and proceeding with SEIZE is actually excessive --
the SEIZE if enough. Moreover, just killing a task with STOP is also racy,
since task should be given some time to come to sleep before its proc
can be parsed.

Rewrite all this code to SEIZE task and all its threads from the very beginning.

With this we can distinguish stopped task state and migrate it properly (not
supported now, need to implement).

This thing however has one BIG problem -- after we SEIZE-d a task we should seize
it's threads, but we should do it in a loop -- reading /proc/pid/task and seizing
them again and again, until the contents of this dir stops changing (not done now).

Besides, after we seized a task and all its threads we cannot scan it's children
list once -- task can get reparented to init and any task's child can call clone
with CLONE_PARENT flag thus repopulating the children list of the already seized
task (not done also)

This patch is ugly, yes, but splitting it doesn't help to review it much, sorry :(

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2012-02-01 17:29:13 +04:00
Cyrill Gorcunov
4f83d028ff ipc: Drop u32[2] from image, simply use u64 all the time
This eliminate

 | ipc_ns.c:287:2: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing]

and makes code simplier.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
2012-02-01 17:23:44 +04:00
Stanislav Kinsbursky
c826057a9c IPC: dump namespace itself
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2012-01-31 22:32:22 +04:00
Pavel Emelyanov
beb158a66e cr: Task creds support
Dumping is simple. All but secbits can be read from proc, secbits
are got from parasite.

Restoring is a bit tricky -- when you change anything on kernel
cred's struct it performs sophisticated checks and can change
some more stuff than requested, so the creds restoration procedure
is carefully commented step-by-step.

Another thing to mention is that creds are restored after everything
else, i.e. right before performing final threads sync and sigreturns.
This is done to avoid potential problems with insufficient caps for
restoring other stuff (e.g. CAP_DAC_OVERRIDE or zero euid is most
likely required for opening any image file and the notorious control
/proc/sys/kernel/ns_last_pid, which in turn is performed till the
very last moment).

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2012-01-30 13:00:50 +04:00
Cyrill Gorcunov
29bda9aae5 sockets: Restore in-flight unix stream sockets
It's done in two steps

 - On checkpoint we find which icons are present
   over all sockets and setup peer number to
   appropriate listening socket

 - On restore we collect listening sockets and once
   we find in-flight connection we search for appropriate
   listening socket name and use it to call connect() then

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
2012-01-27 23:21:06 +04:00
Pavel Emelyanov
16c58dbd11 magic: Fix PIPEFS_MAGIC constant
This one is actually an internal kernel magic number for pipefs filesystem
and shouldn't be changed.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-01-26 20:42:45 +04:00
Pavel Emelyanov
60dee71484 magic: Change magic numbers
Existing ones are boring. Let's switch them into geographical coordinates
of various Russian towns in NNNNEEEE form.

4 digits for a coordinate give us up to 2km of inaccuracy, which is more
than enough to find a town. We cannot use longitude further than 99.99,
i.e. we won't cover the Far East region, but that's OK -- there's more than
enough good candidates even in the European part of the country only.

Feel free to extend.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2012-01-26 19:49:27 +04:00
Pavel Emelyanov
98f4c2e4de ns: Support UTS namespace
Only two fields are modifiable -- hostname and domainname. So
read them on dump and write on restore.

File format is simple --

u32 magic
u32 length of nodename
u8[] nodename string
u32 length of domainname
u8[] domainname string

For OpenVZ we can write the release at the end, but this is later.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2012-01-26 16:54:22 +04:00
Pavel Emelyanov
b7de83aaf3 crtools: Interval timers support
Timers are dumped from inside parasite code, the format is plain -- just
3 pairs of interval/value one-by-one.

The restoration occurs in two stages -- first prepare the timer values in
restorer (and check for sanity), then setup the timers in the latest stage
before actually calling the sigreturn.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2012-01-24 18:41:49 +04:00
Cyrill Gorcunov
415b789cbf image: Add mm_saved_auxv entry
It's needed for auxv dump and restore.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
2012-01-24 18:01:07 +04:00
Cyrill Gorcunov
faf41eb5b2 dump: Dump cmdline and envirion parameters
It implies update to kernel side as well.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
2012-01-24 18:01:07 +04:00
Pavel Emelyanov
18aaad6164 img: Extend task image with state and exit code
Introduce 3 states we will have to work with:

* alive for tasks sleeping or running
* dead for zombies
* stopped for stopped tasks. We cannot distinguish tasks in this state now,
  but with freezer cgroup this will become possible

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2012-01-23 01:43:36 +04:00
Pavel Emelyanov
dbf3c1a8cd crtools: Reformat core_entry
Keep task arch-independent fields in one struct (will be extended) in the
beginning of the image and make pads be located separately.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2012-01-23 01:43:00 +04:00
Stanislav Kinsbursky
f3253a40d2 checkpoint: IPv4 listening sockets dumping support 2012-01-18 12:38:58 +04:00
Pavel Emelyanov
d1b3fd09b3 fdinfo: fd_is_special helper for maps and cwd
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2012-01-16 23:51:12 +04:00
Pavel Emelyanov
e2d8aec7f5 files: Named constant for cwd fdinfo
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2012-01-16 23:50:50 +04:00
Pavel Emelyanov
0d34b2707c crtools: Remove unused and wrong arrays from pstree image
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2012-01-12 22:09:58 +04:00
Andrey Vagin
9129d4e2a1 restore: don't use char in image struct-s
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2012-01-11 18:24:35 +04:00
Andrey Vagin
d6a1cd0fbc restore: Learn to work with shared struct file-s
Some process can share one struct file-s, we may find them by "object IDs".
A file descriptor is opened in one process and send to other via unix socket.

The procedure of restoring files contains four stages.
* Collect data about all file's descriptors
  On this stage we find process which will restore a file descriptor and
  create a list of processes, who should get this descriptor.

* Create datagrams unix sockets
  If a file descriptor should be received, a unix socket is created
  instead of it.

* Open file descriptors
  A process with the least pid opens a file and sends this file
  descriptors to all one who wait it.

* Receive file descriptors.

When we were thinking up this algoritm, we wanted to minimize a number
of context switches. A number of context switches is proportional of a
number of processes.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2012-01-11 16:01:44 +04:00
Pavel Emelyanov
c5eb61e866 Unix sockets initial support
Currently it can only work with stream sockets, which have no skbs in queues
(listening or established -- both work OK).

The cpt part uses the sock_diag engine that was merged to Dave recently to
collect sockets. Then it dumps sockets by checking the filesystem ID of a
failed-to-open through /proc/pid/fd descriptors (sockets do not allow for
such tricks with opens through proc) against SOCKFS_TYPE.

The rst part is more tricky. Listen sockets are just restored, this is simple.
Connected sockets are restored like this:

1. One end establishes a listening anon socket at the desired descriptor;
2. The other end just creates a socket at the desired descriptor;
3. All sockets, that are to be connect()-ed call connect. Unix sockets
   do not block connect() till the accept() time and thus we continue with...
4. ... all listening sockets call accept() and ... dup2 the new fd into the
   accepting end.

There's a problem with this approach -- socket names are not preserved, but
looking into our OpenVZ implementation I think this is OK for existing apps.

What should be done next is:

1. Need to merge the file IDs patches in our tree and make Andrey to
   support files sharing. This will solve the

	sk = socket();
	fork();

   case. Currently it simply doesn't work :(

2. Need to add support for DGRAM sockets -- I wrote comment how to do it
   in the can_dump_unix_sk()

3. Need to add support for in-flight connections

4. Implement support for UDP sockets (quite simple)

5. Implement support for listening TCP sockets (also not very complex)

6. Implement support for connected TCP scokets (hard one, Tejun's patches are not
   very good for this from my POV)

Cyrill, plz, apply this patch and put the above descriptions onto wiki docs (do we
have the plans page yet?).

Andrey, plz, take care of unix sockets tests in zdtm. Most likely it won't work till
you do the shared files support for sockets.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2011-12-26 23:25:04 +04:00
Cyrill Gorcunov
870803fb5f image: Shrink signal entry structure
Since we use pure syscalls there is no
need to keep intermediate layer for signals.

Moreover mask entry moved at the end of the structure
so we will easily expand it if it'll be ever needed.

Note it breaks backward compatibility with older image
but since it's development stage it should be safe.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrew Vagin <avagin@parallels.com>
2011-12-08 19:36:26 +04:00
Cyrill Gorcunov
53c611b630 dump,restore: Use rt_sigaction_t for sys_sigaction
Since we operate with syscalls directly we are
to convert signal's structures between image and
kernel formats, without intermediate glibc layer.

Note this involves chaging sa_entry::flags to u64
(since it's long int value in kernel).

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2011-12-03 17:24:05 +04:00
Cyrill Gorcunov
9be4034048 image: Introduce struct sa_entry
It's needed to keep singnal handlers on
disk with predefined format.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
2011-12-02 23:06:51 +04:00
Andrey Vagin
25434884e1 Dump and restore sigactions (v2)
A parasite code dumps all sigactions in sigact.pid.

v2: remove hard code for sizeof(sigset_t)

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2011-11-30 22:04:09 +04:00
Cyrill Gorcunov
3d55b9d125 dump: Drop VMA_DUMP_ALL flag
It has been used at very early stage when
no mincore call was implemented. Not needed
anymore -- so drop it out.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
2011-11-25 18:39:58 +04:00
Pavel Emelyanov
bb3d02c281 crtools: Take MINCORE_ANON pages into account
Reduce the pages-xxx.img file size significantly (from 2.1M to ~100K for simple counter test)
by not dumping private file pages, that have not yet changed from its file prototype.

If you'll have problems with it, just let me know and comment the definition of PAGE_ANON not
to block your work.

This uses the implemented earlier flag from mincore.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
2011-11-25 18:31:18 +04:00
Pavel Emelyanov
47b7404d73 crtools: Don't save vma's inode info in image
This one isn't used on restore process, since the file mapped is
stored in the fdinfo part of the images.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
2011-11-23 14:03:24 +04:00
Pavel Emelyanov
fb44c9d82b crtools: Don't hold pid on vma image
It's pointless. All vmas are stored in the per-pid image file.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
2011-11-23 14:03:14 +04:00
Cyrill Gorcunov
0fd17a08cb Bring some order in usage of VMA entries helpers
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
2011-11-15 17:12:29 +04:00
Cyrill Gorcunov
bb15450d98 image: Drop tls_array from the image
We use registers set anyway

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
2011-11-15 14:57:39 +04:00
Cyrill Gorcunov
35781a8c6d util: Drop redundant vma_area->vma.status assignment
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
2011-11-15 13:37:17 +04:00
Cyrill Gorcunov
417fe5d8e1 image: Drop redundant VMA_FORCE_WRITE
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
2011-11-15 11:57:24 +04:00
Cyrill Gorcunov
2c0e5db7eb image: Drop redundant offsetof
Already in compiler.h

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
2011-11-15 11:54:01 +04:00
Cyrill Gorcunov
0a26593a3b dump, restore: Add blocked signals mask
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
2011-11-09 00:29:41 +04:00
Cyrill Gorcunov
8a8850d146 dump: Dump TLS via sys_arch_prctl
As such -- no need for kernel patch.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
2011-11-07 16:29:36 +04:00
Cyrill Gorcunov
c32845ef60 dump: Shrink struct core_entry twice
No need to keep it that big. Note from
this patch if we ever deside to use kernel
elf approach -- the image structures are
to be updated in kernel as well.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
2011-10-25 14:59:35 +04:00
Cyrill Gorcunov
af647ce009 dump: Dump threads params as well
We only need registers at the moment

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
2011-10-23 12:43:52 +04:00
Cyrill Gorcunov
4389c021fc dump, kernel: Add some mm structure members into the dump
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
2011-10-12 18:05:07 +04:00
Cyrill Gorcunov
ce65f2f718 dump, kernel: Add start/end_code data
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
2011-10-12 16:02:36 +04:00
Cyrill Gorcunov
f7e6e63b44 kernel, dump: Obtain brk value
Also re-make image to be 2 pages in size
which should be enough for basic params we
need to restore tasks.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
2011-10-12 09:40:02 +04:00
Cyrill Gorcunov
ec9496c147 image: Use CKPT_ARCH_SIZE
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
2011-10-11 10:10:07 +04:00
Cyrill Gorcunov
4b7a318322 dump: Prepare for new core_entry layout
core_entry layout is arch dependant.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
2011-10-11 01:32:39 +04:00
Cyrill Gorcunov
99466eb328 dump: Add dumping a tasks's flags
We need it to figure out if FPU was used
so that we need to restore context later.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
2011-10-10 17:05:12 +04:00
Cyrill Gorcunov
8f0af4f8a6 Restore task's command line as well
Note binary format for core file is changed.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
2011-10-01 13:24:34 +04:00
Cyrill Gorcunov
30f002d21c Add comments on VMA area status flags
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
2011-09-26 01:26:11 +04:00
Cyrill Gorcunov
523de23624 Initial commit
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
2011-09-23 12:00:45 +04:00