2
0
mirror of https://github.com/checkpoint-restore/criu synced 2025-08-30 13:58:34 +00:00
Commit Graph

712 Commits

Author SHA1 Message Date
Jamie Liu
288cf51741 restore: mutate tgt_addr in map_private_vma
prepare_mappings() uses the return value of map_private_vma() for the
size of the mapped vma. Unfortunately the return value of
map_private_vma() is an int, resulting in breakage when the size exceeds
31 bits. Change map_private_vma() to return only an error code, and
mutate addr in-place.

Signed-off-by: Jamie Liu <jamieliu@google.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-02 15:52:49 +04:00
Andrey Vagin
a7fb6a1f41 restore: call post-restore scripts before network-unlock
post-restore script can fail, so it can't be called after network-unlock.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-01 11:21:36 +04:00
Deyan Doychev
69a6bf4439 criu: Add exec-cmd option (v3)
The --exec-cmd option specifies a command that will be execvp()-ed on successful
restore. This way the command specified here will become the parent process of
the restored process tree.

Waiting for the restored processes to finish is responsibility of this command.

All service FDs are closed before we call execvp(). Standad output and error of
the command are redirected to the log file when we are restoring through the RPC
service.

This option will be used when restoring LinuX Containers and it seems helpful
for perf or other use cases when restored processes must be supervised by a
parent.

Two directions were researched in order to integrate CRIU and LXC:

1. We tell to CRIU, that after restoring container is should execve()
   lxc properly explaining to it that there's a new container hanging
   around.

2. We make LXC set himself as child subreaper, then fork() criu and ask
   it to detach (-d) from restore container afterwards. Being a subreaper,
   it should get the container's init into his child list after it.

The main reason for choosing the first option is that the second one can't work
with the RPC service. If we call restore via the service then criu service will
be the top-most task in the hierarchy and will not be able to reparent the
restore trees to any other task in the system. Calling execve from service
worker sub-task (and daemonizing it) should solve this.

Signed-off-by: Deyan Doychev <deyandoichev@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-03-25 01:20:02 +04:00
Tikhomirov Pavel
670d1ce856 v2 page-read: rework open_page_read to use in shmem restore
Signed-off-by: Tikhomirov Pavel <snorcht@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-03-18 11:48:58 +04:00
Cyrill Gorcunov
1153f225ff image: Add O_OPT when trying to open optional image files
During the time some files become obsolete and might be missing
in checkpoint image set, but to keep backward compatibility we
still trying to open them, which might print out error like

 | Unable to open 'path-to-file'

and confuse a reader why criu prints error but continue working.

To eliminate this problem O_OPT flag has been introduced in
commit 16b5692061, which suppress error message priting
if the flag is set.

Now start using O_OPT in the following functions

 - open_irmap_cache: irmap cache is relatively new optional feature

 - prepare_rlimits, open_signal_image, restore_file_locks,
   prepare_fd_pid, prepare_mm_pid, collect_image: all these
   helpers are trying to open image files which can be missing.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-03-17 14:21:21 +04:00
Cyrill Gorcunov
b478b2fb2f rlimit: Restore rlimist from Core data
To save backward compatibility try to read
data from old image if Core entry doesn't
has rlimits bound.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-03-14 15:44:42 +04:00
Pavel Emelyanov
391e4bd7b9 page-read: Sanitize opening routines
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-02-28 15:19:19 +04:00
Tikhomirov Pavel
f0a6d32cd4 v2 deduplication: add auto-dedup on restore
if option --auto-dedup is set on restore, then as soon as page is
restored it will be punched from the image.

open image in O_RDWR mode

Signed-off-by: Tikhomirov Pavel <snorcht@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-02-28 14:11:49 +04:00
Pavel Emelyanov
dc7abdfb92 vma: Don't lookup file_desc for vma twice
We do it first -- on collect, second -- on restore. The
2nd lookup is excessive, we can put fd pointer on vm_area
at lookup and reuse one later.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-02-07 13:51:29 +04:00
Pavel Emelyanov
fd41201975 restore: Parse /proc/self/maps for self mappings
On restore we only need to know currnet task mappings' start and end
to find where to put the restorer blob. And since the smaps file in
/proc/pid is up to 3 times slower, than the maps one, it makes
perfect sense just to parse the latter one.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-02-07 13:32:21 +04:00
Pavel Emelyanov
18a5c90c3b collect: Add comment describing collect order
With packed reg-files we have a complex fd - file - vma - remap interaction. I
think this should be reflected in the code comment.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-02-05 16:18:29 +04:00
Pavel Emelyanov
74c3cc1996 rst: Introduce post-restore action
Useful to test restore time -- just abort restore with this
action and that's it.

Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-02-04 20:54:51 +04:00
Pavel Emelyanov
d8071ffd1a stats: Fix restore pages stats
We errorneously report nr_compared as total number of restored pages.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-02-04 14:03:10 +04:00
Pavel Emelyanov
72e462ad67 mm: Read mmentry early
We'll merge mm and vma images, so mm should be read in the
same place where vmas are.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-02-04 11:44:04 +04:00
Pavel Emelyanov
eb1ae0a025 vma: Turn embeded VmaEntry on vma_area into pointer
On restore we will read all VmaEntries in one big MmEntry object,
so to avoif copying them all into vma_areas, make them be pointable.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-02-04 11:44:01 +04:00
Pavel Emelyanov
446fdd7200 rst: Collect VmaEntries only once on restore
Right now we do it two times -- on shmem prepare and
on the restore itself. Make collection only once as
we do for fdinfo-s -- root task reads all stuff in and
populates tasks' rst_info with it.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-02-03 23:35:03 +04:00
Pavel Emelyanov
0786f831d7 mem: Move shmem preparation routine and rename
We'll collect VmaEntries early before fork.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-02-03 23:34:12 +04:00
Pavel Emelyanov
c8d5f1a215 vma: Use vma_area_is helper where appropriate
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-02-03 17:22:03 +04:00
Pavel Emelyanov
068b4f3c9b rst: Error code got hidden by successful core read
The ret is overwritten by core read sub-routine. Need
to reset it to -1 to keep failing in case of e.g. last
pid sysctl write.

Reported-by: Neal Becker <ndbecker2@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-01-06 01:08:06 +04:00
Pavel Emelyanov
70bb57e20a restore: Run setup-ns scripts before restoring them
We should call external scripts when namespaces are created,
but before we try to fill them with data from images.

This is done so e.g. to make it possible to push external net
links to netns.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-12-26 22:39:12 +04:00
Andrey Vagin
6bbdec26f3 files: add ability to set callbacks for files (v7)
Here is nothing interecting. If a file can't be dumped by criu,
plugins are called. If one of plugins knows how to dump the file,
the file entry is marked as need_callback. On restore if we see
this mark, we execute plugins for restoring the file.

v2: Callbacks are called for all files, which are not supported by CRIU.
v3: Call plugins for a file instead of file descriptor. A few file
descriptors can be associated with one file.
v4: A file descriptor is opened in a callback. It's required for
    restoring anon vmas.
v5: Add a separate type for unsupported files
v6: define FD_TYPES__UNSUPP
v7: s/unsupp/ext (external)

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-12-20 16:07:38 +04:00
Andrey Vagin
d7cf271ed4 crtools: preload libraries (v2)
Libraries (plugins) is going to be used for dumping and restoring
external dependencies (e.g. dbus, systemd journal sockets, charecter
devices, etc)

A plugin can have the cr_plugin_init() and cr_plugin_fini functions for
initialization and deinialization.

criu-plugin.h contains all things, which can be used in plugins.

v2: rename lib to plugin
v3: add a default value for a plugin path.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-12-19 21:48:33 +04:00
Tikhomirov Pavel
5ec15bf25c page-read: add proper processing of return value
when get an error reading image file need stop restore

Signed-off-by: Tikhomirov Pavel <snorcht@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-12-17 19:45:22 +04:00
Pavel Emelyanov
ae98ef6ae0 mount: Factor out mount tree build for NEWNS and non-NS cases
We anyway build the tree, in the NS case -- few calls later.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-12-12 16:19:48 +04:00
Cyrill Gorcunov
cf1ce5f817 mount: Build mount tree on dump restore early, if needed
For paths resolution we will need mount tree to be parsed
and built, but it's not that simple -- the current code
implies that once parsed the tree must not be re-parsed
again, so we pass @parse argument from a caller: if a task
we're restoring do not use mount namespace, we should parse
mount tree early, otherwise defer this action until mount
tree is read from the image.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-12-11 16:05:19 +04:00
Andrey Vagin
57d25e7cea mm: fix expression to determine which vma-s can be shared
Currently only addresses are compared. It's obviously not enough.

* First of all the parent vma must be private.
* Both vma-s must have the identical set of MAP_GROWSDOWN and MAP_FILES
  flags.
* Both vma-s must be linked to the same file.

https://bugzilla.openvz.org/show_bug.cgi?id=2824
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-11-22 18:19:23 +04:00
Andrey Vagin
7659c995f5 vm: don't overwrite vma->shmid for private mappings
shmid contains a file id for file mappings. It's required to determine,
which VMA-s are cowed. The parent maps a VMA and saves premmaped
address. Then  child trys to determing, which VMA-s must be inhereted
from parent, for that it compares addresses, flags and file id.

We don't want to transfer vma_area-s in restorer, so when a VMA entry is
copied in restorer memory, the premmaped address is save in shmid.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-11-22 18:19:08 +04:00
Pavel Emelyanov
c3b9448cf7 pidfile: Don't push opts.pidfile as write_pidfile arg
opts are criu-wide available.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-11-20 14:26:41 +04:00
Ruslan Kuprieiev
dc80d6f125 log: get rid of LOG_DIR_FD_OFF and opening cwd in log_init()
Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-11-15 21:38:41 +04:00
Pavel Emelyanov
a2917ffc87 rst: Initialize task restore args after rst-mem remap
The plan is to move the args on rst-mem itself. Thus we need
to make sure it's initialized _after_ remap into restorer space.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-11-08 17:57:04 +04:00
Pavel Emelyanov
3e895cc2da rst: Rename task_restore_core_args
Remove the _core_ from it.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-11-08 17:32:07 +04:00
Pavel Emelyanov
ec7e483e8b restorer: Make task- and thread- args go one-by-one
Currently we have task args page-aligned, then there go
thread args. This is waste of memory. Let's put them in
one row.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-11-08 17:29:55 +04:00
Andrey Vagin
4850fd94a8 crtools: move cr_options in a separate header
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-11-06 18:17:52 +04:00
Andrey Vagin
0d1dfc2e08 crtools: move all stuff about vma together
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-11-06 12:43:49 +04:00
Andrey Vagin
824403a009 crtools: create new header for servicefd stuff (v2)
v2: generate patch relative to the official git.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-11-06 12:43:02 +04:00
Pavel Emelyanov
ba0527d42b restore: Remove actually unused variable from sigreturn_restore
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-11-04 00:16:34 +04:00
Pavel Emelyanov
32b4a26c6b restore: Comment why we need copy data on task restore args
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-11-04 00:16:14 +04:00
Pavel Emelyanov
ebd76f4bec restore: Move sigpending out of sigreturn_restore
The sigreturn_restore is the place when we prepare the restorer
layout and jump to it. Reading and decoding images should be done
earlier. The new rst-malloc engine allows for that.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-11-04 00:15:39 +04:00
Pavel Emelyanov
91f797f66f restore: Move posix-timers out of sigreturn_restore
The sigreturn_restore is the place when we prepare the restorer
layout and jump to it. Reading and decoding images should be done
earlier. The new rst-malloc engine allows for that.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-11-04 00:15:15 +04:00
Pavel Emelyanov
ed88f2df66 restore: Move rlimits out of sigreturn_restore
The sigreturn_restore is the place when we prepare the restorer
layout and jump to it. Reading and decoding images should be done
earlier. The new rst-malloc engine allows for that.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-11-04 00:14:42 +04:00
Pavel Emelyanov
4f675313cc rst-malloc: Switch to private allocations once forked
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-11-03 17:40:15 +04:00
Pavel Emelyanov
ca0b51bc00 rst: Close logdir earlier
Just a code sanitation.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-11-03 17:37:10 +04:00
Pavel Emelyanov
3e235f715e restore: Get self maps after allocating necessary memory
We're filling some rst-mem data _after_ we get the self maps
list. This is a bug, since the restorer vma get forcedly mapped
into a place we get out of self-vmas-list.

Move the self-vmas-list getting after we allocate the memory
we need.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-11-03 17:23:31 +04:00
Ruslan Kuprieiev
95a961b739 log: don't kill task, if unable to write pidfile
write_pidfile() was taken out from cr-restore.c, where it was supposed to kill
child if unable to create pidfile. Now we're also using it at service/page-server where kill is redundant. So lets take out kill() from write_pidfile() back to cr-restore.

Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-11-03 12:51:13 +04:00
Pavel Emelyanov
45f39e0415 rst: Make shmem restore to use rst-malloc
This actually fixes a bug -- memory for shmem info was
not allocated dynamically, thus we were limited in the
amount of shmems to be restored.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-11-02 01:06:31 +04:00
Pavel Emelyanov
7ed35d8a87 rst: Switch private rst-mem allocator on generic rst-malloc
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-11-02 01:05:13 +04:00
Cyrill Gorcunov
0707df7745 restore: Don't unmap vdso proxy on final cleanup
In case if we need to use vdso proxy the memory area
which holds restorer also has a place for vdso proxy
code itself, so on final pass we should not unmap it,
otherwise any call to vdso function will cause sigsegv.

IOW, the memory before final "cleanup" pass of restorer
might look as

    +-----------+---------+     +-------------+------+
    | bootstrap | rt-vdso | ... | application | vdso |
    +-----------+---------+     +-------------+------+
                       ^                         |
                       `-------------------------+

and we have redirected "vdso" code to jump to "rt-vdso".
After final pass the memory must look as

                +---------+     +-------------+------+
                | rt-vdso | ... | application | vdso |
                +---------+     +-------------+------+
                       ^                         |
                       `-------------------------+

I noticed this problem during container migration
testing, the container itself was suspended on 2.6.32
OpenVZ kernel with apache running inside, and any attempt
to connect to apache caused apache to crash.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-10-30 16:30:57 +04:00
Pavel Emelyanov
389b814632 restore: Make optional images check right after open
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-10-29 23:00:24 +04:00
Pavel Emelyanov
bb5476cb63 restore: Put tgt vmas in rst-mem, not special purpose memory
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-10-29 23:00:06 +04:00
Pavel Emelyanov
d7db85e9dc restore: Iterate tgt vmas by number, not by terminating point
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-10-29 22:59:55 +04:00