2
0
mirror of https://github.com/checkpoint-restore/criu synced 2025-08-30 13:58:34 +00:00

634 Commits

Author SHA1 Message Date
Andrey Vagin
c838494034 mem: stop searching a parent vma if we found one
A parent vma can be only one.

Fixes: 57d25e7cea12 ("mm: fix expression to determine which vma-s can be shared")
Reported-by: Filipe Brandenburger <filbranden@google.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-14 01:00:22 +04:00
Andrey Vagin
b57497ade5 mem: fix typo in determining an address of parent vma
Look at this hunk from 7659c995f58f:
-    paddr = decode_pointer(vma_premmaped_start(&p->vma));
+    paddr = decode_pointer(vma->premmaped_addr);

Obviously we want to use p->premmaped_addr instead of
vma->premmaped_addr.

Fixes: 7659c995f58f ("vm: don't overwrite vma->shmid for private mappings")
Reported-by: Filipe Brandenburger <filbranden@google.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-14 01:00:13 +04:00
Andrey Vagin
20003300d8 mount: remove extra calls of mntns_collect_root()
Now mntns_collect_root() should be called each time when we need to get
a root of a specified namespace and we don't need to call it for
initializing the global variable.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-22 23:49:43 +04:00
Pavel Emelyanov
79f3e90856 rst: Less arguments to restore_task_mnt_ns
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-22 23:48:46 +04:00
Pavel Emelyanov
8550f52017 mnt: Move local mntns collecting on restore into prepare_mnt_ns
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-22 23:48:43 +04:00
Andrey Vagin
2f4be997b6 mount: use per-namespace mntinfo_tree (v2)
This patch removes the global mntinfo_tree and collect_mount_info where
it was constructed. The mntinfo list is filled from dump_mnt_ns,
rst_collect_local_mntns, collect_mnt_namespaces and read_mnt_ns_img.

A mountinfo entry contains a reference on a proper ns_id entry, so
we cau use mnt_id to look up a proper mount namespace.

v2: remove trash after rebasing.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:40:19 +04:00
Andrey Vagin
5418938ec3 resotre: collect mounts of current mntns
It's required for restoring in the current mntns.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:39:46 +04:00
Andrey Vagin
de4326a382 mount: return descriptor from mntns_collect_root
We are going to support nested mount namespaces, so files can be opened
from more than one namespace and a root must be collect for each file.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:39:32 +04:00
Andrey Vagin
36a9f31665 restore: close PROC_FD_OFF before calling sigreturn
mntns_collect_root() uses PROC_FD_OFF, so we need to close it.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:39:30 +04:00
Andrey Vagin
d2012883ab criu: rename current_ns_mask to root_ns_mask (v2)
Now we supports sub-mntns, so root_ns_mask sounds more correct than
current_ns_mask.

v2: typo fix
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:38:33 +04:00
Andrey Vagin
3a291e33ff crtools: restore nested mount namespaces (v2)
Known issue:
* currently only namespaces with the same root is supported
* nested namespaces can be dumped and restored only if the root task
  has own mount namespace.

All nested namespaces are restored in a root namespace in temporary
directories. All mount points restored in one tree and then they are
divided into namesaces.
The task with minimal pid for each namespaces unshared mntns and
then it makes pivot_root in a proper temporary directory. All other
tasks makes setns to enter into a mount namespace of the task with
minimal pid.

v2: clean up

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:38:17 +04:00
Andrey Vagin
e7e9c2ee6e mounts: create a temporary directory for restoring non-root mntns (v2)
All non-root namespaces will be restored as sub-trees of the root tree.

This patch adds helpers to create a temporary directory and mount tmpfs
in it, then create directories for each non-root mount namespace.

tmpfs is quite useful here to simplify destroying this construction,
we don't need to unmount each namespace separately.

v2: add a comment why MNT_DETACH is not dangerous here
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:38:12 +04:00
Pavel Emelyanov
e8ac085af8 Revert "crtools: close all desriptors only for the root task"
We have a race. Consider we have 3 tasks, A, B and C. A and B
share fdtable, C -- does not. Then we might be in a situation
when A is restoring memory reading mem images, and B -- forking
the C child. In that case descriptors held by A (for mem restore)
will be inherited by C and will not get closed.

This reverts commit d36e07aabe073993d8ae9695e33f6e45b2eb6a21.
2014-04-21 14:48:05 +04:00
Andrey Vagin
946eadd598 mount: open_mount uses __open_mountpoint instead of own logic
Now we have two funсtions which do mostly the same, so this patch merges
them.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-17 12:03:15 +04:00
Pavel Emelyanov
302591aa05 rlimits: Reshuffle new and legacy restoration code
Do the same as was done with timers.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-17 12:01:10 +04:00
Pavel Emelyanov
1d438db66d rlimits: Move entries from top-core into task-core
This appeared after latest 1.2, so it's still possible
to do this move.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-17 12:01:08 +04:00
Pavel Emelyanov
35e560de00 timers: Reshuffle new and legacy restoration code
Make explicit checks and helpers for legacy images.
This should facilitate its removal some day in the
future.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-17 12:01:06 +04:00
Pavel Emelyanov
87da9b83ce posix-timers: Clean restore code flow
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-17 12:01:04 +04:00
Pavel Emelyanov
b54e340945 core: Move posix timers on core entry
This as well gives us minus one image per-task and
allocates more space on core task entry.

One thing to note -- the amount of posix timers is
not easily accessible at the core entry allocation
time, so the respective array is allocated on demand.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-17 12:00:54 +04:00
Pavel Emelyanov
dfd5a62f38 core: Move itimers on core
This allows to have one image less per-task, which in turn
reduces live migration time a little bit.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-17 12:00:52 +04:00
Andrey Vagin
70d9780ddd restore: call close_old_fds() after mounting /proc
A process can be restored in a new pidns. close_old_fds() opens
the /proc/PID directory. Without this patch we can see errors like this:
(00.333915)      1: Error (util.c:102): Unable to close fd 6: Bad file descriptor

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-11 16:50:16 +04:00
Andrey Vagin
d36e07aabe crtools: close all desriptors only for the root task
For all other tasks only unsed service descriptors will be closed.

This change allows to have file descriptors, which may be used for
restoring namespaces. All non-server descriptors must be closed before
restoring files.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-09 15:50:40 +04:00
Jamie Liu
288cf51741 restore: mutate tgt_addr in map_private_vma
prepare_mappings() uses the return value of map_private_vma() for the
size of the mapped vma. Unfortunately the return value of
map_private_vma() is an int, resulting in breakage when the size exceeds
31 bits. Change map_private_vma() to return only an error code, and
mutate addr in-place.

Signed-off-by: Jamie Liu <jamieliu@google.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-02 15:52:49 +04:00
Andrey Vagin
a7fb6a1f41 restore: call post-restore scripts before network-unlock
post-restore script can fail, so it can't be called after network-unlock.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-01 11:21:36 +04:00
Deyan Doychev
69a6bf4439 criu: Add exec-cmd option (v3)
The --exec-cmd option specifies a command that will be execvp()-ed on successful
restore. This way the command specified here will become the parent process of
the restored process tree.

Waiting for the restored processes to finish is responsibility of this command.

All service FDs are closed before we call execvp(). Standad output and error of
the command are redirected to the log file when we are restoring through the RPC
service.

This option will be used when restoring LinuX Containers and it seems helpful
for perf or other use cases when restored processes must be supervised by a
parent.

Two directions were researched in order to integrate CRIU and LXC:

1. We tell to CRIU, that after restoring container is should execve()
   lxc properly explaining to it that there's a new container hanging
   around.

2. We make LXC set himself as child subreaper, then fork() criu and ask
   it to detach (-d) from restore container afterwards. Being a subreaper,
   it should get the container's init into his child list after it.

The main reason for choosing the first option is that the second one can't work
with the RPC service. If we call restore via the service then criu service will
be the top-most task in the hierarchy and will not be able to reparent the
restore trees to any other task in the system. Calling execve from service
worker sub-task (and daemonizing it) should solve this.

Signed-off-by: Deyan Doychev <deyandoichev@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-03-25 01:20:02 +04:00
Tikhomirov Pavel
670d1ce856 v2 page-read: rework open_page_read to use in shmem restore
Signed-off-by: Tikhomirov Pavel <snorcht@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-03-18 11:48:58 +04:00
Cyrill Gorcunov
1153f225ff image: Add O_OPT when trying to open optional image files
During the time some files become obsolete and might be missing
in checkpoint image set, but to keep backward compatibility we
still trying to open them, which might print out error like

 | Unable to open 'path-to-file'

and confuse a reader why criu prints error but continue working.

To eliminate this problem O_OPT flag has been introduced in
commit 16b5692061e2, which suppress error message priting
if the flag is set.

Now start using O_OPT in the following functions

 - open_irmap_cache: irmap cache is relatively new optional feature

 - prepare_rlimits, open_signal_image, restore_file_locks,
   prepare_fd_pid, prepare_mm_pid, collect_image: all these
   helpers are trying to open image files which can be missing.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-03-17 14:21:21 +04:00
Cyrill Gorcunov
b478b2fb2f rlimit: Restore rlimist from Core data
To save backward compatibility try to read
data from old image if Core entry doesn't
has rlimits bound.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-03-14 15:44:42 +04:00
Pavel Emelyanov
391e4bd7b9 page-read: Sanitize opening routines
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-02-28 15:19:19 +04:00
Tikhomirov Pavel
f0a6d32cd4 v2 deduplication: add auto-dedup on restore
if option --auto-dedup is set on restore, then as soon as page is
restored it will be punched from the image.

open image in O_RDWR mode

Signed-off-by: Tikhomirov Pavel <snorcht@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-02-28 14:11:49 +04:00
Pavel Emelyanov
dc7abdfb92 vma: Don't lookup file_desc for vma twice
We do it first -- on collect, second -- on restore. The
2nd lookup is excessive, we can put fd pointer on vm_area
at lookup and reuse one later.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-02-07 13:51:29 +04:00
Pavel Emelyanov
fd41201975 restore: Parse /proc/self/maps for self mappings
On restore we only need to know currnet task mappings' start and end
to find where to put the restorer blob. And since the smaps file in
/proc/pid is up to 3 times slower, than the maps one, it makes
perfect sense just to parse the latter one.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-02-07 13:32:21 +04:00
Pavel Emelyanov
18a5c90c3b collect: Add comment describing collect order
With packed reg-files we have a complex fd - file - vma - remap interaction. I
think this should be reflected in the code comment.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-02-05 16:18:29 +04:00
Pavel Emelyanov
74c3cc1996 rst: Introduce post-restore action
Useful to test restore time -- just abort restore with this
action and that's it.

Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-02-04 20:54:51 +04:00
Pavel Emelyanov
d8071ffd1a stats: Fix restore pages stats
We errorneously report nr_compared as total number of restored pages.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-02-04 14:03:10 +04:00
Pavel Emelyanov
72e462ad67 mm: Read mmentry early
We'll merge mm and vma images, so mm should be read in the
same place where vmas are.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-02-04 11:44:04 +04:00
Pavel Emelyanov
eb1ae0a025 vma: Turn embeded VmaEntry on vma_area into pointer
On restore we will read all VmaEntries in one big MmEntry object,
so to avoif copying them all into vma_areas, make them be pointable.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-02-04 11:44:01 +04:00
Pavel Emelyanov
446fdd7200 rst: Collect VmaEntries only once on restore
Right now we do it two times -- on shmem prepare and
on the restore itself. Make collection only once as
we do for fdinfo-s -- root task reads all stuff in and
populates tasks' rst_info with it.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-02-03 23:35:03 +04:00
Pavel Emelyanov
0786f831d7 mem: Move shmem preparation routine and rename
We'll collect VmaEntries early before fork.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-02-03 23:34:12 +04:00
Pavel Emelyanov
c8d5f1a215 vma: Use vma_area_is helper where appropriate
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-02-03 17:22:03 +04:00
Pavel Emelyanov
068b4f3c9b rst: Error code got hidden by successful core read
The ret is overwritten by core read sub-routine. Need
to reset it to -1 to keep failing in case of e.g. last
pid sysctl write.

Reported-by: Neal Becker <ndbecker2@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-01-06 01:08:06 +04:00
Pavel Emelyanov
70bb57e20a restore: Run setup-ns scripts before restoring them
We should call external scripts when namespaces are created,
but before we try to fill them with data from images.

This is done so e.g. to make it possible to push external net
links to netns.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-12-26 22:39:12 +04:00
Andrey Vagin
6bbdec26f3 files: add ability to set callbacks for files (v7)
Here is nothing interecting. If a file can't be dumped by criu,
plugins are called. If one of plugins knows how to dump the file,
the file entry is marked as need_callback. On restore if we see
this mark, we execute plugins for restoring the file.

v2: Callbacks are called for all files, which are not supported by CRIU.
v3: Call plugins for a file instead of file descriptor. A few file
descriptors can be associated with one file.
v4: A file descriptor is opened in a callback. It's required for
    restoring anon vmas.
v5: Add a separate type for unsupported files
v6: define FD_TYPES__UNSUPP
v7: s/unsupp/ext (external)

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-12-20 16:07:38 +04:00
Andrey Vagin
d7cf271ed4 crtools: preload libraries (v2)
Libraries (plugins) is going to be used for dumping and restoring
external dependencies (e.g. dbus, systemd journal sockets, charecter
devices, etc)

A plugin can have the cr_plugin_init() and cr_plugin_fini functions for
initialization and deinialization.

criu-plugin.h contains all things, which can be used in plugins.

v2: rename lib to plugin
v3: add a default value for a plugin path.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-12-19 21:48:33 +04:00
Tikhomirov Pavel
5ec15bf25c page-read: add proper processing of return value
when get an error reading image file need stop restore

Signed-off-by: Tikhomirov Pavel <snorcht@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-12-17 19:45:22 +04:00
Pavel Emelyanov
ae98ef6ae0 mount: Factor out mount tree build for NEWNS and non-NS cases
We anyway build the tree, in the NS case -- few calls later.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-12-12 16:19:48 +04:00
Cyrill Gorcunov
cf1ce5f817 mount: Build mount tree on dump restore early, if needed
For paths resolution we will need mount tree to be parsed
and built, but it's not that simple -- the current code
implies that once parsed the tree must not be re-parsed
again, so we pass @parse argument from a caller: if a task
we're restoring do not use mount namespace, we should parse
mount tree early, otherwise defer this action until mount
tree is read from the image.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-12-11 16:05:19 +04:00
Andrey Vagin
57d25e7cea mm: fix expression to determine which vma-s can be shared
Currently only addresses are compared. It's obviously not enough.

* First of all the parent vma must be private.
* Both vma-s must have the identical set of MAP_GROWSDOWN and MAP_FILES
  flags.
* Both vma-s must be linked to the same file.

https://bugzilla.openvz.org/show_bug.cgi?id=2824
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-11-22 18:19:23 +04:00
Andrey Vagin
7659c995f5 vm: don't overwrite vma->shmid for private mappings
shmid contains a file id for file mappings. It's required to determine,
which VMA-s are cowed. The parent maps a VMA and saves premmaped
address. Then  child trys to determing, which VMA-s must be inhereted
from parent, for that it compares addresses, flags and file id.

We don't want to transfer vma_area-s in restorer, so when a VMA entry is
copied in restorer memory, the premmaped address is save in shmid.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-11-22 18:19:08 +04:00
Pavel Emelyanov
c3b9448cf7 pidfile: Don't push opts.pidfile as write_pidfile arg
opts are criu-wide available.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-11-20 14:26:41 +04:00