Here is nothing interecting. If a file can't be dumped by criu,
plugins are called. If one of plugins knows how to dump the file,
the file entry is marked as need_callback. On restore if we see
this mark, we execute plugins for restoring the file.
v2: Callbacks are called for all files, which are not supported by CRIU.
v3: Call plugins for a file instead of file descriptor. A few file
descriptors can be associated with one file.
v4: A file descriptor is opened in a callback. It's required for
restoring anon vmas.
v5: Add a separate type for unsupported files
v6: define FD_TYPES__UNSUPP
v7: s/unsupp/ext (external)
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Libraries (plugins) is going to be used for dumping and restoring
external dependencies (e.g. dbus, systemd journal sockets, charecter
devices, etc)
A plugin can have the cr_plugin_init() and cr_plugin_fini functions for
initialization and deinialization.
criu-plugin.h contains all things, which can be used in plugins.
v2: rename lib to plugin
v3: add a default value for a plugin path.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
when get an error reading image file need stop restore
Signed-off-by: Tikhomirov Pavel <snorcht@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
For paths resolution we will need mount tree to be parsed
and built, but it's not that simple -- the current code
implies that once parsed the tree must not be re-parsed
again, so we pass @parse argument from a caller: if a task
we're restoring do not use mount namespace, we should parse
mount tree early, otherwise defer this action until mount
tree is read from the image.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Currently only addresses are compared. It's obviously not enough.
* First of all the parent vma must be private.
* Both vma-s must have the identical set of MAP_GROWSDOWN and MAP_FILES
flags.
* Both vma-s must be linked to the same file.
https://bugzilla.openvz.org/show_bug.cgi?id=2824
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
shmid contains a file id for file mappings. It's required to determine,
which VMA-s are cowed. The parent maps a VMA and saves premmaped
address. Then child trys to determing, which VMA-s must be inhereted
from parent, for that it compares addresses, flags and file id.
We don't want to transfer vma_area-s in restorer, so when a VMA entry is
copied in restorer memory, the premmaped address is save in shmid.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The plan is to move the args on rst-mem itself. Thus we need
to make sure it's initialized _after_ remap into restorer space.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Currently we have task args page-aligned, then there go
thread args. This is waste of memory. Let's put them in
one row.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The sigreturn_restore is the place when we prepare the restorer
layout and jump to it. Reading and decoding images should be done
earlier. The new rst-malloc engine allows for that.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The sigreturn_restore is the place when we prepare the restorer
layout and jump to it. Reading and decoding images should be done
earlier. The new rst-malloc engine allows for that.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The sigreturn_restore is the place when we prepare the restorer
layout and jump to it. Reading and decoding images should be done
earlier. The new rst-malloc engine allows for that.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We're filling some rst-mem data _after_ we get the self maps
list. This is a bug, since the restorer vma get forcedly mapped
into a place we get out of self-vmas-list.
Move the self-vmas-list getting after we allocate the memory
we need.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
write_pidfile() was taken out from cr-restore.c, where it was supposed to kill
child if unable to create pidfile. Now we're also using it at service/page-server where kill is redundant. So lets take out kill() from write_pidfile() back to cr-restore.
Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This actually fixes a bug -- memory for shmem info was
not allocated dynamically, thus we were limited in the
amount of shmems to be restored.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
In case if we need to use vdso proxy the memory area
which holds restorer also has a place for vdso proxy
code itself, so on final pass we should not unmap it,
otherwise any call to vdso function will cause sigsegv.
IOW, the memory before final "cleanup" pass of restorer
might look as
+-----------+---------+ +-------------+------+
| bootstrap | rt-vdso | ... | application | vdso |
+-----------+---------+ +-------------+------+
^ |
`-------------------------+
and we have redirected "vdso" code to jump to "rt-vdso".
After final pass the memory must look as
+---------+ +-------------+------+
| rt-vdso | ... | application | vdso |
+---------+ +-------------+------+
^ |
`-------------------------+
I noticed this problem during container migration
testing, the container itself was suspended on 2.6.32
OpenVZ kernel with apache running inside, and any attempt
to connect to apache caused apache to crash.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It's not enough to check only uids on dump and restore -- we need to
check e-ids and s-ids now (and caps in the future).
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
v2: remove redundant functions and variables.
Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Currently we catch processes on the exit point from sigreturn.
If a task must be restored in the stopped state, we can send SIGSTOP
before detaching from it.
v2: add more descriptive comment about skipping SIGSTOP in ptrace.c
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We can restore task's pgid which is not equal to its pid,
only when the respective group leader is alive. To make
restore reliable we wait for all group leaders to restore
using separate restore stage.
It's better to optimize this -- each task has a pointer on
its group leader and waits for one to become such.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
First of all, this should be done strictly after we've stopped accessing
files by their paths, even absolute. This place is right before going
into restorer.
And the second thing is that we want to re-use the open_fd_by_id engine,
since it handles various tricky cases of open-file-by-path. And since
there's no such thing as fchroot(int fd), we emulate it using the
/proc/self/fd/ links.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We restore chroot before doing this, so if we might need to
open one, we may have no access to the /proc/... paths.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
All process VMA-s are in "premmaped area". All restorer stuff are in
bootstap "area", so we have two areas.
So we don't need to unmap extra VMA-s one by one. We can call munmap
three times for the region before the first area, for the hole between
areas and for the region after the second area.
The old scheme didn't work, because the list of VMA-s can be changed
after collecting. It can be due to memory allocations by libc or due to
increased stack.
v2: improve readability at the expense of beautiness
v3: print return code of munmap in error messages
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This patch adds a function for removing the restorer blob. This function
never returns and the process must be trapped on the exit from the
munmap syscall.
v2: * release parasite_ctl sturcture and use the new interface of
parasite_prep_ctl
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
A task is stopped here for unmaping restorer blob and restoring a state.
The method is the same as for parasite. CRIU attaches to processes via
ptrace and start to trace all syscalls.
v2: don't use a software breakpoint
v3: stop all thread on the exit from sigreturn
v4: attach to each thread
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
For the root task the clone syscall returns the pid in criu's pidns,
but for other processes the clone syscall returns PID in the restored
namespace.
The /proc/self link contains the PID value of the current process, so if
we want to determing the PID in a criu's pidns, we should use criu's
/proc.
v2: readlink() does not append a null byte to buf, so we must do that
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When service/page server becomes daemon, we may need to know it's pid.
Signed-off-by: Ruslan Kuprieiev<kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Use decode_pointer() to convert a virtual address into a native pointer.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
In /proc/pid/maps grow-down VMA-s are shown without guard pages, but
sometime these "guard" pages can contain usefull data. For example if
a real guard page has been remmaped by another VMA. Let's call such
pages as fake guard pages.
So when a grow-down VMA is mmaped on restore, it should be mapped with
one more guard page to restore content of the fake guard page.
https://bugzilla.openvz.org/show_bug.cgi?id=2715
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>