The sigreturn_restore is the place when we prepare the restorer
layout and jump to it. Reading and decoding images should be done
earlier. The new rst-malloc engine allows for that.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The sigreturn_restore is the place when we prepare the restorer
layout and jump to it. Reading and decoding images should be done
earlier. The new rst-malloc engine allows for that.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The sigreturn_restore is the place when we prepare the restorer
layout and jump to it. Reading and decoding images should be done
earlier. The new rst-malloc engine allows for that.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We're filling some rst-mem data _after_ we get the self maps
list. This is a bug, since the restorer vma get forcedly mapped
into a place we get out of self-vmas-list.
Move the self-vmas-list getting after we allocate the memory
we need.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
write_pidfile() was taken out from cr-restore.c, where it was supposed to kill
child if unable to create pidfile. Now we're also using it at service/page-server where kill is redundant. So lets take out kill() from write_pidfile() back to cr-restore.
Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This actually fixes a bug -- memory for shmem info was
not allocated dynamically, thus we were limited in the
amount of shmems to be restored.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
In case if we need to use vdso proxy the memory area
which holds restorer also has a place for vdso proxy
code itself, so on final pass we should not unmap it,
otherwise any call to vdso function will cause sigsegv.
IOW, the memory before final "cleanup" pass of restorer
might look as
+-----------+---------+ +-------------+------+
| bootstrap | rt-vdso | ... | application | vdso |
+-----------+---------+ +-------------+------+
^ |
`-------------------------+
and we have redirected "vdso" code to jump to "rt-vdso".
After final pass the memory must look as
+---------+ +-------------+------+
| rt-vdso | ... | application | vdso |
+---------+ +-------------+------+
^ |
`-------------------------+
I noticed this problem during container migration
testing, the container itself was suspended on 2.6.32
OpenVZ kernel with apache running inside, and any attempt
to connect to apache caused apache to crash.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It's not enough to check only uids on dump and restore -- we need to
check e-ids and s-ids now (and caps in the future).
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
v2: remove redundant functions and variables.
Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Currently we catch processes on the exit point from sigreturn.
If a task must be restored in the stopped state, we can send SIGSTOP
before detaching from it.
v2: add more descriptive comment about skipping SIGSTOP in ptrace.c
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We can restore task's pgid which is not equal to its pid,
only when the respective group leader is alive. To make
restore reliable we wait for all group leaders to restore
using separate restore stage.
It's better to optimize this -- each task has a pointer on
its group leader and waits for one to become such.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
First of all, this should be done strictly after we've stopped accessing
files by their paths, even absolute. This place is right before going
into restorer.
And the second thing is that we want to re-use the open_fd_by_id engine,
since it handles various tricky cases of open-file-by-path. And since
there's no such thing as fchroot(int fd), we emulate it using the
/proc/self/fd/ links.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We restore chroot before doing this, so if we might need to
open one, we may have no access to the /proc/... paths.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
All process VMA-s are in "premmaped area". All restorer stuff are in
bootstap "area", so we have two areas.
So we don't need to unmap extra VMA-s one by one. We can call munmap
three times for the region before the first area, for the hole between
areas and for the region after the second area.
The old scheme didn't work, because the list of VMA-s can be changed
after collecting. It can be due to memory allocations by libc or due to
increased stack.
v2: improve readability at the expense of beautiness
v3: print return code of munmap in error messages
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This patch adds a function for removing the restorer blob. This function
never returns and the process must be trapped on the exit from the
munmap syscall.
v2: * release parasite_ctl sturcture and use the new interface of
parasite_prep_ctl
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
A task is stopped here for unmaping restorer blob and restoring a state.
The method is the same as for parasite. CRIU attaches to processes via
ptrace and start to trace all syscalls.
v2: don't use a software breakpoint
v3: stop all thread on the exit from sigreturn
v4: attach to each thread
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
For the root task the clone syscall returns the pid in criu's pidns,
but for other processes the clone syscall returns PID in the restored
namespace.
The /proc/self link contains the PID value of the current process, so if
we want to determing the PID in a criu's pidns, we should use criu's
/proc.
v2: readlink() does not append a null byte to buf, so we must do that
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When service/page server becomes daemon, we may need to know it's pid.
Signed-off-by: Ruslan Kuprieiev<kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Use decode_pointer() to convert a virtual address into a native pointer.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
In /proc/pid/maps grow-down VMA-s are shown without guard pages, but
sometime these "guard" pages can contain usefull data. For example if
a real guard page has been remmaped by another VMA. Let's call such
pages as fake guard pages.
So when a grow-down VMA is mmaped on restore, it should be mapped with
one more guard page to restore content of the fake guard page.
https://bugzilla.openvz.org/show_bug.cgi?id=2715
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
These contain linkage between number, data type and routines
for pb messages we write/read to/from image files. Most of them
have simple number-type-routines mapping, so introduce a generating
script for that.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This thing is pretty straightforward -- on netns creation
populate it with tun-s, after this collect tun files, open
and attach them with regular fd-s engine.
One tricky thing -- when populating namespace with tun links
make them all persistent and drop this flag (if required)
later, when the first alive opened appears.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
If processes are restored without pidns, criu knows pidtheir -s from images,
but part of those task may have not yet forked, and thus the pids can not
exist or (!) be used by other processes.
To address that we abort stages RESTORE_NS and FORKING without killing tasks,
but with task_entries->start futex by writing STATE_FAIL into it and making
the tasks to check that. Since during RESTORE_NS and FORKING stages tasks can
only block on the mentioned futes, we can safely do it.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This thing was introduced by 01f8f8f4 to help not mixing
per-thread error messages in log files. Now messages are
not mixed by other means, so this thing is useless.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Currentl it waits for previous stage to complete and starts the
next one. Now it starts the next one and waits for it to complete.
The latter way fits better into both -- the code and the head.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>