It's not enough to check only uids on dump and restore -- we need to
check e-ids and s-ids now (and caps in the future).
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
v2: remove redundant functions and variables.
Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Currently we catch processes on the exit point from sigreturn.
If a task must be restored in the stopped state, we can send SIGSTOP
before detaching from it.
v2: add more descriptive comment about skipping SIGSTOP in ptrace.c
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We can restore task's pgid which is not equal to its pid,
only when the respective group leader is alive. To make
restore reliable we wait for all group leaders to restore
using separate restore stage.
It's better to optimize this -- each task has a pointer on
its group leader and waits for one to become such.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
First of all, this should be done strictly after we've stopped accessing
files by their paths, even absolute. This place is right before going
into restorer.
And the second thing is that we want to re-use the open_fd_by_id engine,
since it handles various tricky cases of open-file-by-path. And since
there's no such thing as fchroot(int fd), we emulate it using the
/proc/self/fd/ links.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We restore chroot before doing this, so if we might need to
open one, we may have no access to the /proc/... paths.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
All process VMA-s are in "premmaped area". All restorer stuff are in
bootstap "area", so we have two areas.
So we don't need to unmap extra VMA-s one by one. We can call munmap
three times for the region before the first area, for the hole between
areas and for the region after the second area.
The old scheme didn't work, because the list of VMA-s can be changed
after collecting. It can be due to memory allocations by libc or due to
increased stack.
v2: improve readability at the expense of beautiness
v3: print return code of munmap in error messages
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This patch adds a function for removing the restorer blob. This function
never returns and the process must be trapped on the exit from the
munmap syscall.
v2: * release parasite_ctl sturcture and use the new interface of
parasite_prep_ctl
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
A task is stopped here for unmaping restorer blob and restoring a state.
The method is the same as for parasite. CRIU attaches to processes via
ptrace and start to trace all syscalls.
v2: don't use a software breakpoint
v3: stop all thread on the exit from sigreturn
v4: attach to each thread
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
For the root task the clone syscall returns the pid in criu's pidns,
but for other processes the clone syscall returns PID in the restored
namespace.
The /proc/self link contains the PID value of the current process, so if
we want to determing the PID in a criu's pidns, we should use criu's
/proc.
v2: readlink() does not append a null byte to buf, so we must do that
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When service/page server becomes daemon, we may need to know it's pid.
Signed-off-by: Ruslan Kuprieiev<kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Use decode_pointer() to convert a virtual address into a native pointer.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
In /proc/pid/maps grow-down VMA-s are shown without guard pages, but
sometime these "guard" pages can contain usefull data. For example if
a real guard page has been remmaped by another VMA. Let's call such
pages as fake guard pages.
So when a grow-down VMA is mmaped on restore, it should be mapped with
one more guard page to restore content of the fake guard page.
https://bugzilla.openvz.org/show_bug.cgi?id=2715
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
These contain linkage between number, data type and routines
for pb messages we write/read to/from image files. Most of them
have simple number-type-routines mapping, so introduce a generating
script for that.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This thing is pretty straightforward -- on netns creation
populate it with tun-s, after this collect tun files, open
and attach them with regular fd-s engine.
One tricky thing -- when populating namespace with tun links
make them all persistent and drop this flag (if required)
later, when the first alive opened appears.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
If processes are restored without pidns, criu knows pidtheir -s from images,
but part of those task may have not yet forked, and thus the pids can not
exist or (!) be used by other processes.
To address that we abort stages RESTORE_NS and FORKING without killing tasks,
but with task_entries->start futex by writing STATE_FAIL into it and making
the tasks to check that. Since during RESTORE_NS and FORKING stages tasks can
only block on the mentioned futes, we can safely do it.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This thing was introduced by 01f8f8f4 to help not mixing
per-thread error messages in log files. Now messages are
not mixed by other means, so this thing is useless.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Currentl it waits for previous stage to complete and starts the
next one. Now it starts the next one and waits for it to complete.
The latter way fits better into both -- the code and the head.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
On restore we compare pages' contents with memcmp to check which
of them can remain shared. Report this info in restore stats.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Restore stats are difficult -- we have to collect them from several
tasks and thus existing plain variables would not work. We'll need
shared memory with stats, so prepre for allocating one.
Other than this -- put call to write_stats() where appropriate for
restore.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We have already tried to prevent generating core files for zombies:
commit 6da52216ce
Author: Andrey Vagin <avagin@openvz.org>
Date: Fri Jul 12 18:14:23 2013 +0400
restore: set the zero limit for RLIMIT_CORE
But it doesn't work if a core file is sent into a pipe.
This functionality is used by the abrt daemon for example.
This patch uses more direct way, it unsets the dumpable flag with help
of PR_SET_DUMPABLE.
v2: remove the previous hack
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Otherwise we lost 1:1 mapping between names being
dumped and what user sees after restore.
| 1455 pts/0 T 0:00 \_ ./crtools restore -t 1448
| 1448 ? Ss 0:00 | \_ ./zombie00 --pidfile=zombie00.pid --outfile=zombie00.out
| 1449 ? Z 0:00 | \_ [zombie00] <defunct>
| 1450 ? Z 0:00 | \_ [zombie00] <defunct>
| 1451 ? Z 0:00 | \_ [zombie00] <defunct>
| 1452 ? Z 0:00 | \_ [zombie00] <defunct>
https://bugzilla.openvz.org/show_bug.cgi?id=2635
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Number of rlimits may vary depending on system version
criu is compiled against. So we use rst-allocator to
carry all limits read from file.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>