- make control block to keep all information
needed to run injected syscall and parasite
blobs
- add ptrace_swap_area helper
- handle both parasite engine calls and injected
syscalls by single __parasite_execute function
- drop jerr() usage
- bring back handling of inflight signals from
original program inside parasite code
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
I don't like that we poke the parasite into remote space with 4k calls
to ptrace. Now we have the /proc/pid/map_files/ dir which helps us sharing
a mapping with some other process.
Use this -- map the remote area for parasite locally and put the parasite
blob into it with simple memcpy.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Right now we do syscall_seized for this, but we have the misc dumping command
and the core is (after patch #3) dump after parasite, so we can get brk from
the misc dump, thus avoiding one more switch to parasite.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
This actually does two things:
1. The parasite code writes to pages _or_ to pages_shared file himself based
on a hint given from the main program. This avoids shared pages copying
in finalize_core.
2. The private pages are moved out of the core file into a separate one. This
avoids private pages copying in finalize_core.
The goal of this patch is a) to avoid pages copying at all (we still have
one on restore, but fixing this requires Andrey's work on shared memory
dumping) and b) make big blobs with pages be stored in separate files (I
have plans on its format rework and unification).
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Only the low 32 bits of orig_ax are meaningful
for obtaining syscall number so we need to test
if sign extended bits are greater than 0.
Reported-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
v2: wrappers names become less obfuscating
This patch:
1) Updates function cr_fdset_open() to be suitable for handling fdset creation
for dump and show stages.
2) Replaces cr_fdset_open() by new wrapper function cr_fdset_dump().
3) Replaces prep_cr_fdset_for_restore() by new wrapper function cr_fdset_show().
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
In case of critical error is happened during checkpoint
procedure, the program should exit immediately.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
This patch adds ability to checkpoint/restore
/proc/pid/exe symlink, so if a process we've just
checkpointed has been say /path/to/exe, then at restore
time we bring this path back.
There some restiction from kernel side: if
existing /proc/pid/exe already mapped more than
once, the kernel will refuse to change the symlink,
so we need to restore it lately when mmaps of crtools
itself already unmapped (ie via late call in
restorer.c).
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
This commit brings the former "Rewrite task/threads stopping engine"
commit back. Handling it separately is too complex so better try
to handle it in-place.
Note some tests might fault, it's expected.
---
Stopping tasks with STOP and proceeding with SEIZE is actually excessive --
the SEIZE if enough. Moreover, just killing a task with STOP is also racy,
since task should be given some time to come to sleep before its proc
can be parsed.
Rewrite all this code to SEIZE task and all its threads from the very beginning.
With this we can distinguish stopped task state and migrate it properly (not
supported now, need to implement).
This thing however has one BIG problem -- after we SEIZE-d a task we should
seize
it's threads, but we should do it in a loop -- reading /proc/pid/task and
seizing
them again and again, until the contents of this dir stops changing (not done
now).
Besides, after we seized a task and all its threads we cannot scan it's children
list once -- task can get reparented to init and any task's child can call clone
with CLONE_PARENT flag thus repopulating the children list of the already seized
task (not done also)
This patch is ugly, yes, but splitting it doesn't help to review it much, sorry
:(
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Stopping tasks with STOP and proceeding with SEIZE is actually excessive --
the SEIZE if enough. Moreover, just killing a task with STOP is also racy,
since task should be given some time to come to sleep before its proc
can be parsed.
Rewrite all this code to SEIZE task and all its threads from the very beginning.
With this we can distinguish stopped task state and migrate it properly (not
supported now, need to implement).
This thing however has one BIG problem -- after we SEIZE-d a task we should seize
it's threads, but we should do it in a loop -- reading /proc/pid/task and seizing
them again and again, until the contents of this dir stops changing (not done now).
Besides, after we seized a task and all its threads we cannot scan it's children
list once -- task can get reparented to init and any task's child can call clone
with CLONE_PARENT flag thus repopulating the children list of the already seized
task (not done also)
This patch is ugly, yes, but splitting it doesn't help to review it much, sorry :(
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
* kid -> child
* First letter should be uppercase
* Misc typos in messages and comments
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
v2: strlen() check removed from parse_ns_string()
Now '-n' option must be followed by namespaces tags, separated by commas.
Currently, only "uts" namespace is supported.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
There are two cases for cr_fdset_open
- It might be called with already allocated
memory so we should reuse it.
- It might be called with NULL pointing out
that caller expects us to allocate memory.
If an open() error happens somewhere inside cr_fdset_open
it requires two error paths
- Just close all files opened but don't free memory
if it was not allocated by us
- Close all files opened *and* free memory allocated
by us.
In any case we should close all files opened so close_cr_fdset()
helper is splitted into two parts.
Also the caller should be ready for such semantics as well and
do not re-assign pointers obtained but simply test for NULL
on results.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
cr-dump.c: In function ‘dump_one_reg_file’:
cr-dump.c:128:2: error: format ‘%8x’ expects type ‘unsigned int’, but argument 5 has type ‘long unsigned int’
cr-dump.c: In function ‘dump_one_pipe’:
cr-dump.c:223:2: error: format ‘%d’ expects type ‘int’, but argument 2 has type ‘long unsigned int’
cr-dump.c:237:3: error: format ‘%8lx’ expects type ‘long unsigned int’, but argument 2 has type ‘u32’
cr-dump.c:237:3: error: format ‘%8lx’ expects type ‘long unsigned int’, but argument 3 has type ‘u32’
cr-dump.c:237:3: error: format ‘%8lx’ expects type ‘long unsigned int’, but argument 4 has type ‘u32’
cr-dump.c:237:3: error: format ‘%8lx’ expects type ‘long unsigned int’, but argument 5 has type ‘u32’
cr-dump.c:240:3: error: format ‘%d’ expects type ‘int’, but argument 4 has type ‘long unsigned int’
cr-dump.c: In function ‘dump_one_fd’:
cr-dump.c:257:3: error: format ‘%d’ expects type ‘int’, but argument 5 has type ‘long unsigned int’
cr-dump.c:262:3: error: format ‘%d’ expects type ‘int’, but argument 4 has type ‘long unsigned int’
cr-dump.c:272:4: error: format ‘%d’ expects type ‘int’, but argument 3 has type ‘long unsigned int’
cr-dump.c:286:4: error: format ‘%d’ expects type ‘int’, but argument 4 has type ‘long unsigned int’
cr-dump.c:295:2: error: format ‘%d’ expects type ‘int’, but argument 4 has type ‘long unsigned int’
cr-dump.c: In function ‘dump_task_files’:
cr-dump.c:340:3: error: too few arguments for format
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
This is a standard convention to print error message (i.e. strerror(errno))
at the end of line, like this:
Cannot remove file: Permission denied
So pr_perror is fixed to follow this convention (using GNU extension
%m helps a lot here). Unfortunately, due to this we have to make
pr_perror() print a new line character, too, so we had to strip it
from the all pr_perror() invocations.
That (appending a newline) also makes pr_perror() a black sheep
in the herd of pr_* helpers, but what can we do? Worst case scenario
is an extra newline after an error message, not too harmful.
An alternative approach (stripping the newline from the passed format
string and re-adding it) was discussed thoroughly, and it was decided
that such a hack looks a bit too dirty.
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Dumping is simple. All but secbits can be read from proc, secbits
are got from parasite.
Restoring is a bit tricky -- when you change anything on kernel
cred's struct it performs sophisticated checks and can change
some more stuff than requested, so the creds restoration procedure
is carefully commented step-by-step.
Another thing to mention is that creds are restored after everything
else, i.e. right before performing final threads sync and sigreturns.
This is done to avoid potential problems with insufficient caps for
restoring other stuff (e.g. CAP_DAC_OVERRIDE or zero euid is most
likely required for opening any image file and the notorious control
/proc/sys/kernel/ns_last_pid, which in turn is performed till the
very last moment).
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Add command and basis for dumping minor bits for task
from parasite code. It's supposed to retrieve minor bits
form tasks which cannot be read from /proc.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
New option -n to dump/restore namespaces.
Fork the namespaces dumping task and write a helper for switching a namespace.
Prepare the restorer code for restoring namespaces before root task.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Split the CR_FD_ bits into per-task and global ones and replace
of CR_FD_DESC_NOPSTREE with CR_FD_DESC_TASK, which is explicit
set of per-task bits.
The CR_FD_DESC_NS will appear soon.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Timers are dumped from inside parasite code, the format is plain -- just
3 pairs of interval/value one-by-one.
The restoration occurs in two stages -- first prepare the timer values in
restorer (and check for sanity), then setup the timers in the latest stage
before actually calling the sigreturn.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Dump the core-pid.img file only. On restore select the way of killing
task based on his exit_code -- exit or kill with a signal. Before dying
unblock all the handlers and set SIG_DFL to it (to make the dead signal
other than KILL be deliverable).
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Don't dump tasks in any state but T (stopped). Will extend it with Z soon.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Introduce 3 states we will have to work with:
* alive for tasks sleeping or running
* dead for zombies
* stopped for stopped tasks. We cannot distinguish tasks in this state now,
but with freezer cgroup this will become possible
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Move common code for tasks and threads dumping into routine. It will be
used for zombies as well.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Kill all the macros for reading/writing image parts. New API looks like
* write_img_buf/write_img
Write an object into an image. Reports 0 for OK, -1 for error. The _buf
version accepts object size as an argument, the other one uses sizeof()
* read_img_buf/read_img
Reads an object from image. Reports 0 for OK, -1 for error or EOF.
* read_img_buf_eof/read_img
Reads an object from image. Reports 1 for OK, 0 for EOF and -1 for error.
This is not symmetrical with the previous one, but it was done deliberately
to make it possible to write code like
ret = read_img_bug_eof();
if (ret <= 0)
return ret; /* 0 means OK, all is done, -1 means error was met */.
... /* 1 means object was read, can proceed */
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Keep task arch-independent fields in one struct (will be extended) in the
beginning of the image and make pads be located separately.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Rename prep_cr_fdset_for_dump into cr_fdset_open and make it reentable, i.e.
every next enter will open more files in the same fdset. Required for zombies
and makes the code cleaner.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
This is to make sure we dump zombies w/o threads.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Move the /proc/pid/stat parsing before anything in dump_one_task (prepare
for zombies dumping) and cleanup the rest of the code accordingly.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
These are checking in proc parse routine anyway.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Currently we read the children tree of running processes.
This is deadly wrong. Stop task once we open it's proc dir
before descending to its kids.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
We can do it since we may not want to dump the missed sockets.
But later, when we will start dumping containers, we'll better
fail here, rather than in the dump stage.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
This makes crtools correctly abort when working on wrong kernel.
Otherwise all the open files will have the same (garbage) ID and
the subsequent restore will result in broken app.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
And do not use strcpy, better to stick with strncpy.
Moreover, to be on a safe side make proc internal
buffer big enough even for "(%16s)" format, it's
hardly possible that the kernel ever change stat
format but just to be on a safe side.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
We will later need other fields of this file, so let's parse it right
now in a way, that allows to easily get new fields.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
* @xemul:
crtools: Collect dumping fd parameters into one place
crtools: Toss dump_one_fd args around
crtools: Rename fd to lfd in dump_one_fd
crtools: Sanitize pstree construction
crtools: Remove unused printk_registers and co
crtools: Deduplicate file info showing code
crtools: Merge pstree collecting into showing
crtools: Remove unused and wrong arrays from pstree image
crtools: Remove lseeks after prep_cr_ calls
crtools: Cleanup collect_pstree in cr-show
Conflicts:
cr-dump.c
include/sockets.h
sockets.c
The conflicts are mostly because of commit
995ef5eca3b8d74cf192e41c307f5329d93f9795
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
This allows us to get rid of open-coded "/proc/pid/X".
Based-on-patch-from: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>