This thing on pstre_item was created to carry task-specific
information across the "restore" code-flow.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Because we need to lookup for ghost files from
inotify system where we only have device/inode
as a key, we save dev/ino in ghost image entry.
Note we use in-kernel format for device to be
consistent with inotify and mount related
code base.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The remap_put will be needed to defer unlinking of ghost files
if they are referred from inotify system.
The lookup_remap is needed to figure out if the watch
target the inotify has present in ghost files list.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
A value of signo is in [1, SIGMAX].
Currenly signals are enumirated from 1 to SIGMAX, but SIGMAX
is not included. This patch fixes this mestake.
v2: * save backward compatibility
* set a correct value of SIGMAX = 64. It can not be in a
separate patch, because a format is changed again.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When finishing a stage we have to report this (decrement the
number of tasks in stage) and wait while stage switch. Write
a helper that does both.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This function prints an error message only once.
[xemul: change ({}) to do {} while (0)]
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This reverts commit ef3771d566dacb8ee9fe71b744d56f08674fe3db.
With new SO_BINDTODEVICE getting API it's not required.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
A parent process can change a few pages after forking a child and
all this pages should not be avaliable from the child.
Each vma has a bitmap of existent pages. Parent's and child's bitmaps
can be compared and all pages which are not present in a child bitmap
are dropped.
v2: don't check page_bitmap on NULL
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
All memory content are restored before entering in restorer.c.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
With this patch vma->shmid contains file id before mapping a region,
then it contains of a temporary address.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
All private vmas are placed in a premmapped region and
they are sorted by start addresses, so they should be shifted apart.
Here is one more problem with overlapped temporary and target regions,
mremap could not remap such cases directly, so for such cases a vma is
remapped away and then remapped on a target place.
v2: fix accoding with Pavel's comments
v3: add a huge comment with pictures
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Private vma-s are mapped before forking children, then they are
remapped to corrected places in restorer.c.
In restorer all unneeded vma-s are unmaped. VMA-s from premmapped
regions should not be unmaped.
v2: replace guard pages on arithmetic in restorer
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
A private vma will be mapped in a temporary place and an address will
be saved in vma->shmid, because nobody was used this field for private
vma-s before. For easy reading vma_premmapped_addr will be used for
getting the temporary address.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Now this macros returns true only for anon | priv vmas.
It will be expanded to file | priv in a follow patch.
v2: check flags together.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This code is borrowed from the Linux kernel sources.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This makes code more readable, saves one ptr on stack and
lets us jump into restorer code using tags.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
To unify the code for both thread leader and regular threads
we move blocked signals for thread leader into threads argument
area and use restore_thread_common() helper.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
I've moved dump_thread helper a bit lower in file
since I've to call for find_thread_state helper.
After all this groups all thread related functions
in one slab.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This patch starts using parasite_init_threads_seized and
parasite_fini_threads_seized helpers to save per-thread
data in parasite and remove it on cure procedure.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Just to be consistent with types we're using.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We will need to extend the structure to keep not only tid/tid_addr
but blocked signals as well, thus rename it to more generic
parasite_dump_thread.
The command PARASITE_CMD_DUMP_TID_ADDR renamed to
PARASITE_CMD_DUMP_THREAD for the same sake.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We will use them from crtools code to save and restore blocked
signals mask of threads.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
These helpers will be needed to save a blocked signals
mask for dumpee threads.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The per-thread information requires own space in parasite data.
In particular we will keep the blocked signals bound to thread
pids.
For this sake the caller need to provide the parasite how many
threads will be used to calculate space.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Three parts.
Proc: open of map_files' link doesn't work on sockets. We fstatat
it and check that it's a socket (it will be packet), then save
the socket inode on vma_area.
Dump: we resolve socket inode to socket id and save it on vma.
We use id, not inode, since on restore we'll have to mmap some
opened file, not just abstract socket with inode.
Restore: when reading vma-s we just need to find out on what fd
the respective packet socket is opened (i.e. -- no map-and-close
sockets supported by now) and dup() it to let restorer mmap it
back.
All this make it possible to c/r the tcpdump tool!
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Sizes of send and recv buffers are set to maximum to restore queues,
after that sizes of buffers are restored.
O_NONBLOCK is set to a socket to prevent blocking during restore.
#2411
v2: do the same for unix sockets.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Writing to last_pid sysctl is CAP_SYS_ADMIN potected. Thus restoring
creds before it won't work in all the cases.
Fix this by making all threads restore creds themselves, and the
thread group leader -- after all of them.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrey Vagin <avagin@parallels.com>
The size of vector depends on the kernel config
so use the real size of a vector dumped. Otherwise
we might fail on restore.
Reported-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The former is actually the parameters of thread group leader, so
it's natural to have them on-task.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The dumping of tty peers is somewhat tricky. And it became more
complex once we allowed to migrate/inherit sessions.
It's being found (in screen c/r) that we've a problem in looking
up of session leaders while dumping tty.
Let me explain with more details. Here is an example of screen
session
PID GID SID
20567 20567 20567 SCREEN
20568 20568 20568 pts/3 \_ /bin/bash
The screen opens master peer (ptmx) and then provides
bash the slave peer (pts/3) where bash sets up a session
leader on it.
Thus we get interesting scenario -- our pstree construction
is done in lazy fashion, we run parasite code to fetch sid/pgid
of a process tree item only when we're really dumping the task.
Thus when we start dumping ptmx peer (which belongs to SCREEN)
we've not yet constructed the process tree item for children
(ie /bin/bash) and the lookup function in tty code (which walks
over all process items in a tree) simply fails to find sid of
child, because we've not yet dumped it.
Thus, to resolve such situation we verify tty sids at late stage
of dumping.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
On my testing machine these defs are not present
so define them where needed.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Get the info from kernel diag message (it should always be there)
and restore the shutdown at the very end.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It will be required to support socket bound to devices.
When restoring w/o net namespaces -- collect existing devices.
When restoring with them -- collect what is received from image.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This option will tells the tool to procceed dumping
even if a root task is not a session leader.
This implies that this option will allow to "migrate"
one external tty connection. Say a person may dump
"top" application in one bash shell and restore it
in another shell session.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We will need it for slave ttys migration. They serve for one purpose --
to clone self stdio descriptor and use it with tty layer, which will
be addressed in further patches.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It should fit the sa_sigaction declaration, otherwise compiler
complains about uncasted assignments.
| restorer.c: In function тАШ__export_restore_taskтАЩ:
| restorer.c:318:20: error: assignment from incompatible pointer type [-Werror]
So just use siginfo_t here from the system signal.h header.
Signed-off-by: Andrew Grigorev <andrew@ei-grad.ru>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
No magic here, just fetch info using getpriority and sched_getxxx calls.
Good news is that the mentioned syscalls take pid as argument and do work
with it, i.e. -- no need in parasite help here.
Restore is splitted into prep -- copy sched bits from image on restorer
args -- and the restore itself. It's done to avoid restoring tasks info
with IDLE priority ;) To make restorer not-fail sched bits are validated
for sanity on prep stage.
Minimal sanity test is also there.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>