Those will be inherited from parent. Before this patch this list was
always empty, but it will change soon.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
read_vmas will be called bedore forking children to restore
copy-on-write memory.
v2: don't open an image one more time
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This makes code more readable, saves one ptr on stack and
lets us jump into restorer code using tags.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
To unify the code for both thread leader and regular threads
we move blocked signals for thread leader into threads argument
area and use restore_thread_common() helper.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Otherwise we might get nil dereference in sigreturn restore.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Three parts.
Proc: open of map_files' link doesn't work on sockets. We fstatat
it and check that it's a socket (it will be packet), then save
the socket inode on vma_area.
Dump: we resolve socket inode to socket id and save it on vma.
We use id, not inode, since on restore we'll have to mmap some
opened file, not just abstract socket with inode.
Restore: when reading vma-s we just need to find out on what fd
the respective packet socket is opened (i.e. -- no map-and-close
sockets supported by now) and dup() it to let restorer mmap it
back.
All this make it possible to c/r the tcpdump tool!
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The length of bootstrap in the print is old and wrong, we need to fix
it and unify the length variable.
Signed-off-by: Huang Qiang <h.huangqiang@huawei.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Writing to last_pid sysctl is CAP_SYS_ADMIN potected. Thus restoring
creds before it won't work in all the cases.
Fix this by making all threads restore creds themselves, and the
thread group leader -- after all of them.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrey Vagin <avagin@parallels.com>
The size of vector depends on the kernel config
so use the real size of a vector dumped. Otherwise
we might fail on restore.
Reported-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The former is actually the parameters of thread group leader, so
it's natural to have them on-task.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
With some historical changes, the second page-aligned for
restore_thread_vma_len is reduplicate. So remove it.
Signed-off-by: Huang Qiang <h.huangqiang@huawei.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Many image files opened by open_image_ro weren't closed before return, fix
them all in this patch.
Signed-off-by: Huang Qiang <h.huangqiang@huawei.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It will be required to support socket bound to devices.
When restoring w/o net namespaces -- collect existing devices.
When restoring with them -- collect what is received from image.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We will need it for slave ttys migration. They serve for one purpose --
to clone self stdio descriptor and use it with tty layer, which will
be addressed in further patches.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
No magic here, just fetch info using getpriority and sched_getxxx calls.
Good news is that the mentioned syscalls take pid as argument and do work
with it, i.e. -- no need in parasite help here.
Restore is splitted into prep -- copy sched bits from image on restorer
args -- and the restore itself. It's done to avoid restoring tasks info
with IDLE priority ;) To make restorer not-fail sched bits are validated
for sanity on prep stage.
Minimal sanity test is also there.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
In case if here no task found which would restore
controlling terminal -- exit with error instead of
continue with just error message.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Dumping them is performed via parasite, since calling the getgroups
is the only way of getting the complete list. Currently the nr of
groups to dump is limited explicitly with the size of shared memory
between crtools and parasite. This is MUCH more that we have seen
on real apps so far.
Restoring is done early, before restorer blob not to carry the undefined
array of grpous in there. This is OK, since groups do not affect us at
that point and are not affected by subsequent creds restore.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Otherwise there is a race between files with same names:
link(name -> ghost) link(name->ghost)
open(name)
unlink(name)
open(name) -> ENOENT
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Restore must not fail after unlocking connections.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Early we moved prepare_shared() to a root task,
because several preparation actions should be executed
in a target namespace set (e.g.: ghost files).
TCP sockets are a subset of init sockets,
they should be unlocked before resume. It's convient to do
from crtools.
An image can't be read more than one time, because we want to
send it via network.
For this two reasons prepare_shared is spitted in two parts,
one for crtools and one for a root task.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
TCP_REPAIR should be droppet when a network is unlocked.
A network should be unlocked at the last moment, because
after this moment restore must not failed, otherwise a state of
a tcp connection can be changed and a state of one side in our image
will be invalid.
v2: use xremalloc instead of mmap and remmap
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We need to do two non-trivial things with ttys -- interconnect
slaves to masters (or to each other) and setup ctl-tty restoring
task.
Now this is done in subsequently depending on each other steps:
1. collect ttys
2. interconnect slaves and mark ctl-tty tasks
3. collect fake fds for tty-ctl tasks
4. setup orphaned slaves
We can relax this logic in two ways:
1. don't split marking ctl-tty tasks and then creating fds for them
do it in one step at the end
2. don't interconnect slaves with masters and orphaned slaves in
two steps -- do it in one place after fds are collected
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
In case if there is no master peer associated
with a slave peer we have two cases
- the master peer was closed before slave
- we just have no master peer at all, but
only slave one
This patch addresses only first case -- we open
fake master and hook slaves on it, then close it
immediately.
The second case will be addressed later.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Instead -- mmap it once in root task and then mremap it later.
No mremap of original restorer can be done, since in that case
the restorer vma would be tied to crtools binary which in turn
will make set-exe-file prctl to fail with EBUSY.
Note -- after mremap the original vmas list becomes non relevant,
but it's OK. Only new holes appear inside which is OK for munmap.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This will avoid exec bit on restorer args and will make
it possible for shared restorer eventually.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
There was a strange thing -- task args size is aligned, but when
threads args ptr is get this alignment was lost. Fix this and make
all the bufs page-aligned.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The idea behind is pretty simple -- once we find
that there is a controlling terminal present we
do call ioctl on appropriate /dev/pts/N.
This is done in a bit unusuall manner. When we
find that there is a controling terminal present
we do create an additional FdinfoEntry for it
with object id taken from existing master peer.
The file engine stack this new FdinfoEntry on
fd_info_head head list. Thus we will have at
least two entries on this list. One for real
Fdinfo associated with master peer and one for
our new generated Fdfinfo entry, it depends on
pid which one become a file master.
Finally we do use post_open_fd hook in our
tty code which allows us to open controlling
terminal and yield proper ioctl on it.
v2:
- restore control terminals via service fd,
still need to speedup service fd retrieval.
v3:
- use prepare_ctl_tty() helper to generate
control terminal fdinfo entry
v4:
- use post_open_fd
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Usually the PTYs represent a pair of links -- master peer and slave
peer. Master peer must be opened before slave. Internally, when kernel
creates master peer it also generates a slave interface in a form of
/dev/pts/N, where N is that named pty "index". Master/slave connection
unambiguously identified by this index.
Still, one master can carry multiple slaves -- for example a user opens
one master via /dev/ptmx and appropriate /dev/pts/N in sequence.
The result will be the following
master
`- slave 1
`- slave 2
both slave will have same master index but different file descriptors.
Still inside the kernel pty parameters are same for both slaves. Thus
only one slave parameters should be restored, there is no need to carry
all parameters for every slave peer we've found.
Not yet addressed problems:
- At moment of restore the master peer might be already closed for
any reason so to resolve such problem we need to open a fake master
peer with proper index and hook a slave on it, then we close
master peer.
- Need to figure out how to deal with ttys which have some
data in buffers not yet flushed, at moment this data will
be simply lost during c/r
- Need to restore control terminals
- Need to fetch tty flags such as exclusive/packet-mode,
this can't be done without kernel patching
[ avagin@:
- ideas on contol terminals restore
- overall code redesign and simplification
]
v4:
- drop redundant pid from dump_chrdev
- make sure optional fown is passed on regular ptys
- add a comments about zeroifying termios
- get rid of redundant empty line in files.c
v5 (by avagin@):
- complete rework of tty image format, now we have
two files -- tty.img and tty-info.img. The idea
behind to reduce data being stored.
v6 (by xemul@):
- packet mode should be set to true in image,
until properly fetched from the kernel
- verify image data on retrieval
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
A ghost file is used for restoring descriptors of an unlinked file.
It is created, opened and deleted.
Currently ghost files are collected in root task and then removed
by crtools when everybody is restored. This scheme doesn't work,
ghost_file_list is not shared, plus tasks may live in different mount
namespace.
It was broken by the following commit:
bd4e5d2f restore: prepare shared objects after initializing namespaces
We can't just move clear_ghost_files(), because we need to wait, until
all processes have not opened a ghost file.
We can add one more global barrier or move clear_ghost_files() in
a restore code bellow an existent barrier.
Here is a better sollution, a gost file is deleted by the last user.
v2: Use the type atomic_t and fix a commit message.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
/proc/PID/maps can contains not up to date information about a stack vma.
A kernel marks a VMA as stack, if thread_struct->usersp is in it,
but usersp is updated, when a process calls a syscall.
This problem is occured, when we try to dump/restore a process in a loop.
When a restorer resumes a process, a restorer vma will be marked as stack.
A thread stack should not be marked as stack, because its vma is mapped
w/o MAP_GROWSDOWN.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
An analogue to current macro the kernel has.
The name 'me' is somehow confusing.
No func. changes.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Will need it to honor current log level in restorer.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We can set a directory for log and image files.
crtools sets it as a current directory and then creates all files in it.
It works before we don't decide to change a mount name space.
I suggest to open a log dir and create files for help openat.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Why do we need a new stack? We already have one and it can be used.
We need to step a bit for executing a glibc clone()
v2: Don't lose a page from a child's stack
v3: Remove the defined constant STACK_SIZE
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When we fork a pidns init there's no need in specifying its pid,
as it will be autogenerated to 1. Clean the code not to mess with
the last_pid sysctl at all in that case, rather than just omitting
the write into it.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When we restore a pid namespace the root task will get some unknown pid
in the original (i.e. -- the ns crtools a launched from) one. To find
this pid out one can use this option -- it will make the pid obtained by
the new init to be written into a pid file.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>