2
0
mirror of https://github.com/checkpoint-restore/criu synced 2025-08-29 13:28:27 +00:00

315 Commits

Author SHA1 Message Date
Andrey Vagin
772d6853d2 crtools: collect inet sockets to crtools
Early we moved prepare_shared() to a root task,
because several preparation actions should be executed
in a target namespace set (e.g.: ghost files).

TCP sockets are a subset of init sockets,
they should be unlocked before resume. It's convient to do
from crtools.
An image can't be read more than one time, because we want to
send it via network.

For this two reasons prepare_shared is spitted in two parts,
one for crtools and one for a root task.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-09-17 20:06:06 +04:00
Andrey Vagin
c27ff2baac tcp: unset TCP_REPAIR at the last moment after unlocking network (v2)
TCP_REPAIR should be droppet when a network is unlocked.
A network should be unlocked at the last moment, because
after this moment restore must not failed, otherwise a state of
a tcp connection can be changed and a state of one side in our image
will be invalid.

v2: use xremalloc instead of mmap and remmap

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-09-17 20:02:57 +04:00
Pavel Emelyanov
990f80dd0f tty: Sanitize slavery and ctl tty setups
We need to do two non-trivial things with ttys -- interconnect
slaves to masters (or to each other) and setup ctl-tty restoring
task.

Now this is done in subsequently depending on each other steps:

1. collect ttys
2. interconnect slaves and mark ctl-tty tasks
3. collect fake fds for tty-ctl tasks
4. setup orphaned slaves

We can relax this logic in two ways:

1. don't split marking ctl-tty tasks and then creating fds for them
   do it in one step at the end
2. don't interconnect slaves with masters and orphaned slaves in
   two steps -- do it in one place after fds are collected

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-09-14 18:12:59 +04:00
Pavel Emelyanov
ff875dc494 tty: Cleanup tty mutex preparation
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-09-14 17:58:46 +04:00
Cyrill Gorcunov
4ae20428c0 tty: Restore orphan slavery peers
In case if there is no master peer associated
with a slave peer we have two cases

 - the master peer was closed before slave
 - we just have no master peer at all, but
   only slave one

This patch addresses only first case -- we open
fake master and hook slaves on it, then close it
immediately.

The second case will be addressed later.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-09-14 17:50:46 +04:00
Pavel Emelyanov
667953c00f restorer: Don't memcpy restorer blob in each task on restore
Instead -- mmap it once in root task and then mremap it later.
No mremap of original restorer can be done, since in that case
the restorer vma would be tied to crtools binary which in turn
will make set-exe-file prctl to fail with EBUSY.

Note -- after mremap the original vmas list becomes non relevant,
but it's OK. Only new holes appear inside which is OK for munmap.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-09-14 14:51:40 +04:00
Pavel Emelyanov
80d5fb285f restorer: Mmap restorer blob separately from the rest
This will avoid exec bit on restorer args and will make
it possible for shared restorer eventually.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-09-13 04:10:48 +04:00
Pavel Emelyanov
63ce82e7f6 restore: Sanitize restorer code + args layout
There was a strange thing -- task args size is aligned, but when
threads args ptr is get this alignment was lost. Fix this and make
all the bufs page-aligned.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-09-13 03:01:48 +04:00
Pavel Emelyanov
5a469e1894 restorer: Lost tgt vmas lenght in restorer memory blob hinting
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-09-13 02:56:14 +04:00
Pavel Emelyanov
b354a09cd7 rst: Brush up shared resources collection
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-09-12 20:11:33 +04:00
Pavel Emelyanov
ccce9fed2a tty: Brush up ctl tty preparation
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-09-12 20:09:05 +04:00
Cyrill Gorcunov
20d6762d93 tty: Add restoration of controlling terminal v4
The idea behind is pretty simple -- once we find
that there is a controlling terminal present we
do call ioctl on appropriate /dev/pts/N.

This is done in a bit unusuall manner. When we
find that there is a controling terminal present
we do create an additional FdinfoEntry for it
with object id taken from existing master peer.

The file engine stack this new FdinfoEntry on
fd_info_head head list. Thus we will have at
least two entries on this list. One for real
Fdinfo associated with master peer and one for
our new generated Fdfinfo entry, it depends on
pid which one become a file master.

Finally we do use post_open_fd hook in our
tty code which allows us to open controlling
terminal and yield proper ioctl on it.

v2:
 - restore control terminals via service fd,
   still need to speedup service fd retrieval.

v3:
 - use prepare_ctl_tty() helper to generate
   control terminal fdinfo entry

v4:
 - use post_open_fd

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-09-12 20:00:58 +04:00
Cyrill Gorcunov
89a7a45d37 tty: Add checkpoint/restore for unix terminals v6
Usually the PTYs represent a pair of links -- master peer and slave
peer. Master peer must be opened before slave. Internally, when kernel
creates master peer it also generates a slave interface in a form of
/dev/pts/N, where N is that named pty "index". Master/slave connection
unambiguously identified by this index.

Still, one master can carry multiple slaves -- for example a user opens
one master via /dev/ptmx and appropriate /dev/pts/N in sequence.
The result will be the following

master
`- slave 1
`- slave 2

both slave will have same master index but different file descriptors.
Still inside the kernel pty parameters are same for both slaves. Thus
only one slave parameters should be restored, there is no need to carry
all parameters for every slave peer we've found.

Not yet addressed problems:

- At moment of restore the master peer might be already closed for
  any reason so to resolve such problem we need to open a fake master
  peer with proper index and hook a slave on it, then we close
  master peer.

- Need to figure out how to deal with ttys which have some
  data in buffers not yet flushed, at moment this data will
  be simply lost during c/r

- Need to restore control terminals

- Need to fetch tty flags such as exclusive/packet-mode,
  this can't be done without kernel patching

[ avagin@:
   - ideas on contol terminals restore
   - overall code redesign and simplification
]

v4:
 - drop redundant pid from dump_chrdev
 - make sure optional fown is passed on regular ptys
 - add a comments about zeroifying termios
 - get rid of redundant empty line in files.c

v5 (by avagin@):
 - complete rework of tty image format, now we have
   two files -- tty.img and tty-info.img. The idea
   behind to reduce data being stored.

v6 (by xemul@):
 - packet mode should be set to true in image,
   until properly fetched from the kernel
 - verify image data on retrieval

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-09-12 20:00:54 +04:00
Andrey Vagin
b11eeea381 restore: auto-unlink for ghost files (v2)
A ghost file is used for restoring descriptors of an unlinked file.
It is created, opened and deleted.

Currently ghost files are collected in root task and then removed
by crtools when everybody is restored. This scheme doesn't work,
ghost_file_list is not shared, plus tasks may live in different mount
namespace.

It was broken by the following commit:
bd4e5d2f restore: prepare shared objects after initializing namespaces

We can't just move clear_ghost_files(), because we need to wait, until
all processes have not opened a ghost file.
We can add one more global barrier or move clear_ghost_files() in
a restore code bellow an existent barrier.

Here is a better sollution, a gost file is deleted by the last user.

v2: Use the type atomic_t and fix a commit message.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-09-11 17:59:59 +04:00
Andrey Vagin
f6d373cc8c restore: prevent killing of nonpositive PIDs
I don't like surprises.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-09-07 18:52:59 +04:00
Andrey Vagin
0ae2bad0c6 mm: mark a vma as stack, if a value of sp is in it
/proc/PID/maps can contains not up to date information about a stack vma.
A kernel marks a VMA as stack, if thread_struct->usersp is in it,
but usersp is updated, when a process calls a syscall.

This problem is occured, when we try to dump/restore a process in a loop.
When a restorer resumes a process, a restorer vma will be marked as stack.

A thread stack should not be marked as stack, because its vma is mapped
w/o MAP_GROWSDOWN.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-09-07 18:21:04 +04:00
Cyrill Gorcunov
45375d5721 restore: Rename a task item being restored to `current'
An analogue to current macro the kernel has.
The name 'me' is somehow confusing.

No func. changes.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-09-05 19:52:55 +04:00
Cyrill Gorcunov
05466cc38a restorer: Pass current log level to the arguments
Will need it to honor current log level in restorer.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-09-03 14:44:09 +04:00
Andrey Vagin
9ec01ff307 log: don't create a log file in a current directory
We can set a directory for log and image files.
crtools sets it as a current directory and then creates all files in it.
It works before we don't decide to change a mount name space.

I suggest to open a log dir and create files for help openat.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-09-02 01:02:30 +04:00
Andrey Vagin
d34b9004a7 restore: use a currect stack for new processes (v3)
Why do we need a new stack? We already have one and it can be used.

We need to step a bit for executing a glibc clone()

v2: Don't lose a page from a child's stack
v3: Remove the defined constant STACK_SIZE

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-08-28 23:19:28 +04:00
Pavel Emelyanov
3ae36e700f restore: Don't mess with last_pid when restoring pidns init
When we fork a pidns init there's no need in specifying its pid,
as it will be autogenerated to 1. Clean the code not to mess with
the last_pid sysctl at all in that case, rather than just omitting
the write into it.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-08-14 14:09:20 +04:00
Andrey Vagin
aabb56bd66 crtools: write a pid of a root task in a specified file
When we restore a pid namespace the root task will get some unknown pid
in the original (i.e. -- the ns crtools a launched from) one. To find
this pid out one can use this option -- it will make the pid obtained by
the new init to be written into a pid file.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-08-14 12:54:00 +04:00
Pavel Emelyanov
5c9cc71fea log: Replace perror-s with pr_perror-s over code
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-08-11 21:57:42 +04:00
Pavel Emelyanov
9efd12f2c7 code: Remove trailing whitespaces over .c and .h files
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-08-11 21:34:35 +04:00
Cyrill Gorcunov
57032aff5e restorer: Do restore futex robust lists
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-08-10 20:29:01 +04:00
Pavel Emelyanov
1a62282d48 net: Push the host end of a veth to original netns
The call will then have to handle this end (put into a bridge).

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-08-10 19:14:36 +04:00
Pavel Emelyanov
7f1c9af0f8 vma: State that vma->fd is -1 constant in the image
This field was lost while switching to protobuf -- the vma images
were used by parasite as plain array and it was easier to reseve
this space in the image. Now it's too late to change this, so make
it be -1 always.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-08-10 10:17:50 +04:00
Pavel Emelyanov
fc7071d05e net: Packet sockets basic support
Support only basic packet socket functionality -- create and bind.
This should be enough to start testing dhclient inside container.
Other stuff (filter, mmaps, fanouts, etc.) will come later.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-08-09 16:17:41 +04:00
Pavel Emelyanov
b1b0a39a58 pb: Rewrite object reading to use pb-descs
The pb_read thing is no longer a macros. This will allow to
factor out objects collecting on restore.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-08-07 19:22:00 +04:00
Andrey Vagin
8bff4c7fca restore: consolidate restoring of a root task in one blob
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-08-06 18:37:13 +04:00
Andrey Vagin
703a322cc0 restore: mount_proc return a result instead of exit
In addition it fixes error handling.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-08-06 18:36:59 +04:00
Andrey Vagin
4c88cafe43 restore: fix clean up in PIDNS
When processes are restored in PIDNS, the controll process (crtools)
don't know a real pid of processes, but it knows a pid of init.

crtools can kill init and all other processes will be killed too.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-08-06 18:31:39 +04:00
Andrey Vagin
bd4e5d2f9d restore: prepare shared objects after initializing namespaces
On this stage crtools unlink old socket files, create ghost files and etc,
so we should be in a correct namespace.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-08-02 16:08:06 +04:00
Andrey Vagin
3cb5969b25 pstee: fix memory corrupation
The pstree_item for helpers is allocated without rst_info.
Before this patch prepare_fd_pid was executed for such items and
touched rst_info.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-08-02 15:54:54 +04:00
Pavel Emelyanov
da409cc641 signalfd: Dumping and restoring
Only the fact of the fd presence, its flags and fown and the sigmask.
The sigpending state is tightly coupled with the task's sigpending
state which is not yet supported.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-08-02 12:26:35 +04:00
Andrey Vagin
2e9ddccdb9 restore: rework logic about temporary proc
We need proc for restoring processes. The existent /proc may be not suitable.
E.g. If processes are in pidns.

crtools mounts procfs in a temporary directory, but it should be
umounted at the end. Before this patch crtools did that, but
it doesn't work if processes in a mount namespace.

Actually this logic can be simplified and this patch does that.
* create a tmp dir
* mount procfs
* open this directory and save a file descriptor.
* detach procfs
* remove the tmp dir
* access to proc via openat, statat and so on.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-08-01 15:01:13 +04:00
Cyrill Gorcunov
58b0ef655f restore: Add test for optional PB fields in core_entry
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2012-07-20 14:30:20 +04:00
Cyrill Gorcunov
9d918c5964 protobuf: Convert core_entry to PB format v5
This requires some exlanations

 - Since we use protobuf data in restorer
   code we need to carry a copy of appropriate
   PB entities in resident memory. For this
   sake task_restore_core_args and thread_restore_args
   were significantly reworked. In short -- the caller
   code fills PB structures into task arguments space.

v3:
 - Combine everything arch related to thread_info field,
   and make it optional
 - Drop "version" field from message, we check version in
   another specific message
 - Don't forget to call core_entry__free_unpacked where needed
 - We continue dumping FPU state, still it's not yet restored

v4:
 - Don't carry task_core_entry and task_kobs_ids_entry for
   threads, and yield error if present in image.

v5:
 - Allocate core_entry depending on type of task being dumped

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-07-20 14:06:42 +04:00
Pavel Emelyanov
64967eef21 crtools: Kill the ability to work on individual process
We haven't tested it for several monthes and there's no evidence
it is required at all. For dumping a single task -t option works
just fine.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-07-19 17:55:34 +04:00
Pavel Emelyanov
9f2168a4f0 images: Introduce the top-level file -- inventory
Currently we store the images version in the core file. This is
bad, since core file describes a single process (or thread) and
says nothing about the images set as a whole (let alone the fact
that it's being parsed too late).

Thus introduce the inventory image file which describes the image
set the way we need (want). For now the only entry in it is the
images version. In the future it can be extended.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-07-19 17:37:25 +04:00
Cyrill Gorcunov
4806e1395f protobuf: Convert vma_entry to PB format v3
v2:
 - Use regular uint types in message proto
 - Use PB engine for "show"
v3:
 - drop usage of temp. variable in prepare_shmem_pid

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-07-19 12:43:36 +04:00
Pavel Emelyanov
ffd40996ea pb: Switch creds to protobuf format
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-07-19 12:35:25 +04:00
Cyrill Gorcunov
808b8f2f06 protobuf: Convert mm_entry to PB format
Because the MmEntry has a "repeated" field, we
copy aux vector explicitly and reserve space for
it in task args.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-07-19 07:25:05 +04:00
Cyrill Gorcunov
a7691bcbe2 protobuf: Convert itimer_entry to PB format
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-07-18 16:27:01 +04:00
Cyrill Gorcunov
6b9d3affc9 protobuf: Convert sa_entry to PB format
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-07-18 16:25:06 +04:00
Pavel Emelyanov
786012e891 mnt: Fix mountinfo collecting issues
1. Mountinfo should be collected after we have forked into new namespace (strictly
   speaking this is so)
2. When restoring a mnt ns we can reuse the collected mntinfos rather than reading
   them again.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-07-15 08:43:37 +04:00
Andrey Vagin
6fb3759c5f restore: restore pgid in two phases
As described in the previous patch, process group leaders are restored in
the first phase, then all other processes restores pgid.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-07-02 16:53:55 +04:00
Andrey Vagin
5c45786417 restore: wait while restroring pgid (v2)
Pgid leader should become such before any other task tries
to enter this group (with setpgid). Thus we introduce a yet
another global sync point -- before it all pgid leaders call
setpgid after it all the others do it.

v2: wait while helpers restored pgid

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-07-02 16:53:54 +04:00
Cyrill Gorcunov
ea1ce8e472 fifo: Add checkpoint restore for fifos v4
Checkpoint and restore of fifo is similar to
pipes c/r except the pipe end-points are named
file.

Because the fifo has a name we use regular files
facility for fifo path c/r.

Still there is a trick used to "open" fifo:
the opening procedure migh sleep if a fifo's peer
is not yet opened, so before doing a real open
we yield a fake open procedure (with O_RDWR flag)
which prevents us from sleeping even if peer
is not yet ready. Also we need writable fifo
end to restore data queued.

v2:
 - add open/priv members to reg_file_info
 - make open_fifo_fd to use open_fe_fd
 - comment on pipe_id
 - make sure the fifo data is not restored twice

v3:
 - drop useless fixme comment and add sane one
v4:
 - Use restore_data flag to escape data restore duplication
 - Use S_ISREG for file contents copying

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-07-01 17:15:48 +04:00
Andrey Vagin
49c1d43645 pstree: move all code about pstree in a separate file
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Looks-cool-to: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-06-27 21:07:30 +04:00