because they describes a process TREE.
It's usefull, when we dump tasks from another pid namespace,
because a real pid is got from parasite. In previous version
we need to update pid in two places one is in a pstree_item and
one is in a children array.
A process tree will be necessery to restore sid and pgid,
because we should add fake tasks in a tree. For example if
a sesion leader is absent.
v2: fix rollback actions
v3: fix comments from Pavel Emelyanov
* add macros for_each_pstree_item
* and a few bugs
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Since event polling depends on other files
to be opened we split main files list into
two parst -- event poll files and all other
files, thus defer the creation of eventpoll
files in prepare_fds().
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
If the option --log-pid is set, each process will have an own log file.
Otherwise PID is added to each log message.
A message can't be bigger than one page minus some bytes for pid.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
v2:
- open_mount is cleaned up
- byte-stream hex conversion remains untouched since
strtol is flipping numbers to LE manner
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
v2:
- Pass initial counter value to eventfd call
(can't pass flags here since they are obtained
with fcntl and must be restored same way or
restore will fail)
- Use rst_file_params for flags and owner restore
- Use eventfd.[ch] instead of eventfs.[ch]
- Move show funcs to eventfd.c
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
First of all -- to make crtools dump/restore established tcp sockets
you have to specify the --tcp-established options. By doing so you
inform crtools that
a) you know, that after dump there will be a netfilter rules blocking
the dumped connections
b) you guarantee, that before restore this netfilter block is still
there
What else this patch does is simple -- collects establised sockets and
calls the dump/restore tcp function (now empty) where appropriate.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
In case if dgram socket peer is not connected back
we can try to resolve peer by name.
For security reason this happens only if '-x' option
is passed at checkpoint and restore time.
In particular this is needed for programs which do
use dgram socket to send messages to /dev/log.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Completely unlinked file is the one with n_link count being zero.
Such files only allow to read their contents and carry with us.
In order to dump this thing I introduce the "path remap" technology.
For reg file a remapping entry is dumped which describes, that at
restore stage before opening a regfile->path this path should be
linked to some other name and then (after open) unlinked.
For completely unlinked files the remap path would be a path to
a "ghost" file, i.e. a file which is created only at the time of
restore and which is removed completely at the end of it.
Partially unlinked files (i.e. those having n_link != 0, but a
path by which we see them in someone's fd is not accessible) should
be handled in another way.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
I store them on _entry since sids can only be inherited or
set to current's pid. Thus the best we can do it restore sids
at fork time, thus save them in the image we use to fork.
Maybe when we submit patches that will give us ability to set
arbitrary pgid and sid we'll change this, but this is in the
future.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This is trivial change, but is required to check for pgid/sid
are in 'restorable' state, see for respective patch/code for
details.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The mm_xxx bits are per-mm_struct, not per-task_struct in kernel.
Thus, when we support CLONE_VM we'd better have these bits in a
separate image file.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Why? Because one day we'll support various CLONE_ flags and
for fdtable and fs info we'd like to have separate images (since
these objects are separate in kernel).
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This is a big change, yes. Dump unix sockets in the same manner
as all the other files are done now. A few notes however.
1. We explicitly drop names for connected stream sockets. This is
done to avoid conflicts with names -- accepted sockets share their
names with the listening parent. This can be done later by binding
a socket to a name, them renaming it to some temporary uniq one
and at the very very end renaming some back to original.
2. Interconnected sockets are restored via socketpair() call. This is
correct, but names are dropped. Need to bind() sockets after this
(yes, this can be done), but for this we need to implement the trick
with renames described before.
3. FD for socket queues is constantly re-opened not to resolve fd
conflicts. Need to use service fds engine for this later.
4. Some code cleanup is still required, yes (will follow shortly).
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
A pipe buffer has 16 slots. A slot is page, offset and size.
When we use splice and data is not aligned, splice connects
a page from file cache and set offset. For this reason we loose
a part of buffer.
If a data size is more than 15 pages, data will be aligned in a image.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Information about pipe's file structs saved in one global file and
fdinfo_entry is saved for each descriptor
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This was required when pages were stored in elf files for
exec. Now we can stop reading it on eof.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Now every inetsk fd dump results in a new entry in the fdinfo.img file. Sockets itself are
dumped into inetsk.img global image file. On restore the generic fdinfo redistribution algo
is used and inet sockets are opened only when required.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Each fdset item now has the callback which will show a contents of a magic-described
image file. Per-task and global show code is reworked to walk the respective fdsets
and calling ->show on each file.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
After we removed the pid from pstree image file the -t or -p option for show
command no longer makes sense. Make 'show' mode rely on -D option to find out
where to find the root (i.e. pstree.img) file.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This contains reg-files and sk-queues images, as they contain data
which is potentially generated by every task, so keep it open all
the time dump goes.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Current fdsets are ugly, limited (bitmask will exhaust in several months) and
suffer from unknown problems with fdsets reuse :(
With new approach (this set) the images management is simple. The basic function
is open_image, which gives you an fd for an image. If you want to pre-open several
images at once instead of calling open_image every single time, you can use the
new fdsets.
Images CR_FD_ descriptors should be grouped like
_CR_FD_FOO_FROM,
CR_FD_FOO_ITEM1,
CR_FD_FOO_ITEM2,
..
CR_FD_FOO_ITEMN,
_CR_FD_FOO_TO,
After this you can call cr_fd_open() specifying ranges -- _FROM and _TO macros,
it will give you an cr_fdset object. Then the fdset_fd(set, type) will give you
the descriptor of the open "set" group corresponding to the "type" type.
3 groups are introduced in this set -- tasks, ns and global.
That's it.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Write two helpers for opening an fdset for task and one for ns.
This probably can be done with some "generic" macro(s), but this
time it's simpler not to produce more code of that type.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It's not required any longer. Now fdsets are allocated one-by-one only
when required and there's no need in adding new fds to existing sets.
Thus just remove the last arg from cr_fdset_open.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This fd is global, so make it such. It will stop being just a global
variable soon.
Plus, remove the pid arg from format.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Pid number is redundant - this file is one for the whole tree.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Open the exec link at fd restore stage as yet another service fd,
then pass it to restover via args and just call prctl on it.
This is good for several reasons -- the amount of code required for
this is less and opening files should better happen before we switch
to restorer (opening will be complex and it's MUCH easier to open all
we need in one place).
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The core image now contains only core per-task stuff.
The new file resurrects Tula magic number removed earlier.
Acked-by: Andrey Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It's a rudiment from old times, when restore worked via ececve.
Now we modify the core file in place to fixup vma-s.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
vma_entry contains shmid and all shared memory are dumped in own files.
The most interesting thing is restore.
A maping is restored by process with the smallest pid. The mamping
is created before executing restorer.
We map a full mapping and restore it's conten, then we open a file from
/proc/pid/map_files and store a descriptor in vma_info. The mapping is
unmaped. Now we can map any region of this mapping in the restorer.
We use this trick, because a target process may have this mapping in
some places and the restorer has not function to open proc files.
v2: fix error hangling
xemul: Fixed static-s and args for cr_dump_shmem
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It will be used to restore shared mappings
v2: clean up
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Now a name of an image file is hard coded ("smth-%d.img", pid),
but the images of namespaces, shared memery, etc belong to
not one task, so they may have other formats of names, which
will describe objects.
For example a image of shared memory content may have name like
this ("pages-shmem-%ld.img", shmid)
v2: fix comment
v3: rebase
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>