When we restore a pid namespace the root task will get some unknown pid
in the original (i.e. -- the ns crtools a launched from) one. To find
this pid out one can use this option -- it will make the pid obtained by
the new init to be written into a pid file.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
"init" in LXC opens /dev/null and then mounts devtmpfs in /dev,
so crtools can not resolve the path to the origin /dev/null.
crtools with the option --evasive-devices will check the origin
device and a new device are the same and if it's true, crtools will
dump a new path.
v2: add a description for the option
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The protobuf-c generate plain routines for entries manipulations, but
we want to have some "generic" way of working with messages. Collect them
all in an array of descriptors (similar to image files descriptions) and
do full typechecking while this.
Such thing will allow to simplfy code later.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The option is -r|--pivot-root and an argument is a path to new root.
A root task will make pivot_root. LXC CT does that, so we need that
for restoring.
v2: s/pivot-root/root/
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We haven't tested it for several monthes and there's no evidence
it is required at all. For dumping a single task -t option works
just fine.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Do not presume that the argv[1] is action. Use the optind index at the end
of parsing instead. This allows to specify --help option properly.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
to require dumping pid namespace. Dump and restore will be failed if
a tress doesn't contain a process init.
pid namespace will be created implicitly if a process init in the tree.
v2: fix comments from Pavel
v3: Restore of pidns should be approved by user
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
If the option --log-pid is set, each process will have an own log file.
Otherwise PID is added to each log message.
A message can't be bigger than one page minus some bytes for pid.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
v2:
- open_mount is cleaned up
- byte-stream hex conversion remains untouched since
strtol is flipping numbers to LE manner
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
v2:
- Pass initial counter value to eventfd call
(can't pass flags here since they are obtained
with fcntl and must be restored same way or
restore will fail)
- Use rst_file_params for flags and owner restore
- Use eventfd.[ch] instead of eventfs.[ch]
- Move show funcs to eventfd.c
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
* Split dump/restore options from show ones
* Structure the former list
* Add the --tcp-established one
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This test is very basic :( Need more work on it.
One note about --close arg to crtools -- bash leaves all
fds open when laucnhing progams, and for restore this is
critical, as one of them can be busy. Iterating over all
the possibel fds and closing them is "do not want" thing.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
First of all -- to make crtools dump/restore established tcp sockets
you have to specify the --tcp-established options. By doing so you
inform crtools that
a) you know, that after dump there will be a netfilter rules blocking
the dumped connections
b) you guarantee, that before restore this netfilter block is still
there
What else this patch does is simple -- collects establised sockets and
calls the dump/restore tcp function (now empty) where appropriate.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
In case if dgram socket peer is not connected back
we can try to resolve peer by name.
For security reason this happens only if '-x' option
is passed at checkpoint and restore time.
In particular this is needed for programs which do
use dgram socket to send messages to /dev/log.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Completely unlinked file is the one with n_link count being zero.
Such files only allow to read their contents and carry with us.
In order to dump this thing I introduce the "path remap" technology.
For reg file a remapping entry is dumped which describes, that at
restore stage before opening a regfile->path this path should be
linked to some other name and then (after open) unlinked.
For completely unlinked files the remap path would be a path to
a "ghost" file, i.e. a file which is created only at the time of
restore and which is removed completely at the end of it.
Partially unlinked files (i.e. those having n_link != 0, but a
path by which we see them in someone's fd is not accessible) should
be handled in another way.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The mm_xxx bits are per-mm_struct, not per-task_struct in kernel.
Thus, when we support CLONE_VM we'd better have these bits in a
separate image file.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Why? Because one day we'll support various CLONE_ flags and
for fdtable and fs info we'd like to have separate images (since
these objects are separate in kernel).
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
A pipe buffer has 16 slots. A slot is page, offset and size.
When we use splice and data is not aligned, splice connects
a page from file cache and set offset. For this reason we loose
a part of buffer.
If a data size is more than 15 pages, data will be aligned in a image.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This was required when pages were stored in elf files for
exec. Now we can stop reading it on eof.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Each fdset item now has the callback which will show a contents of a magic-described
image file. Per-task and global show code is reworked to walk the respective fdsets
and calling ->show on each file.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
After we removed the pid from pstree image file the -t or -p option for show
command no longer makes sense. Make 'show' mode rely on -D option to find out
where to find the root (i.e. pstree.img) file.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This contains reg-files and sk-queues images, as they contain data
which is potentially generated by every task, so keep it open all
the time dump goes.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Current fdsets are ugly, limited (bitmask will exhaust in several months) and
suffer from unknown problems with fdsets reuse :(
With new approach (this set) the images management is simple. The basic function
is open_image, which gives you an fd for an image. If you want to pre-open several
images at once instead of calling open_image every single time, you can use the
new fdsets.
Images CR_FD_ descriptors should be grouped like
_CR_FD_FOO_FROM,
CR_FD_FOO_ITEM1,
CR_FD_FOO_ITEM2,
..
CR_FD_FOO_ITEMN,
_CR_FD_FOO_TO,
After this you can call cr_fd_open() specifying ranges -- _FROM and _TO macros,
it will give you an cr_fdset object. Then the fdset_fd(set, type) will give you
the descriptor of the open "set" group corresponding to the "type" type.
3 groups are introduced in this set -- tasks, ns and global.
That's it.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Write two helpers for opening an fdset for task and one for ns.
This probably can be done with some "generic" macro(s), but this
time it's simpler not to produce more code of that type.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It's not required any longer. Now fdsets are allocated one-by-one only
when required and there's no need in adding new fds to existing sets.
Thus just remove the last arg from cr_fdset_open.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The core image now contains only core per-task stuff.
The new file resurrects Tula magic number removed earlier.
Acked-by: Andrey Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
vma_entry contains shmid and all shared memory are dumped in own files.
The most interesting thing is restore.
A maping is restored by process with the smallest pid. The mamping
is created before executing restorer.
We map a full mapping and restore it's conten, then we open a file from
/proc/pid/map_files and store a descriptor in vma_info. The mapping is
unmaped. Now we can map any region of this mapping in the restorer.
We use this trick, because a target process may have this mapping in
some places and the restorer has not function to open proc files.
v2: fix error hangling
xemul: Fixed static-s and args for cr_dump_shmem
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Now a name of an image file is hard coded ("smth-%d.img", pid),
but the images of namespaces, shared memery, etc belong to
not one task, so they may have other formats of names, which
will describe objects.
For example a image of shared memory content may have name like
this ("pages-shmem-%ld.img", shmid)
v2: fix comment
v3: rebase
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Using absolute paths for this is dangerous - while doing c/r we should
be extremely carefully and not change tasks' roots and mount namespaces
too early. Sometimes it will not work -- when restoring containers we'll
be unable to switch to new CT and still have the ability to open images.
Rework the images opening via openat and keep the image dir fd open all
the time as the service fd (introduced earlier).
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>