On restore we will read all VmaEntries in one big MmEntry object,
so to avoif copying them all into vma_areas, make them be pointable.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Currently we check PTRACE_PEEKSIGINFO and if it's defined in a system
header, we suppose that ptrace_peeksiginfo_args is defined there too.
But due to a bug in glibc, this check doesn't work. Now we have F20,
where ptrace_peeksiginfo_args is defined in sys/ptrace and F21 where
it isn't defined.
commit 9341dde4d56ca71b61b47c8b87a06e6d5813ed0e
Author: Mike Frysinger <vapier@gentoo.org>
Date: Sun Jan 5 16:07:13 2014 -0500
ptrace.h: add __ prefix to ptrace_peeksiginfo_args
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Right now we do it two times -- on shmem prepare and
on the restore itself. Make collection only once as
we do for fdinfo-s -- root task reads all stuff in and
populates tasks' rst_info with it.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When parsing mappings in proc, we fstat vm file, later,
when dumping it, we stat it again to fill fd_parms.
The 2nd stat is not required, we can keep the stat in
vma_area.
This removed 35% of all stat calls on dump of basic container.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The is_foo_link readlinks the lfd to check. This makes
anon-inodes dumping readlink several times to find proper
dump ops. Optimize this thing.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Quite a lot of VMAs in tasks map the same file with different
perms. In that case we may skip opening all these files, but
"borrow" one from the previous VMA parsed.
There's little sense in seeking more that just previous VMA,
as same files are rarely (can be though) mapped in different
locations.
After this on a basic Centos6 container the number of opens and
stats in this function drops from ~1500 to ~500.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It's useful to know this value.
W/o cache (first pre-dump) on minimal container the irmap
resolve time is ~0.2 sec. With cache (next pre-dumps or
final dump) on the same container the irmap resolve time
is 10 times less.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When dumping fsnotifies we may go to irmap to get inode->path
mapping. The irmap engine scans FS (in hinted locations) to
get one and it is slow even though we scan only part of the FS.
Since the above scanning is done while tasks are frozen the
freeze time goes up :(
Improve the situation by generating irmap cache in working dir
at pre-dump when tasks get unfrozen.
The on-disk irmap cache is PB file, it sits in -W directory
and can be loaded on dump/pre-dump start in memory. When
resolving the inode->path mapping irmap may meet these entries,
revalidate them and potentially save time.
After pre-dump the (re-)collected irmap data is written back
to irmap cache image. Typically entries written back are the
same read in on cache load.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We will generate some info about file-descriptors at that
stage. For now these pre-dumped ones would be fsnotifies,
so the pre-dump of a single fd is written as simple as
possible, but enough for that type of FDs pre-dump.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Well, we want to pre-dump files (fsnotifies), for that we
will need mountinfo-s and root, and for the latter -- the
current ns mask.
The problem with current ns mask is that its generation is
incorporated into ns IDs generation and dumping. And since
the ids dumping is not performed on pre-dump, let's just
provide a helper for ns-mask generation.
Strictly speaking, the whole ns-mask idea is not great, but
it's to be fixed later.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The existing code opens "self" and parses what's in there,
just twist the code a little to accept generic pid.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Service shouldn't call client provided scripts, as it
creates a security issue (client may be unpriviledged,
while the service is).
In order to let caller do what it would normally do with
criu-scripts, make criu notify it about scripts. Caller
then do whatever it needs and responds back.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
RPC will start page-server daemon and needs to get the
controll back to report back to caller, but the glibc's
daemon() does exit() in parent context preventing it.
Thus -- introduce own daemonizing routine.
Strictly speaking, this is not pure daemon() clone, as the
parent process has to exit himself. But this is OK for now.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
If someone reads untouched page, the kernel maps the zero page
to this address. This page will not have the SOFT_DIRTY bit and it must
not be dumped.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
For inotify/fanotifies we cannot always open inodes by a
handle and have to scan directories searching for the inode
path :(
Fortunately, in most of the containers' cases fsnotifies are
put in "typical" places. These are used as hints for scanner.
The best way to go is use openvz's ploop over such filesystems.
Long term solution is to fix NFS to provide opening by handle.
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We will need to special-care NFS silly-rename files, thus
we need to know which FS a file belongs to.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The -F|--fields option specifies which fields (by name, comma
separated) should be printed.
For nested fields all names in path should be specified.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
If we meet a link we cannot dump we call plugin to check
whether it's the link, that should be treated as external.
Note, that on restore we don't call any plugins, but
consider the setup-namespace script to move the respective
link into the namespace. Links are not hierarchical and
can be moved between namespaces easily, so it's OK to
delegate the link creation to the script.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
All the entries with with_plugin set will be mounted by plugin.
The interesting case is when we do the pivot-root restore. In this
case we call restore callback very early (before we unmount the old
tree) and ask it to create the mountpoint at temporary location.
Later we move the mount to proper place.
The old_root argument of the callback is where it can find files
in the original mount namespace.
The is_file is return-argument. Sine files and directories cannot be
bind-mounted to each-other, the callback should create the mountpoint
itself and report whether it created file or directory.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
External bind mounts are those with source sitting outside of the
current FS view. Such are detected in validate_mounts(), so we
just go ahead and call plugins.
The plugin is provided with the mountpoint to decide whether it's
his or not (what else does the guy need?) and an ID with this it
can identify the mountpoint in /proc. The same ID will be used at
restore time to find the needed restore info.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
In file included from arch/x86/crtools.c:11:0:
include/ptrace.h:16:0: error: "PTRACE_LISTEN" redefined [-Werror]
#define PTRACE_LISTEN 0x4208
^
In file included from include/ptrace.h:5:0,
from arch/x86/crtools.c:11:
/usr/include/sys/ptrace.h:150:0: note: this is the location of the previous definition
#define PTRACE_LISTEN PTRACE_LISTEN
^
cc1: all warnings being treated as errors
make[1]: *** [arch/x86/crtools.o] Error 1
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It's a feature of PTRACE_SEIZE. So we need to do something, only
if we want to change the state.
[xemul: If task _was_ in stopped state before dump and we want them
to stay alive after dump, the existing code queues one more STOP
to it. This affects subsequent dump, as we seize a stopped task
with STOP in queue.
One more item in TODO list -- support stopped tasks with STOP in
queue :)
]
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Before this patch, backslash was at 81th column which makes the text
twice longer on a standard 80 col terminal, which is quite annoying.
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Here is nothing interecting. If a file can't be dumped by criu,
plugins are called. If one of plugins knows how to dump the file,
the file entry is marked as need_callback. On restore if we see
this mark, we execute plugins for restoring the file.
v2: Callbacks are called for all files, which are not supported by CRIU.
v3: Call plugins for a file instead of file descriptor. A few file
descriptors can be associated with one file.
v4: A file descriptor is opened in a callback. It's required for
restoring anon vmas.
v5: Add a separate type for unsupported files
v6: define FD_TYPES__UNSUPP
v7: s/unsupp/ext (external)
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We don't know a state behind an external socket. It depends on logic
of the program, which handles this socket.
This patch adds ability to load a library with callbacks for dumping
and restoring external sockets.
This patch introduces two callbacks cr_plugin_dump_unix_sk and
cr_plugin_restore_unix_sk. If a callback can not handle a socket, it
must return -ENOTSUP.
The main questions, what kind of information should be tranfered in
these callbacks. Pls, think a few minutes about that and send me
your opinion.
v2: Use uflags instread of adding a new field
v3: clean up
v4: Unsuitable callbacks return -ENOTSUP.
v5: set USK_CALLBACK, if a socket was dumped by callback.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Libraries (plugins) is going to be used for dumping and restoring
external dependencies (e.g. dbus, systemd journal sockets, charecter
devices, etc)
A plugin can have the cr_plugin_init() and cr_plugin_fini functions for
initialization and deinialization.
criu-plugin.h contains all things, which can be used in plugins.
v2: rename lib to plugin
v3: add a default value for a plugin path.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
old snapshot from "parent" symlink, and pids from pagemap-PID.img files
Signed-off-by: Tikhomirov Pavel <snorcht@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Such constants as CR_MAX_MSG_SIZE and CR_DEFAULT_SERVICE_ADDRESS are need to be used in both service and lib.
Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
In case criu check is run as non-root, a lot of information is printed
to a user, with the only missing bit is it should run it as root.
Fix it.
I still don't like the fact that some other stuff is printed here,
like the timestamp and the __FILE__:__LINE__, but this should be
fixed separately.
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We have a mess of uintX_t and uX usage. Drop off uintX_t ones.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Remove space before tab characters.
Found by git grep ' ' (Space, Ctrl-V, Tab in shell).
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>