When live migrating a container with large amount of processes
inside the time to do page-server-ed dump may be up to 10 times
slower than for the local dump.
The delay is always introduced in the open_page_server_xfer()
when criu negotiates the has_parent bit on the 2nd task. This
likely happens because of the Nagel algo taking place -- after
the write() of the OPEN2 command happened kernel delays this
command sending waiting for more data.
v2:
Fix this by turning on CORK option on memory transfer sockets
on send side, and NODELAY one once on urgent data. Receive
side is always NODELAY-ed. According to Alexey Kuznetsov this
is the best mode ever for such type of transfers.
v3:
Push packets in pre-dump's check_parent_server_xfer too.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@odin.com>
Conflicts:
include/util.h
util.c
It's required for dumping tmpfs, where we use tar to save content.
If we need to execute tar from a proper userns to get right uid-s and
gid-s for files.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
ispathsub("/foo", "/") reports false. This is a corner case,
as 2nd argument is not expected to end with /. Fix this and
add comment about ispathsub() arguments assumptions.
Reported-by: Andrey Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When we validate the mount tree not to have overmounts we need to
check one path to be the sub-path of another. Here's a helper for
this.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
This commit is in preparation for the (hopefully last :) restore special cpuset
patch.
Previously, we installed the cgroup service fd after calling
prepare_cgroup_dirs, which meant that we had to carry around the temporary
directory name in order to put things in the right place. The
restore_cgroup_prop function uses the cg service fd instead of carrying around
the full path. This means that we can't sue restore_cgroup_prop, without first
sanitizing the path. Instead, we install the service fd before calling
prepare_cgroup_dirs, and all the code just references that instead of carrying
around the temporary path.
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This is to simplify the change from int fd to more
generic image class data-type.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
When service starts page server all the preparations (log, wdir, img dir, etc.)
happen in parent task, then we fork page server.
This is OK for now, but when we will serve several requests per connection, all
these resources would be leaked in parent.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We can't dump netlink socket, inotify, fanotify, if they have queued
data, so lets add a function to chech this.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Before we would not detect the mount point for co-mounted controllers. Things
still worked because we'd just re-mount them ourselves and traverse our own
mount point, but this saves an extra mount().
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
During the dump phase, /proc/cgroups is parsed to find co-mounted cgroups.
Then, for each task /proc/self/cgroup is parsed for the cgroups that it is a
member of, and that cgroup is traversed to find any child cgroups which may
also need restoring. Any cgroups not currently mounted will be temporarily
mounted and traversed. All of this information is persisted along with the
original cg_sets, which indicate which cgroups a task is a member of.
On restore, an initial phase creates all the cgroups which were saved. Tasks
are then restored into these cgroups via cg_sets as usual.
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We have a set of routines that open /proc/$pid files via proc service
descriptor. Teach them to accept non-pids as pids to open /proc/self/*
and /proc/* files via the same engine.
Signed-f-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It's possible that a procfs mounted somewhere other than /proc
is in use.
Signed-off-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The is_foo_link readlinks the lfd to check. This makes
anon-inodes dumping readlink several times to find proper
dump ops. Optimize this thing.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
RPC will start page-server daemon and needs to get the
controll back to report back to caller, but the glibc's
daemon() does exit() in parent context preventing it.
Thus -- introduce own daemonizing routine.
Strictly speaking, this is not pure daemon() clone, as the
parent process has to exit himself. But this is OK for now.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
In case criu check is run as non-root, a lot of information is printed
to a user, with the only missing bit is it should run it as root.
Fix it.
I still don't like the fact that some other stuff is printed here,
like the timestamp and the __FILE__:__LINE__, but this should be
fixed separately.
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Both fopen_proc() and opendir_proc() are calling open_proc().
If open_proc() fails, it prints an error message. Before this
patch, a second error message was printed due to missing brackets.
This second message is a bit more specific (it tells the exact
file/dir we failed to open) but it is redundant, because more
generic error was already printed by open_proc(). It is also
can be misleading because for the second message we reuse
the same errno while we should not.
So, let's remove this second error message print by using brackets.
Alternatively, we could leave this as is (just fixing indentation)
and let two errors be printed -- there is nothing wrong with that,
but I think in this case less messages is better.
This is a fix to commit 5661d80.
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We really have a mess of extern/non-extern declaration
of functions in our headers. Always use extern for
unification purpose.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Depending on BITS_PER_LONG userspace representation of dev_t
may vary, so we need to choose proper encoding.
Signed-off-by: Igor Sukhih <igor@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
There's ... a number of places where we want to do something
with /proc/self/fd/%d path. Each time we guess buffer size
that is enough for this. Make standard constant for this and
save some space on stack and drop args for some functions.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When a kernel didn't show vma flags, we set MAP_GROWSDOWN for stack
vmas, but it's not reliable. E.g. thread stacks are mapped without
MAP_GROWSDOWN.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Instead of bloating util.h lets move ERR_
helpers to own err.h header. This allow
to reuse it where needed without util.h
inclusion.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We will need futexes to use in PIE code but futex.h
uses BUG_ON helper, so to diet inclusions move BUG_ONs
code to include/bug.h.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
No need to walk up the directories if we need
to include protobuf file. This was always a bad
use of ability to walk the filesystem from other
headers.
Same time we don't need -I$(SRC_DIR)/protobuf/
in general makefile anymore.
[xemul: Small fixlet in head Makefile, since patch
it out-of-order]
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Oracle has such mappings.
v2: add check, that a file is a character device
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
* The following files goes into the directory arch/x86/include/asm unmodified:
- include/atomic.h,
- include/linkage.h,
- include/memcpy_64.h,
- include/types.h,
- include/bitops.h,
- pie/parasite-head-x86-64.S,
- include/processor-flags.h,
- include/syscall-x86-64.def.
* Changed include directives in the source files that include the headers
listed above.
* Modified build scripts to reflect the source moves.
Signed-off-by: Alexander Kartashov <alekskartashov@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Make them look like __CR_<smth>_H__ with
sed -e '1,2s/#\(ifndef\|define\) _\?_\?\(CR_\)\?/#\1 __CR_/' -e '1,2s/_H_\?_\?.*$/_H__/'
on every header file.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Remove the restorer-log and link log-simple into restorer
blob. Now we can use the normal pr_foo API.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
For executing an external tools we need to block a SIGCHLD
and to juggle file descriptors.
SIGCHLD is blocked for getting an exit code.
A problem with file descriptors can be if we want to set 2 to STDIN,
1 to STDERR, 0 to STDOUT for example.
v2: use helpers reopen_fd_as and move_img_fd
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When a pidns is dumped, crtools uses procfs from this pidns
for getting information about zombies.
v2: Use close_proc() instead of restore_proc_fd().
A current proc will be opened again, if it's required.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
/proc/PID/maps can contains not up to date information about a stack vma.
A kernel marks a VMA as stack, if thread_struct->usersp is in it,
but usersp is updated, when a process calls a syscall.
This problem is occured, when we try to dump/restore a process in a loop.
When a restorer resumes a process, a restorer vma will be marked as stack.
A thread stack should not be marked as stack, because its vma is mapped
w/o MAP_GROWSDOWN.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Make restorer-log.c to be more similar to general log code.
- Use current_logfd/current_loglevel variable names
- Add ability to filter out messages which log level
is not requested
For example, if a program get restorer without any log
level specified -- we restore it silently not printing
even a single message (if no error happened of course).
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This one allocates a memory, that will be shared (MAP_SHARED)
between all the subsequent children. This can now be just used
for fd resote, later it will be required at inet socket port
binding.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>