Actually rt_sigset_t and k_rtsigset_t are the same
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Currently shmem generates page images in parallel
with page server and IDs may intersect. Fix this by
making page server create larger IDs.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The page server is a process, that is about to get pages over
the network and put them into pagemap- + pages- images. Right
now what it does is simply get the data and puts it into the
image files. When we have dirty set tracking in the kernel the
page server will have to collect "page changes" and properly
integrate them into images.
Running crtools with page server is like this:
dst_node# crtools page-server --port <port> -D dump/ ...
src_node# crtools dump -t <pid> --page-server --address <dst_node> --port <port> -D dump/ ...
After this images from dst_node/dump/ and src_node/dump/ should
be put into one place and tasks can be restored out of it.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We'll send them over network soon, so prepare abstraction layer for
this. Shmem is not on this scheme yet.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Since now we drain pages out of parasite, we can invent any format for
page dumps. Let is be ... prorobuf one! :)
Another thing to keep in mind, is that we're about to use splices and
implement iterative migration, so it's better to have actual pages be
page-aligned in the image.
And -- backward compatibility. That said the new format is:
1. pagemap-... file which contains a header (currently with a ID of
the image with pages, see below) and an array of <nr_pages:vaddr>
pairs. The first value means "how many pages to take from the
file with pages (see below)" and the second -- where in the task
address space to put them. Simple.
2. pages-... file which containes only pages one by one (thus aligned
as we want).
This patch breaks backward compatibility (old images with pages wil
be restored and then crash). Need to do it before v0.5 release.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This allows to reuse magic numbers outside of crtools code.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Currently we dump pages directly from parasite into image files. This
is bad for several reasons:
1. We cannot use any more-or-less custom format for pages easily, since
parasite code cannot be linked with any libraries;
2. We will not be able to optimize migration with preliminary memory
migration (a.k.a. iterative migration) with it -- if we send pages
from parasite over network we are not able to let the task we dump
continue running.
That said, what is done is -- pages from target task are put into a
page-pipe in one go, then (not in this patch) parasite can be released
and we can do with pages whatever we want. For now pages are just
spliced from pipe into image file.
Some numbers:
In order to drain 1Gb of memory from task we need 1.5M of shared map
in args (for iovecs) and 4 pipes (8 descriptors) each referencing 128Mb
of pages, which int turn requires 4 x 640K chunks of sequential kernel
memory (for pipe_buffer). Not that big I guess.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The page-pipe is an object, that can accumulate pages inside it. It
consists of list of page-pipe-bufs, which in turn has a pipa, an
array of iovecs that describe the pages' locations and some stats.
Users of it are supposed to vmsplice pages into pipes to accumulate
then for later use, and vmsplice them from pipes when required.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
I will have to push some sort of map of pages to dump into parasite.
For this, I need to have estimation of how much memory I'd need for
than in parasite args. These two values will help with it.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Right now when we collect list of vmas we need to know the
number of elements in it. In the future I will need to know
more, so it makes sense to create a vmas-list object for it.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Just make use of previous patch. The creds dumping args are tuned to
fit one page (minimal static args size).
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Sometimes we don't know the exact amount of data we would want
to send to parasite via args area (e.g. -- while draining fds).
Fix this, by moving the args area behind the parasite blob and
mmap-ing it with the run-time calculated size.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We will need futexes to use in PIE code but futex.h
uses BUG_ON helper, so to diet inclusions move BUG_ONs
code to include/bug.h.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This facrots out common core members freeing into pstree.c
helper. Per-arch freeing helpers are now symmetrical to the
allocating ones.
This is a merge of two Cyrill's patches.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The FPU data is quite CPU-type oriented,
thus move it to asm/fpu.h.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
They are really depends on CPU we're running on.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
No need to walk up the directories if we need
to include protobuf file. This was always a bad
use of ability to walk the filesystem from other
headers.
Same time we don't need -I$(SRC_DIR)/protobuf/
in general makefile anymore.
[xemul: Small fixlet in head Makefile, since patch
it out-of-order]
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It's been reported that we do not test if the tracee is 32bit
task running kn 64bit kernel. This patch adds such test.
https://bugzilla.openvz.org/show_bug.cgi?id=2505
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This allows us to have a unique place where version lives
and what is more important other subsystems (such as docs
generation) may reuse version definitions.
At moment EXTRA (which corresponds kernels -rc tag) and
NAME is not yet used, but I desided to put them in place
for future needs.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When we've read all pstree-items and their ids we
can get the desired clone-flags early and avoid all
these dances with flag calculations in fork_with_pid
and company.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It's no longer required to use this option -- two currently
supported cases (tasks on host and tasks in containers) can
be detected automatically. Keep this option for future.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Introduce the current_ns_mask variable, that collects info about
which namespaces tasks being dumped and to be restored live in.
For simlicity all tasks are supposed to live in one set of spaces.
This should be fixed eventually.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The recent kernels allow to get namespaces IDs by reading proc-ns links.
Use this to generate IDs for tasks' namespaces (I do generate them, since
IDs provided by kernel look ugly :( ).
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Oracle has such mappings.
v2: add check, that a file is a character device
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
According to the file lock information from the image, we recall
flock or fcntl with proper parameters, so we can rehold the file
locks as we were dumped.
We only support flock and posix file locks so far.
Changelog since the initial version:
a. Use prepare_file_locks instead of restore function directly.
b. Fix some bugs.
Originally-signed-off-by: Zheng Gu <cengku.gu@huawei.com>
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Dump file locks' necessary entries to the image, we only support flock and
posix file lock right now.
Changelog since the initial version:
We got file lock info from global list, so the dump_task_file_locks
can be much simpler.
Originally-signed-off-by: Zheng Gu <cengku.gu@huawei.com>
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We collect all file locks to a golbal list, so we can use them easily
in dump_one_task. For optimizaton, we only collect file locks hold by
tasks in the pstree.
Thanks to the ptrace-seize machanism, we can aviod the blocked file lock
issue, makes the work simpler.
Right now, the check handles only one situation:
-- Dumping tasks with file locks hold without the -l option.
This covers for the most part. But we still need some more work to make
it perfect robust in the future.
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Sometimes, when we parse some global info files, we can only care about
tasks which are taken into dump(such as file locks), which means their
pids are in the pstree.
So a function like this would be help.
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We need a new protobuf description for file-lock.
Originally-signed-off-by: Zheng Gu <cengku.gu@huawei.com>
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Add --file-locks/-l option to support handling file locks, for safety,
only used for container.
Originally-signed-off-by: Zheng Gu <cengku.gu@huawei.com>
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The layout of the struct rlimit depends on the value
of the macro FILE_OFFSET_BITS. If FILE_OFFSET_BITS is 64
the userspace and kernel definitions becomes incoherent
on a 32-bit platform.
The struct krlimit representing the kernel version of
the struct rlimit is introduced to address the issue:
the function restore_rlims() is fixed to convert between
the userspace and kernel representations of the struct.
Signed-off-by: Alexander Kartashov <alekskartashov@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
These are structs that (now) tie together ns string
and the CLONE_ flag. It's nice to have one (some code
becomes simpler) and will help us with auto-namespaces
detection.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>