Current code doesn't make any difference between OPT and no-OPT
except for the message is printed or not in the open_image().
So this particular change changes nothing but the availability of
this message.
In the next patches I wil introduce "empty images" to deal with
the ENOENT situation in a more graceful manner.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When we are doing pre-dump, we splice pages in pipes and only then open
images and dump pages. But when we are splicing pages, we need to know
about existence of parent images. This patch adds a new call to determin
existence of parent images.
In addition this patch fixes a following issue:
CID 83244 (#1 of 1): Uninitialized pointer read (UNINIT)
14. uninit_use: Using uninitialized value xfer.parent.
v2: initialize unused field of struct page_server_iov, because it sends
in network.
CID 83451 (#1 of 1): Uninitialized scalar variable (UNINIT)
2. uninit_use_in_call: Using uninitialized value pi. Field pi.nr_pages
is uninitialized when calling write.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We want to have buffered images to speed up dump and,
slightly, restore. Right now we use plan file descriptors
to write and read images to/from. Making them buffered
cannot be gracefully done on plain fds, so introduce
a new class.
This will also help if (when?) we will want to do more
complex changes with images, e.g. store them all in one
file or send them directly to the network.
For now the cr_img just contains one int _fd variable.
This patch chages the prototype of open_image() to
return struct cr_img *, pb_(read|write)* to accept one
and fixes the compilation of the rest of the code :)
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
There will be no int-fd soon, so one more preparation
to this fact.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
We have some fields, that are dump-only and some that
are restore only (quite a lot of them actually).
Reshuffle them on the vma_area to explicitly show which
one is which. And rename some of them for easier grep.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
vvar zone is mapped by a kernel and must not ever
been dumped into image, the data present there is
valid on running kernel only.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
In a worst scenario we need one IOV for every page we're transferring
from the parasite, thus don't divide by two here.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
here was bug cause if e.g.: iterative snapshots are made and
between two of them new process in process tree was created,
it can have pages which are non dirty, and won't save them
into image. but there is no parent image for it.
pages which are non soft-dirty appear if process with some pages
in non dirty state forks, child will inherit those pte's
and if child don't write to those pages, they will be still in non
soft-dirty state when next dump comes.
also this bug was not catched because of error in zdtm, look 3/3
v2: simplify, add more justification in commit message.
Signed-off-by: Tikhomirov Pavel <snorcht@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
During the time some files become obsolete and might be missing
in checkpoint image set, but to keep backward compatibility we
still trying to open them, which might print out error like
| Unable to open 'path-to-file'
and confuse a reader why criu prints error but continue working.
To eliminate this problem O_OPT flag has been introduced in
commit 16b5692061e2, which suppress error message priting
if the flag is set.
Now start using O_OPT in the following functions
- open_irmap_cache: irmap cache is relatively new optional feature
- prepare_rlimits, open_signal_image, restore_file_locks,
prepare_fd_pid, prepare_mm_pid, collect_image: all these
helpers are trying to open image files which can be missing.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The offset should point on a next entry.
Cc: Pavel Tikhomirov <snorcht@gmail.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This function splices data from a process to criu,
so dump_pages describes the real meaning.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Before this patch, criu splices all data in pipes and then saves these
data in a image file. Here is a problem, becase creating pipes with big
buffers fails too often, because a kernel tries to allocate a big linear
chunks of memory. Now memory are dumped for a few iterations, where the
size of pipe buffers is restricted.
TODO: need to rework pre-dump, because currently dumping data from
pipes are postponed. We are going to use sys_process_vm_readv for
this.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The problem is that vmsplice() to a big pipe fails very often.
The kernel allocates a linear chunk of memory for pipe buffer
descriptos, but a big allocation in kernel can fail.
So we need to restrict maximal capacity of pipes. But the number of
pipes is restricted too, so we need to split dumping memory on chunks.
In this patch we calculates the pipe size for which vmsplice() will not
fail.
v2: s/batch/chunk and a few other small fixes
v3: Remove callbacks from page_pipes and reuse pipes
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We do it first -- on collect, second -- on restore. The
2nd lookup is excessive, we can put fd pointer on vm_area
at lookup and reuse one later.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
If a file mmaped or pointed by exe link is unlinked, we will
generate a ghost file for it. On restore the ghost file will
be created with the users counter 1 and the very first open
(e.g. for mmap) will unlink the file.
Handle this by bumping up user counter for every mapping
pointing on the file.
This appeared after previous patches that packed the reg-files
image. Before it each vma and exe link created separate entry
in the reg-files image.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When writing VMAs we perform too many small writes into vma-.img files.
This can be easily fixed by moving the vma-s into mm-s, all the more
so they cannot be splitted from each other.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
On restore we will read all VmaEntries in one big MmEntry object,
so to avoif copying them all into vma_areas, make them be pointable.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Right now we do it two times -- on shmem prepare and
on the restore itself. Make collection only once as
we do for fdinfo-s -- root task reads all stuff in and
populates tasks' rst_info with it.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When reading pagemaps, we read it from specific position. To
do it, we called lseek, then read. Fortunetely, there's a
syscall that does both things in one call -- pread. Since
we don't need to keep pagemap's position for further reads,
it perfectly suits our needs.
This removes 75% of lseek calls when dumping basic container.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
If someone reads untouched page, the kernel maps the zero page
to this address. This page will not have the SOFT_DIRTY bit and it must
not be dumped.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When writing pagemaps to page-server we want both page-*.img and pagemap-*.img
to be on remote host. But the subsequent pre-dump/dump with parent images will
try to access pagemap-*img-s to check for hole dumped being present in it.
To handle this, move checks for hole being backed by something in parent into
page-local-xfer. In case of page-server dump it will be page-server who will
check the holes when writing them.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We have a big mistake in how we track for ptes to be SOFT_DIRTY -- no
need in these checks if the --track-mem is not given.
While fixing this, remember proper checks for the kernel memory tracker.
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Reported-by: Tim Schürmann <info@tim-schuermann.de>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Remove whitespace at EOL (found by git grep ' $')
To people using vim, I'd suggest adding the following code to ~/.vimrc:
let c_space_errors = 1
highlight FormatError ctermbg=darkred guibg=darkred
match FormatError /\s\+$\|\ \+\t\|\%80v.\|\ \{8\}/
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
fcntl data is arch independent, so move it out of include/asm/type.h
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The version, that might not wait for ack is always called with
"async" flag set. Cleanup things according to this.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This file was created for backward compatibility with
not-yet-patched kernel. Now we can remove it.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
These processes don't have image files in a parent snapshot and crtools
should not fail in this case.
https://bugzilla.openvz.org/show_bug.cgi?id=2636
v2: return NULL from mem_snap_init, if a parent image is absent.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This patch prevents page dump failure screening
by the PARASITE_CMD_MPROTECT_VMAS command success.
Signed-off-by: Alexander Kartashov <alekskartashov@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Parasite daemon mode it quite tricky. One may consider
it as consisting of two parts
- daemon mode for thread leader
- daemon mode for regular threads
Thread leader daemon
--------------------
Once thread leader parasite code switched initialized,
it starts spinning on socket listening for commands
to handle.
If the command destination is the thread leader itself it
handles it and replies back the ack to the caller (iow
the main crtools code).
If the recepient is not thread leader but one of threads,
then thread leader wakes up the thread by futex and makes
it to handle the command waiting on futex for result. Once
result obtained, the ack is being sending back to caller.
Thread daemon
-------------
On initialization thread daemon starts waiting a command on futex.
The futex is triggered by thread leader daemon when command received.
Once command is received and handled, the result is reported back to
the thread leader daemon, which in turn send ack message.
Both thread-leader and regular threads require own stack to operate
on since they all are present in memory simultaneously. Thus we use
call_daemon_thread() helper which takes care of providing stack
to the callee.
TODO:
- ARM requires own wrappers on damonize/trap low-level code,
at moment x86-64 is only covered
v2: remove PARASITE_CMD_DAEMONIZED and s->ack
parasite: use a propper command for getting ack
Fixed-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
They are relying on trap being issued at the end of execution
so to distinguish it with future daemon mode add "trap" postfix
to them.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This constants are system wide, so move them to mem.h
header for reuse sake.
[ xemul: It was kerndat.h in the patch ]
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>