2
0
mirror of https://github.com/checkpoint-restore/criu synced 2025-08-31 14:25:49 +00:00
Commit Graph

72 Commits

Author SHA1 Message Date
Christopher Covington
1438f013a2 Pass task_size to vma_area_is_private()
If we want one CRIU binary to work across all AArch64 kernel
configurations, a single task size value cannot be hard coded. Since
vma_area_is_private() is used by both restorer blob code and non
restorer blob code, which must use different variables for recording
the task size, make task_size a function argument and modify the call
sites accordingly. This fixes the following error on AArch64 kernels
with CONFIG_ARM64_64K_PAGES=y.

  pie: Error (pie/restorer.c:929): Can't restore 0x3ffb7e70000 mapping w>
  pie: ith 0xfffffffffffffff7

Signed-off-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-08-03 17:14:18 +03:00
Cyrill Gorcunov
9ce0254c04 vma: Unify private VMAs testing
We have two helpers for VMA type testing: privately_dump_vma() and vma_priv(). They
work with different types but basically do the same: check if we should dump VMA into
the image and restore it back then.

Lets unify they both into common vma_entry_is_private() helper and vma_area_is_private()
for working with vma_area type.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-04-01 12:36:46 +03:00
Pavel Emelyanov
f7f76d6ba6 img: Introduce empty images
When an image of a certian type is not found, CRIU sometimes
fails, sometimes ignores this fact. I propose to ignore this
fact always and treat absent images and those containing no
objects inside (i.e. -- empty). If the latter code flow will
_need_ objects, then criu will fail later.

Why object will be explicitly required? For example, due to
restoring code reading the image with pb_read_one, w/o the
_eof suffix thus required the object to be in the image.

Another example is objects dependencies. E.g. fdinfo objects
require various files objects. So missing image files will
result in non-resolved searches later.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-03-13 14:42:54 +03:00
Pavel Emelyanov
e29c9daec2 img: Remove O_OPT and COLLECT_OPTIONAL
Current code doesn't make any difference between OPT and no-OPT
except for the message is printed or not in the open_image().
So this particular change changes nothing but the availability of
this message.

In the next patches I wil introduce "empty images" to deal with
the ENOENT situation in a more graceful manner.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-03-13 14:42:01 +03:00
Andrey Vagin
71a0b5dc31 mem: check existence of parent images before dumping pages (v2)
When we are doing pre-dump, we splice pages in pipes and only then open
images and dump pages. But when we are splicing pages, we need to know
about existence of parent images. This patch adds a new call to determin
existence of parent images.

In addition this patch fixes a following issue:
CID 83244 (#1 of 1): Uninitialized pointer read (UNINIT)
14. uninit_use: Using uninitialized value xfer.parent.

v2: initialize unused field of struct page_server_iov, because it sends
in network.

CID 83451 (#1 of 1): Uninitialized scalar variable (UNINIT)
2. uninit_use_in_call: Using uninitialized value pi. Field pi.nr_pages
is uninitialized when calling write.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-11-29 19:32:40 +03:00
Pavel Emelyanov
19a76494a9 kerndat: Collect all global variables on one struct
Not to spoil the global namespace and unify the kerndat
data names.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-11-11 20:14:53 +04:00
Pavel Emelyanov
c443b03e10 rst: Rework the rst_info referencing
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-10-01 13:34:38 +04:00
Pavel Emelyanov
295090c1ea img: Introduce the struct cr_img
We want to have buffered images to speed up dump and,
slightly, restore. Right now we use plan file descriptors
to write and read images to/from. Making them buffered
cannot be gracefully done on plain fds, so introduce
a new class.

This will also help if (when?) we will want to do more
complex changes with images, e.g. store them all in one
file or send them directly to the network.

For now the cr_img just contains one int _fd variable.

This patch chages the prototype of open_image() to
return struct cr_img *, pb_(read|write)* to accept one
and fixes the compilation of the rest of the code :)

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
2014-09-30 21:48:13 +04:00
Pavel Emelyanov
42821edccf img: Use errno when checking optional images open fail
There will be no int-fd soon, so one more preparation
to this fact.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
2014-09-30 21:48:11 +04:00
Pavel Emelyanov
cf8c9ae870 vma: Reshuffle the struct vma_area
We have some fields, that are dump-only and some that
are restore only (quite a lot of them actually).

Reshuffle them on the vma_area to explicitly show which
one is which. And rename some of them for easier grep.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-29 13:19:55 +04:00
Cyrill Gorcunov
0bb002ce69 vdso: dump -- Don't dump contents of vvar zone
vvar zone is mapped by a kernel and must not ever
been dumped into image, the data present there is
valid on running kernel only.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-06-24 22:48:41 +04:00
Cyrill Gorcunov
d8e5f617b5 mem: Don't shrink the number of IOVs needed for page transferring
In a worst scenario we need one IOV for every page we're transferring
from the parasite, thus don't divide by two here.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-12 14:46:41 +04:00
Tikhomirov Pavel
e01dc7faa6 v2 mem: if no parent image persists, can't rely on it
here was bug cause if e.g.: iterative snapshots are made and
between two of them new process in process tree was created,
it can have pages which are non dirty, and won't save them
into image. but there is no parent image for it.

pages which are non soft-dirty appear if process with some pages
in non dirty state forks, child will inherit those pte's
and if child don't write to those pages, they will be still in non
soft-dirty state when next dump comes.

also this bug was not catched because of error in zdtm, look 3/3

v2: simplify, add more justification in commit message.

Signed-off-by: Tikhomirov Pavel <snorcht@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-18 13:50:06 +04:00
Cyrill Gorcunov
1153f225ff image: Add O_OPT when trying to open optional image files
During the time some files become obsolete and might be missing
in checkpoint image set, but to keep backward compatibility we
still trying to open them, which might print out error like

 | Unable to open 'path-to-file'

and confuse a reader why criu prints error but continue working.

To eliminate this problem O_OPT flag has been introduced in
commit 16b5692061, which suppress error message priting
if the flag is set.

Now start using O_OPT in the following functions

 - open_irmap_cache: irmap cache is relatively new optional feature

 - prepare_rlimits, open_signal_image, restore_file_locks,
   prepare_fd_pid, prepare_mm_pid, collect_image: all these
   helpers are trying to open image files which can be missing.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-03-17 14:21:21 +04:00
Cyrill Gorcunov
8bea49a74b pagemap-cache: Use page.h helpers
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-02-21 16:29:41 +04:00
Andrey Vagin
b742c125fb mem: fix calculation of a page offset in generate_iovs
The offset should point on a next entry.

Cc: Pavel Tikhomirov <snorcht@gmail.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-02-19 20:09:40 +04:00
Cyrill Gorcunov
bbdc511741 mem: Use pagemap cache
This improve speed if we're dumping a big set of small vmas.

CentOS-6 container
------------------

Without cache

dump: {
	freezing_time: 1705
	frozen_time: 44885
	memdump_time: 9064
	memwrite_time: 15846
	pages_scanned: 246979
	pages_skipped_parent: 0
	pages_written: 2831
	irmap_resolve: 0
}

With cache

dump: {
	freezing_time: 898
	frozen_time: 40859
	memdump_time: 7254
	memwrite_time: 16375
	pages_scanned: 246979
	pages_skipped_parent: 0
	pages_written: 2831
	irmap_resolve: 0
}

1024 VMA, 40K each
------------------
Without cache

dump: {
	freezing_time: 170
	frozen_time: 30372
	memdump_time: 3895
	memwrite_time: 691
	pages_scanned: 13487
	pages_skipped_parent: 0
	pages_written: 61
	irmap_resolve: 0
}

With cache

dump: {
	freezing_time: 231
	frozen_time: 27646
	memdump_time: 768
	memwrite_time: 798
	pages_scanned: 13487
	pages_skipped_parent: 0
	pages_written: 61
	irmap_resolve: 0
}

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-02-18 12:49:17 +04:00
Andrey Vagin
c6a3b1de27 mem: rename fill_pages into dump_pages
This function splices data from a process to criu,
so dump_pages describes the real meaning.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-02-14 16:44:58 +04:00
Andrey Vagin
b5dff62e3b mem: use chunk mode for dumping anonymous memory
Before this patch, criu splices all data in pipes and then saves these
data in a image file. Here is a problem, becase creating pipes with big
buffers fails too often, because a kernel tries to allocate a big linear
chunks of memory. Now memory are dumped for a few iterations, where the
size of pipe buffers is restricted.

TODO: need to rework pre-dump, because currently dumping data from
      pipes are postponed. We are going to use sys_process_vm_readv for
      this.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-02-10 15:06:42 +04:00
Andrey Vagin
4e395c5c8d mem: move code to splice memory into pipes in a separate function
It's preparation to dump memory by chunks.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-02-10 15:06:40 +04:00
Andrey Vagin
bb98a82098 page-pipe: split dumping memory on chunks (v3)
The problem is that vmsplice() to a big pipe fails very often.

The kernel allocates a linear chunk of memory for pipe buffer
descriptos, but a big allocation in kernel can fail.

So we need to restrict maximal capacity of pipes. But the number of
pipes is restricted too, so we need to split dumping memory on chunks.

In this patch we calculates the pipe size for which vmsplice() will not
fail.

v2: s/batch/chunk and a few other small fixes
v3: Remove callbacks from page_pipes and reuse pipes
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-02-10 15:06:39 +04:00
Pavel Emelyanov
dc7abdfb92 vma: Don't lookup file_desc for vma twice
We do it first -- on collect, second -- on restore. The
2nd lookup is excessive, we can put fd pointer on vm_area
at lookup and reuse one later.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-02-07 13:51:29 +04:00
Pavel Emelyanov
490efb4695 files: Properlu count number of users for mmaped/exe-d ghost files
If a file mmaped or pointed by exe link is unlinked, we will
generate a ghost file for it. On restore the ghost file will
be created with the users counter 1 and the very first open
(e.g. for mmap) will unlink the file.

Handle this by bumping up user counter for every mapping
pointing on the file.

This appeared after previous patches that packed the reg-files
image. Before it each vma and exe link created separate entry
in the reg-files image.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-02-05 16:18:21 +04:00
Pavel Emelyanov
b745ae83b3 vma: Backward compatible VMA restore
If we've found zero VMAs in MmEntry try to look for
VMAs in vma-.img image file.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-02-04 12:04:33 +04:00
Pavel Emelyanov
54f4f889a5 mm: Move VmaEntries from separate image into Mm one
When writing VMAs we perform too many small writes into vma-.img files.
This can be easily fixed by moving the vma-s into mm-s, all the more
so they cannot be splitted from each other.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-02-04 11:44:05 +04:00
Pavel Emelyanov
72e462ad67 mm: Read mmentry early
We'll merge mm and vma images, so mm should be read in the
same place where vmas are.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-02-04 11:44:04 +04:00
Pavel Emelyanov
ed836740ba vma: Don't copy VmaEntry on vma_area
After previous patch is't now possible.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-02-04 11:44:02 +04:00
Pavel Emelyanov
eb1ae0a025 vma: Turn embeded VmaEntry on vma_area into pointer
On restore we will read all VmaEntries in one big MmEntry object,
so to avoif copying them all into vma_areas, make them be pointable.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-02-04 11:44:01 +04:00
Pavel Emelyanov
6a5188a2cd vma: Use vma_area_is helper where appropriate (p2)
Lost from c8d5f1a2

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-02-04 11:43:22 +04:00
Pavel Emelyanov
446fdd7200 rst: Collect VmaEntries only once on restore
Right now we do it two times -- on shmem prepare and
on the restore itself. Make collection only once as
we do for fdinfo-s -- root task reads all stuff in and
populates tasks' rst_info with it.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-02-03 23:35:03 +04:00
Pavel Emelyanov
0786f831d7 mem: Move shmem preparation routine and rename
We'll collect VmaEntries early before fork.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-02-03 23:34:12 +04:00
Pavel Emelyanov
593cb59a63 pagemap: Use pread to read pagemap entries
When reading pagemaps, we read it from specific position. To
do it, we called lseek, then read. Fortunetely, there's a
syscall that does both things in one call -- pread. Since
we don't need to keep pagemap's position for further reads,
it perfectly suits our needs.

This removes 75% of lseek calls when dumping basic container.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-02-03 00:30:02 +04:00
Andrey Vagin
af510ae01a mm: don't dump the zero page
If someone reads untouched page, the kernel maps the zero page
to this address. This page will not have the SOFT_DIRTY bit and it must
not be dumped.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-01-30 14:31:39 +04:00
Pavel Emelyanov
bfc2895f2a dump: Move pagemap parent checks from mem-dump to page-xfer
When writing pagemaps to page-server we want both page-*.img and pagemap-*.img
to be on remote host. But the subsequent pre-dump/dump with parent images will
try to access pagemap-*img-s to check for hole dumped being present in it.

To handle this, move checks for hole being backed by something in parent into
page-local-xfer. In case of page-server dump it will be page-server who will
check the holes when writing them.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-01-24 16:02:57 +04:00
Andrey Vagin
e36250ef7e mem: handle errors of page_xfer_dump_pages()
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-01-23 13:27:14 +04:00
Pavel Emelyanov
04fde2e178 mem: Don't track memory changes if --track-mem is not specified
We have a big mistake in how we track for ptes to be SOFT_DIRTY -- no
need in these checks if the --track-mem is not given.

While fixing this, remember proper checks for the kernel memory tracker.

Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Reported-by: Tim Schürmann <info@tim-schuermann.de>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-01-16 22:07:44 +04:00
Kir Kolyshkin
d64d68d66c whitespace-at-eol cleanup
Remove whitespace at EOL (found by git grep ' $')

To people using vim, I'd suggest adding the following code to ~/.vimrc:

let c_space_errors = 1
highlight FormatError ctermbg=darkred guibg=darkred
match FormatError /\s\+$\|\ \+\t\|\%80v.\|\ \{8\}/

Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-12-12 10:00:45 +04:00
Cyrill Gorcunov
8f64a14a03 headers: Move fcntl related data to include/fcntl.h
fcntl data is arch independent, so move it out of include/asm/type.h

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-11-14 22:13:10 +04:00
Andrey Vagin
4850fd94a8 crtools: move cr_options in a separate header
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-11-06 18:17:52 +04:00
Andrey Vagin
0d1dfc2e08 crtools: move all stuff about vma together
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-11-06 12:43:49 +04:00
Andrey Vagin
824403a009 crtools: create new header for servicefd stuff (v2)
v2: generate patch relative to the official git.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-11-06 12:43:02 +04:00
Pavel Emelyanov
987de2de05 parasite: Rename ack-waiting function to look better
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-07-17 08:56:17 +04:00
Pavel Emelyanov
f2978d2ac7 parasite: Reshuffle sync and async daemon-node executing routines
The version, that might not wait for ack is always called with
"async" flag set. Cleanup things according to this.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-07-17 08:54:24 +04:00
Cyrill Gorcunov
93e60c76ee mem: Add missing EOL into pr_warn
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-07-15 18:48:57 +04:00
Pavel Emelyanov
0b5170c0fc mem: Remove pagemap2 mentions
This file was created for backward compatibility with
not-yet-patched kernel. Now we can remove it.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-07-02 09:51:33 +04:00
Pavel Emelyanov
0f96026192 mem: Lower messages severity for inability to reset dirty tracker
The _actual_ need for this is checked in other place.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-07-02 09:15:50 +04:00
Pavel Emelyanov
1edb5b01f3 mem: Don't ignore memory tracking reset errors
Otherwise on non-soft-dirty kernel dump passes, but
produces broken image.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-07-01 21:26:35 +04:00
Andrey Vagin
dd38ae16d1 mm: handle new processes which created between snapshots (v2)
These processes don't have image files in a parent snapshot and crtools
should not fail in this case.

https://bugzilla.openvz.org/show_bug.cgi?id=2636

v2: return NULL from mem_snap_init, if a parent image is absent.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-06-27 23:47:15 +04:00
Alexander Kartashov
5bf83eaf69 mem: don't screen page dump failure
This patch prevents page dump failure screening
by the PARASITE_CMD_MPROTECT_VMAS command success.

Signed-off-by: Alexander Kartashov <alekskartashov@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-06-22 13:17:28 +04:00
Andrey Vagin
11d3adbf56 parasite: remove code which used for daemonized threads
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-05-27 16:45:24 +04:00