2
0
mirror of https://github.com/checkpoint-restore/criu synced 2025-08-29 13:28:27 +00:00

9635 Commits

Author SHA1 Message Date
Pavel Emelyanov
5f75727830 epol: Sanitize epoll tfds collecting
This case is legacy, tfds are merged into epoll entry, but
to make it working we have separate list of tfds and extra
code in ->open callback.

Keep the legacy code in one place.

Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-06-14 11:17:01 +03:00
Pavel Emelyanov
c48099d83a image: Introduce collect-nofree flag
Current collect helper frees the pb entry if there's
zero priv_size on cinfo. For files we'll have zero
priv_size (as entries will be collected by sub-cinfos),
while the entry in question should NOT be freed.

Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-06-14 11:17:01 +03:00
Mike Rapoport
80e146da20 criu: pagemap: add reset method
Rather than do open/close to reset pagemap, just update it's state.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2017-06-14 10:48:05 +03:00
Adrian Reber
9de01e1b89 Make skip_pages function available criu-wide
For the upcoming userfaultfd integration the skip_pages functionality is
required to find the userfaultfd requested pages.

Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2017-06-14 10:46:58 +03:00
Pavel Emelyanov
f0a87835e1 vma: Fix badly inherited FD in filemap_open
Previous patch (5a1e1aac) tried to minimize the amount of
open()s called when mmap()ing the files. Unfortunatley, there
was a mistake and wrong flags were compared which resulted in
the whole optimization working randomly (typically not
working).

Fixing the flags comparison revealed another problem. The
patch in question correllated with the 03e8c417 one, which
caused some vmas to be opened and mmaped much later than the
premap. When hitting the situation when vmas sharing their
fds are partially premapped and partially not, the whole
vm_open sharing became broken in multiple places -- either
needed fd was not opened, or the not needed left un-closed.

To fix this the context, that tracks whether the fd should
be shared or not, should be moved from collect stage to
the real opening loop. In this case we need to explicitly
know which vmas _may_ share fds (file private and shared)
with each other, so the sharing knowledge becomes spread
between open_filemap() and its callers. Oh, well...

Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-06-14 10:44:33 +03:00
Pavel Emelyanov
a0738c75c7 vma: Do not open similar VMAs multiple times
On real apps it's typical to have sequences ov VMAs with
absolutely the same file mapped. We've seen this dump-time
and fixed multiple openings of map_files links with the
file_borrowed flag.

Restore situation is the same -- the vm_open() call in many
cases re-open the same path with the same flags. This slows
things down.

To fix this -- chain VMAs with mapped files to each other
and only the first one opens the file and only the last
one closes it.

✓ travis-ci: success for mem: Do not re-open files for mappings when not required
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2017-06-14 10:44:33 +03:00
Pavel Emelyanov
77e9c5d806 vma: Move fdflags evaluation into collect_filemap
In this routine we'll need to compare fdflags, so to
avoid double if-s, let's calculate and set fdflags early.

✓ travis-ci: success for mem: Do not re-open files for mappings when not required
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2017-06-14 10:44:33 +03:00
Pavel Emelyanov
c9194500bf mem: Don't do unneeded mprotects
When a vma we restore doesn't have any pages in pagemaps there's
not need to enforce PROT_WRITE bit on it.

This only applies to non-premmaped vmas.

Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-06-14 10:44:33 +03:00
Pavel Emelyanov
91388fce03 mem: Delayed vma/pr restore (v2)
Performance experiments show, that we spend (relatively) a lot of time
mremap-ing areas from premap area into their proper places. This time
depends on the task being restored, but for those with many vmas this
can be up to 20%.

The thing is that premapping is only needed to restore cow pages since
we don't have any API in the kernel to share a page between two or more
anonymous vmas. For non-cowing areas we map mmap() them directly in
place. But for such cases we'll also need to restore the page's contents
also from the pie code.

Doing the whole page-read code from PIE is way too complex (for now), so
the proposal is to optimize the case when we have a single local pagemap
layer. This is what pr.pieok boolean stands for.

v2:
* Fixed ARM compiling (vma addresses formatting)
* Unused tail of premapped area was left in task after restore
* Preadv-ing pages in restorer context worked on corrupted iovs
  due to mistakes in pointer arithmetics
* AIO mapping skipped at premap wasn't mapped in pie
* Growsdown VMAs should sometimes (when they are "guarded" by
  previous VMA and guard page's contents cannot be restored in
  place) be premmaped
* Always premmap for lazy-pages restore

Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-06-14 10:44:32 +03:00
Pavel Emelyanov
074e7b8901 vma: Mark cow roots
Next patch will stop premapping some private vmas. In particular -- those,
that are not COW-ed with anyone. To make this work we need to distinguish
vmas that are not cowed with anyone from those cowed with children only.
Currently both have vma->parent pointer set to NULL, so for former let's
introduce the special mark.

Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-06-14 10:40:57 +03:00
Pavel Emelyanov
76c1ec4e27 vma: Do not open vmas when inheriting
Inherited VMAs don't need the descriptor to work with.

Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-06-14 10:40:56 +03:00
Pavel Emelyanov
f6bfdb8d1a vma: Move cow decision earlier (v2)
Collect VMAs into COW-groups. This is done by checking each pstree_item's
VMA list in parallel with the parent one and finding VMAs that have
chances to get COW pages. The vma->parent pointer is used to tie such
areas together.

v2:
* Reworded comment about pvmas
* Check for both vmas to be private, not only child
* Handle helper tasks

Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-06-14 10:40:56 +03:00
Pavel Emelyanov
216658cdf0 vma: Keep pointer on parent vma
We currently keep pointer on parent vma bitmap, but more info
about the parent will be needed soon.

Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-06-14 10:40:56 +03:00
Pavel Emelyanov
5ca537a211 vma: Introduce vma_premapped flag
Not all private VMA-s will be premmaped, so a separate sign of
a VMA being on the premap area is needed.

Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-06-14 10:40:56 +03:00
Pavel Emelyanov
fd5ae6d9b5 mem: Shuffle page-read around
The page-read will be needed during the premap stage.

Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-06-14 10:40:54 +03:00
Dmitry Safonov
28c35b1815 vdso/compat: Don't unmap missing vdso/vvar vmas
I've met missing vvar on Virtuozzo 7 kernel - just skip
unmapping it.

TODO: check ia32 C/R with kernel CONFIG_VDSO=n

Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-06-14 10:31:34 +03:00
Dmitry Safonov
baed8b8c6d pie/vdso: return back ELF header mismatch error
I've deleted it previously by the reason that I searched
vdso vma in [vdso/vvar] vma's pair by magic header.
So, I needed to suppress this error.

>From that moment, I've reworked how 32-bit vdso is parsed
and now we don't need to search it, even more: we parse it
only once in the criu helper.

Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-06-14 10:31:34 +03:00
Pavel Tikhomirov
47f3b88955 restore: fix sys_wait4 error handling in case no child
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-06-14 10:31:34 +03:00
Dmitry Safonov
dc8261ced5 net: suggest enabling NETFILTER_XT_MARK if iptables-restore failed
On x86_64 defconfig it's =m, so if you boot kernel without initramfs
in qemu, you will see this.

[xemul: split long line]

Fixes: #292
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-06-14 10:31:34 +03:00
Dmitry Safonov
871ce841a7 travis/ia32: Remove libc6.i386 dependency
Not needed anymore for CONFIG_COMPAT.

Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-06-14 09:49:30 +03:00
Dmitry Safonov
37b3e5953b ia32/feature-test: Don't check i686 libraries presence
I was adapting CRIU with ia32 support for building with Koji,
and found that Koji can't build x86_64 packages and have
i686 libs installed.
While at it, I found that i686 libraries requirement is
no longer valid since I've deleted the second parasite.

Drop feature test for i686 libs and put test for gcc.
That will effectively test if gcc can compile 32-bit code
and bug with debian's gcc (#315).

Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-06-14 09:48:43 +03:00
Dmitry Safonov
f32ffdef90 nmk: Provide try-asm build check function
I need to add feature test written in assembly to check
if the feature can be compiled.

Add a make function for this purpose.

Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-06-14 09:48:43 +03:00
Kirill Tkhai
7d8adbee2b mount: Find NS_ROOT for cr-time mount on restore
After commit 2e8970beda5b "mount: create a mount point
for the root mount namespace in the roots yard", top
of the tree of mount_infos points to the fake mount.
So, when we're looking for appropriate place for
binfmt_misc, we can't find "xxx/proc/sys/fs/binfmt_misc".
Fix that by finding real NS_ROOT manually.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-06-14 09:45:44 +03:00
Kang Yan
e03aa018c2 Update Makefile
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-06-14 09:45:44 +03:00
Pavel Emelyanov
7ce496eb10 tty: Kill prepare_shared_tty
The routine in question just sets up the mutex to access
/dev/ptmx. This initialization can be done when we collect
a single tty.

✓ travis-ci: success for Sanitize initialization bits
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2017-06-14 02:01:44 +03:00
Pavel Emelyanov
a216fb1e5e tty: Merge tty post actions
No need to schedule both post-actions, we can merge them. This
also sanitizes the "void *unised" arguments for both.

✓ travis-ci: success for Sanitize initialization bits
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2017-06-14 02:00:35 +03:00
Pavel Emelyanov
77443aa520 regfiles: Kill prepare_shared_reg_files()
This routine just initializes the remap open lock,
and there's already the code that initializes the
whole remap engine.

Re-arrange this part.

✓ travis-ci: success for Sanitize initialization bits
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2017-06-14 01:59:47 +03:00
Pavel Emelyanov
237bd26982 remap: Rename global lock
Now this lock is only needed to serialize remap open
code, so name it such.

✓ travis-ci: success for Sanitize initialization bits
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2017-06-14 01:59:47 +03:00
Pavel Emelyanov
a534c76c42 remaps: Rename clean_linked_remap
This routine cleans any file remap.

✓ travis-ci: success for Sanitize initialization bits
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2017-06-14 01:59:47 +03:00
Pavel Emelyanov
23e092f709 ghosts: Add comment about shared path allocation
Ghost remaps allocate path with shmalloc. Add comment
why this is such.

✓ travis-ci: success for Sanitize initialization bits
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2017-06-14 01:59:47 +03:00
Pavel Emelyanov
2356e5ffc6 regfiles: Do not serialize remap lookup
We used to have users counter on remap which was
incremented each time this routine was called. Nowadays
remaps are managed w/o the refcounting and we no
longer need global mutex protection for it.

✓ travis-ci: success for Sanitize initialization bits
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2017-06-14 01:59:47 +03:00
Pavel Emelyanov
b01bd93757 rst: Collect procfs remaps at once
There's no need in separate call to prepare_procfs_remaps().
All remaps are collected one step earlier and we can do
open_remap_dead_process() right at once.

Also rename the latter routine.

✓ travis-ci: success for Sanitize initialization bits
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2017-06-14 01:59:47 +03:00
Pavel Emelyanov
30e2fd2175 fsnotify: Tossing legacy bits around
This just moves all the deprecated code into one place.

✓ travis-ci: success for Sanitize fsnotify legacy code
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2017-06-14 01:59:46 +03:00
Pavel Emelyanov
d965f5887d fsnotify: Kill fsnotify lists
The lists are only needed to collect marks (deprecated) into
notify objects. The latter ones are stored in fdsec hash, so
for this legacy case we can find them there.

✓ travis-ci: success for Sanitize fsnotify legacy code
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2017-06-14 01:59:46 +03:00
Pavel Emelyanov
80f84ab0a4 fsnotify: Deprecate separate images for marks
Marks images were merged into regular in 1.3.

✓ travis-ci: success for Sanitize fsnotify legacy code
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2017-06-14 01:59:46 +03:00
Pavel Emelyanov
4bd43e226f fsnotify: Fix legacy fanotify collect
Wrong helper is called.

✓ travis-ci: success for Sanitize fsnotify legacy code
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2017-06-14 01:59:46 +03:00
Kir Kolyshkin
88699b49f7 travis-tests: set CRIU_PMC_OFF conditionally
We only needed it for kernel 3.19. Apparently, Ubuntu 14.04.5 comes
with a kernel from 16.04 (i.e. 4.4), so we can disable this workaround!

Anyway, just in case, let's do it conditionally.

While at it, slightly improve the comment.

Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-06-14 01:59:46 +03:00
Kir Kolyshkin
2a6b8e1743 travis-tests: install less packages
asciidoc pulls in a lot of dependencies, most of those are not
needed as we just use it to convert txt to a man page. Adding
--no-install-recommended option to apt-get makes it skip those
additional dependencies. The only needed package is xmlto, so
let's add it explicitly.

This results is some 50 packages being skipped (mostly TeX/LaTeX and
some extra SGML tools), wow!

Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-06-14 01:59:44 +03:00
Pavel Emelyanov
12e3adc68c criu: Version 3.1
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
v3.1
2017-05-22 11:40:37 +03:00
Pavel Emelyanov
27be93b7db Revert "kdat: Relax uffd checks (v2)"
This reverts commit a840995689f7ab898ba34fd10e014704165e2f83, that
got into master by mistake.

Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2017-05-22 11:23:11 +03:00
Cyrill Gorcunov
f734928c8a tty: Make sure no /dev/tty inheritance exist
Currently we support restoring opened /dev/tty reference
if only control terminal belongs to the same process,
ie no inheritance is allowed.

Thus we should refuse to dump in such scenario
otherwise restore will fail.

Reported-by: Stanislav Kinsburskiy <skinsbursky@virtuozzo.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-05-19 09:33:19 +03:00
Andrei Vagin
0237eb27c8 restore: don't collect mounts if mntns isn't restored
Currently it is only used to get a file descriptor to the mount
namespace root, but if we have only one mntns, we can open "/".

Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-05-19 09:33:11 +03:00
Andrei Vagin
695295ae98 fsnotify: don't save mnt_id if a mount namepsace isn't dumped
Processes can be restored in another mntns, so mnt_id will be
useless in this case. If mntns isn't dumped, we have to dump a path
to a mount point.

Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-05-19 09:33:02 +03:00
Dmitry Safonov
6f8f85ffd4 ia32/kdat: Check for 32-bit mmap() bug
There were kernel bug with 32-bit mmap() returning 64-bit pointer.
The fix is in Torvalds master, will be released in v4.12 kernel.
Checkpointing after v4.9 kernel works good, but restoring will
result in application which will mmap() 64-bit addresses resulting
in segfault/memory corruptions/etc.
As our policy is fail on dump if we can't restore on the same target,
error checkpointing for v4.9.

Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-05-19 09:31:40 +03:00
Andrei Vagin
6eee76d403 restore: simplify sigchld handler
The idea is simple. Everyone has to wait its children,
a restore is interrupted if we found abandoned zombie.

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-05-19 09:31:29 +03:00
Pavel Emelyanov
9058baae7e pagemap: Enqueue iovec to arbitrary list
We currently can batch pagemap entries into page-read list, but will
need to queue them into per-pstree_entry one.

Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-05-19 09:25:41 +03:00
Pavel Emelyanov
b43f7aa615 page_read: Make it possible to get pages.img ID from page_read
The pages.img will need to get opened one more time w/o the pagemap.img.

Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-05-19 09:21:18 +03:00
Pavel Emelyanov
aebbdbf1ef compel: Add preadv syscall
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-05-19 09:21:16 +03:00
Pavel Emelyanov
0c4eaafd4a vma: Add vma_next() helper
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-05-19 09:20:13 +03:00
Kir Kolyshkin
ef6c3f6ce6 compel/test/fdspy: fix linking
1. Commit 8b99809 ("compel: make plugins .a archives") changed the
   suffix of compel plugins, so this test no longer compiles.

2. "compel plugins" can print auxiliary plugins now, let's use it.

Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-05-19 09:17:20 +03:00