2
0
mirror of https://github.com/checkpoint-restore/criu synced 2025-08-30 22:05:36 +00:00
Commit Graph

4346 Commits

Author SHA1 Message Date
Pavel Emelyanov
441b9b9ee5 zdtm: Stupid test for task-in-cgroup
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-28 00:24:54 +04:00
Pavel Emelyanov
203c291467 cg: Restore tasks into proper cgroups
On restore find out in which sets tasks live in and move
them there.

Optimization note -- move tasks into cgroups _before_ fork
kids to make them inherit cgroups if required. This saves
a lot of time.

Accessibility note -- when moving tasks into cgroups don't
search for existing host mounts (they may be not available)
and don't mount temporary ones (may be impossible due to
user namespaces). Instead introduce service fd with a yard
of mounts.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-27 23:48:06 +04:00
Pavel Emelyanov
1ba9d2cae9 cg: Dump cgroups tasks live in
Each task points to a single ID of cgroup-set it lives in. This
is done so to save some space in the image, as tasks likely
live in the same set of cgroups.

Other than this we keep track of what cgroup set we dump the
subtree from. If it happens, that root task lives in the same
cgroup set as criu does, we don't allow for any other sub-cgroups
and make restore (next patch) much simpler and faster.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-27 23:48:06 +04:00
Pavel Emelyanov
8b8eb53a0a cg: Skeleton for cgroup code
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-27 23:48:06 +04:00
Pavel Emelyanov
06f7243380 image: Add bits and pieces for cgroups image
The exact structure of the image will be revealed in the
next patch(es). What is important here, is that cgroup
image is somewhat new.

It will likely contain arrays of objects of different types,
so I introduce the "header" object, that will link these
arrays using pb repeated fields. This will help us to avoid
many image files for different cgroup objects and will make
the amount of write()-s required be 1.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-27 23:48:06 +04:00
Pavel Emelyanov
b48e4cbfb8 proc: Introduce helper for parsing /proc/$pid/cgroup file
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-27 23:48:06 +04:00
Pavel Emelyanov
e5eb73ea48 util: Introduce strstartswith helper
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-27 23:48:06 +04:00
Cyrill Gorcunov
c473461d24 vdso: Make it arch specific
Currently we build vDSO handling code for all archs provided
in the source code having some "common" parts inside pie/vdso.c,
pie/vdso-stub.c, vdso-stub.c and vdso.c. This were more or
less well but in new linux kernels (starting from 3.16 presumably)
the vDSO has been significantly reworked so every architecture
must have own vDSO handling engine (just like the kernel does).

So in this patch we move vDSO code to arch specific and because
aarch64 actually doesn't implement proxification yet due to
kernel restrictions -- we drops it out. When there will be
kernel support we bring it back in proper arch/aarch64
implementation.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Alexander Kartashov <alekskartashov@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-27 23:41:31 +04:00
Cyrill Gorcunov
676708e3b3 vdso: Put CONFIG_VDSO where needed
Guard vDSO code with CONFIG_VDSO, no need to even build it
on archs which do not support vDSO handling.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Alexander Kartashov <alekskartashov@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-27 23:40:07 +04:00
Cyrill Gorcunov
89faae1e9b vdso: dump -- Drop duplicated include
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Alexander Kartashov <alekskartashov@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-27 23:40:00 +04:00
Cyrill Gorcunov
46661cf8b2 vdso: make -- Export VDSO and CONFIG_VDSO
We will need it to figure out if architecture
needs vDSO handling code to be built. Note
currently only x86 is exporting vDSO simply
because ARM support is not yet ready.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Alexander Kartashov <alekskartashov@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-27 23:39:51 +04:00
Filipe Brandenburger
ff1cf6b2ca zdtm: add missing entries to test/zdtm/.gitignore
Signed-off-by: Filipe Brandenburger <filbranden@google.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-26 17:38:56 +04:00
Filipe Brandenburger
5b14a32e35 zdtm: keep entries in test/zdtm/.gitignore sorted
Signed-off-by: Filipe Brandenburger <filbranden@google.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-26 17:38:54 +04:00
Filipe Brandenburger
733bbb337c make: use -pthread when linking with -lrt
This is required when linking statically.  It is also consistent with
the rules for other zdtm tests that link with -lrt.

Signed-off-by: Filipe Brandenburger <filbranden@google.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-26 17:38:53 +04:00
Filipe Brandenburger
81fe07e54d make: respect USERCFLAGS in zdtm makefiles
Signed-off-by: Filipe Brandenburger <filbranden@google.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-26 17:38:52 +04:00
Filipe Brandenburger
2351d188c3 make: fix dependencies for "make criu" to work
Without the added dependencies, "make criu" will fail when trying to
build arch/x86/crtools.d because it can not find include/config.h, the
extra dependencies force the "config" rule to be processed before the
dependencies of "criu" are evaluated.

Signed-off-by: Filipe Brandenburger <filbranden@google.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-26 17:38:51 +04:00
Filipe Brandenburger
c1c7d8984c make: clean up obj-ext-src-y objects on "make clean"
This is needed for lib/rpc.pb-c.{d,o} to be removed by "make clean".

Signed-off-by: Filipe Brandenburger <filbranden@google.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-26 17:38:49 +04:00
Andrey Vagin
f0cbc301fc mm: mark VM_IO and VM_PFNMAP VMA-s as unsupported
vmsplice doesn't work for such VMA-s.

This flags is set in a kernel function remap_pfn_range()
(remap kernel memory to userspace), which is widely used by device
drivers to provide direct access to a device memory.

Reported-by: J F <jgmb45@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-23 13:34:16 +04:00
Cyrill Gorcunov
19906f9182 fsnotify: Use mnt_id when restoring mount mark
When there is a bind mount present on same mountpoint
where mark is laying the device of both mountpoints
is the same so we might end up getting wrong mountpoint.
Instead we should used mount id which is unique among
all mounpoints.

Note it's a fast fix, I need to review fsnotify code
more and make sure all corner cases are covered.

https://bugzilla.openvz.org/show_bug.cgi?id=2974

Reported-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-23 13:31:32 +04:00
Filipe Brandenburger
9f415abdb7 zdtm: fix cow01 test to fail on system errors
Add extra troubleshooting messages for errors and failures so that the
output can help troubleshoot the issue.

Make error handling more uniform by using -1 for errors (e.g. opening
files, reading from pagemap, etc.) and 1 for failures (data mismatch,
pages not COWed when expected to be, etc.)

Tested: Ran this test with criu versions known to have problems with COW
and with problems to read from pagemap due to the dumpable flag not
being preserved.

Signed-off-by: Filipe Brandenburger <filbranden@google.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-21 00:51:44 +04:00
Andrey Vagin
8e1842aec1 zdtm/pty02: ignore SIGHUP
The test proces opens master and slave points. The slave point becomes
the control terminal for this process. Then the test process forks a
second process and the first process closes the master point. So when a
master point is closed in a second process, the first one will be killed
by SIGHUP.

Reported-by: Victor Konyashkin <vkonyashkin@parallels.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-14 17:35:53 +04:00
Andrey Vagin
5a14695777 make: "make test" executes make in the test directory
test/Makefile knows better how to execute tests.
For example it allows to execute tests simultaneously

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-14 17:35:20 +04:00
Andrey Vagin
be10e0e157 git: update .gitignore
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-14 17:34:41 +04:00
Filipe Brandenburger
d5bb7e9748 dump: preserve the dumpable flag on criu dump/restore
Preserve the dumpable flag, which affects whether a core dump will be
generated, but also affects the ownership of the virtual files under
/proc/$pid after restoring a process.

Tested: Restored a process with a criu including this patch and looked
at /proc/$pid to confirm that the virtual files were no longer all owned
by root:root.

zdtm tests pass except for cow01 which seems to be broken.
(see https://bugzilla.openvz.org/show_bug.cgi?id=2967 for details.)

This patch fixes https://bugzilla.openvz.org/show_bug.cgi?id=2968

Signed-off-by: Filipe Brandenburger <filbranden@google.com>
Change-Id: I8c386508448a84368a86666f2d7500b252a78bbf
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-14 01:02:37 +04:00
Andrey Vagin
1a1b50168d vma: don't skip vmas during searching a parent vma
We stop searching if vma->start is bigger than a required one.
The coursor is set on the last examined vma. When we are searching a
parent vma for the next vma, we start examine vma-s starting from
coursor->next, so we don't examine the vma, which is pointed by cursor.

This patch replaces list_for_each_entry_continue on list_for_each_entry_from.

Reported-by: Filipe Brandenburger <filbranden@google.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-14 01:00:31 +04:00
Andrey Vagin
c838494034 mem: stop searching a parent vma if we found one
A parent vma can be only one.

Fixes: 57d25e7cea ("mm: fix expression to determine which vma-s can be shared")
Reported-by: Filipe Brandenburger <filbranden@google.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-14 01:00:22 +04:00
Andrey Vagin
b57497ade5 mem: fix typo in determining an address of parent vma
Look at this hunk from 7659c995f5:
-    paddr = decode_pointer(vma_premmaped_start(&p->vma));
+    paddr = decode_pointer(vma->premmaped_addr);

Obviously we want to use p->premmaped_addr instead of
vma->premmaped_addr.

Fixes: 7659c995f5 ("vm: don't overwrite vma->shmid for private mappings")
Reported-by: Filipe Brandenburger <filbranden@google.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-14 01:00:13 +04:00
Cyrill Gorcunov
ebd2e88c9a zdtm: maps05 -- Test a flood of small VMAs
Here we test a big number of small VMAs to survive c/r. The main
idea is to use a big number of IOVs needed for page transferring.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-12 14:46:48 +04:00
Cyrill Gorcunov
d8e5f617b5 mem: Don't shrink the number of IOVs needed for page transferring
In a worst scenario we need one IOV for every page we're transferring
from the parasite, thus don't divide by two here.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-12 14:46:41 +04:00
Andrey Vagin
85d366e8e3 vdso: don't scan MAP_GROWSDOWN vmas
* VDSO is never mapped with MAP_GROWSDOWN
* The first page  of growsdown vma may be a guard page, so any attempt
  to read it is suppressed by SIGBUS.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-12 14:46:33 +04:00
Cyrill Gorcunov
8a9e81b646 parasite-syscall: Print which @syscall_ip is selected
Useful for debugging.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-12 14:46:24 +04:00
Pavel Emelyanov
d513098f3c mount: Don't create kids with CLONE_NEWNS
We explicitly setns() every single task by hands when restoring
mount namespaces, they can be created without the NEWNS flag.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrey Vagin <avagin@parallels.com>
2014-05-12 14:20:17 +04:00
Andrey Vagin
63c9478a4a zdtm: add a new test case for growdown vma-s
This test case creates two consecutive grows down vmas with a hole
between them.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-07 21:03:50 +04:00
Andrey Vagin
20ec585916 mem: add a guard page only if here is enough space for it
Currently we don't add a guard page to a second consecutive growsdonw vma,
even if here is enough space for it. It's wrong. Look at the following test
output:

Execute zdtm/live/static/grow_map03
./grow_map03 --pidfile=grow_map03.pid --outfile=grow_map03.out
Dump 3888
Restore
Test: zdtm/live/static/grow_map03, Result: FAIL
==================================== ERROR ====================================
Test: zdtm/live/static/grow_map03, Namespace:
Dump log   : /root/git/criu/test/dump/grow_map03/3888/1/dump.log
--------------------------------- grep Error ---------------------------------
------------------------------------- END -------------------------------------
Restore log: /root/git/criu/test/dump/grow_map03/3888/1/restore.log
--------------------------------- grep Error ---------------------------------
pie: Error (pie/restorer.c:465): Unable to remap 0x7f0da2c99000 -> 0x7f46425fc000
pie: Error (pie/restorer.c:969): Restorer fail 3888
(00.035621) Error (cr-restore.c:1590): Restoring FAILED.
------------------------------------- END -------------------------------------
================================= ERROR OVER =================================

strace:
mremap(0x7fc3de5b6000, 0, 0, MREMAP_MAYMOVE|MREMAP_FIXED, 0x7f38dd4e0000) = -1 EINVAL (Invalid argument)

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-07 21:03:34 +04:00
Andrey Vagin
3a9c6a3d37 util: use glibc macros to generate device numbers in the dev_t format
Our version of macroses are worng.

Our macros:
#define MINOR(dev)           ((dev) & 0xff)

Glibc function:
return (__dev & 0xff) | ((unsigned int) (__dev >> 12) & ~0xff);

Reported-by: Amey Deshpande <ameyd@google.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-07 21:02:35 +04:00
Cyrill Gorcunov
74a2cce189 page-pipe: Don't increase the page_pipe::nr_pipes if we reuse pipes
The page-pipe buffers may be reused once queued pages are
dumped, but we happen to increase page_pipe::nr_pipes
all the timer, regardless the fact where page buffer
came from.

In worst scenario this may lead to incorrect -EAGAIN returned
from page_pipe_grow forcing calling code to create new
pipes. This is not critical but should be fixed.

In other words page_pipe::nr_pipes must track _really_
created pipes.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-07 21:01:35 +04:00
Pavel Emelyanov
54a996b0f3 criu: Version 1.3-rc1
We have made big step towards C/R of LXC containers -- made support
for nested mount namespaces and addressed the re-attach issue. Also
we have AArch64 support merged.

Some work is still to be done, but it's good time to show what we
have so far, thus -- the 1.3-rc1.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
v1.3-rc1
2014-04-25 15:40:17 +04:00
Christopher Covington
5d74f55d80 Don't say /proc in macro errors
It's possible that a procfs mounted somewhere other than /proc
is in use.

Signed-off-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-25 13:25:13 +04:00
Cyrill Gorcunov
0c89d779f9 log: Include inttypes.h for PRI helpers
https://bugzilla.openvz.org/show_bug.cgi?id=2949

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-25 13:23:55 +04:00
Andrey Vagin
74c0e9e201 check: skip mnt_id support check for mainstream kernels (v2)
v2: s/tun/mnt_id
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-25 13:22:43 +04:00
Andrey Vagin
08960eb1fb zdtm: check mnt_id in /proc/PID/fdinfo/X
Nested mount namespaces should be checked only if fdinfo contains mnt_id

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-24 11:55:40 +04:00
Pavel Emelyanov
d4e4e57744 mnt: Get ns temp root path once when collecting data from image
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-23 17:40:39 +04:00
Pavel Emelyanov
5975c94e3c zdtm: Add testing of irmap cache generation
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-23 17:24:19 +04:00
Andrey Vagin
a20011aca6 ns: initialize nsid in rst_add_ns_id
Execute zdtm/live/static/pipe00
./pipe00 --pidfile=pipe00.pid --outfile=pipe00.out
Dump 3158
Restore
test/zdtm.sh: line 472:  3173 Segmentation fault      (core dumped) setsid  restore --file-locks --tcp-established -x -D  -o

Reported-by: Jenkins Criuovich
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-23 14:46:19 +04:00
Pavel Emelyanov
8d5822d9cb mnt: Factor out mntns nsid creation on restore
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-23 13:22:12 +04:00
Pavel Emelyanov
aca33ac402 mnt: Add comment why we need criu's mntns mounts for dump
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-23 13:10:21 +04:00
Pavel Emelyanov
f22591c99a files: Check for for mount to exist only once
The nsid lookup will search for mount in case mnt_id
is given. No need to do it twice (the 2nd time for
sanity check).

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-23 03:01:21 +04:00
Pavel Emelyanov
74357818f8 unix: Get ns root fd only once.
This makes mntns_get_root_fd usage more natural.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-23 02:51:11 +04:00
Pavel Emelyanov
8ef1d378bf mnt: Add comments on tricky places
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-23 02:47:42 +04:00
Pavel Emelyanov
3d11c6c0ed mnt: Sanitize prepare_mnt_ns nsids loop
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-23 02:35:04 +04:00