2
0
mirror of https://github.com/checkpoint-restore/criu synced 2025-08-29 05:18:00 +00:00

4796 Commits

Author SHA1 Message Date
Tycho Andersen
8ff0b1ef06 cg: Use the right path offset to restore properties
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-22 19:17:28 +04:00
Saied Kazemi
d8b41b6525 Added AUFS support.
The AUFS support code handles the "bad" information that we get from
the kernel in /proc/<pid>/map_files and /proc/<pid>/mountinfo files.
For details see comments in sysfs_parse.c.

The main motivation for this work was dumping and restoring Docker
containers which by default use the AUFS graph driver.  For dump,
--aufs-root <container_root> should be added to the command line options.
For restore, there is no need for AUFS-specific command line options
but the container's AUFS filesystem should already be set up before
calling criu restore.

[ xemul: With AUFS files sometimes, in particular -- in case of a
  mapping of an executable file (likekely the one created at elf load),
  in the /proc/pid/map_files/xxx link target we see not the path
  by which the file is seen in AUFS, but the path by which AUFS
  accesses this file from one of its "branches". In order to fix
  the path we get the info about branches from sysfs and when we
  meet such a file, we cut the branch part of the path. ]

Signed-off-by: Saied Kazemi <saied@google.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-21 18:35:22 +04:00
Pavel Emelyanov
1514284d84 locks: Fix restore from v1.2 images
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-20 17:38:36 +04:00
Andrey Vagin
c049d8452d files: don't check uninitialized memory in create_link_remap()
Look at this strace output:
107   linkat(45, "", 1017, "./root/git/orig/criu/test/zdtm/live/static/unlink_fstat03.test (deleted)/link_remap.4", AT_EMPTY_PATH) = -1 ENOENT (No such file or director

It's obvious, that we didn't cat the file name.

Here is an error in calculation of offset for the last symbol.
The current version of code sets this offset in strlen(),
but it's actually strlen() - 1.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-20 14:02:43 +04:00
Pavel Emelyanov
546f2701f0 signals: Comments and while (1) loop
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-19 15:27:54 +04:00
Pavel Emelyanov
11fc475853 signals: Sanitize j loop control variable
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-19 15:27:40 +04:00
Pavel Emelyanov
f9ebd18354 signals: Don't collect siginfo_t-s on stack
We've moved signinfos on core entry, thus the bits with
siginfo-s themselves cannot sit on stack any longer.
Otherwise we would overwritem them with next batch and
will feed stack pointer to the caller, thus causing a
data and garbage on the stack to be written into image
instead of siginfo data.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-19 15:27:19 +04:00
Pavel Emelyanov
92664c5220 signals: Don't forget to allocate SiginfoEntry
The se variable is just an array of pointers on these
objects. Need to allocate the objects themselves.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-19 15:25:57 +04:00
Pavel Emelyanov
8197bae072 signals: Move nr variable into peeking loop
And sanitize its usage a little bit.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-19 15:25:13 +04:00
Pavel Emelyanov
22082b0e55 signals: Calculate peek offset in-place
No need in extra variable for that.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-19 15:24:36 +04:00
Pavel Emelyanov
4778cb30bb zdtm: Remove cgroup02 out of runlist
It fails on moving tasks into cpuset due to empty masks. Temporary
disable the test.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-19 13:39:29 +04:00
Pavel Emelyanov
ddd837d9e9 rst: Fix core pointer passed into reading thread core image
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-19 13:19:06 +04:00
Ruslan Kuprieiev
9b2d1774ba image: mark CR_FD_SIGNAL and CR_FD_PSIGNAL as obsoleted and don't create signal-s*.img, v2
After this patch, signal-s*.img won't be created.

v2: just move them to the end of array

Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-19 13:09:49 +04:00
Ruslan Kuprieiev
68501cde88 dump: dump signals into signals_*
Every thread has it's own private signals stored at thread_core->signals_p
and leader thread has also shared signals stored at tc->signals_s.

Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-19 13:09:47 +04:00
Ruslan Kuprieiev
aac9fd5bad dump: allocate task cores in collect_task() instead of parasite_infect_seized()
We need it to be able to dump signals into cores
before calling parasite_infect_seized().

Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-19 13:09:46 +04:00
Ruslan Kuprieiev
60ef59c7ff restore: use signals_s and signals_p to prepare signals
In order to save backward compatibility, criu will try to open signal*.img,
if no signals_* are found.

Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-19 13:09:45 +04:00
Ruslan Kuprieiev
235a41fcf9 restore: open cores for each thread early and store them at current->core
We need to open cores for each thread early, because we'll need them to
prepare signals later.

Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-19 13:09:44 +04:00
Ruslan Kuprieiev
2950f5ea6e protobuf: add signal_queue_entry and use it in thread_core_entry and task_core_entry
Add signal_queue_entry signals_s for shared signals to task_core_entry
and signals_p for private signals to thread_core_entry.

Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-19 13:09:43 +04:00
Pavel Emelyanov
f781ba0466 rst: Rework task_entries to use rst_mem engine
The task_entries is a small structure used to coordinate the
processes restore stages. Currentl we allocate one page for
it and handle one separately. No need in this complexity, actually.
The rst_mem engine is already capable to controll this small object.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-19 13:00:10 +04:00
Pavel Emelyanov
a9484a916a shmem: Fix format of printing shmem addresses
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-19 13:00:09 +04:00
Pavel Emelyanov
5f9acc8dc9 shmem: Explicitly initialize rst_shmems
This is a position in the RM_SHREMAP memory. Since shmems are currently
the only user of it, this is validly equals zero, but it will change soon.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-19 13:00:07 +04:00
Tycho Andersen
aa72de67fd tests: add a test for --cgroup-root
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-19 12:59:10 +04:00
Tycho Andersen
94f6c87c9f cg: add --cgroup-root option
The motivation for this is to be able to restore containers into cgroups other
than what they were dumped in (if, e.g. they might conflict with an existing
container). Suppose you have a container in:

memory:/mycontainer
cpuacct,cpu:/mycontainer
blkio:/mycontainer
name=systemd:/mycontainer

You could then restore them to /mycontainer2 via --cgroup-root /mycontainer2.
If you want to restore different controllers to different paths, you can
provide multiple arguments, for example, passing:

--cgroup-root /mycontainer2 --cgroup-root cpuacct,cpu:/specialcpu \
--cgroup-root name=systemd:/specialsystemd

Would result in things being restored to:

memory:/mycontainer2
cpuacct,cpu:/specialcpu
blkio:/mycontainer2
name=systemd:/specialsystemd

i.e. a --cgroup-root without a controller prefix specifies the new default root
for all cgroups.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-19 12:58:36 +04:00
Andrey Vagin
513b0dc3e0 zdtm_ct: call setsid() to move in another autogroup
Transition and streaming tests can create many processes
which are using cpu. CPU should be divided between tests fairly.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-19 12:57:36 +04:00
Garrison Bellack
a152c843b8 Quick patch for error when writing mem.lim default
When writing the system default for memory.limit_in_bytes (which is a LLONG_MAX)
the write fails. The number is equivalent to -1 (unlimited). So during dump,
store the number -1 instead.

Change-Id: Iafccc96bf5dbade763d7addaeda24194616e4d5f
Signed-off-by: Garrison Bellack <gbellack@google.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-18 14:50:00 +04:00
Sophie Blee-Goldman
e606c2141e Dump capabilities from the parasite
Needed for future user namespace support. Capabilities will have to be
dumped from the parasite, ie from inside the namespace since there is no
obvious way to 'translate' capabilities from the global namespace (unlike
with uids and gids, where the id mappings can be used for translation).

[ additional explanation from Andrew Vagin:

"capabilities" are not translated between namespaces. They can exist
only in one userns, where a process lives. If a process is created in a
new userns, it gets a full set of capabilities in this userns, and
loses all caps in a parent userns.

So if capabilities are not shown in /proc/pid/stat, we have no way to
get it except of using parasite code. ]

Signed-off-by: Sophie Blee-Goldman <ableegoldman@google.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-15 23:10:44 +04:00
Sophie Blee-Goldman
4940776620 Move function definition
Moves the definition of kerndat_init() to below the definition
of get_last_cap(). Needed for reading capabilities in a future patch.

Signed-off-by: Sophie Blee-Goldman <ableegoldman@google.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-15 23:09:51 +04:00
Cyrill Gorcunov
e90d0f1214 zdtm: pty00 -- Count for SIGHUP
Just to make sure we're not loosing signals
after restore.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-15 22:15:12 +04:00
Tycho Andersen
37cf27d33e cg: path buffer should be PATH_MAX long
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-15 22:14:14 +04:00
Garrison Bellack
95e689db42 cg: Make lacking properties during dump non-fatal
Because different kernel versions have different cgroup properties, criu
shouldn't crash just because the properties statically listed aren't exact.
Instead, during dump, ignore properties the kernel doesn't have and continue.

Change-Id: I5a8b93d6a8a3a9664914f10cf8e2110340dd8b31
Signed-off-by: Garrison Bellack <gbellack@google.com>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-15 22:13:38 +04:00
Andrew Vagin
2d1f5a06c8 zdtm: don't use same cgoup names for a few tests (v2)
We run tests concurrently and they can race for equal resources

v2: fix hooks too
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-15 22:12:04 +04:00
Andrew Vagin
73fc3a775a zdtm/cgroup01: create more than one empty cgroups
We found a bug, when a second cgroup is restored incorrectly,
so let's create one more empty cgroup.

Signed-off-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-15 22:11:25 +04:00
Andrew Vagin
bbdff34803 cgroup: don't overwrite the offset value in a loop (v2)
prepare_cgroup_dirs() gets a path and an offset.
Then we add substrings to the source string and handle them.

v2: fix one more place in prepare_cgroup_dir_properties()

Cc: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-15 22:11:10 +04:00
Cyrill Gorcunov
14c65e91fa cg: Drop redundant newline from pr_perror
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-15 22:10:25 +04:00
Andrey Vagin
bff466c291 tty: open tty-s with O_NOCTTY
When we open tty, we don't want to set it as controlling terminal.

[xemul: We do it in all the other places, this one is forgotten.
 The "controlling tty" feature is setup explicitly later with
 the ioctl (TIOCSCTTY) call. ]

This bug was caught by pty04. Where we get unexpected SIGCONT,
which is sent after closing a controlling terminal.

./pty04 --pidfile=pty04.pid --outfile=pty04.out
Dump 9578
Restore
Test: zdtm/live/static/pty04, Result: FAIL
==================================== ERROR ====================================
Test: zdtm/live/static/pty04, Namespace:
Dump log   : /home/jenkins/workspace/Rpi-CRIU/test/dump/static/pty04/9578/1/dump.log
--------------------------------- grep Error ---------------------------------
------------------------------------- END -------------------------------------
Restore log: /home/jenkins/workspace/Rpi-CRIU/test/dump/static/pty04/9578/1/restore.log
--------------------------------- grep Error ---------------------------------
(00.083420) Error (cr-restore.c:1092): 9578 killed by signal 0
(00.083708) Error (cr-restore.c:1713): Restoring FAILED.
------------------------------------- END -------------------------------------
================================= ERROR OVER =================================

Reported-by: Mr Jenkins
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-15 13:28:16 +04:00
Cyrill Gorcunov
994ae676b4 restore: Set CLONE_PARENT iif pdeath_sig is present, v4
It's been discovered that on 3.11 we might fail on restore
if pass @CLONE_PARENT flag into clone() call due to kernel
limitations.

Because we're treating 3.11 as a base working kernel lets
do a trick instead

 - setup this flag iif pdeath_sig is present
 - if CLONE_NEWPID is passed warn a user about
   potential consequences.
 - because we need to carry the condition in attach_to_tasks
   call, introduce @root_as_sibling variable for this.

CC: Tycho Andersen <tycho.andersen@canonical.com>
CC: Pavel Emelyanov <xemul@parallels.com>
CC: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-15 13:26:36 +04:00
Andrey Vagin
47fae013b5 zdtm: add a small program to create a zdtm container (v2)
I didn't find a way how to do that with help "unshare".
It's simpler to write this program. It looks better than tricks in
zdtm.sh.

v2: proxify exit status

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Ruslan Kuprieiev <kupruser@gmail.com>
Acked-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-14 18:28:38 +04:00
Andrey Vagin
0b33bac3bc criu: allow the root task to handle SIGCHLD
If criu process attaches to the root task (it happens for opts.swrk_restore
and opts.restore_detach) with ptrace, then any signal delivered to the root
would be also delivered to criu. The latter woult treat the former to die
due to this delivery and would abort the restore.

Fix it by checking that criu (current == NULL) gets ptrace notification
(si_code == CLD_TRAPPED) about signal delivered (si_status = SIGCHLD,
no other signals are allowed by the restoring tasks).

This patch fixes the following error of static/zombie00:

Execute zdtm/live/static/zombie00
./zombie00 --pidfile=zombie00.pid --outfile=zombie00.out
Dump 2207
Restore
Test: zdtm/live/static/zombie00, Result: FAIL
==================================== ERROR ====================================
Restore log: /root/git/orig/criu/test/dump/static/zombie00/2207/1/restore.log
(00.026826) Error (cr-restore.c:1085): 2207 killed by signal 17
(00.026985) Error (cr-restore.c:1706): Restoring FAILED.
================================= ERROR OVER =================================

Reported-by: Mr Jenkins
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-14 17:09:53 +04:00
Pavel Emelyanov
bcd1649699 cg: Use relative paths in cgroup dirs image
Before the patch cg tree section from cgroup00 test looked like this

{
	cnames: "name=zdtmtst"
	dirs: 	{
		path: "/subcg"
		children: 		{
			path: "/subcg/subsubcg"
			children: <empty>
			properties: <empty>
		}

		properties: <empty>
	}

}

this /subsg in the children is excessive. Turn this into directory names.
Now the section looks like

{
	cnames: "name=zdtmtst"
	dirs: 	{
		dir_name: "subcg"
		children: 		{
			dir_name: "subsubcg"
			children: <empty>
			properties: <empty>
		}

		properties: <empty>
	}

}

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
2014-08-14 12:27:19 +04:00
Pavel Emelyanov
bf91821f11 cg: Fix restoration of tasks into existing cgroups
When we omit the --manage-cgroups on dump the controllers section
in cgroups image lacks the none-d entries (the name=systemd is the
most typical).

If it happens, that init task lives in non-criu cgset (it can be
so if we do --shell-job dump from another terminal and see criu
and root task living in different user.slice systemd cgroups) then
on restore the move_in_cgroup() would fail to lookup the required
controller.

In order to fix this we should still call the collect_cgroups()
on dump, so that it adds the none-d controllers into the list,
but don't dump the dirs tree itself.

The patch looks ugly, but it just moves the current_controller
evaluation from the middle of the loop upwards (and renames the
char *opts variable not to conflict with global opts).

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
2014-08-14 12:26:56 +04:00
Tycho Andersen
e301b1d56c restore: --restore-detached implies CLONE_PARENT
We need to use CLONE_PARENT to prevent processes from immediately dying due to
pdeath_sig when they are restored in detached mode.

[ xemul: One more place which requires check for restore-detach
         is in sigactions preparation ]

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-14 12:25:07 +04:00
Andrey Vagin
9d4e5370f1 zdtm/ipc_namespace: set the auto_msgmni sysctl to zero
We are going to execute tests concurrently, but if auto_msgmni is
enabled, the msgmni is recalcalated each time, when ipcns is created
or removed.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-14 12:21:53 +04:00
Andrey Vagin
edca5ab0af sysctl: don't write '\0' at the end of buffer in a sysctl file
It isn't required. The kernel has a bug in handling auto_msgmni and
if we send extra symbols, a new value isn't applied.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-14 12:21:48 +04:00
Andrey Vagin
64405c1d5b ipc: set the msgmni sysctl after auto_msgmni
Because setting of auto_msgmni recalculates a value of msgmni

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-14 12:21:43 +04:00
Andrey Vagin
20578e63cf zdtm/ipc_namespaces: don't extra symbols in a sysctl file
The kernel has a bug in handling auto_msgmni and if we send extra
symbols, a new value isn't applied.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-14 12:21:38 +04:00
Andrey Vagin
6705051282 syscall: don't use pr_info to print a part of string
Before:
(00.009468)     87: sysctl: <kernel/sem> = <(00.009475)     87: 2108913153 (00.009481)     87: 1252387386 (00.009486)     87: 835139248 (00.009491)     87: 320896030 (00.009496)     87: >
After:
(00.009468)     87: sysctl: <kernel/sem> = <2108913153 1252387386 835139248 320896030 >

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-14 12:20:36 +04:00
Cyrill Gorcunov
d7ff4a1319 test: bers -- Add short help output
Reported-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-13 15:50:54 +04:00
Sophie Blee-Goldman
3faaed2f64 Bug-fix in size calculation
Fixes a bug in how PARASITE_MAX_GROUPS was calculated, and adds a
compiler check to assert that parasite_dump_creds doesn't exceed
the page size.

Signed-off-by: Sophie Blee-Goldman <ableegoldman@google.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-13 13:04:58 +04:00
Tycho Andersen
ded04267f8 scripts: set CRIU_IMAGE_DIR when running scripts
When doing a restore for LXC, we store some other metadata (which bridge a veth
was on) in the image directory so that the restore script can correctly unlock
a network device and attach it to the right interface. This patch is needed so
that the script can find this metadata.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-12 22:43:37 +04:00
Pavel Emelyanov
44926184a1 cg: Don't copy path when restoring properties
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-12 22:32:22 +04:00