2
0
mirror of https://github.com/checkpoint-restore/criu synced 2025-08-28 12:57:57 +00:00

181 Commits

Author SHA1 Message Date
Saied Kazemi
e3fec5f8eb Ignore mnt_id value for AUFS file descriptors.
Starting with version 3.15, the kernel provides a mnt_id field in
/proc/<pid>/fdinfo/<fd>.  However, the value provided by the kernel for
AUFS file descriptors obtained by opening a file in /proc/<pid>/map_files
is incorrect.

Below is an example for a Docker container running Nginx.  The mntid
program below mimics CRIU by opening a file in /proc/1/map_files and
using the descriptor to obtain its mnt_id.  As shown below, mnt_id is
set to 22 by the kernel but it does not exist in the mount namespace of
the container.  Therefore, CRIU fails with the error:

	"Unable to look up the 22 mount"

In the global namespace, 22 is the root of AUFS (/var/lib/docker/aufs).

This patch sets the mnt_id of these AUFS descriptors to -1, mimicing
pre-3.15 kernel behavior.

	$ docker ps
	CONTAINER ID        IMAGE                    ...
	3850a63ee857        nginx-streaming:latest   ...
	$ docker exec -it 38 bash -i
	root@3850a63ee857:/# ps -e
	  PID TTY          TIME CMD
	    1 ?        00:00:00 nginx
	    7 ?        00:00:00 nginx
	   31 ?        00:00:00 bash
	   46 ?        00:00:00 ps
	root@3850a63ee857:/# ./mntid 1
	open("/proc/1/map_files/400000-4b8000") = 3
	cat /proc/49/fdinfo/3
	pos:	0
	flags:	0100000
	mnt_id:	22
	root@3850a63ee857:/# awk '{print $1 " " $2}' /proc/1/mountinfo
	87 58
	103 87
	104 87
	105 104
	106 104
	107 104
	108 87
	109 87
	110 87
	111 87
	root@3850a63ee857:/# exit
	$ grep 22 /proc/self/mountinfo
	22 21 8:1 /var/lib/docker/aufs /var/lib/docker/aufs ...
	44 22 0:35 / /var/lib/docker/aufs/mnt/<ID> ...
	$

Signed-off-by: Saied Kazemi <saied@google.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-02-09 14:07:40 +03:00
Pavel Emelyanov
89d9578730 proc: Print parsed fstype for unsupported mounts
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-01-27 16:15:22 +03:00
Pavel Emelyanov
c9b6614eef proc: Remove now pointless debug
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-01-26 15:05:32 +03:00
Saied Kazemi
2749d9e6ea Rework fixup_aufs_vma_fd() for non-AUFS links
This patch reworks fixup_aufs_vma_fd() to let symbolic links in
/proc/<pid>/map_files that are not pointing to AUFS branch names follow
the non-AUFS applcation logic.

The use case that prompted this commit was an application mapping
/dev/zero as shared and writeable which shows up in map_files as:

lrw------- ... 7fc5c5a5f000-7fc5c5a60000 -> /dev/zero (deleted)

If the AUFS support code reads the link, it will have to strip off the
" (deleted)" string added by the kernel but core CRIU code already
does this.

Signed-off-by: Saied Kazemi <saied@google.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-01-22 14:56:40 +03:00
Pavel Emelyanov
08c204820f aio: Dump AIO rings
When AIO context is set up kernel does two things:

1. creates an in-kernel aioctx object
2. maps a ring into process memory

The 2nd thing gives us all the needed information
about how the AIO was set up. So, in order to dump
one we need to pick the ring in memory and get all
the information we need from it.

One thing to note -- we cannot dump tasks if there
are any AIO requests pending. So we also need to
go to parasite and check the ring to be empty.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-12-26 18:13:36 +03:00
Pavel Emelyanov
86c0c5fb99 proc: Allocate and get vma fstat in vma_get_mapfile
We will need to detect aio mappings soon, so this is a preparation,
that makes future patching simpler.

Also move aufs stat-ing into aufs code to keep more aufs logic in
one place.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-12-25 21:10:15 +03:00
Pavel Emelyanov
6a6cdb8d4a proc: Drop always true last argument of parse_smaps()
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
2014-12-22 13:52:03 +03:00
Pavel Emelyanov
ab2c1e426c proc_parse: Invert supported VMA check
It's for more natural adding of new else-if branch for aio.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-12-17 18:11:40 +03:00
Pavel Emelyanov
d4c62b6b5c proc_parse: Print vma start in hex
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-12-17 18:10:54 +03:00
Cyrill Gorcunov
88031bf89e proc_parse: Convert parse_pid_status to BFD engine
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-11-11 20:18:27 +04:00
Pavel Emelyanov
19a76494a9 kerndat: Collect all global variables on one struct
Not to spoil the global namespace and unify the kerndat
data names.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-11-11 20:14:53 +04:00
Andrey Vagin
71a9cd0634 proc: delete parse_pid_stat_small() (v2)
It's unused now.

v2: remove the proc_pid_stat_small struct too.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-11-07 17:15:37 +04:00
Andrey Vagin
05943959a5 proc: parse state and ppid from /proc/pid/status (v2)
v2: don't leak FILE

CID 73423 (#1 of 1): Resource leak (RESOURCE_LEAK)
15. leaked_storage: Variable f going out of scope leaks the storage it points to.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-11-07 17:15:03 +04:00
Andrey Vagin
fc84aa581a flock: blocked processes are not interesting for us (v2)
All out processes are stopped in a moment, when file locks are
collected, so they can't to wait any locks.

Here is a proof of this theory:
[root@avagin-fc19-cr ~]# flock xxx sleep 1000 &
[1] 23278
[root@avagin-fc19-cr ~]# flock xxx sleep 1000 &
[2] 23280
[root@avagin-fc19-cr ~]# cat /proc/locks
1: FLOCK  ADVISORY  WRITE 23278 08:03:280001 0 EOF
1: -> FLOCK  ADVISORY  WRITE 23280 08:03:280001 0 EOF
[root@avagin-fc19-cr ~]# gdb -p 23280
(gdb) ^Z
[3]+  Stopped                 gdb -p 23280
[root@avagin-fc19-cr ~]# cat /proc/locks
1: FLOCK  ADVISORY  WRITE 23278 08:03:280001 0 EOF

Currently criu can dump nothing, if we have one process which is
waiting a lock. I don't see any reason to do this.

v2: typo fix

Cc: Qiang Huang <h.huangqiang@huawei.com>
Reported-by: Mr Jenkins
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-11-05 16:34:52 +04:00
Andrey Vagin
7cb829b78f proc: don't leak memory
CID 73370: Resource leak (RESOURCE_LEAK)
13. leaked_storage: Variable timer going out of scope leaks the storage it points to.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-11-05 15:46:59 +04:00
Pavel Emelyanov
2e91a9c814 bfd: Don't flush read-only images
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-11-05 15:38:17 +04:00
Andrey Vagin
2464ad08d6 locks: print a lock before reporting an error about it
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-10-30 15:16:17 +04:00
Cyrill Gorcunov
4135f6cd1c proc_parse: parse_smaps -- Use @file_path instead of strstr helper
strstr is a really heavy one, lets use already defined
and filled @file_path variable instead.

Reported-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-10-27 21:28:18 +04:00
Cyrill Gorcunov
4ad462b459 mount: proc-parse -- Show @mnt_id on debug print as well
This is convenient when need to lookup into debug prints
and check which mount point were used somewhere else
(in particular I will need @mnt_id in tty code so
 on error I can easily figure out which mountpoint has
 been used).

No func changes.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-10-03 13:21:16 +04:00
Cyrill Gorcunov
ae96d21a07 bfd: Use ERR_PTR and such instead of BREADERR
No need to invent new error codes here, simply
use ERR_PTR/IS_ERR_OR_NULL and such.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-10-02 14:56:39 +04:00
Cyrill Gorcunov
c01efda8af bfd: timerfd -- Fix parsing typo
While been converting reading of data stream
to bfd the @buf member was left untouched leading
to incorrect data to be read, fix it setting up
proper one, ie @str itself, otherwise dumping
of timerfd files are failing.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-30 11:48:15 +04:00
Pavel Emelyanov
e651a6eba4 filemap: Get vma mnt_id early
We have a, well, issue with how we calculate the vma's mnt_id.

Right now get one via criu side file descriptor that it got by
opening the /proc/pid/map_files/ link. The problem is that these
descriptors are 'merged' or 'borrowed' by adjacent vmas from
previous ones. Thus, getting the mnt_id value for each of them
makes no sense -- these files are the same.

So move this mnt_id getting earlier into vma parsing code. This
brings a potential problem -- if we have two adjacent vmas
mapping the same inode (dev:ino pair) but living in different
mount namespaces -- this check would produce wrong result.
"Wrong" from the perspective that on restore correct file would
be opened from wrong namespace.

I propose to live with it, since this is not worse than the
--evasive-devices option, it's _very_ unlikely, but saves a lot
of openeings.

Note, that in case app switched mount namespace and then mapped
some new library (with dlopen) things would work correctly -- new
vmas will likely be not adjacent and for different dev:ino.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-29 13:20:55 +04:00
Pavel Emelyanov
cf8c9ae870 vma: Reshuffle the struct vma_area
We have some fields, that are dump-only and some that
are restore only (quite a lot of them actually).

Reshuffle them on the vma_area to explicitly show which
one is which. And rename some of them for easier grep.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-29 13:19:55 +04:00
Pavel Emelyanov
cfce460b48 proc_parse: Rework timers parser to use bfd
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-23 20:49:16 +04:00
Pavel Emelyanov
cc4a67b3ed proc_parse: Rework smaps parser to use bfd
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-23 20:49:07 +04:00
Pavel Emelyanov
2c8af6b8e6 proc_parse: Rework fdinfo parser to use bfd
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-23 20:48:58 +04:00
Pavel Emelyanov
d3b634283e proc: Use fopen_proc instead of fopen("/proc...")
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-18 20:26:20 +04:00
Pavel Emelyanov
6e960f1fc5 proc: Use fopen_proc in fdinfo parsing
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-18 20:26:12 +04:00
Cyrill Gorcunov
64a7aa55eb cg: Fix separator search in parse_task_cgroup
If there is no separator in first place we should
avoid implicit + 1 which make @name = 1 in worst case.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-18 20:19:00 +04:00
Cyrill Gorcunov
b99b76b045 cg: proc_parse -- Don't compare cgroup paths
When we compare sets in cg_set_compare() we presume that controller
names are properly sorted but because of use of strcmp(cc->path, path)
it's not true. In particular in case if there are two same sets which
differ in paths only

(00.126812) cg:  `- New css ID 2
(00.127051) cg:     `- [memory] -> [/vz-1]
(00.127079) cg:     `- [name=systemd] -> [/vz-1]
(00.127108) cg:     `- [net_cls] -> [/vz-1]

(00.239829) cg:  `- New css ID 3
(00.240067) cg:     `- [memory] -> [/vz-1]
(00.240096) cg:     `- [net_cls] -> [/vz-1]
(00.240154) cg:     `- [name=systemd] -> [/vz-1/system.slice/dbus.service]

we currently refuse to dump such configuretion. Thus remove
path comparision from the first place.

CC: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-16 23:14:47 +04:00
Andrey Vagin
f88d72d0bd mount: strip options for all mounts
Currently we stript options only one of brothers, but
mount_equal() thinks that two brothers should have the same options.

Execute zdtm/live/static/mountpoints
./mountpoints --pidfile=mountpoints.pid --outfile=mountpoints.out
Dump 2737
WARNING: mountpoints returned 1 and left running for debug needs
Test: zdtm/live/static/mountpoints, Result: FAIL
==================================== ERROR ====================================
Test: zdtm/live/static/mountpoints, Namespace:
Dump log   : /root/git/criu/test/dump/static/mountpoints/2737/1/dump.log
--------------------------------- grep Error ---------------------------------
(00.146444) Error (mount.c:399): Two shared mounts 50, 67 have different sets of children
(00.146460) Error (mount.c:402): 67:./zdtm_mpts/dev/share-1 doesn't have a proper point for 54:./zdtm_mpts/dev/share-3/test.mnt.share
(00.146820) Error (cr-dump.c:1921): Dumping FAILED.
------------------------------------- END -------------------------------------
================================= ERROR OVER =================================

Reported-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Tested-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-10 18:33:08 +04:00
Pavel Emelyanov
b6e3223a1e locks: Don't skip out-of-tree flocks
These guys may have pids that are not met in pstree.
This is not the reason for skipping those, try to
resolve flocks anyway.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-02 19:54:28 +04:00
Pavel Emelyanov
efac9ed8b3 locks: Parse lock type earlier
Same reason as for previous patch.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-02 17:44:39 +04:00
Pavel Emelyanov
0095b40a29 locks: Parse lock kind earlier
Currently we keep the lock type (posix/flock) till the
time we dump it, then "decode" it into binary value.
I will need the easy-to-check one early, so parse the
kind in proc_parse.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-02 16:39:09 +04:00
Andrey Vagin
33c75d0df9 eventpoll: parse_fdinfo_pid_s() returns allocated object for eventpol tfd
We are going to collect all objects in a list and write them into
the eventpoll image. The eventpoll tfd image will be depricated.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-02 16:08:17 +04:00
Andrey Vagin
78a54bd87c fsnotify: parse_fdinfo_pid_s() returns allocated object for fanotify marks
We are going to collect all objects in a list and write them into
the fanotify image. The fanotify mark image will be depricated.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-02 16:07:44 +04:00
Andrey Vagin
7079bb1086 fsnotify: parse_fdinfo_pid_s() returns allocated object for inotify wd (v2)
We are going to collect all objects in a list and write them into
the inotify image. The inotify wd image will be depricated.

v2: cb() must always free an entry
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-02 16:07:43 +04:00
Saied Kazemi
9eec8b03af Use --root instead of --aufs-root
When dumping Docker containers using the AUFS graph driver, we can
use the --root option instead of --aufs-root for specifying the
container's root.  This patch obviates the need for --aufs-root
and makes dump CLI more consistent with restore CLI.

Signed-off-by: Saied Kazemi <saied@google.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-27 14:31:40 +04:00
Saied Kazemi
d8b41b6525 Added AUFS support.
The AUFS support code handles the "bad" information that we get from
the kernel in /proc/<pid>/map_files and /proc/<pid>/mountinfo files.
For details see comments in sysfs_parse.c.

The main motivation for this work was dumping and restoring Docker
containers which by default use the AUFS graph driver.  For dump,
--aufs-root <container_root> should be added to the command line options.
For restore, there is no need for AUFS-specific command line options
but the container's AUFS filesystem should already be set up before
calling criu restore.

[ xemul: With AUFS files sometimes, in particular -- in case of a
  mapping of an executable file (likekely the one created at elf load),
  in the /proc/pid/map_files/xxx link target we see not the path
  by which the file is seen in AUFS, but the path by which AUFS
  accesses this file from one of its "branches". In order to fix
  the path we get the info about branches from sysfs and when we
  meet such a file, we cut the branch part of the path. ]

Signed-off-by: Saied Kazemi <saied@google.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-21 18:35:22 +04:00
Andrey Vagin
c9228dd809 restore: use /proc/self/mountinfo for collecting mounts fo the root task (v3)
If the root task is forked in a new pidns, it can't use its pid for
accessing /proc, because this proc belongs to the source pidns.

v2: don't copy a static string.
v3: take a bright part of Tycho's patch

Reported-by: Tycho Andersen <tycho.andersen@canonical.com>
Cc: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-12 14:35:25 +04:00
Cyrill Gorcunov
8311b43517 proc_parse.c: parse_task_cgroup -- Don't forget to init @path
proc_parse.c: In function ‘parse_task_cgroup’:
proc_parse.c:1603:16: error: ‘path’ may be used uninitialized in this function [-Werror=uninitialized]
cc1: all warnings being treated as errors

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-07 13:18:09 +04:00
Andrey Vagin
a48e52b58c proc_parse: check that scanf fill the offset var
CID 1168165 (#2 of 2): Untrusted array index read (TAINTED_SCALAR)
40. tainted_data: Using tainted variable "hoff" as an index into an
array "str"

$ man 3 scanf
n      Nothing  is expected; instead, the number of characters consumed
      thus far from the input is  stored  through  the  next  pointer,
      which  must  be  a  pointer  to  int.  This is not a conversion,
      although it can be suppressed with the *  assignment-suppression
      character.   The  C  standard says: "Execution of a %n directive
      does not increment the assignment count returned at the  comple‐
      tion of execution" but the Corrigendum seems to contradict this.
      Probably it is wise not to make any assumptions on the effect of
      %n conversions on the return value.

So it isn't not enough to check a return code from scanf().

Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-07 10:26:14 +04:00
Andrey Vagin
1e0e83701f cgroup: fix dereference before null check
Coverity: 1230177 Dereference before null check

There may be a null pointer dereference, or else the comparison against
null is unnecessary.  In parse_task_cgroup: All paths that lead to this
null pointer comparison already dereference the pointer earlier
(CWE-476)

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-07 10:24:50 +04:00
Cyrill Gorcunov
ecd432fe27 timerfd: Implement c/r procedure
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-06 19:20:09 +04:00
Cyrill Gorcunov
cd704e80ee cgroups: Make sure the cgroup formatted correctly
In case if something is broken in the kernel and
we get a format corrupted -- simply exit out with
error instead of strlen'ing nil string.

Also while at it -- add a comment about format.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-07-15 17:04:11 +04:00
Tycho Andersen
51876eea5d Attempt to restore cgroups
During the dump phase, /proc/cgroups is parsed to find co-mounted cgroups.
Then, for each task /proc/self/cgroup is parsed for the cgroups that it is a
member of, and that cgroup is traversed to find any child cgroups which may
also need restoring. Any cgroups not currently mounted will be temporarily
mounted and traversed. All of this information is persisted along with the
original cg_sets, which indicate which cgroups a task is a member of.

On restore, an initial phase creates all the cgroups which were saved. Tasks
are then restored into these cgroups via cg_sets as usual.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-07-10 17:00:28 +04:00
Tycho Andersen
4a012f1478 Fix typo
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-06-25 14:13:27 +04:00
Cyrill Gorcunov
fe7b8aeb8c vdso: x86 -- Add handling of vvar zones
New kernel 3.16 will have old vDSO zone splitted into the two vmas:
one for vdso code itself and second that named vvar for data been
referenced from vdso code.

Because I can't do 'dump' and 'restore' parts of the code separately
(otherwise test would fail) the commit is pretty big one and hard to
read so here is detailed explanation what's going on.

 1) When start dumping we detect vvar zone by reading /proc/pid/smap
    and looking up for "[vvar]" token. Note the vvar zone is mapped
    by a kernel with PF/IO flags so we should not fail here.

    Also it's assumed that at least for now kernel won't be changed
    much and [vvar] zone always follows the [vdso] zone, otherwise
    criu will print error.

 2) In previous commits we disabled dumping vvar area contents so
    the restorer code never try to read vvar data but still we need
    to map vvar zone thus vma entry remains in image.

 3) As with previous vdso format we might have 2 cases

    a) Dump and restore is happening on same kernel
    b) Dump and restore are done on different kernels

    To detect which case we have we parse vdso data from image
    and find symbols offsets then compare their values with runtime
    symbols provided us by a kernel. If they match and (!!!) the
    size of vvar zone is the same -- we simply remap both zones
    from runtime kernel into the positions dumpee had at checkpoint
    time. This is that named "inplace" remap (a).

    If this happens the vdso_proxify() routine drops VMA_AREA_REGULAR
    from vvar area provided by a caller code and restorer won't try
    to handle this vma. It looks somehow strange and probably should
    be reworked but for now I left it as is to minimize the patch.

    In case of (b) we need to generate a proxy. We do that in same
    way as we were before just include vvar zone into proxy and save
    vvar proxy address inside vdso mark injected into vdso area. Thus
    on subsequent checkpoint we can detect proxy vvar zone and rip
    it off the list of vmas to handle.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-06-24 22:48:43 +04:00
Pavel Emelyanov
0066d5e813 restore: Open /proc/self/maps via helpers
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
2014-06-09 15:29:47 +04:00
Pavel Emelyanov
b48e4cbfb8 proc: Introduce helper for parsing /proc/$pid/cgroup file
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-27 23:48:06 +04:00