2
0
mirror of https://github.com/checkpoint-restore/criu synced 2025-08-27 12:28:14 +00:00

1696 Commits

Author SHA1 Message Date
Pavel Emelyanov
3659d60ab7 restore: Open /proc/sys/kernel/ns_last_pid via helpers
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
2014-06-09 15:29:49 +04:00
Pavel Emelyanov
8644ce9628 util: Prepare proc opening helpers to open any files
We have a set of routines that open /proc/$pid files via proc service
descriptor. Teach them to accept non-pids as pids to open /proc/self/*
and /proc/* files via the same engine.

Signed-f-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-06-09 15:29:46 +04:00
Pavel Emelyanov
8a07349388 files: Fix open_path() to provide mntns root fd to callbacks
This fixes the support for fifo-s in mount namespaces and
makes it easier to control the correct open_path() usage in
the future.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-06-06 12:20:02 +04:00
Pavel Emelyanov
203c291467 cg: Restore tasks into proper cgroups
On restore find out in which sets tasks live in and move
them there.

Optimization note -- move tasks into cgroups _before_ fork
kids to make them inherit cgroups if required. This saves
a lot of time.

Accessibility note -- when moving tasks into cgroups don't
search for existing host mounts (they may be not available)
and don't mount temporary ones (may be impossible due to
user namespaces). Instead introduce service fd with a yard
of mounts.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-27 23:48:06 +04:00
Pavel Emelyanov
1ba9d2cae9 cg: Dump cgroups tasks live in
Each task points to a single ID of cgroup-set it lives in. This
is done so to save some space in the image, as tasks likely
live in the same set of cgroups.

Other than this we keep track of what cgroup set we dump the
subtree from. If it happens, that root task lives in the same
cgroup set as criu does, we don't allow for any other sub-cgroups
and make restore (next patch) much simpler and faster.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-27 23:48:06 +04:00
Pavel Emelyanov
8b8eb53a0a cg: Skeleton for cgroup code
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-27 23:48:06 +04:00
Pavel Emelyanov
06f7243380 image: Add bits and pieces for cgroups image
The exact structure of the image will be revealed in the
next patch(es). What is important here, is that cgroup
image is somewhat new.

It will likely contain arrays of objects of different types,
so I introduce the "header" object, that will link these
arrays using pb repeated fields. This will help us to avoid
many image files for different cgroup objects and will make
the amount of write()-s required be 1.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-27 23:48:06 +04:00
Pavel Emelyanov
b48e4cbfb8 proc: Introduce helper for parsing /proc/$pid/cgroup file
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-27 23:48:06 +04:00
Pavel Emelyanov
e5eb73ea48 util: Introduce strstartswith helper
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-27 23:48:06 +04:00
Cyrill Gorcunov
c473461d24 vdso: Make it arch specific
Currently we build vDSO handling code for all archs provided
in the source code having some "common" parts inside pie/vdso.c,
pie/vdso-stub.c, vdso-stub.c and vdso.c. This were more or
less well but in new linux kernels (starting from 3.16 presumably)
the vDSO has been significantly reworked so every architecture
must have own vDSO handling engine (just like the kernel does).

So in this patch we move vDSO code to arch specific and because
aarch64 actually doesn't implement proxification yet due to
kernel restrictions -- we drops it out. When there will be
kernel support we bring it back in proper arch/aarch64
implementation.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Alexander Kartashov <alekskartashov@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-27 23:41:31 +04:00
Cyrill Gorcunov
676708e3b3 vdso: Put CONFIG_VDSO where needed
Guard vDSO code with CONFIG_VDSO, no need to even build it
on archs which do not support vDSO handling.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Alexander Kartashov <alekskartashov@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-27 23:40:07 +04:00
Andrey Vagin
f0cbc301fc mm: mark VM_IO and VM_PFNMAP VMA-s as unsupported
vmsplice doesn't work for such VMA-s.

This flags is set in a kernel function remap_pfn_range()
(remap kernel memory to userspace), which is widely used by device
drivers to provide direct access to a device memory.

Reported-by: J F <jgmb45@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-23 13:34:16 +04:00
Filipe Brandenburger
d5bb7e9748 dump: preserve the dumpable flag on criu dump/restore
Preserve the dumpable flag, which affects whether a core dump will be
generated, but also affects the ownership of the virtual files under
/proc/$pid after restoring a process.

Tested: Restored a process with a criu including this patch and looked
at /proc/$pid to confirm that the virtual files were no longer all owned
by root:root.

zdtm tests pass except for cow01 which seems to be broken.
(see https://bugzilla.openvz.org/show_bug.cgi?id=2967 for details.)

This patch fixes https://bugzilla.openvz.org/show_bug.cgi?id=2968

Signed-off-by: Filipe Brandenburger <filbranden@google.com>
Change-Id: I8c386508448a84368a86666f2d7500b252a78bbf
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-14 01:02:37 +04:00
Andrey Vagin
3a9c6a3d37 util: use glibc macros to generate device numbers in the dev_t format
Our version of macroses are worng.

Our macros:
#define MINOR(dev)           ((dev) & 0xff)

Glibc function:
return (__dev & 0xff) | ((unsigned int) (__dev >> 12) & ~0xff);

Reported-by: Amey Deshpande <ameyd@google.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-05-07 21:02:35 +04:00
Christopher Covington
5d74f55d80 Don't say /proc in macro errors
It's possible that a procfs mounted somewhere other than /proc
is in use.

Signed-off-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-25 13:25:13 +04:00
Cyrill Gorcunov
0c89d779f9 log: Include inttypes.h for PRI helpers
https://bugzilla.openvz.org/show_bug.cgi?id=2949

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-25 13:23:55 +04:00
Pavel Emelyanov
8d5822d9cb mnt: Factor out mntns nsid creation on restore
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-23 13:22:12 +04:00
Pavel Emelyanov
68e2841a9b mnt: Turn mntns_get_root_fd into accepting mnt ns_id
The only exception (for now) is the irmap -- it should
operate on ns as well.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-23 02:31:16 +04:00
Pavel Emelyanov
1435617c40 mnt: Rename _collect_root into _get_root_fd
Nowadays this routine is mainly used for getting an
fd, rather than keeping one for future reference.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-23 01:38:58 +04:00
Pavel Emelyanov
79f3e90856 rst: Less arguments to restore_task_mnt_ns
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-22 23:48:46 +04:00
Pavel Emelyanov
8550f52017 mnt: Move local mntns collecting on restore into prepare_mnt_ns
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-22 23:48:43 +04:00
Pavel Emelyanov
f4b7a6fedd mnt: Mark rst_collect_local_mntns as void
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-22 23:48:38 +04:00
Pavel Emelyanov
88eef43e41 mnt: Mark dump_mnt_ns as static
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-22 23:48:33 +04:00
Pavel Emelyanov
4ffa79695d mnt: Remove unneeded argument from prepare_mnt_ns
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-22 23:48:23 +04:00
Andrey Vagin
85569e8dd4 mount: prevent dumping nested mount namespace without mnt_id in fdinfo
When we don't know mnt_id, we don't know to which namespace a file
belongs.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:40:27 +04:00
Andrey Vagin
2f4be997b6 mount: use per-namespace mntinfo_tree (v2)
This patch removes the global mntinfo_tree and collect_mount_info where
it was constructed. The mntinfo list is filled from dump_mnt_ns,
rst_collect_local_mntns, collect_mnt_namespaces and read_mnt_ns_img.

A mountinfo entry contains a reference on a proper ns_id entry, so
we cau use mnt_id to look up a proper mount namespace.

v2: remove trash after rebasing.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:40:19 +04:00
Andrey Vagin
fb3ce0fbeb mount: prepare to work without mnt_id
Kernels before 3.15 doesn't show mnt_id and mnt_id isn't saved in
images, if mntns isn't dumped.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:40:10 +04:00
Andrey Vagin
b6d3314c54 check: collect mounts of the current mntns
They are used for collecting unix sockets

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:40:04 +04:00
Andrey Vagin
26a0dc91dd mount: add a function to get a temporary root for mntns
On restore all mount namespaces are restored in the root mntns and
sub-namecpeaces are restored in temorary places.

This function allows to get paths to these places.

It will be used in open_remap_ghost(), because it's called in the root
task, when other tasks are not forked yet.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:39:59 +04:00
Andrey Vagin
1b3fa9bc25 mount: set nsid for each mount point
We want to look up mntns by mnt_id.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:39:50 +04:00
Andrey Vagin
5418938ec3 resotre: collect mounts of current mntns
It's required for restoring in the current mntns.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:39:46 +04:00
Andrey Vagin
e827a695f3 mount: separate collect_mnt_ns from dump_mnt_ns
We are going to support nested mntns, so the global mntinfo_tree
variable are useless and information about tree should be connected
to a proper namespace.

But when we don't dump mntns, we need to collect mounts for the current
mntns.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:39:41 +04:00
Andrey Vagin
cc1fd5760a mount: save mount tree for each namespace
We are going to support nested mount namespaces and each NS has own
tree. The mount tree is used for checking that a file is reachable.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:39:34 +04:00
Andrey Vagin
22d384536d files-ids: generate id-s accoding with mnt_id, st->st_dev and st->st_ino
One device can be mounted a few times, so files are identical only,
if they have the same mnt_id.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:39:28 +04:00
Andrey Vagin
87b1f5408c files: save mnt_id on fd_param
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:39:18 +04:00
Andrey Vagin
0721626902 namespaces: dump mount namespaces before tasks (v2)
because we want to check, that all files are reachable.
For that we need to collect all mounts from all namespaces.

v2: dump mntns separately
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:38:47 +04:00
Andrey Vagin
d2012883ab criu: rename current_ns_mask to root_ns_mask (v2)
Now we supports sub-mntns, so root_ns_mask sounds more correct than
current_ns_mask.

v2: typo fix
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:38:33 +04:00
Andrey Vagin
4067f4bb7e mount: allow to dump and restore nested mount namespaces (v3)
v2: another attempt to write readable code:)
v3: clean up
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:38:23 +04:00
Andrey Vagin
3a291e33ff crtools: restore nested mount namespaces (v2)
Known issue:
* currently only namespaces with the same root is supported
* nested namespaces can be dumped and restored only if the root task
  has own mount namespace.

All nested namespaces are restored in a root namespace in temporary
directories. All mount points restored in one tree and then they are
divided into namesaces.
The task with minimal pid for each namespaces unshared mntns and
then it makes pivot_root in a proper temporary directory. All other
tasks makes setns to enter into a mount namespace of the task with
minimal pid.

v2: clean up

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:38:17 +04:00
Andrey Vagin
e7e9c2ee6e mounts: create a temporary directory for restoring non-root mntns (v2)
All non-root namespaces will be restored as sub-trees of the root tree.

This patch adds helpers to create a temporary directory and mount tmpfs
in it, then create directories for each non-root mount namespace.

tmpfs is quite useful here to simplify destroying this construction,
we don't need to unmount each namespace separately.

v2: add a comment why MNT_DETACH is not dangerous here
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:38:12 +04:00
Andrey Vagin
84c48e6244 mounts: Mark ns' roots in the list of mount points (v2)
When we'll restore nested mount namespaces, all but root ones (sub-namespaces)
will be restored as sub-mounts in the root mount namespace. So mi->mountpoint
will be not '/' even if a mount is root for its mntns.

v2: s/is_root/is_ns_root/
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:37:57 +04:00
Andrey Vagin
eac462922c restore: add mount id-s in the ns_ids list (v4)
Currently ns_ids list is filled only on dump. Soon we'll need this
list for mount namespaces on restore, e.g. to know which tasks share
the namespaces.

v2: merge the patch "namespace: add a function to search an ns_id
item by id" into this one.
v3: add prefix rst_ to add_ns_id
v4: look up namespace by two values -- type AND ID

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:37:52 +04:00
Pavel Emelyanov
e8ac085af8 Revert "crtools: close all desriptors only for the root task"
We have a race. Consider we have 3 tasks, A, B and C. A and B
share fdtable, C -- does not. Then we might be in a situation
when A is restoring memory reading mem images, and B -- forking
the C child. In that case descriptors held by A (for mem restore)
will be inherited by C and will not get closed.

This reverts commit d36e07aabe073993d8ae9695e33f6e45b2eb6a21.
2014-04-21 14:48:05 +04:00
Andrew Vagin
b664bb142b mount: fill fstypes for btrfs mounts on restore
BTRFS returns subvolume dev-id instead of superblock dev-id,
so we need to know which mounts are btrfs.

The mi->fstype->name is "unsuppoerted" here, because the fstype->code
is saved in an image

{
.name = "unsupported",
.code = FSTYPE__UNSUPPORTED,
},
{
.name = "btrfs",
.code = FSTYPE__UNSUPPORTED,
}

An a second reason is that pocesses can be migrated from smth to btrfs.
This all can happen _only_ for the root mount and for bind mounts of
the root mount...

Signed-off-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-18 15:01:57 +04:00
Andrey Vagin
8df879941d mount: save relative path in mi->mountpoint
"relative path" is absolute path with dot at the beginning.

We already use relative paths on restore. In this patch we add "."
on dump too. It's convinient, because we needed to add dot each time
when we want to access this mount point.
Before this patch we had to created a temporary copy.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-17 12:05:58 +04:00
Andrey Vagin
87a49bdfaf servicefd: add a service fd for current root
It's already used for dumping files and it will be used for restoring,
so it should be service fd to avoid intersection with restored
descriptors.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-17 12:03:11 +04:00
Pavel Emelyanov
d48d6c7267 posix-timers: Helper for freeing proc parsed data
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-17 12:01:02 +04:00
Pavel Emelyanov
b54e340945 core: Move posix timers on core entry
This as well gives us minus one image per-task and
allocates more space on core task entry.

One thing to note -- the amount of posix timers is
not easily accessible at the core entry allocation
time, so the respective array is allocated on demand.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-17 12:00:54 +04:00
Pavel Emelyanov
dfd5a62f38 core: Move itimers on core
This allows to have one image less per-task, which in turn
reduces live migration time a little bit.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-17 12:00:52 +04:00
Andrey Vagin
bed13a58ec proc_parse: parse mnt_id from /proc/PID/fdinfo/FD
It will be used for restoring files from proper mounts.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-09 16:43:50 +04:00