2
0
mirror of https://github.com/checkpoint-restore/criu synced 2025-08-28 12:57:57 +00:00

164 Commits

Author SHA1 Message Date
Kir Kolyshkin
0f6b5d921c open_proc_sfd(): use pr_perror() after failed open()
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Acked-by: Andrew Vagin <avagin@odin.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-10-09 18:28:12 +03:00
Kir Kolyshkin
17b92fa542 Append newline when using pr_err()
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Acked-by: Andrew Vagin <avagin@odin.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-10-09 18:28:00 +03:00
Tycho Andersen
3b02df57ef util: don't chop off last element in buffer
377763e5 is incorrect since we can't always chop off the last element in
the buffer:

Execute static/cgroup00
./cgroup00 --pidfile=cgroup00.pid --outfile=cgroup00.out --dirname=cgroup00.test
Dump 12819
(00.003514) Error (files-reg.c:624): Can't create link remap for /dev/nul. Use link-remap option.
(00.003523) Error (cr-dump.c:1257): Dump files (pid: 12819) failed with -1
(00.004042) Error (cr-dump.c:1619): Dumping FAILED.
WARNING: cgroup00 returned 1 and left running for debug needs
Test: zdtm/live/static/cgroup00, Result: FAIL
==================================== ERROR ====================================
Test: zdtm/live/static/cgroup00, Namespace:
================================= ERROR OVER =================================

Hopefully the >= will appease coverity (instead of just a ==).

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-10-07 15:53:42 +03:00
Kir Kolyshkin
377763e5b1 read_fd_link(): don't overrun buf
This is a classical off-by-one error. If sizeof(buf) is 512,
the last element is buf[511] but not buf[512].

Reported by Coverity, CID 114624, 114622 etc.

Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-10-07 14:56:33 +03:00
Tycho Andersen
f79f4546cf sysctl: move sysctl calls to usernsd
When in a userns, tasks can't write to certain sysctl files:

(00.009653)      1: Error (sysctl.c:142): Can't open sysctl kernel/hostname: Permission denied

See inline comments for details on affected namespaces.

Mostly for my own education in what is required to port something to be
userns restorable, I ported the sysctl stuff. A potential concern for this
patch is that copying structures with pointers around is kind of gory. I
did it ad-hoc here, but it may be worth inventing some mechanisms to make
it easier, although I'm not sure what exactly that would look like
(potentially re-using some of the protobuf bits; I'll investigate this more
if it looks helpful when doing the cgroup user namespaces port?).

Another issue is that there is not a great way to return non-fd stuff in
memory right now from userns_call; one of the little hacks in this code
would be "simplified" if we invented a way to do this.

v2: coalesce the individual struct sysctl_req requests into one big
    sysctl_userns_req that is in a contiguous region of memory so that we
    can pass it via userns_call. Hopefully nobody finds my little ascii
    diagram too offensive :)
v3: use the fork/setns trick to change the syctl values in the right ns for
    IPC/UTS nses; see inline comment for details
v4: only use sysctl_userns_req when actually doing a userns_call.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-10-05 13:16:14 +03:00
Pavel Emelyanov
9b3189fed1 util: Add make_yard helper
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-09-28 11:32:18 +03:00
Andrey Vagin
011231af3b util: add ability to execute programs in a specified userns
It's required for dumping tmpfs, where we use tar to save content.
If we need to execute tar from a proper userns to get right uid-s and
gid-s for files.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-08-07 14:42:01 +03:00
Andrey Vagin
de70936ec7 util: don't ignore standard descriptors
It's rudiment. close_old_fds() closes all extra descriptors.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-07-15 17:36:18 +03:00
Pavel Emelyanov
2e8ddb2db0 proc: Don't use parent proc_self_fd cached descriptor
When we call open_proc(PROC_SELF, ...) the /proc/self descriptor is
cached in criu. If the process fork()-s after than and child goes
open_proc(PROC_SELF, ...) then it will get the parent's proc descriptor.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@virtuozzo.com>
2015-05-30 00:32:08 +03:00
Christopher Covington
cefe22bdac Use run-time page size where it matters
In AArch64, pages may be 4K or 64K depending on kernel configuration.
The GNU C Library documentation suggests [1], "the correct interface
to query about the page size is sysconf". Introduce one new
architecture-specific function-like macro, page_size(), that on x86
and AArch32 remains a constant so as to minimally affect performance,
but on AArch64 is sysconf(_SC_PAGESIZE) for correctness.

1. https://www.gnu.org/software/libc/manual/html_node/Query-Memory-Parameters.html

To minimize churn, the PAGE_SIZE macro is left as a build-time
estimation of what the run-time page size might be.

This fixes the following errors for CRIU on AArch64 kernels with
CONFIG_ARM64_64K_PAGES=y, allowing dump of
`setsid sleep < /dev/null &> /dev/null` to succeed.

Error (kerndat.c:48): Can't stat self map_files: No such file or directory

Error (util.c:668): Can't read pme for pid 90: No such file or directory

Error (parasite-syscall.c:1135): Can't open 89/map_files/0x3ffb7da0000-0x3ffb7dac000 on procfs: No such file or directory

Signed-off-by: Christopher Covington <cov@codeaurora.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-04-22 15:39:05 +03:00
Cyrill Gorcunov
3bd6d9d7b0 image: Add comments about VMA_AREA constants and drop FORCE_READ flag
Force-read came from very first dev version of CRIU (even before 1.0 release)
and never been used actually in image.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-03-04 17:48:47 +03:00
Ruslan Kuprieiev
b09a88b5f9 util: set cr_errno to ESRCH if no PID dir in proc
Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-12-19 18:59:14 +03:00
Pavel Emelyanov
f408bc0331 log: Print vma status in dump logs
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-12-17 18:11:09 +03:00
Saied Kazemi
0412152fc5 Add inherit fd support
There are cases where a process's file descriptor cannot be restored
from the checkpoint images.  For example, a pipe file descriptor with
one end in the checkpointed process and the other end in a separate
process (that was not part of the checkpointed process tree) cannot be
restored because after checkpoint the pipe will be broken.

There are also cases where the user wants to use a new file during
restore instead of the original file at checkpoint time.  For example,
the user wants to change the log file of a process from /path/to/oldlog
to /path/to/newlog.

In these cases, criu's caller should set up a new file descriptor to be
inherited by the restored process and specify the file descriptor with the
--inherit-fd command line option.  The argument of --inherit-fd has the
format fd[%d]:%s, where %d tells criu which of its own file descriptors
to use for restoring the file identified by %s.

As a debugging aid, if the argument has the format debug[%d]:%s, it tells
criu to write out the string after colon to the file descriptor %d.  This
can be used, for example, as an easy way to leave a "restore marker"
in the output stream of the process.

It's important to note that inherit fd support breaks applications
that depend on the state of the file descriptor being inherited.  So,
consider inherit fd only for specific use cases that you know for sure
won't break the application.

For examples please visit http://criu.org/Category:HOWTO.

v2: Added a check in send_fd_to_self() to avoid closing an inherit fd.
    Also, as an extra measure of caution, added checks in the inherit fd
    look up functions to make sure that the inherit fd hasn't been reused.
    The patch also includes minor cosmetic changes.

Signed-off-by: Saied Kazemi <saied@google.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-12-10 12:48:30 +03:00
Tycho Andersen
de055b7992 cg: use one path style throughout cg restore code
This commit is in preparation for the (hopefully last :) restore special cpuset
patch.

Previously, we installed the cgroup service fd after calling
prepare_cgroup_dirs, which meant that we had to carry around the temporary
directory name in order to put things in the right place. The
restore_cgroup_prop function uses the cg service fd instead of carrying around
the full path. This means that we can't sue restore_cgroup_prop, without first
sanitizing the path. Instead, we install the service fd before calling
prepare_cgroup_dirs, and all the code just references that instead of carrying
around the temporary path.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-10-07 12:56:52 +04:00
Pavel Emelyanov
abeae2671b proc: Keep /proc/self cached separately from /proc/pid
When dumping tasks we do a lot of open_proc()-s and to
speed this up the /proc/pid directory is opened first
and the fd is kept cached. So next open_proc()-s do just
openat(cached_fd, name).

The thing is that we sometimes call open_proc(PROC_SELF)
in between and proc helpers cache the /proc/self too. As
the result we have a bunch of

  open(/proc/pid)
  close()
  open(/proc/self)
  close()

see-saw-s in the middle of dumping tasks.

To fix this we may cache the /proc/self separately from
the /proc/pid descriptor. This eliminates quite a lot
of pointless open-s and close-s.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-29 13:21:43 +04:00
Pavel Emelyanov
1c8ab40e65 proc: Sanitate empty lines
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-29 13:21:23 +04:00
Cyrill Gorcunov
19018622cd util: mkdirp -- Print exactly what is failed
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-18 20:12:24 +04:00
Pavel Emelyanov
b47b0201f3 page-server: Don't setup options in parent task
When service starts page server all the preparations (log, wdir, img dir, etc.)
happen in parent task, then we fork page server.

This is OK for now, but when we will serve several requests per connection, all
these resources would be leaked in parent.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-05 13:49:54 +04:00
Pavel Emelyanov
069bdd9674 scripts: Move scripts code into separate sources
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-05 13:48:21 +04:00
Andrey Vagin
961655dc02 util: add a function to check output data in a file descriptor
We can't dump netlink socket, inotify, fanotify, if they have queued
data, so lets add a function to chech this.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-02 16:25:50 +04:00
Andrey Vagin
8f17b34abb criu: Drop redundant newline from pr_perror
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-22 19:22:39 +04:00
Tycho Andersen
ded04267f8 scripts: set CRIU_IMAGE_DIR when running scripts
When doing a restore for LXC, we store some other metadata (which bridge a veth
was on) in the image directory so that the restore script can correctly unlock
a network device and attach it to the right interface. This patch is needed so
that the script can find this metadata.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-12 22:43:37 +04:00
Pavel Emelyanov
914ab7f245 util: Don't xfree pointer on xmalloc-ed pointer
... free the pointer itself :)

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-06 09:37:40 +04:00
Tycho Andersen
0f178a1f99 cg: correctly detect co-mounted controller mount point
Before we would not detect the mount point for co-mounted controllers. Things
still worked because we'd just re-mount them ourselves and traverse our own
mount point, but this saves an extra mount().

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-07-14 15:14:37 +04:00
Tycho Andersen
51876eea5d Attempt to restore cgroups
During the dump phase, /proc/cgroups is parsed to find co-mounted cgroups.
Then, for each task /proc/self/cgroup is parsed for the cgroups that it is a
member of, and that cgroup is traversed to find any child cgroups which may
also need restoring. Any cgroups not currently mounted will be temporarily
mounted and traversed. All of this information is persisted along with the
original cg_sets, which indicate which cgroups a task is a member of.

On restore, an initial phase creates all the cgroups which were saved. Tasks
are then restored into these cgroups via cg_sets as usual.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-07-10 17:00:28 +04:00
Cyrill Gorcunov
993205e3be vdso: util -- Show 'vvar' abreviature when meet VMA_AREA_VVAR
This is for debug purpose mostly.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-06-24 22:48:42 +04:00
Pavel Emelyanov
8644ce9628 util: Prepare proc opening helpers to open any files
We have a set of routines that open /proc/$pid files via proc service
descriptor. Teach them to accept non-pids as pids to open /proc/self/*
and /proc/* files via the same engine.

Signed-f-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-06-09 15:29:46 +04:00
Pavel Emelyanov
e8ac085af8 Revert "crtools: close all desriptors only for the root task"
We have a race. Consider we have 3 tasks, A, B and C. A and B
share fdtable, C -- does not. Then we might be in a situation
when A is restoring memory reading mem images, and B -- forking
the C child. In that case descriptors held by A (for mem restore)
will be inherited by C and will not get closed.

This reverts commit d36e07aabe073993d8ae9695e33f6e45b2eb6a21.
2014-04-21 14:48:05 +04:00
Andrey Vagin
d36e07aabe crtools: close all desriptors only for the root task
For all other tasks only unsed service descriptors will be closed.

This change allows to have file descriptors, which may be used for
restoring namespaces. All non-server descriptors must be closed before
restoring files.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-09 15:50:40 +04:00
Cyrill Gorcunov
6df067c50a util: Make sure open successed
Opening /dev/null may fail, check for ret code.

CID 1168167

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-09 15:26:25 +04:00
Deyan Doychev
69a6bf4439 criu: Add exec-cmd option (v3)
The --exec-cmd option specifies a command that will be execvp()-ed on successful
restore. This way the command specified here will become the parent process of
the restored process tree.

Waiting for the restored processes to finish is responsibility of this command.

All service FDs are closed before we call execvp(). Standad output and error of
the command are redirected to the log file when we are restoring through the RPC
service.

This option will be used when restoring LinuX Containers and it seems helpful
for perf or other use cases when restored processes must be supervised by a
parent.

Two directions were researched in order to integrate CRIU and LXC:

1. We tell to CRIU, that after restoring container is should execve()
   lxc properly explaining to it that there's a new container hanging
   around.

2. We make LXC set himself as child subreaper, then fork() criu and ask
   it to detach (-d) from restore container afterwards. Being a subreaper,
   it should get the container's init into his child list after it.

The main reason for choosing the first option is that the second one can't work
with the RPC service. If we call restore via the service then criu service will
be the top-most task in the hierarchy and will not be able to reparent the
restore trees to any other task in the system. Calling execve from service
worker sub-task (and daemonizing it) should solve this.

Signed-off-by: Deyan Doychev <deyandoichev@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-03-25 01:20:02 +04:00
Pavel Emelyanov
eaee604238 Revert "util: Add "/bin" to PATH when spawning helpers"
This reverts commit 66ab5e1ad8ac682c9225446747885184b7bf41b5.

After Andrey's fixes that create mount points before dropping
old mounts and going to pivot_root, this patch is not needed.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-03-04 22:18:17 +04:00
Pavel Emelyanov
66ab5e1ad8 util: Add "/bin" to PATH when spawning helpers
We call tar, ip, iptables, etc. when restoring container.
The problem is that these stuff is called from inside new
mount namespace after pivot_root(). But the execvp uses
PATH variable inherited from the host system, which may
not reflect real binaries layout.

Add "/bin" to path as temporary workaround.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-02-10 15:45:43 +04:00
Tikhomirov Pavel
2c7fb879ae chdir: need to check return value
otherwise it won't compile:

util.c: In function ‘cr_daemon’:
util.c:594:8: error: ignoring return value of ‘chdir’, declared
with attribute warn_unused_result [-Werror=unused-result]
   chdir("/");
        ^

Signed-off-by: Tikhomirov Pavel <snorcht@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-02-04 14:05:35 +04:00
Pavel Emelyanov
eb1ae0a025 vma: Turn embeded VmaEntry on vma_area into pointer
On restore we will read all VmaEntries in one big MmEntry object,
so to avoif copying them all into vma_areas, make them be pointable.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-02-04 11:44:01 +04:00
Pavel Emelyanov
98fbeb8d0a vma: Vma allocation helper is now function
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-02-03 17:18:42 +04:00
Pavel Emelyanov
bd7bf7bd39 anon-inode: Don't readlink fd/fd multiple times
The is_foo_link readlinks the lfd to check. This makes
anon-inodes dumping readlink several times to find proper
dump ops. Optimize this thing.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-02-02 22:14:29 +04:00
Andrey Vagin
7051e2e92d util: apply PME_PFRAME_MASK to get pfn
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-01-31 15:11:28 +04:00
Pavel Emelyanov
f9c8e3a2cd pagemap: Factor out pfn retrieving for vdso and zero page
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-01-30 23:34:53 +04:00
Pavel Emelyanov
9753501297 rpc: Introduce CLI's --action-script analogue
Service shouldn't call client provided scripts, as it
creates a security issue (client may be unpriviledged,
while the service is).

In order to let caller do what it would normally do with
criu-scripts, make criu notify it about scripts. Caller
then do whatever it needs and responds back.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-01-30 15:58:45 +04:00
Pavel Emelyanov
29952618e3 daemon: Write own daemon routine
RPC will start page-server daemon and needs to get the
controll back to report back to caller, but the glibc's
daemon() does exit() in parent context preventing it.

Thus -- introduce own daemonizing routine.

Strictly speaking, this is not pure daemon() clone, as the
parent process has to exit himself. But this is OK for now.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-01-30 15:58:41 +04:00
Pavel Emelyanov
839a3c6122 files: Don't call fstatfs twice
When filling fd_parms we do call statfs, no need to call it
again later.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-01-29 20:09:27 +04:00
Pavel Emelyanov
39834b1d7e util: Log scripts running
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-12-26 22:39:08 +04:00
Andrey Vagin
e027f116e4 plugin: add a function to get a descriptor to the image dir
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-12-19 21:49:39 +04:00
Kir Kolyshkin
b11f24fd5d criu check: don't run as non-root
In case criu check is run as non-root, a lot of information is printed
to a user, with the only missing bit is it should run it as root.

Fix it.

I still don't like the fact that some other stuff is printed here,
like the timestamp and the __FILE__:__LINE__, but this should be
fixed separately.

Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-12-13 13:58:45 +04:00
Andrey Vagin
4850fd94a8 crtools: move cr_options in a separate header
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-11-06 18:17:52 +04:00
Andrey Vagin
0d1dfc2e08 crtools: move all stuff about vma together
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-11-06 12:43:49 +04:00
Andrey Vagin
824403a009 crtools: create new header for servicefd stuff (v2)
v2: generate patch relative to the official git.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-11-06 12:43:02 +04:00
Pavel Emelyanov
297360ef7d rst: Switch shmalloc allocator to use rst-malloc
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-11-02 01:04:07 +04:00