2
0
mirror of https://github.com/checkpoint-restore/criu synced 2025-08-28 21:07:43 +00:00

272 Commits

Author SHA1 Message Date
Marcos Lilljedahl
e3f900f954 Remove multiple command validation as cpuinfo requires it
"criu cpuinfo [dump | check]" can't be used through the command line as this validation kicks in.
Other commands will fail due to required arguments so I believe it's not necessary anymore.

Signed-off-by: Marcos Lilljedahl <marcosnils@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-09-21 11:50:05 +03:00
Tycho Andersen
4f2e4ab3be irmap: add --irmap-scan-path option
This option allows users to specify their own irmap paths to scan in the event
that they don't have a path in one of the hard coded hints.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-09-21 11:46:12 +03:00
Cyrill Gorcunov
6632b2a4fc criu: Require @fd argument for "swrk" mode
Otherwise we hit nil dereference.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-09-16 15:40:31 +03:00
Cyrill Gorcunov
fe946bf4eb crtools: Fix potential nil dereference
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-09-16 15:40:12 +03:00
Cyrill Gorcunov
ced8f88401 opts: Allo to specify the maximum size of ghost files
For example we hit a case where systemd carries journal
file with 4M in size.

https://jira.sw.ru/browse/PSBM-38571

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-08-10 16:51:11 +03:00
Andrey Vagin
8ea020388e dump: use freezer cgroup to seize processes (v4)
Without using a freezer cgroup, we need to do a few iterations to catch
all tasks, because a new tasks can be born. If new tasks appear faster
than criu collects them, criu fails. The freezer cgroup allows to
solve this problem.

We freeze the freezer group, then attaches to tasks with ptrace and thaw
the freezer cgroup. We suppose that all tasks which are going to be
dumped in a specified freezer group.

v2: fix comments from Christopher
Reviewed-by: Christopher Covington <cov@codeaurora.org>

v3: refactor task_seize

v4: fix comments from Pavel

Cc: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-08-10 16:47:51 +03:00
Artem Kuzmitskiy
79fd764ae6 Add dumping of unnamed unix sockets.
* Added functionality for dumping unnamed unix sockets.
  When we call CRIU with dump option, for unnamed socket we
  should pass it inode into --ext-unix-sk. Details about this problem
  described in http://criu.org/External_UNIX_socket#What_to_do_with_socketpair.28.29-s.3F.
  Usage example:
    criu dump -D images -o dump.log -v4 --ext-unix-sk=4529709 -t 13506

* fix typo error in log output

Signed-off-by: Artem Kuzmitskiy <artem.kuzmitskiy@lge.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-07-29 17:51:51 +03:00
Pavel Emelyanov
350a7a982a Revert "cgroups: Add ability to reuse existing cgroup yard directory"
Reasoning: some systems have /sys/fs/cgroup stuff mounted as read-only
and we have to either remount it rw or create our own set. The former
doesn't look sane as this rw remounting is also done by ststemd, so
let's return back to manual cgyard construction.

This reverts commit 860df95f859cf7ba23b57fc832793c623a5897e4.

Conflicts:
	cgroup.c
	include/cr_options.h

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-06-16 19:15:20 +03:00
Cyrill Gorcunov
c7d646afb3 cgroups: Introduce cgroup management modes
When been playing wich checkpoint/restore of container I found
that we can't reuse existing controller if they were pre-created.
For example currently in PCS7 we're bindmount cgroups which belong
to a container in a form of

 /sys/fs/cgroup/<controller>/<container> ==> /sys/fs/cgroup/<controller>

so that CRIU dumps such configuration fine but on restore
it recreates controllers from the scratch which we would
like to bindmount them and ask CRIU to restore subcgroups
and their parameters.

So I extended --manage-cgroups option to take <mode> arguments.
Detailed description in docs.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-06-15 21:21:56 +03:00
Cyrill Gorcunov
860df95f85 cgroups: Add ability to reuse existing cgroup yard directory
Currently we always create temporary directory where we restore
cgroups, but this won't work in case if mounting cgroups is forbidden
from inside of a container for some reason (as in OpenVZ kernel).

So one can pass --cgroup-yard option to specify an existing
directory where cgroups are living. By default we assume it
lays in /sys/fs/cgroup.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-06-15 21:21:54 +03:00
Salvatore Bonaccorso
0e021364db Fix spelling error in usage text
Signed-off-by: Salvatore Bonaccorso <carnil@debian.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-06-02 15:22:28 +03:00
Pavel Emelyanov
0dfdd052e7 show: Deprecate this action in criu tool
Now we have the crit utility to print images' contents in the human-readable
format, so show can be thrown out some time soon.

For now let's just deprecate it and leave functional only when the output
is asked into non-terminal. I.e. the plan shell "criu show -f" will not work.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
2015-04-21 16:09:44 +03:00
Tycho Andersen
b508ebcb50 opts: fix formatting for options
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-04-10 17:55:44 +03:00
Tycho Andersen
fcae4f3954 mnt: add --enable-external-masters option
This option enables external (slave) bind mounts to be resolved.

v2: don't always assume that when the master id matches, the mounts match

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-04-10 17:54:51 +03:00
Tycho Andersen
0afffc9dc1 mnt: add --enable-external-sharing flag
With this flag, external shared bind mounts are attempted to be resolved
automatically.

v2: don't always assume when the sharing matches that the mount matches

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-04-10 17:54:12 +03:00
Tycho Andersen
aebfabb5ad mnt: add --ext-mount-map auto option
When this option is specified, if an external (private) bind mount is not
specified by --ext-mount-map KEY:VAL then it is attempted to be resolved
automatically.

v2: introduce find_best_external_match, which looks for the best match based on
    sharing/slave ids; don't try to resolve fsroot_mounted() mountpoints
v3: get rid of really_collect_self_mounts
v4: get rid of fsroot_mounted() check when autodetecting external mounts

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-04-10 17:52:14 +03:00
Oleg Nesterov
e2c38245c6 introduce --enable-fs cli option
Finally add --enable-fs option to specify the comma separated list of
filesystem names which should be treated as FSTYPE_AUTO.

Note: obviously this option is not safe, use at your own risk. "dump"
will always succeed if the mntpoint is auto, but "restore" can fail or
do something wrong if mount(src, mountpoint, flags, options) can not
actually "just work" as FSTYPE_AUTO logic expects.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-04-10 17:35:43 +03:00
Cyrill Gorcunov
391d589482 options: Use union for @daemon and @restore_detach
They both are using 'd' option in different context
though, lets give them two names.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-04-06 18:06:16 +03:00
Cyrill Gorcunov
314c5b0261 opts: Align long options description
Man, it was almost unreadable.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-04-06 18:05:49 +03:00
Oleg Nesterov
eb518936d8 introduce --skip-mnt cli option
Which obviously can be used to "ignore" the mounts we do not want or
need to dump. The user should know what he does.

Note: this patch changes parse_mountinfo() to check should_skip_mount().
This is because imo we want to filter out the unwanted mounts asap, af
if they do not exist. This increases the chances the dumping will fail
if something else depends on this mount. Say, another mountpoint or an
opened file.

Perhaps it makes sense to teach should_skip_mount() to use fnmatch()
and/or look at the optional "(fs|mnt)=" prefix to skip by fsname too.

To me it would be better to force the user of this option to understand
what it does. Say, if "dump" fails because the child mount can't find
the skipped parent, he should add another --skip-mnt option or do not
dump. Otherwise, if we do this automagically the user can probably be
surpised, he might even miss the fact that we skip more than he asked.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-04-03 17:56:05 +03:00
Andrey Vagin
b23268e492 service: add ability to set inherit file descriptors (v3)
This is required to use criu swrk in libcontainer.

v2: remove useless function declaration
    allow to set inherit_fd only for swrk
v3: check swrk out of loop

Cc: Saied Kazemi <saied@google.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-03-30 13:09:25 +03:00
Ruslan Kuprieiev
df301b7eb7 security: create separate security.h header
Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-02-10 16:53:54 +03:00
Cyrill Gorcunov
58b3ef0e6f Fix --cpu-cap option parsing, v2
Manage to forgot to increase pointer by length
of the words parsed.

Reported-by: Mark O'Neill <mao@tumblingdice.co.uk>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-01-27 16:17:12 +03:00
Andrey Vagin
b025ef1dbc criu: return an error if arguments contains more than one command
For example:
$ criu show stats-dump
Error (crtools.c:468): Unable to handle more than one command

$ ./criu show -f stats-dump
dump: {
	freezing_time: 626
	frozen_time: 24762
	memdump_time: 2696
	memwrite_time: 1007
	pages_scanned: 1108
	pages_skipped_parent: 0
	pages_written: 28
	irmap_resolve: 0
}

Reported-by: Thouraya TH <thouraya87@gmail.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-01-22 18:57:39 +03:00
Pavel Emelyanov
0749ef23e9 check/zdtm: Introduce fine-grained feature testing
Right now we state that CRIU works on 3.11 and above kernels and, at the
same time, have support for a couple of new features like aio, tun, timerfd
etc. available in later kernels. Since these new features do not break
generic operations we do not require them in the kernel strictly.

However, in the zdtm tests it's very important to know exactly what can
and what cannot be tested. Right now this is done in a tough manner -- if
the kernel is not 3.11 or criu check fails for _any_ reason we treat the
kernel as being "bad" and throw out a set of tests.

I propose to test some individual features and form the list of tests
in a more fine-grained manner.

This patch only fixes the AIO, mnt_id, tun and posix-timers tests. Next
I will add checks and fixes for user-namespaces tests.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
2015-01-22 18:55:34 +03:00
Saied Kazemi
296129295a Allow the veth-pair option to specify a bridge
When restoring a pair of veth devices that had one end inside a namespace
or container and the other end outside, CRIU creates a new veth pair,
puts one end in the namespace/container, and names the other end from
what's specified in the --veth-pair IN=OUT command line option.

This patch allows for appending a bridge name to the OUT string in the
form of OUT@<BRIDGE-NAME> in order for CRIU to move the outside veth to
the named bridge.  For example, --veth-pair eth0=veth1@br0 tells CRIU
to name the peer of eth0 veth1 and move it to bridge br0.

This is a simple and handy extension of the --veth-pair option that
obviates the need for an action script although one can still do the same
(and possibly more) if they prefer to use action scripts.

Signed-off-by: Saied Kazemi <saied@google.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-01-12 14:54:18 +03:00
Cyrill Gorcunov
fd07bc7791 cpu: Add 'ins' mode to --cpu-cap option
In this mode we test if target cpu has all features present
in image file but do not require bit to bit match: target cpu
may be a new one with more features present.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-12-26 18:15:46 +03:00
Saied Kazemi
0412152fc5 Add inherit fd support
There are cases where a process's file descriptor cannot be restored
from the checkpoint images.  For example, a pipe file descriptor with
one end in the checkpointed process and the other end in a separate
process (that was not part of the checkpointed process tree) cannot be
restored because after checkpoint the pipe will be broken.

There are also cases where the user wants to use a new file during
restore instead of the original file at checkpoint time.  For example,
the user wants to change the log file of a process from /path/to/oldlog
to /path/to/newlog.

In these cases, criu's caller should set up a new file descriptor to be
inherited by the restored process and specify the file descriptor with the
--inherit-fd command line option.  The argument of --inherit-fd has the
format fd[%d]:%s, where %d tells criu which of its own file descriptors
to use for restoring the file identified by %s.

As a debugging aid, if the argument has the format debug[%d]:%s, it tells
criu to write out the string after colon to the file descriptor %d.  This
can be used, for example, as an easy way to leave a "restore marker"
in the output stream of the process.

It's important to note that inherit fd support breaks applications
that depend on the state of the file descriptor being inherited.  So,
consider inherit fd only for specific use cases that you know for sure
won't break the application.

For examples please visit http://criu.org/Category:HOWTO.

v2: Added a check in send_fd_to_self() to avoid closing an inherit fd.
    Also, as an extra measure of caution, added checks in the inherit fd
    look up functions to make sure that the inherit fd hasn't been reused.
    The patch also includes minor cosmetic changes.

Signed-off-by: Saied Kazemi <saied@google.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-12-10 12:48:30 +03:00
Tycho Andersen
2b5d06817f dump: pre-load kernel modules
See the comment below for an explanation of what is going on. We will
ultimately need to handle dumping the netlink data, but I think it is good to
prevent injecting events into the stream during a dump. So we pre-load the
modules, even though it isn't very pretty.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-10-14 14:21:05 +04:00
Cyrill Gorcunov
73b9a2ebe3 cpuinfo: Add "cpuinfo [dump|check]" commands, v2
On Wed, Oct 01, 2014 at 05:51:09PM +0400, Pavel Emelyanov wrote:
> > Yes, what you've been expecting?
>
> if (!strcmp(argv[optind]))
> 	return cpu_cap_check()
>
> or smth like this.

updated. So if it become confusing -- feel free to merge [1;9] and
ping me to resend the rest, or pick up from attachements.

>From 6af96ff63ac82f9566c3cba9c116dc67698c9797 Mon Sep 17 00:00:00 2001
From: Cyrill Gorcunov <gorcunov@openvz.org>
Date: Tue, 30 Sep 2014 18:33:40 +0400
Subject: [PATCH] cpuinfo: Add "cpuinfo [dump|check]" commands

They allow to validate cpuinfo information
without running complete dump/restore actions.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-10-03 13:26:58 +04:00
Cyrill Gorcunov
87273ccdb8 cpuinfo: x86 -- Add dump and validation of cpuinfo image, v2
On Wed, Oct 01, 2014 at 04:57:40PM +0400, Pavel Emelyanov wrote:
> On 10/01/2014 01:07 AM, Cyrill Gorcunov wrote:
> > On Tue, Sep 30, 2014 at 09:18:53PM +0400, Cyrill Gorcunov wrote:
> >> If a user requested criu to dump cpuinfo image then we
> >> write one on dump and verify on restore. At the moment
> >> we require all cpu feature bits to match the destination
> >> cpu in a sake of simplicity, but in future we need deps
> >> engine which would filer out bits and test if cpu we're
> >> restoring on is more capable than one we were dumping at
> >> allowing to proceed restore procedure.
> >>
> >> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
> >
> > Updated to new img format

Something like attached?

>From 59272a9514311e6736cddee08d5f88aa95d49189 Mon Sep 17 00:00:00 2001
From: Cyrill Gorcunov <gorcunov@openvz.org>
Date: Thu, 25 Sep 2014 16:04:10 +0400
Subject: [PATCH] cpuinfo: x86 -- Add dump and validation of cpuinfo image

If a user requested criu to dump cpuinfo image then we
write one on dump and verify on restore. At the moment
we require all cpu feature bits to match the destination
cpu in a sake of simplicity, but in future we need deps
engine which would filer out bits and test if cpu we're
restoring on is more capable than one we were dumping at
allowing to proceed restore procedure.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-10-03 13:26:57 +04:00
Cyrill Gorcunov
ff1a751a89 opt: cpu-cap -- Introduce "none" and "cpuinfo" arguments
They will serve to choose capability level when migrating
images between various hardware nodes.

Note it's bare functionality introduced in this commit,
the real implementation is in next patches.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-10-03 13:25:56 +04:00
Cyrill Gorcunov
88f488e31f opt: cpu-cap -- Make it as optional_argument
This slightly changes the visible API, but i think it's safe now.
The idea behind it to make single --cpu-cap to indicate "--cpu-cap all".

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-10-03 13:25:55 +04:00
Pavel Emelyanov
53957fadc3 restore: Introduce the --restore-sibling option
We have a slight mess with how criu restores root task.
Right now we have the following options.

1) CLI
	a) Usually
	task calling criu
	 `- criu
	     `- root restored task

	b) when --restore-detached AND root has pdeath_sig

	task calling criu
	 `- criu
	 `- root restored task

2) Library/SWRK
	task using lib/swrk
	 `- criu
	 `- root restored task

3) Standalone service
	a) Usually
	service
	 `- service sub task
	     `- root restored task

	b) when root has pdeath_sig
	criu service
	 `- criu sub task
	 `- root restored task

It would be better is CRIU always restored the root task as sibling,
but we have 3 constraints:

First, the case 1.a is kept for zdtm to run tests in pid namespaces
on 3.11, which in turn doesn't allow CLONE_PARENT | CLONE_NEWPID.

Second, CLI w/o --restore-detach waits for the restored task to die and
this behavior can be "expected" already.

Third, in case of standalone service tasks shouldn't become service's
children.

And I have one "plan". The p.haul project while live migrating tasks
on destination node starts a service, which uses library/swrk mode. In
this case the restored processes become p.haul service's kids which is
also not great.

That said, here's the option called --restore-child that pairs the
--restore-detach like this:

* detached AND child:

task
 `- criu restore (exits at the end)
 `- root task

The root task will become task's child.
This will be default to library/swrk.
This is what LXC needs.

* detach AND !child

task
 `- criu restore (exits at the end)
     `- root task

The root task will get re-parented to init.
This will be compatible with 1.3.
This will be default to standalone service and
to my wish with the p.haul case.

* !detach AND child

task
 `- criu restore (waits for root task to die)
 `- root task

This should be deprecated, so that criu restore doesn't mess
 with task <-> root task signalling.

* !detach AND !child

task
 `- criu restore (waits for root task to die)
     `- root task

This is how plain criu restore works now.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Andrew Vagin <avagin@openvz.org>
2014-09-10 18:30:30 +04:00
Pavel Emelyanov
b47b0201f3 page-server: Don't setup options in parent task
When service starts page server all the preparations (log, wdir, img dir, etc.)
happen in parent task, then we fork page server.

This is OK for now, but when we will serve several requests per connection, all
these resources would be leaked in parent.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-05 13:49:54 +04:00
Pavel Emelyanov
069bdd9674 scripts: Move scripts code into separate sources
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-05 13:48:21 +04:00
Pavel Emelyanov
7058714fda service: Add ability to inherit page server socket
The swrk action is turning out to be a cool thing. We can
spawn criu with swrk action with some FD being open, then
ask for dump/pre-dump/page-server telling it that some
descriptor it needs is "out there".

This patch lets us specify that the page server communication
channel is already in criu's fdtable.

TODO: teach regular service to accept fd via service socket.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-03 20:50:12 +04:00
Ruslan Kuprieiev
9089ce89c4 service: use setproctitle
Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Acked-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-02 16:15:20 +04:00
Saied Kazemi
9eec8b03af Use --root instead of --aufs-root
When dumping Docker containers using the AUFS graph driver, we can
use the --root option instead of --aufs-root for specifying the
container's root.  This patch obviates the need for --aufs-root
and makes dump CLI more consistent with restore CLI.

Signed-off-by: Saied Kazemi <saied@google.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-27 14:31:40 +04:00
Filipe Brandenburger
601e5bb485 crtools: bump up the getopt return values to outside the ascii range
The return values were getting dangerously close to the range of meaningful
values, in particular the next candidate 63 is equal to '?' which is the
typical return value in case of error.

The return values for long options may be any integer, so bump them up to
outside the ascii range, start above 1000. For ease of review this patch, keep
the existing range (41-62) and increment each value by 1000.

Tested:
- Ran "criu --help", works fine.
- Manual dump and restore with some of the options, worked fine.
- Ran the zdtm test suite, tests passed.

Signed-off-by: Filipe Brandenburger <filbranden@google.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-26 12:49:42 +04:00
Pavel Emelyanov
7947ea7111 crtools: Make new_cg_root_add setup global root too
This is to make it convenient for service to setup the same thing.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
2014-08-22 19:20:03 +04:00
Saied Kazemi
d8b41b6525 Added AUFS support.
The AUFS support code handles the "bad" information that we get from
the kernel in /proc/<pid>/map_files and /proc/<pid>/mountinfo files.
For details see comments in sysfs_parse.c.

The main motivation for this work was dumping and restoring Docker
containers which by default use the AUFS graph driver.  For dump,
--aufs-root <container_root> should be added to the command line options.
For restore, there is no need for AUFS-specific command line options
but the container's AUFS filesystem should already be set up before
calling criu restore.

[ xemul: With AUFS files sometimes, in particular -- in case of a
  mapping of an executable file (likekely the one created at elf load),
  in the /proc/pid/map_files/xxx link target we see not the path
  by which the file is seen in AUFS, but the path by which AUFS
  accesses this file from one of its "branches". In order to fix
  the path we get the info about branches from sysfs and when we
  meet such a file, we cut the branch part of the path. ]

Signed-off-by: Saied Kazemi <saied@google.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-21 18:35:22 +04:00
Tycho Andersen
94f6c87c9f cg: add --cgroup-root option
The motivation for this is to be able to restore containers into cgroups other
than what they were dumped in (if, e.g. they might conflict with an existing
container). Suppose you have a container in:

memory:/mycontainer
cpuacct,cpu:/mycontainer
blkio:/mycontainer
name=systemd:/mycontainer

You could then restore them to /mycontainer2 via --cgroup-root /mycontainer2.
If you want to restore different controllers to different paths, you can
provide multiple arguments, for example, passing:

--cgroup-root /mycontainer2 --cgroup-root cpuacct,cpu:/specialcpu \
--cgroup-root name=systemd:/specialsystemd

Would result in things being restored to:

memory:/mycontainer2
cpuacct,cpu:/specialcpu
blkio:/mycontainer2
name=systemd:/specialsystemd

i.e. a --cgroup-root without a controller prefix specifies the new default root
for all cgroups.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-19 12:58:36 +04:00
Tycho Andersen
f95b05eb75 opts: add --manage-cgroups option
criu managed cgroups is now an opt-in thing, so by default criu does not manage
(i.e. dump or restore) cgroups. This allows users to use the previous behavior.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-12 14:32:50 +04:00
Ruslan Kuprieiev
2b268c6c21 security: check additional groups,v5
Currently, we only check if process gids match primary gid of user.
But process and user have additional groups too. So lets:
     1) check that process rgid,egid and sgid are in the user's grouplist.
     2) on restore check that user has all groups from the images.

Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-06 10:20:27 +04:00
Pavel Emelyanov
84eb0a1927 criu: Restore tasks as siblings in swrk
Andrey validly pointed out, that restoring pdeath_sig is not
compatible with criu_restore_child() call -- after criu restore
children, it will exit and fire the pdeath_sig into restored
tree root, potentially killing it.

The fix for that could be -- when started in swrk more, criu can
restore tree not as children tasks, but as siblings, using the
CLONE_PARENT flag when fork()-ing the root task.

With this we should also take care about errors handing -- right
now criu catches the SIGCHILD from dying children tasks, and
since we plan to create them be children of the criu parent (the
library caller) we will not be able to catch them. To do so we
SEIZE the root task in advance thus causing all SIGCHLD-s go to
criu, not to its parent.

Having this done we no longer need the SUBREAPER trick in the
library call -- tasks get restored right as callers kids :)

Some thoughts for future -- using this trick we can finally make
"natural" restoration of shell jobs. I.e. -- make criu restore
some subtree right under bash, w/o leaving itself as intermediate
task and w/o re-parenting the subtree to init after restore.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrey Vagin <avagin@parallels.com>
2014-07-01 16:16:07 +04:00
Pavel Emelyanov
d30521a3cf crtools: Add internal "swrk" action
To help restoring tasks from images as kids to the caller, we can
do the trick.

1. Caller sets himself as child reaper with PR_SET_CHILD_SUBREAPER prctl
2. Caller makes sure criu binary is suid-ed and owned by root
3. Caller forks and calls execv() on criu asking it to restore
4. Criu finishes restore and exits. All its kids get reparented to the
   criu's parent, i.e. -- to the library caller.
5. Caller stops being subreaper

In order to make the execv() and arguments passing simpler I propose
to execv() the service worker function, that accepts options via socket.

This is good for two reasons.

1. We don't have to construct CLI options in libcriu
2. We reuse other service's facilities, such as security checks,
   ability to dump, pre-dump and other stuff

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-06-27 14:24:33 +04:00
Pavel Emelyanov
c7e0042946 crtools: Introduce the --ext-mount-map option (v3)
On dump one uses one or more --ext-mount-map option with A:B arguments.
A denotes a mountpoint (as seen from the target mount namespace) criu
dumps and B is the string that will be written into the image file
instead of the mountpoint's root.

On restore one uses the same --ext-mount-map option(s) with similar
A:B arguments, but this time criu treats A as string from the image's
root field (foobar in the example above) and B as the path in criu's
mount namespace the should be bind mounted into the mountpoint.

v3:
* Added documentation
* Added RPC bits
* Changed option name into --ext-mount-map
* Use colon as key and value separator

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-06-17 10:36:30 +04:00
Deyan Doychev
69a6bf4439 criu: Add exec-cmd option (v3)
The --exec-cmd option specifies a command that will be execvp()-ed on successful
restore. This way the command specified here will become the parent process of
the restored process tree.

Waiting for the restored processes to finish is responsibility of this command.

All service FDs are closed before we call execvp(). Standad output and error of
the command are redirected to the log file when we are restoring through the RPC
service.

This option will be used when restoring LinuX Containers and it seems helpful
for perf or other use cases when restored processes must be supervised by a
parent.

Two directions were researched in order to integrate CRIU and LXC:

1. We tell to CRIU, that after restoring container is should execve()
   lxc properly explaining to it that there's a new container hanging
   around.

2. We make LXC set himself as child subreaper, then fork() criu and ask
   it to detach (-d) from restore container afterwards. Being a subreaper,
   it should get the container's init into his child list after it.

The main reason for choosing the first option is that the second one can't work
with the RPC service. If we call restore via the service then criu service will
be the top-most task in the hierarchy and will not be able to reparent the
restore trees to any other task in the system. Calling execve from service
worker sub-task (and daemonizing it) should solve this.

Signed-off-by: Deyan Doychev <deyandoichev@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-03-25 01:20:02 +04:00
Pavel Emelyanov
edde5fb461 irmap: Add option that forces fsnotify watches paths resolve
When migrating container with copying its FS, the inode numbers
and thus their handles wil change. This will make the restore of
inotify/fanotify fail, since they do it via fhandles.

We've already faced the problems with fsnotifies on NFS -- they
don't work there. To address this an irmap cache is created on
pre-dump, so to resolve the issue with changed inodes during
migration, we can force the irmap cache build.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-03-06 15:12:05 +04:00