The swrk action is turning out to be a cool thing. We can
spawn criu with swrk action with some FD being open, then
ask for dump/pre-dump/page-server telling it that some
descriptor it needs is "out there".
This patch lets us specify that the page server communication
channel is already in criu's fdtable.
TODO: teach regular service to accept fd via service socket.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
criu managed cgroups is now an opt-in thing, so by default criu does not manage
(i.e. dump or restore) cgroups. This allows users to use the previous behavior.
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Currently, we only check if process gids match primary gid of user.
But process and user have additional groups too. So lets:
1) check that process rgid,egid and sgid are in the user's grouplist.
2) on restore check that user has all groups from the images.
Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
To help restoring tasks from images as kids to the caller, we can
do the trick.
1. Caller sets himself as child reaper with PR_SET_CHILD_SUBREAPER prctl
2. Caller makes sure criu binary is suid-ed and owned by root
3. Caller forks and calls execv() on criu asking it to restore
4. Criu finishes restore and exits. All its kids get reparented to the
criu's parent, i.e. -- to the library caller.
5. Caller stops being subreaper
In order to make the execv() and arguments passing simpler I propose
to execv() the service worker function, that accepts options via socket.
This is good for two reasons.
1. We don't have to construct CLI options in libcriu
2. We reuse other service's facilities, such as security checks,
ability to dump, pre-dump and other stuff
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
On dump one uses one or more --ext-mount-map option with A:B arguments.
A denotes a mountpoint (as seen from the target mount namespace) criu
dumps and B is the string that will be written into the image file
instead of the mountpoint's root.
On restore one uses the same --ext-mount-map option(s) with similar
A:B arguments, but this time criu treats A as string from the image's
root field (foobar in the example above) and B as the path in criu's
mount namespace the should be bind mounted into the mountpoint.
v3:
* Added documentation
* Added RPC bits
* Changed option name into --ext-mount-map
* Use colon as key and value separator
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The --exec-cmd option specifies a command that will be execvp()-ed on successful
restore. This way the command specified here will become the parent process of
the restored process tree.
Waiting for the restored processes to finish is responsibility of this command.
All service FDs are closed before we call execvp(). Standad output and error of
the command are redirected to the log file when we are restoring through the RPC
service.
This option will be used when restoring LinuX Containers and it seems helpful
for perf or other use cases when restored processes must be supervised by a
parent.
Two directions were researched in order to integrate CRIU and LXC:
1. We tell to CRIU, that after restoring container is should execve()
lxc properly explaining to it that there's a new container hanging
around.
2. We make LXC set himself as child subreaper, then fork() criu and ask
it to detach (-d) from restore container afterwards. Being a subreaper,
it should get the container's init into his child list after it.
The main reason for choosing the first option is that the second one can't work
with the RPC service. If we call restore via the service then criu service will
be the top-most task in the hierarchy and will not be able to reparent the
restore trees to any other task in the system. Calling execve from service
worker sub-task (and daemonizing it) should solve this.
Signed-off-by: Deyan Doychev <deyandoichev@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When migrating container with copying its FS, the inode numbers
and thus their handles wil change. This will make the restore of
inotify/fanotify fail, since they do it via fhandles.
We've already faced the problems with fsnotifies on NFS -- they
don't work there. To address this an irmap cache is created on
pre-dump, so to resolve the issue with changed inodes during
migration, we can force the irmap cache build.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
As we have global variable opts, it is bad to use
local var with the same name.
Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Currently we have a bug, service sends resp of type PRE_DUMP
instead of DUMP. So lets introduce send_criu_pre_dump_resp() and
use it.
Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This is the first time restorer gets info back from CRIU
service. At that time it makes perfect sense to report
what PID we're working with.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
setup_opts_from_req() prints an error message, so there's no need
for its caller to print another one.
While at it, simplify/unify error checking, treating any
non-zero value as an error.
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
There are two reasons to ban subdirs in logfile name.
First, process might be in different namespace, so it is right to give us fd
for work dir, just like we did with images dir. Second, as service might be ran
as root, it is unsafe to give an opportunity to fill any dir with trash.
If you wan't to put logs/stats somwhere else than images_dir, you could
set work_dir_fd.
Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This works as multi-req thing -- caller issues the
pre-dump request, criu serves it and sends result back.
Then service waits for the next request on the same
session, client doesn't have to re-connect back.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Service shouldn't call client provided scripts, as it
creates a security issue (client may be unpriviledged,
while the service is).
In order to let caller do what it would normally do with
criu-scripts, make criu notify it about scripts. Caller
then do whatever it needs and responds back.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When RPC is being requested to check the kernel, it's
enough to check the minimal amount of kernel feature.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Since sd_listen_fds() doesn't set errno when returning a value > 1,
it doesn't make sense to use pr_perror(). Use pr_err() instead.
While at it, remove the period from the log message.
[v2: fix function names]
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
If restore fails on early stage(like no images in directory), then root_item
might be uninitialized, so when we are trying to send response with root_item->pid
criu crashes.
Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
cr_dump_tasks() may return not only -1 on fail.
Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Such constants as CR_MAX_MSG_SIZE and CR_DEFAULT_SERVICE_ADDRESS are need to be used in both service and lib.
Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When pr_perror is used, an error message is appended with a comma
and an strerror(errno), so we should not put a period at the end,
otherwise we'll end up with something like this:
Error: Can't bind.: Permission denied
Found by git grep -w pr_perror | grep '\."'
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
... and don't return -1.
This is a missing part from commit 3477223.
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Makes the criu RPC socket socket-activated with
systemd [1], meaning that systemd will create and listen to
the UNIX socket /var/run/criu-srvice.socket
on behalf of criu until a connection comes in, when it will
then pass control of the socket, along with the first connection
over to a newly spawned criu daemon.
This is similar to inetd, but criu stays around after getting
started, listening itsself on the socket.
[1] http://0pointer.de/blog/projects/socket-activation.html
v2: stripped down sd-daemon.[ch]
moved units to scripts/sd
v3: stripped down further by removing unneeded includes
Signed-off-by: Shawn Landden <shawn@churchofgit.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Currently we have a bug: if no leave_running is set in request, service won't send dump response. We must not send response only if it was a self-dump request and no leave_running option was set.
Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>