To show which events are coming and flush events before dump as required by new fsnotify mode.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
If a controller is mounted during dumping processes, criu may fail with error:
Error (cgroup.c:768): cg: Set 3 is not subset of 2
so lets create all test controllers before executing tests.
Reported-by: Mr Jenkins
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Events are not dumped/restored. An idea of ignoring them isn't good.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
There is a potential attack here where if someone is restoring something and
criu write the pid to a file the attacker controls, the attacker can then
re-write that to whatever pid they want. ciru should instead open the file with
O_EXCL so that the restore fails if the file exists.
We don't need O_TRUNC here since we're O_EXCL-ing the file.
Reported-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
If we need to check if current loglevel will suppress
our messagess (say you need to run pr_debug in a cycle)
we can use this helper to eliminate unneded calls.
Like
if (!pr_quelled(LOG_DEBUG)) {
... do something specific to LOG_DEBUG ...
}
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
All marks are collected in a list and then they are written in
the eventpoll image as a repeated field.
This images merge reduces the amount of image files criu
generates and may simplify the fix of mentioned above issue
v2: save the original order of entries
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
All marks are collected in a list and then they are written in
the fanotify image as a repeated field.
This images merge reduces the amount of image files criu
generates and may simplify the fix of mentioned above issue
v2: don't leak fe.mark_entry
v3: save the original order of marks
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
All watch descriptors are collected in a list and then
they are written in inotify image as a repeated field.
This images merge reduces the amount of image files criu
generates and may simplify the fix of mentioned above issue.
v2: use free_inotify_wd_entry() instead of xfree in dump_one_inotify()
v3: don't leak ie.wd_entry
v4: save the original order of watchers
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The swrk action is turning out to be a cool thing. We can
spawn criu with swrk action with some FD being open, then
ask for dump/pre-dump/page-server telling it that some
descriptor it needs is "out there".
This patch lets us specify that the page server communication
channel is already in criu's fdtable.
TODO: teach regular service to accept fd via service socket.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Here we define new api to be used in plugins.
- Plugin should provide a descriptor with help of
CR_PLUGIN_REGISTER macro, or in case if plugin require
no init/exit functions -- with CR_PLUGIN_REGISTER_DUMMY.
- Plugin should define a plugin hook with help of
CR_PLUGIN_REGISTER_HOOK macro.
- Now init/exit functions of plugins takes @stage
argument which tells plugin which stage of criu
it's been called on dump/restore. For exit it
also takes @ret which allows plugin to know if
something went wrong and it needs to cleanup
own resources.
The idea behind is to not limit plugins authors with names
of functions they might need to use for particular hook.
Such new API deprecates olds plugins structure but to keep
backward compatibility we will provide a tiny layer of
additional code to support old plugins for at least a couple
of release cycles.
For example a trivial plugin might look like
| #include <sys/types.h>
| #include <sys/stat.h>
| #include <fcntl.h>
| #include <libgen.h>
| #include <errno.h>
|
| #include <sys/socket.h>
| #include <linux/un.h>
|
| #include <stdio.h>
| #include <stdlib.h>
| #include <string.h>
| #include <unistd.h>
|
| #include "criu-plugin.h"
| #include "criu-log.h"
|
| static int dump_ext_file(int fd, int id)
| {
| pr_info("dump_ext_file: fd %d id %d\n", fd, id);
| return 0;
| }
|
| CR_PLUGIN_REGISTER_DUMMY("trivial")
| CR_PLUGIN_REGISTER_HOOK(CR_PLUGIN_HOOK__DUMP_EXT_FILE, dump_ext_file)
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
These guys may have pids that are not met in pstree.
This is not the reason for skipping those, try to
resolve flocks anyway.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
After patches, that dump locks w/o dfds array, we can even
not allocate one when we don't need it.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
If we open a file, lock one, fork, then close and
open the file in parent again, lock should 'slide'
to the child process anyway.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Inherited flock is the one that a task got from its parent.
In case parent closes the corresponding fd, the /proc/locks
still shows the parent pid, while the lock is owned by child.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We have a problem with file locks (bug #2512) -- the /proc/locks
file shows the ID of lock creator, not the owner. Thus, if the
creator died, but holder is still alive, criu fails to dump the
lock held by latter task.
The proposal is to find who _might_ hold the lock by checking
for dev:inode pairs on lock vs file descriptors being dumped.
If the creator of the lock is still alive, then he will take
the priority.
One thing to note about flocks -- these belong to file entries,
not to tasks. Thus, when we meet one, we should check whether
the flock is really held by task's FD by trying to set yet
another one. In case of success -- lock really belongs to fd
we dump, in case it doesn't trylock should fail.
At the very end -- walk the list of locks and dump them all at
once, which is possible by merge of per-task file-locks images
into one global one.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Currently we keep the lock type (posix/flock) till the
time we dump it, then "decode" it into binary value.
I will need the easy-to-check one early, so parse the
kind in proc_parse.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We can't dump netlink socket, inotify, fanotify, if they have queued
data, so lets add a function to chech this.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Check for setproctitle_init, as old versions of libbsd don't have one.
Reported-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Acked-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Usually /tmp is a mount point.
Recently we found a bug in criu, when it restore mount fanotify on "./"
instead of "/". The test didn't find it, because they are pointed on the
same mount point.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We try to remove mark on the correct mount point and
if the mark is restored on a wrong mount point, we will get ENOENT.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Each mountpoint belongs to a mount namespace, so we need to
find a root of the mount namespace and open mountpoint
ralative to this root.
The same logic is used in get_mark_path().
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
I'm scared, when I see smth like that:
rm: cannot remove ‘/var/lib/jenkins/jobs/CRIU/workspace/test/dump/static/cgroup00/31195/1/.criu.cgyard.6qctPl/systemd/tasks’: Operation not permitted
v2: do that only in the "test" directory
Reported-by: Mr Jenkins
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It's cleaned up accoding with following statements:
* files_id can't be zero (look at dump_task_kobj_ids)
* item->ids is allocated for all non-dead tasks
* a parent can't be dead
In addition here is a tiny coding stype fix.
Fixes: 475bb1e77522 ("rst: Evaluate per-task clone mask early")
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We are going to collect all objects in a list and write them into
the eventpoll image. The eventpoll tfd image will be depricated.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We are going to collect all objects in a list and write them into
the fanotify image. The fanotify mark image will be depricated.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We are going to collect all objects in a list and write them into
the inotify image. The inotify wd image will be depricated.
v2: cb() must always free an entry
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It's been a long delay since 1.2, but we did it :)
The greatest new acheivement is finally support for Docker
and LXC on CRIU side. Some work is still to be don on the
other, but here in CRIU everything is ready.
Another notable things are AArch64 support and, of course,
a lot of bugfixes.
Further plan is to make releases be not so rare :)
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This is really just the last bit of c32046c9; if restore_one_task() fails, we
need to do the same futex wakeup we do everywhere else in this function.
v2: use err instead of err_fini_mnt after mount has been finalized normally
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Once the task restore has failed, we can just abort, no need to restore the cg
props.
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When in --restore-detached (i.e. root_as_sibling) mode, we ptrace(PTRACE_SEIZE)
the root task to receive its SIGCHLD in case one of its child tasks dies.
However, we don't receive a SIGCHLD if the root task itself dies, so we must
explicitly abort.
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
A kernel without that option configured does not have /dev/pts/ptmx, so
fallback to the previous way of creating it using mknod instead.
The previous code was trying to bind mount ptmx on top of a symlink, which does
not actually work... Keep only the symlink call and use a relative symlink
instead. Adjust the error message of the symlink case to mention symlink()
instead of mknod() and also /dev/ptmx instead of /dev/pts.
Tested:
- zdtm test suite runs on ^ns/static/.* before and after the change.
- Same on a kernel with CONFIG_DEVPTS_MULTIPLE_INSTANCES unset.
Signed-off-by: Filipe Brandenburger <filbranden@google.com>
Acked-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When dumping Docker containers using the AUFS graph driver, we can
use the --root option instead of --aufs-root for specifying the
container's root. This patch obviates the need for --aufs-root
and makes dump CLI more consistent with restore CLI.
Signed-off-by: Saied Kazemi <saied@google.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
rmdir is executed for non-existent directories, so we don't check
an exit code of this operation.
This patch executs rmdir only for existent directories and check
an exit code of rmdir.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
In this job tests are dumped and resumed. The cgroup02 test checks,
that it is moved in another set of cgroups, but this is done on restore.
Output file: test/zdtm/live/static/cgroup02.out>
------------------------------------------------------------------------------
14:35:55.127: 85: found cgroup at cgroup02.test/zdtmtst>
14:35:55.127: 85: found cgroup at cgroup02.test/defaultroot>
14:35:55.127: 85: FAIL: cgroup02.c:132: oldroot not rewritten to zdtmtstroot!
v2: typo fix
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Filipe Brandenburger <filbranden@google.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>