This commit is in preparation for the (hopefully last :) restore special cpuset
patch.
Previously, we installed the cgroup service fd after calling
prepare_cgroup_dirs, which meant that we had to carry around the temporary
directory name in order to put things in the right place. The
restore_cgroup_prop function uses the cg service fd instead of carrying around
the full path. This means that we can't sue restore_cgroup_prop, without first
sanitizing the path. Instead, we install the service fd before calling
prepare_cgroup_dirs, and all the code just references that instead of carrying
around the temporary path.
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The symptom of this bug was that users restoring tasks to a nested cgroup where
the top level group was created by criu (and not previously configured) e.g.
cpuset:/lxc/u1 would get an ENOSPC. criu would try to copy the special
properties into /lxc/u1 directly and (silently) fail, and then tried to copy
the task into the cg and fail with ENOSPC:
ENOSPC Attempted to write(2) an empty cpuset.cpus or cpuset.mems setting to
a cpuset that has tasks attached.
Fixing the silent failure to a loud failure, it gave EACCES:
EACCES Attempted to add, using write(2), a CPU or memory node to a cpuset, when
that CPU or memory node was not already in its parent.
So, we need to copy the the special props down the entire tree. Additionally,
we shouldn't copy props directly from the top, since some intermediate point in
the tree could add restrictions. We first walk back up the tree to find the
first point where the props are empty, and then copy that parent's props all
the way down.
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We're using NULL as a sentinel here to indicate that we shouldn't restore any
cgroup properties. We should make sure that we don't leak this information and
instead check the n_properties field, which we should also set correctly.
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We want to have buffered images to speed up dump and,
slightly, restore. Right now we use plan file descriptors
to write and read images to/from. Making them buffered
cannot be gracefully done on plain fds, so introduce
a new class.
This will also help if (when?) we will want to do more
complex changes with images, e.g. store them all in one
file or send them directly to the network.
For now the cr_img just contains one int _fd variable.
This patch chages the prototype of open_image() to
return struct cr_img *, pb_(read|write)* to accept one
and fixes the compilation of the rest of the code :)
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
The same -- int-fd will soon go away, so return the
explicit int -1 instead of it.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Since we're going to switch from int-fd-s to class-image
soon the fdset name will not fit into the new terminology.
This patch is
sed -e 's/fdset/imgset/g' -i *
sed -e 's/imgset_fd/img_from_set/g' -i *
git mv include/fdset.h include/imgset.h
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
We prefer x* helpers because they print error
in case of allocation failures.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It is called from prepare_cgroup_sfd() and cr_restore_tasks().
+ criu restore --file-locks --tcp-established --evasive-devices --link-remap --root /var/lib/vz/root/101 --restore-detached --action-script /usr/local/libexec/vzctl/scripts/vps-rst-env -D /vz/dump/Dump.101 -o restore.log -vvvv --pidfile /var/lib/vzctl/vepid/101
*** Error in `criu': double free or corruption (fasttop): 0x00000000006bcd40 ***
Program terminated with signal 6, Aborted.
Missing separate debuginfos, use: debuginfo-install glibc-2.17-20.fc19.x86_64 libgcc-4.8.3-1.fc19.x86_64 protobuf-c-0.15-7.fc19.x86_64
(gdb) bt
#0 0x00007ffff72179e9 in raise () from /lib64/libc.so.6
#1 0x00007ffff72190f8 in abort () from /lib64/libc.so.6
#2 0x00007ffff7257d17 in __libc_message () from /lib64/libc.so.6
#3 0x00007ffff725f0b8 in _int_free () from /lib64/libc.so.6
#4 0x0000000000426971 in cr_restore_tasks () at cr-restore.c:1833
#5 0x0000000000418426 in main (argc=<optimized out>, argv=0x7fffffffeb38, envp=<optimized out>) at crtools.c:479
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Without this patch, we dump somethin like this:
{
cnames: "hugetlb"
dirs: {
dir_name: ""
children: {
dir_name: "ewroot"
children: <empty>
properties: <empty>
}
properties: <empty>
}
}
It's obvious, that dir_name should be newroot.
The problem is reproduced, if a task leaves in "/" and has a subgroup.
This issue was caught by a chance. The cgroup02 test doesn't clean up
controllers and leaves the "newroot" there. So when we executed a cgroup
test after cgroup02, we could find many directories like "ewroot",
"wroot", etc. This patch fixes this issue.
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This is to make it convenient for service to setup the same thing.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
cpuset.cpus and cpuset.mems can't be written to for the first time after they
have tasks, so the traditional mechanism of restoring properties after
restoring the tasks won't work here. Instead, we copy the parent values of the
properties into them, restore the tasks, and then restore via the traditional
mechanism the actual values of these properties.
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
In particular, cpuset.cpus and cpuset.mems can both be "lists" (strings), as
well as hex integers. We don't use the result of this parse, so it is fine to delete it.
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The motivation for this is to be able to restore containers into cgroups other
than what they were dumped in (if, e.g. they might conflict with an existing
container). Suppose you have a container in:
memory:/mycontainer
cpuacct,cpu:/mycontainer
blkio:/mycontainer
name=systemd:/mycontainer
You could then restore them to /mycontainer2 via --cgroup-root /mycontainer2.
If you want to restore different controllers to different paths, you can
provide multiple arguments, for example, passing:
--cgroup-root /mycontainer2 --cgroup-root cpuacct,cpu:/specialcpu \
--cgroup-root name=systemd:/specialsystemd
Would result in things being restored to:
memory:/mycontainer2
cpuacct,cpu:/specialcpu
blkio:/mycontainer2
name=systemd:/specialsystemd
i.e. a --cgroup-root without a controller prefix specifies the new default root
for all cgroups.
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When writing the system default for memory.limit_in_bytes (which is a LLONG_MAX)
the write fails. The number is equivalent to -1 (unlimited). So during dump,
store the number -1 instead.
Change-Id: Iafccc96bf5dbade763d7addaeda24194616e4d5f
Signed-off-by: Garrison Bellack <gbellack@google.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Because different kernel versions have different cgroup properties, criu
shouldn't crash just because the properties statically listed aren't exact.
Instead, during dump, ignore properties the kernel doesn't have and continue.
Change-Id: I5a8b93d6a8a3a9664914f10cf8e2110340dd8b31
Signed-off-by: Garrison Bellack <gbellack@google.com>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
prepare_cgroup_dirs() gets a path and an offset.
Then we add substrings to the source string and handle them.
v2: fix one more place in prepare_cgroup_dir_properties()
Cc: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Before the patch cg tree section from cgroup00 test looked like this
{
cnames: "name=zdtmtst"
dirs: {
path: "/subcg"
children: {
path: "/subcg/subsubcg"
children: <empty>
properties: <empty>
}
properties: <empty>
}
}
this /subsg in the children is excessive. Turn this into directory names.
Now the section looks like
{
cnames: "name=zdtmtst"
dirs: {
dir_name: "subcg"
children: {
dir_name: "subsubcg"
children: <empty>
properties: <empty>
}
properties: <empty>
}
}
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
When we omit the --manage-cgroups on dump the controllers section
in cgroups image lacks the none-d entries (the name=systemd is the
most typical).
If it happens, that init task lives in non-criu cgset (it can be
so if we do --shell-job dump from another terminal and see criu
and root task living in different user.slice systemd cgroups) then
on restore the move_in_cgroup() would fail to lookup the required
controller.
In order to fix this we should still call the collect_cgroups()
on dump, so that it adds the none-d controllers into the list,
but don't dump the dirs tree itself.
The patch looks ugly, but it just moves the current_controller
evaluation from the middle of the loop upwards (and renames the
char *opts variable not to conflict with global opts).
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
This should be symmetrical with cg dirs creation.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
criu managed cgroups is now an opt-in thing, so by default criu does not manage
(i.e. dump or restore) cgroups. This allows users to use the previous behavior.
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Building on top of the cgroup properties infrastructure patch, this patch will
add all the cgroups properties to the static list of properties we want to restore.
Change-Id: I992c260089dcc2ba169a8ac5b19d73f29c678e7d
Signed-off-by: Garrison Bellack <gbellack@google.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Restores 2 cgroup properties after the criu restoration of tasks.
Currently the cgroup files to be restored are static but
are easily extendable. To change the properties to be restored,
edit this list at the top of cgroup.c. If a cgroup exists during
restoration, its properties will not be overwritten.
Work based off Tycho Anderson tycho.andersen@canonical.com
Change-Id: Ida32b9773eeac1d4d6e82ad644524ed099d5f9b1
Signed-off-by: Garrison Bellack <gbellack@google.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
There is an issue where if the proccess to be killed spawns a child proccess and
moves it in a child cgroup of the one the parent process is in, the cgroup fd
was being closed in the parent process before it forked the child. Then when
move_in_cgroup() is called for the child process, the file descriptor has
already been closed causing a failure for the second call to move_in_cgroup().
Moved the fd close after the fork call.
Change-Id: I6ae88b95c5410a7f56108e28eb3133f113e868d0
Signed-off-by: Garrison Bellack <gbellack@google.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
CID 1230179 (#1 of 1): Resource leak (RESOURCE_LEAK)
15. leaked_storage: Variable "ncd" going out of scope leaks the storage
it points to.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
- xfree works well with nil argument no need for additional tests.
- no need for @ret variable, we either success returning 0 explicitly,
either fail with explicit -1
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
After the commit that walks /proc/self/fd/N path instead of the temporary
one, the add_cgroup() started trimming first several bytes from the cgroup
path.
Test passed, since all cgroups were left as is after dump, so criu restore
didn't recreate them but got EEXIST on all mkdir-s.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
A mount point, which is mounted by someone else, may be umounted in
any moment.
For example the test system executes tests concurrently and sometimes
one test looks up a mount point, which has been mounted by another test.
==================================== ERROR ====================================
Test: zdtm/live/static/inotify00, Namespace: 1
Dump log : /var/lib/jenkins/jobs/CRIU-dump/workspace/test/dump/inotify00/15535/1/dump.log
--------------------------------- grep Error ---------------------------------
(00.021951) Error (cgroup.c:409): cg: failed walking /var/lib/jenkins/jobs/CRIU-dump/workspace/test/dump/signalfd00/15538/1/.criu.cgmounts.UGj28v/ for empty cgroups
(00.021967) Error (cr-dump.c:1601): Dump core (pid: 15535) failed with -1
(00.025509) Error (cr-dump.c:1914): Dumping FAILED.
------------------------------------- END -------------------------------------
================================= ERROR OVER =================================
In the previous patch I suggested to open a mount point, but it brought
other problems. We may open a directory where a cgroup mount has been
umounted and an owner will get EBUSY on attempt to remove this
directory.
Reported-by: Jenkins Criuovich
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We have two bugs actually.
First, the check for 'item == root_item' in dump_task_cgroup fires
twice: first when we rite inventory (item == NULL as argument and
root_item == NULL because we haven't yet collected tasks) and the
2nd time when we dump the root task itself.
The 2nd issue sits in dump_cgroups() -- if root_cgset == criu_cgset
we don't write cgroups information at all (checking that we don't
have them with list_is_singular() inside that if). That said, we
don't need to read the cgroups tree if we're not going to dump it.
This patch fixes both.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
Before we would not detect the mount point for co-mounted controllers. Things
still worked because we'd just re-mount them ourselves and traverse our own
mount point, but this saves an extra mount().
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The path in cc->path here always has a "/" prefix since it comes from
/proc/$pid/cgroup. The additional / confuses the string slinging because ftw()
normalizes paths to have one "/" when we start traversing a subdirectory.
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>