On ARM some PTRACE_... constants are not declared in sys/ptrace.h file.
They are in linux/ptrace.h, but on x86 this file somewhat conflicts with
the sys/ one. For now fix ARM compilation by using criu/ one and think
of it later.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Andrey validly pointed out, that restoring pdeath_sig is not
compatible with criu_restore_child() call -- after criu restore
children, it will exit and fire the pdeath_sig into restored
tree root, potentially killing it.
The fix for that could be -- when started in swrk more, criu can
restore tree not as children tasks, but as siblings, using the
CLONE_PARENT flag when fork()-ing the root task.
With this we should also take care about errors handing -- right
now criu catches the SIGCHILD from dying children tasks, and
since we plan to create them be children of the criu parent (the
library caller) we will not be able to catch them. To do so we
SEIZE the root task in advance thus causing all SIGCHLD-s go to
criu, not to its parent.
Having this done we no longer need the SUBREAPER trick in the
library call -- tasks get restored right as callers kids :)
Some thoughts for future -- using this trick we can finally make
"natural" restoration of shell jobs. I.e. -- make criu restore
some subtree right under bash, w/o leaving itself as intermediate
task and w/o re-parenting the subtree to init after restore.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrey Vagin <avagin@parallels.com>
The implementation is pretty straightforward. When dumping per-thread
misc data with parasite, collect one, then write in thread_core_info.
On restore wait for creds restore and put the value back (some creds
changes drop it to zero).
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Perform dumping but with preliminary iterations. Each
time an iteration ends the ->more callback is called.
The callback's return value is
- positive -- one more iteration starts
- zero -- final dump is performed and call exits
- negative -- dump is aborted, the value is returned
back from criu_dump_iters
Inside callback one may (well, should) call criu_set_
function to alter the details of next iterations. In
particluar, then prev and next images directories should
be changed.
The @pi argument is an opaque value that caller may
use to request pre-dump statistics (not yet implemented).
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
After a bit more thinking I found a way to fetch arguments
from notifications -- pass opaque value into callback and
provide a set of calls for exploring one.
With this we can
a) provide more data if service supplies additional fields
in the future
b) not check the action name to decide whether or not the
requested argument is available
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This is achieved by supplying the callback. Every time a notification
arrives the callback is called. Return value of 0 means continue,
any other value aborst the request and the value is reported back
to the caller (from criu_dump/criu_restore calls).
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It's derived from test.c, but is more self-contained
and explicitly checks for both C and R results.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Which is at the same time the demonstration of how to do the trick.
v2:
* remove stupid sleep 1 synchronization
* run internal version of child, not the external script
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
It fully uses the swrk action of criu. The problems, that caller may
have is that the restored tasks die _before_ libcriu's call returns.
v2:
* rename _sub to _child
* unblock sigchild before execl-ing criu
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
To help restoring tasks from images as kids to the caller, we can
do the trick.
1. Caller sets himself as child reaper with PR_SET_CHILD_SUBREAPER prctl
2. Caller makes sure criu binary is suid-ed and owned by root
3. Caller forks and calls execv() on criu asking it to restore
4. Criu finishes restore and exits. All its kids get reparented to the
criu's parent, i.e. -- to the library caller.
5. Caller stops being subreaper
In order to make the execv() and arguments passing simpler I propose
to execv() the service worker function, that accepts options via socket.
This is good for two reasons.
1. We don't have to construct CLI options in libcriu
2. We reuse other service's facilities, such as security checks,
ability to dump, pre-dump and other stuff
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Building criu with "make criu" on a clean tree was not working, failing on:
make[1]: *** No rule to make target `arch/x86/vdso-pie.o'. Stop.
make: *** [arch/x86/vdso-pie.o] Error 2
git bisect traced the regression to commit c473461d24 (vdso: Make it arch
specific) which apparently dropped the rule to build $(ARCH_DIR)/vdso-pie.o
using the pie rule. Restore the dependency for "make criu" to work again from
a clean tree.
Tested:
$ git clean -fdx
$ make criu
Fixes: c473461d24
Signed-off-by: Filipe Brandenburger <filbranden@google.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
If we build with something like:
make LDFLAGS="-Wl,-Bsymbolic-functions"
We'll get an error because the LDFLAGS are being passed to LD when they
should be pased to CC.
Signed-off-by: Chris J Arges <chris.j.arges@canonical.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Robust lists may be disabled, for example if the "futex_cmpxchg_enabled"
variable in the kernel is unset.
Detect that case by checking that both "get_robust_list" and "set_robust_list"
syscalls return ENOSYS and do not make criu dump fail in that case, but simply
assume an empty list, which is consistent with the syscalls not being
available.
Tested: Successfully ran the zdtm test suite on a kernel where the
"get_robust_list" and "set_robust_list" syscalls are disabled.
Signed-off-by: Filipe Brandenburger <filbranden@google.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Skip the string "name=" when recreating cgroups directories in cgyard.
For example, systemd's entries in cgroup.img are:
name: "name=systemd"
path: "/user/1000.user/4.session"
When creating systemd subdir, named= should not be part of the name.
Signed-off-by: Saied Kazemi <saied@google.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The newinstance options isn't shown in mountinfo. Currently it is
detected in devpts_dump. It is added only for root mounts and it
isn't added for bind-mounts. So mounts_equal(a, b, true) returns false
for such mounts and criu doesn't understand that they should be
bind-mounted.
Reported-by: Tycho Andersen <tycho.andersen@canonical.com>
Cc: Serge Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
On PI we've noticed that CLOCK_BOOTTIME might not be defined
in system headers, so ship own one.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
To be able to run specific tests depending on
architecture we're executing on.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It parses vDSO in memory (just like CRIU does) and
then use direct calls to vDSO entries instead of
.plt/.got bundle. The reason for that -- I must
be sure we're able to proceed calls without relying
on libc anyhow.
Note the test is x86-64 specific so I don't turn in
on in test suite by default.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When opening a reg file on restore -- check that the file size we
opened matches the on we saw on dump. This is not bullet-proof protection,
but is helpful to protect against FS updates between dump/restore.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
To test CLOCK_BOOTTIME feature recently implemented in OpenVZ kernel.
Vanilla kernel and CRIU passes it.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Otherwise cgroups sub-mounts may propagate to another namespaces
and the directory would become unremovable.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
New kernel 3.16 will have old vDSO zone splitted into the two vmas:
one for vdso code itself and second that named vvar for data been
referenced from vdso code.
Because I can't do 'dump' and 'restore' parts of the code separately
(otherwise test would fail) the commit is pretty big one and hard to
read so here is detailed explanation what's going on.
1) When start dumping we detect vvar zone by reading /proc/pid/smap
and looking up for "[vvar]" token. Note the vvar zone is mapped
by a kernel with PF/IO flags so we should not fail here.
Also it's assumed that at least for now kernel won't be changed
much and [vvar] zone always follows the [vdso] zone, otherwise
criu will print error.
2) In previous commits we disabled dumping vvar area contents so
the restorer code never try to read vvar data but still we need
to map vvar zone thus vma entry remains in image.
3) As with previous vdso format we might have 2 cases
a) Dump and restore is happening on same kernel
b) Dump and restore are done on different kernels
To detect which case we have we parse vdso data from image
and find symbols offsets then compare their values with runtime
symbols provided us by a kernel. If they match and (!!!) the
size of vvar zone is the same -- we simply remap both zones
from runtime kernel into the positions dumpee had at checkpoint
time. This is that named "inplace" remap (a).
If this happens the vdso_proxify() routine drops VMA_AREA_REGULAR
from vvar area provided by a caller code and restorer won't try
to handle this vma. It looks somehow strange and probably should
be reworked but for now I left it as is to minimize the patch.
In case of (b) we need to generate a proxy. We do that in same
way as we were before just include vvar zone into proxy and save
vvar proxy address inside vdso mark injected into vdso area. Thus
on subsequent checkpoint we can detect proxy vvar zone and rip
it off the list of vmas to handle.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Because of new vvar area we need to carry the
address of vvar proxy inside the mark. Thus
add members needed and update routines.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>