Two modes of pre-dump algorithm:
1) splicing memory by parasite
--pre-dump-mode=splice (default)
2) using process_vm_readv syscall
--pre-dump-mode=read
Signed-off-by: Abhishek Dubey <dubeyabhishek777@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
1) Instead of tampering with the nr argument, do_full_int80() returns
the value of the system call. It also avoids copying all registers back
into the syscall_args32 argument after the syscall.
2) Additionally, the registers r12-r15 were added in the list of
clobbers as kernels older than v4.4 do not preserve these.
3) Further, GCC uses a 128-byte red-zone as defined in the x86_64 ABI
optimizing away the correct position of the %rsp register in
leaf-functions. We now avoid tampering with the red-zone, fixing a
SIGSEGV when running mmap_bug_test() in debug mode (DEBUG=1).
Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Right now, it is created from the pre-dump hook, but
if the --snap option is set, the test fails:
$ python test/zdtm.py run -t zdtm/static/cgroup_yard -f h --snap --iter 3
...
Running zdtm/static/cgroup_yard.hook(--pre-dump)
Traceback (most recent call last):
File zdtm/static/cgroup_yard.hook, line 14, in <module>
os.mkdir(yard)
OSError: [Errno 17] File exists: 'external_yard'
Cc: Michał Cłapiński <mclapinski@google.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Right now it is cleaned up from a post-restore hook,
but zdtm.py can be executed with the norst option:
$ zdtm.py run -t zdtm/static/cgroup_yard --norst
...
OSError: [Errno 17] File exists: 'external_yard'
Cc: Michał Cłapiński <mclapinski@google.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Prior log initialisation CRIU preserves all (early) log messages in a
buffer. In case of error the content of the content of this buffer
needs to be printed out (flushed).
Suggested-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
Before the 5.2 kernel, only fpu_state->fpu_state_64.xsave has to be
64-byte aligned. But staring with the 5.2 kernel, the same is required
for pu_state->fpu_state_ia32.xsave.
The behavior was changed in:
c2ff9e9a3d9d ("x86/fpu: Merge the two code paths in __fpu__restore_sig()")
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Instead of creating cgroup yard in CRIU, now we can create it externally
and pass it to CRIU. Useful if somebody doesn't want to grant
CAP_SYS_ADMIN to CRIU.
Signed-off-by: Michał Cłapiński <mclapinski@google.com>
`pushq` sign-extends the value. Which is a bummer as the label's address
may be higher that 2Gb, which means that the sign-bit will be set.
As it long-jumps with ia32 selector, %r11 can be scratched.
Use %r11 register as a temporary to push the 32-bit address.
Complements: a9a760278c1a ("arch/x86: push correct eip on the stack
before lretq")
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Reported-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Right now we use pushq, but it pushes sign-extended value, so if the
parasite code is placed higher that 2Gb, we will see something like
this:
0xf7efd5b0: pushq $0x23
0xf7efd5b2: pushq $0xfffffffff7efd5b9
=> 0xf7efd5b7: lretq
Actually we want to push 0xf7efd5b9 instead of 0xfffffffff7efd5b9.
Fixes: #398
Cc: Dmitry Safonov <dima@arista.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Acked-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
The function clear_ghost_files() has been removed in commit
b11eeea "restore: auto-unlink for ghost files (v2)".
Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
Here we have some bugfixes, huuuge *.py patch for coding style
and nice set of new features like 32bit for ARM, TLS for page
server and new mode for CGroups.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reduce code duplication by taking setup_swrk() function into a separate
module that can be reused in multiple places.
Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
When Python 2 is not installed we assume that /usr/bin/python refers to
version 3 of Python and the executable /usr/bin/python2 does not exist.
This commit also resolves a compatibility issue with Popen where in
Py2 file descriptors will be inherited by the child process and in
Py3 they will be closed by default.
Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
As discussed on the mailing list, current .py files formatting does not
conform to the world standard, so we should better reformat it. For this
the yapf tool is used. The command I used was
yapf -i $(find -name *.py)
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
After adding the test for fake inotify events cleanup on restore, we've
detected that we also have the same problem on dump/predump, criu
touches files that are watched and generates fake events:
[root@snorch criu]# test/zdtm.py run -t zdtm/static/inotify04 --norst -k always
=== Run 1/1 ================ zdtm/static/inotify04
======================== Run zdtm/static/inotify04 in h ========================
Start test
./inotify04 --pidfile=inotify04.pid --outfile=inotify04.out --dirname=inotify04.test
Run criu dump
=[log]=> dump/zdtm/static/inotify04/36/1/dump.log
------------------------ grep Error ------------------------
(00.004050) fsnotify: openable (inode match) as home/snorch/devel/criu/test/zdtm/static/inotify04.test/inotify-testfile
(00.004052) fsnotify: Dumping /home/snorch/devel/criu/test/zdtm/static/inotify04.test/inotify-testfile as path for handle
(00.004055) fsnotify: id 0x000007 flags 0x000800
(00.004071) 36 fdinfo 5: pos: 0 flags: 4000/0
(00.004080) Warn (criu/fsnotify.c:336): fsnotify: The 0x000008 inotify events will be dropped
------------------------ ERROR OVER ------------------------
Send the 15 signal to 36
Wait for zdtm/static/inotify04(36) to die for 0.100000
############### Test zdtm/static/inotify04 FAIL at result check ################
Test output: ================================
18:20:10.558: 36: Event 0x20
18:20:10.558: 36: Event 0x10
18:20:10.558: 36: Event 0x20
18:20:10.558: 36: Event 0x10
18:20:10.558: 36: Event 0x20
18:20:10.558: 36: Event 0x10
18:20:10.558: 36: Event 0x20
18:20:10.558: 36: Event 0x10
18:20:10.558: 36: Read 8 events
18:20:10.558: 36: FAIL: inotify04.c:105: Found 8 unexpected inotify events (errno = 11 (Resource temporarily unavailable))
<<< ================================
##################################### FAIL #####################################
To suppress fails in jenkins make the inotify04 test 'reqrst'. Still
need to cleanup (or do not create) these events on dump/predump.
This adds the same tests currently running for docker also for podman.
In addition this also tests podman --export/--import (migration)
support.
Signed-off-by: Adrian Reber <areber@redhat.com>
If vvar is absent vdso_before_vvar is initialized by "false".
Which means that the check that supposed to track vdso/vvar pair went
into wrong brackets. In result it broke CRIU on kernels that don't have
vvar mapping.
Simpilfy the code by moving the check for VVAR_BAD_SIZE outside of
conditional for vdso_before_vvar.
Reported-by: Cyrill Gorcunov <gorcunov@gmail.com>
Fixes: 0918c7667647 ("vdso/restorer: Always track vdso/vvar positions in
vdso_maps_rt")
Signed-off-by: Dmitry Safonov <dima@arista.com>
Acked-by: Cyrill Gorcunov <gorcunov@gmail.com>
Tested-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
This is needed to workaround the problem with "ip route save":
(00.113153) Running ip route save
Error: ipv4: FIB table does not exist.
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Just create two inotify watches on a testfile, and do nothing except
c/r, it is expected that there is no events in queue after these.
before "inotify: cleanup auxiliary events from queue":
[root@snorch criu]# ./test/zdtm.py run -t zdtm/static/inotify04
=== Run 1/1 ================ zdtm/static/inotify04
======================== Run zdtm/static/inotify04 in h ========================
DEP inotify04.d
CC inotify04.o
LINK inotify04
Start test
./inotify04 --pidfile=inotify04.pid --outfile=inotify04.out --dirname=inotify04.test
Run criu dump
Run criu restore
Send the 15 signal to 60
Wait for zdtm/static/inotify04(60) to die for 0.100000
=============== Test zdtm/static/inotify04 FAIL at result check ================
Test output: ================================
18:37:14.279: 60: Event 0x10
18:37:14.280: 60: Event 0x20
18:37:14.280: 60: Event 0x10
18:37:14.280: 60: Read 3 events
18:37:14.280: 60: FAIL: inotify04.c:105: Found 3 unexpected inotify events (errno = 11 (Resource temporarily unavailable))
<<< ================================
v2: make two inotifies on the same file
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
zdtm: inotify04 add another inotify on the same file
I've mentioned the problem that after c/r each inotify receives one or
more unexpected events.
This happens because our algorithm mixes setting up an inotify watch on
the file with opening and closing it.
We mix inotify creation and watched file open/close because we need to
create the inotify watch on the file from another mntns (generally). And
we do a trick opening the file so that it can be referenced in current
mntns by /proc/<pid>/fd/<id> path.
Moreover if we have several inotifies on the same file, than queue gets
even more events than just one which happens in a simple case.
note: For now we don't have a way to c/r events in queue but we need to
at least leave the queue clean from events generated by our own.
These, still, looks harder to rewrite wd creation without this proc-fd
trick than to remove unexpected events from queues.
So just cleanup these events for each fdt-restorer process, for each of
its inotify fds _after_ restore stage (at CR_STATE_RESTORE_SIGCHLD).
These is a closest place where for an _alive_ process we know that all
prepare_fds() are done by all processes. These means we need to do the
cleanup in PIE code, so need to add sys_ppoll definitions for PIE and
divide process in two phases: first collect and transfer fds, second do
real cleanup.
note: We still do prepare_fds() for zombies. But zombies have no fds in
/proc/pid/fd so we will collect no in collect_fds() and therefore we
have no in prepare_fds(), thus there is no need to cleanup inotifies for
zombies.
v2: adopt to multiple unexpected events
v3: do not cleanup from fdt-receivers, done from fdt-restorer
v4: do without additional fds restore stage
v5: replace sys_poll with sys_ppoll and fix minor nits
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
use ppoll always and remove poll
Omit calling raw syscalls and use vdso for the purpose of logging.
That will eliminate as much as one-syscall-per-PIE-message.
Getting time without switching to kernel will speed up C/R,
keeping logs as informative as they were.
Fixes: #346
I haven't enabled vdso timings for ia32 applications as it needs more
changes and complexity.. Maybe later.
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
We need to differ compatible (ia32) vdso maps from x86_64.
That dictates ABI on vdso code.
According to that, the decision to (not) use gettimeofday() from vdso in
64-bit restorer.
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
For simplicity, make them always valid in restorer.
rt->vdso_start will be used to calculate gettimeofday() address.
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
vdso will be used in restorer for timings in logs - try to keep it
during restore process.
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Provide a way to set gettimeofday() function for an infected task.
CRIU's parasite & restorer are very voluble as more logs are better
than lesser in terms of bug investigations.
In all modern kernels there is a way to get time without entering
kernel: vdso. So, add a way to reduce the cost of logging without making
it less valuable.
[I'm not particularly fond of std_log_set_gettimeofday() name, so
if someone can come with a better naming - I'm up for a change]
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Doesn't change uapi, but makes it a bit more friendly and documented
which loglevel means what for foreign user.
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
The following error is falsely reported by flake8:
lib/py/images/pb2dict.py:266:24: F821 undefined name 'basestring'
This error occurs because `basestring` is not available in Python 3,
however the if condition on the line above ensures that this error
will not occur at run time.
Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
In the Fedora tests we install python3-pip only to install flake8.
This is not necessary as there is a Fedora package for flake8.
Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
More and more python2 packages are being removed from future Fedora
releases. This removes python2 packages explicitly listed in CRIU's
Dockerfiles, which all are not required for the current level of
testing.
Signed-off-by: Adrian Reber <areber@redhat.com>