2
0
mirror of https://github.com/checkpoint-restore/criu synced 2025-08-31 06:15:24 +00:00
Commit Graph

9962 Commits

Author SHA1 Message Date
Dmitry Safonov
c8f16bfacb compel/infect: Warn if close() failed on memfd
As a preparation for __must_check on compel_syscall(), check it on
close() too - maybe not as useful as with other syscalls, but why not.

Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:04 -08:00
Dmitry Safonov
a93117ede1 lib/ptrace: Be more elaborate about failures
Also, don't use the magic -2 => return errno on failure.

Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:04 -08:00
Dmitry Safonov
ef277068de lib/ptrace: Allow PTRACE_PEEKDATA with errno != 0
>From man ptrace:
> On error, all requests return -1, and errno is set appropriately.
> Since the value returned by a successful PTRACE_PEEK* request may be
> -1, the caller must clear errno before the call, and then check
> it afterward to determine whether or not an error occurred.

FWIW: if ptrace_peek_area() is called with (errno != 0) it may
false-fail if the data is (-1).

Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:04 -08:00
Adrian Reber
ea018e9a9c travis: remove group from .travis.yml
Tests are successful even after removing 'group:' from .travis.yml.
Apparently it is not necessary.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-02-04 12:39:04 -08:00
Adrian Reber
fe668075ad travis: switch pcp64le and s390x to real hardware
Now that Travis also supports ppc64le and s390x we can remove all qemu
based docker emulation from our test setup. This now runs ppc64le and
s390x tests on real hardware (LXD containers).

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-02-04 12:39:04 -08:00
Adrian Reber
eab8cf0775 travis: switch all arm related tests to real hardware
This switches all arm related tests (32bit and 64bit) to the aarch64
systems Travis provides. For arm32 we are running in a armv7hf container
on aarch64 with 'setarch linux32'.

The main changes are that docker on Travis aarch64 cannot use
'--privileged' as Travis is using unprivileged LXD containers to setup
the testing environment.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-02-04 12:39:04 -08:00
Adrian Reber
075f1beaf7 Makefile hack for travis aarch64/armv8l
For CRIU's compile only tests for armv7hf on Travis we are using
'setarch linux32' which returns armv8l on Travis aarch64.

This adds a path in the Makefile to treat armv8l just as armv7hf during
compile. This enables us to run armv7hf compile tests on Travis aarch64
hardware. Much faster. Maybe not entirely correct, but probably good
enough for compile testing in an armv7hf container.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-02-04 12:39:04 -08:00
Adrian Reber
6be414bb2b travis: Do not run privileged containers in LXD
Travis uses unprivileged containers for aarch64 in LXD. Docker with
'--privileged' fails in such situation. This changes the travis setup
to only start docker with '--privileged' if running on x86_64.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-02-04 12:39:04 -08:00
Adrian Reber
62953d4334 travis: fix copy paste error from previous commit
In my previous commit I copied a line with a return into the main script
body. bash can only return from functions. This changes return to exit.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-02-04 12:39:04 -08:00
Nidhi Gupta
2b4e653361 Run java functional tests on travis
Signed-off-by: Nidhi Gupta <itsnidhi16@gmail.com>
2020-02-04 12:39:04 -08:00
Pavel Tikhomirov
f3cca97d80 mount: make mnt_resort_siblings nonrecursive and reuse friendly
Add mnt_subtree_next DFS-next search to remove recursion.

v5: add these patch, remove recursion from sorting helpers
v6: rip out butifull yet unused step-part of nfs-next algorithm

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2020-02-04 12:39:04 -08:00
Pavel Tikhomirov
35adc08598 mount: rework mount tree build step on restore
Build each mntns mount tree alone just after reading mounts for it from
image. These additional step before merging everything to a single mount
tree allows us to have pointers to each mntns root mount at hand, also
it allows us to remove extra complication from mnt_build_tree.

Teach collect_mnt_from_image return a tail pointer, so we can merge
lists together later after building each tree.

Add separate merge_mount_trees helper to create joint mount tree for all
mntns'es and simplify mnt_build_ids_tree.

I don't see any place where we use mntinfo_tree on restore, so save the
real root of mntns mounts tree in it, instead of root_yard_mp, will need
it in next patches for checking restore of these trees.

v2: prepend children to the root_yard in merge_mount_trees so that the
order in merged tree persists

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2020-02-04 12:39:04 -08:00
Pavel Tikhomirov
7be7260261 ns/restore/image: do not read namespace images for non-namespaced case
Images for mount and net namespaces are empty if ns does not belong to
us, thus we don't need to collect on restore.

By adding these checks we will eliminate suspicious messages in logs
about lack of images:

./test/zdtm.py run -k always -f h -t zdtm/static/env00

env00/54/2/restore.log:(00.000332) No mountpoints-5.img image
env00/54/2/restore.log:(00.000342) No netns-2.img image

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2020-02-04 12:39:04 -08:00
Pavel Tikhomirov
71dff54aa4 ns: make rst_new_ns_id static
It's never used outside of namespaces.c

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2020-02-04 12:39:04 -08:00
Pavel Tikhomirov
d804f70a68 mount: remove useless check in populate_mnt_ns
The path:

restore_root_task
 prepare_namespace_before_tasks
  mntns_maybe_create_roots

is always called before the path below:

retore_root_task
 fork_with_pid
  restore_task_with_children
   prepare_namespace
    prepare_mnt_ns
     populate_mnt_ns

So (!!mnt_roots) == (root_ns_mask & CLONE_NEWNS) in populate_mnt_ns, but
in prepare_mnt_ns we've already checked that it is true, so there is no
need in these check - remove it.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2020-02-04 12:39:04 -08:00
Dmitry Safonov
9325339e64 travis: Disallow failures on ia32
It seems pretty stable and hasn't add many false-positives during last
months. While can reveal some issues for compatible C/R code.

Signed-off-by: Dmitry Safonov <dima@arista.com>
2020-02-04 12:39:04 -08:00
Nidhi Gupta
389bcfef3e test/java: Add FileRead Tests
Signed-off-by: Nidhi Gupta <itsnidhi16@gmail.com>
2020-02-04 12:39:04 -08:00
Vitaly Ostrosablin
c4006c0034 test/static:conntracks: Support nftables
Update test to support both iptables and nft to create conntrack rules.

Signed-off-by: Vitaly Ostrosablin <vostrosablin@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:04 -08:00
Adrian Reber
a7c625938e travis: start to use aarch64 hardware
With the newly introduced aarch64 at Travis it is possible for the CRIU
test-cases to switch to aarch64.

Travis uses unprivileged LXD containers on aarch64 which blocks many of
the kernel interfaces CRIU needs. So for now this only tests building
CRIU natively on aarch64 instead of using the Docker+QEMU combination.

All tests based on Docker are not working on aarch64 is there currently
seems to be a problem with Docker on aarch64. Maybe because of the
nesting of Docker in LXD.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-02-04 12:39:04 -08:00
Sergey Bronnikov
3861b334b2 Fix broken web-links 2020-02-04 12:39:04 -08:00
Nicolas Viennot
1a28dee52b Action scripts should be invoked with normal signal behavior
Signal masks propagate through execve, so we need to clear them before
invoking the action scripts as it may want to handle SIGCHLD, or SIGSEGV.

Signed-off-by: Nicolas Viennot <nicolas.viennot@twosigma.com>
2020-02-04 12:39:04 -08:00
Dmitry Safonov
19a24df53c early-log: Print warnings only if the buffer is full
I don't see many issues with early-log, so we probably don't
need the warning when it was used. Note that after
commit 74731d9 ("zdtm: make grep_errors also grep warnings")
also warnings are grepped by zdtm.py (and I believe that was
an improvement) which prints some bothering lines:

> =[log]=> dump/zdtm/static/inotify00/38/1/dump.log
> ------------------------ grep Error ------------------------
> (00.000000) Will allow link remaps on FS
> (00.000034) Warn  (criu/log.c:203): The early log isn't empty
> ------------------------ ERROR OVER ------------------------

Instead of decreasing loglevel of the message, improve it by
reporting a real issue.

Cc: Adrian Reber <adrian@lisas.de>
Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Cc: Radostin Stoyanov <rstoyanov1@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
2020-02-04 12:39:04 -08:00
Ashutosh Mehra
00ce121fd5 Add criu to PATH env variable in libcriu tests
PATH is pointing to incorrect location for `criu` executable
causing libcriu tests to fail when running in travis.
Also added statements to display log file contents on failure
to help in debugging.

Signed-off-by: Ashutosh Mehra <asmehra1@in.ibm.com>
2020-02-04 12:39:04 -08:00
Ashutosh Mehra
321f826621 Enable libcriu testing in travis jobs
Updated scripts/travis/travis-tests to run libcriu test.

Signed-off-by: Ashutosh Mehra <asmehra1@in.ibm.com>
2020-02-04 12:39:04 -08:00
Ashutosh Mehra
f8125b8bef Couple of fixes to build and run libcriu tests
libcriu tests are currently broken. This patch fixes couple of
issues to allow the building and running libcriu tests.

1. lib/c/criu.h got updated to include version.h which is present
at "criu/include", but the command to compile libcriu tests is not
specifying "criu/include" in the path to be searched for header
files. This resulted in compilation error.
This can be fixed by adding "-I ../../../../../criu/criu/include"
however it causes more problems as "criu/include/fcntl.h" would now
hide system defined fcntl.h
Solution is to use "-iquote ../../../../../criu/criu/include"
which applies only to the quote form of include directive.

2. Secondly, libcriu.so major version got updated to 2 but
libcriu/run.sh still assumes verion 1. Instead of just updating the
version in libcriu/run.sh to 2, this patch updates the libcriu/Makefile
to use "CRIU_SO_VERSION_MAJOR" so that future changes to major version
of libcriu won't cause same problem again.

Signed-off-by: Ashutosh Mehra <asmehra1@in.ibm.com>
2020-02-04 12:39:04 -08:00
Radostin Stoyanov
477c3a4b0b service: Use space on stack for msg buffer
RPC messages are have fairly small size and using space on the stack
might be a better option. This change follows the pattern used with
do_pb_read_one() and pb_write_one().

Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
2020-02-04 12:39:04 -08:00
Radostin Stoyanov
e56401ed3c image-desc: Remove CR_FD_FILE_LOCKS_PID
The support for per-pid images with locks has been dropped with
commit d040219 ("locks: Drop support for per-pid images with locks")
and CR_FD_FILE_LOCKS_PID is not used.

Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
2020-02-04 12:39:04 -08:00
Pavel Tikhomirov
f65b17e976 cgroup: fix cg_yard leak on error path in prepare_cgroup_sfd
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2020-02-04 12:39:04 -08:00
Radostin Stoyanov
5a92f100b8 page-pipe: Resize up to PIPE_MAX_SIZE
When performing pre-dump we continuously increase the page-pipe size to
fit the max amount memory pages in the pipe's buffer. However, we never
actually set the pipe's buffer size to max. By doing so, we can reduce
the number of pipe-s necessary for pre-dump and improve the performance
as shown in the example below.

For example, let's consider the following process:

	#include <stdio.h>
	#include <stdlib.h>
	#include <unistd.h>

	void main(void)
	{
		int i = 0;
		void *cache = calloc(1, 1024 * 1024 * 1024);
		while(1) {
			printf("%d\n", i++);
			sleep(1);
		}
	}

stats-dump before this change:

	frozen_time: 123538
	memdump_time: 95344
	memwrite_time: 11980078
	pages_scanned: 262721
	pages_written: 262169
	page_pipes: 513
	page_pipe_bufs: 519

stats-dump after this change:

	frozen_time: 83287
	memdump_time: 54587
	memwrite_time: 12547466
	pages_scanned: 262721
	pages_written: 262169
	page_pipes: 257
	page_pipe_bufs: 263

Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
2020-02-04 12:39:04 -08:00
Nicolas Viennot
71c2a9dc73 Guard against empty file lock status
The lock status string may be empty. This can happen when the owner of
the lock is invisible from our PID namespace. This unfortunate behavior
is fixed in kernels v4.19 and up (see commit 1cf8e5de40)

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:04 -08:00
Andrei Vagin
3efe44382f image: avoid name conflicts in image files
Conflict register for file "sk-opts.proto": READ is already defined in
file "rpc.proto". Please fix the conflict by adding package name on the
proto file, or use different name for the duplication.  Note: enum
values appear as siblings of the enum type instead of children of it.

https://github.com/checkpoint-restore/criu/issues/815
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:04 -08:00
Andrei Vagin
6b264f591f criu: use atomic_add instead of atomic_sub
atomic_sub isn't defined for all platforms.

Reported-by: Mr Jenkins
Cc: Abhishek Dubey <dubeyabhishek777@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:04 -08:00
Andrei Vagin
7c97cc7eb2 lib/c: fix a compile time error
lib/c/criu.c:343:30: error: implicit conversion from enumeration type
'enum criu_pre_dump_mode' to different enumeration type 'CriuPreDumpMode'
(aka 'enum _CriuPreDumpMode') [-Werror,-Wenum-conversion

                opts->rpc->pre_dump_mode = mode;
                                         ~ ^~~~

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:04 -08:00
Andrei Vagin
d305576996 zdtm: handle --pre-dump-mode in the rpc mode
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:04 -08:00
Abhishek Dubey
befbbd9bba Refactor time accounting macros
refactoring time macros as per read mode
pre-dump design.

Signed-off-by: Abhishek Dubey <dubeyabhishek777@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:04 -08:00
Abhishek Dubey
98608b90de read mode pre-dump implementation
Pre-dump using the process_vm_readv syscall.
During frozen state, only iovecs will be
generated and draining of memory happens
after the task is unfrozen. Pre-dumping of
shared memory remains unmodified.

Signed-off-by: Abhishek Dubey <dubeyabhishek777@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:04 -08:00
Abhishek Dubey
4c774afc18 Adding cnt_sub for stats manipulation
adding cnt_sub function (complement of cnt_add).
cnt_sub is utilized to decrement stats counter
according to skipped page count during "read"
mode pre-dump.

Signed-off-by: Abhishek Dubey <dubeyabhishek777@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:04 -08:00
Abhishek Dubey
29b63e9a72 Skip adding PROT_READ to non-PROT_READ mappings
"read" mode pre-dump may fail even after
adding PROT_READ flag. Adding PROT_READ
works when dumping statically. See added
comment for details.

Signed-off-by: Abhishek Dubey <dubeyabhishek777@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:04 -08:00
Abhishek Dubey
e0ea21ad5e Handling iov generation for non-PROT_READ regions
Skip iov-generation for regions not having
PROT_READ, since process_vm_readv syscall
can't process them during "read" pre-dump.
Handle random order of "read" & "splice"
pre-dumps.

Signed-off-by: Abhishek Dubey <dubeyabhishek777@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:04 -08:00
Abhishek Dubey
20d4920a8b Adding --pre-dump-mode option
Two modes of pre-dump algorithm:
    1) splicing memory by parasite
        --pre-dump-mode=splice (default)
    2) using process_vm_readv syscall
        --pre-dump-mode=read

Signed-off-by: Abhishek Dubey <dubeyabhishek777@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:02 -08:00
Dmitry Safonov
576a99f492 restorer/inotify: Don't overflow PIE stack
PATH_MAX == 4096; PATH_MAX*8 == 32k; RESTORE_STACK_SIZE == 32k.

Fixes: a3cdf94869 ("inotify: cleanup auxiliary events from queue")
Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Cc: Andrei Vagin <avagin@gmail.com>
Co-debugged-with: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:38:24 -08:00
Nicolas Viennot
578597299a Cleanup do_full_int80()
1) Instead of tampering with the nr argument, do_full_int80() returns
the value of the system call. It also avoids copying all registers back
into the syscall_args32 argument after the syscall.

2) Additionally, the registers r12-r15 were added in the list of
clobbers as kernels older than v4.4 do not preserve these.

3) Further, GCC uses a 128-byte red-zone as defined in the x86_64 ABI
optimizing away the correct position of the %rsp register in
leaf-functions. We now avoid tampering with the red-zone, fixing a
SIGSEGV when running mmap_bug_test() in debug mode (DEBUG=1).

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:38:24 -08:00
Andrei Vagin
b84f481b55 unix: print inode numbers as unsigned int
Reported-by: Mr Jenkins
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:38:24 -08:00
Andrei Vagin
3f1c4a17ad pipe: print pipe_id as unsigned to generate an external pipe name
Reported-by: Mr Jenkins
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:38:24 -08:00
Pavel Tikhomirov
b47ef26eac cgroup: fixup nits
1) s/\s*$//
2) fix snprintf out of bound access

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2020-02-04 12:38:24 -08:00
Andrei Vagin
f44939317f zdtm/cgroup_yard: create a test cgroup yard from the post-start hook
Right now, it is created from the pre-dump hook, but
if the --snap option is set, the test fails:
$ python test/zdtm.py run -t zdtm/static/cgroup_yard -f h --snap --iter 3
...
Running zdtm/static/cgroup_yard.hook(--pre-dump)
Traceback (most recent call last):
  File zdtm/static/cgroup_yard.hook, line 14, in <module>
    os.mkdir(yard)
OSError: [Errno 17] File exists: 'external_yard'

Cc: Michał Cłapiński <mclapinski@google.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:38:24 -08:00
Andrei Vagin
db40ef5be6 test/cgroup_yard: always clean up a test cgroup yard
Right now it is cleaned up from a post-restore hook,
but zdtm.py can be executed with the norst option:
$ zdtm.py run -t zdtm/static/cgroup_yard --norst
...
OSError: [Errno 17] File exists: 'external_yard'

Cc: Michał Cłapiński <mclapinski@google.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:38:24 -08:00
Radostin Stoyanov
813bfbeb4f Convert pr_msg() error messages to pr_err()
Print error messages to stderr (instead of stdout).

Suggested-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
2020-02-04 12:38:23 -08:00
Radostin Stoyanov
a9f974b495 Introduce flush_early_log_to_stderr destructor
Prior log initialisation CRIU preserves all (early) log messages in a
buffer. In case of error the content of the content of this buffer
needs to be printed out (flushed).

Suggested-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
2020-02-04 12:37:37 -08:00
Andrei Vagin
8bdc60d50e arch/x86: fpu_state->fpu_state_ia32.xsave hast to be 64-byte aligned
Before the 5.2 kernel, only fpu_state->fpu_state_64.xsave has to be
64-byte aligned. But staring with the 5.2 kernel, the same is required
for pu_state->fpu_state_ia32.xsave.

The behavior was changed in:
c2ff9e9a3d9d ("x86/fpu: Merge the two code paths in __fpu__restore_sig()")

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:37:37 -08:00