2
0
mirror of https://github.com/checkpoint-restore/criu synced 2025-08-31 06:15:24 +00:00
Commit Graph

10134 Commits

Author SHA1 Message Date
Nicolas Viennot
7d79a58f4d img-streamer: introduction of criu-image-streamer
This adds the ability to stream images with criu-image-streamer

The workflow is the following:
1) criu-image-streamer is started, and starts listening on a UNIX
   socket.
2) CRIU is started. img_streamer_init() is invoked, which connects to the
   socket. During dump/restore operations, instead of using local disk to
   open an image file, img_streamer_open() is called to provide a UNIX pipe
   that is sent over the UNIX socket.
3) Once the operation is done, img_streamer_finish() is called, and the
   UNIX socket is disconnected.

criu-image-streamer can be found at:
https://github.com/checkpoint-restore/criu-image-streamer

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
2020-10-20 00:18:24 -07:00
Nicolas Viennot
51c3f8a908 pipes: loop over splice() when dumping a pipe's data
Instead of erroring, we should loop until we get the desired number of
bytes written, like regular I/O loops.

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
2020-10-20 00:18:24 -07:00
Radostin Stoyanov
0708cbd883 remote: Use tmp file buffer when restore ip dump
When CRIU calls the ip tool on restore, it passes the fd of remote
socket by replacing the STDIN before execvp. The stdin is used by the
ip tool to receive input. However, the ip tool calls ftell(stdin)
which fails with "Illegal seek" since UNIX sockets do not support file
positioning operations. To resolve this issue, read the received
content from the UNIX socket and store it into temporary file, then
replace STDIN with the fd of this tmp file.

 # python test/zdtm.py run -t zdtm/static/env00 --remote -f ns
 === Run 1/1 ================ zdtm/static/env00

 ========================= Run zdtm/static/env00 in ns ==========================
 Start test
 ./env00 --pidfile=env00.pid --outfile=env00.out --envname=ENV_00_TEST
 Adding image cache
 Adding image proxy
 Run criu dump
 Run criu restore
 =[log]=> dump/zdtm/static/env00/31/1/restore.log
 ------------------------ grep Error ------------------------
 RTNETLINK answers: File exists
 (00.229895)      1: do_open_remote_image RDONLY path=route-9.img snapshot_id=dump/zdtm/static/env00/31/1
 (00.230316)      1: 	Running ip route restore
 Failed to restore: ftell: Illegal seek
 (00.232757)      1: Error (criu/util.c:712): exited, status=255
 (00.232777)      1: Error (criu/net.c:1479): IP tool failed on route restore
 (00.232803)      1: Error (criu/net.c:2153): Can't create net_ns
 (00.255091) Error (criu/cr-restore.c:1177): 105 killed by signal 9: Killed
 (00.255307) Error (criu/mount.c:2960): mnt: Can't remove the directory /tmp/.criu.mntns.dTd7ak: No such file or directory
 (00.255339) Error (criu/cr-restore.c:2119): Restoring FAILED.
 ------------------------ ERROR OVER ------------------------
 ################# Test zdtm/static/env00 FAIL at CRIU restore ##################
 ##################################### FAIL #####################################

Fixes #311

Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2020-10-20 00:18:24 -07:00
Radostin Stoyanov
01cab14dfa util: Fix addr casting for IPv4/IPv6 in autobind
When saddr.ss_family is AF_INET6 we should cast &saddr to
(struct sockaddr_in6 *).

Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
be2ded15ee test: fix flake8 errors
The newest version of flake reports errors that variable names like 'l'
should not be used, because they are hard to read.

This changes 'l' to 'line' to make flake8 happy.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-06-06 11:46:14 -07:00
Adrian Reber
d23d1fc0f9 travis: fix alpine builds
With the latest version of the alpine container image it seems that
alpine changed a few package names. This adapts the alpine container
to solve the travis failures.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-06-06 11:45:21 -07:00
Adrian Reber
f2edc1e199 Update certificates for failing tls based tests
When using zdtm.py with --tls it started to fail as the certificates
seem to have expired. Following commands have been used to re-generate
the certificate:

            # Generate CA key and certificate
            echo -ne "ca\ncert_signing_key" > temp
            certtool --generate-privkey > cakey.pem
            certtool --generate-self-signed \
                --template temp \
                --load-privkey cakey.pem \
                --outfile cacert.pem

            # Generate server key and certificate
            echo -ne "cn=$HOSTNAME\nencryption_key\nsigning_key" > temp
            certtool --generate-privkey > key.pem
            certtool --generate-certificate \
                --template temp \
                --load-privkey key.pem \
                --load-ca-certificate cacert.pem \
                --load-ca-privkey cakey.pem \
                --outfile cert.pem
            rm temp cakey.pem

Without this tests will fail in Travis.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-06-05 11:37:50 -07:00
Pavel Emelyanov
95ead14874 criu: Version π
The long-tempting release with lots of new features on board.
We have finally the time namespace support, great improvment of
the pre-dump memory consumption, new clone3 support and many
more.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
v3.14
2020-04-29 16:31:49 +03:00
Kir Kolyshkin
5c5e7695a5 get_clean_mount: demote an error to a warning
When testing runc checkpointing, I frequently see the following error:

> Error (criu/mount.c:1107): mnt: Can't create a temporary directory: Read-only file system

This happens because container root is read-only mount.

The error here is not actually fatal since it is handled later
in ns_open_mountpoint() (at least since [1] is fixed), but it is shown
as error in runc integration tests.

Since it is not fatal, let's demote it to a warning to avoid confusion.

[1] https://github.com/checkpoint-restore/criu/issues/520

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-04-25 00:43:23 -07:00
Andrei Vagin
c83a0aae2c proc: parse clock symbolic names in /proc/pid/timens_offsets
Clock IDs in this file has been replaced by clock symbolic names.

Now it looks like this:
    $ cat /proc/774/timens_offsets
    monotonic      864000         0
    boottime      1728000         0

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-04-25 00:43:23 -07:00
Pavel Tikhomirov
7dc89376b8 pstree: improve error handling in read_pstree_image
First don't free pstree_item as they are allocated with shmalloc on
restore. Second always pstree_entry__free_unpacked PstreeEntry. Third
remove all breaks replacing them with implict goto err, so that it would
be easier to understand that we are on error path. Forth split out
code for reading one pstree item in separate function.

Sadly there is no much use in xfree-ing pi->threads because in case of
an error we still have ->threads unfreed from previous entries anyway.

But at least some cleanup can be done here.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2020-04-25 00:43:23 -07:00
Pavel Tikhomirov
42b5700b72 kerndat remove duplicate call to kerndat_nsid()
Func kerndat_nsid() is called twice.

v2: leave kerndat_nsid call near kerndat_link_nsid

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2020-04-25 00:43:23 -07:00
Nicolas Viennot
2c2fdd3334 parasite-msg: %u is not implemented for parasite code
Changed all the %u into %d.

Ideally, we should implement the %u format for parasite code.

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
2020-04-25 00:43:23 -07:00
Nicolas Viennot
ef7ef9cfa0 kerndat: remove duplicate call to kerndat_socket_netns()
kerndat_socket_netns() is called twice. We keep the latter to avoid
changing the behavior.

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
2020-04-25 00:43:23 -07:00
Pavel Tikhomirov
62088c721f criu: put statement continuation on the same line as the closing bracket
We should follow Linux Kernel Codding Style:

... the closing brace is empty on a line of its own, except in the cases
where it is followed by a continuation of the same statement, ie ... an
else in an if-statement ...

https://www.kernel.org/doc/html/v4.10/process/coding-style.html#placing-braces-and-spaces

Automaticly fixing with:

:!git grep --files-with-matches "^\s*else[^{]*{" | xargs
:argadd <files>
:argdo :%s/}\s*\n\s*\(else[^{]*{\)/} \1/g | update

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2020-04-25 00:43:23 -07:00
Alexander Mikhalitsyn
d1fa1734ee autofs: fix integer overflow in mount options parsing
In real life cases pipe_ino param could be larger that INT_MAX,
but in autofs_parse() function we using atoi function, that uses
4 byte integers. It's a bug.

Example of mount info from real case:
(00.508286) 	type autofs source /etc/auto.misc mnt_id 2824 s_dev 0x4b9 / @
./misc flags 0x300000 options fd=5,pipe_ino=3480845226,pgrp=95929,timeout=300,
minproto=5,maxproto=5,indirect

3480845226 > 2147483647 (32-bit wide signed int max value) => we have a problem

It causes a error:
(03.195915) Error (criu/pipes.c:529): The packetized mode for pipes is not supported yet

Signed-off-by: Alexander Mikhalitsyn (Virtuozzo) <alexander@mihalicyn.com>
2020-04-25 00:43:23 -07:00
Nicolas Viennot
6b9faabf39 mem: avoid re-opening CR_FD_PAGES when not needed
This commit introduces an optimization when rsti(t)->vma_io is empty.
This optimization allows streaming a non-seekable image as CR_FD_PAGES
is not reopened.

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
2020-04-25 00:43:23 -07:00
Nicolas Viennot
4d34f84bb6 img: rellocate a PATH_MAX buffer from the bss section to the stack
Reducing our memory footprint by 4K.

Improved-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
2020-04-25 00:43:23 -07:00
Nicolas Viennot
bb0b4219ef img: fix image_name() when image is empty
When an image is opened but errored with a ENOENT error, the image is
still valid. Later on, do_pb_read_one() can fail and will invoke
image_name(). The image fd is EMPTY_IMG_FD (-404). read_fd_link fails.

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
2020-04-25 00:43:23 -07:00
Andrei Vagin
067a20c815 zdtm: fail if test with the crfail tag passes
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-04-25 00:43:23 -07:00
Andrei Vagin
698f3a4dbd zdtm: limit the line length for ps by 160 symbols
By default, this limit is 80 symbols and this isn't enough:
 4730 pts/0    S+     0:00          \_ ./zdtm_ct zdtm.py
7535 4731 pts/0    S+     0:00          |   \_ python zdtm.py
7536 4839 pts/0    S+     0:00          |       \_ python zdtm.p
7537 4861 pts/0    S+     0:00          |           \_ make --no
7538 4882 pts/0    S+     0:00          |               \_ ./mnt
7539 4883 ?        Ss     0:00          |                   \_ .

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-04-25 00:43:23 -07:00
Andrei Vagin
eab1a30748 timens: restore processes in a new timens to restore clocks
After restoring processes, we have to be sure that monotonic and
boottime clocks will not go backward. For this, we can restore processes
in a new time namespace and set proper offsets for the clocks.

In this patch, criu dumps clocks values event when processes are running
in this host time namespace and on restore, criu creates a new time
namespace, sets dumped clock values and restores processes.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-04-25 00:43:23 -07:00
Andrei Vagin
73438d34bb test: check that C/R of nested time namespaces fails
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-04-25 00:43:23 -07:00
Andrei Vagin
0d8c0562f9 zdtm_ct: run each test in a new time namespace
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-04-25 00:43:23 -07:00
Andrei Vagin
f1655fd540 zdtm: add a new test to check c/r of time namespaces
This test checks that monotonic and boottime don't jump after C/R.

In ns and uns flavors, the test is started in a separate time namespace
with big offsets, so if criu will restore a time namespace incorrectly
the test will detect the big delta of clocks values before and after C/R.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-04-25 00:43:23 -07:00
Andrei Vagin
3fd0fa4bdc zdtm: add support for time namespaces
For ns and uns flavors, tests run in separate time namespaces.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-04-25 00:43:23 -07:00
Andrei Vagin
ddba4af608 namespace: fail if ns/time_for_children isn't equal to ns/time
This case isn't supported right now.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-04-25 00:43:23 -07:00
Andrei Vagin
4127ef4ab7 criu: Add support for time namespaces
The time namespace allows for per-namespace offsets to the system
monotonic and boot-time clocks.

C/R of time namespaces are very straightforward. On dump, criu enters a
target time namespace and dumps currents clocks values, then on restore,
criu creates a new namespace and restores clocks values.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-04-25 00:43:23 -07:00
Pavel Tikhomirov
0e9b42acf9 MAINTAINERS: Add Pavel (myself) to maintainers
Hope I have enough experience in the project to be nominated. I want to
help with review and will try to do my best in it.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2020-04-06 23:19:57 -07:00
Pavel Tikhomirov
e3fb52e375 remove header include statements duplicates
Revert "util: introduce the mount_detached_fs helper"

This reverts commit 5dbc24b206.

Revert "criu: Make use strlcpy() to copy into allocated strings"

This reverts commit bc49927bbc.

Fixes for https://github.com/checkpoint-restore/criu/pull/1003

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2020-03-30 19:43:32 -07:00
Pavel Emelyanov
fcb23dbfcf Merge pull request #1003 from avagin/v3.14-part2
Prepare v3.14 (part 2)
2020-03-30 13:50:56 +03:00
Andrei Vagin
8c36865c84 memfd: split the struct memfd_inode
The struct memfd_inode has a union for dump and restore parts.
The only common parts are the list_head node, and the inode id.

Suggested-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-03-27 19:36:20 +03:00
Andrei Vagin
e3a5d09752 memfd: save all memfd inodes in one image
Per-object image is acceptable if we expect to have 1-3 objects
per-container. If we expect to have more objects, it is better to save
them all into one image. There are a number of reasons for this:
* We need fewer system calls to read all objects from one image.
* It is faster to save or move one image.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-03-27 19:36:20 +03:00
Byeonggon Lee
967797a867 Add build directory to gitignore
After running make install, build directory is generated but not ignored
in gitignore. So this commit add build directory to gitignore.

Signed-off-by: Byeonggon Lee <gonny952@gmail.com>
2020-03-27 19:36:20 +03:00
Pavel Tikhomirov
cc362b432e namespaces: fix error handling in dump_user_ns
Fix n_xid_map leaks on error path and remove useless exit_code.

Fixes: 6e1726f8 ("userns: set uid and gid before entering into userns")
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2020-03-27 19:36:20 +03:00
Andrei Vagin
1ad8657ddb config/nftables: include string.h for strlen
Fixes: 9433b7b9db ("make: use cflags/ldflags for config.h detection mechanism")
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-03-27 19:36:20 +03:00
Andrei Vagin
5f28b692a0 test/fifo_loop: change sizes of all fifo-s to fit a test buffer
This test doesn't expect that the write operation will block.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-03-27 19:36:20 +03:00
Andrei Vagin
1ad209b9c2 test/pipe03: check that pipe size is restored
Create two pipes with and without queued data.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-03-27 19:36:20 +03:00
Andrei Vagin
2b376168ef pipe: restore pipe size even if a pipe is empty
Without this patch, pipe size is restored only if a pipe has
queued data.

Reported-by: Mr Jenkins
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-03-27 19:36:20 +03:00
Valeriy Vdovin
fa705e418b zdtm: Use safe helper function to initialize unix socket sockaddr structure
The helper function removes code duplication from tests that want to
initialize unix socket address to an absolute file path, derived from
current working directory of the test + relative filename of a resulting
socket. Because the former code used cwd = get_current_dir_name() as
part of absolute filename generation, the resulting filepath could later
cause failure of bind systcall due to unchecked permissions and
introduce confusing permission errors.

Signed-off-by: Valeriy Vdovin <valeriy.vdovin@virtuozzo.com>
2020-03-27 19:36:20 +03:00
Valeriy Vdovin
691b4a4e7e zdtm: Implemented get_current_dir_name wrapper that checks for 'x' permissions
Any filesystem syscall, that needs to navigate to inode by it's
absolute path performs successive lookup operations for each part of the
path. Lookup operation includes access rights check.
Usually but not always zdtm tests processes fall under 'other' access
category. Also, usually directories don't have 'x' bit set for other.
In case when bit 'x' is not set and user-ID and group-ID of a process
relate it to 'other', test's will not succeed in performing these
syscalls which are most of filesystem api, that has const char *path
as part of it arguments (open, openat, mkdir, bind, etc).
The observable behavior of that is that zdtm tests fail at file
creation ops on one system and pass on the other. The above is not
immediately clear to the developer by just looking at failed test's logs.
Investigation of that is also not quick for a developer due to the
complex structure of zdtm runtime where nested clones with
NAMESPACE flags take place alongside with bind-mounts.

As an additional note: 'get_current_dir_name' is documented as returning
EACCESS in case when some part of the path lacks read/list permissions.
But in fact it's not always so. Practice shows, that test processes can
get false success on this operation only to fail on later call to
something like mkdir/mknod/bind with a given path in arguments.

'get_cwd_check_perm' is a wrapper around 'get_current_dir_name'. It also
checks for permissions on the given filepath and logs the error. This
directs the developer towards the right investigation path or even
eliminates the need for investigation completely.

Signed-off-by: Valeriy Vdovin <valeriy.vdovin@virtuozzo.com>
2020-03-27 19:36:20 +03:00
Andrei Vagin
c40c09cbbf test/zdtmp: add a test to C/R shared memory file descriptors
Any shared memory region can be openned via /proc/self/map_files.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-03-27 19:36:20 +03:00
Andrei Vagin
10b1d46f67 mem/vma: set VMA_FILE_{PRIVATE,SHARED} if a vma file is borrowed
Here is a fast path when two consequent vma-s share the same file.

But one of these vma-s can map a file with MAP_SHARED, but another one
can map it with MAP_PRIVATE and we need to take this into account.
2020-03-27 19:36:20 +03:00
Andrei Vagin
fb65ab2b1a mem: dump shared memory file descriptors
Any shared memroy mapping can be opened via /proc/self/maps_files/.
Such file descriptors look like memfd file descriptors, so
they can be dumped by the same way.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-03-27 19:36:20 +03:00
Nicolas Viennot
f42ae70c75 make: use cflags/ldflags for config.h detection mechanism
The config.h detection scripts should use the provided CFLAGS/LDFLAGS
as it tries to link libnl, libnet, and others.

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
2020-03-27 19:36:20 +03:00
Andrei Vagin
d0d6f1ad10 mailmap: update my email
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-03-27 19:36:20 +03:00
Mike Rapoport
c3ad4942d4 travis: add ppc64-cross test on amd64
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
2020-03-27 19:36:20 +03:00
Alexander Mikhalitsyn
b9c8e957d8 crit-recode: skip (not try to parse) nftables raw image
We should ignore (not parse) images that has non-crtool format,
that images has no magic number (RAW_IMAGE_MAGIC equals 0).

nftables images has format compatible with `nft -f /proc/self/fd/0`
input format.

Reported-by: Mr Jenkins
Signed-off-by: Alexander Mikhalitsyn (Virtuozzo) <alexander@mihalicyn.com>
2020-03-27 19:36:20 +03:00
Dmitry Safonov
1f74f8d770 travis: Use debian/buster as base for cross build tests
Jessie is called 'oldoldstable', migrate to Buster.

Suggested-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
2020-03-27 19:36:20 +03:00
Dmitry Safonov
18ac1540c4 travis: Add aarch64-cross test on amd64
Fixes: #924
Signed-off-by: Dmitry Safonov <dima@arista.com>
2020-03-27 19:36:20 +03:00