2
0
mirror of https://github.com/checkpoint-restore/criu synced 2025-08-26 11:57:52 +00:00

9635 Commits

Author SHA1 Message Date
Kirill Tkhai
d3d17b0cbf files: Move prepare_ctl_tty() to criu/tty.c
Move the function and reduce its arguments number.
This is cleanup needed to keep all tty code together.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
2017-12-28 20:02:50 +03:00
Kirill Tkhai
56cd4b53c2 files: Close ctl tty via generic engine
Just mark the fle as "fake" and the engine will do all the work.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
2017-12-28 20:02:50 +03:00
Pavel Tikhomirov
b1a67f8572 zdtm: improve tempfs_overmounted test
Unchanged test provided by Andrew.

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-12-26 21:11:45 +03:00
Pavel Tikhomirov
b286426642 mount: do remaps for child-overmount of another overmount
In case we have mounts:

1 /mnt/
2 /mnt/a with parent 1
3 /mnt/a/b with parent 1
4 /mnt/a with parent 2

We determine 2 as needing remap with does_mnt_overmount() and remap it.
Next we mount 4 on top of 2. Next in fixup_remap_mounts() we want to
move 2 back to it's parent 1, but instead move 4 there. So in these case
children-overmounts need to be remapped too.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-12-26 21:11:45 +03:00
Pavel Tikhomirov
b82c935c27 mount: fix try_remap_mount
Remaps in mnt_remap_list should follow same descending order which was
setup in mnt_resort_siblings(), so don't reorder them.

For instance if we have sibling mounts with mountpoints:
1) /dir1/dir2/dir3
2) /dir1/dir2
3) /dir1
Here (2) is sibling-overmount for (1). Mount (3) is sibling-overmount
for both (1) and (2). So when we move overmounts back in
fixup_remap_mounts() we should first move (2) and only then (3).

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-12-26 21:11:45 +03:00
Pavel Tikhomirov
f083987f27 mount: fix mnt_resort_siblings to work as described
We should add new entry _before_ first entry with less depth to sort in
descending order.

e.g: entries in list have depths [7,5,3], adding new entry m with depth
4 we would break list_for_each_entry loop on p with depth 3, before
patch we would get [7,5,3,4] after list_add, which is wrong.

Also we can relax "<=" check to "<" to avoid unnecessary reordering.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-12-26 21:11:45 +03:00
Pavel Tikhomirov
7ca51df5a8 zdtm: now tempfs_overmounted will pass so remove crfail
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-12-26 21:11:45 +03:00
Pavel Tikhomirov
b6cfb1ce29 mount: make open_mountpoint handle overmouts properly
dump of VZ7 ct fails, if we have overmounted tmpfs inside:

[root@silo ~]# prlctl enter su-test-2
entered into CT
CT-829e7b28 /# mkdir /mnt/overmntedtmp
CT-829e7b28 /# mount -t tmpfs tmpfs /mnt/overmntedtmp/
CT-829e7b28 /# mount -t tmpfs tmpfs /mnt
CT-829e7b28 /# logout

[root@silo ~]# prlctl suspend su-test-2
Suspending the CT...
Failed to suspend the CT: PRL_ERR_VZCTL_OPERATION_FAILED (Details: Will skip in-flight TCP connections
(01.657913) Error (criu/mount.c:1202): mnt: Can't open ./mnt/overmntedtmp: No such file or directory
(01.662528) Error (criu/util.c:709): exited, status=1
(01.664329) Error (criu/util.c:709): exited, status=1
(01.664694) Error (criu/cr-dump.c:2005): Dumping FAILED.
Failed to checkpoint the Container
All dump files and logs were saved to /vz/private/829e7b28-f204-4bce-b09f-d203b99befd4/dump/Dump.fail
Checkpointing failed
)

Criu wants to dump the contents of /mnt/overmntedtmp/ mount but it is
unavailable. So we copy the mount namespace in such a case and unmount
overmounts to access what we want to dump.

Actual usecase here is dumping CT with active mariadb and ssh
connection. Together they happen to create such overmount. As by default
systemd creates a separate mount namespace for mysql and also mounts
tmpfs to /run/user in it, and when ssh(root) is connected - systemd also
mounts tmpfs in container root mount namespace to /run/user/0 for user
files. As /run is slave mount /run/user/0 also propagates to mysql's
mount namespace and initially becomes overmounted by /run/user.

https://jira.sw.ru/browse/PSBM-57362

remove __maybe_unused for mnt_is_overmounted and umount_overmounts

changes in v2:
1) Use clone not fork, share resources with parent same as in
call_in_child_process.
2) Do not enter userns (create helper) for non-overmounted mounts. Thus
return back setns/resorens logic.
3) Helper opens fd for parent directly due to CLONE_FILES, remove futex.
4) Check helper exit status properly.
5) Add get_clean_fd helper.
6) Add better comments.

changes in v3:
1) Pass fd from helper through args instead of ret code, fix ret code
checking.
2) Add \n to pr_err in open_mountpoint

changes in v5:
Make comments even better.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-12-26 21:11:45 +03:00
Pavel Tikhomirov
24298e00b3 mount add umount_overmounts helper to make mount visible
also remove __maybe_unused for __umount_children_overmounts

note: leave it __maybe_unused yet
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-12-26 21:11:45 +03:00
Pavel Tikhomirov
9e71c1f284 mount: add __umount_children_overmounts helper to make mount visible
note: leave it __maybe_unused yet
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-12-26 21:11:45 +03:00
Pavel Tikhomirov
d4fa52e6b9 mount: add mnt_is_overmounted helper to check mount visibility
note: leave it __maybe_unused yet
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-12-26 21:11:45 +03:00
Andrei Vagin
c1b0a849e4 syscall: fix arguments for preadv()
It has two arguments "pos_l and "pos_h" instead of one "off". It is used
to handle 64-bit offsets on 32-bit kernels.

SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
                unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h)

https://github.com/checkpoint-restore/criu/issues/424
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-12-15 21:54:58 +03:00
Vitaly Ostrosablin
6ccd871ba6 criu: Don't fail if ghost file has no parent dirs.
Due to way CRIU handles paths (as relative to workdir), there's a case,
where migration would fail. Simple example is a ghost file in filesystem
root (with root being cwd). For example, "/unlinked" becomes "unlinked".
And original code piece scans path for other slashes, which would be
missing in this case. But it's still a perfectly valid case, and there's
no need to fail. So if there's no parent dir - we just don't need to
create one and we can just return 0 here instead of failing.

Signed-off-by: Vitaly Ostrosablin <vostrosablin@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-12-15 21:54:58 +03:00
Andrei Vagin
c767b11c54 test: check that corked udp sockets are not dumped
The kernel doesn't have an interface to get a sent queue for udp
sockets, so currently we can't dump them and criu dump has to fail in
such cases.

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-12-15 09:14:03 +03:00
Andrei Vagin
776782fe20 sk-inet: detect corked sockets by getting a proper sock opt
Now we block all sockets with non-zero idiag_wqueue, but it doesn't mean
that a CORK option is enabled for a socket. A packet can be in a network
stack and it is accounted into idiag_wqueue.

https://github.com/checkpoint-restore/criu/issues/409

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-12-15 09:14:03 +03:00
Andrei Vagin
9ca8953baa test/docker: check a continaer with a read-only file system
Now it's probably one valide use case, because there is no way to commit
a container when a container is being checkpointed.

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-12-07 19:56:24 +03:00
Andrei Vagin
514510ba87 zdtm/tempfs_subns: sync children with the main process
All static tests has to stop any activity before C/R.

./tempfs_subns --pidfile=tempfs_subns.pid --outfile=tempfs_subns.out --dirname=tempfs_subns.test
Run criu dump
Unable to kill 128: [Errno 3] No such process
Run criu restore
7: Old mounts lost: []
7: New mounts appeared: [('/rootfs/criu/test', '/'), ('/', '/proc'), ('/', '/dev/pts')]
:

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-12-07 19:55:23 +03:00
Pavel Tikhomirov
35f7f5e360 pr_err: add \n where we miss them
Except for several false positives done by:
find -type f -name "*.c" -not -path "./test/*" -exec sed -i
's/\(\<pr_err.*[^\][^n]\)\("[,)]\)/\1\\n\2/g' {} \;

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-12-07 19:52:13 +03:00
Pierre-Olivier Mercier
acde064232 compel: add missing header required by musl
This fix compilation issue regarding undeclared NULL and memcpy when using musl
on ARM.

Signed-off-by: Pierre-Olivier Mercier <nemunaire@nemunai.re>
2017-12-06 21:44:24 -08:00
Andrei Vagin
0c1b1d0fe7 pipe: dump all data from a pipe
Currently we use an additional pipe to steal data from a pipe, but we
don't check that we steal all data. And the additional pipe can have a
smaller size.

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-12-05 04:49:50 +03:00
Andrei Vagin
b921ad2908 test: check a pipe with a custom size
CRIU doesn't handle correctly pipes with sizes which are bigger than a
default one.

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-12-04 23:59:06 +03:00
Pavel Tikhomirov
4d3ae51725 remap: don't free rpath and don't shfree_last gf
a) As we shmalloced rpath it can not be xfreed. b) As we shmalloc
variables in collect_remap_ghost() far away from open_remap_ghost()
where we want to free them, there is no guaranty that our shmalloc was
last and we can't use shfree_last().

fixes commit 0c675a5e9d40 ("files: remove link_remaps when everything
has been restored")

When create_ghost() fails for some reason that produces a segfault for me.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-12-02 04:08:47 +03:00
Mike Rapoport
8c545089ed zdtm: use {read,write}_data in fifo tests
Reading and writing large buffers may result in short read/write. In cases
we expect the entire buffer to be transferred use {read,write}_data rather
than plain read/write syscalls.

Reported-by: Mr Jenkins
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 21:59:03 +03:00
Mike Rapoport
a76ff847d6 zdtm: lib: add {read,write}_data helpers
Some tests expect that all the data will be handled in a single invocation
of read/write system call. But it possible to have short read/write on a
loaded system and this is not an error.
Add helper functions that will reliably read/write the entire buffer.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 21:58:59 +03:00
Andrei Vagin
39d65fba82 compel: use a correct name format for vma files in /proc/pid/map_files/
Currently we use the "map_files/%p-%p" format, but actually it should
be "map_files/%lx-%lx".

The kernel could handle both formats, but recently Alexey Dobriyan fixed
the kernel and it accept only the second format.

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:36:15 +03:00
Mike Rapoport
7491d9191c lazy-pages: do not allow background fetch before restore is finished
If we start backroung memory fetch before restore is completely finished,
we may try to write to the memory areas which were not yet remapped to
proper place and are not registered with userfaultfd.
Add synchronization between restore and the lazy-pages so that lazy-pages
will only handle #PFs before all the tasks are restored.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:36:15 +03:00
Mike Rapoport
b3a754d706 page-server: implement epoll->hangup_event
The remote page read has nothing to do if the page-server on the source has
closed the connection. Just report an error and abort.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:36:15 +03:00
Mike Rapoport
c6cb9d882a util: epoll: add processing of EPOLL{RD}HUP
Currently when we poll a file descriptor, we only process EPOLLIN events
and if a connection is closed the receiving side has no means to deal with
it.
Add a callback for EPOLL{RD}HUP events and the default implementation for
these events that removes the file descriptor from epoll and closes it.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:36:15 +03:00
Mike Rapoport
584123e99b util: epoll: rename revent to read event
A bit more readable and will be easy to distinguish from upcoming
hevent^Whangup_event.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:36:15 +03:00
Mike Rapoport
307c55640c util: epoll: move comment about timeout decrease to uffd.c
The generic epoll_wait wrapper should not do any assumptions about timeout.
It's it up to lazy-pages daemon to make (future) policy decisions.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:36:15 +03:00
Andrei Vagin
de467742c6 netfilter: use ipv4 iptables rules to block IPv4-mapped IPv6 addresses
If ipv6 socket has an IPv4-mapped address, it is used to handle ipv4
connection, so we have to use ipv4 iptables rules to block this
connection.

Reported-by: Mr Jenkins
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:36:15 +03:00
Adrian Reber
89cdc6bcb4 crtools: also print the current kernel version
In addition to writing the CRIU version to the log file this adds the
current kernel version to the log file:

(00.000008) Version: 3.5 (gitid v3.5-511-ga8cc6cf)
(00.000303) Running on node01 Linux 3.10.0-513.el7.x86_64 #1 SMP Tue Feb 29 06:78:90 EST 2017 x86_64

v2:
 - small changes as suggested by Dmitry (thanks)

Signed-off-by: Adrian Reber <areber@redhat.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:36:15 +03:00
Pavel Begunkov
6b7c744f7b locks: skip 'lease correction' for non-regular files
Leases can be set only on regular files. Thus, as optimization we can
skip attempts to find associated leases in 'correct_file_leases_type'
for other fd types.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:36:15 +03:00
Jacob Wen
5efcb6028d phaul: use relative path for parent link
Absolute paths for parent links may not work on restore.
e.g: restore on a different server(during migration).

See https://github.com/checkpoint-restore/criu/blob/criu-2.x-stable/criu/image.c#L432

Signed-off-by: Jacob Wen <jian.w.wen@oracle.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:36:15 +03:00
Pavel Begunkov
369b56068b zdtm: Test inherited file leases
-- check childs' errors in file_leases03
-- test c/r of lease transfered to child process

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:36:15 +03:00
Pavel Begunkov
394875cdbc locks: Remove duplicated locks
CRIU creates dictinct lock record for each file descriptor on the same
OFD. The patch removes this duplicates. To do so, it adds new field into
struct file_lock, which stores pid of fd, on which lock was found.
'owner pid' is not actually helpful, because the original fd, on which
lock have been set, can be already closed.

Also it purges crutches doing the same stuff but only for file leases.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:36:15 +03:00
Mike Rapoport
6db46e554f lazy-pages: drop_iovs: mark iov as not queued
If we receive only part of the IOV from the page-server we recalculate the
IOV so it will point to the area we still have to fetch. During the split,
the IOV covering the remaining area may remain marked as 'queued' and we'll
never retry fetching it.
Marking the IOV as not queued will ensure its pages will be requested
again.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:36:15 +03:00
Mike Rapoport
5f87346f27 page-xfer: remote-pages: allow receiving partial data
Since commit e609267f681062b4370e528a50f635222e0c2330 ("page-pipe: allow to
share pipes between page pipe buffers") the assumption that we will receive
the exact amount of pages we've requested with PS_IOV_GET does not always
hold.
In the case we serve pages data from the images using 'page-server
--lazy-page' the IOVs seen by the pagemap may cross page-pipe buffer
boundaries and read_page_pipe will clamp the pages in the response to those
boundaries.
Adjust page_server_read so it will not try to receive more pages than
page-server is going to send.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:36:15 +03:00
Mike Rapoport
1c94e98bf1 debug_show_page_pipe: add PPB's pipe offset
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:36:15 +03:00
Andrei Vagin
59770d4f8b phaul: run the phaul test in a docker container
golang from the Ubuntu Trusty is too old.

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:36:15 +03:00
Andrei Vagin
ff599cd966 travis: run phaul tests
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:36:15 +03:00
Andrei Vagin
eb2736dd56 phaul/Makefile: add a target to run tests
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:36:15 +03:00
Andrei Vagin
ddb09e6e18 phaul: check an exit code of a page-server
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:36:15 +03:00
Andrei Vagin
7f482d20e8 phaul/test: exit with a non-zero code in error cases
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:36:15 +03:00
Andrei Vagin
08a0555ea5 phaul: print a message from error objects
It can help to understand a error.

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:36:15 +03:00
Andrei Vagin
98888ec773 phaul: add a script to run tests
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:36:15 +03:00
Andrei Vagin
cc1c41a03c phaul/test: add github.com/golang/protobuf in vendor/
In this case, we can compile tests without cloning third party libraries.

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:36:14 +03:00
Andrei Vagin
3637c414ca phaul: add phaul/src/stats/stats.pb.go
This is required for "go get", it can't execute any commands.

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:36:14 +03:00
Andrei Vagin
767d6e1ace lib: add lib/go/src/rpc/rpc.pb.go
It is required for "go get", it can't execute any commands.

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:36:14 +03:00
Andrei Vagin
e5f1a37925 phaul: use full paths for modules
It is a general practice in golang and "go get" works in this case.

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:36:14 +03:00