2
0
mirror of https://github.com/checkpoint-restore/criu synced 2025-08-30 05:48:05 +00:00

9167 Commits

Author SHA1 Message Date
Dmitry Safonov
f54942c62a make: Don't set $(MAKEFLAGS)
We shouldn't set MAKEFLAGS by the following reasons:
1. User may want to specify some make parameter (e.g., `-d` for debug)
2. We lose parallel build. No `-j` is passed to submake and it looks
   like, gnu/make will not deal with parallel recursive make if
   $(MAKEFLAGS) is unset back.
   Easy to verify: Add `sleep 3` to build rule in Makefile.inc and
   you'll find only one sleep process at a time. After the patch
   if you specify say `-j5` to make - you'll have 5 sleep processes.

Reverts: commit e9beed7bb3f3 ("build: zdtm -- Add implicit rules into
zdtm building").

Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:33 +03:00
Dmitry Safonov
919f548813 test/make: Drop implicit make variables
Let's drop usage of COMPILE.c, OUTPUT_OPTION.
It will allow run submake with -R.

Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:33 +03:00
Dmitry Safonov
c2876c2f25 nmk: Don't redefine MAKEFLAGS
$(MAKEFLAGS) already contains -r -R and --no-print-directory: those
flags are being added in include.mk.. which is included two lines above.
There is no comment and I see no big sense in erasing $(MAKEFLAGS),
rather than adding those flags. So I considered this as a typo.

Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:33 +03:00
Andrei Vagin
3d5ee99974 zdtm: always run criu dump with --track-mem if --snaps is set
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:33 +03:00
Andrei Vagin
8f7a52946f service: don't cache a service descriptor
Service descriptros can be moved in a child process.

v2: handle errors of install_service_fd() properly

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:33 +03:00
Dmitry Safonov
e52add949a test/make: Include .d files
Include deps files to recompile tests when dependency has changed.

Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Reported-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:32 +03:00
Andrey Vagin
c12409fc5f zdtm: calling futex via syscall saves error codes in errno
man 2 futex:
  In  the  event  of  an error (and assuming that futex() was invoked via
  syscall(2)), all operations return -1 and set  errno  to  indicate  the
  cause of the error.

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:32 +03:00
Radostin Stoyanov
348b169518 Remove redundant semicolons
Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:32 +03:00
Radostin Stoyanov
4a6cf33be3 net: Remove trailing whitespace
Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:32 +03:00
Pavel Tikhomirov
ef78b890d2 files: fix clone_service_fd overlap handling
Though LOG_FD_OFF < IMG_FD_OFF, get_service_fd(LOG_FD_OFF) is > than
get_service_fd(IMG_FD_OFF), see __get_service_fd, so the check here
should be twisted. Also add bug_on to track possible __get_service_fd
change which can break these check again.

We have a problem when USERNSD_SK replaces LOG_FD_OFF, latter when
writing to log, instead we actually send crazy commands to usernsd,
which fails to handle them and BUGs or crashes.

https://jira.sw.ru/browse/PSBM-83472

Also we had similar problem when __userns_call receives bad repsonse,
likely it has the same background:

https://api.travis-ci.org/v3/job/352164661/log.txt

fixes commit 129bb14611c3 ("files: Prepare clone_service_fd() for
overlaping ranges.")

v2: move BUG_ON to main() to check it only once, use min+1 and max-1

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Acked-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:32 +03:00
Pavel Tikhomirov
dc62a93010 files: define O_TMPFILE
These fixes compilation on VZ7:
https://ci.openvz.org/job/CRIU/job/CRIU-virtuozzo/job/criu-dev/3605/console

https://jira.sw.ru/browse/PSBM-83713
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:32 +03:00
Andrei Vagin
93b34b6cb8 files: drop O_TMPFILE from file descriptor flags
Unnamed temporary files are restored as ghost files.

If O_TMPFILE is set for the open() syscall, the pathname argument
specifies a directory, but criu gives a path to a ghost file.

(00.107450)     36: Error (criu/files-reg.c:1757): Can't open file tmp/#42274874 on restore: Not a directory

Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:32 +03:00
Andrei Vagin
47513625b2 zdtm: add a test to check O_TMPFILE
man 2 open:
...
O_TMPFILE (since Linux 3.11)

Create  an unnamed temporary file.  The pathname argument speci‐ fies a
directory; an unnamed  inode  will  be  created  in  that directory's
filesystem.  Anything written to the resulting file will be lost when
the last file descriptor is closed, unless the file is given a name.
...

Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:32 +03:00
Andrei Vagin
f992f469e8 jenkins: add a pipeline file for criu-lazy-migration
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:32 +03:00
Radoslaw Burny
72d5e41abd sfds: Fix UB in choose_service_fd_base due to calling __builtin_clz(0)
__builtin_clz(0) leads to undefined behaviour:
https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html

Set nr = 1 directly to avoid this.

Link: https://github.com/checkpoint-restore/criu/issues/470
Signed-off-by: Radoslaw Burny <rburny@google.com>
Acked-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:32 +03:00
Andrei Vagin
f872b43152 travis: don't fail a build due to the GCOV job
It fails too often due to installing gcc-7.

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:32 +03:00
Andrei Vagin
3e66125e9b [v2] criu: add -fprofile-update=atomic for builds with gcov
Sometimes we see errors like this:
criu/cr-restore.gcda:Merge mismatch for function 106

It proabably means that this gcda file was corrupted. According to the
gcc man page, the -fprofile-update=atomic should fix this problem.

v2: this options appered in gcc7, so we need to install it.

Reported-by: Mr Travis CI
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:32 +03:00
Mike Rapoport
8d91b2e3fc lazy-pages: make uffd_io_complete more robust
Make sure we handle various corner cases:
* we received less pages than requested
* the request was capped because of unmap/remap etc
* the process has exited underneath us

Currently we are freeing the request once we've found the address to use
with uffd_copy(). Instead, let's keep the request object around, use it to
properly calculate number of pages we pass to uffd_copy() and then re-add
tailing range (if any) to the IOVs list.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:32 +03:00
Mike Rapoport
7e10b43b4f lazy-pages: factor out insertion to sorted IOV list
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:32 +03:00
Mike Rapoport
95ea4e92a8 lazy-pages: fork: fix duplication of IOV lists
Instead of merging unfinished requests with child's IOVs we queued them
into parent's IOV list. Fix it.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:32 +03:00
Mike Rapoport
7ffaf883bd lazy-pages: actually return to epoll_wait after completing forks
Commit 9cb20327aa4 ("return to epoll_wait after completing forks") was only
half way there. Adding the other half.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:32 +03:00
Mike Rapoport
fb7873c97c lazy-pages: don't try to uffd_copy to removed memory regions
It is possible that when pages request from the remove source arrive, part
of the memory range covered by the request would be already gone because of
madvise(MADV_DONTNEED), mremap() etc.
Ensure we are not trying to uffd_copy more than we are allowed.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:32 +03:00
Mike Rapoport
312e97f1c2 lazy-pages: return to epoll_wait after completing forks
If we get fork() event just before transferring last IOV of the parent
process, continuing to background fetch after completing fork event
handling will cause lazy-pages daemon to exit and nothing will monitor the
child process memory.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:32 +03:00
Mike Rapoport
b6917ef4ca lazy-pages: update events handling to take requests into account
Since the memory mapping is now split between ->iovs and ->reqs lists, any
update to memory layout should take into account both lists.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:32 +03:00
Mike Rapoport
93f3fa3484 lazy-pages: cache buffer size in the lazy_pages_info
Instead of recalculating required for lazy_pages_info->buf when copying
IOVs at fork() time, keep the size of the buffer in the lazy_pages_info
struct.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:31 +03:00
Mike Rapoport
2ad16d4c50 lazy-pages: handle_requests: fix return value propagation
When we return from epoll_run_rfds with positive return value it means that
event handling loop was interrupted because the event should be handled
outside of that loop. Is always the case with UFFD_EVENT_FORK.

It may happen that the event occurred after we've completed the memory
transfer and we are on the way to successful return from the
handle_requests() function, but instead of returning 0 we will return the
positive value we've got from epoll_run_rfds.

Explicitly assigning return value of complete_forks() fixes this issue.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:31 +03:00
Mike Rapoport
315c4418fe lazy-pages: merge_iov_lists: fix corner case of empty destination
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:31 +03:00
Mike Rapoport
82b7e843e1 lazy-pages: introduce merge_iov_lists helper
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:31 +03:00
Mike Rapoport
bb2af52190 test: lazy-pages: exclude maps007
With userfaultfd we cannot reliably service process_vm_readv calls. The
maps007 test that uses these calls passed previously by sheer luck.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:31 +03:00
Mike Rapoport
13f955cdf9 lazy-pages: kill POLL_TIMEOUT
In the current model we haven't started the background page transfer until
POLL_TIMEOUT time has elapsed since the last uffd or socket event. If the
restored process will do memory access one in (POLL_TIMEOUT - eplsilon) the
filling of its memory can take ages.

This patch changes them model in the following way:
* poll for the events indefinitely until the restore is complete
* the restore completion event causes reset of the poll timeout to zero and
* starts the background transfers
* after each transfer we return to check if there are any uffd events to
handle

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:31 +03:00
Mike Rapoport
333005426f lazy-pages: add ability to limit background transfer size
Currently, once we get to transfer pages in the "background", we try to
fetch the entire IOV at once. For large IOVs this may impact #PF latency
for the #PF events occurred during the transfer.

Let's add a simple heuristic for controlling size of the background
transfers. Initially, the transfer will be limited to some default value.
Every time we transfer a chunk we increase the transfer size until it
reaches a pre-defined maximal size. A page fault event resets the
background transfer size to its initial value.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:31 +03:00
Mike Rapoport
f68dcdde83 lazy-pages: make complete_forks more robust
The complete_forks function presumes that it always has a work to do
because we assume that fork event is the only case when we drop out of
epoll_run_rfds with positive return value.

Teach complete_forks to bail out when there is no pending forks to process
to allow exiting epoll_run_rfds for different reasons.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:31 +03:00
Mike Rapoport
b11f579626 lazy-pages: simplify background transfer logic
First check if there are pages we need to transfer and only afterwards
check if there are outstanding requests. Also, instead checking 'bool
remaining' to see if there is more work to do we can simply check if all
the lpi's have been already serviced.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:31 +03:00
Mike Rapoport
71a3f9aaee lazy-pages: rename handle_remaining_pages to xfer_pages
The intention is to use this function for transferring all the pages that
didn't cause a #PF.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:31 +03:00
Mike Rapoport
cc5aa76438 lazy-pages: rename first_pending_iov to pick_next_range
The function anyway pick the next page range to transfer it's just doing it
in very simple FIFO manner.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:31 +03:00
Mike Rapoport
cbee7c04e7 lazy-pages: rework requests queueing
We already have a queue for the requested memory ranges which contains
'lp_req' objects. These objects hold the same information as the lazy_iov:
start address of the range, end address and the address that the range had
at the dump time.

Rather than keep this information twice and use double bookkeeping, we can
extract the requested range from lpi->iovs and move it to lpi->reqs.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:31 +03:00
Mike Rapoport
971e395e67 lazy-pages: rename iov->*base to iov->*start
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:31 +03:00
Mike Rapoport
8e2f957468 lazy-pages: lazy_iov: use end instead of len
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:31 +03:00
Mike Rapoport
eff067a33b lazy-pages: split_iov: always create the new iov above the one being split
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:31 +03:00
Mike Rapoport
6115934bce lazy-pages: explicitly set process exited condition
Instead of relying on length of various lists add a boolean variable to
lazy_pages_info to make it clean when the process has exited

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:31 +03:00
Andrey Vagin
6a66b87e12 zdtm: check an exit code of a straced restore
Currently zdtm doesn't detect when restore failed, if it is executed
with strace. With this patch, fake-restore.sh creates a test file, and
zdtm is able to distinguish when restore failed.

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:31 +03:00
Andrei Vagin
4960d44129 zdtm.py: fix a logic about determing a test flavor in a error case
The get() method requires a key and now we are using an index. That
will never work correctly as it is now.

Acked-by: Adrian Reber <adrian@lisas.de>
Reported-by: Adrian Reber <adrian@lisas.de>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:31 +03:00
Andrey Vagin
faf4f72a1f unix: split dump_external_sockets() for readability
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:31 +03:00
Andrey Vagin
1b52bb436e unix: fix an error code in bind_unix_sk()
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:30 +03:00
Andrey Vagin
c1ad0f8f6d unit: don't check ui->ue->name.len twice in bind_unix_sk()
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:30 +03:00
Andrey Vagin
3347c6efec unix: split bind_unix_sk() for readability
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:30 +03:00
Andrey Vagin
019ebec03e unix: restore sockets on correct mount points
Currently we restore all sockets in the root mount namespace, because we
were not able to get any information about a mount point where a socket
is bound. It is obviously incorrect in some cases.

In 4.10 kernel, we added the SIOCUNIXFILE ioctl for unix sockets.  This
ioctl opens a file to which a socket is bound and returns a file
descriptor.

This new ioctl allows us to get mnt_id by reading fdinfo, and mnt_id
is enough to find a proper mount point and a mount namespace.

The logic of this patch is straight forward. On dump, we save mnt_id for
sockets, on restore we find a mount namespace by mnt_id and restore this
socket in its mount namespace.

Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:30 +03:00
Andrey Vagin
6d785e6cdd unix: resolve a socket file when a socket descriptor is available
unix_process_name() are called when sockets are being collected,
but at this moment we don't have socket descriptors.

A socket descriptor is reuired to get mnt_id, what will allow to resolve
a socket path in its mount namespace.

Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:30 +03:00
Andrey Vagin
0286752b45 kerndat: check the SIOCUNIXFILE ioctl for unix sockets
This ioctl opens a file to which a socket is bound and
returns a file descriptor. This file descriptor can be used to get
mnt_id and a file path.

Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:45:30 +03:00
Andrey Vagin
8ebf1c48f8 unix: handle sockets with USK_CALLBACK as external sockets
The USK_CALLBACK flag means that a socket is externel and will be
restored by a plugin. open_unixsk_standalone should not be called to
these sockets.

$ make -C test/others/unix-callback/ run
...
(00.109338)   7471: sk unix: Opening standalone socket (id 0xd ino 0 peer 0x63b)
(00.109376)   7471: Error (criu/sk-unix.c:1128): sk unix: BUG at criu/sk-unix.c:1128

Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 11:44:33 +03:00