We shouldn't set MAKEFLAGS by the following reasons:
1. User may want to specify some make parameter (e.g., `-d` for debug)
2. We lose parallel build. No `-j` is passed to submake and it looks
like, gnu/make will not deal with parallel recursive make if
$(MAKEFLAGS) is unset back.
Easy to verify: Add `sleep 3` to build rule in Makefile.inc and
you'll find only one sleep process at a time. After the patch
if you specify say `-j5` to make - you'll have 5 sleep processes.
Reverts: commit e9beed7bb3f3 ("build: zdtm -- Add implicit rules into
zdtm building").
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Let's drop usage of COMPILE.c, OUTPUT_OPTION.
It will allow run submake with -R.
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
$(MAKEFLAGS) already contains -r -R and --no-print-directory: those
flags are being added in include.mk.. which is included two lines above.
There is no comment and I see no big sense in erasing $(MAKEFLAGS),
rather than adding those flags. So I considered this as a typo.
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Service descriptros can be moved in a child process.
v2: handle errors of install_service_fd() properly
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Include deps files to recompile tests when dependency has changed.
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Reported-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
man 2 futex:
In the event of an error (and assuming that futex() was invoked via
syscall(2)), all operations return -1 and set errno to indicate the
cause of the error.
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Though LOG_FD_OFF < IMG_FD_OFF, get_service_fd(LOG_FD_OFF) is > than
get_service_fd(IMG_FD_OFF), see __get_service_fd, so the check here
should be twisted. Also add bug_on to track possible __get_service_fd
change which can break these check again.
We have a problem when USERNSD_SK replaces LOG_FD_OFF, latter when
writing to log, instead we actually send crazy commands to usernsd,
which fails to handle them and BUGs or crashes.
https://jira.sw.ru/browse/PSBM-83472
Also we had similar problem when __userns_call receives bad repsonse,
likely it has the same background:
https://api.travis-ci.org/v3/job/352164661/log.txt
fixes commit 129bb14611c3 ("files: Prepare clone_service_fd() for
overlaping ranges.")
v2: move BUG_ON to main() to check it only once, use min+1 and max-1
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Acked-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Unnamed temporary files are restored as ghost files.
If O_TMPFILE is set for the open() syscall, the pathname argument
specifies a directory, but criu gives a path to a ghost file.
(00.107450) 36: Error (criu/files-reg.c:1757): Can't open file tmp/#42274874 on restore: Not a directory
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
man 2 open:
...
O_TMPFILE (since Linux 3.11)
Create an unnamed temporary file. The pathname argument speci‐ fies a
directory; an unnamed inode will be created in that directory's
filesystem. Anything written to the resulting file will be lost when
the last file descriptor is closed, unless the file is given a name.
...
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Sometimes we see errors like this:
criu/cr-restore.gcda:Merge mismatch for function 106
It proabably means that this gcda file was corrupted. According to the
gcc man page, the -fprofile-update=atomic should fix this problem.
v2: this options appered in gcc7, so we need to install it.
Reported-by: Mr Travis CI
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Make sure we handle various corner cases:
* we received less pages than requested
* the request was capped because of unmap/remap etc
* the process has exited underneath us
Currently we are freeing the request once we've found the address to use
with uffd_copy(). Instead, let's keep the request object around, use it to
properly calculate number of pages we pass to uffd_copy() and then re-add
tailing range (if any) to the IOVs list.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Instead of merging unfinished requests with child's IOVs we queued them
into parent's IOV list. Fix it.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Commit 9cb20327aa4 ("return to epoll_wait after completing forks") was only
half way there. Adding the other half.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
It is possible that when pages request from the remove source arrive, part
of the memory range covered by the request would be already gone because of
madvise(MADV_DONTNEED), mremap() etc.
Ensure we are not trying to uffd_copy more than we are allowed.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
If we get fork() event just before transferring last IOV of the parent
process, continuing to background fetch after completing fork event
handling will cause lazy-pages daemon to exit and nothing will monitor the
child process memory.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Since the memory mapping is now split between ->iovs and ->reqs lists, any
update to memory layout should take into account both lists.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Instead of recalculating required for lazy_pages_info->buf when copying
IOVs at fork() time, keep the size of the buffer in the lazy_pages_info
struct.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
When we return from epoll_run_rfds with positive return value it means that
event handling loop was interrupted because the event should be handled
outside of that loop. Is always the case with UFFD_EVENT_FORK.
It may happen that the event occurred after we've completed the memory
transfer and we are on the way to successful return from the
handle_requests() function, but instead of returning 0 we will return the
positive value we've got from epoll_run_rfds.
Explicitly assigning return value of complete_forks() fixes this issue.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
With userfaultfd we cannot reliably service process_vm_readv calls. The
maps007 test that uses these calls passed previously by sheer luck.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
In the current model we haven't started the background page transfer until
POLL_TIMEOUT time has elapsed since the last uffd or socket event. If the
restored process will do memory access one in (POLL_TIMEOUT - eplsilon) the
filling of its memory can take ages.
This patch changes them model in the following way:
* poll for the events indefinitely until the restore is complete
* the restore completion event causes reset of the poll timeout to zero and
* starts the background transfers
* after each transfer we return to check if there are any uffd events to
handle
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Currently, once we get to transfer pages in the "background", we try to
fetch the entire IOV at once. For large IOVs this may impact #PF latency
for the #PF events occurred during the transfer.
Let's add a simple heuristic for controlling size of the background
transfers. Initially, the transfer will be limited to some default value.
Every time we transfer a chunk we increase the transfer size until it
reaches a pre-defined maximal size. A page fault event resets the
background transfer size to its initial value.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The complete_forks function presumes that it always has a work to do
because we assume that fork event is the only case when we drop out of
epoll_run_rfds with positive return value.
Teach complete_forks to bail out when there is no pending forks to process
to allow exiting epoll_run_rfds for different reasons.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
First check if there are pages we need to transfer and only afterwards
check if there are outstanding requests. Also, instead checking 'bool
remaining' to see if there is more work to do we can simply check if all
the lpi's have been already serviced.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The intention is to use this function for transferring all the pages that
didn't cause a #PF.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The function anyway pick the next page range to transfer it's just doing it
in very simple FIFO manner.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
We already have a queue for the requested memory ranges which contains
'lp_req' objects. These objects hold the same information as the lazy_iov:
start address of the range, end address and the address that the range had
at the dump time.
Rather than keep this information twice and use double bookkeeping, we can
extract the requested range from lpi->iovs and move it to lpi->reqs.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Instead of relying on length of various lists add a boolean variable to
lazy_pages_info to make it clean when the process has exited
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Currently zdtm doesn't detect when restore failed, if it is executed
with strace. With this patch, fake-restore.sh creates a test file, and
zdtm is able to distinguish when restore failed.
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The get() method requires a key and now we are using an index. That
will never work correctly as it is now.
Acked-by: Adrian Reber <adrian@lisas.de>
Reported-by: Adrian Reber <adrian@lisas.de>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Currently we restore all sockets in the root mount namespace, because we
were not able to get any information about a mount point where a socket
is bound. It is obviously incorrect in some cases.
In 4.10 kernel, we added the SIOCUNIXFILE ioctl for unix sockets. This
ioctl opens a file to which a socket is bound and returns a file
descriptor.
This new ioctl allows us to get mnt_id by reading fdinfo, and mnt_id
is enough to find a proper mount point and a mount namespace.
The logic of this patch is straight forward. On dump, we save mnt_id for
sockets, on restore we find a mount namespace by mnt_id and restore this
socket in its mount namespace.
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
unix_process_name() are called when sockets are being collected,
but at this moment we don't have socket descriptors.
A socket descriptor is reuired to get mnt_id, what will allow to resolve
a socket path in its mount namespace.
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
This ioctl opens a file to which a socket is bound and
returns a file descriptor. This file descriptor can be used to get
mnt_id and a file path.
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The USK_CALLBACK flag means that a socket is externel and will be
restored by a plugin. open_unixsk_standalone should not be called to
these sockets.
$ make -C test/others/unix-callback/ run
...
(00.109338) 7471: sk unix: Opening standalone socket (id 0xd ino 0 peer 0x63b)
(00.109376) 7471: Error (criu/sk-unix.c:1128): sk unix: BUG at criu/sk-unix.c:1128
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>