This is the version from v4.11-rc5. Apparently, that would be the userfault
ABI for the next few month.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The UFFDIO_EVENT_EXIT didn't make it upstream because of possible races in
exit() syscall [1].
The only way to detect that the monitored process is exited is checking for
ENOSPC errno value set by uffdio_copy.
[1] http://www.spinics.net/lists/linux-mm/msg122467.html
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
We only use one epoll instance to manage lazy-pages related I/O. Making
epollfd file-visible will allow cleaner implementation of the restored
process exit() calls tracking.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Both lazy_iov and lp_req have two fields for address/start: the run-time
address that tracks remaps, and the "dump time" address, which is required
for pagemap accesses.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The lazy-pages daemon have to properly track changes to virtual memory
layout of the restored process. The test verifies that lazy-pages daemon
properly reacts to fork(), exit(), madvise(MADV_DONTNEED) and mremap()
events.
Currently, no zdtm tests would generate UFFD_EVENT_{REMAP,REMOVE}.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
The dup_page_read performs a shallow copy of a page_read object. It is
required for implementation of fork event in lazy-pages daemon.
When a restored process fork()'s a child, the lazy-pages daemon will handle
page faults of the child process, and it will use the parent process memory
dump for that.
travis-ci: success for lazy-pages: add non-#PF events handling (rev2)
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Replace "pr<id>" with "pr<pid>-<id>" when printing information about a
particular page-read.
travis-ci: success for lazy-pages: add non-#PF events handling (rev2)
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
When the restored process calls mremap(), we will see #PF's on the new
addresses and we have to create a correspondence between the addresses
found in the dump and the actual addresses the process uses. For this
purpose we distinguish "live" address and "image" address in the lazy IOVs
and outstanding requests. The "live" address is used to find the
appropriate IOV and in uffd_copy and the "image" address is used to request
pages from the page-read.
If the mremap() call causes the mapping to grow, the additional part will
receive zero pages, as expected.
For the shrinking remaps, we will get UFFD_EVENT_UNMAP for the dropped part
of the mapping.
travis-ci: success for lazy-pages: add non-#PF events handling (rev2)
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
The UNMAP event is generated by userfaultfd when a VMA (or a part of it) is
unmapped from the process address space. We won't receive #PF's at the
unmapped range, but we need to make sure we are not trying to fill that
range at handle_remaining_pages.
Note, that the VMA is gone, so there is no sense to unregister uffd from
it.
travis-ci: success for lazy-pages: add non-#PF events handling (rev2)
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
When the restored process calls madvise(MADV_DONTNEED) or
madvise(MADV_REMOVE) the memory range specified by the madvise() call
should be remapped to zero pfn and we should stop monitoring this range in
order to avoid its pollution with data the process does not expect.
All we need to do here, is to unregister the memory range from userfaultfd
and the kernel will take care of the rest.
travis-ci: success for lazy-pages: add non-#PF events handling (rev2)
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
This is the version from linux-next at the moment.
travis-ci: success for lazy-pages: add non-#PF events handling (rev2)
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
With address space manipulations, amount of pages that the lazy-pages
daemon will copy might differ from amount of pages we had in the dumps.
Disable the warning and error retval for now; we can restore the accounting
once uffd event handling stabilizes a bit.
travis-ci: success for lazy-pages: add non-#PF events handling (rev2)
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
The copied_pages and total_pages may be different because the process may
drop parts of its address space. And the IOVs list will be empty iff we are
done with the process.
travis-ci: success for lazy-pages: add non-#PF events handling (rev2)
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Currently drop_lazy_iovs presumes that the range that should be dropped
starts inside an IOV. This works fine with page faults and background pages
but will fail for mapping changes.
travis-ci: success for lazy-pages: add non-#PF events handling (rev2)
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
The function essentially drops a memory range from lazy IOVecs
travis-ci: success for lazy-pages: add non-#PF events handling (rev2)
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
When printing a message about particular process events, always prefix it
with "<pid>-<uffd>" for better log readability
travis-ci: success for lazy-pages: add non-#PF events handling (rev2)
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Currently we use pagemap to check if we should copy a page into process
address space or zero it. The lazy_iov'ecs can be used instead. If a
lazy_iov covers the faulting address, we should go ahead and read the page
and copy it. If there is not lazy_iov for the faulting address, just zero
it immediately.
Searching lazy_iov's rather than pagemap will also simplify upcoming
handling of UFFD_EVENT_REMAP.
travis-ci: success for lazy-pages: add non-#PF events handling (rev2)
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Same meaning, less to type.
travis-ci: success for lazy-pages: add non-#PF events handling (rev2)
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Multithreaded applications may concurrently generate page faults at the
same address, which will cause creation of multiple requests for remote
page, and, consequently, confuse the page server on the dump side.
We can keep track on page fault requests in flight and ensure this way that
we request a page from the remote side only once.
travis-ci: success for lazy-pages: add non-#PF events handling (rev2)
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
The kernel anyways does not really supports any flags for page fault
message and we've used '#if 0' to skip the flags processing. However, we
can just drop this chunk as we anyway will have do some more work than
simply removing '#if 0' to handle UFFD_WP.
travis-ci: success for lazy-pages: add non-#PF events handling (rev2)
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
It works faster and allows to check exit codes.
travis-ci: success for series starting with [1/2] page-server: don't return a server pid from page-server
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
travis-ci: success for crtools: close a signal descriptor after passing a preparation stage (rev6)
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
The introduction of page-server send mode have broken the lazy dump because
instead of using existing pstree, the page server now tries to recreate the
pstree from the images.
Adding lazy_dump parameter to cr_page_server resolves this issue.
travis-ci: success for lazy-pages: fix lazy dump
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
To use lazy-pages from runc '--lazy-pages' functionality needs to be
accessible via RPC. This enables lazy-pages via RPC.
The information on which port to listen is taken from the
criu_page_server_info protobuf structure. If the user has enabled
lazy-pages via RPC only criu_page_server_info.port is evaluated
to get the listen port.
With additional patches in runc is it possible to use lazy-restore
with 'runc checkpoint' and 'runc restore'.
travis-ci: success for lazy-pages: enable lazy-pages via RPC
Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Rename handle_user_fault to handle_uffd_event and split actual page fault
handling to a helper function
travis-ci: success for lazy-pages: add non-#PF events handling
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
If an event handler returns a positive value, the event polling and
handling loop is interrupted after all the pending events indicated by
epoll_wait are processed.
travis-ci: success for lazy-pages: add non-#PF events handling
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
The number of pending events returned by epoll_wait is overridden by the
first call to an event handler. Using an additional local variable resolves
this issue.
travis-ci: success for lazy-pages: add non-#PF events handling
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Currently we are waiting for lazy-pages daemon to finish as a part of
.restore method, which may cause filling test process memory before the
test process resumed it's execution after call to test_waitsig(). In such
case, no page faults occur, but rather all the memory is copied in
handle_remaining_pages method in uffd.c.
Let's move wait(<lazy-pages-pid>,..) after call to test.stop().
travis-ci: success for lazy-pages: add non-#PF events handling
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
travis-ci: success for revert zero pagemaps
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
travis-ci: success for revert zero pagemaps
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Note, that since zero pages stats never been into master we can make
incompatible changes to stats image.
travis-ci: success for revert zero pagemaps
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
The pagemap entries for pages mapped to zero pfn proved to be not useful...
travis-ci: success for revert zero pagemaps
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
A page that explicitly mapped to zero pfn or a page that is not present
should be treated in the same way, therefore the zero pagemaps are not
required and will be removed by the following commits.
travis-ci: success for revert zero pagemaps
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
If a page was not marked "present" at the dump time it will not be covered
by the pagemap and it will remain unmapped in the restored process. We
should uffdio_zero such pages and let kernel mm to take over.
travis-ci: success for revert zero pagemaps
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
CID 173076, issues/259
travis-ci: success for pagemap: verify the number of pages returned by receive_remote_pages_info
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
This function does weird things, so better have it at least somehow
documented.
travis-ci: success for lazy-pages: add comments to update_lazy_iovecs
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Instead of checking for availability of userfaultfd late during the restore
process, make the detection of supported userfaultfd functionality part of
kerndat. As a bonus, I've extended criu check with ability to verify
presence of userfaultfd.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Use latest version from usefaultfd tree [1]. Judging by comments about the
last re-spin of userfaultfd updates, the API will go in "as is", so we can
pretty much rely on the current API definitions for proper detection of
supported userfaultfd features.
[1] https://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Add pre-dump and remote-lazy-pages passes to criu-lazy-pages.sh
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Currently we poll userfaultfd for page faults and if there were no page
faults during 5 seconds we stop monitoring the userfaultfd and start
copying remaining pages chunk by chunk.
If a page fault occurs during the copy, the faulting process will be stuck
until the page it accessed would be copied to its address space.
This patch limits the initial "page fault only" stage to 1 second instead
of 5, and interleaves non-blocking poll of userfaultfd with copying of the
remaining memory afterwards.
travis-ci: success for lazy-pages: interleave #PF handling with transfers of remaining pages
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
travis-ci: success for lazy-pages: spelling: s/pagefalt/#PF
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
After commit a97d6d3f1609 (pagemap: replace seek_page with seek_pagemap
method), uffd only searches the pagemap containing the faulting page, but
it not for the page itself. For local restore it causes wrong data to be
read from pages*img and subsequent crash of the restored process.
Adding a call to pr->skip_pages fixes the problem.
travis-ci: success for lazy-pages: fix searching for the page at #PF time
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>