Now these two look exactly the same and we can have
only one call with additional sync/async (flags) arg.
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The newly introduced sync-read call may look exactly
the same as its async pair by using the respective
complete callback.
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
There's no need in two API calls to read xfer header
and pages themselves, so merge them into one single
call.
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
* drop --keep-going etc from --lazy-pages pass
* add --remote-lazy-pages pass
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
* select excluded tests based on the kernel version
* test local and remote lazy-pages with and withour pre-dump
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The page-read for child process is a shallow copy of the parent process
page-read. They share the open file descriptors and the pagemap.
The lpi_fini of the child processes should not release any resources, they
all will be released during lpi_fini of the parent process.
Fixes: #325
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
For the remote lazy pages case, to access pages in the middle of a pipe we
are splitting the page_pipe_buffers and iovecs and use splice() to move the
data between the underlying pipes. After the splits we get page_pipe_buffer
with single iovec that can be used to splice() the data further into the
socket.
This patch replaces the splitting and splicing with use of a helper pipe
and tee(). We tee() the pages from beginning of the pipe up to the last
requested page into a helper pipe, sink the unneeded head part into
/dev/null and we get the requested pages ready for splice() into the
socket.
This allows lazy-pages daemon to request the same page several time, which
is required to properly support fork() after the restore.
As added bonus we simplify the code and reduce amount of pipes that live in
the system.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Until now once we've started to fetch an iovec we've been waiting until
it's completely copied before returning to event processing loop. Now we
can have several request for the remote pages in flight.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
There could be several outstaning requests for the same page, either from
page fault handler or from handle_remaining_pages. Verifying that the
faulting address is already requested is not enough. We need to check if
there any request in flight that covers the faulting address.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
v2: When uffd is present, the reported features may still be 0,
so we need one more bool for uffd syscall itself.
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
They still will fail with --remote-lazy-pages, so mark them as
'noremotelazy'
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
This allows skipping tests that are not yet run with --remote-lazy-pages,
but can be run with --lazy-pages
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
When running with --lazy-pages or --remote-lazy-pages, the daemons should
run in the background, rather than complete before t.stop() is called.
Many tests try to verify things are ok after test_waitsig() and that's
exactly the place where they access memory and cause page faults.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Most of zdtm test should pass with --lazy-pages with kernels newer than
4.11.
Some test excluded for older kernels surprisingly pass even now, mainly
becuase they do not actually stress userfaultfd, which will be fixed in the
upcoming commits :)
The cmdlinenv00 fails even with kernel 4.11 because of a race between uffd
and gup in the case external process reads /proc/<pid>/cmdline before
memory containing the command line is populated.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
This is the version from v4.11-rc5. Apparently, that would be the userfault
ABI for the next few month.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The UFFDIO_EVENT_EXIT didn't make it upstream because of possible races in
exit() syscall [1].
The only way to detect that the monitored process is exited is checking for
ENOSPC errno value set by uffdio_copy.
[1] http://www.spinics.net/lists/linux-mm/msg122467.html
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
We only use one epoll instance to manage lazy-pages related I/O. Making
epollfd file-visible will allow cleaner implementation of the restored
process exit() calls tracking.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Both lazy_iov and lp_req have two fields for address/start: the run-time
address that tracks remaps, and the "dump time" address, which is required
for pagemap accesses.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The lazy-pages daemon have to properly track changes to virtual memory
layout of the restored process. The test verifies that lazy-pages daemon
properly reacts to fork(), exit(), madvise(MADV_DONTNEED) and mremap()
events.
Currently, no zdtm tests would generate UFFD_EVENT_{REMAP,REMOVE}.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
The dup_page_read performs a shallow copy of a page_read object. It is
required for implementation of fork event in lazy-pages daemon.
When a restored process fork()'s a child, the lazy-pages daemon will handle
page faults of the child process, and it will use the parent process memory
dump for that.
travis-ci: success for lazy-pages: add non-#PF events handling (rev2)
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Replace "pr<id>" with "pr<pid>-<id>" when printing information about a
particular page-read.
travis-ci: success for lazy-pages: add non-#PF events handling (rev2)
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
When the restored process calls mremap(), we will see #PF's on the new
addresses and we have to create a correspondence between the addresses
found in the dump and the actual addresses the process uses. For this
purpose we distinguish "live" address and "image" address in the lazy IOVs
and outstanding requests. The "live" address is used to find the
appropriate IOV and in uffd_copy and the "image" address is used to request
pages from the page-read.
If the mremap() call causes the mapping to grow, the additional part will
receive zero pages, as expected.
For the shrinking remaps, we will get UFFD_EVENT_UNMAP for the dropped part
of the mapping.
travis-ci: success for lazy-pages: add non-#PF events handling (rev2)
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
The UNMAP event is generated by userfaultfd when a VMA (or a part of it) is
unmapped from the process address space. We won't receive #PF's at the
unmapped range, but we need to make sure we are not trying to fill that
range at handle_remaining_pages.
Note, that the VMA is gone, so there is no sense to unregister uffd from
it.
travis-ci: success for lazy-pages: add non-#PF events handling (rev2)
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
When the restored process calls madvise(MADV_DONTNEED) or
madvise(MADV_REMOVE) the memory range specified by the madvise() call
should be remapped to zero pfn and we should stop monitoring this range in
order to avoid its pollution with data the process does not expect.
All we need to do here, is to unregister the memory range from userfaultfd
and the kernel will take care of the rest.
travis-ci: success for lazy-pages: add non-#PF events handling (rev2)
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
This is the version from linux-next at the moment.
travis-ci: success for lazy-pages: add non-#PF events handling (rev2)
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
With address space manipulations, amount of pages that the lazy-pages
daemon will copy might differ from amount of pages we had in the dumps.
Disable the warning and error retval for now; we can restore the accounting
once uffd event handling stabilizes a bit.
travis-ci: success for lazy-pages: add non-#PF events handling (rev2)
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
The copied_pages and total_pages may be different because the process may
drop parts of its address space. And the IOVs list will be empty iff we are
done with the process.
travis-ci: success for lazy-pages: add non-#PF events handling (rev2)
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Currently drop_lazy_iovs presumes that the range that should be dropped
starts inside an IOV. This works fine with page faults and background pages
but will fail for mapping changes.
travis-ci: success for lazy-pages: add non-#PF events handling (rev2)
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
The function essentially drops a memory range from lazy IOVecs
travis-ci: success for lazy-pages: add non-#PF events handling (rev2)
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
When printing a message about particular process events, always prefix it
with "<pid>-<uffd>" for better log readability
travis-ci: success for lazy-pages: add non-#PF events handling (rev2)
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Currently we use pagemap to check if we should copy a page into process
address space or zero it. The lazy_iov'ecs can be used instead. If a
lazy_iov covers the faulting address, we should go ahead and read the page
and copy it. If there is not lazy_iov for the faulting address, just zero
it immediately.
Searching lazy_iov's rather than pagemap will also simplify upcoming
handling of UFFD_EVENT_REMAP.
travis-ci: success for lazy-pages: add non-#PF events handling (rev2)
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Same meaning, less to type.
travis-ci: success for lazy-pages: add non-#PF events handling (rev2)
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Multithreaded applications may concurrently generate page faults at the
same address, which will cause creation of multiple requests for remote
page, and, consequently, confuse the page server on the dump side.
We can keep track on page fault requests in flight and ensure this way that
we request a page from the remote side only once.
travis-ci: success for lazy-pages: add non-#PF events handling (rev2)
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
The kernel anyways does not really supports any flags for page fault
message and we've used '#if 0' to skip the flags processing. However, we
can just drop this chunk as we anyway will have do some more work than
simply removing '#if 0' to handle UFFD_WP.
travis-ci: success for lazy-pages: add non-#PF events handling (rev2)
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
It works faster and allows to check exit codes.
travis-ci: success for series starting with [1/2] page-server: don't return a server pid from page-server
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
travis-ci: success for crtools: close a signal descriptor after passing a preparation stage (rev6)
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
The introduction of page-server send mode have broken the lazy dump because
instead of using existing pstree, the page server now tries to recreate the
pstree from the images.
Adding lazy_dump parameter to cr_page_server resolves this issue.
travis-ci: success for lazy-pages: fix lazy dump
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
To use lazy-pages from runc '--lazy-pages' functionality needs to be
accessible via RPC. This enables lazy-pages via RPC.
The information on which port to listen is taken from the
criu_page_server_info protobuf structure. If the user has enabled
lazy-pages via RPC only criu_page_server_info.port is evaluated
to get the listen port.
With additional patches in runc is it possible to use lazy-restore
with 'runc checkpoint' and 'runc restore'.
travis-ci: success for lazy-pages: enable lazy-pages via RPC
Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Rename handle_user_fault to handle_uffd_event and split actual page fault
handling to a helper function
travis-ci: success for lazy-pages: add non-#PF events handling
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
If an event handler returns a positive value, the event polling and
handling loop is interrupted after all the pending events indicated by
epoll_wait are processed.
travis-ci: success for lazy-pages: add non-#PF events handling
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>