Lazy migration requires both dumped and restored processes to coexist at
the same time. This breaks some basic assumptions in the zdtm design.
Simulation of lazy migration with the page server allows testing most of
the involved code paths without major intervention into zdtm
infrastructure.
travis-ci: success for lazy-pages: improve testability (rev2)
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
The kernel support for lazy pages (userfaultfd) lacks many important
features which effectively prevents success in certain tests.
Allow skipping such test with somewhat informative message
travis-ci: success for lazy-pages: improve testability (rev2)
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Currently, standalone page-server can only receive pages from the remote
dump. Extend it with the ability to serve local memory dump to a remote
lazy-pages daemon.
travis-ci: success for lazy-pages: improve testability (rev2)
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Instead of crashing dump/page-server when a problem detected after the
page-pipe was split, print nice error messages and return error.
travis-ci: success for page-xfer: page_server_get_pages: replace BUG_ONs with 'return -1' (rev2)
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Splitting of the trailing part of page-pipe buffer worked by coincidence
for single page requests. Request longer than a single page were not
handled correctly.
The proper point for splitting the trailing part of the page-pipe buffer is
the IOV following the IOV containing the desired page(s).
travis-ci: success for page-pipe: (yet another) fix for split page-pipe buffers (rev2)
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
travis-ci: success for Some more cleanups over uffd.c (rev3)
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
With epoll helpers in util we can stop exposing the
page-server socket to the oter world.
travis-ci: success for Some more cleanups over uffd.c (rev3)
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
v2: Move epoll_prepare() too
travis-ci: success for Some more cleanups over uffd.c (rev3)
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
The uffd code only needs the pstree items themselves, not
any IDs and relations they might have.
travis-ci: success for Some more cleanups over uffd.c (rev3)
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
This run away from previous set :) Two routines are now
identical, only page-read flags differ.
v2: Keep the uffd_hanle_pages() name
travis-ci: success for Some more cleanups over uffd.c (rev3)
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Extend the RPC feature check functionality to also test for lazy-pages
support. This does not check for certain UFFD features (yet). Right now
it only checks if kerndat_uffd() returns non-zero.
The RPC response is now transmitted from the forked process instead of
encoding all the results into the return code. The parent RPC process
now only sends an RPC message in the case of a failure.
Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Currently, lazy dump starts page server regardless of errors that might
have been encountered at earlier stages. Fix it.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Le sigh.
travis-ci: success for more pr_perror() usage fixes
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Add a queue of async-read jobs into page-xfer. When the
page_server_sk gets a read event from epoll it reads as
many bytes into page_server_iov + page buffer as recv
allows and returns.
Once the full iov+data is ready the requestor is notified
and the next async read is started.
This patch removes calls to recv(...MSG_WAITALL) from all
remote async paths.
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Finally, page_fault_local and page_fault_remote are
absolutely identical, so we can just merge them.
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
This one is called by PR once IO is complete (right now
for sync cases only, more work is required here) and
lets us unify local and remote PF code in uffd.
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
The _copy and _update_lazy_iovecs are both called by hands
once the data is ready.
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
This flag means, that the PR_ASYNC is valid, but the IO
should be started ASAP. This is how remote reader works,
so this flag is mostly for the local reader. It will let
us unify page-fault handlers for local and remote cases.
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
We already have routines that do send-req, recv-info
and recv-page, so no need in yet another one.
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
All the "lower" page-read-s should have already arrived with
pre-dump. This fixes the combined scheme.
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
The page transfer protocol is completely synchronous on the dump side,
therefore we can presume that when we get POLLIN event on the page server
socket it is either page info response for the last sent page request or
the page data following the last page info.
In the first case we set ev_data associated with page server socket events
to values received in receive_remote_page_info and in the second case we
reset ev_data to zero. This allows us to distinguish what was the reason
page_server_event have been called.
travis-ci: success for uffd: A new set of improvements
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
The synchronous remote page transfer prevents reception of uffd events
during the communications with the page server on the dump side. Adding
socket file descriptor to epoll_wait allows processing of incoming uffd
events after non-blocking request for remote page is issued and before the
dump side page server replies.
travis-ci: success for uffd: A new set of improvements
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
The asynchronous version of remote page_read will send the request to the
dump side and return happily.
The response will be handled by the uffd.c because it's epoll loop is the
only place where we can handle events.
travis-ci: success for uffd: A new set of improvements
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
This part of code is responsible for reseting pagemap to proper locatation,
and mapping requested address to zero pfn if needed. The upcoming addtions
to uffd.c will reuse this code.
travis-ci: success for uffd: A new set of improvements
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
For asynchrounous page transfers in post-copy migration we need to be able
to request a remote pages, receive back information about the data is going
to arrive and receive the page data itself.
travis-ci: success for uffd: A new set of improvements
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
It will used by lazy-pages daemon to enable polling for reception of page
data from remote dump
travis-ci: success for uffd: A new set of improvements
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
In early days of uffd.c return value from uffd_copy was used to count
transferred pages. Since this is not the case anymore we can use 0 as
success.
travis-ci: success for uffd: A new set of improvements
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Currently lazy-pages daemon uses either pr->read_pages or get_remote_pages
to get actual page data from local images or remote server. From now on,
page_read will be completely responsible for getting the page data.
travis-ci: success for uffd: A new set of improvements
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
travis-ci: success for uffd: A new set of improvements
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
travis-ci: success for uffd: A new set of improvements
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
travis-ci: success for uffd: A new set of improvements
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
travis-ci: success for uffd: A new set of improvements
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
travis-ci: success for uffd: A new set of improvements
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Currently we allocate a single page to use as intermediate buffer for
holding data that will be used in UFFDIO_COPY. Let's allocate a buffer per
process and make that buffer large enough to hold the largest continuos
chunk.
travis-ci: success for uffd: A new set of improvements
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
page_read->seek_page was restored to skip zero pagemaps, therefore we
should check its return value rather than underlying PME.
travis-ci: success for uffd: A new set of improvements
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Inline relevant parts of get_page inside uffd_handle_page and call
uffd_{copy,zero}_page after we've got the data.
travis-ci: success for uffd: A new set of improvements
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
We will want to poll not only a bunch of uffd-s, but also the lazy
socket, so here's "an fd and a callback" object to be pushed into
epoll.
travis-ci: success for uffd: A new set of improvements
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Instead of tracking memory handled by userfaultfd on the page basis we can
use IOVs for continious chunks.
travis-ci: success for uffd: A new set of improvements
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Right now the zdtm.py hacks around core code and waits for
a second for the socket to appear. Let's better make proper
--daemon mode for lazy-pages daemon and pidfile generation.
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Instead of creating mm-related parts of restore info in process tree we
can directly use MmEntry for VMA traversals.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Moving the find_vmas and collect_uffd_pages functions before they are
actually used. This allows to drop forward declaration of find_vmas and
will make subsequent refactoring cleaner.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
The event received should be checked to be #PF before
accessing its other arguments.
[ Mike:
Well, looking forward to see non-cooperative userfaultfd patches in kernel
we should have something like
static int handle_uffd_enent(struct lazy_pages_info *lpi)
{
read(&msg...);
switch (msg.event) {
case UFFD_EVENT_PAGEFAULT:
handle_pagefault(lpi, msg);
break;
default:
return -1;
}
}
But since this patch is anyway is a bugfix: <ack>
]
travis-ci: success for uffd: A set of improvements over criu/uffd.c
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
After previous patch we no longer need this hash since
we don't need fd -> lpi conversion.
travis-ci: success for uffd: A set of improvements over criu/uffd.c
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
This helps us get lpi MUCH faster on #PF.
travis-ci: success for uffd: A set of improvements over criu/uffd.c
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
This avoids excessive memcpy() one instruction below.
travis-ci: success for uffd: A set of improvements over criu/uffd.c
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
In cases errno is being set, we need to use pr_perror() to print it.
In cases errno is not set, we should use pr_err().
pr_perror() doesn't need a colon or a newline. pr_err() needs a newline.
Cc: Adrian Reber <areber@redhat.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
travis-ci: success for Assorted nitpicks
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Use relative path for UNIX socket instead of absolute one.
This ensures we won't run into problems with invalid socket names.
travis-ci: success for lazy-pages: use relative path for UNIX socket
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Restore of a zombie process does not call setup_uffd which causes
lazy-pages daemon to stuck forever waiting for (pid, uffd) pair to arrive.
Let's extend the protocol between restore and lazy-pages so that for zombie
process a (0, -1) pair will be sent instead of actual (uffd, pid).
travis-ci: success for lazy-pages: misc fixes (rev4)
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>