travis-ci: success for lazy-pages: misc fixes (rev4)
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
To properly handle zombie processes we will need to distinguish failures
coming from socket communications from absent userfault file descriptor
travis-ci: success for lazy-pages: misc fixes (rev4)
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
When a VMA is mapped with MAP_LOCKED it is address space is populated with
pages which causes UFFDIO_COPY to return -EXISTS. Until we can find some
better solution let's avoid marking VMAs with MAP_LOCKED as lazy.
Fixes: #238
travis-ci: success for lazy-pages: misc fixes (rev3)
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Only the UFFD daemon is aware if pages are in the parent or not. The
restore will continue to work as any lazy-restore except that pages from
parent checkpoints will be pre-populated by the restorer.
The restorer will still register the whole memory region as being
handled by userfaultfd even if it contains pages from parent
checkpoints. Userfaultfd page faults will only happen on pages which
contain no data. This means from the parent pre-populated pages will not
trigger a userfaultfd message even if marked as being handled by
userfaultfd.
The UFFD daemon knows about pages which are available in the parent
checkpoints and will not push those pages unnecessarily to userfaultfd.
Following steps to migrate a process are now possible:
Source system:
* criu pre-dump -D /tmp/cp/1 -t <PID>
* rsync -a /tmp/cp <destination>:/tmp
* criu dump -D /tmp/cp/2 -t <PID> --port 27 --lazy-pages \
--prev-images-dir ../1/ --track-mem
Destination system:
* rsync -a <source>:/tmp/cp /tmp/
* criu lazy-pages --page-server --address <source> --port 27 \
-D /tmp/cp/2 &
* criu restore --lazy-pages -D /tmp/cp/2
This will now restore all pages from the parent checkpoint if they
are not marked as lazy in the second checkpoint.
v2:
- changed parent detection to use pagemap_in_parent()
v3:
- unfortunately this reverts
c11cf95afbe023a2816a3afaecb65cc4fee670d7
"criu: mem: skip lazy pages during restore based on pagemap info"
To be able to split the VMA-s in the right chunks for the restorer
it is necessary to make the decision lazy or not on the VmaEntry
level.
v4:
- everything has changed thanks to Mike Rapoport's suggestion
- the VMA-s are no longer touched or split
- instead of over 100 lines of changes this is now two line patch
Signed-off-by: Adrian Reber <areber@redhat.com>
Acked-by: Mike Rapoprot <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Combining pre-copy (pre-dump) and post-copy (lazy-pages) mode showed a
problem in the function page_pipe_split_ppb(). The function is used to
split the page-pipe-buffer so that it only contains the IOVs request
from the restore side during lazy restore.
Unfortunately it only splits the leading IOVs out of the
page-pipe-buffer and not the trailing:
Before split for requested address 0x7f27284d1000:
page-pipe: ppb->iov 0x7f0f74d93040
page-pipe: 0x7f27282bb000 1
page-pipe: 0x7f27284d1000 1
page-pipe: 0x7f27284dd000 2
After split:
page-pipe: ppb->iov 0x7f0f74d93050
page-pipe: 0x7f27284d1000 1
page-pipe: 0x7f27284dd000 2
and:
page-pipe: ppb->iov 0x7f0f74d93040
page-pipe: 0x7f27282bb000 1
This patch keeps on splitting the page-pipe-buffer until it contains
only the requested address with the requested length.
After split (still trying to load 0x7f27284d1000):
page-pipe: ppb->iov 0x7f0f74d93050
page-pipe: 0x7f27284d1000 1
and:
page-pipe: ppb->iov 0x7f0f74d93040
page-pipe: 0x7f27282bb000 1
and:
page-pipe: ppb->iov 0x7f0f74d93060
page-pipe: 0x7f27284dd000 2
v2:
- moved declarations to the declaration block
Signed-off-by: Adrian Reber <areber@redhat.com>
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Currently potentially lazy pages are not counted as written even if they
are dump into pages*img. Count these pages as "pages_written" when dump is
not going to skip writing lazy pages to disk.
travis-ci: success for criu: mem: count all pages actually written to image as "pages_written"
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
This translates pagemap flags into strings for easier readability.
Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Do the same here, the flags is now enough to tell hole
from pagemap.
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
They are now the same and PE_PRESENT bit helps us distinguish
holes from pagemaps having pages inside.
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
The ->write_hole and ->write_pagemap now look very much
alike, so let's merge them. This is preparatory patch
that makes holy type decision based on PE_FOO flags.
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Instead of checking whether the VMA containing a page can be lazy for each
page, skip the entire parts of pagemap that have PE_LAZY flag set.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
The PE_PRESENT flags is always set for pagemap entries that have
corresponding pages in the pages*img. Pagemap entries describing a hole
either with zero page or with pages in the parent snapshot will no have
PE_PRESENT flag set.
Pagemap entry that may be lazily restored is a special case. For the lazy
restore from disk case, both PE_LAZY and PE_PRESENT will be set in the
pagemap, but for the remote lazy pages case only PE_LAZY will be set.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
With 'zero' and 'lazy' booleans replaced by the flags field in
PagemapEntry, it is required that page-xfer will be aware of the change.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Having three booleans in pagemap entry clues for usage of good old flags.
Replace 'zero' and 'lazy' booleans with flags and use flags for internal
tracking of in_parent value. Eventually, in_parent may be deprecated.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Create the socket early so that it will be available after restoring the
namespaces
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Very minimalistic at the moment, no remote pages and namesapces.
Still better than nothing :)
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: Adrian Reber <areber@redhat.com>
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
We'll use it in anon shmem dedup so we need to have access
to it in shmem.c
Signed-off-by: Eugene Batalov <eabatalov89@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
The UNIX sockets do not like relative paths. Assuming both lazy-pages
daemon and restore use the same opts.work_dir, their working directory full
path will be the same.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
The remote lazy pages variant can be run as follows:
src# criu dump -t <pid> --lazy-pages --port 9876 -D /tmp/1 &
src# while ! sudo fuser 9876/tcp ; do sleep 1; done
src# scp -r /tmp/1/ dst:/tmp/
dst# criu lazy-pages --page-server --address dst --port 9876 -D /tmp/1 &
dst# criu restore --lazy-pages -D /tmp/1
In a nutshell, this implementation of remote lazy pages does the following:
- dump collects the process memory into the pipes, transfers non-lazy pages
to the images or to the page-server on the restore side. The lazy pages
are kept in pipes for later transfer
- when the dump creates the page_pipe_bufs, it marks the buffers containing
potentially lazy pages with PPB_LAZY
- at the dump_finish stage, the dump side starts TCP server that will
handle page requests from the restore side
- the checkpoint directory is transferred to the restore side
- on the restore side lazy-pages daemon is started, it creates UNIX socket
to receive uffd's from the restore and a TCP socket to forward page
requests to the dump side
- restore creates memory mappings and fills the VMAs that cannot be handled
by uffd with the contents of the pages*img.
- restore registers lazy VMAs with uffd and sends the userfault file
descriptors to the lazy-pages daemon
- when a #PF occurs, the lazy-pages daemon sends PS_IOV_GET command to the dump
side; the command contains PID, the faulting address and amount of pages
(always 1 at the moment)
- the dump side extracts the requested pages from the pipe and splices them
into the TCP socket.
- the lazy-pages daemon copies the received pages into the restored process
address space
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
When appropriate, the lazy pages will no be written to the destination.
Instead, a pagemap entry for range of such pages will be marked with 'lazy'
flag.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
The pages that are mapped to zero_page_pfn are not dumped but information
where are they located is required for lazy restore.
Note that get_pagemap users presumed that zero pages are not a part of the
pagemap and these pages were just silently skipped during memory restore.
At the moment I preserve this semantics and force get_pagemap to skip zero
pages.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Pagemap now is more friendly to random accesses, enable use of new APIs.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Fix CID 163485 (#2 of 2): Dereference null return value (NULL_RETURNS)
7. dereference: Dereferencing a pointer that might be null dest when
calling handle_user_fault.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
This will allow to split a ppb so that data residing at specified address
will be immediately available
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
for buffers that contain potentially lazy pages
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
>>> >>> CID 161312: Error handling issues (NEGATIVE_RETURNS)
>>> >>> "task_args->uffd" is passed to a parameter that cannot be negative.
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
>>> >>> CID 161322: API usage errors (USE_AFTER_FREE)
>>> >>> Calling "close(int)" closes handle "client" which has already been closed.
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Now that userfaultfd/lazy-pages support is enable all the time, this
adds runtime detection of userfaultfd. On a system without the
userfaultfd syscall following is printed:
uffd daemon:
(00.000004) Error (uffd.c:176): lazy-pages: Runtime detection of userfaultfd failed on this system.
(00.000024) Error (uffd.c:177): lazy-pages: Processes cannot be lazy-restored on this system.
or criu restore
(00.457047) 6858: Error (uffd.c:176): lazy-pages: Runtime detection of userfaultfd failed on this system.
(00.457049) 6858: Error (uffd.c:177): lazy-pages: Processes cannot be lazy-restored on this system.
Signed-off-by: Adrian Reber <areber@redhat.com>
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Add linux/userfaultfd.h to criu sources. This header is a part
of the kernel API and I see nothing wrong to have in the repo.
Why we want to do this:
* to check that criu works correctly if a kernel doesn't
support userfaultfd.
* to check compilation of the userfaultfd part in travis-ci.
v2: remove UFFD from FEATURES_LIST
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Adrian Reber <areber@redhat.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Adrian Reber <areber@redhat.com>
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
for holding state related to userfaultfd handling
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
It verifies that amount of collected and transferred pages is consitent
and prints a summary
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
It will handle page fault notifications from userfaultfd
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
so that listenning file descriptor might be used in select/poll
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
If CONFIG_HAS_UFFD is not defined an attempt to run the lazy pages daemon
will result in error message
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>