After the commit
02c763939c10 ("test/zdtm: unify common code")
CFLAGS with -D_GNU_SOURCE defined in the top Makefile
are being passed to tests Makefiles.
As _GNU_SOURCE is also defined by tests, that resulted in
zdtm tests build failures:
make[2]: Entering directory `/home/criu/test/zdtm/lib'
CC test.o
test.c:1:0: error: "_GNU_SOURCE" redefined [-Werror]
#define _GNU_SOURCE
^
<command-line>:0:0: note: this is the location of the previous definition
cc1: all warnings being treated as errors
make[2]: *** [test.o] Error 1
However, we didn't catch this in time by Travis-CI, as zdtm.py doesn't
do `make zdtm`, rather it does `make -C test/zdtm/{lib,static,transition}`.
By calling middle makefile this way, it doesn't have _GNU_SOURCE in
CFLAGS from top-Makefile.
I think the right thing to do here - is following CRIU's way:
rely on definition of _GNU_SOURCE by Makefiles.
This patch is almost fully generated with
find test/zdtm/ -name '*.c' -type f \
-exec sed -i '/define _GNU_SOURCE/{n;/^$/d;}' '{}' \; \
-exec sed -i '/define _GNU_SOURCE/d' '{}' \;
With an exception for adding -D_GNU_SOURCE in tests Makefile.inc for
keeping the same behaviour for zdtm.py.
Also changed utsname.c to use utsname::domainname, rather private
utsname::__domainname, as now it's uncovered (from sys/utsname.h):
> struct utsname
> {
...
> # ifdef __USE_GNU
> char domainname[_UTSNAME_DOMAIN_LENGTH];
> # else
> char __domainname[_UTSNAME_DOMAIN_LENGTH];
> # endif
Reported-by: Adrian Reber <areber@redhat.com>
Cc: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
With kdat cache and unified kerndat_init() we can call it very early in
crtools and then kdat information will be available for all cr-* actions.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
kdat and lazy-pages use nearly the same sequence to open userfault. This
code can definitely live in a dedicated function.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Raise an exception for kernels that do not have userfaultfd. For the
kernels that have userfaultfd but do not provide non-cooperative events
(4.3 - 4.11) just print a warning.
Fixes: #363
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
For the older kernels the implementation of userfaultfd would not include
non-cooperative mode. In such case it is still possible to use uffd and
enable lazy-pages, but if the restored process will change its virtual
memory layout during restore, we'll get memory corruption.
After this change 'criu check --feature uffd' will report success if the kernel
supports userfaultfd at all and 'criu check --feature uffd-noncoop' will
report success if the kernel supports non-cooperative userfaultfd.
Suggested-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Acked-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
All the iovecs in uffd.c are lazy, there is no point in having _lazy_ in
functions that operate on these iovecs.
Acked-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
There's no real point to have two-liner wrapper for compete_page_fault and
uffd_io_complete is better semantically.
Acked-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Both test seem to reproduce issue #357 [1] too frequently which make it
really annoying. Temporarily remove them from lazy-pages passes until the
issue is fixed.
[1] https://github.com/xemul/criu/issues/357
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The page_pipe_read obsoleted page_pipe_split and related functions and
there is no point in keeping them.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
When page-server gets a request for an absent pagemap from lazy-pages
daemon, it should not reply with "zero pages"'.
The pagemap should be completely in sync between src and dst and dst
should never request pages that are not present. Maybe we should return
-1 here? At least we'll have a chance that dump will unroll everything...
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Make it call .write_pagemap once and decide whether or not to
call .write_pages based on the flags caluculated in one place
as well.
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Same thing for the boolean value saying whether or not to send
lazy pagemaps alone or follow them with the respective pages.
This value is non-true in the single place, so let's simplify
the API and keep this bool on xfer object.
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The offset in question is used by shmem dumping code to dump
memory segments relative to shmem segment start, no to task
mapping start. The offset value is now the part of the xfer
callback and is typically 0 :) Let's keep this on xfer object
to simplify the xfer API.
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
There are two places left that send ps_iov by hands into
socket. Switch it to use common helper.
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
There are cases when we need to specify flags with which
to send the ps_iov, so tune-up the send_psi for that and
use where needed.
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The only thing it does is puts 4 values on the on-stack ps_iov,
let's avoid double stack copying and put the values on ps_iov
in callers.
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Introduce the PS_IOV_ADD_F command that is to add pages with
flags. We already use the similar notation on page-xfer -- the
single write callback with pagemap and flags. For page-server
let's use the same. Legacy _HOLE and _PAGE handling is kept.
Changed commands numbers are OK, as the commands in question
are still in criu-dev branch.
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The page_read.seek_pagemap already tunes the pages offset,
so the separate call for skip_pagemap_pages in the routine
in question is always no-op.
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
When the remap 'from' parameter matches an IOV end we try to split that IOV
exactly at its end and effectively create an IOV with zero length.
With the off-by-one fix we will skip the IOV in such case as expected.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Acked-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
When we combine pre-dump with lazy pages, we populate a part of a memory
region with data that was saved during the pre-dump. Afterwards, the
region is registered with userfaultfd and we expect to get page faults for
the parts of the region that were not yet populated. However, khugepaged
collapses the pages and the page faults we would expect do not occur.
To mitigate this problem we temporarily disable THP for the restored
process, up to the point when we register all the memory regions with
userfaultfd.
https://lists.openvz.org/pipermail/criu/2017-May/037728.html
Reported-by: Adrian Reber <areber@redhat.com>
Acked-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Acked-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The PR_SET_THP_DISABLE prctl allows control of transparent huge pages on
per-process basis. It is available since Linux 3.15, but until recently it
set VM_NOHUGEPAGE for all VMAs created after prctl() call, which prevents
proper restore for combination of pre- and post-copy. A recent change to
prctl(PR_SET_THP_DISABLE) behavior eliminates the use of per-VMA flags and
we can use the new version of the prctl() to disable THP.
Acked-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The is_vma_range_fmt and parse_vmflags will be required for detection of
availability of PR_SET_THP_DISABLE prctl
Acked-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Now we have two separate recv-calling routines, that receive
header and pages from page-server. These two can finally be
unified.
After this the sync-read code looks like -- start async one
and wait for it to finish right at once.
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
This is prerequisite for the next patch.
v2: spellchecks, code reshuffle
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Now these two look exactly the same and we can have
only one call with additional sync/async (flags) arg.
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The newly introduced sync-read call may look exactly
the same as its async pair by using the respective
complete callback.
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
There's no need in two API calls to read xfer header
and pages themselves, so merge them into one single
call.
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
* drop --keep-going etc from --lazy-pages pass
* add --remote-lazy-pages pass
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
* select excluded tests based on the kernel version
* test local and remote lazy-pages with and withour pre-dump
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The page-read for child process is a shallow copy of the parent process
page-read. They share the open file descriptors and the pagemap.
The lpi_fini of the child processes should not release any resources, they
all will be released during lpi_fini of the parent process.
Fixes: #325
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
For the remote lazy pages case, to access pages in the middle of a pipe we
are splitting the page_pipe_buffers and iovecs and use splice() to move the
data between the underlying pipes. After the splits we get page_pipe_buffer
with single iovec that can be used to splice() the data further into the
socket.
This patch replaces the splitting and splicing with use of a helper pipe
and tee(). We tee() the pages from beginning of the pipe up to the last
requested page into a helper pipe, sink the unneeded head part into
/dev/null and we get the requested pages ready for splice() into the
socket.
This allows lazy-pages daemon to request the same page several time, which
is required to properly support fork() after the restore.
As added bonus we simplify the code and reduce amount of pipes that live in
the system.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Until now once we've started to fetch an iovec we've been waiting until
it's completely copied before returning to event processing loop. Now we
can have several request for the remote pages in flight.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
There could be several outstaning requests for the same page, either from
page fault handler or from handle_remaining_pages. Verifying that the
faulting address is already requested is not enough. We need to check if
there any request in flight that covers the faulting address.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
v2: When uffd is present, the reported features may still be 0,
so we need one more bool for uffd syscall itself.
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
They still will fail with --remote-lazy-pages, so mark them as
'noremotelazy'
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
This allows skipping tests that are not yet run with --remote-lazy-pages,
but can be run with --lazy-pages
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
When running with --lazy-pages or --remote-lazy-pages, the daemons should
run in the background, rather than complete before t.stop() is called.
Many tests try to verify things are ok after test_waitsig() and that's
exactly the place where they access memory and cause page faults.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Most of zdtm test should pass with --lazy-pages with kernels newer than
4.11.
Some test excluded for older kernels surprisingly pass even now, mainly
becuase they do not actually stress userfaultfd, which will be fixed in the
upcoming commits :)
The cmdlinenv00 fails even with kernel 4.11 because of a race between uffd
and gup in the case external process reads /proc/<pid>/cmdline before
memory containing the command line is populated.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>