2
0
mirror of https://github.com/checkpoint-restore/criu synced 2025-08-26 03:47:35 +00:00

9635 Commits

Author SHA1 Message Date
Mike Rapoport
850095dad4 lazy-pages: make uffd_io_complete more robust
Make sure we handle various corner cases:
* we received less pages than requested
* the request was capped because of unmap/remap etc
* the process has exited underneath us

Currently we are freeing the request once we've found the address to use
with uffd_copy(). Instead, let's keep the request object around, use it to
properly calculate number of pages we pass to uffd_copy() and then re-add
tailing range (if any) to the IOVs list.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-10 00:37:01 +03:00
Mike Rapoport
00ce9b52aa lazy-pages: factor out insertion to sorted IOV list
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-10 00:37:01 +03:00
Mike Rapoport
9124dbd7f9 criu/config: reduce the number of argv traversals
Instead of pre-parsing command line twice, one time to detect -h/--help and
another time to find config file parameter, check for both in one pass.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-10 00:20:55 +03:00
Mike Rapoport
81621ce14c criu/config: allow init_config properly handle -h/--help
When config parsing was split into a separate part the handling of
-h/--help option during init_config was broken. Fix it.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-10 00:20:55 +03:00
Mike Rapoport
cc101ae63b criu/config: rename variables counting options in config files
s/first_count/global_cfg_argc
s/second_count/user_cfg_argc

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-10 00:20:55 +03:00
Adrian Reber
4386b324b3 Fix building unlink_fstat00 unlink test case
Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-09 23:59:58 +03:00
Mike Rapoport
253061634e criu: fix 'criu --version'
Currently kerndat_init() runs before command line parsing and running
simple 'criu --version' command may produce something like:

Warn  (criu/kerndat.c:847): Can't load /run/criu.kdat
Error (criu/util.c:842): exited, status=3
Error (criu/util.c:842): exited, status=3
Write 4294967295 to /proc/self/loginuid failed: Operation not permittedWarn
(criu/net.c:2732): Unable to get socket network namespace
Warn  (criu/net.c:2732): Unable to get tun network namespace
Warn  (criu/sk-unix.c:213): sk unix: Unable to open a socket file:
Operation not permitted
Error (criu/net.c:3023): Unable create a network namespace: Operation not
permitted
Warn  (criu/net.c:3069): NSID isn't reported for network links
Version: 3.6
GitID: v3.6-611-g0b27d0a

Group early calls to kerndat_* and init_service_fd calls into a function
and call this function after the command line parsing is finished.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-09 10:17:47 +03:00
Mike Rapoport
5f18a6e804 criu: split configuration parsing into separate file
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-09 10:17:47 +03:00
Joel Nider
0c41a4208b compel: std_vprint_num returns a null-terminated string
This function is an analogue to vsprintf(), and is used in very much the
same way. The caller expects the modified string pointer to be pointing to
a null-terminated string.

Signed-off-by: Joel Nider <joeln@il.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-09 10:17:47 +03:00
Andrei Vagin
3bc9e95e49 travis: rollback to fedora 27
We have a few issues with fc28. For example:
https://github.com/checkpoint-restore/criu/issues/469

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-09 10:17:38 +03:00
Joel Nider
21b754921e criu/restorer: print valid string when set last_pid fails
The string returned by std_vprint_num() is right-aligned in the buffer.
Therefore, we must print the string starting from the pointer returned in
the 'ps' argument, and not from the start of the original buffer.

Signed-off-by: Joel Nider <joeln@il.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-08 00:32:30 +03:00
Mike Rapoport
8aba610bd3 lazy-pages: fork: fix duplication of IOV lists
Instead of merging unfinished requests with child's IOVs we queued them
into parent's IOV list. Fix it.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-08 00:27:20 +03:00
Mike Rapoport
e77df5abd8 lazy-pages: actually return to epoll_wait after completing forks
Commit 9cb20327aa4 ("return to epoll_wait after completing forks") was only
half way there. Adding the other half.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-02 03:26:32 +03:00
Andrei Vagin
43c3772334 zdtm: include lock.h after zdtmtst.h
In file included from s390x_gs_threads.c:10:0:
../lib/lock.h: In function 'mutex_lock':
../lib/lock.h:148:4: error: implicit declaration of function 'pr_perror' [-Werror=implicit-function-declaration]
    pr_perror("futex");

Reported-by: Mr Jenkins

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-30 02:10:59 +03:00
Dmitry Safonov
4439454088 make: Don't set $(MAKEFLAGS)
We shouldn't set MAKEFLAGS by the following reasons:
1. User may want to specify some make parameter (e.g., `-d` for debug)
2. We lose parallel build. No `-j` is passed to submake and it looks
   like, gnu/make will not deal with parallel recursive make if
   $(MAKEFLAGS) is unset back.
   Easy to verify: Add `sleep 3` to build rule in Makefile.inc and
   you'll find only one sleep process at a time. After the patch
   if you specify say `-j5` to make - you'll have 5 sleep processes.

Reverts: commit e9beed7bb3f3 ("build: zdtm -- Add implicit rules into
zdtm building").

Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-30 00:35:49 +03:00
Dmitry Safonov
c574c2883f test/make: Drop implicit make variables
Let's drop usage of COMPILE.c, OUTPUT_OPTION.
It will allow run submake with -R.

Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-30 00:35:49 +03:00
Dmitry Safonov
4cbd3ac447 nmk: Don't redefine MAKEFLAGS
$(MAKEFLAGS) already contains -r -R and --no-print-directory: those
flags are being added in include.mk.. which is included two lines above.
There is no comment and I see no big sense in erasing $(MAKEFLAGS),
rather than adding those flags. So I considered this as a typo.

Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-30 00:35:49 +03:00
Andrei Vagin
0771942c73 zdtm: always run criu dump with --track-mem if --snaps is set
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-30 00:30:21 +03:00
Andrei Vagin
45620418e4 inventory: save dump_uptime for criu dump if track_mem is set
A set of images from criu dump can be used as a previous point, when we
are doing snapshots. In this case, each point contains a full set of
images.

https://github.com/checkpoint-restore/criu/issues/479

v2: return -1 if invertory_save_uptime failed

Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Reviewed-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-30 00:30:21 +03:00
Andrei Vagin
297aa5b428 mem: return -1 from __parasite_dump_pages_seized in error cases
Here is one of often mistakes:

int funcX()
{
	int ret;

	ret = funcA()
	if (ret < 0)
		goto err;

	if (smth)
		goto err; // return 0 !!!!

err:
	return ret;
}

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-30 00:30:21 +03:00
Kirill Tkhai
9ddf9a224a helper: Move service fds closing code to restore_one_helper()
There is no reasons we need this cleanup code in generic
restore_one_task(), so let's move it for better readability.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-27 18:09:04 +03:00
Pavel Tikhomirov
c32c7371b9 zdtm: check that pid-reuse does not break iterative memory dump
The idea of the test is:

1) mmap separate page and put variable there, so that other usage does
not dirty these region. Initialize the variable with VALUE_A.

2) fork a child with special pid == CHILD_NS_PID. Only if it is a first
child overwrite the variable with VALUE_B.

3) wait for the end of the next predump or end of restore with
test_wait_pre_dump_ack/test_wait_pre_dump pair and kill our child.

Note: The memory region is "clean" in parent.

4) goto (2) unles end of cr is reported by test_waitpre

So on first iteration child with pid CHILD_NS_PID was dumped with
VALUE_B, on all other iterations and on final dump other child with the
same pid exists but with VALUE_A. But on all iterations after the first
one we have these memory region "clean". So criu before the fix would
have restored the VALUE_B taking it from first child's image, but should
restore VALUE_A.

Note: Child in its turn waits termination and performs a check that variable
value doesn't change after c/r.

We should run the test with at least one predump to trigger the problem:

[root@snorch criu]# ./test/zdtm.py run --pre 1 -k always -t zdtm/transition/pid_reuse
Checking feature ns_pid
Checking feature ns_get_userns
Checking feature ns_get_parent

=== Run 1/1 ================ zdtm/transition/pid_reuse

===================== Run zdtm/transition/pid_reuse in ns ======================
DEP       pid_reuse.d
CC        pid_reuse.o
LINK      pid_reuse
Start test
Test is SUID
./pid_reuse --pidfile=pid_reuse.pid --outfile=pid_reuse.out
Run criu pre-dump
Send the 10 signal to  52
Run criu dump
Run criu restore
Send the 15 signal to  73
Wait for zdtm/transition/pid_reuse(73) to die for 0.100000
Test output: ================================
14:47:57.717: 11235: ERR: pid_reuse.c:76: Wrong value in a variable after restore
14:47:57.717:     4: FAIL: pid_reuse.c:110: Task 11235 exited with wrong code 1 (errno = 11 (Resource temporarily unavailable))

<<< ================================

https://jira.sw.ru/browse/PSBM-67502

v3: simplify waitpid's status check
v9: switch to test_wait_pre_dump(_ack)

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-27 01:04:15 +03:00
Pavel Tikhomirov
348c39bf14 zdtm/lib: add pre-dump-notify test flag
If pre-dump-notify flag is set, zdtm sends a notify to the test after
pre-dump was finished and waits for the test to send back a reply that
test did all it's work and now is ready for a next pre-dump/dump.

How it can be used:

while (!test_wait_pre_dump()) {
	/* Do something after predump */
	test_wait_pre_dump_ack();
}
/* Do something after restore */

Internally we open two pipes for the test one for receiving notify (with
two open ends) and one for replying to it (only write end open). Fds of
pipes are dupped to predefined numbers and zdtm opens these fds through
/proc/<test-pid>/fd/{100,101} and communicates with the test.

v9: switch to two way interface to remove race then operation we try to
run after predump may be yet unfinished at the time of next dump.

Suggested-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-27 01:04:15 +03:00
Pavel Tikhomirov
6d281417a5 memory: don't use parent memdump if detected possible pid reuse
We have a problem when a pid is reused between consequent dumps we can't
understand if pagemap and pages from images of parent dump are invalid
to restore these pid already. That can lead even to wrong memory
restored for these pid, see the test in last patch.

So these is a try do separate processes with (likely) invalid previous
memory dump from processes with 100% valid previous dump.

For that we use the value of /proc/<pid>/stat's start_time and also the
timestamp of each (pre)dump. If the start time is strictly less than the
timestamp, that means that the pagemap for these pid from previous dump
is valid - was done for exactly the same process.

Creation time is in centiseconds by default so if predump is really fast
(<1csec) we can have false negative decisions for some processes, but in
case of long running processes we are fine.

https://jira.sw.ru/browse/PSBM-67502

v2: remove __maybe_unused for get_parent_stats; fix get_parent_stats to
have static typing; print warning only if unsure; check has_dump_uptime
v3: read parent stats from image only once; reuse stat from previous
parse_pid_stat call on dump
v4: move code to function; use unsigned long long for ticks; put
proc_pid_stat on mem_dump_ctl; print warning on all pid-reuse cases
v5: free parent's stats entry properly, pass it in arguments to
(pre_)dump_one_task
v6: free parent's stats in error path too
v7: zero init parent_se
v8: improve error message
v9: switch to inventory image from stats, if pid-reuse fails - fail
current dump

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-27 01:04:15 +03:00
Pavel Tikhomirov
f6d454b4ce inventory: add a helper to get entry of parent pre-dump
will be used in the next patch

https://jira.sw.ru/browse/PSBM-67502

note: actually we need only one value from inventory entry but I still
prefer general helper as we still need to read and allocate memory
for the whole structure

v2: fix get_parent_stats to have static typing
v3: simplify get_parent_stats to return a StatsEntry pointer instead of
doing it through arguments
v8: replace errors with warnings, we should whatch on them only if we
have corresponding error in detect_pid_reuse else they are fine
v9: change stats to inventory image

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-27 01:04:15 +03:00
Pavel Tikhomirov
688ffa50f1 inventory: save uptime to know when dump had happened
We want to use a simple fact: If we have an alive process in a pstree we
want to dump, and a starttime of that process is less than pre-dump's
timestamp (taken while all processes were freezed), then these exact
process existed (100% sure) at the time of these pre-dump and the
process' memory was dumped in images.

So save inventory image on pre-dump and put there an uptime.

https://jira.sw.ru/browse/PSBM-67502

v9: improve comment, put uptime to ivnentory image as 1) where is no
stats in parent images directory if --work-dir option is set to
something different then images directory, 2) stats-dump is not an image
and it is a bad practice to put there data required for restoring.
v10:s/u_int64_t/uint64_t/

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-27 01:04:15 +03:00
Pavel Tikhomirov
da21d2d17a parse: add a helper to obtain an uptime
will be used in the next patch

https://jira.sw.ru/browse/PSBM-67502

note: man for /proc/uptime says that uptime is in seconds and for now
the format is "seconds.centiseconds", where ecentiseconds is 2 digits

note: now uptime is in csec but I prefer saving it in usec, that allows
us to be reuse these image field when/if we have more accurate value.

v8: add length specifier to parse only centiseconds
v9: put uptime to u_int64_t directly, define CSEC_PER_SEC
v10: switch to uint64_t from u_int64_t, comment about usec in image

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-27 01:04:15 +03:00
Pavel Tikhomirov
9b0003676a Revert "parse: add a helper to obtain an uptime"
This reverts commit cf2f035d9f5cdca96c814bd26e24c556ad736171.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-27 01:04:15 +03:00
Pavel Tikhomirov
ae6de318e5 Revert "stats: save uptime to know when dump had happened"
Leave dump_uptime in stats file for backward and forward compatibility
though it is unused now.

This reverts commit fbba4d249a49e34e41c7c63ed77fab1bee3a13de.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-27 01:04:15 +03:00
Pavel Tikhomirov
7ca133f5b3 Revert "stats: add a helper to get stats of parent pre-dump"
This reverts commit 4a43486e24cf543ed2c0320552087f506c51635f.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-27 01:04:15 +03:00
Pavel Tikhomirov
c8346275a8 Revert "memory: don't use parent memdump if detected possible pid reuse"
This reverts commit ffd415a5b5b8a9ef8fb99904a3c9b04ecdb3052b.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-27 01:04:15 +03:00
Pavel Tikhomirov
08e5fe2560 files: define O_TMPFILE
These fixes compilation on VZ7:
https://ci.openvz.org/job/CRIU/job/CRIU-virtuozzo/job/criu-dev/3605/console

https://jira.sw.ru/browse/PSBM-83713
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-27 00:53:46 +03:00
Andrei Vagin
7fd3936350 service: don't cache a service descriptor
Service descriptros can be moved in a child process.

v2: handle errors of install_service_fd() properly

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-26 22:30:09 +03:00
Dmitry Safonov
dcafa78b96 test/make: Include .d files
Include deps files to recompile tests when dependency has changed.

Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Reported-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-26 22:03:35 +03:00
Andrey Vagin
7a0b698b59 zdtm: calling futex via syscall saves error codes in errno
man 2 futex:
  In  the  event  of  an error (and assuming that futex() was invoked via
  syscall(2)), all operations return -1 and set  errno  to  indicate  the
  cause of the error.

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-26 22:03:35 +03:00
Radostin Stoyanov
1c22d7ba86 Remove redundant semicolons
Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-25 21:09:41 +03:00
Radostin Stoyanov
ffd25aa856 net: Remove trailing whitespace
Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-25 21:09:41 +03:00
Pavel Tikhomirov
87b84e59c5 files: fix clone_service_fd overlap handling
Though LOG_FD_OFF < IMG_FD_OFF, get_service_fd(LOG_FD_OFF) is > than
get_service_fd(IMG_FD_OFF), see __get_service_fd, so the check here
should be twisted. Also add bug_on to track possible __get_service_fd
change which can break these check again.

We have a problem when USERNSD_SK replaces LOG_FD_OFF, latter when
writing to log, instead we actually send crazy commands to usernsd,
which fails to handle them and BUGs or crashes.

https://jira.sw.ru/browse/PSBM-83472

Also we had similar problem when __userns_call receives bad repsonse,
likely it has the same background:

https://api.travis-ci.org/v3/job/352164661/log.txt

fixes commit 129bb14611c3 ("files: Prepare clone_service_fd() for
overlaping ranges.")

v2: move BUG_ON to main() to check it only once, use min+1 and max-1

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Acked-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-21 02:43:23 +03:00
Andrei Vagin
2600d6e76d files: drop O_TMPFILE from file descriptor flags
Unnamed temporary files are restored as ghost files.

If O_TMPFILE is set for the open() syscall, the pathname argument
specifies a directory, but criu gives a path to a ghost file.

(00.107450)     36: Error (criu/files-reg.c:1757): Can't open file tmp/#42274874 on restore: Not a directory

Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-21 02:40:44 +03:00
Andrei Vagin
9b2ecd4948 zdtm: add a test to check O_TMPFILE
man 2 open:
...
O_TMPFILE (since Linux 3.11)

Create  an unnamed temporary file.  The pathname argument speci‐ fies a
directory; an unnamed  inode  will  be  created  in  that directory's
filesystem.  Anything written to the resulting file will be lost when
the last file descriptor is closed, unless the file is given a name.
...

Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-21 02:40:30 +03:00
Andrei Vagin
c9c4baa3eb jenkins: add a pipeline file for criu-lazy-migration
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-21 02:25:20 +03:00
Radoslaw Burny
230005ca79 sfds: Fix UB in choose_service_fd_base due to calling __builtin_clz(0)
__builtin_clz(0) leads to undefined behaviour:
https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html

Set nr = 1 directly to avoid this.

Link: https://github.com/checkpoint-restore/criu/issues/470
Signed-off-by: Radoslaw Burny <rburny@google.com>
Acked-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-19 23:36:47 +03:00
Mike Rapoport
2bedf8d995 lazy-pages: don't try to uffd_copy to removed memory regions
It is possible that when pages request from the remove source arrive, part
of the memory range covered by the request would be already gone because of
madvise(MADV_DONTNEED), mremap() etc.
Ensure we are not trying to uffd_copy more than we are allowed.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-19 23:30:57 +03:00
Mike Rapoport
9cb20327aa lazy-pages: return to epoll_wait after completing forks
If we get fork() event just before transferring last IOV of the parent
process, continuing to background fetch after completing fork event
handling will cause lazy-pages daemon to exit and nothing will monitor the
child process memory.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-19 23:30:57 +03:00
Mike Rapoport
dd3704a644 lazy-pages: update events handling to take requests into account
Since the memory mapping is now split between ->iovs and ->reqs lists, any
update to memory layout should take into account both lists.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-19 23:30:57 +03:00
Mike Rapoport
d4d09942de lazy-pages: cache buffer size in the lazy_pages_info
Instead of recalculating required for lazy_pages_info->buf when copying
IOVs at fork() time, keep the size of the buffer in the lazy_pages_info
struct.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-19 23:30:57 +03:00
Mike Rapoport
9fe912f974 lazy-pages: handle_requests: fix return value propagation
When we return from epoll_run_rfds with positive return value it means that
event handling loop was interrupted because the event should be handled
outside of that loop. Is always the case with UFFD_EVENT_FORK.

It may happen that the event occurred after we've completed the memory
transfer and we are on the way to successful return from the
handle_requests() function, but instead of returning 0 we will return the
positive value we've got from epoll_run_rfds.

Explicitly assigning return value of complete_forks() fixes this issue.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-19 23:30:57 +03:00
Mike Rapoport
12cc41f671 lazy-pages: merge_iov_lists: fix corner case of empty destination
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-19 23:30:57 +03:00
Mike Rapoport
48d9930055 lazy-pages: introduce merge_iov_lists helper
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-19 23:30:57 +03:00
Mike Rapoport
85fd6abea4 test: lazy-pages: exclude maps007
With userfaultfd we cannot reliably service process_vm_readv calls. The
maps007 test that uses these calls passed previously by sheer luck.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-19 23:26:34 +03:00