2
0
mirror of https://github.com/checkpoint-restore/criu synced 2025-08-30 22:05:36 +00:00
Commit Graph

9606 Commits

Author SHA1 Message Date
Cyrill Gorcunov
b5d9bf130b seccomp: Define log prefix
For grepability sake in logs.

Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-15 03:51:16 +03:00
Cyrill Gorcunov
9b1238384e seccomp: compel -- Add PTRACE_SECCOMP_GET_METADATA definition
We will use it to figure out if filter log target is used.
Metadata associated with seccomp filter is relatively new
feature which allows userspace to get and set it back.

Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-15 03:51:16 +03:00
Andrei Vagin
35e8a3446f travis: build criu and run tests on centos
It is one of our target platforms.

Cc: Adrian Reber <areber@redhat.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Acked-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-15 03:05:15 +03:00
Dmitry Safonov
c37583942c ppc64/aarch64: Dynamically define PAGE_SIZE
On ppc64/aarch64 Linux can be set to use Large pages, so the PAGE_SIZE
isn't build-time constant anymore. Define it through _SC_PAGESIZE.

There are different sizes for a page on ppc64:
: #if defined(CONFIG_PPC_256K_PAGES)
: #define PAGE_SHIFT              18
: #elif defined(CONFIG_PPC_64K_PAGES)
: #define PAGE_SHIFT              16
: #elif defined(CONFIG_PPC_16K_PAGES)
: #define PAGE_SHIFT              14
: #else
: #define PAGE_SHIFT              12
: #endif

And on aarch64 there are default sizes and possibly someone can set his
own PAGE_SHIFT:
: config ARM64_PAGE_SHIFT
:         int
:         default 16 if ARM64_64K_PAGES
:         default 14 if ARM64_16K_PAGES
:         default 12

On the downside - each time we need PAGE_SIZE, we're doing libc
function call on aarch64/ppc64.

Fixes: #415

Tested-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-15 03:03:01 +03:00
Dmitry Safonov
9457b3169d criu/proc: Define BUF_SIZE without PAGE_SIZE dependency
PAGE_SIZE will be a variable value on platforms where it can be
different due to large pages.
And looks like (c) there is no reason for BUF_SIZE == PAGE_SIZE,
so let's keep it as it was, rather than complicating it with dynamic
allocation for the buffer.

Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-15 03:03:01 +03:00
Dmitry Safonov
f8064a7413 criu/log: Define log buffer length without PAGE_SIZE
The same value, but as PAGE_SIZE can be different for the same
platform - it's no more static value.

Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-15 03:03:01 +03:00
Dmitry Safonov
14c0abb1da criu/dump: Fix size of personality buffer
Personality value is printed in kernel like this:
static int proc_pid_personality(/* .. */)
{
        int err = lock_trace(task);
        if (!err) {
                seq_printf(m, "%08x\n", task->personality);
                unlock_trace(task);
        }
        return err;
}

So, we don't need a whole page to read the value.

Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-15 03:03:01 +03:00
Dmitry Safonov
0f98ee4641 compel/criu: Add ARCH_HAS_LONG_PAGES to PIE binaries
For architectures like aarch64/ppc64 it's needed to propagate the size
of page inside PIEs. For the parasite page size will be defined during
seizing, and for restorer during early initialization.
Afterward we can use PAGE_SIZE in PIEs like we did before.

Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-15 03:03:01 +03:00
Dmitry Safonov
37c5b64e0e aio: Allow expressions in NR_IOEVENTS_IN_PAGES macro
The macro is used only in aio_estimate_nr_reqs():
unsigned int k_max_reqs = NR_IOEVENTS_IN_NPAGES(size/PAGE_SIZE);

Which compiler may evaluate as (((PAGE_SIZE*size)/PAGE_SIZE) - ...)
It works as long as PAGE_SIZE is long.
The patches set converts PAGE_SIZE to use sysconf() returning
(unsigned), non-long type and making the aio macro overflowing.

I do not see any value making PAGE_SIZE (unsigned long) typed.

Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-15 03:03:01 +03:00
Dmitry Safonov
454fe3e7c0 parasite: Rename misnamed nr_pages
It's actually number of bytes spliced, not pages.
And I bet (unsigned long) suits the purpose more than (int).

Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-15 03:03:01 +03:00
Dmitry Safonov
2ea16be202 criu: Remove PAGE_IMAGE_SIZE
It's unused since commit fd3f33f5d2 ("headers: image.h -- Drop unused
entries"), so let's remove it completely.

Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-15 03:03:01 +03:00
Andrei Vagin
cc88159f8d zdtm: mount tmpfs into /run in a test root
iptables creates /run/xtables.lock file and
we want to have it per-test.

(00.332159)      1: 	Running iptables-restore -w for iptables-restore -w
Fatal: can't open lock file /run/xtables.lock: Permission denied

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-15 02:20:48 +03:00
Andrei Vagin
839875a442 zdtm/socket-ext: clean up test files
Reported-by: Dmitry Safonov <dima@arista.com>
Cc: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Reviewed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-15 02:16:27 +03:00
Andrei Vagin
1bbbd745ce fs: take into account that cr_system overrides standard descriptors
(00.027596) mnt: Dumping mountpoints
(00.027600) mnt: 	90: 2f:/ @ ./dev/pts
(00.027614) mnt: 	130: 2e:/ @ ./run
tar: /proc/self/fd/1: Cannot open: Not a directory
tar: Error is not recoverable: exiting now

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-15 02:16:00 +03:00
Andrei Vagin
b0153d7ed8 config: check that there are not unhandled parameters in a config
Return an error if we meet unexpected parameters in a config file

Cc: Veronika Kabatova <vkabatov@redhat.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 06:35:46 +03:00
Andrei Vagin
849183591b config: initialize the last element of a config argv as NULL
Now we rely on scanf, that it will initializes a pointer to NULL, when
it fails to parse a string, but I can't find in a man page, that it has
to do this.

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 06:35:46 +03:00
Andrei Vagin
133ff575db config: skip spaces at a beginning of lines
Otherwise lines started with spaces are ignored.

Cc: Veronika Kabatova <vkabatov@redhat.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 06:35:46 +03:00
Andrei Vagin
47288993ea net: workaround a problem when iptables can't open /run/xtables.lock
Starting with iptables 1.6.2, we have to use the --wait option,
but it doesn't work properly with userns, because in this case,
we don't have enough rights to open /run/xtables.lock.

(00.174703)      1: 	Running iptables-restore -w for iptables-restore -w Fatal: can't open lock file /run/xtables.lock: Permission denied
(00.192058)      1: Error (criu/util.c:842): exited, status=4
(00.192080)      1: Error (criu/net.c:1738): iptables-restore -w failed
(00.192088)      1: Error (criu/net.c:2389): Can't create net_ns
(00.192131)      1: Error (criu/util.c:1567): Can't wait or bad status: errno=0, status=65280

This patch workarounds this problem by mounting tmpfs into /run.
Net namespaces are restored in a separate process, so we can create a
new mount namespace and create new mounts.

https://github.com/checkpoint-restore/criu/issues/469

Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 06:35:19 +03:00
Andrei Vagin
938c966ad3 zdtm: create /run in a test root
iptables 1.6.2 fails without /run, because it tries to create
the /run/xtables.lock file

Test output: ================================
Fatal: can't open lock file /run/xtables.lock: No such file or directory
23:29:06.098:     4: ERR: netns-nf.c:21: Can't set input rule (errno = 2 (No such file or directory))
23:29:06.098:     3: ERR: test.c:315: Test exited unexpectedly with code 255

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-12 06:35:19 +03:00
Cyrill Gorcunov
10a9d9dd2d log: pr_warn_once -- Fix formatting
Signed-off-by: Cyrill Gorcunov <gorcunov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-10 03:31:32 +03:00
Mike Rapoport
a50ae653b2 Revert "travis: temporarily disable lazy-pages testing"
This reverts commit 4459590d5f.

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-10 00:37:01 +03:00
Mike Rapoport
850095dad4 lazy-pages: make uffd_io_complete more robust
Make sure we handle various corner cases:
* we received less pages than requested
* the request was capped because of unmap/remap etc
* the process has exited underneath us

Currently we are freeing the request once we've found the address to use
with uffd_copy(). Instead, let's keep the request object around, use it to
properly calculate number of pages we pass to uffd_copy() and then re-add
tailing range (if any) to the IOVs list.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-10 00:37:01 +03:00
Mike Rapoport
00ce9b52aa lazy-pages: factor out insertion to sorted IOV list
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-10 00:37:01 +03:00
Mike Rapoport
9124dbd7f9 criu/config: reduce the number of argv traversals
Instead of pre-parsing command line twice, one time to detect -h/--help and
another time to find config file parameter, check for both in one pass.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-10 00:20:55 +03:00
Mike Rapoport
81621ce14c criu/config: allow init_config properly handle -h/--help
When config parsing was split into a separate part the handling of
-h/--help option during init_config was broken. Fix it.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-10 00:20:55 +03:00
Mike Rapoport
cc101ae63b criu/config: rename variables counting options in config files
s/first_count/global_cfg_argc
s/second_count/user_cfg_argc

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-10 00:20:55 +03:00
Adrian Reber
4386b324b3 Fix building unlink_fstat00 unlink test case
Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-09 23:59:58 +03:00
Mike Rapoport
253061634e criu: fix 'criu --version'
Currently kerndat_init() runs before command line parsing and running
simple 'criu --version' command may produce something like:

Warn  (criu/kerndat.c:847): Can't load /run/criu.kdat
Error (criu/util.c:842): exited, status=3
Error (criu/util.c:842): exited, status=3
Write 4294967295 to /proc/self/loginuid failed: Operation not permittedWarn
(criu/net.c:2732): Unable to get socket network namespace
Warn  (criu/net.c:2732): Unable to get tun network namespace
Warn  (criu/sk-unix.c:213): sk unix: Unable to open a socket file:
Operation not permitted
Error (criu/net.c:3023): Unable create a network namespace: Operation not
permitted
Warn  (criu/net.c:3069): NSID isn't reported for network links
Version: 3.6
GitID: v3.6-611-g0b27d0a

Group early calls to kerndat_* and init_service_fd calls into a function
and call this function after the command line parsing is finished.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-09 10:17:47 +03:00
Mike Rapoport
5f18a6e804 criu: split configuration parsing into separate file
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-09 10:17:47 +03:00
Joel Nider
0c41a4208b compel: std_vprint_num returns a null-terminated string
This function is an analogue to vsprintf(), and is used in very much the
same way. The caller expects the modified string pointer to be pointing to
a null-terminated string.

Signed-off-by: Joel Nider <joeln@il.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-09 10:17:47 +03:00
Andrei Vagin
3bc9e95e49 travis: rollback to fedora 27
We have a few issues with fc28. For example:
https://github.com/checkpoint-restore/criu/issues/469

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-09 10:17:38 +03:00
Joel Nider
21b754921e criu/restorer: print valid string when set last_pid fails
The string returned by std_vprint_num() is right-aligned in the buffer.
Therefore, we must print the string starting from the pointer returned in
the 'ps' argument, and not from the start of the original buffer.

Signed-off-by: Joel Nider <joeln@il.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-08 00:32:30 +03:00
Mike Rapoport
8aba610bd3 lazy-pages: fork: fix duplication of IOV lists
Instead of merging unfinished requests with child's IOVs we queued them
into parent's IOV list. Fix it.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-08 00:27:20 +03:00
Mike Rapoport
e77df5abd8 lazy-pages: actually return to epoll_wait after completing forks
Commit 9cb20327aa ("return to epoll_wait after completing forks") was only
half way there. Adding the other half.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-05-02 03:26:32 +03:00
Andrei Vagin
43c3772334 zdtm: include lock.h after zdtmtst.h
In file included from s390x_gs_threads.c:10:0:
../lib/lock.h: In function 'mutex_lock':
../lib/lock.h:148:4: error: implicit declaration of function 'pr_perror' [-Werror=implicit-function-declaration]
    pr_perror("futex");

Reported-by: Mr Jenkins

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-30 02:10:59 +03:00
Dmitry Safonov
4439454088 make: Don't set $(MAKEFLAGS)
We shouldn't set MAKEFLAGS by the following reasons:
1. User may want to specify some make parameter (e.g., `-d` for debug)
2. We lose parallel build. No `-j` is passed to submake and it looks
   like, gnu/make will not deal with parallel recursive make if
   $(MAKEFLAGS) is unset back.
   Easy to verify: Add `sleep 3` to build rule in Makefile.inc and
   you'll find only one sleep process at a time. After the patch
   if you specify say `-j5` to make - you'll have 5 sleep processes.

Reverts: commit e9beed7bb3 ("build: zdtm -- Add implicit rules into
zdtm building").

Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-30 00:35:49 +03:00
Dmitry Safonov
c574c2883f test/make: Drop implicit make variables
Let's drop usage of COMPILE.c, OUTPUT_OPTION.
It will allow run submake with -R.

Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-30 00:35:49 +03:00
Dmitry Safonov
4cbd3ac447 nmk: Don't redefine MAKEFLAGS
$(MAKEFLAGS) already contains -r -R and --no-print-directory: those
flags are being added in include.mk.. which is included two lines above.
There is no comment and I see no big sense in erasing $(MAKEFLAGS),
rather than adding those flags. So I considered this as a typo.

Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-30 00:35:49 +03:00
Andrei Vagin
0771942c73 zdtm: always run criu dump with --track-mem if --snaps is set
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-30 00:30:21 +03:00
Andrei Vagin
45620418e4 inventory: save dump_uptime for criu dump if track_mem is set
A set of images from criu dump can be used as a previous point, when we
are doing snapshots. In this case, each point contains a full set of
images.

https://github.com/checkpoint-restore/criu/issues/479

v2: return -1 if invertory_save_uptime failed

Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Reviewed-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-30 00:30:21 +03:00
Andrei Vagin
297aa5b428 mem: return -1 from __parasite_dump_pages_seized in error cases
Here is one of often mistakes:

int funcX()
{
	int ret;

	ret = funcA()
	if (ret < 0)
		goto err;

	if (smth)
		goto err; // return 0 !!!!

err:
	return ret;
}

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-30 00:30:21 +03:00
Kirill Tkhai
9ddf9a224a helper: Move service fds closing code to restore_one_helper()
There is no reasons we need this cleanup code in generic
restore_one_task(), so let's move it for better readability.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-27 18:09:04 +03:00
Pavel Tikhomirov
c32c7371b9 zdtm: check that pid-reuse does not break iterative memory dump
The idea of the test is:

1) mmap separate page and put variable there, so that other usage does
not dirty these region. Initialize the variable with VALUE_A.

2) fork a child with special pid == CHILD_NS_PID. Only if it is a first
child overwrite the variable with VALUE_B.

3) wait for the end of the next predump or end of restore with
test_wait_pre_dump_ack/test_wait_pre_dump pair and kill our child.

Note: The memory region is "clean" in parent.

4) goto (2) unles end of cr is reported by test_waitpre

So on first iteration child with pid CHILD_NS_PID was dumped with
VALUE_B, on all other iterations and on final dump other child with the
same pid exists but with VALUE_A. But on all iterations after the first
one we have these memory region "clean". So criu before the fix would
have restored the VALUE_B taking it from first child's image, but should
restore VALUE_A.

Note: Child in its turn waits termination and performs a check that variable
value doesn't change after c/r.

We should run the test with at least one predump to trigger the problem:

[root@snorch criu]# ./test/zdtm.py run --pre 1 -k always -t zdtm/transition/pid_reuse
Checking feature ns_pid
Checking feature ns_get_userns
Checking feature ns_get_parent

=== Run 1/1 ================ zdtm/transition/pid_reuse

===================== Run zdtm/transition/pid_reuse in ns ======================
DEP       pid_reuse.d
CC        pid_reuse.o
LINK      pid_reuse
Start test
Test is SUID
./pid_reuse --pidfile=pid_reuse.pid --outfile=pid_reuse.out
Run criu pre-dump
Send the 10 signal to  52
Run criu dump
Run criu restore
Send the 15 signal to  73
Wait for zdtm/transition/pid_reuse(73) to die for 0.100000
Test output: ================================
14:47:57.717: 11235: ERR: pid_reuse.c:76: Wrong value in a variable after restore
14:47:57.717:     4: FAIL: pid_reuse.c:110: Task 11235 exited with wrong code 1 (errno = 11 (Resource temporarily unavailable))

<<< ================================

https://jira.sw.ru/browse/PSBM-67502

v3: simplify waitpid's status check
v9: switch to test_wait_pre_dump(_ack)

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-27 01:04:15 +03:00
Pavel Tikhomirov
348c39bf14 zdtm/lib: add pre-dump-notify test flag
If pre-dump-notify flag is set, zdtm sends a notify to the test after
pre-dump was finished and waits for the test to send back a reply that
test did all it's work and now is ready for a next pre-dump/dump.

How it can be used:

while (!test_wait_pre_dump()) {
	/* Do something after predump */
	test_wait_pre_dump_ack();
}
/* Do something after restore */

Internally we open two pipes for the test one for receiving notify (with
two open ends) and one for replying to it (only write end open). Fds of
pipes are dupped to predefined numbers and zdtm opens these fds through
/proc/<test-pid>/fd/{100,101} and communicates with the test.

v9: switch to two way interface to remove race then operation we try to
run after predump may be yet unfinished at the time of next dump.

Suggested-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-27 01:04:15 +03:00
Pavel Tikhomirov
6d281417a5 memory: don't use parent memdump if detected possible pid reuse
We have a problem when a pid is reused between consequent dumps we can't
understand if pagemap and pages from images of parent dump are invalid
to restore these pid already. That can lead even to wrong memory
restored for these pid, see the test in last patch.

So these is a try do separate processes with (likely) invalid previous
memory dump from processes with 100% valid previous dump.

For that we use the value of /proc/<pid>/stat's start_time and also the
timestamp of each (pre)dump. If the start time is strictly less than the
timestamp, that means that the pagemap for these pid from previous dump
is valid - was done for exactly the same process.

Creation time is in centiseconds by default so if predump is really fast
(<1csec) we can have false negative decisions for some processes, but in
case of long running processes we are fine.

https://jira.sw.ru/browse/PSBM-67502

v2: remove __maybe_unused for get_parent_stats; fix get_parent_stats to
have static typing; print warning only if unsure; check has_dump_uptime
v3: read parent stats from image only once; reuse stat from previous
parse_pid_stat call on dump
v4: move code to function; use unsigned long long for ticks; put
proc_pid_stat on mem_dump_ctl; print warning on all pid-reuse cases
v5: free parent's stats entry properly, pass it in arguments to
(pre_)dump_one_task
v6: free parent's stats in error path too
v7: zero init parent_se
v8: improve error message
v9: switch to inventory image from stats, if pid-reuse fails - fail
current dump

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-27 01:04:15 +03:00
Pavel Tikhomirov
f6d454b4ce inventory: add a helper to get entry of parent pre-dump
will be used in the next patch

https://jira.sw.ru/browse/PSBM-67502

note: actually we need only one value from inventory entry but I still
prefer general helper as we still need to read and allocate memory
for the whole structure

v2: fix get_parent_stats to have static typing
v3: simplify get_parent_stats to return a StatsEntry pointer instead of
doing it through arguments
v8: replace errors with warnings, we should whatch on them only if we
have corresponding error in detect_pid_reuse else they are fine
v9: change stats to inventory image

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-27 01:04:15 +03:00
Pavel Tikhomirov
688ffa50f1 inventory: save uptime to know when dump had happened
We want to use a simple fact: If we have an alive process in a pstree we
want to dump, and a starttime of that process is less than pre-dump's
timestamp (taken while all processes were freezed), then these exact
process existed (100% sure) at the time of these pre-dump and the
process' memory was dumped in images.

So save inventory image on pre-dump and put there an uptime.

https://jira.sw.ru/browse/PSBM-67502

v9: improve comment, put uptime to ivnentory image as 1) where is no
stats in parent images directory if --work-dir option is set to
something different then images directory, 2) stats-dump is not an image
and it is a bad practice to put there data required for restoring.
v10:s/u_int64_t/uint64_t/

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-27 01:04:15 +03:00
Pavel Tikhomirov
da21d2d17a parse: add a helper to obtain an uptime
will be used in the next patch

https://jira.sw.ru/browse/PSBM-67502

note: man for /proc/uptime says that uptime is in seconds and for now
the format is "seconds.centiseconds", where ecentiseconds is 2 digits

note: now uptime is in csec but I prefer saving it in usec, that allows
us to be reuse these image field when/if we have more accurate value.

v8: add length specifier to parse only centiseconds
v9: put uptime to u_int64_t directly, define CSEC_PER_SEC
v10: switch to uint64_t from u_int64_t, comment about usec in image

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-27 01:04:15 +03:00
Pavel Tikhomirov
9b0003676a Revert "parse: add a helper to obtain an uptime"
This reverts commit cf2f035d9f.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-27 01:04:15 +03:00
Pavel Tikhomirov
ae6de318e5 Revert "stats: save uptime to know when dump had happened"
Leave dump_uptime in stats file for backward and forward compatibility
though it is unused now.

This reverts commit fbba4d249a.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-04-27 01:04:15 +03:00