We have a problem when a pid is reused between consequent dumps we can't
understand if pagemap and pages from images of parent dump are invalid
to restore these pid already. That can lead even to wrong memory
restored for these pid, see the test in last patch.
So these is a try do separate processes with (likely) invalid previous
memory dump from processes with 100% valid previous dump.
For that we use the value of /proc/<pid>/stat's start_time and also the
timestamp of each (pre)dump. If the start time is strictly less than the
timestamp, that means that the pagemap for these pid from previous dump
is valid - was done for exactly the same process.
Creation time is in centiseconds by default so if predump is really fast
(<1csec) we can have false negative decisions for some processes, but in
case of long running processes we are fine.
https://jira.sw.ru/browse/PSBM-67502
v2: remove __maybe_unused for get_parent_stats; fix get_parent_stats to
have static typing; print warning only if unsure; check has_dump_uptime
v3: read parent stats from image only once; reuse stat from previous
parse_pid_stat call on dump
v4: move code to function; use unsigned long long for ticks; put
proc_pid_stat on mem_dump_ctl; print warning on all pid-reuse cases
v5: free parent's stats entry properly, pass it in arguments to
(pre_)dump_one_task
v6: free parent's stats in error path too
v7: zero init parent_se
v8: improve error message
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
will be used in the next patch
https://jira.sw.ru/browse/PSBM-67502
note: actually we need only one value from stats entry but I still
prefer general helper as we still need to read and allocate memory
for the whole structure
v2: fix get_parent_stats to have static typing
v3: simplify get_parent_stats to return a StatsEntry pointer instead of
doing it through arguments
v8: replace errors with warnings, we should whatch on them only if we
have corresponding error in detect_pid_reuse else they are fine
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
We want to use a simple fact: If we have an alive process in a pstree we
want to dump, and a starttime of that process is less than pre-dump's
timestamp, then these exact process existed (100% sure) at the time of
these pre-dump and the process' memory was dumped in images. Thus we
save uptime while in freezed state else these won't work.
https://jira.sw.ru/browse/PSBM-67502
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
will be used in the next patch
https://jira.sw.ru/browse/PSBM-67502
note: man for /proc/uptime says that uptime is in seconds and for now
the format is "seconds.centiseconds", where ecentiseconds is 2 digits
v8: add length specifier to parse only centiseconds
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Child may see close() result before it receives signal,
while it shouldn't see it. Instead of games with later
close(), just stop do it. sys_exit() after program finish
will close them all.
Reported-by: Andrey Vagin <avagin@virtuozzo.com>
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Currently we save only attributes with non-zero values. For example,
a default value for IFLA_IPTUN_PROTO is IPPROTO_IPV6 (41), so we have to
save even attributes with zero values.
https://github.com/checkpoint-restore/criu/issues/445
Fixes: 4a044e6af93f ("net: Dump regular sit device")
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
x86's kerndat section in crtools.c has grown too much.
Let's make it more readable and *looking at cleared include-list*,
it'll better parallelize build.
Maybe we should turn __weak function into 0-defines.
Or clean 0-defines with ifdefs in generic file.
I have no strong opinion on that.
Signed-off-by: Dmitry Safonov <dima@arista.com>
Show error message when image-{cache,proxy} is called without --port
and image-proxy without --address argument.
Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
The --lazy-migrate option allows testing of lazy migration when running ns
or uns flavor.
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
CID 85039 (#1 of 1): Unchecked return value (CHECKED_RETURN)
6. check_return: Calling ptrace without checking return value (as is done elsewhere 44 out of 49 times).
CID 155804 (#1 of 1): Unchecked return value (CHECKED_RETURN)
2. check_return: Calling umount2 without checking return value (as is done elsewhere 8 out of 9 times).
CID 73358 (#1 of 1): Improper use of negative value (NEGATIVE_RETURNS)
8. negative_returns: sk is passed to a parameter that cannot be negative. [hide details]
CID 154076 (#1 of 1): Unchecked return value from library (CHECKED_RETURN)
1. check_return: Calling setsockopt(sk, 6, 1, &val, 4U) without checking return value. This library function may fail and return an error code.
CID 161693 (#1 of 1): Resource leak (RESOURCE_LEAK)
5. leaked_storage: Variable new going out of scope leaks the storage it points to.
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
*** CID 179043: (USE_AFTER_FREE)
close bfd fd safe so that we won't have double close
*** CID 179041: Resource leaks (RESOURCE_LEAK)
don't forget to close fd on error
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
*** CID 185302: Null pointer dereferences (NULL_RETURNS)
/test/zdtm/static/cgroup_ifpriomap.c: 107 in read_one_priomap()
>>> Dereferencing a pointer that might be null "out->ifname" when calling "strncpy".
There is also a warning about using rand(), but..
Not sure that we need to entangle everything just for pleasing Coverity:
>>> CID 185301: Security best practices violations (DC.WEAK_CRYPTO)
>>> "rand" should not be used for security related applications, as linear congruential algorithms are too easy to break.
Leaving that as-is and marking in Coverity as WONTFIX.
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
I've also dropped `noauto' in this patch, reverting the
commit be98273cf137 ("zdtm: mark static/cgroup_ifpriomap as noauto")
Don't see any sense to separate it as another patch.
Fixes: #383
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Currently, if pipe is shared between lazy and non-lazy PPBs lazy migration
fails because data that should be transfered on demand is spliced into the
images. Preventing pipe sharing between PPBs of different type resolves
this issue.
In order to still minimize pipe fragmentation, we track the last pipe that
was used for certain PPB type and re-use it for the PPB of the same type.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
As it's aligned to 16, all structures that contain it should be
also aligned to 16. In the kernel there is no such align as
there two separate definitions of i387_fxsave_struct:
one for ia32 and another for x86_64.
Fixes newly introduced align warning in gcc-8.1:
In file included from compel/include/uapi/compel/asm/sigframe.h:7,
from compel/plugins/std/infect.c:13:
compel/include/uapi/compel/asm/fpu.h:89:1: error: alignment 1 of 'struct xsave_struct_ia32' is less than 16 [-Werror=packed-not-aligned]
} __packed;
^
It doesn't change the current align of the struct, as containing
structures are __packed and it aligned already *by fact*.
It only affects the function users of the struct's local variables:
now they lay aligned on a stack.
Signed-off-by: Dmitry Safonov <dima@arista.com>
Ugh, I've spent 25 mins at 4 A.M. to figure out why the tests are failing.
And the reason is stupied me, who defined a new flag after 0x8
as 0x16, not as 0x10. Simplify those definitions for such simple-minded
living creatures like Dima.
Signed-off-by: Dmitry Safonov <dima@arista.com>
On Skylake processors and kernel older than v4.14
ptrace(PTRACE_GETREGSET, pid, NT_X86_XSTATE, iov)
may return not full xstate, ommiting FP part (that is XFEATURE_MASK_FP).
There is a patch which describes this bug:
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1318800.html
Anyway, it's fixed in v4.14 kernel by (what we believe with Andrey) this:
https://patchwork.kernel.org/patch/9567939/
As we still support kernels from v3.10 and newer, we need to have a
workaround for this kernel bug on Skylake CPUs.
Big thanks to Shlomi for the reports, the effort and for providing an
Amazon VM to test this. I wish more bug reporters were like you.
Reported-by: Shlomi Matichin <shlomi@binaris.com>
Provided-test-env: Shlomi Matichin <shlomi@binaris.com>
Investigated-with: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Mere cleanup. For Skylake workaround I'll call one after another,
so it's better separate it in a small helpers.
Signed-off-by: Dmitry Safonov <dima@arista.com>
get_task_regs() needs to know if it needs to use workaround
for a Skylake ptrace() bug. The next patch will introduce a
new flag for that.
I also thought about making 3 versions of get_task_regs() and
adding them to ictx->get_task_regs() depending on the flags..
But get_task_regs() is a private function and infect_ctx is
a uapi.. So, let's just pass context flags to get_task_regs().
Signed-off-by: Dmitry Safonov <dima@arista.com>
As we anyway define save_regs_t for other purposes,
use it in the function declaration.
To unify infect_ctx style, add make_sigframe_t.
Mere cleanup.
Signed-off-by: Dmitry Safonov <dima@arista.com>
We need to know if ptrace(PTRACE_GETREGSET, pid, NT_X86_XSTATE, iov)
returns xsave without FP state part.
Signed-off-by: Dmitry Safonov <dima@arista.com>
We ignore restore_one_*notify() error code, while we mustn't.
Make open function fail when we can't restore them.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Imagine, we have to restore inotify with watch descriptor 0x34d71d6.
Then we have:
1.235021 5578: fsnotify: Watch got 0x1 but 0x34d71d6 expected
...
...
527.378042 5578: fsnotify: Watch got 0x34d71d3 but 0x34d71d6 expected
527.378042 5578: fsnotify: Watch got 0x34d71d4 but 0x34d71d6 expected
527.378042 5578: fsnotify: Watch got 0x34d71d5 but 0x34d71d6 expected
Stop doing this and stop generating GBs of debug messages.
We already have print message before restore_one_inotify().
Let's add just one more after it.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
zdtm: Add scm06 test
From: Kirill Tkhai <ktkhai@virtuozzo.com>
This test makes looped unix sockets queues and tries
to iterate over them after the restore.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Everything is ready. Message queue restores are in
the second stage of open for all types of unix sockets.
We just need to make scm wait before restore_unix_queue()
and allow to dump such scm context.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
When we allow unix sockets sent over unix sockets,
dump_sk_queue() may dump and resolve some peers.
So, we need run it firstly and avoid linking our
peer_node to peer's peer_list.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Similar to previous patch, this makes the second end
of dgram socketpair to be open till post open. This
allows to delay restore of message queue.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
This makes the second end of socketpair to live till post_open.
We need it alive if we want to restore message queue later.
Otherwise, we do not have a queuer, which fd is used to actually
write messages.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
This function will be used to allocate id for fake files
(don't confuse with fake fds, e.g. fles).
Suggested-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
restore_sk_common() may shutdown a socket, and queuer
won't be able to connect to it. So, this action must
be postponed.
We have this problem since long ago, but we are lucky
we haven't bumped in it.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Actually, there is no functional changes. We just postpone
restore of the queues. This will be used in the further
patches to restore unix sockets sent over unix sockets.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
After previous patch, master and slave ends of socketpair
are owned by the only task. So, we may avoid using
of send_desc_to_peer() of the second end, and just
reopen it with right pid.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>