2
0
mirror of https://github.com/checkpoint-restore/criu synced 2025-08-26 03:47:35 +00:00

9635 Commits

Author SHA1 Message Date
Pavel Tikhomirov
ffd415a5b5 memory: don't use parent memdump if detected possible pid reuse
We have a problem when a pid is reused between consequent dumps we can't
understand if pagemap and pages from images of parent dump are invalid
to restore these pid already. That can lead even to wrong memory
restored for these pid, see the test in last patch.

So these is a try do separate processes with (likely) invalid previous
memory dump from processes with 100% valid previous dump.

For that we use the value of /proc/<pid>/stat's start_time and also the
timestamp of each (pre)dump. If the start time is strictly less than the
timestamp, that means that the pagemap for these pid from previous dump
is valid - was done for exactly the same process.

Creation time is in centiseconds by default so if predump is really fast
(<1csec) we can have false negative decisions for some processes, but in
case of long running processes we are fine.

https://jira.sw.ru/browse/PSBM-67502

v2: remove __maybe_unused for get_parent_stats; fix get_parent_stats to
have static typing; print warning only if unsure; check has_dump_uptime
v3: read parent stats from image only once; reuse stat from previous
parse_pid_stat call on dump
v4: move code to function; use unsigned long long for ticks; put
proc_pid_stat on mem_dump_ctl; print warning on all pid-reuse cases
v5: free parent's stats entry properly, pass it in arguments to
(pre_)dump_one_task
v6: free parent's stats in error path too
v7: zero init parent_se
v8: improve error message

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2018-02-28 22:57:30 +03:00
Pavel Tikhomirov
4a43486e24 stats: add a helper to get stats of parent pre-dump
will be used in the next patch

https://jira.sw.ru/browse/PSBM-67502

note: actually we need only one value from stats entry but I still
prefer general helper as we still need to read and allocate memory
for the whole structure

v2: fix get_parent_stats to have static typing
v3: simplify get_parent_stats to return a StatsEntry pointer instead of
doing it through arguments
v8: replace errors with warnings, we should whatch on them only if we
have corresponding error in detect_pid_reuse else they are fine

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2018-02-28 22:57:30 +03:00
Pavel Tikhomirov
fbba4d249a stats: save uptime to know when dump had happened
We want to use a simple fact: If we have an alive process in a pstree we
want to dump, and a starttime of that process is less than pre-dump's
timestamp, then these exact process existed (100% sure) at the time of
these pre-dump and the process' memory was dumped in images. Thus we
save uptime while in freezed state else these won't work.

https://jira.sw.ru/browse/PSBM-67502

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2018-02-28 22:57:30 +03:00
Pavel Tikhomirov
cf2f035d9f parse: add a helper to obtain an uptime
will be used in the next patch

https://jira.sw.ru/browse/PSBM-67502

note: man for /proc/uptime says that uptime is in seconds and for now
the format is "seconds.centiseconds", where ecentiseconds is 2 digits

v8: add length specifier to parse only centiseconds

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2018-02-28 22:57:30 +03:00
Dmitry Safonov
3930040274 x86/kdat: Check PTRACE_TRACEME return value
Coverity has informed:

*** CID 188251:  Error handling issues  (CHECKED_RETURN)
/criu/arch/x86/crtools.c: 196 in kdat_x86_has_ptrace_fpu_xsave_bug_child()
190             return 0;
191     }
192     #endif
193
194     static int kdat_x86_has_ptrace_fpu_xsave_bug_child(void *arg)
195     {
>>>     CID 188251:  Error handling issues  (CHECKED_RETURN)
>>>     Calling "ptrace" without checking return value (as is done elsewhere 46 out of 51 times).
196             ptrace(PTRACE_TRACEME, 0, 0, 0);
197             kill(getpid(), SIGSTOP);
198             pr_err("Continue after SIGSTOP.. Urr what?\n");
199             _exit(1);
200     }
201

Also added checks for kill() and waitpid().

Signed-off-by: Dmitry Safonov <dima@arista.com>
2018-02-28 22:57:11 +03:00
Kirill Tkhai
d38e8fe965 zdtm: Fix race in zdtm/transition/epoll.c test
Child may see close() result before it receives signal,
while it shouldn't see it. Instead of games with later
close(), just stop do it. sys_exit() after program finish
will close them all.

Reported-by: Andrey Vagin <avagin@virtuozzo.com>
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
2018-02-28 22:57:11 +03:00
Andrei Vagin
e2a892b18c net: save all attributes of sit devices
Currently we save only attributes with non-zero values. For example,
a default value for IFLA_IPTUN_PROTO is IPPROTO_IPV6 (41), so we have to
save even attributes with zero values.

https://github.com/checkpoint-restore/criu/issues/445

Fixes: 4a044e6af93f ("net: Dump regular sit device")
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-02-27 00:27:19 +03:00
Kirill Tkhai
0892cec451 net: Fix namespace fd leak in get_socket_ns()
We open ns_fd via ioctl(SIOCGSKNS), but never close. Fix that.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
2018-02-19 20:52:06 +03:00
Dmitry Safonov
c9e4a908a5 kerndat: Separate per-arch kerndat
x86's kerndat section in crtools.c has grown too much.
Let's make it more readable and *looking at cleared include-list*,
it'll better parallelize build.

Maybe we should turn __weak function into 0-defines.
Or clean 0-defines with ifdefs in generic file.
I have no strong opinion on that.

Signed-off-by: Dmitry Safonov <dima@arista.com>
2018-02-19 20:49:37 +03:00
Radostin Stoyanov
ef65b164d6 crtools: image-{cache, proxy} requires address/port
Show error message when image-{cache,proxy} is called without --port
and image-proxy without --address argument.

Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
2018-02-19 20:48:45 +03:00
Mike Rapoport
0c6dadf2f4 test/jenkins: add script for lazy migration testing
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
2018-02-19 20:37:49 +03:00
Andrei Vagin
4ef926192c zdtm: enable lazy migration testing
The --lazy-migrate option allows testing of lazy migration when running ns
or uns flavor.

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
2018-02-19 20:37:49 +03:00
Andrei Vagin
802caf0252 restore: print a error if ptrace() failed
CID 85039 (#1 of 1): Unchecked return value (CHECKED_RETURN)
6. check_return: Calling ptrace without checking return value (as is done elsewhere 44 out of 49 times).
2018-02-16 22:56:36 +03:00
Andrei Vagin
04c1634dfc cgroup: print errors for umount and rmdir
CID 155804 (#1 of 1): Unchecked return value (CHECKED_RETURN)
2. check_return: Calling umount2 without checking return value (as is done elsewhere 8 out of 9 times).
2018-02-16 22:56:36 +03:00
Andrei Vagin
5ab6836b0c ns: don't dereference a null pointer
CID 181219 (#1 of 1): Dereference null return value (NULL_RETURNS)
3. dereference: Dereferencing a null pointer ns.
2018-02-16 22:56:36 +03:00
Andrei Vagin
edb5da271d soccr: don't leak memory on error paths
CID 172198 (#1 of 1): Resource leak (RESOURCE_LEAK)
9. leaked_storage: Variable sk going out of scope leaks the storage it points to.
2018-02-16 22:56:36 +03:00
Andrei Vagin
f82c078682 criu: don't call close() for a negative value
CID 73358 (#1 of 1): Improper use of negative value (NEGATIVE_RETURNS)
8. negative_returns: sk is passed to a parameter that cannot be negative. [hide details]
2018-02-16 22:56:36 +03:00
Andrei Vagin
8feb217220 util: print all errors in a log
CID 154076 (#1 of 1): Unchecked return value from library (CHECKED_RETURN)
1. check_return: Calling setsockopt(sk, 6, 1, &val, 4U) without checking return value. This library function may fail and return an error code.
2018-02-16 22:56:36 +03:00
Andrei Vagin
34f16a4607 cgroups: don't leak memory on a error path
CID 161693 (#1 of 1): Resource leak (RESOURCE_LEAK)
5. leaked_storage: Variable new going out of scope leaks the storage it points to.

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-02-16 22:56:36 +03:00
Pavel Tikhomirov
e767c69b30 criu: fix leaks detected by coverity scan part 2
*** CID 179043:    (USE_AFTER_FREE)
close bfd fd safe so that we won't have double close

*** CID 179041:  Resource leaks  (RESOURCE_LEAK)
don't forget to close fd on error

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2018-02-16 10:19:57 +03:00
Dmitry Safonov
3a35c7fd86 zdtm/cgroup_ifpriomap: Fix Coverity warning
*** CID 185302:  Null pointer dereferences  (NULL_RETURNS)
/test/zdtm/static/cgroup_ifpriomap.c: 107 in read_one_priomap()
>>>     Dereferencing a pointer that might be null "out->ifname" when calling "strncpy".

There is also a warning about using rand(), but..
Not sure that we need to entangle everything just for pleasing Coverity:
>>>     CID 185301:  Security best practices violations  (DC.WEAK_CRYPTO)
>>>     "rand" should not be used for security related applications, as linear congruential algorithms are too easy to break.
Leaving that as-is and marking in Coverity as WONTFIX.

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
2018-02-16 04:41:59 +03:00
Dmitry Safonov
1406eb92a1 zdtm/cgroup_ifpriomap: Find cgroup's controller's name to mount
I've also dropped `noauto' in this patch, reverting the
commit be98273cf137 ("zdtm: mark static/cgroup_ifpriomap as noauto")
Don't see any sense to separate it as another patch.

Fixes: #383

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
2018-02-16 04:41:59 +03:00
Radostin Stoyanov
63534ad5a3 Remove a redundant "return" statement
Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
2018-02-16 04:40:01 +03:00
Mike Rapoport
e78f0a5f39 page-pipe: do not allow pipe sharing between different PPB types
Currently, if pipe is shared between lazy and non-lazy PPBs lazy migration
fails because data that should be transfered on demand is spliced into the
images. Preventing pipe sharing between PPBs of different type resolves
this issue.
In order to still minimize pipe fragmentation, we track the last pipe that
was used for certain PPB type and re-use it for the PPB of the same type.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
2018-02-16 04:27:11 +03:00
Dmitry Safonov
1566bec9a6 zdtm/fpu02: Don't run the test on !x86 platforms
Fixes: commit 925451c12b2e ("zdtm/x86: Add a mxcsr preserving fpu test")

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
2018-02-16 01:53:22 +03:00
Andrei Vagin
0d786de5f3 criu: fix gcc-8 warnings
criu/sk-packet.c:443:3: error: 'strncpy' output may be truncated
copying 14 bytes from a string of length 15
   strncpy(addr_spkt.sa_data, req.ifr_name, sa_data_size);
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
criu/img-remote.c:383:3: error: 'strncpy' specified bound 4096
equals destination size
   strncpy(snapshot_id, li->snapshot_id, PATHLEN);
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
criu/img-remote.c:384:3: error: 'strncpy' specified bound 4096
equals destination size
   strncpy(path, li->name, PATHLEN);
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
criu/files.c:288:3: error: 'strncpy' output may be truncated copying
4095 bytes from a string of length 4096
   strncpy(buf, link->name, PATH_MAX - 1);
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
criu/sk-unix.c:239:36: error: '/' directive output may be truncated
writing 1 byte into a region of size between 0 and 4095
   snprintf(path, sizeof(path), ".%s/%s", dir, sk->name);
                                    ^
criu/sk-unix.c:239:3: note: 'snprintf' output 3 or more bytes
(assuming 4098) into a destination of size 4096
   snprintf(path, sizeof(path), ".%s/%s", dir, sk->name);
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
criu/mount.c:2563:3: error: 'strncpy' specified bound 4096 equals
destination size
   strncpy(path, m->mountpoint, PATH_MAX);
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
criu/cr-restore.c:3647:2: error: 'strncpy' specified bound 16 equals
destination size
  strncpy(task_args->comm, core->tc->comm, sizeof(task_args->comm));
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-02-13 10:16:36 +03:00
Andrei Vagin
634703bb7d zdtm: fix gcc-8 warnings
fs.c:78:5: error: 'strncpy' specified bound 64 equals destination size [-Werror=stringop-truncation]
     strncpy(m->fsname, fsname, sizeof(m->fsname));
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2018-02-13 10:16:36 +03:00
Dmitry Safonov
4af0bb721d compel: Explicitely align all containers of i387_fxsave_struct
As it's aligned to 16, all structures that contain it should be
also aligned to 16. In the kernel there is no such align as
there two separate definitions of i387_fxsave_struct:
one for ia32 and another for x86_64.
Fixes newly introduced align warning in gcc-8.1:
In file included from compel/include/uapi/compel/asm/sigframe.h:7,
                 from compel/plugins/std/infect.c:13:
compel/include/uapi/compel/asm/fpu.h:89:1: error: alignment 1 of 'struct xsave_struct_ia32' is less than 16 [-Werror=packed-not-aligned]
 } __packed;
 ^

It doesn't change the current align of the struct, as containing
structures are __packed and it aligned already *by fact*.
It only affects the function users of the struct's local variables:
now they lay aligned on a stack.

Signed-off-by: Dmitry Safonov <dima@arista.com>
2018-02-13 10:14:42 +03:00
Dmitry Safonov
925451c12b zdtm/x86: Add a mxcsr preserving fpu test
It helped a bit to debug Skylake ptrace() bug, let's put it in.

Signed-off-by: Dmitry Safonov <dima@arista.com>
2018-02-13 10:14:42 +03:00
Dmitry Safonov
80d2adcfea compel: Cleanup INFECT_* definitions
Ugh, I've spent 25 mins at 4 A.M. to figure out why the tests are failing.
And the reason is stupied me, who defined a new flag after 0x8
as 0x16, not as 0x10. Simplify those definitions for such simple-minded
living creatures like Dima.

Signed-off-by: Dmitry Safonov <dima@arista.com>
2018-02-13 10:14:42 +03:00
Dmitry Safonov
3b9091ae66 compel/x86: Add workaround on ptrace() bug on Skylake
On Skylake processors and kernel older than v4.14
    ptrace(PTRACE_GETREGSET, pid, NT_X86_XSTATE, iov)
may return not full xstate, ommiting FP part (that is XFEATURE_MASK_FP).
There is a patch which describes this bug:
  https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1318800.html
Anyway, it's fixed in v4.14 kernel by (what we believe with Andrey) this:
  https://patchwork.kernel.org/patch/9567939/

As we still support kernels from v3.10 and newer, we need to have a
workaround for this kernel bug on Skylake CPUs.

Big thanks to Shlomi for the reports, the effort and for providing an
Amazon VM to test this. I wish more bug reporters were like you.

Reported-by: Shlomi Matichin <shlomi@binaris.com>
Provided-test-env: Shlomi Matichin <shlomi@binaris.com>
Investigated-with: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
2018-02-13 10:14:42 +03:00
Dmitry Safonov
df18ed2f89 compel/x86: Separate functions used to get fpu state
Mere cleanup. For Skylake workaround I'll call one after another,
so it's better separate it in a small helpers.

Signed-off-by: Dmitry Safonov <dima@arista.com>
2018-02-13 10:14:42 +03:00
Dmitry Safonov
70b67e3383 compel: Add ctx flags to get_task_regs()
get_task_regs() needs to know if it needs to use workaround
for a Skylake ptrace() bug. The next patch will introduce a
new flag for that.
I also thought about making 3 versions of get_task_regs() and
adding them to ictx->get_task_regs() depending on the flags..
But get_task_regs() is a private function and infect_ctx is
a uapi.. So, let's just pass context flags to get_task_regs().

Signed-off-by: Dmitry Safonov <dima@arista.com>
2018-02-13 10:14:42 +03:00
Dmitry Safonov
a80817a328 compel/infect: Unite save_regs_t with save_regs() declaration
As we anyway define save_regs_t for other purposes,
use it in the function declaration.
To unify infect_ctx style, add make_sigframe_t.
Mere cleanup.

Signed-off-by: Dmitry Safonov <dima@arista.com>
2018-02-13 10:14:42 +03:00
Dmitry Safonov
583eff8ab5 x86/kerndat: Add a check for ptrace() bug on Skylake
We need to know if ptrace(PTRACE_GETREGSET, pid, NT_X86_XSTATE, iov)
returns xsave without FP state part.

Signed-off-by: Dmitry Safonov <dima@arista.com>
2018-02-13 10:14:42 +03:00
Dmitry Safonov
8dcc764b86 x86/crtools: Add fork() err-path handle
Error-path for failed fork().
Looks originally forgotten, oops!
Also print a message on failed fork().

Signed-off-by: Dmitry Safonov <dima@arista.com>
2018-02-13 10:14:42 +03:00
Kirill Tkhai
5a7248bda2 inotify: Fix open_*notify_fd() never fails
We ignore restore_one_*notify() error code, while we mustn't.
Make open function fail when we can't restore them.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
2018-02-13 10:09:01 +03:00
Kirill Tkhai
7f750e20c3 inotify: Do not DDOS by debug message on restore watch descriptor
Imagine, we have to restore inotify with watch descriptor 0x34d71d6.
Then we have:

1.235021     5578: fsnotify:           Watch got       0x1 but 0x34d71d6 expected
...
...
527.378042   5578: fsnotify:           Watch got 0x34d71d3 but 0x34d71d6 expected
527.378042   5578: fsnotify:           Watch got 0x34d71d4 but 0x34d71d6 expected
527.378042   5578: fsnotify:           Watch got 0x34d71d5 but 0x34d71d6 expected

Stop doing this and stop generating GBs of debug messages.
We already have print message before restore_one_inotify().
Let's add just one more after it.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
2018-02-13 10:09:01 +03:00
Radostin Stoyanov
a33d3739e4 Fix typos
Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
2018-02-07 21:14:26 +03:00
Kirill Tkhai
e7449021bd zdtm: Add scm06 test
zdtm: Add scm06 test

From: Kirill Tkhai <ktkhai@virtuozzo.com>

This test makes looped unix sockets queues and tries
to iterate over them after the restore.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
2018-02-07 01:11:44 +03:00
Kirill Tkhai
37385e36b8 files: Allow to send unix sockets over unix sockets
Everything is ready. Message queue restores are in
the second stage of open for all types of unix sockets.
We just need to make scm wait before restore_unix_queue()
and allow to dump such scm context.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
2018-02-07 01:11:44 +03:00
Kirill Tkhai
75a52d666b unix: Move dump_sk_queue() before peer resolution
When we allow unix sockets sent over unix sockets,
dump_sk_queue() may dump and resolve some peers.
So, we need run it firstly and avoid linking our
peer_node to peer's peer_list.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
2018-02-07 01:11:44 +03:00
Kirill Tkhai
5e56a8c959 unix: Add fake queuer for standalone dgram sockets
Similar to previous patch, this makes the second end
of dgram socketpair to be open till post open. This
allows to delay restore of message queue.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
2018-02-07 01:11:44 +03:00
Kirill Tkhai
4e627abb57 unix: Add fake queuer for standalone stream sockets in established state
This makes the second end of socketpair to live till post_open.
We need it alive if we want to restore message queue later.
Otherwise, we do not have a queuer, which fd is used to actually
write messages.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
2018-02-07 01:11:44 +03:00
Kirill Tkhai
66d437f8aa unix: Split collect_one_unixsk()
Extract the functionality, which makes socket memory initialization.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
2018-02-07 01:11:44 +03:00
Kirill Tkhai
3164411e9e files: Implement find_unused_file_desc_id()
This function will be used to allocate id for fake files
(don't confuse with fake fds, e.g. fles).

Suggested-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
2018-02-07 01:11:44 +03:00
Kirill Tkhai
b49c0506b8 unix: Postpone restore_sk_common() of standalone sockets
restore_sk_common() may shutdown a socket, and queuer
won't be able to connect to it. So, this action must
be postponed.

We have this problem since long ago, but we are lucky
we haven't bumped in it.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
2018-02-07 01:11:44 +03:00
Kirill Tkhai
c39504fd26 unix: Make unix_sk_info::queuer pointer
Use pointer to the queuer instead of its id.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
2018-02-07 01:11:44 +03:00
Kirill Tkhai
4bc52ac069 unix: Move queue restore of interconnected pair to post open
Actually, there is no functional changes. We just postpone
restore of the queues. This will be used in the further
patches to restore unix sockets sent over unix sockets.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
2018-02-07 01:11:44 +03:00
Kirill Tkhai
d65d32acc1 unix: Rework peer transfer in open_unixsk_pair_master()
After previous patch, master and slave ends of socketpair
are owned by the only task. So, we may avoid using
of send_desc_to_peer() of the second end, and just
reopen it with right pid.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
2018-02-07 01:11:44 +03:00