log_init() opens a log file. We want to have criu and kernel versions in
each log file, so it looks reasonable to print them from log_init().
Without this patch, versions are not printed, if criu is called in the
swrk mode.
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Acked-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
We have three code paths, where we call log_set_loglevel() and in all
these places, we need to initialize libraries logging engines, so it is
better to do this from one function. For example, currently we forgot to
initialize soccr logging for criu swrk.
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Acked-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
This test creates a pty pair, creates a test process and sets a slave
pty as control terminal for it.
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
This is test that triggers a bug with ghost files, that was resolved in
patch "Don't fail if ghost file has no parent dirs".
Signed-off-by: Vitaly Ostrosablin <vostrosablin@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
In the current model we haven't started the background page transfer until
POLL_TIMEOUT time has elapsed since the last uffd or socket event. If the
restored process will do memory access one in (POLL_TIMEOUT - eplsilon) the
filling of its memory can take ages.
This patch changes them model in the following way:
* poll for the events indefinitely until the restore is complete
* the restore completion event causes reset of the poll timeout to zero and
* starts the background transfers
* after each transfer we return to check if there are any uffd events to
handle
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Currently, once we get to transfer pages in the "background", we try to
fetch the entire IOV at once. For large IOVs this may impact #PF latency
for the #PF events occurred during the transfer.
Let's add a simple heuristic for controlling size of the background
transfers. Initially, the transfer will be limited to some default value.
Every time we transfer a chunk we increase the transfer size until it
reaches a pre-defined maximal size. A page fault event resets the
background transfer size to its initial value.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The complete_forks function presumes that it always has a work to do
because we assume that fork event is the only case when we drop out of
epoll_run_rfds with positive return value.
Teach complete_forks to bail out when there is no pending forks to process
to allow exiting epoll_run_rfds for different reasons.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
First check if there are pages we need to transfer and only afterwards
check if there are outstanding requests. Also, instead checking 'bool
remaining' to see if there is more work to do we can simply check if all
the lpi's have been already serviced.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The intention is to use this function for transferring all the pages that
didn't cause a #PF.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The function anyway pick the next page range to transfer it's just doing it
in very simple FIFO manner.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
We already have a queue for the requested memory ranges which contains
'lp_req' objects. These objects hold the same information as the lazy_iov:
start address of the range, end address and the address that the range had
at the dump time.
Rather than keep this information twice and use double bookkeeping, we can
extract the requested range from lpi->iovs and move it to lpi->reqs.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Instead of relying on length of various lists add a boolean variable to
lazy_pages_info to make it clean when the process has exited
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Currently zdtm doesn't detect when restore failed, if it is executed
with strace. With this patch, fake-restore.sh creates a test file, and
zdtm is able to distinguish when restore failed.
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The get() method requires a key and now we are using an index. That
will never work correctly as it is now.
Acked-by: Adrian Reber <adrian@lisas.de>
Reported-by: Adrian Reber <adrian@lisas.de>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Currently we restore all sockets in the root mount namespace, because we
were not able to get any information about a mount point where a socket
is bound. It is obviously incorrect in some cases.
In 4.10 kernel, we added the SIOCUNIXFILE ioctl for unix sockets. This
ioctl opens a file to which a socket is bound and returns a file
descriptor.
This new ioctl allows us to get mnt_id by reading fdinfo, and mnt_id
is enough to find a proper mount point and a mount namespace.
The logic of this patch is straight forward. On dump, we save mnt_id for
sockets, on restore we find a mount namespace by mnt_id and restore this
socket in its mount namespace.
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
unix_process_name() are called when sockets are being collected,
but at this moment we don't have socket descriptors.
A socket descriptor is reuired to get mnt_id, what will allow to resolve
a socket path in its mount namespace.
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
This ioctl opens a file to which a socket is bound and
returns a file descriptor. This file descriptor can be used to get
mnt_id and a file path.
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The USK_CALLBACK flag means that a socket is externel and will be
restored by a plugin. open_unixsk_standalone should not be called to
these sockets.
$ make -C test/others/unix-callback/ run
...
(00.109338) 7471: sk unix: Opening standalone socket (id 0xd ino 0 peer 0x63b)
(00.109376) 7471: Error (criu/sk-unix.c:1128): sk unix: BUG at criu/sk-unix.c:1128
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Unix file sockets have to be restored in proper mount namespaces.
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
It is not used, probably was committed by mistake.
Fixes: 2d093a1702 ("travis: add a job to test on the fedora rawhide")
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
In Ubuntu Bionic for armhf, clang is compiled for armv8l rather than
armv7l (as it was and still is for gcc) and so it uses armv8 by default.
This breaks compilation of tests using smp_mb():
> error: instruction requires: data-barriers
The fix is to add "-march=armv7-a" to CFLAGS which we already do,
except not for the tests.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
The newly introduced output of the CRIU and kernel version does not
happen when running CRIU under RPC. This moves the print_versions()
function util.c and calls it from cr-service.c
Signed-off-by: Adrian Reber <areber@redhat.com>
This patch makes restore_one_inotify() to request specific
watch descriptor number instead of iterating in (possible)
long-duration loop if system supports it.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
This adds new tunfile_entry::ns_id field and populates
it in standard socket way. Restore uses this ns_id
to choose correct namespace. Note, we could completelly
skip set_netns() on restore in case of !has_ns_id, but
using top_net_ns invents some definite behaviour.
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
ktkhai: comment written/code movings
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Sometimes we see errors like this:
criu/cr-restore.gcda:Merge mismatch for function 106
It proabably means that this gcda file was corrupted. According to the
gcc man page, the -fprofile-update=atomic should fix this problem.
v2: this options appered in gcc7, so we need to install it.
Reported-by: Mr Travis CI
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
We have a problem when a pid is reused between consequent dumps we can't
understand if pagemap and pages from images of parent dump are invalid
to restore these pid already. That can lead even to wrong memory
restored for these pid, see the test in last patch.
So these is a try do separate processes with (likely) invalid previous
memory dump from processes with 100% valid previous dump.
For that we use the value of /proc/<pid>/stat's start_time and also the
timestamp of each (pre)dump. If the start time is strictly less than the
timestamp, that means that the pagemap for these pid from previous dump
is valid - was done for exactly the same process.
Creation time is in centiseconds by default so if predump is really fast
(<1csec) we can have false negative decisions for some processes, but in
case of long running processes we are fine.
https://jira.sw.ru/browse/PSBM-67502
v2: remove __maybe_unused for get_parent_stats; fix get_parent_stats to
have static typing; print warning only if unsure; check has_dump_uptime
v3: read parent stats from image only once; reuse stat from previous
parse_pid_stat call on dump
v4: move code to function; use unsigned long long for ticks; put
proc_pid_stat on mem_dump_ctl; print warning on all pid-reuse cases
v5: free parent's stats entry properly, pass it in arguments to
(pre_)dump_one_task
v6: free parent's stats in error path too
v7: zero init parent_se
v8: improve error message
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
will be used in the next patch
https://jira.sw.ru/browse/PSBM-67502
note: actually we need only one value from stats entry but I still
prefer general helper as we still need to read and allocate memory
for the whole structure
v2: fix get_parent_stats to have static typing
v3: simplify get_parent_stats to return a StatsEntry pointer instead of
doing it through arguments
v8: replace errors with warnings, we should whatch on them only if we
have corresponding error in detect_pid_reuse else they are fine
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
We want to use a simple fact: If we have an alive process in a pstree we
want to dump, and a starttime of that process is less than pre-dump's
timestamp, then these exact process existed (100% sure) at the time of
these pre-dump and the process' memory was dumped in images. Thus we
save uptime while in freezed state else these won't work.
https://jira.sw.ru/browse/PSBM-67502
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>