2
0
mirror of https://github.com/checkpoint-restore/criu synced 2025-08-28 21:07:43 +00:00

9635 Commits

Author SHA1 Message Date
Mike Rapoport
8337e99857 ppc64le: fix build with UFFD
The __u64 is 'unsigned long' on Power and 'unsigned long long' on x86_64.
Using PRI?64 does not help because, for instance, PRIu64 is 'lu'.

According to [1] the solution is to define __SANE_USERSPACE_TYPES__ for
Power builds

[1] http://thread.gmane.org/gmane.linux.kernel/1425475/focus=1427433

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2017-09-16 09:10:03 +03:00
Mike Rapoport
6273fc971a uffd: add handling of zero pages
When get_page returns 0, it means that a page is mapped by a vma but it is
not found in the pagemap. This happens when a page is a zero page and
threofre skipped by dump.
Use UFFDIO_ZEROMAP to create a zero page in the restored process address
space.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2017-09-16 09:10:03 +03:00
Mike Rapoport
58edba63f9 uffd: introduce uffd_handle_page
so that it'll be able to handle both UFFDIO_COPY and UFFDIO_ZEROPAGE

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2017-09-16 09:10:03 +03:00
Mike Rapoport
e3f05ea0ca uffd: increment uffd_copied_pages only in one place
The uffd_copied_pages can be incremented in uffd_copy_page function rather
than in its callers

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2017-09-16 09:10:03 +03:00
Adrian Reber
a7004002b5 uffd.c: move the code out of the 'main' function
Most of the UFFD logic was in the function uffd_listen() which was
directly called from crtools.c. In preparation for the remote lazy
restore most of the code has been moved to separate function for better
integration of the network functionality.

Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2017-09-16 09:10:03 +03:00
Adrian Reber
b3ae1cc20b uffd.c: make some variable static global
To better track how many pages have been handled by UFFD a few variables
have been made static global to easier access them and to reduce the
number of parameters passed around.

Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2017-09-16 09:10:03 +03:00
Adrian Reber
048b31b24c uffd.c: move code into subfunctions
uffd_listen() is a rather large function and this starts to move code
into subfunctions.

Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2017-09-16 09:10:03 +03:00
Adrian Reber
c3abfff0ad uffd.c: remove unused variable vma_size
The variable vma_size was used for early debugging of lazy restore and
has no significance now.

Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2017-09-16 09:10:03 +03:00
Mike Rapoport
fcaf36f52c uffd: remove handling of VDSO pages
Since VDSO pages cannot be lazy, no need to take care of them in lazy-pages
daemon.

Signed-off-by: Mike Rapoport <rapoport@il.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2017-09-16 09:10:03 +03:00
Mike Rapoport
9c7970c2ae uffd: do not treat VDSO pages as lazy
VDSO is just a few pages and they can be loaded directly rather than go
through userfaultfd to save some complexity on the lazy-pages daemon side.

Signed-off-by: Mike Rapoport <rapoport@il.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2017-09-16 09:10:03 +03:00
Adrian Reber
33696ceba9 uffd.c: do not call unneeded functions
For a lazy restore via userfaultfd the lazy-pages daemon
needs to know which pages exist, so that it knows when all
pages have finally been migrated so that the restored process
has all of its memory. Therefore it needs to know which pages
exist and it needs to parse the files in the dump result directory.

The existing criu functions are designed to be used by a 'normal'
restore and thus a lot of assumptions are made what has to be set up.

For the lazy-pages restore the complete 'restore' initialization is
not necessary and therefore the criu common code dependencies are
minimized with this commit.

Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2017-09-16 09:10:03 +03:00
Adrian Reber
b8f46c368b Fix userfaultfd code with newer compilers
Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2017-09-16 09:10:02 +03:00
Adrian Reber
57891afcb2 Try to include userfaultfd with criu (part 2)
This is a first try to include userfaultfd with criu. Right now it
still requires a "normal" checkpoint. After checkpointing the
application it can be restored with the help of userfaultfd.

All restored pages with MAP_ANONYMOUS and MAP_PRIVATE set are marked as
being handled by userfaultfd.

As soon as the process is restored it blocks on the first memory access
and waits for pages being transferred by userfaultfd.

To handle the required pages a new criu command has been added. For a
userfaultfd supported restore the first step is to start the
'lazy-pages' server:

  criu lazy-pages -v4 -D /tmp/3/ --address /tmp/userfault.socket

This waits on a unix domain socket (defined using the --address option)
to receive a userfaultfd file descriptor from a '--lazy-pages' enabled
'criu restore':

  criu restore -D /tmp/3 -j -v4 --lazy-pages \
  --address /tmp/userfault.socket

In the first step the VDSO pages are pushed from the lazy-pages server
into the restored process. After that the lazy-pages server waits on the
UFFD FD for a UFFD requested page. If there are no requests received
during a period of 5 seconds the lazy-pages server switches into a mode
where the remaining, non-transferred pages are copied into the
destination process. After all remaining pages have been copied the
lazy-pages server exits.

The first page that usually is requested is a VDSO page. The process
currently used for restoring has two VDSO pages, but only one is
requested
via userfaultfd. In the second part where the remaining pages are copied
into the process, the second VDSO page is also copied into the process
as it has not been requested previously. Unfortunately, even as this
page has not been requested before, it is not accepted by userfaultfd.
EINVAL is returned. The reason for EINVAL is not understood and
therefore
the VDSO pages are copied first into the process, then switching to
request
mode and copying the pages which are requested via userfaultfd. To
decide at which point the VDSO pages can be copied into the process, the
lazy-pages server is currently waiting for the first page requested via
userfaultfd. This is one of the VDSO pages. To not copy a page a second
time, which is unnecessary and not possible, there is now a check to see
if the page has been transferred previously.

The use case to use usefaultfd with a checkpointed process on a remote
machine will probably benefit from the current work related to
image-cache and image-proxy.

For the final implementation it would be nice to have a restore running
in uffd mode on one system which requests the memory pages over the
network from another system which is running 'criu checkpoint' also in
uffd mode. This way the pages need to be copied only 'once' from the
checkpoint process to the uffd restore process.

TODO:
    * Contains still many debug outputs which need to be cleaned up.
    * Maybe transfer the dump directory FD also via unix domain sockets
      so that the 'uffd'/'lazy-pages' server can keep running without
      the need to specify the dump directory with '-D'
    * Keep the lazy-pages server running after all pages have been
      transferred and start waiting for new connections to serve.
    * Resurrect the non-cooperative patch set, as once the restored task
      fork()'s or calls mremap() the whole thing becomes broken.
    * Figure out if current VDSO handling is correct.
    * Figure out when and how zero pages need to be inserted via uffd.

v2:
    * provide option '--lazy-pages' to enable uffd style restore
    * use send_fd()/recv_fd() provided by criu (instead of own
      implementation)
    * do not install the uffd as service_fd
    * use named constants for MAP_ANONYMOUS
    * do not restore memory pages and then later mark them as uffd
      handled
    * remove function find_pages() to search in pages-<id>.img;
      now using criu functions to find the necessary pages;
      for each new page search the pages-<id>.img file is opened
    * only check the UFFDIO_API once
    * trying to protect uffd code by CONFIG_UFFD;
      use make UFFD=1 to compile criu with this patch

v3:
   * renamed the server mode from 'uffd' -> 'lazy-pages'
   * switched client and server roles transferring the UFFD FD
     * the criu part running in lazy-pages server mode is now
       waiting for connections
     * the criu restore process connects to the lazy-pages server
       to pass the UFFD FD
   * before UFFD copying anything else the VDSO pages are copied
     as it fails to copy unused VDSO pages once the process is running.
     this was necessary to be able to copy all pages.
   * if there are no more UFFD messages for 5 seconds the lazy-pages
     server switches in copy mode to copy all remaining pages, which
     have not been requested yet, into the restored process
   * check the UFFDIO_API at the correct place
   * close UFFD FD in the restorer to remove open UFFD FD in the
     restored process

v4:
    * removed unnecessary madvise() calls ; it seemed necessary when
      first running tests with uffd; it actually is not necessary
    * auto-detect if build-system provides linux/userfaultfd.h
      header.
    * simplify unix domain socket setup and communication.
    * use --address to specify the location of the used
      unix domain socket.

v5:
    * split the userfaultfd patch in multiple smaller patches
    * introduced vma_can_be_lazy() function to check if a page
      can be handled by uffd
    * moved uffd related code from cr-restore.c to uffd.c
    * handle failure to register a memory page of the restored process
      with userfaultfd

v6:
    * get PID of to be restored process from the 'criu restore' process;
      first the PID is transferred and then the UFFD

Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2017-09-16 09:10:02 +03:00
Adrian Reber
e2268aa39c Try to include userfaultfd with criu (part 1)
This is a first try to include userfaultfd with criu. Right now it
still requires a "normal" checkpoint. After checkpointing the
application it can be restored with the help of userfaultfd.

All restored pages with MAP_ANONYMOUS and MAP_PRIVATE set are marked as
being handled by userfaultfd.

As soon as the process is restored it blocks on the first memory access
and waits for pages being transferred by userfaultfd.

To handle the required pages a new criu command has been added. For a
userfaultfd supported restore the first step is to start the
'lazy-pages' server:

  criu lazy-pages -v4 -D /tmp/3/ --address /tmp/userfault.socket

This is part 1 of the userfaultfd integration which provides the
'lazy-pages' server implementation.

v2:
    * provide option '--lazy-pages' to enable uffd style restore
    * use send_fd()/recv_fd() provided by criu (instead of own
      implementation)
    * do not install the uffd as service_fd
    * use named constants for MAP_ANONYMOUS
    * do not restore memory pages and then later mark them as uffd
      handled
    * remove function find_pages() to search in pages-<id>.img;
      now using criu functions to find the necessary pages;
      for each new page search the pages-<id>.img file is opened
    * only check the UFFDIO_API once
    * trying to protect uffd code by CONFIG_UFFD;
      use make UFFD=1 to compile criu with this patch

v3:
   * renamed the server mode from 'uffd' -> 'lazy-pages'
   * switched client and server roles transferring the UFFD FD
     * the criu part running in lazy-pages server mode is now
       waiting for connections
     * the criu restore process connects to the lazy-pages server
       to pass the UFFD FD
   * before UFFD copying anything else the VDSO pages are copied
     as it fails to copy unused VDSO pages once the process is running.
     this was necessary to be able to copy all pages.
   * if there are no more UFFD messages for 5 seconds the lazy-pages
     server switches in copy mode to copy all remaining pages, which
     have not been requested yet, into the restored process
   * check the UFFDIO_API at the correct place
   * close UFFD FD in the restorer to remove open UFFD FD in the
     restored process

v4:
    * removed unnecessary madvise() calls ; it seemed necessary when
      first running tests with uffd; it actually is not necessary
    * auto-detect if build-system provides linux/userfaultfd.h
      header
    * simplify unix domain socket setup and communication.
    * use --address to specify the location of the used
      unix domain socket

v5:
    * split the userfaultfd patch in multiple smaller patches
    * introduced vma_can_be_lazy() function to check if a page
      can be handled by uffd
    * moved uffd related code from cr-restore.c to uffd.c
    * handle failure to register a memory page of the restored process
      with userfaultfd

v6:
    * get PID of to be restored process from the 'criu restore' process;
      first the PID is transferred and then the UFFD
    * code has been re-ordered to be better prepared for lazy-restore
      from remote host
    * compile test for UFFD availability only once

Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2017-09-16 09:10:02 +03:00
Adrian Reber
27e601790e Remove static from prepare_task_entries function
For the upcoming userfaultfd integration the lazy-pages mode needs to
setup the criu infrastructure to read the pages files.

Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2017-09-16 09:10:02 +03:00
Mike Rapoport
98ac646f86 ppc64le: travis: fixup Ubuntu repositories
The ppc64le docker image has broken /etc/apt/sources.list. A small fixup to
it allows running ppc64le tests.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-08-31 00:35:34 +03:00
Andrei Vagin
835e252b09 test/ptrace_sig: wait children before calling test_daemon
A static test has to do nothing after test_daemon().

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-08-31 00:35:00 +03:00
Pavel Emelyanov
a31c1854e1 criu: Version 3.4
The biggest new thing this time is s390x arch support!
Also we have several improvements and a set of bugfixes
as usual.

Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
v3.4
2017-08-21 16:27:57 +03:00
Pavel Emelyanov
84492ae27c Added badges to the title page
Travis statues for master and criu-dev and codacy grade.
2017-08-17 17:13:17 +03:00
Dmitry Safonov
204c1ef9e0 restore: Fix deadlock when helper's child dies
Since commit ced9c529f687 ("restore: fix race with helpers' kids dying
too early"), we block SIGCHLD in helper tasks before CR_STATE_RESTORE.
This way we avoided default criu sighandler as it doesn't expect that
childs may die.

This is very racy as we wait on futex for another stage to be started,
but the next stage may start only when all the tasks complete previous
stage. If some children of helper dies, the helper may already have
blocked SIGCHLD and have started sleeping on the futex. Then the next
stage never comes and no one reads a pending SIGCHLD for helper.

A customer met this situation on the node, where the following
(non-related) problem has occured:
Unable to send a fin packet: libnet_write_raw_ipv6(): -1 bytes written (Network is unreachable)
Then child criu of the helper has exited with error-code and the
lockup has happened.

While we could fix it by aborting futex in the end of
restore_task_with_children() for each (non-root also) tasks,
that would be not completely correct:
1. All futex-waiting tasks will wake up after that and they
   may not expect that some tasks are on the previous stage,
   so they will spam into logs with unrelated errors and may
   also die painfully.
2. Child may die and miss aborting of the futex due to:
   o segfault
   o OOM killer
   o User-sended SIGKILL
   o Other error-path we forgot to cover with abort futex

To fix this deadlock in TASK_HELPER, as suggested-by Kirill,
let's check if there are children deaths expected - if there
isn't any, don't block SIGCHLD, otherwise wait() and check if
death was on expected stage of restore (not CR_STATE_RESTORE).

Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>

Conflicts:
	criu/cr-restore.c
2017-08-16 13:20:25 +03:00
Pavel Emelyanov
8a270e5807 util: Add block_sigmask/unblock_sigmask helpers
This is an extract from Kirill Tkhai's patch
87464739 (restore: Block SIGCHLD during root_item initialization)

Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2017-08-16 13:18:39 +03:00
Dmitry Safonov
96602ba854 restorer: Report child's death reason correctly
E.g, if child was killed by SIGSEGV, this message
previously was "exited, status=11", as si_code == CLD_DUMPED == 3
in this case will result in (si_code & CLD_KILLED) == (si_code & 1).
Which is misleading as you may try to look for exit() calls
with 11 arg.
Correct if to compare si_code with CLD_*.

Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-08-16 13:07:35 +03:00
Cyrill Gorcunov
59c9583992 images: netdev -- Adjust venet comment
venet is not virtuozzo specific but rather
came from openvz, make it so.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-08-16 13:06:50 +03:00
Andrei Vagin
625552f0e9 page-xfer: handle a case when splice returns zero
A return value of 0 means end of input, so we need to
stop reading from this descriptor.

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-08-16 13:06:10 +03:00
Andrei Vagin
89d10280e5 restore: handle errors of restore_wait_other_tasks
In a error case, task_entries->nr_in_progress is set to -1
and we have to handle this case.

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-08-16 13:04:12 +03:00
Pavel Emelyanov
181c44aa3f compel,s390: Fix setting regs for tasks
Item's thread struct pid is not a pointer in master.

Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2017-08-15 16:21:29 +03:00
Andrei Vagin
e2e1dbf490 util: report a command name in error messages for cr_system()
It is good to know what command were executed.

https://github.com/xemul/criu/issues/371
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2017-08-15 15:24:11 +03:00
Pavel Emelyanov
943df7d612 page-server: Tune up the encode/decode helpers
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-08-15 15:24:11 +03:00
Pavel Emelyanov
ac05c73a7b page-server: Fix incompatibility of page server protocol
It turned out (surprise surprise) that page server exchanges CR_FD_PAGEMAP
and CR_FD_SHMEM_PAGEMAP values in ps_iov_msg. The problem is that these
constants sit in enum and change their values from version to version %)

So here's the fixed version of the protocol including the backward compat
checks on all the values that could be met from older CRIUs (we're lucky
and they didn't intersect).

Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-08-15 15:24:11 +03:00
Mike Rapoport
ae7a9a0758 test/zdtm: set maps04 timeout to 60 seconds
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-08-15 15:24:11 +03:00
Mike Rapoport
b34efd5591 test/zdtm.py: allow setting test timeout in the test description
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-08-15 15:24:11 +03:00
Vitaly Ostrosablin
9152dd2198 test: remap_dead_pid.c: Fix child PID not being printed
There's two issues with this code:

1. Child task is killed by parent faster, than it could print the line.
2. Even if it had time to print it - there would always be 0, because
it's called from child process.

Obviously, this print was meant to be in parent process. So, let's move
it there.

Signed-off-by: Vitaly Ostrosablin <vostrosablin@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-08-15 15:24:11 +03:00
Francesco Giancane
833fa4d7db Make the Makefile variables externally configurable.
As of manual page INSTALL.md, it is stated that those variables can be
overridden by means of environmental variables.

export BINDIR="somedir"
export SBINDIR="somedir"
export LIBDIR="somedir"
export MANDIR="somedir"
export INCLUDEDIR="somedir"
export LIBEXECDIR="somedir"

make install

But those settings will not be honored, sticking to default Makefile values.
This patch fixes the issue.

Signed-off-by: Francesco Giancane <francescogiancane8@gmail.com>
Reviewed-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-08-15 15:24:11 +03:00
Francesco Giancane
5f09c258d1 make: Makefile variable 'PREFIX' should be configurable.
I tried to build and install the criu package from source. All went
well, unless for the install part.

In particular,
PREFIX=/
DESTDIR=~/build_criu

PREFIX=${PREFIX} DESTDIR=${DESTDIR} make install

resulted in installing criu packages in ~/build_criu/usr/local/,
ignoring PREFIX, even if this variable should be configurable as of
manual and GCS.

Reviewed-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Francesco Giancane <francescogiancane8@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-08-15 15:24:11 +03:00
Mike Rapoport
5d6392358d test/zdtm.py: ignore UNIX sockets during report creation
When files are added to the report shutil.copytree is unhappy with
lazy-pages.socket. Tell shutil.copytree that it should ignore *.socket.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-08-15 15:24:10 +03:00
Kirill Tkhai
1db93b6ea9 files: Add comments about FLE_* stages
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-08-15 15:24:10 +03:00
Mike Rapoport
a6543ce2bb mem: kill trailing whitespace
Acked-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-08-15 15:24:10 +03:00
Andrei Vagin
d1bc0b85f0 kerndat: include config.h before using CONFIG_* contants
Otherwise someone can include kerndat.h before config.h
and get another kerndat structure. For example, proc_parse
doesn't include config.h and Cyrill met this problem.

Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Reviewed-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-08-15 15:24:10 +03:00
Pavel Emelyanov
a7b381d1fc net: Relax xmalloc-ing (and fix NULL deref)
There's potential NULL-derefernece in dump_netns_con() -- two xmalloc
results are not checked. However, since there's a huge set of these
xmallocs, I propose to relax the whole thing with one big xmalloc and
xptr_pull() helper.

Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-08-15 15:24:10 +03:00
Kirill Tkhai
3842c6341d shmem: Remove pid argument of shmem_wait_and_open()
It's unused and it's rewritten in shmem_wait_and_open(),
and it just confuses a reader. So, kill it.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
2017-08-15 15:24:10 +03:00
Alice Frosi
1ea1fdde45 compel: Save thread registers before executing parasite code
For dumping threads we execute parasite code before we collect the
floating point registers with get_task_regs().

The following describes the code path were this is done:

 dump_task_threads()
     for (i = 0; i < item->nr_threads; i++)
         dump_task_thread()
           parasite_dump_thread_seized()
             compel_prepare_thread()
                prepare_thread()

                   ### Get general purpose registers ###
                   ptrace_get_regs(ctx->regs)

             ### parasite code is executed in thread ###
             compel_run_in_thread(tctl, PARASITE_CMD_DUMP_THREAD)

             compel_get_thread_regs()

                 ### Get FP and VX registers ###
                 get_task_regs()

Since on s390 gcc uses floating point registers to cache general
purpose without the -msoft-float option the floating point
registers would have been clobbered after compel_run_in_thread().

With this patch we first save all thread registers with task_get_regs() and
then call the parasite code.

We introduce a new function arch_set_thread_regs() that restores the saved
registers for all threads. This is necessary for criu dump --leave-running,
--check-only, and error handling.

Pre-dump does not require to save the registers for the threads because
because only the main thread (task) is used for parsite code execution.

The above changes allow us to use all register sets in the parasite
code - therefore we can remove -msoft-float on s390.

Reviewed-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Alice Frosi <alice@linux.vnet.ibm.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2017-08-15 14:52:15 +03:00
Michael Holzheu
43ad175780 s390/zdtm: Add s390x_reg_check test case
This test can be used to verify FP and VX registers on s390:

- Verify that "criu restore" sets the correct register sets
  from "criu dump":
  $ zdtmp.py run -t zdtm/static/s390x_regs_check

- Verify that dumpee continues running with correct registers after
  parasite injection:
  $ zdtm.py run --norst -t zdtm/static/s390x_regs_check
  $ zdtm.py run --norst --pre 2 -t zdtm/static/s390x_regs_check
  $ zdtm.py run --check-only -t zdtm/static/s390x_regs_check

Reviewed-by: Alice Frosi <alice@linux.vnet.ibm.com>
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2017-08-15 14:51:13 +03:00
Michael Holzheu
7083f15351 zdtm: Move include of Makefile.inc to right place
We have to include Makefile.inc earlier in order to use SRCARCH.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2017-08-15 14:51:13 +03:00
Adrian Reber
f07adae690 compel/s390: glibc renamed ucontext to ucontext_t
The upcoming glibc release renamed 'struct ucontext' to
'struct ucontext_t':

https://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=251287734e89a52da3db682a8241eb6bccc050c9;hp=c86ed71d633c22d6f638576f7660c52a5f783d66

Instead of using 'struct ucontext' this patch changes it
to the typedef ucontext_t which already exists in older and
new versions of glibc.

Signed-off-by: Adrian Reber <areber@redhat.com>
Reviewed-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Reviewed-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-08-09 18:51:42 +03:00
Adrian Reber
f2899a728c compel/aarch64: glibc renamed ucontext to ucontext_t
The upcoming glibc release renamed 'struct ucontext' to
'struct ucontext_t':

https://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=251287734e89a52da3db682a8241eb6bccc050c9;hp=c86ed71d633c22d6f638576f7660c52a5f783d66

Instead of using 'struct ucontext' this patch changes it
to the typedef ucontext_t which already exists in older and
new versions of glibc.

Signed-off-by: Adrian Reber <areber@redhat.com>
Reviewed-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-08-09 18:51:42 +03:00
Adrian Reber
36576e2096 compel/ppc64: glibc renamed ucontext to ucontext_t
The upcoming glibc release renamed 'struct ucontext' to
'struct ucontext_t':

https://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=251287734e89a52da3db682a8241eb6bccc050c9;hp=c86ed71d633c22d6f638576f7660c52a5f783d66

Instead of using 'struct ucontext' this patch changes it
to the typedef ucontext_t which already exists in older and
new versions of glibc.

Signed-off-by: Adrian Reber <areber@redhat.com>
Reviewed-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-08-09 18:51:42 +03:00
Adrian Reber
3ef506befc compel: adapt .gitgnore for aarch64 and ppc64le
Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-08-09 18:51:42 +03:00
Gianfranco Costamagna
7e5b5f3128 Fix ppc64el build failure, by not redefining AT_VECTOR_SIZE_ARCH
This fixes the ppc64el build failure
  CC       criu/arch/ppc64/sigframe.o
In file included from criu/arch/ppc64/sigframe.c:5:0:
criu/arch/ppc64/include/asm/types.h:32:0: error: "AT_VECTOR_SIZE_ARCH" redefined [-Werror]
 #define AT_VECTOR_SIZE_ARCH 6

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-08-09 18:51:42 +03:00
Michael Holzheu
a5e2605e7a s390: Fix off-by-one error for task size detection
The entries in /proc/<pid>/maps look like the following:

  3fffffdf000-40000000000 rw-p 00000000 00:00 0

The upper address is the first address that is *not* included in the
address range.

Our function max_mapped_addr() should return the last valid address
for a process, but currently returns the first invalid address.

This can lead to the following error message on kernel that have
kernel commit ee71d16d22bb:

 Error (criu/proc_parse.c:694): Can't dump high memory region
 1ffffffffff000-20000000000000 of task 24 because kernel commit ee71d16d22bb
 is missing

Fix this and make max_mapped_addr() the last valid address (first invalid
address - 1).

Reported-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-08-09 18:51:42 +03:00
Michael Holzheu
61e6c01d09 s390: Move -msoft-float/-fno-optimize-sibling-calls into compel Makefiles
We currently define "CFLAGS += -msoft-float -fno-optimize-sibling-calls"
in Makefile.compel.

Makefile.compel is included into the toplevel Makefile and so changes
CFLAGS which are exported to all criu and zdtm Makefiles.

We must not use -msoft-float outside the compel files. E.g. otherwise for
zdtm we get the following build error:

uptime_grow.o: In function `main':
/tmp/2/criu/test/zdtm/static/uptime_grow.c:36: undefined reference to `__floatdidf'
/tmp/2/criu/test/zdtm/static/uptime_grow.c:36: undefined reference to `__muldf3'
/tmp/2/criu/test/zdtm/static/uptime_grow.c:36: undefined reference to `__floatdidf'
/tmp/2/criu/test/zdtm/static/uptime_grow.c:36: undefined reference to `__adddf3

Fix this and move the CFLAGS definition to the compel Makefiles.
We do this by defining a new flag CFLAGS_PIE that is only used for pie code.

Reported-by: Adrian Reber <areber@redhat.com>
Suggested-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Reviewed-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Tested-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-08-09 18:51:41 +03:00