2
0
mirror of https://github.com/checkpoint-restore/criu synced 2025-08-31 06:15:24 +00:00
Commit Graph

8948 Commits

Author SHA1 Message Date
Kirill Tkhai
f9582b0bd3 utils: Introduce open_fd_of_real_pid()
As access to /proc/[pid]/fd/[i] of a task from parent's
user_ns is prohibited, introduce a helper, doing that
via usernsd.

Also, remove BUG_ON() in usernsd, as now it may be used
without input fd parameter.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:14 +03:00
Kirill Tkhai
13cba0ca69 user_ns: Set user_ns before net_ns creation
Since net ns may reffer not only to root_user_ns,
set appropriate user_ns before its creation.

v3: New

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:14 +03:00
Kirill Tkhai
a69a4eddec ns: Generate user_ns tree
Create user namespaces hierarhy from criu main task.
Open ns'es fds, so they are seen for everybody in
fdstore.

Why we do it this way.
1)User namespaces are not correlated with task
hierarhy. Parent task may have a user namespace
of a level bigger, that a child task. So, we
can't restore the user namespaces just by
passing CLONE_NEWUSER in fork_with_pid().

2)CLONE_FS tasks will require user_ns is set at the
moment of clone(), so we have to restore target user_ns
in locality of create_children_and_session() in this case.

v3: Check for WIFEXITED(). Aligned stack.
    Use fdstore to keep ns fd.
    Create tree from root_item.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:14 +03:00
Kirill Tkhai
3aebb0eac6 utils: Move getting real pid functionality to separate function
This is refactoring

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:14 +03:00
Kirill Tkhai
5938cc0d50 proc: Close CR_PROC_FD_OFF and TRANSPORT_FD_OFF later
CR_PROC_FD_OFF is need for accessing to foreign tasks
fds, and will be used in the future.

TRANSPORT_FD_OFF is for uniformity.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:14 +03:00
Kirill Tkhai
0966b0d05d ns: Make write_id_map() use CR_PROC_FD_OFF
Currently, it's used by criu from CRIU_NS only.
So, in fact open_proc_rw() leads to opening of
a fd in CRIU_NS /proc (open_pid_proc() just
opens "/proc" dir, when PROC_FD_OFF is not set).

Make write_id_map() use CR_PROC_FD_OFF, which
exists, and does not confuse a user.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:14 +03:00
Kirill Tkhai
75f71e73a1 ns: Make prepare_userns() have ns map parameter
This is refactoring

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:14 +03:00
Kirill Tkhai
0742794aa6 ns: Write/read ns entries in new way
The patch introduces generic way for dumping all the namespaces
in a generic way (currently, only user ns entries are dumped).

Handler for old user ns images is remained on its place.

v4: Rebase on generic parent_id and userns_id.
v3: On restore, keep in mind, that parent ns may not be read
    at the moment of the searching of it.
    Set correct user ns id to d_ns.
    Reflect the fact, that parent_id is moved to pid and user ext.
    Read ns ids before tasks.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:14 +03:00
Kirill Tkhai
c2a773cfeb proto: Add ns_entry description
New image format, generic for all namespaces.
Currently, it's for pid, net and user ns.

v4: Rename ns-hookup to ns.
    Make user_ns and parent generic.
v3: Move parent_id to pid and user ext

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:14 +03:00
Kirill Tkhai
a1d4cef08a images: Move uid_gid_extent and userns_entry descriptions
Move them into ns.proto file

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:14 +03:00
Kirill Tkhai
388d853fa3 ns: Implement dup_userns_entry()
Function for cloning UsernsEntry entries.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:14 +03:00
Kirill Tkhai
b158b7bfdb ns: Set pointer to root_user_ns in ns_ids
Old type images do not have pointer to user_ns.
Set them manually.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:14 +03:00
Kirill Tkhai
45588e9d6c ns: Provide the case when root_item has !NS_ROOT user_ns in rst_add_ns_id()
root_item may have NS_OTHER user_ns, so do not set it directly.
This will be used in next patches.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:13 +03:00
Kirill Tkhai
a64b4cc08c user_ns: Name loading UsernsEntry mappings on restore "old format"
Split prepare_userns() in two functions.

Also, this commit fixes the problem, which existed before my patchset.
We do not populate userns_entry on restore, though it's need and used
at least by the chain prepare_mnt_ns()->sb_opt_cb()->userns_uid().

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:13 +03:00
Kirill Tkhai
e17627f450 ns: Add user and pid ns_id on restore
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:13 +03:00
Kirill Tkhai
5e87976151 ns: Implement target_userns_{u, g}id() and root_userns_{u, g}id()
Add primitives for converting xids from NS_ROOT to custom
NS_OTHER, and vice versa.

v4: Fixed erratum in root_userns_gid()

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:13 +03:00
Kirill Tkhai
f10b0d2e2c ns: Rename and export userns_id() and INVALID_ID
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:13 +03:00
Kirill Tkhai
8cdac55719 user_ns: Make host_id() working with any mapping and rename it
Make possible to convert uid and gid from a user_ns to its
representation in its (grand) parent user_ns.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:13 +03:00
Kirill Tkhai
0efe5cca86 user_ns: Make collect_user_ns() allocate child UsernsEntry mappings
Allocate mapping for NS_OTHER too.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:13 +03:00
Kirill Tkhai
16e4530b57 ns: Change arguments of dump_user_ns()
Make ns as only argument of dump_user_ns(). As the only ns,
which it may be called for is root_item's ns, the logic
after this patch remains the same as it was before.
Also make dump_user_ns() static.

In addition, pass ns to check_user_ns().

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:13 +03:00
Kirill Tkhai
3400312812 ns: Set hookups for all namespaces
Discover relationships between namespaces
and populate appropriate fields in ns_id

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:13 +03:00
Kirill Tkhai
1008c64072 ns: Set nested namespaces hookups
Introduce ns_id::parent and assign a pointer to parent
for every ns except NS_CRIU and NS_ROOT.
Also populate user_ns for pid_ns.

v5: Remove excess check on on->parent.
v4: Set "ret = -1" on one of the error pathes.
    Add comment about user_ns finding.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:13 +03:00
Kirill Tkhai
8ba659fec1 zdtm: Add userns01 test
Check UID and GID in unshared userns remains the same

v5: Use custom UID and GID.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:13 +03:00
Kirill Tkhai
b0af94ddf4 zdtm: Add userns00 test
Create two children, and unshare() user_ns in one of them (C1).
The second child creates one more process, which switches to C1's
namespace and unshares.

v4: Keep in mind the case, when readlink returns PATH_MAX-length string.
    Print full wait status instead of WEXITSTATUS().
v3: Unshare net ns in grand child

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:13 +03:00
rbruno@gsd.inesc-id.pt
15ee55f404 zdtm: Add support for image-proxy/image-cache
Signed-off-by: Rodrigo Bruno <rbruno@gsd.inesc-id.pt>
Signed-off-by: Katerina Koukiou <k.koukiou@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2017-11-30 01:22:11 +03:00
rbruno@gsd.inesc-id.pt
0848223500 Process Migration using Sockets (p2)
The current patch brings the implementation of the image proxy and image cache.
These components are necessary to perform in-memory live migration of processes
using CRIU. The image proxy receives images from CRIU Dump/Pre-Dump (through
UNIX sockets) and forwards them to the image cache (through a TCP socket). The
image cache caches image in memory and sends them to CRIU Restore (through
UNIX sockets) when requested.

Signed-off-by: Rodrigo Bruno <rbruno@gsd.inesc-id.pt>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2017-11-30 01:21:41 +03:00
rbruno@gsd.inesc-id.pt
2fb8492646 Process Migration using Sockets (p1)
This patch introduces the --remote option and the necessary code changes to
support it. This leaves user the option to decide if the checkpoint data is to
be stored on disk or sent through the network (through the image-proxy).
The latter forwards the data to the destination node where image-cache
receives it.

The overall communication is performed as follows:
src_node CRIU dump -> (sends images through UNIX sockets) ->      image-proxy
								       |
								       V
dst_node: CRIU restore <- (receives images through UNIX sockets)<- image-cache

Communication between image-proxy and image-cache is done through a single
TCP connection.

Running criu with --remote option is like this:

dst_node# criu image-cache -d --port <port> -o /tmp/image-cache.log
dst_node# criu restore --remote -o /tmp/image-cache.log
src_node# criu image-proxy -d --port <port> --address <dst_node> -o /tmp/image-proxy.log
src_node# criu dump -t <pid> --remote -o /tmp/dump.log

    [ xemul:
here's the list of what should be done with the cache/proxy
in order to have them merged into master.

0. Document the whole thing :)
   Please, add articles for newly introduced actions and options to
   https://criu.org/CLI page.
   Also, it would be good to have an article describing the protocols
   involved.

1. Make the unix sockets reside in work-dir.
   The good thing is that we've get rid of the socket name option :)
   But looking at do_open_remote_image() I see that it fchdir-s to
   image dir before connecting to proxy/cache. Better solution is to
   put the socket into workdir.

   1a. After this the option -D|--images-dir should become optional.
       Provided the --remote is given CRIU should work purely on the
       work-dir and not generate anything in the images-dir.

2. Tune up the image_cache and image_proxy commands to accept the
   --status-fd and --pidfile options.
   Presumably the very cr_daemon() call should be equipped with
   everything that should be done for daemonizing and proxy/cache
   tasks should just call it :)

3. Fix local connections not to generate per-image threads. There
   can be many images and it's not nice to stress the system with
   such amount of threads. Please, look at how criu/uffd.c manages
   multiple descriptors with page-faults using the epoll stuff.

   3a. The accept_remote_image_connections() seem not to work well
       with opts.ps_socket scenario as the former just calls accept()
       on whatever socket is passed there, while the opts.ps_socket
       is already an established socket for data transfer.

4. No strings in protocol. Now the hard-coded "RESTORE_FINISH" string
   (and DUMP_FINISHED one) is used to terminate the communication.
   Need to tune up the protobuf objects to send boolean (or integer)
   EOF sign rather that the string.

5. Check how proxy/cache works with incremental dumps. Looking at the
   skip_remote_bytes() I think that image-cache and -proxy still do not
   work well with stacked pages images. Probably for those we'll need
   the page-server or lazy-pages -like protocol that would request the
   needed regions and receive it back rather than read bytes from
   sockets simply to skip those.

6. Add support for cache/proxy into go-phaul code. I haven't yet finished
   with the prototype, but plan to do it soon, so once the above steps
   are done we'll be able to proceed with this one.

]

Signed-off-by: Rodrigo Bruno <rbruno@gsd.inesc-id.pt>
Signed-off-by: Katerina Koukiou <k.koukiou@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2017-11-30 01:21:40 +03:00
rbruno@gsd.inesc-id.pt
81ae3efbf2 util: Copy file w/o sendfile
This is the case when the in/out files are image cache/proxy sockets.

Signed-off-by: Rodrigo Bruno <rbruno@gsd.inesc-id.pt>
Signed-off-by: Katerina Koukiou <k.koukiou@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2017-11-30 01:19:12 +03:00
Andrew Vagin
d3ac1f40b8 zdtm: add a test for nested network namespaces
This tests create a few processes which live in three network namespaces
and have a few sockets which are created in different network namespaces.

Acked-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:19:12 +03:00
Andrei Vagin
0f5eb79a3f net: add a way to get a network namespace for a socket
Each sockets belongs to one network namespace and operates
in this network namespace.

socket_diag reports informations about sockets from
one network namespace, but it doesn't report sockets which
are not bound or connected to somewhere. So we need to have
a way to get network namespaces for such sockets.

Acked-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:19:12 +03:00
Andrei Vagin
a123cbab65 kerndat: check the SIOCGSKNS ioctl
This ioctl is called for a socket and returns a file descriptor
for network namespace where a socket has been created.

Acked-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:19:10 +03:00
Andrei Vagin
6636f4a7af net: set a proper network namespace to create a socket
Each socket has to be restored from a proper network namespaces
where it was created.

We set a specified network namespace before restoring a socket.
A task network namespace is set after restoring all files.

v2: don't set the root netns for transport sockets

Acked-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:18:53 +03:00
Andrei Vagin
8ea3d00296 util: move open_proc_fd to service_fd
We need this to avoid conflicts with file descriptors,
which has to be restored.

Currently open_proc_pid() doesn't used during restoring
file descriptors, but we are going to use it to restore
sockets in proper network namespaces.

Acked-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:18:53 +03:00
Andrei Vagin
e5be666ad6 net: allow to dump and restore more than one network namespace
Restore all network namespaces from the root task and then set
a proper namespace for each task after restoring sockets, because
we need to switch network namespaces to restore sockets.

Each socket has to be created in a proper network namespace.

v2: fix a typo bug

Acked-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:18:53 +03:00
Andrei Vagin
286f0f3607 net: save network namespaces for sockets
Each socket has to be restored in a proper namespaces where
it has been created.

Here is an issue about unconnected and unbound sockets,
they are not reported via socket-diag and we can't to
get their network namespaces.

v2: add a comment before get_socket_ns()
    remove nsid from sk_packet_entry

Acked-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:18:53 +03:00
Pavel Emelyanov
7e81cd9298 usernsd: Add debugging to catch BUG in unsd fd/flags mismatch
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2017-11-30 01:18:53 +03:00
Andrei Vagin
4a48a5d68a net: rename pid into nsid for prepare_net_ns()
PID ussualy means processs ID, but prepare_net_ns works with namespaces.

travis-ci: success for Dump and restore nested network namespaces (rev4)
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Reviewed-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2017-11-30 01:18:53 +03:00
Andrei Vagin
0725a3e4c2 netlink: add ns_id as a generic argument to receive_callback
ns_id will be used to collect sockets and other per-netns
resources

travis-ci: success for Dump and restore nested network namespaces (rev4)
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Reviewed-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2017-11-30 01:18:53 +03:00
Andrei Vagin
4550e79c86 restore: call close_pid_proc() if a child is shared a parent fd table
There are a number of global variables around this descriptor
(e.g. open_proc_fd) and their values are saved in memory which
are not shared between processes.

travis-ci: success for Dump and restore nested network namespaces (rev4)
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Reviewed-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2017-11-30 01:18:53 +03:00
Adrian Reber
3257cd2911 Introducing the --check-only option
Talking about criu a common question is, if it is possible to know if a
checkpoint and restore will actually work. Running 'criu dump' with
--leave-running to see if the checkpointing will work and then running
'criu restore' is always an option. If one of those operations (either
'dump' or 'restore') will fail the chances are high that there are
problems with checkpointing or restoring. But a lot of memory might have
already been dumped to disk and transferred to the destination system
which is not necessary to test for a restore failure. If the restore,
however, works the problem exists that the source process has been told
to keep on running (--leave-running) which might be an undesired
situation to have the process now running on the source and destination
system. To avoid a situation like this and to give an easier option to
test if 'criu dump' and 'criu restore' will work, this patch introduces
the '--check-only' option:

 source system:
  # criu dump --check-only -D /tmp/cp -t <PID>
  Only checking if requested operation will succeed
  # rsync -a /tmp/cp dest-system:/tmp

 destination system:
  # criu restore -D /tmp/cp
  Checking mode enabled

criu will detect if a checkpoint is a 'check-only' checkpoint and the
restore will automatically run in '--check-only' mode.

It is also possible to use the '--check-only' switch on a full
checkpoint to see if the restore will succeed and making sure at the
same time that the process will not start running:

 destination system:
  # criu restore --check-only -D /tmp/cp
  Only checking if requested operation will succeed
  Checking mode enabled

Right now only the existing checks (e.g., check binary size) are run in
'check-only' mode but additional checks could be added like:

 * checksums of binaries
 * checksums of used libraries
 * available memory

v2:
 - changes based on Pavel's review

Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:18:53 +03:00
Pavel Emelyanov
4c50326ed6 crns.py: New attempt to have --unshare option
So, here's the enhanced version of the first try.

Changes are:

1. The wrapper name is criu-ns instead of crns.py
2. The CLI is absolutely the same as for criu, since the script
   re-execl-s criu binary. E.g.
	   scripts/criu-ns dump -t 1234 ...
   just works
3. Caller doesn't need to care about substituting CLI options,
   instead, the scripts analyzes the command line and
   a) replaces -t|--tree argument with virtual pid __if__ the
      target task lives in another pidns
   b) keeps the current cwd (and root) __if__ switches to another
      mntns. A limitation applies here -- cwd path should be the
      same in target ns, no "smart path mapping" is performed. So
      this script is for now only useful for mntns clones (which
      is our main goal at the moment).

Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Looks-good-to: Andrey Vagin <avagin@openvz.org>
2017-11-30 01:18:53 +03:00
Rodrigo Bruno
c45eb121e5 img: Introduce O_FORCE_LOCAL flag for images
criu/image-desc.c    | 4 ++--
 criu/image.c         | 4 ++--
 criu/include/image.h | 1 +
 3 files changed, 5 insertions(+), 4 deletions(-)

In order to prepare for remote snapshots (possible with Image Proxy and Image
Cache) the O_FORCE_LOCAL flag is added to force some images not to be remote
and stay as local files in the file system.

Signed-off-by: Rodrigo Bruno <rbruno@gsd.inesc-id.pt>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:18:53 +03:00
Pavel Emelyanov
d9220e958b lib: Add simple Go wrappers for swrk mode
We'll need some docs :) bu the API is

criu := MakeCriu()

criu.Dump(opts, notify)
criu.Restore(opts, notify)
criu.PreDump(opts, notify)
criu.StartPageServer(opts)

where opts is the object from rpc.proto, Go has almost native support
for those, so caller should

- compile .proto file
- export it and golang/protobuf/proto
- create and initialize the CriuOpts struct

and notify is an interface with callbacks that correspond to criu
notification messages.

A stupid dump/restore tool in src/test/main.go demonstrates the above.

Changes since v1:

* Added keep_open mode for pre-dumps. Do use it one needs
  to call criu.Prepare() right after creation and criu.Cleanup()
  right after .Dump()

* Report resp.cr_errmsg string on request error.

Further TODO:

- docs
- code comments

travis-ci: success for libphaul (rev2)
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:18:53 +03:00
Pavel Emelyanov
090ce500d3 test, pipes: Exhaustive test of shared pipes
So, here's the next test that just enumerates all possible states and checks
that CRIU C/R-s it well. This time -- pipes. The goal of the test is to load
the fd-sharing engine, so pipes are chosen, as they not only generate shared
struct files, but also produce 2 descriptors in CRIU's fdesc->open callback
which is handled separately.

It's implemented slightly differently from the unix test, since we don't want
to check sequences of syscalls on objects, we need to check the task to pipe
relations in all possible ways.

The 'state' is several tasks, several pipes and each generated test includes
pipe ends sitting in all possible combinations in the tasks' FDTs.

Also note, that states, that seem to be equal to each other, e.g. pipe between
tasks A->B and pipe B->A, are really different as CRIU picks the pipe-restorer
based in task PIDs. So whether the picked task has read end or write end at
his FDT makes a difference on restore.

Number of tasks is limited with --tasks option, number of pipes with the
--pipes one. Test just runs all -- generates states, makes them and C/R-s
them. To check the restored result the /proc/pid/fd/ and /proc/pid/fdinfo/
for all restored tasks is analyzed.

Right now CRIU works OK for --tasks 2 --pipes 2 (for more -- didn't check).
Kirill, please, check that your patches pass this test.

TODO:

 - Randomize FDs under which tasks see the pipes. Now all tasks if they have
   some pipe, all see it under the same set of FDs.

Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:18:53 +03:00
Pavel Emelyanov
fbc26a980a test, unix: Exhaustive testing of states (v2)
By exhaustive testing I understand a test suite that generates as much
states to try to C/R as possible by trying all the possible sequences
of system calls. Since such a generation, if done on all the Linux API
we support in CRIU, would produce bazillions of process, I propose to
start with something simple.

As a starting point -- unix stream sockets with abstract names that
can be created and used by a single process :)

The script generates situations in which unix sockets can get into by
using a pre-defined set of system calls. In this patch the syscalls
are socket, listen, bind, accept, connect and send. Also the nummber
of system calls to use (i.e. -- the depth of the tree) is limited by
the --depth option.

There are three things that can be done with a generated 'state':

I) Generate :) and show

Generation is done by recursively doing everything that is possible
(and makes sence) in a given state. To reduce the size of the tree
some meaningless branches are cut, e.g. creating a socket and closing
it right after that, creating two similar sockets one-by-one and some
more.

Shown on the screen is a cryptic string, e.g. 'SA-CX-MX_SBL one,
describing the sockets in the state. This is how it can be decoded:

 - sockets are delimited with _
 - first goes type (S -- stream, D --datagram)
 - next goes name state (A -- no name, B with name, X socket is not in
   FD table, i.e. closed or not yet accepted)
 - next may go letter L meaning that the socket is listening
 - -Cx -- socket is connected and x is the peer's name state
 - -Ixyz -- socket has incoming connections queue and xyz are the
   connect()-ors name states
 - -Mxyz -- socket has messages and xyz is senders' name states

The example above means, that we have two sockets:

 - SA-CX-MX: stream, with no name, connected to a dead one and with a
   message from a dead one
 - SBL: stream, with name, listening

Next printed is the sequence of system calls to get into it, e.g. this
is how to get into the state above:

	socket(S) = 1
	bind(1, $name-1)
	listen(1)
	socket(S) = 2
	connect(2, $name-1)
	accept(1) = 3
	send(2, $message-0)
	send(3, $message-0)
	close(3)

Program has created a stream socket, bound it, listened it, then
created another stream socket, connected to the 1st one, then accepted
the connection sent two messages vice-versa and closed the accepted
end, so the 1st socket left connected to the dead socket with a
message from it.

II) Run the state

This is when test actually creates a process that does the syscalls
required to get into the generated state (and hopefully gets into it).

III) Check C/R of the state

This is the trickiest part when it comes to the R step -- it's not
clear how to validate that the state restored is correct. But if only
trying to dump the state -- it's just calling criu dump. As images dir
the state string description is used.

One may choose only to generate the states with --gen option. One may
choose only to run the states with --run option. The latter is useful
to verify that the states generator is actually producing valid
states. If no options given, the state is also dump-ed (restore is to
come later).

For now the usage experience is like this:

- Going --depth 10 --gen (i.e. just generating all possibles states
  that are acheivable with 10 syscalls) produces 44 unique states for
  0.01 seconds. The generated result covers some static tests we have
  in zdtm :)  More generation stats is like this:
   --depth 15 : 1.1 sec   / 72 states
   --depth 18 : 13.2 sec  / 89 states
   --depth 20 : 1 m 8 sec / 101 state

- Running and trying with criu is checked with --depth 9. Criu fails
  to dump the state SA-CX-MX_SBL (shown above) with the error

  Error (criu/sk-queue.c:151): recvmsg fail: error: Connection reset by peer

Nearest plans:

1. Add generators for on-disk sockets names (now oly abstract).
   Here an interesting case is when names overlap and one socket gets
   a name of another, but isn't accessible by it

2. Add datagram sockets.
   Here it'd be fun to look at how many-to-one connections are
   generated and checked.

3. Add socketpair()-s.

Farther plans:

1. Cut the tree better to allow for deeper tree scan.

2. Add restore.

3. Add SCM-s

4. Have the exhaustive testing for other resources.

Changes since v1:

* Added DGRAM sockets :)

  Dgram sockets are trickier that STREAM, as they can reconnect from
  one peer to another. Thus just limiting the tree depth results in
  wierd states when socket just changes peer. In the v1 of this patch
  new sockets were added to the state only when old ones reported that
  there's nothing that can be done with them. This limited the amount
  of stupid branches, but this strategy doesn't work with dgram due to
  reconnect. Due to this, change #2:

* Added the --sockets NR option to limit the amount of sockets.

  This allowed to throw new sockets into the state on each step, which
  made a lot of interesting states for DGRAM ones.

* Added the 'restore' stage and checks after it.

  After the process is restore the script performs as much checks as
  possible having the expected state description in memory. The checks
  verify that the values below get from real sockets match the
  expectations in generated state:

   - socket itself
   - name
   - listen state
   - pending connections
   - messages in queue (sender is not checked)
   - connectivity

  The latter is checked last, after all queues should be empty, by
  sending control messages with socket.recv() method.

* Added --keep option to run all tests even if one of them fails.

  And print nice summary at the end.

So far the test found several issues:

- Dump doesn't work for half-closed connection with unread messages
- Pending half-closed connection is not restored
- Socket name is not restored
- Message is not restored

New TODO:

- Check listen state is still possible to accept connections (?)
- Add socketpair()s
- Add on-disk names
- Add SCM-s
- Exhaustive script for other resources

Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:18:53 +03:00
Andrei Vagin
3121d90de0 criu: print a criu version with the info level
We always ask users what version of criu they use to investigate a problem,
so it better to have it in a log.

Signed-off-by: Andrei Vagin <avagin@openvz.org>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-23 20:26:02 +03:00
Andrei Vagin
ffee07723e criu: remap soccr log levels to criu levels
criu and soccr has different values for log levels, so
someone has to remap them.

Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Reported-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-23 20:25:49 +03:00
root
638c14f2ed zdtm: grep errors from page-server.log and lazy-pages.log
This can help to investigate logs from Mr Jenkins.

Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-23 20:23:23 +03:00
Cyrill Gorcunov
e6537f3d8d fsnotify: open_handle -- Handle multiple mounts with same s_dev
When inotify is laying on uovermounted fs we should walk over
all mountpoints with same s_dev to find openable path.

Note on restore the path is usually already allocated during
dump stage so get_mark_path won't call for open_handle(), in
turn on dump stage the positive return from open_handle()
will cause fsnotify engine to find openable path, thus there
is kind of double work to be optimized in future.

For example we got a container where systemd-udevd inside
opens inotify for /dev/X entry then overmount ./dev path
with slave option and in result irmap engine on predump
can't figure out where the inotify is sitting causing
migrtion to abort.

Signed-off-by: Cyrill Gorcunov <gorcunov@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2017-11-23 20:23:23 +03:00
Dmitry Safonov
d541bc797c build: Move generated config.h into include/common/
config.h is a generated file with "build-features" defines.
We use it for several purposes:
o to check that compiler can do it's job
o to complement user-visible API between distributions
o to add compile-time options from .config global file

It's used in criu and soccr, but compel also needs such thing.

Previously, soccr has a link to config.h in criu includes,
but it would be much cleaner to move it to other headers,
that are shared between sub-projects into include/common.

Reported-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Tested-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-23 20:23:23 +03:00