2
0
mirror of https://github.com/checkpoint-restore/criu synced 2025-08-27 12:28:14 +00:00

9635 Commits

Author SHA1 Message Date
Cyrill Gorcunov
0788b01772 images: remap-file-path -- Reserve entries for spfs manager
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:15 +03:00
Kirill Tkhai
a591636f9f ns: Do not reuse PROC_SELF after CLONE_VM child
Child opens PROC_SELF, populates open_proc_self_pid and exits. If parent creates
one more child with the same pid later, the new child will try to reuse PROC_SELF,
set by exited child. So, we need to close PROC_SELF after the first child has finished.

We have this issue in two places, which have the same code. Let's move the code
into new function call_in_child_process() and fix the issue there using close_pid_proc().

https://travis-ci.org/tkhai/criu/builds/214182862

v2: Introduce the helper call_in_child_process() and fix issue there.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:15 +03:00
Kir Kolyshkin
6a3280c747 No \n in pr_perror
It adds one for you already.

Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:15 +03:00
Kir Kolyshkin
41bcdd8f0c copy_file: rm extra error message
No need to print an error after xmalloc(), it already printed one
for you.

Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:15 +03:00
Kir Kolyshkin
2af2159176 criu/img-remote.c: use pr_err not pr_perror
In those error paths where we don't have errno set,
don't use pr_perror(), use pr_err() instead.

Cc: Rodrigo Bruno <rbruno@gsd.inesc-id.pt>
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:15 +03:00
Kir Kolyshkin
ae845f0079 criu/img-remote.c: use xmalloc
1. Use xmalloc() where possible.

2. There is no need to print an error message, as xmalloc()
   has already printed it for you.

Cc: Rodrigo Bruno <rbruno@gsd.inesc-id.pt>
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:15 +03:00
Kir Kolyshkin
47b5489548 criu/img-remote-proto.c: error printing fixes
OK, so we have pr_perror() for cases where errno is set (and it makes
sense to show it), and pr_err() for other errors. A correct function
is to be used, depending on the context.

1. pthread_mutex_*() functions don't set errno, therefore pr_perror()
   should not be used.

2. accept() sets errno => makes sense to use pr_perror().

3. read_header() arguably sets errno => use pr_err().

4. open_proc_rw() already prints an error message, there is no need
   for yet another one.

Cc: Rodrigo Bruno <rbruno@gsd.inesc-id.pt>
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:15 +03:00
Kir Kolyshkin
941929c61f criu/img-remote-proto.c: use static mutex init
I see no need to do dynamic init here.

Cc: Rodrigo Bruno <rbruno@gsd.inesc-id.pt>
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:15 +03:00
Andrei Vagin
e84540e1f1 zdtm: show a process tree if a test doesn't show signs of life
Call "ps axf" if waitpid() is running more than 10 seconds

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:15 +03:00
Andrei Vagin
136ae76997 mount: use switch_ns_by_fd() instead of open_proc() + setns()
This function was added recently.

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:15 +03:00
Kirill Tkhai
6a430da7ca net: Create child with CLONE_VM in prepare_net_namespaces()
Some functions in prepare_net_ns() use vmalloc(), and this
memory should be visible to our children.

v4: munmap() stack or err path.
v3: Call prepare_userns_creds() before restore net.
v2: No functional changes, just killed continue.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:15 +03:00
Kirill Tkhai
96cbd9f5bd zdtm: Add userns02 test
Differs to userns01 test by unsharing net net in child.
This should test nested user/net ns interaction.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:15 +03:00
Kirill Tkhai
6105dcd590 ns: export prepare_userns_creds()
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:15 +03:00
Rodrigo Bruno
610c6d4ddd Fixed NULL_RETURNS issues introduced by remote images code.
Signed-off-by: rodrigo-bruno <rbruno@gsd.inesc-id.pt>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:15 +03:00
Rodrigo Bruno
baa1dfec4d Fixed RESOURCE_LEAK issues introduced by remote images code.
Signed-off-by: rodrigo-bruno <rbruno@gsd.inesc-id.pt>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:15 +03:00
Kirill Tkhai
70ca366189 files: Do not close transport socket twice
We close it in sigreturn_restore() for unification with other
service fds, so kill the second close() from here.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:15 +03:00
Rodrigo Bruno
89e1937a3f Fixed BUFFER_SIZE_WARNING issues introduced by remote images code.
Signed-off-by: Rodrigo Bruno <rbruno@gsd.inesc-id.pt>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:15 +03:00
Pavel Emelyanov
7b0fa86d66 phaul: Go library for live migration
The API is as simple as

	srv := MakePhaulServer(config)
	cln := MakePhaulClient(local, remote, config)
	cln.Migrate()

* config is the PhaulConfig struct that contains pid to migrate,
  memory transfer channel (file descriptor) that phaul can use
  to send/receive memory and path to existing directory where
  phaul can put intermediate files and images.

* local is PhaulLocal interface with (for now) the single method
  - DumpCopyRestore(): method that phaul calls when it's time
    to do engine-specific dump, images copy and restore on
    the destination side.

  Few words about the latter -- we've learned, that different
  engines have their own way to call CRIU to dump a container,
  so phaul, instead of dumping one by its own, lets the caller
  do it. To keep-up with pre-dump stuff, the client should
  not forget to do three things:

  - set the TrackMem option to true
  - set the ParentImg to the passed value
  - set the Ps (page server) channel with 'config.Memfd'

  The criu object is passed here as well, so that caller can
  call Dump() on it (once we have keep_open support in libcriu
  this will help to avoid additional criu execve).

  The method also should handle the PostDump notification and
  do images-copy and restore in it. Not sure how to wrap this
  into phaul better.

* remote is PhaulRemote interface whose method should be called
  on the dst side on the PhaulServer object using whatever RPC
  the caller finds acceptable.

As a demonstration the src/test/main.go example is attached. To
see how it goes 'make' it, then start the 'piggie $outfile'
proggie and run 'test $pid' command. The piggie will be, well,
live migrated locally :) i.e. will appear as a process with
different pid (it lives in a pid namespace).

Changes since v2:

* Reworked the API onto local/remote/config scheme
* Added ability to configure diretory for images
* Re-used server side Criu object for final restore

Changes since v1:

* Supported keep_open-s for pre-dumps
* Added code comments about interface
* Simplified the example code

Further plans for this are

- move py p.haul to use this compiled library
- add post-copy (lazy pages) support (with Mike help)
- add image-cache and image-proxy (with Ridrigo help)
- add API/framwork for FS migration

Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:15 +03:00
Dmitry Safonov
1fe3523aca images: add proto2 syntax specification to remote-image.proto
To suppress protobuf's warning:
> [libprotobuf WARNING google/protobuf/compiler/parser.cc:546]
> No syntax specified for the proto file: remote-image.proto.
> Please use 'syntax = "proto2";' or 'syntax = "proto3";'
> to specify a syntax version. (Defaulted to proto2 syntax.)

Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:15 +03:00
Kirill Tkhai
1a5f103584 ns: Use rst_new_ns_id() in read_ns_with_hookups()
v2: New

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:15 +03:00
Kirill Tkhai
6b0b8d6483 ns: Replace task argument rst_add_ns_id() with pid
We don't need task in that function, also this allows
to delete fake task in read_ns_with_hookups().

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:15 +03:00
Kirill Tkhai
be38302ca2 ns: Set NS_ROOT namespaces after tasks are read
Namespaces are read in read_ns_with_hookups(),
when tasks are not read. So, root_item is NULL,
and NS_ROOT is not set for appropriate namespaces.

This patch fixes NS_ROOT after tasks are read. Also
it adds uts, ipc and cgroup namespaces for uniformity.

v2: Use macro MARK_ROOT_NS()

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:14 +03:00
Kirill Tkhai
9bf4346c3c ns: Override fake pid in rst_add_ns_id()
During reading of ns file, we add namespaces with fake pid -1.
Allow to override it later with real pid of a process.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:14 +03:00
Kirill Tkhai
c5435791b9 zdtm: Check for fsuid and fsgid in userns01 test
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:14 +03:00
Kirill Tkhai
67f9aef330 zdtm: Check for euid, suid, egid and sgid in userns01 test
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:14 +03:00
Kirill Tkhai
da9367188b zdtm: Check for groups list userns01 test
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:14 +03:00
Kirill Tkhai
dff4949591 ns: Keep all clone flags fixups together
It improves readability, when they all are in the only place
and they all are seen.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:14 +03:00
Kirill Tkhai
bd7bc00e47 ns: Simplify create_net_ns()
Merge code with the same functionality in one

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:14 +03:00
Kirill Tkhai
04cd52c64a net: Kill unused argument in open_net_ns()
Nobody uses it.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:14 +03:00
Kirill Tkhai
9e9357db48 ns: Remove excess unshare CLONE_NEWNET
Child process is created to set NS_OTHER user_ns,
before creation of a net_ns.

So, this CLONE_NEWNET is useless, and the created
net_ns is lost right after we do unshare() in create_net_ns().

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:14 +03:00
Kirill Tkhai
140111c386 ns: Allow nested user namespaces
Everything is prepared for nested user namespaces support.
The only thing, we should do more, is to enter to dumped
user namespace's parent before the dump.
We use CLONE_VM for child tasks, so they may populate
user_ns maps in parent memory without any tricks.

v3: Check for WIFEXITED(). Fixed stack size.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:14 +03:00
Kirill Tkhai
1e6f4047b9 ns: Convert task cred's xids to target user ns
xids are saved according to NS_ROOT, while in pie
we may set them in their target user_ns. So, let's
convert them. Look at the commentary in the code,
while we save them in NS_ROOT.

Also, small cleanup: use creds instead of args->creds
for caps.

v4: Use target_userns_gid() to convert gids.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:14 +03:00
Kirill Tkhai
3481f2d926 ns: Dump creds xids in root_user_ns
They may not be mapped in target user_ns, so dump they
values in NS_ROOT. But because of backward compatibility
we can't collect their values from "/proc/[pid]/status",
because it's supported on the most recent kernel only.
So, choose this dump file format (dumping values in NS_ROOT),
and we be ready for the future.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:14 +03:00
Kirill Tkhai
342be8bc59 rst: Pass pstree_item argument to alloc_groups_copy_creds()
Pass the arg and add const modifiers where they are need.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:14 +03:00
Kirill Tkhai
5d3e26d221 shmem: Fixup shmem_wait_and_open() opens foreign /proc/[pid]/fd/[i]
When target process is in a user_ns, where we do not have a permissions,
we need use usernsd helper to get its fds.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:14 +03:00
Kirill Tkhai
69b412ec82 ns: Set target user_ns after net_ns is set
Restore task's user_ns, and keep in mind we born in parent's user_ns

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:14 +03:00
Kirill Tkhai
87ad5ec22c ns: Implement set_user_ns()
Add a field pstree_item::user_ns, which shows the task's current
user_ns, and introduce helpers to set it.

v3: Rebase on fdstore

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:14 +03:00
Kirill Tkhai
f9582b0bd3 utils: Introduce open_fd_of_real_pid()
As access to /proc/[pid]/fd/[i] of a task from parent's
user_ns is prohibited, introduce a helper, doing that
via usernsd.

Also, remove BUG_ON() in usernsd, as now it may be used
without input fd parameter.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:14 +03:00
Kirill Tkhai
13cba0ca69 user_ns: Set user_ns before net_ns creation
Since net ns may reffer not only to root_user_ns,
set appropriate user_ns before its creation.

v3: New

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:14 +03:00
Kirill Tkhai
a69a4eddec ns: Generate user_ns tree
Create user namespaces hierarhy from criu main task.
Open ns'es fds, so they are seen for everybody in
fdstore.

Why we do it this way.
1)User namespaces are not correlated with task
hierarhy. Parent task may have a user namespace
of a level bigger, that a child task. So, we
can't restore the user namespaces just by
passing CLONE_NEWUSER in fork_with_pid().

2)CLONE_FS tasks will require user_ns is set at the
moment of clone(), so we have to restore target user_ns
in locality of create_children_and_session() in this case.

v3: Check for WIFEXITED(). Aligned stack.
    Use fdstore to keep ns fd.
    Create tree from root_item.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:14 +03:00
Kirill Tkhai
3aebb0eac6 utils: Move getting real pid functionality to separate function
This is refactoring

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:14 +03:00
Kirill Tkhai
5938cc0d50 proc: Close CR_PROC_FD_OFF and TRANSPORT_FD_OFF later
CR_PROC_FD_OFF is need for accessing to foreign tasks
fds, and will be used in the future.

TRANSPORT_FD_OFF is for uniformity.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:14 +03:00
Kirill Tkhai
0966b0d05d ns: Make write_id_map() use CR_PROC_FD_OFF
Currently, it's used by criu from CRIU_NS only.
So, in fact open_proc_rw() leads to opening of
a fd in CRIU_NS /proc (open_pid_proc() just
opens "/proc" dir, when PROC_FD_OFF is not set).

Make write_id_map() use CR_PROC_FD_OFF, which
exists, and does not confuse a user.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:14 +03:00
Kirill Tkhai
75f71e73a1 ns: Make prepare_userns() have ns map parameter
This is refactoring

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:14 +03:00
Kirill Tkhai
0742794aa6 ns: Write/read ns entries in new way
The patch introduces generic way for dumping all the namespaces
in a generic way (currently, only user ns entries are dumped).

Handler for old user ns images is remained on its place.

v4: Rebase on generic parent_id and userns_id.
v3: On restore, keep in mind, that parent ns may not be read
    at the moment of the searching of it.
    Set correct user ns id to d_ns.
    Reflect the fact, that parent_id is moved to pid and user ext.
    Read ns ids before tasks.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:14 +03:00
Kirill Tkhai
c2a773cfeb proto: Add ns_entry description
New image format, generic for all namespaces.
Currently, it's for pid, net and user ns.

v4: Rename ns-hookup to ns.
    Make user_ns and parent generic.
v3: Move parent_id to pid and user ext

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:14 +03:00
Kirill Tkhai
a1d4cef08a images: Move uid_gid_extent and userns_entry descriptions
Move them into ns.proto file

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:14 +03:00
Kirill Tkhai
388d853fa3 ns: Implement dup_userns_entry()
Function for cloning UsernsEntry entries.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:14 +03:00
Kirill Tkhai
b158b7bfdb ns: Set pointer to root_user_ns in ns_ids
Old type images do not have pointer to user_ns.
Set them manually.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:14 +03:00
Kirill Tkhai
45588e9d6c ns: Provide the case when root_item has !NS_ROOT user_ns in rst_add_ns_id()
root_item may have NS_OTHER user_ns, so do not set it directly.
This will be used in next patches.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:13 +03:00