2
0
mirror of https://github.com/checkpoint-restore/criu synced 2025-08-29 13:28:27 +00:00

113 Commits

Author SHA1 Message Date
Kir Kolyshkin
b17962ad8d Fix pr_perror() usage
When using pr_perror(), format string should not end with \n,
as it is added by the macro itself.

Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Acked-by: Andrew Vagin <avagin@odin.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-05-05 13:36:29 +03:00
Oleg Nesterov
745f845fa8 revert 246367e4e483 "add walk_all flag to walk_namespaces"
We no longer need to populate ext_ns->mnt.mntinfo_list until
resolve_external_mounts(). We can rely on find_ext_ns_id() which
does collect_mntinfo() on demand.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-04-14 22:34:40 +03:00
Tycho Andersen
246367e4e4 add walk_all flag to walk_namespaces
In the rest of this series we need to walk all the namespaces to autodetect
which mounts are master/shared/private bind mounts, so we need the information
from criu's namespace in the case when the namespaces are not the same.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-04-09 12:53:19 +03:00
Pavel Emelyanov
e29c9daec2 img: Remove O_OPT and COLLECT_OPTIONAL
Current code doesn't make any difference between OPT and no-OPT
except for the message is printed or not in the open_image().
So this particular change changes nothing but the availability of
this message.

In the next patches I wil introduce "empty images" to deal with
the ENOENT situation in a more graceful manner.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-03-13 14:42:01 +03:00
Pavel Emelyanov
b8556e8084 usernsd: The way to restore priviledged stuff in userns
We have collected a good set of calls that cannot be done inside
user namespaces, but we need to [1]. Some of them has already
being addressed, like prctl mm bits restore, but some are not.

I'm pretty sceptical about the ability to relax the security
checks on quite a lot of them (e.g. open-by-handle is indeed a
very dangerous operation if allowed to unpriviledged user), so
we need some way to call those things even in user namespaces.

The good news about it its that all the calls I've found operate
on file descriptors this way or another. So if we had a process,
that lived outside of user namespace, we could ask one to do the
high priority operation we need and exchange the affected file
descriptor via unix socket.

So the usernsd is the one doing exactly this. It starts before we
create the user namespace and accepts requests via unix socket.
Clients (the processes we restore) send him the functions they
want to call, the descriptor they want to operate on and the
arguments blob. Optionally, they can request some file descriptor
back after the call.

In non usernamespace case the daemon is not started and the calls
are done right in the requestor's process environment.

In the next patch there's an example of how to use this daemon
to do the priviledged SO_SNDBUFFORCE/_RCVBUFFORCE sockopt on
a socket.

[1] http://criu.org/UserNamespace

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@openvz.org>
2015-02-13 16:11:38 +04:00
Pavel Emelyanov
f639680bf1 userns: Don't fork task not to dump userns
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
2014-11-11 20:13:18 +04:00
Pavel Emelyanov
1283921d53 ns: Don't do manual lookup_ns_by_id
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-11-11 20:12:49 +04:00
Pavel Emelyanov
ba983f9819 ns: Factor out nsid listing code
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-11-11 20:12:34 +04:00
Pavel Emelyanov
f33908a897 ns: Rename "created" futex and comment what it is
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-11-11 20:11:58 +04:00
Pavel Emelyanov
b3f644572e ns: Fix compilation on Fedora-19
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-11-07 17:23:17 +04:00
Andrey Vagin
b414318c4b userns: check that all namespaces were created from a target userns
We enter into the target userns and try to enter in other namespaces.
The "enter" operation requires CAP_SYS_ADMIN in a user namespace,
where a taget namespace was created.

Now if one or more namespaces were created in another userns,
criu stops dumping and return an error. I want to find someone, who uses
this configuration. In this case restore will be more complicated.
Current version covers containers needs.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-11-07 17:22:20 +04:00
Andrey Vagin
c004ff7745 restore: set PR_SET_DUMPABLE to have access to proc files
It is cleared when a process is forked in a new userns.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-11-07 17:19:02 +04:00
Andrey Vagin
fa266bb0cb userns: restore per-namespace mappings of user and group IDs (v4)
In this patch we fill /proc/PID/uid_map and /proc/PID/gid_map for the
root task.

v2: initialize groups in a new namespace.
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

v3: add a helper to initialize creds in a new userns

v4: initialize userns creds in prepare_namespaces()
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-11-07 17:16:40 +04:00
Andrey Vagin
cb2f9223a0 dump: dump user namespaces (v2)
For that we need to save per-namespace mappings of user and group IDs.

And all id-s for tasks and files are saved from the target user
namespace.

v2: move code into collect_namespaces()
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-11-07 17:16:16 +04:00
Andrey Vagin
30711b109d userns: save uid-s from a target userns (v2)
We are going to support user namespaces and uid-s will be converted
accoding with userns mappings.

v2: conver id-s for sockets too
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-11-07 17:15:45 +04:00
Andrey Vagin
102cbe8a09 namespaces: take into account USERNS id
and return an error, if a proccess live in another userns,
because criu doesn't support it.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-10-30 16:00:33 +04:00
Andrey Vagin
5ed535f17a namespace: append a null byte after readlink
readlink() does not append a null byte to buf.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-10-30 15:31:15 +04:00
Andrey Vagin
db8ff58f52 namespace: don't fail if a namespace isn't supported by kernel
CRIU reads /proc/pid/ns/[NS] and fails of a link is not exist.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-10-30 15:30:42 +04:00
Pavel Emelyanov
b1a8e41dd0 mnt: Don't validate mounts on pre-dump
This is for two reasons. First, validation can meet external mount
and will call plugins, which is not correct on pre-dump and actually
crashes on uninitilized plugins lists. Second, even if on pre-dump
mount tree is not "supported" this can be a temporary situation (yes,
yes, unlikely, but still).

On the other hand, it's better to fail earlier, but that's another
story.

Reported-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
2014-10-30 15:15:30 +04:00
Andrey Vagin
025b4e86b5 namespaces: user open_proc() in switch_ns()
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-10-30 15:12:05 +04:00
Pavel Emelyanov
16971e47cd ns: Introduce ns walking helper
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-10-14 18:01:27 +04:00
Pavel Emelyanov
c57c2cfa64 predump: Collect mnt and net namespaces properly
On pre-dump we collect only two namespaces -- the mnt one
for criu and mnt one again for root task.

This is not correct. We need all mount namespaces to make
the irmap generation work properly and we need all net
namespaces to have parasite sockets created.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-10-02 14:30:31 +04:00
Pavel Emelyanov
8ad653c732 pstree: Store task's netns on pstree-item
Will be needed for parasite sockets.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-10-01 13:35:11 +04:00
Pavel Emelyanov
678d19be26 ns: Reshuffle nsid generation code
To make it possible to get ns_id object together
with its ID.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-10-01 13:35:01 +04:00
Pavel Emelyanov
7327ffe6a7 ns: Introduce collect_net_namespaces
And move sockets collection there.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-10-01 13:33:56 +04:00
Pavel Emelyanov
01f6f890c2 ns: Introduce collect_namespaces routine
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-10-01 13:33:42 +04:00
Pavel
8ac80915e0 ns: Factor out namespace switching call
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-30 21:54:11 +04:00
Pavel Emelyanov
295090c1ea img: Introduce the struct cr_img
We want to have buffered images to speed up dump and,
slightly, restore. Right now we use plan file descriptors
to write and read images to/from. Making them buffered
cannot be gracefully done on plain fds, so introduce
a new class.

This will also help if (when?) we will want to do more
complex changes with images, e.g. store them all in one
file or send them directly to the network.

For now the cr_img just contains one int _fd variable.

This patch chages the prototype of open_image() to
return struct cr_img *, pb_(read|write)* to accept one
and fixes the compilation of the rest of the code :)

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
2014-09-30 21:48:13 +04:00
Pavel Emelyanov
5f2a7ac27b img: Rename fdset -> imgset
Since we're going to switch from int-fd-s to class-image
soon the fdset name will not fit into the new terminology.

This patch is

 sed -e 's/fdset/imgset/g' -i *
 sed -e 's/imgset_fd/img_from_set/g' -i *
 git mv include/fdset.h include/imgset.h

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
2014-09-30 21:48:10 +04:00
Pavel
b30f0f0104 ns: Dump namespaces in parallel
The main reason for this is -- dumping namespace has a lot of
points when the process just waits for something. At the same
time criu process wait for the ns dumper and doesn't dump
others.

The great example of waiting for something is setns syscall.
Very often it calls synchronize_rcu() which can be quite long.
Let other processes do smth useful while this.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
2014-09-23 20:43:33 +04:00
Andrey Vagin
a20011aca6 ns: initialize nsid in rst_add_ns_id
Execute zdtm/live/static/pipe00
./pipe00 --pidfile=pipe00.pid --outfile=pipe00.out
Dump 3158
Restore
test/zdtm.sh: line 472:  3173 Segmentation fault      (core dumped) setsid  restore --file-locks --tcp-established -x -D  -o

Reported-by: Jenkins Criuovich
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-23 14:46:19 +04:00
Pavel Emelyanov
8d5822d9cb mnt: Factor out mntns nsid creation on restore
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-23 13:22:12 +04:00
Pavel Emelyanov
1435617c40 mnt: Rename _collect_root into _get_root_fd
Nowadays this routine is mainly used for getting an
fd, rather than keeping one for future reference.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-23 01:38:58 +04:00
Pavel Emelyanov
05c02ddcf9 mnt: Move nsmask checking into prepare_mnt_ns
Helper for simpler next patch.

Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-22 23:48:42 +04:00
Pavel Emelyanov
4ffa79695d mnt: Remove unneeded argument from prepare_mnt_ns
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-22 23:48:23 +04:00
Andrey Vagin
9625ebe596 mount: move dump_mnt_namespaces in mount.c
It will fill mntinfo list and this is internal logic of mount.c

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:39:39 +04:00
Andrey Vagin
cc1fd5760a mount: save mount tree for each namespace
We are going to support nested mount namespaces and each NS has own
tree. The mount tree is used for checking that a file is reachable.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:39:34 +04:00
Andrey Vagin
0721626902 namespaces: dump mount namespaces before tasks (v2)
because we want to check, that all files are reachable.
For that we need to collect all mounts from all namespaces.

v2: dump mntns separately
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:38:47 +04:00
Andrey Vagin
d2012883ab criu: rename current_ns_mask to root_ns_mask (v2)
Now we supports sub-mntns, so root_ns_mask sounds more correct than
current_ns_mask.

v2: typo fix
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:38:33 +04:00
Andrey Vagin
4067f4bb7e mount: allow to dump and restore nested mount namespaces (v3)
v2: another attempt to write readable code:)
v3: clean up
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:38:23 +04:00
Andrey Vagin
3a291e33ff crtools: restore nested mount namespaces (v2)
Known issue:
* currently only namespaces with the same root is supported
* nested namespaces can be dumped and restored only if the root task
  has own mount namespace.

All nested namespaces are restored in a root namespace in temporary
directories. All mount points restored in one tree and then they are
divided into namesaces.
The task with minimal pid for each namespaces unshared mntns and
then it makes pivot_root in a proper temporary directory. All other
tasks makes setns to enter into a mount namespace of the task with
minimal pid.

v2: clean up

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:38:17 +04:00
Andrey Vagin
eac462922c restore: add mount id-s in the ns_ids list (v4)
Currently ns_ids list is filled only on dump. Soon we'll need this
list for mount namespaces on restore, e.g. to know which tasks share
the namespaces.

v2: merge the patch "namespace: add a function to search an ns_id
item by id" into this one.
v3: add prefix rst_ to add_ns_id
v4: look up namespace by two values -- type AND ID

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:37:52 +04:00
Andrey Vagin
4bd119ddf6 ns: clean up dump_namespaces
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-18 19:17:10 +04:00
Cyrill Gorcunov
708abf40b8 namespaces: Use long value to check for UINT_MAX
We have a condition

	BUG_ON(kid > UINT_MAX);

but kid is unsigned int so it's never bigger than UINT_MAX,
use unsigned long instead.

CID 1042296

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-09 15:26:19 +04:00
Andrey Vagin
5b564db91e namespace: move struct ns_id into namespace.h
It's going to be used for restoring namespaces. For example we need to
enumirate the ns_ids list for restoring mount namespaces.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-03-12 00:23:47 +04:00
Pavel Emelyanov
84ebc64b1f pre-dump: Collect mount info, root and nsmask
Well, we want to pre-dump files (fsnotifies), for that we
will need mountinfo-s and root, and for the latter -- the
current ns mask.

The problem with current ns mask is that its generation is
incorporated into ns IDs generation and dumping. And since
the ids dumping is not performed on pre-dump, let's just
provide a helper for ns-mask generation.

Strictly speaking, the whole ns-mask idea is not great, but
it's to be fixed later.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-01-30 16:20:15 +04:00
Kir Kolyshkin
d64d68d66c whitespace-at-eol cleanup
Remove whitespace at EOL (found by git grep ' $')

To people using vim, I'd suggest adding the following code to ~/.vimrc:

let c_space_errors = 1
highlight FormatError ctermbg=darkred guibg=darkred
match FormatError /\s\+$\|\ \+\t\|\%80v.\|\ \{8\}/

Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-12-12 10:00:45 +04:00
Andrey Vagin
0ef2128309 crtools: don't include cr-show.h in crtools.h
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-11-06 15:42:14 +04:00
Andrey Vagin
1300cf4915 crtools: move all stuff about fdset in a separate header
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-11-06 15:24:48 +04:00
Andrey Vagin
9826d2dd04 crtools: don't include pstree.h in namespaces.h
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-11-06 12:39:50 +04:00