2
0
mirror of https://github.com/checkpoint-restore/criu synced 2025-08-28 21:07:43 +00:00

42 Commits

Author SHA1 Message Date
Pavel Emelyanov
b8556e8084 usernsd: The way to restore priviledged stuff in userns
We have collected a good set of calls that cannot be done inside
user namespaces, but we need to [1]. Some of them has already
being addressed, like prctl mm bits restore, but some are not.

I'm pretty sceptical about the ability to relax the security
checks on quite a lot of them (e.g. open-by-handle is indeed a
very dangerous operation if allowed to unpriviledged user), so
we need some way to call those things even in user namespaces.

The good news about it its that all the calls I've found operate
on file descriptors this way or another. So if we had a process,
that lived outside of user namespace, we could ask one to do the
high priority operation we need and exchange the affected file
descriptor via unix socket.

So the usernsd is the one doing exactly this. It starts before we
create the user namespace and accepts requests via unix socket.
Clients (the processes we restore) send him the functions they
want to call, the descriptor they want to operate on and the
arguments blob. Optionally, they can request some file descriptor
back after the call.

In non usernamespace case the daemon is not started and the calls
are done right in the requestor's process environment.

In the next patch there's an example of how to use this daemon
to do the priviledged SO_SNDBUFFORCE/_RCVBUFFORCE sockopt on
a socket.

[1] http://criu.org/UserNamespace

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@openvz.org>
2015-02-13 16:11:38 +04:00
Pavel Emelyanov
f33908a897 ns: Rename "created" futex and comment what it is
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-11-11 20:11:58 +04:00
Andrey Vagin
cb2f9223a0 dump: dump user namespaces (v2)
For that we need to save per-namespace mappings of user and group IDs.

And all id-s for tasks and files are saved from the target user
namespace.

v2: move code into collect_namespaces()
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-11-07 17:16:16 +04:00
Andrey Vagin
30711b109d userns: save uid-s from a target userns (v2)
We are going to support user namespaces and uid-s will be converted
accoding with userns mappings.

v2: conver id-s for sockets too
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-11-07 17:15:45 +04:00
Pavel Emelyanov
b1a8e41dd0 mnt: Don't validate mounts on pre-dump
This is for two reasons. First, validation can meet external mount
and will call plugins, which is not correct on pre-dump and actually
crashes on uninitilized plugins lists. Second, even if on pre-dump
mount tree is not "supported" this can be a temporary situation (yes,
yes, unlikely, but still).

On the other hand, it's better to fail earlier, but that's another
story.

Reported-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
2014-10-30 15:15:30 +04:00
Pavel Emelyanov
16971e47cd ns: Introduce ns walking helper
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-10-14 18:01:27 +04:00
Pavel Emelyanov
c57c2cfa64 predump: Collect mnt and net namespaces properly
On pre-dump we collect only two namespaces -- the mnt one
for criu and mnt one again for root task.

This is not correct. We need all mount namespaces to make
the irmap generation work properly and we need all net
namespaces to have parasite sockets created.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-10-02 14:30:31 +04:00
Pavel Emelyanov
3c7d01f6a7 net: Pre-create nl diag sk
The setns() syscall (called by switch_ns()) can be extremely
slow. If we call it two or more times from the same task the
kernel will synchonously go on a very slow routine called
synchronize_rcu() trying to put a reference on old namespaces.

To avoid doing this more than once I propose to create all
per-ns sockets in one place with one setns call. In this
patch there's on nl diag socket used to collect other sockets
is created this way.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-10-01 13:34:29 +04:00
Pavel Emelyanov
01f6f890c2 ns: Introduce collect_namespaces routine
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-10-01 13:33:42 +04:00
Pavel
867bcd2196 mnt: Shorten the mntns dumping loop
We currently have all mouninfo-s from all mnt namespaces collected
in one big list. On dump we scan through it to find the namespaces
we need to dump.

This can be optimized by walking the list of namespaces instead.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
2014-09-23 20:37:32 +04:00
Pavel Emelyanov
8d5822d9cb mnt: Factor out mntns nsid creation on restore
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-23 13:22:12 +04:00
Andrey Vagin
e827a695f3 mount: separate collect_mnt_ns from dump_mnt_ns
We are going to support nested mntns, so the global mntinfo_tree
variable are useless and information about tree should be connected
to a proper namespace.

But when we don't dump mntns, we need to collect mounts for the current
mntns.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:39:41 +04:00
Andrey Vagin
cc1fd5760a mount: save mount tree for each namespace
We are going to support nested mount namespaces and each NS has own
tree. The mount tree is used for checking that a file is reachable.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:39:34 +04:00
Andrey Vagin
0721626902 namespaces: dump mount namespaces before tasks (v2)
because we want to check, that all files are reachable.
For that we need to collect all mounts from all namespaces.

v2: dump mntns separately
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:38:47 +04:00
Andrey Vagin
d2012883ab criu: rename current_ns_mask to root_ns_mask (v2)
Now we supports sub-mntns, so root_ns_mask sounds more correct than
current_ns_mask.

v2: typo fix
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:38:33 +04:00
Andrey Vagin
3a291e33ff crtools: restore nested mount namespaces (v2)
Known issue:
* currently only namespaces with the same root is supported
* nested namespaces can be dumped and restored only if the root task
  has own mount namespace.

All nested namespaces are restored in a root namespace in temporary
directories. All mount points restored in one tree and then they are
divided into namesaces.
The task with minimal pid for each namespaces unshared mntns and
then it makes pivot_root in a proper temporary directory. All other
tasks makes setns to enter into a mount namespace of the task with
minimal pid.

v2: clean up

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:38:17 +04:00
Andrey Vagin
eac462922c restore: add mount id-s in the ns_ids list (v4)
Currently ns_ids list is filled only on dump. Soon we'll need this
list for mount namespaces on restore, e.g. to know which tasks share
the namespaces.

v2: merge the patch "namespace: add a function to search an ns_id
item by id" into this one.
v3: add prefix rst_ to add_ns_id
v4: look up namespace by two values -- type AND ID

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:37:52 +04:00
Andrey Vagin
5b564db91e namespace: move struct ns_id into namespace.h
It's going to be used for restoring namespaces. For example we need to
enumirate the ns_ids list for restoring mount namespaces.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-03-12 00:23:47 +04:00
Pavel Emelyanov
84ebc64b1f pre-dump: Collect mount info, root and nsmask
Well, we want to pre-dump files (fsnotifies), for that we
will need mountinfo-s and root, and for the latter -- the
current ns mask.

The problem with current ns mask is that its generation is
incorporated into ns IDs generation and dumping. And since
the ids dumping is not performed on pre-dump, let's just
provide a helper for ns-mask generation.

Strictly speaking, the whole ns-mask idea is not great, but
it's to be fixed later.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-01-30 16:20:15 +04:00
Cyrill Gorcunov
291aa3f6d6 headers: Add extern specificator to functions
We really have a mess of extern/non-extern declaration
of functions in our headers. Always use extern for
unification purpose.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-11-15 17:00:58 +04:00
Andrey Vagin
2082e8faf0 crtools: don't include crtools.h in other headers
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-11-06 18:17:38 +04:00
Andrey Vagin
9826d2dd04 crtools: don't include pstree.h in namespaces.h
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-11-06 12:39:50 +04:00
Andrey Vagin
07930a8df4 ns: replace pid on id in per-namespace files
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-10-01 12:17:04 +04:00
Pavel Emelyanov
60e6d38868 collect: Shorten common images collecting code
Now we have a set of cinfo-s, it's possible to collect all
this stuff in a plan for-loop.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-08-21 03:52:18 +04:00
Pavel Emelyanov
d020ebb36d files: Compact the code by removing per-file dump helpers
Since *all* of them just call do_dump_gen_file with proper ops,
just call one directly. Compacts the code.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-06-14 00:11:08 +04:00
Pavel Emelyanov
6bf22f8c75 crtools: Get rid of on-stack cr_options
We have global instance of them, that's enough.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-05-28 21:11:13 +04:00
Pavel Emelyanov
ec50a07727 ns: Add c/r for /proc/$pid/ns/$ids references
Based on work done by Cyrill Corcunov (many thanks for that).

In this commit we implement c/r for files which have opened
/proc/$pid/ns/$ids entries.

The idea is rather simple one

Checkpoint
==========

- Check if the file name is the one of known to be ns ref
- If match then write protobuf entry

Restore
=======

- Read all ns entries from the image
- When criu tries to open one we lookup over process
  tree to figure out which PID should be used in path
  and then just open it

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-05-18 04:00:05 +04:00
Cyrill Gorcunov
30936058a0 ns: Extend ns_desc to carry the length of the ns name
This will be needed for fast parsing of procfs ns references.

[ xemul: Add user_ns_desc here ]

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-05-18 03:36:56 +04:00
Cyrill Gorcunov
c459e417f3 ns: Beautify namespaces.h
I'll need to modify this header so before anything
else lets beautify it

 - drop struct pstree_item declaration it's already in pstree.h
 - move struct cr_options to top
 - align members of struct ns_desc
 - move externs to top
 - add argument name to try_show_namespaces

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-05-15 00:37:44 +04:00
Pavel Emelyanov
475bb1e775 rst: Evaluate per-task clone mask early
When we've read all pstree-items and their ids we
can get the desired clone-flags early and avoid all
these dances with flag calculations in fork_with_pid
and company.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-01-19 01:16:19 +04:00
Pavel Emelyanov
a46831aee9 cr: Detect namespaces presence automatically
Introduce the current_ns_mask variable, that collects info about
which namespaces tasks being dumped and to be restored live in.

For simlicity all tasks are supposed to live in one set of spaces.
This should be fixed eventually.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-01-18 13:25:10 +04:00
Pavel Emelyanov
2105e18eee core: Save tasks' namespaces IDs in the ids image
The recent kernels allow to get namespaces IDs by reading proc-ns links.
Use this to generate IDs for tasks' namespaces (I do generate them, since
IDs provided by kernel look ugly :( ).

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-01-18 13:18:33 +04:00
Pavel Emelyanov
3a1c7d1d76 ns: Introduce ns descriptors
These are structs that (now) tie together ns string
and the CLONE_ flag. It's nice to have one (some code
becomes simpler) and will help us with auto-namespaces
detection.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-01-15 23:24:01 +04:00
Cyrill Gorcunov
830d92b0f0 headers: Unify include guards (in comments) and a few fixes
- fix names in comments
 - add empty lines where needed
 - fix rbtree.h
 - fix syscall-types.h

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-12-25 22:40:24 +04:00
Cyrill Gorcunov
e0be540401 pstree: Move struct pid to pstree.h
I believe this make sense to keep this structure
in pstree.h where pstree related data lives.

Also I've added some comments on struct pid members.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-10-08 18:59:36 +04:00
Pavel Emelyanov
2d56d1b056 ns: Add ability to save original ns and restoring it back while switcing
This will be required for parasite transport socket creation -- it will
have to be created in a net ns we're putting parasite in and then we'll
have to restore it back to original to go on dumping.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-08-02 07:55:05 +04:00
Andrey Vagin
634dd10b32 namespaces: dump_namespaces get a struct pid as argument
It uses pid for create image file and real_pid for dumping ns-s.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-06-19 22:16:54 +04:00
Pavel Emelyanov
c58abfd03d show: Introduce ->show callback for fdset
Each fdset item now has the callback which will show a contents of a magic-described
image file. Per-task and global show code is reworked to walk the respective fdsets
and calling ->show on each file.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-03-27 12:01:14 +04:00
Stanislav Kinsbursky
0213d3ec64 namespaces: parametrized namespace option introduced
v2: strlen() check removed from parse_ns_string()

Now '-n' option must be followed by namespaces tags, separated by commas.
Currently, only "uts" namespace is supported.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2012-01-31 22:32:22 +04:00
Stanislav Kinsbursky
225d119e5d namespaces: split UTS and generic code
Generic code will be used for other namespaces.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2012-01-31 13:43:28 +04:00
Pavel Emelyanov
98f4c2e4de ns: Support UTS namespace
Only two fields are modifiable -- hostname and domainname. So
read them on dump and write on restore.

File format is simple --

u32 magic
u32 length of nodename
u8[] nodename string
u32 length of domainname
u8[] domainname string

For OpenVZ we can write the release at the end, but this is later.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2012-01-26 16:54:22 +04:00
Pavel Emelyanov
3391416a1b crtools: Namespaces support skeleton
New option -n to dump/restore namespaces.

Fork the namespaces dumping task and write a helper for switching a namespace.

Prepare the restorer code for restoring namespaces before root task.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2012-01-26 16:54:22 +04:00