It's required to dump uid-s and gid-s from this userns.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When using pr_perror(), format string should not end with \n,
as it is added by the macro itself.
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Acked-by: Andrew Vagin <avagin@odin.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We no longer need to populate ext_ns->mnt.mntinfo_list until
resolve_external_mounts(). We can rely on find_ext_ns_id() which
does collect_mntinfo() on demand.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
In the rest of this series we need to walk all the namespaces to autodetect
which mounts are master/shared/private bind mounts, so we need the information
from criu's namespace in the case when the namespaces are not the same.
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Current code doesn't make any difference between OPT and no-OPT
except for the message is printed or not in the open_image().
So this particular change changes nothing but the availability of
this message.
In the next patches I wil introduce "empty images" to deal with
the ENOENT situation in a more graceful manner.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We have collected a good set of calls that cannot be done inside
user namespaces, but we need to [1]. Some of them has already
being addressed, like prctl mm bits restore, but some are not.
I'm pretty sceptical about the ability to relax the security
checks on quite a lot of them (e.g. open-by-handle is indeed a
very dangerous operation if allowed to unpriviledged user), so
we need some way to call those things even in user namespaces.
The good news about it its that all the calls I've found operate
on file descriptors this way or another. So if we had a process,
that lived outside of user namespace, we could ask one to do the
high priority operation we need and exchange the affected file
descriptor via unix socket.
So the usernsd is the one doing exactly this. It starts before we
create the user namespace and accepts requests via unix socket.
Clients (the processes we restore) send him the functions they
want to call, the descriptor they want to operate on and the
arguments blob. Optionally, they can request some file descriptor
back after the call.
In non usernamespace case the daemon is not started and the calls
are done right in the requestor's process environment.
In the next patch there's an example of how to use this daemon
to do the priviledged SO_SNDBUFFORCE/_RCVBUFFORCE sockopt on
a socket.
[1] http://criu.org/UserNamespace
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@openvz.org>
We enter into the target userns and try to enter in other namespaces.
The "enter" operation requires CAP_SYS_ADMIN in a user namespace,
where a taget namespace was created.
Now if one or more namespaces were created in another userns,
criu stops dumping and return an error. I want to find someone, who uses
this configuration. In this case restore will be more complicated.
Current version covers containers needs.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It is cleared when a process is forked in a new userns.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
In this patch we fill /proc/PID/uid_map and /proc/PID/gid_map for the
root task.
v2: initialize groups in a new namespace.
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
v3: add a helper to initialize creds in a new userns
v4: initialize userns creds in prepare_namespaces()
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
For that we need to save per-namespace mappings of user and group IDs.
And all id-s for tasks and files are saved from the target user
namespace.
v2: move code into collect_namespaces()
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We are going to support user namespaces and uid-s will be converted
accoding with userns mappings.
v2: conver id-s for sockets too
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
and return an error, if a proccess live in another userns,
because criu doesn't support it.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
CRIU reads /proc/pid/ns/[NS] and fails of a link is not exist.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This is for two reasons. First, validation can meet external mount
and will call plugins, which is not correct on pre-dump and actually
crashes on uninitilized plugins lists. Second, even if on pre-dump
mount tree is not "supported" this can be a temporary situation (yes,
yes, unlikely, but still).
On the other hand, it's better to fail earlier, but that's another
story.
Reported-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
On pre-dump we collect only two namespaces -- the mnt one
for criu and mnt one again for root task.
This is not correct. We need all mount namespaces to make
the irmap generation work properly and we need all net
namespaces to have parasite sockets created.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We want to have buffered images to speed up dump and,
slightly, restore. Right now we use plan file descriptors
to write and read images to/from. Making them buffered
cannot be gracefully done on plain fds, so introduce
a new class.
This will also help if (when?) we will want to do more
complex changes with images, e.g. store them all in one
file or send them directly to the network.
For now the cr_img just contains one int _fd variable.
This patch chages the prototype of open_image() to
return struct cr_img *, pb_(read|write)* to accept one
and fixes the compilation of the rest of the code :)
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Since we're going to switch from int-fd-s to class-image
soon the fdset name will not fit into the new terminology.
This patch is
sed -e 's/fdset/imgset/g' -i *
sed -e 's/imgset_fd/img_from_set/g' -i *
git mv include/fdset.h include/imgset.h
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
The main reason for this is -- dumping namespace has a lot of
points when the process just waits for something. At the same
time criu process wait for the ns dumper and doesn't dump
others.
The great example of waiting for something is setns syscall.
Very often it calls synchronize_rcu() which can be quite long.
Let other processes do smth useful while this.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
Nowadays this routine is mainly used for getting an
fd, rather than keeping one for future reference.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It will fill mntinfo list and this is internal logic of mount.c
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We are going to support nested mount namespaces and each NS has own
tree. The mount tree is used for checking that a file is reachable.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
because we want to check, that all files are reachable.
For that we need to collect all mounts from all namespaces.
v2: dump mntns separately
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Now we supports sub-mntns, so root_ns_mask sounds more correct than
current_ns_mask.
v2: typo fix
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
v2: another attempt to write readable code:)
v3: clean up
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Known issue:
* currently only namespaces with the same root is supported
* nested namespaces can be dumped and restored only if the root task
has own mount namespace.
All nested namespaces are restored in a root namespace in temporary
directories. All mount points restored in one tree and then they are
divided into namesaces.
The task with minimal pid for each namespaces unshared mntns and
then it makes pivot_root in a proper temporary directory. All other
tasks makes setns to enter into a mount namespace of the task with
minimal pid.
v2: clean up
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Currently ns_ids list is filled only on dump. Soon we'll need this
list for mount namespaces on restore, e.g. to know which tasks share
the namespaces.
v2: merge the patch "namespace: add a function to search an ns_id
item by id" into this one.
v3: add prefix rst_ to add_ns_id
v4: look up namespace by two values -- type AND ID
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We have a condition
BUG_ON(kid > UINT_MAX);
but kid is unsigned int so it's never bigger than UINT_MAX,
use unsigned long instead.
CID 1042296
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It's going to be used for restoring namespaces. For example we need to
enumirate the ns_ids list for restoring mount namespaces.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Well, we want to pre-dump files (fsnotifies), for that we
will need mountinfo-s and root, and for the latter -- the
current ns mask.
The problem with current ns mask is that its generation is
incorporated into ns IDs generation and dumping. And since
the ids dumping is not performed on pre-dump, let's just
provide a helper for ns-mask generation.
Strictly speaking, the whole ns-mask idea is not great, but
it's to be fixed later.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Remove whitespace at EOL (found by git grep ' $')
To people using vim, I'd suggest adding the following code to ~/.vimrc:
let c_space_errors = 1
highlight FormatError ctermbg=darkred guibg=darkred
match FormatError /\s\+$\|\ \+\t\|\%80v.\|\ \{8\}/
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>