We link files to each other at restore time to restore
unlinked paths. Kernel has strange secutiry restrictions
about linkat we use. If the fsuid of the caller doesn't
equals the uid of the file and the file is not "safe"
one, then only global CAP_CHOWN will be allowed to link().
This brings problems in user namespaces -- uns root is
not allowed to linkat any file, unlike global root.
Fortunately, we can change the fsuid temporarily and
still linkat the file we want. Hopefully this hack will
go away some day soon, when the kernel will have saner
checks for linkat capabilities.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
The test uses map_files dir to check for mapping being restored,
while this proc directory is only available for CAP_SYS_ADMIN.
Fix this by checking less strict /proc/pid/maps.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The rest partially need more userns_call-s but mostly just don't
work in userns themselves. Need further investigation.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
Locked termios require global CAP_SYS_ADMIN. But let's
restore everything for tty in one call since regular
termios depend on locked and it's not nice to do sync
usernsd call for locked only.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
The syscall in question requires global CAP_DAC_READ_SEARCH.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
We have collected a good set of calls that cannot be done inside
user namespaces, but we need to [1]. Some of them has already
being addressed, like prctl mm bits restore, but some are not.
I'm pretty sceptical about the ability to relax the security
checks on quite a lot of them (e.g. open-by-handle is indeed a
very dangerous operation if allowed to unpriviledged user), so
we need some way to call those things even in user namespaces.
The good news about it its that all the calls I've found operate
on file descriptors this way or another. So if we had a process,
that lived outside of user namespace, we could ask one to do the
high priority operation we need and exchange the affected file
descriptor via unix socket.
So the usernsd is the one doing exactly this. It starts before we
create the user namespace and accepts requests via unix socket.
Clients (the processes we restore) send him the functions they
want to call, the descriptor they want to operate on and the
arguments blob. Optionally, they can request some file descriptor
back after the call.
In non usernamespace case the daemon is not started and the calls
are done right in the requestor's process environment.
In the next patch there's an example of how to use this daemon
to do the priviledged SO_SNDBUFFORCE/_RCVBUFFORCE sockopt on
a socket.
[1] http://criu.org/UserNamespace
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@openvz.org>
User is already able to see it in stdout, so there is no
reason why we should protect it.
Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
If criu run with suid bit set, user should be able
to read pidfiles(i.e. service pidfile).
Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This will allow us to easily extend commands that crit
supports, avoiding "--help" confusion.
Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Starting with version 3.15, the kernel provides a mnt_id field in
/proc/<pid>/fdinfo/<fd>. However, the value provided by the kernel for
AUFS file descriptors obtained by opening a file in /proc/<pid>/map_files
is incorrect.
Below is an example for a Docker container running Nginx. The mntid
program below mimics CRIU by opening a file in /proc/1/map_files and
using the descriptor to obtain its mnt_id. As shown below, mnt_id is
set to 22 by the kernel but it does not exist in the mount namespace of
the container. Therefore, CRIU fails with the error:
"Unable to look up the 22 mount"
In the global namespace, 22 is the root of AUFS (/var/lib/docker/aufs).
This patch sets the mnt_id of these AUFS descriptors to -1, mimicing
pre-3.15 kernel behavior.
$ docker ps
CONTAINER ID IMAGE ...
3850a63ee857 nginx-streaming:latest ...
$ docker exec -it 38 bash -i
root@3850a63ee857:/# ps -e
PID TTY TIME CMD
1 ? 00:00:00 nginx
7 ? 00:00:00 nginx
31 ? 00:00:00 bash
46 ? 00:00:00 ps
root@3850a63ee857:/# ./mntid 1
open("/proc/1/map_files/400000-4b8000") = 3
cat /proc/49/fdinfo/3
pos: 0
flags: 0100000
mnt_id: 22
root@3850a63ee857:/# awk '{print $1 " " $2}' /proc/1/mountinfo
87 58
103 87
104 87
105 104
106 104
107 104
108 87
109 87
110 87
111 87
root@3850a63ee857:/# exit
$ grep 22 /proc/self/mountinfo
22 21 8:1 /var/lib/docker/aufs /var/lib/docker/aufs ...
44 22 0:35 / /var/lib/docker/aufs/mnt/<ID> ...
$
Signed-off-by: Saied Kazemi <saied@google.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
If a process is executed in another pidns, a /proc/PID doesn't link with
the proper process.
This patch fixes a problem like this:
1: Error (util.c:106): Unable to close fd 33: Bad file descriptor
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
There are two places where we store IP addresses (both IPv4 and IPv6).
Mark them with custom option and print them in compressed form for
--pretty output.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Ruslan Kuprieiev <kupruser@gmail.com>
I plan to mark some fields as IP address and print them respectively.
The --format hex is not nice switch for this and introducing one more
(--format hex ipadd) is too bad.
So let's fix the cirt API to be simple and stupid. By default crit
generates canonical one-line JSON. With --pretty option it splits the
output into lines, adds indentation and prints hex as hex and IP as
IP.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Ruslan Kuprieiev <kupruser@gmail.com>
Can be useful to re-run some tests in case smth failed in the middle
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
We suppose that the test is not able to exit before this moment.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
sockets.c: In function ‘preload_socket_modules’:
sockets.c:153:36: error: ‘NETLINK_SOCK_DIAG’ undeclared (first use in this function)
sockets.c:153:36: note: each undeclared identifier is reported only once for each function it appears in
Reported-by: Mr Travis
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>