Here is an example from a fedora container:
65 64 252:0 /vz/private/1 / rw,relatime shared:29 - ext4 /dev/mapper/centos_pcs-root rw,data=ordered
77 65 252:0 /vz/private/1/var/tmp/systemd-httpd.service-XLnJPNc/tmp /var/tmp rw,relatime shared:41 - ext4 /dev/mapper/centos_pcs-root rw,data=ordered
We can see non-root shared mount, which is mounted to the root
mount from the same shared group. The test emulates this situation.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Fedora bind-mounts a part of the root mount to itself. Currently we
don't allow to mount children of a shared mount, if other mount from
this shared group are not mounted.
This patch adds an exclusion for cases, when a child has the same
group. We allow to mount a child, if wider mounts are mounted.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Currently we connect roots of sub-namespaces to the root of the root
mount namespace. And we get problems, if the root of the root mntns is
shared, because all children of a shared mount must be propagated to
other mounts in this group.
Actually we mount tmpfs in mnt_roots and here is nothing wrong to add it
in a tree.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
And use the expression for it, it's quite short. This
makes the amount of variables in the code fit into brains.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
Do paths conversions and checks step-by-step and add many comments
what we do in each step and why.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
When we check whether a submount of a mount is visible in another
mount (shared peer of the latter), we can and should use the new
issubpath helper.
Should because the used strncmp may scan beyond ct_mpnt_rpath if
its length is smaller (no checks for this in the code).
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
These are constant for given m, so calculate them outside
of the loop. Also rename them to reflect what they are.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
The path lenght is zero for the "/" one and strlen(path)
for all the others. This is done so to make it possible
to use this length to get tail-paths: if path_1 starts
with path_2 and both are absolute, then
path_1 + path_length(path_2)
would give the tail of the tail of path_1 relative to
path_2 even if the path_2 is just "/".
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
A problem which is solved in this path is that some children can be
unaccessiable (unvisiable) for non-root bind-mounts
root mount point
-------------------
/ /a (shared:1)
/ /a/x
/ /a/x/y
/ /a/z
/x /b (shared:1)
/ /b/y
/b is a non-root bind-mount of /a
/y is visiable to both mounts
/z is vidiable only for /a
Before this patch we checked that the set of children is the same for
all mount in a shared group. Now we check that a visiable set of mounts
is the same for all mounts in a shared group.
Now we take the next mount in the shared group, which is wider or equal
to current and compare children between them.
Before this patch validate_shared(m) validates the m->parent mount.
Now it validates the "m" mount. So you can find following lines in the
patch:
- if (m->parent->shared_id && validate_shared(m))
+ if (m->shared_id && validate_shared(m))
We doesn't support shared mounts with different set of children.
Here is an example of such case can be created:
mount tmpfs a /a
mount --make-shared /a
mkdir /a/b
mount tmpfs b /a/b
mount --bind /a /c
In this case /c doesn't have the /b child. To support such cases,
we need to sort all shared mounts accoding with a set of children.
v2: If root is equal to "/", its len should be zero. We expect that the
last symbol in a path is not "/".
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We scan threads and children list several times while freezing
the tree, this is done to avoid race with new threads/kids
appearing.
Factor out the iterations code.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
To make the threads collect code be structured similar to
children collect. This will also help in further patching.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Currently this test creates one process and wait it. So most part of the
time this test has only one process without children.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
For other tests set of file descriptors can be changed
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Right now we push all the auxiliary arguments to parasite_infect_seized
while 2 of them are only required to calculate the size of args area.
Let's better keep track of required args size and get rid of excessive
arguments to parasite_infect_seized().
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We need to know this to insert holes. Currently xfer->parent isn't
initialized for remote sessions. In most cases it has a non-zero value,
so generate_iovs() is called with has_parent = true.
bash test/zdtm.sh -p -P -i 3 ns/transition/fork
(00.000106) Error (sysctl.c:194): Can't open sysctl net/ipv4/tcp_wmem: No such file or directory
(00.017048) 420: Error (image.c:231): Unable to open pagemap-420.img: No such file or directory
(00.017065) 420: Error (image.c:231): Unable to open pages-420.img: No such file or directory
(00.017090) 420: Error (page-read.c:73): No parent for snapshot pagemap
(00.017290) 86: Error (cr-restore.c:1185): 420 exited, status=1
(00.017317) Error (cr-restore.c:1831): Restoring FAILED.
v2: add a new command to open a page server. It's required to save
backward compatibility. If someone tries to use an old version of
page server, he will get an error.
Reported-by: Mr Jenkins
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We do the same for other features.
Here is an exception in case of the --ms option.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
ispathsub("/foo", "/") reports false. This is a corner case,
as 2nd argument is not expected to end with /. Fix this and
add comment about ispathsub() arguments assumptions.
Reported-by: Andrey Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Restoring mount namespaces requires to create temporary directories
in a test root.
When tests execute in a new userns, they have non-zero gid and uid,
so we need to grant permissions for them.
v2: add +rx as well
Reported-by: Mr Jenkins
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When we validate the mount tree not to have overmounts we need to
check one path to be the sub-path of another. Here's a helper for
this.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
We enter into the target userns and try to enter in other namespaces.
The "enter" operation requires CAP_SYS_ADMIN in a user namespace,
where a taget namespace was created.
Now if one or more namespaces were created in another userns,
criu stops dumping and return an error. I want to find someone, who uses
this configuration. In this case restore will be more complicated.
Current version covers containers needs.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Sockets tests are excluded, because SO_RCVBUFFORCE and SO_SNDBUFFORCE
are protected by CAP_NET_ADMIN
tty*, pty* are excluded, because TIOCSLCKTRMIOS protected by
CAP_SYS_ADMIN
*ghost, *notify, *unlink* are excluded, because linkat(AT_EMPTY_PATH)
are protected by CAP_DAC_READ_SEARCH
v2: use a blacklist
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Here are two issues:
1. All mounts in a new user namespace are locked, so
we need to create a new root mount. We need to bind-mount root to
itself.
2. /proc and /sys must be mounted before umounting /proc and /sys
which were inhereted. It's a security policy.
"""
Author: Eric W. Biederman <ebiederm@xmission.com>
Date: Sun Mar 24 14:28:27 2013 -0700
userns: Restrict when proc and sysfs can be mounted
Only allow unprivileged mounts of proc and sysfs if they are already
mounted when the user namespace is created.
"""
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Devices can not be created in a new user namespace.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>