It may differ from group leader's pid_for_children_ns,
so dump it separate.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Handle it as other namespaces, except of case of
just unshared pid ns w/o child reaper (fail if so),
and in case of we don't have it on restore (old
images -- then get pid_for_children ns from pid ns).
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Account the fact, that ns may be found by alternative
name, and use it if so.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
pid_for_children_ns of root_item may differ from its pid_ns.
In this case we don't want mark such pid_for_children_ns
as NS_ROOT in nsid_add().
Also, we don't want to create fake pid ns, if pid_for_children_ns
is a just create pid_ns without child reaper (kernel returns ENOENT
in this case). Better we later add a support to kernel to pick
such namespaces. So, we return UINT_MAX and fail.
We encode both the possibilities using the only "alternative"
parameter, as it's enough for us. If anybody needs to separate
them in the future and introduce separate parameters, it'd be rather
simple.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
It may be read from "/proc/[pid]/ns/pid_for_children_ns",
so add this alias to pid_ns_desc.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
This is need for "/proc/[pid]/ns/pid_for_children_ns",
which is a pid namespace with not standard file name.
Also pass "alternative" parameter to generate_ns_id().
v2: Enlarge buffer size
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Currently it's set in generate_ns_id() and it's equal to:
1)NS_CRIU, if root_ns_mask does not contain CLONE_NEWPID,
2)NS_ROOT, if it contains.
The assignment is not obvious, as it's set NS_CRIU firstly,
and then rewrites in NS_ROOT, if it exists.
Mark top_pid_ns after ns_ids population in more clear way.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Don't call destroy_pid_ns_helpers() twice; add a new case
for that.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Parent waits them by pid in its active pid namespace. So,
it's need to converts them there instead of using vpid().
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Child reaper of a ns have initial user_ns equal to its pid_ns->user_ns.
Keep all ns assignments together.
v2: Delete the assignment from call_clone_fn() and rename patch.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
1)Create a pid namespace and child reaper in it;
2)Set a specific next pid for future created process;
3)Create one more process in the namespace and kill it;
4)Wait for signal
5)Check, that NSpids of dead task remains the same.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Place child reapers of pid namespaces at the beginning
of pstree_item::children list and sort them by nesting
level.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Currently, one feature is supported. Add possibility
for a test to depend on several features.
v2: Delete excess "if" as suggested by Andrey Vagin.
Rename variables to decrise patch size.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Glibc has BUG with process creation:
https://sourceware.org/bugzilla/show_bug.cgi?id=21386
It doesn't behave well when parent and child are from
different pid namespaces and have the same pid.
Use raw syscall without glibc's asserts as workaround.
Also, use raw syscall for getpid() in tests too,
as these two function go in the pair (glibc's getpid()
relies on glibc's fork()).
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
When new rb_root is created for pidns it is initialized with
RB_ROOT, so ns->pid.rb_root.rb_node is NULL at first. Later
then insert first node in lookup_create_pid() to these rb-tree
it will have (NULL & color) in node->rb_parent_color.
So the check "!rb_parent(&found->ns[i].node)" will be true for
the rb-tree's root node, and criu will fail lookup these node.
We haven't hit that yet as to get to these check we need task in
at least two levels of pidns which at the same time is the root
in rb-tree on e.g. level 0.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
This minimize chances to hit problem where files
used for page transfer are trying to use same number
reserved for service fd.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Will need it to unlimit the files allocation
for service fd reserving and later for parasite code run
(which is implemented in vz7 instance and soon will be
ported into vanilla).
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Add a fake fd type for autofs. This allows functions
like find_file_desc() work as expected, without
having two different file_desc with the same type
and same id.
Also, later, it will allow to delete autofs_create_fle()
and to use generic helper.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
1)Use CLONE_VFORK to create subprocess, as it's safe after patch
"clone_noasan: Allow to create CLONE_VM|CLONE_VFORK processe".
2)add more CLONE_XXX to flags to speedup the syscall.
3)Do not send SIGCHLD, as parent sees child's exit() synchronuos anyway.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Picked from patch "[PATCH RFC] namespaces: use CLONE_VFORK
with CLONE_VM when it is possible" by Andrew Vagin.
Currenly parent touches child's stack, as in moment of clone() call
its stack pointer is above the child's (we allocate char stack[128]
on parent's stack). This prevents to create CLONE_VM|CLONE_VFORK
processes, because the child uses stack addresses occupied by parent.
The patch changes clone_noasan() behaviour and allows to do that
with the same memory consumption. We give a child memory, which
is not used by parent clone(), so parent's and child's stacks
have no tntersection.
This allows to create CLONE_VM|CLONE_VFORK processes.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Move switch_ns() down because __pstree_pid_by_virt()
does not need cleanup.
Add more goto labels and restore ns back in case of fail.
Also delete pr_err(), because the error is already printed
by request_set_next_pid().
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Before this patch we used flock to order task creation,
but this way is not good. It took 5 syscalls to synchronize
a creation of a single child:
1)open()
2)flock(LOCK_EX)
3)flock(LOCK_UN)
4)close() in parent
5)close() in child
The patch introduces more effective way for synchronization,
which executes 2 syscalls only. We use last_pid_mutex,
and the syscalls number sounds definitely better.
v2: Don't use flock() at all
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Group them for 1)error and 2)parent cases. This minimize the code
and will be used in next patches.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
It's impossible to create a task from a pid_ns if its helper
is not created, because we wait in wait_pid_ns_helper_prepared()
for that. So, such situation here is a bug.
Move the wait and convert it to BUG().
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Get pid_ns fd from INIT_PID task of this namespace and
use switch_ns() and restore_ns().
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
It must sound do_destroy_pid_ns_helpers() with *s*.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
In next patches, root_item will need to have its real pid
to be sure, usernsd already sees it.
Also add a comment, explaning why set real pid in two places.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
We never call this function for root_item.
It's for dropping user ns, which may happen
with the rest of tasks only.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
memcpy() is not need here, as we rewrite all the fields later.
Also, use PID_SIZE() helper.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
When INIT_PID of a pid_ns emergency exits, kernel
kills all processes belonging to the namespace.
So, it's hopelessly to wait helper answer to destroy
request. Use kill() to destroy instead of that.
It will be noop in case of a handler is already
killed, and we won't stuck.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
After last patches for net ns the test works again (as envinronment changed),
so return it back.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
pid, net, ipc, uts, mnt ids exist always, and we check
for them when we are reading ids img (see previous
patch "pstree: Check for always existing task ids").
Also, pstree_item::ids exist always too (we populate
them even for dead tasks, see read_pstree_image()).
So, delete the excess checks and simplify the code.
Also, in restore_one_alive_task() check for has_user_ns_id
instead of ids, as ids always exist.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Limit the scope of this macros and make visible its borders.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
has_pid_ns_id is checked above. In could go together with
previous patch, but I separated them for easier review.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
All alive task must have have ids and the fields,
implemented before the img format became stable
(see commit 2105e18eee70).
Check for them in the only place (in additional to the
check for has_pid_ns_id, which we already have)
and this will allow to remove checks for item->ids and
for item->ids->has_xxx_ns_id from the rest of code
and make it simplier. See patch "pstree: Delete checks
of always existing pstree_item::ids on restore" in further)
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
This is already checked, when we check for parent->ids.
So, delete excess check.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>