It's used in the same file, where it's declarated.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Add missed return on memory allocation fail branch.
Found by coverity.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Do not use pid namespace helper when there is one-level pid.
If it's one-level, then the created task is in root pid ns.
Also, as a parent's level is less or equal a child's,
then parent is in root pid ns too. So, write next pid directly.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
That is for if-unset check in dump_task_thread(), which compares virt
and -1. It is ok not to initialize virt if kernel has NSpid in
/proc/pid/status as parse_pid_status() will rewrite zeroes, but on
VZ7 kernel it will fail:
https://ci.openvz.org/job/CRIU/job/CRIU-virtuozzo/job/criu-dev/2021/
Acked-by: Kirill Tkhai <ktkhai@virtuozzo.com>
======================== Run zdtm/static/clone_fs in h =========================
Start test
./clone_fs --pidfile=clone_fs.pid --outfile=clone_fs.out
Run criu dump
=[log]=> dump/zdtm/static/clone_fs/24/1/dump.log
------------------------ grep Error ------------------------
(00.007511) Dumping general registers for 25 in native mode
(00.007525) Dumping GP/FPU registers for 25
(00.007535) 25 has 0 sched policy
(00.007542) dumping 0 nice for 25
(00.007549) Error (criu/cr-dump.c:863): Parasite and /proc/[pid]/status gave different tids
------------------------ ERROR OVER ------------------------
Run criu restore
=[log]=> dump/zdtm/static/clone_fs/24/1/restore.log
------------------------ grep Error ------------------------
(00.000497) Add user ns 2 pid 24
(00.000500) Add pid ns 1 pid 24
(00.000503) Add ipc ns 4 pid 24
(00.000506) Add uts ns 5 pid 24
(00.000514) Error (criu/pstree.c:501): Can't skip zero pids levels (0) or find {parent,} ns (1)
(00.000520) Error (criu/pstree.c:813): BUG at criu/pstree.c:813
------------------------ ERROR OVER ------------------------
Acked-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The root task can live in another netns and it has to be restored
before executing setup-namespaces scripts.
Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Looks-good-to: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
runc restore executes criu with --emptyns network and set
a setup-namespaces script to restore a network namespace.
https://github.com/xemul/criu/issues/314
Looks-good-to: Pavel Emelyanov <xemul@virtuozzo.com>
Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Fixes: 2189b9c71d3d ("net: allow to dump and restore more than one network namespace")
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Allow nested pid_ns, but turn restoring of pgid and sid off for the cases,
when there are child pid namespaces. This functionality will be realized
by Pavel Tikhomirov, he is working on that.
v4: Also make restore_before_setsid() always return false if there are
child pid namespaces
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
If task has no INIT_PID, then clear this clone flag.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Make the sanity check working in case of mutli-level pids.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
If there is multi-level pid, ask helpers to populate /proc last pids
in their active pid namespaces. So, thread will be created with right
NStids.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
We need a socket to request NStids for tasks threads.
Transport socket will be used for that in next patches.
So, close it later, after threads are created.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Request helpers to set ns_last_pid in their active pid_ns.
Of course, optimizations are possible here, but not for now.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Since child's pid_ns may have user_ns not equal
to parent's, and we do not want to lose parent's
user_ns (as it's not impossible to restore it back),
create the child from a sub-process.
v3: New
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Get pid_ns of the child and setns() it.
Of course, many optimizations are possible
here, but not for now.
v3: Save current pid_for_children ns to do not do excess setns()
when it's already set.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
If we are not creating a pid_ns, we need to wait while it's not
created by parent of the ns's INIT_PID.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Task may set last_pid only for its active pid namespace,
so if NSpid of a child contains more then one level, we
need external help to populate the whole pid hierarhy
(pid in parent pid_ns, pid in grand parent etc). Pid ns
helpers are used for that.
These are childred of usernsd, which are listening for
socket, and setting requested last pid in their active
pid_ns.
v4: Move destroy_pid_ns_helpers() before CR_STATE_RESTORE_SIGCHLD
change, as they must die before zombies.
v3: Block SIGCHLD during stoppinig of pid_ns helpers.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Handler to catch exited children of usernsd.
This will be used in next patches to watch
for pid_ns helpers exit.
v4: Do not watch for CLD_TRAPPED, as it's need only for root_item
v3: pr_err() -> pr_info()
v2: New
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
This will be used in next patches to set specific
handler, not only currently hardcoded sigchld_handler().
v2: New
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
In next patches we will need a named socket in usernsd,
to send "set last pid" requests. Create transport socket
for that.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
This futex is need to notify waiter, that ns has created
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
We need a parent for pid_ns helpers. This can't be criu task,
as this introduces circular dependencies. So choose usernsd
for that, as we create it almost always.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Save pid_ns of just created pid_ns to fdstore. This will
allow other tasks to get it to create children.
v5: Use store_self_ns() helper. Move code to separate function.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Do that before creation of usernsd. This allows to get fdstore
files from usernsd.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Field to keep fdstore id of open pid_ns descriptor.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
We will need this, when pid has more than one level.
Even if we create task with CLONE_NEWPID, we will
need to populate parent pid levels, i.e., to have
this file, which we use for synchronization, locked.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Move next pid writing functionality to separate function.
Also make set_next_pid() skip INIT_PID, as I'm going to
call this function unconditionally in next patches.
v4: Call perror() before close()
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Task is able to set "/proc/sys/kernel/ns_last_pid" only for
its active pid_ns. So, if we create a multi-level pid task,
we need helpers, which allow to set pids in whole pid hierarhy.
This patch reserves a helper for every pid_ns from free pids
of this ns.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Extrack a helper to find a task by its pid value in any pid_ns of hierarhy
by its value in this pid_ns.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Add ns argument to this function to be able to find a free pid
in speific pid_ns, not only in top_pid_ns.
v4: Use ns[level] instead of ns[0] during next dereferrence.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Use new added fields to store pids in the whole ns hierarhy.
v4: Fix erratum with wrong index of ns_pgid[i] (was 0)
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Add new fields to store NSpid, NSsid, NSpgid for task and NStid for threads.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Zombie kills itself and it must you the pid number
from its own pid namespace.
v4: New
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
In case of kdat.has_nspid == true, zombie pids are already
dumped.
v4: New
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
We need to know root_item's pid_ns nesting level
(if its pid_ns is similar to criu's, or not)
before collect_tasks(). Thus we populate
pstree_item::pid::level correctly.
v4: New
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Since parse_thread_status() still does not use a passed thread,
moving the allocation up is just a refactoring.
Also make readable the nightmare with indexes and simplify BUG_ON()
using ">=".
v4: New
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Currently this is refactoring, as only level == 1 is allowed.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Implement a helper receiving number of levels between
pid namespace level NS_CRIU and NS_ROOT.
v2: Pre-dump tasks pid_ns in predump_task_ns_ids()
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Check that a pid_ns create with custom user_ns is restore right:
parent (pid_ns1, user_ns1)
|
v
child (pid_ns2, user_ns2)
pid_ns1 (of user_ns1)
|
v
pid_ns2 (of user_ns2)
user_ns1
|
v
user_ns2
v3: New
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Create parent (P) and its three children (C1, C2 and C3)
with different pid namespaces:
P (pid_ns1)
/|\
/ | \
/ | \
/ | \
/ | \
(pid_ns1) C1 C2 C3 (pid_ns1)
(pid_ns2)
where pid_ns1 is a parent of pid_ns2:
pid_ns1
|
pid_ns2
Children C1, C2 and C3 created in the written order,
i.e. C1 has the smallest pid and C2 has the biggest.
After receiving signal check, that pid namespaces
restored right.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
I need named socket to communicate with pid_ns helpers
(see next patches) and receive answer from them
(it's impossible to send answer to unnamed socket).
As we already have transport socket, we'll reuse it
for the above goal too.
This patch makes transport sockets be created before
creation of children tasks. Also, now they are created
not only for alive tasks (so we need additional
manipulations for TASK_HELPERS, e.g., to call prepare_fdt()).
v5: Return CLONE_FILES clone() argument during task helpers
creation. Also get rid of fdt_mutex as CLONE_FILES processes
does not close old files after clone, and we don't have
intertersections between them. Also, socket() system call
can't return a fd in service fds range, which was the main
reason to have this mutex.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Next patches will create transport sockets in task helpers.
As helpers are forked using CLONE_FILES, they must resolve
shared fds to create their own service fds. This patch allows
that.
I've digged in the code, and there is no a reason, we need
pid_rst_prio() during choosing of fdt restorer. So, this
case may be safely deleted, which guarantees, that in case
of TASK_HELPER, the restorer of fdt will be parent, i.e.,
no one TASK_HELPER will be restorer of fdt.
v5: New
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
This is refactoring, which will be used in next patches.
BUG_ON() just to mention that parent must be set before
call of this function.
v5: New
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
mntns_get_root_fd() may be called by a task from
!root_user_ns, and it fails if so.
Put root fd to fdstore to allow use it every task.
v3: New
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
In next patches usernsd will need to create transport
socket in the same net_ns as other tasks do their
TRANSPORT_FD_OFF sockets.
Choose criu net_ns for that: this allows usernsd
to do not wait for creation of other net_ns, i.e.
to do not introduce new dependencies between tasks.
In case of (root_ns_mask & CLONE_NEWUSER) != 0
root_item's user_ns does not allow to restore criu net_ns,
so do prepare_net_namespaces() in sub-process to do not
lose criu net.
v3: Introduce __prepare_net_namespaces and execute it in cloned task.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Helper to free pstree_item and its components.
Also, use it in collect_children() to free
dynamically allocated components.
v4: New
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
We already vpid() function. Make it #define instead of that,
and this will allow to use it in assignments:
vpid(item) = pid;
Also, introduce vsid(), vpgid() and vtid() like it's shown below:
#define vpid(item) (item->pid->ns[0].virt)
#define vsid(item) (item->sid->ns[0].virt)
#define vpgid(item) (item->pgid->ns[0].virt)
#define vtid(item, i) (item->threads[i]->ns[0].virt)
https://travis-ci.org/tkhai/criu/builds/225938195
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
When --remote option is specified, read_local_page tries to pread from a
socket, and fails with "Illegal seek" error.
Restore single pread call for regular image files case and introduce
maybe_read_page_img_cache version of maybe_read_page method.
Generally-approved-by: Rodrigo Bruno <rbruno@gsd.inesc-id.pt>
Acked-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>