2
0
mirror of https://github.com/checkpoint-restore/criu synced 2025-08-30 13:58:34 +00:00
Commit Graph

9272 Commits

Author SHA1 Message Date
Kirill Tkhai
2f1b6d40c5 pid: Set pid_ns before we create a child
Get pid_ns of the child and setns() it.
Of course, many optimizations are possible
here, but not for now.

v3: Save current pid_for_children ns to do not do excess setns()
    when it's already set.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:30 +03:00
Kirill Tkhai
d36e9f9f03 pid: Wait till pid_ns created before we create a child of this ns
If we are not creating a pid_ns, we need to wait while it's not
created by parent of the ns's INIT_PID.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:30 +03:00
Kirill Tkhai
290b9f2a62 pid: Create pid_ns helpers
Task may set last_pid only for its active pid namespace,
so if NSpid of a child contains more then one level, we
need external help to populate the whole pid hierarhy
(pid in parent pid_ns, pid in grand parent etc). Pid ns
helpers are used for that.

These are childred of usernsd, which are listening for
socket, and setting requested last pid in their active
pid_ns.

v4: Move destroy_pid_ns_helpers() before CR_STATE_RESTORE_SIGCHLD
change, as they must die before zombies.

v3: Block SIGCHLD during stoppinig of pid_ns helpers.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:30 +03:00
Kirill Tkhai
108a469469 ns: Add usernsd signal handler
Handler to catch exited children of usernsd.
This will be used in next patches to watch
for pid_ns helpers exit.

v4: Do not watch for CLD_TRAPPED, as it's need only for root_item
v3: pr_err() -> pr_info()
v2: New

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:30 +03:00
Kirill Tkhai
8f8f1e3a00 cr-restore: Add argument to criu_signals_setup()
This will be used in next patches to set specific
handler, not only currently hardcoded sigchld_handler().

v2: New

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:30 +03:00
Kirill Tkhai
da0fc6cca0 ns: Install transport fd socket in usernsd
In next patches we will need a named socket in usernsd,
to send "set last pid" requests. Create transport socket
for that.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:30 +03:00
Kirill Tkhai
905b0c7667 pid: Add pid ns futex helper_created
This futex is need to notify waiter, that ns has created

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:30 +03:00
Kirill Tkhai
7ef7bf8f75 ns: Always start usernsd
We need a parent for pid_ns helpers. This can't be criu task,
as this introduces circular dependencies. So choose usernsd
for that, as we create it almost always.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:30 +03:00
Kirill Tkhai
589c378e12 pid: Save created pid_ns fd to fdstore
Save pid_ns of just created pid_ns to fdstore. This will
allow other tasks to get it to create children.

v5: Use store_self_ns() helper. Move code to separate function.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:30 +03:00
Kirill Tkhai
4f4101eaca fdstore: Init fdstore earlier
Do that before creation of usernsd. This allows to get fdstore
files from usernsd.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:30 +03:00
Kirill Tkhai
c9a0e1c4b9 pid: Add fdstore id for pid_ns descriptor
Field to keep fdstore id of open pid_ns descriptor.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:30 +03:00
Kirill Tkhai
d3948be062 pid: Always lock last pid file on clone()
We will need this, when pid has more than one level.
Even if we create task with CLONE_NEWPID, we will
need to populate parent pid levels, i.e., to have
this file, which we use for synchronization, locked.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:30 +03:00
Kirill Tkhai
92d27ac2b4 restore: Implement set_next_pid() helper
Move next pid writing functionality to separate function.
Also make set_next_pid() skip INIT_PID, as I'm going to
call this function unconditionally in next patches.

v4: Call perror() before close()

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:30 +03:00
Kirill Tkhai
aa07409e0f ns: Reserve pid_ns helpers
Task is able to set "/proc/sys/kernel/ns_last_pid" only for
its active pid_ns. So, if we create a multi-level pid task,
we need helpers, which allow to set pids in whole pid hierarhy.

This patch reserves a helper for every pid_ns from free pids
of this ns.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:30 +03:00
Kirill Tkhai
a88c55f9ca pstree: Extract __pstree_item_by_virt() to act on any pid_ns
Extrack a helper to find a task by its pid value in any pid_ns of hierarhy
by its value in this pid_ns.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:30 +03:00
Kirill Tkhai
9c0feb829a pstree: Make get_free_pid() work for different pid_ns and export it
Add ns argument to this function to be able to find a free pid
in speific pid_ns, not only in top_pid_ns.

v4: Use ns[level] instead of ns[0] during next dereferrence.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:30 +03:00
Kirill Tkhai
e83585b93d pstree: Dump and restore NSpid, NSsid etc
Use new added fields to store pids in the whole ns hierarhy.

v4: Fix erratum with wrong index of ns_pgid[i] (was 0)

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:30 +03:00
Kirill Tkhai
a04d413000 images: Add NSpids pstree descriptions
Add new fields to store NSpid, NSsid, NSpgid for task and NStid for threads.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:30 +03:00
Kirill Tkhai
9018ff911d zombie: Kill by last_level_pid, not by vpid
Zombie kills itself and it must you the pid number
from its own pid namespace.

v4: New

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:30 +03:00
Kirill Tkhai
fec8dc4a1e pstree: Skip zombie dumping tricks if there is kdat.has_nspid
In case of kdat.has_nspid == true, zombie pids are already
dumped.

v4: New

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:30 +03:00
Kirill Tkhai
3e6fdadebc pstree: Collect NSpid, NSsid and NStgid when possible
v4: New

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:30 +03:00
Kirill Tkhai
f334e43bcf pstree: Pre-dump ns ids before tasks
We need to know root_item's pid_ns nesting level
(if its pid_ns is similar to criu's, or not)
before collect_tasks(). Thus we populate
pstree_item::pid::level correctly.

v4: New

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:30 +03:00
Kirill Tkhai
0892ab3f56 pstree: Move thread allocation up and do cleanup
Since parse_thread_status() still does not use a passed thread,
moving the allocation up is just a refactoring.

Also make readable the nightmare with indexes and simplify BUG_ON()
using ">=".

v4: New

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:30 +03:00
Kirill Tkhai
8a9b9fee41 pstree: Use thread group leader level of pid to allocate threads
Currently this is refactoring, as only level == 1 is allowed.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:30 +03:00
Kirill Tkhai
9d0ff98930 pstree: Introduce PID_SIZE() helper
We will use this expression in more places.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:30 +03:00
Kirill Tkhai
096bb4071e pstree: Change arguments in parse_pid_status()
Pass task or thread there.

v4: New

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:30 +03:00
Kirill Tkhai
d308a86f2a pid_ns: Implement pid_ns_root_off()
Implement a helper receiving number of levels between
pid namespace level NS_CRIU and NS_ROOT.

v2: Pre-dump tasks pid_ns in predump_task_ns_ids()

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:30 +03:00
Kirill Tkhai
27b6096cd8 zdtm: Add pidns01 test
Check that a pid_ns create with custom user_ns is restore right:

parent (pid_ns1, user_ns1)
  |
  v
child  (pid_ns2, user_ns2)

pid_ns1 (of user_ns1)
  |
  v
pid_ns2 (of user_ns2)

user_ns1
  |
  v
user_ns2

v3: New

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:30 +03:00
Kirill Tkhai
80f23b436e zdtm: Add pidns00 test
Create parent (P) and its three children (C1, C2 and C3)
with different pid namespaces:

	     P (pid_ns1)
	    /|\
	   / | \
	  /  |  \
	 /   |   \
	/    |    \
(pid_ns1) C1     C2    C3 (pid_ns1)
	 (pid_ns2)

where pid_ns1 is a parent of pid_ns2:

	   pid_ns1
	      |
	   pid_ns2

Children C1, C2 and C3 created in the written order,
i.e. C1 has the smallest pid and C2 has the biggest.

After receiving signal check, that pid namespaces
restored right.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:30 +03:00
Kirill Tkhai
b2702d1ae2 cr-restore: Open transport socket earlier
I need named socket to communicate with pid_ns helpers
(see next patches) and receive answer from them
(it's impossible to send answer to unnamed socket).
As we already have transport socket, we'll reuse it
for the above goal too.

This patch makes transport sockets be created before
creation of children tasks. Also, now they are created
not only for alive tasks (so we need additional
manipulations for TASK_HELPERS, e.g., to call prepare_fdt()).

v5: Return CLONE_FILES clone() argument during task helpers
creation. Also get rid of fdt_mutex as CLONE_FILES processes
does not close old files after clone, and we don't have
intertersections between them. Also, socket() system call
can't return a fd in service fds range, which was the main
reason to have this mutex.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:29 +03:00
Kirill Tkhai
5fdfd7626b files: Make possible task helpers to use shared_fdt_prepare()
Next patches will create transport sockets in task helpers.
As helpers are forked using CLONE_FILES, they must resolve
shared fds to create their own service fds. This patch allows
that.

I've digged in the code, and there is no a reason, we need
pid_rst_prio() during choosing of fdt restorer. So, this
case may be safely deleted, which guarantees, that in case
of TASK_HELPER, the restorer of fdt will be parent, i.e.,
no one TASK_HELPER will be restorer of fdt.

v5: New

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:29 +03:00
Kirill Tkhai
61a38fba15 pstree: Change type of init_pstree_helper() and check for parent
This is refactoring, which will be used in next patches.
BUG_ON() just to mention that parent must be set before
call of this function.

v5: New

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:29 +03:00
Kirill Tkhai
1dce09bb14 mnt: Put root fd to fdstore
mntns_get_root_fd() may be called by a task from
!root_user_ns, and it fails if so.

Put root fd to fdstore to allow use it every task.

v3: New

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:29 +03:00
Kirill Tkhai
9c7195ff39 ns: Do not change net_ns in prepare_net_namespaces()
In next patches usernsd will need to create transport
socket in the same net_ns as other tasks do their
TRANSPORT_FD_OFF sockets.

Choose criu net_ns for that: this allows usernsd
to do not wait for creation of other net_ns, i.e.
to do not introduce new dependencies between tasks.

In case of (root_ns_mask & CLONE_NEWUSER) != 0
root_item's user_ns does not allow to restore criu net_ns,
so do prepare_net_namespaces() in sub-process to do not
lose criu net.

v3: Introduce __prepare_net_namespaces and execute it in cloned task.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:29 +03:00
Kirill Tkhai
0904d50e67 pstree: Implement free_pstree_item() helper
Helper to free pstree_item and its components.
Also, use it in collect_children() to free
dynamically allocated components.

v4: New

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:29 +03:00
Kirill Tkhai
79c3e2618e pstree: Implement vpgid(), vsid() and vtid()
We already vpid() function. Make it #define instead of that,
and this will allow to use it in assignments:

	vpid(item) = pid;

Also, introduce vsid(), vpgid() and vtid() like it's shown below:

#define vpid(item)     (item->pid->ns[0].virt)
#define vsid(item)     (item->sid->ns[0].virt)
#define vpgid(item)    (item->pgid->ns[0].virt)
#define vtid(item, i)  (item->threads[i]->ns[0].virt)

https://travis-ci.org/tkhai/criu/builds/225938195

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:29 +03:00
Mike Rapoport
b7d1ea24cd pagemap: fix reading pages from socket for --remote case
When --remote option is specified, read_local_page tries to pread from a
socket, and fails with "Illegal seek" error.
Restore single pread call for regular image files case and introduce
maybe_read_page_img_cache version of maybe_read_page method.

Generally-approved-by: Rodrigo Bruno <rbruno@gsd.inesc-id.pt>
Acked-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:28 +03:00
Kirill Tkhai
0e245fca75 kerndat: Check that "/proc/[pid]/status" file has NS{pid, ..} lines
If there is nested pid_ns, we need to be able to get pid in
the whole pid hierarhy. This may be taken from "/proc/[pid]/status"
file only. Check, that kernel has support for it.

v3: Add criu feature check

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:22:17 +03:00
Kirill Tkhai
9e5945ce87 pstree: Make lookup_create_pid() able to create tasks with pid->level > 1
Pid may contain more then one level, so this patch teaches the function
to work with such pids. The signify difference after this patch is that
we link a new item in several rb_root in every ns.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
2017-11-30 01:22:17 +03:00
Kirill Tkhai
749da7a973 ns: Add MAX_NS_NESTING
It's maximum number of levels of namespaces found in linux kernel.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
2017-11-30 01:22:17 +03:00
Kirill Tkhai
0ee26a863d pstree: Add pid_ns id argument to lookup_create_pid()
Pass a namespace of item to the function.
This will allow to link the pid in correct ns::pid::root_rb
in next patches.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
2017-11-30 01:22:17 +03:00
Kirill Tkhai
e051f842a9 pstree: Split lookup_create_pid()
Extract the function, which seaches for existing pid.
In next patches we will use it.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
2017-11-30 01:22:17 +03:00
Kirill Tkhai
c9aa6f3548 pstree: Add pid_ns check in read_pstree_image
Sanity check, that we have pid_ns_id. As we dump pid_ns_id
since ids are implemented, they must be always.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
2017-11-30 01:22:17 +03:00
Kirill Tkhai
729bda402b pstree: Dump pid and user ns ids for dead tasks
Dead task has them set, so we must dump and restore them.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
2017-11-30 01:22:17 +03:00
Kirill Tkhai
7c87c66290 pstree: Assign ids for dead tasks in read_pstree_image()
Alive tasks must have ids populated, while dead tasks have
pid and user namespaces are set (in Linux). But we never
dumped ids for dead tasks.

Since we have no support nested pid ns yet, only one pid_ns
is possible in existing dumps, so it must be equal to
root_item's. User ns is not so, but currently it's impossible
to know dead task's user ns from the dump, so set it
to root_item's too.

In further, we're going to dump ids for all tasks: see next
patches for that. This patch is only to handle old images
with unexisting dead tasks's ids.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
2017-11-30 01:22:17 +03:00
Kirill Tkhai
5ccb178a0f pstree: Move parent assignment in read_pstree_image() up
Move block with finding of parent item up in the function.
No functional changes, only changing the order.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
2017-11-30 01:22:17 +03:00
Kirill Tkhai
58abb5fb53 ids: Copy unexisted ids from root_item
ids were introduced sequentially, so some old image
may not have some id. Copy them from root_item.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
2017-11-30 01:22:17 +03:00
Kirill Tkhai
65fe92b94c pid: Add ns::pid::rb_root
Add a per-ns rb tree to link pids. Should replace global pid_root_rb.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
2017-11-30 01:22:17 +03:00
Kirill Tkhai
74cc3a4f5d pid: Add top_pid_ns
It's the most parent pid namespace, which is seen by dumpees.
It's NS_ROOT if root_ns_mask has CLONE_NEWPID, and NS_CRIU
otherwise.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
2017-11-30 01:22:17 +03:00
Kirill Tkhai
c59c372414 pstree: Read ids earlier in read_pstree_image()
Read ids before creation of item, then we'll know
pid_ns of the item, so later we will be able to
allocate item with right levels of pid (in next patches).

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
2017-11-30 01:22:17 +03:00