2
0
mirror of https://github.com/checkpoint-restore/criu synced 2025-08-27 04:18:27 +00:00

9635 Commits

Author SHA1 Message Date
Kirill Tkhai
3dbe8541b1 pstree: Dump threads pid_for_children_ns
It may differ from group leader's pid_for_children_ns,
so dump it separate.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:28:55 +03:00
Kirill Tkhai
e3e1d8afd7 ns: Collect/read pid_for_children ns
Handle it as other namespaces, except of case of
just unshared pid ns w/o child reaper (fail if so),
and in case of we don't have it on restore (old
images -- then get pid_for_children ns from pid ns).

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:28:55 +03:00
Kirill Tkhai
517b399745 ns: Use alternative name in set_ns_hookups() if need
Account the fact, that ns may be found by alternative
name, and use it if so.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:28:55 +03:00
Kirill Tkhai
1323ecf207 ns: Make possible to avoid NS_ROOT assignment
pid_for_children_ns of root_item may differ from its pid_ns.
In this case we don't want mark such pid_for_children_ns
as NS_ROOT in nsid_add().

Also, we don't want to create fake pid ns, if pid_for_children_ns
is a just create pid_ns without child reaper (kernel returns ENOENT
in this case). Better we later add a support to kernel to pick
such namespaces. So, we return UINT_MAX and fail.

We encode both the possibilities using the only "alternative"
parameter, as it's enough for us. If anybody needs to separate
them in the future and introduce separate parameters, it'd be rather
simple.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:28:55 +03:00
Kirill Tkhai
e1c9383787 ns: Add alternative name for pid namespace
It may be read from "/proc/[pid]/ns/pid_for_children_ns",
so add this alias to pid_ns_desc.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:28:55 +03:00
Kirill Tkhai
d197ed96ed ns: Add possibility to read a ns by alternative name in __get_ns_id()
This is need for "/proc/[pid]/ns/pid_for_children_ns",
which is a pid namespace with not standard file name.
Also pass "alternative" parameter to generate_ns_id().

v2: Enlarge buffer size

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:28:53 +03:00
Kirill Tkhai
4c340548c9 img: Add pid_for_children_ns_id description
Store task's /proc/[pid]/ns/pid_for_children

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:26:45 +03:00
Kirill Tkhai
0dc735940f kerndat: Check for /proc/[pid]/ns/pid_for_children_ns
Introduce new extra feature "pid_for_children_ns" to show
setns'ed pid namespaces.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=eaa0d190bfe1ed891b814a52712dcd852554cb08

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:26:45 +03:00
Kirill Tkhai
6f4a4e1bbd dump: More obvious assignment of top_pid_ns
Currently it's set in generate_ns_id() and it's equal to:
1)NS_CRIU, if root_ns_mask does not contain CLONE_NEWPID,
2)NS_ROOT, if it contains.

The assignment is not obvious, as it's set NS_CRIU firstly,
and then rewrites in NS_ROOT, if it exists.

Mark top_pid_ns after ns_ids population in more clear way.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:26:45 +03:00
Kirill Tkhai
c204fb101e restore: Make error path in restore_root_task() accurate
Don't call destroy_pid_ns_helpers() twice; add a new case
for that.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:26:45 +03:00
Kirill Tkhai
879f3a0b22 restore: Convert waited helpers and zombies pids in parent's pid_ns
Parent waits them by pid in its active pid namespace. So,
it's need to converts them there instead of using vpid().

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:26:45 +03:00
Kirill Tkhai
6566cddc10 ns: Move forked task user_ns assignment
Child reaper of a ns have initial user_ns equal to its pid_ns->user_ns.
Keep all ns assignments together.

v2: Delete the assignment from call_clone_fn() and rename patch.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:26:44 +03:00
Kirill Tkhai
17b6de792d zdtm: Add pidns02 test (test on zombies)
1)Create a pid namespace and child reaper in it;
2)Set a specific next pid for future created process;
3)Create one more process in the namespace and kill it;
4)Wait for signal
5)Check, that NSpids of dead task remains the same.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:26:44 +03:00
Kirill Tkhai
6c143e5eba pstree: Add helpers for ordered linking child task to parent
Place child reapers of pid namespaces at the beginning
of pstree_item::children list and sort them by nesting
level.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:26:44 +03:00
Kirill Tkhai
d82cd43b78 zdtm: Make possible to claim for features list
Currently, one feature is supported. Add possibility
for a test to depend on several features.

v2: Delete excess "if" as suggested by Andrey Vagin.
    Rename variables to decrise patch size.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:26:43 +03:00
Pavel Emelyanov
839837b99a pstree: Add extern to "current" declaration
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:59 +03:00
Kirill Tkhai
e997d34b0c criu: Add raw fork() implementation
Glibc has BUG with process creation:
https://sourceware.org/bugzilla/show_bug.cgi?id=21386

It doesn't behave well when parent and child are from
different pid namespaces and have the same pid.

Use raw syscall without glibc's asserts as workaround.

Also, use raw syscall for getpid() in tests too,
as these two function go in the pair (glibc's getpid()
relies on glibc's fork()).

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:59 +03:00
Pavel Tikhomirov
ad741662b8 pstree: use RB_EMPTY_NODE to check that node is not linked
When new rb_root is created for pidns it is initialized with
RB_ROOT, so ns->pid.rb_root.rb_node is NULL at first. Later
then insert first node in lookup_create_pid() to these rb-tree
it will have (NULL & color) in node->rb_parent_color.

So the check "!rb_parent(&found->ns[i].node)" will be true for
the rb-tree's root node, and criu will fail lookup these node.

We haven't hit that yet as to get to these check we need task in
at least two levels of pidns which at the same time is the root
in rb-tree on e.g. level 0.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:59 +03:00
Cyrill Gorcunov
d0ffc478ea sfd: Lift up own fd limit on bootup
This minimize chances to hit problem where files
used for page transfer are trying to use same number
reserved for service fd.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2017-11-30 01:24:58 +03:00
Cyrill Gorcunov
b55eb53c1e kdat: Add fetching files stat
Will need it to unlimit the files allocation
for service fd reserving and later for parasite code run
(which is implemented in vz7 instance and soon will be
ported into vanilla).

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
2017-11-30 01:24:32 +03:00
Kirill Tkhai
8cd8f9736e files: Unexport collect_task_fd()
It has only one user, so unexport it.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:32 +03:00
Kirill Tkhai
2c2a253354 autofs: Add FD_TYPES__AUTOFS_PIPE type
Add a fake fd type for autofs. This allows functions
like find_file_desc() work as expected, without
having two different file_desc with the same type
and same id.

Also, later, it will allow to delete autofs_create_fle()
and to use generic helper.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:32 +03:00
Kirill Tkhai
522d9c7180 utils: Make call_in_child_process() use parent's stack
1)Use CLONE_VFORK to create subprocess, as it's safe after patch
"clone_noasan: Allow to create CLONE_VM|CLONE_VFORK processe".

2)add more CLONE_XXX to flags to speedup the syscall.

3)Do not send SIGCHLD, as parent sees child's exit() synchronuos anyway.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:32 +03:00
Kirill Tkhai
0315082e06 clone_noasan: Allow to create CLONE_VM|CLONE_VFORK processes
Picked from patch "[PATCH RFC] namespaces: use CLONE_VFORK
with CLONE_VM when it is possible" by Andrew Vagin.

Currenly parent touches child's stack, as in moment of clone() call
its stack pointer is above the child's (we allocate char stack[128]
on parent's stack). This prevents to create CLONE_VM|CLONE_VFORK
processes, because the child uses stack addresses occupied by parent.

The patch changes clone_noasan() behaviour and allows to do that
with the same memory consumption. We give a child memory, which
is not used by parent clone(), so parent's and child's stacks
have no tntersection.

This allows to create CLONE_VM|CLONE_VFORK processes.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:32 +03:00
Kirill Tkhai
81097c837b pid_ns: Close sk in case of pid_ns_helper_sock() fails
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:32 +03:00
Kirill Tkhai
943424a281 pid_ns: Do cleanups in do_create_pid_ns_helper()
Move switch_ns() down because __pstree_pid_by_virt()
does not need cleanup.
Add more goto labels and restore ns back in case of fail.

Also delete pr_err(), because the error is already printed
by request_set_next_pid().

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:32 +03:00
Kirill Tkhai
d4e1c5fb44 forking: Use last_pid_mutex for synchronization during clone()
Before this patch we used flock to order task creation,
but this way is not good. It took 5 syscalls to synchronize
a creation of a single child:

1)open()
2)flock(LOCK_EX)
3)flock(LOCK_UN)
4)close() in parent
5)close() in child

The patch introduces more effective way for synchronization,
which executes 2 syscalls only. We use last_pid_mutex,
and the syscalls number sounds definitely better.

v2: Don't use flock() at all

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:32 +03:00
Kirill Tkhai
e032b85c51 forking: Introduce last_pid_mutex and helpers
Introduce mutex for synchronization ns_last_pid file
on restore.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:32 +03:00
Kirill Tkhai
a90aad23a4 namespace: Group unlocking/closing operations in do_create_pid_ns_helper()
Group them for 1)error and 2)parent cases. This minimize the code
and will be used in next patches.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:32 +03:00
Kirill Tkhai
e31ad5f195 pid_ns: Move parent pid_ns's helper check to create_pid_ns_helper()
It's impossible to create a task from a pid_ns if its helper
is not created, because we wait in wait_pid_ns_helper_prepared()
for that. So, such situation here is a bug.
Move the wait and convert it to BUG().

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:32 +03:00
Kirill Tkhai
56719a3e52 pid_ns: Simplify do_create_pid_ns_helper() using ns helpers
Get pid_ns fd from INIT_PID task of this namespace and
use switch_ns() and restore_ns().

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:32 +03:00
Kirill Tkhai
3ba6ed7fcd pid_ns: Pass namespace init task to do_create_pid_ns_helper()
This will be used in next patch.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:32 +03:00
Kirill Tkhai
f88ed85618 pid_ns: Rename do_destroy_pid_ns_helper()
It must sound do_destroy_pid_ns_helpers() with *s*.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:32 +03:00
Kirill Tkhai
a6505c5f56 restore: Always set real pid in restore_task_with_children()
In next patches, root_item will need to have its real pid
to be sure, usernsd already sees it.

Also add a comment, explaning why set real pid in two places.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:32 +03:00
Kirill Tkhai
6191677b75 restore: Delete excess code in call_clone_fn()
We never call this function for root_item.
It's for dropping user ns, which may happen
with the rest of tasks only.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:32 +03:00
Kirill Tkhai
f7b1e3fc77 restore: Simplify do_fork_with_pid() #2
Move xfree() up

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:31 +03:00
Kirill Tkhai
1314e0d2e8 restore: Simplify do_fork_with_pid()
memcpy() is not need here, as we rewrite all the fields later.
Also, use PID_SIZE() helper.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:31 +03:00
Kirill Tkhai
fce6be3c9b zdtm: Mark ns tests as auto
Check the features and delete "noauto".

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:31 +03:00
Kirill Tkhai
20d1d08f30 ns: Add ns_get_parent() feature
Check for NS_GET_PARENT nsfs ioctl().

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:31 +03:00
Kirill Tkhai
10ed43f2d5 ns: Add ns_get_userns() feature
Check for NS_GET_USERNS nsfs ioctl().

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:31 +03:00
Kirill Tkhai
c739646659 zdtm: Fixup netns_sub_veth test hang
This patch fixes the test hang, which happens in my envinronment.

==================== Run zdtm/static/netns_sub_veth in uns =====================
Start test
Test is SUID
./netns_sub_veth --pidfile=netns_sub_veth.pid --outfile=netns_sub_veth.out

==== ALARM ====
  PID TTY      STAT   TIME COMMAND
 1991 ?        Ssl    0:40  \_ /usr/lib/gnome-terminal/gnome-terminal-server
 2124 pts/1    Ss+    0:00  |   \_ bash
 2416 pts/2    Ss+    0:00  |   \_ bash
 4064 pts/4    Ss     0:00  |   \_ bash
 4075 pts/4    S      0:00  |   |   \_ su
 4085 pts/4    S      0:00  |   |       \_ bash
 1556 pts/4    S+     0:00  |   |           \_ python2 ./test/zdtm.py run -t zdtm/static/netns_sub_veth
 1590 pts/4    S+     0:00  |   |               \_ ./zdtm_ct zdtm.py
 1605 pts/4    S+     0:00  |   |               |   \_ python2 zdtm.py
 1616 pts/4    S+     0:00  |   |               |       \_ python2 zdtm.py
 1960 pts/4    S+     0:00  |   |               |           \_ make --no-print-directory -C zdtm/static netns_sub_veth.pid
 1969 pts/4    S+     0:00  |   |               |               \_ ./netns_sub_veth --pidfile=netns_sub_veth.pid --outfile=netns_sub_veth.out
 1970 ?        Ss     0:00  |   |               |                   \_ ./netns_sub_veth --pidfile=netns_sub_veth.pid --outfile=netns_sub_veth.
 1973 ?        S      0:00  |   |               |                       \_ ./netns_sub_veth --pidfile=netns_sub_veth.pid --outfile=netns_sub_v
 1974 ?        Ss     0:00  |   |               |                           \_ ./netns_sub_veth --pidfile=netns_sub_veth.pid --outfile=netns_s
 1975 ?        Z      0:00  |   |               |                               \_ [netns_sub_veth] <defunct>
 1979 pts/4    R+     0:00  |   |               \_ ps axf

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:31 +03:00
Kirill Tkhai
de463c61b3 pid_ns: Destroy helpers via kill()
When INIT_PID of a pid_ns emergency exits, kernel
kills all processes belonging to the namespace.
So, it's hopelessly to wait helper answer to destroy
request. Use kill() to destroy instead of that.
It will be noop in case of a handler is already
killed, and we won't stuck.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:31 +03:00
Kirill Tkhai
e299352705 zdtm: Return tun test back as "auto"
After last patches for net ns the test works again (as envinronment changed),
so return it back.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:31 +03:00
Kirill Tkhai
4677d05e1a pstree: Delete checks of always existing pstree_item::ids on restore
pid, net, ipc, uts, mnt ids exist always, and we check
for them when we are reading ids img (see previous
patch "pstree: Check for always existing task ids").
Also, pstree_item::ids exist always too (we populate
them even for dead tasks, see read_pstree_image()).
So, delete the excess checks and simplify the code.

Also, in restore_one_alive_task() check for has_user_ns_id
instead of ids, as ids always exist.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:31 +03:00
Kirill Tkhai
3396508044 pstree: undef ADD_OR_COPY_ID()
Limit the scope of this macros and make visible its borders.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:31 +03:00
Kirill Tkhai
6748de2720 pstree: Fix alignment in read_pstree_ids()
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:31 +03:00
Kirill Tkhai
594c69206c pstree: Delete excess check in read_pstree_ids()
has_pid_ns_id is checked above. In could go together with
previous patch, but I separated them for easier review.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:31 +03:00
Kirill Tkhai
80e9ca5b01 pstree: Check for always existing task ids on restore
All alive task must have have ids and the fields,
implemented before the img format became stable
(see commit 2105e18eee70).

Check for them in the only place (in additional to the
check for has_pid_ns_id, which we already have)
and this will allow to remove checks for item->ids and
for item->ids->has_xxx_ns_id from the rest of code
and make it simplier. See patch "pstree: Delete checks
of always existing pstree_item::ids on restore" in further)

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:31 +03:00
Kirill Tkhai
b3bbece27f pstree: Delete excess check it read_pstree_image()
This is already checked, when we check for parent->ids.
So, delete excess check.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:31 +03:00
Kirill Tkhai
a58717c908 namespaces: Silence coverity on get_service_fd()
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-11-30 01:24:31 +03:00