We need to know root_item's pid_ns nesting level
(if its pid_ns is similar to criu's, or not)
before collect_tasks(). Thus we populate
pstree_item::pid::level correctly.
v4: New
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Since parse_thread_status() still does not use a passed thread,
moving the allocation up is just a refactoring.
Also make readable the nightmare with indexes and simplify BUG_ON()
using ">=".
v4: New
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Currently this is refactoring, as only level == 1 is allowed.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Implement a helper receiving number of levels between
pid namespace level NS_CRIU and NS_ROOT.
v2: Pre-dump tasks pid_ns in predump_task_ns_ids()
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Check that a pid_ns create with custom user_ns is restore right:
parent (pid_ns1, user_ns1)
|
v
child (pid_ns2, user_ns2)
pid_ns1 (of user_ns1)
|
v
pid_ns2 (of user_ns2)
user_ns1
|
v
user_ns2
v3: New
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Create parent (P) and its three children (C1, C2 and C3)
with different pid namespaces:
P (pid_ns1)
/|\
/ | \
/ | \
/ | \
/ | \
(pid_ns1) C1 C2 C3 (pid_ns1)
(pid_ns2)
where pid_ns1 is a parent of pid_ns2:
pid_ns1
|
pid_ns2
Children C1, C2 and C3 created in the written order,
i.e. C1 has the smallest pid and C2 has the biggest.
After receiving signal check, that pid namespaces
restored right.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
I need named socket to communicate with pid_ns helpers
(see next patches) and receive answer from them
(it's impossible to send answer to unnamed socket).
As we already have transport socket, we'll reuse it
for the above goal too.
This patch makes transport sockets be created before
creation of children tasks. Also, now they are created
not only for alive tasks (so we need additional
manipulations for TASK_HELPERS, e.g., to call prepare_fdt()).
v5: Return CLONE_FILES clone() argument during task helpers
creation. Also get rid of fdt_mutex as CLONE_FILES processes
does not close old files after clone, and we don't have
intertersections between them. Also, socket() system call
can't return a fd in service fds range, which was the main
reason to have this mutex.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Next patches will create transport sockets in task helpers.
As helpers are forked using CLONE_FILES, they must resolve
shared fds to create their own service fds. This patch allows
that.
I've digged in the code, and there is no a reason, we need
pid_rst_prio() during choosing of fdt restorer. So, this
case may be safely deleted, which guarantees, that in case
of TASK_HELPER, the restorer of fdt will be parent, i.e.,
no one TASK_HELPER will be restorer of fdt.
v5: New
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
This is refactoring, which will be used in next patches.
BUG_ON() just to mention that parent must be set before
call of this function.
v5: New
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
mntns_get_root_fd() may be called by a task from
!root_user_ns, and it fails if so.
Put root fd to fdstore to allow use it every task.
v3: New
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
In next patches usernsd will need to create transport
socket in the same net_ns as other tasks do their
TRANSPORT_FD_OFF sockets.
Choose criu net_ns for that: this allows usernsd
to do not wait for creation of other net_ns, i.e.
to do not introduce new dependencies between tasks.
In case of (root_ns_mask & CLONE_NEWUSER) != 0
root_item's user_ns does not allow to restore criu net_ns,
so do prepare_net_namespaces() in sub-process to do not
lose criu net.
v3: Introduce __prepare_net_namespaces and execute it in cloned task.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Helper to free pstree_item and its components.
Also, use it in collect_children() to free
dynamically allocated components.
v4: New
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
We already vpid() function. Make it #define instead of that,
and this will allow to use it in assignments:
vpid(item) = pid;
Also, introduce vsid(), vpgid() and vtid() like it's shown below:
#define vpid(item) (item->pid->ns[0].virt)
#define vsid(item) (item->sid->ns[0].virt)
#define vpgid(item) (item->pgid->ns[0].virt)
#define vtid(item, i) (item->threads[i]->ns[0].virt)
https://travis-ci.org/tkhai/criu/builds/225938195
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
When --remote option is specified, read_local_page tries to pread from a
socket, and fails with "Illegal seek" error.
Restore single pread call for regular image files case and introduce
maybe_read_page_img_cache version of maybe_read_page method.
Generally-approved-by: Rodrigo Bruno <rbruno@gsd.inesc-id.pt>
Acked-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
If there is nested pid_ns, we need to be able to get pid in
the whole pid hierarhy. This may be taken from "/proc/[pid]/status"
file only. Check, that kernel has support for it.
v3: Add criu feature check
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Pid may contain more then one level, so this patch teaches the function
to work with such pids. The signify difference after this patch is that
we link a new item in several rb_root in every ns.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Pass a namespace of item to the function.
This will allow to link the pid in correct ns::pid::root_rb
in next patches.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Sanity check, that we have pid_ns_id. As we dump pid_ns_id
since ids are implemented, they must be always.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Alive tasks must have ids populated, while dead tasks have
pid and user namespaces are set (in Linux). But we never
dumped ids for dead tasks.
Since we have no support nested pid ns yet, only one pid_ns
is possible in existing dumps, so it must be equal to
root_item's. User ns is not so, but currently it's impossible
to know dead task's user ns from the dump, so set it
to root_item's too.
In further, we're going to dump ids for all tasks: see next
patches for that. This patch is only to handle old images
with unexisting dead tasks's ids.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Move block with finding of parent item up in the function.
No functional changes, only changing the order.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
It's the most parent pid namespace, which is seen by dumpees.
It's NS_ROOT if root_ns_mask has CLONE_NEWPID, and NS_CRIU
otherwise.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Read ids before creation of item, then we'll know
pid_ns of the item, so later we will be able to
allocate item with right levels of pid (in next patches).
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Pass vpid instead of pstree_item as input argument,
and return ids to caller. No functional changes here.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
This patch is cleanup, which just makes comparation
on values on the one pid level. It has no functional
payload, because the new patches turn off pgid set
if for multi-level pids cases, till it will be implemented.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
This allows to compare pids values on the whole hierarhy.
v3: Do not use break as some travis builds don't like it.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
(Was "user_ns: Block SIGCHLD during namespaces generation")
We don't want asynchronous signal handler during creation
of namespaces (for example, in create_user_ns_hierarhy())
as we do wait() synchronous. So we need to block the signal.
Do this once globally.
v2: Set initial ret = 0
v3: Block signal globally in root_item before its children
are created.
v4: Move block to prepare_namespace()
Suggested-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
We're interested in just created child only. Other possibly guys
will be handled in appropriate places later (criu task may have
helpers-children).
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
After the commit
02c763939c10 ("test/zdtm: unify common code")
CFLAGS with -D_GNU_SOURCE defined in the top Makefile
are being passed to tests Makefiles.
As _GNU_SOURCE is also defined by tests, that resulted in
zdtm tests build failures:
make[2]: Entering directory `/home/criu/test/zdtm/lib'
CC test.o
test.c:1:0: error: "_GNU_SOURCE" redefined [-Werror]
#define _GNU_SOURCE
^
<command-line>:0:0: note: this is the location of the previous definition
cc1: all warnings being treated as errors
make[2]: *** [test.o] Error 1
However, we didn't catch this in time by Travis-CI, as zdtm.py doesn't
do `make zdtm`, rather it does `make -C test/zdtm/{lib,static,transition}`.
By calling middle makefile this way, it doesn't have _GNU_SOURCE in
CFLAGS from top-Makefile.
I think the right thing to do here - is following CRIU's way:
rely on definition of _GNU_SOURCE by Makefiles.
This patch is almost fully generated with
find test/zdtm/ -name '*.c' -type f \
-exec sed -i '/define _GNU_SOURCE/{n;/^$/d;}' '{}' \; \
-exec sed -i '/define _GNU_SOURCE/d' '{}' \;
With an exception for adding -D_GNU_SOURCE in tests Makefile.inc for
keeping the same behaviour for zdtm.py.
Also changed utsname.c to use utsname::domainname, rather private
utsname::__domainname, as now it's uncovered (from sys/utsname.h):
> struct utsname
> {
...
> # ifdef __USE_GNU
> char domainname[_UTSNAME_DOMAIN_LENGTH];
> # else
> char __domainname[_UTSNAME_DOMAIN_LENGTH];
> # endif
Reported-by: Adrian Reber <areber@redhat.com>
Cc: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Check, that fdstore-keeped user ns files are opened
correct after restore.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Since net ns is assigned after prepare_fds() and,
in common case, at the moment of open_ns_fd() call
task points to a net ns, which differs to its target
net ns, we can't get the ns from a task. So, get it
from fdstore. Also, support userns ns fds.
v2: Add comment
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
This improves uniformity. Also, this will be used in next patch.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Move the code to simplify it and to allow to use this function others.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
This function may call functions like open_proc(),
so use CLONE_VM to reflect children open files in
parent memory.
v3: New
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
This will be used in next patch.
Also, check for MAP_FAILED istead of NULL before munmap().
v3: New
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
This reverts a hunk from commit 4ad343c ("Use *open_proc* where
possible"), and adds a comment explaining why.
The bug was caught by ci [1] and wasn't caught by Travis because
the last one runs on the older kernel.
(00.271276) 1: Error (criu/util.c:204): fd 0 already in use
(called at criu/files.c:1008)
(00.292162) Error (criu/cr-restore.c:1127): 425 exited, status=1
(00.295802) Error (criu/cr-restore.c:2059): Restoring FAILED.
[1] https://ci.openvz.org/view/CRIU/job/CRIU/job/CRIU-snap/job/criu-dev/2079/consoleFull
Reported-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>