In preparation for the zdtm option '--check-only' a new helper function
reset_pid() is added which writes to ns_last_pid to avoid PID collisions
during check-only restore and the real restore.
Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
1)Create a socket, bind it, then create a child in lower user, pid and net ns.
2)Close socket in parent
3)After signal, check that child can create the socket with the same name.
(It must, as it's in another net namespace).
v2: Add uid/gid mapping.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Plain wait() waits only children created with SIGCHLD flag.
Add it.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Wait child before daemonization to do not allow
zdtm.py to see child fds and maps before it
becomes zombie.
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Wait child before daemonization to do not allow
zdtm.py to see child fds and maps before it
becomes zombie.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Session 15's(20) leader is in first pidns, one it's process is in
second pidns and one is
in the third. So we create two helpers here for each aditional
pidns.
(It is critical that
Full test now looks like (mind pids here are different(real) from
their id's in source code e.g. 15 is 20 here):
(pid,ppid,sid)
session04(1, 0, 1)───session04(4, 1, 4)───session04(5, 4, 4)───session04(6, 5, 6,pid1)─┬─session04(8, 6, 8)───session04(9, 8, 7)
├─session04(10, 6, 6)───session04(11, 10, 11)
├─session04(13, 6, 13)───session04(14, 13, 11)
├─session04(15, 6, 15)
├─session04(17, 6, 17)─┬─session04(18, 17, 15)
│ └─session04(19, 17, 17,pid2)───session04(22, 19, 20)
├─session04(20, 6, 20)
└─session04(23, 6, 6,pid3)───session04(25, 23, 20)
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Demand ns_pid, ns_get_userns and ns_get_parent features, else will
have "Can't do ns ioctl" error in criu:set_ns_opt().
v2:remove unused variable i in cleanup
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Before "pstree: rework init reparent handling for pid namespaces" patch
we would get:
$ ./test/zdtm.py run -t zdtm/static/session01
=== Run 1/1 ================ zdtm/static/session01
======================= Run zdtm/static/session01 in ns ========================
Start test
./session01 --pidfile=session01.pid --outfile=session01.out
Run criu dump
Run criu restore
=[log]=> dump/zdtm/static/session01/31/1/restore.log
------------------------ grep Error ------------------------
(00.001103) 8 was born with sid 4
(00.001105) 7 was born with sid 4
(00.001106) 21 was born with sid 17
(00.001108) 1 was born with sid 17
(00.001109) Error (criu/pstree.c:1005): Can't find a session leader for 17
------------------------ ERROR OVER ------------------------
Corresponding tree before dump:
(combined 'pstree -pS 1' and 'ps axf -o pid,ppid,sid')
session01(1, 0, 1)─┬─session01(3, 1, 1)───session01(4, 3, 4)─┬─session01(5, 4, 5)─┬─session01(23, 5, 5)
│ │ ├─session01(24, 5, 5)
│ │ └─session01(26, 5, 5)
│ ├─session01(6, 4, 4)
│ ├─session01(7, 4, 7)───session01(16, 7, 4)
│ └─session01(8, 4, 8)───session01(15, 8, 15)───session01(20, 15, 4)
├─session01(12, 1, 12)───session01(17, 12, 17)───session01(18, 17, 18)───session01(27, 18, 4)
├─session01(13, 1, 10)
├─session01(14, 1, 4)
└─session01(21, 1, 21)───session01(22, 21, 17)
22 can not restore as it needs session 17, but 17-th's leader is not in
ancestors(21 had been reparented from 17; 12, 13 an 14 from 4).
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Create a child in new pid_ns; then the child creats thread and zombie.
Zombie is in the second created new pid_ns. Then the great parent
setns() to its active pid_ns. So, lets draw the table:
pid_ns vs pid_for_children_ns
great parent: equal
child: not equal
child thread: equal
grand child zombie: zombies don't have pid_for_children_ns
After signal chech that everything remains the same.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
1)Create a pid namespace and child reaper in it;
2)Set a specific next pid for future created process;
3)Create one more process in the namespace and kill it;
4)Wait for signal
5)Check, that NSpids of dead task remains the same.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Currently, one feature is supported. Add possibility
for a test to depend on several features.
v2: Delete excess "if" as suggested by Andrey Vagin.
Rename variables to decrise patch size.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Glibc has BUG with process creation:
https://sourceware.org/bugzilla/show_bug.cgi?id=21386
It doesn't behave well when parent and child are from
different pid namespaces and have the same pid.
Use raw syscall without glibc's asserts as workaround.
Also, use raw syscall for getpid() in tests too,
as these two function go in the pair (glibc's getpid()
relies on glibc's fork()).
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
After last patches for net ns the test works again (as envinronment changed),
so return it back.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Check that a pid_ns create with custom user_ns is restore right:
parent (pid_ns1, user_ns1)
|
v
child (pid_ns2, user_ns2)
pid_ns1 (of user_ns1)
|
v
pid_ns2 (of user_ns2)
user_ns1
|
v
user_ns2
v3: New
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Create parent (P) and its three children (C1, C2 and C3)
with different pid namespaces:
P (pid_ns1)
/|\
/ | \
/ | \
/ | \
/ | \
(pid_ns1) C1 C2 C3 (pid_ns1)
(pid_ns2)
where pid_ns1 is a parent of pid_ns2:
pid_ns1
|
pid_ns2
Children C1, C2 and C3 created in the written order,
i.e. C1 has the smallest pid and C2 has the biggest.
After receiving signal check, that pid namespaces
restored right.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
In next patches usernsd will need to create transport
socket in the same net_ns as other tasks do their
TRANSPORT_FD_OFF sockets.
Choose criu net_ns for that: this allows usernsd
to do not wait for creation of other net_ns, i.e.
to do not introduce new dependencies between tasks.
In case of (root_ns_mask & CLONE_NEWUSER) != 0
root_item's user_ns does not allow to restore criu net_ns,
so do prepare_net_namespaces() in sub-process to do not
lose criu net.
v3: Introduce __prepare_net_namespaces and execute it in cloned task.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
After the commit
02c763939c10 ("test/zdtm: unify common code")
CFLAGS with -D_GNU_SOURCE defined in the top Makefile
are being passed to tests Makefiles.
As _GNU_SOURCE is also defined by tests, that resulted in
zdtm tests build failures:
make[2]: Entering directory `/home/criu/test/zdtm/lib'
CC test.o
test.c:1:0: error: "_GNU_SOURCE" redefined [-Werror]
#define _GNU_SOURCE
^
<command-line>:0:0: note: this is the location of the previous definition
cc1: all warnings being treated as errors
make[2]: *** [test.o] Error 1
However, we didn't catch this in time by Travis-CI, as zdtm.py doesn't
do `make zdtm`, rather it does `make -C test/zdtm/{lib,static,transition}`.
By calling middle makefile this way, it doesn't have _GNU_SOURCE in
CFLAGS from top-Makefile.
I think the right thing to do here - is following CRIU's way:
rely on definition of _GNU_SOURCE by Makefiles.
This patch is almost fully generated with
find test/zdtm/ -name '*.c' -type f \
-exec sed -i '/define _GNU_SOURCE/{n;/^$/d;}' '{}' \; \
-exec sed -i '/define _GNU_SOURCE/d' '{}' \;
With an exception for adding -D_GNU_SOURCE in tests Makefile.inc for
keeping the same behaviour for zdtm.py.
Also changed utsname.c to use utsname::domainname, rather private
utsname::__domainname, as now it's uncovered (from sys/utsname.h):
> struct utsname
> {
...
> # ifdef __USE_GNU
> char domainname[_UTSNAME_DOMAIN_LENGTH];
> # else
> char __domainname[_UTSNAME_DOMAIN_LENGTH];
> # endif
Reported-by: Adrian Reber <areber@redhat.com>
Cc: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Check, that fdstore-keeped user ns files are opened
correct after restore.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
We shave a test case for external veth devices. This test case
checks veth devices which are living in two dumped network
namespaces.
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Differs to userns01 test by unsharing net net in child.
This should test nested user/net ns interaction.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Check UID and GID in unshared userns remains the same
v5: Use custom UID and GID.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Create two children, and unshare() user_ns in one of them (C1).
The second child creates one more process, which switches to C1's
namespace and unshares.
v4: Keep in mind the case, when readlink returns PATH_MAX-length string.
Print full wait status instead of WEXITSTATUS().
v3: Unshare net ns in grand child
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
This tests create a few processes which live in three network namespaces
and have a few sockets which are created in different network namespaces.
Acked-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
So, here's the next test that just enumerates all possible states and checks
that CRIU C/R-s it well. This time -- pipes. The goal of the test is to load
the fd-sharing engine, so pipes are chosen, as they not only generate shared
struct files, but also produce 2 descriptors in CRIU's fdesc->open callback
which is handled separately.
It's implemented slightly differently from the unix test, since we don't want
to check sequences of syscalls on objects, we need to check the task to pipe
relations in all possible ways.
The 'state' is several tasks, several pipes and each generated test includes
pipe ends sitting in all possible combinations in the tasks' FDTs.
Also note, that states, that seem to be equal to each other, e.g. pipe between
tasks A->B and pipe B->A, are really different as CRIU picks the pipe-restorer
based in task PIDs. So whether the picked task has read end or write end at
his FDT makes a difference on restore.
Number of tasks is limited with --tasks option, number of pipes with the
--pipes one. Test just runs all -- generates states, makes them and C/R-s
them. To check the restored result the /proc/pid/fd/ and /proc/pid/fdinfo/
for all restored tasks is analyzed.
Right now CRIU works OK for --tasks 2 --pipes 2 (for more -- didn't check).
Kirill, please, check that your patches pass this test.
TODO:
- Randomize FDs under which tasks see the pipes. Now all tasks if they have
some pipe, all see it under the same set of FDs.
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
By exhaustive testing I understand a test suite that generates as much
states to try to C/R as possible by trying all the possible sequences
of system calls. Since such a generation, if done on all the Linux API
we support in CRIU, would produce bazillions of process, I propose to
start with something simple.
As a starting point -- unix stream sockets with abstract names that
can be created and used by a single process :)
The script generates situations in which unix sockets can get into by
using a pre-defined set of system calls. In this patch the syscalls
are socket, listen, bind, accept, connect and send. Also the nummber
of system calls to use (i.e. -- the depth of the tree) is limited by
the --depth option.
There are three things that can be done with a generated 'state':
I) Generate :) and show
Generation is done by recursively doing everything that is possible
(and makes sence) in a given state. To reduce the size of the tree
some meaningless branches are cut, e.g. creating a socket and closing
it right after that, creating two similar sockets one-by-one and some
more.
Shown on the screen is a cryptic string, e.g. 'SA-CX-MX_SBL one,
describing the sockets in the state. This is how it can be decoded:
- sockets are delimited with _
- first goes type (S -- stream, D --datagram)
- next goes name state (A -- no name, B with name, X socket is not in
FD table, i.e. closed or not yet accepted)
- next may go letter L meaning that the socket is listening
- -Cx -- socket is connected and x is the peer's name state
- -Ixyz -- socket has incoming connections queue and xyz are the
connect()-ors name states
- -Mxyz -- socket has messages and xyz is senders' name states
The example above means, that we have two sockets:
- SA-CX-MX: stream, with no name, connected to a dead one and with a
message from a dead one
- SBL: stream, with name, listening
Next printed is the sequence of system calls to get into it, e.g. this
is how to get into the state above:
socket(S) = 1
bind(1, $name-1)
listen(1)
socket(S) = 2
connect(2, $name-1)
accept(1) = 3
send(2, $message-0)
send(3, $message-0)
close(3)
Program has created a stream socket, bound it, listened it, then
created another stream socket, connected to the 1st one, then accepted
the connection sent two messages vice-versa and closed the accepted
end, so the 1st socket left connected to the dead socket with a
message from it.
II) Run the state
This is when test actually creates a process that does the syscalls
required to get into the generated state (and hopefully gets into it).
III) Check C/R of the state
This is the trickiest part when it comes to the R step -- it's not
clear how to validate that the state restored is correct. But if only
trying to dump the state -- it's just calling criu dump. As images dir
the state string description is used.
One may choose only to generate the states with --gen option. One may
choose only to run the states with --run option. The latter is useful
to verify that the states generator is actually producing valid
states. If no options given, the state is also dump-ed (restore is to
come later).
For now the usage experience is like this:
- Going --depth 10 --gen (i.e. just generating all possibles states
that are acheivable with 10 syscalls) produces 44 unique states for
0.01 seconds. The generated result covers some static tests we have
in zdtm :) More generation stats is like this:
--depth 15 : 1.1 sec / 72 states
--depth 18 : 13.2 sec / 89 states
--depth 20 : 1 m 8 sec / 101 state
- Running and trying with criu is checked with --depth 9. Criu fails
to dump the state SA-CX-MX_SBL (shown above) with the error
Error (criu/sk-queue.c:151): recvmsg fail: error: Connection reset by peer
Nearest plans:
1. Add generators for on-disk sockets names (now oly abstract).
Here an interesting case is when names overlap and one socket gets
a name of another, but isn't accessible by it
2. Add datagram sockets.
Here it'd be fun to look at how many-to-one connections are
generated and checked.
3. Add socketpair()-s.
Farther plans:
1. Cut the tree better to allow for deeper tree scan.
2. Add restore.
3. Add SCM-s
4. Have the exhaustive testing for other resources.
Changes since v1:
* Added DGRAM sockets :)
Dgram sockets are trickier that STREAM, as they can reconnect from
one peer to another. Thus just limiting the tree depth results in
wierd states when socket just changes peer. In the v1 of this patch
new sockets were added to the state only when old ones reported that
there's nothing that can be done with them. This limited the amount
of stupid branches, but this strategy doesn't work with dgram due to
reconnect. Due to this, change #2:
* Added the --sockets NR option to limit the amount of sockets.
This allowed to throw new sockets into the state on each step, which
made a lot of interesting states for DGRAM ones.
* Added the 'restore' stage and checks after it.
After the process is restore the script performs as much checks as
possible having the expected state description in memory. The checks
verify that the values below get from real sockets match the
expectations in generated state:
- socket itself
- name
- listen state
- pending connections
- messages in queue (sender is not checked)
- connectivity
The latter is checked last, after all queues should be empty, by
sending control messages with socket.recv() method.
* Added --keep option to run all tests even if one of them fails.
And print nice summary at the end.
So far the test found several issues:
- Dump doesn't work for half-closed connection with unread messages
- Pending half-closed connection is not restored
- Socket name is not restored
- Message is not restored
New TODO:
- Check listen state is still possible to accept connections (?)
- Add socketpair()s
- Add on-disk names
- Add SCM-s
- Exhaustive script for other resources
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
This can help to investigate logs from Mr Jenkins.
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
warning: In the GNU C Library, "major" is defined
by <sys/sysmacros.h>. For historical compatibility, it is
currently defined by <sys/types.h> as well, but we plan to
remove this soon. To use "major", include <sys/sysmacros.h>
directly. If you did not intend to use a system-defined macro
"major", you should undefine it after including <sys/types.h>.
if (major(st.st_rdev) != major(st_rtc.st_rdev) ||
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
aio_context_t is 8 byte long so on 32 bit mode it might be
strippped off when unsigned long used instead. Fix this typo.
Signed-off-by: Cyrill Gorcunov <gorcunov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
v2: defining crit_bin and using it for Popen() // Mike
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
flake8 was updated recently and now it shows a few new warnings:
[root@fc24 criu]# make lint
flake8 --config=scripts/flake8.cfg test/zdtm.py
test/zdtm.py:181:4: E722 do not use bare except'
test/zdtm.py:304:2: E722 do not use bare except'
test/zdtm.py:325:3: E722 do not use bare except'
test/zdtm.py:445:3: E722 do not use bare except'
test/zdtm.py:573:4: E722 do not use bare except'
test/zdtm.py:1369:2: E722 do not use bare except'
test/zdtm.py:1385:3: E722 do not use bare except'
test/zdtm.py:1396:2: E722 do not use bare except'
test/zdtm.py:1420:3: E722 do not use bare except'
test/zdtm.py:1820:2: E741 ambiguous variable name 'l'
make: *** [Makefile:369: lint] Error 1
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Also use task_waiter_t syncpoint to make sure fd won't escape
while we're reading output.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
All Jenkins jobs fail with this error:
22:25:13.186: 37: ERR: cgroup_ifpriomap.c:50: Can't mount cgroups (errno = 16 (Device or resource busy))
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
A test to check C/R of multiline cgroup net_prio.ifpriomap.
Before this patches set restoring of this file failed as
it's a multiline cgroup property and kernel can read it
only line-by-line.
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Those are devices which are written to devices.allow cgroup.
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>