To figure out efd:tfd mapping easier by reading the logs.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
When target file obtained from epoll fdinfo (internally the
kernel keeps only file _number_ inside) we have to check its
identity to make sure it is exactly one which has been added
into epoll engine. The only proper way is to use kcmp syscall.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
When we are checkpoiting epoll targets we assuming that this target
file is belonging to the process we are on. This is of course not
true. Without kernel support the only thing we can do is compare
fd numbers with ones present in epoll fdinfo. When fd numer match
we assume that it indeed the file which has been added into epoll.
This won't cover the case when file has been moved to some other
number and new one is reopened instead of it. Such scenario will
trigger false positive and we can't do anything about.
In next patches with kernel help we will make precise check for
files identity.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
In epoll dumping we will need the whole set of fds to investigate
the targets, so pass this parameter down to epoll code.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
We will need it to make sure the target files in epolls are present
in current process.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
- switch to use uintX type (just to drop uX finally,
it doesn't worth to carry this type)
- instead of including huge util.h rather include the
files which are really needed: log, xmalloc, compiler
and bug
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Before this patch we used flock to order task creation,
but this way is not good. It took 5 syscalls to synchronize
a creation of a single child:
1)open()
2)flock(LOCK_EX)
3)flock(LOCK_UN)
4)close() in parent
5)close() in child
The patch introduces more effective way for synchronization,
which executes 2 syscalls only. We use last_pid_mutex,
and the syscalls number sounds definitely better.
v2: Don't use flock() at all
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
I think, we should warn a user when we can't C/R compatible
applications. That's valid for different than x86 archs.
Let's correct the message the way it'll suit non-x86.
Reported-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
When a non-root user runs "criu restore" and criu has the suid bit,
a process will run with non-zero uid and gid.
Before the 4.13 kernel (4d28df6152aa "prctl: Allow local CAP_SYS_ADMIN
changing exe_file"), PR_SET_MM_EXE_FILE fails if uid or gid isn't zero.
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Install sudo, create test user with ID 1000, install bash,
fix pidfile creation and pidfile chmod.
v2:
* use sleep to give the criu daemon some time to start up
v3:
* Andrei is of course right and sleep is not good solution.
After adding --status-fd support to criu service, this
is how we now detect that criu is ready.
v4:
* This was much more complicated than expected which is related
to the different versions of the tools on the different travis
test targets. There seems to be a bug in bash on Ubuntu
https://lists.gnu.org/archive/html/bug-bash/2017-07/msg00039.html
which prevents using 'read -n1' on Ubuntu. As a workaround
the result from CRIU's status FD is now read via python.
Another problem was discovered on alpine with the loop restore test.
CRIU says to use setsid even if the process is already using setsid.
As a workaround, still with setsid, this process is now using
shell-job true for checkpoint and restore.
Parts of v2 have been committed before. So the changes from this commit
are partially already in another commit.
Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Make the --status-fd option also work in 'criu service' mode to avoid
race conditions during testing.
Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Install sudo, create test user with ID 1000, install bash,
fix pidfile creation and pidfile chmod.
v2:
* use sleep to give the criu daemon some time to start up
Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
This extends the test.py to also run the RPC command VERSION via 'criu
service'. It was already running using 'criu swrk'.
Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
In this directory there are various test cases using CRIU in RPC mode
(or SWRK mode).
This fixes the broken tests by moving the start of 'criu service' from
run.sh to the Makefile as the test cases is running using "sudo -g
'#1000' -u '#1000'" and the PID file created by CRIU can only be read by
the root user. If starting the 'criu service' before run.sh the PID file
still can be changed to 0666 and fixing the test script.
This also adds version.py to the test cases that are executed.
Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Return an error if we meet unexpected parameters in a config file
Cc: Veronika Kabatova <vkabatov@redhat.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Now we rely on scanf, that it will initializes a pointer to NULL, when
it fails to parse a string, but I can't find in a man page, that it has
to do this.
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Instead of pre-parsing command line twice, one time to detect -h/--help and
another time to find config file parameter, check for both in one pass.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
When config parsing was split into a separate part the handling of
-h/--help option during init_config was broken. Fix it.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Currently kerndat_init() runs before command line parsing and running
simple 'criu --version' command may produce something like:
Warn (criu/kerndat.c:847): Can't load /run/criu.kdat
Error (criu/util.c:842): exited, status=3
Error (criu/util.c:842): exited, status=3
Write 4294967295 to /proc/self/loginuid failed: Operation not permittedWarn
(criu/net.c:2732): Unable to get socket network namespace
Warn (criu/net.c:2732): Unable to get tun network namespace
Warn (criu/sk-unix.c:213): sk unix: Unable to open a socket file:
Operation not permitted
Error (criu/net.c:3023): Unable create a network namespace: Operation not
permitted
Warn (criu/net.c:3069): NSID isn't reported for network links
Version: 3.6
GitID: v3.6-611-g0b27d0a
Group early calls to kerndat_* and init_service_fd calls into a function
and call this function after the command line parsing is finished.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Creating a test for verifying configuration parsing feature. The
test is created by reusing already present inotify_irmap test.
Because of addition of default configuration files, --no-default-config
option is added to zdtm.py to not break the test suite on systems with
these files present.
Signed-off-by: Veronika Kabatova <vkabatov@redhat.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Implementation changes for usage of simple configuration files. Before
parsing the command line options, either default configuration files
(/etc/criu/default.conf, $HOME/.criu/default.conf; in this order) are
parsed, or a specific config file passed by the user. Two new options are
introduced: "--config FILEPATH" option allows users to specify a single
configuration file they want to use; and "--no-default-config" option to
forbid the parsing of default configuration files. Both options are to be
passed only via the command line.
Usage of configuration files is not mandatory to keep backwards
compatibility. The implementation of this feature tries to be compatible
with command line usage -- the user should get the same results whether
he passes the options (in the right order of parsing) on command line or
writes them in config files. This allows the user to:
1) Override boolean options if needed
2) Specify partial configuration for options that are possible to pass
several times (e.g. "--external"), and pass the rest of the options
based on process runtime by command line
Configuration file syntax allows comments marked with '#' sign, the rest
of the line after '#' is ignored. The user can use one option per line
(with argument supplied on the same line if needed, divided with whitespace
characters), the options are the same as long options (without the "--"
prefix used on command line).
Configuration file example (syntax purposes only, doesn't make sense):
$ cat ~/.criu/default.conf
tcp-established
work-dir /home/<USERNAME>/criu/work_directory
extra # inline comment
no-restore-sibling
tree 111111
Signed-off-by: Veronika Kabatova <vkabatov@redhat.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
criu/image-desc.c | 4 ++--
criu/image.c | 4 ++--
criu/include/image.h | 1 +
3 files changed, 5 insertions(+), 4 deletions(-)
In order to prepare for remote snapshots (possible with Image Proxy and Image
Cache) the O_FORCE_LOCAL flag is added to force some images not to be remote
and stay as local files in the file system.
Signed-off-by: Rodrigo Bruno <rbruno@gsd.inesc-id.pt>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
We'll need some docs :) bu the API is
criu := MakeCriu()
criu.Dump(opts, notify)
criu.Restore(opts, notify)
criu.PreDump(opts, notify)
criu.StartPageServer(opts)
where opts is the object from rpc.proto, Go has almost native support
for those, so caller should
- compile .proto file
- export it and golang/protobuf/proto
- create and initialize the CriuOpts struct
and notify is an interface with callbacks that correspond to criu
notification messages.
A stupid dump/restore tool in src/test/main.go demonstrates the above.
Changes since v1:
* Added keep_open mode for pre-dumps. Do use it one needs
to call criu.Prepare() right after creation and criu.Cleanup()
right after .Dump()
* Report resp.cr_errmsg string on request error.
Further TODO:
- docs
- code comments
travis-ci: success for libphaul (rev2)
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
So, here's the next test that just enumerates all possible states and checks
that CRIU C/R-s it well. This time -- pipes. The goal of the test is to load
the fd-sharing engine, so pipes are chosen, as they not only generate shared
struct files, but also produce 2 descriptors in CRIU's fdesc->open callback
which is handled separately.
It's implemented slightly differently from the unix test, since we don't want
to check sequences of syscalls on objects, we need to check the task to pipe
relations in all possible ways.
The 'state' is several tasks, several pipes and each generated test includes
pipe ends sitting in all possible combinations in the tasks' FDTs.
Also note, that states, that seem to be equal to each other, e.g. pipe between
tasks A->B and pipe B->A, are really different as CRIU picks the pipe-restorer
based in task PIDs. So whether the picked task has read end or write end at
his FDT makes a difference on restore.
Number of tasks is limited with --tasks option, number of pipes with the
--pipes one. Test just runs all -- generates states, makes them and C/R-s
them. To check the restored result the /proc/pid/fd/ and /proc/pid/fdinfo/
for all restored tasks is analyzed.
Right now CRIU works OK for --tasks 2 --pipes 2 (for more -- didn't check).
Kirill, please, check that your patches pass this test.
TODO:
- Randomize FDs under which tasks see the pipes. Now all tasks if they have
some pipe, all see it under the same set of FDs.
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
By exhaustive testing I understand a test suite that generates as much
states to try to C/R as possible by trying all the possible sequences
of system calls. Since such a generation, if done on all the Linux API
we support in CRIU, would produce bazillions of process, I propose to
start with something simple.
As a starting point -- unix stream sockets with abstract names that
can be created and used by a single process :)
The script generates situations in which unix sockets can get into by
using a pre-defined set of system calls. In this patch the syscalls
are socket, listen, bind, accept, connect and send. Also the nummber
of system calls to use (i.e. -- the depth of the tree) is limited by
the --depth option.
There are three things that can be done with a generated 'state':
I) Generate :) and show
Generation is done by recursively doing everything that is possible
(and makes sence) in a given state. To reduce the size of the tree
some meaningless branches are cut, e.g. creating a socket and closing
it right after that, creating two similar sockets one-by-one and some
more.
Shown on the screen is a cryptic string, e.g. 'SA-CX-MX_SBL one,
describing the sockets in the state. This is how it can be decoded:
- sockets are delimited with _
- first goes type (S -- stream, D --datagram)
- next goes name state (A -- no name, B with name, X socket is not in
FD table, i.e. closed or not yet accepted)
- next may go letter L meaning that the socket is listening
- -Cx -- socket is connected and x is the peer's name state
- -Ixyz -- socket has incoming connections queue and xyz are the
connect()-ors name states
- -Mxyz -- socket has messages and xyz is senders' name states
The example above means, that we have two sockets:
- SA-CX-MX: stream, with no name, connected to a dead one and with a
message from a dead one
- SBL: stream, with name, listening
Next printed is the sequence of system calls to get into it, e.g. this
is how to get into the state above:
socket(S) = 1
bind(1, $name-1)
listen(1)
socket(S) = 2
connect(2, $name-1)
accept(1) = 3
send(2, $message-0)
send(3, $message-0)
close(3)
Program has created a stream socket, bound it, listened it, then
created another stream socket, connected to the 1st one, then accepted
the connection sent two messages vice-versa and closed the accepted
end, so the 1st socket left connected to the dead socket with a
message from it.
II) Run the state
This is when test actually creates a process that does the syscalls
required to get into the generated state (and hopefully gets into it).
III) Check C/R of the state
This is the trickiest part when it comes to the R step -- it's not
clear how to validate that the state restored is correct. But if only
trying to dump the state -- it's just calling criu dump. As images dir
the state string description is used.
One may choose only to generate the states with --gen option. One may
choose only to run the states with --run option. The latter is useful
to verify that the states generator is actually producing valid
states. If no options given, the state is also dump-ed (restore is to
come later).
For now the usage experience is like this:
- Going --depth 10 --gen (i.e. just generating all possibles states
that are acheivable with 10 syscalls) produces 44 unique states for
0.01 seconds. The generated result covers some static tests we have
in zdtm :) More generation stats is like this:
--depth 15 : 1.1 sec / 72 states
--depth 18 : 13.2 sec / 89 states
--depth 20 : 1 m 8 sec / 101 state
- Running and trying with criu is checked with --depth 9. Criu fails
to dump the state SA-CX-MX_SBL (shown above) with the error
Error (criu/sk-queue.c:151): recvmsg fail: error: Connection reset by peer
Nearest plans:
1. Add generators for on-disk sockets names (now oly abstract).
Here an interesting case is when names overlap and one socket gets
a name of another, but isn't accessible by it
2. Add datagram sockets.
Here it'd be fun to look at how many-to-one connections are
generated and checked.
3. Add socketpair()-s.
Farther plans:
1. Cut the tree better to allow for deeper tree scan.
2. Add restore.
3. Add SCM-s
4. Have the exhaustive testing for other resources.
Changes since v1:
* Added DGRAM sockets :)
Dgram sockets are trickier that STREAM, as they can reconnect from
one peer to another. Thus just limiting the tree depth results in
wierd states when socket just changes peer. In the v1 of this patch
new sockets were added to the state only when old ones reported that
there's nothing that can be done with them. This limited the amount
of stupid branches, but this strategy doesn't work with dgram due to
reconnect. Due to this, change #2:
* Added the --sockets NR option to limit the amount of sockets.
This allowed to throw new sockets into the state on each step, which
made a lot of interesting states for DGRAM ones.
* Added the 'restore' stage and checks after it.
After the process is restore the script performs as much checks as
possible having the expected state description in memory. The checks
verify that the values below get from real sockets match the
expectations in generated state:
- socket itself
- name
- listen state
- pending connections
- messages in queue (sender is not checked)
- connectivity
The latter is checked last, after all queues should be empty, by
sending control messages with socket.recv() method.
* Added --keep option to run all tests even if one of them fails.
And print nice summary at the end.
So far the test found several issues:
- Dump doesn't work for half-closed connection with unread messages
- Pending half-closed connection is not restored
- Socket name is not restored
- Message is not restored
New TODO:
- Check listen state is still possible to accept connections (?)
- Add socketpair()s
- Add on-disk names
- Add SCM-s
- Exhaustive script for other resources
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Andrew reported that previously he been able to c/r even
on the machine with xsavec enabled, so allow to process
for now.
P.S.I'm investigating the problem and to not block testing
process lets permit passing with xsaves bit present.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
With new cpu-cap='op=noxsaves' mode on x86 we should use
compel's instance of rt info since only it carries
features masked.
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Currently even if kernel supports compact xsave frame a user
can disable it by passing noxsaves argument as a boot option.
Thus cpuid instruction will report its presence but in real
it gonna be masked from kernel pov. Lets do the same and
allow a user to mask it via --cpu-cap=noxsaves option
(valid for x86 only).
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Will need them to mask some of the features from
command line options.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
We don't have yet support of compacted xsave frames so report
error on cpu-check, checkpoint, restore actions. Basically
it is done in cpu_init routine which is called in the sites
we're interested in.
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>