The origin idea was to set --empty net for criu dump and criu restore,
but before cde33dcb0639 ("empty-ns: Don't C/R iptables too (v2)"),
criu restore worked without --empty net and we didn't notice that
docker doesn't set this option on restore.
After a small brainstorm, we decided that it is better to remove
this requirement. Docker has to set this option, but with this changes,
the docker issue will be less urgent.
https://github.com/checkpoint-restore/criu/issues/393
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Install the last version of Docker, start a container and C/R it a few times.
v2: call make to install criu
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Test cases:
0. Basic non-breaking read/write leases.
1. Multiple read leases and OFDs with no lease for the same file.
2. Breaking leases.
3. Multiple fds (dup + inherited) for single lease (mutual OFD).
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Information about locks in /proc/<pid>/fdinfo is presented only since
kernel v4.1. This patch adds logic to *note_file_lock* to match leases
and OFDs.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
restore of breaking leases is executed in 2 steps:
1. restore the lease in a state it was before break
2. break it by opening associated file.
The patch fixes type of broken leases to 'target lease type',
because procfs always returns 'READ' in this case.
Also, it adds 'updated' field in lock structure. It's used to remove all
duplicated records for single lease from the image, which wasn't
corrected by 'correct_lease_type'.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Leases in breaking state are not supported. In that case criu will
report an error during the dumping. Also lock info in
/proc/<pid>/fdinfo should be presented (since kernel 4.1).
Before taking out new lease it modifies process fsuid to match file uid
(see fcntl F_SETLEASE).
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Otherwise we get errors like this:
/usr/include/sys/socket.h:315:5: note: expected 'const struct sockaddr *' but argument is of type 'struct sockaddr_un *'
int bind (int, const struct sockaddr *, socklen_t);
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The idea of the check-only option is that criu dump and criu
restore is executed with this option to check whether c/r is
possible for a set of processes. This has to work faster than
without the check-only option.
Now we run criu restore --check-only for images which have
been generated by criu dump without --check-only, it is obviously wrong.
Cc: Adrian Reber <areber@redhat.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
If the check-only option is set, dump and restore is executed twice,
and we need to set separate logs for both cases.
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
If the restore was exexuted with the check-only option,
after restoring all resources tasks waits children and
exits with the 0 code.
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The kerndat_init() is now called before the jump to action handler. This
allows us to directly use kdat without calling to the corresponding
kerndat_*() methods.
✓ travis-ci: success for lazy-pages: update checks for availability of userfaultfd (rev3)
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
In commit 8ce156970cb1 ("pstree: rework init reparent handling for pid
namespaces") I've changed session leader lookup to walk up untill
session leader, in sid inheritance check, not as before to just any
same session process. So we need to allow external sid to be inherited
explicitly for shell jobs.
With these fix I manage to c/r your example fine.
Note: for shell jobs with nested pidnses we need also "[PATCH 04/10]
pstree: add prepare_pstree_leaders to create sid/pgid helpers in
advance" which is in crml now, to prevent creating session helpers
for processes which want to inherit sid from criu process. For non
nestedns case we are fine.
Reported-by: Connor Zanin <zanin.c@husky.neu.edu>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
If a task, holding userns_sync_lock unexpectedly exits,
criu will hang on error path in restore_root_task(),
because it can't use usernsd to destroy them.
Lets remove the intermediary: we'll create pid_ns helpers
as children of criu main task, and criu main task will
be able to use simple kill() to stop them.
v2: Make code more compact, add a comment.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Cleanup fork() definition and make a generic function
for all archs. It may be useful, when you want to add
more clone flags to fork(), or if you want to pass more,
than one argument to child function (glibc's clone
alows only one).
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
This functionality will be moved in criu task in next patches,
the patch is a preparation.
v2: Rename the function and move pr_err() to it.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
I think, we should warn a user when we can't C/R compatible
applications. That's valid for different than x86 archs.
Let's correct the message the way it'll suit non-x86.
Reported-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Currently lookup_create_item() calls BUG_ON(), if it meets a thread.
We don't expect to meet a thread there, but if images contain incorrect
data, we can be in this situation in open_remap_dead_process().
(gdb) bt
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
It is cheched in dead_pid_conflict, otherwise criu may segfault:
Program terminated with signal 11, Segmentation fault.
1073 if (item->pid->real == item->threads[i].real ||
(gdb) p item
$1 = (struct pstree_item *) 0x0
(gdb) bt
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Introduce a helper and use it instead of repeating code.
Use file and line of caller in error message printing
to allow the caller do not use additional print.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The restorer blob may die silently due to anything:
- Segmentation fault
- OOM killer
- User-sended SIGKILL
- Child CRIU restorer did't abort futex on error path (and exited)
We should terminate the restoring process and avoid locking
self up on waiting for died restoree.
Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
To get pid in ns of current.
Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
We have two places to check for parent via page server -- as
a part of _OPEN req and explicit req. Make the latter code
be in-sync with the opening one.
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The opts.remote is always false in this code.
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
It's simply impossible (yet), so emit a warning.
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
There is no real need to have both.
Signed-off-by: Omri Kramer <omri.kramer@gmail.com>
Singed-off-by: Lior Fisch <fischlior@gmail.com>
Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
service_fd_id is id of a specific task, while other tasks
in shared fd table group may have bigger id numbers.
In this case given unused fd intersects with service fds
of such tasks. This leads to undefined behaviour. Fix that.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The pid must be taken relative to the parent pid namespace.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Acked-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
This patch speeds up creation of child process by disabling
iteration over open files for the most cases. Really, we don't
need that now, as previous patches make parent files do not leak:
mnt namespace fds are stored in fdstore, pid proc files
are closed directly.
So, now we can skip closing old files for the most cases,
except some CLONE_FILES cases: we need that only if parent
have CLONE_FILES in its flags (and for root_item).
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
When task is in pid namespace, getpid() can't be used
to identify it. So, use vpid instead of that.
Also, move log_init_by_pid() above pid check.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Child does not know about parent's pid proc fd,
and it can't close it by fd. Next patch will do
close_old_files() optional, and it will base on
the fact there is no leftover fds. So, close pid
proc directly.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
This allows to decrese number of file descriptors,
which are passed to children, and that is need to
close in close_old_files().
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>:
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Creating a test for verifying configuration parsing feature. The
test is created by reusing already present inotify_irmap test.
Because of addition of default configuration files, --no-default-config
option is added to zdtm.py to not break the test suite on systems with
these files present.
Signed-off-by: Veronika Kabatova <vkabatov@redhat.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Implementation changes for usage of simple configuration files. Before
parsing the command line options, either default configuration files
(/etc/criu/default.conf, $HOME/.criu/default.conf; in this order) are
parsed, or a specific config file passed by the user. Two new options are
introduced: "--config FILEPATH" option allows users to specify a single
configuration file they want to use; and "--no-default-config" option to
forbid the parsing of default configuration files. Both options are to be
passed only via the command line.
Usage of configuration files is not mandatory to keep backwards
compatibility. The implementation of this feature tries to be compatible
with command line usage -- the user should get the same results whether
he passes the options (in the right order of parsing) on command line or
writes them in config files. This allows the user to:
1) Override boolean options if needed
2) Specify partial configuration for options that are possible to pass
several times (e.g. "--external"), and pass the rest of the options
based on process runtime by command line
Configuration file syntax allows comments marked with '#' sign, the rest
of the line after '#' is ignored. The user can use one option per line
(with argument supplied on the same line if needed, divided with whitespace
characters), the options are the same as long options (without the "--"
prefix used on command line).
Configuration file example (syntax purposes only, doesn't make sense):
$ cat ~/.criu/default.conf
tcp-established
work-dir /home/<USERNAME>/criu/work_directory
extra # inline comment
no-restore-sibling
tree 111111
Signed-off-by: Veronika Kabatova <vkabatov@redhat.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
On s390 the first two paramters are swapped because we use
the CONFIG_CLONE_BACKWARDS2 kernel config option.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Reviewed-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
If task's pid were hashed before the task itself
(this may happen, when another task has sid or pgid
equal to this pid), the pid mustn't contain zero
levels. So, if pgid or sid has zero levels, we should
not add them.
Otherwise, session04 --iter 3 fails with:
=[log]=> dump/zdtm/static/session04/30/2/restore.log
------------------------ grep Error ------------------------
(01.858187) 6: Restoring children in our session:
(01.858206) 6: Forking task with 303 pid (flags 0x600)
(01.869893) 1: PID: real 145 virt 15
(01.870247) 1: Forking task with 20 pid (flags 0x0)
(01.872948) Error (criu/cr-restore.c:381): 0: Write -1 to sys/kernel/ns_last_pid: Invalid argument
(01.873030) Error (criu/namespaces.c:2664): Can't set next pid
(01.873103) 1: Error (criu/ns-common.c:46): Error answer
(01.873123) 1: Error (criu/cr-restore.c:404): Can't request next pid
(01.873135) 1: Error (criu/cr-restore.c:1321): Can't set next pid
(01.873310) 1: Error (criu/cr-restore.c:1434): Can't fork for 20: No such file or directory
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Acked-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Since commit 84eedc49a (pstree: Make lookup_create_pid() able to create
tasks with pid->level > 1) the read_pstree_image function presumes that
namespaces image is already parsed.
This patch ensures that this is the case for prepare_dummy_pstree users.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>