Now criu create a new pipe buffer, if a previous one has another set of
flags. In this case, a pipe is not full and we can use it for the
next page buffer.
We need 88 pipes to pre-dump the zdtm/static/fork test without this
patch, and we need only 17 pipes with this patch.
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
v2: and move it upper, because it is going to be used in ppb_alloc()
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
vmsplice can't splice more than UIO_MAXIOV, but we can
call it a few times from a parasite.
v2: s/nr/nr_segs/
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The origin idea was to set --empty net for criu dump and criu restore,
but before cde33dcb0639 ("empty-ns: Don't C/R iptables too (v2)"),
criu restore worked without --empty net and we didn't notice that
docker doesn't set this option on restore.
After a small brainstorm, we decided that it is better to remove
this requirement. Docker has to set this option, but with this changes,
the docker issue will be less urgent.
https://github.com/checkpoint-restore/criu/issues/393
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Install the last version of Docker, start a container and C/R it a few times.
v2: call make to install criu
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Test cases:
0. Basic non-breaking read/write leases.
1. Multiple read leases and OFDs with no lease for the same file.
2. Breaking leases.
3. Multiple fds (dup + inherited) for single lease (mutual OFD).
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Information about locks in /proc/<pid>/fdinfo is presented only since
kernel v4.1. This patch adds logic to *note_file_lock* to match leases
and OFDs.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
restore of breaking leases is executed in 2 steps:
1. restore the lease in a state it was before break
2. break it by opening associated file.
The patch fixes type of broken leases to 'target lease type',
because procfs always returns 'READ' in this case.
Also, it adds 'updated' field in lock structure. It's used to remove all
duplicated records for single lease from the image, which wasn't
corrected by 'correct_lease_type'.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Leases in breaking state are not supported. In that case criu will
report an error during the dumping. Also lock info in
/proc/<pid>/fdinfo should be presented (since kernel 4.1).
Before taking out new lease it modifies process fsuid to match file uid
(see fcntl F_SETLEASE).
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Otherwise we get errors like this:
/usr/include/sys/socket.h:315:5: note: expected 'const struct sockaddr *' but argument is of type 'struct sockaddr_un *'
int bind (int, const struct sockaddr *, socklen_t);
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The idea of the check-only option is that criu dump and criu
restore is executed with this option to check whether c/r is
possible for a set of processes. This has to work faster than
without the check-only option.
Now we run criu restore --check-only for images which have
been generated by criu dump without --check-only, it is obviously wrong.
Cc: Adrian Reber <areber@redhat.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
If the check-only option is set, dump and restore is executed twice,
and we need to set separate logs for both cases.
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
If the restore was exexuted with the check-only option,
after restoring all resources tasks waits children and
exits with the 0 code.
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The kerndat_init() is now called before the jump to action handler. This
allows us to directly use kdat without calling to the corresponding
kerndat_*() methods.
✓ travis-ci: success for lazy-pages: update checks for availability of userfaultfd (rev3)
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
In commit 8ce156970cb1 ("pstree: rework init reparent handling for pid
namespaces") I've changed session leader lookup to walk up untill
session leader, in sid inheritance check, not as before to just any
same session process. So we need to allow external sid to be inherited
explicitly for shell jobs.
With these fix I manage to c/r your example fine.
Note: for shell jobs with nested pidnses we need also "[PATCH 04/10]
pstree: add prepare_pstree_leaders to create sid/pgid helpers in
advance" which is in crml now, to prevent creating session helpers
for processes which want to inherit sid from criu process. For non
nestedns case we are fine.
Reported-by: Connor Zanin <zanin.c@husky.neu.edu>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
If a task, holding userns_sync_lock unexpectedly exits,
criu will hang on error path in restore_root_task(),
because it can't use usernsd to destroy them.
Lets remove the intermediary: we'll create pid_ns helpers
as children of criu main task, and criu main task will
be able to use simple kill() to stop them.
v2: Make code more compact, add a comment.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Cleanup fork() definition and make a generic function
for all archs. It may be useful, when you want to add
more clone flags to fork(), or if you want to pass more,
than one argument to child function (glibc's clone
alows only one).
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
This functionality will be moved in criu task in next patches,
the patch is a preparation.
v2: Rename the function and move pr_err() to it.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
I think, we should warn a user when we can't C/R compatible
applications. That's valid for different than x86 archs.
Let's correct the message the way it'll suit non-x86.
Reported-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Currently lookup_create_item() calls BUG_ON(), if it meets a thread.
We don't expect to meet a thread there, but if images contain incorrect
data, we can be in this situation in open_remap_dead_process().
(gdb) bt
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
It is cheched in dead_pid_conflict, otherwise criu may segfault:
Program terminated with signal 11, Segmentation fault.
1073 if (item->pid->real == item->threads[i].real ||
(gdb) p item
$1 = (struct pstree_item *) 0x0
(gdb) bt
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Introduce a helper and use it instead of repeating code.
Use file and line of caller in error message printing
to allow the caller do not use additional print.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The restorer blob may die silently due to anything:
- Segmentation fault
- OOM killer
- User-sended SIGKILL
- Child CRIU restorer did't abort futex on error path (and exited)
We should terminate the restoring process and avoid locking
self up on waiting for died restoree.
Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
To get pid in ns of current.
Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
We have two places to check for parent via page server -- as
a part of _OPEN req and explicit req. Make the latter code
be in-sync with the opening one.
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The opts.remote is always false in this code.
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
It's simply impossible (yet), so emit a warning.
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
There is no real need to have both.
Signed-off-by: Omri Kramer <omri.kramer@gmail.com>
Singed-off-by: Lior Fisch <fischlior@gmail.com>
Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
service_fd_id is id of a specific task, while other tasks
in shared fd table group may have bigger id numbers.
In this case given unused fd intersects with service fds
of such tasks. This leads to undefined behaviour. Fix that.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The pid must be taken relative to the parent pid namespace.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Acked-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
This patch speeds up creation of child process by disabling
iteration over open files for the most cases. Really, we don't
need that now, as previous patches make parent files do not leak:
mnt namespace fds are stored in fdstore, pid proc files
are closed directly.
So, now we can skip closing old files for the most cases,
except some CLONE_FILES cases: we need that only if parent
have CLONE_FILES in its flags (and for root_item).
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
When task is in pid namespace, getpid() can't be used
to identify it. So, use vpid instead of that.
Also, move log_init_by_pid() above pid check.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Child does not know about parent's pid proc fd,
and it can't close it by fd. Next patch will do
close_old_files() optional, and it will base on
the fact there is no leftover fds. So, close pid
proc directly.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
This allows to decrese number of file descriptors,
which are passed to children, and that is need to
close in close_old_files().
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>:
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Creating a test for verifying configuration parsing feature. The
test is created by reusing already present inotify_irmap test.
Because of addition of default configuration files, --no-default-config
option is added to zdtm.py to not break the test suite on systems with
these files present.
Signed-off-by: Veronika Kabatova <vkabatov@redhat.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>