Guard vDSO code with CONFIG_VDSO, no need to even build it
on archs which do not support vDSO handling.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Alexander Kartashov <alekskartashov@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We stop searching if vma->start is bigger than a required one.
The coursor is set on the last examined vma. When we are searching a
parent vma for the next vma, we start examine vma-s starting from
coursor->next, so we don't examine the vma, which is pointed by cursor.
This patch replaces list_for_each_entry_continue on list_for_each_entry_from.
Reported-by: Filipe Brandenburger <filbranden@google.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
A parent vma can be only one.
Fixes: 57d25e7cea12 ("mm: fix expression to determine which vma-s can be shared")
Reported-by: Filipe Brandenburger <filbranden@google.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Look at this hunk from 7659c995f58f:
- paddr = decode_pointer(vma_premmaped_start(&p->vma));
+ paddr = decode_pointer(vma->premmaped_addr);
Obviously we want to use p->premmaped_addr instead of
vma->premmaped_addr.
Fixes: 7659c995f58f ("vm: don't overwrite vma->shmid for private mappings")
Reported-by: Filipe Brandenburger <filbranden@google.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Now mntns_collect_root() should be called each time when we need to get
a root of a specified namespace and we don't need to call it for
initializing the global variable.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This patch removes the global mntinfo_tree and collect_mount_info where
it was constructed. The mntinfo list is filled from dump_mnt_ns,
rst_collect_local_mntns, collect_mnt_namespaces and read_mnt_ns_img.
A mountinfo entry contains a reference on a proper ns_id entry, so
we cau use mnt_id to look up a proper mount namespace.
v2: remove trash after rebasing.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We are going to support nested mount namespaces, so files can be opened
from more than one namespace and a root must be collect for each file.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
mntns_collect_root() uses PROC_FD_OFF, so we need to close it.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Now we supports sub-mntns, so root_ns_mask sounds more correct than
current_ns_mask.
v2: typo fix
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Known issue:
* currently only namespaces with the same root is supported
* nested namespaces can be dumped and restored only if the root task
has own mount namespace.
All nested namespaces are restored in a root namespace in temporary
directories. All mount points restored in one tree and then they are
divided into namesaces.
The task with minimal pid for each namespaces unshared mntns and
then it makes pivot_root in a proper temporary directory. All other
tasks makes setns to enter into a mount namespace of the task with
minimal pid.
v2: clean up
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
All non-root namespaces will be restored as sub-trees of the root tree.
This patch adds helpers to create a temporary directory and mount tmpfs
in it, then create directories for each non-root mount namespace.
tmpfs is quite useful here to simplify destroying this construction,
we don't need to unmount each namespace separately.
v2: add a comment why MNT_DETACH is not dangerous here
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We have a race. Consider we have 3 tasks, A, B and C. A and B
share fdtable, C -- does not. Then we might be in a situation
when A is restoring memory reading mem images, and B -- forking
the C child. In that case descriptors held by A (for mem restore)
will be inherited by C and will not get closed.
This reverts commit d36e07aabe073993d8ae9695e33f6e45b2eb6a21.
Now we have two funсtions which do mostly the same, so this patch merges
them.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Make explicit checks and helpers for legacy images.
This should facilitate its removal some day in the
future.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This as well gives us minus one image per-task and
allocates more space on core task entry.
One thing to note -- the amount of posix timers is
not easily accessible at the core entry allocation
time, so the respective array is allocated on demand.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This allows to have one image less per-task, which in turn
reduces live migration time a little bit.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
A process can be restored in a new pidns. close_old_fds() opens
the /proc/PID directory. Without this patch we can see errors like this:
(00.333915) 1: Error (util.c:102): Unable to close fd 6: Bad file descriptor
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
For all other tasks only unsed service descriptors will be closed.
This change allows to have file descriptors, which may be used for
restoring namespaces. All non-server descriptors must be closed before
restoring files.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
prepare_mappings() uses the return value of map_private_vma() for the
size of the mapped vma. Unfortunately the return value of
map_private_vma() is an int, resulting in breakage when the size exceeds
31 bits. Change map_private_vma() to return only an error code, and
mutate addr in-place.
Signed-off-by: Jamie Liu <jamieliu@google.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
post-restore script can fail, so it can't be called after network-unlock.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The --exec-cmd option specifies a command that will be execvp()-ed on successful
restore. This way the command specified here will become the parent process of
the restored process tree.
Waiting for the restored processes to finish is responsibility of this command.
All service FDs are closed before we call execvp(). Standad output and error of
the command are redirected to the log file when we are restoring through the RPC
service.
This option will be used when restoring LinuX Containers and it seems helpful
for perf or other use cases when restored processes must be supervised by a
parent.
Two directions were researched in order to integrate CRIU and LXC:
1. We tell to CRIU, that after restoring container is should execve()
lxc properly explaining to it that there's a new container hanging
around.
2. We make LXC set himself as child subreaper, then fork() criu and ask
it to detach (-d) from restore container afterwards. Being a subreaper,
it should get the container's init into his child list after it.
The main reason for choosing the first option is that the second one can't work
with the RPC service. If we call restore via the service then criu service will
be the top-most task in the hierarchy and will not be able to reparent the
restore trees to any other task in the system. Calling execve from service
worker sub-task (and daemonizing it) should solve this.
Signed-off-by: Deyan Doychev <deyandoichev@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
During the time some files become obsolete and might be missing
in checkpoint image set, but to keep backward compatibility we
still trying to open them, which might print out error like
| Unable to open 'path-to-file'
and confuse a reader why criu prints error but continue working.
To eliminate this problem O_OPT flag has been introduced in
commit 16b5692061e2, which suppress error message priting
if the flag is set.
Now start using O_OPT in the following functions
- open_irmap_cache: irmap cache is relatively new optional feature
- prepare_rlimits, open_signal_image, restore_file_locks,
prepare_fd_pid, prepare_mm_pid, collect_image: all these
helpers are trying to open image files which can be missing.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
To save backward compatibility try to read
data from old image if Core entry doesn't
has rlimits bound.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
if option --auto-dedup is set on restore, then as soon as page is
restored it will be punched from the image.
open image in O_RDWR mode
Signed-off-by: Tikhomirov Pavel <snorcht@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We do it first -- on collect, second -- on restore. The
2nd lookup is excessive, we can put fd pointer on vm_area
at lookup and reuse one later.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
On restore we only need to know currnet task mappings' start and end
to find where to put the restorer blob. And since the smaps file in
/proc/pid is up to 3 times slower, than the maps one, it makes
perfect sense just to parse the latter one.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
With packed reg-files we have a complex fd - file - vma - remap interaction. I
think this should be reflected in the code comment.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Useful to test restore time -- just abort restore with this
action and that's it.
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
On restore we will read all VmaEntries in one big MmEntry object,
so to avoif copying them all into vma_areas, make them be pointable.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Right now we do it two times -- on shmem prepare and
on the restore itself. Make collection only once as
we do for fdinfo-s -- root task reads all stuff in and
populates tasks' rst_info with it.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The ret is overwritten by core read sub-routine. Need
to reset it to -1 to keep failing in case of e.g. last
pid sysctl write.
Reported-by: Neal Becker <ndbecker2@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We should call external scripts when namespaces are created,
but before we try to fill them with data from images.
This is done so e.g. to make it possible to push external net
links to netns.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Here is nothing interecting. If a file can't be dumped by criu,
plugins are called. If one of plugins knows how to dump the file,
the file entry is marked as need_callback. On restore if we see
this mark, we execute plugins for restoring the file.
v2: Callbacks are called for all files, which are not supported by CRIU.
v3: Call plugins for a file instead of file descriptor. A few file
descriptors can be associated with one file.
v4: A file descriptor is opened in a callback. It's required for
restoring anon vmas.
v5: Add a separate type for unsupported files
v6: define FD_TYPES__UNSUPP
v7: s/unsupp/ext (external)
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Libraries (plugins) is going to be used for dumping and restoring
external dependencies (e.g. dbus, systemd journal sockets, charecter
devices, etc)
A plugin can have the cr_plugin_init() and cr_plugin_fini functions for
initialization and deinialization.
criu-plugin.h contains all things, which can be used in plugins.
v2: rename lib to plugin
v3: add a default value for a plugin path.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
when get an error reading image file need stop restore
Signed-off-by: Tikhomirov Pavel <snorcht@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
For paths resolution we will need mount tree to be parsed
and built, but it's not that simple -- the current code
implies that once parsed the tree must not be re-parsed
again, so we pass @parse argument from a caller: if a task
we're restoring do not use mount namespace, we should parse
mount tree early, otherwise defer this action until mount
tree is read from the image.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>