We fully support xsaves, so no need for noxsaves temporary
option anymore.
This reverts commit ba93feb5f0f017f2ee498f6ee2db58bcaf817501.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
We fully support xsaves now, so no need for
warning or any other message.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
arm's cpu code is the same as aarch64, so use
the symlink.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
cpu extensions (such as avx-512) require bigger xsave
area to keep fpu registers set, so we allocate a page
per process to keep them all. On checkpoint we parse
runtime fpu features and dump them into an image and
do reverse on restore procedure.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
This feature will allow to catch incapability early
if present in image.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
- externd compel_cpuinfo_t to keep all fpu information
neded for xsaves mode
- fetch xsaves data in compel_cpuid
All this will allow us to extend criu to support
avx-512 intructions.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
We're printing messages before the log file
is set up, so instead of keeping silence lets
print data in default outputs.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
This reduces memory usage if image files are stored on tmpfs.
Signed-off-by: Pawel Stradomski <pstradomski@google.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
In resolve_shared_mounts there are cases when we have m->master_id > 0
but m->mnt_master is not set, it happens for cases where we have no
access to master mount, for instance CT root m->parent==NULL or when
mount is external. In can_mount_now for such master mounts we don't
need to check mounted state also, so just use "if(mi->mnt_master)"
condition instead of "if(mi->master_id > 0)" to fix segfault.
https://jira.sw.ru/browse/PSBM-86978
Program terminated with signal 11, Segmentation fault.
0x000000000046328b in can_mount_now (mi=0x2155970) at criu/mount.c:2699
2699 list_for_each_entry(c, &mi->mnt_master->children, siblings)
(gdb) p mi->mnt_master
$2 = (struct mount_info *) 0x0
Fixes commit 3a02362c5be1 ("mount: fix can_mount_now to wait children of
master's share properly")
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Running 'criu dump -t <PID>' with a configuration file under valgrind
where <PID> does not exist, gives:
==14336== 600 bytes in 5 blocks are definitely lost in loss record 5 of 5
==14336== at 0x4C29BC3: malloc (vg_replace_malloc.c:299)
==14336== by 0x5D387A4: getdelim (in /usr/lib64/libc-2.17.so)
==14336== by 0x439829: getline (stdio.h:117)
==14336== by 0x439829: parse_config (config.c:69)
==14336== by 0x439CB2: init_configuration.isra.1 (config.c:159)
==14336== by 0x439F75: init_config (config.c:212)
==14336== by 0x439F75: parse_options (config.c:487)
==14336== by 0x42499F: main (crtools.c:140)
==14336== LEAK SUMMARY:
==14336== definitely lost: 600 bytes in 5 blocks
With this patch:
==17892== LEAK SUMMARY:
==17892== definitely lost: 0 bytes in 0 blocks
Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
It works like other external resources.
A user specify which namespaces are external and have not to be dumped.
On restore, the user gives file descriptors to preconfigured namespaces.
How to use:
dump:
--external net[INO]:KEY
restore:
--inherit-fd fd[NSFD]:KEY
The test script contains more details how to use this:
test/others/netns_ext/run.sh
Acked-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
We need to know which namespaces are external to restore them properly.
Acked-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
We already check (root, mountpoint) pairs preserve, do the same for
(root, mountpoint, shared, slave) fours.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
When CRIU is called for a first time and the /run/criu.kdat file does
not exists, the following warning is shown:
Warn (criu/kerndat.c:847): Can't load /run/criu.kdat
This patch is replacing this warning with a more appropriate debug message.
File /run/criu.kdat does not exist
Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
If we fail to create temporary directory for doing a clean mount we can
make mount clean reusing the code which enters new mountns to umount
overmounts. As when last process exits mntns all mounts are implicitly
cleaned from children, see in kernel source - sys_exit->do_exit
->exit_task_namespaces->switch_task_namespaces->free_nsproxy
->put_mnt_ns->umount_tree->drop_collected_mounts->umount_tree:
/* Hide the mounts from mnt_mounts */
list_for_each_entry(p, &tmp_list, mnt_list) {
list_del_init(&p->mnt_child);
}
Fixes commit b6cfb1ce2948 ("mount: make open_mountpoint handle overmouts
properly")
https://github.com/checkpoint-restore/criu/issues/520
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Acked-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Create a tree of shared mounts where shared mounts have different sets
of children (while having the same root):
First share1 is mounted shared tmpfs and second share1/child1 is mounted
inside, third share1 is bind-mounted to share2 (now share1 and share2
have the same shared id, but share2 has no child), fourth share1/child2
is bind-mounted from share1, and also propagated to share2/child2 (now
all except share1/child1 have the same shared id), fifth share1/child3
is mounted and propagates inside the share.
Finally we have four mounts shared between each other with different
sets of children mounts, and even more two of them are children of
another two:
495 494 0:62 / /zdtm/static/non_uniform_share_propagation.test/share1 rw,relatime shared:235 - tmpfs share rw
496 495 0:63 / /zdtm/static/non_uniform_share_propagation.test/share1/child1 rw,relatime shared:236 - tmpfs child1 rw
497 494 0:62 / /zdtm/static/non_uniform_share_propagation.test/share2 rw,relatime shared:235 - tmpfs share rw
498 495 0:62 / /zdtm/static/non_uniform_share_propagation.test/share1/child2 rw,relatime shared:235 - tmpfs share rw
499 497 0:62 / /zdtm/static/non_uniform_share_propagation.test/share2/child2 rw,relatime shared:235 - tmpfs share rw
500 495 0:64 / /zdtm/static/non_uniform_share_propagation.test/share1/child3 rw,relatime shared:237 - tmpfs child3 rw
503 497 0:64 / /zdtm/static/non_uniform_share_propagation.test/share2/child3 rw,relatime shared:237 - tmpfs child3 rw
502 499 0:64 / /zdtm/static/non_uniform_share_propagation.test/share2/child2/child3 rw,relatime shared:237 - tmpfs child3 rw
501 498 0:64 / /zdtm/static/non_uniform_share_propagation.test/share1/child2/child3 rw,relatime shared:237 - tmpfs child3 rw
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
These also fixes false-propagation problem of the mount to itself if it
is in parent's share.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
1) redo waiting for parents of propagation group to be mounted using
pre-found propagation groups
2) for shared mount wait for children of that shared group which has no
propagation in our shared mount
(2) - effectively is a support of non-uniform shares, that means two
mounts of shared group can have different sets of children now - we will
mount them in the right order, but propagate_mount and validate_shared
are still preventing c/r-ing such shares, will fix the former and remove
the latter in separate(next) patches.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
These information will help improving the restore of tricky mounts
configurations.
Function same_propagation_group checks if two mounts were created
simultaneousely through shared mount propagation, and the main part of
these - they should be in exaclty the same place inside the share of
their parents.
Function root_path_from_parent prints the mountpoint path
relative to the root of the parent's share, by first substracting
parent's mountpoint from our mountpoint and second prepending parents
root path (relative to the root of it's file system), e.g:
id parent_id root mountpoint
1 0 / /
2 1 / /parent_a
3 1 /dir /parent_b
4 2 / /parent_a/dir/a
5 3 / /parent_b/a
(Let 2 and 3 be a shared group)
For mount 4 root_path_from_parent gives:
"/parent_a/dir/a" - "/parent_a" == "/dir/a"
"/" + "/dir/a" == "/dir/a"
For mount 5:
"/parent_b/a" - "/parent_b" == "/a"
"/dir" + "/a" == "/dir/a"
So mounts 4 and 5 are a propagation group.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
We should not use ->bind link for checking master's children. As if we
have two slaves shared between each other, the one mounted first will
replace ->bind link for the other - that will break restore.
Also while on it, if we do not want doubled mounts and want to
prohibit propagation to slaves on restore we likely want all children of
the whole master's share mounted before slave.
JFYI: Actually these restriction is very strict and some cases will fail
to restore, for instance (hope nobody does so):
mkdir /test
mount -t tmpfs test /test
mount --make-private /test
mkdir /test/{share,slave}
mount -t tmpfs share /test/share --make-shared
mount --bind /test/share/ /test/slave/
mount --make-slave /test/slave
mount --make-shared /test/slave
mkdir /test/share/slave
mount --bind /test/slave/ /test/share/slave/
cat /proc/self/mountinfo | grep test
524 612 0:69 / /test rw,relatime - tmpfs test rw
570 524 0:73 / /test/share rw,relatime shared:879 - tmpfs share rw
571 524 0:73 / /test/slave rw,relatime shared:942 master:879 - tmpfs share rw
602 570 0:73 / /test/share/slave rw,relatime shared:942 master:879 - tmpfs share rw
603 571 0:73 / /test/slave/slave rw,relatime shared:943 master:942 - tmpfs share rw
Here 603 is a propagation of 602 from master 570 to slave 571, and it is
the only way to get such a mount as 571 and 602 are in one shared group
now and all later mounts to them will propagate between them and create
dublicated mounts. So to create real 603 without dups we need to have
/test/slave mounted before /test/share/slave, which contradicts with
current assumption.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
See more detailed explanation inside in-code comment.
note: Actually before we remove validate_mounts (later in these
patchset) we likely won't get to these check and fail earlier, as having
children collision implies shared mounts with different sets of
children.
note: from v4.11 and ms kernel commit 1064f874abc0 ("mnt: Tuck mounts
under others instead of creating shadow/side mounts.") there will be no
more mount collision.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
We use fdstore intensively for example when handling
bindmounted sockets and ghost dgram sockets. The system
limit for per-socket queue may not be enough if someone
generate lots of ghost sockets (150 and more as been
detected on default fedora 27).
To make it operatable lets unlimit fdstore queue size
on startup.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
When we are dumping epoll and one of target fd is been
duped we can reuse already collected fds rbtree to find
proper target. We handle it in a lazy way:
- try use plain regular bsearch first, in case of all
targets are not duped we checkpoint epoll immediately
- if bsearch failed we put this epoll entry into a queue
and run its dumping later when all other files in the
process are already dumped. At this moment fds tree
should already has all target files in rbtree thus
we can simply lookup for it
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
It is used in files tree generation so we will need
reuse for epoll sake.
Also use the whole 64 bit offset to shuffle bits more.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
To find target files with help of our collected
rbtree.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
If we can't find target file descriptor we should
exit on dump with error instead of skipping it.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
We will use them to fast lookup of targets files.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>