The original commit added saving THP_DISABLED flag value, but missed
restoring it. There is restoring code, but used only when --lazy_pages
mode is enabled. Restore the prctl flag always. While at it, rename the
`has_thp_enabled` -> `!thp_disabled` for consistency.
Fixes: bbbd597b4124 (2017-06-28 "mem: add dump state of THP_DISABLED prctl")
Signed-off-by: Michał Mirosław <emmir@google.com>
Linux 4.15 doesn't like empty string for cgroup2 mount options.
Pass NULL then to satisfy the kernel check. Log the options for
easier debugging.
Signed-off-by: Michał Mirosław <emmir@google.com>
4.15-based kernels don't allow F_*SEAL for memfds created with MFD_HUGETLB.
Since seals are not possible in this case, fake F_GETSEALS result as if it
was queried for a non-sealing-enabled memfd.
Signed-off-by: Michał Mirosław <emmir@google.com>
This does cgroup namespace creation separately from joining task
cgroups. This makes the code more logical, because creating cgroup
namespace also involves joining cgroups but these cgroups can be
different to task's cgroups as they are cgroup namespace roots
(cgns_prefix), and mixing all of them together may lead to
misunderstanding.
Another positive thing is that we consolidate !item->parent checks in
one place in restore_task_with_children.
Signed-off-by: Valeriy Vdovin <valeriy.vdovin@virtuozzo.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
This is a patch proposed by Thomas here:
https://lore.kernel.org/all/87ilczc7d9.ffs@tglx/
It removes (created id > desired id) "sanity" check and adds proper
checking that ids start at zero and increment by one each time when we
create/delete a posix timer.
First purpose of it is to fix infinite looping in create_posix_timers on
old pre 3.11 kernels.
Second purpose is to allow kernel interface of creating posix timers
with desired id change from iterating with predictable next id to just
setting next id directly. And at the same time removing predictable next
id so that criu with this patch would not get to infinite loop in
create_posix_timers if this happens.
Thanks a lot to Thomas!
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
CentOS 7 CI environment uses Python 2. To execute criu-ns
script in CentOS 7 changing the current shebang line to
python is required.
This reverse the changes made in a15a63fce0ad4d1a9119771577fa7ef562bbfd6b
Signed-off-by: Dhanuka Warusadura <csx@tuta.io>
These changes fix the `ImportError: No module named pathlib`
error when executing criu-ns tests located at criu/test/others/criu-ns
Signed-off-by: Dhanuka Warusadura <csx@tuta.io>
These changes remove and update the changes introduced in
7177938e60b81752a44a8116b3e7e399c24c4fcb in favor of the
Python version in CI.
os.waitstatus_to_exitcode() function appeared in Python 3.9
Related to: #1909
Signed-off-by: Dhanuka Warusadura <csx@tuta.io>
--criu-binary argument provides a way to supply the CRIU binary
location to run_criu().
Related to: #1909
Signed-off-by: Dhanuka Warusadura <csx@tuta.io>
By default, the file name 'amdgpu_plugin.txt' is used also as the name
for the corresponding man page (`man amdgpu_plugin`). However, when
this man page is installed system-wide it would be more appropriate
to have a prefix 'criu-' (e.g., `man criu-amdgpu-plugin`).
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Using the fact that we know criu_pid and criu is a parent of restored
process we can create pidfile with pid on caller pidns level.
We need to move mount namespace creation to child so that criu-ns can
see caller pidns proc.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Newer Intel CPUs (Sapphire Rapids) have a much larger xsave area than
before. Looking at older CPUs I see 2440 bytes.
# cpuid -1 -l 0xd -s 0
...
bytes required by XSAVE/XRSTOR area = 0x00000988 (2440)
On newer CPUs (Sapphire Rapids) it grows to 11008 bytes.
# cpuid -1 -l 0xd -s 0
...
bytes required by XSAVE/XRSTOR area = 0x00002b00 (11008)
This increase the xsave area from one page to four pages.
Without this patch the fpu03 test fails, with this patch it works again.
Signed-off-by: Adrian Reber <areber@redhat.com>
The pipe_size type is unsigned int, when the fcntl call fails and
return -1, it will cause a negative rollover problem.
Signed-off-by: zhoujie <zhoujie133@huawei.com>
The TOS(type of service) field in the ip header allows you specify the
priority of the socket data.
Signed-off-by: Suraj Shirvankar <surajshirvankar@gmail.com>
The highlight feature of this release is the ability to use CRIU for
non-root users. Adrian Reber implemented the kernel part and created the
initial version of CRIU changes. Then Younes Manton joined the effort
and pushed it to the finish line.
The full change log is here: https://criu.org/Download/criu/3.18
Signed-off-by: Andrei Vagin <avagin@gmail.com>
We do kerndat_has_nspid in kerndat_init already and save result to
kerndat cache, we don't need to recheck it each time.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Previously when tv_sec>=100, the line would look like this:
(269.189615 Error [...]
Now the last char is overwritten with ')'.
Signed-off-by: Michal Clapinski <mclapinski@google.com>
In parse_pid_status there are 13 places where we do done++, so when
"done" is 13 it means that we have matched each of those 13 places and
we are ready to stop. In next lines we are not going to find anything.
So the right condition for the while loop is (done < 13).
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
During the restore process, netlink fd uses the flags in the
NetlinkSkEntry structure to restore the file state, but during
the dump process, the flags values is not saved to the structure.
Signed-off-by: zhoujie <zhoujie133@huawei.com>
Signed-off-by: hejingxian <hejingxian@huawei.com>
Previously fixup was done before threads' registers were dumped so it
didn't actually work. This commit splits rseq fixup into thread leader
fixup and other threads fixup and applies them after the entities are
seized.
Signed-off-by: Michal Clapinski <mclapinski@google.com>
Kernel shouldn't clean up rseq_cs inside a critical section.
If rseq_cs has been cleaned up, it means there is a bug in migration.
Signed-off-by: Michal Clapinski <mclapinski@google.com>
This patch adds concurrency groups to the CI workflows to automatically
cancel any in-progress workflows when a pull request has been updated.
A `concurrency` group allows to ensure that a single job or workflow
will run at a time. For example, when a pull request is updated with
a force-push, the GiHub CI workflows currently in-progress will be
automatically cancelled, and the CI would run only with the updated
commits.
https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#concurrency
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
- use exit_code instead of returning ret
- replace -errno return with -1
- move fallback to if (!kdat.sk_unix_file)
- fix readlinkat error checking (ret < 0 && ret >= PATH_MAX) by using
read_fd_link helper
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
As we now don't have any calls to free in this function we can replace
all lables with explicit returns.
While on it: Replace useless -errno and 1 returns with -1 as from the
very first implementation of unix_resolve_name (it changed name to _old
later) in [1] any non-zero return was treated as error.
6d785e6cd ("unix: resolve a socket file when a socket descriptor is
available") [1]
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
It is strange to free a pointer which is already in unix_sk_desc, either
on error path or on skip as we leave freed pointer in desc and it can
probably be used after free later and lead to some corruption. So I
would prefer not to free it as we don't have full controll over it here.
Fixes: 6d785e6cd ("unix: resolve a socket file when a socket descriptor is available")
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Fix cwd freeing on error path in get_cwd_check_perm and
on non-error-path in unix_fill_sock_name.
v2: use cleanup_free attribute in unix_fill_sock_name
Signed-off-by: Yuriy Vasiliev <yuriy.vasiliev@virtuozzo.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
First, let's move lookup_create_item-s to the end so that on pgid
replacement we don't have false positive pstree_pid_by_virt check
founding item created by sid replacement. (note: we need those
lookup_create_item-s for the sake of free pid selection mechanism)
Second, let's add checks for sid/pgid in images intersecting with
current_sid/pgid, as this would also bring problems on restore.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
In Virtuozzo tests we have seen uninformative errors:
(26.575039) 151187 fdinfo 6: pos: 0 flags: 2/0
(26.575076) sockets: Searching for socket 0x346d1 family 1
(666.230281 ----------------------------------------
(666.230586 Error (criu/cr-dump.c:1850): Dump files (pid: 151187) failed
with -1
So let's add some error messages to this stack.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
With this macro we can easily declare struct mntns_zdtm variables with
all lists properly initiallized. Let's use it in mount_complex_sharing
as without it we can have segfault on error path when accessing
uninitialized list pointers.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Currently we only allow external fuse mount itself, let's allow
bindmount for it too. Other mount code is ready for this change and will
be able to bindmount it from corresponding external mount.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
When installing packages within Archlinux container, pacman fails with
the following errors:
(3/7) Creating temporary files...
/usr/lib/tmpfiles.d/journal-nocow.conf:26: Failed to replace specifiers in '/var/log/journal/%m': No such file or directory
/usr/lib/tmpfiles.d/systemd.conf:23: Failed to replace specifiers in '/run/log/journal/%m': No such file or directory
/usr/lib/tmpfiles.d/systemd.conf:25: Failed to replace specifiers in '/run/log/journal/%m': No such file or directory
/usr/lib/tmpfiles.d/systemd.conf:26: Failed to replace specifiers in '/run/log/journal/%m/*.journal*': No such file or directory
/usr/lib/tmpfiles.d/systemd.conf:29: Failed to replace specifiers in '/var/log/journal/%m': No such file or directory
/usr/lib/tmpfiles.d/systemd.conf:30: Failed to replace specifiers in '/var/log/journal/%m/system.journal': No such file or directory
/usr/lib/tmpfiles.d/systemd.conf:32: Failed to replace specifiers in '/var/log/journal/%m': No such file or directory
/usr/lib/tmpfiles.d/systemd.conf:33: Failed to replace specifiers in '/var/log/journal/%m/system.journal': No such file or directory
To solve this problem we need to initialize the machine ID.
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
This patch optimizes shell code as reading a single file as input using a 'cat' command to a program.
It is considered to be a Useless Use of Cat (UUOC).
It's more efficient to simply use redirection.
However, in some cases, even using the redirection operator '<' seems unnecessary.
Signed-off-by: KKrypt <sankalpacharya1211@gmail.com>
When we collect external mount namespace we don't want to dump mounts in
it, so lets remove this flag. This way we can e.g. use for_dump in
->parse() callbacks to separate in-container mounts from others.
This only affects rare case of `--ext-mount-map auto` but to be
absolutely correct let's fix it anyway.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
The new field cg_set is currently marked as required which causes backward
compatibility problem when using newer CRIU version to restore dumped image
from older version. This commit makes this field optional and reworks the
logic to fallback to use cg_set from task_core when it is not in
thread_core.
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
The new field is_threaded is currently marked as required which causes
backward compatibility problem when using newer CRIU version to restore
dumped image from older version. This commit makes this field optional and
reworks the logic the skip fixing up threaded cgroup controllers if there
is no information in dumped image.
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
The patch is similar to what has been done in linux kernel, as this
warning effectively prevents us from adding list elements to local list
head. See https://github.com/torvalds/linux/commit/49beadbd47c2
Else we have:
CC criu/mount.o
In file included from criu/include/cr_options.h:7,
from criu/mount.c:13:
In function '__list_add',
inlined from 'list_add' at include/common/list.h:41:2,
inlined from 'mnt_tree_for_each' at criu/mount.c:1977:2:
include/common/list.h:35:19: error: storing the address of local variable 'postpone' in
'((struct list_head *)((char *)start + 8))[24].prev' [-Werror=dangling-pointer=]
35 | new->prev = prev;
| ~~~~~~~~~~^~~~~~
criu/mount.c: In function 'mnt_tree_for_each':
criu/mount.c:1972:19: note: 'postpone' declared here
1972 | LIST_HEAD(postpone);
| ^~~~~~~~
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Setting all supported by CPU features in xstate_bv may bring it into
dirty-upper-state as documented in specs, resulting in lower
performance. Let's not do this and set only those have been used by
dumpee.
P.S.
Off course it has to be a one-liner!
Fixes: #1171
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
This patch documents how do we use `make lint` and `make indent` and
adds a note about their integration with CI.
Co-authored-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>