Apparently Skylake uses init-optimization when saving FPU state, and ptrace()
returns XSTATE_BV[0] = 0 meaning FPU was not used by a task (in init state).
Since CRIU restore uses sigreturn to restore registers, FPU state is always
restored. Fill the state with default values on dump to make restore happy.
Signed-off-by: Michał Mirosław <emmir@google.com>
The original commit added saving THP_DISABLED flag value, but missed
restoring it. There is restoring code, but used only when --lazy_pages
mode is enabled. Restore the prctl flag always. While at it, rename the
`has_thp_enabled` -> `!thp_disabled` for consistency.
Fixes: bbbd597b41 (2017-06-28 "mem: add dump state of THP_DISABLED prctl")
Signed-off-by: Michał Mirosław <emmir@google.com>
Linux 4.15 doesn't like empty string for cgroup2 mount options.
Pass NULL then to satisfy the kernel check. Log the options for
easier debugging.
Signed-off-by: Michał Mirosław <emmir@google.com>
4.15-based kernels don't allow F_*SEAL for memfds created with MFD_HUGETLB.
Since seals are not possible in this case, fake F_GETSEALS result as if it
was queried for a non-sealing-enabled memfd.
Signed-off-by: Michał Mirosław <emmir@google.com>
This does cgroup namespace creation separately from joining task
cgroups. This makes the code more logical, because creating cgroup
namespace also involves joining cgroups but these cgroups can be
different to task's cgroups as they are cgroup namespace roots
(cgns_prefix), and mixing all of them together may lead to
misunderstanding.
Another positive thing is that we consolidate !item->parent checks in
one place in restore_task_with_children.
Signed-off-by: Valeriy Vdovin <valeriy.vdovin@virtuozzo.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
This is a patch proposed by Thomas here:
https://lore.kernel.org/all/87ilczc7d9.ffs@tglx/
It removes (created id > desired id) "sanity" check and adds proper
checking that ids start at zero and increment by one each time when we
create/delete a posix timer.
First purpose of it is to fix infinite looping in create_posix_timers on
old pre 3.11 kernels.
Second purpose is to allow kernel interface of creating posix timers
with desired id change from iterating with predictable next id to just
setting next id directly. And at the same time removing predictable next
id so that criu with this patch would not get to infinite loop in
create_posix_timers if this happens.
Thanks a lot to Thomas!
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
CentOS 7 CI environment uses Python 2. To execute criu-ns
script in CentOS 7 changing the current shebang line to
python is required.
This reverse the changes made in a15a63fce0
Signed-off-by: Dhanuka Warusadura <csx@tuta.io>
These changes fix the `ImportError: No module named pathlib`
error when executing criu-ns tests located at criu/test/others/criu-ns
Signed-off-by: Dhanuka Warusadura <csx@tuta.io>
These changes remove and update the changes introduced in
7177938e60 in favor of the
Python version in CI.
os.waitstatus_to_exitcode() function appeared in Python 3.9
Related to: #1909
Signed-off-by: Dhanuka Warusadura <csx@tuta.io>
--criu-binary argument provides a way to supply the CRIU binary
location to run_criu().
Related to: #1909
Signed-off-by: Dhanuka Warusadura <csx@tuta.io>
By default, the file name 'amdgpu_plugin.txt' is used also as the name
for the corresponding man page (`man amdgpu_plugin`). However, when
this man page is installed system-wide it would be more appropriate
to have a prefix 'criu-' (e.g., `man criu-amdgpu-plugin`).
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Using the fact that we know criu_pid and criu is a parent of restored
process we can create pidfile with pid on caller pidns level.
We need to move mount namespace creation to child so that criu-ns can
see caller pidns proc.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Newer Intel CPUs (Sapphire Rapids) have a much larger xsave area than
before. Looking at older CPUs I see 2440 bytes.
# cpuid -1 -l 0xd -s 0
...
bytes required by XSAVE/XRSTOR area = 0x00000988 (2440)
On newer CPUs (Sapphire Rapids) it grows to 11008 bytes.
# cpuid -1 -l 0xd -s 0
...
bytes required by XSAVE/XRSTOR area = 0x00002b00 (11008)
This increase the xsave area from one page to four pages.
Without this patch the fpu03 test fails, with this patch it works again.
Signed-off-by: Adrian Reber <areber@redhat.com>
The pipe_size type is unsigned int, when the fcntl call fails and
return -1, it will cause a negative rollover problem.
Signed-off-by: zhoujie <zhoujie133@huawei.com>
The TOS(type of service) field in the ip header allows you specify the
priority of the socket data.
Signed-off-by: Suraj Shirvankar <surajshirvankar@gmail.com>
It is mapped, not maped. Same applies for mmap I guess.
Found by codespell, except it wants to change it to mapped,
which will make it less specific.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Change made through this commit:
- Include copy of flog as a seperate tree.
- Modify the makefile to add and compile flog code.
Signed-off-by: prakritigoyal19 <prakritigoyal19@gmail.com>
The highlight feature of this release is the ability to use CRIU for
non-root users. Adrian Reber implemented the kernel part and created the
initial version of CRIU changes. Then Younes Manton joined the effort
and pushed it to the finish line.
The full change log is here: https://criu.org/Download/criu/3.18
Signed-off-by: Andrei Vagin <avagin@gmail.com>
We do kerndat_has_nspid in kerndat_init already and save result to
kerndat cache, we don't need to recheck it each time.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Previously when tv_sec>=100, the line would look like this:
(269.189615 Error [...]
Now the last char is overwritten with ')'.
Signed-off-by: Michal Clapinski <mclapinski@google.com>
In parse_pid_status there are 13 places where we do done++, so when
"done" is 13 it means that we have matched each of those 13 places and
we are ready to stop. In next lines we are not going to find anything.
So the right condition for the while loop is (done < 13).
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
During the restore process, netlink fd uses the flags in the
NetlinkSkEntry structure to restore the file state, but during
the dump process, the flags values is not saved to the structure.
Signed-off-by: zhoujie <zhoujie133@huawei.com>
Signed-off-by: hejingxian <hejingxian@huawei.com>
Previously fixup was done before threads' registers were dumped so it
didn't actually work. This commit splits rseq fixup into thread leader
fixup and other threads fixup and applies them after the entities are
seized.
Signed-off-by: Michal Clapinski <mclapinski@google.com>
Kernel shouldn't clean up rseq_cs inside a critical section.
If rseq_cs has been cleaned up, it means there is a bug in migration.
Signed-off-by: Michal Clapinski <mclapinski@google.com>
This patch adds concurrency groups to the CI workflows to automatically
cancel any in-progress workflows when a pull request has been updated.
A `concurrency` group allows to ensure that a single job or workflow
will run at a time. For example, when a pull request is updated with
a force-push, the GiHub CI workflows currently in-progress will be
automatically cancelled, and the CI would run only with the updated
commits.
https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#concurrency
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
- use exit_code instead of returning ret
- replace -errno return with -1
- move fallback to if (!kdat.sk_unix_file)
- fix readlinkat error checking (ret < 0 && ret >= PATH_MAX) by using
read_fd_link helper
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
As we now don't have any calls to free in this function we can replace
all lables with explicit returns.
While on it: Replace useless -errno and 1 returns with -1 as from the
very first implementation of unix_resolve_name (it changed name to _old
later) in [1] any non-zero return was treated as error.
6d785e6cd ("unix: resolve a socket file when a socket descriptor is
available") [1]
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
It is strange to free a pointer which is already in unix_sk_desc, either
on error path or on skip as we leave freed pointer in desc and it can
probably be used after free later and lead to some corruption. So I
would prefer not to free it as we don't have full controll over it here.
Fixes: 6d785e6cd ("unix: resolve a socket file when a socket descriptor is available")
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Fix cwd freeing on error path in get_cwd_check_perm and
on non-error-path in unix_fill_sock_name.
v2: use cleanup_free attribute in unix_fill_sock_name
Signed-off-by: Yuriy Vasiliev <yuriy.vasiliev@virtuozzo.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
First, let's move lookup_create_item-s to the end so that on pgid
replacement we don't have false positive pstree_pid_by_virt check
founding item created by sid replacement. (note: we need those
lookup_create_item-s for the sake of free pid selection mechanism)
Second, let's add checks for sid/pgid in images intersecting with
current_sid/pgid, as this would also bring problems on restore.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
In Virtuozzo tests we have seen uninformative errors:
(26.575039) 151187 fdinfo 6: pos: 0 flags: 2/0
(26.575076) sockets: Searching for socket 0x346d1 family 1
(666.230281 ----------------------------------------
(666.230586 Error (criu/cr-dump.c:1850): Dump files (pid: 151187) failed
with -1
So let's add some error messages to this stack.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>