Here we just want to check that if rseq was registered before C/R
it remains registered after it.
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
A lot of kernel versions lacks support for ptrace(PTRACE_GET_RSEQ_CONFIGURATION).
But the userspace may be fresh (for instance containers with fresh Fedora runs
on CentOS 7 host). Consider two scenarious:
- kernel has no ptrace(PTRACE_GET_RSEQ_CONFIGURATION) support
1. there is a process which use rseq => fail dump
2. there is no process which use rseq => we can dump without any problems
But how to determine if process use rseq or not without get_rseq_conf feature?
Let's just try to do rseq registration from the parasite. If rseq is already
registered then we'll got EBUSY error. If not we'll success in registration.
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Support basic rseq C/R scenario. Assume that:
- there are no processes with IP inside the rseq critical section (CS)
- kernel has ptrace(PTRACE_GET_RSEQ_CONFIGURATION) support
On dump:
1. use ptrace(PTRACE_GET_RSEQ_CONFIGURATION) to get
struct rseq pointer, rseq size and signature from the kernel.
2. save to the image
On restore:
1. get rseq ptr, size, signature from the image
2. register it back using rseq() from the restorer parasite
Fixes: #1696
Reported-by: Radostin Stoyanov <radostin@redhat.com>
Suggested-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Add "get_rseq_conf" feature corresponding to the
ptrace(PTRACE_GET_RSEQ_CONFIGURATION) support.
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
The code expected that the cgroup directory ends with a ',' and
unconditionally removes the last character. For the "unified" case this
resulted in the last 'd' being remove instead of the non existing comma.
This just adds a comma after "unified" so that the last removed
character is not the 'd'.
Suggested-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
Those that codespell have a few variants for:
./soccr/soccr.c:219: thise ==> these, this
./soccr/soccr.c:444: sence ==> sense, since
./criu/net.c:665: ot ==> to, of, or
./criu/net.c:775: ot ==> to, of, or
./criu/files.c:1244: wan't ==> want, wasn't
./criu/kerndat.c:1141: happend ==> happened, happens, happen
./criu/mount-v2.c:781: carefull ==> careful, carefully
./test/zdtm/static/socket_aio.c:54: Chiled ==> Child, chilled
./test/zdtm/static/socket_listen6.c:73: Chiled ==> Child, chilled
./test/zdtm/static/socket_listen.c:73: Chiled ==> Child, chilled
./test/zdtm/static/socket_listen4v6.c:73: Chiled ==> Child, chilled
./test/zdtm/static/sk-unix-dgram-ghost.c:201: childs ==> children, child's
./test/zdtm/static/sk-unix-dgram-ghost.c:205: childs ==> children, child's
./compel/arch/x86/src/lib/infect.c:297: automatical ==> automatically, automatic, automated
While at it, do some other minor fixes in the same lines.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
I am not sure if this is going to bring any compatibility issues.
If yes, we need to remove this patch and add "useable" to the list of
ignored words instead.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Codespell thinks that tThe is a typo. Fix it by separating "\t"
which also includes readability (a bit).
[v2: run via make indent]
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
It is mapped, not maped. Same applies for mmap I guess.
Found by codespell, except it wants to change it to mapped,
which will make it less specific.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Codespell thinks that NODEL is a misspelled MODEL. Indeed it looks that
way. Add an underscore.
Do the same for the file names.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Codespell thinks that "inot" is a misspelled "into".
Rename to infd ("inotify fd") to make it happy.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
CRIU has a few places where it creates unix sockets and their names have to be
unique for each criu run.
Fixes: #1798
Signed-off-by: Andrei Vagin <avagin@google.com>
```
criu/apparmor.c:679:26: error: 'fscanf' may overflow; destination buffer in argument 3 has size 48, but the corresponding specifier may require size 49 [-Werror,-Wfortify-source]
ret = fscanf(f, "%48s", contents);
```
The buffer size should be at least one larger than the fscanf maximum
field width.
Fixes: 8d992a680e ("lsm: support checkpoint/restore of stacked apparmor profiles")
Signed-off-by: Fangrui Song <maskray@google.com>
The init process can exit if it doesn't have any child processes and its
pidns is destroyed in this case. CRIU dump is running in the target pid
namespace and it kills dumped processes at the end. We need to create a
holder process to be sure that the pid namespace will not be destroy
before criu exits.
Fixes: #1775
Signed-off-by: Andrei Vagin <avagin@gmail.com>
zdtm.py mounts two named controllers for tests. In CI, we run zdtm.py a few
times, so we can mount (create) these controllers once to avoid any unwanted
effects.
Signed-off-by: Andrei Vagin <avagin@google.com>
The idea that each zdtm.py should have own helder, so that two zdtm.py that are
running on the same host don't effect each other.
Fixes: #1774
Signed-off-by: Andrei Vagin <avagin@google.com>
We have three of "Can't mount at %s", let's distinguish simple mount
from bind-mount and re-mount to make log reading easier.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
On pre v5.15 kernel we don't have MOVE_MOUNT_SET_GROUP support and thus
all our ci logs are filled with "fallback" messages. Let's decrease log
level to debug, so that we don't see it in ci logs.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
[root@fedora criu]# ./test/zdtm.py run -t zdtm/static/pty-console --iters 2 --keep-going --ignore-taint
[WARNING] Option --keep-going is more useful when running multiple tests
userns is supported
=== Run 1/1 ================ zdtm/static/pty-console
====================== Run zdtm/static/pty-console in uns ======================
Start test
Test is SUID
./pty-console --pidfile=pty-console.pid --outfile=pty-console.out
Run criu dump
Run criu restore
Run criu dump
=[log]=> dump/zdtm/static/pty-console/62/2/dump.log
------------------------ grep Error ------------------------
b'(00.009325) 101 fdinfo 3: pos: 0 flags: 100000/0'
b'(00.009332) Dumping path for 3 fd via self 19 [/zdtm/static]'
b'(00.009345) 101 fdinfo 4: pos: 0 flags: 100002/0'
b'(00.009352) tty: Dumping tty 20 with id 0xc'
b"(00.009358) Error (criu/files-reg.c:1710): Can't lookup mount=1647 for fd=4 path=/ptmx"
b'(00.009361) ----------------------------------------'
b'(00.009369) Error (criu/cr-dump.c:1368): Dump files (pid: 101) failed with -1'
b'(00.009696) Running network-unlock scripts'
b'(00.012401) Unfreezing tasks into 1'
b'(00.012410) \tUnseizing 86 into 1'
b'(00.012415) \tUnseizing 101 into 1'
b'(00.012428) Error (criu/cr-dump.c:1788): Dumping FAILED.'
------------------------ ERROR OVER ------------------------
################ Test zdtm/static/pty-console FAIL at CRIU dump ################
Test output: ================================
<<< ================================
Send the 9 signal to 86
Wait for zdtm/static/pty-console(86) to die for 0.100000
##################################### FAIL #####################################
Restore on second iteration with mount-v2 fails, that is because
devpts_restore which is called from do_new_mount_v2 via fstype->restore
opens ptmx file in service mntns and saves it to fdstore for later use.
So after first c/r open ptmx fd changes mnt_id in fdinfo to a detached
mount. Let's just disable mount-v2 for this test for now.
FIXME: We should create separate fstype hook to do_mount_in_right_mntns,
so that we can open files from this hook in actual restored mntns.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Let's run zdtm in jenkins with --mntns-compat-mode option and same for
device-external mount test from others.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Now when we switched to mount-v2 by default to check old mount engine we
need to explicitly run with --mntns-compat-mode option.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
We can have tracefs separate mount from debugfs and that's why the
/sys/kernel/debug external mount now has children and this thing is not
supported to be bind in container with children, because we don't wan't
external mounts to introduce some unexpected extra external mounts so we
bind them without MS_REC in mount-v2 unlike in old mount engine.
We can either bind without MS_REC when constructing test or provide all
children mount as separate external mounts to criu, let's just disable
for now.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/87875c023
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Before mounts-v2 we have seen mounts loosing their mount readonly flags
when they were in a propagation group, because CRIU "forgot" to set
them, with new mount engine it should work now as all propagations are
now created on the same path there all other normal mounts are created,
and all mount flags are restored.
This test actually mounts only one mount, other three are propagations,
lets set mount ro flag for half of them.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/22584993d
FIXME: need to check options restored right as we don't have
--check-mounts to do this job for us.
Reviewed-by: Alexander Mikhalitsyn (Virtuozzo) <alexander@mihalicyn.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Mounts-v2 engine should fix multiple problems of old engine relative to
sharing options, lets add a test for such problems.
Add all four types of shared groups: 1) private, 2) shared, 3) slave
and 4) slave+shared for mounts. Propagate them into sharing and after
propagation change sharing with four ways: 1) don't change, 2) make
private, 3) make slave and 4) make private + make shared.
This brings 16 cases of different sharing options for mount propagation,
lets check that they all are restored fine.
Lets create mounts from description to make it easier to improve this
test in future.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/8bcd0034d
FIXME: need to check options restored right as we don't have
--check-mounts to do this job for us.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
These test simply checks that sharing between two mounts in container:
1) external mount and 2) it's bind persists (case when bind has the same
mountpoint).
Note: on old mount engine mounts inside container become also shared
with mount in criu mount namespace (outside container) after c/r which
is not right.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/76a09e850
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Now when we switched to mount-v2 by default to check old mount engine we
need to explicitly run with --mntns-compat-mode option.
Note that if the feature move_mount_set_group is not supported then
regular run will just fallback to old mount engine and then we don't
need separate run with --mntns-compat-mode.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Design of mounts-v2:
As a preparation step we classify mounts in groups by (shared_id,
master_id) in new resolve_shared_mounts_v2 (just after reading images).
New function prepare_mnt_ns_v2 is our main entry point when switching
from old mount engine to new one actually happens.
First we pre-create each mount namespace nearly empty, only with root
yard in place (pre_create_mount_namespaces).
We walk the mount tree and mount each mount similar to old mount
engine but not in mount tree but as a sub-directory of root yard
(plain mountpoint) in service (criu) mount namespace. Also we
bind this mount from service mntns to real mntns just after creation.
(do_mount_in_right_mntns)
Note: this way we initially have the final mount which would be
visible to restored container user with right mnt_id for the sake of
e.g. creating unix sockets on it (for unix socket bindmounts), and
both have copy of the mount in service mntns so that old code which
accesses files on mounts through service mntns still can acces them.
New can_mount_now_v2 is now free from heuristics we had for restoring
shared groups, we will restore them later via MOVE_MOUNT_SET_GROUP,
for now everything is private.
Now when all plain mount are created in real mount namespaces, we can
move them to the tree for each namespace. Also we open fds on the
mountpoint: one mp_fd_id before moving and another mnt_fd_id after,
so that we can access each file later from final mntns via those fds.
(assemble_mount_namespaces)
New restore_mount_sharing_options walks each root sharing group and
their descendants with dfs tree walk. It creates sharing for the first
mount in the sharing group and then sets the same sharing on all other
mounts in this group.
Sharing creation for fist mount is two step:
a) If mount has master_id we either copy shared_id from parent sharing
group or from external source and then make mount slave thus
converting it to right master_id.
b) Next if mount has shared_id we just make us shared, creating right
shared_id.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/596651d02
Changes:
- Split all "exporting" to separate preparational patches
- Rework cr_time
- Switch to MOVE_MOUNT_SET_GROUP
- Use resolve_mountpoint for external mounts (for MOVE_MOUNT_SET_GROUP)
- Mounting plain mounts both in service and in restored-final mntns
- Call MOVE_MOUNT_SET_GROUP from usernsd
- Rework can_mount_now_v2 to handle bind of both root and external.
- Use sys_move_mount for mount assembling.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>