We should follow Linux Kernel Codding Style:
... the closing brace is empty on a line of its own, except in the cases
where it is followed by a continuation of the same statement, ie ... an
else in an if-statement ...
https://www.kernel.org/doc/html/v4.10/process/coding-style.html#placing-braces-and-spaces
Automaticly fixing with:
:!git grep --files-with-matches "^\s*else[^{]*{" | xargs
:argadd <files>
:argdo :%s/}\s*\n\s*\(else[^{]*{\)/} \1/g | update
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
In real life cases pipe_ino param could be larger that INT_MAX,
but in autofs_parse() function we using atoi function, that uses
4 byte integers. It's a bug.
Example of mount info from real case:
(00.508286) type autofs source /etc/auto.misc mnt_id 2824 s_dev 0x4b9 / @
./misc flags 0x300000 options fd=5,pipe_ino=3480845226,pgrp=95929,timeout=300,
minproto=5,maxproto=5,indirect
3480845226 > 2147483647 (32-bit wide signed int max value) => we have a problem
It causes a error:
(03.195915) Error (criu/pipes.c:529): The packetized mode for pipes is not supported yet
Signed-off-by: Alexander Mikhalitsyn (Virtuozzo) <alexander@mihalicyn.com>
This commit introduces an optimization when rsti(t)->vma_io is empty.
This optimization allows streaming a non-seekable image as CR_FD_PAGES
is not reopened.
Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
When an image is opened but errored with a ENOENT error, the image is
still valid. Later on, do_pb_read_one() can fail and will invoke
image_name(). The image fd is EMPTY_IMG_FD (-404). read_fd_link fails.
Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
After restoring processes, we have to be sure that monotonic and
boottime clocks will not go backward. For this, we can restore processes
in a new time namespace and set proper offsets for the clocks.
In this patch, criu dumps clocks values event when processes are running
in this host time namespace and on restore, criu creates a new time
namespace, sets dumped clock values and restores processes.
Signed-off-by: Andrei Vagin <avagin@gmail.com>
This test checks that monotonic and boottime don't jump after C/R.
In ns and uns flavors, the test is started in a separate time namespace
with big offsets, so if criu will restore a time namespace incorrectly
the test will detect the big delta of clocks values before and after C/R.
Signed-off-by: Andrei Vagin <avagin@gmail.com>
The time namespace allows for per-namespace offsets to the system
monotonic and boot-time clocks.
C/R of time namespaces are very straightforward. On dump, criu enters a
target time namespace and dumps currents clocks values, then on restore,
criu creates a new namespace and restores clocks values.
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Hope I have enough experience in the project to be nominated. I want to
help with review and will try to do my best in it.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
The struct memfd_inode has a union for dump and restore parts.
The only common parts are the list_head node, and the inode id.
Suggested-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Per-object image is acceptable if we expect to have 1-3 objects
per-container. If we expect to have more objects, it is better to save
them all into one image. There are a number of reasons for this:
* We need fewer system calls to read all objects from one image.
* It is faster to save or move one image.
Signed-off-by: Andrei Vagin <avagin@gmail.com>
After running make install, build directory is generated but not ignored
in gitignore. So this commit add build directory to gitignore.
Signed-off-by: Byeonggon Lee <gonny952@gmail.com>
Fix n_xid_map leaks on error path and remove useless exit_code.
Fixes: 6e1726f8 ("userns: set uid and gid before entering into userns")
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
The helper function removes code duplication from tests that want to
initialize unix socket address to an absolute file path, derived from
current working directory of the test + relative filename of a resulting
socket. Because the former code used cwd = get_current_dir_name() as
part of absolute filename generation, the resulting filepath could later
cause failure of bind systcall due to unchecked permissions and
introduce confusing permission errors.
Signed-off-by: Valeriy Vdovin <valeriy.vdovin@virtuozzo.com>
Any filesystem syscall, that needs to navigate to inode by it's
absolute path performs successive lookup operations for each part of the
path. Lookup operation includes access rights check.
Usually but not always zdtm tests processes fall under 'other' access
category. Also, usually directories don't have 'x' bit set for other.
In case when bit 'x' is not set and user-ID and group-ID of a process
relate it to 'other', test's will not succeed in performing these
syscalls which are most of filesystem api, that has const char *path
as part of it arguments (open, openat, mkdir, bind, etc).
The observable behavior of that is that zdtm tests fail at file
creation ops on one system and pass on the other. The above is not
immediately clear to the developer by just looking at failed test's logs.
Investigation of that is also not quick for a developer due to the
complex structure of zdtm runtime where nested clones with
NAMESPACE flags take place alongside with bind-mounts.
As an additional note: 'get_current_dir_name' is documented as returning
EACCESS in case when some part of the path lacks read/list permissions.
But in fact it's not always so. Practice shows, that test processes can
get false success on this operation only to fail on later call to
something like mkdir/mknod/bind with a given path in arguments.
'get_cwd_check_perm' is a wrapper around 'get_current_dir_name'. It also
checks for permissions on the given filepath and logs the error. This
directs the developer towards the right investigation path or even
eliminates the need for investigation completely.
Signed-off-by: Valeriy Vdovin <valeriy.vdovin@virtuozzo.com>
Here is a fast path when two consequent vma-s share the same file.
But one of these vma-s can map a file with MAP_SHARED, but another one
can map it with MAP_PRIVATE and we need to take this into account.
Any shared memroy mapping can be opened via /proc/self/maps_files/.
Such file descriptors look like memfd file descriptors, so
they can be dumped by the same way.
Signed-off-by: Andrei Vagin <avagin@gmail.com>
The config.h detection scripts should use the provided CFLAGS/LDFLAGS
as it tries to link libnl, libnet, and others.
Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
We should ignore (not parse) images that has non-crtool format,
that images has no magic number (RAW_IMAGE_MAGIC equals 0).
nftables images has format compatible with `nft -f /proc/self/fd/0`
input format.
Reported-by: Mr Jenkins
Signed-off-by: Alexander Mikhalitsyn (Virtuozzo) <alexander@mihalicyn.com>
The file only includes other headers (which may be not needed).
If we aim for one-include-for-compel, we could instead paste all
subheaders into "compel.h".
Rather, I think it's worth to migrate to more fine-grained compel
headers than follow the strategy 'one header to rule them all'.
Further, the header creates problems for cross-compilation: it's
included in files, those are used by host-compel. Which rightfully
confuses compiler/linker as host's definitions for fpu regs/other
platform details get drained into host's compel.
Signed-off-by: Dmitry Safonov <dima@arista.com>
The plan is to remove "compel.h". That file only includes other headers
(which may be not needed). If we aim for one-include-for-compel, we
could instead paste all subheaders into "compel.h".
Rather, I think it's worth to migrate to more fine-grained compel
headers than follow the strategy 'one header to rule them all'.
Further, the header creates problems for cross-compilation: it's
included in files, those are used by host-compel. Which rightfully
confuses compiler/linker as host's definitions for fpu regs/other
platform details get drained into host's compel.
As a first step - stop including "compel.h" in criu.
Signed-off-by: Dmitry Safonov <dima@arista.com>
To really open symlink file and not the regular file below it, one needs
to do open with O_PATH|O_NOFOLLOW flags. Looks like systemd started to
open /etc/localtime symlink this way sometimes, and before that nobody
actually used this and thus we never supported this in CRIU.
Error (criu/files-ext.c:96): Can't dump file 11 of that type [120777]
(unknown /etc/localtime)
Looks like it is quiet easy to support, as c/r of symlink file is almost
the same as c/r of regular one. We need to only make fstatat not
following links in check_path_remap.
Also we need to take into account support of ghost symlinks.
Signed-off-by: Alexander Mikhalitsyn (Virtuozzo) <alexander@mihalicyn.com>
Co-developed-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
On these test without the patch ("fown: Don't fail on dumping files opened
wit O_PATH") we trigger these errors:
Error (criu/pie/parasite.c:340): fcntl(4, F_GETOWN_EX) -> -9
Error (criu/files.c:403): Can't get owner signum on 18: Bad file descriptor
Error (criu/files-reg.c:1887): Can't restore file pos: Bad file descriptor
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Alexander Mikhalitsyn (Virtuozzo) <alexander@mihalicyn.com>
O_PATH opened files are special: they have empty
file operations in kernel space, so there not that
much we can do with them, even setting position is
not allowed. Same applies to a signal number for
owner settings.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Co-developed-by: Alexander Mikhalitsyn <alexander@mihalicyn.com>
Signed-off-by: Alexander Mikhalitsyn (Virtuozzo) <alexander@mihalicyn.com>
python 2.7 doesn't call the read system call if it's read file to the
end once. The next seek allows to workaround this problem.
inhfd/memfd.py hangs due to this issue.
Reported-by: Mr Jenkins
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Right now, criu uses a dumped fd to dump content of a memfd "file".
Here are two reasons why we should not do this:
* a state of a dumped fd doesn't have to be changed, but now criu calls
lseek on it. This can be workarounded by using pread.
* a dumped descriptor can be write-only.
Reported-by: Mr Jenkins
Cc: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
The runc test cases are (sometimes) mounting a cgroup inside of the
container. For these tests to succeed, let CRIU know that cgroup2 exists
and how to restore such a mount.
This does not fix any specific cgroup2 settings, it just enables CRIU to
mount cgroup2 in the restored container.
Signed-off-by: Adrian Reber <areber@redhat.com>
More preparations for cgroupv2 freezer. Factor our the freezer state
opening and writing to have one location where to handle v1 and v2
differences.
Signed-off-by: Adrian Reber <areber@redhat.com>
The cgroupv2 freezer does not return the same strings as v1. Instead of
THAWED and FROZEN v2 returns 0 and 1 (strings). This prepares the seize
code to use 0 and 1 everywhere and THAWED and FROZEN only for v1
specific code paths.
Signed-off-by: Adrian Reber <areber@redhat.com>