This way users would be able to create more meaningfull pull-requests
and issues. And we would not need to ask them to provide basic
information each time.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Previousely kerndat_uffd could not differentiate -EPERM and -1 returned
from uffd_open(). That way "Failed to get uffd API" and "Incompatible
uffd API ..." errors were just ignored, which is probably not what we
want.
v2: rework with extra argument of uffd_open for errno, rename err
label in uffd_open for readability
Fixes: cfdeac4a4 ("kerndat: Handle non-root mode when checking uffd")
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Now that the Ubuntu kernel is no longer broken with regards to
overlayfs, let's switch back to overlayfs instead of devicemapper and
vfs graphdrivers.
Signed-off-by: Adrian Reber <areber@redhat.com>
There are several problems with the loop.sh script. First, the code is
duplicated across tests in the so-called 'othres' category. Second, we
need to run it with the 'setsid' utility to make sure that it runs in
a new session. Third, we have to redirect the standard file descriptors
and use the '&' operator to make it run in the background. Finally,
obtaining the PID of the 'loop.sh' process resulted in race condition.
In this patch we replace the loop.sh script with a program that would
address all problems mentioned above. The requirements for this program
are as follows.
- It must be reusable across tests
- It must start a process that is detached from the current shell
- It must wait for the process to start and output its PID
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
The function name '_exit' is misleading as this function doesn't
actually exit when the status of the previous command is zero.
In addition, the behaviour of this function is not really needed.
This patch removes the '_exit' function and applies the correct
behaviour to stop the test on failure.
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
One should never rely on errno if libc syscall is successful. We can
either see an errno set from some previous failed syscall or even errno
set by a this successful libc syscall. So lets check ret first.
Fixes: 1ccdaf47 ("criu: add pidfd based pid reuse detection for RPC
clients")
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
This is just a symlink to the original transition/pid_reuse test with
the right options passed to trigger the pidfd store based pid reuse
detection code path.
Pidfd store based detection is supported only in RPC mode which
requires passing a unix socket fd to be used as pidfd store and
the kernel should support pidfd_open and pidfd_getfd syscalls
{'feature': 'pidfd_store'} for this test to work.
Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
When testing pid reuse using pidfd_store feature in RPC mode we need
to pass a unix socket fd used to CRIU in the RPC option
pidfd_store_sk to store the pidfds between predump/dump iterations.
Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
Closes: #717
This increases the reliability of pid reuse detection using pidfds,
currently through RPC migration tools like P.Haul.
A connectionless unix socket is passed to criu in RPC mode through
the RPC option pidfd_store_sk.
If this option is set, the socket is initialized in
init_pidfd_store_sk() to be used as a queue for task pidfds.
criu then sends tasks pidfds to this socket in send_pidfd_entry()
and receives them in the next pre-dump/dump iteration to build
the pidfds hashtable in init_pidfd_store_hash().
These pidfds will be used later in detect_pid_reuse().
How it should be used in migration tools like P.Haul:
- Open a connectionless unix socket
- Pass the socket fd in the RPC option pidfd_store_sk when
doing a pre-dump or dump
This will fail if the kernel does not support pidfd_open or
pidfd_getfd syscalls, so pidfd_store_sk should not be set if the
kernel does not support pidfd_open.
This could be checked with:
CLI: criu check --feature pidfd_store
RPC: CRIU_REQ_TYPE__FEATURE_CHECK and set pidfd_store to
true in the "features" field of the request
v2:
- add reasonable polling restart limit in check_pidfd_entry_state
to avoid getting stuck
- avoid leaking pidfd in send_pidfd_entry when entry is NULL,
otherwise pidfds are freed in free_pidfd_store
v3:
- check that the passed pidfd store is not empty after
the first iteration (i.e. --prev-images-dir option set).
v4:
- clear pidfd_hash heads
- check entry allocation error in init_pidfd_store_hash()
Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
pidfd_store which will be used for reliable pidfd based pid reuse
detection for RPC clients requires two recent syscalls (pidfd_open
and pidfd_getfd).
We allow checking if pidfd_store is supported using:
1. CLI: criu check --feature pidfd_store
2. RPC: CRIU_REQ_TYPE__FEATURE_CHECK and set pidfd_store to
true in the "features" field of the request
Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
pidfd_store_sk option will be used later to store tasks pidfds
between predumps to detect pid reuse reliably.
pidfd_store_sk should be a fd of a connectionless unix socket.
init_pidfd_store_sk() steals the socket from the RPC client using
pidfd_getfd, checks that it is a connectionless unix socket and
checks if it is not initialized before (i.e. unnamed socket).
If not initialized the socket is first bound to an abstract name
(combination of the real pid/fd to avoid overlap), then it is
connected to itself hence allowing us to store the pidfds in the
receive queue of the socket (this is similar to how fdstore_init()
works).
v2:
- avoid close(pidfd) overriding errno of SYS_pidfd_open in
init_pidfd_store_sk()
- close pidfd_store_sk because we might have leftover from
previous iterations
Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
pidfd_getfd syscall will be needed later to send pidfds between
pre-dump/dump iterations for pid reuse detection.
v2:
- check size written/read of val_a/val_b is correct
- return with error when val_a != val_b
Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
pidfd_open syscall will be needed later to send pidfds between
pre-dump/dump iterations for pid reuse detection.
v2:
- make kerndat_has_pidfd_open void since 0 is always returned
- fix missing tabs in syscall tables
Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
When criu is run as user it fails and exits because of kerndat_uffd() returning -1.
This, in turn, happens after uffd = syscall(SYS_userfaultfd, flags); which only works
for root.
In the change it ignores the permission error and proceeds further just like it's done
for e.g. pagemap checking.
Signed-off-by: Nithin Jaikar J <jaikar006@gmail.com>
This commit extends the CRIT tests to cover the 'x' command, which is
used to explore an image directory.
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
This is the workaround for #1429.
The parasite code contains instructions that trigger SIGTRAP to stop at
certain points. In such cases, the kernel sends a force SIGTRAP that
can't be ignore and if it is blocked, the kernel resets its signal
handler to a default one and unblocks it. It means that if we want to
save the origin signal handle
Signed-off-by: Andrei Vagin <avagin@gmail.com>
This testcase reproduces deadlock in "wait_fds_event" futex in open_fdinfos()
function (files subsystem).
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
This patch fixes deadlock that appears on ghost DGRAM unix sockets.
Problem is that wake_connected_sockets() function *should* be called
strictly after fle->stage >= FLE_OPEN.
Explanation:
Consider situation, we have ghost unix DGRAM socket (peer socket),
and also have several sockets that connected to this peer socket.
How restore of that picture works?
In files subsystem we have open_fdinfos(pstree_item*) function that calls open_fd()
function for *every* fd of task. open_fd() function calls appropriate
file descriptor "open" handler that may return "1" which means "try again later".
This retcode means, that some additional resources is needed to fully restore file
descriptor. For *ghost* UNIX sockets, for instance, we need to have peer socket
file descriptor *before* we can open and restore client sockets. Here is the main problem.
open_fdinfos() called from separate tasks simultaneously, so, when we get "1" retcode
we stay on futex (wait_fds_event() function) and waiting for someone another task
restore some resource and notify us that we can retry opening of file descriptor.
With *ghost* UNIX socket I've managed to caught next behaviour.
1. From one task (that holds client socket) open_fdinfos() called open_fd() that called
open_unixsk_standalone(). In open_unixsk_standalone we have check that means
"if socket have peer and that peer is GHOST and that peer fle->stage < FLE_OPEN"
return "try again". Ok. So, this task will stay on wait_fds_event().
2. Second task, that holds *peer* tried to open peer socket fd. So,
it also calls open_fd() -> open_unixsk_standalone() -> opening socket
-> bind_unix_sk() -> in bind_unix_sk we have call to wake_connected_sockets().
So, after that call we will "wake up" task from first point and it may proceed
fd restoring. Yes? No. In first point we need to "peer_fle->stage >= FLE_OPEN"
but fle->stage of our peer socket will become FLE_OPEN in open_fd(). After we
return from open_unixsk_standalone we proceed to setup_and_serve_out() where we have
appropriate stage change.
Between call of wake_connected_sockets and moment when we set stage to FLE_OPEN
should pass very small amount of time. But it may happen, so we "wake up"
tasks that holds client sockets but did not have enough time to change fle->stage
to FLE_OPEN. Exactly that case I've managed to reproduce.
(Really, ossec-hids application managed to reproduce this problem at first %) )
v1: file_desc_ops->on_stage_change callback was introduced,
sk-unix ghost code reworked so that to call wake_connected_sockets() strictly
after changing fle->stage to FLE_OPEN.
v2: implementation replaced with short and more practical patch by Andrei
Suggested-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
On modern Linux distributions iptables binaries using new nftables
API. We dump iptables rules using "iptables-save", and nftables
rules using libnftables API. This breaks network unlock on modern systems
because technically, we dump rules (including network lock rules) two times.
There is another problem - on host we can have modern distribution, but
in Docker container we can use iptables with netfilter (legacy) API.
So, in this case this legacy rules will be skipped.
This patch handles all of that cases. It tries to find iptables legacy and
dump legacy rules by using appropriate iptables binaries, dump nftables
rules by using libnftables.
Fixes#1435
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
CRIU used to check for the existence of /sys/fs/selinux to see if
SELinux is enabled on a system. We have seen systems with SELinux kind
of enabled but reading out the labels gives does not return real labels.
To work around this, this commit adds a check during LSM detection
if SELinux labels are in the right format. For CRIU this check means to
see if there are at least 3 ':' in a label. If not CRIU switches to no
LSM mode.
Signed-off-by: Adrian Reber <areber@redhat.com>
As of January 1st, 2020 Python 2 is no longer supported and
many distributions no longer provide packages for Python 2
dependencies.
This patch allows CRIU to use Python 3 by default when both
major versions are available on the system.
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
CRIU follows Linux kernel coding style. This patch updates the
architecture-specific code for MIPS to use tab indentation,
add whitespace between closing parenthesis and open bracket,
and changes the mode of source files from 755 to 644.
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
When invoking an action-script, all file descriptors >= 3 are closed.
If execvp() fails, we can only log the error on stderr. pr_msg() outputs
on stderr, so we use this as opposed to pr_perror().
Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
Unlike pr_perror, pr_err and other macros do not append \n
to the message being printed, so the caller needs to take care of it.
Sometimes it was not done, so let's add this manually.
To make sure it won't happen again, add a line to Makefile under the
linter target to check for such missing \n. NOTE this check is only
done for part of such cases (where the pr_* statement fits in one line
and there's no comment after), but it's better than nothing.
Add comments after pr_msg and pr_info statements where we deliberately
don't add \n, so that the above check ignores them.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
My editor (vim) auto-removes whitespace at EOL for *.c and *.h files,
and I think it makes sense to have a separate commit for this, rather
than littering other commits with such changes.
To make sure this won't pile up again, add a line to Makefile under
the linter target to check for such things (so CI will fail).
This is all whitespace except an addition to Makefile.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
1. Use pr_perror where errno needs to be shown.
2. Use pr_err in cases where errno is not set
by the previous failed call.
3. Make sure pr_err's first argument do not have \n.
4. While at it, fix some error messages.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
On my system (shellcheck v0.7.1) make lint shows a few warnings about
needing to quote variables.
Fix those.
PS I am not sure why those are not shown by GHA CI, I assume there is
different shellcheck version used. Add shellcheck -- version to the
appropriate Makefile target to avoid confusion.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
In many cases developers forget that pr_perror and fail macros
are a bit special, in particular:
1. they already show errno;
2. they already append \n to the message.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Macro fail() already prints the value of errno, so there's no need to
explicitly add it.
Found by git grep '^\s*\<fail\>.*errno'
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Macro fail already appends errno and strerror(errno) to the error
message, so there's no need to do it explicitly.
Brought to you by
for f in $(git grep -l fail test/zdtm); do
test -f $f || continue
echo $f
sed -i '\|^[[:space:]]*fail(.*[ (]%m)*"|s/:*[ (]*%m)*//' $f
done
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Macro fail already appends \n to the message, so there's no need to do
it explicitly.
Brought to you by
for f in $(git grep -l fail test/zdtm); do
test -f $f || continue
echo $f
sed -i '\%^[[:space:]]*fail(.*\\n"%s/\\n"/"/' $f
done
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Macro pr_perror already adds errno and its string representation to the
error message, so there's no need to explicitly do it.
While at it, fix some error messages.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
pr_perror should be used for cases where the failed operation sets
errno. For cases where errno is not set, pr_err is preferable.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>