The --post-start hook creates a netns which the test should enter
at the beginning of the test.
The test randomly failed in CI tests, it is most likely caused by
a race condition.
I suspect this flow is root cause:
1. --post-start hook starts just after the test (in parallel)
2. --post-start hook calls ip netns add to create the test netns
3. ip creates the netns file
4. netns_lock test opens that file and uses it in setns
5. ip mounts the netns to the file
Of course test fails at step 4 because the netns is not yet mounted
to the file.
I made the test wait for SYNCFILE to be created by the --post-start
hook before it tries to open the netns file and call setns.
Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
The mips64el-cross test target started to show following error:
error: listing the stack pointer register '$29' in a clobber list is deprecated [-Werror=deprecated]
This fixes it in three different places by removing $29' from the
clobber list. This is only compile tested as we have no mips hardware
for testing.
Signed-off-by: Adrian Reber <areber@redhat.com>
If inherit-fd is read from a config file its buffer will be freed
after the config file is parsed but before task restore, which is
when we need to use the mapping. Therefore, when adding an
inherit-fd mapping to the opts list, copy the key string to a new
buffer.
Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
It will broken when the cli `crit show ipcns-shm-9.img` is executed, msg:
{
"magic": "IPCNS_SHM",
"entries": [
{
"desc": {
"key": 0,
"uid": 0,
"gid": 0,
"cuid": 0,
"cgid": 0,
"mode": 438,
"id": 0
},
"size": 1048576,
"in_pagemaps": true,
"extra": Traceback (most recent call last):
File "/usr/bin/crit", line 6, in <module>
cli.main()
File "/usr/lib/python3/dist-packages/pycriu/cli.py", line 412, in main
opts["func"](opts)
File "/usr/lib/python3/dist-packages/pycriu/cli.py", line 45, in decode
json.dump(img, f, indent=indent)
File "/usr/lib/python3.9/json/__init__.py", line 179, in dump
for chunk in iterable:
File "/usr/lib/python3.9/json/encoder.py", line 431, in _iterencode
yield from _iterencode_dict(o, _current_indent_level)
File "/usr/lib/python3.9/json/encoder.py", line 405, in _iterencode_dict
yield from chunks
File "/usr/lib/python3.9/json/encoder.py", line 325, in _iterencode_list
yield from chunks
File "/usr/lib/python3.9/json/encoder.py", line 405, in _iterencode_dict
yield from chunks
File "/usr/lib/python3.9/json/encoder.py", line 438, in _iterencode
o = _default(o)
File "/usr/lib/python3.9/json/encoder.py", line 179, in default
raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type bytes is not JSON serializable
This is caused by `img['magic'][0]['extra']` which is bytes. I find
other load condtions, fix them at the same time.
Signed-off-by: fu.lin <fulin10@huawei.com>
This change is motivated by checkpointing and restoring container in
Pods.
When restoring a container into a new Pod the SELinux label of the
existing Pod needs to be used and not the SELinux label saved during
checkpointing.
The option --lsm-profile already enables changing of process SELinux
labels on restore. If there are, however, tmpfs checkpointed they
will be mounted during restore with the same context as during
checkpointing. This can look like the following example:
context="system_u:object_r:container_file_t:s0:c82,c137"
On restore we want to change this context to match the mount label of
the Pod this container is restored into. Changing of the mount label
is now possible with the new option --mount-context:
criu restore --mount-context "system_u:object_r:container_file_t:s0:c204,c495"
This will lead to mount options being changed to
context="system_u:object_r:container_file_t:s0:c204,c495"
Now the restored container can access all the files in the container
again.
This has been tested in combination with runc and CRI-O.
Signed-off-by: Adrian Reber <areber@redhat.com>
When criu dumps a process in a network namespace it locks
the network so that no packets from peer enters the stack,
otherwise RST will be sent by a kernel causing the connection
to fail.
In netns_lock.c we try to enter the netns created by post-start
hook so that criu locks the network namespace between dump and
restore.
A TCP server is started in post-start hook inside the test netns
and runs in the background detached from its parent so that
it stays alive for the duration of the test.
Other hooks (pre-dump, pre-restore, post-restore) try to
connect to the server.
Pre-dump and post-restore hooks should be able to connect
successfully.
Pre-restore hook client with SOCCR_MARK should also connect
successfully.
Pre-restore hook client without SOCCR_MARK should not be able
to connect but also should not get connection refused as all
packets are dropped in the namespace so the kernel shouldn't
send an RST packet as a result. Instead we check that the
connect operation causes a timeout.
This test would be useful when testing that the network is
locked using different ways (using iptables currently and
other methods later).
v2:
- check that packets with SOCCR_MARK are allowed to
pass when the netns is locked.
v3:
- fix pre-restore hook skipping non SOCCR_MARK
connection test due to early exit in SOCCR_MARK
variant.
Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
criu exec is deprecated for some time now and criu just exits with an
error if running 'criu exec'. This removes the test for that non-working
subcommand.
Signed-off-by: Adrian Reber <areber@redhat.com>
This adds a test run to ensure known (but fixed) configuration file
parser errors are not crashing CRIU anymore.
Based on missing test code coverage this script also tests code paths of
the option handling which have not been tested until now.
Signed-off-by: Adrian Reber <areber@redhat.com>
Trying to see how robust the configuration parser I was able to crash
CRIU pretty quickly. This fixes a few crashes in the existing
configuration file parser.
Signed-off-by: Adrian Reber <areber@redhat.com>
The callers of bread() and bwrite() assume the operation reads/writes
the complete length of the passed buffer.
We must loop when invoking the read()/write() APIs.
Fixes#1504
Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
Now that we are running CI on an actual CentOS 7 kernel different
tests are no longer working as they require newer kernels.
This commit disables a few tests only on CentOS 7.
Signed-off-by: Adrian Reber <areber@redhat.com>
This commit removes a couple of workaround for old kernels and
distributions which we no longer use in CI.
Signed-off-by: Adrian Reber <areber@redhat.com>
On Cirrus CI we can run tests on the orignal CentOS 7 kernel.
The kernel is rather old, but on GitHub Actions we a 5.8 kernel
and a containerized CentOS 7 user space not much is working
correctly anymore. With this commit CentOS 7 based tests are
no longer running on GitHub Actions but on Cirrus CI.
Signed-off-by: Adrian Reber <areber@redhat.com>
With this change tainted kernels can be ignored with setting
ZDTM_IGNORE_TAINT=1. This is just to simplify the CI script to not
require to change every call of zdtm. Setting the variable once should
be enough.
Signed-off-by: Adrian Reber <areber@redhat.com>
These files use $PKG_CONFIG before they include the common files that
setup a default, so set early defaults in them too.
Signed-off-by: Mike Frysinger <vapier@chromium.org>
The build needs to respect $PKG_CONFIG env var like other standard
build systems and the the upstream pkg-config project itself. This
allows the package builder to point it to the right tool when doing
a cross-compile build. Otherwise the host pkg-config tool is used
which won't have access to the packages in the cross sysroot.
Signed-off-by: Mike Frysinger <vapier@chromium.org>
This patch improves the changes from 19be9ced9.
To use the newer version of containerd, we need to make sure that the
containerd service has been restarted after install. Instead of
hard-coding a version number, we can use github API to get the latest
release. In addition, the tar file contains all binary files in a
'./bin' sub-folder. Thus, it should be extracted in '/usr'.
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Although CentOS 8 comes with 4.18 kernel it has time namespace patches
backported but not all the required once. This disables time namespaced
tests on everything older than 5.11.
Signed-off-by: Adrian Reber <areber@redhat.com>
The SET_CHAR_OPT(__dest, __src) macro is essentially:
free(opts.__dest);
opts.__dest = xstrdup(__src);
So if __dest == __src the string that get's copied is freed. This e.g.
is the case in criu/lsm.c
int lsm_check_opts(void)
{
char *aux;
if (!opts.lsm_supplied)
return 0;
aux = strchr(opts.lsm_profile, ':');
if (aux == NULL) {
pr_err("invalid argument %s for --lsm-profile\n", opts.lsm_profile);
return -1;
}
*aux = '\0';
aux++;
if (strcmp(opts.lsm_profile, "apparmor") == 0) {
if (kdat.lsm != LSMTYPE__APPARMOR) {
pr_err("apparmor LSM specified but apparmor not supported by kernel\n");
return -1;
}
SET_CHAR_OPTS(lsm_profile, aux);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
} else if (strcmp(opts.lsm_profile, "selinux") == 0) {
if (kdat.lsm != LSMTYPE__SELINUX) {
pr_err("selinux LSM specified but selinux not supported by kernel\n");
return -1;
}
SET_CHAR_OPTS(lsm_profile, aux);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
} else if (strcmp(opts.lsm_profile, "none") == 0) {
xfree(opts.lsm_profile);
opts.lsm_profile = NULL;
} else {
pr_err("unknown lsm %s\n", opts.lsm_profile);
return -1;
}
return 0;
}
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
This way users would be able to create more meaningfull pull-requests
and issues. And we would not need to ask them to provide basic
information each time.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Previousely kerndat_uffd could not differentiate -EPERM and -1 returned
from uffd_open(). That way "Failed to get uffd API" and "Incompatible
uffd API ..." errors were just ignored, which is probably not what we
want.
v2: rework with extra argument of uffd_open for errno, rename err
label in uffd_open for readability
Fixes: cfdeac4a4 ("kerndat: Handle non-root mode when checking uffd")
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Now that the Ubuntu kernel is no longer broken with regards to
overlayfs, let's switch back to overlayfs instead of devicemapper and
vfs graphdrivers.
Signed-off-by: Adrian Reber <areber@redhat.com>
There are several problems with the loop.sh script. First, the code is
duplicated across tests in the so-called 'othres' category. Second, we
need to run it with the 'setsid' utility to make sure that it runs in
a new session. Third, we have to redirect the standard file descriptors
and use the '&' operator to make it run in the background. Finally,
obtaining the PID of the 'loop.sh' process resulted in race condition.
In this patch we replace the loop.sh script with a program that would
address all problems mentioned above. The requirements for this program
are as follows.
- It must be reusable across tests
- It must start a process that is detached from the current shell
- It must wait for the process to start and output its PID
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
The function name '_exit' is misleading as this function doesn't
actually exit when the status of the previous command is zero.
In addition, the behaviour of this function is not really needed.
This patch removes the '_exit' function and applies the correct
behaviour to stop the test on failure.
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
One should never rely on errno if libc syscall is successful. We can
either see an errno set from some previous failed syscall or even errno
set by a this successful libc syscall. So lets check ret first.
Fixes: 1ccdaf47 ("criu: add pidfd based pid reuse detection for RPC
clients")
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
This is just a symlink to the original transition/pid_reuse test with
the right options passed to trigger the pidfd store based pid reuse
detection code path.
Pidfd store based detection is supported only in RPC mode which
requires passing a unix socket fd to be used as pidfd store and
the kernel should support pidfd_open and pidfd_getfd syscalls
{'feature': 'pidfd_store'} for this test to work.
Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
When testing pid reuse using pidfd_store feature in RPC mode we need
to pass a unix socket fd used to CRIU in the RPC option
pidfd_store_sk to store the pidfds between predump/dump iterations.
Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
Closes: #717
This increases the reliability of pid reuse detection using pidfds,
currently through RPC migration tools like P.Haul.
A connectionless unix socket is passed to criu in RPC mode through
the RPC option pidfd_store_sk.
If this option is set, the socket is initialized in
init_pidfd_store_sk() to be used as a queue for task pidfds.
criu then sends tasks pidfds to this socket in send_pidfd_entry()
and receives them in the next pre-dump/dump iteration to build
the pidfds hashtable in init_pidfd_store_hash().
These pidfds will be used later in detect_pid_reuse().
How it should be used in migration tools like P.Haul:
- Open a connectionless unix socket
- Pass the socket fd in the RPC option pidfd_store_sk when
doing a pre-dump or dump
This will fail if the kernel does not support pidfd_open or
pidfd_getfd syscalls, so pidfd_store_sk should not be set if the
kernel does not support pidfd_open.
This could be checked with:
CLI: criu check --feature pidfd_store
RPC: CRIU_REQ_TYPE__FEATURE_CHECK and set pidfd_store to
true in the "features" field of the request
v2:
- add reasonable polling restart limit in check_pidfd_entry_state
to avoid getting stuck
- avoid leaking pidfd in send_pidfd_entry when entry is NULL,
otherwise pidfds are freed in free_pidfd_store
v3:
- check that the passed pidfd store is not empty after
the first iteration (i.e. --prev-images-dir option set).
v4:
- clear pidfd_hash heads
- check entry allocation error in init_pidfd_store_hash()
Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
pidfd_store which will be used for reliable pidfd based pid reuse
detection for RPC clients requires two recent syscalls (pidfd_open
and pidfd_getfd).
We allow checking if pidfd_store is supported using:
1. CLI: criu check --feature pidfd_store
2. RPC: CRIU_REQ_TYPE__FEATURE_CHECK and set pidfd_store to
true in the "features" field of the request
Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
pidfd_store_sk option will be used later to store tasks pidfds
between predumps to detect pid reuse reliably.
pidfd_store_sk should be a fd of a connectionless unix socket.
init_pidfd_store_sk() steals the socket from the RPC client using
pidfd_getfd, checks that it is a connectionless unix socket and
checks if it is not initialized before (i.e. unnamed socket).
If not initialized the socket is first bound to an abstract name
(combination of the real pid/fd to avoid overlap), then it is
connected to itself hence allowing us to store the pidfds in the
receive queue of the socket (this is similar to how fdstore_init()
works).
v2:
- avoid close(pidfd) overriding errno of SYS_pidfd_open in
init_pidfd_store_sk()
- close pidfd_store_sk because we might have leftover from
previous iterations
Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
pidfd_getfd syscall will be needed later to send pidfds between
pre-dump/dump iterations for pid reuse detection.
v2:
- check size written/read of val_a/val_b is correct
- return with error when val_a != val_b
Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
pidfd_open syscall will be needed later to send pidfds between
pre-dump/dump iterations for pid reuse detection.
v2:
- make kerndat_has_pidfd_open void since 0 is always returned
- fix missing tabs in syscall tables
Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
When criu is run as user it fails and exits because of kerndat_uffd() returning -1.
This, in turn, happens after uffd = syscall(SYS_userfaultfd, flags); which only works
for root.
In the change it ignores the permission error and proceeds further just like it's done
for e.g. pagemap checking.
Signed-off-by: Nithin Jaikar J <jaikar006@gmail.com>
This commit extends the CRIT tests to cover the 'x' command, which is
used to explore an image directory.
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>