mir/criu - criu - Mike's Git repositories

mir/criu

mirror of https://github.com/checkpoint-restore/criu synced 2025-08-30 22:05:36 +00:00

Author	SHA1	Message	Date
Pavel Tikhomirov	d0511319e5	github: Add templates for new issues and pull requests This way users would be able to create more meaningfull pull-requests and issues. And we would not need to ask them to provide basic information each time. Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>	2021-09-03 10:31:00 -07:00
Radostin Stoyanov	3c10d3335b	criu(8): document --join-ns option The --join-ns option was introduced with commits: https://github.com/checkpoint-restore/criu/commit/2cf17cd https://github.com/checkpoint-restore/criu/commit/790ec46 Closes: #1054 Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2021-09-03 10:31:00 -07:00
Pavel Tikhomirov	80ee4f8aec	kdat: make uffd_open return errno from syscall separately Previousely kerndat_uffd could not differentiate -EPERM and -1 returned from uffd_open(). That way "Failed to get uffd API" and "Incompatible uffd API ..." errors were just ignored, which is probably not what we want. v2: rework with extra argument of uffd_open for errno, rename err label in uffd_open for readability Fixes: `cfdeac4a4` ("kerndat: Handle non-root mode when checking uffd") Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>	2021-09-03 10:31:00 -07:00
Adrian Reber	a8525c07d4	ci: no longer avoid overlayfs Now that the Ubuntu kernel is no longer broken with regards to overlayfs, let's switch back to overlayfs instead of devicemapper and vfs graphdrivers. Signed-off-by: Adrian Reber <areber@redhat.com>	2021-09-03 10:31:00 -07:00
Radostin Stoyanov	2aa4185a6c	test/others: refactor loop process There are several problems with the loop.sh script. First, the code is duplicated across tests in the so-called 'othres' category. Second, we need to run it with the 'setsid' utility to make sure that it runs in a new session. Third, we have to redirect the standard file descriptors and use the '&' operator to make it run in the background. Finally, obtaining the PID of the 'loop.sh' process resulted in race condition. In this patch we replace the loop.sh script with a program that would address all problems mentioned above. The requirements for this program are as follows. - It must be reusable across tests - It must start a process that is detached from the current shell - It must wait for the process to start and output its PID Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2021-09-03 10:31:00 -07:00
Radostin Stoyanov	2b78d95e6b	test/others: drop '_exit' function The function name '_exit' is misleading as this function doesn't actually exit when the status of the previous command is zero. In addition, the behaviour of this function is not really needed. This patch removes the '_exit' function and applies the correct behaviour to stop the test on failure. Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2021-09-03 10:31:00 -07:00
Andrei Vagin	34410b9e75	test: add a test to check that sigtrap handlers are restored Signed-off-by: Andrei Vagin <avagin@gmail.com>	2021-09-03 10:31:00 -07:00
Andrei Vagin	b310fbd31f	ksigset: fix a typo in ksigdelset Fixes: `8063eb8fe6` ("parasite: don't block SIGTRAP") Reported-by: zl-wang <zlwang@ca.ibm.com> Signed-off-by: Andrei Vagin <avagin@gmail.com>	2021-09-03 10:31:00 -07:00
Pavel Tikhomirov	c1b2d194e9	mem/pidfd: fix poll retry error checking One should never rely on errno if libc syscall is successful. We can either see an errno set from some previous failed syscall or even errno set by a this successful libc syscall. So lets check ret first. Fixes: 1ccdaf47 ("criu: add pidfd based pid reuse detection for RPC clients") Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>	2021-09-03 10:31:00 -07:00
Zeyad Yasser	1c08709cdb	zdtm: add pidfd store based pid reuse test This is just a symlink to the original transition/pid_reuse test with the right options passed to trigger the pidfd store based pid reuse detection code path. Pidfd store based detection is supported only in RPC mode which requires passing a unix socket fd to be used as pidfd store and the kernel should support pidfd_open and pidfd_getfd syscalls {'feature': 'pidfd_store'} for this test to work. Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>	2021-09-03 10:31:00 -07:00
Zeyad Yasser	ea0dc7807a	zdtm: add --pidfd-store option in RPC mode When testing pid reuse using pidfd_store feature in RPC mode we need to pass a unix socket fd used to CRIU in the RPC option pidfd_store_sk to store the pidfds between predump/dump iterations. Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>	2021-09-03 10:31:00 -07:00
Zeyad Yasser	e79131e8c3	criu: add pidfd based pid reuse detection for RPC clients Closes: #717 This increases the reliability of pid reuse detection using pidfds, currently through RPC migration tools like P.Haul. A connectionless unix socket is passed to criu in RPC mode through the RPC option pidfd_store_sk. If this option is set, the socket is initialized in init_pidfd_store_sk() to be used as a queue for task pidfds. criu then sends tasks pidfds to this socket in send_pidfd_entry() and receives them in the next pre-dump/dump iteration to build the pidfds hashtable in init_pidfd_store_hash(). These pidfds will be used later in detect_pid_reuse(). How it should be used in migration tools like P.Haul: - Open a connectionless unix socket - Pass the socket fd in the RPC option pidfd_store_sk when doing a pre-dump or dump This will fail if the kernel does not support pidfd_open or pidfd_getfd syscalls, so pidfd_store_sk should not be set if the kernel does not support pidfd_open. This could be checked with: CLI: criu check --feature pidfd_store RPC: CRIU_REQ_TYPE__FEATURE_CHECK and set pidfd_store to true in the "features" field of the request v2: - add reasonable polling restart limit in check_pidfd_entry_state to avoid getting stuck - avoid leaking pidfd in send_pidfd_entry when entry is NULL, otherwise pidfds are freed in free_pidfd_store v3: - check that the passed pidfd store is not empty after the first iteration (i.e. --prev-images-dir option set). v4: - clear pidfd_hash heads - check entry allocation error in init_pidfd_store_hash() Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>	2021-09-03 10:31:00 -07:00
Zeyad Yasser	ba882893c3	cr-check: add ability to check if pidfd_store feature is supported pidfd_store which will be used for reliable pidfd based pid reuse detection for RPC clients requires two recent syscalls (pidfd_open and pidfd_getfd). We allow checking if pidfd_store is supported using: 1. CLI: criu check --feature pidfd_store 2. RPC: CRIU_REQ_TYPE__FEATURE_CHECK and set pidfd_store to true in the "features" field of the request Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>	2021-09-03 10:31:00 -07:00
Zeyad Yasser	e3c9c3429a	cr-service: add pidfd_store_sk option to rpc.proto pidfd_store_sk option will be used later to store tasks pidfds between predumps to detect pid reuse reliably. pidfd_store_sk should be a fd of a connectionless unix socket. init_pidfd_store_sk() steals the socket from the RPC client using pidfd_getfd, checks that it is a connectionless unix socket and checks if it is not initialized before (i.e. unnamed socket). If not initialized the socket is first bound to an abstract name (combination of the real pid/fd to avoid overlap), then it is connected to itself hence allowing us to store the pidfds in the receive queue of the socket (this is similar to how fdstore_init() works). v2: - avoid close(pidfd) overriding errno of SYS_pidfd_open in init_pidfd_store_sk() - close pidfd_store_sk because we might have leftover from previous iterations Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>	2021-09-03 10:31:00 -07:00
Zeyad Yasser	a9508c9864	criu: check if pidfd_getfd syscall is supported pidfd_getfd syscall will be needed later to send pidfds between pre-dump/dump iterations for pid reuse detection. v2: - check size written/read of val_a/val_b is correct - return with error when val_a != val_b Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>	2021-09-03 10:31:00 -07:00
Zeyad Yasser	30e8d8cadf	criu: check if pidfd_open syscall is supported pidfd_open syscall will be needed later to send pidfds between pre-dump/dump iterations for pid reuse detection. v2: - make kerndat_has_pidfd_open void since 0 is always returned - fix missing tabs in syscall tables Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>	2021-09-03 10:31:00 -07:00
nithin-jaikar	5d08f975a1	kerndat: Handle non-root mode when checking uffd When criu is run as user it fails and exits because of kerndat_uffd() returning -1. This, in turn, happens after uffd = syscall(SYS_userfaultfd, flags); which only works for root. In the change it ignores the permission error and proceeds further just like it's done for e.g. pagemap checking. Signed-off-by: Nithin Jaikar J <jaikar006@gmail.com>	2021-09-03 10:31:00 -07:00
Radostin Stoyanov	8c303d1a64	test/others/crit: add test for 'x' This commit extends the CRIT tests to cover the 'x' command, which is used to explore an image directory. Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2021-09-03 10:31:00 -07:00
Radostin Stoyanov	e393001095	lib/cli.py: Open explore file as a binary Fixes #1484 Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2021-09-03 10:31:00 -07:00
Andrei Vagin	c8973d426b	test/zdtm: check that a penging SIGTRAP handled properly Signed-off-by: Andrei Vagin <avagin@gmail.com>	2021-09-03 10:31:00 -07:00
Andrei Vagin	61c7cc5a92	parasite: don't block SIGTRAP This is the workaround for #1429. The parasite code contains instructions that trigger SIGTRAP to stop at certain points. In such cases, the kernel sends a force SIGTRAP that can't be ignore and if it is blocked, the kernel resets its signal handler to a default one and unblocks it. It means that if we want to save the origin signal handle Signed-off-by: Andrei Vagin <avagin@gmail.com>	2021-09-03 10:31:00 -07:00
Adrian Reber	ed58fb2214	test: create new tls certificates The certificates expired again. This replaces the expired certificates with test certificates which are valid for ever: echo -ne "ca\ncert_signing_key\nexpiration_days = -1\n" > temp certtool --generate-privkey > cakey.pem certtool --generate-self-signed \ --template temp \ --load-privkey cakey.pem \ --outfile cacert.pem echo -ne "cn=$HOSTNAME\nencryption_key\nsigning_key\nexpiration_days = -1\n" > temp certtool --generate-privkey > key.pem certtool --generate-certificate \ --template temp \ --load-privkey key.pem \ --load-ca-certificate cacert.pem \ --load-ca-privkey cakey.pem \ --outfile cert.pem rm cakey.pem temp Signed-off-by: Adrian Reber <areber@redhat.com>	2021-09-03 10:31:00 -07:00
Alexander Mikhalitsyn	6beeabcd42	zdtm: add sk-unix-dgram-ghost test case This testcase reproduces deadlock in "wait_fds_event" futex in open_fdinfos() function (files subsystem). Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>	2021-09-03 10:31:00 -07:00
Alexander Mikhalitsyn	2609e98ee7	sk-unix: ghost: fix deadlock between peer_fle->stage and fds wake up This patch fixes deadlock that appears on ghost DGRAM unix sockets. Problem is that wake_connected_sockets() function should be called strictly after fle->stage >= FLE_OPEN. Explanation: Consider situation, we have ghost unix DGRAM socket (peer socket), and also have several sockets that connected to this peer socket. How restore of that picture works? In files subsystem we have open_fdinfos(pstree_item) function that calls open_fd() function for every* fd of task. open_fd() function calls appropriate file descriptor "open" handler that may return "1" which means "try again later". This retcode means, that some additional resources is needed to fully restore file descriptor. For ghost UNIX sockets, for instance, we need to have peer socket file descriptor before we can open and restore client sockets. Here is the main problem. open_fdinfos() called from separate tasks simultaneously, so, when we get "1" retcode we stay on futex (wait_fds_event() function) and waiting for someone another task restore some resource and notify us that we can retry opening of file descriptor. With ghost UNIX socket I've managed to caught next behaviour. 1. From one task (that holds client socket) open_fdinfos() called open_fd() that called open_unixsk_standalone(). In open_unixsk_standalone we have check that means "if socket have peer and that peer is GHOST and that peer fle->stage < FLE_OPEN" return "try again". Ok. So, this task will stay on wait_fds_event(). 2. Second task, that holds peer tried to open peer socket fd. So, it also calls open_fd() -> open_unixsk_standalone() -> opening socket -> bind_unix_sk() -> in bind_unix_sk we have call to wake_connected_sockets(). So, after that call we will "wake up" task from first point and it may proceed fd restoring. Yes? No. In first point we need to "peer_fle->stage >= FLE_OPEN" but fle->stage of our peer socket will become FLE_OPEN in open_fd(). After we return from open_unixsk_standalone we proceed to setup_and_serve_out() where we have appropriate stage change. Between call of wake_connected_sockets and moment when we set stage to FLE_OPEN should pass very small amount of time. But it may happen, so we "wake up" tasks that holds client sockets but did not have enough time to change fle->stage to FLE_OPEN. Exactly that case I've managed to reproduce. (Really, ossec-hids application managed to reproduce this problem at first %) ) v1: file_desc_ops->on_stage_change callback was introduced, sk-unix ghost code reworked so that to call wake_connected_sockets() strictly after changing fle->stage to FLE_OPEN. v2: implementation replaced with short and more practical patch by Andrei Suggested-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>	2021-09-03 10:31:00 -07:00
Alexander Mikhalitsyn	655610e09a	ci: remove hack for netns-nft zdtm test Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>	2021-09-03 10:31:00 -07:00
Alexander Mikhalitsyn	ddefbbff16	zdtm: add combined nftables/iptables netns-nft-ipt test Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>	2021-09-03 10:31:00 -07:00
Alexander Mikhalitsyn	4696e61edb	zdtm: skip static/netns-nft test if nftables feature isn't supported Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>	2021-09-03 10:31:00 -07:00
Alexander Mikhalitsyn	d8821d9a8f	net: skip iptables dump if it has nft backend and nft dump is supported On modern Linux distributions iptables binaries using new nftables API. We dump iptables rules using "iptables-save", and nftables rules using libnftables API. This breaks network unlock on modern systems because technically, we dump rules (including network lock rules) two times. There is another problem - on host we can have modern distribution, but in Docker container we can use iptables with netfilter (legacy) API. So, in this case this legacy rules will be skipped. This patch handles all of that cases. It tries to find iptables legacy and dump legacy rules by using appropriate iptables binaries, dump nftables rules by using libnftables. Fixes #1435 Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>	2021-09-03 10:31:00 -07:00
Adrian Reber	e26949cfed	lsm: handle half initialized SELinux setups CRIU used to check for the existence of /sys/fs/selinux to see if SELinux is enabled on a system. We have seen systems with SELinux kind of enabled but reading out the labels gives does not return real labels. To work around this, this commit adds a check during LSM detection if SELinux labels are in the right format. For CRIU this check means to see if there are at least 3 ':' in a label. If not CRIU switches to no LSM mode. Signed-off-by: Adrian Reber <areber@redhat.com>	2021-09-03 10:31:00 -07:00
Radostin Stoyanov	e2c352e4f8	tools.mk: Use Python 3 by default As of January 1st, 2020 Python 2 is no longer supported and many distributions no longer provide packages for Python 2 dependencies. This patch allows CRIU to use Python 3 by default when both major versions are available on the system. Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2021-09-03 10:31:00 -07:00
Radostin Stoyanov	177e4b4bad	mips: remove empty gitignore Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2021-09-03 10:31:00 -07:00
Radostin Stoyanov	22142eedf0	mips: coding style fixes CRIU follows Linux kernel coding style. This patch updates the architecture-specific code for MIPS to use tab indentation, add whitespace between closing parenthesis and open bracket, and changes the mode of source files from 755 to 644. Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>	2021-09-03 10:31:00 -07:00
zl-wang	99a6a17c2f	Allow systemcfg proc file to be dumped Currently, it cannot be check-pointed, because that type of file is on UNSUPP list. Signed-off-by: zl-wang <zlwang@ca.ibm.com>	2021-09-03 10:31:00 -07:00
Nicolas Viennot	731cafa85c	logging: pr_perror() -> pr_msg() when execvp fails in action scripts and others When invoking an action-script, all file descriptors >= 3 are closed. If execvp() fails, we can only log the error on stderr. pr_msg() outputs on stderr, so we use this as opposed to pr_perror(). Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>	2021-09-03 10:31:00 -07:00
Nicolas Viennot	24bdfa72df	net: add a #define for increased compatiblity with old distributions Debian 9 doesn't have IFLA_NEW_IFINDEX defined Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>	2021-09-03 10:31:00 -07:00
Nicolas Viennot	29c34386b0	restore: fix error message when fork fails `last_pid_buf` is not where the last_pid string is, but it is in `s`. Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>	2021-09-03 10:31:00 -07:00
Kir Kolyshkin	f10425e053	criu: end pr_(err\|warn\|msg\|info\|debug) with \n Unlike pr_perror, pr_err and other macros do not append \n to the message being printed, so the caller needs to take care of it. Sometimes it was not done, so let's add this manually. To make sure it won't happen again, add a line to Makefile under the linter target to check for such missing \n. NOTE this check is only done for part of such cases (where the pr_* statement fits in one line and there's no comment after), but it's better than nothing. Add comments after pr_msg and pr_info statements where we deliberately don't add \n, so that the above check ignores them. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-09-03 10:31:00 -07:00
Kir Kolyshkin	96b7178bab	Whitespace at EOL cleanup and check My editor (vim) auto-removes whitespace at EOL for .c and .h files, and I think it makes sense to have a separate commit for this, rather than littering other commits with such changes. To make sure this won't pile up again, add a line to Makefile under the linter target to check for such things (so CI will fail). This is all whitespace except an addition to Makefile. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-09-03 10:31:00 -07:00
Kir Kolyshkin	7ea20e8f5a	criu: make sure to use pr_perror to show errno In cases where we need to print errno, it is better to use pr_perror. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-09-03 10:31:00 -07:00
Kir Kolyshkin	10c619adb9	test/zdtm: pr_err / pr_perror fixes 1. Use pr_perror where errno needs to be shown. 2. Use pr_err in cases where errno is not set by the previous failed call. 3. Make sure pr_err's first argument do not have \n. 4. While at it, fix some error messages. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-09-03 10:31:00 -07:00
Kir Kolyshkin	dca0eb5b4a	test/others/bers: use pr_perror When errno is set, it makes sense to use pr_perror. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-09-03 10:31:00 -07:00
Kir Kolyshkin	e326889c06	criu/mount.c: fix \n in pr_debug Funny but it used incorrect slash. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-09-03 10:31:00 -07:00
Kir Kolyshkin	2166d47482	scripts: fix shellcheck warnings On my system (shellcheck v0.7.1) make lint shows a few warnings about needing to quote variables. Fix those. PS I am not sure why those are not shown by GHA CI, I assume there is different shellcheck version used. Add shellcheck -- version to the appropriate Makefile target to avoid confusion. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-09-03 10:31:00 -07:00
Kir Kolyshkin	5f3631916a	Makefile: amend lint with pr_perror/fail checks In many cases developers forget that pr_perror and fail macros are a bit special, in particular: 1. they already show errno; 2. they already append \n to the message. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-09-03 10:31:00 -07:00
Kir Kolyshkin	4cd23083be	test/zdtm: don't pass errno to fail() Macro fail() already prints the value of errno, so there's no need to explicitly add it. Found by git grep '^\s\<fail\>.errno' Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-09-03 10:31:00 -07:00
Kir Kolyshkin	12a2bd0edd	test/zdtm: don't use %m with fail Macro fail already appends errno and strerror(errno) to the error message, so there's no need to do it explicitly. Brought to you by for f in $(git grep -l fail test/zdtm); do test -f $f \|\| continue echo $f sed -i '\\|^[[:space:]]fail(.[ (]%m)"\|s/:[ (]%m)//' $f done Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-09-03 10:31:00 -07:00
Kir Kolyshkin	b20694835d	test/zdtm: don't use \n with fail() Macro fail already appends \n to the message, so there's no need to do it explicitly. Brought to you by for f in $(git grep -l fail test/zdtm); do test -f $f \|\| continue echo $f sed -i '\%^[[:space:]]fail(.\\n"%s/\\n"/"/' $f done Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-09-03 10:31:00 -07:00
Kir Kolyshkin	9cbcaaed39	test/zdtm: don't use errno for pr_perror Macro pr_perror already adds errno and its string representation to the error message, so there's no need to explicitly do it. While at it, fix some error messages. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-09-03 10:31:00 -07:00
Kir Kolyshkin	865a5e9511	test/zdtm: don't use pr_perror where errno is unset pr_perror should be used for cases where the failed operation sets errno. For cases where errno is not set, pr_err is preferable. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-09-03 10:31:00 -07:00
Kir Kolyshkin	d55a65e939	criu: don't use errno for pr_error There is no need to, as pr_error already adds strerror(errno) to the error message. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-09-03 10:31:00 -07:00

1 2 3 4 5 ...

10636 Commits