mirror of
https://github.com/checkpoint-restore/criu
synced 2025-08-22 09:58:09 +00:00
freeze_processes: implement kludges for cgroup v1
Cgroup v1 freezer has always been problematic, failing to freeze a cgroup. In runc, we have implemented a few kludges to increase the chance of succeeding, but those are used when runc freezes a cgroup for its own purposes (for "runc pause" and to modify device properties for cgroup v1). When criu is used, it fails to freeze a cgroup from time to time (see [1], [2]). Let's try adding kludges similar to ones in runc. Alas, I have absolutely no way to test this, so please review carefully. [1]: https://github.com/opencontainers/runc/issues/4273 [2]: https://github.com/opencontainers/runc/issues/4457 Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This commit is contained in:
parent
9c3c095cfe
commit
7c66617d0e
31
criu/seize.c
31
criu/seize.c
@ -539,6 +539,34 @@ err:
|
|||||||
return exit_code;
|
return exit_code;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static void cgroupv1_freezer_kludges(int fd, int iter, const struct timespec *req) {
|
||||||
|
/* As per older kernel docs (freezer-subsystem.txt before
|
||||||
|
* the kernel commit ef9fe980c6fcc1821), if FREEZING is seen,
|
||||||
|
* userspace should either retry or thaw. While current
|
||||||
|
* kernel cgroup v1 docs no longer mention a need to retry,
|
||||||
|
* even recent kernels can't reliably freeze a cgroup v1.
|
||||||
|
*
|
||||||
|
* Let's keep asking the kernel to freeze from time to time.
|
||||||
|
* In addition, do occasional thaw/sleep/freeze.
|
||||||
|
*
|
||||||
|
* This is still a game of chances (the real fix belongs to the kernel)
|
||||||
|
* but these kludges might improve the probability of success.
|
||||||
|
*
|
||||||
|
* Cgroup v2 does not have this problem.
|
||||||
|
*/
|
||||||
|
switch (iter % 32) {
|
||||||
|
case 9:
|
||||||
|
case 20:
|
||||||
|
freezer_write_state(fd, FROZEN);
|
||||||
|
break;
|
||||||
|
case 31:
|
||||||
|
freezer_write_state(fd, THAWED);
|
||||||
|
nanosleep(req, NULL);
|
||||||
|
freezer_write_state(fd, FROZEN);
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
static int freeze_processes(void)
|
static int freeze_processes(void)
|
||||||
{
|
{
|
||||||
int fd, exit_code = -1;
|
int fd, exit_code = -1;
|
||||||
@ -597,6 +625,9 @@ static int freeze_processes(void)
|
|||||||
if (state == FROZEN || i++ == nr_attempts || alarm_timeouted())
|
if (state == FROZEN || i++ == nr_attempts || alarm_timeouted())
|
||||||
break;
|
break;
|
||||||
|
|
||||||
|
if (!cgroup_v2)
|
||||||
|
cgroupv1_freezer_kludges(fd, i, &req);
|
||||||
|
|
||||||
nanosleep(&req, NULL);
|
nanosleep(&req, NULL);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
Loading…
x
Reference in New Issue
Block a user