We can have ghost-files on readonly mounts, for them we will need to recreate the file on restore, and we can't do that if mount is readonly, so the idea is to remount the mount we want to operate on to be writable, and later after all ghost-files restored return mounts to their proper state if needed. There are three exceptions, where we don't remount: a) Overmounted mounts can't be easily remounted writable, as their mountpoints are invisible for us. b) If the mount has readonly superblock - there can be no ghost-files on such a mount. c) When we are in host mntns, we should not remount mounts in it, else if we face errors in between we'll forget to remount back. We have 3 places where we need to add these remount: 1) create_ghost() 2) clean_one_remap() 3) rfi_remap() For (1) and (2) we can just remount the mount writable without remounting it back as they are called in service mntns (the one we save in mnt_ns_fd), which will be destroyed with all it's mounts at the end. We mark such mounts as remounted in service mntns - REMOUNTED_RW_SERVICE. For (3) we need to remount these mounts back to readonly so we mark them with REMOUNTED_RW and later in remount_readonly_mounts all such mounts are re-remounted back. For (3) we also need to enter proper mntns of tmi before remounting. These solution v3 is better than v2 as for v2 we added additional remount for all bind-readonly mounts, now we do remounts only for those having ghost-files restore operations on them. These should be quiet a rare thing, so ~3 remounts added for each suitable mount is a relatively small price. note: Also I thought and tried to implement the complete remove of the step of remounting back to readonly, but it requires quiet a tricky playing with usernsd and only removes one remount (of ~3) for already a rare case so I don't thing it worth the effort. v2: minor commit message cleanup and remove warn v4: don't delay, only remount the mounts we explicitly want to write to just before operating, rename patch accordingly, reuse do_restore_task_mnt_ns, optimize inefficient ns_remount_readonly_mounts, and also add another exception. v5: simplify child status check, fix log messages and brackets, do not drop all flags but only the readonly flag Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@gmail.com>
CRIU -- A project to implement checkpoint/restore functionality for Linux
CRIU (stands for Checkpoint and Restore in Userspace) is a utility to checkpoint/restore Linux tasks.
Using this tool, you can freeze a running application (or part of it) and checkpoint it to a hard drive as a collection of files. You can then use the files to restore and run the application from the point it was frozen at. The distinctive feature of the CRIU project is that it is mainly implemented in user space. There are some more projects doing C/R for Linux, and so far CRIU appears to be the most feature-rich and up-to-date with the kernel.
The project started as the way to do live migration for OpenVZ Linux containers, but later grew to more sophisticated and flexible tool. It is currently used by (integrated into) OpenVZ, LXC/LXD, Docker, and other software, project gets tremendous help from the community, and its packages are included into many Linux distributions.
The project home is at http://criu.org. This wiki contains all the knowledge base for CRIU we have. Pages worth starting with are:
- Installation instructions
- A simple example of usage
- Examples of more advanced usage
- Troubleshooting can be hard, some help can be found here, here and here
A video tour on basic CRIU features
Advanced features
As main usage for CRIU is live migration, there's a library for it called P.Haul. Also the project exposes two cool core features as standalone libraries. These are libcompel for parasite code injection and libsoccr for TCP connections checkpoint-restore.
Live migration
True live migration using CRIU is possible, but doing all the steps by hands might be complicated. The phaul sub-project provides a Go library that encapsulates most of the complexity. This library and the Go bindings for CRIU are stored in the go-criu repository.
Parasite code injection
In order to get state of the running process CRIU needs to make this process execute some code, that would fetch the required information. To make this happen without killing the application itself, CRIU uses the parasite code injection technique, which is also available as a standalone library called libcompel.
TCP sockets checkpoint-restore
One of the CRIU features is the ability to save and restore state of a TCP socket without breaking the connection. This functionality is considered to be useful by itself, and we have it available as the libsoccr library.
How to contribute
CRIU project is (almost) the never-ending story, because we have to always keep up with the Linux kernel supporting checkpoint and restore for all the features it provides. Thus we're looking for contributors of all kinds -- feedback, bug reports, testing, coding, writing, etc. Here are some useful hints to get involved.
- We have both -- very simple and more sophisticated coding tasks;
- CRIU does need extensive testing;
- Documentation is always hard, we have some information that is to be extracted from people's heads into wiki pages as well as some texts that all need to be converted into useful articles;
- Feedback is expected on the github issues page and on the mailing list;
- For historical reasons we do not accept PRs, instead patches are welcome;
- Spread the word about CRIU in social networks;
- If you're giving a talk about CRIU -- let us know, we'll mention it on the wiki main page;
Licence
The project is licensed under GPLv2 (though files sitting in the lib/ directory are LGPLv2.1).