1) In dump_tcp_conn_state, if return from libsoccr_save is >=0, we check that sizeof(struct libsoccr_sk_data) returned from libsoccr_save is equal to sizeof(struct libsoccr_sk_data) we see in dump_tcp_conn_state (probably to check if we use the right library version). And if sizes are different we go to err_r, which just returns ret, which can teoretically be 0 (if size in library is zero) and that would lead dump_one_tcp treat this as success though it is obvious error. 2) In case of dump_opt or open_image fails we don't explicitly set ret and rely that sizeof(struct libsoccr_sk_data) previously set to ret is not 0, I don't really like it, it makes reading code too complex. 3) We have a lot of err_* labels which do exactly the same thing, there is no point in having all of them, also it is better to choose the name of the label based on what it really does. So let's refactor error handling to avoid these inconsistencies. Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
CRIU -- A project to implement checkpoint/restore functionality for Linux
CRIU (stands for Checkpoint and Restore in Userspace) is a utility to checkpoint/restore Linux tasks.
Using this tool, you can freeze a running application (or part of it) and checkpoint it to a hard drive as a collection of files. You can then use the files to restore and run the application from the point it was frozen at. The distinctive feature of the CRIU project is that it is mainly implemented in user space. There are some more projects doing C/R for Linux, and so far CRIU appears to be the most feature-rich and up-to-date with the kernel.
CRIU project is (almost) the never-ending story, because we have to always keep up with the Linux kernel supporting checkpoint and restore for all the features it provides. Thus we're looking for contributors of all kinds -- feedback, bug reports, testing, coding, writing, etc. Please refer to CONTRIBUTING.md if you would like to get involved.
The project started as the way to do live migration for OpenVZ Linux containers, but later grew to more sophisticated and flexible tool. It is currently used by (integrated into) OpenVZ, LXC/LXD, Docker, and other software, project gets tremendous help from the community, and its packages are included into many Linux distributions.
The project home is at http://criu.org. This wiki contains all the knowledge base for CRIU we have. Pages worth starting with are:
- Installation instructions
- A simple example of usage
- Examples of more advanced usage
- Troubleshooting can be hard, some help can be found here, here and here
Checkpoint and restore of simple loop process
Advanced features
As main usage for CRIU is live migration, there's a library for it called P.Haul. Also the project exposes two cool core features as standalone libraries. These are libcompel for parasite code injection and libsoccr for TCP connections checkpoint-restore.
Live migration
True live migration using CRIU is possible, but doing all the steps by hands might be complicated. The phaul sub-project provides a Go library that encapsulates most of the complexity. This library and the Go bindings for CRIU are stored in the go-criu repository.
Parasite code injection
In order to get state of the running process CRIU needs to make this process execute some code, that would fetch the required information. To make this happen without killing the application itself, CRIU uses the parasite code injection technique, which is also available as a standalone library called libcompel.
TCP sockets checkpoint-restore
One of the CRIU features is the ability to save and restore state of a TCP socket without breaking the connection. This functionality is considered to be useful by itself, and we have it available as the libsoccr library.
Licence
The project is licensed under GPLv2 (though files sitting in the lib/ directory are LGPLv2.1).
All files in the images/ directory are licensed under the Expat license (so-called MIT). See the images/LICENSE file.