mir/criu - criu - Mike's Git repositories

mir/criu

mirror of https://github.com/checkpoint-restore/criu synced 2025-08-30 13:58:34 +00:00

Author	SHA1	Message	Date
Kirill Tkhai	da9367188b	zdtm: Check for groups list userns01 test Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-11-30 01:22:14 +03:00
Kirill Tkhai	dff4949591	ns: Keep all clone flags fixups together It improves readability, when they all are in the only place and they all are seen. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-11-30 01:22:14 +03:00
Kirill Tkhai	bd7bc00e47	ns: Simplify create_net_ns() Merge code with the same functionality in one Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-11-30 01:22:14 +03:00
Kirill Tkhai	04cd52c64a	net: Kill unused argument in open_net_ns() Nobody uses it. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-11-30 01:22:14 +03:00
Kirill Tkhai	9e9357db48	ns: Remove excess unshare CLONE_NEWNET Child process is created to set NS_OTHER user_ns, before creation of a net_ns. So, this CLONE_NEWNET is useless, and the created net_ns is lost right after we do unshare() in create_net_ns(). Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-11-30 01:22:14 +03:00
Kirill Tkhai	140111c386	ns: Allow nested user namespaces Everything is prepared for nested user namespaces support. The only thing, we should do more, is to enter to dumped user namespace's parent before the dump. We use CLONE_VM for child tasks, so they may populate user_ns maps in parent memory without any tricks. v3: Check for WIFEXITED(). Fixed stack size. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-11-30 01:22:14 +03:00
Kirill Tkhai	1e6f4047b9	ns: Convert task cred's xids to target user ns xids are saved according to NS_ROOT, while in pie we may set them in their target user_ns. So, let's convert them. Look at the commentary in the code, while we save them in NS_ROOT. Also, small cleanup: use creds instead of args->creds for caps. v4: Use target_userns_gid() to convert gids. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-11-30 01:22:14 +03:00
Kirill Tkhai	3481f2d926	ns: Dump creds xids in root_user_ns They may not be mapped in target user_ns, so dump they values in NS_ROOT. But because of backward compatibility we can't collect their values from "/proc/[pid]/status", because it's supported on the most recent kernel only. So, choose this dump file format (dumping values in NS_ROOT), and we be ready for the future. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-11-30 01:22:14 +03:00
Kirill Tkhai	342be8bc59	rst: Pass pstree_item argument to alloc_groups_copy_creds() Pass the arg and add const modifiers where they are need. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-11-30 01:22:14 +03:00
Kirill Tkhai	5d3e26d221	shmem: Fixup shmem_wait_and_open() opens foreign /proc/[pid]/fd/[i] When target process is in a user_ns, where we do not have a permissions, we need use usernsd helper to get its fds. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-11-30 01:22:14 +03:00
Kirill Tkhai	69b412ec82	ns: Set target user_ns after net_ns is set Restore task's user_ns, and keep in mind we born in parent's user_ns Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-11-30 01:22:14 +03:00
Kirill Tkhai	87ad5ec22c	ns: Implement set_user_ns() Add a field pstree_item::user_ns, which shows the task's current user_ns, and introduce helpers to set it. v3: Rebase on fdstore Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-11-30 01:22:14 +03:00
Kirill Tkhai	f9582b0bd3	utils: Introduce open_fd_of_real_pid() As access to /proc/[pid]/fd/[i] of a task from parent's user_ns is prohibited, introduce a helper, doing that via usernsd. Also, remove BUG_ON() in usernsd, as now it may be used without input fd parameter. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-11-30 01:22:14 +03:00
Kirill Tkhai	13cba0ca69	user_ns: Set user_ns before net_ns creation Since net ns may reffer not only to root_user_ns, set appropriate user_ns before its creation. v3: New Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-11-30 01:22:14 +03:00
Kirill Tkhai	a69a4eddec	ns: Generate user_ns tree Create user namespaces hierarhy from criu main task. Open ns'es fds, so they are seen for everybody in fdstore. Why we do it this way. 1)User namespaces are not correlated with task hierarhy. Parent task may have a user namespace of a level bigger, that a child task. So, we can't restore the user namespaces just by passing CLONE_NEWUSER in fork_with_pid(). 2)CLONE_FS tasks will require user_ns is set at the moment of clone(), so we have to restore target user_ns in locality of create_children_and_session() in this case. v3: Check for WIFEXITED(). Aligned stack. Use fdstore to keep ns fd. Create tree from root_item. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-11-30 01:22:14 +03:00
Kirill Tkhai	3aebb0eac6	utils: Move getting real pid functionality to separate function This is refactoring Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-11-30 01:22:14 +03:00
Kirill Tkhai	5938cc0d50	proc: Close CR_PROC_FD_OFF and TRANSPORT_FD_OFF later CR_PROC_FD_OFF is need for accessing to foreign tasks fds, and will be used in the future. TRANSPORT_FD_OFF is for uniformity. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-11-30 01:22:14 +03:00
Kirill Tkhai	0966b0d05d	ns: Make write_id_map() use CR_PROC_FD_OFF Currently, it's used by criu from CRIU_NS only. So, in fact open_proc_rw() leads to opening of a fd in CRIU_NS /proc (open_pid_proc() just opens "/proc" dir, when PROC_FD_OFF is not set). Make write_id_map() use CR_PROC_FD_OFF, which exists, and does not confuse a user. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-11-30 01:22:14 +03:00
Kirill Tkhai	75f71e73a1	ns: Make prepare_userns() have ns map parameter This is refactoring Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-11-30 01:22:14 +03:00
Kirill Tkhai	0742794aa6	ns: Write/read ns entries in new way The patch introduces generic way for dumping all the namespaces in a generic way (currently, only user ns entries are dumped). Handler for old user ns images is remained on its place. v4: Rebase on generic parent_id and userns_id. v3: On restore, keep in mind, that parent ns may not be read at the moment of the searching of it. Set correct user ns id to d_ns. Reflect the fact, that parent_id is moved to pid and user ext. Read ns ids before tasks. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-11-30 01:22:14 +03:00
Kirill Tkhai	c2a773cfeb	proto: Add ns_entry description New image format, generic for all namespaces. Currently, it's for pid, net and user ns. v4: Rename ns-hookup to ns. Make user_ns and parent generic. v3: Move parent_id to pid and user ext Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-11-30 01:22:14 +03:00
Kirill Tkhai	a1d4cef08a	images: Move uid_gid_extent and userns_entry descriptions Move them into ns.proto file Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-11-30 01:22:14 +03:00
Kirill Tkhai	388d853fa3	ns: Implement dup_userns_entry() Function for cloning UsernsEntry entries. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-11-30 01:22:14 +03:00
Kirill Tkhai	b158b7bfdb	ns: Set pointer to root_user_ns in ns_ids Old type images do not have pointer to user_ns. Set them manually. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-11-30 01:22:14 +03:00
Kirill Tkhai	45588e9d6c	ns: Provide the case when root_item has !NS_ROOT user_ns in rst_add_ns_id() root_item may have NS_OTHER user_ns, so do not set it directly. This will be used in next patches. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-11-30 01:22:13 +03:00
Kirill Tkhai	a64b4cc08c	user_ns: Name loading UsernsEntry mappings on restore "old format" Split prepare_userns() in two functions. Also, this commit fixes the problem, which existed before my patchset. We do not populate userns_entry on restore, though it's need and used at least by the chain prepare_mnt_ns()->sb_opt_cb()->userns_uid(). Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-11-30 01:22:13 +03:00
Kirill Tkhai	e17627f450	ns: Add user and pid ns_id on restore Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-11-30 01:22:13 +03:00
Kirill Tkhai	5e87976151	ns: Implement target_userns_{u, g}id() and root_userns_{u, g}id() Add primitives for converting xids from NS_ROOT to custom NS_OTHER, and vice versa. v4: Fixed erratum in root_userns_gid() Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-11-30 01:22:13 +03:00
Kirill Tkhai	f10b0d2e2c	ns: Rename and export userns_id() and INVALID_ID Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-11-30 01:22:13 +03:00
Kirill Tkhai	8cdac55719	user_ns: Make host_id() working with any mapping and rename it Make possible to convert uid and gid from a user_ns to its representation in its (grand) parent user_ns. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-11-30 01:22:13 +03:00
Kirill Tkhai	0efe5cca86	user_ns: Make collect_user_ns() allocate child UsernsEntry mappings Allocate mapping for NS_OTHER too. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-11-30 01:22:13 +03:00
Kirill Tkhai	16e4530b57	ns: Change arguments of dump_user_ns() Make ns as only argument of dump_user_ns(). As the only ns, which it may be called for is root_item's ns, the logic after this patch remains the same as it was before. Also make dump_user_ns() static. In addition, pass ns to check_user_ns(). Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-11-30 01:22:13 +03:00
Kirill Tkhai	3400312812	ns: Set hookups for all namespaces Discover relationships between namespaces and populate appropriate fields in ns_id Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-11-30 01:22:13 +03:00
Kirill Tkhai	1008c64072	ns: Set nested namespaces hookups Introduce ns_id::parent and assign a pointer to parent for every ns except NS_CRIU and NS_ROOT. Also populate user_ns for pid_ns. v5: Remove excess check on on->parent. v4: Set "ret = -1" on one of the error pathes. Add comment about user_ns finding. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-11-30 01:22:13 +03:00
Kirill Tkhai	8ba659fec1	zdtm: Add userns01 test Check UID and GID in unshared userns remains the same v5: Use custom UID and GID. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-11-30 01:22:13 +03:00
Kirill Tkhai	b0af94ddf4	zdtm: Add userns00 test Create two children, and unshare() user_ns in one of them (C1). The second child creates one more process, which switches to C1's namespace and unshares. v4: Keep in mind the case, when readlink returns PATH_MAX-length string. Print full wait status instead of WEXITSTATUS(). v3: Unshare net ns in grand child Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-11-30 01:22:13 +03:00
rbruno@gsd.inesc-id.pt	15ee55f404	zdtm: Add support for image-proxy/image-cache Signed-off-by: Rodrigo Bruno <rbruno@gsd.inesc-id.pt> Signed-off-by: Katerina Koukiou <k.koukiou@gmail.com> Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>	2017-11-30 01:22:11 +03:00
rbruno@gsd.inesc-id.pt	0848223500	Process Migration using Sockets (p2) The current patch brings the implementation of the image proxy and image cache. These components are necessary to perform in-memory live migration of processes using CRIU. The image proxy receives images from CRIU Dump/Pre-Dump (through UNIX sockets) and forwards them to the image cache (through a TCP socket). The image cache caches image in memory and sends them to CRIU Restore (through UNIX sockets) when requested. Signed-off-by: Rodrigo Bruno <rbruno@gsd.inesc-id.pt> Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>	2017-11-30 01:21:41 +03:00
rbruno@gsd.inesc-id.pt	2fb8492646	Process Migration using Sockets (p1) This patch introduces the --remote option and the necessary code changes to support it. This leaves user the option to decide if the checkpoint data is to be stored on disk or sent through the network (through the image-proxy). The latter forwards the data to the destination node where image-cache receives it. The overall communication is performed as follows: src_node CRIU dump -> (sends images through UNIX sockets) -> image-proxy \| V dst_node: CRIU restore <- (receives images through UNIX sockets)<- image-cache Communication between image-proxy and image-cache is done through a single TCP connection. Running criu with --remote option is like this: dst_node# criu image-cache -d --port <port> -o /tmp/image-cache.log dst_node# criu restore --remote -o /tmp/image-cache.log src_node# criu image-proxy -d --port <port> --address <dst_node> -o /tmp/image-proxy.log src_node# criu dump -t <pid> --remote -o /tmp/dump.log [ xemul: here's the list of what should be done with the cache/proxy in order to have them merged into master. 0. Document the whole thing :) Please, add articles for newly introduced actions and options to https://criu.org/CLI page. Also, it would be good to have an article describing the protocols involved. 1. Make the unix sockets reside in work-dir. The good thing is that we've get rid of the socket name option :) But looking at do_open_remote_image() I see that it fchdir-s to image dir before connecting to proxy/cache. Better solution is to put the socket into workdir. 1a. After this the option -D\|--images-dir should become optional. Provided the --remote is given CRIU should work purely on the work-dir and not generate anything in the images-dir. 2. Tune up the image_cache and image_proxy commands to accept the --status-fd and --pidfile options. Presumably the very cr_daemon() call should be equipped with everything that should be done for daemonizing and proxy/cache tasks should just call it :) 3. Fix local connections not to generate per-image threads. There can be many images and it's not nice to stress the system with such amount of threads. Please, look at how criu/uffd.c manages multiple descriptors with page-faults using the epoll stuff. 3a. The accept_remote_image_connections() seem not to work well with opts.ps_socket scenario as the former just calls accept() on whatever socket is passed there, while the opts.ps_socket is already an established socket for data transfer. 4. No strings in protocol. Now the hard-coded "RESTORE_FINISH" string (and DUMP_FINISHED one) is used to terminate the communication. Need to tune up the protobuf objects to send boolean (or integer) EOF sign rather that the string. 5. Check how proxy/cache works with incremental dumps. Looking at the skip_remote_bytes() I think that image-cache and -proxy still do not work well with stacked pages images. Probably for those we'll need the page-server or lazy-pages -like protocol that would request the needed regions and receive it back rather than read bytes from sockets simply to skip those. 6. Add support for cache/proxy into go-phaul code. I haven't yet finished with the prototype, but plan to do it soon, so once the above steps are done we'll be able to proceed with this one. ] Signed-off-by: Rodrigo Bruno <rbruno@gsd.inesc-id.pt> Signed-off-by: Katerina Koukiou <k.koukiou@gmail.com> Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>	2017-11-30 01:21:40 +03:00
rbruno@gsd.inesc-id.pt	81ae3efbf2	util: Copy file w/o sendfile This is the case when the in/out files are image cache/proxy sockets. Signed-off-by: Rodrigo Bruno <rbruno@gsd.inesc-id.pt> Signed-off-by: Katerina Koukiou <k.koukiou@gmail.com> Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>	2017-11-30 01:19:12 +03:00
Andrew Vagin	d3ac1f40b8	zdtm: add a test for nested network namespaces This tests create a few processes which live in three network namespaces and have a few sockets which are created in different network namespaces. Acked-by: Pavel Emelyanov <xemul@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-11-30 01:19:12 +03:00
Andrei Vagin	0f5eb79a3f	net: add a way to get a network namespace for a socket Each sockets belongs to one network namespace and operates in this network namespace. socket_diag reports informations about sockets from one network namespace, but it doesn't report sockets which are not bound or connected to somewhere. So we need to have a way to get network namespaces for such sockets. Acked-by: Pavel Emelyanov <xemul@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-11-30 01:19:12 +03:00
Andrei Vagin	a123cbab65	kerndat: check the SIOCGSKNS ioctl This ioctl is called for a socket and returns a file descriptor for network namespace where a socket has been created. Acked-by: Pavel Emelyanov <xemul@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-11-30 01:19:10 +03:00
Andrei Vagin	6636f4a7af	net: set a proper network namespace to create a socket Each socket has to be restored from a proper network namespaces where it was created. We set a specified network namespace before restoring a socket. A task network namespace is set after restoring all files. v2: don't set the root netns for transport sockets Acked-by: Pavel Emelyanov <xemul@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-11-30 01:18:53 +03:00
Andrei Vagin	8ea3d00296	util: move open_proc_fd to service_fd We need this to avoid conflicts with file descriptors, which has to be restored. Currently open_proc_pid() doesn't used during restoring file descriptors, but we are going to use it to restore sockets in proper network namespaces. Acked-by: Pavel Emelyanov <xemul@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-11-30 01:18:53 +03:00
Andrei Vagin	e5be666ad6	net: allow to dump and restore more than one network namespace Restore all network namespaces from the root task and then set a proper namespace for each task after restoring sockets, because we need to switch network namespaces to restore sockets. Each socket has to be created in a proper network namespace. v2: fix a typo bug Acked-by: Pavel Emelyanov <xemul@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-11-30 01:18:53 +03:00
Andrei Vagin	286f0f3607	net: save network namespaces for sockets Each socket has to be restored in a proper namespaces where it has been created. Here is an issue about unconnected and unbound sockets, they are not reported via socket-diag and we can't to get their network namespaces. v2: add a comment before get_socket_ns() remove nsid from sk_packet_entry Acked-by: Pavel Emelyanov <xemul@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-11-30 01:18:53 +03:00
Pavel Emelyanov	7e81cd9298	usernsd: Add debugging to catch BUG in unsd fd/flags mismatch Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>	2017-11-30 01:18:53 +03:00
Andrei Vagin	4a48a5d68a	net: rename pid into nsid for prepare_net_ns() PID ussualy means processs ID, but prepare_net_ns works with namespaces. travis-ci: success for Dump and restore nested network namespaces (rev4) Signed-off-by: Andrei Vagin <avagin@virtuozzo.com> Reviewed-by: Dmitry Safonov <dsafonov@virtuozzo.com> Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>	2017-11-30 01:18:53 +03:00
Andrei Vagin	0725a3e4c2	netlink: add ns_id as a generic argument to receive_callback ns_id will be used to collect sockets and other per-netns resources travis-ci: success for Dump and restore nested network namespaces (rev4) Signed-off-by: Andrei Vagin <avagin@virtuozzo.com> Reviewed-by: Dmitry Safonov <dsafonov@virtuozzo.com> Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>	2017-11-30 01:18:53 +03:00

1 2 3 4 5 ...

8960 Commits