mir/criu - criu - Mike's Git repositories

mir/criu

mirror of https://github.com/checkpoint-restore/criu synced 2025-08-26 11:57:52 +00:00

Author	SHA1	Message	Date
Tycho Andersen	e6a3aef43e	remap: don't allocate dead pids in wrong context Closes #87 Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com> CC: Andrew Vagin <avagin@virtuozzo.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-11-23 11:47:29 +03:00
Tycho Andersen	cc9587ffc5	seccomp: is optional when parsing /proc/pid/status Also define some constants for people who don't have them in their headers. Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-11-23 11:44:50 +03:00
Andrew Vagin	028998c588	proc_parse: parse pending signals It's required to check the SIGSTOP signal, which can't be blocked. Signed-off-by: Andrew Vagin <avagin@virtuozzo.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-11-20 21:13:31 +03:00
Cyrill Gorcunov	7de345d6b7	net: Move node's net fd reference into service fd So we keep it and dont close inside close_old_fds() helper but pass into veth creation so the kernel can fetch the net namespace of the veth peer. v2 (by avagin@): - don't forget to close opened descriptor Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-11-19 16:46:36 +03:00
Andrew Vagin	1e8a0594db	net: dump iptables for ipv6 (v2) v2: don't dump iptables if ipv6 isn't supported Signed-off-by: Andrew Vagin <avagin@virtuozzo.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-11-19 15:19:01 +03:00
Andrew Vagin	1648db970c	kerndat: check whether ipv6 is supported or not (v2) v2: use a cached value to dump ipv6 interface addesses call get_ipv6() from kerndat_init_rst too Signed-off-by: Andrew Vagin <avagin@virtuozzo.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-11-19 15:18:08 +03:00
Andrew Vagin	a2780c6131	lock: futex() with timeout isn't restarted after signals (v2) It returns EINTR, so we need to handle it. $ bash test/zdtm.sh --restore-sibling ns/static/env00 ... futex(0x7fc20ec92010, FUTEX_WAIT, 1, {120, 0}) = ? ERESTART_RESTARTBLOCK (Interrupted by signal) Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-11-19 15:15:39 +03:00
Andrew Vagin	4c00ac2908	lock: print a message if a futex is locked for more than 120 second Signed-off-by: Andrew Vagin <avagin@virtuozzo.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-11-17 10:52:22 +03:00
Tycho Andersen	221af18ea0	seccomp: add support for SECCOMP_MODE_FILTER This commit adds basic support for dumping and restoring seccomp filters via the new ptrace interface. There are two current known limitations with this approach: 1. This approach doesn't support restoring tasks who first do a seccomp() and then a setuid(); the test elaborates on this and I don't think it is tough to do, but it is not done yet. 2. Filters are compared via memcmp(), so two tasks which have the same parent task and install identical (via memory) filters will have those filters considered to be the "same". Since we force all tasks to have the same creds (including seccomp filters) right now, this isn't a problem. The approach used here is very similar to the cgroup approach: the actual filters are stored in a seccomp.img, and each task has an id that points to the part of the filter tree it needs to restore. This keeps us from dumping the same filter multiple times, since filters are inherited on fork. v2: * remove unused seccomp_filters field from struct rst_info * rework memory layout for passing filters to restorer blob * add a sanity check when finding inherited filters Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-11-17 10:51:20 +03:00
Andrew Vagin	b78af1923b	mount: wait when mntns will be created to get its root (v2) v2: add comments and rename ns_created to ns_populated. Reported-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Andrew Vagin <avagin@virtuozzo.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-11-17 10:46:00 +03:00
Andrew Vagin	7017181849	mount: don't inherit mount namespace descriptors to each process close_olds_fds() knows nothing about more than one set of service file descriptros, so it's better to call it before forking children as it was bedore 9d60724eca71 ("restore: restore mntns before creating private vma-s") The root task restores all processes and pin them with file descriptors, then a task restores a mount namespace by opening the file descriptor of the root task via /proc/pid/fd/X. Reported-by: Mr Jenkins Signed-off-by: Andrew Vagin <avagin@virtuozzo.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-11-17 10:45:09 +03:00
Andrew Vagin	9d60724eca	restore: restore mntns before creating private vma-s (v3) We need to open a file to restore a file mapping and this file can be from a current mntns. v2: All namespaces are resotred from the root task and then other tasks calls setns() to set a proper mntns. v3: fix comments from Pavel Signed-off-by: Andrew Vagin <avagin@virtuozzo.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-11-14 09:53:47 +03:00
Pavel Emelyanov	dc00fea333	net: Dont print error in rule save This thing is new and can be absent in ip tool, which is OK and is handled by net.c code itself. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-11-12 16:31:21 +03:00
Pavel Emelyanov	18d9170858	util: Add flags to cr_system Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-11-12 16:31:19 +03:00
Cyrill Gorcunov	ee2409ec37	compiler: Grab min_t, max_t from the kernel Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-11-12 14:57:00 +03:00
Cyrill Gorcunov	ba475b8dcf	bitmap -- Add few helpers for bits manipulations Grabbed from kernel. Probably worth to gather all bits manipulators here in future. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-11-12 11:15:02 +03:00
Pavel Emelyanov	780d699401	page-read: Teach page-read to read multiple pages at once This is preparatory patch, the problem to solve is described in the next one. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-11-12 11:14:43 +03:00
Tycho Andersen	8a95be0679	net: allow c/r of empty bridges in the container Implementing c/r of bridges with slaves shouldn't be too hard (viz. the comment), but this is all I need to for right now. v2: remove extra debug statement v3: * remember to close fd in dump_bridge * use "known" buffer length and snprintf for spath in dump_bridge * change brace style Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-11-12 10:31:58 +03:00
Pavel Emelyanov	a20ed3c6f0	page-server: Fine grained corking control (v3) When live migrating a container with large amount of processes inside the time to do page-server-ed dump may be up to 10 times slower than for the local dump. The delay is always introduced in the open_page_server_xfer() when criu negotiates the has_parent bit on the 2nd task. This likely happens because of the Nagel algo taking place -- after the write() of the OPEN2 command happened kernel delays this command sending waiting for more data. v2: Fix this by turning on CORK option on memory transfer sockets on send side, and NODELAY one once on urgent data. Receive side is always NODELAY-ed. According to Alexey Kuznetsov this is the best mode ever for such type of transfers. v3: Push packets in pre-dump's check_parent_server_xfer too. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Andrew Vagin <avagin@odin.com>	2015-11-10 16:00:25 +03:00
Pavel Emelyanov	d6d06c9dfc	Open proc links with O_PATH These three are like map_files one. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>	2015-11-10 15:58:36 +03:00
Cyrill Gorcunov	049a7c828a	userns: Wrap call with a macro fore readability Pass function name into a helper instead of pointer wich doesn't provide much useful info. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-11-05 15:29:04 +03:00
Kirill Tkhai	c9afd17ad6	net: Add ip rule save/restore Add support for save and restore of ip rules. It uses new functionality of iproute which is already in iproute git: http://git.kernel.org/cgit/linux/kernel/git/shemminger/iproute2.git/commit/?id=2f4e171f7df22107b38fddcffa56c1ecb5e73359 v2: Use xstrdup() instead of strdup(). v3: Use open/close instead of helper. v4: Return -1 on empty dump. Signed-off-by: Kirill Tkhai <ktkhai@odin.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-10-27 22:56:33 +03:00
Andrew Vagin	1d8fcb6b94	bfd: add breadchr Reading stops after an EOF or a specified charecter. Signed-off-by: Andrew Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-10-27 22:51:09 +03:00
Cyrill Gorcunov	7a99e699ce	mnt: Export __open_mountpoint We gonna need it for inotify handle testing. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-10-21 15:08:03 +03:00
Kir Kolyshkin	5940e3d14c	xfree(): simplify Contrary to a popular opinion, there is no need to check an argument for being non-NULL before calling free(). >From free(3) man page: > > If ptr is NULL, no operation is performed. Let's change xfree macro to be a synonym for free(). Signed-off-by: Kir Kolyshkin <kir@openvz.org> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-10-21 14:58:39 +03:00
Pavel Emelyanov	68baf8e77d	criu: Fault injection core This patch(set) is inspired by similar from Andrey Vagin sent sime time earlier. The major idea is to artificially fail criu dump or restore at specific places and let zdtm tests check whether failed dump or restore resulted in anything bad. This particular patch introduces the ability to tell criu "fail at X point". Each point is specified with a integer constant and with the next patches there will appear places over the code checking for specific fail code being set and failing. Two points are introduced -- early on dump, right after loading the parasite and right after creation of the root task. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-10-19 12:42:29 +03:00
Cyrill Gorcunov	61859d1176	fsnotify: Filter out internal inotify bits when restoring marks The kernel prior 4.3 is exporting FS_EVENT_ON_CHILD bit via procfs fdinfo interface. This bit is kernel's internal and should not be passed in inotify_add_watch call. Thus simply filter it out when obtain from old images for backward compatibility reason. More details here https://lkml.org/lkml/2015/9/21/680 Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-10-14 15:51:55 +03:00
Matthew Krafczyk	29c08d8672	Add pre-dump and pre-restore action scripts This allows the user to perform actions before dumping or restoration occurs. Signed-off-by: Matthew Krafczyk <krafczyk.matthew@gmail.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-10-09 18:23:41 +03:00
Christopher Covington	871da9a111	pie: Give VDSO symbol table local scope In commit c2271198, Laurent Dufour kindly reunified the VDSO code that had become duplicated between architectures. Unfortunately this introduced a regression in AArch64 where apparently due to the scope of vdso_symbols array of pointers to characters changing from local to global, load-time relocations became necessary. The following thread on the GCC mailing list discusses why load-time relocations can be necessary when pointers are used, although it doesn't mention the potential for locally scoped arrays to be handled differently: https://gcc.gnu.org/ml/gcc/2004-05/msg01016.html Because the alternatives, such as porting piegen to AArch64, are far more involved, simply revert the change in scope. Signed-off-by: Christopher Covington <cov@codeaurora.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-10-05 13:21:16 +03:00
Christopher Covington	627f9a9e5f	aarch64: Fix write_intraprocedure_branch types In the recent VDSO code reunification, some types were changed but a pair of necessary corresponding changes was omitted. Fix that so the AArch64 build succeeds without type-related warnings-turned-errors. Also move the definition to the AArch64-specific header since it's not currently being used by any other architectures. Signed-off-by: Christopher Covington <cov@codeaurora.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-10-05 13:20:01 +03:00
Tycho Andersen	f79f4546cf	sysctl: move sysctl calls to usernsd When in a userns, tasks can't write to certain sysctl files: (00.009653) 1: Error (sysctl.c:142): Can't open sysctl kernel/hostname: Permission denied See inline comments for details on affected namespaces. Mostly for my own education in what is required to port something to be userns restorable, I ported the sysctl stuff. A potential concern for this patch is that copying structures with pointers around is kind of gory. I did it ad-hoc here, but it may be worth inventing some mechanisms to make it easier, although I'm not sure what exactly that would look like (potentially re-using some of the protobuf bits; I'll investigate this more if it looks helpful when doing the cgroup user namespaces port?). Another issue is that there is not a great way to return non-fd stuff in memory right now from userns_call; one of the little hacks in this code would be "simplified" if we invented a way to do this. v2: coalesce the individual struct sysctl_req requests into one big sysctl_userns_req that is in a contiguous region of memory so that we can pass it via userns_call. Hopefully nobody finds my little ascii diagram too offensive :) v3: use the fork/setns trick to change the syctl values in the right ns for IPC/UTS nses; see inline comment for details v4: only use sysctl_userns_req when actually doing a userns_call. Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-10-05 13:16:14 +03:00
Andrew Vagin	a973e6fcb3	net: dump ipv6 routes "ip route dump" dumps only ipv4 routes. Reported-by: Ross Boucher <boucher@gmail.com> Signed-off-by: Andrew Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-10-05 13:11:31 +03:00
Tycho Andersen	97cb181cbc	irmap: don't leak irmap objects in --irmap-scan-path v2: use struct irmap directly in irmap_path_opt Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-09-28 22:02:51 +03:00
Pavel Emelyanov	efa7dcf7c2	ghost: Remove ghost files if restore fails Issue #18. When restore fails ghost files remain there. And to remove them we have to know their list, paths to original files (to construct the ghost name) and the namespace ghost lives in. For the latter we keep the restore task namespace at hands till the final stage and setns into it to kill ghosts. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-09-28 22:00:37 +03:00
Pavel Emelyanov	a7c9f3011d	mnt: Read mount images early Mappings from mount id to namespace will be required to remove ghosts on restore failure. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-09-28 22:00:36 +03:00
Pavel Emelyanov	b0e23c3d4f	files: Collect ghosts and regilfes early Info about ghosts presence and paths will be needed to remove the ghosts itself and thus are needed in criu. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-09-28 22:00:35 +03:00
Pavel Emelyanov	152222a6b7	remap: Sanitize ghost file path printing First -- avoid two memory copies by printing ns root directly, and second -- remove extra argument from create_ghost, the mnt_id value we need there can be found on the ghost_file object. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-09-28 21:59:45 +03:00
Pavel Emelyanov	6cf77f6726	remap: Rename fields for easier grep Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-09-28 21:58:28 +03:00
Pavel Emelyanov	7ca6cc1eb2	mnt: Clean roots yard from criu process So here it is. If root task dies on restore the roots yard dir remains unrmdired :( Since we already know its name, we can remove one from criu. By the time we get to this place the sub mount namespace(s) are already dead and yard dir is empty. But umounting should be done by tasks after successfull restore, so keep depopulation there. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-09-28 21:57:35 +03:00
Pavel Emelyanov	3e7c92ed02	mnt: Renames around roots yard Same thing as in previous patch -- we have too many generic clean_ and fini_ prefixes over the code. And we need more (see next patch), so let's specify what exactly we clean or fini. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-09-28 21:57:21 +03:00
Pavel Emelyanov	c5c65fe17a	mnt: Create roots in criu context In case root task restore failure we'll have to remove the roots yard dir from criu, so we have to create one by criu to at least have the dit name. It's OK to do it in criu, since the yards is created in the opts.root which is the same for any mnt ns we deal with on restore. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-09-28 21:56:51 +03:00
Pavel Emelyanov	e3f5ba3c37	ns: Prepare namespaces before tasks There's already two things we do in criu namespaces before forking the init task (start unsd and keep netnsfd for back reference). Next patches will introduce the 3rd action for mount namespaces, so have a special pre-call for all this stuff. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-09-28 21:56:26 +03:00
Pavel Emelyanov	9b3189fed1	util: Add make_yard helper Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-09-28 11:32:18 +03:00
Pavel Emelyanov	9353051ba7	ns: Check ns type with type field Actually make use of the ns->type field and remove all getpid()'s and other strange/inconsistent checks. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-09-21 12:15:28 +03:00
Pavel Emelyanov	22b7256612	ns: Introduce ns type We (may) have 3 types of namespace objects in criu -- criu's one, root task's one and others. All of them sometimes make sense and we differentiate them in a weird way -- by checking the ns->pid field against getpid() or by comparing with root_item's. The proposal is to mark ns_id objects explicitly with type field. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-09-21 12:14:07 +03:00
Tycho Andersen	85ebf0a83b	usernsd: also pass pid of process that made the req We'll use this in the next patch to correctly write sysctls. Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-09-21 12:01:01 +03:00
Tycho Andersen	72ff44d0dc	usernsd: move MAX_MSG_SIZE to namespaces.h We'll use this size in the next patch to avoid having to do some dynamic allocation. v2: call it MAX_UNSFD_MSG_SIZE instead v3: fix all uses of MAX_MSG_SIZE :) Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-09-21 11:57:40 +03:00
Andrey Vagin	1174a2ad0f	mount: handle mnt_flags and sb_flags separatly (v4) They both can container the MS_READONLY flag. And in one case it will be read-only bind-mount and in another case it will be read-only super-block. v2: set mnt and sb for one call of mount() when it's posiable v3: return a comment which was deleted by mistake v4: Fix the sentense about restoring mnt flags Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-09-21 11:55:17 +03:00
Tycho Andersen	4f2e4ab3be	irmap: add --irmap-scan-path option This option allows users to specify their own irmap paths to scan in the event that they don't have a path in one of the hard coded hints. Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-09-21 11:46:12 +03:00
Andrey Vagin	d3be641acd	cgroups: get controllers from /proc/self/cgroups (v2) Some controllers can be disabled in kernel options. In this case they are shown in /proc/cgroups, but they could not be mounted. All enabled controllers can be collected from /proc/self/cgroup. https://github.com/xemul/criu/issues/28 v2: ',' is used to separate controllers Cc: Tycho Andersen <tycho.andersen@canonical.com> Reported-by: Ross Boucher <boucher@gmail.com> Signed-off-by: Andrey Vagin <avagin@openvz.org> Acked-by: Tycho Andersen <tycho.andersen@canonical.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-09-16 15:46:10 +03:00

1 2 3 4 5 ...

1696 Commits