mir/criu - criu - Mike's Git repositories

mir/criu

mirror of https://github.com/checkpoint-restore/criu synced 2025-08-28 12:57:57 +00:00

Author	SHA1	Message	Date
Cyrill Gorcunov	80ef8fd2fb	mount: Handle deleted bindmounts To handle deleted bindmounts we simply create the former directory bindmount lived at, mount the target and remove the directory back. For this sake we add @deleted entry into the image. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Tycho Andersen <tycho.andersen@canonical.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-08-21 21:26:17 +03:00
Cyrill Gorcunov	5c18267ddf	proc-parse: Fix line merging Seems was a typo. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-08-13 17:18:19 +03:00
Andrey Vagin	9a8ca1cfff	mount: convert uid and gid properties according with userns uid and gid are shown in the init userns. We are going to restore mounts in a target userns, so we need to set these options in the target userns. Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-08-07 14:42:00 +03:00
Gabriel Guimaraes	dbaab31f31	Workaround for the OverlayFS bug present before Kernel 4.2 This is here only to support the Linux Kernel between versions 3.18 and 4.2. After that, this workaround is not needed anymore, but it will work properly on both a kernel with and without the bug. The bug is that when a process has a file open in an OverlayFS directory, the information in /proc/<pid>/fd/<fd> and /proc/<pid>/fdinfo/<fd> is wrong, so we grab that information from the mountinfo table instead. This is done every time fill_fdlink is called. We first check to see if the mnt_id and st_dev numbers currently match some entry in the mountinfo table. If so, we already have the correct mnt_id and no fixup is needed. Then we proceed to see if there are any overlayFS mounted directories in the mountinfo table. If so, we concatenate the mountpoint with the name of the file, and stat the resulting path to check if we found the correct device id and node number. If that is the case, we update the mount id and link variables with the correct values. Signed-off-by: Gabriel Guimaraes <gabriellimaguimaraes@gmail.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-08-07 14:30:41 +03:00
Andrey Vagin	2f172c8b24	crtools: split cr-dump.c in two files Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-08-06 14:31:06 +03:00
Christopher Covington	1438f013a2	Pass task_size to vma_area_is_private() If we want one CRIU binary to work across all AArch64 kernel configurations, a single task size value cannot be hard coded. Since vma_area_is_private() is used by both restorer blob code and non restorer blob code, which must use different variables for recording the task size, make task_size a function argument and modify the call sites accordingly. This fixes the following error on AArch64 kernels with CONFIG_ARM64_64K_PAGES=y. pie: Error (pie/restorer.c:929): Can't restore 0x3ffb7e70000 mapping w> pie: ith 0xfffffffffffffff7 Signed-off-by: Christopher Covington <cov@codeaurora.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-08-03 17:14:18 +03:00
Cyrill Gorcunov	f377dade94	proc: Align data in parse_mnt_flags and parse_sb_opt Make it more readable. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-07-29 17:57:53 +03:00
Cyrill Gorcunov	4ee50534a8	proc: Drop parse_cpuinfo_features helper We use native cpuid, so this one is no longer used. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-07-28 13:37:03 +03:00
Andrey Vagin	cb3c1bb7fe	proc: show a string in a error message if we can't parse it Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-07-23 15:13:17 +03:00
Andrey Vagin	5b767ffd49	mount: decode paths from mountinfo (v2) mountinfo contains mangled paths. space, tab and back slash were replaced with usual octal escape, so we need to replace these charecter back. v2: declare cure_path as static Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-07-21 16:44:55 +03:00
Andrey Vagin	34cb65ce5d	mount: handle a case when a source argument is empty (v2) For example: mount -t tmpfs "" test v2: don't leak memory Reported-by: Ross Boucher <boucher@gmail.com> Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-07-21 16:44:23 +03:00
Tycho Andersen	209693d49b	don't assume the kernel has CONFIG_SECCOMP linux/seccomp.h may not be available, and the seccomp mode might not be listed in /proc/pid/status, so let's not assume those two things are present. v2: add a seccomp.h with all the constants we use from linux/seccomp.h v3: don't do a compile time check for PTRACE_O_SUSPEND_SECCOMP, just let ptrace return EINVAL for it; also add a checkskip to skip the seccomp_strict test if PTRACE_O_SUSPEND_SECCOMP or linux/seccomp.h aren't present. v4: use criu check --feature instead of checkskip to check whether the kernel supports seccomp_suspend Reported-by: Mr. Jenkins Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com> Acked-by: Andrew Vagin <avagin@odin.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-07-13 14:50:35 +03:00
Tycho Andersen	e0b24e21d3	creds: fail to dump when creds in thread group don't match Since we don't support dumping per-thread creds, let's at least fail to dump if the creds don't match. Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-06-24 17:38:35 +03:00
Tycho Andersen	0d8aec0c3a	seccomp: add initial support for SECCOMP_MODE_STRICT Unfortunately, SECCOMP_MODE_FILTER is not currently exposed to userspace, so we can't checkpoint that. In any case, this is what we need to do for SECCOMP_MODE_STRICT, so let's do it. This patch works by first disabling seccomp for any processes who are going to have seccomp filters restored, then restoring the process (including the seccomp filters), and finally resuming the seccomp filters before detaching from the process. v2 changes: * update for kernel patch v2 * use protobuf enum for seccomp type * don't parse /proc/pid/status twice v3 changes: * get rid of extra CR_STAGE_SECCOMP_SUSPEND stage * only suspend seccomp in finalize_restore(), just before the unmap * restore the (same) seccomp state in threads too; also add a note about how this is slightly wrong, and that we should at least check for a mismatch Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-06-24 17:38:32 +03:00
Pavel Emelyanov	b08f3fae5b	vdso: Reduce the amount of in-code ifdef-s Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Reviewed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>	2015-06-08 23:34:33 +03:00
Andrey Vagin	5a9fe81b75	locks: print unknown file locks Now it isn't clear which lock is not supported. Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-05-30 00:32:16 +03:00
Andrew Vagin	641693f8f0	proc_parse: remove a debug message Signed-off-by: Andrew Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-04-29 17:24:01 +03:00
Andrew Vagin	84c65f00f9	proc_parse: handle errors for breadline() 00:03:27.746 (00.008815) Error (bfd.c:149): bfd: Error reading file: No such process Reported-by: Mr Jenkins Signed-off-by: Andrew Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-04-29 17:23:55 +03:00
Oleg Nesterov	be4acd9d6e	fix parse_mnt_flags() to dump/restore STRICTATIME correctly CRIU always retores the mounts as MNT_RELATIME. This is because the kernel uses this mode by default, so we need to pass MS_STRICTATIME explicitely if we didn't see "noatime" or "MS_RELATIME". While at it, make mnt_opt2flag[] and sb_opt2flag "static", otherwise gcc actually creates these arrays on stack even if there are "const". Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-04-27 14:54:44 +03:00
Andrey Vagin	25267e5b30	lock: parse the lock field in fdinfo if it's avaliable (v2) /proc/locks can contain a wrong pid for a lock and we always need to check this fact. Starting with the 4.1 kernel, locks are reported in fdinfo. v2: rebase to the curret master skip note_file_lock() Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-04-27 14:53:24 +03:00
Andrey Vagin	e59a81bab6	proc_parse: handle a return code of fopen_proc Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-04-22 15:36:20 +03:00
Andrey Vagin	4a36feacc7	parse_proc: take into account that breadline can return an error code Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-04-22 15:36:12 +03:00
Andrey Vagin	6b4ecb90db	proc_parse: don't play with a function exit code We should not have a chance to exit with a wrong code on error paths. \|^^^\ \| \________________ \| ** \|_\ \_______/^^^^^^^/_____/ / / / / /____/ Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-04-22 15:34:37 +03:00
Oleg Nesterov	6400fc9516	simplify the "ignore filesystem-subtype" logic We can simply overwrite the dot symbol right after the kernel reports it to us. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Tycho Andersen <tycho.andersen@canonical.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-04-14 15:15:19 +03:00
Oleg Nesterov	eb518936d8	introduce --skip-mnt cli option Which obviously can be used to "ignore" the mounts we do not want or need to dump. The user should know what he does. Note: this patch changes parse_mountinfo() to check should_skip_mount(). This is because imo we want to filter out the unwanted mounts asap, af if they do not exist. This increases the chances the dumping will fail if something else depends on this mount. Say, another mountpoint or an opened file. Perhaps it makes sense to teach should_skip_mount() to use fnmatch() and/or look at the optional "(fs\|mnt)=" prefix to skip by fsname too. To me it would be better to force the user of this option to understand what it does. Say, if "dump" fails because the child mount can't find the skipped parent, he should add another --skip-mnt option or do not dump. Otherwise, if we do this automagically the user can probably be surpised, he might even miss the fact that we skip more than he asked. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-04-03 17:56:05 +03:00
Oleg Nesterov	9fee3dc817	pass "bool for_dump" argument down to collect_mntinfo() and parse_mountinfo() Preparation. 1. Add the new "bool for_dump" arg to collect/parse_mntinfo(). 2. Introduce "struct collect_mntns_arg" to pass the additional "bool for_dump" field to collect_mntinfo() and change it to pass this boolean to collect_mntinfo()->parse_mountinfo() path. 3. Change other callers of collect_mntinfo() to pass "false". Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-04-03 17:55:18 +03:00
Cyrill Gorcunov	9ce0254c04	vma: Unify private VMAs testing We have two helpers for VMA type testing: privately_dump_vma() and vma_priv(). They work with different types but basically do the same: check if we should dump VMA into the image and restore it back then. Lets unify they both into common vma_entry_is_private() helper and vma_area_is_private() for working with vma_area type. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-04-01 12:36:46 +03:00
Oleg Nesterov	79e0b37c4e	parse_mountinfo_ent: xrealloc(new->mountpoint) can fail This is pure theoretical, especially in this particular case when we actually want to (likely) free the unused memory. Still the code which ignores potential error doesn't look good. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-03-30 13:20:39 +03:00
Oleg Nesterov	69600335a9	parse_mountinfo_ent: fix the leakage of "opt" 1. parse_mountinfo_ent() mixes "return -1" and "goto err" on failure, this looks confusing and inconsistent. 2. And buggy. It forgets to free(opt) if parse_mnt_flags() fails. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-03-30 13:20:38 +03:00
Oleg Nesterov	8b5faee7c6	parse_mountinfo_ent: kill the wrong xfree(new->mountpoint) The caller will do this on failure too. So this is unnecessary and wrong because we do not nullify ->mountpoint. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-03-30 13:20:37 +03:00
Oleg Nesterov	b66728ef14	parse_mountinfo: fix and simplify the usage of r_fstype 1. parse_mountinfo() forgets to free(fst) if parse_mountinfo_ent() succeeds. 2. The usage of fst/r_fstype is ovecomplicated for no reason. Just change the parse_mountinfo() paths to populate/use/free this fsname unconditionally, and move the ownership to the caller. There is no reason to check FSTYPE__UNSUPPORTED and/or fallback to ->name. Better yet, we could even turn fsname into the local "char []" and avoid %ms and free(), but then we would need to pass the length of this buffer to parse_mountinfo_ent(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-03-30 13:20:37 +03:00
Oleg Nesterov	2cfeeac465	parse_mountinfo: add the "end" block into the main loop Preparation to simplify the review. parse_mountinfo() assumes that: 1. The "err:" block does all the necessary cleanups on failure. This is wrong, see the next patch. 2. We can never skip the mountpoint. This is true, but we are going to change this. s/goto err/goto end/ in the main loop, add the "end:" label which inserts the new mount_info into the list and then checks ret != 0 to figure out whether we need to abort. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-03-30 13:20:36 +03:00
Andrey Vagin	54d0f24107	proc_parse: fixed format strings Use the format specifier PRIx64 instead of %lx to print uint64. integer. Reported-by: Mr Travis CI Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-03-27 15:27:59 +03:00
Oleg Nesterov	57db932a0a	mount: always report ->mnt_id as decimal validate_mounts() prints ->mnt_id in hex when it reports the failure. This complicates the understanding because this ->mnt_id is printed as decimal elsewhere, including /proc/$pid/mountinfo. parse_mountinfo() adds "0x" at least and this is just pr_info(), but lets change it too. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Andrew Vagin <avagin@openvz.org> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-03-27 14:03:04 +03:00
Andrey Vagin	36f0c16db0	proc: move logic about adding vma into a list in a separate function parse_smaps() is too big for easy reading. In addition, we are creating a new interface to get information about processes, which is called taskdiag, so parse_smaps() will do only what it should do accoding with the name. All other should be moved in a separate functions which will be reused to work with task_diag. Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-03-27 14:01:19 +03:00
Andrey Vagin	046245dcb0	proc: move logic about filling vma structures into a separate function parse_smaps() is too big for easy reading. In addition, we are creating a new interface to get information about processes, which is called taskdiag, so parse_smaps() will do only what it should do accoding with the name. All other should be moved in a separate functions which will be reused to work with task_diag. Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-03-27 14:00:50 +03:00
Pavel Emelyanov	7ede4697cf	bfd: Don't leak image-open flags into bfdopen Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-03-16 15:58:14 +03:00
Saied Kazemi	e3fec5f8eb	Ignore mnt_id value for AUFS file descriptors. Starting with version 3.15, the kernel provides a mnt_id field in /proc/<pid>/fdinfo/<fd>. However, the value provided by the kernel for AUFS file descriptors obtained by opening a file in /proc/<pid>/map_files is incorrect. Below is an example for a Docker container running Nginx. The mntid program below mimics CRIU by opening a file in /proc/1/map_files and using the descriptor to obtain its mnt_id. As shown below, mnt_id is set to 22 by the kernel but it does not exist in the mount namespace of the container. Therefore, CRIU fails with the error: "Unable to look up the 22 mount" In the global namespace, 22 is the root of AUFS (/var/lib/docker/aufs). This patch sets the mnt_id of these AUFS descriptors to -1, mimicing pre-3.15 kernel behavior. $ docker ps CONTAINER ID IMAGE ... 3850a63ee857 nginx-streaming:latest ... $ docker exec -it 38 bash -i root@3850a63ee857:/# ps -e PID TTY TIME CMD 1 ? 00:00:00 nginx 7 ? 00:00:00 nginx 31 ? 00:00:00 bash 46 ? 00:00:00 ps root@3850a63ee857:/# ./mntid 1 open("/proc/1/map_files/400000-4b8000") = 3 cat /proc/49/fdinfo/3 pos: 0 flags: 0100000 mnt_id: 22 root@3850a63ee857:/# awk '{print $1 " " $2}' /proc/1/mountinfo 87 58 103 87 104 87 105 104 106 104 107 104 108 87 109 87 110 87 111 87 root@3850a63ee857:/# exit $ grep 22 /proc/self/mountinfo 22 21 8:1 /var/lib/docker/aufs /var/lib/docker/aufs ... 44 22 0:35 / /var/lib/docker/aufs/mnt/<ID> ... $ Signed-off-by: Saied Kazemi <saied@google.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-02-09 14:07:40 +03:00
Pavel Emelyanov	89d9578730	proc: Print parsed fstype for unsupported mounts Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-01-27 16:15:22 +03:00
Pavel Emelyanov	c9b6614eef	proc: Remove now pointless debug Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-01-26 15:05:32 +03:00
Saied Kazemi	2749d9e6ea	Rework fixup_aufs_vma_fd() for non-AUFS links This patch reworks fixup_aufs_vma_fd() to let symbolic links in /proc/<pid>/map_files that are not pointing to AUFS branch names follow the non-AUFS applcation logic. The use case that prompted this commit was an application mapping /dev/zero as shared and writeable which shows up in map_files as: lrw------- ... 7fc5c5a5f000-7fc5c5a60000 -> /dev/zero (deleted) If the AUFS support code reads the link, it will have to strip off the " (deleted)" string added by the kernel but core CRIU code already does this. Signed-off-by: Saied Kazemi <saied@google.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-01-22 14:56:40 +03:00
Pavel Emelyanov	08c204820f	aio: Dump AIO rings When AIO context is set up kernel does two things: 1. creates an in-kernel aioctx object 2. maps a ring into process memory The 2nd thing gives us all the needed information about how the AIO was set up. So, in order to dump one we need to pick the ring in memory and get all the information we need from it. One thing to note -- we cannot dump tasks if there are any AIO requests pending. So we also need to go to parasite and check the ring to be empty. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-12-26 18:13:36 +03:00
Pavel Emelyanov	86c0c5fb99	proc: Allocate and get vma fstat in vma_get_mapfile We will need to detect aio mappings soon, so this is a preparation, that makes future patching simpler. Also move aufs stat-ing into aufs code to keep more aufs logic in one place. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-12-25 21:10:15 +03:00
Pavel Emelyanov	6a6cdb8d4a	proc: Drop always true last argument of parse_smaps() Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Andrew Vagin <avagin@parallels.com>	2014-12-22 13:52:03 +03:00
Pavel Emelyanov	ab2c1e426c	proc_parse: Invert supported VMA check It's for more natural adding of new else-if branch for aio. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-12-17 18:11:40 +03:00
Pavel Emelyanov	d4c62b6b5c	proc_parse: Print vma start in hex Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-12-17 18:10:54 +03:00
Cyrill Gorcunov	88031bf89e	proc_parse: Convert parse_pid_status to BFD engine Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-11-11 20:18:27 +04:00
Pavel Emelyanov	19a76494a9	kerndat: Collect all global variables on one struct Not to spoil the global namespace and unify the kerndat data names. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-11-11 20:14:53 +04:00
Andrey Vagin	71a9cd0634	proc: delete parse_pid_stat_small() (v2) It's unused now. v2: remove the proc_pid_stat_small struct too. Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-11-07 17:15:37 +04:00
Andrey Vagin	05943959a5	proc: parse state and ppid from /proc/pid/status (v2) v2: don't leak FILE CID 73423 (#1 of 1): Resource leak (RESOURCE_LEAK) 15. leaked_storage: Variable f going out of scope leaks the storage it points to. Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-11-07 17:15:03 +04:00

1 2 3 4 5

218 Commits