mir/criu - criu - Mike's Git repositories

mir/criu

mirror of https://github.com/checkpoint-restore/criu synced 2025-08-28 21:07:43 +00:00

Author	SHA1	Message	Date
Tycho Andersen	0d8aec0c3a	seccomp: add initial support for SECCOMP_MODE_STRICT Unfortunately, SECCOMP_MODE_FILTER is not currently exposed to userspace, so we can't checkpoint that. In any case, this is what we need to do for SECCOMP_MODE_STRICT, so let's do it. This patch works by first disabling seccomp for any processes who are going to have seccomp filters restored, then restoring the process (including the seccomp filters), and finally resuming the seccomp filters before detaching from the process. v2 changes: * update for kernel patch v2 * use protobuf enum for seccomp type * don't parse /proc/pid/status twice v3 changes: * get rid of extra CR_STAGE_SECCOMP_SUSPEND stage * only suspend seccomp in finalize_restore(), just before the unmap * restore the (same) seccomp state in threads too; also add a note about how this is slightly wrong, and that we should at least check for a mismatch Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-06-24 17:38:32 +03:00
Pavel Emelyanov	b08f3fae5b	vdso: Reduce the amount of in-code ifdef-s Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Reviewed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>	2015-06-08 23:34:33 +03:00
Andrey Vagin	5a9fe81b75	locks: print unknown file locks Now it isn't clear which lock is not supported. Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-05-30 00:32:16 +03:00
Andrew Vagin	641693f8f0	proc_parse: remove a debug message Signed-off-by: Andrew Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-04-29 17:24:01 +03:00
Andrew Vagin	84c65f00f9	proc_parse: handle errors for breadline() 00:03:27.746 (00.008815) Error (bfd.c:149): bfd: Error reading file: No such process Reported-by: Mr Jenkins Signed-off-by: Andrew Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-04-29 17:23:55 +03:00
Oleg Nesterov	be4acd9d6e	fix parse_mnt_flags() to dump/restore STRICTATIME correctly CRIU always retores the mounts as MNT_RELATIME. This is because the kernel uses this mode by default, so we need to pass MS_STRICTATIME explicitely if we didn't see "noatime" or "MS_RELATIME". While at it, make mnt_opt2flag[] and sb_opt2flag "static", otherwise gcc actually creates these arrays on stack even if there are "const". Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-04-27 14:54:44 +03:00
Andrey Vagin	25267e5b30	lock: parse the lock field in fdinfo if it's avaliable (v2) /proc/locks can contain a wrong pid for a lock and we always need to check this fact. Starting with the 4.1 kernel, locks are reported in fdinfo. v2: rebase to the curret master skip note_file_lock() Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-04-27 14:53:24 +03:00
Andrey Vagin	e59a81bab6	proc_parse: handle a return code of fopen_proc Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-04-22 15:36:20 +03:00
Andrey Vagin	4a36feacc7	parse_proc: take into account that breadline can return an error code Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-04-22 15:36:12 +03:00
Andrey Vagin	6b4ecb90db	proc_parse: don't play with a function exit code We should not have a chance to exit with a wrong code on error paths. \|^^^\ \| \________________ \| ** \|_\ \_______/^^^^^^^/_____/ / / / / /____/ Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-04-22 15:34:37 +03:00
Oleg Nesterov	6400fc9516	simplify the "ignore filesystem-subtype" logic We can simply overwrite the dot symbol right after the kernel reports it to us. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Tycho Andersen <tycho.andersen@canonical.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-04-14 15:15:19 +03:00
Oleg Nesterov	eb518936d8	introduce --skip-mnt cli option Which obviously can be used to "ignore" the mounts we do not want or need to dump. The user should know what he does. Note: this patch changes parse_mountinfo() to check should_skip_mount(). This is because imo we want to filter out the unwanted mounts asap, af if they do not exist. This increases the chances the dumping will fail if something else depends on this mount. Say, another mountpoint or an opened file. Perhaps it makes sense to teach should_skip_mount() to use fnmatch() and/or look at the optional "(fs\|mnt)=" prefix to skip by fsname too. To me it would be better to force the user of this option to understand what it does. Say, if "dump" fails because the child mount can't find the skipped parent, he should add another --skip-mnt option or do not dump. Otherwise, if we do this automagically the user can probably be surpised, he might even miss the fact that we skip more than he asked. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-04-03 17:56:05 +03:00
Oleg Nesterov	9fee3dc817	pass "bool for_dump" argument down to collect_mntinfo() and parse_mountinfo() Preparation. 1. Add the new "bool for_dump" arg to collect/parse_mntinfo(). 2. Introduce "struct collect_mntns_arg" to pass the additional "bool for_dump" field to collect_mntinfo() and change it to pass this boolean to collect_mntinfo()->parse_mountinfo() path. 3. Change other callers of collect_mntinfo() to pass "false". Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-04-03 17:55:18 +03:00
Cyrill Gorcunov	9ce0254c04	vma: Unify private VMAs testing We have two helpers for VMA type testing: privately_dump_vma() and vma_priv(). They work with different types but basically do the same: check if we should dump VMA into the image and restore it back then. Lets unify they both into common vma_entry_is_private() helper and vma_area_is_private() for working with vma_area type. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-04-01 12:36:46 +03:00
Oleg Nesterov	79e0b37c4e	parse_mountinfo_ent: xrealloc(new->mountpoint) can fail This is pure theoretical, especially in this particular case when we actually want to (likely) free the unused memory. Still the code which ignores potential error doesn't look good. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-03-30 13:20:39 +03:00
Oleg Nesterov	69600335a9	parse_mountinfo_ent: fix the leakage of "opt" 1. parse_mountinfo_ent() mixes "return -1" and "goto err" on failure, this looks confusing and inconsistent. 2. And buggy. It forgets to free(opt) if parse_mnt_flags() fails. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-03-30 13:20:38 +03:00
Oleg Nesterov	8b5faee7c6	parse_mountinfo_ent: kill the wrong xfree(new->mountpoint) The caller will do this on failure too. So this is unnecessary and wrong because we do not nullify ->mountpoint. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-03-30 13:20:37 +03:00
Oleg Nesterov	b66728ef14	parse_mountinfo: fix and simplify the usage of r_fstype 1. parse_mountinfo() forgets to free(fst) if parse_mountinfo_ent() succeeds. 2. The usage of fst/r_fstype is ovecomplicated for no reason. Just change the parse_mountinfo() paths to populate/use/free this fsname unconditionally, and move the ownership to the caller. There is no reason to check FSTYPE__UNSUPPORTED and/or fallback to ->name. Better yet, we could even turn fsname into the local "char []" and avoid %ms and free(), but then we would need to pass the length of this buffer to parse_mountinfo_ent(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-03-30 13:20:37 +03:00
Oleg Nesterov	2cfeeac465	parse_mountinfo: add the "end" block into the main loop Preparation to simplify the review. parse_mountinfo() assumes that: 1. The "err:" block does all the necessary cleanups on failure. This is wrong, see the next patch. 2. We can never skip the mountpoint. This is true, but we are going to change this. s/goto err/goto end/ in the main loop, add the "end:" label which inserts the new mount_info into the list and then checks ret != 0 to figure out whether we need to abort. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-03-30 13:20:36 +03:00
Andrey Vagin	54d0f24107	proc_parse: fixed format strings Use the format specifier PRIx64 instead of %lx to print uint64. integer. Reported-by: Mr Travis CI Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-03-27 15:27:59 +03:00
Oleg Nesterov	57db932a0a	mount: always report ->mnt_id as decimal validate_mounts() prints ->mnt_id in hex when it reports the failure. This complicates the understanding because this ->mnt_id is printed as decimal elsewhere, including /proc/$pid/mountinfo. parse_mountinfo() adds "0x" at least and this is just pr_info(), but lets change it too. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Andrew Vagin <avagin@openvz.org> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-03-27 14:03:04 +03:00
Andrey Vagin	36f0c16db0	proc: move logic about adding vma into a list in a separate function parse_smaps() is too big for easy reading. In addition, we are creating a new interface to get information about processes, which is called taskdiag, so parse_smaps() will do only what it should do accoding with the name. All other should be moved in a separate functions which will be reused to work with task_diag. Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-03-27 14:01:19 +03:00
Andrey Vagin	046245dcb0	proc: move logic about filling vma structures into a separate function parse_smaps() is too big for easy reading. In addition, we are creating a new interface to get information about processes, which is called taskdiag, so parse_smaps() will do only what it should do accoding with the name. All other should be moved in a separate functions which will be reused to work with task_diag. Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-03-27 14:00:50 +03:00
Pavel Emelyanov	7ede4697cf	bfd: Don't leak image-open flags into bfdopen Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-03-16 15:58:14 +03:00
Saied Kazemi	e3fec5f8eb	Ignore mnt_id value for AUFS file descriptors. Starting with version 3.15, the kernel provides a mnt_id field in /proc/<pid>/fdinfo/<fd>. However, the value provided by the kernel for AUFS file descriptors obtained by opening a file in /proc/<pid>/map_files is incorrect. Below is an example for a Docker container running Nginx. The mntid program below mimics CRIU by opening a file in /proc/1/map_files and using the descriptor to obtain its mnt_id. As shown below, mnt_id is set to 22 by the kernel but it does not exist in the mount namespace of the container. Therefore, CRIU fails with the error: "Unable to look up the 22 mount" In the global namespace, 22 is the root of AUFS (/var/lib/docker/aufs). This patch sets the mnt_id of these AUFS descriptors to -1, mimicing pre-3.15 kernel behavior. $ docker ps CONTAINER ID IMAGE ... 3850a63ee857 nginx-streaming:latest ... $ docker exec -it 38 bash -i root@3850a63ee857:/# ps -e PID TTY TIME CMD 1 ? 00:00:00 nginx 7 ? 00:00:00 nginx 31 ? 00:00:00 bash 46 ? 00:00:00 ps root@3850a63ee857:/# ./mntid 1 open("/proc/1/map_files/400000-4b8000") = 3 cat /proc/49/fdinfo/3 pos: 0 flags: 0100000 mnt_id: 22 root@3850a63ee857:/# awk '{print $1 " " $2}' /proc/1/mountinfo 87 58 103 87 104 87 105 104 106 104 107 104 108 87 109 87 110 87 111 87 root@3850a63ee857:/# exit $ grep 22 /proc/self/mountinfo 22 21 8:1 /var/lib/docker/aufs /var/lib/docker/aufs ... 44 22 0:35 / /var/lib/docker/aufs/mnt/<ID> ... $ Signed-off-by: Saied Kazemi <saied@google.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-02-09 14:07:40 +03:00
Pavel Emelyanov	89d9578730	proc: Print parsed fstype for unsupported mounts Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-01-27 16:15:22 +03:00
Pavel Emelyanov	c9b6614eef	proc: Remove now pointless debug Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-01-26 15:05:32 +03:00
Saied Kazemi	2749d9e6ea	Rework fixup_aufs_vma_fd() for non-AUFS links This patch reworks fixup_aufs_vma_fd() to let symbolic links in /proc/<pid>/map_files that are not pointing to AUFS branch names follow the non-AUFS applcation logic. The use case that prompted this commit was an application mapping /dev/zero as shared and writeable which shows up in map_files as: lrw------- ... 7fc5c5a5f000-7fc5c5a60000 -> /dev/zero (deleted) If the AUFS support code reads the link, it will have to strip off the " (deleted)" string added by the kernel but core CRIU code already does this. Signed-off-by: Saied Kazemi <saied@google.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-01-22 14:56:40 +03:00
Pavel Emelyanov	08c204820f	aio: Dump AIO rings When AIO context is set up kernel does two things: 1. creates an in-kernel aioctx object 2. maps a ring into process memory The 2nd thing gives us all the needed information about how the AIO was set up. So, in order to dump one we need to pick the ring in memory and get all the information we need from it. One thing to note -- we cannot dump tasks if there are any AIO requests pending. So we also need to go to parasite and check the ring to be empty. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-12-26 18:13:36 +03:00
Pavel Emelyanov	86c0c5fb99	proc: Allocate and get vma fstat in vma_get_mapfile We will need to detect aio mappings soon, so this is a preparation, that makes future patching simpler. Also move aufs stat-ing into aufs code to keep more aufs logic in one place. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-12-25 21:10:15 +03:00
Pavel Emelyanov	6a6cdb8d4a	proc: Drop always true last argument of parse_smaps() Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Andrew Vagin <avagin@parallels.com>	2014-12-22 13:52:03 +03:00
Pavel Emelyanov	ab2c1e426c	proc_parse: Invert supported VMA check It's for more natural adding of new else-if branch for aio. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-12-17 18:11:40 +03:00
Pavel Emelyanov	d4c62b6b5c	proc_parse: Print vma start in hex Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-12-17 18:10:54 +03:00
Cyrill Gorcunov	88031bf89e	proc_parse: Convert parse_pid_status to BFD engine Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-11-11 20:18:27 +04:00
Pavel Emelyanov	19a76494a9	kerndat: Collect all global variables on one struct Not to spoil the global namespace and unify the kerndat data names. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-11-11 20:14:53 +04:00
Andrey Vagin	71a9cd0634	proc: delete parse_pid_stat_small() (v2) It's unused now. v2: remove the proc_pid_stat_small struct too. Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-11-07 17:15:37 +04:00
Andrey Vagin	05943959a5	proc: parse state and ppid from /proc/pid/status (v2) v2: don't leak FILE CID 73423 (#1 of 1): Resource leak (RESOURCE_LEAK) 15. leaked_storage: Variable f going out of scope leaks the storage it points to. Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-11-07 17:15:03 +04:00
Andrey Vagin	fc84aa581a	flock: blocked processes are not interesting for us (v2) All out processes are stopped in a moment, when file locks are collected, so they can't to wait any locks. Here is a proof of this theory: [root@avagin-fc19-cr ~]# flock xxx sleep 1000 & [1] 23278 [root@avagin-fc19-cr ~]# flock xxx sleep 1000 & [2] 23280 [root@avagin-fc19-cr ~]# cat /proc/locks 1: FLOCK ADVISORY WRITE 23278 08:03:280001 0 EOF 1: -> FLOCK ADVISORY WRITE 23280 08:03:280001 0 EOF [root@avagin-fc19-cr ~]# gdb -p 23280 (gdb) ^Z [3]+ Stopped gdb -p 23280 [root@avagin-fc19-cr ~]# cat /proc/locks 1: FLOCK ADVISORY WRITE 23278 08:03:280001 0 EOF Currently criu can dump nothing, if we have one process which is waiting a lock. I don't see any reason to do this. v2: typo fix Cc: Qiang Huang <h.huangqiang@huawei.com> Reported-by: Mr Jenkins Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-11-05 16:34:52 +04:00
Andrey Vagin	7cb829b78f	proc: don't leak memory CID 73370: Resource leak (RESOURCE_LEAK) 13. leaked_storage: Variable timer going out of scope leaks the storage it points to. Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-11-05 15:46:59 +04:00
Pavel Emelyanov	2e91a9c814	bfd: Don't flush read-only images Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-11-05 15:38:17 +04:00
Andrey Vagin	2464ad08d6	locks: print a lock before reporting an error about it Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-10-30 15:16:17 +04:00
Cyrill Gorcunov	4135f6cd1c	proc_parse: parse_smaps -- Use @file_path instead of strstr helper strstr is a really heavy one, lets use already defined and filled @file_path variable instead. Reported-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Andrew Vagin <avagin@parallels.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-10-27 21:28:18 +04:00
Cyrill Gorcunov	4ad462b459	mount: proc-parse -- Show @mnt_id on debug print as well This is convenient when need to lookup into debug prints and check which mount point were used somewhere else (in particular I will need @mnt_id in tty code so on error I can easily figure out which mountpoint has been used). No func changes. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-10-03 13:21:16 +04:00
Cyrill Gorcunov	ae96d21a07	bfd: Use ERR_PTR and such instead of BREADERR No need to invent new error codes here, simply use ERR_PTR/IS_ERR_OR_NULL and such. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Andrew Vagin <avagin@parallels.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-10-02 14:56:39 +04:00
Cyrill Gorcunov	c01efda8af	bfd: timerfd -- Fix parsing typo While been converting reading of data stream to bfd the @buf member was left untouched leading to incorrect data to be read, fix it setting up proper one, ie @str itself, otherwise dumping of timerfd files are failing. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-09-30 11:48:15 +04:00
Pavel Emelyanov	e651a6eba4	filemap: Get vma mnt_id early We have a, well, issue with how we calculate the vma's mnt_id. Right now get one via criu side file descriptor that it got by opening the /proc/pid/map_files/ link. The problem is that these descriptors are 'merged' or 'borrowed' by adjacent vmas from previous ones. Thus, getting the mnt_id value for each of them makes no sense -- these files are the same. So move this mnt_id getting earlier into vma parsing code. This brings a potential problem -- if we have two adjacent vmas mapping the same inode (dev:ino pair) but living in different mount namespaces -- this check would produce wrong result. "Wrong" from the perspective that on restore correct file would be opened from wrong namespace. I propose to live with it, since this is not worse than the --evasive-devices option, it's _very_ unlikely, but saves a lot of openeings. Note, that in case app switched mount namespace and then mapped some new library (with dlopen) things would work correctly -- new vmas will likely be not adjacent and for different dev:ino. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-09-29 13:20:55 +04:00
Pavel Emelyanov	cf8c9ae870	vma: Reshuffle the struct vma_area We have some fields, that are dump-only and some that are restore only (quite a lot of them actually). Reshuffle them on the vma_area to explicitly show which one is which. And rename some of them for easier grep. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-09-29 13:19:55 +04:00
Pavel Emelyanov	cfce460b48	proc_parse: Rework timers parser to use bfd Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-09-23 20:49:16 +04:00
Pavel Emelyanov	cc4a67b3ed	proc_parse: Rework smaps parser to use bfd Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-09-23 20:49:07 +04:00
Pavel Emelyanov	2c8af6b8e6	proc_parse: Rework fdinfo parser to use bfd Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-09-23 20:48:58 +04:00

1 2 3 4 5

205 Commits