mir/criu - criu - Mike's Git repositories

mir/criu

mirror of https://github.com/checkpoint-restore/criu synced 2025-08-31 14:25:49 +00:00

Author	SHA1	Message	Date
Kirill Tkhai	0fcaeea912	files: Do not close transport socket twice We close it in sigreturn_restore() for unification with other service fds, so kill the second close() from here. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2018-03-02 10:10:17 +03:00
Cyrill Gorcunov	cc5dbf51d8	sfd: Lift up own fd limit on bootup This minimize chances to hit problem where files used for page transfer are trying to use same number reserved for service fd. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2018-03-02 10:10:17 +03:00
Cyrill Gorcunov	28af7aa037	kdat: Add fetching files stat Will need it to unlimit the files allocation for service fd reserving and later for parasite code run (which is implemented in vz7 instance and soon will be ported into vanilla). Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2018-03-02 10:10:08 +03:00
Kirill Tkhai	8b51779520	files: Unexport collect_task_fd() It has only one user, so unexport it. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2018-03-02 09:15:28 +03:00
Kirill Tkhai	6d40803eea	autofs: Add FD_TYPES__AUTOFS_PIPE type Add a fake fd type for autofs. This allows functions like find_file_desc() work as expected, without having two different file_desc with the same type and same id. Also, later, it will allow to delete autofs_create_fle() and to use generic helper. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2018-03-02 09:15:28 +03:00
Pavel Tikhomirov	8fdacca527	zdtm: improve tempfs_overmounted test Unchanged test provided by Andrew. Signed-off-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2018-03-02 09:15:28 +03:00
Pavel Tikhomirov	0709e3ce76	mount: do remaps for child-overmount of another overmount In case we have mounts: 1 /mnt/ 2 /mnt/a with parent 1 3 /mnt/a/b with parent 1 4 /mnt/a with parent 2 We determine 2 as needing remap with does_mnt_overmount() and remap it. Next we mount 4 on top of 2. Next in fixup_remap_mounts() we want to move 2 back to it's parent 1, but instead move 4 there. So in these case children-overmounts need to be remapped too. Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2018-03-02 09:15:28 +03:00
Pavel Tikhomirov	a9ec5829bc	mount: fix try_remap_mount Remaps in mnt_remap_list should follow same descending order which was setup in mnt_resort_siblings(), so don't reorder them. For instance if we have sibling mounts with mountpoints: 1) /dir1/dir2/dir3 2) /dir1/dir2 3) /dir1 Here (2) is sibling-overmount for (1). Mount (3) is sibling-overmount for both (1) and (2). So when we move overmounts back in fixup_remap_mounts() we should first move (2) and only then (3). Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2018-03-02 09:15:28 +03:00
Pavel Tikhomirov	84d6c73042	mount: fix mnt_resort_siblings to work as described We should add new entry _before_ first entry with less depth to sort in descending order. e.g: entries in list have depths [7,5,3], adding new entry m with depth 4 we would break list_for_each_entry loop on p with depth 3, before patch we would get [7,5,3,4] after list_add, which is wrong. Also we can relax "<=" check to "<" to avoid unnecessary reordering. Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2018-03-02 09:15:28 +03:00
Pavel Tikhomirov	dd104ddbe1	zdtm: now tempfs_overmounted will pass so remove crfail Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2018-03-02 09:15:28 +03:00
Pavel Tikhomirov	b364f4fd52	mount: make open_mountpoint handle overmouts properly dump of VZ7 ct fails, if we have overmounted tmpfs inside: [root@silo ~]# prlctl enter su-test-2 entered into CT CT-829e7b28 /# mkdir /mnt/overmntedtmp CT-829e7b28 /# mount -t tmpfs tmpfs /mnt/overmntedtmp/ CT-829e7b28 /# mount -t tmpfs tmpfs /mnt CT-829e7b28 /# logout [root@silo ~]# prlctl suspend su-test-2 Suspending the CT... Failed to suspend the CT: PRL_ERR_VZCTL_OPERATION_FAILED (Details: Will skip in-flight TCP connections (01.657913) Error (criu/mount.c:1202): mnt: Can't open ./mnt/overmntedtmp: No such file or directory (01.662528) Error (criu/util.c:709): exited, status=1 (01.664329) Error (criu/util.c:709): exited, status=1 (01.664694) Error (criu/cr-dump.c:2005): Dumping FAILED. Failed to checkpoint the Container All dump files and logs were saved to /vz/private/829e7b28-f204-4bce-b09f-d203b99befd4/dump/Dump.fail Checkpointing failed ) Criu wants to dump the contents of /mnt/overmntedtmp/ mount but it is unavailable. So we copy the mount namespace in such a case and unmount overmounts to access what we want to dump. Actual usecase here is dumping CT with active mariadb and ssh connection. Together they happen to create such overmount. As by default systemd creates a separate mount namespace for mysql and also mounts tmpfs to /run/user in it, and when ssh(root) is connected - systemd also mounts tmpfs in container root mount namespace to /run/user/0 for user files. As /run is slave mount /run/user/0 also propagates to mysql's mount namespace and initially becomes overmounted by /run/user. https://jira.sw.ru/browse/PSBM-57362 remove __maybe_unused for mnt_is_overmounted and umount_overmounts changes in v2: 1) Use clone not fork, share resources with parent same as in call_in_child_process. 2) Do not enter userns (create helper) for non-overmounted mounts. Thus return back setns/resorens logic. 3) Helper opens fd for parent directly due to CLONE_FILES, remove futex. 4) Check helper exit status properly. 5) Add get_clean_fd helper. 6) Add better comments. changes in v3: 1) Pass fd from helper through args instead of ret code, fix ret code checking. 2) Add \n to pr_err in open_mountpoint changes in v5: Make comments even better. Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2018-03-02 09:15:28 +03:00
Pavel Tikhomirov	83df86494b	mount add umount_overmounts helper to make mount visible also remove __maybe_unused for __umount_children_overmounts note: leave it __maybe_unused yet Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2018-03-02 03:03:20 +03:00
Pavel Tikhomirov	d17bad63cc	mount: add __umount_children_overmounts helper to make mount visible note: leave it __maybe_unused yet Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2018-03-02 03:03:20 +03:00
Pavel Tikhomirov	2bed6e9f3b	mount: add mnt_is_overmounted helper to check mount visibility note: leave it __maybe_unused yet Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2018-03-02 03:03:20 +03:00
Andrei Vagin	0d9bed0ec6	kerndat: call kerndat_link_nsid() It was droped during one of rebases. Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2018-02-16 02:11:17 +03:00
Dmitry Safonov	677b6cb0f2	kdat/net: Init kerndat even if nsid aren't supported We should continue even if kdat feature isn't supported: [criu]# ./criu/criu dump -t `pidof pypy` --shell-job Warn (criu/kerndat.c:804): Can't load /run/criu.kdat Warn (criu/libnetlink.c:55): ERROR -95 reported by netlink Error (criu/net.c:3042): Unable to create a veth pair: -95 Warn (criu/net.c:3064): NSID isn't reported for network links Cc: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>	2018-02-16 02:11:17 +03:00
Andrei Vagin	fc3ffd8282	net: handle a case when --empty net is set only for criu dump The origin idea was to set --empty net for criu dump and criu restore, but before `cde33dcb06` ("empty-ns: Don't C/R iptables too (v2)"), criu restore worked without --empty net and we didn't notice that docker doesn't set this option on restore. After a small brainstorm, we decided that it is better to remove this requirement. Docker has to set this option, but with this changes, the docker issue will be less urgent. https://github.com/checkpoint-restore/criu/issues/393 Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2018-02-16 02:11:17 +03:00
Pavel Emelyanov	71e2bdc968	net: Fix links collection retcode There's a if (bad_thing) { ret = -1; break; } code above this hunk, whose intention is to propagate -1 back to caller. This propagation is obviously broken. Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2018-02-16 02:11:17 +03:00
Andrei Vagin	95f2d40769	restore: create the root netns before running setup-namespaces scripts runc restore executes criu with --emptyns network and set a setup-namespaces script to restore a network namespace. https://github.com/xemul/criu/issues/314 Looks-good-to: Pavel Emelyanov <xemul@virtuozzo.com> Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com> Fixes: 2189b9c71d3d ("net: allow to dump and restore more than one network namespace") Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2018-02-16 02:11:17 +03:00
Kirill Tkhai	ef65d98a78	clone_noasan: Allow to create CLONE_VM\|CLONE_VFORK processes Picked from patch "[PATCH RFC] namespaces: use CLONE_VFORK with CLONE_VM when it is possible" by Andrew Vagin. Currenly parent touches child's stack, as in moment of clone() call its stack pointer is above the child's (we allocate char stack[128] on parent's stack). This prevents to create CLONE_VM\|CLONE_VFORK processes, because the child uses stack addresses occupied by parent. The patch changes clone_noasan() behaviour and allows to do that with the same memory consumption. We give a child memory, which is not used by parent clone(), so parent's and child's stacks have no tntersection. This allows to create CLONE_VM\|CLONE_VFORK processes. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2018-02-16 02:11:17 +03:00
Kirill Tkhai	badf42caa4	restore: Block SIGCHLD during root_item initialization (Was "user_ns: Block SIGCHLD during namespaces generation") We don't want asynchronous signal handler during creation of namespaces (for example, in create_user_ns_hierarhy()) as we do wait() synchronous. So we need to block the signal. Do this once globally. v2: Set initial ret = 0 v3: Block signal globally in root_item before its children are created. v4: Move block to prepare_namespace() Suggested-by: Andrew Vagin <avagin@virtuozzo.com> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2018-02-16 02:11:17 +03:00
Kirill Tkhai	a877a89484	util: add a function to run an action is a child process The action is run in a very lightweight process. Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2018-02-16 02:03:57 +03:00
Kirill Tkhai	b5225915df	ns: Do not change net_ns in prepare_net_namespaces() In next patches usernsd will need to create transport socket in the same net_ns as other tasks do their TRANSPORT_FD_OFF sockets. Choose criu net_ns for that: this allows usernsd to do not wait for creation of other net_ns, i.e. to do not introduce new dependencies between tasks. In case of (root_ns_mask & CLONE_NEWUSER) != 0 root_item's user_ns does not allow to restore criu net_ns, so do prepare_net_namespaces() in sub-process to do not lose criu net. v3: Introduce __prepare_net_namespaces and execute it in cloned task. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2018-02-15 22:13:26 +03:00
Kirill Tkhai	453a90e580	ns: Fix wrong opened net ns file Since net ns is assigned after prepare_fds() and, in common case, at the moment of open_ns_fd() call task points to a net ns, which differs to its target net ns, we can't get the ns from a task. So, get it from fdstore. Also, support userns ns fds. v2: Add comment Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2018-02-15 22:11:43 +03:00
Andrei Vagin	35ad233fb9	test/zdtm/static/netns_sub_veth.c	2018-02-15 21:45:08 +03:00
Andrei Vagin	98b5542c77	test: check veth devices from two network namespaces We shave a test case for external veth devices. This test case checks veth devices which are living in two dumped network namespaces. Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2018-02-15 21:45:08 +03:00
Andrei Vagin	f06369b9e2	net: dump and restore connected to a bridge links A network device, which is connected to a bridge, is restored after the bridge. In this case we can set the master attribute and the device will be connected to the bridge automatically. Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2018-02-15 21:45:08 +03:00
Andrei Vagin	5b617ecde5	net: create a list of all links We will need to enumirate links a few times Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2018-02-15 21:45:08 +03:00
Andrei Vagin	8cb26b02e6	net: split restore_links on read and restore parts It's a preparation for enumirating links a few times. Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2018-02-15 21:45:08 +03:00
Andrei Vagin	6aa4a9c2f1	netns: restore internal veth devices When we dump a veth device, the kernel reports where a peer device lives and we use this information to restore this veth pair. On restore we set a net ns id for a peer and it is created in the required netns. v2: add more comments Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2018-02-15 21:45:08 +03:00
Andrei Vagin	7dc4b34e1c	net: give ns_id to link_info functions It will be used to restore links in different net namesapces. Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2018-02-15 21:45:08 +03:00
Andrei Vagin	9a2c99c343	netns: dump and restore network namespace ID-s In each network namespace we can set an id for another network namespace to be able to address it in netlink messages. For example, we can say that a peer of a veth devices has to be created in a network namespace with a specified id. If we request information about a veth device, a kernel will report where a peer device lives. An user are able to set this ID-s, so we have to dump and restore them. v2: add more commetns v3: make a union of nsfd_id and ns_fd, they are not used together Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2018-02-15 21:45:08 +03:00
Andrei Vagin	1db8e1680f	netns: create a netlink route socket out of dump_links() It will be used to dump netns id-s too. Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2018-02-15 21:45:08 +03:00
Andrei Vagin	2e8069f2a5	net: transfer ns_id structures to functions about c/r-ing netns It will be used to get or set netns id-s. Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2018-02-15 21:45:08 +03:00
Andrei Vagin	805dddad0b	netlink: add nla_get_s32() This function was added into libnl3 recently, but we have to support old versions of this library. Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2018-02-15 21:45:08 +03:00
Andrei Vagin	b79267b584	kerndat: check whether a kernel supports netns id-s or not Each network namespaces has a list of ID-s for other namespaces, so if we request infomation about a veth device, we get an id for a namespace of a peer device. These ID-s can be set by users or by kernel when they are required. CRIU has to restore these ID-s for network namespaces. We have to remember that one netns can have different id-s in different network namespaces. Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2018-02-15 21:45:08 +03:00
Andrei Vagin	dd9e114276	images: add a network namespace id into images It is possible to assign id for network namespaces and this id will be used by the kernel in some netlink messages. If no id is assigned when the kernel needs it, it will be automatically assigned by the kernel. For example, this id is reported for peer veth devices. v2: add a comment Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2018-02-15 21:45:08 +03:00
Kirill Tkhai	a75f0fa8bc	ns: Simplify create_net_ns() Merge code with the same functionality in one Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2018-02-15 21:44:53 +03:00
Kirill Tkhai	b54c7d3d88	net: Kill unused argument in open_net_ns() Nobody uses it. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2018-02-15 21:42:44 +03:00
Andrei Vagin	ae291308ee	test/zdtm/static/netns_sub.c	2018-02-15 19:51:55 +03:00
Andrew Vagin	d7d11e0e00	zdtm: add a test for nested network namespaces This tests create a few processes which live in three network namespaces and have a few sockets which are created in different network namespaces. Acked-by: Pavel Emelyanov <xemul@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2018-02-15 19:46:13 +03:00
Andrei Vagin	14731c5210	net: add a way to get a network namespace for a socket Each sockets belongs to one network namespace and operates in this network namespace. socket_diag reports informations about sockets from one network namespace, but it doesn't report sockets which are not bound or connected to somewhere. So we need to have a way to get network namespaces for such sockets. Acked-by: Pavel Emelyanov <xemul@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2018-02-15 19:45:49 +03:00
Andrei Vagin	37ea6ed0cd	kerndat: check the SIOCGSKNS ioctl This ioctl is called for a socket and returns a file descriptor for network namespace where a socket has been created. Acked-by: Pavel Emelyanov <xemul@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2018-02-15 19:45:49 +03:00
Andrei Vagin	7a6a42d05b	net: set a proper network namespace to create a socket Each socket has to be restored from a proper network namespaces where it was created. We set a specified network namespace before restoring a socket. A task network namespace is set after restoring all files. v2: don't set the root netns for transport sockets Acked-by: Pavel Emelyanov <xemul@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2018-02-15 19:45:49 +03:00
Andrei Vagin	2b6ed5bffc	util: move open_proc_fd to service_fd We need this to avoid conflicts with file descriptors, which has to be restored. Currently open_proc_pid() doesn't used during restoring file descriptors, but we are going to use it to restore sockets in proper network namespaces. Acked-by: Pavel Emelyanov <xemul@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2018-02-15 19:45:49 +03:00
Andrei Vagin	6b7393c44e	net: allow to dump and restore more than one network namespace Restore all network namespaces from the root task and then set a proper namespace for each task after restoring sockets, because we need to switch network namespaces to restore sockets. Each socket has to be created in a proper network namespace. v2: fix a typo bug Acked-by: Pavel Emelyanov <xemul@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2018-02-15 19:45:49 +03:00
Andrei Vagin	a54d3cf110	net: save network namespaces for sockets Each socket has to be restored in a proper namespaces where it has been created. Here is an issue about unconnected and unbound sockets, they are not reported via socket-diag and we can't to get their network namespaces. v2: add a comment before get_socket_ns() remove nsid from sk_packet_entry Acked-by: Pavel Emelyanov <xemul@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2018-02-15 19:45:49 +03:00
Pavel Emelyanov	17f163a6f2	usernsd: Add debugging to catch BUG in unsd fd/flags mismatch Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>	2018-02-15 19:45:49 +03:00
Andrei Vagin	5902eb2155	net: rename pid into nsid for prepare_net_ns() PID ussualy means processs ID, but prepare_net_ns works with namespaces. travis-ci: success for Dump and restore nested network namespaces (rev4) Signed-off-by: Andrei Vagin <avagin@virtuozzo.com> Reviewed-by: Dmitry Safonov <dsafonov@virtuozzo.com> Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>	2018-02-15 19:45:49 +03:00
Andrei Vagin	4ea9cf336a	netlink: add ns_id as a generic argument to receive_callback ns_id will be used to collect sockets and other per-netns resources travis-ci: success for Dump and restore nested network namespaces (rev4) Signed-off-by: Andrei Vagin <avagin@virtuozzo.com> Reviewed-by: Dmitry Safonov <dsafonov@virtuozzo.com> Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>	2018-02-15 19:45:49 +03:00

1 2 3 4 5 ...

9040 Commits