2
0
mirror of https://github.com/checkpoint-restore/criu synced 2025-08-26 11:57:52 +00:00

1696 Commits

Author SHA1 Message Date
Dmitry Safonov
3815004841 kerndat: do not report error on loginuid feature test
Fix for commit 0ce8e4299506 ("kerndat: do not report errors on feature
test").
That commit hid error messages for feature testing when you cannot
write to /proc/*/loginuid files because of missing kernel patch that
allows unsetting loginuid value on older kernels, but it didn't hide
error messages in case of disabled CONFIG_AUDITSYSCALL - then you
don't have loginuid files.
Also fixed comment for kerndat feature test: procfs file might fail
to open if it's missing and that's fine - !CONFIG_AUDITSYSCALL case,
but it can't fail due permission fault on _read_ (then something is
wrong, lets report a problem).

Reported-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2016-02-08 20:25:45 +03:00
Andrew Vagin
0dcfcc0ea5 net: dump netfilter conntracks and expectations
We request all contracks via netlink and save netlink messages which
describe them in an image file, then we send these netlink messages back on restore.

https://github.com/xemul/criu/issues/54
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2016-02-08 11:41:59 +03:00
Andrew Vagin
7458b054f9 rst-malloc: return aligned pointers to sizeof(void *) (v4)
Stas found that if we don't align a pointer,
futex and atomic operations can fail.

v2: don't hard-code the size of void *
v3: add a function to allocate memory without gaps with
    a privious slice. It's used to allocate arrays.
v4: don't change rst_mem_cpos

Cc: Stanislav Kinsburskiy <skinsbursky@virtuozzo.com>
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2016-02-07 11:09:44 +03:00
Laurent Dufour
8ceab588a5 crtools: no more linked with builtin syscall
crtools binary is linked with the C library and could rely on all the
services this library is providing, including system calls.

Thus it doesn't need to be linked with the builtin system calls code
made for the parasite/restorer binaries.

This patch does:
 - remove the inclusion of syscall.h
 - replace all call to sys_<syscall>() by C library <syscall>()
 - replace unwrapped system calls by syscall(SYS_<syscall>,...)
 - fix the generated compiler's issues.

There should not be any functional changes. The only 'code' changes is
appearing in locks.h when futex is called through the C library, the
errno value is fetched from errno variable instead of the return
value.

Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Reviewed-by: Christopher Covington <cov@codeaurora.org>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2016-02-06 20:42:03 +03:00
Laurent Dufour
34faa89bcf namespace: move definition of CLONE_SUBNS
The CRIU internal define of CLONE_SUBNS should not be put in
syscall-types.h since this define is not part of a system call.

This move is required to prepare the removal of syscall.h from the
component of crtools binary.

Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Reviewed-by: Christopher Covington <cov@codeaurora.org>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2016-02-06 20:42:01 +03:00
Laurent Dufour
62194b1548 build: conditional define of struct prctl_mm_map
The file include/prctl.h should define the struct prctl_mm_map only if
it is not already defined in the system include file linux/prctl.h.

The definition should be part of the '#ifndef PR_SET_MM_MAP' block
since this structure is not defined in that case.

Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Reviewed-by: Christopher Covington <cov@codeaurora.org>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2016-02-06 20:41:59 +03:00
Cyrill Gorcunov
e9aed2ed38 plugin: Add PRE_DUMP stage into plugins
It is missged in first place and may cause
problem on exiting via alarm hanling.

Reported-by: Igor Sukhih <igor@virtuozzo.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2016-02-05 13:24:11 +03:00
Stanislav Kinsburskiy
2f70a79eac pipes: move struct pipe_info declaration to pipes.h
AutoFS will need to create write pipe end file descriptor, if it was closed.
Thus, pipe_info structure have to be exported.

Signed-off-by: Stanislav Kinsburskiy <skinsbursky@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2016-01-27 17:32:19 +03:00
Tycho Andersen
fa433b7552 cgroup: restore perms on cgroup dirs as well
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2016-01-27 16:45:24 +03:00
Tycho Andersen
cbe8ef4fe7 cgroup: restore cgroup property perms as well
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2016-01-27 16:45:20 +03:00
Pavel Emelyanov
a85aeb7fd0 ns: Remove __rst_new_ns_id
There's only one user of it, so better to reshuffle the arg set.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2016-01-20 11:11:06 +03:00
Pavel Emelyanov
b8a9122d89 fds: Remove unused arg from close_old_fds()
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2016-01-20 11:10:29 +03:00
Igor Sukhih
eec66f3d30 criu [PATCH] post-setup-namespaces
Introduce post-setup-namespaces action script

It needed to have possibility to run cutom script after mount
namespace is configured

Signed-off-by: Igor Sukhih <igor@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2016-01-20 11:08:22 +03:00
Andrew Vagin
ef8d4cf285 service: add support for the --external option
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2016-01-20 11:07:31 +03:00
Pavel Emelyanov
d7684252c8 kdat: Handle pagemaps with zeroed pfns
Recent kernels allow for user to read proc pagemap file, but zero
pfns in it. Support this mode for user dumps.

https://github.com/xemul/criu/issues/101

Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Acked-by: Andrew Vagin <avagin@virtuozzo.com>
2016-01-18 21:07:06 +03:00
Vijaya Kumar K
8a7c006bdd define macro for stack alignment
Replace stack alignment magic constant with
__stack_aligned__ macro.
Also align stack for sigaltstack test case.

Signed-off-by: Vijaya Kumar K <vijayak@caviumnetworks.com>
Reviewed-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2016-01-13 15:50:42 +03:00
Vijaya Kumar K
cda3a0799b restorer: Update RESTORE_ALIGN_STACK for arm64
arm64 requires stack to be aligned to 16 bytes.
update RESTORE_ALIGN_STACK macro to always align
to 16 bytes.

Signed-off-by: Vijaya Kumar K <vijayak@caviumnetworks.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-29 14:53:22 +03:00
Dmitry Safonov
9405d4a4cf criu-log: introduce print_once
Impact: small cleanup

Signed-off-by: Dmitry Safonov <dsafonov@odin.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-29 14:51:28 +03:00
Dmitry Safonov
0ce8e42995 kerndat: do not report errors on feature test
prepare_loginuid() called on kerndat_loginuid where it tests for
loginuid restore feature. Let's omit error printing for feature test.

Signed-off-by: Dmitry Safonov <dsafonov@odin.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-29 14:51:23 +03:00
Andrey Vagin
4864996ed8 dump: write an inventory image after dumping all processes
Currently if criu segfaulted, the inventory image isn't removed and
we can't detect that images are incomplete.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-29 14:50:38 +03:00
Andrew Vagin
4bab48fb39 tty: allow to dump and restore external terminals (v2)
Now we can use the --inherit-fd option to mark external terminals on dump
and to tell which file desdriptors should be used to restore these terminals.

Here is an example how it works:
$ setsid sleep 1000

$ ipython
In [1]: import os
In [2]: st = os.stat("/proc/self/fd/0")
In [3]: print "tty[%x:%x]" % (st.st_rdev, st.st_dev)
tty:[8800:d]

$ps -C sleep
  PID TTY          TIME CMD
 4109 ?        00:00:00 sleep

$ ./criu dump --external 'tty[8800:d]' -D imgs -v4 -t 4109
$ ./criu restore --inherit-fd 'fd[1]:tty[8800:d]' -D imgs -v4

v2: add missed break
    remove @non_file from tty_driver

Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-29 14:48:10 +03:00
Andrew Vagin
6280625aac crtools: add ability to set list of external resources
This option is used to mark external resources on dump.

Currently it's going to be used to handle external tty-s,
but in a future it can be used to any type of resources.

We can have a few ways to restore external resources and
we will have a separate options to say how to restore each type.

For example, we can use --inherit-fd to restore external
file descriptors.

Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-29 14:47:56 +03:00
Andrew Vagin
2245a43393 tty: use a pair of dev and rdev to identify a terminal
We can't use only a terminal device, because we can not distinguish
two pty-s from different mounts in this case.

$ mount -t devpts -o newinstance xxx pts1
$ mount -t devpts -o newinstance xxx pts2
$ stat pts1/0
Device: 27h/39d	Inode: 3           Links: 1     Device type: 88,0
$ stat pts2/0
Device: 28h/40d	Inode: 3           Links: 1     Device type: 88,0

Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-29 14:47:46 +03:00
Andrew Vagin
20592acef7 syscall: use a correct type for timer_t
timer_t is (void *) in glibc, but timer_t is (int) in kernel.
When we call system calls, we need to use timer_t from kernl.

https://github.com/xemul/criu/issues/98
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-28 13:11:04 +03:00
Dmitry Safonov
1137c3f80a kerndat: add has_loginuid to kerndat_s
This value will differ on C/R:
  - on checkpoint it means that it's possible to dump logiuid values;
  - on restore it means that it's possible to unset loginuid and write
    saved value to unsetted loginuid.

Signed-off-by: Dmitry Safonov <dsafonov@odin.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-28 13:09:01 +03:00
Cyrill Gorcunov
d1db4faf9b vdso: Don't fail if pagemap is not accessbile
We use page frame number to detect vDSO which has been remapped
in-place from runtime vDSO during restore. In such case if the
kernel is younger than 3.16 the "[vdso]" mark won't be reported
in procfs output.

Still to address recently reported CVEs and be able to run CRIU
in unprivileged mode we need to handle vDSO without pagemap access
and here is the deal -- when we find VMA which "looks like" vDSO
we try to scan it for vDSO symbols and if it matches we restore
its status without PFN access.

Here is some details on @pagemap access in-kernel history:

 - @pagemap introduced in commit 85863e475e59 where anyone
   which can attach to a task via ptrace is allowed to read
   data from @pagemap (Feb 4 2008, v2.6.25-rc1)

 - in commit 006ebb40d3d65 ptrace attach rule has been changed
   into ptrace read permission (May 19 2008, v2.6.27-rc1)

 - in commit ab676b7d6fbf4 opening of @pagemap become guarded
   with CAP_SYS_ADMIN because of leak of physical addresses
   into userspace (Mar 9 2015, v4.0-rc5)

 - in commit 1c90308e7a77a opening of @pagemap become available
   for regular users again (with ptrace read permission) but
   physical addresses of pages are hidden from non-privileged
   userd (Sep 8 2015, v4.3-rc1)

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Looks-good-to-me: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-24 14:40:05 +03:00
Pavel Emelyanov
df1729e3ec util: Ability to ignore errno when opening proc
When run from regular user criu will get EACCES/EPERM from
opening proc, but in some situations criu will now how to
deal with it. So this patch makes it possible not to print
error message in logs for such cases.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Looks-good-to-me: Andrew Vagin <avagin@virtuozzo.com>
2015-12-24 14:40:02 +03:00
Pavel Emelyanov
6c22bfe216 criu: Remove security
We no longer support root-mode service and suid binaries, so
any artificial restrictions no longer make sense.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Looks-good-to-me: Andrew Vagin <avagin@virtuozzo.com>
2015-12-24 14:39:58 +03:00
Cyrill Gorcunov
e9fc593cde creds: dump -- Implement per-thread dump of credentials
This as well as restore requires several steps to reach per-thread
support during dump stage

 - @creds area to be fetched from the parasite is embedded into
   parasite_dump_structure

 - when test for task to be dumpable we no longer compare caps
   because we now allow them to be different (and I renamed
   proc_status_creds_eq to proc_status_creds_dumpable for this
   sake)

 - have to extend dump_thread_common to support dumping of
   creds (we call for dump_thread_common in several places,
   in particular when we need to fetch misc params we don't
   need creds, here @creds option comes into the play)

 - after this patch no creds-X.img file be generated anymore,
   I guess we might drop it off with time from descriptors

https://jira.sw.ru/browse/PSBM-41416

v2:
 - In dump_task_creds() don't mangle the call for parasite_dump_creds
   and collect_lsm_profile
 - PARASITE_MAX_GROUPS takes parasite_dump_thread into account because
   dump_thread_common now serves two cases: for plain misc parameters
   fetching and for creds as well (depending on the context)
 - when test for dumpable we still require the seccomp filters
   to match, they can be different and we need to support such
   configuration too but not in this series

v3:
 - Rip off dump_task_creds completely, together with PARASITE_CMD_DUMP_CREDS,
   we dump creds unconditionally in dump_thread_common
 - the group leader thread data is fetched via new
   parasite_dump_thread_leader_seized helper

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-24 13:21:50 +03:00
Cyrill Gorcunov
eb5d84428e creds: restore -- Implement per-thread restore of credentials
Because the creds parameters are to be passed inside pie/restorer
code but read before thread_restore_args and task_restore_args
structures are allocated we need a small trick and prepare
creds int several stages

 - collect all creds data into separate private memory blobs
 - once all memory needed for restorer is allocated we relocate
   pointers in this blocks and setup
   thread_restore_args::thread_creds_args to appropriate
   address
 - restorer works as usual and setup creds parameters as before

v2:
 - fix addressing in positioning of rst_ memory (I've occasionally
   zap pointers and when been sending patches forgot to merge changes
   back, so while I've the series successfully restoring containers
   with different creds, if been merged the series won't work. So
   all changes are merged as appropriate)

 - drop module's global @cap_last_cap from pie/restorer.c

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-24 13:20:58 +03:00
Cyrill Gorcunov
212e210552 creds: Move proc_status_creds::cap_X at the end of structure
For easier comparision which gonna be addressed in next patch.

https://jira.sw.ru/PSBM-41416

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-24 13:18:39 +03:00
Cyrill Gorcunov
767e3e994e xmalloc: Add xmemdup helper
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-24 13:18:05 +03:00
Kirill Tkhai
36c4cba986 binfmt_misc: Skip dumping if it's not virtual
Similar to devtmpfs and devpts, skip binfmt_misc
mount if it's not virtual.

Signed-off-by: Kirill Tkhai <ktkhai@odin.com>
Acked-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-23 15:44:45 +03:00
Andrey Ryabinin
d0ff73077d dump: add timeout for collecting processes
Currently criu dump may hang indefinitely. E.g. in wait for task
that blocked in vfork() or task could be in D state for some other
reason. This patch adds time limit on collecting tasks during the
dump operation. If collecting processes takes too long, the dump
process will be terminated. Timeout is 5 seconds by default, but
it could be changed via parameter.

Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-21 12:00:49 +03:00
Stanislav Kinsburskiy
16fd19895c util: new string helpers introduced
This patch brings add_to_string() and construct_string() helpers.
They allow to create a string with variable amount of parameters in sprintf()
manner, but supporting string allocation (and reallocation if necessary)

v2:
1) Helpers were renamed to xstrcat() and xsprintf() respectively.
2) Added printf attributes to force compiler check

Signed-off-by: Stanislav Kinsburskiy <skinsbursky@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-21 11:57:00 +03:00
Evgeniy Akimov
8b04551c48 restore: restore freezer cgroup state
Patch restores freezer cgroup state between finalize_restore stages.
It should be done after first stage because we cannot unmap restorer blob
from frozen process, and before second stage because we must freeze processes
before they continue run.
We also need to move fini_cgroup between these stages to provide freezer
cgroup state restorer access to cgroup mount directories.
Error handlers contains fini_cgroup, so we are sure that fini_cgroup call
won't be missed.

Patch restores state only for one freezer cgroup from --freeze-cgroup option,
not all states from whole hierarchy, because CRIU supports checkpoint from
freezer cgroup hierarchy only with THAWED state, except root cgroup from
--freeze-cgroup option.

Signed-off-by: Evgeniy Akimov <geka666@gmail.com>
Signed-off-by: Eugene Batalov <eabatalov89@gmail.com>
Acked-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-16 18:17:04 +03:00
Evgeniy Akimov
34662a68c8 cgroups: save freezer state during dump
CRIU sets freezer.state to "THAWED" during process tree dumping. That's why
we can't simply save freezer.state file contents to cgroups image. New
special function get_real_freezer_state() returns freezer cgroup state
observed before CRIU dumping start. Patch puts its return value to dump file.

Signed-off-by: Evgeniy Akimov <geka666@gmail.com>
Signed-off-by: Eugene Batalov <eabatalov89@gmail.com>
Acked-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-16 18:17:01 +03:00
Dmitry Safonov
e5c99983a4 criu: dump loginuid & oom_score_adj values
https://jira.sw.ru/browse/PSBM-41993

Signed-off-by: Dmitry Safonov <dsafonov@odin.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-16 18:08:58 +03:00
Rodrigo Bruno
f993926f5b Rename cr_opts.ps_port into port
Signed-off-by: Rodrigo Bruno <rbruno at gsd.inesc-id.pt>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-15 14:00:09 +03:00
Rodrigo Bruno
91b689a3a4 Introduce the read_into_buffer helper
This will be required for page-cache and page-proxy set.

Signed-off-by: Rodrigo Bruno <rbruno at gsd.inesc-id.pt>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-15 13:58:03 +03:00
Andrew Vagin
391efc9d7e seize: don't wory if a cgroup contains some extra tasks (v3)
A freezer cgroup can contain tasks which will be not dumped,
criu unfreezes the group, so we need to freeze all extra
task with ptrace like we do for target tasks.

Currently we attache and send an interrupt signals to these tasks,
but we don't call waitpid() for them, so then waitpid(-1, ...)
returns these tasks where we don't expect to see them.

v2: execute freezer_detach() only if opts.freeze_cgroup is set
    calculate extra tasks in a freezer cgroup correctly
v3: s/frozen_processes/processes_to_wait/

Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-14 15:55:30 +03:00
Stanislav Kinsburskiy
8e863a94c7 fstype: "mount" callback introduced
It will be used to mount AutoFS, because context creation is required in
addition to actual mount operation.

Signed-off-by: Stanislav Kinsburskiy <skinsbursky@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-14 14:04:32 +03:00
Stanislav Kinsburskiy
1617579a27 pstree: more pstree-related helpers
This patch introduces three helpers:
1) pstree_item_by_real() - search for pstree item by real pid.
2) pstree_item_by_virt() - search for pstree item by virtual pid.
3) pid_to_virt() - return virtual pis by real one.

Note: pstree_item_by_virt() and pid_to_virt() will be used to migrate AutoFS.

Signed-off-by: Stanislav Kinsburskiy <skinsbursky@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-14 14:04:21 +03:00
Andrew Vagin
af55c059fb mount: fix a race between restoring namespaces and file mappings (v2)
Currently we wait when a namespace will be restored to get its root.
We need to open a namespace root to open a file to restore a memory mapping.

A process restores mappings and only then forks children. So we can have
a situation, when we need to open a file from a namespace, which will be
"restored" by one of our children.

The root task restores all mount namespaces and opens a file descriptor
for each of them. In this patch we open root for each mntns in the root
task.

If we neeed to get root of a namespace which isn't populated, we can get
it from the root task. After the CR_STATE_FORKING stage, the root task
closes all namespace descriptors ane we know that all namespaces are
populated at this moment.

v2: don't close root_fd for root ns, because it was not opened
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-10 14:58:59 +03:00
Fyodor
78a521163f pagemap-cache: add const-qualifier to pmc's vma
We need to perform dirty page tracking when dumping shmem but there
we have only const vmas so we need pmc to work with them. Also pmc concept
implies that it won't change its vmas so it would be natural to declared
them as const.

Signed-off-by: Fyodor Bocharov <fbocharov@yandex.ru>
Signed-off-by: Eugene Batalov <eabatalov89@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-08 15:45:44 +03:00
Tycho Andersen
6af96c8404 lsm: add a --lsm-profile flag
In LXD, we use the container name in the LSM profile. If the container name
is changed on migrate (on the host side), we want to use a different LSM
profile name (a. la. --cgroup-root). This flag adds that support.

v2: remove unused field, add comment about double detection in
    kerndat_lsm()

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-08 15:07:26 +03:00
Cyrill Gorcunov
6217a84ae3 mnt: Carry run-time device ID in mount_info
When we're restoring fsnotify watchees we need to resolve
path to a handle at some mountpoint referred by @s_dev
member (device ID) which is saved inside image. This
ID actually may be changed at the every mount (say
one restores container after machine reboot) or in
case of container's migration.

Thus the test for overmounting in __open_mountpoint
will fail and we get an error.

Lets do a trick: introduce @s_dev_rt member which
is supposed to carry run-time device ID. When dumping
this member simply equal to traditional @s_dev fetched
from the procfs, but when restoring we fetch it from
stat call once mountpoint become alive.

https://jira.sw.ru/browse/PSBM-41610

v2:
 - predefine MOUNT_INVALID_DEV
 - use fetch_rt_stat instead of assigning device in restore_shared_options
 - copy @s_dev_rt in propagate_siblings and propagate_mount

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-08 14:58:32 +03:00
Kirill Tkhai
de8fd000d0 fs: Add binfmt_misc support
This patch implements checkpoint/restore functionality
for binfmt_misc mounts. Both magic and extension types
and "disabled" state are supported.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-08 14:52:26 +03:00
Pavel Emelyanov
a90d01a078 service: Remove systemd startup mode
Due to security reasons the systemd-spawn mode is no longer
supported in service.

Also fix the default binding address to be in local cwd not
to start global service by chance.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-07 11:28:49 +03:00
Andrew Vagin
fde1116fee proc: parse sigpnd and shdpnd separatly
We found that we want to know whether SIGSTOP is queue
in both or is in one of this queues.

Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-11-26 09:05:12 +03:00