2
0
mirror of https://github.com/checkpoint-restore/criu synced 2025-08-29 05:18:00 +00:00

5990 Commits

Author SHA1 Message Date
Andrew Vagin
af55c059fb mount: fix a race between restoring namespaces and file mappings (v2)
Currently we wait when a namespace will be restored to get its root.
We need to open a namespace root to open a file to restore a memory mapping.

A process restores mappings and only then forks children. So we can have
a situation, when we need to open a file from a namespace, which will be
"restored" by one of our children.

The root task restores all mount namespaces and opens a file descriptor
for each of them. In this patch we open root for each mntns in the root
task.

If we neeed to get root of a namespace which isn't populated, we can get
it from the root task. After the CR_STATE_FORKING stage, the root task
closes all namespace descriptors ane we know that all namespaces are
populated at this moment.

v2: don't close root_fd for root ns, because it was not opened
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-10 14:58:59 +03:00
Andrew Vagin
a9be7621b7 mount: pick out a function to set ROOT_FD_OFF
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-10 14:58:18 +03:00
Pavel Emelyanov
decf4f525a crit: Fix casts for fixed and sfixed types
The native pb engine doesn't accept types other than int or long:

...
  File "/root/src/criu/pycriu/images/pb2dict.py", line 264, in dict2pb
    pb_val.append(_dict2pb_cast(field, v))
  File "/usr/lib/python2.7/site-packages/google/protobuf/internal/containers.py", line 111, in append
    self._type_checker.CheckValue(value)
  File "/usr/lib/python2.7/site-packages/google/protobuf/internal/type_checkers.py", line 104, in CheckValue
    raise TypeError(message)
TypeError: 1.1258999068426252e+16 has type <type 'float'>, but expected one of: (<type 'int'>, <type 'long'>)

In particular, this is seen when encoding back so_filter field from
inetsk image.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Ruslan Kuprieiev <kupruser@gmail.com>
2015-12-10 14:57:27 +03:00
Cyrill Gorcunov
d1e9b11d02 seize: get_freezer_state -- Relax stack
For historical reason we allocate the complete PATH_MAX
here just to fetch a word of freezer state. Lets relax
the stack pressue and rename @path to @state. Same time
make states @frozen, @freezing, @thawed being static,
we don't export them.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-10 14:56:51 +03:00
Andrey Vagin
b6d9bcd10d zdtm.sh: set a type argument for mknod
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-10 14:55:51 +03:00
Andrey Vagin
94d91458c9 zdtm.py: don't worry if uns isn't in run_flavs
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-09 17:03:07 +03:00
Andrew Vagin
7094e110a3 mount: stop doing anything if populate_mnt_ns() failed
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-09 14:28:46 +03:00
Andrew Vagin
96a12d4755 mount: don't worry if a binfmt_misc image is empty
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-09 14:27:58 +03:00
Pavel Emelyanov
16dccd5ca2 jenkins: Fix CRIT test to skip non-criu images and provide cumulative output
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-08 18:21:47 +03:00
Andrew Vagin
2c747325f4 mount: don't add dot to a path
It isn't required and it doesn't work in a case when
we want to bind-mount a file.

Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-08 17:19:51 +03:00
Adrian Reber
ff3fb16f14 crit: Pretty print vma flags and status
To better understand the content of mm-<ID>.img and pagemap-<ID>.img
additional constant names have been added to better resolve the hex
value to symbolical names.

Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-08 16:34:48 +03:00
Andrew Vagin
93e996d8ed mount: umount a temporary mount with MTN_DETACH
If a temporary mount is a shared one, a new mount can be
propagated into it.

Fixes: 0e9736ab68e0 ("mount: fix restoring a bind-mount when its root is overmounted)")

Reported-by: Mr Jenkins
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-08 16:33:58 +03:00
Fyodor
78a521163f pagemap-cache: add const-qualifier to pmc's vma
We need to perform dirty page tracking when dumping shmem but there
we have only const vmas so we need pmc to work with them. Also pmc concept
implies that it won't change its vmas so it would be natural to declared
them as const.

Signed-off-by: Fyodor Bocharov <fbocharov@yandex.ru>
Signed-off-by: Eugene Batalov <eabatalov89@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-08 15:45:44 +03:00
Fyodor
38daf50f22 page-xfer: fix wrong hole address offset
CRIU doesn't save vaddr of each anon shmem page in anon shared mem pagemap img.
It saves page offset from the beginning of anon shared memory area.
CRIU calls page_xfer_dump_pages() with non zero @off argument
to convert dumper virtual addresses to such offsets.

The problem is in page_xfer_dump_pages() code. It substracts @off
only for pages in pagemap but not for holes in pagemap.
Bug is fixed in this patch.
This patch is just a copy-paste of valid code path for pages to code path for holes.

Bug is not currently reproduced in CRIU because:
1. Only anon shmem provides non-zero @off value to page_xfer_dump_pages()
2. Anon shared memory doesn't create holes in its pagemap (for now)

This bugfix is a preparation for anon shared memory deduplication patchset.

Signed-off-by: Fyodor Bocharov <fbocharov@yandex.ru>
Signed-off-by: Eugene Batalov <eabatalov89@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-08 15:44:11 +03:00
Cyrill Gorcunov
2a25092152 test: Add inotify02
Its only purpose if to verify that we can show up
a huge number of inotify in fdoutput (before
the kernel v3.18-rc1-7-ga3816ab we can show
only handles which fit page size in summary).

In particular we revealed that hald daemon makes
up to 35 notification marks which kernel can't
show up in a one pass and dump fails.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-08 15:37:07 +03:00
Pavel Emelyanov
7ba43a5521 zdtm.py: Fix zdtm_test._env data type
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-08 15:19:25 +03:00
Andrey Vagin
da0b8770f2 sysctl: don't skip erros
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-08 15:12:19 +03:00
Andrey Ryabinin
76e78a38e2 sysctl: really skip missing entries in __nonuserns_sysctl_op()
When __nonuserns_sysctl_op() hits non-existing it goes to the next
iteration without updating 'req' pointer. Thus it continuously tries
to open non-exitsting entry until breaking out of loop.
We should go to the next sysctl instead.

Fixes: f79f4546cfc0 ("sysctl: move sysctl calls to usernsd")
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-08 15:11:19 +03:00
Dmitry Safonov
60595eb432 criu: x86_32: change stack align to 16 bit on parasite head
GCC now assumes by default that the stack is aligned to a 16-byte boundary.
It's very unlikely that parasite head's first call will contain
an SSE instruction which will segfault, but to be pedantically correct
will lose additional 8 bytes.

See also:
http://sourceforge.net/p/fbc/bugs/659/

Signed-off-by: Dmitry Safonov <dsafonov@odin.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-08 15:10:22 +03:00
Andrew Vagin
b2b86052bc criu: add the mnt_id feature if a test uses more than one mntns
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-08 15:09:26 +03:00
Andrew Vagin
0c9db23f50 zdtm.py: skip the uns flavor if userns isn't suppported
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-08 15:09:05 +03:00
Tycho Andersen
6af96c8404 lsm: add a --lsm-profile flag
In LXD, we use the container name in the LSM profile. If the container name
is changed on migrate (on the host side), we want to use a different LSM
profile name (a. la. --cgroup-root). This flag adds that support.

v2: remove unused field, add comment about double detection in
    kerndat_lsm()

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-08 15:07:26 +03:00
Pavel Emelyanov
c5e002d55a crit: Encode back pretty IP addresses
Currently decoded with --pretty image cannot be encoded back if there's
an IP address inside. "Just decoded" can.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-08 15:07:01 +03:00
Pavel Emelyanov
809bd09ba7 crit: Show devices nicely
Currently device numbers are shown as plain integers, but in
pretty output it's nice to see the major:minor pairs.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Ruslan Kuprieiev <rkuprieiev@cloudlinux.com>
2015-12-08 15:06:39 +03:00
Pavel Emelyanov
77f9b7bfbb jenkins: Add test for crit de/encode correctness
The crit tool should decode and encode all images and after
de- and en- sequence the result should be the same as before.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Ruslan Kuprieiev <rkuprieiev@cloudlinux.com>
2015-12-08 15:06:08 +03:00
Pavel Emelyanov
7a5cef1ea2 zdtm.py: Run tests in best flavor
If someone wants to run all tests in any, but the most
difficult for criu, flavor, the 'best' one is introduced.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Ruslan Kuprieiev <rkuprieiev@cloudlinux.com>
2015-12-08 15:05:57 +03:00
Pavel Emelyanov
0ba3b88b4a zdtm.py: Count skipped tests
Currently launcher doesn't know that some tests are skipped
and draws incorrect progress bar :)

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Ruslan Kuprieiev <rkuprieiev@cloudlinux.com>
2015-12-08 15:05:50 +03:00
Andrew Vagin
e7e63a0de1 mount: don't rewrite root for external mounts
It's used to restore bind-mounts. For example, we cat the common
part of bind-mounts:

Core was generated by `criu restore -vvvv --file-locks --tcp-established --evasive-devices --manage-cg'.
Program terminated with signal 11, Segmentation fault.
741                     BUG_ON(target_root[tok] == '\0');
(gdb) bt

https://jira.sw.ru/browse/PSBM-41932

Reported-by: Virtuozzo QA Team
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-08 15:05:12 +03:00
Kirill Tkhai
ea747b0755 unix: Add support for restoring receive queue for unix DGRAM sockets
Restore a receive queue in cases of:

1)socketpair with closed second end;
2)peer-less socket, who is a peer for others.

We use here a hack, it is the connect() with AF_UNSPEC family,
which clears peer of restoring socket. See unix_dgram_connect()
for the details.

This also makes socket_close_data test working.

SOCK_STREAM is supported in TCP_ESTABLISHED case in the same
function.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>

v2: 1)Add a commentary near connect()
    2)Delete test/zdtm/live/static/socket_close_data.desc
v3: delete ui->ue->peer check
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-08 15:02:26 +03:00
Andrew Vagin
515e18e422 zdtm: add mntns_rw_ro_rw to the test list
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-08 15:01:22 +03:00
Andrew Vagin
0e9736ab68 mount: fix restoring a bind-mount when its root is overmounted
In this case we mount source mount in a temporary place and use it to
create the bind-mount.

Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-08 15:00:03 +03:00
Andrew Vagin
9102b08ceb mount: refactor do_bind_mount()
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-08 14:59:41 +03:00
Andrew Vagin
d6d6af9a5a mount: pick out a function to bind mount a point in a tmp place
This is used to get a mount without over-mounted parts.

Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-08 14:59:25 +03:00
Cyrill Gorcunov
6217a84ae3 mnt: Carry run-time device ID in mount_info
When we're restoring fsnotify watchees we need to resolve
path to a handle at some mountpoint referred by @s_dev
member (device ID) which is saved inside image. This
ID actually may be changed at the every mount (say
one restores container after machine reboot) or in
case of container's migration.

Thus the test for overmounting in __open_mountpoint
will fail and we get an error.

Lets do a trick: introduce @s_dev_rt member which
is supposed to carry run-time device ID. When dumping
this member simply equal to traditional @s_dev fetched
from the procfs, but when restoring we fetch it from
stat call once mountpoint become alive.

https://jira.sw.ru/browse/PSBM-41610

v2:
 - predefine MOUNT_INVALID_DEV
 - use fetch_rt_stat instead of assigning device in restore_shared_options
 - copy @s_dev_rt in propagate_siblings and propagate_mount

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-08 14:58:32 +03:00
Pavel Emelyanov
0b0f40ecf8 zdtm.py: The groups_test class for running groups
So here's the new test class that handles the test from
groups set. The class is inherited from zdtm_test one as
what it does is -- starts the pseudo-init in ns/uns flavors
and asks one to spawn() the sub-tests from the list.

All groups tests can only be run inside ns flavor, so if
the host flavor is asked, just the pseudo-init is spawned.
This is because using ns flavor is the easiest way to spawn
all the sub tests under this init (however, h flavor can be
supported by marking the pseudo-init as sub-reaper).

On stop this pseudo-init is signalled to stop, it in turn
stops all the sub-tests and then exits. When the pid
namespace destruction is complete, the sub-tests .out-s are
checked.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-08 14:54:52 +03:00
Pavel Emelyanov
41296aaf88 zdtm.py: Generator of groups of tests
Introduce yet another tests set called 'groups'. Each test
in this set is a list of existing zdtm tests that can be
started side-by-side in an ns flavor.

To 'create' such a test the zdtm.py group action is used,
which lists tests and semi-randomly groups them together.
The grouping possibility is checked by comparing the .desc
files of those -- desc-s should coincide. One exception is
test dependencies, these are just merged together.

After running the group action there appears groups/ dir
with tests each containing just the list of zdtm tests
that are in a group. The respective .desc file is also
generated and this one matches the .desc for tests inside.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-08 14:54:41 +03:00
Pavel Emelyanov
23898bf166 zdtm.py: Prepare zdtm_test and flavors for mass test start
This is -- add ability to pull more than one binary into
mntns root and ability to start zdtm test with more stuff
in the environment than generated in start method itself.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-08 14:53:47 +03:00
Pavel Emelyanov
b96974825d zdtm: Remove unneeded re-exec
When starting inside ns flavor the test_init() routine prepares
the binary to be run inside namespaces. In particular this routine
fork()-s an init, execve()-s one to pick up mappings and exe from
the new mntns and then fork()-s the test itself. In order to go
back to test_init() for test initialization the execve() is done
again, but it's actually not required and confuses the reader.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-08 14:53:19 +03:00
Pavel Emelyanov
b3c8ee1b4f zdtm: Factor out ps showing code
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-08 14:53:11 +03:00
Pavel Emelyanov
0ecd6c336a zdtm: Introduce explicit prepare_namespaces() routine
This one is to set up uids for userns, do ip l s lo up for netns
and do the prepare_mntns(). BTW, the latter's code is shifted one
tab left as this is where it should be.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-08 14:53:05 +03:00
Kirill Tkhai
de8fd000d0 fs: Add binfmt_misc support
This patch implements checkpoint/restore functionality
for binfmt_misc mounts. Both magic and extension types
and "disabled" state are supported.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-08 14:52:26 +03:00
Kirill Tkhai
70b0e161c9 zdtm: Add socket_close_data01 test
This test is for unix sockets open in DGRAM mode.
Server opens a socket, binds it and waits for a signal.
Client connects to the socket and sends a message.

After the signal server checks that data is readable,
and that it's still possible to connect to the bound
socket.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-08 14:51:07 +03:00
Pavel Emelyanov
4f18e1e52c criu: Version 1.8
So, first of all we've fixed (I hope) the security issues
spotted by RedHat people. Another big thing of this release
is the huge amount of bug fixes found while testing live
migration. And enhancements for live migration itself is

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
v1.8
2015-12-07 11:29:53 +03:00
Pavel Emelyanov
a90d01a078 service: Remove systemd startup mode
Due to security reasons the systemd-spawn mode is no longer
supported in service.

Also fix the default binding address to be in local cwd not
to start global service by chance.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-07 11:28:49 +03:00
Pavel Emelyanov
1c75d23cef zdtm.py: Remove no longer needed yaml module
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-04 19:02:25 +03:00
Andrew Vagin
c1404f6671 mount: restore cwd after creating a roots yard (v2)
Currently we see that a cgroup yard are not umounted
with the ENOENT error, because cwd was changed.

v2: construct a path to remove a roots yard
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-04 15:22:58 +03:00
Cyrill Gorcunov
fa17566283 arch: arm -- Wire in ptrace
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-04 11:41:48 +03:00
Cyrill Gorcunov
a38bf75d1e arch: ppc64 -- Wire in ptrace syscall
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-04 11:41:43 +03:00
Cyrill Gorcunov
cd48bddb52 cr-check: Don't include sys/syscalls.h
This conflicts with predefined constants in our own syscalls lib.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-04 11:41:26 +03:00
Cyrill Gorcunov
fca60130f6 arch: x86 -- Add sys_ptrace declaration
We will need it for cr-check.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
iAcked-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-04 11:41:20 +03:00