For the root task the clone syscall returns the pid in criu's pidns,
but for other processes the clone syscall returns PID in the restored
namespace.
The /proc/self link contains the PID value of the current process, so if
we want to determing the PID in a criu's pidns, we should use criu's
/proc.
v2: readlink() does not append a null byte to buf, so we must do that
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When service/page server becomes daemon, we may need to know it's pid.
Signed-off-by: Ruslan Kuprieiev<kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Now we don't have generic criu_msg thing -- instead, we have
explicit request (with per-type args) and explicit responce
(yet again -- with per-type args).
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Service connection is actually an 'external' one from unix sockets engine POV,
but we don't want to dump it as such. Thus, we explicitly find one and dump it
as half-closed connection. On restore we push an artificial message into it
to report to the program that the dump-request was served, but the program is
restored.
Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The need in service is described at http://criu.org/Self_dump
Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The criu service is a daemon, that opens a unix socket and listens for
incoming requests. The requests will be declared in protobuf/rpc.proto
and for now will only contain the 'dump' request.
Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
A few sentences, which are required for understanging this patch
2a) A shared mount can be replicated to as many mountpoints and all the
replicas continue to be exactly same.
2b) A slave mount is like a shared mount except that mount and umount
events only propagate towards it.
2c) A private mount does not forward or receive propagation.
All rules is there Documentation/filesystems/sharedsubtree.txt
If it's a first mount in a group, all group members should be
bind-mounted from this one.
Each mount propagates to all members of parent's group. The group can
contains a few slaves.
Mounts, which have propagated to slaves, are unmounted, because we can't
be sure, that they propagated in real life. For example:
mount --bind --make-slave /share /slave1
mount --bind --make-slave /share /slave2
mount /share/test
umount /slave2/test
mount --make-share /slave1/test
mount --bind --make-share /slave1/test /slave2/test
41 40 0:33 / /share rw,relatime shared:28 - tmpfs xxx rw
42 40 0:33 / /slave1 rw,relatime master:28 - tmpfs xxx rw
43 40 0:33 / /slave2 rw,relatime master:28 - tmpfs xxx rw
44 41 0:34 / /share/test rw,relatime shared:29 - tmpfs xxx rw
46 42 0:34 / /slave1/test rw,relatime shared:30 master:29 - tmpfs xxx rw
45 43 0:34 / /slave2/test rw,relatime shared:30 master:29 - tmpfs xxx rw
/slave1/test and /slave2/test depend on each other and minimum one of them
doesn't propagate from /share/test
v2: use false and true for bool
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Try to restore mounts while a postpone list isn't empty and check
that each iteration has some progress, otherwice it will fails for
preventing infinite loops
v2: rework logic about postpone list
add more comments
v3: one more attempt to make it more readable
v4: Here is a master class from Pavel how to write self-documented code.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
All shared mounts from one group are connected to circular list.
All slave are added into the proper master list.
v2: change variable name and fix a bug about adding shared mounts in a
circular list.
v3: handle errors of collect_shared
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Currently pos has type unsigned long, so its size depends on
architecture. pos is saved as 64-bit value in the image file and it
isn't restored, if it is equal to -1. Due to convertation on 32-bit
platforms -1 is converted into UINT_MAX and we get error on restore.
$ zdtm.sh ns/static/tun
...
(00.398513) 5: Error (files-reg.c:534): Can't restore file pos: Illegal seek
(00.398888) 5: Error (files-reg.c:489): Can't open file /dev/net/tun: Illegal seek
...
id: 0x15 flags: 0x2 pos: 0x000000ffffffff fown: { uid: 0 euid: 0 signum: 0 pid_type: 0 pid: 0 } name: "/dev/net/tun"
crtools is compiled with _FILE_OFFSET_BITS=64, so off_t is always 64-bit.
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Same as kernel provides, adopted from Linux sources.
strlcpy is similar to strncpy but _always_ adds \0
at the end of string even if destination is shorter.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We have generic do_pb_show() call and tons of show_foo
routines, that just call one with proper args. Compact
the code by putting the args into array and calling
the do_pb_show() in one place.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
These contain linkage between number, data type and routines
for pb messages we write/read to/from image files. Most of them
have simple number-type-routines mapping, so introduce a generating
script for that.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This thing is pretty straightforward -- on netns creation
populate it with tun-s, after this collect tun files, open
and attach them with regular fd-s engine.
One tricky thing -- when populating namespace with tun links
make them all persistent and drop this flag (if required)
later, when the first alive opened appears.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The major issue with dump is -- some info id get via netlink,
some via sysfs and some (!) via opened and attached tun file.
But the latter cannot be created, if there's another one attached
(or the mq device is full with threads).
Thus we have to dump this info via existing tun file and keep one
in memory till the link dump code takes place.
Opposite situation is also possible -- we can have a persistent
unattached device. In this case we have to attach to it, dump
things and detach back.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
There will be two entities handled:
1. tun file -- an opened char device with misc major and tun minor
that can be attached to item #2
2. tun netdevice -- another type of links
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
TUN devices are created with ioctl, but their parameters (e.g.
flags with state, mtu, etc.) are to be restored with generic
RTM_SETLINK message.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Some (most) network devices would like to have NetDeviceEntry with
more fields, than currently present (and enough for lo and veth).
Prepare for that by allowing them to define their own callback that
would fill the resor of the pb entry and call write_netdev_img().
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
If processes are restored without pidns, criu knows pidtheir -s from images,
but part of those task may have not yet forked, and thus the pids can not
exist or (!) be used by other processes.
To address that we abort stages RESTORE_NS and FORKING without killing tasks,
but with task_entries->start futex by writing STATE_FAIL into it and making
the tasks to check that. Since during RESTORE_NS and FORKING stages tasks can
only block on the mentioned futes, we can safely do it.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Use same code as provided in kernel. In first place
we used own prototypes in case of simplicity (they
all were based on "lock xadd" instruction. There is
no more need for that and we can switch to well known
kernel's api.
Because kernel uses plain int type to carry atomic
counters I had to add explicit u32 type for futexes,
as well as a couple of fixes for new api usage.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This thing was introduced by 01f8f8f4 to help not mixing
per-thread error messages in log files. Now messages are
not mixed by other means, so this thing is useless.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This will only work if timiings are reported by a single
task. Collecting them from several tasks is to be done.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
On restore we compare pages' contents with memcmp to check which
of them can remain shared. Report this info in restore stats.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Restore stats are difficult -- we have to collect them from several
tasks and thus existing plain variables would not work. We'll need
shared memory with stats, so prepre for allocating one.
Other than this -- put call to write_stats() where appropriate for
restore.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Currently criu sends data to the page server, but it doesn't get any
feedback, so it can't be sure that all data have been accepted.
This patch adds a flush command, which requires an answer from the page
server. This command is sent before disconnecting.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The version, that might not wait for ack is always called with
"async" flag set. Cleanup things according to this.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
There are parts dumping which is common to thread and task,
and this stuff is represented by parasite_dump_thread structure.
Merge this into parasite_dump_misc and facror out dumping code.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
If a kernel supports PTRACE_SETSIGMASK, criu don't need to execute
PARASITE_CMD_INIT and PARASITE_CMD_DAEMONIZE, because the frist command
is used only for blocking signals. If criu crashes between these
commands, a process state will be corrupted, because all signals remain
blocked.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>