Why? Because one day we'll support various CLONE_ flags and
for fdtable and fs info we'd like to have separate images (since
these objects are separate in kernel).
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
No need in per-type hash tables and search routines. We can
handle it via generic file_desc structure. Some more thoughts
on unixsk and pipe lists are still required :(
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
[xemul: This check in open_transport_fd should go away once we
implement opening files with peers. ]
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
pipe_entry is encapsulated in pipe_info.
All pipe_info-s connects in the list pipes.
All pipe_info-s with the same piep_id connects to pipe_list,
it a circular list without a defined head.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Now fdinfos are collected independently from reg files and
sockets. During tihs collect we effectively create the mirroring
list of both by checking which type-IDs are added first.
Fix this by removing the fdinfo_desc and attaching fds directly
to collected reg files and sockets. Pipes and unix sockets will
be reworked in the same manner soon.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Instread of re-reading this image again and again on every fd restore, pull the
reg-files.img in early and store the entries in a hash. This will simplify the
further fd restoring fixes and will allow for dump/restore via a stream (socket).
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It's used only to check whether we should do file re-send and verify we've served
all the cliens. This can be replaced with proper list manipulations.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The statfs is not required, we now check for fd being a socket with S_IFSOCK.
The 2nd stat is just not needed, the caller provides stat info.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Instead of open-coded u32 variables poking lets use
futex_t type and appropriate helpers where needed.
This should increase readability.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrey Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Open the exec link at fd restore stage as yet another service fd,
then pass it to restover via args and just call prctl on it.
This is good for several reasons -- the amount of code required for
this is less and opening files should better happen before we switch
to restorer (opening will be complex and it's MUCH easier to open all
we need in one place).
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Just make the fixup_vma_fds read and write vma images and
those called by it provide and fd for this.
Acked-by: Andrey Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We switch generic-object-id concept with sys_kcmp approach,
which implies changes of image format a bit (and since it's
early time for project overall, we're allowed to).
In short -- previously every file descriptor had an ID
generated by a kernel and exported via procfs. If the
appropriate file descriptors were the same objects in
kernel memory -- the IDs did match up to bit. It allows
us to figure out which files were actually the identical
ones and should be restored in a special way.
Once sys_kcmp system call was merged into the kernel,
we've got a new opprotunity -- to use this syscall instead.
The syscall basically compares kernel objects and returns
ordered results suitable for objects sorting in a userspace.
For us it means -- we treat every file descriptor as a combination
of 'genid' and 'subid'. While 'genid' serves for fast comparison
between fds, the 'subid' is kind of a second key, which guarantees
uniqueness of genid+subid tuple over all file descritors found
in a process (or group of processes).
To be able to find and dump file descriptors in a single pass we
collect every fd into a global rbtree, where (!) each node might
become a root for a subtree as well.
The main tree carries only non-equal genid. If we find genid which
is already in tree, we need to make sure that it's either indeed
a duplicate or not. For this we use sys_kcmp syscall and if we
find that file descriptors are different -- we simply put new
fd into a subtree.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
This function simply allocates shared memory. Name it so
and move it closer to the variables it referes on.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Having them in the header file will allow to share these
structures with other callers. Moreover, this is a good
practice to have definition(s) in header file until otherwise
really needed.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Some process can share one struct file-s, we may find them by "object IDs".
A file descriptor is opened in one process and send to other via unix socket.
The procedure of restoring files contains four stages.
* Collect data about all file's descriptors
On this stage we find process which will restore a file descriptor and
create a list of processes, who should get this descriptor.
* Create datagrams unix sockets
If a file descriptor should be received, a unix socket is created
instead of it.
* Open file descriptors
A process with the least pid opens a file and sends this file
descriptors to all one who wait it.
* Receive file descriptors.
When we were thinking up this algoritm, we wanted to minimize a number
of context switches. A number of context switches is proportional of a
number of processes.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>