The namelen is u16, to cover the PATH_MAX u8 is not enough.
The pos is u64, since file offset is that long indeed.
The id is u32 as per previous patch.
Fix printf-s respectively.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Open the exec link at fd restore stage as yet another service fd,
then pass it to restover via args and just call prctl on it.
This is good for several reasons -- the amount of code required for
this is less and opening files should better happen before we switch
to restorer (opening will be complex and it's MUCH easier to open all
we need in one place).
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Just make the fixup_vma_fds read and write vma images and
those called by it provide and fd for this.
Acked-by: Andrey Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This is a cleanup patch. Use file entry type variable for special files
instead of file entry addr variable.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The messages are filtered by their type
LOG_MSG - plain messages, they escape any (!) log level
filtration and go to stdout
LOG_ERROR - error messages
LOG_WARN - warning messages
LOG_INFO - informative messages
LOG_DEBUG - debug messages
By default the LOG_WARN log level is used, thus LOG_INFO
and LOG_DEBUG messages will not appear in output stream.
pr_panic helper was replaced with pr_err, pr_warning
shorthanded to pr_warn and old printk if rather pr_msg
now.
Because we share messages between "show" and "dump" actions,
before the "show" action proceed we need to tune up
log level and set it to LOG_INFO.
Also note that printing of VMA and siginfo now
became LOG_INFO messages, it was not that correct
to print them regardless the log level.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
We switch generic-object-id concept with sys_kcmp approach,
which implies changes of image format a bit (and since it's
early time for project overall, we're allowed to).
In short -- previously every file descriptor had an ID
generated by a kernel and exported via procfs. If the
appropriate file descriptors were the same objects in
kernel memory -- the IDs did match up to bit. It allows
us to figure out which files were actually the identical
ones and should be restored in a special way.
Once sys_kcmp system call was merged into the kernel,
we've got a new opprotunity -- to use this syscall instead.
The syscall basically compares kernel objects and returns
ordered results suitable for objects sorting in a userspace.
For us it means -- we treat every file descriptor as a combination
of 'genid' and 'subid'. While 'genid' serves for fast comparison
between fds, the 'subid' is kind of a second key, which guarantees
uniqueness of genid+subid tuple over all file descritors found
in a process (or group of processes).
To be able to find and dump file descriptors in a single pass we
collect every fd into a global rbtree, where (!) each node might
become a root for a subtree as well.
The main tree carries only non-equal genid. If we find genid which
is already in tree, we need to make sure that it's either indeed
a duplicate or not. For this we use sys_kcmp syscall and if we
find that file descriptors are different -- we simply put new
fd into a subtree.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
This function simply allocates shared memory. Name it so
and move it closer to the variables it referes on.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
The function has nothing to do with "pop" operation,
it rather "pull"s descriptor out of list. Name it so.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Having them in the header file will allow to share these
structures with other callers. Moreover, this is a good
practice to have definition(s) in header file until otherwise
really needed.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
This patch adds ability to checkpoint/restore
/proc/pid/exe symlink, so if a process we've just
checkpointed has been say /path/to/exe, then at restore
time we bring this path back.
There some restiction from kernel side: if
existing /proc/pid/exe already mapped more than
once, the kernel will refuse to change the symlink,
so we need to restore it lately when mmaps of crtools
itself already unmapped (ie via late call in
restorer.c).
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Otherwise we skip shared segment
overflow on big scales and tests sporadically fail.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
It will be used in the parasite
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
files.c: In function ‘collect_fd’:
files.c:111:2: error: format ‘%d’ expects type ‘int’, but argument 3 has type ‘u64’
files.c: In function ‘open_fd’:
files.c:348:3: error: format ‘%d’ expects type ‘int’, but argument 2 has type ‘u64’
files.c: In function ‘receive_fd’:
files.c:425:5: error: format ‘%d’ expects type ‘int’, but argument 4 has type ‘u64’
files.c:425:5: error: format ‘%d’ expects type ‘int’, but argument 5 has type ‘u64’
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
This is a standard convention to print error message (i.e. strerror(errno))
at the end of line, like this:
Cannot remove file: Permission denied
So pr_perror is fixed to follow this convention (using GNU extension
%m helps a lot here). Unfortunately, due to this we have to make
pr_perror() print a new line character, too, so we had to strip it
from the all pr_perror() invocations.
That (appending a newline) also makes pr_perror() a black sheep
in the herd of pr_* helpers, but what can we do? Worst case scenario
is an extra newline after an error message, not too harmful.
An alternative approach (stripping the newline from the passed format
string and re-adding it) was discussed thoroughly, and it was decided
that such a hack looks a bit too dirty.
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
The absent image file on shared resources preparation now means -- no resources
for this pid (zombies will not have these files).
This is not the most elegant solution, but I don't have anything better in mind.
Need to think over, all the more so we're most likely about to reimplement the
way image is stored some day in the future.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Kill all the macros for reading/writing image parts. New API looks like
* write_img_buf/write_img
Write an object into an image. Reports 0 for OK, -1 for error. The _buf
version accepts object size as an argument, the other one uses sizeof()
* read_img_buf/read_img
Reads an object from image. Reports 0 for OK, -1 for error or EOF.
* read_img_buf_eof/read_img
Reads an object from image. Reports 1 for OK, 0 for EOF and -1 for error.
This is not symmetrical with the previous one, but it was done deliberately
to make it possible to write code like
ret = read_img_bug_eof();
if (ret <= 0)
return ret; /* 0 means OK, all is done, -1 means error was met */.
... /* 1 means object was read, can proceed */
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
We allocate only one page for fdinfo_list_entries.
In the future we will be able to resize this memory.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Yet again -- this makes code easier to understand from my POV.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
open_fdinfo calls move_img_fd, so other functions should not care about it
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Some process can share one struct file-s, we may find them by "object IDs".
A file descriptor is opened in one process and send to other via unix socket.
The procedure of restoring files contains four stages.
* Collect data about all file's descriptors
On this stage we find process which will restore a file descriptor and
create a list of processes, who should get this descriptor.
* Create datagrams unix sockets
If a file descriptor should be received, a unix socket is created
instead of it.
* Open file descriptors
A process with the least pid opens a file and sends this file
descriptors to all one who wait it.
* Receive file descriptors.
When we were thinking up this algoritm, we wanted to minimize a number
of context switches. A number of context switches is proportional of a
number of processes.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>