This patch was designed to be generic and thus usable for all kinds of
sockets. Not sure, thah this goal has been reached, but at least I tried.
Key ideas:
1) sockets queue dump file have to be readed first and then packets entries
with offset for it's data in image will be collected in doubly linked list by
read_sockets_queue() function.
Note: list will contain sockets queues for all (!) the sockets of the task.
2) socket queue can be restored by restore_socket_queue(), which selects
packets from the list by passed id and use sendfile() top send them to the
passed socket. It also removes packet from the list and frees it.
Based on xemul@ patches.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
This patch was designed to be generic and thus usable for all kinds of
sockets. Not sure, thah this goal has been reached, but at least I tried.
Key ideas:
1) On-stack structure for collecting sockets queues and then passing them to
parasite code.
2) Singly linked list is used for collecting structures, representing sockets
of any kind (!) with queues.
Based on xemul@ patches.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
This patch adds sockets queue dump functionality. Key ideas
1) sockets info is passed as plain array in parasite args.
2) new socket option SO_PEEK_OFF with MSG_PEEK is used to read the get the
queue's packets.
3) Buffer for packet will be allocated for each socket separately and with
size of socket sending buffer. For stream sockets is means, that it's queue
will be dumped in chunks of this size.
Note: loop around sys_msgrcv() is required for DGRAM sockets - sys_msgrcv()
with MSG_PEEK will return only one packet.
Based on xemul@ patches.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
This test is based on original version by xemul@ - second message to queue
added.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
We switch generic-object-id concept with sys_kcmp approach,
which implies changes of image format a bit (and since it's
early time for project overall, we're allowed to).
In short -- previously every file descriptor had an ID
generated by a kernel and exported via procfs. If the
appropriate file descriptors were the same objects in
kernel memory -- the IDs did match up to bit. It allows
us to figure out which files were actually the identical
ones and should be restored in a special way.
Once sys_kcmp system call was merged into the kernel,
we've got a new opprotunity -- to use this syscall instead.
The syscall basically compares kernel objects and returns
ordered results suitable for objects sorting in a userspace.
For us it means -- we treat every file descriptor as a combination
of 'genid' and 'subid'. While 'genid' serves for fast comparison
between fds, the 'subid' is kind of a second key, which guarantees
uniqueness of genid+subid tuple over all file descritors found
in a process (or group of processes).
To be able to find and dump file descriptors in a single pass we
collect every fd into a global rbtree, where (!) each node might
become a root for a subtree as well.
The main tree carries only non-equal genid. If we find genid which
is already in tree, we need to make sure that it's either indeed
a duplicate or not. For this we use sys_kcmp syscall and if we
find that file descriptors are different -- we simply put new
fd into a subtree.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
This function simply allocates shared memory. Name it so
and move it closer to the variables it referes on.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
The function has nothing to do with "pop" operation,
it rather "pull"s descriptor out of list. Name it so.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Having them in the header file will allow to share these
structures with other callers. Moreover, this is a good
practice to have definition(s) in header file until otherwise
really needed.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Though we can use libc's syscall() wrapper
I would prefer to have own implementation
in case if we will need it in non-libc bindable
code.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
1) Use macro for defining size mapped size.
2) Allow to allocate all space (including the last byte).
Signed-off-by: Stanislav Kinsbursky <skinsbursky@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
[gorcunov@: MAX_BUF_SIZE tuneup, don't use ops without braces)]
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Current heap has 10 MB free space. It should be enough. Otherwise something
broken by design..
Signed-off-by: Stanislav Kinsbursky <skinsbursky@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Stack heap size is probably small for dumping unix sockest skb's (it's hard to
discover, how big it could be; thus let's assume that 10MB is enough,
otherwise give up and throw error).
So let' replace this stack heap wil 10 MB anonimous private mapping.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Socket address is being generated with this function,
name it so.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
1) Added few missed successfull status setup - this looks
redundant though, but unified with other functions.
2) remove redundant argument in dump_pages_fini().
Signed-off-by: Stanislav Kinsbursky <skinsbursky@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
v2: it's toom risky to jump to address equal to line numbet (there could be
valid executable code). So now jump is done to 0 address and %sp encodes line
number (32 most significant bits) and error code (32 least significant bits).
There is a race between log close by process being restoring and opened file
desctriptors check in zdtm test suite - crtools can exit and compare file
descriptors before detached restored process will perform all the rest tasks
(including close of the log) and execute final system call:
|--- dump/sleeping00/8578/dump.fd 2012-02-20 14:31:31.246096000 +0300
|+++ dump/sleeping00/8578/restore.fd 2012-02-20 14:31:31.418095999 +0300
|@@ -1,4 +1,5 @@
|
| 0 -> /dev/null
| 1 -> /dev/null
|+1023 -> /root/crtools/test/dump/sleeping00/8578/restore.log
| 2 -> /dev/null
The solution is to close log in restorer before final command received. But
this leads to another problem: we have to inform somehow about possible errors
afterwards This is done by forced segmentation fault and looks like this
(dmesg):
pipe00[4678]: segfault at 0 ip 00007f4c8ab77d02 sp 000002ed00000001 error 4
Where %sp encodes line number (32 most significant bits) and error code (32
least significant bits).
Signed-off-by: Stanislav Kinsbursky <skinsbursky@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
In case if time is modified in ls -l output we should
not treat it as error, interrupting zdtm work
-lrwx------ 1 root root 64 Feb 17 14:52 0 -> /dev/null
-lrwx------ 1 root root 64 Feb 17 14:52 1 -> /dev/null
-lrwx------ 1 root root 64 Feb 17 14:52 2 -> /dev/null
+lrwx------ 1 root root 64 Feb 17 14:53 0 -> /dev/null
+lrwx------ 1 root root 64 Feb 17 14:53 1 -> /dev/null
+lrwx------ 1 root root 64 Feb 17 14:53 2 -> /dev/null
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrew Vagin <avagin@parallels.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
This is a place where they should belong to.
util.c is too big already.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Don't forget to close opened file in case of error.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
map_files format defined as %lx-%lx in
kernel and while there should not be a
problem if it's written in %p-%p, still
better to be on a safe side and follow
kernel's notation.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
The ptrace seize doesn't prevent signals from delivery. That said,
we should block the signals in the target task before dumping
anything which is signals-related, i.e. memory and registers.
But once we've blocked signals, we should dump registers before
unblocking them, since any postponed signal will screw things up.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
This patch tries to introduce lazy and hidden pid_dir support,
meaning one don't have to worry about pid_dir but the optimization
is still there.
The patch relies on the fact that we work with many /proc/pid files for
one pid, then for another pid and so on, i.e. not in a random manner.
The idea is when we call open_proc() with a new pid for the first time,
the appropriate /proc/PID directory is opened and its fd is stored.
Next call to open_proc() with the same PID only need to check that
the PID is not changed. In case PID is changed, we close the old one
and open/store a new one.
Now the code using open_proc() and friends:
- does not need to carry proc_pid around, pid is enough
- does not need to call open_pid_proc()
The only thing that can't be done in that "lazy" mode is closing the last
PID fd, thus close_pid_proc().
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
...and make it correctly print the file name we were unable to open.
Also, error from fdopen[dir]() is now reported with file name as well.
Note that open_proc() and friends need to be macros in order for
pr_perror() to show actual file name and line number where error occured.
Historical note: the original version of this patch was way more radical,
changing openat() to open() and thus removing pid_dir (replacing with pid
when needed) and open_proc_dir(), changing openat() to open(). The word
from Pavel is he wants to keep the openat/pid_dir optimization because
it saves two dentry lookups in kernel code for each open(). Because of
this optimization (and desire to print correct file name in case
of error) we have to carry both pid and pid_dir everywhere.
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
It's not a fd to open and map, but SYSV IPC id instead.
So don't close it - this may lead to unpredictable results
(in case of SYSV IPC id will match fd, opened by processes).
Signed-off-by: Stanislav Kinsbursky <skinsbursky@openvz.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
This looks clearer, because this check has nothing with SYSV IPC
mappings. Also we don't modify the vma_entry itself anymore but
operate with local 'flags' copy.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@openvz.org>
[gorcunov@: A few tune ups]
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
* Remove redundant messages
* Show which test will be executed
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Otherwise 'make test' fails on a clean tree because crtools is not built.
Side effect: if you have code modifications that are not yet compiled,
they will be compiled.
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>