At moment only open_mnt_root is there, which is
needed for inotify restore.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
To restore inotify we need to know the
mount point device numbers, so this helper
parses /proc/pid/mountinfo file for that.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
v2:
- Pass initial counter value to eventfd call
(can't pass flags here since they are obtained
with fcntl and must be restored same way or
restore will fail)
- Use rst_file_params for flags and owner restore
- Use eventfd.[ch] instead of eventfs.[ch]
- Move show funcs to eventfd.c
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This is also a code move plus two changes:
* shmems renamed to dump_shmems
* shmem area size is shared with restorer (it's one page for both for now)
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Dumped pages are those taken to the image
Skipped pages are those, sitting in RAM, but left there (file shared pages)
Total is the total amount of pages that fit into vmarea
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
What it does is uses the tcp repair engine from the kernel.
On dump the connection is locked with netfilter, socket is put
in the repair state and then its internals are dumped.
On restore we create a socket, put it into repair, dress it up
and then unlock all the connections at the very end.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When dump finished with error we should unlock all locked
previously connections.
When restoring we should collect connctions and unlock them
all at the end.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Between dump and restore of a tcp conn we have to keep the connection
blocked, since the socket doesn't exists in the kernel at this time
and any packet from peer will result in RST. Thus, add the -j DROP rule
for every connection we're about to repair.
Later, when we support containers, this will be extended to stop the
whole networking in a CT instead of cherry-peeking connections.
It does system("iptables ...") for this, but I'd prefer using the
libnetfilter-devel sometime in the future.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
First of all -- to make crtools dump/restore established tcp sockets
you have to specify the --tcp-established options. By doing so you
inform crtools that
a) you know, that after dump there will be a netfilter rules blocking
the dumped connections
b) you guarantee, that before restore this netfilter block is still
there
What else this patch does is simple -- collects establised sockets and
calls the dump/restore tcp function (now empty) where appropriate.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Funny, but after this git thinks, that I've renamed the sockets.c
file into sk-unix.c one and fixed it a little bit %)
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This is common, that opened fd fix its fowner and flags. Make
a cuntion for this. Those that obtain fd with open() don't need
to restore flags though.
A thought -- do we need yet another abstraction between fdinfo and
type-d files?
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
To reflect the change in kernel v3.4-rc4
update PR_GET_TID_ADDR and rename it to
PR_GET_TID_ADDRESS as it named in kernel.
[xemul: this fixes futex test broken with 3.4 rebase]
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Some of them (FC12 I debug on) doesn't have fown_ex stuff in their
libc. Add the missing declarations.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Since 3.4-rc the seize-devel flag is removed, the stop event
is renamed (great) and the way si_code should be parsed has
been fixed.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Easier to read.
[ xemul: There's a silent change in how sk buffer is read in -- before
the patch there was a static buffer for data, now this thing is
xrealloc-ed ]
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
In case if dgram socket peer is not connected back
we can try to resolve peer by name.
For security reason this happens only if '-x' option
is passed at checkpoint and restore time.
In particular this is needed for programs which do
use dgram socket to send messages to /dev/log.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
At early days we've been using only a few syscalls
which together with debug compiler options always
produce relative addresses for memory variables used
in parasite and restorer blobs. Thus it came in unnoticed
that there is something worng with syscalls declarations
we use.
Basically all our syscalls are just a wrappers over inline
assembly code in form of
static long syscall2(int nr, long arg0, long arg1)
{
long ret;
asm volatile(
"movl %1, %%eax \t\n"
"movq %2, %%rdi \t\n"
"movq %3, %%rsi \t\n"
"syscall \t\n"
"movq %%rax, %0 \t\n"
: "=r"(ret)
: "g" ((int)nr), "g" (arg0), "g" (arg1)
: "rax", "rdi", "rsi", "memory");
return ret;
}
so every argument treated to be plain long (even if the call
sematics implies it's a memory address passed but not some
integer direct value) and transferred via general purpose
register.
As being mentioned it caused no problems when debug options
specified at compile time, the compiler do not tries to optimize
addressing but generates code which always compute them.
The situation is changed if one is building crtools with
optimization enabled -- the compiler finds that arguments
are rather plain long numbers and might pass direct addresses
of variables, instead of generating relative addresses
(because function declarations have no pointers and 'g' in cope
with 'mov' is used, which is of course wrong).
To fix all this -- now syscalls declarations are generated from
syscall.def file and function arguments are passed in conform
with x86-64 ABI.
This shrinks amount of source code needed to declare syscalls
and opens a way to use optimization.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Inline assembly is very convenient if a couple of
instructions is used but when it grows better to
move it out of wrapper C code and write in plain
assembly, after all we need a very precise control
in bootstrap code.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Instead of generating offsets from early compiled
object files (one day the offsets obtained from
there might be changed during linkage stage) better
to get them from a final stage where all object
files involved are linked into complete binary blob.
That happened that at early stage we indeed were using
only single file per parasite and restorer but at present
there a couple of file involved (and will be more in
future) so we need a safe approach.
Also note the symbols being exported are prefixed as
"__export_". This is easier approach for now. Putting
such symbols into separate section requires a way
more efforts to handle.
The main reason of having two files (Elf object
and binary blob) is to get 1:1 mapping between
symbols definition and their position in binary
target.
The exported symbols name addresses are obtained
from object file and used as offsets in binary
target.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Don't re-read fdinfo image 4 times on restore, just use those collected
on me pstree_entry instance.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This brings hardness into syscall trasition to asm code,
pass this constants in callers.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Our loggin engine prints file:line only at invoke point,
so it's unable to see where exactly reopen_fd_as_safe is
failed.
With this patch the output is more human readable
| Error (util.c:96): fd 7 already in use (called at files.c:359)
Ideally we need full backtrace here, but it's different task.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Completely unlinked file is the one with n_link count being zero.
Such files only allow to read their contents and carry with us.
In order to dump this thing I introduce the "path remap" technology.
For reg file a remapping entry is dumped which describes, that at
restore stage before opening a regfile->path this path should be
linked to some other name and then (after open) unlinked.
For completely unlinked files the remap path would be a path to
a "ghost" file, i.e. a file which is created only at the time of
restore and which is removed completely at the end of it.
Partially unlinked files (i.e. those having n_link != 0, but a
path by which we see them in someone's fd is not accessible) should
be handled in another way.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The base idea is trivial, once file descriptor
created the owner is read and set up.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Just show implemented and stubs added to image
(regular file and pipes).
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
I store them on _entry since sids can only be inherited or
set to current's pid. Thus the best we can do it restore sids
at fork time, thus save them in the image we use to fork.
Maybe when we submit patches that will give us ability to set
arbitrary pgid and sid we'll change this, but this is in the
future.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>