The global variable me isn't initialized, when we tried to use it.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
If the option --log-pid is set, each process will have an own log file.
Otherwise PID is added to each log message.
A message can't be bigger than one page minus some bytes for pid.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
v2:
- open_mount is cleaned up
- byte-stream hex conversion remains untouched since
strtol is flipping numbers to LE manner
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
v2:
- Pass initial counter value to eventfd call
(can't pass flags here since they are obtained
with fcntl and must be restored same way or
restore will fail)
- Use rst_file_params for flags and owner restore
- Use eventfd.[ch] instead of eventfs.[ch]
- Move show funcs to eventfd.c
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When dump finished with error we should unlock all locked
previously connections.
When restoring we should collect connctions and unlock them
all at the end.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Instead of generating offsets from early compiled
object files (one day the offsets obtained from
there might be changed during linkage stage) better
to get them from a final stage where all object
files involved are linked into complete binary blob.
That happened that at early stage we indeed were using
only single file per parasite and restorer but at present
there a couple of file involved (and will be more in
future) so we need a safe approach.
Also note the symbols being exported are prefixed as
"__export_". This is easier approach for now. Putting
such symbols into separate section requires a way
more efforts to handle.
The main reason of having two files (Elf object
and binary blob) is to get 1:1 mapping between
symbols definition and their position in binary
target.
The exported symbols name addresses are obtained
from object file and used as offsets in binary
target.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Don't re-read fdinfo image 4 times on restore, just use those collected
on me pstree_entry instance.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This brings hardness into syscall trasition to asm code,
pass this constants in callers.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
On some systems PAGE_SIZE is declared as sysconf(_SC_PAGESIZE) in <sys/user.h>
this is non-constant expression, so it cannot be used in type declarations.
This breaks compilation with a very non-obvious error message:
CC parasite-syscall.o
In file included from parasite-syscall.c:30:0:
./include/parasite.h:90:8: error: variably modified ‘fds’ at file scope
crtools doesn't uses anything from <sys/user.h>, so we can drop its usage.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Command below was executed several times:
sed 's/\(pr_.*[^%,x,X]\)\(\%[0-9,l,L]*x\)/\10x\2/g' -i *.c
Signed-off-by: Stanislav Kinsbursky <skinsbursky@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Util we have kernel support.
[ xemul: MySQL uses runaway pgid and sid and we cannot restore it
gracefully with exiting API :( Byt MySQL seem not to care about
pgid and sid change after restore, so ignore this for a while ]
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Completely unlinked file is the one with n_link count being zero.
Such files only allow to read their contents and carry with us.
In order to dump this thing I introduce the "path remap" technology.
For reg file a remapping entry is dumped which describes, that at
restore stage before opening a regfile->path this path should be
linked to some other name and then (after open) unlinked.
For completely unlinked files the remap path would be a path to
a "ghost" file, i.e. a file which is created only at the time of
restore and which is removed completely at the end of it.
Partially unlinked files (i.e. those having n_link != 0, but a
path by which we see them in someone's fd is not accessible) should
be handled in another way.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This is preriquisity for terminals handling and just a good
practice to save and restore everything we can :)
Not all combinations are supported. All the problems we still
have come from the inability to attach to group/session with
ID no tasks own as its PID.
This can be workarounded by fork()-ing this pid temporarily,
but we'd rather think in the direction of modifying the kernel
to give us direct syscall for this (oh my...)
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
I store them on _entry since sids can only be inherited or
set to current's pid. Thus the best we can do it restore sids
at fork time, thus save them in the image we use to fork.
Maybe when we submit patches that will give us ability to set
arbitrary pgid and sid we'll change this, but this is in the
future.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
New stage CR_STATE_FORKING. This is required to restore pgids
properly -- we need to make sure a task with pid whose pgid we
are about to enter is alive. And this task is not necesserily
our parent, thus wait for everyone to appear.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The mm_xxx bits are per-mm_struct, not per-task_struct in kernel.
Thus, when we support CLONE_VM we'd better have these bits in a
separate image file.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Why? Because one day we'll support various CLONE_ flags and
for fdtable and fs info we'd like to have separate images (since
these objects are separate in kernel).
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It's an O(n) algorithm.
Now we iterate both lists simultaneously to find a hole.
[xemul: Discussion making the patch more understandable:
Cyrill:
If s_vma is the last one on self_vma_list you could break immediately, no?
And the snippet I somehow miss is -- how the situation handled when
hole
a b
source |----| |-----|
target |----| |-----|
c d
the hole fits the requested size but the hole is shifted
in target, so that you've
prev_vma_end = a
and then you find that a - d > vma_len and return a
as start address for new mapping while finally it
might intersect with address c.
Or I miss something obvious?
Andrey:
Look at "continue" one more time.
prev_vma_end is returned only if both condition are true
if (prev_vma_end + vma_len > s_vma->vma.start) {
....
if (prev_vma_end + vma_len > t_vma->vma.start) {
...
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Looks-good-to: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
[ xemul: The fix effectively is -- stop scanning the 2nd vma list
once we see, that the hint's end hits the next vma ]
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This is a big change, yes. Dump unix sockets in the same manner
as all the other files are done now. A few notes however.
1. We explicitly drop names for connected stream sockets. This is
done to avoid conflicts with names -- accepted sockets share their
names with the listening parent. This can be done later by binding
a socket to a name, them renaming it to some temporary uniq one
and at the very very end renaming some back to original.
2. Interconnected sockets are restored via socketpair() call. This is
correct, but names are dropped. Need to bind() sockets after this
(yes, this can be done), but for this we need to implement the trick
with renames described before.
3. FD for socket queues is constantly re-opened not to resolve fd
conflicts. Need to use service fds engine for this later.
4. Some code cleanup is still required, yes (will follow shortly).
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
pipe_entry is encapsulated in pipe_info.
All pipe_info-s connects in the list pipes.
All pipe_info-s with the same piep_id connects to pipe_list,
it a circular list without a defined head.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
[ xemul: I don't know how to make this with incremental changes either,
and just go with it :( ]
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Collect pstree_item-s on restore in big list. This lets
us not lseek this file on restore and simplifies the code
a little.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
In commit 71cc2733a79efba65d3466f784b19d17805cf50d
I occasionally dropped the ability to abort on waiting
(because we used signed -1 value to inform waiters that
something is wrong and waiting should be aborted, but
the type was changed to unsigned one and as result
this condition never triggers).
So to resolve it futex_abort_and_wake() is added and
should be used explicitly where appropriate instead
if signess hack.
Reported-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This was required when pages were stored in elf files for
exec. Now we can stop reading it on eof.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Same as prev 2 patches now for the unix sockets. They are still in per-pid image files, but
this is going to change soon (I hope).
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Instread of re-reading this image again and again on every fd restore, pull the
reg-files.img in early and store the entries in a hash. This will simplify the
further fd restoring fixes and will allow for dump/restore via a stream (socket).
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It's wrong to treat restore_thread_exec_start as arguments
area (I managed to overlook this problem in commit
014841825acb14a1d695569b9fe3575f5de6442b) it's rather
a function start address.
The thread arguments area allocated dynamically after the
restorer blob itself.
We didn't hit any problems earlier simply because there
were a few bytes owerwritten in function prologue.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This is not good to update images while restoring.
Thus, read vma_entry-es once into a list, put opened (when required) fds
in there and make restorer walk the entries in mem, not those read from
the image file.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
These chunks of memry, which transit into restorer code gets unmapped one-by-one and thus each of them
should be page-aligned.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
To be consistent. Mutexes are futex based but have
own semantics so better to be able to distinguish
the types.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrey Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Instead of open-coded u32 variables poking lets use
futex_t type and appropriate helpers where needed.
This should increase readability.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrey Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Pid number is redundant - this file is one for the whole tree.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Open the exec link at fd restore stage as yet another service fd,
then pass it to restover via args and just call prctl on it.
This is good for several reasons -- the amount of code required for
this is less and opening files should better happen before we switch
to restorer (opening will be complex and it's MUCH easier to open all
we need in one place).
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>