[ xemul: I don't know how to make this with incremental changes either,
and just go with it :( ]
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Collect pstree_item-s on restore in big list. This lets
us not lseek this file on restore and simplifies the code
a little.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
In commit 71cc2733a79efba65d3466f784b19d17805cf50d
I occasionally dropped the ability to abort on waiting
(because we used signed -1 value to inform waiters that
something is wrong and waiting should be aborted, but
the type was changed to unsigned one and as result
this condition never triggers).
So to resolve it futex_abort_and_wake() is added and
should be used explicitly where appropriate instead
if signess hack.
Reported-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This was required when pages were stored in elf files for
exec. Now we can stop reading it on eof.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Same as prev 2 patches now for the unix sockets. They are still in per-pid image files, but
this is going to change soon (I hope).
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Instread of re-reading this image again and again on every fd restore, pull the
reg-files.img in early and store the entries in a hash. This will simplify the
further fd restoring fixes and will allow for dump/restore via a stream (socket).
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It's wrong to treat restore_thread_exec_start as arguments
area (I managed to overlook this problem in commit
014841825acb14a1d695569b9fe3575f5de6442b) it's rather
a function start address.
The thread arguments area allocated dynamically after the
restorer blob itself.
We didn't hit any problems earlier simply because there
were a few bytes owerwritten in function prologue.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This is not good to update images while restoring.
Thus, read vma_entry-es once into a list, put opened (when required) fds
in there and make restorer walk the entries in mem, not those read from
the image file.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
These chunks of memry, which transit into restorer code gets unmapped one-by-one and thus each of them
should be page-aligned.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
To be consistent. Mutexes are futex based but have
own semantics so better to be able to distinguish
the types.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrey Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Instead of open-coded u32 variables poking lets use
futex_t type and appropriate helpers where needed.
This should increase readability.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrey Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Pid number is redundant - this file is one for the whole tree.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Open the exec link at fd restore stage as yet another service fd,
then pass it to restover via args and just call prctl on it.
This is good for several reasons -- the amount of code required for
this is less and opening files should better happen before we switch
to restorer (opening will be complex and it's MUCH easier to open all
we need in one place).
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Since the collect_shmems updates start address for vmas, for
two shared mappings in one task we'll try to dup() the 1st
restoration attempt, since the si's start will be set to the
2nd one, which is not yet restored.
Thus we should map-and-open the first one being restored, not
the one with matched address and dup() all the rest.
[avagin@: There's no such thing, since the collect stage checks
for pid being less _or_ _equal_ and this only the first vma's start
will be saved. But anyway, this makes it more obvious.]
Acked-by: Andrey Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Just make the fixup_vma_fds read and write vma images and
those called by it provide and fd for this.
Acked-by: Andrey Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The core image now contains only core per-task stuff.
The new file resurrects Tula magic number removed earlier.
Acked-by: Andrey Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It's a rudiment from old times, when restore worked via ececve.
Now we modify the core file in place to fixup vma-s.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
vma_entry contains shmid and all shared memory are dumped in own files.
The most interesting thing is restore.
A maping is restored by process with the smallest pid. The mamping
is created before executing restorer.
We map a full mapping and restore it's conten, then we open a file from
/proc/pid/map_files and store a descriptor in vma_info. The mapping is
unmaped. Now we can map any region of this mapping in the restorer.
We use this trick, because a target process may have this mapping in
some places and the restorer has not function to open proc files.
v2: fix error hangling
xemul: Fixed static-s and args for cr_dump_shmem
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Using absolute paths for this is dangerous - while doing c/r we should
be extremely carefully and not change tasks' roots and mount namespaces
too early. Sometimes it will not work -- when restoring containers we'll
be unable to switch to new CT and still have the ability to open images.
Rework the images opening via openat and keep the image dir fd open all
the time as the service fd (introduced earlier).
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
There is a scenario when pstree_fd may be tried
to close several times -- if we start crtools
with "detach" option.
So simply make restore_root_task to close opened
file descriptor, this also simplifies the code.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Reserve more mem for bootrstrap code and put all self vmas at its tail.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Just prepare the code for smoother further bootstrap areas allocation.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
There are three bugs in this code.
1. self vmas list is released before get-hint is called
2. get-hint code wrongly detects the hole (just bugs, no details)
3. exec hint is mapped without MAP_FIXED, but should
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
The messages are filtered by their type
LOG_MSG - plain messages, they escape any (!) log level
filtration and go to stdout
LOG_ERROR - error messages
LOG_WARN - warning messages
LOG_INFO - informative messages
LOG_DEBUG - debug messages
By default the LOG_WARN log level is used, thus LOG_INFO
and LOG_DEBUG messages will not appear in output stream.
pr_panic helper was replaced with pr_err, pr_warning
shorthanded to pr_warn and old printk if rather pr_msg
now.
Because we share messages between "show" and "dump" actions,
before the "show" action proceed we need to tune up
log level and set it to LOG_INFO.
Also note that printing of VMA and siginfo now
became LOG_INFO messages, it was not that correct
to print them regardless the log level.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Based on xemul@ patches.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
This is a cleanup patch.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
This function simply allocates shared memory. Name it so
and move it closer to the variables it referes on.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Don't forget to close opened file in case of error.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
map_files format defined as %lx-%lx in
kernel and while there should not be a
problem if it's written in %p-%p, still
better to be on a safe side and follow
kernel's notation.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
This patch tries to introduce lazy and hidden pid_dir support,
meaning one don't have to worry about pid_dir but the optimization
is still there.
The patch relies on the fact that we work with many /proc/pid files for
one pid, then for another pid and so on, i.e. not in a random manner.
The idea is when we call open_proc() with a new pid for the first time,
the appropriate /proc/PID directory is opened and its fd is stored.
Next call to open_proc() with the same PID only need to check that
the PID is not changed. In case PID is changed, we close the old one
and open/store a new one.
Now the code using open_proc() and friends:
- does not need to carry proc_pid around, pid is enough
- does not need to call open_pid_proc()
The only thing that can't be done in that "lazy" mode is closing the last
PID fd, thus close_pid_proc().
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
This patch introduces the following changes:
1) writing of shmid value into vma_area->fd instead of
waiting for shared memory region is open by parent,
reopen it and dump fd.
2) new syscall support: sys_shmat
3) use sys_shmat() to map memory region in restorer's
mapping function if vma flag VMA_AREA_SYSVIPC is set.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>