Otherwise a task can start to handle a signal and registers can be
changed.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When we try to execute a parasite code, a signal can be started
handling, so we need to update a task registers, which will be saved in
a core file.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Currently it's always stopped, but it will be changed, when a parasite
will be executed as a daemon.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It will be used for executed parasite as a daemon.
What we have previously -- the stack has been preallocated in parasite
blob itself and bootstrap procedure calculated the value needed for %rsp.
With this patch applied we provide every thread own stack as:
- find out how many threads are present
- calculate the summary size of all stacks
- when we ask dumpee to provide us memory area needed to run
parasite code, we pass summary size needed for everything
- when parasite code is asked to run we calculate %rsp needed
taking into account the thread number (ie offsets) and then
setup proper %rsp via ptrace call, instead of calculating it
in bootstrap parasite code
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Now we restore thread registers immediately after a command,
but when we will execute a parasite, it will be impossible.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We have three arrays for thread related data: item->threads,
parasite_ctl->thread and tid_state in parasite.
With this patch a thread will have the same index in all arrays.
The zero index is used for a thread leader.
In this case we don't need to search thread_state in parasite.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When parasite daemon mode will be implemented we get deprived of ability
to fetch registers at the late moment of dumping as we were, thus just
bind CoreEntry to pstree item and allocate CoreEntry'ies for every
thread found, once process tree is in seized state.
Then immediately fill CoreEntry'ies with registers. We use prctl
opcode for that but fetch a complete set of registers including
FPU state, and convert them into protobuf format.
Zombie tasks remains untouched, we allocate CoreEntry for them
right at moment of dumping becuase we don't need registers there
to be written on disk.
This way get_task_regs no longer need parasite_ctl argument
and it's zapped.
Still parasite_ctl has own copy of general registers set but
this is because we need them to be in cpu native format unlike
ones kept in CoreEntry.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Before this patch sigframes were constructed in restorer. We are going
to construct sigframes for parasites. Both parasite and restorer should
be as thing as posible.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The first one fills sigframe and the second one restores another
registers.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
struct thread_restore_args contains many pointers on different objects,
only a few of them are really required.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Instead of opencoded mark injection provide
a helper for easier grepability.
[xemul: Go ahead and remove the INIT_VDSO_MARK at all]
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The [vdso] mark in procfs output is not reliable,
so since we know which prot it should has, escape
obvious mishints.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
In case if we have created vdso proxy the rt-vdso should
not be dumped because it will be re-created on next restore
anyway. Thus with help of parasite service routine find
the rt-vdso and tear it off from VMAs list.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
In case if we need vdso proxy there is a need
to recognize it somehow on further checkpoint
action. But such vdso won't be recognized by
the kernel and [vdso] mark won't appear in
procfs output. Thus we put own mark on it.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We will need it in parasite code to detect run time vdso area.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When a task being restored we may meet two situations
- vdso in image doesn't match the runtime vdso provided
by a kernel. For this case we need to patch dumpee
vdso redirecting calls to runtime vdso, thus dumpee
vdso become a proxy.
- vdso in image does match the runtime vdso, in this
case we simply remap runtime vdso to address where
dumpee vdso lives. Plain remapping here is quite
important and allows us to save vdso pfn which will
be used in parasite code later.
Note after this patch the restored task may have two
vdso in memory. Proper dumping of such situation will
be addressed in future patches.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Runtime vdso need to be kept in some safe place when all
self-vmas are unmapped. So we reserve space for it in restorer
blob area and then remap it into. It's quite important to do
a remap here rather than data copy because otherwise pfn
of vdso disappear and in future we won't be able to detect
vdso are on dumping stage.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
During criu startup we need to fill symbol table of own
run-time vdso provided by the kernel. We will need this
data for vdso proxy.
Because this functions are not used in restorer code,
we move them out of PIE (since PIE code must remain
as small as possible).
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It's quite minimal at moment and provides only two helpers
- vdso_redirect_calls, to patch vdso area redirectling
calls to some new place.
- vdso_fill_symtable, to parse vma area as vdso library
and fill symbols table with offsets and names.
Because these routines will be needed in both regular criu
code and restorer code -- we compile it in pie format.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Will need to extend it to support vdso-pie code which
used in both -- pie code and plain executable code.
I know it's ugly and I must invent some more elegant
way, but need some solution at moment to be able to
compile existing code.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
With the patch zdtm will check not only periodic timers, but also
one-shot ones.
Set one-shot timer to expire in a very long interval (INT_MAX in the
test), and check its remaining value after checkpoint/restore. The
following relationship must hold:
initial_value - remaining_value = time_passed
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It's optional. It's going to be used in jenkins.
time bash -x test//zdtm.sh static/maps04
real 0m40.220s
user 0m0.096s
sys 0m12.822s
time bash -x test//zdtm.sh -t static/maps04
real 0m9.904s
user 0m0.074s
sys 0m1.630s
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Here are a few bugs which hide each other.
* memcmp(&newset, &oldset, sizeof(newset) returns 0 is masks are equal.
* sigprocmask return sigset_t and it contains extra bits for the future,
so we need to initialize all this bits otherwise they will contain
random data.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
There will be a couple of more builtin helpers needed
in pie code soon. Thus to unify approach do
- rename asm/memcpy_64.h to asm/string.h
- introduce include/asm-generic/string.h file
where all helpers are implemented if optimized
variant is not yet provided
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
RHEL6 is shipped with libc 2.12 where no prlimit
helper provided (which was pushed into libc itself
since 2.13). So we implement own proxy for convenience.
Note, if libc does support prlimit itself we simply
don't use any proxy and call it directly.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This constants are system wide, so move them to mem.h
header for reuse sake.
[ xemul: It was kerndat.h in the patch ]
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
==22653== Syscall param read(buf) points to unaddressable byte(s)
==22653== at 0x50480B0: __read_nocancel (in /usr/lib64/libpthread-2.17.so)
==22653== by 0x40CF7C: parasite_dump_pages_seized (mem.c:244)
==22653== by 0x41681D: cr_dump_tasks (cr-dump.c:1533)
==22653== by 0x40448C: main (crtools.c:309)
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When a kernel didn't show vma flags, we set MAP_GROWSDOWN for stack
vmas, but it's not reliable. E.g. thread stacks are mapped without
MAP_GROWSDOWN.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The size of vma can be changed after parsing flags. For example we need
to add a guard page for vma with MAP_GROWSDOWN.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It's being reported that some systems (as Ubuntu 13.04) already
have struct tcp_repair_opt definition in their system headers.
| sk-tcp.c:25:8: error: redefinition of struct tcp_repair_opt
| sk-tcp.c:31:2: error: redeclaration of enumerator TCP_NO_QUEUE
So add a facility for compile time testing for reported entities
to be present on a system. For this we generate include/config.h
where all tested entries will lay and source code need to include
it only in places where really needed.
Reported-by: Vasily Averin <vvs@parallels.com>
Acked-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Some features need to be tested on the system
where the project is compiled, so instead of
drowning into autoconf hell lets try to handle
all this with make facility.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
page-server creates a listen socket and only then goes into the
background, so we can be sure, that page-server is ready for work after
detaching.
v2: call daemon() in a proper place and reuse the option -d
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Just open /proc/self/ns/net and check if
it remains the same on restore.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Based on work done by Cyrill Corcunov (many thanks for that).
In this commit we implement c/r for files which have opened
/proc/$pid/ns/$ids entries.
The idea is rather simple one
Checkpoint
==========
- Check if the file name is the one of known to be ns ref
- If match then write protobuf entry
Restore
=======
- Read all ns entries from the image
- When criu tries to open one we lookup over process
tree to figure out which PID should be used in path
and then just open it
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>