This actually fixes a bug -- memory for shmem info was
not allocated dynamically, thus we were limited in the
amount of shmems to be restored.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
On restore we need differetn types of memory allocation.
Here's an engine that tries to generalize them all. The
main difference is in how the buffer with objects is being
grown up.
There are 3 types of memory allocations:
1. shared memory -- objects, that will be used by all criu
children, but will not reach the restorer
2. shared remapable -- the same, but restorer would need
access to them, i.e. -- buffer with objects will get
remapped into restorer
3. private -- the same, but allocatedby each task for itself
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Depending on BITS_PER_LONG userspace representation of dev_t
may vary, so we need to choose proper encoding.
Signed-off-by: Igor Sukhih <igor@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This patch introduces the routines ptrace_get_gpregs() and ptrace_set_gpregs()
that wrap the ptrace interface to get and set CPU registers respectively.
The motivation is to make the CRIU code be compatible with architectures that
don't support the PTRACE_GETREGS and PTRACE_SETREGS ptrace calls ---
the requests PTRACE_GETREGSET and PTRACE_SETREGSET are implemented instead.
Signed-off-by: Alexander Kartashov <alekskartashov@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
In case if checkpoint is failed or -R option passed
we need to remove link remap files created during
dump procedure.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This one keeps registers and sigmask for running thread. Will
be used for simpler parasite management.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
In case if we need to use vdso proxy the memory area
which holds restorer also has a place for vdso proxy
code itself, so on final pass we should not unmap it,
otherwise any call to vdso function will cause sigsegv.
IOW, the memory before final "cleanup" pass of restorer
might look as
+-----------+---------+ +-------------+------+
| bootstrap | rt-vdso | ... | application | vdso |
+-----------+---------+ +-------------+------+
^ |
`-------------------------+
and we have redirected "vdso" code to jump to "rt-vdso".
After final pass the memory must look as
+---------+ +-------------+------+
| rt-vdso | ... | application | vdso |
+---------+ +-------------+------+
^ |
`-------------------------+
I noticed this problem during container migration
testing, the container itself was suspended on 2.6.32
OpenVZ kernel with apache running inside, and any attempt
to connect to apache caused apache to crash.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
There's ... a number of places where we want to do something
with /proc/self/fd/%d path. Each time we guess buffer size
that is enough for this. Make standard constant for this and
save some space on stack and drop args for some functions.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Don't carry it around in a static global variable. Would
be useful for pidns leaks (processes entered one) scan.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The maximal size which may be used in the kernel for sending TCP data
on restore is varies depending on how many memory installed on the
system, moreover the memory allocated for "read queue" is bigger than
used for "write queue". Thus when we checkpointed a big slab of data
we need to figure out which size is allowed for sending data on restore.
For this we read /proc/sys/net/ipv4/tcp_[wmem|rmem] on restore and calculate
the size needed, then we simply chop data to segements and send it
in a loop.
Typical output on restore is something like
| (00.013001) 30110: TCP queue memory limits are 2097152:3145728
https://bugzilla.openvz.org/show_bug.cgi?id=2751
[xemul: moved stuff to kerndat.c]
Reported-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
By default just use the iptables-save and iptables-restore commands.
User may define CR_IPTABLES variable, in this case the "sh -c $CR_IPTABLES"
would be called.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Lets use one default log filename. User can set if in request, if needed.
Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It's not enough to check only uids on dump and restore -- we need to
check e-ids and s-ids now (and caps in the future).
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
v2: remove redundant functions and variables.
Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
[xemul: Simplified !log_file case and renumbered .proto fields]
Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Right now we have an ability to launch the C/R service from root
and execure dump requests from unpriviledged users. Not to be bad
guys, we deny dumping tasks belonging to user, that cannot be
"watched" (traced, read /proc, etc.) by the dumper.
In the future we will use this "engine" when launched with suid
bit, and (probably) will have more sophisticated policy.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We are going to replace pid on id in names of image files. The id is
uniq for each namespace, so it's more convient, if image files are
opened per namespace.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We are going to replace pid on id in names of image files. The id is
uniq for each namespace, so it's more convient, if image files are
opened per namespace.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We are going to replace pid on id in names of image files. The id is
uniq for each namespace, so it's more convient, if image files are
opened per namespace.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We are going to replace pid on id in names of image files. The id is
uniq for each namespace, so it's more convient, if image files are
opened per namespace.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Currently all values of constants should be continuous,
because cr_fdset_open is used for opening images for all namespaces.
The next patches will rework this code and image files will be opened
per namespace, then all these ugly settings of one constant to another
will be removed.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We can restore task's pgid which is not equal to its pid,
only when the respective group leader is alive. To make
restore reliable we wait for all group leaders to restore
using separate restore stage.
It's better to optimize this -- each task has a pointer on
its group leader and waits for one to become such.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We restore chroot before doing this, so if we might need to
open one, we may have no access to the /proc/... paths.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
All process VMA-s are in "premmaped area". All restorer stuff are in
bootstap "area", so we have two areas.
So we don't need to unmap extra VMA-s one by one. We can call munmap
three times for the region before the first area, for the hole between
areas and for the region after the second area.
The old scheme didn't work, because the list of VMA-s can be changed
after collecting. It can be due to memory allocations by libc or due to
increased stack.
v2: improve readability at the expense of beautiness
v3: print return code of munmap in error messages
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This patch adds a new parasite command, which unmaps the parasite blob.
This command never returns and the criu process traps the target process
on the exit from the munmap syscall.
v2: rename the function for unmaping a parasite blob to not intersects
with criu's functions.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This patch adds a function for removing the restorer blob. This function
never returns and the process must be trapped on the exit from the
munmap syscall.
v2: * release parasite_ctl sturcture and use the new interface of
parasite_prep_ctl
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It will be used in cr-restore.c for stopping threads on the exit from
sigreturn.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The munmap syscall must be executed from a process memory. The code can
be injected in memory and then removed. But we can avoid all these
actions, if the code will be in the blob and a process will be trapped
on the exit from the munmap syscall.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
All processes must be started by PTRACE_SYSCALL. The function calls wait
in a loop and if a process on the exit from the required syscall, it
is stopped, otherwise it will be reexecuted by PTRACE_SYSCALL.
The function doesn't know, which processes should be trapped, so
you should care, that wait() will not catch someone else.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This patch adds nothing new, just splits the existant function.
Currently a parasite stopped on sigreturn for unmaping a parasite blob.
The same scheme will be used for restorer blob and this function will be
used to stop on exit from the munmap syscall.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>