Because the MmEntry has a "repeated" field, we
copy aux vector explicitly and reserve space for
it in task args.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
1. Mountinfo should be collected after we have forked into new namespace (strictly
speaking this is so)
2. When restoring a mnt ns we can reuse the collected mntinfos rather than reading
them again.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
As described in the previous patch, process group leaders are restored in
the first phase, then all other processes restores pgid.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Pgid leader should become such before any other task tries
to enter this group (with setpgid). Thus we introduce a yet
another global sync point -- before it all pgid leaders call
setpgid after it all the others do it.
v2: wait while helpers restored pgid
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Checkpoint and restore of fifo is similar to
pipes c/r except the pipe end-points are named
file.
Because the fifo has a name we use regular files
facility for fifo path c/r.
Still there is a trick used to "open" fifo:
the opening procedure migh sleep if a fifo's peer
is not yet opened, so before doing a real open
we yield a fake open procedure (with O_RDWR flag)
which prevents us from sleeping even if peer
is not yet ready. Also we need writable fifo
end to restore data queued.
v2:
- add open/priv members to reg_file_info
- make open_fifo_fd to use open_fe_fd
- comment on pipe_id
- make sure the fifo data is not restored twice
v3:
- drop useless fixme comment and add sane one
v4:
- Use restore_data flag to escape data restore duplication
- Use S_ISREG for file contents copying
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We need own proc for restoring mount namespace, since the proc should
be umounted and mounted back diring namespace restore and I don't want
to introduce a special kludge for this.
One more notice -- the temp proc is mounted _after_ namespaces recreation
for the same reason (it will be umounted by prepare_mnt_ns).
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
These are declared in files-reg.h, so get rid of
them and add files-reg.h inclusion where needed.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Don't fail if a root non-init task has another sid, because
it's inherited from parent and can't be restored and
it's expected behaviour, when a subtree is dumped.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It's sign, that a parent has been changed sid after forking a child.
We should know a sid with which a process was born, because in a processes
chain, more then one process might change SID.
v2: fix names of variables
v3: prevent rewriting of born_sid
v4: Abort the restorer with error message if a born_sid can't be determing.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
* Create helpers for processes which have been reparented to init.
* Insert helpers in a process tree.
* Helpers will exit after constructing a process tree.
v2: fix variables names and check errors
v3: add comments in code
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
They will be used for restoring sid. For example, if a session
group leader is absent, a helper process is created with this id
and it will die after restoring all other tasks.
Before this patch restore failed if anyone exited.
Now we should skip helpers, which exited successfully. It's a bit tricky.
All children are collected in sigchld_handler, but we have a point,
where we want to wait all helpers. For that waitpit is used and ECHLD
is ignored, because it signs that a helper exited and has been waited in
sigchld_handler.
v2: check that me isn't NULL in the sig handler
v3: move code about waiting helpers in a separate function
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It will be used for allocating PIDs for helper tasks
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
to require dumping pid namespace. Dump and restore will be failed if
a tress doesn't contain a process init.
pid namespace will be created implicitly if a process init in the tree.
v2: fix comments from Pavel
v3: Restore of pidns should be approved by user
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Util it's very critical for speed we should
not use unsafe sprintf helper, we're root-granted
program and must be as safe as possible.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Create a tmp directory and mount proc from a target pid ns.
This proc will show pid-s from the target pid ns.
crtools uses map_files for restoring sharing mappings.
the tmp directory is removed after restore.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
v2: rework this by using openat() and service fds for proc root.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
A pid namespace is created if a pid of the first task is 1.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Add struct pid and use it everywhere. This struct contains
two fields: pid and real_pid.
real_pid is a pid outside of the target pid namespace.
pid is in the target pid namespace
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
v2: Synchronize the argument type of __alloc_pstree_item and the
values you put into it. I.e. int-int or bool-bool, not bool-int.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Taking into account the way the dump saves pstrees in the image.
If pstree.img isn't edited, a slow path should not be executed at all.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
because they describes a process TREE.
It's usefull, when we dump tasks from another pid namespace,
because a real pid is got from parasite. In previous version
we need to update pid in two places one is in a pstree_item and
one is in a children array.
A process tree will be necessery to restore sid and pgid,
because we should add fake tasks in a tree. For example if
a sesion leader is absent.
v2: fix rollback actions
v3: fix comments from Pavel Emelyanov
* add macros for_each_pstree_item
* and a few bugs
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The global variable me isn't initialized, when we tried to use it.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
If the option --log-pid is set, each process will have an own log file.
Otherwise PID is added to each log message.
A message can't be bigger than one page minus some bytes for pid.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
v2:
- open_mount is cleaned up
- byte-stream hex conversion remains untouched since
strtol is flipping numbers to LE manner
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
v2:
- Pass initial counter value to eventfd call
(can't pass flags here since they are obtained
with fcntl and must be restored same way or
restore will fail)
- Use rst_file_params for flags and owner restore
- Use eventfd.[ch] instead of eventfs.[ch]
- Move show funcs to eventfd.c
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When dump finished with error we should unlock all locked
previously connections.
When restoring we should collect connctions and unlock them
all at the end.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Instead of generating offsets from early compiled
object files (one day the offsets obtained from
there might be changed during linkage stage) better
to get them from a final stage where all object
files involved are linked into complete binary blob.
That happened that at early stage we indeed were using
only single file per parasite and restorer but at present
there a couple of file involved (and will be more in
future) so we need a safe approach.
Also note the symbols being exported are prefixed as
"__export_". This is easier approach for now. Putting
such symbols into separate section requires a way
more efforts to handle.
The main reason of having two files (Elf object
and binary blob) is to get 1:1 mapping between
symbols definition and their position in binary
target.
The exported symbols name addresses are obtained
from object file and used as offsets in binary
target.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Don't re-read fdinfo image 4 times on restore, just use those collected
on me pstree_entry instance.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This brings hardness into syscall trasition to asm code,
pass this constants in callers.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
On some systems PAGE_SIZE is declared as sysconf(_SC_PAGESIZE) in <sys/user.h>
this is non-constant expression, so it cannot be used in type declarations.
This breaks compilation with a very non-obvious error message:
CC parasite-syscall.o
In file included from parasite-syscall.c:30:0:
./include/parasite.h:90:8: error: variably modified ‘fds’ at file scope
crtools doesn't uses anything from <sys/user.h>, so we can drop its usage.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Command below was executed several times:
sed 's/\(pr_.*[^%,x,X]\)\(\%[0-9,l,L]*x\)/\10x\2/g' -i *.c
Signed-off-by: Stanislav Kinsbursky <skinsbursky@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Util we have kernel support.
[ xemul: MySQL uses runaway pgid and sid and we cannot restore it
gracefully with exiting API :( Byt MySQL seem not to care about
pgid and sid change after restore, so ignore this for a while ]
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Completely unlinked file is the one with n_link count being zero.
Such files only allow to read their contents and carry with us.
In order to dump this thing I introduce the "path remap" technology.
For reg file a remapping entry is dumped which describes, that at
restore stage before opening a regfile->path this path should be
linked to some other name and then (after open) unlinked.
For completely unlinked files the remap path would be a path to
a "ghost" file, i.e. a file which is created only at the time of
restore and which is removed completely at the end of it.
Partially unlinked files (i.e. those having n_link != 0, but a
path by which we see them in someone's fd is not accessible) should
be handled in another way.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This is preriquisity for terminals handling and just a good
practice to save and restore everything we can :)
Not all combinations are supported. All the problems we still
have come from the inability to attach to group/session with
ID no tasks own as its PID.
This can be workarounded by fork()-ing this pid temporarily,
but we'd rather think in the direction of modifying the kernel
to give us direct syscall for this (oh my...)
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
I store them on _entry since sids can only be inherited or
set to current's pid. Thus the best we can do it restore sids
at fork time, thus save them in the image we use to fork.
Maybe when we submit patches that will give us ability to set
arbitrary pgid and sid we'll change this, but this is in the
future.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>