2
0
mirror of https://github.com/checkpoint-restore/criu synced 2025-08-27 04:18:27 +00:00

1696 Commits

Author SHA1 Message Date
Cyrill Gorcunov
87273ccdb8 cpuinfo: x86 -- Add dump and validation of cpuinfo image, v2
On Wed, Oct 01, 2014 at 04:57:40PM +0400, Pavel Emelyanov wrote:
> On 10/01/2014 01:07 AM, Cyrill Gorcunov wrote:
> > On Tue, Sep 30, 2014 at 09:18:53PM +0400, Cyrill Gorcunov wrote:
> >> If a user requested criu to dump cpuinfo image then we
> >> write one on dump and verify on restore. At the moment
> >> we require all cpu feature bits to match the destination
> >> cpu in a sake of simplicity, but in future we need deps
> >> engine which would filer out bits and test if cpu we're
> >> restoring on is more capable than one we were dumping at
> >> allowing to proceed restore procedure.
> >>
> >> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
> >
> > Updated to new img format

Something like attached?

>From 59272a9514311e6736cddee08d5f88aa95d49189 Mon Sep 17 00:00:00 2001
From: Cyrill Gorcunov <gorcunov@openvz.org>
Date: Thu, 25 Sep 2014 16:04:10 +0400
Subject: [PATCH] cpuinfo: x86 -- Add dump and validation of cpuinfo image

If a user requested criu to dump cpuinfo image then we
write one on dump and verify on restore. At the moment
we require all cpu feature bits to match the destination
cpu in a sake of simplicity, but in future we need deps
engine which would filer out bits and test if cpu we're
restoring on is more capable than one we were dumping at
allowing to proceed restore procedure.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-10-03 13:26:57 +04:00
Cyrill Gorcunov
e07b4a0e7a cpuinfo: x86 -- Add protobuf entry
At the moment only x86 is covered, ARM needs own handler.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-10-03 13:26:56 +04:00
Cyrill Gorcunov
ff1a751a89 opt: cpu-cap -- Introduce "none" and "cpuinfo" arguments
They will serve to choose capability level when migrating
images between various hardware nodes.

Note it's bare functionality introduced in this commit,
the real implementation is in next patches.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-10-03 13:25:56 +04:00
Cyrill Gorcunov
3914b180d3 cpuinfo: Drop cpu_set_feature from exporting
It's redundant, should be cpu local.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-10-03 13:23:34 +04:00
Cyrill Gorcunov
ae96d21a07 bfd: Use ERR_PTR and such instead of BREADERR
No need to invent new error codes here, simply
use ERR_PTR/IS_ERR_OR_NULL and such.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-10-02 14:56:39 +04:00
Pavel Emelyanov
c57c2cfa64 predump: Collect mnt and net namespaces properly
On pre-dump we collect only two namespaces -- the mnt one
for criu and mnt one again for root task.

This is not correct. We need all mount namespaces to make
the irmap generation work properly and we need all net
namespaces to have parasite sockets created.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-10-02 14:30:31 +04:00
Pavel Emelyanov
8ad653c732 pstree: Store task's netns on pstree-item
Will be needed for parasite sockets.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-10-01 13:35:11 +04:00
Pavel Emelyanov
3f38145163 pstree: Introduce item's dump info
Empty for now, will be filled soon.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-10-01 13:34:53 +04:00
Pavel Emelyanov
c443b03e10 rst: Rework the rst_info referencing
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-10-01 13:34:38 +04:00
Pavel Emelyanov
3c7d01f6a7 net: Pre-create nl diag sk
The setns() syscall (called by switch_ns()) can be extremely
slow. If we call it two or more times from the same task the
kernel will synchonously go on a very slow routine called
synchronize_rcu() trying to put a reference on old namespaces.

To avoid doing this more than once I propose to create all
per-ns sockets in one place with one setns call. In this
patch there's on nl diag socket used to collect other sockets
is created this way.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-10-01 13:34:29 +04:00
Pavel Emelyanov
7327ffe6a7 ns: Introduce collect_net_namespaces
And move sockets collection there.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-10-01 13:33:56 +04:00
Pavel Emelyanov
01f6f890c2 ns: Introduce collect_namespaces routine
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-10-01 13:33:42 +04:00
Pavel Emelyanov
b476879239 irmap: Get root mntfd before releasing tasks on predump
We have a use-after-free in predump code:

1st the free_pstree() is called in pre_dump_tasks(), then we
go to irmap_predump_run() which may call the lookup_irmap()
which, in turn, dereferences the root_item to get the root
mount ns fd.

But the problem is bigger than that. After we've released the
tasks (done before freeing pstree on predump) we can no longer
access them by PIDs, so keeping the root-item after irmap
scan is not a fix.

Fix is to get the root fd before releasing the tasks and using
one in irmap scanner.

Caught recently on iterative inotify_irmap test.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
2014-10-01 09:37:04 +04:00
Pavel
8ac80915e0 ns: Factor out namespace switching call
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-30 21:54:11 +04:00
Pavel Emelyanov
b90ae65c4c img: Prepare to use bfd engine
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
2014-09-30 21:48:53 +04:00
Pavel Emelyanov
67bbc7ea0b bfd: Rename fields
For reads and writes the names pos and bleft will
have strange meaning, so rename them into smth more
appropriate.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
2014-09-30 21:48:51 +04:00
Pavel Emelyanov
166c58d5bb img: Mark unbufferred images
We have some images that store raw data together with
the pb objects (and one that just stores raw data) and
use custom access to this. E.g. pipe-data images splice
data into them and sk-queue one lseeks the image for
queue packets.

For those using buffered mode mixed with raw may lead
to troubles. Explicitly mark such images, so that the
buffering (next patches) handle such images carefully.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
2014-09-30 21:48:15 +04:00
Pavel Emelyanov
295090c1ea img: Introduce the struct cr_img
We want to have buffered images to speed up dump and,
slightly, restore. Right now we use plan file descriptors
to write and read images to/from. Making them buffered
cannot be gracefully done on plain fds, so introduce
a new class.

This will also help if (when?) we will want to do more
complex changes with images, e.g. store them all in one
file or send them directly to the network.

For now the cr_img just contains one int _fd variable.

This patch chages the prototype of open_image() to
return struct cr_img *, pb_(read|write)* to accept one
and fixes the compilation of the rest of the code :)

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
2014-09-30 21:48:13 +04:00
Pavel Emelyanov
5f2a7ac27b img: Rename fdset -> imgset
Since we're going to switch from int-fd-s to class-image
soon the fdset name will not fit into the new terminology.

This patch is

 sed -e 's/fdset/imgset/g' -i *
 sed -e 's/imgset_fd/img_from_set/g' -i *
 git mv include/fdset.h include/imgset.h

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
2014-09-30 21:48:10 +04:00
Pavel Emelyanov
1cb690ddc9 img: Move images IO helpers into .c file
This is to simplify the change from int fd to more
generic image class data-type.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
2014-09-30 21:48:08 +04:00
Pavel Emelyanov
5eb39aad4d bfd: Multiple buffers management (v2)
I plan to re-use the bfd engine for images buffering. Right
now this engine uses one buffer that gets reused by all
bfdopen()-s. This works for current usage (one-by-pne proc
files access), but for images we'll need more buffers.

So this patch just puts buffers in a list and organizes a
stupid R-R with refill on it.

v2:
  Check for buffer allocation errors
  Print buffer mem pointer in debug

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
2014-09-29 15:37:14 +04:00
Pavel Emelyanov
e651a6eba4 filemap: Get vma mnt_id early
We have a, well, issue with how we calculate the vma's mnt_id.

Right now get one via criu side file descriptor that it got by
opening the /proc/pid/map_files/ link. The problem is that these
descriptors are 'merged' or 'borrowed' by adjacent vmas from
previous ones. Thus, getting the mnt_id value for each of them
makes no sense -- these files are the same.

So move this mnt_id getting earlier into vma parsing code. This
brings a potential problem -- if we have two adjacent vmas
mapping the same inode (dev:ino pair) but living in different
mount namespaces -- this check would produce wrong result.
"Wrong" from the perspective that on restore correct file would
be opened from wrong namespace.

I propose to live with it, since this is not worse than the
--evasive-devices option, it's _very_ unlikely, but saves a lot
of openeings.

Note, that in case app switched mount namespace and then mapped
some new library (with dlopen) things would work correctly -- new
vmas will likely be not adjacent and for different dev:ino.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-29 13:20:55 +04:00
Pavel Emelyanov
f84d19e09a vma: Add comments about some dump fields of vma_area
We have non-obvious handling of vm_file_fd/vm_socket_id
pair and the vma->file_borrowed.

Comment these to in the structure.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-29 13:20:20 +04:00
Pavel Emelyanov
cf8c9ae870 vma: Reshuffle the struct vma_area
We have some fields, that are dump-only and some that
are restore only (quite a lot of them actually).

Reshuffle them on the vma_area to explicitly show which
one is which. And rename some of them for easier grep.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-29 13:19:55 +04:00
Pavel Emelyanov
53771adcaa bfd: File-descriptors based buffered read
This sounds strange, but we kinda need one. Here's the
justification for that.

We heavily open /proc/pid/foo files. To speed things up we
do pid_dir = open("/proc/pid") then openat(pid_dir, foo).
This really saves time on big trees, up to 10%.

Sometimes we need line-by-line scan of these files, and for
that we currently use the fdopen() call. It takes a file
descriptor (obtained with openat from above) and wraps one
into a FILE*.

The problem with the latter is that fdopen _always_ mmap()s
a buffer for reads and this buffer always (!) gets unmapped
back on fclose(). This pair of mmap() + munmap() eats time
on big trees, up to 10% in my experiments with p.haul tests.

The situation is made even worse by the fact that each fgets
on the file results in a new page allocated in the kernel
(since the mapping is new). And also this fgets copies data,
which is not big deal, but for e.g. smaps file this results
in ~8K bytes being just copied around.

Having said that, here's a small but fast way of reading a
descriptor line-by-line using big buffer for reducing the
amount of read()s.

After all per-task fopen_proc()-s get reworked on this engine
(next 4 patches) the results on p.haul test would be

        Syscall     Calls      Time (% of time)
Now:
           mmap:      463  0.012033 (3.2%)
         munmap:      447  0.014473 (3.9%)
Patched:
         munmap:       57  0.002106 (0.6%)
           mmap:       74  0.002286 (0.7%)

The amount of read()s and open()s doesn't change since FILE*
also uses page-sized buffer for reading.

Also this eliminates some amount of lseek()s and fstat()s
the fdopen() does every time to catch up with file position
and to determine what sort of buffering it should use (for
terminals it's \n-driven, for files it's not).

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-23 20:48:38 +04:00
Pavel
867bcd2196 mnt: Shorten the mntns dumping loop
We currently have all mouninfo-s from all mnt namespaces collected
in one big list. On dump we scan through it to find the namespaces
we need to dump.

This can be optimized by walking the list of namespaces instead.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
2014-09-23 20:37:32 +04:00
Pavel Emelyanov
ab50f6ac18 ptrace: Factor out pie stopping code
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrey Vagin <avagin@parallels.com>
2014-09-23 20:36:10 +04:00
Andrew Vagin
13fc78b907 ptrace: say to parasite_stop_on_syscall where is we now
On restore parasite_stop_on_syscall() can be called after PTRACE_SYSCALL
and after a breakpoint. parasite_stop_on_syscall() must be called only
after PTRACE_SYSCALL, so all tests where is one process stuck.

Reported-by: Mr Jenkins
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-22 12:49:45 +04:00
Andrey Vagin
248fc31531 restore: use breakpoints instead of tracing syscalls
Currently CRIU traces syscalls to catch a moment, when sigreturn() is
called. Now we trace recv(cmd), close(logfd), close(cmdfd), sigreturn().

We can reduce a number of steps by using hw breakpoints. A breakpoint is
set before sigreturn, so we will need to trace only it.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-19 17:57:18 +04:00
Andrey Vagin
0b1b81512b dump: use breakpoints instead of tracing syscalls (v2)
Currently CRIU traces syscalls to catch a moment, when sigreturn() is
called. Now we trace recv(cmd), close(logfd), close(cmdfd), sigreturn().

We can reduce a number of steps by using hw breakpoints. A breakpoint is
set before sigreturn, so we will need to trace only it.

v2: In the first version a breakpoint is set after sigreturn. In this
case we have a problem with signals. If a process has pending signals,
it will start to precess them after exiting from sigreturn(), but before
returning to userspace. So the breakpoint will not be triggered.

And at the end Here are a few numbers how we catch sigreturn.
Before this patch criu executes 36 syscalls and gets 12 signals.
With this patch criu executes 18 syscalls and gets 5 signals.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-19 17:56:25 +04:00
Tycho Andersen
f020bef776 remap: add a dead pid /proc remap
If a file like /proc/20/mountinfo is open, but 20 is a zombie (or doesn't exist
any more), we can't read this file at all, so a link remap won't work. Instead,
we add a new remap, called the dead process remap, which forks a TASK_HELPER as
that dead pid so that the restore task can open the new /proc/20/mountinfo
instead.

This commit also adds a new stage CR_STATE_RESTORE_SHARED. Since new
TASK_HELPERS are added when loading the shared resource images, we need to wait
to start forking tasks until after these resources are loaded.

v2: fix a mutex bug

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-19 17:42:48 +04:00
Tycho Andersen
c09ba04c48 restore: TASK_HELPERs live until RESTORE stage ends
In order to use TASK_HELPERS to open files from dead processes, they should
persist until criu is done restoring the filesystem, which happens in the
RESTORE stage. To do this, we need to pass each helper's PIDs to the restorer
blob, so that it can wait() on them when the restore stage is done.

This commit is in preparation for the remap_dead_pid commits.

v2: wait() on helpers after restore stage is over
v3: add CR_STATE_RESTORE_FS stage
v4: CR_STATE_RESTORE_FS waits for nr_tasks + nr_helpers, not nr_threads
v5: ditch CR_STATE_RESTORE_FS in favor of passing helpers to restorer blob

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-19 17:42:46 +04:00
Cyrill Gorcunov
d36c4058bc plugin: Explicit assign plugin hooks
So it won't depend on the order in declaration.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-19 17:39:06 +04:00
Ruslan Kuprieiev
ada4664429 security: change CR_FD_PERM from rw-rw-r-- to rw-r--r--
This makes only root to be able to modify images by default.
When using criu with suid bit set, group of the images is set
to user group, which is not safe, considering current CR_FD_PERM.

Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-18 20:16:50 +04:00
Pavel Emelyanov
53957fadc3 restore: Introduce the --restore-sibling option
We have a slight mess with how criu restores root task.
Right now we have the following options.

1) CLI
	a) Usually
	task calling criu
	 `- criu
	     `- root restored task

	b) when --restore-detached AND root has pdeath_sig

	task calling criu
	 `- criu
	 `- root restored task

2) Library/SWRK
	task using lib/swrk
	 `- criu
	 `- root restored task

3) Standalone service
	a) Usually
	service
	 `- service sub task
	     `- root restored task

	b) when root has pdeath_sig
	criu service
	 `- criu sub task
	 `- root restored task

It would be better is CRIU always restored the root task as sibling,
but we have 3 constraints:

First, the case 1.a is kept for zdtm to run tests in pid namespaces
on 3.11, which in turn doesn't allow CLONE_PARENT | CLONE_NEWPID.

Second, CLI w/o --restore-detach waits for the restored task to die and
this behavior can be "expected" already.

Third, in case of standalone service tasks shouldn't become service's
children.

And I have one "plan". The p.haul project while live migrating tasks
on destination node starts a service, which uses library/swrk mode. In
this case the restored processes become p.haul service's kids which is
also not great.

That said, here's the option called --restore-child that pairs the
--restore-detach like this:

* detached AND child:

task
 `- criu restore (exits at the end)
 `- root task

The root task will become task's child.
This will be default to library/swrk.
This is what LXC needs.

* detach AND !child

task
 `- criu restore (exits at the end)
     `- root task

The root task will get re-parented to init.
This will be compatible with 1.3.
This will be default to standalone service and
to my wish with the p.haul case.

* !detach AND child

task
 `- criu restore (waits for root task to die)
 `- root task

This should be deprecated, so that criu restore doesn't mess
 with task <-> root task signalling.

* !detach AND !child

task
 `- criu restore (waits for root task to die)
     `- root task

This is how plain criu restore works now.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Andrew Vagin <avagin@openvz.org>
2014-09-10 18:30:30 +04:00
Pavel Emelyanov
b47b0201f3 page-server: Don't setup options in parent task
When service starts page server all the preparations (log, wdir, img dir, etc.)
happen in parent task, then we fork page server.

This is OK for now, but when we will serve several requests per connection, all
these resources would be leaked in parent.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-05 13:49:54 +04:00
Pavel Emelyanov
76017ec5a4 scripts: Use numeric action val in RPC notifications
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-05 13:48:27 +04:00
Pavel Emelyanov
17d44de9af scripts: Use numeric script names
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-05 13:48:26 +04:00
Pavel Emelyanov
069bdd9674 scripts: Move scripts code into separate sources
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-05 13:48:21 +04:00
Cyrill Gorcunov
d039868f99 log: Add pr_quelled helper
If we need to check if current loglevel will suppress
our messagess (say you need to run pr_debug in a cycle)
we can use this helper to eliminate unneded calls.

Like
  if (!pr_quelled(LOG_DEBUG)) {
    ... do something specific to LOG_DEBUG ...
  }

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-03 20:56:54 +04:00
Andrey Vagin
c40eff85dc eventpoll: merge eventpoll tfd into eventpoll image
All marks are collected in a list and then they are written in
the eventpoll image as a repeated field.

This images merge reduces the amount of image files criu
generates and may simplify the fix of mentioned above issue

v2: save the original order of entries
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-03 20:51:40 +04:00
Andrey Vagin
c4a8dd17bc fsnotify: merge fanotify mark image into fanotify image (v3)
All marks are collected in a list and then they are written in
the fanotify image as a repeated field.

This images merge reduces the amount of image files criu
generates and may simplify the fix of mentioned above issue

v2: don't leak fe.mark_entry
v3: save the original order of marks
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-03 20:51:39 +04:00
Andrey Vagin
a10907a1dd fsnotify: merge inotify wd image into inotify image (v4)
All watch descriptors are collected in a list and then
they are written in inotify image as a repeated field.

This images merge reduces the amount of image files criu
generates and may simplify the fix of mentioned above issue.

v2: use free_inotify_wd_entry() instead of xfree in dump_one_inotify()
v3: don't leak ie.wd_entry
v4: save the original order of watchers
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-03 20:51:38 +04:00
Pavel Emelyanov
7058714fda service: Add ability to inherit page server socket
The swrk action is turning out to be a cool thing. We can
spawn criu with swrk action with some FD being open, then
ask for dump/pre-dump/page-server telling it that some
descriptor it needs is "out there".

This patch lets us specify that the page server communication
channel is already in criu's fdtable.

TODO: teach regular service to accept fd via service socket.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-03 20:50:12 +04:00
Cyrill Gorcunov
3146f58317 plugin: Rework plugins API, v2
Here we define new api to be used in plugins.

 - Plugin should provide a descriptor with help of
   CR_PLUGIN_REGISTER macro, or in case if plugin require
   no init/exit functions -- with CR_PLUGIN_REGISTER_DUMMY.

 - Plugin should define a plugin hook with help of
   CR_PLUGIN_REGISTER_HOOK macro.

 - Now init/exit functions of plugins takes @stage
   argument which tells plugin which stage of criu
   it's been called on dump/restore. For exit it
   also takes @ret which allows plugin to know if
   something went wrong and it needs to cleanup
   own resources.

The idea behind is to not limit plugins authors with names
of functions they might need to use for particular hook.

Such new API deprecates olds plugins structure but to keep
backward compatibility we will provide a tiny layer of
additional code to support old plugins for at least a couple
of release cycles.

For example a trivial plugin might look like

 | #include <sys/types.h>
 | #include <sys/stat.h>
 | #include <fcntl.h>
 | #include <libgen.h>
 | #include <errno.h>
 |
 | #include <sys/socket.h>
 | #include <linux/un.h>
 |
 | #include <stdio.h>
 | #include <stdlib.h>
 | #include <string.h>
 | #include <unistd.h>
 |
 | #include "criu-plugin.h"
 | #include "criu-log.h"
 |
 | static int dump_ext_file(int fd, int id)
 | {
 |	pr_info("dump_ext_file: fd %d id %d\n", fd, id);
 |	return 0;
 | }
 |
 | CR_PLUGIN_REGISTER_DUMMY("trivial")
 | CR_PLUGIN_REGISTER_HOOK(CR_PLUGIN_HOOK__DUMP_EXT_FILE, dump_ext_file)

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-03 20:48:36 +04:00
Cyrill Gorcunov
b858711726 plugin: Beautify criu-plugin.h
- use custom multiline comments style
 - ending #endif comment

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-03 20:46:40 +04:00
Pavel Emelyanov
53537f52c8 locks: Don't dump locks in per-task manner (v3)
We have a problem with file locks (bug #2512) -- the /proc/locks
file shows the ID of lock creator, not the owner. Thus, if the
creator died, but holder is still alive, criu fails to dump the
lock held by latter task.

The proposal is to find who _might_ hold the lock by checking
for dev:inode pairs on lock vs file descriptors being dumped.
If the creator of the lock is still alive, then he will take
the priority.

One thing to note about flocks -- these belong to file entries,
not to tasks. Thus, when we meet one, we should check whether
the flock is really held by task's FD by trying to set yet
another one. In case of success -- lock really belongs to fd
we dump, in case it doesn't trylock should fail.

At the very end -- walk the list of locks and dump them all at
once, which is possible by merge of per-task file-locks images
into one global one.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-02 17:44:46 +04:00
Pavel Emelyanov
efac9ed8b3 locks: Parse lock type earlier
Same reason as for previous patch.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-02 17:44:39 +04:00
Pavel Emelyanov
0095b40a29 locks: Parse lock kind earlier
Currently we keep the lock type (posix/flock) till the
time we dump it, then "decode" it into binary value.
I will need the easy-to-check one early, so parse the
kind in proc_parse.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-02 16:39:09 +04:00
Andrey Vagin
961655dc02 util: add a function to check output data in a file descriptor
We can't dump netlink socket, inotify, fanotify, if they have queued
data, so lets add a function to chech this.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-09-02 16:25:50 +04:00