Here we define new api to be used in plugins.
- Plugin should provide a descriptor with help of
CR_PLUGIN_REGISTER macro, or in case if plugin require
no init/exit functions -- with CR_PLUGIN_REGISTER_DUMMY.
- Plugin should define a plugin hook with help of
CR_PLUGIN_REGISTER_HOOK macro.
- Now init/exit functions of plugins takes @stage
argument which tells plugin which stage of criu
it's been called on dump/restore. For exit it
also takes @ret which allows plugin to know if
something went wrong and it needs to cleanup
own resources.
The idea behind is to not limit plugins authors with names
of functions they might need to use for particular hook.
Such new API deprecates olds plugins structure but to keep
backward compatibility we will provide a tiny layer of
additional code to support old plugins for at least a couple
of release cycles.
For example a trivial plugin might look like
| #include <sys/types.h>
| #include <sys/stat.h>
| #include <fcntl.h>
| #include <libgen.h>
| #include <errno.h>
|
| #include <sys/socket.h>
| #include <linux/un.h>
|
| #include <stdio.h>
| #include <stdlib.h>
| #include <string.h>
| #include <unistd.h>
|
| #include "criu-plugin.h"
| #include "criu-log.h"
|
| static int dump_ext_file(int fd, int id)
| {
| pr_info("dump_ext_file: fd %d id %d\n", fd, id);
| return 0;
| }
|
| CR_PLUGIN_REGISTER_DUMMY("trivial")
| CR_PLUGIN_REGISTER_HOOK(CR_PLUGIN_HOOK__DUMP_EXT_FILE, dump_ext_file)
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Now we supports sub-mntns, so root_ns_mask sounds more correct than
current_ns_mask.
v2: typo fix
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The --exec-cmd option specifies a command that will be execvp()-ed on successful
restore. This way the command specified here will become the parent process of
the restored process tree.
Waiting for the restored processes to finish is responsibility of this command.
All service FDs are closed before we call execvp(). Standad output and error of
the command are redirected to the log file when we are restoring through the RPC
service.
This option will be used when restoring LinuX Containers and it seems helpful
for perf or other use cases when restored processes must be supervised by a
parent.
Two directions were researched in order to integrate CRIU and LXC:
1. We tell to CRIU, that after restoring container is should execve()
lxc properly explaining to it that there's a new container hanging
around.
2. We make LXC set himself as child subreaper, then fork() criu and ask
it to detach (-d) from restore container afterwards. Being a subreaper,
it should get the container's init into his child list after it.
The main reason for choosing the first option is that the second one can't work
with the RPC service. If we call restore via the service then criu service will
be the top-most task in the hierarchy and will not be able to reparent the
restore trees to any other task in the system. Calling execve from service
worker sub-task (and daemonizing it) should solve this.
Signed-off-by: Deyan Doychev <deyandoichev@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Well, when we create external link with ip utility we cannot
specify its index. Thus we will not be able to move external
device to namespace and ask criu to restore link params.
However, RTM_SETLINK can happily work with the link name only.
And since we do have one in the images, we can omit setting
the index in the requrest.
TODO: Send patch with index specification to iproute2.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
If we meet a link we cannot dump we call plugin to check
whether it's the link, that should be treated as external.
Note, that on restore we don't call any plugins, but
consider the setup-namespace script to move the respective
link into the namespace. Links are not hierarchical and
can be moved between namespaces easily, so it's OK to
delegate the link creation to the script.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
For devices, that are available in netns we have a special
routine, that just restored link params.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Currently if a network namespace is dumped and something fails, sockets
remain in repair mode. It's because cpt_unlock_tcp_connections is
executed only if network namespace is not dumped.
cpt_unlock_tcp_connections disables repair mode for sockets and drops
netfilters. netfilters are not used in case of network namespaces.
v2: don't execute network-unlock scripts, if network namespace are not
dumped.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
By default just use the iptables-save and iptables-restore commands.
User may define CR_IPTABLES variable, in this case the "sh -c $CR_IPTABLES"
would be called.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We are going to replace pid on id in names of image files. The id is
uniq for each namespace, so it's more convient, if image files are
opened per namespace.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The current scheme is racy. It use open_detache_mount in a current
name-space. If a mount namespace is created by someone else between
mount and umount(detach) in open_detache_mount, the mount will be
propagated in the new mntns, then it is detached in a current ns and
rmdir fails, because it's still mounted in athother mntns.
This patch creates a new mount namespace for mounting sysfs.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We have generic do_pb_show() call and tons of show_foo
routines, that just call one with proper args. Compact
the code by putting the args into array and calling
the do_pb_show() in one place.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This thing is pretty straightforward -- on netns creation
populate it with tun-s, after this collect tun files, open
and attach them with regular fd-s engine.
One tricky thing -- when populating namespace with tun links
make them all persistent and drop this flag (if required)
later, when the first alive opened appears.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The major issue with dump is -- some info id get via netlink,
some via sysfs and some (!) via opened and attached tun file.
But the latter cannot be created, if there's another one attached
(or the mq device is full with threads).
Thus we have to dump this info via existing tun file and keep one
in memory till the link dump code takes place.
Opposite situation is also possible -- we can have a persistent
unattached device. In this case we have to attach to it, dump
things and detach back.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
TUN devices are created with ioctl, but their parameters (e.g.
flags with state, mtu, etc.) are to be restored with generic
RTM_SETLINK message.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Some information about network devices may hide in sysfs, thus
it's required to have one at hands while dumping the netns.
Create the detached mount for that.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Some (most) network devices would like to have NetDeviceEntry with
more fields, than currently present (and enough for lo and veth).
Prepare for that by allowing them to define their own callback that
would fill the resor of the pb entry and call write_netdev_img().
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Kernel has more and more links with rtnl-ops, which report
a string kind of the device, which is handy for debugging.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This will be needed for fast parsing of procfs ns references.
[ xemul: Add user_ns_desc here ]
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It's no longer required to use this option -- two currently
supported cases (tasks on host and tasks in containers) can
be detected automatically. Keep this option for future.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
These are structs that (now) tie together ns string
and the CLONE_ flag. It's nice to have one (some code
becomes simpler) and will help us with auto-namespaces
detection.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This reverts commit ef3771d566dacb8ee9fe71b744d56f08674fe3db.
With new SO_BINDTODEVICE getting API it's not required.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Many image files opened by open_image_ro weren't closed before return, fix
them all in this patch.
Signed-off-by: Huang Qiang <h.huangqiang@huawei.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It will be required to support socket bound to devices.
When restoring w/o net namespaces -- collect existing devices.
When restoring with them -- collect what is received from image.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
One function is used on restoring and one is used on dumping,
so each function has own prefix rst or cpt.
The both functions have the same effect, so the main part of the names
is same and it describes "unlock_tcp_connections".
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Restore must not fail after unlocking connections.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When restoring a container crtools create veth pair inside it and then
pushed one end to the namespaces crtools live in (outside). To facilitate
the subsequent management of the otter end of the veth pair this option
is added -- one can specifu a name by which the respective end would be
visible. E.g.: --veth-pair eth0=veth101.0
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>