When restoring venet device we need to restore its
index as well, which actually possible with new iproute2
package but the problem is that the index itself lays
inside image file. We could use crit tool to extract
it but this would slowdon procedure signifantly (need
to run python which would parse the image, or need
to pass the index into environmnet from inside of
the CRIU itself).
So lets do a trick and simply created venet device
inside container by criu itself (thanks we support
creating venet via netlink interface now).
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrew Vagin <avagin@odin.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
If the netns image is absent, the NetnsEntry entry will not be initialized.
Currently restore from old images crashes:
Core was generated by `criu swrk 3'.
Program terminated with signal SIGSEGV, Segmentation fault.
$0 0x0000000000427d80 in netns_entry.free_unpacked ()
(gdb) bt
$0 0x0000000000427d80 in netns_entry.free_unpacked ()
$1 0x0000000000436d07 in prepare_net_ns ()
$2 0x0000000000457c78 in prepare_namespace ()
$3 0x0000000000432917 in restore_task_with_children ()
$4 0x00007fc86acfccfd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
v2: remove debugging code
Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
icmp entries are missing on 3.10 kernel
(which is PCS7 default one) so we should
simply skip them on dump and restore.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Some entries might be missing and that should not cause
CRIU to stop dumping when we know the entries are safe
to unuse.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We might miss entry in "ri ? ri - 1" expression when ri = 1.
Lets use known array size instead.
For some reason it didn't trigger on my tests earlier.
Reported-by: Andrew Vagin <avagin@odin.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrew Vagin <avagin@odin.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We no longer need to populate ext_ns->mnt.mntinfo_list until
resolve_external_mounts(). We can rely on find_ext_ns_id() which
does collect_mntinfo() on demand.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
we don't need this check as just after it we have:
if (ret)
break;
which safely unpackes nde, and closes nlsk and img in case of error
ps: sorry for inconveniences caused by my patchset
Signed-off-by: Pavel Tikhomirov <ptikhomirov@odin.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
because namespace default options are set before devices creation,
devices will gain default options(except lo, which is created with ns).
Signed-off-by: Pavel Tikhomirov <ptikhomirov@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
In the rest of this series we need to walk all the namespaces to autodetect
which mounts are master/shared/private bind mounts, so we need the information
from criu's namespace in the case when the namespaces are not the same.
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
sockets.c: In function ‘preload_socket_modules’:
sockets.c:153:36: error: ‘NETLINK_SOCK_DIAG’ undeclared (first use in this function)
sockets.c:153:36: note: each undeclared identifier is reported only once for each function it appears in
Reported-by: Mr Travis
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
In the next patch we will need to care about the exact error reported by
the kernel, so add the error callback for this.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When restoring a pair of veth devices that had one end inside a namespace
or container and the other end outside, CRIU creates a new veth pair,
puts one end in the namespace/container, and names the other end from
what's specified in the --veth-pair IN=OUT command line option.
This patch allows for appending a bridge name to the OUT string in the
form of OUT@<BRIDGE-NAME> in order for CRIU to move the outside veth to
the named bridge. For example, --veth-pair eth0=veth1@br0 tells CRIU
to name the peer of eth0 veth1 and move it to bridge br0.
This is a simple and handy extension of the --veth-pair option that
obviates the need for an action script although one can still do the same
(and possibly more) if they prefer to use action scripts.
Signed-off-by: Saied Kazemi <saied@google.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
On pre-dump we collect only two namespaces -- the mnt one
for criu and mnt one again for root task.
This is not correct. We need all mount namespaces to make
the irmap generation work properly and we need all net
namespaces to have parasite sockets created.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Now we have netns on pstree-item and have the place
where to pre-create daemon socket in needed namespace.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The setns() syscall (called by switch_ns()) can be extremely
slow. If we call it two or more times from the same task the
kernel will synchonously go on a very slow routine called
synchronize_rcu() trying to put a reference on old namespaces.
To avoid doing this more than once I propose to create all
per-ns sockets in one place with one setns call. In this
patch there's on nl diag socket used to collect other sockets
is created this way.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Right now we don't support multiple net namespaces,
but some day we will. Other than this we have a logic
to distinguish cases with no namespaces vs one namespace,
so this walking already makes sence.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We want to have buffered images to speed up dump and,
slightly, restore. Right now we use plan file descriptors
to write and read images to/from. Making them buffered
cannot be gracefully done on plain fds, so introduce
a new class.
This will also help if (when?) we will want to do more
complex changes with images, e.g. store them all in one
file or send them directly to the network.
For now the cr_img just contains one int _fd variable.
This patch chages the prototype of open_image() to
return struct cr_img *, pb_(read|write)* to accept one
and fixes the compilation of the rest of the code :)
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Since we're going to switch from int-fd-s to class-image
soon the fdset name will not fit into the new terminology.
This patch is
sed -e 's/fdset/imgset/g' -i *
sed -e 's/imgset_fd/img_from_set/g' -i *
git mv include/fdset.h include/imgset.h
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Here we define new api to be used in plugins.
- Plugin should provide a descriptor with help of
CR_PLUGIN_REGISTER macro, or in case if plugin require
no init/exit functions -- with CR_PLUGIN_REGISTER_DUMMY.
- Plugin should define a plugin hook with help of
CR_PLUGIN_REGISTER_HOOK macro.
- Now init/exit functions of plugins takes @stage
argument which tells plugin which stage of criu
it's been called on dump/restore. For exit it
also takes @ret which allows plugin to know if
something went wrong and it needs to cleanup
own resources.
The idea behind is to not limit plugins authors with names
of functions they might need to use for particular hook.
Such new API deprecates olds plugins structure but to keep
backward compatibility we will provide a tiny layer of
additional code to support old plugins for at least a couple
of release cycles.
For example a trivial plugin might look like
| #include <sys/types.h>
| #include <sys/stat.h>
| #include <fcntl.h>
| #include <libgen.h>
| #include <errno.h>
|
| #include <sys/socket.h>
| #include <linux/un.h>
|
| #include <stdio.h>
| #include <stdlib.h>
| #include <string.h>
| #include <unistd.h>
|
| #include "criu-plugin.h"
| #include "criu-log.h"
|
| static int dump_ext_file(int fd, int id)
| {
| pr_info("dump_ext_file: fd %d id %d\n", fd, id);
| return 0;
| }
|
| CR_PLUGIN_REGISTER_DUMMY("trivial")
| CR_PLUGIN_REGISTER_HOOK(CR_PLUGIN_HOOK__DUMP_EXT_FILE, dump_ext_file)
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Now we supports sub-mntns, so root_ns_mask sounds more correct than
current_ns_mask.
v2: typo fix
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The --exec-cmd option specifies a command that will be execvp()-ed on successful
restore. This way the command specified here will become the parent process of
the restored process tree.
Waiting for the restored processes to finish is responsibility of this command.
All service FDs are closed before we call execvp(). Standad output and error of
the command are redirected to the log file when we are restoring through the RPC
service.
This option will be used when restoring LinuX Containers and it seems helpful
for perf or other use cases when restored processes must be supervised by a
parent.
Two directions were researched in order to integrate CRIU and LXC:
1. We tell to CRIU, that after restoring container is should execve()
lxc properly explaining to it that there's a new container hanging
around.
2. We make LXC set himself as child subreaper, then fork() criu and ask
it to detach (-d) from restore container afterwards. Being a subreaper,
it should get the container's init into his child list after it.
The main reason for choosing the first option is that the second one can't work
with the RPC service. If we call restore via the service then criu service will
be the top-most task in the hierarchy and will not be able to reparent the
restore trees to any other task in the system. Calling execve from service
worker sub-task (and daemonizing it) should solve this.
Signed-off-by: Deyan Doychev <deyandoichev@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Well, when we create external link with ip utility we cannot
specify its index. Thus we will not be able to move external
device to namespace and ask criu to restore link params.
However, RTM_SETLINK can happily work with the link name only.
And since we do have one in the images, we can omit setting
the index in the requrest.
TODO: Send patch with index specification to iproute2.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
If we meet a link we cannot dump we call plugin to check
whether it's the link, that should be treated as external.
Note, that on restore we don't call any plugins, but
consider the setup-namespace script to move the respective
link into the namespace. Links are not hierarchical and
can be moved between namespaces easily, so it's OK to
delegate the link creation to the script.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
For devices, that are available in netns we have a special
routine, that just restored link params.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Currently if a network namespace is dumped and something fails, sockets
remain in repair mode. It's because cpt_unlock_tcp_connections is
executed only if network namespace is not dumped.
cpt_unlock_tcp_connections disables repair mode for sockets and drops
netfilters. netfilters are not used in case of network namespaces.
v2: don't execute network-unlock scripts, if network namespace are not
dumped.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
By default just use the iptables-save and iptables-restore commands.
User may define CR_IPTABLES variable, in this case the "sh -c $CR_IPTABLES"
would be called.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>