2
0
mirror of https://github.com/checkpoint-restore/criu synced 2025-08-26 20:07:28 +00:00

1696 Commits

Author SHA1 Message Date
Laurent Dufour
303b875892 arch/ppc64: Add PowerPC 64 LE support
This patch initiates the ppc64le architecture support in CRIU.

Note that ppc64 (Big Endian) architecture is not yet supported since there
are still several issues to address with this architecture. However, in the
long term, the two architectures should be addressed using the almost the
same code, so sharing the ppc64 directory.

Major ppc64 issues:

Loader is not involved when the parasite code is loaded. So no relocation
is done for the parasite code. As a consequence r2 must be set manually
when entering the parasite code, and GOT is not filled.

Furthermore, the r2 fixup code at the services's global address which has
not been fixed by the loader should not be run. Branching at local address,
as the assembly code does is jumping over it.

On the long term, relocation should be done when loading the parasite code.

We are introducing 2 trampolines for the 2 entry points of the restorer
blob.  These entry points are dealing with r2. These ppc64 specific entry
points are overwritting the standard one in sigreturn_restore() from
cr-restore.c.  Instead of using #ifdef, we may introduce a per arch wrapper
here.

CRIU needs 2 kernel patches to be run powerpc which are not yet upstream:
 - Tracking the vDSO remapping
 - Enabling the kcmp system call on powerpc

Feature not yet supported:
- Altivec registers C/R
- VSX registers C/R
- TM support
- all lot of things I missed..

Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-04-30 09:57:49 +03:00
Andrey Vagin
25267e5b30 lock: parse the lock field in fdinfo if it's avaliable (v2)
/proc/locks can contain a wrong pid for a lock and we always need to
check this fact.  Starting with the 4.1 kernel, locks are reported
in fdinfo.

v2: rebase to the curret master
    skip note_file_lock()

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-04-27 14:53:24 +03:00
Andrey Vagin
b9c14a09b0 kerndat: check the lock field in fdinfo (v2)
Starting with the 4.1 kernel, fdinfo contains information about file
locks.

v2: s/has_lock/has_fdinfo_lock/
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-04-27 14:53:22 +03:00
Christopher Covington
cefe22bdac Use run-time page size where it matters
In AArch64, pages may be 4K or 64K depending on kernel configuration.
The GNU C Library documentation suggests [1], "the correct interface
to query about the page size is sysconf". Introduce one new
architecture-specific function-like macro, page_size(), that on x86
and AArch32 remains a constant so as to minimally affect performance,
but on AArch64 is sysconf(_SC_PAGESIZE) for correctness.

1. https://www.gnu.org/software/libc/manual/html_node/Query-Memory-Parameters.html

To minimize churn, the PAGE_SIZE macro is left as a build-time
estimation of what the run-time page size might be.

This fixes the following errors for CRIU on AArch64 kernels with
CONFIG_ARM64_64K_PAGES=y, allowing dump of
`setsid sleep < /dev/null &> /dev/null` to succeed.

Error (kerndat.c:48): Can't stat self map_files: No such file or directory

Error (util.c:668): Can't read pme for pid 90: No such file or directory

Error (parasite-syscall.c:1135): Can't open 89/map_files/0x3ffb7da0000-0x3ffb7dac000 on procfs: No such file or directory

Signed-off-by: Christopher Covington <cov@codeaurora.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-04-22 15:39:05 +03:00
Andrey Vagin
a1ca6efa50 service: allocate buffers for messages dinamically (v2)
Currently we use a static buffer, but it is too small.

Error (cr-service.c:58): Failed unpacking request: Success
Error (cr-service.c:694): Can't recv request: Success
data too short after length-prefix of 1217

v2: use recv instead on recvmsg

Reported-by: Ross Boucher <rboucher@gmail.com>
Cc: Ross Boucher <rboucher@gmail.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Ruslan Kuprieiev <rkuprieiev@cloudlinux.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-04-21 16:09:09 +03:00
Oleg Nesterov
745f845fa8 revert 246367e4e483 "add walk_all flag to walk_namespaces"
We no longer need to populate ext_ns->mnt.mntinfo_list until
resolve_external_mounts(). We can rely on find_ext_ns_id() which
does collect_mntinfo() on demand.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-04-14 22:34:40 +03:00
Pavel Emelyanov
f5ea330ce1 img: Introduce v1.1 images (v2)
These images have common magic in front of per-image one. With
this we have 3 "types" of images -- inventory (head), other
images, service files. The latter would be stats (not an image,
just happen to be in PB format) and irmap cache (not an image
again, just auxiliary thing which is in PB for convenience).

Since inventory file is the first one we read on restore it's
OK to set the global "new images" flag there. Dump (write) is
always in new format.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Ruslan Kuprieiev <rkuprieiev@cloudlinux.com>
Acked-by: Andrew Vagin <avagin@odin.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
2015-04-14 15:18:32 +03:00
Tycho Andersen
fcae4f3954 mnt: add --enable-external-masters option
This option enables external (slave) bind mounts to be resolved.

v2: don't always assume that when the master id matches, the mounts match

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-04-10 17:54:51 +03:00
Tycho Andersen
0afffc9dc1 mnt: add --enable-external-sharing flag
With this flag, external shared bind mounts are attempted to be resolved
automatically.

v2: don't always assume when the sharing matches that the mount matches

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-04-10 17:54:12 +03:00
Tycho Andersen
aebfabb5ad mnt: add --ext-mount-map auto option
When this option is specified, if an external (private) bind mount is not
specified by --ext-mount-map KEY:VAL then it is attempted to be resolved
automatically.

v2: introduce find_best_external_match, which looks for the best match based on
    sharing/slave ids; don't try to resolve fsroot_mounted() mountpoints
v3: get rid of really_collect_self_mounts
v4: get rid of fsroot_mounted() check when autodetecting external mounts

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-04-10 17:52:14 +03:00
Oleg Nesterov
e2c38245c6 introduce --enable-fs cli option
Finally add --enable-fs option to specify the comma separated list of
filesystem names which should be treated as FSTYPE_AUTO.

Note: obviously this option is not safe, use at your own risk. "dump"
will always succeed if the mntpoint is auto, but "restore" can fail or
do something wrong if mount(src, mountpoint, flags, options) can not
actually "just work" as FSTYPE_AUTO logic expects.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-04-10 17:35:43 +03:00
Pavel Tikhomirov
2b49efeaf3 add netns protobuf entry and image, also add conf to net device entry
Signed-off-by: Pavel Tikhomirov <ptikhomirov@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-04-09 18:59:17 +03:00
Tycho Andersen
246367e4e4 add walk_all flag to walk_namespaces
In the rest of this series we need to walk all the namespaces to autodetect
which mounts are master/shared/private bind mounts, so we need the information
from criu's namespace in the case when the namespaces are not the same.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-04-09 12:53:19 +03:00
Cyrill Gorcunov
391d589482 options: Use union for @daemon and @restore_detach
They both are using 'd' option in different context
though, lets give them two names.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-04-06 18:06:16 +03:00
Oleg Nesterov
eb518936d8 introduce --skip-mnt cli option
Which obviously can be used to "ignore" the mounts we do not want or
need to dump. The user should know what he does.

Note: this patch changes parse_mountinfo() to check should_skip_mount().
This is because imo we want to filter out the unwanted mounts asap, af
if they do not exist. This increases the chances the dumping will fail
if something else depends on this mount. Say, another mountpoint or an
opened file.

Perhaps it makes sense to teach should_skip_mount() to use fnmatch()
and/or look at the optional "(fs|mnt)=" prefix to skip by fsname too.

To me it would be better to force the user of this option to understand
what it does. Say, if "dump" fails because the child mount can't find
the skipped parent, he should add another --skip-mnt option or do not
dump. Otherwise, if we do this automagically the user can probably be
surpised, he might even miss the fact that we skip more than he asked.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-04-03 17:56:05 +03:00
Oleg Nesterov
9fee3dc817 pass "bool for_dump" argument down to collect_mntinfo() and parse_mountinfo()
Preparation.

1. Add the new "bool for_dump" arg to collect/parse_mntinfo().

2. Introduce "struct collect_mntns_arg" to pass the additional
   "bool for_dump" field to collect_mntinfo() and change it to
   pass this boolean to collect_mntinfo()->parse_mountinfo() path.

3. Change other callers of collect_mntinfo() to pass "false".

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-04-03 17:55:18 +03:00
Cyrill Gorcunov
243e58f0e3 tty: Implement support of current tty
Opening current tty is tricky: first slave peer should
be opened and session restored, and only then we can
open /dev/tty. So that I made rst_info to carry
additional list @tty_ctty where all current ttys
get gathered and opened after slave peers were
brought to live.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-04-02 20:20:08 +03:00
Cyrill Gorcunov
25abdf3ac4 tty: Rework tty_driver structure
- rename @t to @type and use protobuf constants here instead
 - for special features use @subtype just like kernel does
 - get rid of TTY_TYPE_ constants, we don't need them
 - drop @flags, we don't need it anymore

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-04-02 20:20:01 +03:00
Cyrill Gorcunov
9ce0254c04 vma: Unify private VMAs testing
We have two helpers for VMA type testing: privately_dump_vma() and vma_priv(). They
work with different types but basically do the same: check if we should dump VMA into
the image and restore it back then.

Lets unify they both into common vma_entry_is_private() helper and vma_area_is_private()
for working with vma_area type.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-04-01 12:36:46 +03:00
Andrey Vagin
b23268e492 service: add ability to set inherit file descriptors (v3)
This is required to use criu swrk in libcontainer.

v2: remove useless function declaration
    allow to set inherit_fd only for swrk
v3: check swrk out of loop

Cc: Saied Kazemi <saied@google.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-03-30 13:09:25 +03:00
Andrey Vagin
a66217a253 image: open lazy images in img_raw_fd()
Lazy images are opened on a first attempt of using.

00:01:18.534 Test: zdtm/live/static/pipe00, Result: FAIL
00:01:18.537 ==================================== ERROR ====================================
00:01:18.538 Test: zdtm/live/static/pipe00, Namespace: 1
00:01:18.538 Dump log   : /var/lib/jenkins/jobs/CRIU/workspace/test/dump/ns/static/pipe00/13536/1/dump.log
00:01:18.540 --------------------------------- grep Error ---------------------------------
00:01:18.543 (00.026666) Error (include/image.h:153): BUG at include/image.h:153
00:01:18.543 (00.050663) Error (namespaces.c:801): Namespaces dumping finished with error 134
00:01:18.543 (00.050918) Error (cr-dump.c:1979): Dumping FAILED.
00:01:18.545 ------------------------------------- END -------------------------------------
00:01:18.548 ================================= ERROR OVER =================================

Reported-by: Mr Jenkins
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-03-16 17:17:04 +03:00
Pavel Emelyanov
8ce37e676a img: Don't create empty images
Currently on dump we generate too many image files, effectively
all the stuff from the GLOB set is created. The thing is that
sometimes some of created images can be empty (just contain the
magic number at the head). Thos images are useless and just
waste the space.

When applied after the "empty images" set, this introduces the
lazy images -- when we call open_image() the actual file is
only created (and the magic number is written into it) when the
very first object goes into it.

For example for the simplest test we have, then static/env00
one, the created image files are

   core-7290.img
   creds-7290.img
   fdinfo-2.img
   fs-7290.img
   ids-7290.img
   inventory.img
   mm-7290.img
   pagemap-7290.img
   pages-1.img
   pstree.img
   reg-files.img
   sigacts-7290.img

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-03-16 15:58:32 +03:00
Pavel Emelyanov
7ede4697cf bfd: Don't leak image-open flags into bfdopen
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-03-16 15:58:14 +03:00
Pavel Emelyanov
f7f76d6ba6 img: Introduce empty images
When an image of a certian type is not found, CRIU sometimes
fails, sometimes ignores this fact. I propose to ignore this
fact always and treat absent images and those containing no
objects inside (i.e. -- empty). If the latter code flow will
_need_ objects, then criu will fail later.

Why object will be explicitly required? For example, due to
restoring code reading the image with pb_read_one, w/o the
_eof suffix thus required the object to be in the image.

Another example is objects dependencies. E.g. fdinfo objects
require various files objects. So missing image files will
result in non-resolved searches later.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-03-13 14:42:54 +03:00
Pavel Emelyanov
45a0cc4234 page-read: Explicitly mark ENOENT with return code
When page-read fails to open the pagemap image it reports error.
One place (stacked page-reads) need to handle the absent images
case gracefully, so fix the return codes to make this check
work.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-03-13 14:42:11 +03:00
Pavel Emelyanov
e29c9daec2 img: Remove O_OPT and COLLECT_OPTIONAL
Current code doesn't make any difference between OPT and no-OPT
except for the message is printed or not in the open_image().
So this particular change changes nothing but the availability of
this message.

In the next patches I wil introduce "empty images" to deal with
the ENOENT situation in a more graceful manner.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-03-13 14:42:01 +03:00
Cyrill Gorcunov
19948472d9 tty: Rename tty_type to tty_driver
There are too many "type" in code.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-03-10 21:16:22 +03:00
Cyrill Gorcunov
652fbf3bd1 tty: Drop redundant constants
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-03-10 21:16:10 +03:00
Pavel Emelyanov
f32f4ffa76 img: Open images for dump in O_WRONLY mode
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-03-09 22:21:15 +03:00
Pavel Emelyanov
618c17b6f8 img: Simplify the open_image() macro
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-03-09 22:21:08 +03:00
Pavel Emelyanov
dceb6633c7 page-read: Introduce custom flags for opening
Instead of open flags and boolean is_shmem argument.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-03-04 17:50:32 +03:00
Cyrill Gorcunov
3bd6d9d7b0 image: Add comments about VMA_AREA constants and drop FORCE_READ flag
Force-read came from very first dev version of CRIU (even before 1.0 release)
and never been used actually in image.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-03-04 17:48:47 +03:00
Pavel Emelyanov
057f00ce92 tty: Make tty type be object rather than integer
The plan is to replace tons of if (type == TTY_TYPE_FOO) checks
with type->something dereferences.

To do this, start with replacing int type with struct tty_type *
in relevant places and fixing compilation.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-03-04 17:47:04 +03:00
Pavel Emelyanov
a7601d6a50 tty: Move tty_type() and is_pty() to tty.c
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-03-04 17:46:16 +03:00
Cyrill Gorcunov
bec5a023d1 tty: Fix mistyping of /dev/tty
/dev/tty stands for current terminal which we don't yet
implemented a support for.

This is a bugfix for upcoming stable version, the proper
support of /dev/tty is gonna be implemented separately.

Reported-by: Saied Kazemi <saied@google.com>
CC: Andrew Vagin <avagin@parallels.com>
CC: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-02-20 00:11:38 +03:00
Saied Kazemi
1b4e9058e8 Do not call listen() when SO_REUSEADDR is off
For an established TCP connection, the send queue is restored in two
steps: in step (1), we retransmit the data that was sent before but not
yet acknowledged, and in step (2), we transmit the data that was never
sent outside before.  The TCP_REPAIR option is disabled before step (2)
and re-enabled after step (2) (without this patch).

If the amount of data to be sent in step (2) is large, the TCP_REPAIR
flag on the socket can remain off for some time (O(milliseconds)).  If a
listen() is called on another socket bound to the same port during this
time window, it fails. This is because -- turning TCP_REPAIR off clears
the SO_REUSEADDR flag on the socket.

This patch adds a mutex (reuseaddr_lock) per port number, so that a
listen() on a port number does not happen while SO_REUSEADDR for another
socket on the same port is off.

Thanks to Amey Deshpande <ameyd@google.com> for debugging.

Signed-off-by: Saied Kazemi <saied@google.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-02-16 13:18:32 +03:00
Andrey Vagin
3f23bde548 criu: print correct errno messages from pr_perror()
"%m" can't be used to print strerror(errno), because print_on_level()
calls gettimeofday() which can overwrite errno.

For example:
13486 connect(4, {sa_family=AF_INET, sin_port=htons(8880), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 ENETUNREACH (Network is unreachable)
13486 gettimeofday({1423756664, 717423}, NULL) = 0
13486 open("/etc/localtime", O_RDONLY|O_CLOEXEC) = -1 EACCES (Permission denied)
13486 write(2, "15:57:44.717:     4: ERR: socket_udp.c:73: Can't connect (errno = 101 (Permission denied))\n", 91) = 91

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-02-13 15:14:44 +03:00
Pavel Emelyanov
9a392dff3a reg-files: Do not try to linkat with wrong user
We link files to each other at restore time to restore
unlinked paths. Kernel has strange secutiry restrictions
about linkat we use. If the fsuid of the caller doesn't
equals the uid of the file and the file is not "safe"
one, then only global CAP_CHOWN will be allowed to link().

This brings problems in user namespaces -- uns root is
not allowed to linkat any file, unlike global root.

Fortunately, we can change the fsuid temporarily and
still linkat the file we want. Hopefully this hack will
go away some day soon, when the kernel will have saner
checks for linkat capabilities.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
2015-02-13 16:11:38 +04:00
Pavel Emelyanov
b8556e8084 usernsd: The way to restore priviledged stuff in userns
We have collected a good set of calls that cannot be done inside
user namespaces, but we need to [1]. Some of them has already
being addressed, like prctl mm bits restore, but some are not.

I'm pretty sceptical about the ability to relax the security
checks on quite a lot of them (e.g. open-by-handle is indeed a
very dangerous operation if allowed to unpriviledged user), so
we need some way to call those things even in user namespaces.

The good news about it its that all the calls I've found operate
on file descriptors this way or another. So if we had a process,
that lived outside of user namespace, we could ask one to do the
high priority operation we need and exchange the affected file
descriptor via unix socket.

So the usernsd is the one doing exactly this. It starts before we
create the user namespace and accepts requests via unix socket.
Clients (the processes we restore) send him the functions they
want to call, the descriptor they want to operate on and the
arguments blob. Optionally, they can request some file descriptor
back after the call.

In non usernamespace case the daemon is not started and the calls
are done right in the requestor's process environment.

In the next patch there's an example of how to use this daemon
to do the priviledged SO_SNDBUFFORCE/_RCVBUFFORCE sockopt on
a socket.

[1] http://criu.org/UserNamespace

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@openvz.org>
2015-02-13 16:11:38 +04:00
Ruslan Kuprieiev
09c3f5d0c7 security: add cr_fchown
Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-02-10 16:54:31 +03:00
Ruslan Kuprieiev
df301b7eb7 security: create separate security.h header
Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-02-10 16:53:54 +03:00
Pavel Emelyanov
1bbc994ccf sysctl: Remove dead CTL_PRINT|_SHOW code
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-01-27 16:18:27 +03:00
Andrey Vagin
4dbc3f093a sockets: define NETLINK_SOCK_DIAG in sockets.h
sockets.c: In function ‘preload_socket_modules’:
sockets.c:153:36: error: ‘NETLINK_SOCK_DIAG’ undeclared (first use in this function)
sockets.c:153:36: note: each undeclared identifier is reported only once for each function it appears in

Reported-by: Mr Travis
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-01-23 15:40:02 +03:00
Pavel Emelyanov
0749ef23e9 check/zdtm: Introduce fine-grained feature testing
Right now we state that CRIU works on 3.11 and above kernels and, at the
same time, have support for a couple of new features like aio, tun, timerfd
etc. available in later kernels. Since these new features do not break
generic operations we do not require them in the kernel strictly.

However, in the zdtm tests it's very important to know exactly what can
and what cannot be tested. Right now this is done in a tough manner -- if
the kernel is not 3.11 or criu check fails for _any_ reason we treat the
kernel as being "bad" and throw out a set of tests.

I propose to test some individual features and form the list of tests
in a more fine-grained manner.

This patch only fixes the AIO, mnt_id, tun and posix-timers tests. Next
I will add checks and fixes for user-namespaces tests.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
2015-01-22 18:55:34 +03:00
Pavel Emelyanov
674df19a34 nlk: Add error callback to do_rtnl_req
In the next patch we will need to care about the exact error reported by
the kernel, so add the error callback for this.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-01-22 18:54:37 +03:00
Saied Kazemi
296129295a Allow the veth-pair option to specify a bridge
When restoring a pair of veth devices that had one end inside a namespace
or container and the other end outside, CRIU creates a new veth pair,
puts one end in the namespace/container, and names the other end from
what's specified in the --veth-pair IN=OUT command line option.

This patch allows for appending a bridge name to the OUT string in the
form of OUT@<BRIDGE-NAME> in order for CRIU to move the outside veth to
the named bridge.  For example, --veth-pair eth0=veth1@br0 tells CRIU
to name the peer of eth0 veth1 and move it to bridge br0.

This is a simple and handy extension of the --veth-pair option that
obviates the need for an action script although one can still do the same
(and possibly more) if they prefer to use action scripts.

Signed-off-by: Saied Kazemi <saied@google.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-01-12 14:54:18 +03:00
Pavel Emelyanov
a1b1959dd1 shmem: Turn shmem-info into shared objects from shremap ones
We have a nasty issue with it. Current code allocates these
entries in shremap area one by one. We do NOT allocate any
OTHER entries in this region, but if we will this array will
be spoiled.

Fortunately we no longer need shmem-infos as plain array,
neither we need one in restorer. So just turn this into plain
shared objects and collect them in a list.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-01-12 14:47:24 +03:00
Pavel Emelyanov
b246ccb181 shmem: Move some code to shmem.c file
The struct and find routine used to be use by restorer code. Now
the former fully uses vmas and fd opened, so we can move the code
into .c file not to spoil global namespace.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-01-12 14:47:17 +03:00
Pavel Emelyanov
455f9b564e fd: Factor out inheriting FDs code
We have two places where we lookup the inherited-fd list
by name and dup() the descriptor found. I propose to factor
out this piece in a single inherited_fd() call. When
we will want to support inheritance for sockets or any
other files we'll simply add the inherited_fd() call
there.

I'm also thinking about moving the call to inherited_fd
into generic level, but the open_path() routine doesn't
allow to do it in a simple manner.

Also we have not yet finished issue with files-vs-inodes
mapping. Keeping all the logic in one function should
make the solution simpler.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-01-12 14:46:51 +03:00
Pavel Emelyanov
8f691c40d5 fd: Mark inherit_fd_lookup_fd static
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-01-12 14:46:42 +03:00