We are going to parse fdinfo for getting mnt_id,
so we can take there pos and flags and don't call
fcntl and lseek for that.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
For all other tasks only unsed service descriptors will be closed.
This change allows to have file descriptors, which may be used for
restoring namespaces. All non-server descriptors must be closed before
restoring files.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This patch moves the files arch/$ARCH/include/asm/int.h to
include/asm-generic/int.h and makes the types {u,s}{8,16,32}
be aliases of the fixed sized integer types [u]int{8,16,32}_t.
This makes it possible to use single set of integer typedefs
in all architectural ports.
Signed-off-by: Alexander Kartashov <alekskartashov@parallels.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The devpts instance was mounted w/o the newinstance option if,
the device number is equal to the root /dev/pts.
I think this condition is strong enough to not mount devpts in a
temporary place.
v2: move the host.bla-bla-bla in kerndat.c
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The --exec-cmd option specifies a command that will be execvp()-ed on successful
restore. This way the command specified here will become the parent process of
the restored process tree.
Waiting for the restored processes to finish is responsibility of this command.
All service FDs are closed before we call execvp(). Standad output and error of
the command are redirected to the log file when we are restoring through the RPC
service.
This option will be used when restoring LinuX Containers and it seems helpful
for perf or other use cases when restored processes must be supervised by a
parent.
Two directions were researched in order to integrate CRIU and LXC:
1. We tell to CRIU, that after restoring container is should execve()
lxc properly explaining to it that there's a new container hanging
around.
2. We make LXC set himself as child subreaper, then fork() criu and ask
it to detach (-d) from restore container afterwards. Being a subreaper,
it should get the container's init into his child list after it.
The main reason for choosing the first option is that the second one can't work
with the RPC service. If we call restore via the service then criu service will
be the top-most task in the hierarchy and will not be able to reparent the
restore trees to any other task in the system. Calling execve from service
worker sub-task (and daemonizing it) should solve this.
Signed-off-by: Deyan Doychev <deyandoichev@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This patch splits the file arch/x86/vdso-pie.c into machine-dependent
and machine-independent parts by moving the routines vdso_fill_symtable(),
vdso_proxify(), and vdso_remap() to the file pie/vdso.c.
The ARM version of the routines is moved to the source pie/vdso-stub.c
to provide the vDSO proxy stub implementation for architectures
that don't provide the vDSO.
Signed-off-by: Alexander Kartashov <alekskartashov@parallels.com>
Looks-good-to: Cyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This patch moves the enum VDSO_SYMBOL_* and macros VDSO_SYMBOL_*_NAME
to the x86 specific header since different architectures export
different symbols from their vDSOs.
Signed-off-by: Alexander Kartashov <alekskartashov@parallels.com>
Looks-good-to: Cyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Supported machine architectures provide TLS stogares of different sizes:
the size of the TLS storage in x86-64 is 24 bytes, ARM --- 4 bytes
and upcoming AArch64 --- 8 bytes. This means every supported architecture
needs a specific type to store the value of the TLS register.
This patch reworks the insterface of the routines arch_get_tls()
and restore_tls() passing them the TLS storage by pointer
rather than by value to simplify the TLS stub for x86.
Signed-off-by: Alexander Kartashov <alekskartashov@parallels.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
No all distros (Rpi) provide O_PATH definition,
so include fcntl.h here thus we don't hit compilation
problem like
| CC image.o
| image.c: In function ‘open_image_at’:
| image.c:187:29: error: ‘O_PATH’ undeclared (first use in this function)
Reported-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
On Thu, Mar 13, 2014 at 02:30:50PM +0400, Cyrill Gorcunov wrote:
>
> This image is deprecated now so move it out of
> _CR_FD_TASK thus we won't be even generating it
> on the dump.
>
> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Updated
>From cb9c3953beac7d42de80635e7a6e537cc867c479 Mon Sep 17 00:00:00 2001
From: Cyrill Gorcunov <gorcunov@openvz.org>
Date: Thu, 13 Mar 2014 14:24:50 +0400
Subject: [PATCH 7/7] rlimit: Move CR_FD_RLIMIT out of _CR_FD_TASK
This image is deprecated now so move it out of
_CR_FD_TASK thus we won't be even generating it
on the dump.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This allows us to distinguish the situation where image
to be opened is missing but optional, thus no error message
should be printed.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It's going to be used for restoring namespaces. For example we need to
enumirate the ns_ids list for restoring mount namespaces.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The implementation of bit operations for ARM isn't actually
architecture-specific so it would rather be shared with
the upcoming port for AArch64 that won't provide optimized
implementation of bit operations.
Signed-off-by: Alexander Kartashov <alekskartashov@parallels.com>
Reviewed-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
when decide that data is no longer needed, there are two cases:
-if data neighbours previous block of "no needed" data, extend bunch
block(it holds begining and size of concequent "no needed" data) by
length of curent block and go next.
-if data not neighbours bunch block(or bunch block size will be bigger
than MAX_BUNCH_SIZE), than we punch bunch block and set bunch block
to curent block.
in the end make cleanup to punch last bunch block.
changes in v1:
punch_hole takes whole page_read
make restriction more precise
Signed-off-by: Tikhomirov Pavel <snorcht@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When migrating container with copying its FS, the inode numbers
and thus their handles wil change. This will make the restore of
inotify/fanotify fail, since they do it via fhandles.
We've already faced the problems with fsnotifies on NFS -- they
don't work there. To address this an irmap cache is created on
pre-dump, so to resolve the issue with changed inodes during
migration, we can force the irmap cache build.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This option will serve to manage CPU capabilities
to be matched/ignored on restore procedure. At the
moment we introduce 'fpu','all' capability arguments.
By default 'all' is set.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Not all distros provide magic numbers we might need
during build procedure, thus provide own definitions
in one known place.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Otherwise if the mark is set up on link we end
with -ELOOP error trying to open it. Thus, use
O_PATH pointing the kernel that we're not going
to read/write this descriptor.
Repored-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
At the moment we are using 4K pages all the time,
so instead of copying code over all archs we're
supporting -- add asm-generic/page.h header.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We have #define BUG() BUG_ON(true) here, where 'true'
is defined in stdbool header, so to be able to include
bug.h on its own -- include needed header inplace.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
These are needed to compile project on CentOS 6.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Pavel reported that in case if there a big number
of small vmas present in the dumpee we're reading
/proc/pid/pagemap too frequently.
To speedup this procedue we inroduce pagemap cache.
The interface is:
- pmc_init/pmc_fini for cache initialization and freeing
- pmc_get_map to retrieve specific PMEs array for VMA area
v2:
- Move internal constants to pagemap-cache.c
- Make PAGEMAP_LEN to accept virtual address/size
- Don't adjust low bound in caching mode to save a couple of code bytes
Reported-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We want to write into empty image files, so we
unlink them before dumping into. Let's O_TRUNC
it instead.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
PIPE_MAX_SIZE is calculated according with the kernel code.
PPB_IOV_BATCH has been taken from my mind.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The problem is that vmsplice() to a big pipe fails very often.
The kernel allocates a linear chunk of memory for pipe buffer
descriptos, but a big allocation in kernel can fail.
So we need to restrict maximal capacity of pipes. But the number of
pipes is restricted too, so we need to split dumping memory on chunks.
In this patch we calculates the pipe size for which vmsplice() will not
fail.
v2: s/batch/chunk and a few other small fixes
v3: Remove callbacks from page_pipes and reuse pipes
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The makefile includes need to be moved for everything to be
defined properly when the configure tests run.
The Ubuntu 12.04 x86_64 GCC and Linaro's 13.08 and newer AArch64
GCC's have the if_packet.h kernel header, but as of 13.12,
the Linaro AArch32 GCC does not.
Change-Id: I363c43fdb81b028f99aac77e15bff9462c87af4b
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
CONFIG_HAS_PEEKSIGINFO_ARGS is used there.
Cc: Christopher Covington <cov@codeaurora.org>
Reported-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Same for previous patch with vmas -- we do it on collect and
on real open. Just put the pointer on fifo_info structure.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We do it first -- on collect, second -- on restore. The
2nd lookup is excessive, we can put fd pointer on vm_area
at lookup and reuse one later.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
On restore we only need to know currnet task mappings' start and end
to find where to put the restorer blob. And since the smaps file in
/proc/pid is up to 3 times slower, than the maps one, it makes
perfect sense just to parse the latter one.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
If a file mmaped or pointed by exe link is unlinked, we will
generate a ghost file for it. On restore the ghost file will
be created with the users counter 1 and the very first open
(e.g. for mmap) will unlink the file.
Handle this by bumping up user counter for every mapping
pointing on the file.
This appeared after previous patches that packed the reg-files
image. Before it each vma and exe link created separate entry
in the reg-files image.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
For unlinked opened and mmaped files we'd need to
care about remaps, for this the callback with both
file_desc and fdinfo_list_entry will be required.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
If we specify log level to none (0) the result is LOG_INFO (2).
Acked-by: Andrew Vagin <avagin@parallels.com>
Acked-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When writing VMAs we perform too many small writes into vma-.img files.
This can be easily fixed by moving the vma-s into mm-s, all the more
so they cannot be splitted from each other.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>