Currently, libcriu is connecting to CRIU service
by itself, just asking user for a path to socket.
But in some cases users need to provide fd instead
path. For example, sometimes task has no access to
criu socket because of strict security mesures, but
is able to inherit fd from a parent that has access
to criu socket.
v2, use union for addr and fd
Signed-off-by: Ruslan Kuprieiev <rkuprieiev@cloudlinux.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
criu_opts contains rpc options and notify callback,
so we can keep all options in just one structure.
This will allow us to easily extend libcriu functionality
and yet keep all options in one place.
We're also not hiding rpc opts structure anymore, so
it is pretty clear where power-user should put his own
CriuOpts instance if he would like to do that.
Signed-off-by: Ruslan Kuprieiev <rkuprieiev@cloudlinux.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
8ffbe754bd moved the rst_mem_lock() call, but didn't move the
corresponding LSM allocations, so we do that here.
One unfortunate thing is that we have to split this into two steps: first
we have to read the creds to figure out exactly how much memory to
allocate for the lsm string. Since prepare_creds() wants to write directly
to the task_restore_args struct and that can't be allocated until after we
lock the restore memory, we break it up into two steps.
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We need to wait for listen() as well as bind() for internal unix sockets, or we
can race like this:
(00.135950) 1: Opening standalone socket (id 0xb ino 0x9422f peer 0)
(00.135974) 353: Error (sk-unix.c:701): Can't connect 0x947c4 socket: Connection refused
(00.136390) 1: Error (cr-restore.c:1228): 353 exited, status=1
(00.136407) 1: Putting 0x9422f into listen state
(where 0x9422f is the peer for 0x947c4)
This race was pretty rare for me, but I've run 1000 tests and it didn't
happen so hopefully this patch fixes it :)
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Andrew Vagin <avagin@odin.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
In commit c7d646afb3 we introduced cgroup resotre
modes but when option passed via RPC code it simply
either true or false which erroniously maps to
CG_MODE_PROPS or CG_MODE_IGNORE modes.
Lets map @true to CG_MODE_SOFT to preserve backward
compatibility and enhance this option in future via
separate option.
Reported-by: Ross Boucher <rboucher@gmail.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It's rudiment. close_old_fds() closes all extra descriptors.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
* reopen a pipe descriptor via /proc/self/fd/X
* give another end of a pipe to "criu restore"
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When we restore an inhereted pipe, we have only one end and we
don't know whether it's write or read one. So we need to call
reopen_pipe each time.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Otherwise we can get an error like this:
1: \t\tCreate transport fd /crtools-fd-1-5
...
1: Found id pipe:[122747] (fd 8) in inherit fd list
1: File pipe:[122747] will be restored from fd 9 duped from inherit
1: Error (util.c:131): fd 5 already in use (called at files.c:872)
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
rst_tcp_repair_sockets is filled when all sockets are collected,
but the repair mode should be disabled only for sockets which
are restored in a current process.
https://bugzilla.openvz.org/show_bug.cgi?id=3281
v2: add a comment
v3: typo fix
Fixes: 73e303c8e2 ('rst: Fix rst_tcp_sock memory management')
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We don't use this any more (and the test was deleted in a previous patch),
so let's get rid of this too.
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This combination was forbidden in 3.12
commit 40a0d32d1eaffe6aac7324ca92604b6b3977eb0e :
"fork: unify and tighten up CLONE_NEWUSER/CLONE_NEWPID checks"
and then it was permited again in 3.13:
commit 1f7f4dde5c945f41a7abc2285be43d918029ecc5
fork: Allow CLONE_PARENT after setns(CLONE_NEWPID)
Cc: Adrian Reber <adrian@lisas.de>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
v2: use the test list instead of the file for telling zdtm.sh the test will
fail
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We'll use this in the next patch when testing the creds comparison for
threads.
v2: use an explicit list in zdtm.sh instead of a file in the test dir
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
After we got the total remapable rst memory size, we no longer
can allocate from it, otherwise the bootstrap area will not
have enough size.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It's similar to previous patch with tcp mem -- no need to
realloc big arrays and then memcpy data between them. It's
enough just to walk timerfd objects at the very end.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
In current scheme we grow an array with realloc()-s then
memcpy() the result into rst_mem. I propose to get rid
or realloc-s (we already have objects for the data we
need to keep) and memcpy-s (and put objects directly
into rst_mem at the end).
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Calling rst_mem_alloc() in a loop with increasing size causes the
n^2 memory grow :) since _alloc is not _realloc.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The help message of CRIU has grown in size and is truncated because the
size of the private buffer in log.c is too small. This patch increases
the size of the buffer.
[ The "bad" message is the --help output one ]
Signed-off-by: Saied Kazemi <saied@google.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The kill syscall queues a signal, but doesn't wait when it will be
handled.
We need to wait processes if we kill them. The user doesn't
expect to find processes after dump in this case.
PTRACE_DETACH returns errors for dead tasks, so we don't need to do it
in these cases.
Cc: Nikita Spiridonov <nspiridonov@odin.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Having ability to have your own options structure is quite nice
and allows much more flexible use of libcriu in cases when you
want to have a bunch of instances of options structures.
This patch also allows users to use raw CriuOpts structure
modified in any suitable way, whether by libcriu's criu_local_set
methods or by using protobuf-c directly.
It is also worth noting, that backward-compatibility in API and ABI
is preserved.
Signed-off-by: Ruslan Kuprieiev <rkuprieiev@cloudlinux.com>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
| CID 96750 (#1 of 1): Resource leak (RESOURCE_LEAK)
| 163. leaked_storage: Variable sec_hdrs going out of scope leaks the storage it points to.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
In test_msg() a buffer is allocated on stack to cook the outputed message.
This buffer's size was defined using the PAGE_SIZE constant defined in
zdtmtst.h file.
On some system like ppc64, the page size is large (64K), leading to massive
stack allocation, which may be too large in case of alternate stack like
the one used in the sigaltstack test.
This fix, defines a 2048 characters buffer for test_msg, and expose a
constant to allocate stack accordingly in the sigaltstack test.
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Acked-by: Andrew Vagin <avagin@odin.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Calls to setsockopt(PACKET_RX_RING/PACKET_TX_RING) are dependent of the
system's page size.
Using sysconf() page size makes these tests working on ppc64 where page
size is 64K.
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Acked-by: Andrew Vagin <avagin@odin.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Since the page size may be different from an architecture/a system to
another it should not be hard coded to 4096.
As a consequence, several tests are failing on ppc64 due to a wrong page
size value.
This fix belongs to sysconf to get the current page size.
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Reviewed-by: Christopher Covington <cov@codeaurora.org>
Acked-by: Andrew Vagin <avagin@odin.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
linux/seccomp.h may not be available, and the seccomp mode might not be
listed in /proc/pid/status, so let's not assume those two things are
present.
v2: add a seccomp.h with all the constants we use from linux/seccomp.h
v3: don't do a compile time check for PTRACE_O_SUSPEND_SECCOMP, just let
ptrace return EINVAL for it; also add a checkskip to skip the
seccomp_strict test if PTRACE_O_SUSPEND_SECCOMP or linux/seccomp.h
aren't present.
v4: use criu check --feature instead of checkskip to check whether the
kernel supports seccomp_suspend
Reported-by: Mr. Jenkins
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Andrew Vagin <avagin@odin.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
v2: actually set ret = -1 on failure
v3: add a --feature option for suspend_seccomp (and make this patch 1,
since the tests depend on it now)
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Andrew Vagin <avagin@odin.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When freeing the vma entries, don't call close on vm_file_fd when dealing
with a VMA AIO entry since the vm_file_fd is then filled with aio_nr_req as
part of the union.
I hit this issue when running the test aio00 on ppc64. Here the value of
the VMA aio aio_nr_req field was matching the value of the service file
descriptor IMG_FD_OFF. This leads to an obscure checkpoint error.
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The initial support of the SYS V shared memory on ppc64 is broken. The call
to shmat done in the restore blob has no chance to work correctly.
This patch fixes the sys_shmat call.
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Using collections.OrderedDict allows us to keep fields in the
same order as they appear in corresponding proto files, which
helps to impove readability. In non-pretty mode we still use
regular dict.
Signed-off-by: Ruslan Kuprieiev <rkuprieiev@cloudlinux.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Since we don't support dumping per-thread creds, let's at least fail to
dump if the creds don't match.
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Note that we don't add the test into the list of tests to run, because it will
fail without the associated kernel patch.
v2: spin lock until seccomp strict is set on the child
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Unfortunately, SECCOMP_MODE_FILTER is not currently exposed to userspace,
so we can't checkpoint that. In any case, this is what we need to do for
SECCOMP_MODE_STRICT, so let's do it.
This patch works by first disabling seccomp for any processes who are going
to have seccomp filters restored, then restoring the process (including the
seccomp filters), and finally resuming the seccomp filters before detaching
from the process.
v2 changes:
* update for kernel patch v2
* use protobuf enum for seccomp type
* don't parse /proc/pid/status twice
v3 changes:
* get rid of extra CR_STAGE_SECCOMP_SUSPEND stage
* only suspend seccomp in finalize_restore(), just before the unmap
* restore the (same) seccomp state in threads too; also add a note about
how this is slightly wrong, and that we should at least check for a
mismatch
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
For testing purpose we need to disable using of
piegen utility. So lets add PIEGEN make option
thus one can "PIEGEN=no make" to build criu
without piegen at all.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Eric wants to restrict permissions for proc mounts in a non-root userns
according with proc mounts in the root userns.
Author: Eric W. Biederman <ebiederm@xmission.com>
Date: Fri May 8 23:49:47 2015 -0500
mnt: Modify fs_fully_visible to deal with locked ro nodev and atime
Ignore an existing mount if the locked readonly, nodev or atime
attributes are less permissive than the desired attributes
of the new mount.
...
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Reasoning: some systems have /sys/fs/cgroup stuff mounted as read-only
and we have to either remount it rw or create our own set. The former
doesn't look sane as this rw remounting is also done by ststemd, so
let's return back to manual cgyard construction.
This reverts commit 860df95f85.
Conflicts:
cgroup.c
include/cr_options.h
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Instead of keeping around multiple fds that point to various places in
/proc, let's just use /proc and openat() things relative to it.
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>