In 0a7c5fd1bd8d1e49e273b51ff39af473d6c68cbc we swapped the BSD
implementation of strlcat and strlcpy in favor of our own replacement.
The checks and the predefined macros are not needed anymore.
Signed-off-by: Lorenzo Fontana <fontanalorenz@gmail.com>
There are several changes in glibc 2.36 that make sys/mount.h header
incompatible with kernel headers:
https://sourceware.org/glibc/wiki/Release/2.36#Usage_of_.3Clinux.2Fmount.h.3E_and_.3Csys.2Fmount.h.3E
This patch removes conflicting includes for `<linux/mount.h>` and
updates the content of `criu/include/linux/mount.h` to match
`/usr/include/sys/mount.h`. In addition, inline definitions sys_*()
functions have been moved from "linux/mount.h" to "syscall.h" to
avoid conflicts with `uapi/compel/plugins/std/syscall.h` and
`<unistd.h>`. The include for `<linux/aio_abi.h>` has been replaced
with local include to avoid conflicts with `<sys/mount.h>`.
Fixes: #1949
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
1. For some reason, Marier distribution headers
not correctly define __GLIBC_HAVE_KERNEL_RSEQ
compile-time constant. It remains undefined,
but in fact header files provides corresponding
rseq types declaration which leads to conflict.
2. Another issue, is that they use uint*_t types
instead of __u* types as in original rseq.h.
This leads to compile time issues like this:
format '%llx' expects argument of type 'long long unsigned int', but argument 5 has type 'uint64_t' {aka 'long unsigned int'}
and we can't even replace %llx to %PRIx64 because it will break
compilation on other distros (like Fedora) with analogical error:
error: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 6 has type ‘__u64’ {aka ‘long long unsigned int’}
Let's use our-own struct rseq copy fully equal to the kernel one,
it's safe because this structure is a part of Linux Kernel ABI.
Fixes#1934
Reported-by: Nikola Bojanic
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
See "man memfd_create" for more information of what memfd is.
This adds support for memfd open files, that are not not memory mapped.
* We add a new kind of file: MEMFD.
* We add two image types MEMFD_FILE, and MEMFD_INODE.
MEMFD_FILE contains usual file information (e.g., position).
MEMFD_INODE contains the memfd name, and a shmid identifier
referring to the content.
* We reuse the shmem facilities for dumping memfd content as it
would be easier to support incremental checkpoints in the future.
Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
After Centos-8 nft used instead of iptables. But we had never supported nft rules in
CRIU, and after c/r all rules are flushed.
Co-developed-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Signed-off-by: Alexander Mikhalitsyn <alexander@mihalicyn.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Distributions starts to supply GCC that is configured to compile
-pie and -fPIC code by default due to security reasons.
CONFIG_COMPAT was unfriendy to -pie by the reason of R_X86_64_32S
relocation in call32.S helper:
LINK criu/criu
/usr/bin/ld: criu/arch/x86/crtools.built-in.o: relocation R_X86_64_32S against `.text' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: Nonrepresentable section on output
collect2: error: ld returned 1 exit status
make[1]: *** [criu/Makefile:92: criu/criu] Error 1
make: *** [Makefile:225: criu] Error 2
Use %rip-relative addressing to avoid ld errors for shared binary linking.
Puff, all needs to be done with bare hands!
Now CONFIG_COMPAT can be used with -pie binaries and all should
also work for debian toolchain (#315).
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Add linux/userfaultfd.h to criu sources. This header is a part
of the kernel API and I see nothing wrong to have in the repo.
Why we want to do this:
* to check that criu works correctly if a kernel doesn't
support userfaultfd.
* to check compilation of the userfaultfd part in travis-ci.
v2: remove UFFD from FEATURES_LIST
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Adrian Reber <areber@redhat.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Adrian Reber <areber@redhat.com>
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
This is a first try to include userfaultfd with criu. Right now it
still requires a "normal" checkpoint. After checkpointing the
application it can be restored with the help of userfaultfd.
All restored pages with MAP_ANONYMOUS and MAP_PRIVATE set are marked as
being handled by userfaultfd.
As soon as the process is restored it blocks on the first memory access
and waits for pages being transferred by userfaultfd.
To handle the required pages a new criu command has been added. For a
userfaultfd supported restore the first step is to start the
'lazy-pages' server:
criu lazy-pages -v4 -D /tmp/3/ --address /tmp/userfault.socket
This is part 1 of the userfaultfd integration which provides the
'lazy-pages' server implementation.
v2:
* provide option '--lazy-pages' to enable uffd style restore
* use send_fd()/recv_fd() provided by criu (instead of own
implementation)
* do not install the uffd as service_fd
* use named constants for MAP_ANONYMOUS
* do not restore memory pages and then later mark them as uffd
handled
* remove function find_pages() to search in pages-<id>.img;
now using criu functions to find the necessary pages;
for each new page search the pages-<id>.img file is opened
* only check the UFFDIO_API once
* trying to protect uffd code by CONFIG_UFFD;
use make UFFD=1 to compile criu with this patch
v3:
* renamed the server mode from 'uffd' -> 'lazy-pages'
* switched client and server roles transferring the UFFD FD
* the criu part running in lazy-pages server mode is now
waiting for connections
* the criu restore process connects to the lazy-pages server
to pass the UFFD FD
* before UFFD copying anything else the VDSO pages are copied
as it fails to copy unused VDSO pages once the process is running.
this was necessary to be able to copy all pages.
* if there are no more UFFD messages for 5 seconds the lazy-pages
server switches in copy mode to copy all remaining pages, which
have not been requested yet, into the restored process
* check the UFFDIO_API at the correct place
* close UFFD FD in the restorer to remove open UFFD FD in the
restored process
v4:
* removed unnecessary madvise() calls ; it seemed necessary when
first running tests with uffd; it actually is not necessary
* auto-detect if build-system provides linux/userfaultfd.h
header
* simplify unix domain socket setup and communication.
* use --address to specify the location of the used
unix domain socket
v5:
* split the userfaultfd patch in multiple smaller patches
* introduced vma_can_be_lazy() function to check if a page
can be handled by uffd
* moved uffd related code from cr-restore.c to uffd.c
* handle failure to register a memory page of the restored process
with userfaultfd
v6:
* get PID of to be restored process from the 'criu restore' process;
first the PID is transferred and then the UFFD
* code has been re-ordered to be better prepared for lazy-restore
from remote host
* compile test for UFFD availability only once
Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
I was adapting CRIU with ia32 support for building with Koji,
and found that Koji can't build x86_64 packages and have
i686 libs installed.
While at it, I found that i686 libraries requirement is
no longer valid since I've deleted the second parasite.
Drop feature test for i686 libs and put test for gcc.
That will effectively test if gcc can compile 32-bit code
and bug with debian's gcc (#315).
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
I'll wrap all compatible code in this CONFIG_COMPAT define.
As I'll wrap also compatible parasite generation in this,
it's also makefile variable, rather than just C define.
The test itself consists of including stubs-32.h, which is
glibc6-i686 presence test and is compiled with -m32 option,
which is test for gcc-multilib.
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
If userns_restore_one_link() is called outside of usernsd,
it switches into the criu namespace and switches back before exiting.
v2: rid of the include of linux/net_namespace.h in criu/include/net.h,
as well as the associated defines and feature checks
travis-ci: success for net: simplify restore of macvlan-s (rev2)
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
This header was only introduced in 2015, so we need to build without it.
travis-ci: success for series starting with [v10,01/11] net: pass the struct nlattrs to dump() functions
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
It's a new option to get/set window parameters.
v2: don't do this check to unprivileged users, because TCP_REPAIR
is protected by CAP_NET_ADMIN.
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
It isn't required, __NR_memfd_create is defined in syscall-codes.h.
With this patch, we can run all tests in travis-ci, because there
a kernel has support of memfd, but __NR_memfd_create isn't defined
in installed headers
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Previously when we've been on early stage we
tried to workaround lack of prlimit libc call
simply providing own implementation, but it
cause problems on some libc configurations so
simply use rlimit64 and __NR_prlimit64 syscall
directly.
The kernel must support __NR_prlimit64 syscall
and provide rlimit64 structure as well.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
We define own SYS_memfd_create in case if it's missing
in libc, but we need it for user-namespace restore.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
To add a new feature test - add it to FEATURES_LIST.
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Just got a situation inside VM where pretty new
kernel with memfd has been installed (and as result
__NR_memfd_create shipped with kernel headers
is provided as well) but libc was old having no
SYS_memfd_create defined. Thus we've got an error
because we use exactly SYS_ number for calls.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
linux/seccomp.h may not be available, and the seccomp mode might not be
listed in /proc/pid/status, so let's not assume those two things are
present.
v2: add a seccomp.h with all the constants we use from linux/seccomp.h
v3: don't do a compile time check for PTRACE_O_SUSPEND_SECCOMP, just let
ptrace return EINVAL for it; also add a checkskip to skip the
seccomp_strict test if PTRACE_O_SUSPEND_SECCOMP or linux/seccomp.h
aren't present.
v4: use criu check --feature instead of checkskip to check whether the
kernel supports seccomp_suspend
Reported-by: Mr. Jenkins
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Andrew Vagin <avagin@odin.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Unfortunately, SECCOMP_MODE_FILTER is not currently exposed to userspace,
so we can't checkpoint that. In any case, this is what we need to do for
SECCOMP_MODE_STRICT, so let's do it.
This patch works by first disabling seccomp for any processes who are going
to have seccomp filters restored, then restoring the process (including the
seccomp filters), and finally resuming the seccomp filters before detaching
from the process.
v2 changes:
* update for kernel patch v2
* use protobuf enum for seccomp type
* don't parse /proc/pid/status twice
v3 changes:
* get rid of extra CR_STAGE_SECCOMP_SUSPEND stage
* only suspend seccomp in finalize_restore(), just before the unmap
* restore the (same) seccomp state in threads too; also add a note about
how this is slightly wrong, and that we should at least check for a
mismatch
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Check for setproctitle_init, as old versions of libbsd don't have one.
Reported-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Acked-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Currently we check PTRACE_PEEKSIGINFO and if it's defined in a system
header, we suppose that ptrace_peeksiginfo_args is defined there too.
But due to a bug in glibc, this check doesn't work. Now we have F20,
where ptrace_peeksiginfo_args is defined in sys/ptrace and F21 where
it isn't defined.
commit 9341dde4d56ca71b61b47c8b87a06e6d5813ed0e
Author: Mike Frysinger <vapier@gentoo.org>
Date: Sun Jan 5 16:07:13 2014 -0500
ptrace.h: add __ prefix to ptrace_peeksiginfo_args
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We will need it for btrfs subvolumes handling.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Same as kernel provides, adopted from Linux sources.
strlcpy is similar to strncpy but _always_ adds \0
at the end of string even if destination is shorter.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It's being reported that some systems (as Ubuntu 13.04) already
have struct tcp_repair_opt definition in their system headers.
| sk-tcp.c:25:8: error: redefinition of struct tcp_repair_opt
| sk-tcp.c:31:2: error: redeclaration of enumerator TCP_NO_QUEUE
So add a facility for compile time testing for reported entities
to be present on a system. For this we generate include/config.h
where all tested entries will lay and source code need to include
it only in places where really needed.
Reported-by: Vasily Averin <vvs@parallels.com>
Acked-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>