When an error occurs we need to close a file descriptor and unmap a region.
Use a separate label for each cleanup.
Fix CID 182644 (#1-2 of 2): Use after close (USE_AFTER_FREE)8. pass_closed_arg:
Passing closed handle f.fd as an argument to bclose
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
It's not really interesting and just pollutes the log
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The intention was to make sure, that only one packet is sent at a time.
And thus read has to return exactly the size of one packet.
But it doesnt' work as expected, because size of autofs_v5_packet_union
differs on 32 bit and 64 bit architectures.
This is a bug, but it's hidden so deeply, that no one really cares by the
following 2 aspects:
1) Autofs pipe has O_DIRECT flag, which means excess bytes will be discarded
upon read.
2) No one tries to read more than one packet size at a time.
So let's fix the test instead and do not try to read more bytes, than
expected.
Signed-off-by: Stanislav Kinsburskiy <skinsbursky@virtuozzo.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
We don't need to look up a mount info element, because
we already have it there.
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
That's a merge conflict between vdso patches set and s390 set.
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: Alice Frosi <alice@linux.vnet.ibm.com>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
We can save some syscalls for *each* dumpee if we don't
open()/seek()/read()/close() /proc/pid/pagemap for each
dumpee and even don't use parasite to parse symtable if
pagemap is unavailable.
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
If it does preserve, we can omit checking pagemap for dumpee or
filling symtable in parasite.
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Just map vdso at restorer's parking zone, no need for
searching it in CRIU and remap it to park zone.
That will save some open()/read()/close() syscalls
for parsing maps file.
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
If there is no vdso in images - we don't need to map it.
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Test for previously fixed bugs for vdso-trampolines insertion:
- unmapping original vvar (which went unnoticed)
- leaving rt-vvar after each C/R cycle and resulting pollution
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
It's hard to stop, when you've begun.
No functional change is expected.
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
As rt-vvar can be placed lower (by address) than rt-vdso,
it likely will go earlier in vma_area_list.
That means that at the moment, when we've found rt-vdso,
we already passed rt-vvar and rt_vvar_marked pointer
will not be initialized.
Search for rt-vvar during the second vma list traverse,
so we will always find it if it's present.
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
I think, it's better if we still restore the original vvar
as some code may expect vma to be there or may dereference
pointer there.
E.g., if we checkpointed task while it was on vdso page,
it'll dereference pointer to vvar. Better keep it in vmas.
So, the original code deleted it becase it was proxy_vvar_marked,
which I think is misnaming problem.
Having two vvar addresses named rt_ and orig_ describes what to
do with them on dump.
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
We need to place @rt_vvar_addr into vdso mark, as we don't know
the position of rt-vvar to be dropped on the following dumps.
I've renamed proxy_*_addr to orig_*_addr, as it looks more
describing.
orig_*_addr we need for marking those VMAs as accordingly,
so restorer would know what to do with them. Otherwise, it'll
think they are just regular vmas.
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Previously, arch_prctl(ARCH_MAP_VDSO_32) was only used by
CONFIG_COMPAT to map compatible vdso blob for 32-bit restoree.
But we can make it more generic:
Omitting mremap() for moving vdso to rt-vdso zone in restorer
and afterward on needed position in restoree.
Also omitting reading /proc/self/maps to find vdso/vvar
addresses (to park afterward in restorer).
TLDR; under this kdat feature we can get rid of a buch of mremap()'s
for each restoree and from parsing /proc/self/maps in vdso_init_restore().
The API is present from v4.9 kernel.
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Found with fault-injected jump trampolines in vdso,
that on ia32 tests rt-vdso got unmapped.
I've fixed it previously, but have forgot it during
debugging vdso cleanup.
Fixes: commit 8544895a528b ("ia32/restorer: move 32-bit pie unmap to x86")
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
I'll need to modify it - make it small and ready for changes.
No functional change is expected.
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Make it more readable and change-ready.
No functional change is expected.
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
This is just to split this oversized outgrowed plumped up
parasite_fixup_vdso() and separate it logically.
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Generalize addr-to-pfn conversion in one function.
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Expand vaddr_to_pfn() so it can read and translate to PFN
from already opened fd. This can be used as optimization
if one need to translate a couple of addresses and
also to read other task's PFN.
I'll use it in the next patch for reading other tasks's
vdso's PFN.
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Let's hide those kernel details in enum.
Further in this patches set I'll add kdat test for presence
of "[vdso]" hint after mremap(), so we will skip any checking
on kernels > v3.16 and do not init vdso_pfn also
(skipping parsing of self-pagemap).
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
On s390 we don't use vvar. Therefore vvar_size in not been initialized
and the value remains VVAR_BAD_SIZE.
Fix the BUG() statement to also cover this case.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Reviewed-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
In file included from criu/include/util-vdso.h:23:0,
from criu/include/kerndat.h:8,
from criu/lsm.c:8:
criu/arch/aarch64/include/asm/vdso.h:17:35: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'aarch_vdso_symbol1'
static const char* __maybe_unused aarch_vdso_symbol1 = "__kernel_clock_getres";
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Don't need to parse vdso symtable each restore - it's boot-persistent,
so move it into criu.kdat file on tmpfs.
That will also remove syscalls made for filling compat vdso symtable
by compat vdso helper.
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
On dump we only need vdso_pfn to be filled,
on restore we need filled symtables.
That means, that we can omit filling symtables on dump,
which also means no need in calling compat_vdso_helper
process with fork(), pipe(), read(), write() and so on syscalls.
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
As ASLR randomizes {vdso,vvar}_start between criu launches,
vdso_parse_maps() should be called each launch:
- on restore to know {vdso,vvar}_start position for later parking
in restorer's save zone
- on checkpointing to get vdso's pfn for pre-v3.16 kernels
which lose "[vdso]" hint in maps file.
But vdso_fill_symtable() call may be omitted if symtable is
inside kdat file.
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Preparation for saving vdso_symtable in kdat, which will
allow skip parsing of native and compat symtables - that
at least will save from running compat helper each criu launch.
As {vvar,vdso}_start are randomized with ASLR, there is no
point in saving them into kdat. We'll still need to reread
them from /proc/self/maps for awhile.
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
I plan to keep boot-persistent vdso_symtable inside kdat,
moving {vvar,vdso}_start addresses out into new structure.
As order of vdso/vvar is preserved across one booting store
it inside of vdso_symtable.
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Make a function which decides if we need to insert
jump trampolines, or if the blobs are equal and just a remap
if enough.
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Ordering check for vvar/vdso blobs was introduced with
the commit b00bdb2dbc31 ("vdso: x86 -- Test VMAs order in vdso_proxify")
Let's simplify it to more readable version.
As above we've compared that vvar/vdso's blob size from dumpee matches
sizes of rt-vvar/rt-vdso, kill that xor here.
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Check that task without vvar & vdso blobs is restored without them.
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Let's just unmap criu's vdso & vvar if restoree doesn't
have them.
This could be fired e.g. by migrating task from vdso-disabled
kernel to vdso-enabled one.
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The plan is to keep boot-persistent vdso properties in symtable,
to omit parsing it in each invocation of criu.
As sizes of vdso/vvar are being stable on the same kernel,
move them into symtable, substituting end addresses.
Begin/end addresses are randomized by ASLR so there is no point
in storing them in kdat.
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The right order for all of our 4 archs is:
SYSCALL_DEFINE5(clone, unsigned long, clone_flags, unsigned long, newsp,
int __user *, parent_tidptr,
unsigned long, tls,
int __user *, child_tidptr)
See Linux kernel for the details.
Note, this is just a fix, and it's not connected with the second patch.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Reviewed-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The check_features RPC function required that both known fields are
present. Without those fields it exited with an error. If RPC users
where not specifying all parameters it would fail. It should, however,
be possible to only check for a subset of options.
Each supported option is checked separately anyway in the forked criu
which does the actual check.
Removing the check also enables RPC clients with older protobuf
definitions to use the feature check.
Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The recent changes to user address space limits in the Linux kernel break
the assumption that TASK_SIZE is 128TB. For now, the maximal task size on
ppc64le is 512TB and we need to detect it in runtime for compatibility with
older kernels.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Even with 2 parallel jobs maps04 takes too much time with
--remote-lazy-pages. Let's skip it for now.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
If there were fork()'s during lazy-restore we create a shallow copy of the
parent process' page-read. Since all the copies reference the same open
files, we cannot close the page-read until we finish restore of all the
processes that share the page-read.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Running zdtm/static/maps04 with --remote-lazy-pages in parallel with 3
other tests takes too much time on the Jenkins builder. Let's try running
with --parallel 2.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
When there is a stale kdat cache file it's contents is read into the memory
and we end up requesting random userfaultfd features. Explicitly set the
kdat.uffd_features to zero before querying the kernel resolves the issue.
✓ travis-ci: success for kerndat: set uffd features to 0 before querying kernel
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Without non-cooperative userfaultfd some programs may fail during lazy
restore because they perform operations that cannot be handled by the
lazy-pages daemon.
✓ travis-ci: success for lazy-pages: update checks for availability of userfaultfd (rev3)
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
The kerndat_init() is now called before the jump to action handler. This
allows us to directly use kdat without calling to the corresponding
kerndat_*() methods.
✓ travis-ci: success for lazy-pages: update checks for availability of userfaultfd (rev3)
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>