The kernel doesn't have an interface to get a sent queue for udp
sockets, so currently we can't dump them and criu dump has to fail in
such cases.
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Now we block all sockets with non-zero idiag_wqueue, but it doesn't mean
that a CORK option is enabled for a socket. A packet can be in a network
stack and it is accounted into idiag_wqueue.
https://github.com/checkpoint-restore/criu/issues/409
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Now it's probably one valide use case, because there is no way to commit
a container when a container is being checkpointed.
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
All static tests has to stop any activity before C/R.
./tempfs_subns --pidfile=tempfs_subns.pid --outfile=tempfs_subns.out --dirname=tempfs_subns.test
Run criu dump
Unable to kill 128: [Errno 3] No such process
Run criu restore
7: Old mounts lost: []
7: New mounts appeared: [('/rootfs/criu/test', '/'), ('/', '/proc'), ('/', '/dev/pts')]
:
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Except for several false positives done by:
find -type f -name "*.c" -not -path "./test/*" -exec sed -i
's/\(\<pr_err.*[^\][^n]\)\("[,)]\)/\1\\n\2/g' {} \;
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Currently we use an additional pipe to steal data from a pipe, but we
don't check that we steal all data. And the additional pipe can have a
smaller size.
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
a) As we shmalloced rpath it can not be xfreed. b) As we shmalloc
variables in collect_remap_ghost() far away from open_remap_ghost()
where we want to free them, there is no guaranty that our shmalloc was
last and we can't use shfree_last().
fixes commit 0c675a5e9d40 ("files: remove link_remaps when everything
has been restored")
When create_ghost() fails for some reason that produces a segfault for me.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Reading and writing large buffers may result in short read/write. In cases
we expect the entire buffer to be transferred use {read,write}_data rather
than plain read/write syscalls.
Reported-by: Mr Jenkins
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Some tests expect that all the data will be handled in a single invocation
of read/write system call. But it possible to have short read/write on a
loaded system and this is not an error.
Add helper functions that will reliably read/write the entire buffer.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Currently we use the "map_files/%p-%p" format, but actually it should
be "map_files/%lx-%lx".
The kernel could handle both formats, but recently Alexey Dobriyan fixed
the kernel and it accept only the second format.
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
If we start backroung memory fetch before restore is completely finished,
we may try to write to the memory areas which were not yet remapped to
proper place and are not registered with userfaultfd.
Add synchronization between restore and the lazy-pages so that lazy-pages
will only handle #PFs before all the tasks are restored.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The remote page read has nothing to do if the page-server on the source has
closed the connection. Just report an error and abort.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Currently when we poll a file descriptor, we only process EPOLLIN events
and if a connection is closed the receiving side has no means to deal with
it.
Add a callback for EPOLL{RD}HUP events and the default implementation for
these events that removes the file descriptor from epoll and closes it.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
A bit more readable and will be easy to distinguish from upcoming
hevent^Whangup_event.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The generic epoll_wait wrapper should not do any assumptions about timeout.
It's it up to lazy-pages daemon to make (future) policy decisions.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
If ipv6 socket has an IPv4-mapped address, it is used to handle ipv4
connection, so we have to use ipv4 iptables rules to block this
connection.
Reported-by: Mr Jenkins
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
In addition to writing the CRIU version to the log file this adds the
current kernel version to the log file:
(00.000008) Version: 3.5 (gitid v3.5-511-ga8cc6cf)
(00.000303) Running on node01 Linux 3.10.0-513.el7.x86_64 #1 SMP Tue Feb 29 06:78:90 EST 2017 x86_64
v2:
- small changes as suggested by Dmitry (thanks)
Signed-off-by: Adrian Reber <areber@redhat.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Leases can be set only on regular files. Thus, as optimization we can
skip attempts to find associated leases in 'correct_file_leases_type'
for other fd types.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
-- check childs' errors in file_leases03
-- test c/r of lease transfered to child process
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
CRIU creates dictinct lock record for each file descriptor on the same
OFD. The patch removes this duplicates. To do so, it adds new field into
struct file_lock, which stores pid of fd, on which lock was found.
'owner pid' is not actually helpful, because the original fd, on which
lock have been set, can be already closed.
Also it purges crutches doing the same stuff but only for file leases.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
If we receive only part of the IOV from the page-server we recalculate the
IOV so it will point to the area we still have to fetch. During the split,
the IOV covering the remaining area may remain marked as 'queued' and we'll
never retry fetching it.
Marking the IOV as not queued will ensure its pages will be requested
again.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Since commit e609267f681062b4370e528a50f635222e0c2330 ("page-pipe: allow to
share pipes between page pipe buffers") the assumption that we will receive
the exact amount of pages we've requested with PS_IOV_GET does not always
hold.
In the case we serve pages data from the images using 'page-server
--lazy-page' the IOVs seen by the pagemap may cross page-pipe buffer
boundaries and read_page_pipe will clamp the pages in the response to those
boundaries.
Adjust page_server_read so it will not try to receive more pages than
page-server is going to send.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Now criu create a new pipe buffer, if a previous one has another set of
flags. In this case, a pipe is not full and we can use it for the
next page buffer.
We need 88 pipes to pre-dump the zdtm/static/fork test without this
patch, and we need only 17 pipes with this patch.
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
v2: and move it upper, because it is going to be used in ppb_alloc()
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
vmsplice can't splice more than UIO_MAXIOV, but we can
call it a few times from a parasite.
v2: s/nr/nr_segs/
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The origin idea was to set --empty net for criu dump and criu restore,
but before cde33dcb0639 ("empty-ns: Don't C/R iptables too (v2)"),
criu restore worked without --empty net and we didn't notice that
docker doesn't set this option on restore.
After a small brainstorm, we decided that it is better to remove
this requirement. Docker has to set this option, but with this changes,
the docker issue will be less urgent.
https://github.com/checkpoint-restore/criu/issues/393
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Install the last version of Docker, start a container and C/R it a few times.
v2: call make to install criu
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Test cases:
0. Basic non-breaking read/write leases.
1. Multiple read leases and OFDs with no lease for the same file.
2. Breaking leases.
3. Multiple fds (dup + inherited) for single lease (mutual OFD).
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Information about locks in /proc/<pid>/fdinfo is presented only since
kernel v4.1. This patch adds logic to *note_file_lock* to match leases
and OFDs.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
restore of breaking leases is executed in 2 steps:
1. restore the lease in a state it was before break
2. break it by opening associated file.
The patch fixes type of broken leases to 'target lease type',
because procfs always returns 'READ' in this case.
Also, it adds 'updated' field in lock structure. It's used to remove all
duplicated records for single lease from the image, which wasn't
corrected by 'correct_lease_type'.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Leases in breaking state are not supported. In that case criu will
report an error during the dumping. Also lock info in
/proc/<pid>/fdinfo should be presented (since kernel 4.1).
Before taking out new lease it modifies process fsuid to match file uid
(see fcntl F_SETLEASE).
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Otherwise we get errors like this:
/usr/include/sys/socket.h:315:5: note: expected 'const struct sockaddr *' but argument is of type 'struct sockaddr_un *'
int bind (int, const struct sockaddr *, socklen_t);
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>