Return results of work separate: a new fd is in a parameter,
a status is in return value.
In next patches we will use return value "1" to indicate,
that open callback should be called once again, and restore
for this fle has not finished yet. So, we need to be able
to differ file descriptor with number 1 and "again" request.
We do not use negative value like -2 for this purpose,
because we want to allow fles to be served out before
they are completelly restored. So, if a fle is successefuly open,
but it's need one more call of open to complete its restore,
then we return 1 and populate new_fd in not negative value.
See "files: Kill struct file_desc_ops::post_open" for the details.
Also, export open_pipe()
v5: Use 0 and -1 for successful return and error.
v6: Rebase on new criu
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
This is need to make receiving a fle non-blocking.
We will sleep on task_st futex instead of this.
v5: Do not set event in send_fd_to_self()
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Currently, it's just an additional check. But later this will be used
int the following way. Return value "1" means, that fle is not ready
for restore, and the caller should recall this method once again later.
See "[PATCH] files: Kill struct file_desc_ops::post_open" for the details.
v5: Use "1" for return
v2: Use generic FLE_OPEN and FLE_RESTORED to determ if a fle is ready
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
This point we think it's a first call of open(),
so the state must be FLE_INITIALIZED.
v6: New
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Timerfd's post_open state does not depend on another objects,
so it may be safely merged into open stage.
travis-ci: success for Rework file opening scheme to make it asynchronous (rev5)
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
The step to make file opening use the only futex.
travis-ci: success for Rework file opening scheme to make it asynchronous (rev5)
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
This is need for waiting, while port has users, using task_st futex.
travis-ci: success for Rework file opening scheme to make it asynchronous (rev5)
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Use task_st futex notifier instead of per-socket.
The step to make file opening use the only futex.
v2: Use internal bound and listen states instead of generic
travis-ci: success for Rework file opening scheme to make it asynchronous (rev5)
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
This is need for waiting a peer using task_st futex.
travis-ci: success for Rework file opening scheme to make it asynchronous (rev5)
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Add fle open stages. Set a stage after every operation.
v2: Do not merge filetype specific state with generic.
travis-ci: success for Rework file opening scheme to make it asynchronous (rev5)
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
The idea is symilar to kernel's wake_up() and wait_event().
One task needs some event. It checks the event has not
happened yet (fle hasn't received, unix peer hasn't bound, etc)
and calls get_fds_event(). Other task makes the event
(sends a fle, binds the peer to a name, etc) and calls set_fds_event().
So, while there is no an event, the first task is sleeping,
and the second wakes it up later:
Task A: clear_fds_event();
if (!socket_bound)
wait_fds_event(); /* sleep */
Task B: bind_socket();
set_fds_event(); /* wake up */
For the details of using see next patches.
v5: Use bit operations.
Split clear_fds_event from wait function.
v2: Do not wait for foreign transport sock is ready,
as it's guarantied by we create it before CR_STATE_FORKING.
travis-ci: success for Rework file opening scheme to make it asynchronous (rev5)
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
(was "files: Wait transport_fd before sending a fd to peer"
and "pstree: Add task_st futex")
We are going to move to the the single per-task bit field
for notifications about file opening events. Introduce
pstree_item::task_st for that.
v5: Add FDS_EVENT_BIT description.
v2: Do not wait until a peer's socket is created
as it's guarantied by we create it before CR_STATE_FORKING.
travis-ci: success for Rework file opening scheme to make it asynchronous (rev5)
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
TTY masters and slaves have no post_open stage,
so these two blocks may safely have their stages merged together.
The third is eventpoll, but two above do not depend
on it (their .post_open do not depend on eventpoll .open).
Unix sockets would have been, but this isn't implemented yet.
So, we may safely execute all stages for different
file types separatelly.
travis-ci: success for Rework file opening scheme to make it asynchronous (rev5)
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Since "receive" stage is used only for slave fds
and nobody depends on slave fds receiving is finished,
we may move it functionality in "post_open" stage.
This just makes slave fds to be received a little bit later.
In other words, only masters have post_open stage,
and only slaves have receive stage. So, in the case of
A and B files:
A->open
B->open
A->recv
A->post_open
B->recv
B->post_open
A->post_open can't depend on B->recv. This follows
just from analyzing of all file types post_open methods.
travis-ci: success for Rework file opening scheme to make it asynchronous (rev5)
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Since we are going to get rid of stage at all, kill this function
and call post_open_fd() unconditionally. It can handle the case,
when file_desc_ops::post_open is NULL
travis-ci: success for Rework file opening scheme to make it asynchronous (rev5)
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Since transport socket is per-process, we do not need
fd parameter in this function anymore.
travis-ci: success for Rework file opening scheme to make it asynchronous (rev5)
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Since transport fd is per-task, this method is not need anymore.
Kill it.
travis-ci: success for Rework file opening scheme to make it asynchronous (rev5)
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Move reopen_fd_as() from receive_fd() to this function.
Note, that recv_fd_from_peer() has other callers, and
all of them are OK with receiving real fds (before,
they received arbitrary fds, and they OK with any fds).
travis-ci: success for Rework file opening scheme to make it asynchronous (rev5)
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Don't wait for "prepare" stage of every peer's fd. Just
send everything to a peer's global transport socket, and
the peer will find appropriate fd, it needs at the moment.
travis-ci: success for Rework file opening scheme to make it asynchronous (rev5)
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
For moving to a single transport socket scheme, we should be able
to receive a fd, which is not need at the moment, but it will
be used in the future. So, we receive a further fd, and then
continue to wait the fd, we really need now.
v3: Delete excess BUG_ON().
Rename main patch funtion to keep_fd_for_future().
Rename second funtion to task_fle(), and make it
have "task" argument.
travis-ci: success for Rework file opening scheme to make it asynchronous (rev5)
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
No functional changes
v3: Also do real_pid futex initialization
travis-ci: success for Rework file opening scheme to make it asynchronous (rev5)
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
No functional changes
travis-ci: success for Rework file opening scheme to make it asynchronous (rev5)
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
No functional changes
travis-ci: success for Rework file opening scheme to make it asynchronous (rev5)
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
For moving to a single transport socket scheme, we need to be able to differ
fds in receive queue from each other. Add a fle pointer as identifier for that.
v2: Rebase on compel
travis-ci: success for Rework file opening scheme to make it asynchronous (rev5)
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
System call sys_futex() requires that (from futex(2)):
"On all platforms, futexes are four-byte integers
that must be aligned on a four-byte boundary".
travis-ci: success for locks: Mask futexes aligned
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
1. LOGROTATEDIR is not used since commit f4e9a1d
("make: don't install service and logrotate configs").
2. SYSTEMDUNITDIR is not used since commit 10d5e9a
("criu: scripts: remove criu service files").
3. install-tree target was *never* used, makes no sense
to keep it.
While at it, also
- sort the variables in "export" statement to match
the order of appearance in Makefile;
- don't export DESTDIR (it is exported by default as
it always comes from the make command line);
- remove unused variable from INSTALL.md.
travis-ci: success for Makefile.install fixes
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Commit 6a51c7e ("make: Allow to install in custom dirs") replaced
all := assignments with ?=, effectively disabling the LIBDIR guessing
logic (as once a variable is assigned, further ?= make no sense).
That commit description says that setting PREFIX from make command line
didn't work. I can't find the original bug report but according to
GNU make documentation (see [1], [2]) as well as to my best knowledge,
any variable set in Makefile can be overridden from the command line,
unless "override VAR = value" is used in the Makefile.
The result of this patch is LIBDIR is correctly set for distros such as
Fedora and Debian, so "make install" works more correct. Surely, any
variable can still be overriden from the command line.
I have also checked the build of Fedora package from criu.spec with this
change -- it works fine.
Now, I am not sure why it was not working for the original bug reporter.
The only hypothesis I have is he tried to do something like
PREFIX=/usr make
instead of
make PREFIX=/usr
If this was the case, it was not a bug but wrong usage.
While at it, fix LIBDIR description in INSTALL.md.
[1] https://www.gnu.org/software/make/manual/html_node/Overriding.html
[2] https://www.gnu.org/software/make/manual/html_node/Override-Directive.html
travis-ci: success for Makefile.install fixes
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
These variables doesn't need to end with a slash.
This helps the next patch.
travis-ci: success for More polishing for compel cli
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
We gonna share this file between subprojects
so lets minimize deps on headers, only syscalls
are left here for non libc compiling.
travis-ci: success for Common headers
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
C compiler might generate calls to memcpy, memset, memcmp, and memmove
as it seem fit (so far we haven't seen memmove being required). That
means we need to provide our own versions of it for code which is not
linked to a libc.
We already have a solution for that in commit bdf6051
("pie: provide memcpy/memcmp/memset for noglibc case")
but we faced another problem of compiler trying to optimize
our builtin_memset() by inserting calls to memset() which
is just an alias in our case and so it lead to infinite recursion.
This was workarounded in commit 8ea0ba7 ("string.h: fix memset
over-optimization with clang") but it's not clear that was a proper
fix.
This patch is considered to be the real solution. As we don't have
any other implementations of memset/memcpy/memcmp in non-libc case,
we can call ours without any prefixes and avoid using weak aliases.
Implementation notes:
1. mem*() functions code had to be moved from .h to .c for the functions
to be compatible with their prototypes declared in /usr/include/string.h
(i.e. "extern").
2. FORTIFY_SOURCE needed to be disabled for code not linked to libc,
because otherwise memcpy() may be replaced with a macro that expands
to __memcpy_chk() which of course can't be resolved during linking.
https://travis-ci.org/kolyshkin/criu/builds/198415449
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Now changes in top-Makefile, middle-Makefile will result in a correct
recompiling, as it's expected:
[criu]$ touch criu/Makefile
[criu]$ make
<...>
DEP criu/arch/x86/sigframe.d
DEP criu/arch/x86/sigaction_compat.d
DEP criu/arch/x86/crtools.d
DEP criu/arch/x86/cpu.d
DEP criu/arch/x86/call32.d
CC criu/arch/x86/call32.o
CC criu/arch/x86/cpu.o
CC criu/arch/x86/crtools.o
<...>
travis-ci: success for Fix rebuild on Makefile changes
Reported-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
We found a weird case of parasite code dying of SIGSEGV when clang
is used as a compiler (see [1] for more details).
Apparently, it was caused by clang optimizing our builtin_memset()
by inserting a call to memset(). It is a valid compiler optimization,
aside from the fact that in our code memset() is defined as a weak
alias to builtin_memset(), which of course lead to infinite recursion
and stack growth.
This might be a bug in compiler, but there are ways to avoid it:
1. Rewrite builtin_memset() in asm (note it needs to be done
for every architecture supported).
2. Disable compiler optimizations for this code (say, by using -O0).
3. Declare the pointer inside builtin_memcpy() as volatile.
The last approach looks more appealing -- mostly for being simple.
[1] https://github.com/xemul/criu/issues/279
travis-ci: success for string.h: fix memset over-optimization with clang
Cc: Andrei Vagin <avagin@virtuozzo.com>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Add a link from task's pid to pstree_item.
Threads have this link set in NULL.
travis-ci: success for Make pstree_item::pid allocated dynamically
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
The expression rb_entry(node, struct pstree_item, pid.node)
may create a fake impression, that we dereferrence pstree_item
for threads too, which is a BUG, but it's not so, because
we are only interested in its ->pid field.
But anyway, escape of pstree_item, iterate over struct pid,
which are more readable.
travis-ci: success for Make pstree_item::pid allocated dynamically
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
This will be used to pass MSG_DONTWAIT in next patch.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Replace "-1" return with errno codes.
ENOMSG and EBADFD were choosen to do not cross with
standard recvmsg() errors (described in its man page).
This patch is need as preparation to making recv_msg()
be able to be non-block, and return EAGAIN and EWOULDBLOCK
in case of no data.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
cr_page_server() returns pid, if it is executed as a daemon
otherwise it returns an error code.
crtools returns 0 only if cr_page_server() returns a positive value,
what is obviously wrong.
travis-ci: success for crtools: close a signal descriptor after passing a preparation stage (rev6)
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
This patch adds the --siganl-fd FD option to specify a file descriptor.
CRIU will write '\0' to this descriptor and close it after passing
a preparation stage.
It is alternative way to demonizing a criu process after a preparation
stage. It's imposiable to get exit code, if a process has daemonized.
The introduced way allows to wait a preparation stage and to get an exit
code. It can be easy used from shell and other script languages.
v3: fix a help message
v4: Here is a sequence of actions how it can be used:
* open a pipe
* run a service with the pipe[1] as status_fd
* read(pipe[0]) to wait a moment when the service will be ready to
accept connections
* do a work which requires the service
* wait the service process to gets its exit status to be sure that
everything okey
travis-ci: success for crtools: close a signal descriptor after passing a preparation stage (rev6)
Cc: Mike Rapoport <mike.rapoport@gmail.com>
Cc: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
All kernel options, features that depend on kernel version
should be checked with `criu check`.
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
This is needed in cases when kernel doesn't support OFD locks.
(OFD locks were added in 2014).
travis-ci: success for zdtm: Add checkskip scripts for OFD locks
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Eugene Batalov <eabatalov89@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Some distros put 'ip' util in /bin directory.
travis-ci: success for tests: add '/bin/ip' to deps in addition to '/sbin/ip'
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
'info' array is off-by-one, nla_parse_nested() requires destination
array (i.e. 'info') to have maxtype+1 (i.e. IFLA_INFO_MAX+1) elements:
ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffef823e3f8
WRITE of size 48 at 0x7ffef823e3f8 thread T0
#0 0x7f9ab7a3915b in __asan_memset (/usr/lib/gcc/x86_64-pc-linux-gnu/5.4.0/libasan.so.2+0x8d15b)
#1 0x7f9ab6d4e553 in nla_parse (/usr/lib64/libnl-3.so.200+0xa553)
#2 0x4acfb7 in dump_one_netdev criu/net.c:445
#3 0x4adb60 in dump_one_ethernet criu/net.c:594
#4 0x4adb60 in dump_one_link criu/net.c:665
#5 0x48af69 in nlmsg_receive criu/libnetlink.c:45
#6 0x48af69 in do_rtnl_req criu/libnetlink.c:119
#7 0x4b0e86 in dump_links criu/net.c:878
#8 0x4b0e86 in dump_net_ns criu/net.c:1651
#9 0x4a760d in do_dump_namespaces criu/namespaces.c:985
#10 0x4a760d in dump_namespaces criu/namespaces.c:1045
#11 0x451ef7 in cr_dump_tasks criu/cr-dump.c:1799
#12 0x424588 in main criu/crtools.c:736
#13 0x7f9ab67b171f in __libc_start_main (/lib64/libc.so.6+0x2071f)
#14 0x4253d8 in _start (/criu/criu/criu+0x4253d8)
Address 0x7ffef823e3f8 is located in stack of thread T0 at offset 264 in frame
#0 0x4ac9ef in dump_one_netdev criu/net.c:364
This frame has 5 object(s):
[32, 168) 'netdev'
[224, 264) 'info' <== Memory access at offset 264 overflows this variable
[320, 1040) 'req'
[1088, 3368) 'path'
[3424, 3625) 'stable_secret'
Increase 'info' size to fix this.
Fixes: b705dcc34d ("net: pass the struct nlattrs to dump() functions")
travis-ci: success for net: fix stack out-of-bounds access in dump_one_netdev()
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>