We close it in sigreturn_restore() for unification with other
service fds, so kill the second close() from here.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The API is as simple as
srv := MakePhaulServer(config)
cln := MakePhaulClient(local, remote, config)
cln.Migrate()
* config is the PhaulConfig struct that contains pid to migrate,
memory transfer channel (file descriptor) that phaul can use
to send/receive memory and path to existing directory where
phaul can put intermediate files and images.
* local is PhaulLocal interface with (for now) the single method
- DumpCopyRestore(): method that phaul calls when it's time
to do engine-specific dump, images copy and restore on
the destination side.
Few words about the latter -- we've learned, that different
engines have their own way to call CRIU to dump a container,
so phaul, instead of dumping one by its own, lets the caller
do it. To keep-up with pre-dump stuff, the client should
not forget to do three things:
- set the TrackMem option to true
- set the ParentImg to the passed value
- set the Ps (page server) channel with 'config.Memfd'
The criu object is passed here as well, so that caller can
call Dump() on it (once we have keep_open support in libcriu
this will help to avoid additional criu execve).
The method also should handle the PostDump notification and
do images-copy and restore in it. Not sure how to wrap this
into phaul better.
* remote is PhaulRemote interface whose method should be called
on the dst side on the PhaulServer object using whatever RPC
the caller finds acceptable.
As a demonstration the src/test/main.go example is attached. To
see how it goes 'make' it, then start the 'piggie $outfile'
proggie and run 'test $pid' command. The piggie will be, well,
live migrated locally :) i.e. will appear as a process with
different pid (it lives in a pid namespace).
Changes since v2:
* Reworked the API onto local/remote/config scheme
* Added ability to configure diretory for images
* Re-used server side Criu object for final restore
Changes since v1:
* Supported keep_open-s for pre-dumps
* Added code comments about interface
* Simplified the example code
Further plans for this are
- move py p.haul to use this compiled library
- add post-copy (lazy pages) support (with Mike help)
- add image-cache and image-proxy (with Ridrigo help)
- add API/framwork for FS migration
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
To suppress protobuf's warning:
> [libprotobuf WARNING google/protobuf/compiler/parser.cc:546]
> No syntax specified for the proto file: remote-image.proto.
> Please use 'syntax = "proto2";' or 'syntax = "proto3";'
> to specify a syntax version. (Defaulted to proto2 syntax.)
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
We don't need task in that function, also this allows
to delete fake task in read_ns_with_hookups().
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Namespaces are read in read_ns_with_hookups(),
when tasks are not read. So, root_item is NULL,
and NS_ROOT is not set for appropriate namespaces.
This patch fixes NS_ROOT after tasks are read. Also
it adds uts, ipc and cgroup namespaces for uniformity.
v2: Use macro MARK_ROOT_NS()
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
During reading of ns file, we add namespaces with fake pid -1.
Allow to override it later with real pid of a process.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
It improves readability, when they all are in the only place
and they all are seen.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Child process is created to set NS_OTHER user_ns,
before creation of a net_ns.
So, this CLONE_NEWNET is useless, and the created
net_ns is lost right after we do unshare() in create_net_ns().
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Everything is prepared for nested user namespaces support.
The only thing, we should do more, is to enter to dumped
user namespace's parent before the dump.
We use CLONE_VM for child tasks, so they may populate
user_ns maps in parent memory without any tricks.
v3: Check for WIFEXITED(). Fixed stack size.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
xids are saved according to NS_ROOT, while in pie
we may set them in their target user_ns. So, let's
convert them. Look at the commentary in the code,
while we save them in NS_ROOT.
Also, small cleanup: use creds instead of args->creds
for caps.
v4: Use target_userns_gid() to convert gids.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
They may not be mapped in target user_ns, so dump they
values in NS_ROOT. But because of backward compatibility
we can't collect their values from "/proc/[pid]/status",
because it's supported on the most recent kernel only.
So, choose this dump file format (dumping values in NS_ROOT),
and we be ready for the future.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
When target process is in a user_ns, where we do not have a permissions,
we need use usernsd helper to get its fds.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Restore task's user_ns, and keep in mind we born in parent's user_ns
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Add a field pstree_item::user_ns, which shows the task's current
user_ns, and introduce helpers to set it.
v3: Rebase on fdstore
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
As access to /proc/[pid]/fd/[i] of a task from parent's
user_ns is prohibited, introduce a helper, doing that
via usernsd.
Also, remove BUG_ON() in usernsd, as now it may be used
without input fd parameter.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Since net ns may reffer not only to root_user_ns,
set appropriate user_ns before its creation.
v3: New
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Create user namespaces hierarhy from criu main task.
Open ns'es fds, so they are seen for everybody in
fdstore.
Why we do it this way.
1)User namespaces are not correlated with task
hierarhy. Parent task may have a user namespace
of a level bigger, that a child task. So, we
can't restore the user namespaces just by
passing CLONE_NEWUSER in fork_with_pid().
2)CLONE_FS tasks will require user_ns is set at the
moment of clone(), so we have to restore target user_ns
in locality of create_children_and_session() in this case.
v3: Check for WIFEXITED(). Aligned stack.
Use fdstore to keep ns fd.
Create tree from root_item.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
CR_PROC_FD_OFF is need for accessing to foreign tasks
fds, and will be used in the future.
TRANSPORT_FD_OFF is for uniformity.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Currently, it's used by criu from CRIU_NS only.
So, in fact open_proc_rw() leads to opening of
a fd in CRIU_NS /proc (open_pid_proc() just
opens "/proc" dir, when PROC_FD_OFF is not set).
Make write_id_map() use CR_PROC_FD_OFF, which
exists, and does not confuse a user.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The patch introduces generic way for dumping all the namespaces
in a generic way (currently, only user ns entries are dumped).
Handler for old user ns images is remained on its place.
v4: Rebase on generic parent_id and userns_id.
v3: On restore, keep in mind, that parent ns may not be read
at the moment of the searching of it.
Set correct user ns id to d_ns.
Reflect the fact, that parent_id is moved to pid and user ext.
Read ns ids before tasks.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
New image format, generic for all namespaces.
Currently, it's for pid, net and user ns.
v4: Rename ns-hookup to ns.
Make user_ns and parent generic.
v3: Move parent_id to pid and user ext
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Old type images do not have pointer to user_ns.
Set them manually.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
root_item may have NS_OTHER user_ns, so do not set it directly.
This will be used in next patches.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Split prepare_userns() in two functions.
Also, this commit fixes the problem, which existed before my patchset.
We do not populate userns_entry on restore, though it's need and used
at least by the chain prepare_mnt_ns()->sb_opt_cb()->userns_uid().
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Add primitives for converting xids from NS_ROOT to custom
NS_OTHER, and vice versa.
v4: Fixed erratum in root_userns_gid()
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Make possible to convert uid and gid from a user_ns to its
representation in its (grand) parent user_ns.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Make ns as only argument of dump_user_ns(). As the only ns,
which it may be called for is root_item's ns, the logic
after this patch remains the same as it was before.
Also make dump_user_ns() static.
In addition, pass ns to check_user_ns().
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Discover relationships between namespaces
and populate appropriate fields in ns_id
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Introduce ns_id::parent and assign a pointer to parent
for every ns except NS_CRIU and NS_ROOT.
Also populate user_ns for pid_ns.
v5: Remove excess check on on->parent.
v4: Set "ret = -1" on one of the error pathes.
Add comment about user_ns finding.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Check UID and GID in unshared userns remains the same
v5: Use custom UID and GID.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Create two children, and unshare() user_ns in one of them (C1).
The second child creates one more process, which switches to C1's
namespace and unshares.
v4: Keep in mind the case, when readlink returns PATH_MAX-length string.
Print full wait status instead of WEXITSTATUS().
v3: Unshare net ns in grand child
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The current patch brings the implementation of the image proxy and image cache.
These components are necessary to perform in-memory live migration of processes
using CRIU. The image proxy receives images from CRIU Dump/Pre-Dump (through
UNIX sockets) and forwards them to the image cache (through a TCP socket). The
image cache caches image in memory and sends them to CRIU Restore (through
UNIX sockets) when requested.
Signed-off-by: Rodrigo Bruno <rbruno@gsd.inesc-id.pt>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>