At now we pretend that all threads are sharing seccomp chains
and at checkpoint moment we test seccomp modes to make sure
if this assumption is valid refusing to dump otherwise.
Still the kernel tacks seccomp filter chains per each thread
and now we've faced applications (such as java) where per-thread
chains are actively used. Thus we need to bring support of handling
filters via per-thread basis.
In this a bit intrusive patch the restore engine is lifted up
to treat each thread separately. Here what is done:
- Image core file is modified to keep seccomp filters
inside thread_core_entry. For backward compatibility
former seccomp_mode and seccomp_filter members in
task_core_entry are renamed to have old_ prefix and
on restore we test if we're dealing with old images.
Since per-thread dump is not yet implemeneted the
dumping procedure continue operating with old_ members.
- In pie restorer code memory containing filters are addressed
from inside thread_restore_args structure which now
contains seccomp mode itself and chain attributes
(number of filters and etc).
Reading of per-thread data is done in seccomp_prepare_threads
helper -- we take one pstree_item and walks over every thread
inside to allocate pie memory and pin data there.
Because of PIE specific, before jumping into pie code
we have to relocate this memory into new place and
for this seccomp_rst_reloc is served.
In restorer itself we check if thread_restore_args provides
us enabled seccomp mode (strict or filter passed) and call
for restore_seccomp_filter if needed.
- To unify names we start using seccomp_ prefix for all related
stuff involved into this change (prepare_seccomp_filters renamed
to seccomp_read_image because it only reads image and nothing
more, image handler is renamed to seccomp_img_entry instead
of too short 'se'.
With this change we're now allowed to start collecting and
dumping seccomp filters per each thread, which will be
done in next patch.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Right now they all sit in a separate file. Since we
don't support CLONE_SIGHAND (and don't plan to) it's
much better to have them in core, all the more so
by the time we dump/restore sigacts, the core entry
is at hands already.
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
There are several places in image files, where we store
integers, but these numbers actually mean some string.
E.g. socket families, states and types and tasks states.
So here's the (criu).dict option for such fields that
helps to convert the numbers into strings and back.
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
cgroup namespaces are imminent to be merged into the kernel (indeed, they
went into and out of 4.5 for minor issues), and will be carried as a
patchset in the ubuntu 16.04 kernel. Here's an attempt at c/r.
There are essentially three key steps:
* on dump, in parse_task_cgroup, we should ask the task what cgroups it
thinks it is in (unless it has the same cgroup ns id as its parent, then we
should just take the prefixes from the parent's set), and set the prefix on
the cg set
* add a new restore step, prepare_cgroup_namespace(), which happens in
prepare_task_cgroup() that does an unshare() if necessary
* when restoring, in move_in_cgroup, if we're going to restore via usernsd,
leave the full path. if not, use (cgset->path + len(cgset->cgns_prefix) as
the path, since we will have already moved into the cgns_prefix and unshared.
Another observation here is that we can support nesting, since these are
restored heirarchically by nature.
v2: * store cgns prefix length instead of full prefix in images
* set has_cgroup_ns_id conditionally
* drop unused argument to move_in_cgroup
* add extra comments about what is happening when unsharing() on
restore
* add extra comments about what is happening when computing the actual
cgns prefix
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
But keep @protobuf as a symlink: we have
this path encoded in sources. Gonna be
removed with time.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>