speedup and reduce memory usage of dfa generation
A variety of changes to improve dfa generation
- By switching to Nodevec instead of Node sets we can reduce memory usage slightly and reduce code
- By using charsets for chars we reduce code and increase chances of node merging/reduction which reduces memory usage slightly
- By merging charsets we reduce the number of nodes
Signed-off-by: John Johansen <john.johansen@canonical.com>
MR: https://gitlab.com/apparmor/apparmor/-/merge_requests/1066
Approved-by: John Johansen <john@jjmx.net>
Merged-by: John Johansen <john@jjmx.net>
In preparation for more flags (not all of the backend dfa based),
rework the optimization and dump flag handling which has been exclusively
around the dfa up to this point.
- split dfa control and dump flags into separate fields. This gives more
room for new flags in the existing DFA set
- rename DFA_DUMP, and DFA_CONTROL to CONTROL_DFA and DUMP_DFA as
this will provide more uniform naming for none dfa flags
- group dump and control flags into a structure so they can be passed
together.
Signed-off-by: John Johansen <john.johansen@canonical.com>
The inverse character set lists the characters it doesn't match. If
the inverse character set contains an oob then that is NOT considered
a match. So length should be one.
However because of oobs are handle not containing an oob doesn't mean
there is a match either. Currently the only way to match an oob is
via a positive express (no inverse matches are possible).
Signed-off-by: John Johansen <john.johansen@canonical.com>
The dynamic_cast operator is slow as it needs to look at RTTI
information and even does some string comparisons, especially in deep
hierarchies like the one for Node. Profiling with callgrind showed
that dynamic_cast can eat a huge portion of the running time, as it
takes most of the time that is spent in the simplify_tree()
function. For some complex profiles, the number of calls to
dynamic_cast can be in the range of millions.
This commit replaces the use of dynamic_cast in the Node hierarchy
with a method called is_type(), which returns true if the pointer can
be casted to the specified type. It works by looking at a Node object
field that is an integer with bits set for each type up in the
hierarchy. Therefore, dynamic_cast is replaced by a simple bits
operation.
This change can reduce the compilation times for some profiles more
that 50%, especially in arm/arm64 arch. This opens the door to maybe
avoid "-O no-expr-simplify" in the snapd daemon, as now that option
would make the compilation slower in almost all cases.
This is the example profile used in some of my tests, with this change
the run-time is around 1/3 of what it was before on an x86 laptop:
profile "test" (attach_disconnected,mediate_deleted) {
dbus send
bus={fcitx,session}
path=/inputcontext_[0-9]*
interface=org.fcitx.Fcitx.InputContext
member="{Close,Destroy,Enable}IC"
peer=(label=unconfined),
dbus send
bus={fcitx,session}
path=/inputcontext_[0-9]*
interface=org.fcitx.Fcitx.InputContext
member=Reset
peer=(label=unconfined),
dbus receive
bus=fcitx
peer=(label=unconfined),
dbus receive
bus=session
interface=org.fcitx.Fcitx.*
peer=(label=unconfined),
dbus send
bus={fcitx,session}
path=/inputcontext_[0-9]*
interface=org.fcitx.Fcitx.InputContext
member="Focus{In,Out}"
peer=(label=unconfined),
dbus send
bus={fcitx,session}
path=/inputcontext_[0-9]*
interface=org.fcitx.Fcitx.InputContext
member="{CommitPreedit,Set*}"
peer=(label=unconfined),
dbus send
bus={fcitx,session}
path=/inputcontext_[0-9]*
interface=org.fcitx.Fcitx.InputContext
member="{MouseEvent,ProcessKeyEvent}"
peer=(label=unconfined),
dbus send
bus={fcitx,session}
path=/inputcontext_[0-9]*
interface=org.freedesktop.DBus.Properties
member=GetAll
peer=(label=unconfined),
dbus (send)
bus=session
path=/org/a11y/bus
interface=org.a11y.Bus
member=GetAddress
peer=(label=unconfined),
dbus (send)
bus=session
path=/org/a11y/bus
interface=org.freedesktop.DBus.Properties
member=Get{,All}
peer=(label=unconfined),
dbus (receive, send)
bus=accessibility
path=/org/a11y/atspi/**
peer=(label=unconfined),
dbus (send)
bus=system
path=/org/freedesktop/Accounts
interface=org.freedesktop.DBus.Introspectable
member=Introspect
peer=(label=unconfined),
dbus (send)
bus=system
path=/org/freedesktop/Accounts
interface=org.freedesktop.Accounts
member=FindUserById
peer=(label=unconfined),
dbus (receive, send)
bus=system
path=/org/freedesktop/Accounts/User[0-9]*
interface=org.freedesktop.DBus.Properties
member={Get,PropertiesChanged}
peer=(label=unconfined),
dbus (send)
bus=session
interface=org.gtk.Actions
member=Changed
peer=(name=org.freedesktop.DBus, label=unconfined),
dbus (receive)
bus=session
interface=org.gtk.Actions
member={Activate,DescribeAll,SetState}
peer=(label=unconfined),
dbus (receive)
bus=session
interface=org.gtk.Menus
member={Start,End}
peer=(label=unconfined),
dbus (send)
bus=session
interface=org.gtk.Menus
member=Changed
peer=(name=org.freedesktop.DBus, label=unconfined),
dbus (send)
bus=session
path="/com/ubuntu/MenuRegistrar"
interface="com.ubuntu.MenuRegistrar"
member="{Register,Unregister}{App,Surface}Menu"
peer=(label=unconfined),
}
xattrs can contain NULL characters in their values which means we can
not user regular NULL transitions to separate values. To fix this
use out of band transition instead.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Currently the NULL character is used as an out of band transition
for string/path elements. This works for them as the NULL character
is not valid for this data. However this does not work for binary
data that can contain a NULL character.
So far we have only dealt with fixed length fields of binary data
making the NULL separator either unnecessary.
However binary data like in the xattr match and mount data field are
variable length and can contain NULL characters. To deal with this
add the ability to specify out of band transitions, that can only
be triggered by code not input data.
The out of band transition can be used to separate variable length
data fields just as the NULL transition has been used to separate
variable length strings.
In the compressed hfa out of band transitions are expressed as a
negative offset from the states base. This leaves us room to expand
the character match range in the future if desired and on average
makes the range between the out of band transition and the input
transitions smaller than would be had if the out of band transition
had been stored after the valid input transitions.
Out of band transitions in the dfa will not break old kernels
that don't know about them, but they won't be able to trigger
the out of band transition match. So they should not be used unless
the kernel indicates that it supports them.
It should be noted that this patch only adds support for a single
out of band transition. If multiple out of band transitions are
required. It is trivial to extend.
- Add a tag indicating support in the kernel
- add a oob max range field to the dfa header so the kernel knows
what the max range that needs verifying is.
- extend oob generation fns to generate oob based on value instead
of a fixed -1.
Signed-off-by: John Johansen <john.johansen@canonical.com>
As a step in preparing for out of band transitions and double walk
transitions rework the backend from using a char index to a class
with an larger range than char.
Signed-off-by: John Johansen <john.johansen@canonical.com>
The length of a xmatch is used to prioritize multiple profiles that
match the same path, with the intent that the more specific match wins.
Currently, the length of a xmatch is computed by the position of the
first regex character.
While trying to work around issues with no_new_privs by combining
profiles, we noticed that the xmatch length computation doesn't work as
expected for multiple regexs. Consider the following two profiles:
profile all /** { }
profile bins /{,usr/,usr/local/}bin/** { }
xmatch_len is currently computed as "1" for both profiles, even though
"bins" is clearly more specific.
When determining the length of a regex, compute the smallest possible
match and use that for xmatch priority instead of the position of the
first regex character.
Elaborate in class comment of firstpos, lastpos, followpos, and nullable
fields beyond just referencing the Dragon book. Also add the section of
the book these are explained in.
The shared node type will be used in the future to add new capabilities
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Steve Beattie <steve@nxnw.org>
We need to rework permission type mapping to nodesets, which means we
need to move the nodeset computations earlier in the dfa creation
processes, instead of a post step of follow(), so move the nodeset
into expr-tree
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Steve Beattie <steve@nxnw.org>
Acked-by: Seth Arnold <seth.arnold@canonical.com>
Make the accept information dump output be in hexidecimal like the
other dumps so its easier to reference between them.
Signed-off-by: John Johansen <john.johansen@canonical.com>
This is patch tries to reduce the number of dynamic_cast<>s needed
during normalization by pushing the operations of normalize_tree()
into the expr-tree classes themselves rather than perform it as
an external function. This eliminates the need for dynamic_cast<>
checks on the current object under inspection and reduces the number
of checks needing to be performed on child Nodes as well.
In non-strict benchmarking, doing the dynamic_cast<> reduction
for just the tree normalization operation resulted in a ~10-15%
improvement in overall time on a couple of different hosts (amd64,
armel), as measured against apparmor_parser -Q. Valgrind's callgrind
tool indicated a reduction in the number of calls to dynamic_cast<>
on the tst/simple_tests/vars/dbus_vars_9.sd test profile from ~19
million calls to ~12 million.
In comparisons with dumped expr trees over both the entire
tst/simple_tests/ tree and from 1000 randomly generated profiles via
stress.rb, the generated trees were identical.
Patch history:
v1: initial version of patch
v2: update patch to take into account the infinite loop fix in
trunk rev 1975 and refresh against current code.
v3: no change
Signed-off-by: Steve Beattie <steve@nxnw.org>
Acked-by: Seth Arnold <seth.arnold@canonical.com>
Acked-by: John Johansen <john.johansen@canonical.com>
non-accepting, and have the proto-state use them.
To reduce memory overhead each set gains its own "cache" that make sure
there is only a single instance of each NodeSet generated. And since
we have a cache abstraction, move relavent stats into it.
Also refactor code slightly to make caches and work_queue etc, DFA member
variables instead of passing them as parameters.
The split + caching results in a small reduction in memory use as the
cost of ProtoState + Caching is less than the redundancy that is eliminated.
However this results in a small decrease in performance.
Sorry I know this really should have been split into multiple patches
but the patch evolved and I got lazy and decided to just not bother
splitting it.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Kees Cook <kees@ubuntu.com>
Create a new ProtoState class that will encapsulate the split, but for
this patch it will just contain what was done previously with NodeSet
Signed-off-by: John Johansen <john.johansen@canonical.com>
Split hfa into hfa and compressed_hfa files. The hfa portion focuses on
creating an manipulating hfas, while compressed_hfa is used for creating
compressed hfas that can be used/reused at run time with much less memory
usage than the full blown hfa.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-By: Steve Beattie <sbeattie@ubuntu.com>
Start of splitting regexp.y into logical components instead of the mess
it is today. Split out the expr-tree and parsing components from regexp.y
int expr-tree.x and parse.y and since regexp.y no longer does parsing
rename it to hfa.cc
Some code cleanups snuck their way into this patch and since I am to
lazy to redo it, I have left them in.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-By: Steve Beattie <sbeattie@ubuntu.com>