From 54be548d2bc9bce726c36a7cc4217e0925a8ccd1 Mon Sep 17 00:00:00 2001 From: Steve Beattie Date: Tue, 7 Nov 2017 08:14:25 -0800 Subject: [PATCH] CompilerImprovements: initial markdown conversion --- CompilerImprovements.md | 96 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 96 insertions(+) create mode 100644 CompilerImprovements.md diff --git a/CompilerImprovements.md b/CompilerImprovements.md new file mode 100644 index 0000000..097208e --- /dev/null +++ b/CompilerImprovements.md @@ -0,0 +1,96 @@ +This pages tracks improvements made to the AppArmor compiler +performance over each release. + +Areas for improvement +===================== + +1. lexer - Front end duplicate include elimination + - per profile cache which files have already been included so they don't need to be processed multiple times + + + +2. Include parse caching + - cache include files post parsing so that they don't need to be re-parsed + + + +3. Rule duplicate removal (only done on files) + - almost unneeded if fix for 1 is done + - requires: to full replace (so code can be dropped) need 1 and 4 + +4. aare -> pcre convertion + - add native parsing of aare to the background eliminating this step + +5. early tree node merging + - as part of tree creation instead of blindly creating new nodes, merge into existing tree as parsing + - requires: 3 + +6. tree simplification + - store alt nodes is sorted vector + - store cat nodes in vector + - do tree factoring on vectors, does not require tree rewrites or left/right manipulations + - merge char nodes into charset nodes + - replace char nodes with charset node to remove duplicate code + +7. dfa creation - currently creates sets and removes duplicates which means a lot of creation and freeing + - push anode and nnode split into expr tree for nullable, firstpos, lastpos and follow calculations + - removes need to split sets, reducing allocation of set just to split it apart and lookup its constituent anode and nnode sets + - push anode and nnode caching into expr tree so it can be used by nullable, firstpos, lastpos + - add nnode vec merging, and move nnode to nodevec all the way through + - eliminates need to create nnode set to be immediately converted to nnode vec + - early accept node merging + - Instead of creating accept node sets do accept node merging and use single accept node to represent merged state + - nodevec merging cache + - merge operation cache provides duplicate elimination so that many node sets don't need to be recreated and freed after being lookuped at the cache level + - Eliminate nnode cache? + - its possible that between the nodevec merge cache and final combined nnode, anode cache the dedicated nnode cache is no longer benifical + +8. comb compression + - change to sliding window algorithm, to reduce set of slot comparisons done + - Note: diff compression also reduces the number of slot comparisons being done by reducing the number of transition + +9. Fix diff compression + - Current diff compression has a bug that needs to be found and fixed + +10. file cache + - cache files that have already been read so they don't need to be reread + - requires multiple profiles be processed by one compiler call to be effective (especially if duplicate include elimination is in effect) + +11. parallel compile + - compile individual profiles in parallel - separating the compile at the profile level should be fairly easy + - requires removal of global state + - compile profile components in parallel (see partial compiles) + - once partial compiles are in place it should be easy to compile these components separately + - pipeline profile construction. + - It is questionable whether a fine grained threading is worth while as the locking overhead is probably rather high but the different stages of the dfa constructions could be pipelined. + - tree optimization -> DFA construction -> minimization -> diff-encode -> comb compression + - since each stage is separate but dependent, separate threads could work on a given thread + +12. partial compiles - shared/precompiled + - share and potentially cache partial compile components + - includes are a natural boundary to precompile + - Requires dfa set operations + +Improvements per Release +======================== + +- 2.1 DFA introduced +- 2.3 tree factoring +- 2.4 minimization +- 2.5 +- 2.6 group info on state, singleton epsnode, remove ref count, +- 2.7 split dfa code - no major improvements +- 2.8 diff encoding +- 2.9 remove node hashing, fixing full minimization +- 2.10 early accept node grouping, and removal of accept alt node trees + +Performance Tests +================= + +small | Release | Time | Peak Mem | Nodes | States | Transitions 2.1 2.3 2.4 2.5 2.6 2.7 2.8 + +medium | Release | Time | Peak Mem | Nodes | States | Transitions 2.1 2.3 2.4 2.5 2.6 2.7 2.8 + +large | Release | Time | Peak Mem | Nodes | States | Transitions 2.1 2.3 2.4 2.5 2.6 2.7 2.8 + +huge | Release | Time | Peak Mem | Nodes | States | Transitions 2.1 2.3 2.4 2.5 2.6 2.7 2.8