are computed and stored in a map, that is not cleaned up. This means that the labeling
is retained across different dfas.
Move the labeling into expr node as this takes less memory than using a map and will
also separates node labeling so its per dfa instead of global. In addition this means
the labeling is cleanedup/freed when the expr tree is freed without any extra work.
each expression tree node and then used as input to create the dfa states.
Currently they are not being freed until the nodes are destroyed, but the information
is no longer needed once the dfa has been created. Cleaning them up early reduces
peak memory usage.
* a non-include related syntax error (errors/modefail.sd)
* multiple successful includes followed by a failed include
(errors/multi_include.sd)
It also fixes two issues with the parser's line counting:
* the count began at 0 (demonstrated by the first testcase's error
being reporting on one line less than it should be), and
* an extra line increment when includes were detected (demonstrated
by the second testcase's error being reported at a line beyond the
correct linenumber.
The existing testcases did not catch these because they were all
based on the first include in the file failing and so the start of
the count from 0 counteracted the extra counted line.
tests. I don't like the solution because it exposes a data structure
definition outside of the only file that should know it's layout.
Also, fixed the Makefile to fail the build when one of the unit test
programs fails. :-(
Instead of updating the profile name, allow a profile to have multiple
alternate names. Aliases are now added as alternate names and matched
through the xmatch dfa.
Alias was broken because it when an alias was made the old path was completely
removed and there was no way to specify it. Update it so aliases just add
an new duplicate rule instead.
imediately after the current partition being considered, instead of
at the back of the parition list. This does two things, it makes it
more likely the data is in cache, and it also in general results in
more partitions being created in a single pass.
not an algorithmic improvement. It does the same basic algorithm of
test until it can insert the data, but instead of only tracking the
first free entry (and recomputing it each pass). It tracks all
free entries reducing the number of comparisons done and the table
grows in size.
This may actually result in a small loss on small tables, but is a win
for larger tables.
Update the hash calculation to guarentee that states with a different
number of transition entries will be placed in seperate partitions.
This will allow for a better character transition based state comparison.
Add basic Hopcroft based dfa minimization. It currently does a simple
straight state comparison that can be quadratic in time to split partitions.
This is offset however by using hashing to setup the initial partitions so
that the number of states within a partition are relative few.
The hashing of states for initial partition setup is linear in time. This
means the closer the initial partition set is to the final set, the closer
the algorithm is to completing in a linear time. The hashing works as
follows: For each state we know the number of transitions that are not
the default transition. For each of of these we hash the set of letters
it can transition on using a simple djb2 hash algorithm. This creates
a unique hash based on the number of transitions and the input it can
transition on. If a state does not have the same hash we know it can not
the same as another because it either has a different number of transitions
or or transitions on a different set.
To further distiguish states, the number of transitions of each transitions
target state are added into the hash. This serves to further distiguish
states as a transition to a state with a different number of transitions
can not possibly be reduced to an equivalent state.
A further distinction of states is made for accepting states in that
we know each state with a unique set of accept permissions must be in
its own partition to ensure the unique accept permissions are in the
final dfa.
The unreachable state removal is a basic walk of the dfa from the start
state marking all states that are reached. It then sweeps any state not
reached away. This does not do dead state removal where a non accepting
state gets into a loop that will never result in an accepting state.