5. Regex to NFA to DFA conversions
Thompson's algorithm is used to convert from a Regex to an NFA. Subset construction is used to convert from an NFA to an equivalent DFA.
Thompson's algorithm applies the following rules recursively:
Exercises
5.1. Convert one of the following regular expressions to an NFA using Thompson's algorithm and then to a DFA using subset construction.
- $ (1 \cup \varepsilon)(0^*1)^*0^* $
- $ (0\cup 01) \cup 1(10^*)^* \cup \varepsilon $
- $ (((00 \cup 11)^*1)^*0)^* $
5.2. What would happen if the construction step for $ e^* $ in Thompson's algorithm were defined as follows? What would go wrong? Find a counterexample word for each of them.
5.3.1 What regular expressions can be converted to NFAs without using epsilon-transitions (and without involving any NFA to DFA conversion)? Think about simple cases, like concatenations of symbols, unions of symbols, etc. and try more complex cases. Can this approach be transformed into a recursive algorithm like Thompson's algorithm?
5.3.2. Consider an NFA as an interface that provides the following methods:
initState(): State
returns the initial state of the NFAendState(): State
returns a distinguished state of the NFAgeneratesEpsilon(): bool
returns whether or not the initial state is a final statetoggleEpsilon(final: bool): NFA
returns the same NFA, but makesinitState()
final or non-final based on the parameterfinal
merge(nfa2: NFA): NFA
returns an NFA constructed from this one andnfa2
whereendState()
andinitState()
are merged together into a single state; the new end state will benfa2.endState()
unless it's been merged, in which case it will be the new merged state.adjoin(nfa2: NFA): NFA
returns an NFA constructed from this one andnfa2
by merging theinitState()
s together and theendState()
s together into 2 new states that will be the init state and end state of the new NFA.
Using only these operations to create NFAs from other NFAs, try to find a recursive algorithm to convert a Regular Expression to an NFA without epsilon-transitions. Discuss base cases, construction steps and justify them. It may be useful to consider some invariants of the algorithm (one such invariant is, of course, the lack of epsilon-transitions, which explains the names of the generatesEpsilon
and toggleEpsilon
methods).