5. Regex to NFA to DFA conversions

Thompson's algorithm is used to convert from a Regex to an NFA. Subset construction is used to convert from an NFA to an equivalent DFA.

Thompson's algorithm applies the following rules recursively:

Exercises

5.1. Convert one of the following regular expressions to an NFA using Thompson's algorithm and then to a DFA using subset construction.

  • $ (1 \cup \varepsilon)(0^*1)^*0^* $
  • $ (0\cup 01) \cup 1(10^*)^* \cup \varepsilon $
  • $ (((00 \cup 11)^*1)^*0)^* $

5.2. What would happen if the construction step for $ e^* $ in Thompson's algorithm were defined as follows? What would go wrong? Find a counterexample word for each of them.

5.3.1 What regular expressions can be converted to NFAs without using epsilon-transitions (and without involving any NFA to DFA conversion)? Think about simple cases, like concatenations of symbols, unions of symbols, etc. and try more complex cases. Can this approach be transformed into a recursive algorithm like Thompson's algorithm?

5.3.2. Consider an NFA as an interface that provides the following methods:

  • initState(): State returns the initial state of the NFA
  • endState(): State returns a distinguished state of the NFA
  • generatesEpsilon(): bool returns whether or not the initial state is a final state
  • toggleEpsilon(final: bool): NFA returns the same NFA, but makes initState() final or non-final based on the parameter final
  • merge(nfa2: NFA): NFA returns an NFA constructed from this one and nfa2 where endState() and initState() are merged together into a single state; the new end state will be nfa2.endState() unless it's been merged, in which case it will be the new merged state.
  • adjoin(nfa2: NFA): NFA returns an NFA constructed from this one and nfa2 by merging the initState()s together and the endState()s together into 2 new states that will be the init state and end state of the new NFA.

Using only these operations to create NFAs from other NFAs, try to find a recursive algorithm to convert a Regular Expression to an NFA without epsilon-transitions. Discuss base cases, construction steps and justify them. It may be useful to consider some invariants of the algorithm (one such invariant is, of course, the lack of epsilon-transitions, which explains the names of the generatesEpsilon and toggleEpsilon methods).