====== 6. Dfa to Regex conversions ====== ==== 6.1. State elimination ==== Consider the following DFAs: ^ **DFA1** ^ **DFA2** ^ |{{ :lfa:screenshot_2021-11-04_at_15.33.10.png?400 |}}| {{ :lfa:2022:lfa2022_lab5_ex2_4_cerinta.png?300 |}} | Convert the given DFAs to a Regex (using the state-elimination strategy). Hint: is it easier to apply conversion on another DFA? It is recommended to apply the State elimination algorithm on the minimal DFA. {{ :lfa:2022:lfa2022_lab6_mindfa2_1.png?300 |}} **Step 1:** If there are multiple final states, make them non-final, create a new final state and add ε-transitions to it. If a final state has any transitions going out of it, create a new final state and add ε-transition. If the initial state has any transitions going into it, create a new initial state and add ε-transition. {{ :lfa:2022:lfa2022_lab6_mindfa2_2.png?300 |}} ** Step 2:** Pick a state that is not initial and not final. For each way to reach its direct successors from its direct predecessors, add a new transition. Then remove the state. - Pick **state 1**: - in: * 0 --(ε)--> 1 * 2 --(0)--> 1 - out: * 1 --(ε)--> 4 * 1 --(1)--> 2 - loop: * 1 --(0)--> 1 - Add new transitions: * 2 --(00*)--> 4 * 2 --(00*1)--> 2 * 0 --(0*)--> 4 * 0 --(0*1)--> 2 * {{ :lfa:2022:lfa2022_lab6_mindfa2_3.png?300 |}} - Pick **state 3**: - Because state 3 has no direct successor, it can simply be removed without adding new transitions. - Pick **state 2**: - in: * 0 --(0*1)--> 2 - out: * 2 --(ε U 00*)--> 4 - loop: * 2 --(00*1)--> 4 - Add new transitions: * 0 --(0*1(00*1)*(ε U 00*))--> 4 * This transition can be written more simply: 0 --( (0*1)(0+1)*0*)--> 4 * {{ :lfa:2022:lfa2022_lab6_mindfa2_4.png?300 |}} ** Step 3:** Repeat step 2 and stop when there is one final and one initial state left. - first, we apply the minimisation algorithm * {{ :lfa:2022:lab6_1_dfa1_1.png?500 |}} - we add the new final and initial states * {{ :lfa:2022:lab6_1_dfa1_2.png?500 |}} - we eliminate the state 8 - as it has no outgoing transitions, we can just remove it * {{ :lfa:2022:lab6_1_dfa1_3.png?500 |}} - we eliminate the state 0,1 - in: * init --(ε)--> 0,1 - out: * 0,1 --(ε)--> fin * 0,1 --(1)--> 2 - loop: * 0,1 --(0)--> 0,1 - we add: * init --(0*)--> fin * init --(0*1)--> 2 * {{ :lfa:2022:lab6_1_dfa1_4.png?500 |}} - we eliminate the state 2 - in: * init --(0*1)--> 2 * 3 --(1)--> 2 - out: * 2 --(ε)--> fin * 2 --(0)--> 3 * 2 --(1)--> 4 - we add: * init --(0*1)--> fin * init --(0*10)--> 3 * init --(0*11)--> 4 * 3 --(1)--> fin * 3 --(10)--> 3 * 3 --(11)--> 4 * {{ :lfa:2022:lab6_1_dfa1_5.png?500 |}} - we eliminate the state 4 - in: * init --(0*11)--> 4 * 3 --(11)--> 4 * 6,7 --(1)--> 4 - out: * 4 --(0)--> 6,7 - we add: * init --(0*110)--> 6,7 * 3 --(110)--> 6,7 * 6,7 --(10)--> 6,7 * {{ :lfa:2022:lab6_1_dfa1_6.png?500 |}} - we eliminate the state 6,7 - in: * init --(0*110)--> 6,7 * 3 --(110)--> 6,7 * 5 --(1)--> 6,7 - out: * 6,7 --(0)--> 5 * 6,7 --(ε)--> fin - loop: * 6,7 --(10)--> 6,7 - we add: * init --(0*110(10)*0)--> 5 * init --(0*110(10)*)--> fin * 3 --(110(10)*0)--> 5 * 3 --(110(10)*)--> fin * 5 --(1(10)*0)--> 5 * 5 --(1(10)*)--> fin * {{ :lfa:2022:lab6_1_dfa1_7.png?500 |}} - we eliminate the state 3 - in: * init --(0*10)--> 3 - out: * 3 --(ε|1)--> fin * 3 --(0|110(10)*0)--> 5 - loop: * 3 --(10)--> 3 - we add: * init --(0*10(10)*(ε|1) )--> fin * init --(0*10(10)*(0|110(10)*0) )-->5 * {{ :lfa:2022:lab6_1_dfa1_8.png?500 |}} - we eliminate the state 5 - in: * init --(0*110(10)*0|0*10(10)*(0|110(10)*0) )--> 5 - out: * 5 --(1(10)*)--> fin - loop: * 5 --(1(10)*0)--> 5 - we add: * init --( (0*110(10)*0|0*10(10)*(0|110(10)*0) ) (1(10)*0)*1(10)*)--> fin - the final regex is '' 0*|0*1|0*110(10)*|0*10(10)*(ε|1|110(10)*)|(0*110(10)*0|0*10(10)*(0|110(10)*0))(1(10)*0)*1(10)*'' ==== 6.2. Brzozowsky's algebraic method ==== [[https://en.wikipedia.org/wiki/Janusz_Brzozowski_(computer_scientist)|Janusz Brzozowski]] worked out a very elegant (and more computationally efficient) way to convert Dfa's to Regexes. It relies on an observation called **Arden's Lemma**: === Arden's Lemma === **Proposition (Arden's Lemma).** Let $math[X, A] and $math[B] be languages, such that $math[X = A\cdot X \cup B]. Then $math[X = A^*B]. * Note that we can apply Arden's lemma in various settings, for instance, let $math[e_A, e_B] be regexes such that $math[X = L(e_A)\cdot X \cup L(e_B)]. Then $math[X = L(e_A^*e_B)]. === Dfa to regex conversion === For each state $math[q], build an equation of the form: $math[q = c_1 q_1 \cup c_2 q_2 \ldots c_n q_n], such that: $math[\delta(q,c_i) = q_i]. Here $math[c_i\in\Sigma], thus $math[q_i] are the $math[c_i]-successors of $math[q]. Additionally, if $math[q] is a final state, add an $math[\epsilon]: $math[q = c_1 q_1 \cup c_2 q_2 \ldots c_n q_n \cup \epsilon]. ^ ^ ^ | **Example** | {{ :lfa:2022:screenshot_2022-11-09_at_09.35.36.png?200 |}} | | Consider the Dfa from the left figure. The equations that we get are: | ::: | | $math[q_1 = A q_1 \cup B q_2] | ::: | | $math[q_2 = (A\cup B)q_2 \cup \epsilon] | ::: | === Equations === Equations are expressions containing **regexes** together with **state variables** (like $math[q_1]). Which are the unknowns. An equation of the form $math[q = e\cdot q'] signifies the fact that $math[L(A_q) = L(e) L(A_{q'})], where $math[L(A_q), L(A_{q'})] are the languages accepted by our Dfa, **starting from states $math[q] and $math[q'], respectively**. === Reducing the system of equations === We can choose **any** equation **except** that corresponding to the initial state, and eliminate it, by exploiting **Arden's Lemma**: * the solution to any equation of the form $math[q = e\cdot q \cup e'] is $math[q = e^*e']. **Example** Going back to the previous system of equations, we can find the solution to $math[q_2] which is: $math[(A \cup B)^*]. Next, we can replace the solution to $math[q_2] in $math[q_1] which yields: * $math[q_1 = A q_1 \cup B(A\cup B)^*]. * We apply Arden's Lemma one more time and yield: $math[q_1 = A^*B(A \cup B)^*]. === Another example === | The initial set of equations is: | {{ :lfa:2022:screenshot_2022-11-09_at_09.36.24.png?200 |}}| | $math[q_1 = A q_1 \cup B q_2] | ::: | | $math[q_2 = A q_2 \cup B q_1 \cup \epsilon] | ::: | * We reduce the system by removing the second equation: * $math[q_2 = A^*(Bq_1\cup \epsilon)] * We replace $math[q_2] into the first equation: * $math[q_1 = A q_1 \cup BA^*(B q_1 \cup \epsilon) = (A \cup BA^*) q_1 \cup BA^*] * and we apply Arden's lemma again: $math[q_1 = (A \cup BA^*)^*BA^*] === Exercise === 6.2.1. Apply Brzozowsky's method to find a regex for the following DFA: ^ {{ :lfa:2022:screenshot_2022-11-09_at_15.46.23.png?200 |}} ^ q0 = ε | Aq1 | Bq2 q1 = ε | Aq1 | Bq2 q2 = Bq2 | Aq0 => q2 = B*Aq0 \\ Replace q2: q0 = ε | Aq1 | BB*Aq0 q1 = ε | Aq1 | BB*Aq0 => q1 = A*(ε | BB*Aq0) \\ Replace q1: q0 = ε | AA*(ε | BB*Aq0) | BB*Aq0 => q0 = ε | A+ | (A+B+A | B+A)q0 => q0 = A* | (A+B+A | B+A)*q0 => q0 = (A+B+A | B+A)*A* \\ From q1 = A*(ε | BB*Aq0) => q1 = A*(ε | BB*A(A+B+A | B+A)*A*) => q1 = A* | A*BB*A(A+B+A | B+A)*A* \\ From q2 = B*Aq0 => q2 = B*A(A+B+A | B+A)*A*