====== Regular languages ===== ===== Definition ===== In the previous lectures, we have introduced **regular expressions**, **NFAs** and **DFAs** as finite representations for languages, and showed the following links between them. * $math[E \rightarrow NA] - each regular expression $math[e] can be transformed to a **NFA** $math[M] such that $math[L(e) = L(M)]. * $math[NA \rightarrow DA] - each **NFA** $math[M] can be transformed to a **DFA** $math[M'] such that $math[L(M) = L(M')]. We introduce the following: * a language $math[L] is called **regular**, if it can be generated by a regular expression, i.e. $math[L=L(E)] for some regular expression $math[E]. Denote by $math[LR\subset 2^{\Sigma^*}] **the set of regular languages** * Denote by $math[L(NFA)\subset 2^{\Sigma^*}], the set of languages which can be accepted by NFAs, and * Denote by $math[L(DFA)\subset 2^{\Sigma^*}], the set of languages which can be accepted by DFAs. It is both formally and practically important to understand the limits of regular expressions and automata (of different types) in capturing languages. We have already seen that regular expressions are countable while languages are not. Can automata capture more languages that regular expressions? Our lecture has so far proven the following: * $math[LR \subseteq L(NFA) \subseteq L(DFA)] To this we add the following observation: * $math[L(DFA) \subseteq L(NFA)]. If a language can be accepted by a DFA, it also can (trivially) be accepted by an NFA, since the latter automata extend the former. Therefore, we have shown that **NFAs and DFAs accept the same languages**, i.e. $math[L(NFA) = L(DFA)]. In other words, if a language $math[L] is accepted by some DFA $math[M] ($math[L=L(M)]), then it can also be accepted by some NFA, and vice-versa. It remains to establish the relationship between $math[LR] and $math[L(DFA)] (or equivalently $math[L(NFA)]). ===== Equivalence between Regular Expressions and Automata ===== $justtheorem Let $math[M] be a DFA. There exists a **regular expression** $math[E], such that $math[L(E)=L(M)]. $end To prove the theorem, we rely on: * a //naming scheme// for states. We assume a state $math[q_i] is identified by its **index** $math[i]. **The indexes in our proof start with 1**, hence $math[1] is the initial state. How states are ordered, or their //kind// (final/nonfinal) is unimportant, however we use the same ordering throughout the proof; * a //naming scheme// for //partial// regular expressions: We label $math[R_{ij}^{(k)}] the **regular expression** such that its **language** is the set of **words** that label a **path** from state $math[i] to $math[j]. Moreover, the path cannot visit states of **index larger than $math[k]**. We prove the following: $justprop Given DFA $math[M], for all states $math[i,j,k] of $math[M], there exists a **regular expression** $math[R_{ij}^{(k)}], which satisfies the above conditions. $end $proof The proof is by induction **over $math[k]**. **Basis case:** $math[k=0]. If $math[i\neq j], then $math[R^{(0)}_{ij}] must contain **exactly** one transition: * $math[R^{(0)}_{ij} = \emptyset] if no transition exists between $math[i] and $math[j] * $math[R^{(0)}_{ij} = c_1 \cup \ldots \cup c_m] if **one or more** transitions exist between $math[i] and $math[j], on symbols $math[c_1] to $math[c_n]. If $math[i = j], then: * $math[R^{(0)}_{ii}] may contain **zero** transitions, hence $math[R^{(0)}_{ij} = \epsilon] * $math[R^{(0)}_{ii}] may contain one transition, and the construction follows the above rules, yielding some regular expression $math[E_0]. We combine the two situations in a single one: $math[R^{(0)}_{ii} = \epsilon \cup E_0], where $math[E_0] is constructed as above. **Induction step**: By //induction hypothesis//, we assume there exist regular expressions $math[R^{k-1}_{ij}] that satisfy our designated constraints, in $math[M]. We build $math[R^{k}_{ij}], for each possible pair of states $math[i,j] in $math[M]. - a path from $math[i] to $math[j] may pass only states whose index is **smaller than $math[k]**. In this case: $math[R^{(k)}_{ij} = R^{(k-1)}_{ij}] - a path from $math[i] to $math[j] **passes $math[k] one or more times**. This path can be decomposed in the following bits: * a path from $math[i] to $math[k] which only visits states $math[