Table of Contents

Regular languages

Definition

In the previous lectures, we have introduced regular expressions, NFAs and DFAs as finite representations for languages, and showed the following links between them.

We introduce the following:

It is both formally and practically important to understand the limits of regular expressions and automata (of different types) in capturing languages.

We have already seen that regular expressions are countable while languages are not. Can automata capture more languages that regular expressions? Our lecture has so far proven the following:

To this we add the following observation:

Therefore, we have shown that NFAs and DFAs accept the same languages, i.e. $ L(NFA) = L(DFA)$ . In other words, if a language $ L$ is accepted by some DFA $ M$ ($ L=L(M)$ ), then it can also be accepted by some NFA, and vice-versa.

It remains to establish the relationship between $ LR$ and $ L(DFA)$ (or equivalently $ L(NFA)$ ).

Equivalence between Regular Expressions and Automata

Let $ M$ be a DFA. There exists a regular expression $ E$ , such that $ L(E)=L(M)$ . </blockquote>

To prove the theorem, we rely on:

We prove the following:

Proposition:

Given DFA $ M$ , for all states $ i,j,k$ of $ M$ , there exists a regular expression $ R_{ij}^{(k)}$ , which satisfies the above conditions.

Proof:

The proof is by induction over $ k$ . Basis case: $ k=0$ . If $ i\neq j$ , then $ R^{(0)}_{ij}$ must contain exactly one transition:

  • $ R^{(0)}_{ij} = \emptyset$ if no transition exists between $ i$ and $ j$
  • $ R^{(0)}_{ij} = c_1 \cup \ldots \cup c_m$ if one or more transitions exist between $ i$ and $ j$ , on symbols $ c_1$ to $ c_n$ .

If $ i = j$ , then:

  • $ R^{(0)}_{ii}$ may contain zero transitions, hence $ R^{(0)}_{ij} = \epsilon$
  • $ R^{(0)}_{ii}$ may contain one transition, and the construction follows the above rules, yielding some regular expression $ E_0$ .

We combine the two situations in a single one: $ R^{(0)}_{ii} = \epsilon \cup E_0$ , where $ E_0$ is constructed as above.

Induction step: By induction hypothesis, we assume there exist regular expressions $ R^{k-1}_{ij}$ that satisfy our designated constraints, in $ M$ .

We build $ R^{k}_{ij}$ , for each possible pair of states $ i,j$ in $ M$ .

  1. a path from $ i$ to $ j$ may pass only states whose index is smaller than $ k$ . In this case: $ R^{(k)}_{ij} = R^{(k-1)}_{ij}$
  2. a path from $ i$ to $ j$ passes $ k$ one or more times. This path can be decomposed in the following bits:
    • a path from $ i$ to $ k$ which only visits states $ <k$ , identified by the regular expression $ R_{ik}^{(k-1)}$
    • zero or more paths from $ k$ to $ k$ which only visit states $ <k$ , each identified by: $ R_{kk}^{(k-1)}$
    • a path from $ k$ to $ j$ which only visits states $ <k$ , identified by: $ R_{kj}^{(k-1)}$ .

The induction hypotheses guarantees that all regular expressions involving the above construction(s) can be properly built. Hence, we assemble $ R_{ij}^{k}$ by combining the two afore-mentioned cases:

$ \displaystyle R_{ij}^{k} = R_{ij}^{(k-1)} \cup R_{ik}^{(k-1)}(R_{kk}^{(k-1)})^*R_{kj}^{(k-1)}$

The proof of our theorem consists in building the regular expression:

$ \displaystyle E = \bigcup_{i\in F}R_{1i}^n$

where $ n$ is the total number of states in $ M$ .

which, according to our proposition, describes all paths that start in the initial state, end in a final state, and may visit all other states.

Regular languages

We have completed an extensive investigation into languages defined via:

and established that these three instruments for defining languages are equivalent. An important observation is that, languages in general support two kinds of definitions:

Generators and acceptors are always useful when working with any kind of particular language.

When is a language regular?

We already know that a language is regular iff it can be defined via an regular expression, or automaton of either kind. However, what is an intrinsic feature do regular languages capture?

Interesting questions regarding languages arise:

  1. When is a language regular?
  2. When is a language not regular?

We can answer question 1. by constructing a regular expression, NFA or DFA to capture the language. However, in practice, there are a few tools which serve this purpose better:

Closure properties of languages

Although $ LR \subseteq L(DFA)$ has already been proven in the former two lectures, there is another way of establishing this, which has further applications. This second means is related to closure properties of languages.

Generally, a set $ A$ has closure under a transformation ($ T:X\rightarrow X$ ) or operation ($ O:X\times X \rightarrow X$ ) iff, by performing the transformation/operation on member(s) $ a$ ($ b$ ) of $ A$ (i.e. $ T(a)$ or $ O(a,b)$ ), we obtain an element in the same set.

Here, the set at hand is $ L(DFA)$ , and the transformations are:

and the operations are:

If $ L\in L(DFA)$ , then $ L^*\in L(DFA)$ and also. $ \overline{L}\in L(DFA)$ . By $ \overline{L}$ , we refer to the complement of the language $ L$ , with respect to $ \Sigma^*$ : $ \overline{L}=\Sigma^* \setminus L$

If $ L_1,L_2\in L(DFA)$ then the languages $ L_1 \cup L_2$ , $ L_1L_2$ (language concatenation) and $ L_1 \cap L_2$ are also members of $ L(DFA)$ . </blockquote>