Table of Contents

Pumping Lemma

Motivation

Consider the language defined via BNF (Bakus-Naur Form) as follows:

which describes simple arithmetic expressions with parentheses. Let us ignore the construction rules for atoms (let $ M_a$ be a DFA with a unique final state which accepts valid atoms).

Consider the following NFA built (with respect to $ M_a$ ) to accept words of the above language:

One can easily observe that the above automaton can only accept words of the following forms:

There are several possible fixes to our construction:

However, it is not possible to add a finite number of states (and transitions) which can accommodate for an arbitrary finite number of parenthesis nestings. (e.g. 1 + (2 + … ))))). Each automaton which we know how to build, can only accept a unique finite number of nestings, hence it can only describe arithmetic expressions with parenthesis nesting of up to some $ k\in\mathbb{N}$ .

This happens because our language is not reg ular.

The pumping lemma

The pumping lemma captures one particular trait of Regular languages:

The previous language is a good counter-example: while arithmetic expressions do capture a 'repeating pattern' - identifying it requires counting: we must keep track of the number $ k$ of open parentheses and make sure that precisely $ k$ parentheses are eventually closed.

Put more formally, the repeating pattern found in words of a Regular Language looks as follows.

Let $ L$ be a regular language. Then there exists $ n$ (dependent on $ L$ ) such that for every word $ w\in L$ of length larger than $ n$ has the following form:

</blockquote>

Informally, the Pumping lemma tells us that all 'large-enough' words of a regular language must have the form shown in the following image:

finite.

Note that arithmetic expressions cannot be broken-down in this way.

What does 'large-enough' mean?

Suppose $ L$ is a finite language. Then there exists a DFA accepting $ L$ which has the structure of a tree: each branch of the tree ending in a leaf describes one possible word of the language. The lemma trivially holds here since the $ n$ at hand can be any number larger that the length of the longest word. Thus, there is no word satisfying $ \mid w \mid \geq n$ .

Suppose $ M$ is a DFA which accepts $ L$ . 'Large-enough' words are those of $ \mid w \mid \geq \mid K\mid$ , i.e. the number of states of $ M$ . For such words, the accepting path through $ M$ must explore some state at least twice.

The pumping lemma in action

The Pumping Lemma is used as a technical instrument for proving that a language $ L$ is not regular. The proof scheme is as follows:

Consider the language $ L = {0^i1^i \mid i > 1}$ consisting of sequences of zeros followed by ones in the same number. Suppose the language is regular.

  1. The pumping lemma tells us that for some large-enough $ n$ , we can find words with the aforementioned properties in $ L$ ;
    • To violate the Pumping lemma, we must show that for any $ n$ , words with the required properties cannot exist in $ L$
  1. The pumping lemma tells us that for any word $ w\in L$ such that $ \mid w \mid \geq n$ , certain properties hold;
    • To violate the pumping lemma, let us choose $ w_n$ (with respect to any $ n$ ) to be $ 0^n1^n$ .
  2. The pumping lemma tells us that any large-enough word, including $ 0^n1^n$ can be split into $ xyz$ where $ y\neq \emptyset$ and $ \mid xy\mid \leq n$ . Thus, even if we do not know $ x,y,z$ , we can argue that: $ x=0^{i}$ , $ y=0^{j}$ with $ j\neq 0$ and $ i+j\leq n$ , and $ z=0^{n-i-j}1^n$ .
  3. Finally, the pumping lemma tells us that any word $ xy^kz$ , with $ k\leq 0$ is also in $ L$ .
    • To violate it, choose $ k=0$ . The word $ xy^0z=xz=0^{i}0^{n-i-j}1^n=0^{n-j}1^{n}$ has strictly less zeros than ones (since $ j\neq 0$ ). The pumping lemma tells us that this word should be in $ L$ however it obviously is not. Contradiction. The language $ L$ cannot be regular.

The Pumping Lemma recipe

To use the Pumping lemma in order to prove that a language $ L$ is not regular:

Thus, the Pumping lemma is contradicted.

The language of arithmetic expressions is not regular

For simplicity assume atoms can only be the one-letter word a, hence the alphabet of the language is $ \{a,+,(,)\}$ .

Closure properties of regular languages

Union

Proposition (union):

Let $ A,B$ be two regular languages. The language $ A\cup B$ is regular.

Proof:

Let $ E_A$ and $ E_B$ be the regular expressions generating $ A$ and $ B$ respectively. The regular expression $ E_A\cup E_B$ generates the language $ A\cup B$ .

Concatenation

Proposition (concatenation):

Let $ A,B$ be two regular languages. The language $ AB$ is regular.

The proof follows the same idea as that of union.

Complement

Proposition (complement):

Let $ A$ be a regular language. The language $ \overline{A}=\Sigma^* \setminus A$ is regular.

Proof:

Let $ M_A = (K,\Sigma,\delta,q_0,F)$ be a DFA which accepts $ A$ . We build the DFA $ \overline{M_A} = (K,\Sigma,\delta,q_0,K\setminus F)$ . $ \overline{M_A}$ only differs from $ M_A$ in the accepting (or final) states: each final state in $ M_A$ is non-final in $ \overline{M_A}$ and vice-versa. It follows immediately that, for a ll words $ w$ , w is accepted by $ M_A$ iff $ w$ is not accepted by $ \overline{M_A}$ . Thus, $ M_A$ accepts any word not in $ A$ , i.e. the language $ \overline{A}$ .

Intersection

Proposition (intersection):

Let $ A,B$ be two regular languages. The language $ A\cap B$ is regular.

Proof:

The language $ A\cap B$ can be defined as $ \overline{\overline{A}\cup\overline{B}}$ . By union and complement closure, we have that $ \overline{A}$ , $ \overline{B}$ , $ \overline{A}\cup\overline{B}$ and finally $ \overline{\overline{A}\cup\overline{B}}$ are regular languages.

An alternative and more useful proof is to construct, starting from DFAs $ M_A=(K_A,\Sigma,q_A,F_A)$ and $ M_B=(K_B,\Sigma,q_B,F_B)$ which accept languages $ A$ and $ B$ , respectively, a DFA for $ A\cap B$ . The construction is called product automaton (written $ M_A\times M_B$ ), and is as follows:

It is easy to prove by induction that for each word $ w$ such that $ (q_A,w)\ vdash_{M_A}^*(p,\epsilon)$ and $ (q_B,w)\vdash_{M_B}^*(r,\epsilon)$ , with $ p\in F_A$ and $ r\in F_B$ , we also have in the product automaton $ ((q_A,q_B),w)\vdash^*_{M_{A\times B}}(p,r),\epsilon$ .

Difference

Proposition (intersection):

Let $ A,B$ be two regular languages. The language $ A\setminus B$ is regular.

Proof:

The language $ A\setminus B$ can be defined as $ A\cup\overline{B}$ , which is regular via reunion and complement properties.

Closure

Proposition (closure):

Let $ A$ be a regular language. The language $ A^*$ is regular.

The proof is similar to that for union and concatenation.

Reversal

Proposition (reversal):

Let $ A$ be a regular language. The language $ A^R$ is regular.