10. Context-Free Languages & Lexers
- 10.1. Context-Free Grammar to PDA conversion
- 10.2. Lexer Spec

10. Context-Free Languages & Lexers

10.1. Context-Free Grammar to PDA conversion

For each context-free grammar G:

describe L(G)
algoritmically construct a PDA that accepts the same language
run the PDA on the given inputs
is the grammar ambiguous? If yes, write a non ambiguous grammar that generates the same language

10.1.1 input: aaaabb
$ S \leftarrow aS | aSb | \epsilon $

Solution

The start symbol of the PDA is S.
The PDA will only have one state q and it will accept via empty stack.
For each nonterminal/rule $ A \leftarrow \gamma $ add a transition q —$(\epsilon, A/ \gamma)$–➤ q and for each terminal c add q —$(c, c/ \epsilon)$–➤ q

Thus, our PDA has the following transitions looping on state q:

$ \epsilon, S/aS $
$ \epsilon, S/aSb $
$ \epsilon, S/\epsilon $
$ a, a/\epsilon $
$ b, b/\epsilon $

Input: aaabb
(aaabb, q, S) ⇒ (aaabb, q, aSb) ⇒ (aabb, q, Sb) ⇒ (aabb, q, aSbb) ⇒ (abb, q, Sbb) ⇒ (abb, q, aSbb) ⇒ (bb, q, Sbb) ⇒ (bb, q, bb) ⇒ (b, q, b) ⇒ ($\epsilon$, q, $\epsilon$)

Is the grammar ambiguuous? yes, because there exist 2 different left-derivations for word aaabb
S ⇒ aSb ⇒ aaSbb ⇒ aaaSbb ⇒ aaabb
S ⇒ aS ⇒ aaSb ⇒ aaaSbb ⇒ aaabb

The accepted language is $ L(G) = \{a^{m}b^{n} | m \ge n \ge 0\} $

Repaired grammar:
$ S \leftarrow aS | A \\ A \leftarrow aAb | \epsilon $

10.1.2 input: xayxcayatabcazz

$ S \leftarrow a | xAz | SbS | cS \\ A \leftarrow SyS | SyStS $

Solution

The PDA has the following transitions looping on state q:

$ \epsilon, S/a $
$ \epsilon, S/xAz $
$ \epsilon, S/SbS $
$ \epsilon, S/cS $
$ \epsilon, A/SyS $
$ \epsilon, A/SyStS $
$ a, a/\epsilon $
$ b, b/\epsilon $
$ c, c/\epsilon $
$ x, x/\epsilon $
$ y, y/\epsilon $
$ z, z/\epsilon $
$ t, t/\epsilon $

Input: xayxcayatabcazz
(xayxcayatabcazz, q, S) ⇒ (xayxcayatabcazz, q, xAz) ⇒ (ayxcayatabcazz, q, Az) ⇒ (ayxcayatabcazz, q, SySz) ⇒ (ayxcayatabcazz, q, aySz) ⇒ (yxcayatabcazz, q, ySz) ⇒ (xcayatabcazz, q, Sz) ⇒ (xcayatabcazz, q, xAzz) ⇒ (cayatabcazz, q, Azz) ⇒ (cayatabcazz, q, SyStSzz) ⇒ (cayatabcazz, q, cSyStSzz) ⇒ (ayatabcazz, q, SyStSzz) ⇒ (ayatabcazz, q, ayStSzz) ⇒ (yatabcazz, q, yStSzz) ⇒ (atabcazz, q, StSzz) ⇒ (atabcazz, q, StSzz) ⇒ (atabcazz, q, atSzz) ⇒ (tabcazz, q, tSzz) ⇒ (abcazz, q, Szz) ⇒ (abcazz, q, SbSzz) ⇒ (abcazz, q, abSzz) ⇒ (bcazz, q, bSzz) ⇒ (cazz, q, Szz) ⇒ (cazz, q, cSzz) ⇒ (azz, q, Szz) ⇒ (azz, q, azz) ⇒ (zz, q, zz) ⇒ (z, q, z) ⇒ ($\epsilon$, q, $\epsilon$)

Is the grammar ambiguuous? yes because of word ababa that has 2 different left-derivations
S ⇒ SbS ⇒ abS ⇒ abSbS ⇒ ababS ⇒ ababa
S ⇒ SbS ⇒ SbSbS ⇒ abSbS ⇒ ababS ⇒ ababa

It is hard to directly explain the language in this form. Another form may be easier. Let's relabel the terminals: a ⇒ bool; b ⇒ and; c ⇒ not; x ⇒ if; y ⇒ then; z ⇒ fi; t ⇒ else.
The grammar becomes: $ S \leftarrow bool | if A fi | S and S | not S \\ A \leftarrow S then S | S then S else S $
The language generated can be described as the language of boolean expressions (considering 'bool' is either a variable or a literal) with the operations 'and', 'not', 'if-then' and 'if-then-else'.
Why is it ambigous? The 'and'/b operator does not define its associativity, and the operators 'and'/b and 'not'/c do not have a clear precedence rule.
To fix this grammar we will use the following conventions: ababa == (aba)ba and caba == (ca)ba
Repaired grammar:
$ S \leftarrow TbS | T \\ T \leftarrow cT | xAz | a \\ A \leftarrow SyS | SyStS $

10.1.3 input: aaabbbbbccc

$ S \leftarrow ABC \\ A \leftarrow aA | \epsilon \\ B \leftarrow bbB | b \\ C \leftarrow cC | c $

Solution

The PDA has the following transitions looping on state q:

$ \epsilon, S/ABC $
$ \epsilon, A/aA $
$ \epsilon, A/\epsilon $
$ \epsilon, B/bbB $
$ \epsilon, B/b $
$ \epsilon, C/cC $
$ \epsilon, C/c $
$ a, a/\epsilon $
$ b, b/\epsilon $
$ c, c/\epsilon $

Input: aaabbbbbccc
(aaabbbbbccc, q, S) ⇒ (aaabbbbbccc, q, ABC) ⇒ (aaabbbbbccc, q, aABC) ⇒ (aabbbbbccc, q, ABC) ⇒ ⇒ (aabbbbbccc, q, aABC) ⇒ (abbbbbccc, q, ABC) ⇒ (aabbbbbccc, q, ABC) ⇒ (aabbbbbccc, q, aABC) ⇒ ⇒ (abbbbbccc, q, ABC) ⇒ ⇒ (abbbbbccc, q, aABC) ⇒ (bbbbbccc, q, ABC) ⇒ (bbbbbccc, q, BC) ⇒ (bbbbbccc, q, bbBC) ⇒ (bbbbccc, q, bBC) ⇒ (bbbccc, q, BC) ⇒ (bbbccc, q, bbBC) ⇒ (bbccc, q, bBC) ⇒ (bccc, q, BC) ⇒ (bccc, q, bC) ⇒ (ccc, q, C) ⇒ (ccc, q, cC) ⇒ (cc, q, C) ⇒ (cc, q, cC) ⇒ (c, q, C) ⇒ (c, q, c) ⇒ ($\epsilon$, q, $\epsilon$)

Is the grammar ambiguuous? no

The accepted language is $ L(G) = \{a^{m}b^{2n + 1}c^{p+1} | m,n,p \ge 0\} $

10.2. Lexer Spec

Given the following specs, construct the lexer DFA as presented in Lecture 14:

PAIRS: $ (10 | 01)* $
ONES: $ 1+ $
NO_CONSEC_ONE: $ (1 | \epsilon)(01 | 0)* $

Separate the following input strings into lexemes:

010101

Click to display ⇲

Click to hide ⇱

Although the entire string is matched by PAIRS and NO_CONSEC_ONE, PAIRS is defined first, thus it will be the first picked.
PAIRS “010101”

1010101011

Click to display ⇲

Click to hide ⇱

First we have a maximal match on “101010101” for regex NO_CONSEC_ONE. The remaining string, “1”, is matched by both ONES and NO_CONSEC_ONE, but ONES is defined first.
NO_CONSEC_ONE “101010101”
ONES “1”

01110101001

Click to display ⇲

Click to hide ⇱

PAIRS “01”
ONES “11”
NO_CONSEC_ONES “0101001”

01010111111001010

Click to display ⇲

Click to hide ⇱

PAIRS “010101”
ONES “11111”
NO_CONSEC_ONES “001010”

1101101001111100001010011001

Click to display ⇲

Click to hide ⇱

ONES “11” PAIRS “01101001”
ONES “1111”
NO_CONSEC_ONES “0000101001”
PAIRS “1001”

Table of Contents

10. Context-Free Languages & Lexers

10.1. Context-Free Grammar to PDA conversion

10.2. Lexer Spec