====== CFG - PDA equivalence =======

We prove that languages generated by Context-Free Grammar coincide with those accepted by PDAs.
The proof is done in two steps. 

===== CFG to PDA =====

Given a Context-Free grammar $math[G=(V,\Sigma,R,S)], we build a PDA $math[M=(K,\Sigma,\Gamma,\Delta,q_0,Z_0,F)] which accepts precisely $math[L(G)]. The main idea is:
   * for each derivation $math[\alpha A \beta \Rightarrow_G \alpha \gamma \beta] in $math[G], we will execute one transition in $math[M].

Consider the following example:

$math[S\rightarrow AB]

$math[A\rightarrow aB \mid bA]

$math[B\rightarrow bB\mid \epsilon]

As well as the derivation:

$math[S\Rightarrow AB \Rightarrow bAB \Rightarrow baBB \Rightarrow babBB \Rightarrow babbBB \Rightarrow babbB \Rightarrow babb]

which illustrates $math[babb\in L(G)] for our example. For this derivation, we should have the configuration sequence:

$math[(?,babb,Z_0) \vdash (?,babb,SZ_0) \vdash (?,babb,ABZ_0) \vdash (?,babb,bABZ_0) \vdash]

$math[(?,abb,ABZ_0) \vdash (?,abb,aBBZ_0)\vdash(?,bb,BBZ_0)\vdash(?,bb,bBBZ_0) \vdash]

$math[(?,b,BBZ_0) \vdash (?,b,bBBZ_0) \vdash (?,\epsilon,BBZ_0)\vdash(?,\epsilon,BZ_0) \vdash]

$math[(?,\epsilon,Z_0)]

Ideas:
   * each possible derivation in $math[G] (not necessary successful), should correspond to a transition in $math[M]
   * we use the stack to hold the **non-terminals** which are due to be expanded. For this to work, we must consider **left-most derivations only**
   * whenever the stack contains a terminal symbol coinciding with the input, we pop it;
   * whenever the stack contains a non-terminal, we pop it, and (non-deterministically) push all its possible derivations.  

==== Construction ====

  * $math[K = \{q_0,p\}]
  * $math[\Gamma = V\cup\{Z_0\}]
  * build transition $math[(q_0,\epsilon,\epsilon,p,S)] which puts the start symbol $math[S] on the stack;
  * for each production $math[A\rightarrow\gamma] with $math[\gamma\in V^*], we build transition $math[(p,\epsilon,A,p,\gamma)], which replaces $math[A] with $math[\gamma] on the stack, without consuming the input
  * for each symbol $math[a\in\Sigma], build transition $math[(p,a,a,p,\epsilon)], which pops a symbol off the stack, once it is read at input;
  * $math[F=\{p\}]
 
==== Proof ====

To prove that $math[L(M)=L(G)], where $math[M] is build from $math[G] following the above rules, we must observe that $math[M] does not simulate **all possible** derivations, but only those which are **left-most**. (As shown by our example). The reason is that $math[M] //eats// symbols as it encounters them, and can only do this from left to right.

Thus, we need to establish:

$prop
Given a CFG $math[G] and a word $math[w] such that $math[S\Rightarrow^*_G w], then there exists a **sequence of derivations where the first non-terminal to the left is always expanded first**, which derives the word $math[w].
$end

In order words, if we can derive $math[w] in $math[G], then we can also derive it via **left-most** derivations only. We omit the proof for this proposition.

We first show that $math[L(G) \subseteq L(M)].

$justprop
If $math[S\Rightarrow_G^* \alpha\beta], using left-most derivations only, and with $math[\alpha\in\Sigma^*] and $math[\beta\in(V\setminus\Sigma)V^*\cup\{\epsilon\}] then $math[(p,\alpha,S)\vdash(p,\epsilon,\beta)].
$end

Our inclusion follows for $math[\beta=\epsilon].

$proof
The proof is by induction over the length of the derivation.

**Basis:** zero-length derivation.

$math[S\Rightarrow^*S] in zero steps. Then $math[(p,\epsilon,S)\vdash^*_M(p,\epsilon,S)], by reflexivity of $math[\vdash_M^*].

**Induction step:** 
Suppose $math[S\Rightarrow^* \alpha\beta] in $math[n+1] steps. Then $math[S\Rightarrow^* uv \Rightarrow\alpha\beta]. Also, $math[\alpha] and $math[u] contain terminal symbols only, while $math[\beta] and $math[v] start with a non-terminal.

Let us look at the last production $math[uv\Rightarrow\alpha\beta]. Since $math[v] must start with a non-terminal, then $math[v] is a word of the form $math[Av']. Then $math[A] is the first non-terminal, hence, a production $math[A\rightarrow\gamma] must exist in $math[G]. Moreover, we can safely assume that $math[\gamma=xBy] where $math[B] is a non-terminal. (the reasoning is similar if no non-terminal in $math[\gamma] exists. Therefore, our derivation actually has the following structure:

$math[uAv' \Rightarrow uxByv'] where $math[\alpha=ux] and $math[\beta=Byv'].

Since, $math[S\Rightarrow^* uAv'] in $math[n] steps, by induction hypothesis, $math[(p,u,S)\vdash_M(p,\epsilon,Av')]

Let us start from configuration $math[(p,\alpha,S) = (p,ux,S)]. The induction hypothesis entails that we can //eat// the $math[u] portion of the word: $math[(p,ux,S)\vdash^*(p,x,Av')]. By construction of $math[M], we can remove $math[A] from the stack without consuming the input: $math[(p,x,Av')\vdash^*(p,x,xByv)]. We have just simulated the production $math[A\rightarrow xBy]. By construction of $math[M], we can also //eat// each symbol from x, while removing it from the stack: $math[(p,\epsilon,Byv)]. The word $math[Byv] is actually $math[\beta] (i.e. a word starting with a non-terminal).

The proof is finished.

$end

Next, we show $math[L(M) \subseteq L(G)] via the following proposition:

$justprop
If $math[(q,\alpha,S)\vdash^*_M(q,\epsilon,\beta)], where $math[\alpha\in\Sigma^*] and $math[\beta\in V^*], then $math[S\Rightarrow_G^*\alpha\beta]
$end

Notice that this implication is not precisely the //converse// of the previous one: $math[\beta] need not start with a non-terminal. The proof is similar to the above. We leave it as exercise.

===== PDA to CFG =====

To construct a grammar from a PDA, we need to envision the sequence of transitions of a PDA, as a:
  * sequence of //pop//-events, while parts of the input are being consumed.
  * a //pop//-event of symbol $math[A] is a sequence of pushes and pops (which do not affect $math[A], or the symbols //under// it), which ultimately ends with the popping of $math[A].

With this in mind, we shall construct **non-terminals** in a grammar as triples:
  * $math[\langle qXr \rangle] where $math[q,r] are states of the PDA and $math[X] is a symbol.
  * such a non-terminal models a **sequence of transitions** where the PDA goes from state $math[q] to state $math[r], while the //pop//-event $math[X] occurs (i.e. a sequence of push-pops which do not touch $math[X] occur, and which end up with $math[X] being removed). The idea is that $math[\langle qXr \Rightarrow^* w] iff $math[(q,w,X)\vdash^*_M (r,\epsilon,\epsilon)]. That is, if $math[w] is consumed starting from state $math[q] with $math[X] on the stack and ending up in state $math[r], then $math[w] can be derived in our grammar from non-terminal $math[\langle qXr\rangle]
==== Construction ====

We shall require the following conditions on the PDA at hand:
  * it should have a **unique** final state. Moreover, in this final state, we //pop// the empty symbol;
  * each transition performs a stack operation of type $math[Y_1Y_2] (e.g. a push) or $math[\epsilon] (a pop):
    * if the PDA performs a more complicated combination of push-pops, we can add intermediate transitions which obey the above rule;
    * if the PDA does not touch the stack, we push a dummy symbol and subsequently pop it;


It is easy to take any PDA and transform it to an equivalent one where the two-above conditions are obeyed.

We construct $math[G=(V,\Sigma,R,S)] as follows:

  * $math[V=\{\langle qXr \rangle \mid q,r\in K,X\in\Gamma \}\cup\Sigma]; some non-terminals from $math[V] may end up being unused;
  * we build production $math[S\rightarrow\langle q_0Z_0p\rangle] where $math[p\in F]. This non-terminal models the sequence of transitions going from the initial state to the final state, while the empty symbol is popped. This sequence of transitions marks the acceptance of a word.
  * If $math[\Delta] contains $math[(q,a,X,r,Y_1Y_2)], then we build $math[\mid K\mid^2] productions of the form:
    * $math[qXr_2 \rightarrow a\langle rY_1r_1\rangle\langle r_1Y_2r_2\rangle] for all $math[r_1,r_2\in K]
    * in other words, in order to obtain a stack with everything unchanged 'below' $math[X], starting from $math[q] and ending up in some $math[r_2], we must eat symbol $math[a], then from $math[r] we must pop $math[Y_1], then $math[Y_2]. We do not know what state will be reached after popping $math[Y_1], so we consider all possible states. The same holds for $math[Y_2].
  * If $math[\Delta] contains $math[(q,a,X,r,\epsilon)], then we build the production:
    * $math[ \langle qXr \rangle \rightarrow a] where $math[a] could be the empty string;

=== Example ===

Consider the following PDA, which accepts $math[L=\{0^n1^n\mid n\geq 0\}].

^ Current state ^ Input ^ Stack top  ^ Next state ^ Stack op ^
| $math[q_0]    | $math[0] | $math[Z_0] | $math[q_0] | $math[XZ_0] |
| $math[q_0]    | $math[0] | $math[X] | $math[q_0] | $math[XX] |
| $math[q_0]    | $math[1] | $math[X] | $math[q_1] | $math[\epsilon] |
| $math[q_1]    | $math[1] | $math[X] | $math[q_1] | $math[\epsilon] |
| $math[q_1]    | $math[\epsilon] | $math[Z] | $math[q_1] | $math[\epsilon] |

Note that the final transition was not necessary for accepting $math[L], but was required by our construction. The final state is $math[q_1].

We first build production:

$math[S \rightarrow \langle q_0 Z_0 q_1 \rangle] which captures the event that we pop the empty stack-symbol starting from the initial state, and ending in the final state (word-acceptance).

The first transition generates the following template production:

$math[\langle q_0 Z_0 r_2 \rangle \rightarrow 0 \langle q_0 X r_1\rangle\langle r_1Z_0 r_2\rangle] with $math[r_1,r_2\in \{q_0,q_1\}]. We have thus defined four productions.

Similarly, the second transition defines:

$math[\langle q_0 X r_2 \rangle \rightarrow 0 \langle q_1 X r_1\rangle\langle r_1X r_2\rangle]
which are another four productions.

The third and fourth transitions yield the productions:

$math[\langle q_0 X q_1 \rangle \rightarrow 1]

$math[\langle q_1 X q_1 \rangle \rightarrow 1]

And the final transition:

$math[\langle q_1 Z_0 q_1 \rangle \rightarrow \epsilon]

The complete set of productions is:

$math[S \rightarrow \langle q_0 Z_0 q_1 \rangle]

$math[\langle q_0 Z_0 q_0 \rangle \rightarrow 0 \langle q_0 X q_0\rangle\langle q_0Z_0 q_0\rangle \mid 0 \langle q_0 X q_1\rangle\langle q_1Z_0 q_0\rangle]

$math[\langle q_0 Z_0 q_1 \rangle \rightarrow 0 \langle q_0 X q_0\rangle\langle q_0Z_0 q_1\rangle \mid 0 \langle q_0 X q_1\rangle\langle q_1Z_0 q_1\rangle]

$math[\langle q_0 X q_0 \rangle \rightarrow 0 \langle q_0 X q_0\rangle\langle q_0X q_0\rangle \mid 0 \langle q_0 X q_1\rangle\langle q_1X q_0\rangle]

$math[\langle q_0 X q_1 \rangle  \rightarrow 1 \mid 0 \langle q_0 X q_0\rangle\langle q_0X q_1\rangle   \mid 0 \langle q_0 X q_1\rangle\langle q_1X q_1\rangle]

$math[\langle q_1 X q_1 \rangle \rightarrow 1]

$math[\langle q_1 Z_0 q_1 \rangle \rightarrow \epsilon]

To make sense of this grammar, note that non-terminals $math[\langle q_1 X q_0\rangle] and $math[\langle q_1 Z_0 q_0 \rangle] do not appear in the LHS of a production, hence any derivation that includes rules containing them will get stuck. 

We eliminate such rules. The result is:

$math[S \rightarrow \langle q_0 Z_0 q_1 \rangle]

$math[\langle q_0 Z_0 q_0 \rangle \rightarrow 0 \langle q_0 X q_0\rangle\langle q_0Z_0 q_0\rangle]

$math[\langle q_0 Z_0 q_1 \rangle \rightarrow 0 \langle q_0 X q_0\rangle\langle q_0Z_0 q_1\rangle \mid 0 \langle q_0 X q_1\rangle\langle q_1Z_0 q_1\rangle]

$math[\langle q_0 X q_0 \rangle \rightarrow 0 \langle q_0 X q_0\rangle\langle q_0X q_0\rangle]

$math[\langle q_0 X q_1 \rangle  \rightarrow 1 \mid 0 \langle q_0 X q_0\rangle\langle q_0X q_1\rangle   \mid 0 \langle q_0 X q_1\rangle\langle q_1X q_1\rangle]

$math[\langle q_1 X q_1 \rangle \rightarrow 1]

$math[\langle q_1 Z_0 q_1 \rangle \rightarrow \epsilon]

Next, we observe that the rules having $math[\langle q_0 Z_0 q_0 \rangle] and $math[\langle q_0 X q_0\rangle] continue to generate non-terminals, ad-infinitum. Derivations including them will never produce actual words. For this reason, we ignore such rules:

$math[S \rightarrow \langle q_0 Z_0 q_1 \rangle]

$math[\langle q_0 Z_0 q_1 \rangle \rightarrow 0 \langle q_0 X q_1\rangle\langle q_1Z_0 q_1\rangle]

$math[\langle q_0 X q_1 \rangle  \rightarrow 1 \mid 0 \langle q_0 X q_1\rangle\langle q_1X q_1\rangle]

$math[\langle q_1 X q_1 \rangle \rightarrow 1]

$math[\langle q_1 Z_0 q_1 \rangle \rightarrow \epsilon]

Also, we can remove the final two productions, and replace the occurrence of the l.h.s. non-terminals with $math[\epsilon] (resp. $math[1]), which yields:

$math[S \rightarrow \langle q_0 Z_0 q_1 \rangle]

$math[\langle q_0 Z_0 q_1 \rangle \rightarrow 0 \langle q_0 X q_1\rangle]

$math[\langle q_0 X q_1 \rangle  \rightarrow 1 \mid 0 \langle q_0 X q_1\rangle 1]

Finally, we can merge the two productions, and write $math[A] instead of $math[\langle q_0 X q_1 \rangle] which produces an easy-to-read grammar:

$math[S \rightarrow 0 A]

$math[A \rightarrow 1 \mid 0 A 1]