====== CFG - PDA equivalence ======= We prove that languages generated by Context-Free Grammar coincide with those accepted by PDAs. The proof is done in two steps. ===== CFG to PDA ===== Given a Context-Free grammar $math[G=(V,\Sigma,R,S)], we build a PDA $math[M=(K,\Sigma,\Gamma,\Delta,q_0,Z_0,F)] which accepts precisely $math[L(G)]. The main idea is: * for each derivation $math[\alpha A \beta \Rightarrow_G \alpha \gamma \beta] in $math[G], we will execute one transition in $math[M]. Consider the following example: $math[S\rightarrow AB] $math[A\rightarrow aB \mid bA] $math[B\rightarrow bB\mid \epsilon] As well as the derivation: $math[S\Rightarrow AB \Rightarrow bAB \Rightarrow baBB \Rightarrow babBB \Rightarrow babbBB \Rightarrow babbB \Rightarrow babb] which illustrates $math[babb\in L(G)] for our example. For this derivation, we should have the configuration sequence: $math[(?,babb,Z_0) \vdash (?,babb,SZ_0) \vdash (?,babb,ABZ_0) \vdash (?,babb,bABZ_0) \vdash] $math[(?,abb,ABZ_0) \vdash (?,abb,aBBZ_0)\vdash(?,bb,BBZ_0)\vdash(?,bb,bBBZ_0) \vdash] $math[(?,b,BBZ_0) \vdash (?,b,bBBZ_0) \vdash (?,\epsilon,BBZ_0)\vdash(?,\epsilon,BZ_0) \vdash] $math[(?,\epsilon,Z_0)] Ideas: * each possible derivation in $math[G] (not necessary successful), should correspond to a transition in $math[M] * we use the stack to hold the **non-terminals** which are due to be expanded. For this to work, we must consider **left-most derivations only** * whenever the stack contains a terminal symbol coinciding with the input, we pop it; * whenever the stack contains a non-terminal, we pop it, and (non-deterministically) push all its possible derivations. ==== Construction ==== * $math[K = \{q_0,p\}] * $math[\Gamma = V\cup\{Z_0\}] * build transition $math[(q_0,\epsilon,\epsilon,p,S)] which puts the start symbol $math[S] on the stack; * for each production $math[A\rightarrow\gamma] with $math[\gamma\in V^*], we build transition $math[(p,\epsilon,A,p,\gamma)], which replaces $math[A] with $math[\gamma] on the stack, without consuming the input * for each symbol $math[a\in\Sigma], build transition $math[(p,a,a,p,\epsilon)], which pops a symbol off the stack, once it is read at input; * $math[F=\{p\}] ==== Proof ==== To prove that $math[L(M)=L(G)], where $math[M] is build from $math[G] following the above rules, we must observe that $math[M] does not simulate **all possible** derivations, but only those which are **left-most**. (As shown by our example). The reason is that $math[M] //eats// symbols as it encounters them, and can only do this from left to right. Thus, we need to establish: $prop Given a CFG $math[G] and a word $math[w] such that $math[S\Rightarrow^*_G w], then there exists a **sequence of derivations where the first non-terminal to the left is always expanded first**, which derives the word $math[w]. $end In order words, if we can derive $math[w] in $math[G], then we can also derive it via **left-most** derivations only. We omit the proof for this proposition. We first show that $math[L(G) \subseteq L(M)]. $justprop If $math[S\Rightarrow_G^* \alpha\beta], using left-most derivations only, and with $math[\alpha\in\Sigma^*] and $math[\beta\in(V\setminus\Sigma)V^*\cup\{\epsilon\}] then $math[(p,\alpha,S)\vdash(p,\epsilon,\beta)]. $end Our inclusion follows for $math[\beta=\epsilon]. $proof The proof is by induction over the length of the derivation. **Basis:** zero-length derivation. $math[S\Rightarrow^*S] in zero steps. Then $math[(p,\epsilon,S)\vdash^*_M(p,\epsilon,S)], by reflexivity of $math[\vdash_M^*]. **Induction step:** Suppose $math[S\Rightarrow^* \alpha\beta] in $math[n+1] steps. Then $math[S\Rightarrow^* uv \Rightarrow\alpha\beta]. Also, $math[\alpha] and $math[u] contain terminal symbols only, while $math[\beta] and $math[v] start with a non-terminal. Let us look at the last production $math[uv\Rightarrow\alpha\beta]. Since $math[v] must start with a non-terminal, then $math[v] is a word of the form $math[Av']. Then $math[A] is the first non-terminal, hence, a production $math[A\rightarrow\gamma] must exist in $math[G]. Moreover, we can safely assume that $math[\gamma=xBy] where $math[B] is a non-terminal. (the reasoning is similar if no non-terminal in $math[\gamma] exists. Therefore, our derivation actually has the following structure: $math[uAv' \Rightarrow uxByv'] where $math[\alpha=ux] and $math[\beta=Byv']. Since, $math[S\Rightarrow^* uAv'] in $math[n] steps, by induction hypothesis, $math[(p,u,S)\vdash_M(p,\epsilon,Av')] Let us start from configuration $math[(p,\alpha,S) = (p,ux,S)]. The induction hypothesis entails that we can //eat// the $math[u] portion of the word: $math[(p,ux,S)\vdash^*(p,x,Av')]. By construction of $math[M], we can remove $math[A] from the stack without consuming the input: $math[(p,x,Av')\vdash^*(p,x,xByv)]. We have just simulated the production $math[A\rightarrow xBy]. By construction of $math[M], we can also //eat// each symbol from x, while removing it from the stack: $math[(p,\epsilon,Byv)]. The word $math[Byv] is actually $math[\beta] (i.e. a word starting with a non-terminal). The proof is finished. $end Next, we show $math[L(M) \subseteq L(G)] via the following proposition: $justprop If $math[(q,\alpha,S)\vdash^*_M(q,\epsilon,\beta)], where $math[\alpha\in\Sigma^*] and $math[\beta\in V^*], then $math[S\Rightarrow_G^*\alpha\beta] $end Notice that this implication is not precisely the //converse// of the previous one: $math[\beta] need not start with a non-terminal. The proof is similar to the above. We leave it as exercise. ===== PDA to CFG ===== To construct a grammar from a PDA, we need to envision the sequence of transitions of a PDA, as a: * sequence of //pop//-events, while parts of the input are being consumed. * a //pop//-event of symbol $math[A] is a sequence of pushes and pops (which do not affect $math[A], or the symbols //under// it), which ultimately ends with the popping of $math[A]. With this in mind, we shall construct **non-terminals** in a grammar as triples: * $math[\langle qXr \rangle] where $math[q,r] are states of the PDA and $math[X] is a symbol. * such a non-terminal models a **sequence of transitions** where the PDA goes from state $math[q] to state $math[r], while the //pop//-event $math[X] occurs (i.e. a sequence of push-pops which do not touch $math[X] occur, and which end up with $math[X] being removed). The idea is that $math[\langle qXr \Rightarrow^* w] iff $math[(q,w,X)\vdash^*_M (r,\epsilon,\epsilon)]. That is, if $math[w] is consumed starting from state $math[q] with $math[X] on the stack and ending up in state $math[r], then $math[w] can be derived in our grammar from non-terminal $math[\langle qXr\rangle] ==== Construction ==== We shall require the following conditions on the PDA at hand: * it should have a **unique** final state. Moreover, in this final state, we //pop// the empty symbol; * each transition performs a stack operation of type $math[Y_1Y_2] (e.g. a push) or $math[\epsilon] (a pop): * if the PDA performs a more complicated combination of push-pops, we can add intermediate transitions which obey the above rule; * if the PDA does not touch the stack, we push a dummy symbol and subsequently pop it; It is easy to take any PDA and transform it to an equivalent one where the two-above conditions are obeyed. We construct $math[G=(V,\Sigma,R,S)] as follows: * $math[V=\{\langle qXr \rangle \mid q,r\in K,X\in\Gamma \}\cup\Sigma]; some non-terminals from $math[V] may end up being unused; * we build production $math[S\rightarrow\langle q_0Z_0p\rangle] where $math[p\in F]. This non-terminal models the sequence of transitions going from the initial state to the final state, while the empty symbol is popped. This sequence of transitions marks the acceptance of a word. * If $math[\Delta] contains $math[(q,a,X,r,Y_1Y_2)], then we build $math[\mid K\mid^2] productions of the form: * $math[qXr_2 \rightarrow a\langle rY_1r_1\rangle\langle r_1Y_2r_2\rangle] for all $math[r_1,r_2\in K] * in other words, in order to obtain a stack with everything unchanged 'below' $math[X], starting from $math[q] and ending up in some $math[r_2], we must eat symbol $math[a], then from $math[r] we must pop $math[Y_1], then $math[Y_2]. We do not know what state will be reached after popping $math[Y_1], so we consider all possible states. The same holds for $math[Y_2]. * If $math[\Delta] contains $math[(q,a,X,r,\epsilon)], then we build the production: * $math[ \langle qXr \rangle \rightarrow a] where $math[a] could be the empty string; === Example === Consider the following PDA, which accepts $math[L=\{0^n1^n\mid n\geq 0\}]. ^ Current state ^ Input ^ Stack top ^ Next state ^ Stack op ^ | $math[q_0] | $math[0] | $math[Z_0] | $math[q_0] | $math[XZ_0] | | $math[q_0] | $math[0] | $math[X] | $math[q_0] | $math[XX] | | $math[q_0] | $math[1] | $math[X] | $math[q_1] | $math[\epsilon] | | $math[q_1] | $math[1] | $math[X] | $math[q_1] | $math[\epsilon] | | $math[q_1] | $math[\epsilon] | $math[Z] | $math[q_1] | $math[\epsilon] | Note that the final transition was not necessary for accepting $math[L], but was required by our construction. The final state is $math[q_1]. We first build production: $math[S \rightarrow \langle q_0 Z_0 q_1 \rangle] which captures the event that we pop the empty stack-symbol starting from the initial state, and ending in the final state (word-acceptance). The first transition generates the following template production: $math[\langle q_0 Z_0 r_2 \rangle \rightarrow 0 \langle q_0 X r_1\rangle\langle r_1Z_0 r_2\rangle] with $math[r_1,r_2\in \{q_0,q_1\}]. We have thus defined four productions. Similarly, the second transition defines: $math[\langle q_0 X r_2 \rangle \rightarrow 0 \langle q_1 X r_1\rangle\langle r_1X r_2\rangle] which are another four productions. The third and fourth transitions yield the productions: $math[\langle q_0 X q_1 \rangle \rightarrow 1] $math[\langle q_1 X q_1 \rangle \rightarrow 1] And the final transition: $math[\langle q_1 Z_0 q_1 \rangle \rightarrow \epsilon] The complete set of productions is: $math[S \rightarrow \langle q_0 Z_0 q_1 \rangle] $math[\langle q_0 Z_0 q_0 \rangle \rightarrow 0 \langle q_0 X q_0\rangle\langle q_0Z_0 q_0\rangle \mid 0 \langle q_0 X q_1\rangle\langle q_1Z_0 q_0\rangle] $math[\langle q_0 Z_0 q_1 \rangle \rightarrow 0 \langle q_0 X q_0\rangle\langle q_0Z_0 q_1\rangle \mid 0 \langle q_0 X q_1\rangle\langle q_1Z_0 q_1\rangle] $math[\langle q_0 X q_0 \rangle \rightarrow 0 \langle q_0 X q_0\rangle\langle q_0X q_0\rangle \mid 0 \langle q_0 X q_1\rangle\langle q_1X q_0\rangle] $math[\langle q_0 X q_1 \rangle \rightarrow 1 \mid 0 \langle q_0 X q_0\rangle\langle q_0X q_1\rangle \mid 0 \langle q_0 X q_1\rangle\langle q_1X q_1\rangle] $math[\langle q_1 X q_1 \rangle \rightarrow 1] $math[\langle q_1 Z_0 q_1 \rangle \rightarrow \epsilon] To make sense of this grammar, note that non-terminals $math[\langle q_1 X q_0\rangle] and $math[\langle q_1 Z_0 q_0 \rangle] do not appear in the LHS of a production, hence any derivation that includes rules containing them will get stuck. We eliminate such rules. The result is: $math[S \rightarrow \langle q_0 Z_0 q_1 \rangle] $math[\langle q_0 Z_0 q_0 \rangle \rightarrow 0 \langle q_0 X q_0\rangle\langle q_0Z_0 q_0\rangle] $math[\langle q_0 Z_0 q_1 \rangle \rightarrow 0 \langle q_0 X q_0\rangle\langle q_0Z_0 q_1\rangle \mid 0 \langle q_0 X q_1\rangle\langle q_1Z_0 q_1\rangle] $math[\langle q_0 X q_0 \rangle \rightarrow 0 \langle q_0 X q_0\rangle\langle q_0X q_0\rangle] $math[\langle q_0 X q_1 \rangle \rightarrow 1 \mid 0 \langle q_0 X q_0\rangle\langle q_0X q_1\rangle \mid 0 \langle q_0 X q_1\rangle\langle q_1X q_1\rangle] $math[\langle q_1 X q_1 \rangle \rightarrow 1] $math[\langle q_1 Z_0 q_1 \rangle \rightarrow \epsilon] Next, we observe that the rules having $math[\langle q_0 Z_0 q_0 \rangle] and $math[\langle q_0 X q_0\rangle] continue to generate non-terminals, ad-infinitum. Derivations including them will never produce actual words. For this reason, we ignore such rules: $math[S \rightarrow \langle q_0 Z_0 q_1 \rangle] $math[\langle q_0 Z_0 q_1 \rangle \rightarrow 0 \langle q_0 X q_1\rangle\langle q_1Z_0 q_1\rangle] $math[\langle q_0 X q_1 \rangle \rightarrow 1 \mid 0 \langle q_0 X q_1\rangle\langle q_1X q_1\rangle] $math[\langle q_1 X q_1 \rangle \rightarrow 1] $math[\langle q_1 Z_0 q_1 \rangle \rightarrow \epsilon] Also, we can remove the final two productions, and replace the occurrence of the l.h.s. non-terminals with $math[\epsilon] (resp. $math[1]), which yields: $math[S \rightarrow \langle q_0 Z_0 q_1 \rangle] $math[\langle q_0 Z_0 q_1 \rangle \rightarrow 0 \langle q_0 X q_1\rangle] $math[\langle q_0 X q_1 \rangle \rightarrow 1 \mid 0 \langle q_0 X q_1\rangle 1] Finally, we can merge the two productions, and write $math[A] instead of $math[\langle q_0 X q_1 \rangle] which produces an easy-to-read grammar: $math[S \rightarrow 0 A] $math[A \rightarrow 1 \mid 0 A 1]