====== 10. Writing a parser for a CF language ======
===== 10.1. A basic functional structure for a parser =====
Consider the following language encoding expressions:
* $math[S \leftarrow M \mid M + S]
* $math[M \leftarrow A \mid A * M]
* $math[A \leftarrow 0 \mid 1 \mid (S)]
10.1.1. Implement an AST for expressions.
* A **parser** is a function which takes a string and has **two** tasks:
- returns the **rest of the string to be parsed**, or an error if parsing failed. Examples:
* ''parse_whitespace(" lfa") = "lfa"''
* ''parse_whitespace("lfa") = None''
- adds the parsed value to **a global stack** whenever the value is part of the AST to be built.
Another example:
stack = [] #
def parse_digit(w):
if len(w) == 0:
return None # parsing fails
if w[0].isalphanum():
stack.append(w[0]) # add the parsed digit to the stack
return w[1:] # return the rest of the word
else:
return None # if the character is not a digit, the parsing fails
10.1.3. Implement a function ''parse_plus'' which parses the character '+' (if the first character is '+', it consumes it, otherwise it fails). Hint: use a more general function which you can then reuse to parse other characters.
10.1.4. We can build **more complex parsers** from simpler ones. The key is to **try** to parse expressions and if parsing fails, we can try a different alternative.
Complete the following implementation of the function ''parse_multiplication'':
def parse_multiplication(w):
if len(w) == 0:
return None
w1 = parse_digit(w) # parse a digit
if w1 != None:
# we have parsed a digit, now we try to parse '+':
w2 = parse_plus(w1)
if w2 != None:
# we have successfully parsed a '+'
w3 = parse_multiplication(w2)
if w3 != None:
# we have parsed a digit followed by + and by another multiplication expression
# what are the contents of the stack right now?
# how should the stack be modified?
else:
# parsing a '+' has failed, so we just return the rest of the string w1
return w1
else:
return None # parsing a digit failed
10.1.5. Following the same structure, write a complete implementation for expression parsers.
===== 10.2. Writing a parser for regular expressions =====
10.2.1. Write a grammar which accurately describes regular expressions. Consider the following definition: //A regular expression is built in the normal way, using the symbols (,),*,| and any other alpha-numeric character. Free spaces may occur freely within the expression//.
10.2.2. Starting from the solution to the previous exercise, write an unambiguous grammar for regexes:
* Make sure to take precedence into account
10.2.3. Write a parser for regular expressions.