This is an old revision of the document!


10. Writing a parser for a CF language

Consider the following language encoding expressions:

  • $ S \leftarrow M \mid M + S$
  • $ M \leftarrow A \mid A * M$
  • $ A \leftarrow 0 \mid 1 \mid (S)$

10.1.1. Implement an AST for expressions.

  • A parser is a function which takes a string and has two tasks:
    1. returns the rest of the string to be parsed, or an error if parsing failed. Examples:
      • parse_whitespace(“ lfa”) = “lfa”
      • parse_whitespace(“lfa”) = None
    2. adds the parsed value to a global stack whenever the value is part of the AST to be built.
  • We can build more complex parsers from simpler ones. The key is to try to parse expressions and if parsing fails, we can try a different alternative.

Consider the framework shown below:

stack = []  # 
 
def parse_digit(w):
    if len(w) == 0:
       return None         # parsing fails
 
    if w[0].isalphanum():
       stack.append(w[0])  # add the parsed digit to the stack
       return w[1:]        # return the rest of the word
    else:
       return None         # if the character is not a digit, the parsing fails

10.1.3. Implement a function parse_plus which parses the character '+' (if the first character is '+', it consumes it, otherwise it fails). Hint: use a more general function which you can then reuse to parse other characters.

10.1.4. Complete the following implementation of the function parse_multiplication:

def parse_multiplication(w):
    if len(w) == 0:
       return None
 
    w1 = parse_digit(w)   # parse a digit
 
    if w1 != None:
        # we have parsed a digit, now we try to parse '+':
        w2 = parse_plus(w1)
        if w2 != None:
            # we have parsed a '+'
            w3 = parse_multiplication(w2)
            if w3 != None:
                # we have parsed a digit followed by + and by another multiplication expression
                # what are the contents of the stack right now?
                # how should the stack be modified?
    else: 
       return None # parsing a digit failed
 

10.1.5. Following the same structure, write a complete implementation for expression parsers.

10.2.1. Write a grammar which accurately describes regular expressions. Consider the following definition: A regular expression is built in the normal way, using the symbols (,),*,| and any other alpha-numeric character. Free spaces may occur freely within the expression.

10.2.2. Starting from the solution to the previous exercise, write an unambiguous grammar for regexes:

  • Make sure to take precedence into account

10.2.3. Write a parser for regular expressions.