10. Writing a parser for a CF language

This is an old revision of the document!

Consider the following language encoding expressions:

$ S \leftarrow M \mid M + S$
$ M \leftarrow A \mid A * M$
$ A \leftarrow 0 \mid 1 \mid (S)$

10.1.1. Implement an AST for expressions.

A parser is a function which takes a string and has two tasks:
1. returns the rest of the string to be parsed, or an error if parsing failed. Examples:
  - parse_whitespace(“ lfa”) = “lfa”
  - parse_whitespace(“lfa”) = None
2. adds the parsed value to a global stack whenever the value is part of the AST to be built.
We can build more complex parsers from simpler ones. The key is to try to parse expressions and if parsing fails, we can try a different alternative.

Consider the framework shown below:

stack = []  # 
 
def parse_digit(w):
    if len(w) == 0:
       return None         # parsing fails
 
    if w[0].isalphanum():
       stack.append(w[0])  # add the parsed digit to the stack
       return w[1:]        # return the rest of the word
    else:
       return None         # if the character is not a digit, the parsing fails

10.1.3. Implement a function parse_plus which parses the character '+' (if the first character is '+', it consumes it, otherwise it fails). Hint: use a more general function which you can then reuse to parse other characters.

10.1.4. Complete the following implementation of the function parse_multiplication:

def parse_multiplication(w):
    if len(w) == 0:
       return None
 
    w1 = parse_digit(w)   # parse a digit
 
    if w1 != None:
        # we have parsed a digit, now we try to parse '+':
        w2 = parse_plus(w1)
        if w2 != None:
            # we have parsed a '+'
            w3 = parse_multiplication(w2)
            if w3 != None:
                # we have parsed a digit followed by + and by another multiplication expression
                # what are the contents of the stack right now?
                # how should the stack be modified?
    else: 
       return None # parsing a digit failed

10.1.5. Following the same structure, write a complete implementation for expression parsers.

10.2.1. Write a grammar which accurately describes regular expressions. Consider the following definition: A regular expression is built in the normal way, using the symbols (,),*,| and any other alpha-numeric character. Free spaces may occur freely within the expression.

10.2.2. Starting from the solution to the previous exercise, write an unambiguous grammar for regexes:

Make sure to take precedence into account

10.2.3. Write a parser for regular expressions.

10. Writing a parser for a CF language

10.1. A basic functional structure for a parser

10.2. Writing a parser for regular expressions