Table of Contents

10. Writing a parser for a CF language

10.1. A basic functional structure for a parser

Consider the following language encoding expressions:

10.1.1. Implement an AST for expressions.

Another example:

stack = []  # 
 
def parse_digit(w):
    if len(w) == 0:
       return None         # parsing fails
 
    if w[0].isalphanum():
       stack.append(w[0])  # add the parsed digit to the stack
       return w[1:]        # return the rest of the word
    else:
       return None         # if the character is not a digit, the parsing fails

10.1.3. Implement a function parse_plus which parses the character '+' (if the first character is '+', it consumes it, otherwise it fails). Hint: use a more general function which you can then reuse to parse other characters.

10.1.4. We can build more complex parsers from simpler ones. The key is to try to parse expressions and if parsing fails, we can try a different alternative. Complete the following implementation of the function parse_multiplication:

def parse_multiplication(w):
    if len(w) == 0:
       return None
 
    w1 = parse_digit(w)   # parse a digit
 
    if w1 != None:
        # we have parsed a digit, now we try to parse '+':
        w2 = parse_plus(w1)
        if w2 != None:
            # we have successfully parsed a '+'
            w3 = parse_multiplication(w2)
            if w3 != None:
                # we have parsed a digit followed by + and by another multiplication expression
                # what are the contents of the stack right now?
                # how should the stack be modified?
        else:
            # parsing a '+' has failed, so we just return the rest of the string w1
            return w1
    else: 
       return None # parsing a digit failed
 

10.1.5. Following the same structure, write a complete implementation for expression parsers.

10.2. Writing a parser for regular expressions

10.2.1. Write a grammar which accurately describes regular expressions. Consider the following definition: A regular expression is built in the normal way, using the symbols (,),*,| and any other alpha-numeric character. Free spaces may occur freely within the expression.

10.2.2. Starting from the solution to the previous exercise, write an unambiguous grammar for regexes:

10.2.3. Write a parser for regular expressions.