Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
lfa:lab10-lexer [2021/12/14 15:49]
pdmatei
lfa:lab10-lexer [2021/12/14 16:31] (current)
pdmatei
Line 1: Line 1:
 ====== 10. Writing a parser for a CF language ====== ====== 10. Writing a parser for a CF language ======
  
- +===== 10.1. A basic functional structure for a parser =====
-===== 10.1. The grammar ===== +
- +
-10.1.1. Write a grammar which accurately describes regular expressions. Consider the following definition: //A regular expression is built in the normal way, using the symbols (,),*,| and any other alpha-numeric caracter. Free spaces may occur freely within the expression//​. +
- +
-10.1.2. Starting from the solution to the previous exercise, write an unambiguous grammar for regexes: +
-   * Make sure to take precedence into account +
- +
-===== 10.2. A basic functional structure for a parser =====+
  
 Consider the following language encoding expressions:​ Consider the following language encoding expressions:​
Line 16: Line 8:
   * $math[A \leftarrow 0 \mid 1 \mid (S)]   * $math[A \leftarrow 0 \mid 1 \mid (S)]
  
-10.2.1. Implement an AST for expressions.+10.1.1. Implement an AST for expressions.
  
-10.2.2. Implement a parser for expressions. Consider the following guidelines:​ +  ​* A **parser** is a function which takes a string and has **two** tasks: ​
-  ​* A **parser** is a function which takes a string and has two tasks: ​+
       - returns the **rest of the string to be parsed**, or an error if parsing failed. Examples:       - returns the **rest of the string to be parsed**, or an error if parsing failed. Examples:
           * ''​parse_whitespace("​ lfa") = "​lfa"''​           * ''​parse_whitespace("​ lfa") = "​lfa"''​
           * ''​parse_whitespace("​lfa"​) = None''​           * ''​parse_whitespace("​lfa"​) = None''​
-      -  +      - adds the parsed value to **a global stack** whenever the value is part of the AST to be built.
  
 +Another example:
 +<code python>
 +stack = []  # 
  
 +def parse_digit(w):​
 +    if len(w) == 0:
 +       ​return None         # parsing fails
 +    ​
 +    if w[0].isalphanum():​
 +       ​stack.append(w[0]) ​ # add the parsed digit to the stack
 +       ​return w[1:]        # return the rest of the word
 +    else:
 +       ​return None         # if the character is not a digit, the parsing fails
 +</​code>​
  
 +10.1.3. Implement a function ''​parse_plus''​ which parses the character '​+'​ (if the first character is '​+',​ it consumes it, otherwise it fails). Hint: use a more general function which you can then reuse to parse other characters.
 +
 +10.1.4. We can build **more complex parsers** from simpler ones. The key is to **try** to parse expressions and if parsing fails, we can try a different alternative.
 +Complete the following implementation of the function ''​parse_multiplication'':​
 +
 +<code python>
 +def parse_multiplication(w):​
 +    if len(w) == 0:
 +       ​return None
 +    ​
 +    w1 = parse_digit(w) ​  # parse a digit
 +    ​
 +    if w1 != None:
 +        # we have parsed a digit, now we try to parse '​+':​
 +        w2 = parse_plus(w1)
 +        if w2 != None:
 +            # we have successfully parsed a '​+'​
 +            w3 = parse_multiplication(w2)
 +            if w3 != None:
 +                # we have parsed a digit followed by + and by another multiplication expression
 +                # what are the contents of the stack right now?
 +                # how should the stack be modified?
 +        else:
 +            # parsing a '​+'​ has failed, so we just return the rest of the string w1
 +            return w1
 +    else: 
 +       ​return None # parsing a digit failed
 +    ​
 +</​code>​
 +
 +10.1.5. Following the same structure, write a complete implementation for expression parsers.
 +
 +
 +===== 10.2. Writing a parser for regular expressions =====
 +
 +10.2.1. Write a grammar which accurately describes regular expressions. Consider the following definition: //A regular expression is built in the normal way, using the symbols (,),*,| and any other alpha-numeric character. Free spaces may occur freely within the expression//​.
 +
 +10.2.2. Starting from the solution to the previous exercise, write an unambiguous grammar for regexes:
 +   * Make sure to take precedence into account
  
 +10.2.3. Write a parser for regular expressions.