Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
lfa:lab10-lexer [2021/12/14 15:42]
pdmatei
lfa:lab10-lexer [2021/12/14 16:31] (current)
pdmatei
Line 1: Line 1:
 ====== 10. Writing a parser for a CF language ====== ====== 10. Writing a parser for a CF language ======
  
 +===== 10.1. A basic functional structure for a parser =====
  
-===== 10.1. The grammar =====+Consider the following language encoding expressions:​ 
 +  * $math[S \leftarrow M \mid M + S] 
 +  * $math[M \leftarrow A \mid A * M] 
 +  * $math[A \leftarrow 0 \mid \mid (S)]
  
-10.1.1. ​Write a grammar which accurately describes regular ​expressions. Consider the following definition: //A regular expression is built in the normal way, using the symbols (,),*,| and any other alpha-numeric caracter. Free spaces may occur freely within the expression//​.+10.1.1. ​Implement an AST for expressions.
  
-10.1.2. Starting from the solution to the previous exercisewrite an unambiguous grammar for regexes+  * A **parser** is a function which takes a string and has **two** tasks:  
-   ​Make sure to take precedence into account+      - returns ​the **rest of the string to be parsed**or an error if parsing failed. Examples
 +          ''​parse_whitespace("​ lfa") = "​lfa"''​ 
 +          * ''​parse_whitespace("​lfa"​) = None''​ 
 +      - adds the parsed value to **a global stack** whenever the value is part of the AST to be built.
  
-===== 10.2. The implementation =====+Another example: 
 +<code python>​ 
 +stack []  # 
  
-10.2.1. Write parser for the following language+def parse_digit(w):​ 
-  * $math[S \leftarrow M \mid M + S] +    if len(w) == 0: 
-  $math[M \leftarrow A \mid A M] +       ​return None         # parsing fails 
-  ​$math[A \leftarrow ​(S)]+     
 +    if w[0].isalphanum():​ 
 +       stack.append(w[0]) ​ # add the parsed digit to the stack 
 +       ​return w[1:]        # return the rest of the word 
 +    else: 
 +       ​return None         # if the character is not a digit, the parsing fails 
 +</​code>​ 
 + 
 +10.1.3. Implement ​function ''​parse_plus''​ which parses ​the character '​+'​ (if the first character is '​+',​ it consumes it, otherwise it fails). Hintuse a more general function which you can then reuse to parse other characters. 
 + 
 +10.1.4. We can build **more complex parsers** from simpler ones. The key is to **try** to parse expressions and if parsing fails, we can try a different alternative. 
 +Complete the following implementation of the function ''​parse_multiplication'':​ 
 + 
 +<code python>​ 
 +def parse_multiplication(w):​ 
 +    if len(w) == 0
 +       ​return None 
 +     
 +    w1 = parse_digit(w) ​  # parse a digit 
 +     
 +    if w1 != None: 
 +        # we have parsed a digit, now we try to parse '​+':​ 
 +        w2 = parse_plus(w1) 
 +        if w2 != None: 
 +            # we have successfully parsed a '​+'​ 
 +            w3 = parse_multiplication(w2) 
 +            if w3 != None: 
 +                # we have parsed a digit followed by + and by another multiplication expression 
 +                # what are the contents of the stack right now? 
 +                # how should the stack be modified? 
 +        else: 
 +            # parsing a '​+'​ has failed, so we just return the rest of the string w1 
 +            return w1 
 +    else:  
 +       ​return None # parsing a digit failed 
 +     
 +</​code>​ 
 + 
 +10.1.5. Following the same structure, write a complete implementation for expression parsers. 
 + 
 + 
 +===== 10.2. Writing a parser for regular expressions ===== 
 + 
 +10.2.1. Write a grammar which accurately describes regular expressions. Consider the following definition: //A regular expression is built in the normal way, using the symbols ​(,),*,| and any other alpha-numeric character. Free spaces may occur freely within the expression//​. 
 + 
 +10.2.2. Starting from the solution to the previous exercise, write an unambiguous grammar for regexes: 
 +   * Make sure to take precedence into account
  
 +10.2.3. Write a parser for regular expressions.