Differences
This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
|
lfa:lab10-lexer [2021/12/14 15:48] pdmatei |
lfa:lab10-lexer [2021/12/14 16:31] (current) pdmatei |
||
|---|---|---|---|
| Line 1: | Line 1: | ||
| ====== 10. Writing a parser for a CF language ====== | ====== 10. Writing a parser for a CF language ====== | ||
| - | + | ===== 10.1. A basic functional structure for a parser ===== | |
| - | ===== 10.1. The grammar ===== | + | |
| - | + | ||
| - | 10.1.1. Write a grammar which accurately describes regular expressions. Consider the following definition: //A regular expression is built in the normal way, using the symbols (,),*,| and any other alpha-numeric caracter. Free spaces may occur freely within the expression//. | + | |
| - | + | ||
| - | 10.1.2. Starting from the solution to the previous exercise, write an unambiguous grammar for regexes: | + | |
| - | * Make sure to take precedence into account | + | |
| - | + | ||
| - | ===== 10.2. A basic functional structure for a parser ===== | + | |
| Consider the following language encoding expressions: | Consider the following language encoding expressions: | ||
| Line 16: | Line 8: | ||
| * $math[A \leftarrow 0 \mid 1 \mid (S)] | * $math[A \leftarrow 0 \mid 1 \mid (S)] | ||
| - | 10.2.1. Implement an AST for expressions. | + | 10.1.1. Implement an AST for expressions. |
| - | 10.2.2. Implement a parser for expressions. Consider the following guidelines: | + | |
| - | * A **parser** is a function which takes a string and has two tasks: | + | * A **parser** is a function which takes a string and has **two** tasks: |
| - | * returns the **rest of the string to be parsed**, or an error if parsing failed. Examples: | + | - returns the **rest of the string to be parsed**, or an error if parsing failed. Examples: |
| * ''parse_whitespace(" lfa") = "lfa"'' | * ''parse_whitespace(" lfa") = "lfa"'' | ||
| * ''parse_whitespace("lfa") = None'' | * ''parse_whitespace("lfa") = None'' | ||
| + | - adds the parsed value to **a global stack** whenever the value is part of the AST to be built. | ||
| + | Another example: | ||
| + | <code python> | ||
| + | stack = [] # | ||
| + | def parse_digit(w): | ||
| + | if len(w) == 0: | ||
| + | return None # parsing fails | ||
| + | | ||
| + | if w[0].isalphanum(): | ||
| + | stack.append(w[0]) # add the parsed digit to the stack | ||
| + | return w[1:] # return the rest of the word | ||
| + | else: | ||
| + | return None # if the character is not a digit, the parsing fails | ||
| + | </code> | ||
| + | 10.1.3. Implement a function ''parse_plus'' which parses the character '+' (if the first character is '+', it consumes it, otherwise it fails). Hint: use a more general function which you can then reuse to parse other characters. | ||
| + | |||
| + | 10.1.4. We can build **more complex parsers** from simpler ones. The key is to **try** to parse expressions and if parsing fails, we can try a different alternative. | ||
| + | Complete the following implementation of the function ''parse_multiplication'': | ||
| + | |||
| + | <code python> | ||
| + | def parse_multiplication(w): | ||
| + | if len(w) == 0: | ||
| + | return None | ||
| + | | ||
| + | w1 = parse_digit(w) # parse a digit | ||
| + | | ||
| + | if w1 != None: | ||
| + | # we have parsed a digit, now we try to parse '+': | ||
| + | w2 = parse_plus(w1) | ||
| + | if w2 != None: | ||
| + | # we have successfully parsed a '+' | ||
| + | w3 = parse_multiplication(w2) | ||
| + | if w3 != None: | ||
| + | # we have parsed a digit followed by + and by another multiplication expression | ||
| + | # what are the contents of the stack right now? | ||
| + | # how should the stack be modified? | ||
| + | else: | ||
| + | # parsing a '+' has failed, so we just return the rest of the string w1 | ||
| + | return w1 | ||
| + | else: | ||
| + | return None # parsing a digit failed | ||
| + | | ||
| + | </code> | ||
| + | |||
| + | 10.1.5. Following the same structure, write a complete implementation for expression parsers. | ||
| + | |||
| + | |||
| + | ===== 10.2. Writing a parser for regular expressions ===== | ||
| + | |||
| + | 10.2.1. Write a grammar which accurately describes regular expressions. Consider the following definition: //A regular expression is built in the normal way, using the symbols (,),*,| and any other alpha-numeric character. Free spaces may occur freely within the expression//. | ||
| + | |||
| + | 10.2.2. Starting from the solution to the previous exercise, write an unambiguous grammar for regexes: | ||
| + | * Make sure to take precedence into account | ||
| + | 10.2.3. Write a parser for regular expressions. | ||