Differences

This shows you the differences between two versions of the page.

--- lfa:lab10-lexer [2021/12/14 15:42]
pdmatei
+++ lfa:lab10-lexer [2021/12/14 16:31] (current)
pdmatei
@@ Line 1: / Line 1: @@
 ====== 10. Writing a parser for a CF language ======
+===== 10.1. A basic functional structure for a parser =====
-===== 10.1. The grammar =====
+Consider the following language encoding expressions:
+  * $math[S \leftarrow M \mid M + S]
+  * $math[M \leftarrow A \mid A * M]
+  * $math[A \leftarrow 0 \mid 1 \mid (S)]
-.1.1. Write a grammar which accurately describes regular expressions. Consider the following definition: //A regular expression is built in the normal way, using the symbols (,),*,| and any other alpha-numeric caracter. Free spaces may occur freely within the expression//.
+.1.1. Implement an AST for expressions.
-.1.2. Starting from the solution to the previous exercise, write an unambiguous grammar for regexes:
+  * A **parser** is a function which takes a string and has **two** tasks:
-   * Make sure to take precedence into account
+      - returns the **rest of the string to be parsed**, or an error if parsing failed. Examples:
+          * ''parse_whitespace(" lfa") = "lfa"''
+          * ''parse_whitespace("lfa") = None''
+      - adds the parsed value to **a global stack** whenever the value is part of the AST to be built.
-===== 10.2. The implementation =====
+Another example:
+<code python>
+stack = []  #
-.2.1. Write a parser for the following language:
+def parse_digit(w):
-  * $math[S \leftarrow M \mid M + S]
+    if len(w) == 0:
-  * $math[M \leftarrow A \mid A * M]
+       return None         # parsing fails
-  * $math[A \leftarrow 0 | 1 | (S)]
+    if w[0].isalphanum():
+       stack.append(w[0])  # add the parsed digit to the stack
+       return w[1:]        # return the rest of the word
+    else:
+       return None         # if the character is not a digit, the parsing fails
+</code>
+.1.3. Implement a function ''parse_plus'' which parses the character '+' (if the first character is '+', it consumes it, otherwise it fails). Hint: use a more general function which you can then reuse to parse other characters.
+.1.4. We can build **more complex parsers** from simpler ones. The key is to **try** to parse expressions and if parsing fails, we can try a different alternative.
+Complete the following implementation of the function ''parse_multiplication'':
+<code python>
+def parse_multiplication(w):
+    if len(w) == 0:
+       return None
+    w1 = parse_digit(w)   # parse a digit
+    if w1 != None:
+        # we have parsed a digit, now we try to parse '+':
+        w2 = parse_plus(w1)
+        if w2 != None:
+            # we have successfully parsed a '+'
+            w3 = parse_multiplication(w2)
+            if w3 != None:
+                # we have parsed a digit followed by + and by another multiplication expression
+                # what are the contents of the stack right now?
+                # how should the stack be modified?
+        else:
+            # parsing a '+' has failed, so we just return the rest of the string w1
+            return w1
+    else:
+       return None # parsing a digit failed
+</code>
+.1.5. Following the same structure, write a complete implementation for expression parsers.
+===== 10.2. Writing a parser for regular expressions =====
+.2.1. Write a grammar which accurately describes regular expressions. Consider the following definition: //A regular expression is built in the normal way, using the symbols (,),*,| and any other alpha-numeric character. Free spaces may occur freely within the expression//.
+.2.2. Starting from the solution to the previous exercise, write an unambiguous grammar for regexes:
+   * Make sure to take precedence into account
+.2.3. Write a parser for regular expressions.