Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
lfa:lab03-lexers [2021/09/28 10:37] pdmatei |
lfa:lab03-lexers [2021/10/25 11:46] (current) stefan.stancu |
||
---|---|---|---|
Line 8: | Line 8: | ||
---- | ---- | ||
- | **3.1.1.** Suppose $math[A_1] is a DFA and w=''aabaaabb'' is a word. Find the **longest prefix** which is accepted by $math[A_1]. | + | **3.1.1.** Suppose $math[A_1] is a DFA and w=''aabaaabb'' is a word. Find the **longest prefix** of w which is accepted by $math[A_1]. |
- | {{ :lfa:lexer-a1.png?200 |}} | + | {{ :lfa:lexer-a1.png?300 |}} |
---- | ---- | ||
Line 16: | Line 16: | ||
---- | ---- | ||
- | **3.1.2.** Split the following word $math[w]=''ababbaabbabaab'' using $math[A_2] as the unique spec. | + | **3.1.2.** Split the following word $math[w]=''ababbaabbabaab'' using $math[A_2] as the unique token. |
- | {{ :lfa:lexer-a2.png?200 |}} | + | {{ :lfa:lexer-a2.png?300 |}} |
**3.1.3.** Given DFAs $math[A_3], $math[A_4] and $math[A_5], use them to split the word $math[w]=''abaaabbabaaaab'' into lexemes. | **3.1.3.** Given DFAs $math[A_3], $math[A_4] and $math[A_5], use them to split the word $math[w]=''abaaabbabaaaab'' into lexemes. | ||
- | ^ Table with alignment ^^^ | + | ^^^^ |
- | | {{ :lfa:lexer-a3.png?200 |}} | {{ :lfa:lexer-a4.png?200 |}} |{{ :lfa:lexer-a5.png?200 |}} | | + | | {{ :lfa:lexer-a3.png?300 |}} | {{ :lfa:lexer-a4.png?200 |}} |{{ :lfa:lexer-a5.png? |
+ | 200 |}} | | ||
Line 29: | Line 30: | ||
===== 3.2. Priority and longest match ===== | ===== 3.2. Priority and longest match ===== | ||
- | When two DFAs match the same (longest) prefix, the first one (in the order of their priority) is matched. An interesting question is whether maximal matching may be replaced by priorities. The following exercise illustrates why this is not the case. | + | When two or more DFAs match the same (longest) prefix, the first one (in the order of their priority) is selected. An interesting question is whether maximal matching may be replaced by priorities. The following exercise illustrates why this is not the case. |
---- | ---- | ||
Line 35: | Line 36: | ||
Let: | Let: | ||
- | * $math[A_1] be a DFA which matches lowercase character sequences (''[a-z]+''), ending with a whitespace (e.g. ''aba '') | + | * $math[A] be a DFA which matches lowercase character sequences (''[a-z]+''), ending with a whitespace (e.g. ''aba '') |
- | * while $math[A_2] matches ''def '' (the four-letter sequence). Let $math[w]=''def deffunction ''. | + | * while $math[B] matches "''def ''" (the four-letter sequence). Let $math[w]="''def deffunction ''". |
Suppose: | Suppose: | ||
- | * $math[A_1] has higher priority than $math[A_2]. How will the string be split? (Which are the lexemes?) | + | * $math[A] has higher priority than $math[B]. How will the string be split? (Which are the lexemes?) |
- | * $math[A_2] has higher priority than $math[A_1]. How will the splitting look like? | + | * $math[B] has higher priority than $math[A]. How will the splitting look like? |
- | * finally, let us return to the **maximal match** principle. How should the DFAs $math[A_1] an $math[A_2] be ordered (w.r.t. priority) so that our word is split in the correct way (assuming a Python syntax)? | + | * finally, let us return to the **maximal match** principle. How should the DFAs $math[A] an $math[B] be ordered (w.r.t. priority) so that our word is split in the correct way (assuming a Python syntax)? |
===== 3.3. Implementation ===== | ===== 3.3. Implementation ===== | ||
- | **3.3.1.** Implement a three-DFA lexer with DFAs $math[A_1], $math[A_2] and $math[A_3]. You can use the code from last lab to directly instantiate the three DFAs. The input should be a word, and the output should be a string of the form ''<token_1>:<lexeme_1> ... <token_n>:<lexeme_n>'', where ''<token_i>'' is the DFA's id (from 1 to 3) and ''<lexeme_i>'' is the matched lexeme. | + | **3.3.1.** Implement a three-DFA lexer with DFAs $math[A_3], $math[A_4] and $math[A_5]. You can use the code from last lab to directly instantiate the three DFAs. The input should be a word, and the output should be a string of the form ''<token_1>:<lexeme_1> ... <token_n>:<lexeme_n>'', where ''<token_i>'' is the DFA's id (from 3 to 5) and ''<lexeme_i>'' is the matched lexeme. |