====== Lab ======


===== Lab-01 - Expresii regulate =====


Key insights:
  * more languages than regular expressions
  * we do not know how to write regular expressions for some languages (as a direct consequence of the above)
  * reg.exps. are unambiguous and FINITE language representations


Objectives:
  * Understand what is a language and a regular expression is, and the relation between them;
  * Write several regular expressions for designated languages
  * Identify languages described by some regular expressions (e.g. ?!)

Resources (tentative):
  * http://www.idt.mdh.se/kurser/cd5560/10_01/examination/KOMPENDIER/Regular/kompendium_eng.pdf

=== Exercises ===
**I. What is the regular expression for the following languages:**

  * $math[\Sigma=\{0, 1\}], $math[L=\{011\}]
//Solution:// $math[E=011] \\
//Obs:// By definition the correct expression is $math[E=((01)1)], but we won't write them when not needed and we use a precedence rule to reduce the number of parantheses in regular expressions as much as possible (Kleene Star > Concatenation > Union).
  * $math[\Sigma=\{a, b\}], $math[L=\{a, b\}]
//Solution:// $math[E=a \cup b] \\
//Obs:// By definition the correct expression is $math[E=(a \cup b)], but we can remove parentheses for the same reason as above.
  * $math[\Sigma=\{0, 1\}], $math[L=\{e, 0, 1, 00, 01, 10, 11, 000, ...\}]
//Solution:// $math[E=(0 \cup 1)^{*}] \\
  * $math[\Sigma=\{0, 1\}], $math[L=\{010010101000010, 010010101000011\}]
//Solution:// $math[E=01001010100001(0 \cup 1)] \\
  * $math[\Sigma=\{0, 1\}], $math[L=\{w \in \Sigma^{*} \mid w \text{ ends with } 0\}] \\
//Solution:// $math[E=(0 \cup 1)^{*}0] \\
  * $math[\Sigma=\{0, 1\}], $math[L=\{w \in \Sigma^{*} \mid w=w_101 \lor w=1w_1, w_1 \in \Sigma^{*}\}] \\
//Solution:// $math[E=(0 \cup 1)^{*}01 \cup 1(0 \cup 1)] \\
  * $math[\Sigma=\{x, y\}], $math[L=\{w \in \Sigma^{*} \mid \#_x(w) = 2\}] \\
//Solution:// $math[E=y^{*}xy^{*}xy^{*}] \\
//Obs:// $math[\#_x(w)] denotes number of $math[x] in $math[w].
  * $math[\Sigma=\{a, b\}], $math[L=\{w \in \Sigma^{*} \mid \#_a(w) \,\vdots\, 2\}] \\
//Solution:// $math[E=(b^{*}ab^{*}ab^{*})^{*}] \\
  * $math[\Sigma=\{x, y\}], $math[L=\{w \in \Sigma^{*} \mid \#_x(w) \ge 1\}] \\
//Solution:// $math[E=(x \cup y)^{*}x(x \cup y)^{*}] \\
  * $math[\Sigma=\{a, b, c\}], $math[L=\{w \in \Sigma^{*} \mid \#_a(w) \ge 1 \land \#_c(w) \ge 1\}] \\
//Solution:// $math[E=((a \cup b \cup c)^{*}a(a \cup b \cup c)^{*}b(a \cup b \cup c)^{*}) \cup ((a \cup b \cup c)^{*}b(a \cup b \cup c)^{*}a(a \cup b \cup c)^{*})] \\
  * $math[\Sigma=\{a, b\}], $math[L=\{w \in \Sigma^{*} \mid w \text{ does not contain } ba\}] \\
//Solution:// $math[E=a^{*}b^{*}] \\
  * $math[\Sigma=\{a, b\}], $math[L=\{w \in \Sigma^{*} \mid \#_a(w) + \#_b(w) = 0\}] \\
//Solution:// $math[E=\epsilon] \\
  * $math[\Sigma=\{a, b\}], $math[L=\{w \in \Sigma^{*} \mid \#_a(w) + \#_b(w) < 0\}] \\
//Solution:// $math[E=\emptyset] \\

===== Lab x - JFlex =====

==== Installing JFlex ====

A complete, platform-dependent set of installation instructions can be found [[http://jflex.de/installing.html| here]]. In a nutshell, JFlex comes as a binary app ''jflex''.

==== The structure of a flex file ====

Consider the following simple JFlex file:
<code java>
import java.util.*;

%%

%class HelloLexer
%standalone

%{
  public Integer words = 0;
%}

LineTerminator = \r|\n|\r\n

%%   

[a-zA-Z]+ { words+=1; }
{LineTerminator} { /* do nothing*/ }
</code>

Suppose the above file is called ''Hello.flex''. Running the command ''jflex Hello.flex'' will generate a Java class which implements a lexer.

Each JFlex file (such as the above), contains 5 sections:
  * the first section, which ends at the first occurrence of ''\%\% '' contains declarations which will be added at the beginning of the Java class file.
  * the second section, right after ''%%'' and until ''%{'' contains a sequence of options for jflex. Here, we use two options:
      * ''class HelloLexer'' tells jflex that the output java class that the lexer classname should be ''HelloLexer''
      * ''standalone'' tells jflex to print the unmatched input word at to standard output and continue scanning.
      * More details regarding possible options can be found in the [[http://jflex.de/manual.pdf|JFlex docs]].
  * the third section, separated by ''%{'' and ''%}'' contains declarations which will be appended in the Lexer class file. Here we declare a public variable ''words''.
  * the fourth section contains regular expression **declarations**. Here, we have declared ''LineTerminator'' to be the regular expression ''\r | \n | \r\n''. Declarations can be use to build more complicated RegExps from simple ones, and can be used as well in the fifth section of the flex file:
  * the fifth section contains rules and actions: a rule specifies a regular expression to be scanned, as well as the appropriate action to be taken, when a word satisfying the regexp is found:
    * the rule ''[a-zA-Z]+ { words+=1; }'' states that whenever ''[a-zA-Z]+'' (a regexp defined inline) is matched by a word, ''words+=1;'' should be executed;
    * the rule ''{LineTerminator} { /* do nothing*/ }'' refers to the regexp defined above (note the brackets); here no action should be executed;
    * JFlex will always scan for the **longest** input word which satisfies a regexp. When a word satisfies more than one regexp the **first** one from the flex file will be matched.

==== Compiling a Hello World project ====

After performing:
<code>
jflex Hello.flex
</code>

we obtain ''HelloLexer.java'' which contains the ''HelloLexer'' public class implementing our lexer. We can easily include this class in our project, e.g.:

<code java>
import java.io.*;
import java.util.*;

public class Hello {
  public static void main (String[] args) throws IOException {
    HelloLexer l = new HelloLexer(new FileReader(args[0]));

    l.yylex();

    System.out.println(l.words);

    
  }
}
</code>
  * Note that the lexer constructor method receives a java Reader as input (other options are possible, see the docs), and we take the name of the file to-be-scanned from standard input.
  * Each lexer implements the method ''yylex'' which starts the scanning process.

After compiling:
<code>
javac HelloLexer.java Hello.java
</code>

and running:

<code>
java Hello
</code>

we obtain:
<code>
 
 
 6
</code>
at standard output.

Recall that the option ''standalone'' tells the lexer to print unmatched words. In our example, those unmatched words are whitespaces.

==== Application - parsing lists ====

Consider the following BNF grammar which describes lists:
<code>
<integer> ::= [0-9]+
<op> ::= "++" | ":"
<element> ::= <integer> | <op> | <list>
<sequence> ::= <element> | <element> " " <sequence> 
<list> ::= | "()" | "(" <sequence> ")"
</code>

The following are examples of lists:
<code>
(1 2 3)
(1 (2 3) 4 ())
(1 (++ (: 2 (3)) (4 5)) 6)
</code>

Your task is to:
  * correctly parse such lists:
    * write a JFlex file to implement the lexer:
      * Since the language describing lists is Context Free, in order to parse a list, you need to keep track of the opened/closed parenthesis. 
      * Start by write a PDA (on paper) which accepts correctly-formed lists. Treat each regular expression you defined (for numbers and operators) as a single symbol;
      * Implement the PDA (strategy) in the lexer file;
  * given a correctly-defined list, write a procedure which evaluates lists operations (in the standard way); For instance, ''(1 (++ (: 2 (3)) (4 5)) 6)'' evaluates to ''(1 (2 3 4 5) 6)''
  * write a procedure which checks if a list is **semantically valid**. What type of checks do you need to implement?