Lab

Key insights:

  • more languages than regular expressions
  • we do not know how to write regular expressions for some languages (as a direct consequence of the above)
  • reg.exps. are unambiguous and FINITE language representations

Objectives:

  • Understand what is a language and a regular expression is, and the relation between them;
  • Write several regular expressions for designated languages
  • Identify languages described by some regular expressions (e.g. ?!)

Resources (tentative):

Exercises

I. What is the regular expression for the following languages:

  • $ \Sigma=\{0, 1\}$ , $ L=\{011\}$

Solution: $ E=011$
Obs: By definition the correct expression is $ E=((01)1)$ , but we won't write them when not needed and we use a precedence rule to reduce the number of parantheses in regular expressions as much as possible (Kleene Star > Concatenation > Union).

  • $ \Sigma=\{a, b\}$ , $ L=\{a, b\}$

Solution: $ E=a \cup b$
Obs: By definition the correct expression is $ E=(a \cup b)$ , but we can remove parentheses for the same reason as above.

  • $ \Sigma=\{0, 1\}$ , $ L=\{e, 0, 1, 00, 01, 10, 11, 000, \ldots\}$

Solution: $ E=(0 \cup 1)^{*}$

  • $ \Sigma=\{0, 1\}$ , $ L=\{010010101000010, 010010101000011\}$

Solution: $ E=01001010100001(0 \cup 1)$

  • $ \Sigma=\{0, 1\}$ , $ L=\{w \in \Sigma^{*} \mid w \text{ ends with } 0\}$

Solution: $ E=(0 \cup 1)^{*}0$

  • $ \Sigma=\{0, 1\}$ , $ L=\{w \in \Sigma^{*} \mid w=w_101 \lor w=1w_1, w_1 \in \Sigma^{*}\}$

Solution: $ E=(0 \cup 1)^{*}01 \cup 1(0 \cup 1)$

  • $ \Sigma=\{x, y\}$ , $ L=\{w \in \Sigma^{*} \mid \#_x(w) = 2\}$

Solution: $ E=y^{*}xy^{*}xy^{*}$
Obs: $ \#_x(w)$ denotes number of $ x$ in $ w$ .

  • $ \Sigma=\{a, b\}$ , $ L=\{w \in \Sigma^{*} \mid \#_a(w) \,\vdots\, 2\}$

Solution: $ E=(b^{*}ab^{*}ab^{*})^{*}$

  • $ \Sigma=\{x, y\}$ , $ L=\{w \in \Sigma^{*} \mid \#_x(w) \ge 1\}$

Solution: $ E=(x \cup y)^{*}x(x \cup y)^{*}$

  • $ \Sigma=\{a, b, c\}$ , $ L=\{w \in \Sigma^{*} \mid \#_a(w) \ge 1 \land \#_c(w) \ge 1\}$

Solution: $ E=((a \cup b \cup c)^{*}a(a \cup b \cup c)^{*}b(a \cup b \cup c)^{*}) \cup ((a \cup b \cup c)^{*}b(a \cup b \cup c)^{*}a(a \cup b \cup c)^{*})$

  • $ \Sigma=\{a, b\}$ , $ L=\{w \in \Sigma^{*} \mid w \text{ does not contain } ba\}$

Solution: $ E=a^{*}b^{*}$

  • $ \Sigma=\{a, b\}$ , $ L=\{w \in \Sigma^{*} \mid \#_a(w) + \#_b(w) = 0\}$

Solution: $ E=\epsilon$

  • $ \Sigma=\{a, b\}$ , $ L=\{w \in \Sigma^{*} \mid \#_a(w) + \#_b(w) < 0\}$

Solution: $ E=\emptyset$

Installing JFlex

A complete, platform-dependent set of installation instructions can be found here. In a nutshell, JFlex comes as a binary app jflex.

The structure of a flex file

Consider the following simple JFlex file:

import java.util.*;
 
%%
 
%class HelloLexer
%standalone
 
%{
  public Integer words = 0;
%}
 
LineTerminator = \r|\n|\r\n
 
%%   
 
[a-zA-Z]+ { words+=1; }
{LineTerminator} { /* do nothing*/ }

Suppose the above file is called Hello.flex. Running the command jflex Hello.flex will generate a Java class which implements a lexer.

Each JFlex file (such as the above), contains 5 sections:

  • the first section, which ends at the first occurrence of \%\% contains declarations which will be added at the beginning of the Java class file.
  • the second section, right after %% and until %{ contains a sequence of options for jflex. Here, we use two options:
    • class HelloLexer tells jflex that the output java class that the lexer classname should be HelloLexer
    • standalone tells jflex to print the unmatched input word at to standard output and continue scanning.
    • More details regarding possible options can be found in the JFlex docs.
  • the third section, separated by %{ and %} contains declarations which will be appended in the Lexer class file. Here we declare a public variable words.
  • the fourth section contains regular expression declarations. Here, we have declared LineTerminator to be the regular expression \r | \n | \r\n. Declarations can be use to build more complicated RegExps from simple ones, and can be used as well in the fifth section of the flex file:
  • the fifth section contains rules and actions: a rule specifies a regular expression to be scanned, as well as the appropriate action to be taken, when a word satisfying the regexp is found:
    • the rule [a-zA-Z]+ { words+=1; } states that whenever [a-zA-Z]+ (a regexp defined inline) is matched by a word, words+=1; should be executed;
    • the rule {LineTerminator} { } refers to the regexp defined above (note the brackets); here no action should be executed;
    • JFlex will always scan for the longest input word which satisfies a regexp. When a word satisfies more than one regexp the first one from the flex file will be matched.

Compiling a Hello World project

After performing:

jflex Hello.flex

we obtain HelloLexer.java which contains the HelloLexer public class implementing our lexer. We can easily include this class in our project, e.g.:

import java.io.*;
import java.util.*;
 
public class Hello {
  public static void main (String[] args) throws IOException {
    HelloLexer l = new HelloLexer(new FileReader(args[0]));
 
    l.yylex();
 
    System.out.println(l.words);
 
 
  }
}
  • Note that the lexer constructor method receives a java Reader as input (other options are possible, see the docs), and we take the name of the file to-be-scanned from standard input.
  • Each lexer implements the method yylex which starts the scanning process.

After compiling:

javac HelloLexer.java Hello.java

and running:

java Hello

we obtain:

 
 

 6

at standard output.

Recall that the option standalone tells the lexer to print unmatched words. In our example, those unmatched words are whitespaces.

Application - parsing lists

Consider the following BNF grammar which describes lists:

<integer> ::= [0-9]+
<op> ::= "++" | ":"
<element> ::= <integer> | <op> | <list>
<sequence> ::= <element> | <element> " " <sequence> 
<list> ::= | "()" | "(" <sequence> ")"

The following are examples of lists:

(1 2 3)
(1 (2 3) 4 ())
(1 (++ (: 2 (3)) (4 5)) 6)

Your task is to:

  • correctly parse such lists:
    • write a JFlex file to implement the lexer:
      • Since the language describing lists is Context Free, in order to parse a list, you need to keep track of the opened/closed parenthesis.
      • Start by write a PDA (on paper) which accepts correctly-formed lists. Treat each regular expression you defined (for numbers and operators) as a single symbol;
      • Implement the PDA (strategy) in the lexer file;
  • given a correctly-defined list, write a procedure which evaluates lists operations (in the standard way); For instance, (1 (++ (: 2 (3)) (4 5)) 6) evaluates to (1 (2 3 4 5) 6)
  • write a procedure which checks if a list is semantically valid. What type of checks do you need to implement?