Lab
Lab-01 - Expresii regulate
Key insights:
- more languages than regular expressions
- we do not know how to write regular expressions for some languages (as a direct consequence of the above)
- reg.exps. are unambiguous and FINITE language representations
Objectives:
- Understand what is a language and a regular expression is, and the relation between them;
- Write several regular expressions for designated languages
- Identify languages described by some regular expressions (e.g. ?!)
Resources (tentative):
Exercises
I. What is the regular expression for the following languages:
- $ \Sigma=\{0, 1\}$ , $ L=\{011\}$
Solution: $ E=011$
Obs: By definition the correct expression is $ E=((01)1)$ , but we won't write them when not needed and we use a precedence rule to reduce the number of parantheses in regular expressions as much as possible (Kleene Star > Concatenation > Union).
- $ \Sigma=\{a, b\}$ , $ L=\{a, b\}$
Solution: $ E=a \cup b$
Obs: By definition the correct expression is $ E=(a \cup b)$ , but we can remove parentheses for the same reason as above.
- $ \Sigma=\{0, 1\}$ , $ L=\{e, 0, 1, 00, 01, 10, 11, 000, \ldots\}$
Solution: $ E=(0 \cup 1)^{*}$
- $ \Sigma=\{0, 1\}$ , $ L=\{010010101000010, 010010101000011\}$
Solution: $ E=01001010100001(0 \cup 1)$
- $ \Sigma=\{0, 1\}$ , $ L=\{w \in \Sigma^{*} \mid w \text{ ends with } 0\}$
Solution: $ E=(0 \cup 1)^{*}0$
- $ \Sigma=\{0, 1\}$ , $ L=\{w \in \Sigma^{*} \mid w=w_101 \lor w=1w_1, w_1 \in \Sigma^{*}\}$
Solution: $ E=(0 \cup 1)^{*}01 \cup 1(0 \cup 1)$
- $ \Sigma=\{x, y\}$ , $ L=\{w \in \Sigma^{*} \mid \#_x(w) = 2\}$
Solution: $ E=y^{*}xy^{*}xy^{*}$
Obs: $ \#_x(w)$ denotes number of $ x$ in $ w$ .
- $ \Sigma=\{a, b\}$ , $ L=\{w \in \Sigma^{*} \mid \#_a(w) \,\vdots\, 2\}$
Solution: $ E=(b^{*}ab^{*}ab^{*})^{*}$
- $ \Sigma=\{x, y\}$ , $ L=\{w \in \Sigma^{*} \mid \#_x(w) \ge 1\}$
Solution: $ E=(x \cup y)^{*}x(x \cup y)^{*}$
- $ \Sigma=\{a, b, c\}$ , $ L=\{w \in \Sigma^{*} \mid \#_a(w) \ge 1 \land \#_c(w) \ge 1\}$
Solution: $ E=((a \cup b \cup c)^{*}a(a \cup b \cup c)^{*}b(a \cup b \cup c)^{*}) \cup ((a \cup b \cup c)^{*}b(a \cup b \cup c)^{*}a(a \cup b \cup c)^{*})$
- $ \Sigma=\{a, b\}$ , $ L=\{w \in \Sigma^{*} \mid w \text{ does not contain } ba\}$
Solution: $ E=a^{*}b^{*}$
- $ \Sigma=\{a, b\}$ , $ L=\{w \in \Sigma^{*} \mid \#_a(w) + \#_b(w) = 0\}$
Solution: $ E=\epsilon$
- $ \Sigma=\{a, b\}$ , $ L=\{w \in \Sigma^{*} \mid \#_a(w) + \#_b(w) < 0\}$
Solution: $ E=\emptyset$
Lab x - JFlex
Installing JFlex
A complete, platform-dependent set of installation instructions can be found here. In a nutshell, JFlex comes as a binary app jflex
.
The structure of a flex file
Consider the following simple JFlex file:
import java.util.*; %% %class HelloLexer %standalone %{ public Integer words = 0; %} LineTerminator = \r|\n|\r\n %% [a-zA-Z]+ { words+=1; } {LineTerminator} { /* do nothing*/ }
Suppose the above file is called Hello.flex
. Running the command jflex Hello.flex
will generate a Java class which implements a lexer.
Each JFlex file (such as the above), contains 5 sections:
- the first section, which ends at the first occurrence of
\%\%
contains declarations which will be added at the beginning of the Java class file. - the second section, right after
%%
and until%{
contains a sequence of options for jflex. Here, we use two options:class HelloLexer
tells jflex that the output java class that the lexer classname should beHelloLexer
standalone
tells jflex to print the unmatched input word at to standard output and continue scanning.- More details regarding possible options can be found in the JFlex docs.
- the third section, separated by
%{
and%}
contains declarations which will be appended in the Lexer class file. Here we declare a public variablewords
. - the fourth section contains regular expression declarations. Here, we have declared
LineTerminator
to be the regular expression\r | \n | \r\n
. Declarations can be use to build more complicated RegExps from simple ones, and can be used as well in the fifth section of the flex file: - the fifth section contains rules and actions: a rule specifies a regular expression to be scanned, as well as the appropriate action to be taken, when a word satisfying the regexp is found:
- the rule
[a-zA-Z]+ { words+=1; }
states that whenever[a-zA-Z]+
(a regexp defined inline) is matched by a word,words+=1;
should be executed; - the rule
{LineTerminator} { }
refers to the regexp defined above (note the brackets); here no action should be executed; - JFlex will always scan for the longest input word which satisfies a regexp. When a word satisfies more than one regexp the first one from the flex file will be matched.
Compiling a Hello World project
After performing:
jflex Hello.flex
we obtain HelloLexer.java
which contains the HelloLexer
public class implementing our lexer. We can easily include this class in our project, e.g.:
import java.io.*; import java.util.*; public class Hello { public static void main (String[] args) throws IOException { HelloLexer l = new HelloLexer(new FileReader(args[0])); l.yylex(); System.out.println(l.words); } }
- Note that the lexer constructor method receives a java Reader as input (other options are possible, see the docs), and we take the name of the file to-be-scanned from standard input.
- Each lexer implements the method
yylex
which starts the scanning process.
After compiling:
javac HelloLexer.java Hello.java
and running:
java Hello
we obtain:
6
at standard output.
Recall that the option standalone
tells the lexer to print unmatched words. In our example, those unmatched words are whitespaces.
Application - parsing lists
Consider the following BNF grammar which describes lists:
<integer> ::= [0-9]+ <op> ::= "++" | ":" <element> ::= <integer> | <op> | <list> <sequence> ::= <element> | <element> " " <sequence> <list> ::= | "()" | "(" <sequence> ")"
The following are examples of lists:
(1 2 3) (1 (2 3) 4 ()) (1 (++ (: 2 (3)) (4 5)) 6)
Your task is to:
- correctly parse such lists:
- write a JFlex file to implement the lexer:
- Since the language describing lists is Context Free, in order to parse a list, you need to keep track of the opened/closed parenthesis.
- Start by write a PDA (on paper) which accepts correctly-formed lists. Treat each regular expression you defined (for numbers and operators) as a single symbol;
- Implement the PDA (strategy) in the lexer file;
- given a correctly-defined list, write a procedure which evaluates lists operations (in the standard way); For instance,
(1 (++ (: 2 (3)) (4 5)) 6)
evaluates to(1 (2 3 4 5) 6)
- write a procedure which checks if a list is semantically valid. What type of checks do you need to implement?