Table of Contents

An introduction to JFlex

What is missing in the Haskell analyser?

Our Haskell analyser is generic (independent of the language being scanned), however it lacks certain key features which are key for production development:

We will now briefly illustrate a tool which incorporates these features. The tool is JFlex and requires programmers to develop their code in Java. (An alternative in Haskell, called ALex, also exists).

JFlex receives as input a spec file *.flex, which, informally, contains a list of regular expressions to be searched, as well as actions (Java code) to be executed once a regular expression is matched. After compiling the flex file, JFlex outputs a Java class, which implements a scanner. The Java class can be included in a larger project with extended functionality.

Installing JFlex

A complete, platform-dependent set of installation instructions can be found here. In a nutshell, JFlex comes as a binary app jflex.

The structure of a flex file

Consider the following simple JFlex file:

import java.util.*;
 
%%
 
%class HelloLexer
%standalone
 
%{
  public Integer words = 0;
%}
 
LineTerminator = \r|\n|\r\n
 
%%   
 
[a-zA-Z]+ { words+=1; }
{LineTerminator} { /* do nothing*/ }

Suppose the above file is called Hello.flex. Running the command jflex Hello.flex will generate a Java class which implements a lexer.

Each JFlex file (such as the above), contains 5 sections:

Compiling a Hello World project

After performing:

jflex Hello.flex

we obtain HelloLexer.java which contains the HelloLexer public class implementing our lexer. We can easily include this class in our project, e.g.:

import java.io.*;
import java.util.*;
 
public class Hello {
  public static void main (String[] args) throws IOException {
    HelloLexer l = new HelloLexer(new FileReader(args[0]));
 
    l.yylex();
 
    System.out.println(l.words);
 
 
  }
}

After compiling:

javac HelloLexer.java Hello.java

and running:

java Hello

we obtain:

 
 

 6

at standard output.

Recall that the option standalone tells the lexer to print unmatched words. In our example, those unmatched words are whitespaces.

Application - parsing expressions

Consider the following BNF grammar which describes a different variant of IMP arithmetic expressions:

<val> ::= [0-9]+
<var> ::= [A-Z][a-z]*[0-9]*
<op> ::= "+" | "MOD"
<atom> ::= <val> | <var>
<expr> ::= <atom> | <atom> <op> <expr>

The following are examples of expressions:

x + 1
A + 1 MOD Bx + Cy0

We start this exercise by first identifying the regular expressions we are interested in. The flex file is given below:

import java.util.*;

%%

%class ExprLexer
%standalone

%{
      public Expr crtexpr = null;
      public String crtop = null;
%}

LineTerminator = \r|\n|\r\n
WS             = (" "|\t)+

op             = "+"|"MOD"
alfastream     = [a-zA-Z]+
digitstream    = [0-9]+
var            = [A-Z]{alfastream}?{digitstream}?
val            = digitstream

%%   

{var} { if (crtop == null) crtexpr = new Var(yytext()); else crtexpr = new Binary(crtop,crtexpr,new Var(yytext())); }
{op} {crtop = yytext();}

{WS} {}

{LineTerminator} { /* do nothing*/ }

Note that we have opted to define the regular expression var in terms of other regular expressions. The regexp e? should be read as e - zero or one occurrence of e. Also note that the text MOD may be interpreted as an op as well as a alphastream; that is why it is important to have it defined as an operator first.

Once the regular expressions are defined, we proceed to the program logic:

Whenever a variable is parsed, we add it as a new expression, if no operator has been previously scanned. Otherwise, we use the existing operator to create a new (sub) expression.

Note that the above program doesn't handle malformed inputs well. Can you identify such cases?

Finally, we add the data-structures required to hold parsed expressions as well as the main test class:

import java.io.*;
import java.util.*;
 
interface Expr {}
 
abstract class Atom implements Expr {
  private String name;
  public Atom (String name) {this.name = name;}
  @Override
  public String toString () {return "{"+this.name+"}";}
}
 
class Binary implements Expr {
  private Expr l,r;
  private String op;
  public Binary (String op, Expr l, Expr r) {this.op = op; this.l = l; this.r = r;}
  @Override
  public String toString () {return l.toString()+"<"+op+">"+r.toString();} 
}
 
 
class Var extends Atom {
  public Var (String s) {super(s);}
}
class Val extends Atom {
    public Val (String s) {super(s);}
}
 
public class Test {
 
 public static void main (String[] args) throws IOException {
    ExprLexer l = new ExprLexer(new FileReader(args[0]));
 
    l.yylex();
 
    System.out.println(l.crtexpr);
 
  }
}