Preliminaries

Formal Languages are a branch of theoretical Computer Science which has applications in numerous areas. Some of them are:

  1. compilers and compiler design
  2. complexity theory
  3. program verification

In the first part of the lecture, we shall focus on compiler design as a case study for introducing Formal Languages. Later on, we shall explore the usage of Formal Languages for complexity theory and program verification.

Compilers and interpreters

A compiler is a software program that takes as input a character stream (usually as a text file), containing a program and outputs executable code, in an assembly language appropriate for the computer architecture at hand.

The compilers' input program belongs to a high-level programming language (e.g. C/C++, Java, Scala, Haskell, etc.). The compiler itself may be written in C/C++, Java, Scala or Haskell, usually relying on parser generators - APIs/libraries specifically designed for compiler development:

  • for C/C++, such APIs are Flex, Bison.
  • for Java - ANTLR.
  • for Haskell - Happy.

In contrast, an interpreter takes a program as input, possibly with a program input string, and runs the program on that input. The interpreter output is the program output.

Parsing

Although functionally different, compilers and interpreters share the parsing phase. During parsing, a stream of symbols is converted to a complex set of data structures (the Abstract Syntax Tree is one of them). Its roles are:

  • to make sure the program is correct.
  • to serve at interpreting or transforming the input to machine code.

Formal Languages are especially helpful for the parsing phase.

Formal Languages provide an indispensable tool for automating most steps in parser development, such that each parser need not be written from scratch, but relying on parser generators.

In the first part of the lecture, we shall study:

  • regular-expressions and automata: they are a means for defining (regular expressions) and computing (automata) tokens.
  • grammars and push-down automata: they are means for defining (grammars) and computing (push-down automata) syntactic relations.

In the second part of the lecture, we shall examine Computer Science areas different from parsing where the above concepts are being deployed.

  • One such concept is Computability and Complexity Theory: we use the concept of language to model problems, and classify problem hardness by examining which computation means (e.g. automata versus push-down automata) can accept a problem. This part of the lecture can be seen as an extension to the Algorithms and Complexity Theory lecture.