Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
lfa:2023:c01draft [2023/10/06 18:19] pdmatei |
lfa:2023:c01draft [2023/10/07 10:04] (current) mihai.udubasa changed regex to context free grammars for syntactic analysis |
||
---|---|---|---|
Line 26: | Line 26: | ||
The umbrella-term "Formal Languages and Automata" refers to a collection of tools that are inherently abstractions designed to make us write better and faster compilers. At their very beginning, compilers where heavy-weight pieces of software that had tens of thousands of lines of code, and took up to 3 years of writing (as was the case with the compiler for ALGOL - ALGOrithmic Language). A considerable part of that weight was carried by **parsers**, tools that were responsible for reading the program at hand. Historically, compilation has always been done in stages, and most compilers tend to stick to the following stages: | The umbrella-term "Formal Languages and Automata" refers to a collection of tools that are inherently abstractions designed to make us write better and faster compilers. At their very beginning, compilers where heavy-weight pieces of software that had tens of thousands of lines of code, and took up to 3 years of writing (as was the case with the compiler for ALGOL - ALGOrithmic Language). A considerable part of that weight was carried by **parsers**, tools that were responsible for reading the program at hand. Historically, compilation has always been done in stages, and most compilers tend to stick to the following stages: | ||
+ | |||
- **lexical stage**: In this stage, the input is split into **lexemes**, chunks or words with a particular interpretation. For instance, in ''int x = 0'', ''int'' might be a keyword, ''x'' might be a token, ''='' might be an operation and so forth. Whitespaces may be skipped or they may be intrinsic parts of the language syntax, as is the case in Haskell (whitespaces are used for indentation, which in turn governs program structure), or Python (where tabs are used instead of whitespaces with a similar role). | - **lexical stage**: In this stage, the input is split into **lexemes**, chunks or words with a particular interpretation. For instance, in ''int x = 0'', ''int'' might be a keyword, ''x'' might be a token, ''='' might be an operation and so forth. Whitespaces may be skipped or they may be intrinsic parts of the language syntax, as is the case in Haskell (whitespaces are used for indentation, which in turn governs program structure), or Python (where tabs are used instead of whitespaces with a similar role). | ||
+ | |||
- **syntactic stage**: In this stage, most parsers will build an Abstract Syntax Tree (AST) which describes the relations between tokens. For instance, the program fragment ''int x = 0'' may be interpreted as a //definition// which consists in the assignment of variable ''x'' to an expression ''0''. This stage is also responsible for making sure that the program is syntactically correct. | - **syntactic stage**: In this stage, most parsers will build an Abstract Syntax Tree (AST) which describes the relations between tokens. For instance, the program fragment ''int x = 0'' may be interpreted as a //definition// which consists in the assignment of variable ''x'' to an expression ''0''. This stage is also responsible for making sure that the program is syntactically correct. | ||
- | - **semantic checks**: Most of these checks are related to typing, which may be more relaxed, as in dynamic languages such as Racket of Python, or rigid, as in most OO-languages or Haskell. | + | |
+ | - **semantic checks**: Most of these checks are related to typing, which may be more relaxed, as in dynamic languages such as Racket or Python, or rigid, as in most OO-languages or Haskell. | ||
- **optimisation** and **code-generation**: During these stages machine code will be generated as well as reorganised or rewritten in order to increase efficiency. | - **optimisation** and **code-generation**: During these stages machine code will be generated as well as reorganised or rewritten in order to increase efficiency. | ||
Line 35: | Line 39: | ||
Finally, note that some languages (and many modern ones) do not fit perfectly on the previous description. Java is such an example. On the one hand, they are compiled, because bytecode will be generated during the process. Next, the bytecode will be further translated to machine code by the JVM. But JIT (Just-In-Time) compilation makes the setting more complex and more similar to interpretation. | Finally, note that some languages (and many modern ones) do not fit perfectly on the previous description. Java is such an example. On the one hand, they are compiled, because bytecode will be generated during the process. Next, the bytecode will be further translated to machine code by the JVM. But JIT (Just-In-Time) compilation makes the setting more complex and more similar to interpretation. | ||
- | Historically, writing parsers was challenging and took a lot of time. Nowadays, writing parsers from scratch is rarely done in practice. This process has been replaced by powerful abstractions, which allow us to specify what type of lexemes we should search for in the lexical phase, and what kind of program structure we should look for, during the syntactic phase. The former are the well-known **regular expressions**, while the latter are, more often than not, **regular expressions**. | + | Historically, writing parsers was challenging and took a lot of time. Nowadays, writing parsers from scratch is rarely done in practice. This process has been replaced by powerful abstractions, which allow us to specify what type of lexemes we should search for in the lexical phase, and what kind of program structure we should look for, during the syntactic phase. The former are the well-known **regular expressions**, while the latter are, more often than not, **context-free grammars**. |
These abstractions are central to our lecture. | These abstractions are central to our lecture. |