lfa:2023:c01draft [books]

This is an old revision of the document!

Abstractions

The beginning

In the early 1940s some of the first programmable machines started to appear in research labs such as Harvard Computation Lab or Bell Labs. These machines were glacially slow by today's standards. However, they were also very difficult to program, and to operate. Programming required intimate knowledge of the machine, and often required resorting to hacks in order to make the computations run faster. Speed was always critical. During this time, when the US was at war, most of these machines were used to compute missile trajectories, damage coverage from bombs, and other war-related computations. Machines such as the Mark 1 were often used 24 hours per day, 7 days of week. The human effort for keeping such machines functional was split between programmers, mostly responsible for developing programs, and operators, which would be ensure the machine was functional throughout the computation. They would repair or replace relays or other components. In some machines, such as the ENIAC, operators would be required to step in during a computation, for it to continue (e.g. flip switches or perform certain electric connections).

Speed

In around 80 years, computers went from 3 (numerical) operations per second (on the Mark 1), to 1 billion instructions per second on A15 bionic chips. This means that one second of computation from your average mobile phone would have taken around 10.5 years to complete on the Mark 1. The success of today's machines can be largely attributed to the major advancements in hardware, with the emergence of the silicon technology. The success of today's software however, is a testament to our ability to build abstractions that make programming easier.

Complexity

One of the first abstractions used in programming started with the observation that many programs were reusing groups of instructions (e.g. performing a more complex task such as numerical differentiation). Thus, programmers started to keep notebooks with such groups, so that they could reuse them in different programs. At this moment, re-usage simply meant copying those instructions from notebooks. Soon, notebooks started being shared between programmers. At that point, subroutines were actually invented. One of the earliest forms of compilers, called linkers would be responsible for automating the copying process. Programs would be loaded and also linked to routines that were required to run it. The subroutine call is thus an abstraction for a piece of code that should be executed at that point. Linkers made the programming process far easier and allowed the programmer to focus on the program logic, removing the task of copying or rewriting from scratch large pieces of code. The next big step was the observation that programmers think and express themselves better in natural language: most of the punch-cards used in early computers bear hand-written comments that would make programs easier to follow. It was Grace Hopper's intuition that programmers should program in natural language instead of machine code, and this lead to the development of COBOL (COmmon Business-Oriented Language), a syntax-friendly language that is still in use today, more than 60 years from its creation, and of FORTRAN (FORmula TRANslator).

Abstractions

These programming languages in the modern sense of the word are abstractions that we use to hide, or abstract from the often complex, messy details of machine hardware. They allow us to write ever more sophisticated programs and applications. These abstractions are powered by compilers or interpreters, which translate our code into machine language, in the former case, or in the actual execution of our code, in the latter. In an attempt to generalise, we shall say that an abstraction is a tool that allows us to take a high-level description of something, and translate it into something operational, a sequence of steps that can be executed to achieve the desired goal. In this sense, the high-level description is given in an appropriate syntax, while the translation process assigns a semantics - an intended, executable, meaning.

The Information Technology of today is highly powered by abstractions. We note only two examples: relational database systems are powered by SQL, which allows us to express and sequence various operations to be performed on data, using almost-human sentences such as SELECT * from user_table where id=0.

Many modern datacenter topologies use software defined networking (SDN). In effect, this means that we can use software abstractions to govern how data is being switched or routed in a topology, without the need to change wiring or to individually update configuration files on distinct machines, in order to achieve some desired behaviour.

The umbrella-term “Formal Languages and Automata” refers to a collection of tools that are inherently abstractions designed to make us write better and faster compilers. At their very beginning, compilers where heavy-weight pieces of software that had tens of thousands of lines of code, and took up to 3 years of writing (as was the case with the compiler for ALGOL - ALGOrithmic Language). A considerable part of that weight was carried by parsers, tools that were responsible for reading the program at hand. Historically, compilation has always been done in stages, and most compilers tend to stick to the following stages: - lexical stage: In this stage, the input is split into lexemes, chunks or words with a particular interpretation. For instance, in int x = 0, int might be a keyword, x might be a token, = might be an operation and so forth. Whitespaces may be skipped or they may be intrinsic parts of the language syntax, as is the case in Haskell (whitespaces are used for indentation, which in turn governs program structure), or Python (where tabs are used instead of whitespaces with a similar role). - syntactic stage: In this stage, most parsers will build an Abstract Syntax Tree (AST) which describes the relations between tokens. For instance, the program fragment int x = 0 may be interpreted as a definition which consists in the assignment of variable x to an expression 0. This stage is also responsible for making sure that the program is syntactically correct. - semantic checks: Most of these checks are related to typing, which may be more relaxed, as in dynamic languages such as Racket of Python, or rigid, as in most OO-languages or Haskell. - optimisation and code-generation: During these stages machine code will be generated as well as reorganised or rewritten in order to increase efficiency.

The first two stages: lexical and syntactic, are usually the responsibility of the parser, which is usually decoupled from t

C01: Introduction

Abstractions

What are Formal Languages (and Automata)?