====== Background ======

Most of the concepts presented in the following lectures were developed during the thirties, ironically, almost 30 years **before** the development of the computer, and **by mathematicians and logicians**.

So why was the subject of computation interesting, in the absence of a computer?

Mathematicians were for long struggling with the proof of apparently simple statements such as **Golbach's Conjecture**, or **Fermat's Last Theorem**. It was commonly believed that, at some point, a bright mathematician would eventually prove or disprove such statements.

The issue of **provability** itself came under the scope of interest, for people such as Hilbert or Gödel. This led mathematicians to examine //mathematical systems//, i.e. objects which specify:
  * a set of **axioms**, i.e. statements which are taken to be true without prior proof.
  * a set of **//instruments of proof//** (e.g. modus ponens), which can be used to //derive// new true statements (starting from the axioms).

One example of a mathematical system is **Euclidean Geometry**, consisting of five axioms and some methods of proof. Euclidean Geometry is powerful in the sense that many mathematical statements (e.g. Pythagoras theorem) can be proven starting from the axioms, and applying the methods of proof.

However, there are areas where Euclidean Geometry does not work: for instance, in Albert Einstein's relativity theory, space is not //euclidean// (consisting of three coordinates only).
For such areas, a new mathematical system is necessary.

Mathematicians from the thirties were also aware of paradoxes, e.g. Russell's paradox (the set containing those sets which do not contain themselves, is a valid construction?) and thus became preoccupied in finding a mathematical system where:
  * paradoxes were not possible (all statements which can be proved in the system are actually true - hence the system is **sound**)
  * all true statements could be found by a finite step of applications of the instruments of proof (hence the system is **complete**)

Gödel is famous for showing that this quest is impossible. He proved that:
  * any mathematical system powerful enough to capture arithmetic is **incomplete**: some true statements cannot be proven.
  * any mathematical system which includes **consistency** as an axiom is **inconsistent**. A set of axioms is **consistent** if there is no statement which can be proved **and** disproved at the same time, from the axioms.

Turing took Gödel's result one step further. Imagine the mathematician working in a mathematical system as a //robot// or //computer// which mechanically performs proof steps starting from the axioms. You feed a statement to the machine - it performs a sequence of proof steps, and then outputs a yes/no answer, i.e. whether or not the statement is true.

Turing devised such a machine and showed that //mechanical// proof is actually computation. Consistent to Gödel's results, he also showed that, for certain statements, **no machine** would not be able to provide an answer, but instead enter in an endless loop. Thus, the concept of **undecidability** was born. Undecidability is essentially the recognition that certain problems **cannot be effectively solved**, no matter how many resources (in terms of time and space) would be allocated.

====== Preliminaries ======

In the last lecture, we examined some problems which:
  * cannot be solved entirely (the best algorithm only terminates if the answer to the problem is //yes//)
  * cannot be solved in sub-exponential time

Both are **impossibility results**: they make a **universal** claim of the form:

$math[\forall] algorithms $math[A]: $math[A] cannot solve problem $math[P] better than $math[\ldots]

A difficulty in proving such a result is that:
  * **algorithms are usually specified in an ad-hoc mathematical notation**, or directly in code. In the latter case, the code can belong to conceptually different programming languages such as C or Java (imperative languages) or Lisp, Scheme (functional languages).
  * **problems are heterogeneous**: they involve very different mathematical objects: **graphs**, **numbers**, **words**, etc.

A common denominator is necessary for both **problems** and **algorithms**.

====== Part I - What is a Problem ? ======

===== A general definition =====

$def[Problem instance]
//A// problem instance //is a mathematical object of which we ask a question and expect an answer.//
$end

$def[Problem]
//An (abstract)// problem //is a mapping $math[P:I \rightarrow O] where $math[I] is a set of
problem instances of which we ask the same question, and $math[O] is a set of
answers. $math[P] assigns to each problem instance $ i \in I$ the answer $ P(i)\in O$.//
$end

$example[problem]
For the ''min-Vertex Covering'', $math[I] is the set of undirected graphs and $math[O] is the set of natural numbers.
$end

It is often the case that the answers we seek are also mathematical objects. For instance, the min Vertex Cover problem must be answered by a number - the minimum covering. However, many other problems prompt //yes/no// answers. 

Whenever $math[O = \{0, 1\}] we say that $math[P] is a //decision problem//. 

  * ''PCP'' and ''Unique Decoding'' are decision problems
  * ''min-Vertex Cover'' is not a decision problem

''min-Vertex Cover'' is very similar to the following decision problem:

$def[k-Vertex Cover]
Let $math[G=(V,E)] be an undirected graph and $math[k] be a natural number. There exists a covering in $math[G] of size $math[k] ?
$end 

The hardness of ''min-Vertex-Cover'' and ''k-Vertex-Cover'' are related: the former implies at most $math[n] attempts to solve a ''k-Vertex-Cover'' with $math[k] ranging from $math[1] to $math[n].

In general, any problem which is not a decision problem has a closely related problem of equal difficulty.

In what follows, **we shall discuss decision problems only**.

===== 'Leveling out' problem instances =====

The last definitions **do not solve the heterogeneity issue**: depending on the problem at hand, $math[I] can be any set of mathematical objects.

Let us make a few observations:
  - each $math[i \in I] must be, in some sense finite. For instance, the graphs ((Graphs can also be infinite, however we do not consider them here)) are finite objects 
  - $math[I] must be //countable// (but not necessarily finite). 

For instance, the problem $math[P : \mathbb{R} \times \mathbb{R} \rightarrow \{0, 1\}] where $math[P(x, y)] returns $math[1] if $math[x] and $math[y] are equal, has no sense from the point of computability theory. Note that the set of real numbers $math[\mathbb{R}] is **not countable**. Assume we would like to answer $math[P (\pi, \sqrt{2})]. Simply storing $math[\pi] and $math[\sqrt{2}], which takes infinite space, is impossible on machines, and also takes us to point 1). Also, there are irrational numbers whose equality is not known.

Therefore:
  * if any possible set of problem instances $math[I] is countable, we can **assign to each** $math[i\in I] a **encoding** - **no matter how the objects in $math[I] are constructed**.

$def[Encoding problem instances]
//Let// $math[\Sigma] //be a finite set whom we call// alphabet. //A// one-letter //word is a member of// $math[\Sigma]. //A// two-letter //word is any member of// $math[\Sigma \times \Sigma = \Sigma^2] . //For instance, if $math[\Sigma = \{a, b . . .\}], then// $math[(a, a) \in \Sigma^2] //is a two-letter word//.// An// i-letter //word is a member of// $math[\Sigma^i] . We denote by:

$math[\Sigma^∗ = \{\epsilon\} \cup \Sigma \cup \Sigma^2 \cup \ldots \cup \Sigma^i \cup \ldots]

//the set of finite words which can be formed over// $math[\Sigma]. //$math[\epsilon] is a special word which we call// empty word. //Instead of writing, e.g.// $math[(a, b, b, a, a)] //for a 5-letter word, we simply write abbaa//. //Concatenation of two words is defined as usual//.
$end

$remark[Words]
  * each problem instance $math[i] can be **uniquely** represented as a finite word $math[enc(i) \in \Sigma^*] , for some $math[\Sigma^∗]
  * therefore, if $math[I] is infinite, then $math[I \simeq \Sigma^*] ($math[I] is isomorphic with $math[\Sigma^∗] )
$end

Having made the above remarks, we can move on to a refinement the problem definition:

$def[Problem]
A problem is a function $math[f:\Sigma^*\rightarrow\Sigma^*]. A **decision** problem is a function $math[f:\Sigma^*\rightarrow\{0,1\}]
$end

  * The above definition is closer in spirit to **programming**: for programmers, it is natural to encode arrays, graphs, and other - more complicated structures, using a universal alphabet (ASCII).
  * We will also see that $math[\Sigma] need not be ASCII, and that the alphabet choice is less important.

The above definition achieves the desired //levelling-out//. However, we can also re-state the definition in another manner which will be more convenient later on. We start by observing the following:

$justprop
//For any finite alphabet// $math[\Sigma]//,// $math[\Sigma^*] //is infinitely countable//.
$end

$proof
We show $math[\Sigma^* \simeq \mathbb{N}]. We build a bijective function //$math[h]// which assigns to each word, a unique natural number. We assign $math[0] to $math[\epsilon]. Assume $math[\mid \Sigma \mid = n]. We assign to each one-letter word, the numbers from $math[1] to //$math[n]//. Next, we assign to each// $math[k]//$math[\geq] 2-letter word $math[w = w^\prime x] the number $math[n ∗ h(w^\prime) + h(x)]. If $math[n = 2] we easily recognise that each binary word is assigned to his natural equivalent.
$end

Hence, we have the following diagram:

$math[i \in I \leftrightarrow enc(i) \in \Sigma^* \leftrightarrow h(enc(i)) \in \mathbb{N}]

The same observation can be derived directly from the fact that $math[I] is **countable**. For instance, if $math[I] is the set of undirected graphs, we can construct an enumeration for all graphs: each graph is assigned a natural number - its //label//. 

  * Although it seems counter-intuitive talk about e.g. $math[25] as the 25th graph in the enumeration of all graphs, for the purposes of Complexity Theory, it is quite helpful.

$def[Problem]
//A problem is a function// $math[f : \mathbb{N} \rightarrow \mathbb{N}]//. If some $math[n] encodes a problem input, then $math[f(n)] encodes its answer.// //A decision problem is a function// $math[f : \mathbb{N} \rightarrow ]{0, 1}.
$end

To conclude: 
  * when trying to solve concrete problems, the encoding issue is fundamental,   
  * From the perspective of **Complexity Theory**, **how** the encoding is done is unessential ((There are exponentially-inefficient ways of encoding objects, which we shall discuss in a later lecture)), and can be viewed
without “//loss of information//” to a natural number.


====== Part II - A model of computation ======

Algorithms are usually described as pseudo-code, and intended as abstractions over concrete programming language operations. The level of abstraction is usually unspecified rigorously, and is decided in an ad-hoc manner by
the algorithm designer/programmer. 

Pseudo-code is often **dependent on some future implementation**, and only abstracts syntactic elements (of a fixed programming language), possibly including data initialisation and subsequent handling. 

Pseudo-code can be easily implemented in different languages **only to the extent** to which the languages share the same programming principles.

As before, we require a means for **leveling out** different programming styles and programming languages, in order to come up with a uniform, straightforward and simple definition for an algorithm.

The key observation is that programming languages offer instructions with **partially overlapping functionality** (e.g. ''if'' and ''switch'', ''for'' and ''while'') which make a programmers' life easy, but which can be removed without compromising the languages' **expressive power** - the same algorithms (or algorithmic ideas) can still be implemented, maybe by writing more code.

One direction for making a programming language as concise as possible leads to an **assembly language** of some form. 

The formal definition for an algorithm which we introduce can be seen as an abstract assembly language, where all technical aspects (e.g. the machine/processor architecture) are put aside.
We call such a //programming language// the **Turing Machine**.

===== The Turing Machine =====

$def[Deterministic Turing Machine]
A **Deterministic Turing Machine** (abbreviated DTM, or simply TM) is a tuple $math[M = (K, F, \Sigma, \delta, s_0)] where: 

  * $math[\Sigma  = \{a, b, c, ...\}] is a finite set of symbols wich we call **alphabet**; 
  * $math[K] is a set of **states**, and $math[F \subseteq K] is a set of **accepting/final states**;
  * $math[\delta:K\times\Sigma\rightarrow K\times\Sigma\times\{L,H,R\}] is a transition function which assigns to each state $math[s\in K] and $math[c\in\Sigma] the triple $math[\delta(s,c)=(s^\prime,c^\prime,pos)];
  * $math[s_0\in K] is an **initial state**.

The Turing Machine has a tape which contains infinite cells in both directions, and on each tape cell we have a symbol from $math[\Sigma]. The Turing Machine has a **tape head**, which is able to read the symbol from the current cell. Also, the Turing Machine is always in a given state. Initially (before the machine has started) the state is $math[s_0]. From a given state $math[s], the Turing Machine reads the symbol $math[c] from the current cell, and performs a  transition . The transition is given by $math[\delta(s, c) = (s^\prime , c^\prime , pos)]. Performing the transition means that the ''TM'' moves from state $math[s] to $math[s^\prime] , overrides the symbol $math[c] with $math[c^\prime] on the tape cell and: 
  * if $math[pos = L], moves the tape head on the next cell to the left
  * if $math[pos = R], moves the tape head on the next cell to the right
  * if $math[pos = H], leaves tape head on the current cell

The Turing Machine will perform transitions according to $math[\delta].

Whenever the ''TM'' reaches an accepting/final state, we say it **halts**. If a ''TM'' reaches a non-accepting state where no other transitions are possible, we say it //gets stuck/hangs//.

  * the **input** of a Turing Machine is a finite word which is contained in its otherwise empty tape;
  * the **output** of a ''TM'' is the contents of the tape (not including empty cells) after the Machine has halted. We also write $math[M(w)] to refer to the output of $math[M], given input $math[w].
$end

$example[Turing Machine]
//Consider the alphabet $math[\Sigma = \{\#, >, 0, 1\}], the set of states $math[K = \{s_0, s_1, s_2\}], the set of final states $math[F = \{s_2\}] and the transition function://

$math[\delta(s_0, 0) = (s_0, 0 ,R) \quad \quad \delta(s_0, 1) = (s_0, 1, R)]

$math[\delta(s_0, \#) = (s_1, \# ,L) \quad \quad \delta(s_1, 1) = (s_1, 0, L)]

$math[\delta(s_1, 0) = (s_2, 1 ,H) \quad \quad \delta(s_1, >) = (s_2, 1, H)]

The Turing Machine $math[M = (K, F, \Sigma, \delta, s_0)] reads a number encoded in binary on the tape, and increments it by $math[1]. The symbol $math[\#] encodes the empty cell tape //(We shall use $math[\#] to refer to the empty cell, through the text)//. Initially, the tape head is positioned at the most significant bit of the number. The Machine first goes over all bits, from left to right. When the first empty cell is detected, the machine goes into state $math[s_1], and starts flipping $math[1]s to $math[0]s, until the first $math[0] (or the initial position, marked by $math[>]) is detected. Finally, the machine places $math[1] on this current cell, and enters it’s final state.

{{aafigura_2.3.1.jpg?nolink&400 |}}

The behaviour of the transition function can be more intuitively represented as in the figure above. Each node represents a state, and each edge - a transition. The label on each edge is of the form $math[c/c^\prime,pos] where $math[c] is the symbol read from the current tape cell, $math[c^\prime] is the symbol written on the current tape cell and $math[pos] is a tape head position. The label should be read as: the machine replaces $math[c] with $math[c^\prime] on the current cell tape and moves in the direction indicated by $math[pos].

Let us consider that, initially, on the tape we have $math[>0111] — the representation of the number $math[7]. The evolution of the tape is shown below. Each line shows the ''TM'' configuration at step $math[i], that is, the tape and current state after transition $math[i]. For convenience, we have chosen to show two empty cells in each direction only. Also, the underline indicates the position of the tape head.

^  Transition no  ^  Tape           ^  Curent state  ^ 
|  0              |  ##__>__0111##  |  $math[s_0]    |
|  1              |  ##>__0__111##  |  $math[s_0]    |
|  2              |  ##>0__1__11##  |  $math[s_0]    |
|  3              |  ##>01__1__1##  |  $math[s_0]    |
|  4              |  ##>011__1__##  |  $math[s_0]    |
|  5              |  ##>0111__#__#  |  $math[s_1]    |
|  6              |  ##>011__0__##  |  $math[s_1]    |
|  7              |  ##>01__0__0##  |  $math[s_1]    |
|  8              |  ##>0__0__00##  |  $math[s_1]    |
|  9              |  ##>__1__000##  |  $math[s_2]    |

$end

$justexercise
Are there any (conceptual) differences between a Turing Machine and an assembly language?
$end
{{## The Turing Machine is resource unbound ##}}

===== The Turing Machine: Program or Programming language ? =====

  * A Turing Machine $math[M] specifies **instructions** via the transition function $math[\delta]. Thus, $math[M] is quite similar to a **program**. 
  * At the same time, The ''TM'' Definition illustrates an **interpretation procedure** which tells us how to perform the computation specified by a Turing Machine. But what //Machine// is performing this interpretation?

Turing's seminal result shows that the interpretation of **any** Turing Machine can be performed by a very special Turing Machine which he calls **The Universal Turing Machine** (abbreviated henceforth as UTM):
  * The **input** of the UTM is a Turing Machine (any TM); hence, to preserve our formalism, we need a mechanism to encode Turing Machines as **words**;
  * The **output** of the UTM is the **word computed by the TM given as input** 


In the following Proposition, we show how Turing Machines can be encoded as words:

$proposition[TMs as words]
Any Turing Machine $math[M = (K, F, \Sigma, \delta, s_0)] can be encoded as a word over $math[\Sigma]. We write $math[enc(M)] to refer to this word.
$end

$proof
Intuitively, we encode states and positions as integers $math[n \in $mathbb{N}], transitions as pairs of integers, etc. and subsequently //"convert"// each integer to it's word counterpart in $math[\Sigma^*], cf. Proposition 1.2.2.

Let $math[NonFin = \mid K \setminus F \setminus \{s_0\} \mid] be the set of non-final states, excluding the initial one. We encode each state in $math[NonFin] as an integer in $math[\{1, 2, ..., NonFin\}] and each final state as an integer in
$math[\{\mid NonFin \mid +1, ..., \mid NonFin \mid + \mid F \mid \}]. We encode the initial state $math[s_0] as $math[\mid NonFin \mid + \mid F \mid + 1], and L,H,R as $math[\mid NonFin \mid + \mid F \mid + i] with $math[i \in \{2,3,4\}]. Each integer from the above is represented as a word using $math[\lceil{ log_{\mid \Sigma \mid} {(\mid NonFin \mid + \mid F \mid + 4)} }\rceil]
bits.

Each transition $math[\delta(s, c) = (s^\prime, c^\prime, pos)] is encoded as:

$math[enc(s)\#c\#enc(s^\prime )\#c^\prime \#enc(pos)]

where $math[enc(\cdot)] is the encoding described above. The entire $math[\delta] is encoded a sequence of encoded transitions, separed by $math[\#]. The encoding of $math[M] is

$math[enc(M) = enc(\mid NonFin\mid)\#enc(\mid F\mid)\#enc(\delta)]

$end

Thus, $math[enc(M)] is a word, which can be fed to another Turing Machine. The latter should have the ability to execute (or to simulate) $math[M]. This is indeed possible:

$prop[The Universal Turing Machine]
There exists a ''TM'' $math[U] which, for any ''TM'' $math[M] and every word $math[w \in \Sigma^*], takes $math[enc(M)] and $math[w] as input and outputs $math[1] whenever $math[M(w) = 1] and $math[0] whenever $math[M(w) = 0]. We call $math[U] the **Universal Turing Machine** and say that $math[U] simulates $math[M].
$end

$proof 
Let $math[M] be a ''TM'' and $math[w = c_1c_2 ... c_n] be a word which is built from the alphabet of $math[M]. We build the Universal Turing Machine $math[U] as follows:
  * The input of $math[U] is $math[enc(M)\#enc(s_0)\#c_1\#c_2 ... c_n]. Note that $math[enc(s_0)] encodes the initial state of $math[M] while $math[c_1] is the first symbol from $math[w]. The portion of the tape $math[enc(s_0)\#c_1\#c_2 ... c_n] will be used to mark the current configuration of $math[M], namely the current state of $math[M] (initially $math[s_0]), the contents of $math[M]'s tape, and $math[M]'s current head position. More generally, this portion of the tape is of the form $math[enc(s_i)\#u\#v], with $math[u, v \in \Sigma_b^*] and $math[s_i] being the current state of $math[M]. The last symbol of $math[u] marks the current symbol, while $math[v] is the word which is to the left of the head. Initially, the current symbol is the first one, namely $math[c_1].
  * $math[U] will scan the initial state of $math[M], then it will move on the initial symbol from $math[w] and finally will move on the portion of $math[enc(M)] were transitions are encoded. Once a valid transition is found, it will execute it: 
    - $math[U] will change the initial state to the current one, according to the transition;
    - $math[U] will change the original symbol in $math[w] according to the transition;
    - $math[U] will change the current symbol, according to $math[pos], from the transition;
  * $math[U] will repeat this process until an accepting state of $math[M] is detected, or until no transition can be performed.
$end

Propositions 1.3.2 and 1.3.1 show that ''TM''s have the capability to characterise both algorithms, as well as the computational framework to execute them. One question remains: what can ''TM''s actually compute? Can they be used to sort vectors, solve SAT, etc.? The answer, which is positive is given by the following hypothesis:

===== Turing Machines and functions =====

A Turing Machine can be naturally viewed as a function $math[M : \Sigma^* \rightarrow \Sigma^*]. But, according to our above definition, any function $math[\Sigma^* \rightarrow \Sigma^*] is a **problem**. Therefore, a Turing Machine can be viewed as a problem?

  * Problems (i.e. functions $math[\Sigma^* \rightarrow \Sigma^*]) are just mappings from words to words. There is **no known law** which governs this mapping, and it may be that such a **law** cannot be found (what **kind of law** anyway?)
  * Turing Machines are mappings from words to words **which are governed by a law**. The **law** has alreadty been described in the Turing Machine definition. So, in this respect, Turing Machines can be viewed as **computable** not just **arbitrary** functions.

===== Limitations of the Turing Machine =====

A natural question to ask is why is the Turing Machine defined in this way, and if we can find alternative more powerful definitions. The scientific answer to the above question is **we do not know** (a partial answer is given by the Church-Turing thesis).

The Turing Machine is widely adopted as a model of computation (for reasons shown below), and this //design decision// is at the foundation of Complexity Theory. All results delivered by Complexity Theory (including the computability and complexity limits of PCP and min-Vertex Cover, respectively), rely on the assumption that no other, more powerful model of computation exists.

**Quantum Computing** is an area of research which explores the existence of more powerful models. However, existing research has yet to deliver such a model. 

===== Conjecture (The Church-Turing Thesis) =====

In what follows, we examine the **Church-Turing Thesis** which is an **argument** for using the Turing Machine as a computational model:

//Any problem which can be solved by the Turing Machine is //"universally solvable"//.//

The term //"universally solvable"// cannot be given a precise mathematical definition. We only know solvability w.r.t. a given model of computation (abstract or concrete). 

  * It has been shown that the Turing Machine can solve any problem which known programming languages can solve (More precisely, all programming languages are Turing-complete, i.e. they can solve everything the ''TM'' can solve. The converse may not be true. Why is that?). 
  * There exist other models of computation, namely: **the lambda-calculus**, **while-programs**, **normal/Markov algorithms**. All known models are **at most** equivalent with the Turing Machine, in the sense that these models compute **precisely the same functions** as the Turing Machine.


===== References =====

  - Alan Turing, [[https://www.cs.virginia.edu/~robins/Turing_Paper_1936.pdf|On computable numbers, with an application to the Entscheidungsproblem]]