Background

Most of the concepts presented in the following lectures were developed during the thirties, ironically, almost 30 years before the development of the computer, and by mathematicians and logicians.

So why was the subject of computation interesting, in the absence of a computer?

Mathematicians were for long struggling with the proof of apparently simple statements such as Golbach's Conjecture, or Fermat's Last Theorem. It was commonly believed that, at some point, a bright mathematician would eventually prove or disprove such statements.

The issue of provability itself came under the scope of interest, for people such as Hilbert or Gödel. This led mathematicians to examine mathematical systems, i.e. objects which specify:

a set of axioms, i.e. statements which are taken to be true without prior proof.
a set of instruments of proof (e.g. modus ponens), which can be used to derive new true statements (starting from the axioms).

One example of a mathematical system is Euclidean Geometry, consisting of five axioms and some methods of proof. Euclidean Geometry is powerful in the sense that many mathematical statements (e.g. Pythagoras theorem) can be proven starting from the axioms, and applying the methods of proof.

However, there are areas where Euclidean Geometry does not work: for instance, in Albert Einstein's relativity theory, space is not euclidean (consisting of three coordinates only). For such areas, a new mathematical system is necessary.

Mathematicians from the thirties were also aware of paradoxes, e.g. Russell's paradox (the set containing those sets which do not contain themselves, is a valid construction?) and thus became preoccupied in finding a mathematical system where:

paradoxes were not possible (all statements which can be proved in the system are actually true - hence the system is sound)
all true statements could be found by a finite step of applications of the instruments of proof (hence the system is complete)

Gödel is famous for showing that this quest is impossible. He proved that:

any mathematical system powerful enough to capture arithmetic is incomplete: some true statements cannot be proven.
any mathematical system which includes consistency as an axiom is inconsistent. A set of axioms is consistent if there is no statement which can be proved and disproved at the same time, from the axioms.

Turing took Gödel's result one step further. Imagine the mathematician working in a mathematical system as a robot or computer which mechanically performs proof steps starting from the axioms. You feed a statement to the machine - it performs a sequence of proof steps, and then outputs a yes/no answer, i.e. whether or not the statement is true.

Turing devised such a machine and showed that mechanical proof is actually computation. Consistent to Gödel's results, he also showed that, for certain statements, no machine would not be able to provide an answer, but instead enter in an endless loop. Thus, the concept of undecidability was born. Undecidability is essentially the recognition that certain problems cannot be effectively solved, no matter how many resources (in terms of time and space) would be allocated.

In the last lecture, we examined some problems which:

cannot be solved entirely (the best algorithm only terminates if the answer to the problem is yes)
cannot be solved in sub-exponential time

Both are impossibility results: they make a universal claim of the form:

$ \forall$ algorithms $ A$ : $ A$ cannot solve problem $ P$ better than $ \ldots$

A difficulty in proving such a result is that:

algorithms are usually specified in an ad-hoc mathematical notation, or directly in code. In the latter case, the code can belong to conceptually different programming languages such as C or Java (imperative languages) or Lisp, Scheme (functional languages).
problems are heterogeneous: they involve very different mathematical objects: graphs, numbers, words, etc.

A common denominator is necessary for both problems and algorithms.

Definition (Problem instance):

A problem instance is a mathematical object of which we ask a question and expect an answer.

Definition (Problem):

An (abstract) problem is a mapping $ P:I \rightarrow O$ where $ I$ is a set of problem instances of which we ask the same question, and $ O$ is a set of answers. $ P$ assigns to each problem instance $ i \in I$ the answer $ P(i)\in O$.

Example (problem):

For the min-Vertex Covering, $ I$ is the set of undirected graphs and $ O$ is the set of natural numbers.

It is often the case that the answers we seek are also mathematical objects. For instance, the min Vertex Cover problem must be answered by a number - the minimum covering. However, many other problems prompt yes/no answers.

Whenever $ O = \{0, 1\}$ we say that $ P$ is a decision problem.

PCP and Unique Decoding are decision problems
min-Vertex Cover is not a decision problem

min-Vertex Cover is very similar to the following decision problem:

Definition (k-Vertex Cover):

Let $ G=(V,E)$ be an undirected graph and $ k$ be a natural number. There exists a covering in $ G$ of size $ k$ ?

The hardness of min-Vertex-Cover and k-Vertex-Cover are related: the former implies at most $ n$ attempts to solve a k-Vertex-Cover with $ k$ ranging from $ 1$ to $ n$ .

In general, any problem which is not a decision problem has a closely related problem of equal difficulty.

In what follows, we shall discuss decision problems only.

The last definitions do not solve the heterogeneity issue: depending on the problem at hand, $ I$ can be any set of mathematical objects.

Let us make a few observations:

each $ i \in I$ must be, in some sense finite. For instance, the graphs ¹⁾ are finite objects
$ I$ must be countable (but not necessarily finite).

For instance, the problem $ P : \mathbb{R} \times \mathbb{R} \rightarrow \{0, 1\}$ where $ P(x, y)$ returns $ 1$ if $ x$ and $ y$ are equal, has no sense from the point of computability theory. Note that the set of real numbers $ \mathbb{R}$ is not countable. Assume we would like to answer $ P (\pi, \sqrt{2})$ . Simply storing $ \pi$ and $ \sqrt{2}$ , which takes infinite space, is impossible on machines, and also takes us to point 1). Also, there are irrational numbers whose equality is not known.

Therefore:

if any possible set of problem instances $ I$ is countable, we can assign to each $ i\in I$ a encoding - no matter how the objects in $ I$ are constructed.

Definition (Encoding problem instances):

Let $ \Sigma$ be a finite set whom we call alphabet. A one-letter word is a member of $ \Sigma$ . A two-letter word is any member of $ \Sigma \times \Sigma = \Sigma^2$ . For instance, if $ \Sigma = \{a, b . . .\}$ , then $ (a, a) \in \Sigma^2$ is a two-letter word. An i-letter word is a member of $ \Sigma^i$ . We denote by:

$ \Sigma^∗ = \{\epsilon\} \cup \Sigma \cup \Sigma^2 \cup \ldots \cup \Sigma^i \cup \ldots$

the set of finite words which can be formed over $ \Sigma$ . $ \epsilon$ is a special word which we call empty word. Instead of writing, e.g. $ (a, b, b, a, a)$ for a 5-letter word, we simply write abbaa. Concatenation of two words is defined as usual.

Remark (Words):

each problem instance $ i$ can be uniquely represented as a finite word $ enc(i) \in \Sigma^*$ , for some $ \Sigma^∗$

therefore, if $ I$ is infinite, then $ I \simeq \Sigma^*$ ($ I$ is isomorphic with $ \Sigma^∗$ )

Having made the above remarks, we can move on to a refinement the problem definition:

Definition (Problem):

A problem is a function $ f:\Sigma^*\rightarrow\Sigma^*$ . A decision problem is a function $ f:\Sigma^*\rightarrow\{0,1\}$

The above definition is closer in spirit to programming: for programmers, it is natural to encode arrays, graphs, and other - more complicated structures, using a universal alphabet (ASCII).
We will also see that $ \Sigma$ need not be ASCII, and that the alphabet choice is less important.

The above definition achieves the desired levelling-out. However, we can also re-state the definition in another manner which will be more convenient later on. We start by observing the following:

Proposition:

For any finite alphabet $ \Sigma$ , $ \Sigma^*$ is infinitely countable.

Proof:

We show $ \Sigma^* \simeq \mathbb{N}$ . We build a bijective function $ h$ which assigns to each word, a unique natural number. We assign $ 0$ to $ \epsilon$ . Assume $ \mid \Sigma \mid = n$ . We assign to each one-letter word, the numbers from $ 1$ to $ n$ . Next, we assign to each $ k$ $ \geq$ 2-letter word $ w = w^\prime x$ the number $ n ∗ h(w^\prime) + h(x)$ . If $ n = 2$ we easily recognise that each binary word is assigned to his natural equivalent.

Hence, we have the following diagram:

$ i \in I \leftrightarrow enc(i) \in \Sigma^* \leftrightarrow h(enc(i)) \in \mathbb{N}$

The same observation can be derived directly from the fact that $ I$ is countable. For instance, if $ I$ is the set of undirected graphs, we can construct an enumeration for all graphs: each graph is assigned a natural number - its label.

Although it seems counter-intuitive talk about e.g. $ 25$ as the 25th graph in the enumeration of all graphs, for the purposes of Complexity Theory, it is quite helpful.

Definition (Problem):

A problem is a function $ f : \mathbb{N} \rightarrow \mathbb{N}$ . If some $ n$ encodes a problem input, then $ f(n)$ encodes its answer. A decision problem is a function $ f : \mathbb{N} \rightarrow $ {0, 1}.

To conclude:

when trying to solve concrete problems, the encoding issue is fundamental,
From the perspective of Complexity Theory, how the encoding is done is unessential ²⁾, and can be viewed

without “loss of information” to a natural number.

Algorithms are usually described as pseudo-code, and intended as abstractions over concrete programming language operations. The level of abstraction is usually unspecified rigorously, and is decided in an ad-hoc manner by the algorithm designer/programmer.

Pseudo-code is often dependent on some future implementation, and only abstracts syntactic elements (of a fixed programming language), possibly including data initialisation and subsequent handling.

Pseudo-code can be easily implemented in different languages only to the extent to which the languages share the same programming principles.

As before, we require a means for leveling out different programming styles and programming languages, in order to come up with a uniform, straightforward and simple definition for an algorithm.

The key observation is that programming languages offer instructions with partially overlapping functionality (e.g. if and switch, for and while) which make a programmers' life easy, but which can be removed without compromising the languages' expressive power - the same algorithms (or algorithmic ideas) can still be implemented, maybe by writing more code.

One direction for making a programming language as concise as possible leads to an assembly language of some form.

The formal definition for an algorithm which we introduce can be seen as an abstract assembly language, where all technical aspects (e.g. the machine/processor architecture) are put aside. We call such a programming language the Turing Machine.

Definition (Deterministic Turing Machine):

A Deterministic Turing Machine (abbreviated DTM, or simply TM) is a tuple $ M = (K, F, \Sigma, \delta, s_0)$ where:

$ \Sigma = \{a, b, c, \ldots\}$ is a finite set of symbols wich we call alphabet;

$ K$ is a set of states, and $ F \subseteq K$ is a set of accepting/final states;

$ \delta:K\times\Sigma\rightarrow K\times\Sigma\times\{L,H,R\}$ is a transition function which assigns to each state $ s\in K$ and $ c\in\Sigma$ the triple $ \delta(s,c)=(s^\prime,c^\prime,pos)$ ;

$ s_0\in K$ is an initial state.

The Turing Machine has a tape which contains infinite cells in both directions, and on each tape cell we have a symbol from $ \Sigma$ . The Turing Machine has a tape head, which is able to read the symbol from the current cell. Also, the Turing Machine is always in a given state. Initially (before the machine has started) the state is $ s_0$ . From a given state $ s$ , the Turing Machine reads the symbol $ c$ from the current cell, and performs a transition . The transition is given by $ \delta(s, c) = (s^\prime , c^\prime , pos)$ . Performing the transition means that the TM moves from state $ s$ to $ s^\prime$ , overrides the symbol $ c$ with $ c^\prime$ on the tape cell and:

if $ pos = L$ , moves the tape head on the next cell to the left

if $ pos = R$ , moves the tape head on the next cell to the right

if $ pos = H$ , leaves tape head on the current cell

The Turing Machine will perform transitions according to $ \delta$ .

Whenever the TM reaches an accepting/final state, we say it halts. If a TM reaches a non-accepting state where no other transitions are possible, we say it gets stuck/hangs.

the input of a Turing Machine is a finite word which is contained in its otherwise empty tape;

the output of a TM is the contents of the tape (not including empty cells) after the Machine has halted. We also write $ M(w)$ to refer to the output of $ M$ , given input $ w$ .

Example (Turing Machine):

Consider the alphabet $ \Sigma = \{\#, >, 0, 1\}$ , the set of states $ K = \{s_0, s_1, s_2\}$ , the set of final states $ F = \{s_2\}$ and the transition function:

$ \delta(s_0, 0) = (s_0, 0 ,R) \quad \quad \delta(s_0, 1) = (s_0, 1, R)$

$ \delta(s_0, \#) = (s_1, \# ,L) \quad \quad \delta(s_1, 1) = (s_1, 0, L)$

$ \delta(s_1, 0) = (s_2, 1 ,H) \quad \quad \delta(s_1, >) = (s_2, 1, H)$

The Turing Machine $ M = (K, F, \Sigma, \delta, s_0)$ reads a number encoded in binary on the tape, and increments it by $ 1$ . The symbol $ \#$ encodes the empty cell tape (We shall use $ \#$ to refer to the empty cell, through the text). Initially, the tape head is positioned at the most significant bit of the number. The Machine first goes over all bits, from left to right. When the first empty cell is detected, the machine goes into state $ s_1$ , and starts flipping $ 1$ s to $ 0$ s, until the first $ 0$ (or the initial position, marked by $ >$ ) is detected. Finally, the machine places $ 1$ on this current cell, and enters it’s final state.

The behaviour of the transition function can be more intuitively represented as in the figure above. Each node represents a state, and each edge - a transition. The label on each edge is of the form $ c/c^\prime,pos$ where $ c$ is the symbol read from the current tape cell, $ c^\prime$ is the symbol written on the current tape cell and $ pos$ is a tape head position. The label should be read as: the machine replaces $ c$ with $ c^\prime$ on the current cell tape and moves in the direction indicated by $ pos$ .

Let us consider that, initially, on the tape we have $ >0111$ — the representation of the number $ 7$ . The evolution of the tape is shown below. Each line shows the TM configuration at step $ i$ , that is, the tape and current state after transition $ i$ . For convenience, we have chosen to show two empty cells in each direction only. Also, the underline indicates the position of the tape head.

Transition no Tape Curent state

0 ##>0111## $ s_0$

1 ##>0111## $ s_0$

2 ##>0111## $ s_0$

3 ##>0111## $ s_0$

4 ##>0111## $ s_0$

5 ##>0111## $ s_1$

6 ##>0110## $ s_1$

7 ##>0100## $ s_1$

8 ##>0000## $ s_1$

9 ##>1000## $ s_2$

Are there any (conceptual) differences between a Turing Machine and an assembly language? </blockquote>

A Turing Machine $ M$ specifies instructions via the transition function $ \delta$ . Thus, $ M$ is quite similar to a program.
At the same time, The TM Definition illustrates an interpretation procedure which tells us how to perform the computation specified by a Turing Machine. But what Machine is performing this interpretation?

Turing's seminal result shows that the interpretation of any Turing Machine can be performed by a very special Turing Machine which he calls The Universal Turing Machine (abbreviated henceforth as UTM):

The input of the UTM is a Turing Machine (any TM); hence, to preserve our formalism, we need a mechanism to encode Turing Machines as words;
The output of the UTM is the word computed by the TM given as input

In the following Proposition, we show how Turing Machines can be encoded as words:

Any Turing Machine $ M = (K, F, \Sigma, \delta, s_0)$ can be encoded as a word over $ \Sigma$ . We write $ enc(M)$ to refer to this word. </blockquote>

Proof:

Intuitively, we encode states and positions as integers $ n \in {N}$ , transitions as pairs of integers, etc. and subsequently “convert” each integer to it's word counterpart in $ \Sigma^*$ , cf. Proposition 1.2.2.

Let $ NonFin = \mid K \setminus F \setminus \{s_0\} \mid$ be the set of non-final states, excluding the initial one. We encode each state in $ NonFin$ as an integer in $ \{1, 2, \ldots, NonFin\}$ and each final state as an integer in $ \{\mid NonFin \mid +1, \ldots, \mid NonFin \mid + \mid F \mid \}$ . We encode the initial state $ s_0$ as $ \mid NonFin \mid + \mid F \mid + 1$ , and L,H,R as $ \mid NonFin \mid + \mid F \mid + i$ with $ i \in \{2,3,4\}$ . Each integer from the above is represented as a word using $ \lceil{ log_{\mid \Sigma \mid} {(\mid NonFin \mid + \mid F \mid + 4)} }\rceil$ bits.

Each transition $ \delta(s, c) = (s^\prime, c^\prime, pos)$ is encoded as:

$ enc(s)\#c\#enc(s^\prime )\#c^\prime \#enc(pos)$

where $ enc(\cdot)$ is the encoding described above. The entire $ \delta$ is encoded a sequence of encoded transitions, separed by $ \#$ . The encoding of $ M$ is

$ enc(M) = enc(\mid NonFin\mid)\#enc(\mid F\mid)\#enc(\delta)$

Thus, $ enc(M)$ is a word, which can be fed to another Turing Machine. The latter should have the ability to execute (or to simulate) $ M$ . This is indeed possible:

Proposition (The Universal Turing Machine):

There exists a TM $ U$ which, for any TM $ M$ and every word $ w \in \Sigma^*$ , takes $ enc(M)$ and $ w$ as input and outputs $ 1$ whenever $ M(w) = 1$ and $ 0$ whenever $ M(w) = 0$ . We call $ U$ the Universal Turing Machine and say that $ U$ simulates $ M$ .

Proof:

Let $ M$ be a TM and $ w = c_1c_2 \ldots c_n$ be a word which is built from the alphabet of $ M$ . We build the Universal Turing Machine $ U$ as follows:

The input of $ U$ is $ enc(M)\#enc(s_0)\#c_1\#c_2 \ldots c_n$ . Note that $ enc(s_0)$ encodes the initial state of $ M$ while $ c_1$ is the first symbol from $ w$ . The portion of the tape $ enc(s_0)\#c_1\#c_2 \ldots c_n$ will be used to mark the current configuration of $ M$ , namely the current state of $ M$ (initially $ s_0$ ), the contents of $ M$ 's tape, and $ M$ 's current head position. More generally, this portion of the tape is of the form $ enc(s_i)\#u\#v$ , with $ u, v \in \Sigma_b^*$ and $ s_i$ being the current state of $ M$ . The last symbol of $ u$ marks the current symbol, while $ v$ is the word which is to the left of the head. Initially, the current symbol is the first one, namely $ c_1$ .

$ U$ will scan the initial state of $ M$ , then it will move on the initial symbol from $ w$ and finally will move on the portion of $ enc(M)$ were transitions are encoded. Once a valid transition is found, it will execute it:

$ U$ will change the initial state to the current one, according to the transition;

$ U$ will change the original symbol in $ w$ according to the transition;

$ U$ will change the current symbol, according to $ pos$ , from the transition;

$ U$ will repeat this process until an accepting state of $ M$ is detected, or until no transition can be performed.

Propositions 1.3.2 and 1.3.1 show that TMs have the capability to characterise both algorithms, as well as the computational framework to execute them. One question remains: what can TMs actually compute? Can they be used to sort vectors, solve SAT, etc.? The answer, which is positive is given by the following hypothesis:

A Turing Machine can be naturally viewed as a function $ M : \Sigma^* \rightarrow \Sigma^*$ . But, according to our above definition, any function $ \Sigma^* \rightarrow \Sigma^*$ is a problem. Therefore, a Turing Machine can be viewed as a problem?

Problems (i.e. functions $ \Sigma^* \rightarrow \Sigma^*$ ) are just mappings from words to words. There is no known law which governs this mapping, and it may be that such a law cannot be found (what kind of law anyway?)
Turing Machines are mappings from words to words which are governed by a law. The law has alreadty been described in the Turing Machine definition. So, in this respect, Turing Machines can be viewed as computable not just arbitrary functions.

A natural question to ask is why is the Turing Machine defined in this way, and if we can find alternative more powerful definitions. The scientific answer to the above question is we do not know (a partial answer is given by the Church-Turing thesis).

The Turing Machine is widely adopted as a model of computation (for reasons shown below), and this design decision is at the foundation of Complexity Theory. All results delivered by Complexity Theory (including the computability and complexity limits of PCP and min-Vertex Cover, respectively), rely on the assumption that no other, more powerful model of computation exists.

Quantum Computing is an area of research which explores the existence of more powerful models. However, existing research has yet to deliver such a model.

In what follows, we examine the Church-Turing Thesis which is an argument for using the Turing Machine as a computational model:

Any problem which can be solved by the Turing Machine is “universally solvable”.

The term “universally solvable” cannot be given a precise mathematical definition. We only know solvability w.r.t. a given model of computation (abstract or concrete).

It has been shown that the Turing Machine can solve any problem which known programming languages can solve (More precisely, all programming languages are Turing-complete, i.e. they can solve everything the TM can solve. The converse may not be true. Why is that?).
There exist other models of computation, namely: the lambda-calculus, while-programs, normal/Markov algorithms. All known models are at most equivalent with the Turing Machine, in the sense that these models compute precisely the same functions as the Turing Machine.

Alan Turing, On computable numbers, with an application to the Entscheidungsproblem

¹⁾ Graphs can also be infinite, however we do not consider them here

²⁾ There are exponentially-inefficient ways of encoding objects, which we shall discuss in a later lecture

Background

Preliminaries

Part I - What is a Problem ?

A general definition

'Leveling out' problem instances

Part II - A model of computation

The Turing Machine

The Turing Machine: Program or Programming language ?

Turing Machines and functions

Limitations of the Turing Machine

Conjecture (The Church-Turing Thesis)

References

Transition no	Tape	Curent state
0	##>0111##	$ s_0$
1	##>0111##	$ s_0$
2	##>0111##	$ s_0$
3	##>0111##	$ s_0$
4	##>0111##	$ s_0$
5	##>0111##	$ s_1$
6	##>0110##	$ s_1$
7	##>0100##	$ s_1$
8	##>0000##	$ s_1$
9	##>1000##	$ s_2$