====== - Complexity Theory ======= ===== - Measuring time and space ===== In Computability Theory, we have classified problems (e.g. in classes $math[R] and $math[RE]) based on Turing Machine's ability to decide/accept them. In order to classify problems based on hardness, we need to account for the //number of steps (time)// and //tape cells// which are employed by a Turing Machine. The amount of spent resources (time/space) by a Turing Machine $math[M] may be expressed as functions: $math[\mathcal{T}_M, \mathcal{S}_M : \Sigma^* \rightarrow \mathbb{N}] where $math[\mathcal{T}_M(w)] (resp. $math[\mathcal{S}_M(w)]) is the number of steps performed (resp. tape cells used) by $math[M], when running on input $math[w]. This definition suffers from un-necessary overhead, which makes time and space analysis difficult. We formulate some examples to illustrate why this is the case: $algorithm[$math[Alg(n)]] $math[\mathbf{while} \mbox{ } n \lt 100] $math[\quad n=n+1] $math[\mathbf{return} \mbox{ } 1] $end We note that $math[Alg] runs 100 steps for $math[n=0] while only one step, for $math[n \geq 100]. However, in practice, it is often considered that each input is as likely to occur as any other (This is not often the case. There are numerous algorithms which rely on some probability that the input is of some particular type. For instance, efficient SAT solves rely on a particular ordering of variables, when interpretations are generated and verified. On certain orderings and certain formulae, the algorithm runs in polynomial time. The key to the efficiency of SAT solvers is the programmers //estimate// and efficient ordering, based on some expectancy from the input formula. The algorithm may be exponentially costly for some formulae, but run in //close-to-polynomial// time, for most of the inputs. ). For this reason, we shall adopt the following convention: ($math[\star]) //We allways consider the most// expensive/defavourable //case, given inputs of a certain type.// In our previous example, we consider the running time of $math[Alg] as being 100, since this is the most expensive case. Consider the following example: $algorithm[$math[Sum(v,n)]] $math[s=0,i=0] $math[\mathbf{while} \mbox{ } i \lt n] $math[\quad s=s+v \lbrack i \rbrack] $math[\quad i=i+1] $math[\mathbf{return} \mbox{ } s] $end Unlike $math[Alg], $math[Sum] does not have a universal upper limit on it's running time. The number of steps $math[Sum] executes, depends on the number of variables from $math[v], namely $math[n], and is equal to $math[2n+3], if we consider each variable initialisation and the return statements, as computing steps. Thus, we observe that ($math[\star \star]) //The running time (resp. consumed space) of an algorithm will grow as the size of the input grows.// We can now merge ($math[\star]) and ($math[\star \star]) into a definition: $def[Running time of a TM] //The// running time //of a Turing Machine $math[M] is given by $math[\mathcal{T}_M : \mathbb{N} \rightarrow \mathbb{N}] iff:// $math[\forall \omega \in \Sigma^*] : the nb. of transitions performed by $math[M] is at most $math[\mathcal{T}_M(\mid \omega \mid)]. $end ===== - Remark (Consumed spaced for a TM) ===== //A naive definition for consumed space of a Turing Machine would state that $math[\mathcal{S}_M(\mid \omega \mid)] is the number of tape cells which $math[M] employs. This definition is imprecise. Consider the Turing Machine which receives a binary word as input and compute whether the word is a power of $math[2^n]. Asside from// reading // it's input, the machine consumes no space. Thus, we might refine our definition into: "$math[\mathcal{S}_M(\mid \omega \mid)]" is the number of tape// writes //which $math[M] performs". This definition is also imprecise. Consider the binary counter Turing Machine from the former chapter. It performs a number of writes proportional to the number of consecutive //1//'s found at the end of the string. However, the counter does not use additional space, but only makes processing on the input.// //Thus, the consumed space $math[\mathcal{S}_M(\mid \omega \mid)] is// the number of written cells **except from the input**. //Consider a Turing Machine which receives $math[n] numbers encoded as binary words, each having at most 4 bits, and which computes the sum of the numbers, modulo $math[2^4]. Apart from reading the $math[4*n] bits and $math[n-1] word separators, the machine employs another// 4 //cells to hold a temporary sum. Thus, the consumed space for this machine is $math[4^7] //(We can also build another machine which simply uses the first number to hold the temporary sum, and this use no additional space. )//.// //A formal definition for consumed space of a TM is outside the scope of this course, since it involves multi-tape Turing Machines. The basic idea is to separate the// input //from the rest of the space used for computation.// //Thus, when assesing the consumed space of an algorithm, we shall **never account for the consumed space by the input**.// Recall that, as in Computability Theory, our primary agenda is to produce a classification of problems. To this end, it makes sense to first introduce a //classification of Turing Machine running times//. ===== Asymptotic notations ===== ===== - Remark (Running times vs. arbitrary functions) ===== //In the previous section, we have defined runnig times of a Turing Machine as funtions $math[\mathcal{T} : \mathbb{N} \rightarrow \mathbb{N}], and we have seen that they ar often monotonically incresing ($math[n \leq m \Longrightarrow \mathcal{T}(n) \leq \mathcal{T}(m)]). While monotonicity is common among the runnig times of conventional algorithms, it is not hard to find examples (more-or-less realistic) where it does not hold. For instance, an algorithm may simply return, if it's input exceeds a given size. Thus, we shall not, in general, assume that running times are monotonic.// //Furthermore, we shall extend our classification to arbitrary functions $math[f : \mathbb{R} \rightarrow \mathbb{R}], since there is no technical reason to consider only functions over naturals. In support for this, we shall also add that asymptotic notations are useful in other fields outside complexity theory, where the assumption that functions are definde over naturals numbers only is not justified.// $def[$math[\Theta] (theta) notation] //Let $math[g : \mathbb{R} \rightarrow \mathbb{R}]. Then $math[\Theta(g(n))] is the class of functions:// $math[\Theta(g(n)) = \left \{ f : \mathbb{R} \rightarrow \mathbb{R} \left\lvert \begin{array}{ll} \exists c_1,c_2 \in \mathbb{R}^+ \\ \exists n_0 \in \mathbb{N} \end{array}, \forall n \geq n_0, c_1g(n) \leq f(n) \leq c_2g(n) \right. \right \}] $end Thus, $math[\Theta(f(n))] is the class of all functions with the //same asymptotic growth// as $math[f(n)]. We can easily observe that, for all continuous $math[g,f \in Hom(\mathbb{R},\mathbb{R})] such that $math[g \in \Theta(f(n))], we have $math[$$\lim_{n\to\infty} \frac{f(n)}{g(n)} = c], where $math[c \neq 0]. There is an infinite number of classes $math[\Theta(f(n))], one for each function $math[f]. However, if $math[g(n) \in \Theta(f(n))], then $math[\Theta(g(n)) = \Theta(f(n))]. It makes sense to consider classes which describe functions with //inferior/superior// asymptotic growth: $def[$math[O, \Omega] notations] //Let $math[f : \mathbb{R} \rightarrow \mathbb{R}]. Then:// $math[O(g(n)) = \left\{ f:\mathbb{R} \rightarrow \mathbb{R} \left\lvert \begin{array}{ll} \exists c \in \mathbb{R}^+ \\ \exists n_0 \in \mathbb{N} \end{array}, \forall n \geq n_0, 0 \leq f(n) \leq c*g(n) \right. \right\}] $math[\Omega(g(n)) = \left\{ f:\mathbb{R} \rightarrow \mathbb{R} \left\lvert \begin{array}{ll} \exists c \in \mathbb{R}^+ \\ \exists n_0 \in \mathbb{N} \end{array}, \forall n \geq n_0, 0 \leq c*g(n) \leq f(n) \right. \right\}] $end Note that $math[g \in O(f(n)) \Longrightarrow O(g(n)) \subseteq O(f(n))], while $math[g \in \Omega(f(n)) \Longrightarrow \Omega(g(n)) \subseteq \Omega(f(n))]. Finally, $math[\Omega(f(n)) \cap O(f(n)) = \Theta(f(n))]. Each of the above propositions can be easily proved using the respective definitions of the notations. $math[O] and $math[\Omega] offer //relaxed// bounds for asymptotic function growth. Thus, $math[g \in O(f(n))] should be read as: //The function $math[g] grows asymptotically at most as much as $math[f]//. It makes sense to also consider //tight// bounds: $def[$math[o,\omega] notations] $math[o(g(n)) = \left\{ f:\mathbb{R} \rightarrow \mathbb{R} \left\lvert \begin{array}{ll} \forall c \in \mathbb{R}^+ \\ \exists n_0 \in \mathbb{N} \end{array}, \forall n \geq n_0, 0 \leq f(n) \leq c*g(n) \right. \right\}] $math[\omega(g(n)) = \left\{ f:\mathbb{R} \rightarrow \mathbb{R} \left\lvert \begin{array}{ll} \forall c \in \mathbb{R}^+ \\ \exists n_0 \in \mathbb{N} \end{array}, \forall n \geq n_0, 0 \leq c*g(n) \leq f(n) \right. \right\}] $end Thus, $math[g \in o(f(n))] should be read: //$math[g] grows assymptotically strictly less than $math[f]//. We have $math[o(f(n)) \cap \omega(f(n)) = \emptyset], $math[O(f(n)) \cap \Omega(f(n)) = \Theta(f(n))] and $math[\omega(f(n)) \cup \Theta(f(n)) = \Omega(f(n))]. ==== - Exercise ==== //If $math[f(n) \in \Omega(n^2)] and $math[g(n) \in O(n^3)] then $math[\displaystyle \frac{f(n)}{g(n)} \in \ldots]// //If $math[f(n) \in o(n^2)] and $math[g(n) \in \Theta(n^3)] then $math[f(n) \cdot g(n) \in \ldots]// //If $math[f(n) \in \Theta(n^3)] and $math[g(n) \in o(n^2)] then $math[\displaystyle \frac{f(n)}{g(n)} \in \ldots]// ==== - Exercise ==== //Prove or disprove the following implications:// $math[f(n)=O(\log n) \Rightarrow 2^{f(n)}=O(n)] $math[f(n)=O(n^2)] //and// $math[g(n)=O(n) \Rightarrow f(g(n))=O(n^3)] $math[f(n)=O(n)] //and// $math[g(n)=1+\sqrt{f(n)} \Rightarrow g(n)=\Omega(\log n)] ===== Syntactic sugars ===== This section follows closely Lecture 2 from [[http://ocw.cs.pub.ro/ppcarte/doku.php?id=aa:intro:bibliography|[1]]]. Quite often, asymptotic notations are used to refer to arbitrary functions having certain properties related to their order of growth. For instance, in: $math[\lceil f(x) \rceil = f(x) + O(1)] applying "rounding" to $math[f(x)], may be expressed as the original $math[f(x)] to which we add a function bounded by a constant. Similarly: $math[\displaystyle \frac{1}{1-x} =1+x+x^2+x^3+\Omega(x^4)], for $math[-1 < x < 1] The above notation allows us to "formally disregard" the terms from the expansion, by replacing them with an asymptotic notation which characterises their order of growth. One should make a distinction between the usage of asymptotic notations in **arithmetic expressions**, such as the ones previously illustrated, and **equations**. Consider the followind example: $math[f(x)=O(1/x)] which should be read: //there exists a function $math[h \in O(1/x)] such that $math[f(x)=h(x)]//. Similarly: $math[f(x)=O(\log x)+O(1/x)] should be read: //there exists functions $math[h \in O(1/x)] and $math[w \in O(\log x)] such that $math[f(x)=w(x)+h(x)]//. In equations such as: $math[O(x)=O(\log x)+O(1/x)] the equality is not symmetric, and should be read from left to right: //for any function $math[f \in O(x)], there exist functions $math[h \in O(1/x)] and $math[w \in O(\log x)] such that $math[f(x)=w(x)+h(x)]//. In order to avoid mistakes, the following algorithmic rule should be applied. When reading an equation of the form: $math[left = right] * each occurrence of an asymptotic notation in the //left// should be replaced by an **unuversally quantified** function belonging to the corresponding class. * each occurrence of an asymptotic notation in //right// should be replaced by an **existentially quantified** function from the corresponding class. ===== - Running time in complexity theory ===== Using asymptotic notations, we can distinguish between running (of algorithms) with different asymptotic growths. Experience has shown that, it is unfeasible to develop a theory which uses asymptotic notations in order to classify problems, based on their difficulty. Thus, in complexity theory, we make an even stronger assumption: //the exponent of a polynomial function is un-important//. Recall that, with asymptotic nottions, we do not differentiate between $math[n^2] and $math[2n^2+n+1] (and denote either of the two by $math[\Theta(n^2)]). In Complexity Theory, we do not distinguish between, e.g. $math[n^2] and $math[n^3], and thus, we write $math[n^{O(1)}], thus refering to a polynomial of arbitrary degree. Before introducing a classification of problems, there is a question which must be addressed: //How does the encoding of a problem instance affect the running time of the subsequent algorithm?// To see why this issue is important, consider the encoding of numbers using a single digit. (e.g. $math[IIIIIIII] encodes the number 8). A Turing Machine $math[M] which starts with the tape: ^ $math[>] ^ //I// ^ //I// ^ //I// ^ //I// ^ //I// ^ //I// ^ $math[\#] ^ $math[\#] ^ and increments the represented number by shifting the head to the first empty cell, where if places $math[I], will perform a number of steps which is linear with respect to the size of the input. Thus, the running time of $math[M] is $math[O(n)] where $math[n= \lvert w \rvert], and $math[w] is $math[M]'s input. The Turing Machine which uses the binary alphabet, and encodes numbers as binary words, will also run in linear time w.r.t. the size of the input, but in this case, there is an //exponential gap// between the two representations. The representation of a natural $math[x] consumes $math[n=\lceil \log x \rceil] cells in the second machine, and $math[x] cells, in the first. Note that $math[x=2^n]. This is one of the rare cases [[http://ocw.cs.pub.ro/ppcarte/doku.php?id=aa:intro:bibliography|[2]]] where a bad choice of a representation may lead to an exponential increase in the number of steps. In what follows, we assume problem instances are encoded in some default way: e.g. grphs are represented as adjacency matrices or as adjacency lists, etc. When appropiate representations are chosen, the computational gap between them is //at most polynomial//. As an example, let us compare a matrix representationof a graph, with that of adjacency lists. Assume the graph is directed, contains $math[n] nodes and only one edge $math[(u,v)]. The matrix representation will consume $math[n^2] positions (out of which $math[n^2-1] are equal to $math[0]), while the list represention will consume only one position (corresponding unique element from the adjacency list of $math[u]). However, the gap between the two representations is polynomial. Thus, from the point of view of Complexity Theory, it is irrellevant if we chose matrices or adjacency lists to represent graphs. This observation is highlighted by the following Proposition: ==== - Proposition (The encoding does not matter) ==== //Let $math[f : \mathbb{N} \rightarrow \{0,1\}] be a problem which is decided by a Turing Machine $math[M] in time $math[T]. $math[M] is defined over the alphabet $math[\Sigma]. Then, there exists a Turing Machine $math[M^\prime \mbox{ } ---] defined over alphabet $math[\Sigma^\prime = \{0,1,\#,>\}], which decides $math[f] and runs in time $math[O(\log \lvert \Sigma \rvert)*T(n)].// //Proof:(sketch)// We build $math[M^\prime = (K^\prime, F^\prime, \Sigma^\prime, \delta^\prime, s_0^\prime)], from $math[M] as follows: * $math[\Sigma^\prime = \{0,1,\#,>\}]. We encode each symbol different from $math[\#] (the empty cell) and $math[>] (the marker symbol of the beginning of the input), as a word $math[w \in \Sigma^\prime] with $math[\lvert w \rvert = \lceil \log \lvert \Sigma \rvert \rceil]. We use $math[k] to refer to the length $math[\lvert w \rvert] of the word $math[w]. We write $math[enc_{\Sigma^\prime}(x)], with $math[x \in \Sigma], to refer to the encoding of symbol $math[x \in \Sigma]. * For each state $math[s \in K], we build $math[2^{k+1}] states $math[q_1, \cdots q_{2^{k+1}} \in K^\prime], organized as a full binary tree of height $math[k]. The purpose of the tree is to recognize the word $math[enc_{\Sigma^\prime}(x)] of length $math[k] from the tape. Thus, the unique state at the root of the tree, namely $math[q_1], is responsible for recognising the first bit. If it is $math[0], $math[M^\prime] will transition to $math[q_2] and if it is $math[1], to $math[q_3]. $math[q_2] and $math[q_3] must each recognize the second bit. After their transitions, we shall be in one of the states $math[q_4] to $math[q_8], which give us information about the first two bits of the word. The states from level $math[i] recognize the first $math[i] bits of the encoded symbol $math[enc_{\Sigma^\prime}(x)]. The states from the last level are $math[2^k] in number, and recognize the last bit of the encoded symbol $math[enc_{\Sigma^\prime}(x)]. Thus, each of the $math[2^k] leaf-states in the tree corresponds to one possible symbol $math[x \in \Sigma] which is encoded as $math[enc_{\Sigma^\prime}(x)]. We connect all $math[2^{k+1}] states by transitions, as described above. * For each transition $math[\delta(s,x)=(s^\prime,x^\prime,pos)] of $math[M], the machine $math[M^\prime] must: (i) recognize $math[x], (ii) override $math[x^\prime], (iii) move according to $math[pos] and go to state $math[s^\prime]. Thus: * (i) is done by the procedure described at the above point. * for (ii), we use $math[k] states to go back ($math[k] cells) at hte beginning of $math[enc_{\Sigma^\prime}(x)] and write $math[enc_{\Sigma^\prime}(x^\prime)], cell by cell. Finally, we connect the state corresponding to $math[enc_{\Sigma^\prime}(x)] from the tree, to the first of the above-described $math[k] states. * for (iii) if $math[pos = L/R], we use another $math[k] states to go either left or right. If $math[pos = H], we need not use these states. Finally, we need to make a transition to the root of the state tree corresponding to $math[s^\prime]. For each transition $math[\delta(s,x)=(s^\prime, x^\prime, pos)] of $math[M], $math[M^\prime] performs $math[k] transitions for reading the encoded symbol $math[x], $math[k] transitions for writing $math[x^\prime] and possibly $math[k] transitions for moving the tape head. Thus, for all $math[w \in \Sigma^\prime], the number of transitions performed by $math[M^\prime] is at most $math[3k*T(\lvert w \rvert)]. Hence, the running time of $math[M^\prime] is $math[O(k)*T(n)]. The proof of Proposition 1.4.1 shows that any Turing Machine using an arbitrary alphabet $math[\Sigma] can be transformed in one using the binary alphabet. The overhead of this transformation is logarithmic: $math[O(\lceil \log \lvert \Sigma \rvert \rceil)]. Thus, if the original Turing Machine runs in some polynomial time $math[T(n)], then the transformed TM will run in $math[O(\lceil \log \lvert \Sigma \rvert \rceil)*T(n)] time which is bound by a polynomial. Similarly, if the original TM is running in supra-polynomial time, the transformed TM will also run in supra-polynomial time. In what follows, we shall assume all Turing Machines are using the binary alphabet $math[\Sigma_b=\{0,1,\#,>\}]. ===== - Complexity classes ===== In the previous section, we have stated that, in complexity theory, we shall make no distinction between polynomial running times with different asymptotic growths (e.g. between $math[n^2] and $math[n^3]). With this in mind, we construct a classification of problems. First, we say that: //$math[f] is decidable in time $math[T(n)]// iff there exists a Turing Machine $math[M] which decides $math[f], and whose running time is $math[T]. We interchangeably use the terms //decidable// and //solvable//, since, in this chapter, there is no ambiguity on what "//solvability//" means, and it cannot be mistaken by acceptability. All considered problems in this section are members of $math[R]. The following definition characterizes problems with a specific time. $math[DTIME(T(n))=\{ f : \mathbb{N} \rightarrow \{0,1\} \mid f \mbox{ is decidable in time } O(T(n))\}] Note that, unlike asymptotic notations, $math[DTIME(T(n))] is a class of problems, not of running times. Also, note that our characterization does not provide a strict upper bound. Hence, e.g. $math[DTIME(T(n)) \subseteq DTIME(T(n^2))]. In words: a problem which is decidable in liniar time, is also decidable in quadratic time. Next, we introduce the class: $math[PTIME= \displaystyle \bigcup\limits_{d \in \mathbb{N}} DTIME(n^d)] $math[PTIME] is often abbreviated $math[P]. It is the class of all problems which are decidable in polynomial time. Also, note that if some problem $math[f] is decidable in $math[\log (n)] time, then $math[f \in DTIME(\log (n)) \subseteq DTIME(n) \subseteq P]. Hence, even problems which are solvable in sub-liniar time belong in the class $math[P]. Further on, we introduce the class: $math[EXPTIME = \displaystyle \bigcup\limits_{d \in \mathbb{N}} DTIME\left(2^{n^d}\right)] which contains all the problems which are decidable in exponential time. Naturally, we have: $math[P \subseteq EXPTIME \subseteq R] There are two interesting questions which can be raised, at this point: - Is the inclusion $math[P \subseteq EXPTIME] strict? - Are all problems in $math[EXPTIME \ P] (Later in this chapter, we shall see that $math[EXPTIME \ P] is a set whose members are currently unknown.), "//equals//", in terms of difficulty? In the following section, we shall focus on the latter question: ===== Nondeterminism and Nondeterministic Turing Machine ===== We recall the problem $math[SAT], which takes as input a boolean formula $math[\psi] in Conjunctive Normal Form (CNF). More precisely: $math[\psi = C_1 \wedge C_2 \wedge \ldots \wedge C_n] where, for each $math[i:1 \leq i \leq n] we have $math[C_i = L_{i1} \vee L_{i2} \vee \ldots \vee L_{im_i}] and, for each $math[j:1 \leq j \leq m_i] we have $math[L_{ij} = x] or $math[L_{ij}=\neg x] and finally, $math[x] is a variable. Recall that $math[SAT] can be solved in exponential time, hence $math[SAT \in EXPTIME]. The major source of complexity consists in generating all possible interpretations on which a verification is subsequently done. In order to answer question 2, from previous section, suppose that $math[SAT] would be solvable in polynomial time. Could we find problems (possibly related to $math[SAT]) which are still solvable in exponential time under our assumption? The answer is yes: Let $math[\gamma] be the formula: $math[\gamma = \forall x_1 \forall x_2 \ldots \forall x_k \psi] where $math[\psi] is a formula in CNF containing variables $math[x_1,\ldots,x_k], and $math[k \in \mathbb{N}] is arbitrarly fixed. Checking if $math[\gamma] is satisfiable is the problem $math[\forall SAT]. An algorithm for $math[\forall SAT] must build //all combinations of 0/1 values for each $math[x_i] with $math[i:1 \leq i \leq k]// and for each one, must solve an instance of the $math[SAT] problem. In total, we have $math[2^k] combinations, and since $math[k] is part of the input, the algorithm runs in exponential time, //provided that we have an algorithm for $math[SAT] which runs in polynomial time//. In order to study the difficulty of problems, in Complexity Theory, we generalise the above-presented approach, in order to determine //degrees of hardness//. In order to do so, we need a general version of our assumption: "$math[SAT] //is solvable in polynomial time//". In other words, we need a theoretical tool to make exponential search (seem) polynomial. This tool will have no relation to reality, and no implications for real problem solving. It should not be understood as a technique for deciding hard problems. It's mere purpose is theoretical: it allows us to abstract one source of complexity (that of $math[SAT] in our example) in order to explore others (e.g. $math[\forall SAT]). The tool that we mentioned is the //Nondeterministic Turing Machine//: $def[Non-deterministic TM] //A non-deterministic Turing Machine ($math[NTM] short) is a tuple $math[M=(K,F,\Sigma,\delta,s_0)] over alphabet $math[\Sigma] with $math[K], $math[\Sigma] and $math[s_0] defined as before, $math[F=\{s_{yes},s_{no}\}] and $math[\delta \subseteq K \times \Sigma \times K \times \Sigma \times \{L,H,R\}] is a **transition relation**.// //A $math[NTM]// **terminates** //iff it reaches a final state, hence, a state in $math[F]. A $math[NTM] $math[M]// **decides** //a function $math[f:\mathbb{N} \rightarrow \{0,1\}] iff $math[f(n^w)=0 \Longrightarrow M(w)] reaches state $math[s_{no}] **on all possible sequences of transitions** and $math[f(n^w)=1 \Longrightarrow M(w)] reaches state $math[s_{yes}] **on at least one sequence of transitions**.// //We say the **running time** of a $math[NTM] $math[M] is $math[T] iff, for all $math[w \in \Sigma], all sequences of transitions of $math[M(w)] contain at most $math[T(\mid w\mid)] steps.// //[Matei: Termination][Matei: Running time]// $end We start with a few technical observations. First note that the $math[NTM] is specifically tailored for decision problems. It has only two final states, which correspond to //yes/no// answers. Also, the machine does not produce an output, and the usage of the tape is merely for internal computations. In essence, these "design choices" for the $math[NTM] are purely for convenience, and alternatives are possible. Whereas the conventional Turing Machine assigned, for each combination of state and symbol, a unique next-state, overriding symbol and head movement, a nondeterministic machine assigns a //collection// of such elements. The //current configuration// of a conventional Turing Machine was characterized by the current contents of the tape, and by the head position. A configuration of the nodeterministic machine corresponds to a //set// of conventional $math[TM] configurations. The intuition is that the $math[NTM] can simultaneously process a set of conventional configurations, in one single step. While the execution of a Turing Machine can be represented as a //sequence//, that of the $math[NTM] can be represented as a //tree//. A path in the tree corresponds to one sequence of transions which the $math[NTM] performs. Now, notice the conditions under which a $math[NTM] decides a function: if at least one sequence of transitions leads to $math[s_{yes}], we can interpret the answer of the $math[NTM] as //yes//. Conversely, if **all** sequences of transitions lead to $math[s_{no}], then the machine returns //no//. Finally, when accounting for the running time of a $math[NTM], we do not count all performed transitions (as it would seem reasonable), but only the //length of the longest transition sequence// performed by the machine. The intuition is that all members of the current configuration are processed //in parralel//, during a single step. We will illustrate the $math[NTM] in the following example: $example[Nondeterministic Turing Machine] //We build the $math[NTM] $math[M_{SAT}] which solves the $math[SAT] problem discussed previously. First, we assume the existence of $math[M_{chk}(I,\psi)] which takes an interpretation and a formula, both encoded as a unique string, and checks if $math[I \models \psi]. $math[M_{chk}] is a conventional $math[TM], thus upon termination it leaves $math[0] or $math[1] on the tape.// * //Step 1: $math[M_{SAT}] computes the number of variables from $math[\psi] (henceforth referred to as $math[n]), and pretends the encoding of $math[\psi] with the encoding of $math[n] in unary (as a sequence of $math[1]'s). This step takes $math[O(n)] transitions.// * //Step 2: During the former step, $math[M_{SAT}] has created a context for generating interpretations. In this step, $math[M_{SAT}] goes over each cell from the encoding of $math[n], and non-deterministically places $math[0] or $math[1] in that cell. Thus, after 1 such transition, there are 2 possible conventional configurations. In the first, bit $math[0] is placed on the first cell of the encoding of $math[n]. In the second, bit $math[1] is placed in the same position. After $math[i] transitions, we have $math[2^i] possible configurations. At the end of this step, we have $math[2^n] possible configurations, and in each one, we have a binary word of length $math[n] at the beginning of the tape, which corresponds to one possible interpretation. All sequences of transitions have the same length, thus, the execution of this part of $math[M_{SAT}] takes $math[O(n)].// * //Step 3: At the end of each sequence illustrated above, we run $math[M_{chk}(I,\psi)], where $math[I] and $math[\psi] are already conveniently on the tape. This step takes $math[O(n*m)] where $math[m] is maximal number of literals in $math[\psi].// //If we add up all running times of the three steps, we obtain $math[O(n*m)].// $end An important issue is whether the $math[NTM] has more expressive power than the conventional $math[TM]: ==== - Proposition ==== //Every function which is decidable by an $math[NTM] in polynomial running time, is also by a $math[TM] which runs in exponential time.// The proof is left as exercise. Intuitively, we can simulate a $math[NTM] by doing a backtracking procedure, with a classic $math[TM]. The propositions shows that the $math[NTM] only offers a gain in speed and not expressive power. It solves precisely the same problems which the conventional $math[TM] solves. ==== Convention for describing $math[NTM] in pseudocode ==== In the previous chapters, we often resorted to traditional pseudocode in order to describe algorithms - that is, Turing Machines. It is occasionaly useful to be able to do the same thing for $math[NTM]s. With this in mind, we introduce some notational conventions. The instruction: $math[v = choice(A)] where $math[v] is a variable and $math[A] is a set of values, behaves as follows: * the current (non-deterministic) configuration of the $math[NTM] shall contain $math[\mid A \mid] conventional configuration. * each conventional configuration corresponds to a distinct value $math[a \in A], and it should be interpreted that $math[v=a], in that particular configuration. * the running time of the instruction is $math[O(1)]. We also note that, it is not possible to achieve some form of "//communication//" between conventional configurations. Thus, it is intuitive to think that the processing (execution of a transition) of a conventional configuration is done independently of all other conventional configurations. We add two aditional instructions: **success** and **fail**. They correspond to a transitions into states $math[s_{yes}] and $math[s_{no}], respectively. We illustrate $math[NTM] pseudocode, by re-writing the $math[SAT] algorithm described above. We adopt the same representational conventions from the first Chapter, and also re-use the procedure **CHECK**. $example[Pseudocode] $math[SolveSAT(\phi)] //Let $math[n] be the number of variables in $math[\varphi].// //Let $math[I] be a vector with $math[n] components which are initialised with $math[0].// $math[\mathbf{for} \mbox{ } i=\overline{0,n-1} \mbox{ :}] $math[\quad I \lbrack i \rbrack = choice(\{0,1\})] $math[\mathbf{if} \mbox{ } CHECK(\varphi,I)=0] $math[\quad fail] $math[\mathbf{else} \mbox{ } succes] $end As illustrated in the previous example, the $math[NTM] has the power of taming down the complexity which results from the search of an exponential number of candidates (in our example, interpretations). Note that, in the $math[NTM], the main source of complexity is given by the verification procedure of $math[M_{chk}(I,\psi)]. By using the $math[NTM], we have managed to find the source of complexity of $math[SAT], which is the exponential search over possible interpretations. We have seen that there are other types of exponential explosion. Thus, we are in the position for refining our classification, by introducing new complexity classes: $math[NTIME(T(n))=\{ f:\mathbb{N} \rightarrow \{0,1\} \mid f \mbox{ is decidable by a } NTM \mbox{ in time } O(T(n))\}] and $math[NPTIME= \displaystyle \bigcup\limits_{d \in \mathbb{N}} NTIME(n^d)] We usually abbreviate the class $math[NPTIME] by $math[NP]. Note that $math[SAT \in NP], however it seems unlikely that $math[\forall SAT \in NP]. We shall discuss the issue in the next section. Let us relate $math[NP] with our other classes. First, note that $math[NP \subseteq EXPTIME]. This result is essentially given by Proposition 1.5.1: every problem solved by a $math[NTM] can be solved in exponential time by a $math[TM]. Also, we trivially have: $math[P \subseteq NP]. The concise argument is that the $math[NTM] is a generalization of the $math[TM], "minus" some technical details. Thus: $math[P \subseteq NP \subseteq EXPTIME \subseteq R] The fundamental property of problems from $math[NP] is that "//solutin candidates can be// verified //in polynomial time//". An analogy with solving crosswords is possible. Generating all possible solutions (by "lexicographically" generating all posible words) is obviously exponential. But since //verifying// whether a solution is correct can be done in polynomial time w.r.t. the size of the crossword, the problem in $math[NP]. We shall soon see that all algorithms which solve hard problems from $math[NP] can be split up into an exponential "generating" procedure, and a polynomial "verification procedure". ===== - Hard and complete problems for a class ===== Recall that proving $math[f \notin R] for a given problem $math[f] was done using contraposition. Essentially, the proof relied on finding a Turing reduction $math[f^* \leq_T f] to a problem $math[f^*] for which $math[f^* \notin R] is known. First, we note that $math[R] could be easily replaced with any complexity class $math[X]. Thus, a more general proof scheme could be described as: $math[f^* \notin X, f \leq_X f^* \Longrightarrow f \notin X] where $math[f^*] and $math[X] are given in advance. We observe that we have replaced $math[\leq_T] by $math[\leq_X], in order to highlight that the reduction $math[\leq_X] depends on the choice of $math[X]. We shall later see that we cannot allways use Turing Reductions, and there are $math[X]'s for which more restrictions on the reduction must be in place. Returning to the proof scheme, we make a second note: to prove $math[f \notin R], we must first find some $math[f^* \notin R]. But if this very fact is shown by the same technique, then we need another problem $math[f^{**} \notin R] and so on. This situation is similar to: "//Which came first, hens or eggs?//". For the case of $math[R], this issue was settled by the proof $math[f_h \notin R] using diagonalization. This introduced an initial undecidable problem, and the aforementioned technique could be employed. Now, let us turn our attention to $math[X=NP]. First, let us examine the properties which the reduction should have, in order to make our technique work in this particular case. Let $math[f^* \notin NP]. We need to show that $math[f \notin NP]. The first part consists of assuming $math[f \in NP]. Next, we ??CUVANT LIPSA?? a reduction $math[T:I_{f^*} \rightarrow I_f] where $math[I_{f^*}] and $math[I_f] are the inputs of the problem from the subscript (In order to make explicit the direction of the transformation, we ignore the fact that both $math[I_{f^*}] and $math[I_f] are (subsets of) naturals.). We recall that the reduction needs to satisfy: $math[\forall i \in I_{f^*} : f^*(i)=1 \Longleftrightarrow f(T(i))=1] In words: "//we can solve all instances $math[i] of $math[I_{f^*}] by solving instances $math[T(i)] of $math[I_f]//". This is done as follows: - receive $math[i \in I_{f^*}] - compute $math[T(i) \in I_f] - run the $math[NTM] $math[M_f(T(i))] ($math[M_f] must exist since $math[f \in NP]) and return it's answer. After a careful look, we observe that the conclusion $math[f^* \in NP] (which is our objective, in order to complete the contrapositive argument) is not immediate. If, for instance, the computation of $math[T(i)] takes exponential time, the the proof does not work: the $math[NTM] which performs 1. 2. 3. runs in (non-deterministic) exponential time. Thus, the restriction that $math[T] is decidable is insufficient. We further need that $math[T] is computable in **polynomial time**. We write $math[f^* \leq_p f] iff there exists a polynomial-time transformation which allows solving $math[f^*] via $math[f], as illustrated by the scheme above. We now turn our attention to the "//hen and eggs//" issue. We need an initial problem, which is known **not** to be in $math[NP]. Unfortunately, such a problem is not known, although there have been major (unsuccessful) efforts in trying to find one. The same holds for the class $math[P]. Hence, the issue: $math[P \subsetneq NP] is currently open, and it is generlly believed that it is true, but all proof attempts have failed. The good news is that our effort in classifying problems is not fruitless. Transformations: $math[f^* \leq_p f \quad f^* \leq_T f] establish a relation between problems, which is denoted as //hardness//. No matter the reduction type (polynomial or Turing), $math[f] is **at least as hard** as $math[f^*]. With the machine $math[M_f] (together with computing a transformation) we can solve all instances of $math[f^*]. It may be possible that $math[T] is bijective: hence each input of $math[f^*] is uniquely mapped to an input of $math[f] and vice-versa. Then, $math[f] and $math[f^*] are //equally hard//. However, this is not generally the case, hence the term "//at least//"as hard. We can naturally extend hardness to complexity classes: $def[1.6.1] //A problem $math[f] is called $math[NP]-hard iff for all $math[f^\prime \in NP], $math[f^\prime \leq_p f].// $end Thus, a problem is hard w.r.t. a class, iff it is at least as hard as any problem in the class at hand. Note that hardness can be defined w.r.t. any complexity class and not just $math[NP], provided that the appropriate type of transformation is employed. $def[1.6.2] //A problem $math[f] is called $math[NP]-complete iff it is $math[NP]-hard and $math[f \in NP].// $end Informally, $math[NP]-complete problems are the hardest problems in $math[NP]. In the more general case, complete problems w.r.t. a class are the hardest of that class. It is likely that if $math[f] is $math[NP]-complete, then $math[f \notin P], however, this is not a proven fact. Thus, instead of trying to disprove membership of a class ($math[f \notin P]), in complexity theory, we prove completeness for immediate upper (or greater) class ($math[f] is $math[NP]-complete). The intuition is that class membership provides an upper bound, while hardness - a lower bound for the difficulty of a problem. We illustrate this by an abstract example. Recall: $math[P \subseteq NP \subseteq EXPTIME] Let $math[f] be a problem. Suppose we find an algorithm for $math[f] which runs in exponential time, however, we cannot find one which runs in polynomial time on a $math[NTM]. At this point, we have $math[f \in EXPTIME]. Suppose we know $math[f] is $math[P]-hard, thus, $math[f] can be used to solve any problem in $math[P]. We now know that $math[f] can be solved exponentially and it is unlikely that $math[f] can be solved in sub-polynomial (e.g. logarithmic) time. Thus, the likely variants are: $math[f] may be solved in polynomial time (i) by a convetional $math[TM \mbox{ } f \in P] or (ii) by a $math[NTM \mbox{ } f \in NP], and (iii) in exponential time, again by a conventional $math[TM \mbox{ } f \in EXPTIME]. In the best case $math[f] is polynomially solvable. In the worst case - it is exponentially solvable. Suppose now we also find that $math[f] is $math[NP]-hard. We cannot rule out $math[f \in P] by a proof, but Complexity Theory predicts that such a membership is not likely. Hence, the feasible variants remain (ii) and (iii). Finally, if we manage to improve our exponential algorithm for $math[f] and turn it into a non-deterministic polynomial algorithm, then $math[f \in NP] and, hence, it is $math[NP]-complete. Case (iii) remains of course true, but it does not carry useful information. At this point, we have an exact characterisation of the difficulty of $math[f]. ==== Proving $math[NP]-completeness ==== For a problem $math[f] to be $math[NP]-complete, it must satisfy two conditions. The first: $math[f \in NP] is shown by finding a $math[NTM] which decides $math[f] in polynomial time. For the second part ($math[f] is $math[NP]-hard), we can employ precisely the "reduction-finding" technique illustrated at the beginning of this section. ==== - Proposition ==== //A problem $math[f] is $math[NP]-hard iff there exists a problem $math[g] which is $math[NP]-hard, such that $math[g \leq_p f].// //Proof:// Suppose $math[g \leq_p f] and $math[g] is $math[NP]-hard. Hence, for all $math[h \in NP], $math[h \leq_p g]. By transitivity, we also have that $math[h \leq_p f]. It follows that $math[f] is $math[NP]-hard. In the former proof we have made use of the transitivity of $math[\leq_p], without showing it. We now state several properties of $math[\leq_p] including transitivity, and leave the proofs as exercises. ==== - Proposition ==== //$math[\leq_p] is reflexive and transitive.// ==== - Proposition ==== //The set of $math[NP]-hard problems is closed under $math[\leq_p].// ==== - Proposition ==== //The set of $math[NP]-complete problems together with $math[\leq_p] is an equivalence class.// ==== - Proposition ==== //Assume $math[f] is $math[NP]-complete. If $math[f \in P], then $math[P=NP].// The former proposition, whose proof follows immediately from the underlying definitions, makes the case for the common belief that $math[P \neq NP]. If some efficient algorithm can be found for some $math[NP]-complete problem, then **all** problems in $math[NP] can be solved in polynomial time. The $math[P=NP] issue can also be given another intuitive interpretation: "//The verification of a solution candidate is as difficult as generating it//" or, alternatively: "//Verifying a given proof $math[P] for $math[A], is as difficult as finding a proof for $math[P]//". Finally, to better understand the implications of $math[P=NP], consider several facts which would be arguably true, in the case the former equality holds: * We can provide a solution to the astronaut's problem (see the first chapter). * Partial program correctness can be solved effciently. Technique such as model checking can be applied to a wide range of applications (including operating system kernels). Bugs are almost removed. Windows bluescreens are no longer happening. * Generation of exponentially many training sets would make tasks such as voice recognition, computer vision, natural language processing - computationlly easy. * Mathematical proofs (of, say 100 pages) can be generated efficiently. Computers can be used to find proofs for some open problems. * We can exponential search to find passwords, or to break encryption keys in polynomial time. Internet privacy is no longer possible using encryption (e.g. using SSH). Internet commerce and banking is no longer possible. Safe communication is no longer possible (at all levels). Any computer-controlled facility (public, militry, etc.), which is connected to the Internet has considerable potential of being compromised. ==== 1.6.1 Remark (Practical applications of reductions) ==== //As illustrated before, reductions of the type $math[\leq_p] are a theoretical tool which is useful for providing $math[NP]-hardness. Reductions also have practical applications. For instance, most $math[NP]-complete problems are solved by employing $math[SAT] solvers, which, as discussed in the former chapters, may be quite fast in general case. Thus, a specific problem instance is cast (via an appropriate transformation) into a formula $math[\varphi], such that $math[\varphi] is satisfiable iff the answer to the instance is// yes. ==== $math[SAT]. The first $math[NP]-complete problem. ==== We observe that the "//hen and eggs//" issue still holds in our scenario. To apply our technique, we need an initial $math[NP]-hard problem in the first place. This problem is provided by Cook's Theorem, which proves that $math[SAT] is $math[NP]-complete. The technique for the proof relies on building, for each $math[NTM \mbox{ } M] (and hence, for each problem in $math[NP]), a formula $math[\varphi_M] such that it is satisfiable iff there exists a sequence in the computation tree leading to **success**.