====== Abstract Data Types - Intro ====== ===== A overview into correctness ===== Consider the following list of position papers: - [[http://homepages.cwi.nl/~storm/teaching/reader/Dijkstra68.pdf | Goto statement considered harmful]] - [[http://web.cs.iastate.edu/~hridesh/teaching/362/07/01/papers/p50-liskov.pdf | Programming with abstract data types]] - [[https://medium.com/@brianwill/object-oriented-programming-a-personal-disaster-1b044c2383ab#.3q0l88tde | Object Oriented Programming - A personal disaster]] - [[http://edge.cs.drexel.edu/regli/Classes/Lisp_papers/worse-is-better.pdf | Lisp: Good News, Bad News, How to Win Big]] These papers/blog-posts share a strong view (not necessarily overlapping nor in opposition) regarding **how programs should be developed in the right way**. There is still no consensus regarding a **correct/healthy** way of writing programs, however, as Donald Knuth describes in his farsighted essay: * [[http://www.paulgraham.com/knuth.html | Computer Programming as an Art]] a well-written program will always be **recognised** as such and **appreciated** by a skilful programmer. **Correctness** of a program refers to the property that: * the program always **returns the desired/intended result** as long as the **input** is **valid** (e.g. a program computing the factorial will actually return $math[!n] as long as the input is a positive integer). This is informal definition is known as **partial correctness** * the program furthermore **terminates** for all inputs - which is known as **total correctness** Although not obvious right away, there is a **strong link** between: * **program correctness** and * **program development** This link is emphasised by the following remarks: * //Testing proves the **presence** not the **absence** of **bugs**// (E. Dijkstra) * It is infeasible to perform **exhaustive testing** for non-trivial software programs. Exhaustive testing means considering all possible inputs of the program at hand. We recall another of Dijkstra's quotes: * //It is not only the programmers task to produce correct programs but also to demonstrate its correctness in a convincing manner, then the above remarks have a profound influence on the programmer's activity: the object he has to produce must be usefully structured// from [[http://www.cs.utexas.edu/users/EWD/ewd02xx/EWD249.PDF | Notes on Structured Programming]]. Almost 50 years after Dijkstra's essay, an active research area consists in development techniques for programs in such a way that their correctness can be **verified automatically**. ===== Correctness at AA ===== We shall adopt Dijkstra's viewpoint regarding program correctness. To this end, we shall investigate a **tool** which allows us to **structure programs** in such a way that we can efficiently **reason** about their **correctness**. The reasoning process we shall look at is mental (not automatic, i.e. machine implementable). However, (semi-)automated program reasoning techniques exist. The tool is **Abstract Data Types (ADTs)**. In a nutshell: * An ADTs allows one to create an **interface** between the actual type representation (of e.g. a list, tree, etc.) together with specific operations (e.g. concatenation, tree traversal) and **how the type and operations are being used by other programmers**. The message here is that programmers that employ such types, do not need (and should not have) information regarding **implementation**, but only about **behaviour**; * ADTs allow specifying operator **behaviours** (via axioms) * ADTs allow **making proofs** regarding these behaviours, which allow us to draw conclusions on the correctness of programs which use ADTs. * ADTs are a **programming discipline** which is currently supported in languages such as **Scala** and which is regarded as particularly useful in a wide range of applications ===== A motivating example ====== Consider the following program: #define INDEX_OUT_OF_BOUNDS 1 int EXCEPTION = 0; void throw (int e){ EXCEPTION = e; } struct AList { int* v; int sz, len; }; typedef struct AList AList; AList Empty(){ AList l; l.sz = 0; l.len = 0; return l; } void copy (int* src, int s_start, int end, int* dst, int d_start){ int i; for (i = s_start; i= l.sz || pos < 0){ throw(INDEX_OUT_OF_BOUNDS); return 0; } return l.v[pos]; } AList ins (AList l, int pos, int e){ if (pos > l.sz || pos < 0){ throw(INDEX_OUT_OF_BOUNDS); return l; } if (pos == l.sz) return add(l,e); else { int temp = l.v[pos]; l.v[pos] = e; return ins(l,pos+1,temp); } } It contains the methods: * ''Empty'' - returns an empty list * ''copy'' - used for copying elements from one array to another * ''add'' - adds an element to a given list (**at the end**) * ''get'' - returns the value from a given position * ''ins'' - inserts an element at a given position in the list The ''AList'' (short for ArrayList) represents a list as an array. Whenever the array becomes full, the capacity of the array **doubles**. Now consider the following code: #define INDEX_OUT_OF_BOUNDS 1 int EXCEPTION = 0; void throw (int e){ EXCEPTION = e; } struct LList { struct LList* next; int val; }; typedef struct LList* LList; LList Empty(){ return 0; } LList add (LList l, int e){ LList n = malloc(sizeof(struct LList)); n->val = e; n->next = l; return n; } int get (LList l, int pos){ if (pos < 0 || l == 0){ throw(INDEX_OUT_OF_BOUNDS); return 0; } if (pos == 0) return l->val; else return get(l->next,pos-1); } void ins (LList l, int pos, int e){ if (pos < 0 || l == 0){ throw(INDEX_OUT_OF_BOUNDS); return; } if (pos == 0){ LList n = malloc(sizeof(struct LList)); n->val = e; n->next = l->next; l->next = n; } else ins (l->next,pos-1,e); } The ''LList'' (abreviating LinkedList) contains precisely the same methods as the ArrayList. The sole difference in behaviour here is that ''add'' will add an element **at the beginning** of the list. Of course, implementations are conceptually different. The efficiency of each list implementation is also different. For instance: * inserting in a ''LList'' takes constant time, while in a ''AList'' takes **linear time** w.r.t. the size of the list - in the worst case. (We shall refine this analysis later). * accessing a given position from a ''LList'' takes **linear time** while in a ''AList'' - constant time. Leaving efficiency aside, the behaviour of both lists is the same, and any program is expected to **behave in the same way irrespective of the type of list implementation** which is deployed. With this in mind, we develop the following **List abstraction**. For convenience we (temporarily) use C code to describe this abstraction: List Empty(); List cons (int e, List l); int head (List l); List tail (List l); int isEmpty(List l); We can group the above function definitions in two categories: * **constructors** (''cons'' and ''Empty''). Using **any combination of these functions** we can **create any** list; * **observers** (''head'', ''tail'', ''isEmpty''). Using any combination of these functions we can **inspect any** element of the list; Furthermore, **any operation defined on lists can be expressed using a combination of these functions**: List add (List l, int e){ return cons(e,l); } int get (List l, int pos){ if (pos == 0) return head(l); return get(tail(l),pos-1); } List ins (List l, int pos, int e){ if (pos == 0) return cons(e,l); return cons(head(l),ins(tail(l),pos-1,e)); } void show (List l){ if (isEmpty(l)) printf("[]\n"); else{ printf("%i ",head(l)); show(tail(l)); } } In the previous code, we have shown implementations of the functions ''add'', ''ins'', ''get'' together with the display function ''show''. These latter implementations are **independent** on how the type of the list: * the code can be **reused directly** * however, in C we cannot deploy exactly the same code, since we do not have an abstraction mechanism such as classes and inheritance. Hence, the **same code** needs to be copy-pasted for each implementation. Note that we have defined list operations without actually implementing our list abstractions. The implementations follow for ''AList'': struct AList { int* v; int sz, len; }; typedef struct AList AList; AList Empty(){ AList l; l.sz = 0; l.len = 0; return l; } void copy (int* src, int s_start, int end, int* dst, int d_start){ int i; for (i = s_start; i as well as for ''LList'': struct LList { struct LList* next; int val; }; typedef struct LList* LList; LList Empty(){ return 0; } LList cons(int e, LList l){ LList n = malloc(sizeof(struct LList)); n->val = e; n->next = l; return n; } int isEmpty(LList l){ return l==0; } int head(LList l){ return l->val; } LList tail(LList l){ return l->next; } Let us recap: * we have introduced an **abstraction for Lists** which consists of: * functions which define **list construction** (''cons'', ''Empty'') * functions which define **list inspection** (''head'', ''tail'', ''isEmpty'') * we have **separated** List implementation from list functionality. All other functionality is defined w.r.t. the above functions. No other implementation info is used. * This allows us to reason about list-manipulating programs **independently** on how lists are implemented.