====== Abstract Data Types - Intro ======
===== A overview into correctness =====
Consider the following list of position papers:
- [[http://homepages.cwi.nl/~storm/teaching/reader/Dijkstra68.pdf | Goto statement considered harmful]]
- [[http://web.cs.iastate.edu/~hridesh/teaching/362/07/01/papers/p50-liskov.pdf | Programming with abstract data types]]
- [[https://medium.com/@brianwill/object-oriented-programming-a-personal-disaster-1b044c2383ab#.3q0l88tde | Object Oriented Programming - A personal disaster]]
- [[http://edge.cs.drexel.edu/regli/Classes/Lisp_papers/worse-is-better.pdf | Lisp: Good News, Bad News, How to Win Big]]
These papers/blog-posts share a strong view (not necessarily overlapping nor in opposition) regarding **how programs should be developed in the right way**.
There is still no consensus regarding a **correct/healthy** way of writing programs, however, as Donald Knuth describes in his farsighted essay:
* [[http://www.paulgraham.com/knuth.html | Computer Programming as an Art]]
a well-written program will always be **recognised** as such and **appreciated** by a skilful programmer.
**Correctness** of a program refers to the property that:
* the program always **returns the desired/intended result** as long as the **input** is **valid** (e.g. a program computing the factorial will actually return $math[!n] as long as the input is a positive integer). This is informal definition is known as **partial correctness**
* the program furthermore **terminates** for all inputs - which is known as **total correctness**
Although not obvious right away, there is a **strong link** between:
* **program correctness** and
* **program development**
This link is emphasised by the following remarks:
* //Testing proves the **presence** not the **absence** of **bugs**// (E. Dijkstra)
* It is infeasible to perform **exhaustive testing** for non-trivial software programs. Exhaustive testing means considering all possible inputs of the program at hand.
We recall another of Dijkstra's quotes:
* //It is not only the programmers task to produce correct programs but also to demonstrate its correctness in a convincing manner, then the above remarks have a profound influence on the programmer's activity: the object he has to produce must be usefully structured//
from [[http://www.cs.utexas.edu/users/EWD/ewd02xx/EWD249.PDF | Notes on Structured Programming]].
Almost 50 years after Dijkstra's essay, an active research area consists in development techniques for programs in such a way that their correctness can be **verified automatically**.
===== Correctness at AA =====
We shall adopt Dijkstra's viewpoint regarding program correctness. To this end, we shall investigate a **tool** which allows us to **structure programs** in such a way that we can efficiently **reason** about their **correctness**. The reasoning process we shall look at is mental (not automatic, i.e. machine implementable). However, (semi-)automated program reasoning techniques exist.
The tool is **Abstract Data Types (ADTs)**. In a nutshell:
* An ADTs allows one to create an **interface** between the actual type representation (of e.g. a list, tree, etc.) together with specific operations (e.g. concatenation, tree traversal) and **how the type and operations are being used by other programmers**. The message here is that programmers that employ such types, do not need (and should not have) information regarding **implementation**, but only about **behaviour**;
* ADTs allow specifying operator **behaviours** (via axioms)
* ADTs allow **making proofs** regarding these behaviours, which allow us to draw conclusions on the correctness of programs which use ADTs.
* ADTs are a **programming discipline** which is currently supported in languages such as **Scala** and which is regarded as particularly useful in a wide range of applications
===== A motivating example ======
Consider the following program:
#define INDEX_OUT_OF_BOUNDS 1
int EXCEPTION = 0;
void throw (int e){
EXCEPTION = e;
}
struct AList {
int* v;
int sz, len;
};
typedef struct AList AList;
AList Empty(){
AList l;
l.sz = 0;
l.len = 0;
return l;
}
void copy (int* src, int s_start, int end, int* dst, int d_start){
int i;
for (i = s_start; i= l.sz || pos < 0){
throw(INDEX_OUT_OF_BOUNDS);
return 0;
}
return l.v[pos];
}
AList ins (AList l, int pos, int e){
if (pos > l.sz || pos < 0){
throw(INDEX_OUT_OF_BOUNDS);
return l;
}
if (pos == l.sz)
return add(l,e);
else
{
int temp = l.v[pos];
l.v[pos] = e;
return ins(l,pos+1,temp);
}
}
It contains the methods:
* ''Empty'' - returns an empty list
* ''copy'' - used for copying elements from one array to another
* ''add'' - adds an element to a given list (**at the end**)
* ''get'' - returns the value from a given position
* ''ins'' - inserts an element at a given position in the list
The ''AList'' (short for ArrayList) represents a list as an array. Whenever the array becomes full, the capacity of the array **doubles**.
Now consider the following code:
#define INDEX_OUT_OF_BOUNDS 1
int EXCEPTION = 0;
void throw (int e){
EXCEPTION = e;
}
struct LList {
struct LList* next;
int val;
};
typedef struct LList* LList;
LList Empty(){
return 0;
}
LList add (LList l, int e){
LList n = malloc(sizeof(struct LList));
n->val = e;
n->next = l;
return n;
}
int get (LList l, int pos){
if (pos < 0 || l == 0){
throw(INDEX_OUT_OF_BOUNDS);
return 0;
}
if (pos == 0)
return l->val;
else
return get(l->next,pos-1);
}
void ins (LList l, int pos, int e){
if (pos < 0 || l == 0){
throw(INDEX_OUT_OF_BOUNDS);
return;
}
if (pos == 0){
LList n = malloc(sizeof(struct LList));
n->val = e;
n->next = l->next;
l->next = n;
}
else
ins (l->next,pos-1,e);
}
The ''LList'' (abreviating LinkedList) contains precisely the same methods as the ArrayList. The sole difference in behaviour here is that ''add'' will add an element **at the beginning** of the list. Of course, implementations are conceptually different. The efficiency of each list implementation is also different. For instance:
* inserting in a ''LList'' takes constant time, while in a ''AList'' takes **linear time** w.r.t. the size of the list - in the worst case. (We shall refine this analysis later).
* accessing a given position from a ''LList'' takes **linear time** while in a ''AList'' - constant time.
Leaving efficiency aside, the behaviour of both lists is the same, and any program is expected to **behave in the same way irrespective of the type of list implementation** which is deployed.
With this in mind, we develop the following **List abstraction**. For convenience we (temporarily) use C code to describe this abstraction:
List Empty();
List cons (int e, List l);
int head (List l);
List tail (List l);
int isEmpty(List l);
We can group the above function definitions in two categories:
* **constructors** (''cons'' and ''Empty''). Using **any combination of these functions** we can **create any** list;
* **observers** (''head'', ''tail'', ''isEmpty''). Using any combination of these functions we can **inspect any** element of the list;
Furthermore, **any operation defined on lists can be expressed using a combination of these functions**:
List add (List l, int e){
return cons(e,l);
}
int get (List l, int pos){
if (pos == 0)
return head(l);
return get(tail(l),pos-1);
}
List ins (List l, int pos, int e){
if (pos == 0)
return cons(e,l);
return cons(head(l),ins(tail(l),pos-1,e));
}
void show (List l){
if (isEmpty(l))
printf("[]\n");
else{
printf("%i ",head(l));
show(tail(l));
}
}
In the previous code, we have shown implementations of the functions ''add'', ''ins'', ''get'' together with the display function ''show''. These latter implementations are **independent** on how the type of the list:
* the code can be **reused directly**
* however, in C we cannot deploy exactly the same code, since we do not have an abstraction mechanism such as classes and inheritance. Hence, the **same code** needs to be copy-pasted for each implementation.
Note that we have defined list operations without actually implementing our list abstractions. The implementations follow for ''AList'':
struct AList {
int* v;
int sz, len;
};
typedef struct AList AList;
AList Empty(){
AList l;
l.sz = 0;
l.len = 0;
return l;
}
void copy (int* src, int s_start, int end, int* dst, int d_start){
int i;
for (i = s_start; i
as well as for ''LList'':
struct LList {
struct LList* next;
int val;
};
typedef struct LList* LList;
LList Empty(){
return 0;
}
LList cons(int e, LList l){
LList n = malloc(sizeof(struct LList));
n->val = e;
n->next = l;
return n;
}
int isEmpty(LList l){
return l==0;
}
int head(LList l){
return l->val;
}
LList tail(LList l){
return l->next;
}
Let us recap:
* we have introduced an **abstraction for Lists** which consists of:
* functions which define **list construction** (''cons'', ''Empty'')
* functions which define **list inspection** (''head'', ''tail'', ''isEmpty'')
* we have **separated** List implementation from list functionality. All other functionality is defined w.r.t. the above functions. No other implementation info is used.
* This allows us to reason about list-manipulating programs **independently** on how lists are implemented.