====== Abstract Data Types - Intro ====== ===== A overview into correctness ===== Consider the following list of position papers: - [[http://homepages.cwi.nl/~storm/teaching/reader/Dijkstra68.pdf | Goto statement considered harmful]] - [[http://web.cs.iastate.edu/~hridesh/teaching/362/07/01/papers/p50-liskov.pdf | Programming with abstract data types]] - [[https://medium.com/@brianwill/object-oriented-programming-a-personal-disaster-1b044c2383ab#.3q0l88tde | Object Oriented Programming - A personal disaster]] - [[http://edge.cs.drexel.edu/regli/Classes/Lisp_papers/worse-is-better.pdf | Lisp: Good News, Bad News, How to Win Big]] These papers/blog-posts share a strong view (not necessarily overlapping nor in opposition) regarding **how programs should be developed in the right way**. There is still no consensus regarding a **correct/healthy** way of writing programs, however, as Donald Knuth describes in his farsighted essay: * [[http://www.paulgraham.com/knuth.html | Computer Programming as an Art]] a well-written program will always be **recognised** as such and **appreciated** by a skilful programmer. **Correctness** of a program refers to the property that: * the program always **returns the desired/intended result** as long as the **input** is **valid** (e.g. a program computing the factorial will actually return $math[!n] as long as the input is a positive integer). This is informal definition is known as **partial correctness** * the program furthermore **terminates** for all inputs - which is known as **total correctness** Although not obvious right away, there is a **strong link** between: * **program correctness** and * **program development** This link is emphasised by the following remarks: * //Testing proves the **presence** not the **absence** of **bugs**// (E. Dijkstra) * It is infeasible to perform **exhaustive testing** for non-trivial software programs. Exhaustive testing means considering all possible inputs of the program at hand. We recall another of Dijkstra's quotes: * //It is not only the programmers task to produce correct programs but also to demonstrate its correctness in a convincing manner, then the above remarks have a profound influence on the programmer's activity: the object he has to produce must be usefully structured// from [[http://www.cs.utexas.edu/users/EWD/ewd02xx/EWD249.PDF | Notes on Structured Programming]]. Almost 50 years after Dijkstra's essay, an active research area consists in development techniques for programs in such a way that their correctness can be **verified automatically**. ===== Correctness at AA ===== We shall adopt Dijkstra's viewpoint regarding program correctness. To this end, we shall investigate a **tool** which allows us to **structure programs** in such a way that we can efficiently **reason** about their **correctness**. The reasoning process we shall look at is mental (not automatic, i.e. machine implementable). However, (semi-)automated program reasoning techniques exist. The tool is **Abstract Data Types (ADTs)**. In a nutshell: * An ADTs allows one to create an **interface** between the actual type representation (of e.g. a list, tree, etc.) together with specific operations (e.g. concatenation, tree traversal) and **how the type and operations are being used by other programmers**. The message here is that programmers that employ such types, do not need (and should not have) information regarding **implementation**, but only about **behaviour**; * ADTs allow specifying operator **behaviours** (via axioms) * ADTs allow **making proofs** regarding these behaviours, which allow us to draw conclusions on the correctness of programs which use ADTs. * ADTs are a **programming discipline** which is currently supported in languages such as **Scala** and which is regarded as particularly useful in a wide range of applications ===== A motivating example ====== Consider the following program:


#define INDEX_OUT_OF_BOUNDS 1

int EXCEPTION = 0;
void throw (int e){
	EXCEPTION = e;
}

struct AList {
	int* v;
	int sz, len;
};

typedef struct AList AList;

AList Empty(){
	AList l;
	l.sz = 0;
	l.len = 0;
	return l;
}

void copy (int* src, int s_start, int end, int* dst, int d_start){
	int i;
	for (i = s_start; i= l.sz || pos < 0){
		throw(INDEX_OUT_OF_BOUNDS);
		return 0;
	}
	return l.v[pos];
}

AList ins (AList l, int pos, int e){
	if (pos > l.sz || pos < 0){
		throw(INDEX_OUT_OF_BOUNDS);
		return l;
	}
	if (pos == l.sz)
		return add(l,e);
	else
	{
		int temp = l.v[pos];
		l.v[pos] = e;
		return ins(l,pos+1,temp);
	}
}

It contains the methods: * ''Empty'' - returns an empty list * ''copy'' - used for copying elements from one array to another * ''add'' - adds an element to a given list (**at the end**) * ''get'' - returns the value from a given position * ''ins'' - inserts an element at a given position in the list The ''AList'' (short for ArrayList) represents a list as an array. Whenever the array becomes full, the capacity of the array **doubles**. Now consider the following code:


#define INDEX_OUT_OF_BOUNDS 1

int EXCEPTION = 0;
void throw (int e){
	EXCEPTION = e;
}

struct LList {
	struct LList* next;
	int val;
};

typedef struct LList* LList;

LList Empty(){
	return 0;
}
LList add (LList l, int e){
	LList n = malloc(sizeof(struct LList));
	n->val = e;

	n->next = l;
	return n;
}
int get (LList l, int pos){
	if (pos < 0 || l == 0){
		throw(INDEX_OUT_OF_BOUNDS);
		return 0;
	}
	if (pos == 0)
		return l->val;
	else
		return get(l->next,pos-1);
}
void ins (LList l, int pos, int e){
	if (pos < 0 || l == 0){
		throw(INDEX_OUT_OF_BOUNDS);
		return;
	}

	if (pos == 0){
		LList n = malloc(sizeof(struct LList));
		n->val = e;
		n->next = l->next;
		l->next = n;
	}
	else
		ins (l->next,pos-1,e);
}

The ''LList'' (abreviating LinkedList) contains precisely the same methods as the ArrayList. The sole difference in behaviour here is that ''add'' will add an element **at the beginning** of the list. Of course, implementations are conceptually different. The efficiency of each list implementation is also different. For instance: * inserting in a ''LList'' takes constant time, while in a ''AList'' takes **linear time** w.r.t. the size of the list - in the worst case. (We shall refine this analysis later). * accessing a given position from a ''LList'' takes **linear time** while in a ''AList'' - constant time. Leaving efficiency aside, the behaviour of both lists is the same, and any program is expected to **behave in the same way irrespective of the type of list implementation** which is deployed. With this in mind, we develop the following **List abstraction**. For convenience we (temporarily) use C code to describe this abstraction:


List Empty();
List cons (int e, List l);
int head (List l);
List tail (List l);
int isEmpty(List l);

We can group the above function definitions in two categories: * **constructors** (''cons'' and ''Empty''). Using **any combination of these functions** we can **create any** list; * **observers** (''head'', ''tail'', ''isEmpty''). Using any combination of these functions we can **inspect any** element of the list; Furthermore, **any operation defined on lists can be expressed using a combination of these functions**:


List add (List l, int e){
	return cons(e,l);
}

int get (List l, int pos){
	if (pos == 0) 
		return head(l);
	return get(tail(l),pos-1);
}

List ins (List l, int pos, int e){
	if (pos == 0)
		return cons(e,l);
	return cons(head(l),ins(tail(l),pos-1,e));
}

void show (List l){
	if (isEmpty(l))
		printf("[]\n");
	else{
		printf("%i ",head(l));
		show(tail(l));
	}
}

In the previous code, we have shown implementations of the functions ''add'', ''ins'', ''get'' together with the display function ''show''. These latter implementations are **independent** on how the type of the list: * the code can be **reused directly** * however, in C we cannot deploy exactly the same code, since we do not have an abstraction mechanism such as classes and inheritance. Hence, the **same code** needs to be copy-pasted for each implementation. Note that we have defined list operations without actually implementing our list abstractions. The implementations follow for ''AList'':


struct AList {
	int* v;
	int sz, len;
};

typedef struct AList AList;

AList Empty(){
	AList l;
	l.sz = 0;
	l.len = 0;
	return l;
}
void copy (int* src, int s_start, int end, int* dst, int d_start){
	int i;
	for (i = s_start; i

as well as for ''LList'':


struct LList {
	struct LList* next;
	int val;
};

typedef struct LList* LList;

LList Empty(){
	return 0;
}
LList cons(int e, LList l){
	LList n = malloc(sizeof(struct LList));
	n->val = e;

	n->next = l;
	return n;
}
int isEmpty(LList l){
	return l==0;
}
int head(LList l){
	return l->val;
}
LList tail(LList l){
	return l->next;
}


Let us recap:
  * we have introduced an **abstraction for Lists** which consists of:
    * functions which define **list construction** (''cons'', ''Empty'')
    * functions which define **list inspection** (''head'', ''tail'', ''isEmpty'')
  * we have **separated** List implementation from list functionality. All other functionality is defined w.r.t. the above functions. No other implementation info is used.
  * This allows us to reason about list-manipulating programs **independently** on how lists are implemented.