Abstract Data Types - Intro

Consider the following list of position papers:

These papers/blog-posts share a strong view (not necessarily overlapping nor in opposition) regarding how programs should be developed in the right way.

There is still no consensus regarding a correct/healthy way of writing programs, however, as Donald Knuth describes in his farsighted essay:

a well-written program will always be recognised as such and appreciated by a skilful programmer.

Correctness of a program refers to the property that:

  • the program always returns the desired/intended result as long as the input is valid (e.g. a program computing the factorial will actually return $ !n$ as long as the input is a positive integer). This is informal definition is known as partial correctness
  • the program furthermore terminates for all inputs - which is known as total correctness

Although not obvious right away, there is a strong link between:

  • program correctness and
  • program development

This link is emphasised by the following remarks:

  • Testing proves the presence not the absence of bugs (E. Dijkstra)
  • It is infeasible to perform exhaustive testing for non-trivial software programs. Exhaustive testing means considering all possible inputs of the program at hand.

We recall another of Dijkstra's quotes:

  • It is not only the programmers task to produce correct programs but also to demonstrate its correctness in a convincing manner, then the above remarks have a profound influence on the programmer's activity: the object he has to produce must be usefully structured

from Notes on Structured Programming.

Almost 50 years after Dijkstra's essay, an active research area consists in development techniques for programs in such a way that their correctness can be verified automatically.

We shall adopt Dijkstra's viewpoint regarding program correctness. To this end, we shall investigate a tool which allows us to structure programs in such a way that we can efficiently reason about their correctness. The reasoning process we shall look at is mental (not automatic, i.e. machine implementable). However, (semi-)automated program reasoning techniques exist.

The tool is Abstract Data Types (ADTs). In a nutshell:

  • An ADTs allows one to create an interface between the actual type representation (of e.g. a list, tree, etc.) together with specific operations (e.g. concatenation, tree traversal) and how the type and operations are being used by other programmers. The message here is that programmers that employ such types, do not need (and should not have) information regarding implementation, but only about behaviour;
  • ADTs allow specifying operator behaviours (via axioms)
  • ADTs allow making proofs regarding these behaviours, which allow us to draw conclusions on the correctness of programs which use ADTs.
  • ADTs are a programming discipline which is currently supported in languages such as Scala and which is regarded as particularly useful in a wide range of applications

Consider the following program:

#define INDEX_OUT_OF_BOUNDS 1
 
int EXCEPTION = 0;
void throw (int e){
	EXCEPTION = e;
}
 
struct AList {
	int* v;
	int sz, len;
};
 
typedef struct AList AList;
 
AList Empty(){
	AList l;
	l.sz = 0;
	l.len = 0;
	return l;
}
 
void copy (int* src, int s_start, int end, int* dst, int d_start){
	int i;
	for (i = s_start; i<end; i++)
		dst[d_start++] = src[i];
}
 
AList add (AList l, int e){
	if (l.v == 0){                 // este v alocat? (Nu...)
		l.v = malloc(sizeof(int));
		l.sz = 1;
		l.len = 1;
		l.v[0] = e;
		return l;
	}
	if (l.sz == l.len){
		int* vp = malloc(l.len*2*sizeof(int));
		copy(l.v,0,l.len,vp,0);
		l.len *= 2;
		l.v = vp;
	}
	l.v[l.sz] = e;
	l.sz++;
	return l;
}
 
int get (AList l, int pos){
	if (pos >= l.sz || pos < 0){
		throw(INDEX_OUT_OF_BOUNDS);
		return 0;
	}
	return l.v[pos];
}
 
AList ins (AList l, int pos, int e){
	if (pos > l.sz || pos < 0){
		throw(INDEX_OUT_OF_BOUNDS);
		return l;
	}
	if (pos == l.sz)
		return add(l,e);
	else
	{
		int temp = l.v[pos];
		l.v[pos] = e;
		return ins(l,pos+1,temp);
	}
}

It contains the methods:

  • Empty - returns an empty list
  • copy - used for copying elements from one array to another
  • add - adds an element to a given list (at the end)
  • get - returns the value from a given position
  • ins - inserts an element at a given position in the list

The AList (short for ArrayList) represents a list as an array. Whenever the array becomes full, the capacity of the array doubles.

Now consider the following code:

#define INDEX_OUT_OF_BOUNDS 1
 
int EXCEPTION = 0;
void throw (int e){
	EXCEPTION = e;
}
 
struct LList {
	struct LList* next;
	int val;
};
 
typedef struct LList* LList;
 
LList Empty(){
	return 0;
}
LList add (LList l, int e){
	LList n = malloc(sizeof(struct LList));
	n->val = e;
 
	n->next = l;
	return n;
}
int get (LList l, int pos){
	if (pos < 0 || l == 0){
		throw(INDEX_OUT_OF_BOUNDS);
		return 0;
	}
	if (pos == 0)
		return l->val;
	else
		return get(l->next,pos-1);
}
void ins (LList l, int pos, int e){
	if (pos < 0 || l == 0){
		throw(INDEX_OUT_OF_BOUNDS);
		return;
	}
 
	if (pos == 0){
		LList n = malloc(sizeof(struct LList));
		n->val = e;
		n->next = l->next;
		l->next = n;
	}
	else
		ins (l->next,pos-1,e);
}

The LList (abreviating LinkedList) contains precisely the same methods as the ArrayList. The sole difference in behaviour here is that add will add an element at the beginning of the list. Of course, implementations are conceptually different. The efficiency of each list implementation is also different. For instance:

  • inserting in a LList takes constant time, while in a AList takes linear time w.r.t. the size of the list - in the worst case. (We shall refine this analysis later).
  • accessing a given position from a LList takes linear time while in a AList - constant time.

Leaving efficiency aside, the behaviour of both lists is the same, and any program is expected to behave in the same way irrespective of the type of list implementation which is deployed.

With this in mind, we develop the following List abstraction. For convenience we (temporarily) use C code to describe this abstraction:

List Empty();
List cons (int e, List l);
int head (List l);
List tail (List l);
int isEmpty(List l);

We can group the above function definitions in two categories:

  • constructors (cons and Empty). Using any combination of these functions we can create any list;
  • observers (head, tail, isEmpty). Using any combination of these functions we can inspect any element of the list;

Furthermore, any operation defined on lists can be expressed using a combination of these functions:

List add (List l, int e){
	return cons(e,l);
}
 
int get (List l, int pos){
	if (pos == 0) 
		return head(l);
	return get(tail(l),pos-1);
}
 
List ins (List l, int pos, int e){
	if (pos == 0)
		return cons(e,l);
	return cons(head(l),ins(tail(l),pos-1,e));
}
 
void show (List l){
	if (isEmpty(l))
		printf("[]\n");
	else{
		printf("%i ",head(l));
		show(tail(l));
	}
}

In the previous code, we have shown implementations of the functions add, ins, get together with the display function show. These latter implementations are independent on how the type of the list:

  • the code can be reused directly
  • however, in C we cannot deploy exactly the same code, since we do not have an abstraction mechanism such as classes and inheritance. Hence, the same code needs to be copy-pasted for each implementation.

Note that we have defined list operations without actually implementing our list abstractions. The implementations follow for AList:

struct AList {
	int* v;
	int sz, len;
};
 
typedef struct AList AList;
 
AList Empty(){
	AList l;
	l.sz = 0;
	l.len = 0;
	return l;
}
void copy (int* src, int s_start, int end, int* dst, int d_start){
	int i;
	for (i = s_start; i<end; i++)
		dst[d_start++] = src[i];
}
AList cons (int e, AList l){
	if (l.v == 0){
		l.v = malloc(sizeof(int));
		l.sz = 1;
		l.len = 1;
		l.v[0] = e;
		return l;
	}
	if (l.sz == l.len){
		int* vp = malloc(l.len*2*sizeof(int));
		copy(l.v,0,l.len,vp,0);
		l.len *= 2;
		l.v = vp;
	}
	l.v[l.sz] = e;
	l.sz++;
	return l;
}
int head (AList l){
	return l.v[l.sz-1];
}
AList tail (AList l){
	l.sz --;
	return l; 
}
int isEmpty(AList l){
	return l.sz == 0;
}

as well as for LList:

struct LList {
	struct LList* next;
	int val;
};
 
typedef struct LList* LList;
 
LList Empty(){
	return 0;
}
LList cons(int e, LList l){
	LList n = malloc(sizeof(struct LList));
	n->val = e;
 
	n->next = l;
	return n;
}
int isEmpty(LList l){
	return l==0;
}
int head(LList l){
	return l->val;
}
LList tail(LList l){
	return l->next;
}

Let us recap:

  • we have introduced an abstraction for Lists which consists of:
    • functions which define list construction (cons, Empty)
    • functions which define list inspection (head, tail, isEmpty)
  • we have separated List implementation from list functionality. All other functionality is defined w.r.t. the above functions. No other implementation info is used.
  • This allows us to reason about list-manipulating programs independently on how lists are implemented.