This is an old revision of the document!


Amortised analysis

Recall our FIFO implementation relying on two lists, from the previous lecture. This implementation has several advantages:

  • it offers constant costs for insertion, removal (dequeue), retrieval;
  • the implementation allows lock-free insertion and removal: imagine one thread attempts to insert, while another attempts to retrieve an element. The order in which each thread executes its operation is irrelevant, and there are no consistency issues. Moreover, the operations can be parallelised since they operate on different lists.
  • it is the straightforward FIFO implementation in pure functional languages, where all objects are immutable (cannot be modified).

Let us return to the first point above, regarding the cost of removing an element is not constant in the general case: if the right list contains only one element, removal will trigger the copying of elements from the left list to the right list, totalling a cost of $ \Theta(n)$ where $ n$ is the size of the FIFO.

A worst-case analysis yields the cost of removal to be $ O(n)$ . In this lecture, we show that this analysis is not precise, and not fair to the FIFO implementation. We start with a few observations:

  • not all dequeue operations may have cost $ \Theta(n)$ . After normalisation, we have a deterministic number of subsequent dequeues which have cost $ 1$ .
  • thus, we consider a sequence:

$ S = op_1, \ldots, op_n $

of $ n$ operations. We consider that each $ op_i$ can be any of: $ enqueue$ , $ dequeue$ or $ top$ . In what follows, we study three methods for determining:

  • the total cost of the sequence and
  • the average cost per operation, in a worst-case analysis (which does not make any limiting assumption on the structure of the FIFO).
  • Suppose that, before performing the sequence $ S$ , the FIFO is empty. (The analysis is similar if the FIFO already contains some elements).
  • Also, suppose that $ S$ does not contain $ top$ operations: they always have constant cost, hence $ cost(S_1) \geq cost(S_2)$ for any two sequences $ S_1$ and $ S_2$ of the same size such that $ S_2$ contains $ top$ operations.

We observe that:

$ cost(S)= cost(ins_l) + cost(del_l) + cost(ins_r) + cost(del_r)$

where $ ins_x$ and $ del_x$ are the costs for all insertions resp. removals from the list $ x$ . Also, we assume each individual insertion/deletion from a list has cost 1.

  • In $ S$ , at most $ n$ elements may be inserted, where $ n$ is the total number of operations. Hence: $ cost(ins_l) \leq n$ .
  • By the implementation, we know that $ cost(del_l) = cost(ins_r)$ : the normalisation will remove each element from $ l$ and introduce it in $ r$ and also that $ cost(del_l) \leq cost(ins_l)$ : the normalisation will remove at most as many elements from $ l$ as there have been inserted.
  • Finally: $ cost(del_r) \leq cost(ins_r)$ : we cannot remove more elements from $ r$ than we have inserted. This would mean we are executing invalid $ dequeue$ operations.

Hence:

$ cost(S) \leq n + n + n + n = 4n$

The average cost per operation is $ \frac{cost(S)}{n}$ , hence it is at most $ 4$ . This analysis shows that we can safely assume each $ enqueue$ or $ dequeue$ operation has individual cost $ 4$ , which is on average an upper bound on the real cost.

Therefore, in our algorithm which employs a FIFO, the average cost per FIFO operation is constant.

Remarks: The aggregate method generally tries to find an (asymptotically) tight bound on the cost of a sequence of operations, by aggregating costs. In our example, aggregation meant estimating the number of operations on lists $ l$ and $ r$ , instead of explicitly counting element insertions, moves and removals.

As before, suppose $ S$ contains only $ enqueue$ and $ dequeue$ . In the banking method we imagine that our data-structure: the FIFO, is a bank. We (over) estimate the cost of each operation type, in such a way that:

  • each cheap operation (e.g. $ enqueue$ ) adds credit to the bank;
  • each expensive operations (e.g. $ dequeue$ with normalisation) takes credit from the bank;

We call this estimated cost ammortised cost (usually denoted as $ \hat{c}$ ).

The golden rule of the banking method is that: no expensive operation can take more credit than the bank has available.

For the FIFO, we estimate:

  • $ \hat{c}_{enq} = 3$ since each inserted element in $ l$ may be subsequently removed from $ l$ and inserted in $ r$ . We charge extra to amortise for this potential cost.
  • which represents the deletion from $ r$ .

To validate this estimation, we verify the golden rule.

Let $ e_i = \hat{c_i} - c_i$ where $ \hat{c_i}$ is the ammortised cost of the ith operation, and $ c_i$ is the real cost.

  • if $ e_i \geq 0$ then $ e_i$ is a surplus added to the bank
  • if $ e_i < 0$ then $ e_i$ is credit taken from the bank.

The golden rule is formally expressed as follows:

$ \displaystyle \forall S: \sum_{ith\;op\;in\;S} e_i \geq 0$

The quantification $ \forall S$ means: after any sequence of operations, while the sum captures the total credit from the bank at the end of executing sequence $ S$ .

The rule is generally presented in the form:

$ \displaystyle \forall S: \sum_{ith\;op\;in\;S} \hat{c}_i \geq \sum_{ith\;op\;in\;S} c_i$

which states that: the sum of the ammortised costs in any sequence of operations must be an upper limit on the sum of real costs. We verify this inequality for our estimation of FIFO ammortised costs. We need to check that:

$ 3*\#enq + \#deq \geq cost(ins_l) + cost(copy) + cost(del_r)$

where $ \#enq$ (resp. $ \#deq$ ) are the number of $ enqueue$ (resp. $ dequeue$ ) operations. Similar to the aggregate method, we have reformulated the real cost in terms of list operations. As before, we observe that:

  • $ \#enq = cost(ins_l)$
  • $ 2*\#enq \geq cost(copy)$ (each inserted element is at most removed from $ l$ and inserted in $ r$ )
  • $ \#deq = cost(del_r)$

which concludes our analysis.

Remarks:

  • there exists no unique good choice for ammortised costs. In our example, an equally good choice would have been $ \hat{c}_{enq} = 4$ and $ \hat{c}_{deq} = 0$ which would have yielded an inequality similar to that from the aggregate method.
  • The essential objective of ammortised analysis is to prove that costs per op are constant on average (in the general case, assymptotically-lower than a worst-case, single-op analysis).

The potential method is conceptually similar to the banking method, however, instead of estimating ammortised cost, we estimate a potential function which models how the credit from the bank changes. More precisely:

  • after a cheap operation, the potential of the data-structure grows;
  • after an expensive operation the accumulated potential of the data-structure is consumed;

The golden rule of the potential method, is that the difference of potential from the initial and any current state of a data-structure can never be negative.

Let $ S = op_1, \ldots, op_n$ and denote by $ F_i$ the state (contents) of the FIFO after the $ ith$ operation. Also, denote by $ \Phi(F_i)$ the potential of the FIFO after the $ ith$ operation. $ \Phi(F_0)$ is the potential of the FIFO in the initial state. The golden rule is expressed as:

$ \forall S: \Phi(F_n) - \Phi(F_0) \geq 0$

We estimate the potential function to be:

$ \Phi(F_n) = 2 * size(l)$

where $ size(l)$ is the size of the left list. The golden rule is easily verified in this particular case, since $ \Phi(F_0)$ is $ 0$ and $ \Phi$ is always positive.

Having found the potential function, we can determine the ammortised cost via the following general formula:

$ \hat{c_i} = c_i + \Phi(F_i) - \Phi(F_{i-1})$

which states that the ammortised cost of an operation is the real cost together with the difference in potential between the $ i-1$ and $ i$ th operations (the latter being positive or negative).

Hence:

$ \hat{c}_{enq} = 1 + 2*size(l_i) - 2*size(l_{i-1}) = 1 + 2 = 3$

where $ size(l_i)$ is the size of the left list after the $ ith$ operation.

For dequeue, we consider two cases:

  • no normalization takes place: $ \hat{c}_{deq} = 1 + 2*size(l_i) - 2*size(l_{i-1}) = 1 + 0 = 1$
  • normalization takes place: $ \hat{c}_{deq} = 1 + 2*size(l_{i-1}) + 0 - 2*size(l_{i-1}) = 1$

note that during dequeueing, the size of the left list does not change.

Remarks:

  • the golden rule for the potential method seems to be very relaxed. There is no other general rule-of-thumb for identifying potential functions. However, there are no guarantees that appropriate ammortised costs can be found, no matter the (valid) choice of $ \Phi$ .
  • the banking and potential methods can be selectively applied, depending on how easy it is to spot an ammortised cost per op, or a potential function.
  • It is not generally guaranteed that the same ammortised cost will be found no matter which method we apply, however this is unimportant, as long as the found ammortised costs are asymptotically the same.

Application - ArrayList

Consider the array list implementation illustrated in the previous lectures. The cost of an insert (cons) operation is $ \Theta(n)$ if the capacity of the array holding the list is full, where $ n$ is the number of elements in the array.

We analyse the cost of a sequence of $ ins$ operations performed on an array list. Let $ size(L)$ denote the capacity in the holding array, and $ elems(L)$ denote the number of elements inserted in the array list.

We recall that:

  • if $ size(L) = elems(L)$ an insert operation has cost size(L)+1 (copy and actual insert)
  • if $ size(L) > elems(L)$ an insert operation has cost 1.

Let $ S$ be a sequence of $ ins$ operations. We aggregate the actual insertion vs copy costs. This is illustrated in the table below, for a sequence of 9 operations:

Operation no 1 2 3 4 5 6 7 8 9
Total cost 1 2 3 1 5 1 1 1 9
Copy cost 0 1 2 0 4 0 0 0 8
Ins cost 1 1 1 1 1 1 1 1 1

In the general case: $ cost(S) = ins\_cost(S) + copy\_cost(S)$

To compute $ copy\_cost(S)$ , we observe that, if $ k$ is the number of copy operations after a sequence of $ n$ operations, then:

$ 2^{k-1} < n \leq 2^{k}$ hence: $ k-1 < log (n) \leq k$ and thus $ k = \lceil\log{n}\rceil$ is the number of copy operations.

$ \displaystyle cost(S) = n + \sum_{i=0}^{\lceil\log{n}\rceil - 1} 2^i \leq n + \sum_{i=0}^{\log{n}} 2^i = 3n - 1$

Thus, the average cost per operation is constant: $ \frac{cost(S)}{n} = \frac{\Theta(n)}{n} = \Theta(1)$

We estimate:

  • $ \hat{c}_{ins} = 3$ - each inserted elements puts credit aside, for: (i) it's own copy, (ii) the copy of another element.

We illustrate this choice via an example. Suppose we have a half-full array, and the current credit is zero:

credit = 0:

* *

after an insertion, 1 was payed for it, and credit = 2:

* * *

after another insertion, credit = 4:

* * * *

Now the array is full, and we have enough credit to pay for the copy of all elements. After another insertion, credit = 0, and the array becomes again half-full:

* * * *

We verify the golden rule of the banking method:

$ \displaystyle \forall S: \sum_{ith\;op\;in\;S} \hat{c}_i \geq \sum_{ith\;op\;in\;S} c_i$

which yields:

$ \displaystyle \forall S: 3n \geq \sum_{ith\;op\;in\;S} c_i$

which has already been verified via the banking method.