Dictionaries and hashing

Python dictionaries are hashtable implementations. In short, the instruction:

val = d[x]

will:

  • apply the a function $ hash$ on object x, to obtain a bucket $ b$ , ($ hash(x)=b$ )
  • buckets are just collections of key-value pairs (often implemented as arrays). Next we search (in linear time) for key x in bucket $ b$ , and return it's corresponding value

Good dispersion functions:

  • If the function $ hash$ is a good dispersion function, then its range (co-domain) will be large, which means that we have a lot of buckets with very few pairs inside them.
  • A very bad dispersion function is the constant function, which performs exactly like an array (a single bucket - with all pairs inside it)
  • The best possible dispersion function would have one bucket per pair

Efficient dispersion functions:

  • An efficient dispersion function is easily computable (finding the proper bucket value is fast)

We often use integers and strings as keys, when relying on dictionaries in Python. These types are immutable, which means they do not change during the execution of a program. As a consequence:

  • we can use a hash function to always get the same bucket value.

If we use mutable objects (e.g. lists) as keys in Python, we get a type-error: TypeError: unhashable type: 'list'. The following example shows why mutable objects cannot be used as keys:

l = []
 
d = {}
d[l] = 1   # the pair (l,1) is stored in the bucket hash(l)
 
l.append(0)
 
print(d[l]) # we use hash(l) to obtain the value assigned to l, but l was changed? 
            # how would the hash function work to always compute the same bucket in which l is assigned?

Sets, as well as other datatypes who's value can change, cannot be used as keys in Python. An option is to use, if necessary, frozenset, which is the immutable alternative to sets in Python.

So far, we have not pointed out who the hash function actually is. For predefined datatypes, Python implements efficient hash functions. However, we can also define hash functions for objects that we create:

class O:
  def __init__(self,x):
      self.x = x
  # this special function is the hashing function implemented for objects of type ''O''. Now we can use them as keys in a dictionary
  def __hash__(self):
      return 0 # this implementation is not really efficient, but it simply illustrates the syntax for defining a hash-function of your own
 
d = {}
ob = O(45)
d[ob] = "hello"
d[O(1)] = "kitty"  # the pairs   O(45),"hello"   and   O(1),"kitty"  will share the same bucket (namely 0)