====== Dictionaries and hashing ====== ===== A review over hashtables ===== Python dictionaries are [[https://en.wikipedia.org/wiki/Hash_table|hashtable]] implementations. In short, the instruction: val = d[x] will: * apply the a function $math[hash] on object ''x'', to obtain a //bucket// $math[b], ($math[hash(x)=b]) * buckets are just collections of key-value pairs (often implemented as arrays). Next we search (in linear time) for key ''x'' in bucket $math[b], and return it's corresponding value Good dispersion functions: * If the function $math[hash] is a **good** dispersion function, then its range (co-domain) will be large, which means that we have a lot of buckets with very few pairs inside them. * A very **bad** dispersion function is the constant function, which performs exactly like an array (a single bucket - with all pairs inside it) * The **best possible** dispersion function would have one bucket per pair Efficient dispersion functions: * An efficient dispersion function is easily computable (finding the proper bucket value is fast) ===== Dictionaries ===== We often use integers and strings as **keys**, when relying on dictionaries in Python. These types are **immutable**, which means they **do not change during the execution of a program**. As a consequence: * **we can use a hash function to always get the same bucket value**. If we use **mutable** objects (e.g. lists) as keys in Python, we get a type-error: ''TypeError: unhashable type: 'list'''. The following example shows why mutable objects cannot be used as keys: l = [] d = {} d[l] = 1 # the pair (l,1) is stored in the bucket hash(l) l.append(0) print(d[l]) # we use hash(l) to obtain the value assigned to l, but l was changed? # how would the hash function work to always compute the same bucket in which l is assigned? Sets, as well as other datatypes who's value can change, cannot be used as keys in Python. An option is to use, if necessary, ''frozenset'', which is the immutable alternative to sets in Python. ===== Writing your own hash function ===== So far, we have not pointed out who the hash function actually is. For predefined datatypes, Python implements efficient hash functions. However, we can also define hash functions for objects that we create: class O: def __init__(self,x): self.x = x # this special function is the hashing function implemented for objects of type ''O''. Now we can use them as keys in a dictionary def __hash__(self): return 0 # this implementation is not really efficient, but it simply illustrates the syntax for defining a hash-function of your own d = {} ob = O(45) d[ob] = "hello" d[O(1)] = "kitty" # the pairs O(45),"hello" and O(1),"kitty" will share the same bucket (namely 0)