Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
lfa:dictionaries [2021/09/28 12:56]
pdmatei created
lfa:dictionaries [2021/09/28 13:21] (current)
pdmatei
Line 1: Line 1:
 ====== Dictionaries and hashing ====== ====== Dictionaries and hashing ======
  
-Python dictionaries are [[https://​en.wikipedia.org/​wiki/​Hash_table|hashtable]] implementations. ​+===== A review over hashtables ===== 
 + 
 +Python dictionaries are [[https://​en.wikipedia.org/​wiki/​Hash_table|hashtable]] implementations. ​In short, the instruction:​ 
 +<code python>​ 
 +val = d[x] 
 +</​code>​ 
 +will: 
 +  * apply the a function $math[hash] on object ''​x'',​ to obtain a //bucket// $math[b], ($math[hash(x)=b]) 
 +  * buckets are just collections of key-value pairs (often implemented as arrays). Next we search (in linear time) for key ''​x''​ in bucket $math[b], and return it's corresponding value 
 + 
 +Good dispersion functions:​ 
 +  * If the function $math[hash] is a **good** dispersion function, then its range (co-domain) will be large, which means that we have a lot of buckets with very few pairs inside them. 
 +  * A very **bad** dispersion function is the constant function, which performs exactly like an array (a single bucket - with all pairs inside it) 
 +  * The **best possible** dispersion function would have one bucket per pair 
 + 
 +Efficient dispersion functions:​ 
 +  * An efficient dispersion function is easily computable (finding the proper bucket value is fast) 
 + 
 +===== Dictionaries ===== 
 + 
 +We often use integers and strings as **keys**, when relying on dictionaries in Python. These types are **immutable**,​ which means they **do not change during the execution of a program**. As a consequence:​ 
 +  * **we can use a hash function to always get the same bucket value**. 
 + 
 +If we use **mutable** objects (e.g. lists) as keys in Python, we get a type-error: ''​TypeError:​ unhashable type: '​list'''​. The following example shows why mutable objects cannot be used as keys: 
 +<code python>​ 
 +l = [] 
 + 
 +d = {} 
 +d[l] = 1   # the pair (l,1) is stored in the bucket hash(l) 
 + 
 +l.append(0) 
 + 
 +print(d[l]) # we use hash(l) to obtain the value assigned to l, but l was changed?  
 +            # how would the hash function work to always compute the same bucket in which l is assigned? 
 +</​code>​ 
 + 
 +Sets, as well as other datatypes who's value can change, cannot be used as keys in Python. An option is to use, if necessary, ''​frozenset'',​ which is the immutable alternative to sets in Python. 
 + 
 +===== Writing your own hash function ===== 
 + 
 +So far, we have not pointed out who the hash function actually is. For predefined datatypes, Python implements efficient hash functions. However, we can also define hash functions for objects that we create: 
 + 
 +<code python>​ 
 +class O: 
 +  def __init__(self,​x):​ 
 +      self.x = x 
 +  # this special function is the hashing function implemented for objects of type ''​O''​. Now we can use them as keys in a dictionary 
 +  def __hash__(self):​ 
 +      return 0 # this implementation is not really efficient, but it simply illustrates the syntax for defining a hash-function of your own 
 + 
 +d = {} 
 +ob = O(45) 
 +d[ob] = "​hello"​ 
 +d[O(1)] = "​kitty" ​ # the pairs   ​O(45),"​hello" ​  ​and ​  ​O(1),"​kitty" ​ will share the same bucket (namely 0) 
 + 
 + 
 +</​code>​ 
 + 
 +