Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
lfa:dictionaries [2021/09/28 12:58]
pdmatei
lfa:dictionaries [2021/09/28 13:21] (current)
pdmatei
Line 1: Line 1:
 ====== Dictionaries and hashing ====== ====== Dictionaries and hashing ======
 +
 +===== A review over hashtables =====
  
 Python dictionaries are [[https://​en.wikipedia.org/​wiki/​Hash_table|hashtable]] implementations. In short, the instruction:​ Python dictionaries are [[https://​en.wikipedia.org/​wiki/​Hash_table|hashtable]] implementations. In short, the instruction:​
Line 7: Line 9:
 will: will:
   * apply the a function $math[hash] on object ''​x'',​ to obtain a //bucket// $math[b], ($math[hash(x)=b])   * apply the a function $math[hash] on object ''​x'',​ to obtain a //bucket// $math[b], ($math[hash(x)=b])
 +  * buckets are just collections of key-value pairs (often implemented as arrays). Next we search (in linear time) for key ''​x''​ in bucket $math[b], and return it's corresponding value
 +
 +Good dispersion functions:
 +  * If the function $math[hash] is a **good** dispersion function, then its range (co-domain) will be large, which means that we have a lot of buckets with very few pairs inside them.
 +  * A very **bad** dispersion function is the constant function, which performs exactly like an array (a single bucket - with all pairs inside it)
 +  * The **best possible** dispersion function would have one bucket per pair
 +
 +Efficient dispersion functions:
 +  * An efficient dispersion function is easily computable (finding the proper bucket value is fast)
 +
 +===== Dictionaries =====
 +
 +We often use integers and strings as **keys**, when relying on dictionaries in Python. These types are **immutable**,​ which means they **do not change during the execution of a program**. As a consequence:​
 +  * **we can use a hash function to always get the same bucket value**.
 +
 +If we use **mutable** objects (e.g. lists) as keys in Python, we get a type-error: ''​TypeError:​ unhashable type: '​list'''​. The following example shows why mutable objects cannot be used as keys:
 +<code python>
 +l = []
 +
 +d = {}
 +d[l] = 1   # the pair (l,1) is stored in the bucket hash(l)
 +
 +l.append(0)
 +
 +print(d[l]) # we use hash(l) to obtain the value assigned to l, but l was changed? ​
 +            # how would the hash function work to always compute the same bucket in which l is assigned?
 +</​code>​
 +
 +Sets, as well as other datatypes who's value can change, cannot be used as keys in Python. An option is to use, if necessary, ''​frozenset'',​ which is the immutable alternative to sets in Python.
 +
 +===== Writing your own hash function =====
 +
 +So far, we have not pointed out who the hash function actually is. For predefined datatypes, Python implements efficient hash functions. However, we can also define hash functions for objects that we create:
 +
 +<code python>
 +class O:
 +  def __init__(self,​x):​
 +      self.x = x
 +  # this special function is the hashing function implemented for objects of type ''​O''​. Now we can use them as keys in a dictionary
 +  def __hash__(self):​
 +      return 0 # this implementation is not really efficient, but it simply illustrates the syntax for defining a hash-function of your own
 +
 +d = {}
 +ob = O(45)
 +d[ob] = "​hello"​
 +d[O(1)] = "​kitty" ​ # the pairs   ​O(45),"​hello" ​  ​and ​  ​O(1),"​kitty" ​ will share the same bucket (namely 0)
 +
 +
 +</​code>​
 +
 +
 +