Transfer: cnblog hanahimi hash table study notes
https://www.cnblogs.com/hanahimi/p/4765265.html

python data structures and algorithms - hash table

Study Notes hash table

Reference translated from: online version of the "complexity thinking" and corresponding to: http: //greenteapress.com/complexity/html/thinkcomplexity004.html

Using a hash table can be very fast lookup and the time constant, while eliminating the need element arranged orderly

python's built-in data types: dictionary, is to use a hash table to achieve

In order to explain the working principle of the hash table, we try to achieve in a hash table structure without using a dictionary.

We need to define a key comprising -> value map data structure, while achieving the following two operations:

add(k, v):

　　Add a new item that maps from key k to value v.

　　With a Python dictionary,d, this operation is written d[k] = v.

get(target):

　　Look up and return the value that corresponds to key target.

　　With a Python dictionary, d, this operation is written d[target] or d.get(target).

A simple implementation is to establish a linear table is used to implement a tuple mapping relationship between the key-value

class LinearMap(object):
    """ 线性表结构 """
     def __init__(self):
         self.items = []
     
     def add(self, k, v):    # 往表中添加元素
         self.items.append((k,v))
     
     def get(self, k):       # 线性方式查找元素
         for key, val in self.items:
             if key==k:      # 键存在，返回值，否则抛出异常
                 return val
         raise KeyError

We can use the add lets you add elements to the list of items to keep order, and use get take binary search mode, the time complexity is O (log n). However, insert a new element to the list is actually a linear operation, so this approach is not the best way. At the same time, we still do not reach the constant seek time requirements.

We can do the following improvements, the overall look-up table is divided into smaller lists several segments, such as 100 sub-segments. A key hash value obtained by hash function, and then calculated, to give added or where to find the sub-segment. With respect to the search list from scratch, time will be greatly shortened. Although get the grow operation is still linear, but BetterMap class makes us one step closer to the hash table:

class BetterMap(object):
    """ 利用LinearMap对象作为子表，建立更快的查询表 """

    def __init__(self, n=100):
        self.maps = []  # 总表格
        for i in range(n):  # 根据n的大小建立n个空的子表
            self.maps.append(LinearMap())

    def find_map(self, k):  # 通过hash函数计算索引值
        index = hash(k) % len(self.maps)
        return self.maps[index]  # 返回索引子表的引用

    # 寻找合适的子表（linearMap对象）,进行添加和查找
    def add(self, k, v):
        m = self.find_map(k)
        m.add(k, v)

    def get(self, k):
        m = self.find_map(k)
        return m.get(k)

have a test:

if __name__ == "__main__":
    table = BetterMap()
    pricedata = [("Hohner257", 257),
                 ("SW1664", 280),
                 ("SCX64", 1090),
                 ("SCX48", 830),
                 ("Super64", 2238),
                 ("CX12", 1130),
                 ("Hohner270", 620),
                 ("F64C", 9720),
                 ("S48", 1988)]

    for item, price in pricedata:
        table.add(k=item, v=price)

    print
    table.get("CX12")
    # >>> 1130
    print
    table.get("QIMEI1248")
    # >>> raise KeyError

Since the hash value of each bond must be different, so the value of the hash value modulo base is different.

When n = 100, the search speed is about LinearMap BetterMap 100 times.

Obviously, the search speed is limited BetterMap parameter n, wherein the length of each LinearMap while not fixed, so that the elements sub-segment is still linear search. If we can limit the maximum length of each sub-section, this time to find in a single sub-segment is responsible for the degree there is a fixed upper limit, LinearMap.get time complexity of the method has become a constant. As a result, we only need to keep track of the number of elements whenever the number of elements in a LinearMap exceeds the threshold, the entire hashtable rearrangement, while adding more LinearMap, this way you can ensure that the search operation is a constant friends.

The following is the realization hashtable:

class HashMap(object):
    def __init__(self):
        # 初始化总表为，容量为2的表格（含两个子表）
        self.maps = BetterMap(2)
        self.num = 0  # 表中数据个数

    def get(self, k):
        return self.maps.get(k)

    def add(self, k, v):
        # 若当前元素数量达到临界值（子表总数）时，进行重排操作
        # 对总表进行扩张，增加子表的个数为当前元素个数的两倍！
        if self.num == len(self.maps.maps):
            self.resize()

        # 往重排过后的 self.map 添加新的元素
        self.maps.add(k, v)
        self.num += 1

    def resize(self):
    """ 重排操作，添加新表, 注意重排需要线性的时间 """
    # 先建立一个新的表,子表数 = 2 * 元素个数
        new_maps = BetterMap(self.num * 2)
        for m in self.maps.maps:  # 检索每个旧的子表
            for k, v in m.items:  # 将子表的元素复制到新子表
                new_maps.add(k, v)
        self.maps = new_maps  # 令当前的表为新表

Add focus portion, the function checks the number of elements and the size BetterMap, if they are equal, the "average number of elements in each LinearMap is 1" , and then calls the resize method.

resize create a new table, the size is doubled, then the old elements in the table "rehashes re-hash" again, into a new table.

resize process is linear, it sounds like quite how good, because we ask hashtable has a constant time. However, be aware that we do not require frequent rearrangement operation, so add operations in most of the time are constant, occasional linear. Since the total time for the add operation n elements proportional to n, the average time is to add a constant!

Suppose we want to add 32 elements, as follows:

1. Since the initial length of 2, rearrangement need not add the first two, the total time of 2 times 2

2. The third add, rearranged to 4, consuming 2, 3 3rd time

3. 4th add, took 1 so far for a total of 6

4. 5th add, rearranged to 8, took 4, 5 5th time

5. Article consuming a total of 6 to 8 times 3 so far for a total of 6 + 5 + 3 = 14

6. 9th add, rearrangement 16, consuming 8, 9 ninth time

7. The first 10 to 16 times, consuming a total of 7, so far, a total time of 14 + 9 + 7 = 30

After 32 add, the total time per unit time 62 by the above process can be found a pattern, after n elements add, when n is a power of 2, the current total unit time is 2n-2, so the average add time absolutely less than 2 per unit time.

When n is a power of 2, the most suitable number, when n becomes large, the average time was increased slightly, it is important that we achieve O (1).

Tags: Python , algorithms , data structures

Python hash table

python data structures and algorithms - hash table

Study Notes hash table

Guess you like