Data structure budget method python implements hash table

Hash table

In order to quickly locate an element, the scientist gives each element a "logical subscript", and then directly finds the element. The hash table is such an implementation. He uses a hash function to calculate where an element should be placed. Of course For a specific element, the subscript calculated by the hash function must be the same each time, and the range cannot exceed the length of the given array.
If we have an array t containing m=13 elements, we can define a simple Hash function h(key) = key%m where the modulus function makes the result of h(key) not exceed the subscript of the array length. Insert the following elements 756, 431, 142, 579, 226, 903, 338
Hash conflicts: After the hash algorithm calculation, a subscript conflict occurs. The current value under the subscript will have a link to the conflicting value (link method). Disadvantages, if there are more conflicts, The time complexity of searching is not O(1).
There is also an open addressing method: its basic idea is to use a way to find the next slot when a slot is occupied, according to finding the next slot The methods are different, divided into:
linear probe: when the slot is occupied, find the next available slot.
Secondary probe: when the slot is occupied, use the second as the offset.
Double hash: recalculate the hash result.
Python uses The second method of probing means that if there is a conflict, we will increase the i-
squared
load factor to the original position. Load factor: The number of slots that have been used is higher than the total number of slots.
When the space is not enough, we define a load factor Concept, such as inserting 8 elements, the total number of slots is 13, that is, 8/13 is approximately equal to 0.62. In general, when the load factor is less than 0.8, it is necessary to open up a new space, and re-hashing
red hi ha
when the load factor is less than 0.8, will re-open space, open space strategy depends on the specific underlying implementation, not after the Re-hash the data of the empty slot into the table,
simulate cpython to implement a hash table
The three most commonly used operations, add, get, and remove
slots generally have three states:
never used: HashMap.UNUSED has been used
but removed : HashMap.EMPTY
is using the node

class Array(object):

    def __init__(self,size=32,inits=None):
       self._size = size
       self._items = inits* size

    def __getitem__(self,index):
        return self._items[index]

    def __setitem__(self,index,value):
        self._items[index] = value

    def __len__(self):
        return self._size

    def clear(self,value = None):
        for i in range(self._items):
            self._items[i] = value

    def __iter__(self):
        for item in self._items:
            yield item

#定义一个槽
class Slot(object):
    """
    定义一个哈希表数组的槽
    注意,一个槽有三种状态,看你能否想明白,相比链接法解决冲突,二次探查法删除一个key更加复杂
    1.从未使用 HashMap.UNUSED.次槽没有被使用和冲突过,查找时只要找到UNUSED就不用再继续探查了
    2.使用过但是remove了,此时hashMpa.BMPTY,该探查点后边的元素仍可是key
    3.槽正在使用Slot点
    """

    def __init__(self,key,value):
        self.key,self.value = key,value


#实现哈希表
class HashTable(object):
    UNUSED = None #没被使用过
    BMPTY = Slot(None,None)#使用过但是删除了

    #初始化函数
    def __init__(self):
        #定义一个数组
        self._table = Array(8,init = HashTable.UNUSED)
        #插入了多少个值
        self.length = 0

    #装饰器,用于计算装载因子
    @property
    def _load_factor(self):
        return self.length / float(len(self._table))

    #hash表的长度
    def __len__(self):
        return self.lenght

    #hsah函数 根据key得到一个数组的下标
    def _hash(self,key):
        #abs返回绝对值 hash返回key的hash值
        return abs(hash(key)) % len(self._table)

    #寻找key
    def _find_key(self,key):
        #调用哈希函数,得到第一个槽的位置
        index = self._hash(key)
        #hash表的长度
        _len = len(self._table)
        #表中槽的init值是否为HashMapp.UNUSED
        while self._table[index] is not HashTable.UNUSED:
            #表中槽的init是否为HashMap.EMPTY
            if self._table[index] is HashTable.EMPTY:
                #重新计算key
                index = (index * 5 + 1) % _len
                #跳过出此次循环
                continue
            #表中有这个key直接输出key
            elif self._table[index].key == key:
                return index
            #否则重新计算key
            else:
                index = (index * 5 + 1) % _len
        #返回None
        return None
    #判断槽能不能被插入
    def _silot_con_insert(self,index):
        return (self._table[index] is HashTable.BMPTY or self._table[index] is HashTable.UNUSED)

    #找到一个空槽的位置
    def _find_slot_for_insert(self,key):
        #获取k的哈希值
        index = self._hash(key)
        #获取到hash表的长度
        _len = len(self._table)
        #如果不能插入,就重新定义key
        while not self._slot_for_insert(index):
            index = (index * 5 + 1) % _len
        return index
    #添加键值对
    def add(self,key,value):

        if key in self:
            #如果有就找到key
            index = self._find_key(key)
            #把value赋值给槽
            self._table[index].value = value
            return False
        else:
            #如果没有找到一个空槽的key
            index = self._find_slot_for_insert(key)
            #这个空槽指向新的槽
            self._table[index] = Slot(key,value)
            #长度加一
            self.length += 1
            #计算装载因子
            if self._load_factor >= 0.8:
                self._rehash()
            return True

    #装载因子小于0.8
    def _rehash(self):
        #把原来的表,赋值给old_table
        old_table = self._table
        #原来的长度剩2,赋值给newsize
        newsize = len(self._table)*2
        #新建一个数组
        self._table = Array(newsize,HashTable.UNUSED)
        self.length = 0
        #遍历原来的数组
        for slot in old_table:
            #判断是否有值
            if slot is not HashTable.UNUSED and slot is not HashTable.BMPTY:
                #找到新key,赋值给index
                index = self._find_slot_for_insert(slot.key)
                #slot赋值到槽中
                self._table[index] = slot
                #长度加一
                self.length += 1
    #取值操作
    def get(self,key,default=None):
        #找到这个key
        index = self._find_key(key)
        #key如果是和None
        if index is None:
            #返回None
            return default
        else:
            #否则槽中value
            return self._table[index].value
    #删除操作
    def remove(self,key):
        #找到key
        index = self._find_key(key)
        #key为None就抛出异常
        if index is None:
            raise KeyError()
        #取到槽中的value
        value = self._table[index].value
        #长度减一
        self.length -=1
        #赋值给key一个状态
        self._table[index] = HashTable.BMPTY
        #赶回值
        return value
    #遍历
    def __iter__(self):
        #循环数组
        for slot in self._table:
            #判断状态
            if slot not in (HashTable.BMPTY,HashTable.UNUSED):
                #生成值
                yield slot.key

Guess you like

Origin blog.csdn.net/weixin_44865158/article/details/100777969