Data structure hash table

First look at the picture below (an implementation of hash table) (very ugly 0.0)
Insert picture description here

Hash : The idea of ​​hashing is to distribute items (key/value pairs) in a series of buckets. Given a key (key), an algorithm is used to calculate an index (index), which shows the position of the entry. It can usually be done in two steps:

hash = hashfunc(key)        //计算key对应hash值
index = hash % array_size   //通过取模"%"使index始终位于0~array_size-1(即索引范围始终位于数组中)

Load factor (factor) : Load factor = the amount of data that has been inserted / the total amount of data that can be inserted. For a fixed number of buckets, the lookup time will increase as the number of entries increases, so the requiredConstant time. In order to achieve this constant time, the capacity can be expanded when the load factor is reached.

Expansion : Forcing all entries to be hashed, the array_size becomes larger (usually twice), so that the range of index becomes wider, and the probability that the hash value is different but the index is the same is reduced (for example, the result of modulo 17, 33 to 16 is 1, 1; and the result of 17 and 33 to 32 is 17, 1, and the entry distribution is more evenly distributed in the bucket, right?) What should I do if the index is still the same? Here (only) introduce the chain address method

System.out.println(17%32);

Chain address method : It looks like the picture above. The time of hash table operation is the time of finding the bucket (which is a constant) plus the time of list operation. The index of the first entry (entry) or the first entry can be placed in the bucket. HashMap belongs to the latter. This is because in most cases, the number of pointer traversals can be reduced by 1, thereby improving the access cache Efficiency (the cache refers to the memory, because the linked list cannot use the CPU cache, and the linked list cannot use the CPU cache because the CPU cache requires a continuous address space) but the disadvantage is that an empty bucket and entry occupy the same space (this is said online, but According to the situation of HashMap, HashMap will initialize the entry array, but the entry stored in the array does not have a null parameter constructor, so the final result is null instead of an initialized entry, so don’t be too silly 0.0)

Clock cycle (a bit off topic 0.0) : Clock cycle is the smallest unit of time for CPU to work, 1 / clock cycle = operating frequency, but the clock cycle required for CPU cache work is less, memory requires more clock cycles, it can be seen that the CPU is in When processing the same task in the CPU cache and the memory, the time taken is different (this is one of the reasons why accessing the array is faster than accessing the linked list! For details , see Array, Linked List for Memory and CPU Access Cache Mechanism )

Guess you like

Origin blog.csdn.net/weixin_44463178/article/details/108904607