How to resolve hash collision

Mainly in the following four ways:

① opening address method

       The basic idea is: when a keyword key hash address p = H (key) conflict to p basis, generate another hash address p1, p1 if the conflict remains, then p is based produce another Kazakhstan Greece address p2, ..., pi until you find a hash address does not conflict, into which the corresponding element.

② zipper law

      This method is constructed simultaneously a plurality of different hash functions: Hi = RH1 (key) i = 1,2, ..., k hash address when Hi = RH1 (key) when a conflict occurs, then calculating Hi = RH2 (key ) ...... until the conflict is no longer produced. This method is less likely to occur aggregation, but increase computation time.

③ re-hashing

      The basic idea of ​​this approach is that all elements i of the hash address is configured of a single linked list called a synonym chain, a single list head pointer and the presence of the i-th unit of the hash table, and thus to find, insert and delete mainly in the synonym chain. Chain address law applicable to the case of frequent insertions and deletions.

※ zipper method advantages:

      Treatment fastener simple conflict, and no accumulation phenomenon, i.e. non-synonymous never conflict, so the average search length shorter; zip method since nodes in each space on the application list is dynamic, so it is more suitable tabulation It can not be determined before the long open situation; hash list constructed with the fastener method, delete node operation and easy to implement. Simply deleting the list corresponding to the node.

※ zipper law disadvantages:

      Pointer requires additional space, so that when the small scale node, open addressing method is more space-saving, and if the pointer saved space to expand the size of the hash table can filling factor becomes smaller, which in turn reduces the open-addressable the conflict, thereby increasing the average search speed.

Figure know Java can be used in hashMap zipper law dealing with conflict. HashMap has an initial capacity size, default is 16

static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16 in order to reduce the probability of collision, when the array length hashMap to an expansion threshold is triggered, all the elements rehash then into the container after the expansion, which is a very time-consuming operation.

And this critical value of [] and the load factor of the current capacity of the container to determine the size: DEFAULT_INITIAL_CAPACITY * DEFAULT_LOAD_FACTOR, i.e., the default is 16x0.75 = 12, the expansion operation is triggered.

So try to estimate the amount of data to their initial values ​​when using the hash container. Specific code to achieve self-study HashMap source.

Basics supplement, go back to the question, why should the default load factor is 0.75? Random hash code, the frequency of appearance of the nodes follow a Poisson distribution of hash bucket, bucket table also given number of elements and probabilities. From the above table can be seen that when the bucket reaches an element 8, the probability has become very small, i.e. 0.75 as the load factor, the length of each chain collision position more than 8 is almost impossible.

④ the establishment of a common overflow area

       The basic idea of ​​this method is: the hash table is divided into two parts, the base table and overflow table, any table of elements and basic conflict, always populated overflow table.

Guess you like

Origin www.cnblogs.com/zhudingtop/p/11456101.html