HashMap realization of the principle and source code analysis

 Hash table (hash table), also known as hash table, is a very important data structure, application scenario and its rich, central to many caching technology (such as memcached) is actually maintains a large hash table in memory, and the principle of HashMap often appear in the face of all kinds of questions, the importance is evident. This article will java collections framework corresponds to achieve the realization of the principle to explain HashMap, HashMap source JDK7 will then be analyzed.

table of Contents

  First, what is the Kazakh Greek table

  Two, HashMap implementation principle

  Third, the array length why HashMap must be a power of 2?

  Fourth, the override equals method to simultaneously rewrite the hashCode method

  V. summary

First, what is a hash table

  Before discussing the hash table, we probably first understand other data structures in the new look for other basic operations execution performance

  Array : A section of contiguous memory locations for storing data. For a specified target to find the time complexity is O (1); lookup by a given value, needs to traverse the array one by one given keyword and match array elements, the time complexity is O (n), of course, for a sequence array, binary search can be adopted, interpolation lookup, to find Fibonacci, etc., can be increased to find complexity O (logn); for general insertion and deletion operations, involving moving the array elements, the average complexity is also O (n)

  Linear list : the list to add, delete operation (after finding the specified operation position), the processing can be invoked only between nodes, the time complexity is O (1), and find the need to traverse the list one by one operating ratio pairs, complexity is O (n)

  Binary tree : for a relatively balanced and orderly binary tree, its insert, search, delete, etc., the average complexity are O (logn).

  Hash Table : Some data compared to the above structure, in the hash table to add, delete, search and other operations, the performance is very high, without regard to conflict of hash, positioning can be completed only once, time complexity degree is O (1), then we take a look at how to achieve amazing hash table to achieve constant order O (1) is.

  We know that the physical storage structure only two data structures: sequential storage structure and the chain memory structure (such as stacks, queues, trees, etc. from the logical structure of FIG to abstract mapped into memory, and both the physical organization form), but we mentioned above, according to the index to find an element in the array, the first positioning can be achieved, hash tables, use of this feature, the trunk is an array of hash table .

  For example, we want to add or find an element, we passed the key of the current element is mapped to a location in the array through a function, through an array subscript a positioning operation can be completed.

        Memory location = f (key)

  Among them, the function f is generally known as a hash function , the function of the design is good or bad will directly affect the merits of the hash table. For example, such as we want to perform an insert operation in the hash table:

  

  Similarly the seek operation, to calculate the actual memory address hash function, then the corresponding address can be removed from the array.

  Hash collision

  However, everything is not perfect, if two different elements, drawn by the same hash function is actually stored address how to do? In other words, when we were on an element hashing get a memory address, and then want to insert, and found other elements has been occupied, in fact, this is the so-called hash collision , also called hash collision. As mentioned earlier, the hash function is critical to the design, a good hash function will try to ensure that  computing simple and hash address evenly distributed, however, we need to be clear that the array is a continuous fixed length memory space, no matter how good hash function can not guarantee to get the absolute memory address does not conflict. So how hash collisions solve it? Hash collisions There are many solutions: open-addressable (conflict, continue to look at a memory address unoccupied) and then hash function method, chain address law, and that is the use of a chain HashMap address method, also is an array + list the way,

Two, HashMap implementation principle

 Entry is a HashMap backbone array. Entry HashMap is the basic unit, each Entry contains a key-value pairs.

// array HashMap trunk can be seen that an Entry array, the initial value} {empty array, the array length of the trunk must be a power of 2, as to why do so, there is detailed later analysis. 
transient Entry <K, V> [ ] table = (Entry <K, V> []) EMPTY_TABLE;

 Entry is a static inner class HashMap in. code show as below

Copy the code
    Entry class static <K, V> the implements of Map.Entry <K, V> { 
        Final Key K; 
        V value; 
        Entry <K, V> Next; // store a pointer to an Entry under reference, the single chain structure 
        int hash; / after / hashcode values of key performs the hash function value obtained is stored in the entry, avoiding double counting 

        / ** 
         * Creates new new entry. 
         * / 
        the entry (int H, K K, V V, the entry <K, V> n-) { 
            value = V; 
            Next = n-; 
            Key = K; 
            the hash = H; 
        }
Copy the code

 Therefore, HashMap overall structure is as follows

  

  Briefly, the array HashMap + chain composed of an array is the HashMap body, the main chain is to solve hash collision exists, if the target position free list array (current entry point of the next null), then for find, add other operations quickly, to address only once; if the target array contains the list, add to the operation, the time complexity is O (n), first traverse the list, i.e. the presence of covered or new ; terms for a search operation, still need to traverse the list one by one and then compared to find the object by key equals method. Therefore, performance considerations, HashMap appear in the list, the less the better the performance will be.

Several other important fields

Copy the code
// the number of key-value actually stored key-value pairs 
transient int size; 
// threshold value, when the table == {}, the value of the initial capacity (initial capacity 16 defaults); when the table is filled, and that it is, after allocating memory space for the table, threshold usually capacity * loadFactory. HashMap need to refer to during the expansion threshold, spoke in detail later 
int threshold; 
// Load factor represents the degree of filling of how many table, the default is 0.75 
Final float loadFactor; 
// for fast failure, due to the non-thread-safe HashMap , when the HashMap iterating, if involved in other threads during the lead to changes in the structure of HashMap (such as put, remove other operations), you need to throw an exception a ConcurrentModificationException 
transient int modCount;
Copy the code
Copy the code

HashMap has four constructors, other constructors if the user does not pass initialCapacity and loadFactor these two parameters, it uses the default value

initialCapacity default is 16, loadFactory default 0.75

We look at one of the

Copy the code
the HashMap public (int initialCapacity, a float loadFactor) { 
     // Here incoming verify initial capacity, can not exceed the maximum MAXIMUM_CAPACITY = 1 << 30 (2
30
) 
        IF (initialCapacity <0) 
            the throw new new an IllegalArgumentException ( "Illegal Initial Capacity:" + 
                                               initialCapacity); 
        IF (initialCapacity> MAXIMUM_CAPACITY) 
            initialCapacity = MAXIMUM_CAPACITY; 
        IF (loadFactor <= 0 || Float.isNaN (loadFactor)) 
            the throw new new an IllegalArgumentException ( "Illegal Load factor:" + 
                                               loadFactor); 

        this.loadFactor = loadFactor; 
        threshold = initialCapacity; 
      the init (); // not actually be implemented in the init method of the HashMap, but in subclasses as there will be a corresponding realization in linkedHashMap }
Copy the code

  We can see from the above code, in a conventional reactor configuration, an array table does not allocate memory space (a parameter of the exceptions specified Map builder), but really build table array in the implementation of a put operation

  OK, then we look at the operation achieve put it

Copy the code
 
    V PUT public (Key K, V value) { 
        // If the table is empty array array {}, for filling the array (real memory space allocated for the table), the parameters for the threshold, this time is the default threshold for the initialCapacity 1 << 4 (2
4
16 =) 
        IF (Table == EMPTY_TABLE) { 
            inflateTable (threshold); 
        } 
       // if the key is null, the storage location for table [0] or table [0] conflict chain 
        IF (key == null) 
            return putForNullKey ( value); 
        int = the hash the hash (key); // key hashCode of further calculations to ensure uniform hash 
        int i = indexFor (hash, table.length ); // actual position in the table 
        for (Entry <K , V> Table E = [I]; E = null;! E = e.next) { 
        // If the corresponding data exists, perform an overwrite operation. Replaced by a new value of the old value, and return the old value 
            Object K; 
            IF (e.hash the hash == && ((K = e.key) == key.equals || Key (K))) { 
                V = E oldValue. value; 
                e.Value = value; 
                e.recordAccess (the this);
                oldValue return; 
            } 
        } 
        ModCount ++; // ensure concurrent access time, if the internal structure HashMap changes, quick response failure 
        addEntry (hash, key, value, i); // add an entry 
        return null; 
    }    
 
Copy the code

 Let's look at this method inflateTable

Copy the code
 
void inflateTable Private (int toSize) { 
        int Capacity = roundUpToPowerOf2 (toSize); // Capacity must be a power of 2 
        threshold = (int) Math.min (capacity * loadFactor, MAXIMUM_CAPACITY + 1); // here assigned threshold , and taking capacity * loadFactor minimum MAXIMUM_CAPACITY + 1, capaticy will not exceed MAXIMUM_CAPACITY, unless loadFactor greater than 1 
        Table new new = the Entry [Capacity]; 
        initHashSeedAsNeeded (Capacity); 
    }
 
Copy the code

  inflateTable This method is used for the trunk storage space is allocated in the table array memory by roundUpToPowerOf2 (toSize) ensures capacity greater than or equal to the second power closest toSize toSize, such toSize = 13, then capacity = 16; to_size = 16, capacity = 16; to_size = 17, capacity = 32.

Copy the code
 
 private static int roundUpToPowerOf2(int number) {
        // assert number >= 0 : "number must be non-negative";
        return number >= MAXIMUM_CAPACITY
                ? MAXIMUM_CAPACITY
                : (number > 1) ? Integer.highestOneBit((number - 1) << 1) : 1;
    }
 
Copy the code

roundUpToPowerOf2 in this process so that a certain length of the array is a power of 2, Integer.highestOneBit is used to obtain the leftmost bit (0 bit to the other) represented by the value.

hash function

Copy the code
 
// this is a magic function, with a lot of XOR, shift operation, etc., of key hashCode calculating and adjusting further bits of the eventually obtained like to ensure uniform distribution of possible memory locations 
final int hash (Object k ) { 
        int = H hashSeed; 
        IF (H = 0 && K the instanceof String) {! 
            return sun.misc.Hashing.stringHash32 ((String) K); 
        } 

        H ^ = k.hashCode (); 

        H ^ = (H> 20 is >>) ^ (>>> 12 is H); 
        return ^ H (H >>>. 7) ^ (H >>>. 4); 
    }
Copy the code

Above hash function value calculated to obtain the actual memory location for further processing by indexFor

  / ** 
     * returns an array subscript 
     * / 
    static int indexFor (H int, int length) { 
        return & H (. 1-length); 
    }

h & (length-1) must ensure the acquired index within the range of the array, for example, the default capacity of 16, length-1 = 15, h = 18, converted into binary calculated as

        1  0  0  1  0
    &   0  1  1  1  1
    __________________
        0  0  0  1  0    = 2

  Finally calculated index = 2. Some versions for use herein is calculated modulo operation, but also to ensure the array index within a certain range, but the bit operation to the computer, the number of higher performance (a large number of bits in the HashMap operation)

Therefore, the final storage location determination process is such that:

Look at the realization of addEntry:

Copy the code
void addEntry(int hash, K key, V value, int bucketIndex) {
        IF ((size> = threshold) && (null = Table [bucketIndex])!) { 
            a resize (2 * table.length); // for expansion when the size exceeds a critical threshold threshold, and the hash collision is imminent 
            hash = ( ! = null Key) hash (Key): 0;? 
            bucketIndex = indexFor (hash, table.length); 
        } 

        createEntry (hash, Key, value, bucketIndex); 
    }
Copy the code

  By the above code can be known, and the hash collision occurs when the size is larger than the threshold value, the array to expansion, the expansion time, need to create a new array of length 2 times the previous array, and the array element in the current Entry all transmissions in the past, the new array length after expansion is two times before, the expansion is relatively resource-consuming operations.

Third, the array length why HashMap must be a power of 2?

We continue to see the resize method mentioned above

Copy the code
 void resize(int newCapacity) {
        Entry[] oldTable = table;
        int oldCapacity = oldTable.length;
        if (oldCapacity == MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return;
        }

        Entry[] newTable = new Entry[newCapacity];
        transfer(newTable, initHashSeedAsNeeded(newCapacity));
        table = newTable;
        threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
    }
Copy the code

If the expansion array, the array length change, and the storage location index = h & (length-1), index also may change, need to re-calculate the index, we take a look at this method of transfer

Copy the code
Transfer void (the Entry [] NewTable, the rehash Boolean) { 
        int = newCapacity newTable.length; 
     // the code for the loop, traversing the list one by one, the index position is recalculated, copy the old data into the new array to array (memory array is not The actual data, it simply copies reference) for (the Entry <K, V> E: Table) { ! the while (E = null) { the Entry <K, V> Next = e.next; IF (the rehash) { E. == null = 0 e.key the hash:? the hash (e.key); } int I = indexFor (e.hash, newCapacity);
          // the current entry point to the new link next index location, newTable [i] has It may be empty, there may also be a chain of entry, if the entry is a chain, directly into the head of the list. = NewTable e.next [I]; NewTable [I] = E; E = Next; } } }
Copy the code

  This method old data array by one traversing the linked list, the new array thrown after expansion, we calculate the array index position is disturbed operation by the hash value hashcode of the key, and then through the length-1 for Bitwise array index to obtain a final position.

  Holding a certain length of the array hashMap power of 2, such as the binary representation of 16 is 10,000, then length-1 is 15, as 01111 binary, the array length after expansion is 32 Similarly, binary representation is 100000, length-1 31 , binary representation is 011111. The following figures we can see that it would guarantee all 1 low, but after the expansion is only one difference, which is more than the leftmost digit of 1, so that when by h & (length-1), as long as h corresponding to a difference that the leftmost bit is 0, the index will be able to ensure that the new array and the old array index unanimously the (greatly reduced the previously hashed good old re-position the array of data exchange), personal understanding.

  

 Further, the length of the array holding a power of two, length-1 1 are both low, so that the array index will be more uniform index obtained, such as:

  We see, above the & operator, is not a high impact on the result (hash function using a variety of bit operation may also be low in order to make more hash), we are only concerned about the low bit, if all low 1, then for h lower part, any change will have an impact on the result, that is to say, to get the storage position index = 21, h low of only this combination. This is also the length of the array is designed to be the reason for the power of two.

  If not a power of 2, that is, not all of the lower case 1, to such index = 21, h of the lower portion is no longer unique, and will become a greater chance of a hash collision, at the same time, index corresponding this bit is not equal to 1 bit anyway, and those corresponding to the position of the array will be wasted.

get method

Copy the code
V GET public (Object key) { 
     // if the key is null, then directly to the table [0] to retrieve the can. IF (Key == null) return getForNullKey (); the Entry <K, V> = getEntry entry (Key); return null == null entry: entry.getValue ();? }
Copy the code

get method corresponding to the value returned by key value, if the key is null, directly to retrieve the table [0] at. Let's look at this method getEntry

Copy the code
the Entry Final <K, V> getEntry (Object key) { 
            
        IF (size == 0) { 
            return null; 
        } 
        // hash values calculated hashcode key value 
        int hash = (key == null) 0:? hash (key ); 
        // indexFor (the hash-length &. 1) to obtain the final array index, and then traverse the list by comparing the equals method to find the corresponding records 
        for (Entry <K, V> e = table [indexFor (hash, table.length)]; 
             ! = null E; 
             E = e.next) { 
            Object K; 
            IF (e.hash the hash == && 
                ! ((K = e.key) == || Key (Key = null && key.equals (K)) )) 
                return E; 
        } 
        return null; 
    }    
Copy the code

  As can be seen, the get method achieved relatively simple, key (hashcode) -> hash -> indexFor -> final index position, find the corresponding position of the table [i], and then see if there is list, traverse the list through the key equals method to find a corresponding match record. It should be noted that some people think the above then traverse the list when, e.hash == hash this judgment is not necessary after positioning to position the array, we can only judge by equals. In fact, imagine if the object passed in key overrides the equals method but does not override hashCode, and this happens to locate the object to the array position, if only judged by equals may be equal, but it is inconsistent and the current object hashCode this situation, according to the hashCode Object of the agreement, can not return the current object, but should return null, an example of the latter would make further explanation.

Fourth, the override equals method to simultaneously rewrite the hashCode method

  About HashMap source code analysis on the introduction here, and finally we'll talk about a common problem, a variety of materials will be referred to, "rewriting equals while also covering hashcode", we give a small example to look at, If you override the equals hashcode without overwriting what kind of problem occurs

Copy the code
/**
 * Created by chengxiao on 2016/11/15.
 */
public class MyTest {
    private static class Person{
        int idCard;
        String name;

        public Person(int idCard, String name) {
            this.idCard = idCard;
            this.name = name;
        }
        @Override
        public boolean equals(Object o) {
            if (this == o) {
                return true;
            }
            if (o == null || getClass() != o.getClass()){
                return false;
            }
            Person person = (Person) o;
            //两个对象是否等值,通过idCard来确定
            == person.idCard this.idCard return; 
        } 

    } 
    public static void main (String [] args) { 
        the HashMap <the Person, String> = new new Map the HashMap <the Person, String> (); 
        the Person Person = new new the Person (1234, " serrucho "); 
        // hashmap PUT to go 
        map.put (person," Dragon "); 
        taken // get, logically should be able to output the" Dragon " 
        System.out.println (" results : "+ map.get (new Person ( 1234," Xiao Feng "))); 
    } 
}
Copy the code

Actual output:

The results: null

  If we have to have a certain understanding of the principles of HashMap, this result is not difficult to understand. Although when we get and put operation is performed, the key used is logically equivalent (by comparing equals equal), but does not override hashCode method, so that when the put operation, key (hashcode1) - > hash -> indexFor -> final index position, the value taken by the key when the key (hashcode1) -> hash -> indexFor -> final index position, the hashcode1 not equal hashcode2, resulting in no targeting a an array of position error is returned from the logical value null (there may happen to an array position of the positioning, but also determines which entry hash values ​​are equal, get the above mentioned method.)

  Therefore, when rewriting the equals method, attention must override hashCode method, but also to ensure equal judged by two equals objects, calling the hashCode method to return the same integer value. And if the judge does not equal two equals objects, which can be the same hashCode (except hash conflict should be avoided).

V. Summary

  This article describes the implementation of the principle of HashMap, combined with the source code to do further analysis, but also involves some source detailing the reason, why the final brief rewrite equals the time needed to rewrite hashCode method. Hope this article can help to you, and also welcome discussion correct me, thank you support!

Guess you like

Origin www.cnblogs.com/ysd139856/p/12537015.html