Java collection, HashMap source code, implementation principle, underlying structure.

JAVA architecture qq group: 7999861, welcome to exchange java interview questions together.

        The collection classes in the java.util package contain some of the most commonly used classes in Java. The most commonly used collection classes are List and Map. Specific implementations of List include ArrayList and Vector, which are variable-sized lists that are more suitable for building, storing, and manipulating lists of elements of any type of object. List is suitable for accessing elements by numerical index.

        Map provides a more general element storage method. The Map collection class is used to store pairs of elements (called "keys" and "values"), where each key maps to a value. Conceptually, you can think of a List as a Map with numeric keys. In fact, in addition to List and Map are defined in java.util, there is no direct connection between the two.

        There are many implementation classes of the Map interface, among which HashMap is a more important implementation. This article focuses on HashMap.

        HashMap is an implementation of the Map interface based on a hash table. This implementation provides all optional map operations and allows null values ​​and null keys. (The HashMap class is much the same as Hashtable, except that it is asynchronous and null is allowed.) This class does not guarantee the ordering of the map, and in particular it does not guarantee that the ordering will persist.

        HashMap combines the advantages of the two implementations of ArrayList and LinkedList. Although HashMap does not have a higher performance in certain operations than the two implementations of List, it has stable performance in basic operations (get and put).

 

        First, start with member variables to understand HashMap and the above concepts.

1. Member variables of HashMap:

    /** 
     * 初始默认容量(必须为2的幂次方) 
     */  
    static final int DEFAULT_INITIAL_CAPACITY = 16;  
      
    /** 
     * 最大容量,如果被指定为一个更高的值必须为2的幂次方,并且小于1073741824.(1<<30) 
     */  
    static final int MAXIMUM_CAPACITY = 1 << 30;  
      
    /** 
     * 默认负载因子/负载系数 
     */  
    static final float DEFAULT_LOAD_FACTOR = 0.75f;  
      
    /** 
     * 内部实现表, 必要时调整大小,其长度亦为2的幂次方 
     */  
    transient Entry[] table;  
      
    /** 
     * map中添加的元素个数 
     */  
    transient int size;  
      
    /** 
     * 扩容临界值,当size达到此值时进行扩容 (容量乘以负载因子). 
     */  
    int threshold;  
      
    /** 
     * 内部实现表的负载因子 
     */  
    final float loadFactor;  
      
    /** 
     * 操作数,可以理解为map实例被操作的次数,包括添加,删除等等 
     */  
    transient volatile int modCount;  

The code has shift operation, if you don't understand it, please click Java binary bit operation, shift operation >>, <<, >>> to understand.

        The internal implementation of HashMap is an Entry array table, and Entry is the entity that holds the corresponding key value. The default size of the table array is 16, we can also specify a larger value during initialization, but the specified value must be a power of 2.

        Through the study of ArrayList, we know that the internal implementation of ArrayList is also an array. When the added element exceeds the capacity limit of the array, ArrayList will "expand" the internal array once, so that new elements can be added.

        There is a similar concept in HashMap. HashMap does not "expand" until the array is full like ArrayList, but judges it according to the load factor.

        For example, the default size of the table array in the HashMap instance is 16, and the load factor is 0.75. When the number of added elements is greater than or equal to 12 (16*0.75), the capacity will be expanded.

        Therefore, the capacity and load factor directly affect whether the table array is expanded and when to expand, which in turn affects the performance of the HashMap instance.

        When we initialize, we can specify the capacity size of the HashMap instance, when the specified size is not a power of 2, as follows:

Map map=new HashMap(131);  

What is the length of the table in the HashMap after the initialization is completed? The answer is: 256

        In fact, as long as you open the source code of the constructor of HashMap, you can understand why. The following is the source code:

public HashMap(int initialCapacity, float loadFactor) {  
    if (initialCapacity < 0)  
        throw new IllegalArgumentException("Illegal initial capacity: "  
                + initialCapacity);  
    if (initialCapacity > MAXIMUM_CAPACITY)  
        initialCapacity = MAXIMUM_CAPACITY;  
    if (loadFactor <= 0 || Float.isNaN(loadFactor))  
        throw new IllegalArgumentException("Illegal load factor: "  
                + loadFactor);  
  
    // Find a power of 2 >= initialCapacity  
    int capacity = 1;  
    while (capacity < initialCapacity)  
        capacity <<= 1;  
  
    this.loadFactor = loadFactor;  
    threshold = (int) (capacity * loadFactor);  
    table = new Entry[capacity];  
    init();  
} 

The key lies in these two lines:

while (capacity < initialCapacity)  
    capacity <<= 1; 

        If the initialCapacity (specified size) is greater than the capacity (original or initialized size), it will continue to cycle through the displacement assignment calculation, which is equivalent to capacity=capacity *2. Until the capacity is greater than or equal to the size we specified. If the specified size is exactly the Nth power of 2, the two values ​​will be equal, and the calculation will be terminated; if the specified size does not meet the conditions, the capacity will be the number of the Nth power of 2 that is just larger than the specified size.

        Therefore, in the above, we specify that the size is 131, and the number greater than 131 and the N-th power of 2 is 256, so the table will be initialized according to 256 at this time.

2. Entry element

        Similar to LinkedList, HashMap also uses Entry inner class to store actual element information. The following is the source code of Entry (part of the code is omitted):

    static class Entry<K, V> implements Map.Entry<K, V> {  
        final K key;  
        V value;  
        Entry<K, V> next;  
        final int hash;  
    }  

        Entry includes 4 member variables, where key is the key, value is the value, next points to the next node element, and hash is the hash value. Entry can find the element of the next node through the next attribute, and then through traversal, the information stored under the corresponding key can be found.

3. HashMap setting element

        Map associates the specified value with the specified key in the Map instance through the put method. If the instance already contains a mapping for this key, the old value is replaced.

        An example is as follows:

    Map map = new HashMap();  
    map.put("user1", "小明");  
    map.put("user2", "小强");  
    map.put("user3", "小红");  
    System.out.println("user1:" + map.get("user1"));  
    System.out.println("user2:" + map.get("user2"));  
    System.out.println("user3:" + map.get("user3"));  
    map.put("user2", "小龙");  
    System.out.println("user1:" + map.get("user1"));  
    //打印结果  
    user1:小明  
    user2:小强  
    user3:小红  
    user1:小明  

        First, an instance map of HashMap is created. At this time, the table array in the map instance will be initialized by default, creating an empty array with a length of DEFAULT_INITIAL_CAPACITY=16.

        Then, call the put method to save a pair of key and value (key, value). When a mapping with the specified key already exists in the existing Map instance, the newly specified value will overwrite the original value.

        Like the related implementation add method of LIst, the put method of HashMap is to set the entry of the element, and a series of judgments and operations will be carried out in the process of put, so only after the put method is thoroughly understood, the internal structure and mechanism of HashMap will be better. clear.

        When HashMap performs the put operation, follow the steps below:

        1) Determine whether the key is empty, if it is empty, call the special method of setting null.

        2) Calculate the hash value of the key.

        3) Calculate the array subscript to be placed by the element through the length of the hash and table arrays.

        4) Traverse the Entry element chain under the subscript. If an Entry with the same key as the specified key is found, the value of the Entry is directly replaced and returned.

        5) If not found, add a new element to the front of the element chain under this subscript.

        The following is a description picture of the put operation process on the official website, which can be used as a reference:

       

The following is the source code of the put method, in which I have added the relevant description for your understanding:

    /** 
     * 设置指定值 
     */  
    public V put(K key, V value) {  
        //1.首先判断key是否为null  
        if (key == null)  
            //如果为null则调用putForNullKey方法  
            return putForNullKey(value);  
        //2.计算key的hash值  
        int hash = hash(key.hashCode());  
        //3.根据计算后的hash值与table数组长度计算该key应放置到table数组的那个下标位置  
        int i = indexFor(hash, table.length);  
        //4.遍历该下标下的所有Entry,如果key已存在则覆盖该key所在Entry的value值  
        for (Entry<K, V> e = table[i]; e != null; e = e.next) {  
            Object k;  
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {  
                V oldValue = e.value;  
                e.value = value;  
                e.recordAccess(this);  
                return oldValue;  
            }  
        }  
      
        modCount++;  
        //5.如果该key不存在则新添加Entry元素至数组指定位置,并且该Entry作为此下标元素链的头部  
        addEntry(hash, key, value, i);  
        return null;  
    }  

4. Internal structure of HashMap

        Through the process analysis of the put method, we have basically understood the mechanism and principle of HashMap's internal implementation, so let's summarize the process of HashMap initialization and adding elements (take the default value as an example):

        (1) Initialize the HashMap instance and initialize its internal array table:

this.loadFactor = DEFAULT_LOAD_FACTOR;//0.75f  
threshold = (int)(DEFAULT_INITIAL_CAPACITY * DEFAULT_LOAD_FACTOR);//16*0.75=12  
table = new Entry[DEFAULT_INITIAL_CAPACITY];//16 

At this point, the table is initialized and created with a length of 16.

        (2) When the element is put for the first time, no element is added to the HashMap instance at this time, so the put method will directly call the addEntry method:

Entry<K,V> e = table[bucketIndex];  
table[bucketIndex] = new Entry<K,V>(hash, key, value, e); 

        First, the original Entry information under the subscript (bucketIndex) will be obtained first. Since the table does not set any value, e is null at this time.

        Then, create a new Entry instance whose next property points to e, and assign this instance to table[bucketIndex].

        (3) When updating the value content of the existing key in the HashMap instance:

    for (Entry<K,V> e = table[i]; e != null; e = e.next) {  
        Object k;  
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {  
            V oldValue = e.value;  
            e.value = value;  
            e.recordAccess(this);  
            return oldValue;  
        }  
    }  

        If the key has been put in the HashMap instance, you only need to traverse to find the node Entry, update its value and return, so the operation of updating the existing key will not call the addEntry method.

        (4) At this time, the internal structure of the HashMap instance is shown in the following figure:

       

        HashMap uses this method of storing elements to combine the advantages of both ArrayList and LinkedList. Although the performance of a simple operation is not higher than either of the two, the advantage of this method is that the storage and retrieval performance is stable, not Severe fluctuations will occur.

5. HashMap gets elements

        Now that you have understood the relevant operation steps when the internal structure of HashMap has already set the element, it is actually easier to understand the acquisition of the element. First, calculate the array subscript according to the specified key, then traverse the Entry chain under the subscript, and finally return.

        The following is the source code of the get method, which is roughly the same as the basic flow of the put method:

    /** 
     * 返回指定key的value 
     */  
    public V get(Object key) {  
        // 1.判断可以是否为null  
        if (key == null)  
            return getForNullKey();  
        // 2.计算key的hash值  
        int hash = hash(key.hashCode());  
        // 3.遍历table指定下标下的Entry链  
        for (Entry<K, V> e = table[indexFor(hash, table.length)]; e != null; e = e.next) {  
            Object k;  
            // 4.如果找到则返回该Entry的value  
            if (e.hash == hash && ((k = e.key) == key || key.equals(k)))  
                return e.value;  
        }  
        // 5.未找到则返回null  
        return null;  
    }  

6. HashMap removes elements

        HashMap implements the remove method of the Map interface, so the added elements can be removed through the remove method:

    Map map = new HashMap();  
    map.put("user1", "小明");  
    map.put("user2", "小强");  
    map.put("user3", "小红");  
    map.remove("user2");  
    System.out.println("user1:" + map.get("user1"));  
    System.out.println("user2:" + map.get("user2"));  
    System.out.println("user3:" + map.get("user3"));  
    //打印结果:  
    user1:小明  
    user2:null  
    user3:小红  

        When the remove method is actively called, the node element will be removed according to the specified key.

        Here is the source code of the remove method:

    /** 
     * 删除指定key下内容 
     */  
    public V remove(Object key) {  
        Entry<K, V> e = removeEntryForKey(key);  
        return (e == null ? null : e.value);  
    }  
      
    /** 
     * 根据指定key删除元素 
     */  
    final Entry<K, V> removeEntryForKey(Object key) {  
        int hash = (key == null) ? 0 : hash(key.hashCode());  
        int i = indexFor(hash, table.length);  
        Entry<K, V> prev = table[i];  
        Entry<K, V> e = prev;  
      
        while (e != null) {  
            Entry<K, V> next = e.next;  
            Object k;  
            if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) {  
                modCount++;  
                size--;  
                if (prev == e)  
                    table[i] = next;  
                else  
                    prev.next = next;  
                e.recordRemoval(this);  
                return e;  
            }  
            prev = e;  
            e = next;  
        }  
      
        return e;  
    }  

        The remove method calls another method, removeEntryForKey. The removeEntryForKey method will loop through all Entry node elements under the specified subscript. If the key exists, it will modify the next point of the previous node of the node, so as to achieve the purpose of removing the Entry node from the Entry chain. .

        Note that the remove operation of HashMap will not cause a "reduction" operation, so it will not affect performance.

7. Traversal of HashMap

        Usually, the user of the Map knows which keys are in the Map instance, and can directly retrieve all the elements through the get(key) method, but in some cases, the code generated by this approach will be a one-time code and cannot be shared.

        The traversal of HashMap usually adopts the following methods:

        1) The Set return of all Entry of the HashMap instance can be obtained through the entrySet() method, so all Entry elements can be obtained by returning and iterating through the entrySet method:

    Map map = new HashMap();  
    map.put("user1", "小明");  
    map.put("user2", "小强");  
    map.put("user3", "小红");  
    Iterator iter = map.entrySet().iterator();  
    while (iter.hasNext()) {  
        Map.Entry entry = (Map.Entry) iter.next();  
        Object key = entry.getKey();  
        Object value = entry.getValue();  
        System.out.println("key:" + key + ";value:" + value);  
        // 然后移除元素  
        if (key.toString().equals("user1")) {  
            iter.remove();  
        } else if (key.toString().equals("user2")) {  
            entry.setValue("小海");  
        }  
      
    }  
    System.out.println(map.get("user1"));  
    System.out.println(map.get("user2"));  
    System.out.println(map.get("user3"));  
      
    // 打印结果:  
    key:user2;value:小强  
    key:user1;value:小明  
    key:user3;value:小红  
    null  
    小海  
    小红  

        This method is simple to operate, less code, high efficiency, and can directly manipulate elements, which is one of the commonly used methods.

        2) Map also provides the keySet method, which is used to return the Set form of all keys, and then iterates over the Set and then obtains the value of the corresponding element through the get method:

    Map map = new HashMap();  
    map.put("user1", "小明");  
    map.put("user2", "小强");  
    map.put("user3", "小红");  
    Iterator iter = map.keySet().iterator();  
    while (iter.hasNext()) {  
        Object key = iter.next();  
        Object value = map.get(key);  
        System.out.println("key:" + key + ";value:" + value);  
        // 然后移除元素  
        if (key.toString().equals("user1")) {  
            iter.remove();  
        }  
    }  
    System.out.println(map.get("user1"));  
    System.out.println(map.get("user2"));  
    System.out.println(map.get("user3"));  
      
    // 打印结果:  
    key:user2;value:小强  
    key:user1;value:小明  
    key:user3;value:小红  
    null  
    小强  
    小红  

        This method needs to traverse all the keys and return them first, and then obtain the elements through the get method. If you simply need to manipulate individual node elements in the Map instance, the efficiency is acceptable, but if you need to obtain and modify them on a large scale, the efficiency is not as good as the first one. Therefore, the choice between the two methods depends on the situation, and there is no absolute.

        3) Return all values ​​directly through the values ​​method:

    Map map = new HashMap();  
    map.put("user1", "小明");  
    map.put("user2", "小强");  
    map.put("user3", "小红");  
    //转换成数组  
    String[] names= (String[]) map.values().toArray(new String[map.size()]);  
    for (String name : names){  
        System.out.println(name);  
    }  
    //采用迭代  
    Collection nameArray =  map.values();  
      
    Iterator iter = nameArray.iterator();  
      
    while (iter.hasNext()) {  
      
     String name=iter.next().toString();  
      
     System.out.println(name);  
      
    }  
    // 打印结果:  
    小强  
    小明  
    小红  

       This method is simple and clear, and is suitable for directly obtaining all values. It can be directly iterated or converted into an array. It is more suitable when the value is directly displayed.

The internal implementation principle and underlying structure of         HashMap have been relatively clear so far.

Summarize:

1. Implementation principle

  • HashMap is based on the principle of hashing. We use put(key, value) to store objects in HashMap, and use get(key) to get objects from HashMap.
  • When we pass the key and value to the put(key, value) method, it first calls the key.hashCode() method, and the returned hashCode value is used to find the bucket location to store the Entry object.
  • Map provides some common methods, such as keySet (), entrySet () and other methods.
    The return value of the keySet() method is a collection of key values ​​in the Map; the return value of entrySet() is also a Set collection, and the type of this collection is Map.Entry.
  •  "If the hashcode of two keys is the same, how do you get the value object?" Answer: When we call the get(key) method, the HashMap will use the hashcode value of the key, find the bucket location, and then get the value object.
  • "If there are two value objects, are they stored in the same bucket?" Answer: The linked list will be traversed until a value object is found.
  • "At this time, you will ask because you don't have a value object to compare, how do you determine that you have found a value object?" Answer: After finding the bucket location, the keys.equals() method will be called to find the correct node in the linked list, and finally Find the value object you are looking for.

Perfect answer:

  • HashMap is based on the hashing principle, we store and get objects through put(key, value) and get(key) methods.
  • When storing an object, we pass the key-value pair to the put(key,value) method, it calls the hashCode() method of the key object key to calculate the hashcode, and then finds the bucket location to store the value object value.
  • When getting the object, find the correct key-value pair key-value through the equals() method of key, and then return the value object value.
  • HashMap uses a linked list to solve the collision problem. When a collision occurs, the object will be stored in the next node of the linked list.
  • HashMap stores key-value pair key-value objects in each linked list node.
  • What happens when two different key objects have the same hashcode for the key? They will be stored in a linked list in the same bucket location and used to find the key-value pair key-value through the equals() method of the key object key.

Because of the many benefits of HashMap, I used to use HashMap as a cache in my application. Because Java is widely used in the financial field, and for performance reasons, we often use HashMap and ConcurrentHashMap.

2. The underlying data structure

The bottom layer of HashMap is mainly implemented based on arrays and linked lists. The reason why it has a relatively fast query speed is mainly because it determines the storage location by calculating the hash code .

  • In HashMap, the hash value is mainly calculated by the hashCode of the key. As long as the hashCode is the same, the calculated hash value is the same.
  • If there are many pairs of stored objects, it is possible that the hash values ​​calculated by different objects are the same, which leads to the so-called hash conflict.
  • Students who have studied data structures know that there are many ways to resolve hash conflicts. The bottom layer of HashMap uses linked lists to resolve hash conflicts.

Supplementary knowledge:

  • HashMap is an implementation of the Map interface based on a hash table.
  • This implementation provides all optional map operations and allows null values ​​and null keys. (The HashMap class is much the same as Hashtable, except that it is unsynchronized and null is allowed.)
  • This class does not guarantee the order of the map, in particular it does not guarantee that the order will be permanent.
  • It is worth noting that HashMap is not thread-safe. If you want a thread-safe HashMap, you can obtain a thread-safe HashMap through the static method synchronizedMap of the Collections class .
Map map = Collections.synchronizedMap(new HashMap());
  • HashMap combines the advantages of the two implementations of ArrayList and LinkedList. Although HashMap does not have higher performance in certain operations than the two implementations of List, it has stable performance in basic operations (get and put).

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325609368&siteId=291194637