Understanding java collections from shallow to deep (5) - collection Map

HashMap

The collection implementation classes under the Collection interface have been introduced earlier. Today we will introduce two important collection implementation classes HashMap and TreeMap under the Map interface.

HashMap is a hash table that stores key-value pairs (key-value) mappings.
Since we want to introduce HashMap, we will introduce HashTable by the way, and compare the two. Both HashMap and Hashtable are classic implementation classes of the Map interface, and the relationship between them is completely similar to the relationship between ArrayList and Vector introduced earlier. Since Hashtable is an old Map implementation class (you can see from the naming convention of Hashtable that t is not capitalized, it is not my typo), the required methods are cumbersome and do not conform to the specification of the Map interface. But Hashtable also has advantages that HashMap does not have. Below we make a comparison between the two.

The difference between HashMap and Hashtable

1. Hashtable is a thread-safe Map implementation, but HashMap is a thread-unsafe implementation, so HashMap has better performance than Hashtable; but if there are multiple threads accessing the same Map object, it would be better to steal the Hashtable implementation class .

2. Hashtable does not allow null to be used as key and value. If you try to put null value into Hashtable, NullPointerException will be thrown; but HashMap can use null as key or value.

HashMap criteria for judging the equality of key and value
In the previous article, we analyzed the criteria for judging the equality of set elements for other sets. HashMap is no exception, the difference is that there are two elements: key and value need to introduce the criteria for judging equality.

Key criteria for judging equality

Similar to HashSet, HashMap and Hashtable judge that two keys are equal: two keys return true through the equals() method comparison, and the hashCode values ​​​​of the two keys are also equal, then the two keys are considered equal.

Note: The object used as the key must implement the hashCode() method and the equals() method. And it is best that the results returned by the two are consistent, that is, if equals() returns true, the hashCode() values ​​are equal. Refer to Set's introduction on this aspect.

The standard for judging equality by value

HashMap and Hashtable judge that two values ​​are equal: as long as the two objects are compared by the equals() method and return true.

Note: The collection elements composed of keys in HashMap cannot be repeated, and the collection elements composed of values ​​can be repeated.

The following program demonstrates the criteria for HashMap to judge that key and value are equal.

public class A {
    
    
    public int count;

    public A(int count) {
    
    
        this.count = count;
    }
    //根据count值来计算hashCode值
    @Override
    public int hashCode() {
    
    
        final int prime = 31;
        int result = 1;
        result = prime * result + count;
        return result;
    }
    //根据count值来判断两个对象是否相等
    @Override
    public boolean equals(Object obj) {
    
    
        if (this == obj)
            return true;
        if (obj == null)
            return false;
        if (getClass() != obj.getClass())
            return false;
        A other = (A) obj;
        if (count != other.count)
            return false;
        return true;
    }
    

}
public class B {
    
    
    public int count;
    public B(int count) {
    
    
        this.count = count;
    }
     //根据count值来判断两个对象是否相等
    @Override
    public boolean equals(Object obj) {
    
    
        if (this == obj)
            return true;
        if (obj == null)
            return false;
        if (getClass() != obj.getClass())
            return false;
        B other = (B) obj;
        if (count != other.count)
            return false;
        return true;
    }

}
public class HashMapTest {
    
    
    public static void main(String[] args){
    
    
        HashMap map = new HashMap();
        map.put(new A(1000), "集合Set");
        map.put(new A(2000), "集合List");
        map.put(new A(3000), new B(1000));
       //仅仅equals()比较为true,但认为是相同的value
        boolean isContainValue = map.containsValue(new B(1000));
        System.out.println(isContainValue);
      //虽然是不同的对象,但是equals()和hashCode()返回结果都相等
        boolean isContainKey = map.containsKey(new A(1000));
        System.out.println(isContainKey);
      //equals()和hashCode()返回结果不满足key相等的条件
        System.out.println(map.containsKey(new A(4000)));
    }
    
}
输出结果:

true
true
false

Note: If the key added to the HashMap is a mutable object, modifying the value of the member variable of the key after adding it to the collection may cause changes in the hashCode() value and the comparison result of equal(), making it impossible to access the key. Normally do not modify.

The essence of HashMap
Let's understand HashMap from the perspective of source code.

HashMap constructor

// 默认构造函数。
HashMap()

// 指定“容量大小”的构造函数
HashMap(int capacity)

// 指定“容量大小”和“加载因子”的构造函数
HashMap(int capacity, float loadFactor)

// The constructor
HashMap(Map<? extends K, ? extends V> map) containing the "sub
-Map" learns two important elements from the constructor: capacity and loadFactor.
Capacity (capacity) is the capacity of the hash table, and the initial capacity is the capacity of the hash table when it is created (ie DEFAULT_INITIAL_CAPACITY = 1 << 4).
The load factor is a measure of how full a hash table can get before its capacity automatically increases. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is resized (that is, the internal data structure is rebuilt), so that the hash table will have approximately twice the number of buckets.
Usually, the default load factor is 0.75 (ie DEFAULT_LOAD_FACTOR = 0.75f), which is a compromise between time and space costs. Although the load factor is too high to reduce the space overhead, it also increases the query cost (this is reflected in most HashMap class operations, including get and put operations). The number of entries required in the map and its load factor should be taken into account when setting the capacity, so as to minimize the number of resize operations. If the capacity is greater than the maximum number of entries divided by the load factor, no rehash will occur.

Node type
HashMap is a hash table implemented by "zipper method". It includes several important member variables: table, size, threshold, loadFactor.

table is a Node[] array type, and Node is actually a one-way linked list. The "key-value key-value pairs" of the hash table are all stored in the Node array.

size is the size of the HashMap, which is the number of key-value pairs saved by the HashMap.

threshold is the threshold of HashMap, which is used to judge whether the capacity of HashMap needs to be adjusted. The value of threshold = "capacity * loading factor", when the amount of data stored in the HashMap reaches the threshold, the capacity of the HashMap needs to be doubled.

loadFactor is the load factor.

To understand HashMap, you must first understand the "zipper method" based on Node.

There are two bottom-level structures of data storage in Java, one is an array, and the other is a linked list. The characteristics of an array: continuous space, fast addressing, but large movements are required when deleting or adding elements. Therefore, the query speed is fast, and the addition and deletion are slow. The linked list is just the opposite. Because the space is not continuous, addressing is difficult. Adding and deleting elements only needs to modify the pointer, so the query speed is slow and the addition and deletion are fast. Is there an array structure to combine arrays and linked lists so as to give full play to their respective advantages? The answer is yes! That is: a hash table. The hash table has a fast (constant level) query speed and a relatively fast addition and deletion speed, so it is very suitable for use in a massive data environment. Generally, the method of implementing a hash table adopts the "zipper method", which we can understand as "an array of linked lists", as shown in the following figure:

insert image description here

In the figure, we can find that the hash table is composed of array + linked list. In an array with a length of 16, each element stores the head node of a linked list. So what rules are these elements stored in the array?
In general, it is obtained through hash(key), which is the hash value of the key of the element. If the hash (key) values ​​are equal, they are all stored in the linked list corresponding to the hash value. It is actually implemented internally with a Node array.

So each array element represents a linked list, where the common point is that the hash (key) is equal.

Let's take a look at the basic element Node of the linked list.

static class Node<K,V> implements Map.Entry<K,V> {
    
    
        final int hash;
        final K key;
        V value;
        // 指向下一个节点
        Node<K,V> next;
        //构造函数。
      // 输入参数包括"哈希值(hash)", "键(key)", "值(value)", "下一节点(next)"
        Node(int hash, K key, V value, Node<K,V> next) {
    
    
            this.hash = hash;
            this.key = key;
            this.value = value;
            this.next = next;
        }

        public final K getKey()        {
    
     return key; }
        public final V getValue()      {
    
     return value; }
        public final String toString() {
    
     return key + "=" + value; }

        public final int hashCode() {
    
    
            return Objects.hashCode(key) ^ Objects.hashCode(value);
        }

        public final V setValue(V newValue) {
    
    
            V oldValue = value;
            value = newValue;
            return oldValue;
        }
         // 判断两个Node是否相等
        // 若两个Node的“key”和“value”都相等,则返回true。
        // 否则,返回false
        public final boolean equals(Object o) {
    
    
            if (o == this)
                return true;
            if (o instanceof Map.Entry) {
    
    
                Map.Entry<?,?> e = (Map.Entry<?,?>)o;
                if (Objects.equals(key, e.getKey()) &&
                    Objects.equals(value, e.getValue()))
                    return true;
            }
            return false;
        }
    }

Under this structure, the addition, deletion, modification and query function of the collection is realized. Due to the limited space of this article, the source code implementation will not be introduced here.

HashMap traversal method
1. Traverse the key-value pairs of HashMap

第一步:根据entrySet()获取HashMap的“键值对”的Set集合。
第二步:通过Iterator迭代器遍历“第一步”得到的集合。

2. Traverse the keys of the HashMap

第一步:根据keySet()获取HashMap的“键”的Set集合。
第二步:通过Iterator迭代器遍历“第一步”得到的集合。

3. Traverse the value of HashMap

第一步:根据value()获取HashMap的“值”的集合。
第二步:通过Iterator迭代器遍历“第一步”得到的集合。

The LinkedHashMap implementation class
HashSet has a LinkedHashSet subclass, and HashMap also has a LinkedHashMap subclass; LinkedHashMap uses a doubly linked list to maintain the order of key-value pairs.
LinkedHashMap needs to maintain the insertion order of elements, so the performance is slightly lower than that of HashMap; but because it maintains the internal order with a linked list, it has better performance when iterating through all the elements in the Map. When the elements of LinkedHashMap are iteratively output, they will be output in the order of adding key-value pairs.
Essentially, LinkedHashMap=hash table + circular doubly linked list

TreeMap
TreeMap is the implementation class of the SortedMap interface. TreeMap is an ordered key-value collection, which is realized by a red-black tree, and each key-value pair is a node of the red-black tree.

TreeMap sorting method
TreeMap has two sorting methods, the same as TreeSet.

Natural sorting: All keys of the TreeMap must implement the Comparable interface, and all keys should be objects of the same class, otherwise a ClassCastException will be thrown.

Custom sorting: When creating a TreeMap, pass in a Comparator object, which is responsible for sorting all the keys in the TreeMap.

The standard for judging the equality of two elements key and value in TreeMap
is similar to the standard for judging the equality of two elements in TreeSet. The standard for judging the equality of two keys in TreeMap is: two keys return 0 through the compareTo() method, and TreeMap considers this The two keys are equal.

The criterion for judging that two values ​​are equal in TreeMap is: two values ​​are compared by the equals() method and return true.

Note: If you use a custom class as the key of TreeMap and want TreeMap to work well, you should keep the same return result when rewriting the equals() method and compareTo() method of this class: two keys pass equals() When method comparison returns true, they should return 0 when compared via the compareTo() method. If the return results of the two methods are inconsistent, the rules of the TreeMap and Map interfaces will conflict.

In addition, similar to TreeSet, TreeMap also adds some new methods according to the sorting characteristics, which are consistent with those in TreeSet. You can refer to the previous article.

The essence of TreeMap
Red-black tree
RB Tree, the full name is Red-Black Tree, also known as "red-black tree", it is a special binary search tree. Each node of the red-black tree has a storage bit indicating the color of the node, which can be red (Red) or black (Black).

The characteristics of red-black tree:
(1) Each node is either black or red.
(2) The root node is black.
(3) Each leaf node (NIL) is black. [Note: The leaf node here refers to the leaf node that is empty (NIL or NULL)! ]
(4) If a node is red, its child nodes must be black.
(5) All paths from a node to its descendant nodes contain the same number of black nodes.
insert image description here

Note:
(01) The leaf nodes in feature (3) are only empty (NIL or null) nodes.
(02) Property (5), ensuring that no path is twice as long as the others. Therefore, a red-black tree is a binary tree that is relatively close to balance.

The time complexity of the red-black tree is: O(log n)
For more information about adding, deleting, modifying and checking operations of the red-black tree, you can refer to this article.
It can be said that operations such as addition, deletion, modification and query of TreeMap are all performed on the basis of a red-black tree.

TreeMap traversal method
Traverse the key-value pairs of TreeMap

Step 1: Obtain the Set collection of "key-value pairs" of the TreeMap according to entrySet().
The second step: use the Iterator iterator to traverse the collection obtained in the "first step".

Traversing the keys of the TreeMap

Step 1: Obtain the Set collection of the "key" of the TreeMap according to keySet().
The second step: use the Iterator iterator to traverse the collection obtained in the "first step".

Traverse the value of TreeMap

Step 1: Obtain the collection of "values" of the TreeMap according to value().
The second step: use the Iterator iterator to traverse the collection obtained in the "first step".

Performance Analysis and Applicable Scenarios of Map Implementation Class
The implementation mechanism of HashMap and Hashtable is almost the same, but HashMap has better performance than Hashtable.
LinkedHashMap is a bit slower than HashMap because it needs to maintain a doubly linked list.
TreeMap is slower than HashMap and Hashtable (especially when inserting and deleting key-value), because the bottom layer of TreeMap uses red-black tree to manage key-value pairs.
Applicable scenarios:
For general application scenarios, consider using HashMap as much as possible, because it is designed for fast query.
Consider using TreeMap if specific ordering is required.
If you only need the order of insertion, consider using LinkedHashMap.

Guess you like

Origin blog.csdn.net/weixin_45817985/article/details/130685791