Java Collection Framework: HashMap

Original address: http://www.importnew.com/18604.html

Java Collection Framework Overview

Java Collection Framework is often involved in work, study, and interviews. I believe you are no strangers to it, and it is not necessary to say much about its power. , The blogger recently read the source code of the Java collection framework and searched for some relevant information to sort out the series of the Java collection framework. On the one hand, it is to make a summary, which is convenient for future reference. On the other hand, I hope that you can point out the shortcomings, and I will update and revise them in time.
Write a picture description here The
blogger took a picture from the Internet, and I think the picture is relatively vivid, so I will give you a reference.
In the above class diagram, the solid line borders are implementation classes, such as ArrayList, LinkedList, HashMap, etc. The polyline borders are abstract classes, such as AbstractCollection, AbstractList, AbstractMap, etc., and the dotted borders are interfaces, such as Collection, Iterator, List Wait.
I found a feature that all the above collection classes implement the Iterator interface, which is an interface used to traverse the elements in the collection, mainly including three methods: hashNext(), next(), and remove(). One of its sub-interfaces, LinkedIterator, adds three more methods to it, namely add(), previous(), hasPrevious(). That is to say, if the Iterator interface is used first, then when traversing the elements in the collection, it can only be traversed backward, and the traversed elements will not be traversed. Usually, the unordered collection implements this interface, such as HashSet, HashMap ; And those collections with ordered elements generally implement the LinkedIterator interface. The collection that implements this interface can be traversed in both directions. You can access the next element through next() and the previous element through previous(), such as ArrayList. .
This article first provides a detailed explanation of HashMap, and the follow-up content will be followed up slowly.

HashMap definition

If there is no special description, this article will take jdk7 as the standard for description.

package java.util;
import java.io.*;
public class HashMap<K,V>
    extends AbstractMap<K,V>
    implements Map<K,V>, Cloneable, Serializable{
}
HashMap overview

Working principle: through the hash method, Objects are stored and retrieved via put and get. When storing objects, when we pass K/V to the put method, it calls the hashCode to calculate the hash to get the bucket location, and then stores it further. The HashMap will automatically adjust the capacity according to the current bucket occupancy (if it exceeds the Load Facotr, the resize is twice the original size). ). When getting the object, we pass K to get, which calls hashCode to calculate the hash to get the bucket location, and further calls the equals() method to determine the key-value pair. If a collision occurs, Hashmap organizes the elements that cause collision and conflict through a linked list. In Java 8, if the elements of collision and conflict in a bucket exceed a certain limit (the default is 8), a red-black tree is used to replace the linked list. thereby increasing the speed.
The hash bucket array implemented by the Entry[] array can get the array subscript by taking the hash value of the key modulo the size of the bucket array.

1
2
static final Entry<?,?>[] EMPTY_TABLE = {};
transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;
based on Map interface implementation, allows null key/value, asynchronous, does not guarantee order (such as insertion order), nor It is guaranteed that the order does not change over time. HashMap stores Entry(hash, key, value, next) objects.
When key==null, it exists in table[0], that is, the first bucket, and the hash value is 0. HashMap will handle the key-value pairs with key==null separately, such as:

private V getForNullKey();
private V putForNullKey(V value);
these two methods.
There are two very important parameters in HashMap, Capacity and Load factor.
The default value of Capacity is 16:
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4;
the default value of load factor is 0.75:
static final float DEFAULT_LOAD_FACTOR = 0.75f;
Simply put, Capacity is the size of the bucket, and Load factor is the fullness of the bucket maximum ratio. If the iterative performance requirements are very high, do not set the Capacity too large, and do not set the load factor too small. When the number of entries in the bucket is greater than capacity*load factor, the bucket size needs to be adjusted to twice the current size.
You can set the initial capacity Capacity, but in the process of HashMap processing, the Capacity will be expanded to a multiple of 2, how do you understand it? For example, if the initial value you set is 17, but 17 is not an integral multiple of 2, it will expand to 32. For example, if you set 15 initially, it will expand to 16. The specific implementation is as follows:

private static int roundUpToPowerOf2(int number) {
       // assert number >= 0 : "number must be non-negative";
       return number >= MAXIMUM_CAPACITY
               ? MAXIMUM_CAPACITY
               : (number > 1) ? Integer.highestOneBit((number - 1) << 1) : 1;
   }
in HashMap There is a member variable modCount, which is used to implement the "fast-fail" mechanism (that is, fail fast). The so-called fail-fast is that in a concurrent collection, if there are other threads modifying its structure during the iterative operation, the iterator will immediately perceive it and throw a ConcurrentModificationException exception immediately, instead of waiting for the iteration to complete. You have gone wrong.

Traversal of HashMap

Take an example to describe the traversal of Map. For details, please refer to "How to Traverse Map Objects in Java"


Map<String,Integer> map = new HashMap<>();
        map.put("s1",
        map.put("s2", 2);
        map.put("s3", 3);
        map.put("s4", 4);
        map.put("s5", 5);
        map.put(null, 9);
        map.put("s6", 6);
        map.put("s7", 7);
        map.put("s8", ;
        for(Map.Entry<String,Integer> entry:map.entrySet())
        {
            System.out.println(entry.getKey()+":"+entry.getValue());
        }
输出结果:

null:9
s2:2
s1:1
s7:7
s8:8
s5:5
s6:6
s3:3
s4:4 the put function is as follows: The general idea of ​​writing a picture to describe
The storage structure is shown in the figure (the picture drawn by ppt is relatively simple~):




First determine whether the table (an array of bullets, the initial class definition: transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;) is empty, if it is empty, expand the table, including Make sure that the size of the table is an integer multiple of 2.
If the key value is null, special processing is performed, and putForNullKey(V value) is called, the hash value is 0, stored in the table, and returned.
If the key value is not null, calculate the hash value of the key;
then calculate the index index of the key in the table;
traverse the linked list of table[index], if it is found that the hash value of the key in the bullet is equal to the key in the linked list, and call The equals() method also returns true, then replace the old value (oldValue) to ensure the uniqueness of the key;
if not, judge the size of the threshold in the table before inserting, if the number of bullets in the table exceeds the threshold (threshold) then Expansion (resize) twice; pay attention to the order of expansion, old1->old2->old3 before expansion, old3->old2->old1 after expansion, the index of the table before expansion and after expansion may not be the same, but for the original bullet The data in the linked list must still be in a linked list after expansion, because the hash value is the same.
Insert a new bullet. Pay attention to the order of insertion. The original linked list of table[index] is old1->old2->old3, and after inserting a new value, it is new1->old1->old2->old3.

public V put(K key, V value) {
      if ( table == EMPTY_TABLE) {
          inflateTable(threshold);
      }
      if (key == null)
          return putForNullKey(value);
      int hash = hash(key);
      int i = indexFor(hash, table.length);
      for (Entry<K,V> e = table[i]; e != null; e = e.next) {
          Object k;
          if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
              V oldValue = e.value;
              e.value = value;
              e.recordAccess(this);
              return oldValue;
          }
      }

      modCount++;
      addEntry(hash, key, value, i);
      return null;
  }
  void addEntry(int hash, K key, V value, int bucketIndex) {
      if ((size >= threshold) && (null != table[bucketIndex])) {
          resize(2 * table.length);
          hash = (null ! = key) ? hash(key) : 0;
          bucketIndex = indexFor(hash, table.length);
      }

      createEntry(hash, key, value, bucketIndex);
  }
Note: In jdk8, a new threshold of 8 is added by default ( TREEIFY_THRESHOLD), when the Entry in a bucket exceeds the threshold, it is not stored in a singly linked list but a red-black tree to speed up the key search speed.

The general idea of ​​the get function is: to

determine whether the key value is null, if so, special processing (find it in the linked list of table[0]),
otherwise calculate the hash value, and then obtain the index value
in the table in the linked list of table[index] Find and obtain the corresponding value according to the hash value and the equals() method.

public V get(Object key) {
       if (key == null)
           return getForNullKey();
       Entry<K,V> entry = getEntry(key);

       return null == entry ? null : entry.getValue();
   }
   final Entry<K,V> getEntry(Object key) {
       if (size == 0) {
           return null;
       }

       int hash = (key == null) ? 0 : hash(key);
       for (Entry<K,V> e = table[indexFor(hash, table.length)];
            e != null;
            e = e.next) {
           Object k;
           if (e.hash == hash &&
               ((k = e.key) == key || (key != null && key.equals(k))))
               return e;
       }
       return null;
   }
hash和indexFor方法:

final int hash(Object k) {
        int h = hashSeed;
        if (0 != h && k instanceof String) {
            return sun.misc.Hashing.stringHash32((String) k);
        }

        h ^= k.hashCode();
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }
    public native int hashCode();
    static int indexFor( int h, int length) {
        // assert Integer.bitCount(length) == 1 : "length must be a non-zero power of 2";
        return h & (length-1);
    }
Do you understand? Or compare the image under the field test~
For example, key=="key", then the hashCode of the key is (106079 in decimal):
0000 0000 0000 0001 1001 1110 0101 1111
When executing h ^= (h >>> 20) ^ (h >>> 12);
h>>>20:
0000 0000 0000 0000 0000 0000 0000 0000
h>>>12:
0000 0000 0000 0000 0000 0000 0001 1001
h ^= (h >>> 20) ^ (h >>> 12):
0000 0000 0000 0001 1001 1110 0100 0110 h >>
when h ^ (h >>> 7) ^ (h >>> 4)
> 7:
0000 0000 0000 0000 0000 0011 0011 1100
h>>>4:
0000 0000 0000 0000 0001 1001 1110 0100
h ^ (h >>> 7) ^ (h >>> 4):
0000 0000 001 0000 0 1110 (99486 in decimal)
Assuming Capacity is the default 16, then
h&(length-1):
0000 0000 0000 0001 1000 0100 1001 1110 &
0000 0000 0000 0000 0000 0000 0000 1111 =
0000 0000 0000 0000 0000 0000 0000 1110 = 14
We can view the relevant details through the Variables in the debug work of Eclipse:

write the picture description here, and
you can see that key=”key” is indeed in table[14].
The mode of h & (length-1) using bitwise operation is faster, and this requires that the size of the array is always 2 to the Nth power to ensure its validity.

The implementation of the hash function in jdk8: (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);

Use a custom object as a key

You can use any object as a key, as long as it The definition rules of the equals() and hashCode() methods are followed, and the object will not change after it is inserted into the Map. If the custom object is immutable, then it satisfies the condition of being a key, because it cannot be changed after it is created.
As follows:

package collections;

public class StringOther
{
    private String name;

    public StringOther(String name)
    {
        this.name = name;
    }

    @Override
    public int hashCode()
    {
        return name.hashCode();
    }

    @Override
    public boolean equals(Object obj)
    {
        if(obj == this)
            return true;
        if(!(obj instanceof StringOther))
            return false;
        StringOther so = (StringOther)obj;
        return so.getName().equals(name);
    }

    public String getName()
    {
        return this.name;
    }

    @Override
    public String toString()
    {
        return "["+this.name+":"+this.hashCode()+"]";
    }
}
测试代码:

Map<StringOther,String> maps = new HashMap<>(16);
     StringOther so1 = new StringOther("so1");
     StringOther so2 = new StringOther("so2");
     maps.put(so1,"1");
     maps.put(so2,"2");

     System.out.println( maps);
output result: {[so1:114005]=1, [so2:114006]=2}
Note that when overriding the equals method, it must be written as:

@Override public boolean equals(Object obj)
, pay attention to the annotations Override and The parameter type Object obj, if written as @Override public boolean equals(StringOther obj), will cause unexpected errors. If you don't understand it, you can leave a message below.
And in each class that overrides the equals method, the hashCode method must also be overridden. If you don't do this, it will violate the general contract of Object.hashCode, which will cause the class to not work properly with all hash-based collections, so The collection includes HashMap, HashSet and Hashtable.

Serialization

Careful friends may notice that HashMap implements the Serializable interface, but the definition of table (transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE; ) is transient. Then I implemented the writeObject() and readObject() methods myself.

Assume what we already know: variables declared as transient are no longer part of object persistence. If you don't know the details of java serialization, you can refer to "JAVA Serialization".

So why is this? There are two reasons:

The put and get of HashMap are based on the implementation of hashCode, and hashCode is a native method. For each different java environment, the hashCode calculated by the same key is different, so the table after deserialization The index will change, and it is impossible to restore
the default size of HashMap and expand after reaching the threshold. Obviously, HashMap cannot guarantee that every bullet has data, and many are null. If this part of data is serialized, it will cause unnecessary waste of resources. .
Since Java's own serialization has obvious shortcomings: large space and inefficiency, bloggers suggest using other methods for serialization, such as json or protobuff, etc. For details, please refer to "Analysis of Several Java Serialization Tools"

and Hashtable's Differences

Since Hashtable is gradually withdrawing from the historical stage, we will not start another article to focus on analysis, and only distinguish it from HashMap here.
Both HashMap and Hashtable implement the Map interface, but before deciding which one to use, you need to figure out the difference between them.

HashMap can be almost equivalent to Hashtable, except that HashMap is non-synchronized and can accept null (HashMap can accept null keys and values, while Hashtable's key and value cannot be null, otherwise it will be reported NullPointException).
HashMap is non-synchronized, and Hashtable is synchronized, which means that Hashtable is thread-safe, and multiple threads can share a Hashtable; and if there is no correct synchronization, multiple threads cannot share HashMap. Java 5 provides ConcurrentHashMap, which is a replacement for HashTable and is more scalable than HashTable.
Another difference is that HashMap's Iterator is a fail-fast iterator, while Hashtable's enumerator iterator is not fail-fast. So when other threads change the structure of HashMap (add or remove elements), ConcurrentModificationException will be thrown, but the remove() method of the iterator itself will not throw ConcurrentModificationException when removing elements. But this is not a guaranteed behavior, it depends on the JVM. This is also the difference between Enumeration and Iterator.
Since Hashtable is thread-safe and synchronized, it is slower than HashMap in a single-threaded environment. If you don't need synchronization and only need a single thread, then using HashMap is better than Hashtable.
HashMap does not guarantee that the order of elements in the Map will not change over time.
Hashtable and HashMap their two internal implementations of the initial size and expansion of the array. The default size of the hash array in HashTable is 11, and the way to increase it is old*2+1. The default size of the hash array in HashMap is 16, and it must be an index of 2.
HashMap can be synchronized by the following statement: Map m = Collections.synchronizeMap(hashMap);
Hashtable and HashMap have several major differences: thread safety and speed. Only use Hashtable if you need complete thread safety, and if you use Java 5 or above, use ConcurrentHashMap instead.

HashMap FAQ summarizes

the working principle of HashMap? (See the beginning)
Do you know the principle of get and put? What is the role of equals() and hashCode()?
Answer: By hashing the hashCode() of the key, and calculating the subscript (n-1 & hash), the location of the buckets can be obtained. If a collision occurs, use the key.equals() method to find the corresponding node in the linked list or tree.
Why are wrapper classes like String, Interger suitable as keys?
Wrapper classes such as String and Interger are suitable as keys of HashMap, and String is the most commonly used. Because String is immutable and final, and the equals() and hashCode() methods have been overridden. Other wrapper classes also have this feature. Immutability is necessary because in order to calculate the hashCode(), the key value must be prevented from changing, and if the key value returns different hashcodes when put and when obtained, then you will not be able to find the object you want from the HashMap. Immutability also has other advantages such as thread safety. If you can keep the hashCode constant just by declaring a field final, then do so. Because the equals() and hashCode() methods are used to obtain the object, it is very important for the key object to correctly override these two methods. If two unequal objects return different hashcodes, the chance of collision will be smaller, which will improve the performance of HashMap.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326773369&siteId=291194637