HashMap interview essentials

HashMap overview

HashMap is an implementation of the Map interface. HashMap allows empty key-value key-value pairs. HashMap is an enhanced version of Hashtable. HashMap is a non-thread-safe container. If you want to construct a thread-safe Map, consider using ConcurrentHashMap. HashMap is unordered, because HashMap cannot guarantee the order of the key-value pairs stored internally.

The underlying data structure of HashMap is a collection of array + linked list. Array is also called in HashMap 桶(bucket). The time loss required to traverse the HashMap is the number of HashMap instance buckets + the number of (key-value mapping).

HashMap has two important factors, initial capacity and load factor. The initial capacity refers to the number of hash table buckets. The load factor is a measure of the degree of filling of the hash table. When there are a sufficient number of entries in the hash table, If the load factor and current capacity are exceeded, the hash table will be rehashed, and the internal data structure will be rebuilt.

Note that HashMap is not thread safe. If multiple threads affect HashMap at the same time, and at least one thread modifies the structure of HashMap, then HashMap must be synchronized. Can be used  Collections.synchronizedMap(new HashMap) to create a thread-safe Map.

HashMap will cause in addition to the remove of the iterator itself, the external remove method may cause the fail-fast mechanism, so try to use the iterator's own remove method. If the structure of the map is modified during the creation of the iterator, an ConcurrentModificationException exception will be thrown  .

Important attributes

Initial capacity (16)

The default initial capacity of HashMap is managed by  DEFAULT_INITIAL_CAPACITY attributes.

static final int DEFAULT_INITIAL_CAPACITY = 1 << 4;

Maximum capacity

static final int MAXIMUM_CAPACITY = 1 << 30;

Default load factor

static final float DEFAULT_LOAD_FACTOR = 0.75f;

The principle of the expansion mechanism is that when the amount stored in the HashMap> HashMap capacity * load factor, the capacity of the HashMap will be doubled.

Modifications

In HashMap, it is used  modCount to indicate the number of modifications, which is mainly used for the fail-fast mechanism of concurrent modification of HashMap.

Expansion threshold

In HashMap, use  threshold the threshold value representing expansion, that is, the value of initial capacity * load factor.

Load factor

loadFactor Represents the load factor, which represents the density in the HashMap.

HashMap data structure

In JDK1.7, HashMap uses an array + linked list implementation, that is, a linked list is used to handle conflicts, and the linked lists of the same hash value are stored in an array. But when there are many elements in an array, that is, when there are many elements with equal hash values, the efficiency of searching sequentially by key value is low.

Therefore, compared with JDK 1.7, JDK 1.8 has made some optimizations in the underlying structure. When the elements in each array are greater than 8, it will be transformed into a red-black tree. The purpose is to optimize query efficiency.

Node interface

The Node node is used to store instances of HashMap, which implements the  Map.Entryinterface. Let’s first look at the definition of the internal interface Entry interface in the Map.

Map.Entry

// 一个map 的entry 链,这个Map.entrySet()方法返回一个集合的视图,包含类中的元素,// 这个唯一的方式是从集合的视图进行迭代,获取一个map的entry链。这些Map.Entry链只在// 迭代期间有效。interface Entry<K,V> {
   
     K getKey();  V getValue();  V setValue(V value);  boolean equals(Object o);  int hashCode();}

The Node node will store four attributes, hash value, key, value, and a reference to the next Node node

 // hash值final int hash;// 键final K key;// 值V value;// 指向下一个Node节点的Node类型Node<K,V> next;

Because Map.Entry is connected by entry chains, Node nodes are also entry chains. When constructing a new HashMap instance, these four attribute values ​​will be divided into incoming

Node(int hash, K key, V value, Node<K,V> next) {
   
     this.hash = hash;  this.key = key;  this.value = value;  this.next = next;}

KeySet inner class

The keySet class inherits from the AbstractSet abstract class. It uses keyset() methods in HashMap  to create KeySet instances and is designed to operate on the key keys in HashMap.

// 返回一个set视图,这个视图中包含了map中的key。public Set<K> keySet() {
   
     // // keySet 指向的是 AbstractMap 中的 keyset  Set<K> ks = keySet;  if (ks == null) {
   
       // 如果 ks 为空,就创建一个 KeySet 对象    // 并对 ks 赋值。    ks = new KeySet();    keySet = ks;  }  return ks;}

Values ​​inner class

The creation of the Values ​​class is actually very similar to the KeySet class, but KeySet is designed to operate on the key-value keys in the Map, and Values ​​is designed to use the value in the key-value pair

public Collection<V> values() {
   
     // values 其实是 AbstractMap 中的 values  Collection<V> vs = values;  if (vs == null) {
   
       vs = new Values();    values = vs;  }  return vs;}

 

EntrySet inner class

key-value Inner class that operates on  key-value pairs

// 返回一个 set 视图,此视图包含了 map 中的key-value 键值对public Set<Map.Entry<K,V>> entrySet() {
   
     Set<Map.Entry<K,V>> es;  return (es = entrySet) == null ? (entrySet = new EntrySet()) : es;}

HashMap put process

First, the hash method is used to calculate the hash code of the object, and the location in the array is determined according to the hash code. If there is no Node node in the array, put directly. If the corresponding array already has a Node node, the length of the linked list will be analyzed , To determine whether the length is greater than 8, if the length of the linked list is less than 8, the head interpolation method will be used before JDK1.7, and the tail interpolation method will be changed after JDK1.8. If the length of the linked list is greater than 8, treeing will be performed, and the linked list will be converted to a red-black tree and stored on the red-black tree.

//JDK1.8final V putVal(int hash, K key, V value, boolean onlyIfAbsent,                   boolean evict) {
   
     Node<K,V>[] tab; Node<K,V> p; int n, i;  // 如果table 为null 或者没有为 table 分配内存,就resize一次  if ((tab = table) == null || (n = tab.length) == 0)    n = (tab = resize()).length;  // 指定hash值节点为空则直接插入,这个(n - 1) & hash才是表中真正的哈希  if ((p = tab[i = (n - 1) & hash]) == null)    tab[i] = newNode(hash, key, value, null);  // 如果不为空  else {
   
       Node<K,V> e; K k;    // 计算表中的这个真正的哈希值与要插入的key.hash相比    if (p.hash == hash &&        ((k = p.key) == key || (key != null && key.equals(k))))      e = p;    // 若不同的话,并且当前节点已经在 TreeNode 上了    else if (p instanceof TreeNode)      // 采用红黑树存储方式      e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);    // key.hash 不同并且也不再 TreeNode 上,在链表上找到 p.next==null    else {
   
         for (int binCount = 0; ; ++binCount) {
   
           if ((e = p.next) == null) {
   
             // 在表尾插入          p.next = newNode(hash, key, value, null);          // 新增节点后如果节点个数到达阈值,则进入 treeifyBin() 进行再次判断          if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st            treeifyBin(tab, hash);          break;        }        // 如果找到了同 hash、key 的节点,那么直接退出循环        if (e.hash == hash &&            ((k = e.key) == key || (key != null && key.equals(k))))          break;        // 更新 p 指向下一节点        p = e;      }    }    // map中含有旧值,返回旧值    if (e != null) { // existing mapping for key      V oldValue = e.value;      if (!onlyIfAbsent || oldValue == null)        e.value = value;      afterNodeAccess(e);      return oldValue;    }  }  // map调整次数 + 1  ++modCount;  // 键值对的数量达到阈值,需要扩容  if (++size > threshold)    resize();  afterNodeInsertion(evict);  return null;}

How to traverse HashMap

The base class of HashMap traversal is  HashIterator, it is a Hash iterator, it is an abstract class inside HashMap, its structure is relatively simple, there are only three methods, hasNext, remove and nextNode methods, of which the nextNode method consists of three iterators Realized, these three iterators are

  • KeyIterator , Traverse the key

  • ValueIterator, Traverse the value

  • EntryIterator, Traverse the Entry chain

Their traversal order is the same, they are all  traversed HashIterator in the  nextNodemethod in use 

final class KeyIterator extends HashIterator        implements Iterator<K> {
   
           public final K next() { return nextNode().key; }    }final class ValueIterator extends HashIterator  implements Iterator<V> {
   
     public final V next() { return nextNode().value; }}final class EntryIterator extends HashIterator  implements Iterator<Map.Entry<K,V>> {
   
     public final Map.Entry<K,V> next() { return nextNode(); }}

Traversal in HashIterator

abstract class HashIterator {
   
     Node<K,V> next;        // 下一个 entry 节点  Node<K,V> current;     // 当前 entry 节点  int expectedModCount;  // fail-fast 的判断标识  int index;             // 当前槽  HashIterator() {
   
       expectedModCount = modCount;    Node<K,V>[] t = table;    current = next = null;    index = 0;    if (t != null && size > 0) { // advance to first entry      do {} while (index < t.length && (next = t[index++]) == null);    }  }  public final boolean hasNext() {
   
       return next != null;  }  final Node<K,V> nextNode() {
   
       Node<K,V>[] t;    Node<K,V> e = next;    if (modCount != expectedModCount)      throw new ConcurrentModificationException();    if (e == null)      throw new NoSuchElementException();    if ((next = (current = e).next) == null && (t = table) != null) {
   
         do {} while (index < t.length && (next = t[index++]) == null);    }    return e;  }  public final void remove() {...}}

Next and current respectively represent the next Node node and the current Node node. HashIterator will traverse all nodes during initialization

HashMap thread is not safe

HashMap is not a thread-safe container, and its insecurity is reflected in the concurrent put operation of HashMap by multiple threads. If there are two threads A and B, first A wants to insert a key-value pair into the HashMap. When the bucket position is determined and put, the time slice of A is just used up. It is B's turn to run, and then B is run. The same operation as A, except that B successfully inserts the key-value pair. If the insert position (bucket) of A and B is the same, then thread A will overwrite the record of B after it continues to execute, causing data inconsistency. Another point is that when the HashMap is expanding, the resize method will form a loop, causing an endless loop and causing the CPU to soar.

How HashMap handles hash collisions

The bottom layer of HashMap is implemented using bit bucket + linked list. The bit bucket determines the insertion position of the element. The bit bucket is determined by the hash method. When the hash of multiple elements is calculated to obtain the same hash value, HashMap will combine multiple Node elements They are all placed in the corresponding bit buckets to form a linked list. This method of handling hash collisions is called the chain address method.

Other ways to deal with hash collisions include open address method, rehash method, and establishment of a public overflow area.

How does HashMap get elements

First, it checks whether the elements in the table are empty, and then calculates the location of the specified key based on the hash. Then check whether the first element of the linked list is empty, if it is not empty, whether it matches, if it matches, directly return this record; if it does not match, then judge whether the value of the next element is null, and return directly if it is empty TreeNode If it is not empty, then judge whether it is an  instance, if it is a TreeNode instance, use the TreeNode.getTreeNode extracted element directly  , otherwise execute the loop until the next element is the null position.

What is the difference between HashMap and HashTable

Same point

Both HashMap and HashTable are implemented based on hash tables. Each element in it is a  key-value key-value pair. HashMap and HashTable implement Map, Cloneable, and Serializable interfaces.

difference

  • The parent class is different: HashMap inherits the  AbstractMap class, while HashTable inherits the  Dictionary class

  • Null values ​​are different: HashMap allows empty key and value values, HashTable does not allow empty key and value values. HashMap will treat Null keys as ordinary keys. Duplicate null keys are not allowed. 

     

  • Thread safety: HashMap is not thread-safe. If multiple external operations modify the data structure of HashMap at the same time, such as add or delete, synchronization operations must be performed. Only the modification of key or value is not an operation to change the data structure. You can choose to construct a thread-safe Map such as  Collections.synchronizedMap or yes  ConcurrentHashMap. The HashTable itself is thread-safe.

  • Performance: Although both HashMap and HashTable are based on singly linked lists, HashMap can achieve constant-time performance through put or get operations; while HashTable's put and get operations are  synchronized locked, so the efficiency is very poor.

  • The initial capacity is different: the initial length of HashTable is 11, after each expansion, the capacity becomes the previous 2n+1 (n is the length of the previous time)

    The initial length of HashMap is 16, and after each expansion it becomes twice the original length. When creating, if the initial value of the capacity is given, then HashTable will directly use the size you give, and HashMap will expand it to the power of two.

The difference between HashMap and HashSet

HashSet inherits from AbstractSet interface and implements Set, Cloneable, and java.io.Serializable interfaces. HashSet does not allow duplicate values ​​in the set. The bottom layer of HashSet is actually HashMap, and all operations on HashSet are actually operations on HashMap. Therefore, HashSet does not guarantee the order of the collection.

How does HashMap expand?

There are two very important variables in HashMap. One is  loadFactor , the other is  threshold , loadFactor represents the load factor, and threshold represents the next expansion threshold. When threshold = loadFactor * array length, the array length is expanded to the original two. Double, to resize the map and put the original objects into the new bucket array.

Why the length of HashMap is a power of 2

Why length%hash == (n-1) & hash, because the length of HashMap is a power of 2, so the remainder is used to determine the subscript in the bucket. If the length of length is not a power of 2, friends can give an example to try

For example, when the length is 9, 3 & (9-1) = 0, 2 & (9-1) = 0, both are on 0 and collide;

This will increase the chance of HashMap collision.

What are the implementations of HashMap thread safety

Because HashMap is not a thread-safe container, it is recommended to use it in concurrent scenarios  ConcurrentHashMap , or use a thread-safe HashMap, use  Collections a thread-safe container under the package, for example

Collections.synchronizedMap(new HashMap());

You can also use HashTable, which is also a thread-safe container. Based on key-value storage, HashMap and HashTable are often compared because the data structure of HashTable is the same as HashMap.

Guess you like

Origin blog.csdn.net/feikillyou/article/details/112725411