Get to know HashMap

HashMap in Java for the hash table (also called a hash table) to achieve, is the data type of Java programmers the most frequently used for mapping (key-value pairs) process, but also the interviewer's favorite. Get to know HashMap, it is very important.

I read so many articles, as commuting time code.

Variable parameters

Let's look at HashMap relative importance of variables,

/**
* The default initial capacity - MUST be a power of two.
* 默认的初始化容量16,这个值一定是2的幂
*/
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

/**
* The maximum capacity, used if a higher value is implicitly specified
* by either of the constructors with arguments.
* MUST be a power of two <= 1<<30.
* 最大的容量,如果HashMap(int initialCapacity, float loadFactor)构造器中的参数
* 较大,也取这个值。这个值一定是2的幂,且<= 1<<30
*/
static final int MAXIMUM_CAPACITY = 1 << 30;

/**
* The load factor used when none specified in constructor.
* 默认的负载因子0.75(3/4)
*/
static final float DEFAULT_LOAD_FACTOR = 0.75f;

/**
* The bin count threshold for using a tree rather than list for a
* bin.  Bins are converted to trees when adding an element to a
* bin with at least this many nodes. The value must be greater
* than 2 and should be at least 8 to mesh with assumptions in
* tree removal about conversion back to plain bins upon
* shrinkage.
* 从链表进化成树结构的阀值
*/
static final int TREEIFY_THRESHOLD = 8;

 /**
 * The bin count threshold for untreeifying a (split) bin during a
 * resize operation. Should be less than TREEIFY_THRESHOLD, and at
 * most 6 to mesh with shrinkage detection under removal.
 * 从树结构退化成链表的阀值
 */
static final int UNTREEIFY_THRESHOLD = 6;

 /**
 * The smallest table capacity for which bins may be treeified.
 * (Otherwise the table is resized if too many nodes in a bin.)
 * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
 * between resizing and treeification thresholds.
 * 树化应满足的最小容量。(否则如果不满足此值,应该对HashMap扩容)
 * (这个值应该至少是4 * TREEIFY_THRESHOLD,来决定是扩容还是树化)
 */
static final int MIN_TREEIFY_CAPACITY = 64;

/**
* The number of times this HashMap has been structurally modified
* Structural modifications are those that change the number of mappings in
* the HashMap or otherwise modify its internal structure (e.g.,
* rehash).  This field is used to make iterators on Collection-views of
* the HashMap fail-fast.  (See ConcurrentModificationException).
* Java的Fast-fail标志
*/
transient int modCount;

Appreciated that the above parameters, first look at the data structure view of HashMap

Size of the array, i.e., the size of the hash table buckets, default is 16, corresponding to the capacity of HashMap above;

The sum of the number of the list, i.e. the number of nodes of the tree HashMap actually stored;

Capacity load factor multiplied HashMap HashMap i.e. expansion threshold.

Initial capacity, the load factor can be adjusted by the configuration parameter.

Ideally, each position on the array has one and only one node, so that space and time are the most perfect; in fact impossible, and when the hash (Key) collision, will use the list to load.

TREEIFY_THRESHOLD, UNTREEIFY_THRESHOLD, MIN_TREEIFY_CAPACITYThe HashMap decide when to use a linked list or tree structure to store entries.

The interviewer asked Love

  1. 16 is a power of 2, also 8, 32 am, why should I choose 16?

    No particular reason, is an empirical value, the authors believe 16 this initial capacity is able to meet common.

  2. Why TREEIFY_THRESHOLD take 8 to determine the list evolved into a tree?

    The problem, Java annotations to it

    Because TreeNodes are about twice the size of regular nodes, we
    use them only when bins contain enough nodes to warrant use
    (see TREEIFY_THRESHOLD). And when they become too small (due to
    removal or resizing) they are converted back to plain bins.  In
    usages with well-distributed user hashCodes, tree bins are
    rarely used.  Ideally, under random hashCodes, the frequency of
    nodes in bins follows a Poisson distribution
    (http://en.wikipedia.org/wiki/Poisson_distribution) with a
    parameter of about 0.5 on average for the default resizing
    threshold of 0.75, although with a large variance because of
    resizing granularity. Ignoring variance, the expected
    occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
    factorial(k)). The first values are:
       0:    0.60653066
       1:    0.30326533
       2:    0.07581633
       3:    0.01263606
       4:    0.00157952
       5:    0.00015795
       6:    0.00001316
       7:    0.00000094
       8:    0.00000006
       more: less than 1 in ten million
    因为TreeNode的大小约为常规节点的两倍,所以我们仅在bin包含TREEIFY_THRESHOLD的节点时才使用它们。当它们变得太小(由于移除或调整大小)时,它们会转换回常规的节点。在使用具有良好分布的用户hashCode的用法中,很少使用树。理想情况下,在随机hashCodes下,bin中节点的频率遵循Poisson分布。默认负载因子为0.75,平均参数约为0.5,尽管由于调整粒度的差异很大。忽略方差,列表大小k的预期出现次数是(exp(-0.5)* pow(0.5,k)/ factorial(k))。可以看出,K为8时,出现的概率时亿分之6。

Constructor

  1. No special requirements, in general, we use
/**
* Constructs an empty <tt>HashMap</tt> with the default initial capacity
* (16) and the default load factor (0.75).
*/
public HashMap() {
  this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}

Constructs a default capacity and the default load factor of the HashMap, pay attention here, in fact, has not been initialized HashMap.

  1. Taking into account the storage capacity, we use
/**
* Constructs an empty <tt>HashMap</tt> with the specified initial
* capacity and the default load factor (0.75).
*
* @param  initialCapacity the initial capacity.
* @throws IllegalArgumentException if the initial capacity is negative.
*/
public HashMap(int initialCapacity) {
  this(initialCapacity, DEFAULT_LOAD_FACTOR);
}

Specifies a default configuration and capacity of the HashMap load factor, designated herein excessive expansion due to low performance, discussed below.

  1. If you want the depth of customization, you can use
/**
* Constructs an empty <tt>HashMap</tt> with the specified initial
* capacity and load factor.
*
* @param  initialCapacity the initial capacity
* @param  loadFactor      the load factor
* @throws IllegalArgumentException if the initial capacity is negative
*         or the load factor is nonpositive
*/
public HashMap(int initialCapacity, float loadFactor) {
  if (initialCapacity < 0)
    throw new IllegalArgumentException("Illegal initial capacity: " +
                                       initialCapacity);
  if (initialCapacity > MAXIMUM_CAPACITY)
    initialCapacity = MAXIMUM_CAPACITY;
  if (loadFactor <= 0 || Float.isNaN(loadFactor))
    throw new IllegalArgumentException("Illegal load factor: " +
                                       loadFactor);
  this.loadFactor = loadFactor;
  this.threshold = tableSizeFor(initialCapacity);
}

Specified configuration and capacity of the HashMap specified load factor, particular attention (taken from annotations based) use,

As a general rule, the default load factor (.75) offers a good
tradeoff between time and space costs.Higher values decrease the
space overhead but increase the lookup cost (reflected in most of
the operations of the <tt>HashMap</tt> class, including
<tt>get</tt> and <tt>put</tt>). The expected number of entries in
the map and its load factor should be taken into account when
setting its initial capacity, so as to minimize the number of
rehash operations.  If the initial capacity is greater than the
maximum number of entries divided by the load factor, no rehash
operations will ever occur.
通常默认负载因子0.75提供了在时间和空间之间提供了很好的折中。更高的负载因子减少空间但是会增加查询消耗(HashMap的大部分操作,包括get、put)。预期的存储数量和负载因子应该在初始化容量时候进行考虑,以减少rehash的操作。如果初始化容量大于预期存储数量除以负载因子,将不会发生rehash的操作。

Hashing process

Hash table is an array of support in accordance with the characteristics of random access array subscript to ensure efficient query. Put Keyinto the array of the subject method is called a hash function , and the three-lobed function of calculated values is called the hash value .

Then take a look at HashMap in hashfunction,

/**
* Computes key.hashCode() and spreads (XORs) higher bits of hash
* to lower.  Because the table uses power-of-two masking, sets of
* hashes that vary only in bits above the current mask will
* always collide. (Among known examples are sets of Float keys
* holding consecutive whole numbers in small tables.)  So we
* apply a transform that spreads the impact of higher bits
* downward. There is a tradeoff between speed, utility, and
* quality of bit-spreading. Because many common sets of hashes
* are already reasonably distributed (so don't benefit from
* spreading), and because we use trees to handle large sets of
* collisions in bins, we just XOR some shifted bits in the
* cheapest possible way to reduce systematic lossage, as well as
* to incorporate impact of the highest bits that would otherwise
* never be used in index calculations because of table bounds.
*/
static final int hash(Object key) {
  int h;
  // key可以使空的哦
  return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

Can be obtained from a comment,

  1. High and low 16-bit or 16-bit exclusive, high retention information disguised reduce collisions .

When the two Key's hashCode changes only at a high level, it is prone to crash, this is mainly due to the HashMap hash algorithm.

  1. Since the common hashCode enough reasonable distribution, and because it uses a tree structure to solve the query efficiency when the collision is serious, and therefore calculated using relatively simple hash.

The hash logic tab[i = (n - 1) & hash]), i.e. 对tab长度进行取余, there is clearly to improve the computational efficiency.

当 lenth = 2^n 时,X % length = X & (length - 1)

Look below the actual hash process, with low or high exclusive

Let's look at the hash logic,

Here you can see, when the HashMap is relatively small such as the default 16, the number of bits involved are relatively low and few, when the key changes are concentrated in the high, low can be expressed (2,18,34), (6 , 22, 38) to a multiple of 16 such as the difference between the arithmetic sequence will form a collision, thus the time for the hash, the exclusive high or to low level of broken bits.

The interviewer asked Love

  1. Java7 and Java8 of HashMap what changes?

    • hash function, the exclusive-OR 5 7, 8 once;

    • Entry is stored, linked lists 7, 8 and a red-black tree linked lists;

Manipulation Functions

put

/**
* 关联k,v。
* Associates the specified value with the specified key in this map.
* If the map previously contained a mapping for the key, the old
* value is replaced.
*
* @param key key with which the specified value is to be associated
* @param value value to be associated with the specified key
* @return the previous value associated with <tt>key</tt>, or
*         <tt>null</tt> if there was no mapping for <tt>key</tt>.
*         (A <tt>null</tt> return can also indicate that the map
*         previously associated <tt>null</tt> with <tt>key</tt>.)
*/
public V put(K key, V value) {
  return putVal(hash(key), key, value, false, true);
}

/**
* Implements Map.put and related methods.
*
* @param hash hash for key
* @param key the key
* @param value the value to put
* @param onlyIfAbsent if true, don't change existing value
* @param evict if false, the table is in creation mode.
* @return previous value, or null if none
*/
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
  Node<K,V>[] tab; Node<K,V> p; int n, i;
  // table未初始化或者长度为0,进行扩容
  if ((tab = table) == null || (n = tab.length) == 0)
    n = (tab = resize()).length;
  // 如果桶里没有值,新生成新节点放入
  if ((p = tab[i = (n - 1) & hash]) == null)
    tab[i] = newNode(hash, key, value, null);
  else { // 否则桶里已经有值p
    Node<K,V> e; K k;
    // 比较第一个元素hash相等、key相等
    if (p.hash == hash && 
        ((k = p.key) == key || (key != null && key.equals(k))))
      e = p; // 将第一个赋值给e
    else if (p instanceof TreeNode)
      // 否则如果是树,则放到树里
      e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
    else {
      // 否则不会为树,是链表
      for (int binCount = 0; ; ++binCount) {
        // 加到链表末尾
        if ((e = p.next) == null) {
          p.next = newNode(hash, key, value, null);
          // 结点数量达到阈值,调用treeifyBin()做进一步判断是否转为红黑树
          if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
            treeifyBin(tab, hash);
          break;
        }
        // 链表中有节点与当前插入的节点hash相等、key相等
        if (e.hash == hash &&
            ((k = e.key) == key || (key != null && key.equals(k))))
          break;
        p = e;
      }
    }
    if (e != null) { // existing mapping for key
      // 存在key值、hash值与插入元素相等的结点
      V oldValue = e.value;
      // 当不是onlyIfAbsent 或者 旧值为空的情况,更新value
      if (!onlyIfAbsent || oldValue == null)
        e.value = value;
      afterNodeAccess(e); // 回调函数
      return oldValue;
    }
  }
  ++modCount;
  // Fast-fail标志
  if (++size > threshold)
    // 如果荣玲已经到达上限,扩容
    resize();
  afterNodeInsertion(evict); // 回调函数
  return null;
}

resize

/**
* 初始化或者两倍扩容。
* Initializes or doubles table size.  If null, allocates in
* accord with initial capacity target held in field threshold.
* Otherwise, because we are using power-of-two expansion, the
* elements from each bin must either stay at same index, or move
* with a power of two offset in the new table.
*
* @return the table
*/
final Node<K,V>[] resize() {
  Node<K,V>[] oldTab = table;
  int oldCap = (oldTab == null) ? 0 : oldTab.length;
  int oldThr = threshold;
  int newCap, newThr = 0;
  //以前的容量大于0,也就是hashMap中已经有元素了,或者new对象的时候设置了初始容量
  if (oldCap > 0) {
    if (oldCap >= MAXIMUM_CAPACITY) {
      //如果以前的容量大于限制的最大容量1<<30,则设置临界值为int的最大值2^31-1
      threshold = Integer.MAX_VALUE;
      return oldTab;
    }
    /**
    * 如果以前容量的2倍小于限制的最大容量,同时大于或等于默认的容量16,则设置可存储容量为以前可存储容量的2倍,因为threshold = loadFactor*capacity,capacity扩大了2倍,loadFactor不变,threshold自然也扩大2倍。
    */
    else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
             oldCap >= DEFAULT_INITIAL_CAPACITY)
      newThr = oldThr << 1; // double threshold
  }
  /**
  * 在HashMap构造器Hash(int initialCapacity, float loadFactor)中有一句代码,this.threshold = tableSizeFor(initialCapacity), 表示在调用构造器时,默认是将初始容量暂时赋值给了threshold临界值,因此此处相当于将上一次的初始容量赋值给了新的容量。什么情况下会执行到这句?当调用了HashMap(int initialCapacity)构造器,还没有添加元素时
  */
  else if (oldThr > 0) // initial capacity was placed in threshold
    newCap = oldThr;
  /**
  * 调用了默认构造器,初始容量没有设置,因此使用默认容量DEFAULT_INITIAL_CAPACITY(16),临界值就是16*0.75
  */
  else {               // zero initial threshold signifies using defaults
    newCap = DEFAULT_INITIAL_CAPACITY;
    newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
  }
  // 对临界值做判断,确保其不为0,因为在上面第二种情况(oldThr > 0),并没有计算newThr
  if (newThr == 0) {
    float ft = (float)newCap * loadFactor;
    newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
              (int)ft : Integer.MAX_VALUE);
  }
  threshold = newThr;
  @SuppressWarnings({"rawtypes","unchecked"})
  Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
  table = newTab;
  if (oldTab != null) {
    // 遍历将原来table中的数据放到扩容后的新表中来
    for (int j = 0; j < oldCap; ++j) {
      Node<K,V> e;
      if ((e = oldTab[j]) != null) {
        oldTab[j] = null;
        // 原来桶里只有一个元素
        if (e.next == null)
          newTab[e.hash & (newCap - 1)] = e;
        // 否则后面还有元素,且是树结构的,对树进行rehash
        else if (e instanceof TreeNode)
          ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
        else { // preserve order
          // 否则后面还有元素,且是链表结构的,对链表进行rehash
          Node<K,V> loHead = null, loTail = null;
          Node<K,V> hiHead = null, hiTail = null;
          Node<K,V> next;
          do {
            next = e.next;
            if ((e.hash & oldCap) == 0) {
              if (loTail == null)
                loHead = e;
              else
                loTail.next = e;
              loTail = e;
            }
            else {
              if (hiTail == null)
                hiHead = e;
              else
                hiTail.next = e;
              hiTail = e;
            }
          } while ((e = next) != null);
          if (loTail != null) {
            loTail.next = null;
            newTab[j] = loHead;
          }
          if (hiTail != null) {
            hiTail.next = null;
            newTab[j + oldCap] = hiHead;
          }
        }
      }
    }
  }
  return newTab;
}

Note, resize time will be involved in the process of rehash, but not all elements need to move,

FIG, 2-fold expansion of the original tab[i = (n - 1) & hash]), since n = 2n, then i = 2n-1, this time is the highest bit affected entries, so do the corresponding HashMap judgment, before deciding whether to shift.

// 链表
if ((e.hash & oldCap) == 0) {}
// 树
if ((e.hash & bit) == 0) {}

The interviewer asked Love

  1. HashMap inserted into an infinite loop, why?

    JDK8 用 head 和 tail 来保证链表的顺序和之前一样,不会因为多线程 put 导致死循环;

    JDK7则是resize时rehash并发时造成的环形链表。

  2. HashMap是否是线程安全,如果不是,如何保证?

    HashMap里的注释已经标注了不是。通过锁或者Collections.synchronizedMap(new HashMap(...));包装来达到安全的效果,考虑到性能,应该使用CurrentHashMap

  3. HashMap/HashTable有什么区别?

    HashMap,不是线程安全的,key和value都允许为null。key为null的键值对永远都放在以table[0]为头结点的链表中。

    HashTable,是线程安全的(方法上都有synchronize)。key(key为空时,hashCode会空指针)、value(显式判断)都不允许为null。

    HashMap继承了AbstractMap,HashTable继承Dictionary抽象类,两者均实现Map接口。
    HashMap的初始容量为16,Hashtable初始容量为11,两者的填充因子默认都是0.75。

    HashMap扩容时是当前容量翻倍即capacity乘2,Hashtable扩容时是容量翻倍+1即capacity乘2+1。

    HashMap和Hashtable的底层实现都是数组+链表结构实现。
    两者计算hash的方法不同;Hashtable计算hash是直接使用key的hashcode对table数组的长度直接进行取模:

  4. Fast-fail是什么?

    是Java集合中的一种机制,在用迭代器遍历一个集合对象时,如果遍历过程中对集合对象的内容进行修改(增删改),则会跑出ConcurrentModificationException。

先写这么多了,我也需要消化一下。囧。

Guess you like

Origin www.cnblogs.com/ranyabu/p/12151694.html