[JDK] JDK source code analysis -HashMap (2)

The foregoing " JDK source code analysis -HashMap (1) " analyzes the realization of the principle and structure of the main methods of internal HashMap. However, generally also asked to interview a lot of other issues, this article a brief analysis of some of the common problems under.

 

Here paste at the structure diagram (JDK 1.8) internal HashMap:

 

FAQ

 

Q1: HashMap whether thread safety? why?

 

First HashMap is thread safe. It should be a lot of people are aware of, HashMap source also described. But why say no safe? Where are reflected in it? Analyze (may not be comprehensive, just for reference) The following two examples briefly.

 

case 1:

Thread T1 execution put / remove other structural modifications (structural modification) operation; thread T2 traversal performed in this case will throw a ConcurrentModificationException. 

Sample Code (to put an example):

private static void test() {
  Map<Integer, Integer> map = new HashMap<>();
    
  Thread t1 = new Thread(() -> {
    for (int i = 0; i < 5000; i++) {
      map.put(i, i);
    }
  });

  Thread t2 = new Thread(() -> {
    for (Map.Entry<Integer, Integer> entry : map.entrySet()) {
      System.out.println(entry);
    }
  });

  t1.start();
  t2.start();
}
// 执行结果:
// 抛出java.util.ConcurrentModificationException

The reason here:

if (modCount != expectedModCount)
    throw new ConcurrentModificationException();

HashMap iterators and collection view, this value will be compared. The purpose is to determine if there are other thread is the HashMap structural modification, if the throw will be abnormal.

 

PS: HashMap carefully read the source code, then you can find that structural changes in the method will have the following line of code:

++modCount;

This value is used to record the number of structural changes.

 

case 2:

Thread T1 and T2 simultaneously perform put / remove other structural modifications (structural modification) operation. To put analysis method, for example, covering the elements will occur.

Sample code:

private static void test() throws InterruptedException {
  Map<Integer, Integer> map = new HashMap<>();

  Thread t1 = new Thread(() -> {
    for (int i = 0; i < 5000; i++) {
      map.put(i, i);
    }
  });

  Thread t2 = new Thread(() -> {
    for (int i = 5000; i < 10000; i++) {
      map.put(i, i);
    }
  });

  t1.start();
  t2.start();
  
  TimeUnit.SECONDS.sleep ( 20 is ); 
  System.out.println (Map); 
  System.out.println ( "size:" + map.size ()); 
} 
// Output:
 // {8192 = 8192, 8193 8193 =, 8194 = 8194, 8195 = 8195, ...
 // size: 9666
 // PS: this is a result of a specific results of several tests may be different, but basically there is no case size = 10000.

Here the problem is put on the method, by the preceding analysis method is put inside the HashMap implementation principle can identify the causes and not repeat them here.

 

And again here, HashMap is to design high efficiency single-threaded, understanding the thread is unsafe to deepen understanding of it, to know in what circumstances are not suitable for use, if the thread safety requirement, consider using ConcurrentHashMap.

 

Q2: switching threshold list and why red-black tree is 8 and 6?

 

Under analyzes why there is a linked list and red-black tree. Ideally, the position of each bin is located HashMap only one node, so that the query highest efficiency for O (1). Pulled out a list, or the list and then converted to a red-black tree is a hash collision response when more serious, the purpose is to make HashMap in extreme cases still be able to maintain high efficiency.

 

As for why is 8, HashMap part of the Implementation notes as follows:

/* Because TreeNodes are about twice the size of regular nodes, we
 * use them only when bins contain enough nodes to warrant use
 * (see TREEIFY_THRESHOLD). And when they become too small (due to
 * removal or resizing) they are converted back to plain bins.  In
 * usages with well-distributed user hashCodes, tree bins are
 * rarely used.  Ideally, under random hashCodes, the frequency of
 * nodes in bins follows a Poisson distribution
 * (http://en.wikipedia.org/wiki/Poisson_distribution) with a
 * parameter of about 0.5 on average for the default resizing
 * threshold of 0.75, although with a large variance because of
 * resizing granularity. Ignoring variance, the expected
 * occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
 * factorial(k)). The first values are:
 *
 * 0:    0.60653066
 * 1:    0.30326533
 * 2:    0.07581633
 * 3:    0.01263606
 * 4:    0.00157952
 * 5:    0.00015795
 * 6:    0.00001316
 * 7:    0.00000094
 * 8:    0.00000006
 * more: less than 1 in ten million
 */

Switch TreeNode TreeNode only due to the size twice as common node (Node), so that only when the bin contains enough (i.e. tree threshold TREEIFY_THRESHOLD) nodes; the node when the bin to reduce (or remove a node expansion), and will then turn red-black tree list.

 

When evenly distributed hashCode, used very little chance TreeNode. Ideally, in hashCode randomly distributed, the distribution of nodes in bin follow a Poisson distribution, and lists several data, you can see the probability of a bin in the list of 8 lengths up to (0.00000006) is less than one ten-millionth , thus converting the threshold is set to 8.

 

And two switching threshold as follows:

/**
 * The bin count threshold for using a tree rather than list for a
 * bin.  Bins are converted to trees when adding an element to a
 * bin with at least this many nodes. The value must be greater
 * than 2 and should be at least 8 to mesh with assumptions in
 * tree removal about conversion back to plain bins upon
 * shrinkage.
 */
static final int TREEIFY_THRESHOLD = 8;

/**
 * The bin count threshold for untreeifying a (split) bin during a
 * resize operation. Should be less than TREEIFY_THRESHOLD, and at
 * most 6 to mesh with shrinkage detection under removal.
 */
static final int UNTREEIFY_THRESHOLD = 6;

As for the red-black tree into the list of threshold is 6, the Internet has to say in order to avoid frequent switching. For example, if the threshold into red-black tree is a linked list of 8, if a HashMap been conducting insert and delete elements, list the number has been about 8, this tree will conduct mutual conversion and list of frequently low efficiency.

 

Such interpretation seems to have some truth, you can go explore.

 

Q3: Why load factor is 0.75?

 

JDK 1.7 in the Description:

/* As a general rule, the default load factor (.75) offers a good tradeoff
 * between time and space costs.  Higher values decrease the space overhead
 * but increase the lookup cost (reflected in most of the operations of the
 * <tt>HashMap</tt> class, including <tt>get</tt> and <tt>put</tt>). 
 */

The following article also made this analysis:

Why HashMap of loadFactor 0.75?

https://www.jianshu.com/p/64f6de3ffcc1

Perhaps the question is not so necessary to get to the bottom, simply, it is actually a tradeoff of time and space.

 

Q4: Why is the capacity of a power of 2?

 

Bitwise n & (length - 1) and n% length modulo arithmetic effect is the same. However, in the computer, the operation efficiency is far higher than the bit modulo operation. Therefore, this is done in order to improve operational efficiency.

 

Q5: What type of element is generally used as a Key? why?

 

General use String, Integer, etc., which are immutable (Immutable), as an immutable class is inherently thread-safe. And override hashCode method and the equals method.

 

Q6: a measure of good or bad hash algorithm? String Class hashCode achieve?

 

Design hash method can have a lot. Although the hash value, the more uniform the better, but HashMap primary purpose is the pursuit of fast, so hash algorithm is designed to be as fast as possible. hashCode method String Class as follows:

public int hashCode() {
    int h = hash;
    if (h == 0 && value.length > 0) {
        char val[] = value;
        for (int i = 0; i < value.length; i++) {
            h = 31 * h + val[i];
        }
        hash = h;
    }
    return h;
}

PS: I am finishing the above problem is a result and reflection from online search for reference only and does not necessarily entirely accurate (to be skeptical). HashMap issues related to possible there are many, will not be enumerated here.

 

 

Reference links:

https://www.jianshu.com/p/7af5bb1b57e2

 

Related Reading:

JDK source code analysis -HashMap (1) 

 

Stay hungry, stay foolish.

PS: This article first appeared in the public micro-channel number] [WriteOnRead.

Guess you like

Origin www.cnblogs.com/jaxer/p/11280037.html