The difference and realization principle of each collection

The difference between HashMap and Hashtable
1. The main difference between the two is that Hashtable is thread-safe, while HashMap is not thread-safe.
2. HashMap can use null as key, while Hashtable does not allow null as key (when HashMap uses null as key, It is always stored on the first node of the table array)
3. The initial capacity of HashMap is 16, and the initial capacity of Hashtable is 11, and the fill factor of both is 0.75 by default.
4. When HashMap is expanded, the current capacity is doubled, namely: capacity * 2, is a double capacity expansion Hashtable i.e. +1: capacity * 2 +. 1
5. the two different methods of calculating hash
Hashtable hash calculation is used as the key length table hashcode array directly modulo
HashMap hash calculation of The hashcode of the key is hashed twice to obtain a better hash value, and then modulo the length of the table array


In addition to HashMap and Hashtable, there is also a hash set HashSet.
The difference is that HashSet is not a key value structure, but only stores unique elements, which is equivalent to a simplified version of HashMap, but only contains the keys in the HashMap.

The underlying structure of HashSet is a hash table. The HashSet collection uses the hashCode() method and equal() method of the element inherited from the Object superclass to determine whether two objects are the same. The hashCode method can avoid the cumbersome process of adding equals each time. So when we define the object ourselves, we can override the two methods that the object inherits from Object, so that they can judge whether the two elements are the same element according to our will.

The bottom layer of the TreeSet collection is a binary tree data structure. He not only does not allow the same elements to exist, but also helps us sort them. After we store the elements in the TreeSet collection, they are sorted in natural order. And we want to sort the elements according to our will, let the elements implement the Comparable interface, and then implement the CompareTo method inside. Returning 0 means that the elements are the same. Otherwise, the arrangement order is judged according to the positive or negative number. TreeSet is asynchronous

 

The realization principle of
HashMap and Hashtable The underlying implementation of HashMap and Hashtable are all realized by the array + linked list/red-black tree structure. When the length of the linked list of a bit bucket reaches a certain threshold, the linked list will be converted to red and black The time complexity of the tree, red-black tree is logN. (When the number of nodes with the same hash value is not less than 8, this is the biggest difference between HashMap implementation in JDK7 and JDK8.) treeifyBin() is to convert the linked list into a red-black tree.
The name of Entry in the JDK has become Node because it is associated with the implementation of Red-Black Tree TreeNode.
When adding, deleting, and obtaining elements, the hash is calculated first, and the index is calculated according to the hash and table.length, which is the subscript of the table array, and then the corresponding operation is performed. Take map as an example: the put process is to calculate the hash first and then pass the hash and table .length fetch and calculate the index value, and then put the key in the table[index] position. When there are other elements in table[index], a linked list will be formed in the table[index] position, and the newly added element will be placed in table[ index], the newly added element is placed in the first place of the table, and the original element is linked by Entry's next, so that the hash conflict problem is solved in the form of a linked list. When the number of elements reaches the critical value (capactiy*factor), expansion is performed , The length of the table array becomes table.length*2
The process of get is to first calculate the hash, then calculate the index value through hash and table.length, and then traverse the linked list on table[index] until the key is found, and then return
(if The hashCode of the two objects is the same, the hashCode will be used to find the bucket location, and then the key.equals() method will be called to find the correct node in the linked list. Finally, the value object you are looking for is found.)

 

ArrayList: The underlying data structure makes the array structure, the query speed is fast, the addition, deletion, and modification are slow. The initial capacity is 10.
When the element exceeds the array content, a new array will be generated, which is extended by 50% of the original array, and the data of the original array is copied to the new In the array, add new elements to the new array.
Waste of space

 

LinkList: The underlying data structure of LinkedList is based on a two-way circular linked list, and no data is stored in the head node. The addition and deletion speed is fast, and the query is slightly slower (binary search method); it
can be cloned, supports serialization, and is asynchronous. It is achieved by a count index value.
For example, when we call get(int location), we first compare "location" and "1/2 of the length of the doubly linked list"; if the former is larger, look up from the head of the linked list until the location; otherwise, from the linked list The previous search starts at the end until the location location.


Vector: The bottom layer is an array structure, and the thread synchronization ArrayList means that the threads are not synchronized; new is a 100% waste of memory;
 


ConcurrentHashMap implementation principle 1.7
ConcurrentHashMap allows multiple modification operations to be performed concurrently (16). The key lies in the use of lock separation technology.
It uses multiple locks to control the modification of different segments of the hash table. Each segment is actually a small hashtable with its own lock. As long as multiple concurrency occurs on different segments, they can proceed concurrently.
ConcurrentHashMap treats the key-value as a whole at the bottom layer. This whole is an Entry object. The segment corresponding to the key is used for processing.
Unlike HashMap, ConcurrentHashMap uses multiple sub-Hash tables, which is the segment (Segment) ConcurrentHashMap completely allows more A read operation is performed concurrently, and the read operation does not require a lock.

https://my.oschina.net/hosee/blog/639352
Some methods need to span segments, such as size() and containsValue(), they may need to lock the entire table rather than just a segment, which requires sequential locking All segments, after the operation is completed, the locks of all segments are released in sequence. Here "in order" is very important, otherwise deadlock is very likely to occur. Inside ConcurrentHashMap, the segment array is final, and its member variables are actually final. However, it is not just to declare the array as final. Ensure that the array members are also final, which requires implementation guarantees. This ensures that there will be no deadlocks, because the order in which locks are acquired is fixed.

There are many HashEntry list arrays under Segment. For a key, it takes three hash operations to finally locate the position of the element.
The three hashes are:
1. For a key, perform a hash operation first to get the hash value h1, that is, h1 = hash1(key);
2 .Hash the high bits of h1 for the second time to get the hash value h2, that is, h2 = hash2 (high bits of h1), and h2 can determine which segment the element is placed in;
3. The obtained h1 Perform the third hash, get the hash value h3, that is, h3 = hash3(h1), through h3, you can determine which HashEntry the element is placed in.

concurrencyLevel represents the concurrency level. This value is used to determine the number of segments. The number of segments is greater than or equal to the first 2 of the concurrencyLevel to the power of n.
Initialization process
1. Verify the legality of the parameters
2. Calculate the number of segments
3. Then use the loop to find the first n-th number ssize that is greater than or equal to the concurrencyLevel, which is the size of the Segment array, and record a total of left bits The number of shifts sshift, and set segmentShift = 32-sshift,
  and the value of segmentMask is equal to ssize-1, each binary bit of segmentMask is 1, the purpose is to determine the index of the segment by doing & operation on the hash value of the key and this value .
4. Check whether the given capacity value is greater than the maximum allowable value, and if it is greater than this value, set it to this value. The maximum capacity value is static final int MAXIMUM_CAPACITY = 1 << 30;.
5. Then calculate how many elements should be placed in each segment on average, this value c is the value rounded up. (int c = initialCapacity / ssize;) For example, if the initial capacity is 15 and the number of segments is 4, an average of 4 elements need to be placed in each segment.
6. Finally, create a Segment instance and treat it as the first element of the Segment array.


The put operation is to be locked. The operation steps are as follows:
1. Determine whether the value is null, if it is null, throw an exception directly.
2. Hash twice by key to determine which segment to put data in.
5. Put value into this segment object. This put operation is basically the same step (get HashEntry index through & operation, and then set).

The get operation does not need to be locked (if the value is null, readValueUnderLock will be called, only this step will be locked), and data security is ensured through volatile and final.
1. As with the put operation, first use the key to perform two hashes to determine which segment should be used to fetch data.
2. Use Unsafe to get the corresponding Segment, and then perform an & operation to get the position of the HashEntry linked list, and then traverse the entire linked list from the head of the linked list (because Hash may collide, so use a linked list to save), if you find the corresponding key, The corresponding value is returned. If the corresponding key is not found after traversing the linked list, it means that the key is not included in the Map, and null is returned.

size/containsValue:
first give 3 opportunities, do not lock all the segments, traverse all the segments, accumulate the size of each segment to get the size of the entire Map, if a certain adjacent two calculations obtain the update times of all the segments (modCount) It is the same, indicating that there is no update operation during the calculation process, and this value is directly returned. If there is a change in the update times of the map during the three calculations without locking, the subsequent calculations first lock all the segments, then traverse all the segments to calculate the map size, and finally unlock all the segments.

The key and value values ​​in ConcurrentHashMap cannot be null, the key in HashMap can be null, and the key in HashTable cannot be null.
ConcurrentHashMap is a thread-safe class and does not guarantee that operations using ConcurrentHashMap are thread-safe!
The get operation of ConcurrentHashMap does not need to be locked, the put operation needs to be locked

 

 


ConcurrentHashMap realization principle 1.8

A
negative number of sizeCtl means that initialization or expansion is in progress.
-1
means that it is initializing. -N means that there are N-1 threads that are performing expansion. A
positive number or 0 means that the hash table has not been initialized. This value indicates the size of initialization or next expansion. , This is similar to the concept of expansion threshold. As you can see later, its value is always 0.75 times the current ConcurrentHashMap capacity, which corresponds to loadfactor.


Node is the core internal class. It packs key-value key-value pairs. All data inserted into ConcurrentHashMap is packed in it. It is very similar to the definition in HashMap, but there are some differences. It sets a volatile synchronization lock for the value and next attributes (same as JDK7 Segment). It does not allow calling the setValue method to directly change the value field of Node. It adds find The method assists the map.get() method.
When the length of the linked list is too long, it will be converted to TreeNode. But unlike HashMap, it is not directly converted into red-black trees, but these nodes are packaged into TreeNode and placed in the TreeBin object, and TreeBin completes the packaging of red-black trees. And TreeNode is integrated from the Node class in ConcurrentHashMap, not from the LinkedHashMap.Entry<K,V> class in HashMap, which means that TreeNode has a next pointer. The purpose of this is to facilitate access based on TreeBin.
ForwardingNode:
A node class used to connect two tables. It contains a nextTable pointer to point to the next table. And the key value next pointer of this node is all null, and its hash value is -1. The find method defined here is to query the node from the nextTable instead of using itself as the head node.


For ConcurrentHashMap, calling its construction method is just setting some parameters. The initialization of the entire table occurs when inserting elements into ConcurrentHashMap. For example, when calling put, computeIfAbsent, compute, merge and other methods, the timing of the call is to check table==null.
The initialization method mainly uses the key attribute sizeCtl. If this value<0, it means that other threads are being initialized, and this operation is abandoned. It can also be seen here that the initialization of ConcurrentHashMap can only be completed by one thread. If the initialization permission is obtained, use the CAS method to set sizeCtl to -1 to prevent other threads from entering. After initializing the array, change the value of sizeCtl to 0.75*n.

To be added

Guess you like

Origin blog.csdn.net/orzMrXu/article/details/102625132