The difference between HashMap and HashSet_study notes

Link source from:CSDN
java The most basic two data structures: the difference between arrays and linked lists:
arrays are easy to read quickly (through a for loop), unchanged storage (the array length is limited);
linked lists are easy to store, not easy to read quickly .
The appearance of the hash table is to solve the weakness of the linked list that is not fast, and the hash table is also called the hash table .

1. Why use HashMap?

HashMap is a hash bucket (array and linked list), it stores key-value pairs (key-value) mapping
HashMap adopts the data structure of array and linked list, which can inherit the linear search of array and the addressing modification of linked list conveniently in query and modification.
HashMap is non-synchronized, so HashMap is fast
HashMap can accept null keys and values, but Hashtable cannot (the reason is that the equlas() method requires objects, because HashMap is a later API that can be processed)

2. What is the working principle of HashMap?

HashMap is based on the principle of hashing. We use put(key, value) to store objects in HashMap, and use get(key) to get objects from HashMap. When we pass the key and value to the put() method, we first call the hashCode() method on the key, and the calculated and returned hashCode is used to find the bucket location of the Map array to store the Node object. The key point here is to point out that HashMap stores key objects and value objects in the bucket as Map.Node.
The following is HashMap initialization, simple simulation data structure

Node[] table=new Node[16] 散列桶初始化，table
class Node {
    
    
 hash;//hash值
 key;//键
　value;//值
　node next;//用于指向链表的下一层（产生冲突，用拉链法）
}

The following is the specific put process (JDK1.8 version)
1. Find the Hash value of the Key, and then calculate the subscript

2. If there is no collision, put it directly into the bucket (collision means that the calculated Hash value is the same and needs to be placed in the same bucket)

3. If there is a collision, link to the back in a linked list

4. If the length of the linked list exceeds the threshold (TREEIFY THRESHOLD==8), the linked list is converted to a red-black tree, and the length of the linked list is less than 6, the red-black tree is converted back to the linked list

5. If the node already exists, replace the old value

6. If the bucket is full (capacity 16*load factor 0.75), it needs to be resized (rearranged after expansion by 2 times)
The following is the specific get process (considering special circumstances, if the hashcodes of the two keys are the same, how do you get the value object?)
When we call the get() method, HashMap will use the hashcode of the key object to find the bucket location. After finding the bucket location, it will call The keys.equals() method finds the correct node in the linked list, and finally finds the value object you are looking for.

3. Is there any way to reduce collisions?

The perturbation function can reduce collisions. The principle is that if two unequal objects return different hashcodes, the chance of collision will be smaller, which means that the structure of the linked list is reduced, so that the value will not be frequently called equal Method, which can improve the performance of HashMap. (Disturbance is the implementation of the algorithm inside the Hash method. The purpose is to make different objects return different hashcodes.)
Using immutable, declared final objects, and using the appropriate equals() and hashCode() methods, will reduce the occurrence of collisions. Immutability makes it possible to cache the hashcode of different keys, which will increase the speed of the entire object acquisition. It is a very good choice to use wrapper classes such as String and Interger as the key. Why are wrapper classes such as String and Interger suitable as keys? Because String is final, and the equals() and hashCode() methods have been rewritten. Immutability is necessary because in order to calculate the hashCode(), the key value must be prevented from changing. If the key value returns a different hashcode when it is put in and when it is obtained, then the object you want cannot be found in the HashMap.

4. What if the size of HashMap exceeds the capacity defined by load factor?

The default load factor size is 0.75, that is to say, when a map fills up 75% of the bucket, like other collections (such as ArrayList, etc.), a bucket array twice the size of the original HashMap will be created to recreate Adjust the size of the map and put the original objects into the new bucket array. This process is called rehashing, because it calls the hash method to find the new bucket location. This value can only be in two places, one is the position of the original subscript, and the other is the position where the subscript is <original subscript + original capacity>

5. Is there any problem with resizing HashMap?

When the HashMap is resized, there is indeed a race condition, because if both threads find that the HashMap needs to be resized, they will try to resize at the same time. In the process of resizing, the order of the elements stored in the linked list will be reversed, because when moving to a new bucket position, HashMap will not put the elements at the end of the linked list, but at the head. This is To avoid tail traversing. If conditional competition occurs, then there is an endless loop. (HashMap is not used in a multi-threaded environment)
Why does multithreading lead to an infinite loop, and how does it happen?
The capacity of HashMap is limited. When the HashMap reaches a certain degree of saturation after multiple element insertions, the probability of conflicting Key mapping positions will gradually increase. At this time, HashMap needs to extend its length, that is, perform Resize. 1. Expansion: Create a new Entry empty array whose length is twice the original array. 2. ReHash: Traverse the original Entry array and re-Hash all the entries to the new array.

6.HashTable

Array + linked list storage
Default capacity: 11 (a prime number is appropriate)
put:
Index calculation: (key.hashCode() & 0x7FFFFFFF)% table.length
If found in the linked list, replace the old value, if not found, continue
When the total number of elements exceeds the capacity * loading factor, the capacity will be doubled and hashed again.
-Add new elements to the head of the linked list
Synchronized is added to the method of modifying the internal shared data of Hashtable to ensure thread safety.

7.HashMap ，HashTable 区别

The default capacity is different. Expansion is different
Thread safety, HashTable safety
Different efficiency HashTable is slower because of locking

8. The difference between HashMap and HashSet

HashSet is implemented through HasMap. The input parameters of HashMap are composed of Key and Value. When implementing HashSet, keeping the Value of HashMap as a constant is equivalent to processing only Key objects in HashMap.
The bottom layer of HashMap is an array structure. Each item in the array corresponds to a linked list. This structure is called a "linked list hash" data structure, which is a combination of an array and a linked list; it is also called a hash table or a hash table.

The process of storing objects in HahMap is as follows
①, call the hashCode() method on the Key of HahMap, and return the int value, which is the corresponding hashCode;

②. Use this hashCode as the index of the hash table to find the corresponding position of the hash table. If the content of the current position is NULL, pack the Key and Value of the hashMap into an Entry array and put it into the current position;

③. If the content of the current position is not empty, continue to search the linked list stored at the current index, use the equals method to find the Entry array with the same Key, and replace the old Value with the current Value;

④. If no object with the same value as the current Key is found, move the linked list at the current position back (the Entry array holds a reference to the next element), and put the new Entry array at the head of the linked list;
The process of storing objects in
HashSet When adding elements to HashSet, HashSet will first call the element’s hashCode method to get the element’s hash value.

Then the element's hash value is subjected to operations such as shifting, and then the storage location of the element in the hash table can be calculated.

Case 1: If it is calculated that the element storage location does not currently have any element storage, then the element can be directly stored at that location.

Case 2: If it is calculated that there are already other elements in the storage location of the element, then the equals method of the element will be called to compare with the element at that location again

If equals returns true, then the element and the element at this position are regarded as duplicate elements and are not allowed to be added. If the equals method returns false, then the element is added.
Insert picture description here