In-depth understanding of the principle of hashMap

write in front

First, re-emphasize the real meaning of hashcode (==) and equals (I remember someone saying that equals is to judge the content of the object, and hashcode is to judge whether it is equal or not):

equals : Whether the same object instance. Note, " instance ". For example String s = new String("test"); s.equals(s), this is the comparison of the same object instance;

Equal sign (==) : Compare the memory address of the object instance (that is, the ID of the object instance) to determine whether it is the same object instance; it can also be said to determine whether the object instances are physically equal;

Hashcode : I think it can be understood like this: it is not the memory address of the object, but a descriptor of the object instance (or the hash algorithm mapping of the object storage location) using the hash algorithm - the hash code of the object instance.

Why you need to use Hashcode can be described from the common requirements of Java collections:

There are two types of collections in ava, one is List, and the other is Set. The elements in the former set are ordered, and the elements can be repeated; the latter elements are unordered, but the elements cannot be repeated. Then there is a more serious problem here: in order to ensure that elements are not repeated, what should be used to judge whether two elements are repeated? This is the Object.equals method. However, if you check every time you add an element, then when there are many elements, the number of comparisons for the elements added to the collection is very high. That is, if the collection now has 1000 elements, then when the 1001st element is added to the collection, it will call the equals method 1000 times. This obviously reduces efficiency considerably.
  
So, Java adopts the principle of hash table. Hash algorithm, also known as hash algorithm, directly assigns data to an address according to a specific algorithm. It can be simply understood that the hashCode method actually returns the image of the object storage location.
   
In this way, when a new element is to be added to the collection, the hashCode method of the element is called first to locate the storage location where it should be placed. If there is no element in this position, it can be stored directly in this position without any further comparison; if there is already an element in this position, call its equals method to compare with the new element, if it is the same, it will not exist If it is not the same, it means that a conflict has occurred. The hash table has a specific solution to the conflict, but it will eventually save the new element in the appropriate position. In this way, the actual number of calls to the equals method is greatly reduced, almost only once or twice.

Simple induction, in-depth understanding of hashmap:

The data structure of HashMap is based on arrays and linked lists . ( Store elements in an array, if there are elements with the same hash, create a linked list structure in the array structure, and then put the elements with the same hash on the next node of the linked list )

The structure of hashMap is similar to this
  element 0-->[hashCode=0, key.value=x1 data]
  element 1-->[hashCode=1, key.value=y1 data]
  . . . . . .
  Element n-->[hashCode=n, key.value=z1 data]

Suppose there is no hashCode=1 element to join, but there are two hashCode=0 data, its structure becomes like this
  element 0-->[ hashCode=0, key.value=x1 data].next-->[hashCode=0, key.value=x2 data]
  element 1-->[null]
  ......
  element n-->[ hashCode=n, key.value=z1 data]

Both put and get will first call the hashcode method to find the relevant key, and then call equals when there is a conflict (this is why the hashcode and equals are reviewed at the beginning)!

HashMap is based on the hashing principle, we store and get objects through put() and get() methods. When we pass the key-value pair to the put() method, it calls the hashCode() method of the key object to calculate the hashcode , and then finds the bucket location to store the value object. When getting the object, find the correct key-value pair through the equals() method of the key object, and then return the value object. HashMap uses a linked list to solve the collision problem. When a collision occurs, the object will be stored in the next node of the linked list. HashMap stores key-value pair objects in each linked list node .

What happens when two different key objects have the same hashcode? They will be stored in a linked list in the same bucket location. The equals() method of the key object is used to find key-value pairs.

simple question

"Have you ever used a HashMap?" "What is a HashMap? Why do you use it?"

Almost everyone will answer "yes", and then answer some features of HashMap, such as HashMap can accept null keys and values, while Hashtable cannot; HashMap is non-synchronized; HashMap is fast; and HashMap stores key-value pairs, etc. . This shows that you have used HashMap and are fairly familiar with it. But the interviewer took a sharp turn and asked some tricky questions from now on, more basic details about HashMap. The interviewer may ask the following questions:

"Do you know how HashMap works?" "Do you know how HashMap's get() method works?"

"HashMap is based on the principle of hashing. We use put(key, value) to store objects in HashMap, and use get(key) to get objects from HashMap. When we pass the key and value to the put() method, we first pair the key Call the hashCode() method, and the returned hashCode is used to find the bucket location to store the Entry object." The key point here is to point out that HashMap stores key objects and value objects in the bucket as Map.Entry. This helps to understand the logic of getting the object. If you don't realize this, or mistakenly think that you only store values ​​in buckets, you won't answer the logic of how to get objects from HashMap. This answer is quite correct and shows that the interviewer does know how hashing and HashMap work. But this is only the beginning of the story, when the interviewer adds some practical scenarios that Java programmers encounter every day, the wrong answers are frequent. The next question may be about collision detection in HashMap and the solution to collision:

"What happens when two objects have the same hashcode?" 

From here, the real confusion begins,

Some interviewers will answer that two objects are equal because the hashcode is the same, the HashMap will throw an exception, or it won't store them. Then the interviewer may remind them that there are equals() and hashCode() methods, and tell them that even if the hashcode is the same, the two objects may not be equal. Some interviewers may give up on this, while others can move on, they answer "Because the hashcode is the same, so their bucket positions are the same, a 'collision' will occur. Because HashMap uses a linked list to store objects, this Entry (contains key-value pairs) The Map.Entry object) will be stored in a linked list." This answer is very reasonable, although there are many ways to handle collisions, this method is the simplest and is exactly what HashMap does. But the story is not over, the interviewer will continue to ask:

"How do you get the value object if two keys have the same hashcode?"

The interviewee will answer: When we call the get() method, HashMap will use the hashcode of the key object to find the bucket location, and then get the value object. The interviewer reminds him that if there are two value objects stored in the same bucket, he gives the answer: the linked list will be traversed until the value object is found. The interviewer will ask, since you don't have a value object to compare, how did you make sure you found a value object? Unless the interviewee has until the HashMap stores key-value pairs in the linked list, they cannot answer this question.

Some interviewers who remember this important knowledge point will say that after finding the bucket location, they will call the keys.equals() method to find the correct node in the linked list, and finally find the value object they are looking for. Perfect answer!

In many cases, interviewers make mistakes in this segment because they confuse the hashCode() and equals() methods. Because before this hashCode() appeared repeatedly, and the equals() method only appeared when the value object was obtained. Some good developers will point out that using immutable, final objects, and using appropriate equals() and hashCode() methods, will reduce collisions and improve efficiency. Immutability makes it possible to cache the hashcode of different keys, which will improve the speed of the whole object acquisition. Using wrapper classes such as String and Interger as keys is a very good choice.

If you thought this was the end of it, you'd be surprised to hear the following question.

What if the size of the HashMap exceeds the capacity defined by the load factor? "

You won't be able to answer this question unless you really know how HashMap works. The default load factor size is 0.75, that is, when a map fills 75% of the bucket, like other collection classes (such as ArrayList, etc.), a bucket array that is twice the size of the original HashMap will be created to recreate Resize the map and put the original objects into the new bucket array. This process is called rehashing because it calls the hash method to find the new bucket location.

If you can answer this question, here comes the question:

Do you understand what's wrong with resizing HashMap? "

You may not be able to answer, and the interviewer will remind you that in the case of multi-threading, there may be a race condition.

When resizing the HashMap, there is indeed a race condition, because if both threads find that the HashMap needs to be resized, they will try to resize at the same time. During the resizing process, the order of the elements stored in the linked list will be reversed, because when moving to a new bucket position, HashMap does not place the element at the end of the linked list, but at the head, which is To avoid tail traversing. If a conditional race occurs, it will be an infinite loop. At this time, you can ask the interviewer, why is it so strange to use HashMap in a multi-threaded environment? :)

supplementary question

Why are wrapper classes like String, Interger suitable as keys?

Wrapper classes such as String and Interger are suitable as keys of HashMap, and String is the most commonly used. Because String is immutable and final, and the equals() and hashCode() methods have been overridden. Other wrapper classes also have this feature. Immutability is necessary because in order to calculate the hashCode(), the key value must be prevented from changing, and if the key value returns different hashcodes when put and when obtained, then you will not be able to find the object you want from the HashMap. Immutability also has other advantages such as thread safety. If you can keep the hashCode constant just by declaring a field final, then do so. Because the equals() and hashCode() methods are used to obtain the object, it is very important for the key object to correctly override these two methods. If two unequal objects return different hashcodes, the chance of collision will be smaller, which will improve the performance of HashMap.

Can we use custom objects as keys?

This is an extension of the previous question. Of course you can use any object as a key, as long as it obeys the definition rules of the equals() and hashCode() methods, and it will not change after the object is inserted into the Map. If the custom object is immutable, then it satisfies the condition of being a key, because it cannot be changed after it is created.

Can we use CocurrentHashMap instead of Hashtable?

This is another very popular interview question, because more and more people use ConcurrentHashMap. We know that Hashtable is synchronized, but ConcurrentHashMap has better synchronization performance because it only locks part of the map according to the synchronization level. ConcurrentHashMap can of course replace HashTable, but HashTable provides stronger thread safety. Check out this blog to see the difference between Hashtable and ConcurrentHashMap.

I personally like this question because of the depth and breadth of the question, and it doesn't directly involve different concepts. Let's take a look at what knowledge points are designed for these questions:

  • The concept of hashing
  • Methods of resolving collisions in HashMap
  • The application of equals() and hashCode() and their importance in HashMap
  • Benefits of Immutable Objects
  • Conditional race for HashMap multithreading
  • Resize HashMap

Summarize

How HashMap works

HashMap is based on the hashing principle, we store and get objects through put() and get() methods. When we pass the key-value pair to the put() method, it calls the hashCode() method of the key object to calculate the hashcode, and then finds the bucket location to store the value object. When getting the object, find the correct key-value pair through the equals() method of the key object, and then return the value object. HashMap uses a linked list to solve the collision problem. When a collision occurs, the object will be stored in the next node of the linked list. HashMap stores key-value pair objects in each linked list node.

What happens when two different key objects have the same hashcode? They will be stored in a linked list in the same bucket location. The equals() method of the key object is used to find key-value pairs.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325906887&siteId=291194637