In-depth analysis of java Map collection

Java collection class combing
To understand the powerful features of Java technology, it is necessary to master the collection framework

1. The introduction and comparison of the inheritance hierarchy of the collection class are summarized as follows:

Collection<--List<--Vector
Collection<--List<--ArrayList
Collection<--List<--LinkedList
Collection<--Set<--HashSet
Collection<--Set<--HashSet<--LinkedHashSet
Collection<--Set<--SortedSet<--TreeSet

In addition to Set and List, collections inherited from Collection also have Queue queues.

Both Vector and ArrayList are based on the implementation of Array array, and it is impossible to get out of the limitation of Array, so the performance will not exceed Array. The only difference between Vector and ArrayList is that Vector is thread-safe, so ArrayList is superior in performance. LinkedList is different from Vector and ArrayList because it is not implemented based on arrays, so it is not limited by Array performance. Each of its nodes contains: (1) the data data of the node itself (2) the information of the next node nextNode. Therefore, when adding or deleting a LinkedList, you only need to change the relevant information of the nextNode to achieve it. Unlike the above two Array-based Lists, a large amount of data movement must be performed during insertion and deletion operations. Lists are mainly stored in a linear fashion, so there is no specific order of elements, only a beginning and an end.
Here I would like to talk about two lists based on arrays, Vector and ArrayList. [Data growth] When the number of stored elements exceeds the number defined by the list, the Vector expansion doubles the capacity, while the ArrayList increases by 50%; [Synchronization aspect] 】Vector is synchronized, the objects in it are safe, and the performance is relatively low; 【Usage mode】In ArrayList and Vector, find data from a specified position by index or add or remove an element at the end of the collection. The time spent is the same O(1), but if you add or remove elements in other positions, it involves the i-th and the elements after the i-th element to perform the displacement operation. Unlike LinkedList, it takes O(1) to add or remove elements from any position in the collection, but the index is slower to O(i).
The fundamental difference between set and list is that the implementation of set is based on HashMap, while List is based on Array.
HashSet: Because of its storage method, Kry in HashMap is used as the corresponding storage item of Set. Because the Key in HashMap is not allowed to be repeated, duplicate values are not allowed in Set. If a duplicate value is stored later, the previous object will be overwritten. LinkedHashSet, as a subclass of HashSet, is a linked list; TreeSet, as a subclass of SortedSet, is different from HashSet: it is implemented through SortedMap and is ordered. A key in HashMap is allowed to have a null value, which is asynchronous (similar to HashTable). (All keys and values in HashTable cannot be empty, they are synchronized, and are rarely used now.)

2. Introduce the three interfaces of Set, Map and List

(1) Set: The elements in the Set are unordered and cannot be repeated. Set has two implementation classes: HashSet and TreeSet, among which TreeSet implements the SortedSet interface, so the elements in TreeSet are ordered;
(2) List: List is an ordered and repeatable collection, and objects are stored in the order in which the elements are stored , so it has precise control over where each element is inserted and deleted. List of elements stored in order allows repetition. LinkedList, ArrayList and Vector all implement the List interface.
(3) Map: Map is a structure that maps from keys to values, and stores key-value pairs, where keys are unique and cannot be repeated, while values can be repeated. There are many subclasses that implement Map: HashMap, TreeMap, LinkedMap, etc. Although the same interface is implemented, the execution efficiency is not exactly the same. HashMap is implemented based on a hash table, and uses the hashCode of the object for fast query; LinkedMap uses a list to maintain the internal order; TreeMap is implemented through the data structure of a red-black tree, and the internal elements are arranged on demand.

3. What is the difference between ArrayList, Vector and LinkedList?

There is no difference in usage between ArrayList and LinkedList, but there is still a difference in function. LinkedList is often used when there are many additions and deletions and few query operations, while ArrayList is the opposite.
These three classes all inherit from the List interface, and the internal elements can be repeated but ordered. All are in the java.util package, and they are all scalable arrays, which means that the length of the array can be dynamically changed.
Brief description: First of all, ArrayList and Vector: both are implemented based on Object[] object array. When they are created, they will apply for contiguous storage space to store data. So: 1) They support the use of subscripts to access elements, so the speed of indexing data is faster. 2) Because of sequential storage, the data in the container needs to be moved when inserting data, so the efficiency of inserting data will be slower. Both ArrayList and Vector have an initialized capacity value, and when the length of data added into the container is greater than the length of the container, they will automatically expand their storage units dynamically. Vector is expanded to 2 times by default, while ArrayList is expanded to 1.5 times.
Difference: The biggest difference between the two is the use of synchronization. The methods in ArrayList are all asynchronous, so ArrayList is not safe, and most methods in Vector are directly or indirectly synchronized, so Vector is thread-safe. Relatively speaking, because Vector is thread-safe and ArrayList is not thread-safe, ArrayList has higher performance than Vector.
LinkedList is implemented using a bidirectional list, and the index of data needs to be traversed from the head of the list, so it is inefficient for random access. However, because inserting elements does not require moving the data, the insert operation is more efficient. At the same time LinkedList is a non-thread safe container.
Selection of List: 1) When indexing or adding and deleting elements at the end of the collection, the efficiency of using ArrayList and Vector is higher; 2) When performing operations of inserting and deleting elements at specific positions, LinkedList is more efficient; Because LinkedList doesn't need to move elements. When using containers in multiple threads, it is safer to choose Vector. Because Vector is thread safe.

4. What is the difference between HashMap, Hashtable, TreeMap and WeakHashMap?

(1) Both HashMap and hashtable use the hash method for indexing, so they have many similarities.
Differences: 1) HashMap is a lightweight implementation of Hashtable (thread-unsafe implementation), they all complete the Map interface, and HashMap allows a null value of the key, Hashtable does not. 2) Hashtable's thread is safe, while HashMap is not safe and does not support thread synchronization. So if using HashMap developers must provide additional synchronization mechanism.
(2) WeakHashMap is very similar to HashMap. The difference is that WeakHashMap is a "weak reference". That is to say, if a key in the WeakHashMap is no longer referenced externally, it will be reclaimed by the garbage collector. The key of HashMap adopts the "strong reference method". Only when the key in it is deleted can it be recycled by the garbage collector.
(3) Map selection: The key-value pairs stored in HashMap are random. 1) When inserting, deleting and locating elements in Map, HashMap is the best choice; 2) Because TreeMap implements the SortMap interface, it can sort the records it saves by key, so the retrieved value is the sorted key value Yes, if you need natural sorting or custom sorting, use TreeMap; 3) If you need to output the data in the same order as the input, use LinkedHashMap (a subclass of HashMap).
Add data to HashMap: If the key exists, overwrite the value corresponding to the key.

5.Map collection

Implementation class: HashMap, Hashtable, LinkedHashMap and TreeMap
HashMap
HashMap is the most commonly used Map, it stores data according to the HashCode value of the key, and its value can be obtained directly according to the key, with fast access speed, when traversing, the order of obtaining data is completely random. Because the key object cannot be repeated, HashMap only allows the key of one record to be Null at most, and allows the value of multiple records to be Null. It is an asynchronous
Hashtable
. Similar to HashMap, Hashtable is a thread-safe version of HashMap. It supports thread synchronization. That is, only one thread can write Hashtable at any time, which also causes Hashtale to be slower when writing. It inherits from the Dictionary class. The difference is that it does not allow the key or value of the record to be null, and the efficiency is low.
ConcurrentHashMap (how to achieve thread safety?)
thread safety, and lock separation. ConcurrentHashMap internally uses segments (Segment) to represent these different parts, each segment is actually a small hash table, they have their own locks. As long as multiple modification operations occur on different segments, they can proceed concurrently.
LinkedHashMap
LinkedHashMap saves the insertion order of records. When traversing LinkedHashMap with Iteraor, the record obtained first must be inserted first, which is slower than HashMap when traversing, and has all the characteristics of HashMap.
TreeMap
TreeMap implements the SortMap interface, which can sort the records it saves according to the key. The default is the ascending order of the key value (natural order), and the sorting comparator can also be specified. When iterator is used to traverse the TreeMap, the obtained records are sorted. . The key value is not allowed to be empty, asynchronous;

6. Summary of the main implementation class differences

Vector and ArrayList
vector are thread-synchronized, so it is also thread-safe, while arraylist is thread-asynchronous and unsafe. If the thread safety factor is not considered, it is generally more efficient to use arraylist.
If the number of elements in the collection is greater than the current length of the collection array, the vector growth rate is 100% of the current array length, and the arraylist growth rate is 50% of the current array length. If you use a large amount of data in a collection, using a vector has certain advantages.
If you look for data at a specified location, the time used by vector and arraylist is the same. If you frequently access data, you can use both vector and arraylist at this time. And if moving a specified position will cause the following elements to move, you should consider using linklist at this time, because it moves the data of a specified position and other elements do not move.
ArrayList and Vector use arrays to store data. The number of elements in this array is larger than the actual data stored in order to add and insert elements. Direct serial number indexing elements are allowed, but inserting data involves memory operations such as array element movement, so indexing data is fast. Inserting data is slow. Vector uses a synchronized method (thread safety), so its performance is worse than ArrayList. LinkedList uses a doubly linked list for storage. Indexing data by serial number requires forward or backward traversal, but when inserting data, only the record book is required. The items before and after the item can be used, so it is faster to insert several times.

arraylist and linkedlist
ArrayList is a data structure based on dynamic arrays, and LinkedList is a data structure based on linked lists.
For random access get and set, ArrayList feels better than LinkedList, because LinkedList has to move the pointer.
For the add and delete operations add and remove, the LinedList is more dominant, because the ArrayList needs to move the data. This depends on the actual situation. If only a single piece of data is inserted or deleted, the speed of ArrayList is better than that of LinkedList. However, if data is randomly inserted and deleted in batches, the speed of LinkedList is much better than that of ArrayList. Because ArrayList inserts a piece of data, it needs to move the insertion point and all the data after it.

HashMap and TreeMap
HashMap can quickly find its content through hashcode, and all elements in TreeMap maintain a certain fixed order. If you need to get an ordered result, you should use TreeMap (the order of elements in HashMap is not fixed).
Insert, delete and locate elements in Map, HashMap is the best choice. But if you want to iterate over keys in natural order or custom order then TreeMap is better. Using a HashMap requires that the added key classes have well-defined implementations of hashCode() and equals().
The elements in the two maps are the same, but the order is different, resulting in different hashCode().
Do the same test:
in HashMap, the map of the same value is in different order, when equals, false;
and in treeMap, the map of the same value, in different order, when equals, true, indicating that treeMap is in equals() Arranged in order.

7. Frequently Asked Questions in Interviews

(1) What is the difference between Iterator and ListIterator?
Iterator: It can only traverse the collection forward, and is suitable for getting removed elements. ListIerator: Inheriting Iterator, it can traverse two-way lists, and also supports element modification.
(2) What is HaspMap and Map?
Map is an interface, part of the Java collection framework, used to store key-value pairs, and HashMap is a class that implements Map with a hash algorithm.
(3) What is the difference between HashMap and HashTable? Comparing Hashtable VS HashMap
, both use key-value methods to obtain data. Hashtable is one of the original collection classes (also known as legacy classes). HashMap was added in version 1.2 of Java 2 as part of the new collections framework. There is a difference between them:
　　● HashMap and Hashtable are roughly equivalent, except for asynchronous and null values (HashMap allows null values as key and value, while Hashtable does not).
　　● HashMap cannot guarantee that the order of the mapping will remain unchanged, but as a subclass of HashMap LinkedHashMap, if you want to iterate in a predictable order (the default is in the insertion order), you can easily replace it with HashMap, if you use Hashtable, it is not so easy .
　　● HashMap is not synchronized, and Hashtable is synchronized.
　　● Iterative HashMap uses fail-fast mechanism, while Hashtable does not, so this is a design consideration.
(4) What does synchronization mean in the context of Hashtable?
　　Synchronization means that only one thread can modify the hash table at a time. Any thread needs to acquire the object lock before executing the update operation of the hashtable, and other threads wait for the release of the lock.
(5) How to synchronize Hashmap?
　　HashMap can achieve synchronization through Map m = Collections.synchronizedMap(hashMap).
(6) When to use Hashtable and when to use HashMap
　　The basic difference is that Hashtable is synchronized and HashMap is not synchronized, so whenever there is the possibility of multiple threads accessing the same instance, Hashtable should be used, otherwise HashMap should be used. Non-thread-safe data structures can lead to better performance.
　　If in the future there is a possibility - you need to get key-value pairs in order, HashMap is a good choice, because there is a subclass of HashMap LinkedHashMap. So if you want to iterate predictably in order (in insertion order by default), you can easily replace HashMap with LinkedHashMap. On the other hand, if you use Hashtable, it is not so simple. At the same time, if there are multiple threads accessing HashMap, Collections.synchronizedMap() can be used instead. In general, HashMap is more flexible.
(7) Why is the Vector class considered obsolete or unofficially deprecated? Or why should we use ArrayList instead of Vector
all the time 　　you should use ArrayList instead of Vector is because by default you are accessing asynchronously, Vector synchronizes every method, you almost never do that, usually what you want to synchronize is the entire sequence of operations. Synchronizing individual operations is also not safe (if you iterate over a Vector, you still have to lock to prevent other threads from changing the collection at the same time). And it's slower. Of course there is also the overhead of locking even if you don't need it, which is a terrible way to synchronize access by default. You can always use Collections.sychronizedList to decorate a collection.
　　In fact Vector combines a collection of "mutable arrays" and synchronizes the implementation of each operation. This is another design flaw. Vector also has some legacy methods in the enumeration and element acquisition methods, which are different from the List interface, and programmers tend to want to use it if these methods are in the code. Although enumeration is faster, they cannot check if the collection is modified while iterating, which will cause problems. Despite the above reasons, oracle has never announced that it will abandon Vector.
(8) The difference between HashMap and ConcurrentHashMap

8. The traversal method of map is introduced:

There are 4 types of map traversal, let's look at them in turn.

The first one is the most common usage, this usage can get the key and value at the same time. Disadvantage: If the map is empty, there will be a null pointer exception, then each time before traversing the map, it must first evaluate the null

public static void forEachMap(Map<String,String> map) {         
    for ( Map.Entry<String,String> entry : map.entrySet()) {    
        System.out.println(entry.getKey()+entry.getValue());    
    }                                                           
}

Let's take a look at the second traversal method. This method is to traverse only the key or value. This method is slightly more efficient than the first method, and the code can be a little more concise. Similarly, this method also needs to determine whether the map is empty.

 public static void forEachMap2(Map<String,String> map){     
     for (String str :map.keySet()){                         
         System.out.println(str);                            
     }                                                       
     for (String str :map.values()){                         
         System.out.println(str);                            
     }                                                       
 }

The third way is to use the iterator way

  /**                                                                           
   * 使用迭代器                                                                   
   * @param map                                                                 
   */                                                                           
  public static void forEachMap3(Map<String, String> map) {                     
      Iterator<Map.Entry<String, String>> iterator = map.entrySet().iterator(); 
      while (iterator.hasNext()) {                                              
          Map.Entry<String, String> entry = iterator.next();                    
          System.out.println(entry.getKey() + entry.getValue());                
      }                                                                         
  }      

  /**                                                                           
   * 使用迭代器但是不适用泛型                                                               
   *                                                                            
   * @param map                                                                 
   */                                                                           
  public static void forEachMap4(Map<String, String> map) {                     
      Iterator iterator = map.entrySet().iterator();                            
      while (iterator.hasNext()){                                               
          Map.Entry entry= (Map.Entry) iterator.next();   
          //这里的类型转换的原因是，如果不加String，那么背默认为两个object，不能相加                      
          System.out.println((String)entry.getKey() + entry.getValue());        
      }                                                                         
  }

There is no big difference between the two methods, but there is a difference in generics.
The fourth is to get the key value of the map first, and then get the value value. This method is relatively inefficient and is generally not recommended.

for (Integer key : map.keySet()) {  
    Integer value = map.get(key);  
    System.out.println("Key = " + key + ", Value = " + value);  
}