Java collection class summary (including interview frequently asked questions)

A summary of Java collections and collection-related questions that are often asked in interviews. Keeping these in mind, when facing back-end and big data positions, when faced with issues related to Java collections, there is basically no problem.

Java collection class summary

/Users/during/Documents/Summary/Java/Interview/Collection/
insert image description here

  • Collection

    • List

      ordered list; may have repeating elements

      • ArrayList

        • Internally implemented by an array, an array is a contiguous memory unit

        • Adding and deleting elements is slow because the entire array needs to be copied, but the query speed is fast

        • 0.5 times expansion + 1. The default initialization length of the instantiated no-argument constructor is 10. If the number of added elements exceeds 10, use Arrays.copyOf to expand.

      • LinkedList

        • The interior is implemented by a linked list, which is a discontinuous memory unit (the first unit and the last unit of LinkedList will point to the header, thus forming a two-way linked list structure)

        • Adding and deleting elements is fast, because only the reference needs to be modified, but the query is not as fast as ArrayList

      • Vector

        • Internally implemented by an array

        • Thread-safe, can be used in a multi-threaded environment, the efficiency is lower than ArrayList

        • Expanded to 2 times the original

    • Set

      Unordered collection (but iterating over Set, the order is the same every time); not repeatable.

      • HashSet

        • out of order

        • can put null

      • LinkedHashSet

        • The order of insertion is preserved and returned in the order of addition when iterating.
      • TreeSet

        • It is realized by binary tree

        • TreeSet implements Set interface and SortedSet interface

        • It stores elements in order, which is not the order in which elements are inserted, but the order in which elements are inserted.

        • Cannot save null

    • Queue

      Queue, first in first out, implementation class of Queue: LinkedList

  • Map

    The key cannot be repeated (inserting the same key will be overwritten), and the value can be repeated.

    • HashMap

      • out of order

      • Allows null keys and values

      • Expansion mechanism: initial size=16, newsize = oldsize * 2. The total number of elements in the Map exceeds 75% of the Entry array, triggering expansion. The size must be the nth power of 2, because this can make the hash distribution more uniform and improve the query efficiency of the hash table.

      • HashMap does not implement the iterable interface, so it cannot be traversed using foreach. But its keySet method can return the collection of its key.

    • LinkedHashMap

      • Element insertion order is preserved. Returns in insertion order when iterating.

      • Suitable for inserting and deleting elements in Map

    • TreeMap

      • Suitable for traversing keys (keys) in natural order (the size of the element's key) or in a custom order

      • Implemented based on red-black tree. TreeMap has no tuning options because the tree is always in balance.

    • Hashtable

      • Null keys and values ​​are not allowed

      • Thread safety: The way to achieve thread safety is to lock the entire Hashtable when modifying data, which is less efficient than HashMap

      • Low-level array + linked list implementation

      • Expansion mechanism: initial size=11, newsize = oldsize * 2 + 1

interview frequently asked collection questions

- Compare the pros and cons of sets and arrays

  • An array type is a collection of data with the same data type; a collection is a container for multiple objects, and multiple objects of different data types can be organized together.
  • Arrays are not dynamic. Once an array is created, its capacity is fixed and cannot be modified. In order to add new elements, it is necessary to create an array with a larger capacity and copy the data to the new array; the collection is dynamic. Collections allow dynamic addition and removal of elements. When the number of elements exceeds the capacity of the collection, the collection will automatically expand.
  • Array is an underlying data structure supported by many languages, and it has the highest performance.

- Write and explain the 6 methods commonly used in the Arrays class

Arrays is a tool class for arrays.
sort() sorts the array
binarySearch() searches through the binary search method, if the key is found, returns the index, otherwise returns a negative number. Arrays must be pre-sorted.
copyOf(T[], length) copies elements of the specified length in the array
equals() judges whether two array elements are equal
fill(T[],key) fills the value of each element in the array as key
asList() converts the array into a List
toString () into a string

- Write out the 6 methods of Collections and explain them

sort() sorts the set
shuffle() shuffles the order of elements in the set
addAll() adds one set to another set
max() determines the maximum value in the set
min() determines the minimum value in the set
copy() adds a set Copy the elements in another collection
to fill() to replace all the elements in one collection with the specified elements

- List, Set, Map difference

Collections in Java include three categories: List, Set, and Map. Both are in the java.util package. List, Set, and Map are all interfaces, and each has its own implementation class. The implementation classes of List mainly include ArrayList, LinkedList, the implementation classes of Set mainly include HashSet, TreeSet, and the implementation classes of Map mainly include HashMap and TreeMap.

List
storage: ordered and repeatable
Access: for, foreach, iterator
The objects in the list are sorted according to the index position, and there can be duplicate objects, allowing to retrieve objects according to the index position of the object in the collection, such as through list.get(i ) to get the elements in the collection.

Set
storage: unordered, non-repeated
Access: foreach,
objects in iterator Set are not sorted in a specific way, and there are no duplicate objects. But some of its implementation classes can sort the objects in the collection in a specific way. You can customize the sorting method by implementing the java.util.Comparator interface.

Map
storage: the storage is a one-to-one mapping "key=value", the key is unordered and not repeated, and the value can be repeated.
Access: You can convert the key in the map to Set storage, iterate the set, and use map.get(key) to get the value. It can also be converted to an entry object and iterated with an iterator.
Each element in the Map contains a key object and a value object, appearing in pairs, the key cannot be repeated, and the value can be repeated.

- The difference between Collection and Collections

Collection is the upper-level interface of the collection class, and the interfaces inherited from it mainly include List and Set.
Collections is a helper class for the collection class, which provides a series of static methods to implement operations such as sorting, searching and thread safety of the collection.

- Tell the difference between ArrayList, LinkedList, and Vector

ArrayList is the most commonly used List implementation class. It is implemented internally through an array, which allows fast random access to elements. The disadvantage of the array is that there can be no interval between each element. When the size of the array is not enough, the storage capacity needs to be increased, and the data of the existing array must be copied to the new storage space. When inserting or deleting elements from the middle position of the ArrayList, the array needs to be copied and moved, and the cost is relatively high. Therefore, it is suitable for random lookup and traversal, not for insertion and deletion. When expanding, it is 1.5 times of the original.
Vector, like ArrayList, is also implemented through an array, ordered, and both allow direct indexing of elements by serial number. The difference is that it supports thread synchronization, that is, only one thread can write Vector at a time, avoiding inconsistency caused by simultaneous writing of multiple threads, but synchronization requires high cost, so accessing it is slower than accessing ArrayList. When expanding, it is twice the original size.
LinkedList stores data in a linked list structure, which is very suitable for dynamic insertion and deletion of data, and the speed of random access and traversal is relatively slow. In addition, it also provides methods that are not defined in the List interface, which are specially used to operate the header and tail elements, and can be used as stacks, queues, and bidirectional queues.

- The difference between HashMap and Hashtable

  • The key value of HashMap can be null, and the key value of Hashtable cannot be null;

  • HashMap thread is not safe, Hashtable thread safe.
    In general, HashMap is used.

- The elements in the Set cannot be repeated. What method is used to distinguish whether it is repeated? Use == or equals()? What's the difference?

Use iterator () method to distinguish whether it is repeated or not.
The equals method (which the String class inherits from its superclass Object) is used to check whether two objects are equal, that is, whether the contents of the two objects are equal.
== has different functions when comparing references and basic data types:
comparing basic data types, if the two values ​​are equal, the result is true;
when comparing references, if the reference points to the same object in memory, the result is true.

- How to sort an object, there are several ways

Put the objects into the List collection, and use the Collections tool class to call the sort() method to sort, but this class must implement the Comparable interface. Or put the object in the Set, and use the TreeSet implementation class to sort the collection directly.

- Remove duplicate elements in the Vector collection

Use the Vector.contains() method to determine whether the element is contained, and if not, add it to a new collection, which is suitable for small data.

- List conversion

// 将List转变为逗号分隔的字符串
List<String> cities = Arrays.asList("Milan", "London",  "New York")
String str = String.join(",", cities)
// 逗号分隔的字符串转list
List<String> cities = Arrays.asList("Milan", "London",  "New York")

+ HashMap principle and other related issues

reference link

http://www.importnew.com/7099.html
http://blog.csdn.net/wenyiqingnianiii/article/details/52204136
http://blog.csdn.net/ghsau/article/details/16843543/
http://www.oracle.com/technetwork/cn/articles/maps1-100947-zhs.html
http://www.cnblogs.com/Qian123/p/5703507.html

- HashMap data structure

Data structure of Entry[] array + linked list

- Override the hashCode and equals methods

The Object class comes with hashCode and equals methods, so all its subclasses have these two methods. But the implementation of these two methods is not reasonable in common cases. The hashCode method should enable the keys to be evenly distributed on the hash table and effectively avoid hash collisions. The equals method can't just compare the memory address of the object, it should be implemented according to the actual situation. Fortunately, if we use String as the key type, then we don't need to care about the implementation of these two methods, because the String class overrides these two methods of the Object class.

Java stipulates this for the eqauls method and the hashCode method:

  1. If two objects are the same (the equals method returns true), then their hashCode values ​​must be the same; for two objects whose equals() method is not equal, hashCode() may be equal (hash collision).
  2. If two objects have the same hashCode, they are not necessarily the same. hashCode() does not wait, and equals() must be launched.
  3. When the equals method is overridden, it is often necessary to override the hashCode method in order to maintain the normal contract of the hashCode method, which states that equal objects must have equal hash codes.
  4. == in java (reference type) is to compare the addresses of two objects in the JVM.
/** JNI,调用底层其它语言实现 */  
public native int hashCode();  

/** 默认同==,直接比较对象 */  
public boolean equals(Object obj) {
    
      
	return (this == obj);  
}

// String类中重写了equals方法,比较的是字符串值,看一下源码实现:
public boolean equals(Object anObject) {
    
      
    if (this == anObject) {
    
      
        return true;  
    }  
    if (anObject instanceof String) {
    
      
        String anotherString = (String) anObject;  
        int n = value.length;  
        if (n == anotherString.value.length) {
    
      
            char v1[] = value;  
            char v2[] = anotherString.value;  
            int i = 0;  
            // 逐个判断字符是否相等  
            while (n-- != 0) {
    
      
                if (v1[i] != v2[i])  
                        return false;  
                i++;  
            }  
            return true;  
        }  
    }  
    return false;  
} 

Overriding equals must meet several conditions:

  • Reflexivity : x.equals(x) shall return true for any non-null reference value x.

  • Symmetry : For any non-null reference values ​​x and y, x.equals(y) shall return true if and only if y.equals(x) returns true.

  • Transitivity : For any non-null reference values ​​x, y, and z, if x.equals(y) returns true, and y.equals(z) returns true, then x.equals(z) shall return true.

  • Consistency : Multiple calls to x.equals(y) always return true or always return false for any non-null reference values ​​x and y, provided no information used in equals comparisons on the objects has been modified.

    x.equals(null) shall return false for any non-null reference value x.

- HashMap put principle

All elements are managed through a hash table. When we call put to store a value, HashMap will first call the hashCode method of K to obtain the hash code, and quickly find a storage location through the hash code. This location can be called bucketIndex. In theory, there may be conflicts in hashCode. There is a professional term called collision. When a collision occurs, the calculated bucketIndex is also the same. At this time, the stored element at the bucketIndex location will be fetched, and finally compared by equals. The equals method It is the method that will be executed when the hash code collides. HashMap finally judges whether K already exists through hashCode and equals:

  1. If there is no element at the bucketIndex position, store the new key-value pair <K, V> at the bucketIndex position.
  2. If it already exists, the objects are the same, replace the old V value with the new V value, and return the old V value.
  3. If there is an element at the bucketIndex position, that is, the hashCode is the same, but the objects or equals are different, the new element will also be placed at this position, the new element will be added to the head of the linked list, and the original element will be pointed to by next.
// 源码
public V put(K key, V value) {
    
      
    // 处理key为null,HashMap允许key和value为null  
    if (key == null)  
        return putForNullKey(value
    // 得到key的哈希码  
    int hash = hash(key);  
    // 通过哈希码计算出bucketIndex  
    int i = indexFor(hash, table.length);  
    // 取出bucketIndex位置上的元素,并循环单链表,判断key是否已存在  
    for (Entry<K,V> e = table[i]; e != null; e = e.next) {
    
      
        Object k;  
        // 哈希码相同并且对象相同时  
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
    
      
            // 新值替换旧值,并返回旧值  
            V oldValue = e.value;  
            e.value = value;  
            e.recordAccess(this);  
            return oldValue;  
        }  
    }  
  
    // key不存在时,加入新元素  
    modCount++;  
    addEntry(hash, key, value, i);  
    return null;  
} 

- HashMap get principle

  1. When the Entry stored in each bucket of the HashMap is only a single Entry—that is, when no Entry chain is generated through pointers, the HashMap at this time has the best performance: when the program retrieves the corresponding value through the key, the system only needs to calculate the value first. The key's hashCode() return value, find out the index of the key in the table array according to the hashCode return value, then take out the Entry at the index, and finally return the value corresponding to the key.

  2. In the case of a "Hash conflict", a single bucket stores not an Entry, but an Entry chain, and the system must traverse each Entry in order until it finds the Entry it wants to search for.

// 源码
public V get(Object key)   
{
    
       
	 // 如果 key 是 null,调用 getForNullKey 取出对应的 value   
	 if (key == null)   
	     return getForNullKey();   
	 // 根据该 key 的 hashCode 值计算它的 hash 码  
	 int hash = hash(key.hashCode());   
	 // 直接取出 table 数组中指定索引处的值,  
	 for (Entry<K,V> e = table[indexFor(hash, table.length)];   
	     e != null;   
	     // 搜索该 Entry 链的下一个 Entr   
	     e = e.next)         // ①  
	 {
    
       
	     Object k;   
	     // 如果该 Entry 的 key 与被搜索 key 相同  
	     if (e.hash == hash && ((k = e.key) == key   
	         || key.equals(k)))   
	         return e.value;   
	 }   
	 return null;   
}

- load factor

When creating a HashMap, there is a default load factor (load factor), its default value is 0.75, which is a trade-off between time and space costs:

  • Increasing the load factor can reduce the memory space occupied by the Hash table (that is, the Entry array), but it will increase the time overhead of querying data, and querying is the most frequent operation (the get() and put() methods of HashMap must be query); if the program is more concerned about space overhead and memory is tight, you can increase the load factor appropriately
  • Decreasing the load factor will improve the performance of data query, but will increase the memory space occupied by the Hash table. If the program is more concerned about time overhead and the memory is more abundant, the load factor can be appropriately reduced

If you know from the beginning that HashMap will save multiple key-value pairs, you can use a larger initial capacity when creating it. If the number of entries in HashMap will never exceed the limit capacity (capacity * load factor), HashMap does not need to call resize The () method reallocates the table array, thus ensuring better performance. Of course, setting the initial capacity too high at the beginning may waste space (the system needs to create an Entry array with a length of capacity), so the initialization capacity setting also needs to be treated carefully when creating a HashMap.

- What if the size of the HashMap exceeds the capacity defined by the load factor?

The default load factor size is 0.75, that is to say, when a map fills up 75% of the buckets, like other collection classes (such as ArrayList, etc.), it will create a bucket array twice the size of the original HashMap to recreate Resize the map and put the original objects into the new bucket array. This process is called rehashing because it calls the hash method to find the new bucket location.

- Do you understand what's wrong with resizing the HashMap?

In the case of multi-threading, a conditional competition may occur (race condition).
When resizing the HashMap, there is indeed a race condition, because if two threads both find that the HashMap needs to be resized, they will try to resize at the same time. During the resizing process, the order of the elements stored in the linked list will be reversed, because when moving to a new bucket position, HashMap will not put the elements at the end of the linked list, but at the head, which is To avoid tail traversing. If the conditional competition occurs, then there is an infinite loop. At this time, you can ask the interviewer, why is it so strange to use HashMap in a multi-threaded environment? )

When dealing with collisions, JDK7 uses a linked list, and JDK8 seems to use a red-black tree, so that the search efficiency will be greatly improved.
So during the interview, I will ask how to deal with it if the list is too long. It is generally said that the algorithm for generating hashcode is not good enough. Then continue to ask what is a good hashcode generation algorithm; if you can answer the question of using a red-black tree, you will ask the question about the red-black tree. So this question is almost enough to face for a long time.

- Why are wrapper classes like String and Integer suitable as keys?

Wrapper classes such as String and Integer are perfectly suitable as the key of HashMap, and String is the most commonly used. Because String is immutable and final, and the equals() and hashCode() methods have been rewritten. Other wrapper classes also have this feature. Immutability is necessary because in order to calculate hashCode(), it is necessary to prevent the key value from changing. If the key value returns a different hashcode when it is put in and when it is retrieved, then you cannot find the object you want from the HashMap. Immutability has other advantages such as thread safety. If you can guarantee that the hashCode is constant just by declaring a field as final, then please do so. Because the equals() and hashCode() methods are used to obtain the object, it is very important that the key object correctly overrides these two methods. If two unequal objects return different hashcodes, the chance of collision will be smaller, which can improve the performance of HashMap.

- Can we use custom objects as keys?

This is an extension of the previous question. Of course, you may use any object as a key, as long as it obeys the definition rules of the equals() and hashCode() methods, and it will not change after the object is inserted into the Map. If the custom object is immutable, then it already qualifies as a key because it cannot be changed after it is created.

- Can we use CocurrentHashMap instead of Hashtable?

This is another very popular interview question, because more and more people use ConcurrentHashMap. We know that Hashtable is synchronized, but ConcurrentHashMap has better synchronization performance because it only locks part of the map according to the synchronization level. ConcurrentHashMap can of course replace HashTable, but HashTable provides stronger thread safety. Check out this blog post to see the difference between Hashtable and ConcurrentHashMap.

Underlying implementation: array + linked list

Expansion: Expansion within a segment (elements in a segment exceed 75% of the length of the corresponding Entry array to trigger expansion, and will not expand the entire Map), check whether expansion is required before insertion, and effectively avoid invalid expansion)

Thread safety: Hashtable's synchronized is for the entire Hash table, that is, each time the entire table is locked to let the thread monopolize; lock separation technology: ConcurrentHashMap allows multiple modification operations to be performed concurrently, and the key lies in the use of lock separation technology.

- Initialize HashMap with anonymous inner class

Map map = new HashMap(){
    
    
{
    
    
        put("name", "张三")
        put("age", "24")
        put("sex", "man")
    }
}
System.out.println(map.get("name"))

Guess you like

Origin blog.csdn.net/u011886447/article/details/104890543