Java Core Technology Interview Essentials (Lecture 8) | What is the difference between Vector, ArrayList and LinkedList?

In our daily work, it is very important to be able to efficiently manage and manipulate data. Since the data structures supported by each programming language are not the same, for example, the C language I first learned, I need to implement a lot of basic data structures by myself, and management and operation will be more troublesome. In contrast, Java is much more convenient. For the needs of common scenarios, Java provides a powerful collection framework, which greatly improves the productivity of developers.

What I want to ask you today is about the collection framework. What is the difference between Vector, ArrayList and LinkedList?


Typical answer

All three implement the List in the collection framework, which is the so-called ordered collection. Therefore, the specific functions are relatively similar. For example, they all provide positioning, adding or deleting operations according to location, and all provide iterators to traverse its content. . But because of specific design differences, the performance is very different in terms of behavior, performance, and thread safety.

Vector is a thread-safe dynamic array provided in the early days of Java. If thread safety is not required, it is not recommended. After all, synchronization has additional overhead. Vector internally uses an object array to store data. The capacity can be automatically increased as needed. When the array is full, a new array will be created and the original array data will be copied.

ArrayList is a more widely used dynamic array implementation. It is not thread-safe by itself, so the performance is much better. Similar to Vector, ArrayList can also adjust the capacity according to needs, but the adjustment logic of the two is different. Vector will be doubled when expanding, while ArrayList will be increased by 50%.

LinkedList, as the name suggests, is a doubly linked list provided by Java, so it does not need to adjust the capacity like the above two, and it is not thread-safe.

Test site analysis

It seems that this question has always been a classic interview question since I contacted Java. My previous answer covered some basic design and implementation of the three.

Generally speaking, you can also add some suitable scenarios for different container types:

  • Vector and ArrayList are dynamic arrays, and their internal elements are stored sequentially in array form, so they are very suitable for random access occasions. In addition to inserting and deleting elements at the tail, the performance is often relatively poor. For example, when we insert an element in the middle position, we need to move all subsequent elements.
  • The LinkedList is much more efficient for node insertion and deletion, but the performance of random access is slower than that of dynamic arrays.

Therefore, in application development, if it can be estimated in advance whether the application operation is biased towards insertion, deletion, or more random access, then you can make targeted selections. This is also the most common angle of investigation for interviews. Given a scenario, choose a suitable data structure, so you must be clear about this typical choice.

Looking at the Java collection framework, I think there are many aspects that need to be mastered:

  • The design structure of the Java collection framework must at least have an overall impression.
  • Java provides the main container (collection and Map) types, understand or master the corresponding data structure, algorithm, and think about specific technology choices.
  • Extend the problem to areas such as performance and concurrency.
  • The evolution and development of the collective framework.

As a Java column, I will try to expand around Java as much as possible, otherwise just listing the data structures involved in the collection part will take up a lot of space. This does not mean that those who are not important, data structures and algorithms are basic skills, and are often required points. Some companies are even very well-known (or even "notorious") for investigating these aspects. Here I take the need to master a typical sorting algorithm as an example, you need to be familiar with at least:

  • Internal sorting, at least master the basic algorithms such as merge sort, exchange sort (bubble, fast sort), selection sort, insertion sort, etc.
  • External sorting, mastering the use of memory and external storage to process large data sets, at least to understand the process and ideas.

Investigating the algorithm is not just how simple it is to implement, interviewers often ask the bottom line, such as which sort is unstable (fast sorting, heap sorting), or thinking about what it means to be stable; for different data sets, various sorts are the best Or the worst case; how to further optimize from a certain perspective (such as space occupation, assuming that the business scenario requires minimal auxiliary space, this perspective is better than merging heap sorting), etc., from simple understanding to further thinking, the interviewer usually Observe the interviewer’s thinking when dealing with problems and communicating.

The above is just an example. It is recommended to study related books, such as "Introduction to Algorithms", "Programming Pearls", etc., or related tutorials . For specific fields, such as recommendation systems, it is recommended to consult field experts. Purely from the perspective of interviews, many friends recommend using some algorithmic websites such as LeetCode to help review and prepare for interviews, but frankly I have not brushed these algorithmic questions. This is also a matter of the benevolent and the wise. I prefer to recruit. Investigate what the interviewer is best at, so as not to hire a pure interviewer.

Knowledge expansion

Let's first understand the overall design of the collection framework. In order to have an intuitive impression, I drew a brief class diagram. Note that in order to avoid confusion, I did not add the thread-safe container under java.util.concurrent; nor did it list the Map container. Although we usually conceptually consider Map as part of the collection framework, it is not The real collection (Collection).

Therefore, today I mainly focus on the narrow collection framework, and the rest will be explained in the content at the back of the column.

We can see the collection framework of Java, the Collection interface is the root of all collections, and then expanded to provide three types of collections, namely:

  • List, which is the most ordered collection we introduced earlier, provides convenient access, insertion, and deletion operations.
  • Set and Set do not allow duplicate elements. This is the most obvious difference from List, that is, there are no two objects equals return true. In our daily development, there are many occasions where we need to ensure the uniqueness of elements.
  • Queue/Deque is the implementation of the standard queue structure provided by Java. In addition to the basic functions of the collection, it also supports similar first-in-first-out (FIFO, First-in-First-Out) or last-in-first-out (LIFO, Last-In) -First-Out) and other specific behaviors. BlockingQueue is not included here, because it is usually a concurrent programming occasion, so it is placed in the concurrent package. 

The general logic of each set is abstracted into the corresponding abstract class. For example, AbstractList concentrates the common parts of various List operations. These collections are not completely isolated. For example, LinkedList itself is both a List and a Deque.

If you read more source code , you will find that, in fact, the TreeSet code is actually implemented by TreeMap by default. The Java class library creates a Dummy object "PRESENT" as the value, and then all inserted elements are actually put in the form of keys. Into the TreeMap; in the same way, HashSet is actually implemented on the basis of HashMap. It turns out that they are just the vest of the Map class!

As mentioned earlier, we need to implement various specific collections, at least understand the basic features and typical usage scenarios, take several implementations of Set as examples:

  • TreeSet supports natural sequential access, but operations such as addition, deletion, and inclusion are relatively inefficient (log(n) time).
  • HashSet uses the hash algorithm. Ideally, if the hash is normal, it can provide constant time addition, deletion, and inclusion operations, but it does not guarantee order.
  • LinkedHashSet internally builds a doubly linked list of records in the insertion order, so it provides the ability to traverse according to the insertion order. At the same time, it also guarantees constant-time addition, deletion, and inclusion operations. The performance of these operations is slightly lower than that of HashSet because Need to maintain the overhead of the linked list.
  • When traversing elements, the performance of HashSet is affected by its own capacity, so when initializing, unless it is necessary, do not set the capacity of the HashMap behind it too large. For LinkedHashSet, due to the convenience provided by its internal linked list, the traversal performance is only related to the number of elements.

The collection classes I introduced today are not thread-safe. For the thread-safe containers in java.util.concurrent, I will introduce them later in the column. However, it does not mean that these collections cannot support concurrent programming scenarios at all. In the Collections tool class, a series of synchronized methods are provided, such as

static <T> List<T> synchronizedList(List<T> list)

We can use similar methods to achieve basic thread-safe collections:

List list = Collections.synchronizedList(new ArrayList());

Its implementation is basically to add basic synchronization support for every basic method, such as get, set, add, etc., through synchronized, which is very simple and rude, but also very practical. Note that the thread-safe collections created by these methods conform to the fail-fast behavior during iteration. When unexpected concurrent modifications occur, ConcurrentModificationException is thrown as early as possible to avoid unpredictable behavior.

Another question that is often investigated is to understand the default sorting algorithm provided by Java, the specific sorting method and design ideas.

This question itself is a bit of a trap, because it is necessary to distinguish between Arrays.sort() and Collections.sort() (the bottom layer is to call Arrays.sort()); what data type; how big a data set (a data set that is too small, Complex sorting is not necessary, Java will directly perform binary insertion sort) and so on.

  • For primitive data types, the so-called Dual-Pivot QuickSort is currently used, which is an improved quick sort algorithm. The early version is a relatively traditional quick sort. You can read the source code.
  • For object data types, TimSort is currently used, which is also an optimized sorting algorithm that combines merge and binary sort (binarySort). TimSort is not the original creation of Java. Simply put, its idea is to find the sorted partitions in the data set (here called run), and then merge these partitions to achieve the purpose of sorting.

In addition, Java 8 introduced a parallel sorting algorithm (using the parallelSort method directly), which is to make full use of the computing power of modern multi-core processors. The underlying implementation is based on the fork-join framework (fork-join will be introduced in relative detail later in the column) , When the processed data set is relatively small, the gap is not obvious, or even worse; but when the data set grows to tens of thousands or more than one million, the improvement is very large, depending on the processor and system surroundings.

The sorting algorithm is still improving. Recently, the author of the implementation of dual-axis quick sorting has submitted a further improvement. It has been researched for many years and is currently in the review and verification stage. According to the author's performance test comparison, compared with the implementation based on merge sorting, the new improvement can increase the speed of random data sorting by 10% to 20%, and even on data sets with other characteristics, it has been improved several times. If you are interested, you can Refer to the specific code and introduction:

http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-January/051000.html

In Java 8, the Java platform supports Lambda and Stream, and the corresponding Java collection framework has also been extensively enhanced to support methods similar to creating corresponding streams or parallelStreams for collections. We can implement functional code very conveniently .

Reading the Java source code, you will find that the design and implementation of these APIs are quite unique. They are not implemented in abstract classes, but in the form of default methods in interfaces such as Collection! This is a new feature of Java 8 at the language level, allowing interfaces to implement default methods. In theory, most of the methods we originally implemented in tool classes like Collections can be converted to corresponding interfaces. In response to this, in the subject of object-oriented, I will specifically sort out the evolution of the basic object-oriented mechanism of the Java language.

In Java 9, the Java standard class library provides a series of static factory methods, such as List.of(), Set.of(), which greatly simplifies the amount of code for building small container instances. According to the industry's practical experience, we found that quite a few collection instances have very limited capacity and will not be modified during the life cycle. However, in the original Java class library, we may have to write:

ArrayList<String>  list = new ArrayList<>();
list.add("Hello");
list.add("World");

With the new container static factory method, one sentence of code is enough, and immutability is guaranteed.

List<String> simpleList = List.of("Hello","world");

Furthermore, the instances created through various of static factory methods also apply some of our so-called best practices, for example, it is immutable and meets our requirements for thread safety; it does not need to consider expansion, so space The upper is more compact and so on.

If we look at the source code of the of method, you will also find a particularly interesting place: we know that Java already supports the so-called variable parameters (varargs), but the official class library still provides a series of methods with specific parameter lengths, it seems It seems very inelegant, why? This is actually for optimal performance. JVM will have obvious additional overhead when processing variable-length parameters (netizens running snail note: java variable-length parameter processing uses a new array), if you need to achieve performance sensitivity The API can also be referenced. 

Today, I started with Verctor, ArrayList, LinkedList, and gradually analyzed the differences in design and implementation, suitable application scenarios, etc., and further briefly summarized the collection framework, and introduced various improvements of the collection framework from basic algorithms to API design and implementation. Hope it can be helpful to your daily development and API design.

Practice one lesson

Do you know what we are discussing today? Leave a question for you to think about an application scenario first. For example, you need to implement a cloud computing task scheduling system, hoping to ensure that the tasks of VIP customers are processed first. What data structure or standard collection type can you use? Furthermore, what data structure are most similar scenarios based on?


Other classic answers

The following is from netizen Lei Pili’s father’s answer to the questions of each lesson:

Under this topic, you will naturally think of priority queues, but you also need to consider vip reclassification, that is, the issue of equal rights for vip of the same level, so you should consider the priority queue priority rule issue in addition to the direct and vip level related priority queue , We have to consider the problem that multiple customers of the same level are not blocked by a large number of tasks for a single customer. The data structure is indeed the basis. Even if this scenario is considered in this question, the data to be scheduled will probably be placed in redis.

The following answer comes from the netizen Sun Xiaogang:

Select the first one. Regarding the issue of reading and writing efficiency, I feel that the expression is lacking, or that it cannot be so absolute.

1. Not all additions and deletions will open up new memory. If there is no new memory, the efficiency will be leveraged.

2. The tail deletion does not need to open up new memory, just remove the last object.

Before, I also received the ArrayList feature, which has fast random access and poor efficiency in addition and deletion. I didn't know until I saw the source code, not so absolute.
The direct result is that the scenarios that are suitable for using ArrayList will choose LinkedList because of this general statement.

The following is the answer from the netizen's official account-Technology Sleeplessly:

Vector, ArrayList, and LinkedList are all linear data structures, but there are differences in implementation and application scenarios.

1 Bottom layer implementation
ArrayList is implemented with an array; LinkedList is implemented with a doubly linked list; Vector is implemented with an array.

2 Read and write mechanism
ArrayList When the inserted element exceeds the predefined maximum value of the current array, the array needs to be expanded. The expansion process needs to call the underlying System.arraycopy() method to perform a large number of array copy operations; it will not reduce when deleting elements The capacity of the array (if you need to reduce the capacity of the array, you can call the trimToSize() method); when looking for an element, you need to traverse the array, and find the non-null elements in an equals way.

When inserting an element, LinkedList must create a new Entry object and update the references of the elements before and after the corresponding element; when searching for an element, you need to traverse the linked list; when deleting an element, you must traverse the linked list to find the element to be deleted, and then Just delete this element from the linked list.
The capacity expansion mechanism of Vector and ArrayList is inconsistent only when inserting elements. For Vector, an Object array with a size of 10 is created by default, and capacityIncrement is set to 0; when the size of the inserted element array is not enough, if the capacityIncrement is greater than 0, the size of the Object array is expanded to the existing size+capacityIncrement; if capacityIncrement< =0, the size of the Object array is expanded to twice the existing size.

3 Read and write efficiency

The addition and deletion of elements in ArrayList will cause dynamic changes in the memory allocation space of the array. Therefore, its insertion and deletion speed is slow, but the retrieval speed is fast.

Because LinkedList stores data based on a linked list, the speed of adding and deleting elements is faster, but the retrieval speed is slower.

4 Thread safety

ArrayList and LinkedList are non-thread-safe; Vector is a thread-safe ArrayList based on synchronized implementation.

It should be noted that: Single-threaded should try to use ArrayList, Vector will have performance loss due to synchronization; even in a multi-threaded environment, we can use the synchronizedList (List list) method provided for us in the Collections class to return a thread-safe synchronization List object.

Answer to the question

Use PriorityBlockingQueue or Disruptor to implement an execution scheduling system based on task priority as a scheduling strategy.

The following is from netizen Joshua’s answer to each lesson:

Since it is the subject of Java, use PriorityBlockingQueue.
If it is a real scene, it will definitely consider a high-availability and durable solution.
In fact, I think we should refer to the bank window. There are three windows at the same time, which are three queues. The bank is the consumer thread. A certain window is VIP first. When there is no VIP, it also serves ordinary customers. To achieve this, either have a dispatcher, or keep the VIP channel forbidden to enter, and the VIP counter is stolen from other queues when it is idle.

The following comes from netizen linco_66's answer for each lesson:

Since the tasks to be processed have a sequence relationship, the first thing to think of is to use the priority queue. Use PriorityQueue to set the priority of VIP users to the highest and give priority to processing. Learning from the scheduling algorithm in the operating system, for other users, we can also design a variety of fair priority selection algorithms (based on the queuing sequence, based on the length of time required for scheduling tasks (short job priority algorithm in the operating system) sorting, High response ratio ((time used + waiting time)/waiting time) prioritized for sorting), combined with PriorityQueue.
Most similar scenarios are based on queue-based data structures. In terms of actual tools, Message Queuing (MQ) is a very straightforward example. You can use the message queue to perform cutting operations on user requests, the foreground quickly responds, and the background performs private processing operations.
In addition, optimization can be thought of: using the advantages of a distributed system to distribute VIP user requests to servers with higher computing power for processing. Achieve the characteristics of high availability!

The following answer comes from netizen jackyz:

Collection: It's like a kind of container. A container used to store, retrieve, and manipulate objects.

1. Disadvantages of
arrays ①The length of the array is immutable ②The array does not provide a method to view the number of effective elements

2. The characteristics of the
collection ①The length of the
collection is variable ②The collection can store any type of object
③The collection can only store objects

3. Collection framework
java.util.Collection: the root interface of the collection hierarchy
    |--- java.util.List: orderly and repeatable.
        |--- ArrayList: uses an array structure to store elements. Select when query operations are many times
        |--- LinkedList: Use a linked list structure to store elements. Add and delete operations often choose
        |--- Vector:
    |--- java.util.Set: disorderly, no repetition is allowed.
        |--- HashSet: is a typical implementation class of the Set interface.
            The basis for judging whether an element exists is: first compare the hashCode value, if the hashCode exists, then compare the content through equals().
                                     If the hashCode value does not exist, store it directly.

            Note: Rewriting hashCode and equals must be consistent!
            |--- LinkedHashSet: Compared with HashSet, there is more order of linked list maintenance elements. Traversal efficiency is higher than HashSet, and addition and deletion efficiency is lower than HashSet
        |--- TreeSet: Has its own sorting method
            |-- Natural sorting (Comparable):
                ①Need to add the class of the objects in the TreeSet collection to implement the Comparable interface
                ②To implement the compareTo(Object o) method
            |-- Custom sorting (Comparator)
                ①Create a class that implements the Comparator interface
                ②Implement the compare(Object o1, Object o2) method
                ③The instance of the implementation class is passed as a parameter to the TreeSet constructor

 

 

 

 

Guess you like

Origin blog.csdn.net/qq_39331713/article/details/114093929