JAVA container learning-collection

I think Java collections are the most important knowledge points in Java foundation, and Java collections must be mastered. When I was in the internship/autumn recruiting interview, as long as I face Java, it must be a Java collection .

As a newcomer, what I am most concerned about is actually how to use this technology in work . To put it another way: "What are the Java collections commonly used in work and what are the application scenarios"

image

How to get started with Java collections and each commonly used subclass I have organized in PDF , so I won’t paste it over, just view it in PDF if needed. This PDF is absolutely satisfactory to you .

List collection

There are two most common collection classes under List collection: ArrayList and LinkedList

At work, I use ArrayList without my head . I asked two colleagues: "Have you used LinkedList in your project?" They both said no.

As we all know, the bottom layer of ArrayList is an array, and the bottom layer of LinkedList is a linked list. Array traversal speed is fast, LinkedList adds and deletes elements fast.

Why use ArrayList instead of LinkedList in your work? The reason is also very simple:

  • At work, the need for traversal is more than addition and deletion . Even adding elements is often only inserted from the end, and ArrayList inserts elements at the end is O(1)

  • ArrayList additions and deletions are not as slow as expected. The underlying calls of ArrayList additions and deletions have copyOf()been optimized. In addition, modern CPUs can block memory operations . ArrayList additions and deletions of normal size are faster than LinkedList.

Therefore, in development, when thinking of using collections to load elements, the first thing that comes to mind is ArrayList.

So here comes, where is LinkedList used? We generally use it on algorithmic problems. Think of LinkedList as a first-in first-out queue, LinkedList itself implements the Queue interface

image

If you consider the issue of thread safety, you can take a look at CopyOnWriteArrayList. It is not used much in actual development, but I think I can understand its idea (CopyOnWrite), which is used in Linux/file system.

image

Set collection

There are three most common collection classes under the Set collection: HashSet, TreeSet, LinkedHashSet

Both List and Set are collections. Generally speaking: if we need to ensure that the elements of the collection are unique, we should think of using the Set collection

For example: now we want to send a batch of messages to the user. In order to reduce the error of "sending duplicate content to the user at once", we use the Set collection to save the user'suserId/phone

Naturally, we must first ensure that there userId/phoneis no duplication of the most upstream batch of users , and we use the Set collection only to make a bottom line to avoid the problem of repeated sending as much as possible.

Generally, the one we use most in development is HashSet . TreeSet is a sortable Set. Generally, we need to be in order. The data pulled from the database is in order, which may often be written order by id descmore. In the development, there is little concern about the orderly insertion of elements, so LinkedHashSet is generally not used.

If you consider the issue of thread safety, you can consider CopyOnWriteArraySet, which is less used (this is a thread-safe Set, and the bottom layer is actually CopyOnWriteArrayList)

TreeSet and LinkedHashSet are more likely to be used when brushing algorithms.

image

image

Map collection

There are also three most common subclasses of the Map collection: HashMap, LinkedHashMap, and TreeMap

If you consider thread safety issues, you should think of ConcurrentHashMap. Of course, you must have a certain understanding of Hashtable, because there are too many questions in the interview.

HashMap is also used a lot in actual development, as long as it is key-valuestructured, we generally use it HashMap. LinkedHashMap and TreeMap are not used much for the same reason as HashSet and TreeSet.

ConcurrentHashMap is also used a lot in actual development. We often use ConcurrentHashMap for local caching . We don't want to request data from the network every time and do local caching locally. Monitor the changes in the data. If the data changes, update the value corresponding to ConcurrentHashMap.

image

image

Queue

I don’t know if you have learned the producer and consumer model. You may be asked to write a piece of code like this during the autumn recruitment interview. The simplest way is to use a blocking queue to write. Similar to the following:

Producer:

import java.util.Random;
import java.util.Vector;
import java.util.concurrent.atomic.AtomicInteger;

public class Producer implements Runnable {

    // true--->生产者一直执行,false--->停掉生产者
    private volatile boolean isRunning = true;

    // 公共资源
    private final Vector sharedQueue;

    // 公共资源的最大数量
    private final int SIZE;

    // 生产数据
    private static AtomicInteger count = new AtomicInteger();

    public Producer(Vector sharedQueue, int SIZE) {
        this.sharedQueue = sharedQueue;
        this.SIZE = SIZE;
    }

    @Override
    public void run() {
        int data;
        Random r = new Random();

        System.out.println("start producer id = " + Thread.currentThread().getId());
        try {
            while (isRunning) {
                // 模拟延迟
                Thread.sleep(r.nextInt(1000));

                // 当队列满时阻塞等待
                while (sharedQueue.size() == SIZE) {
                    synchronized (sharedQueue) {
                        System.out.println("Queue is full, producer " + Thread.currentThread().getId()
                                + " is waiting, size:" + sharedQueue.size());
                        sharedQueue.wait();
                    }
                }

                // 队列不满时持续创造新元素
                synchronized (sharedQueue) {
                    // 生产数据
                    data = count.incrementAndGet();
                    sharedQueue.add(data);

                    System.out.println("producer create data:" + data + ", size:" + sharedQueue.size());
                    sharedQueue.notifyAll();
                }
            }
        } catch (InterruptedException e) {
            e.printStackTrace();
            Thread.currentThread().interrupted();
        }
    }

    public void stop() {
        isRunning = false;
    }
}

consumer:

import java.util.Random;
import java.util.Vector;

public class Consumer implements Runnable {

    // 公共资源
    private final Vector sharedQueue;

    public Consumer(Vector sharedQueue) {
        this.sharedQueue = sharedQueue;
    }

    @Override
    public void run() {

        Random r = new Random();

        System.out.println("start consumer id = " + Thread.currentThread().getId());
        try {
            while (true) {
                // 模拟延迟
                Thread.sleep(r.nextInt(1000));

                // 当队列空时阻塞等待
                while (sharedQueue.isEmpty()) {
                    synchronized (sharedQueue) {
                        System.out.println("Queue is empty, consumer " + Thread.currentThread().getId()
                                + " is waiting, size:" + sharedQueue.size());
                        sharedQueue.wait();
                    }
                }
                // 队列不空时持续消费元素
                synchronized (sharedQueue) {
                    System.out.println("consumer consume data:" + sharedQueue.remove(0) + ", size:" + sharedQueue.size());
                    sharedQueue.notifyAll();
                }
            }
        } catch (InterruptedException e) {
            e.printStackTrace();
            Thread.currentThread().interrupt();
        }
    }
}

Main method test:

import java.util.Vector;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

public class Test2 {


    public static void main(String[] args) throws InterruptedException {

        // 1.构建内存缓冲区
        Vector sharedQueue = new Vector();
        int size = 4;

        // 2.建立线程池和线程
        ExecutorService service = Executors.newCachedThreadPool();
        Producer prodThread1 = new Producer(sharedQueue, size);
        Producer prodThread2 = new Producer(sharedQueue, size);
        Producer prodThread3 = new Producer(sharedQueue, size);
        Consumer consThread1 = new Consumer(sharedQueue);
        Consumer consThread2 = new Consumer(sharedQueue);
        Consumer consThread3 = new Consumer(sharedQueue);
        service.execute(prodThread1);
        service.execute(prodThread2);
        service.execute(prodThread3);
        service.execute(consThread1);
        service.execute(consThread2);
        service.execute(consThread3);

        // 3.睡一会儿然后尝试停止生产者(结束循环)
        Thread.sleep(10 * 1000);
        prodThread1.stop();
        prodThread2.stop();
        prodThread3.stop();

        // 4.再睡一会儿关闭线程池
        Thread.sleep(3000);

        // 5.shutdown()等待任务执行完才中断线程(因为消费者一直在运行的,所以会发现程序无法结束)
        service.shutdown();


    }
}

My project also uses a lot of blocking queues (I think it is related to personal coding style habits), similar to the realization of the above producer and consumer models.

Examples of real scenarios:

  • To send a push message to the operation, first go to the user portrait system to circle and select a group of people, fill in the corresponding group ID and sending time.

  • I use time scheduling and RPC to get crowd information. Traverse HDFS to get each userId of this group

  • Put the traversed userId into a blocking queue, and use multiple threads while(true) to fetch the data of the blocking queue

What are the benefits? When I fetch the userId, there will be a limit: either it exceeds the specified time or reaches the value of BatchSize . In this way, I can form a Task with different userIds of the same content .

Originally 100 userIds are 100 tasks. Now I put 100 userIds in one task (because the content sent is the same, so I can do this). In this way, when it is passed downstream, the concurrency is reduced a lot.

When to consider thread safety

When to consider thread-safe collection classes, that is of course when thread-safe. When is the thread unsafe? The most common is: the object of operation is stateful

Although we often hear about thread insecurity, there are very few places where we programmers are required to deal with thread insecurity in business development . For example: Did you add syn/locklocks when writing Servlet ? I think should not be?

Because the objects of our operations are often stateless . Without shared variables being accessed by multiple threads, there will naturally be no thread safety issues .

image

SpringMVC is a singleton, but SpringMVC manipulates data in methods. Each thread enters the method and generates a stack frame. The data of each stack frame is unique to the thread. If shared variables are not set, there will be no Thread safety issues.

The above is just a simple example of SpringMVC (just for a better understanding);

One sentence summary: As long as multiple threads are involved in operating a shared variable, it is necessary to consider whether to use a thread-safe collection class .

For more details, I'll talk about it when I write a summary of Java multithreading

image

At last

I still want to emphasize that although Java collections are not often used in work, they still have to focus on learning.

If you have learned the source code, you may specify the size of the collection when you create the collection (even if we know it can be dynamically expanded)

If you want to go to an interview, the Java collection is definitely indispensable, a knowledge point that must be asked, you will send sub-questions if you learn it .

Guess you like

Origin blog.csdn.net/qq_39331713/article/details/113885729