The maximum number of concurrent processing of the chat interface

Preface

Living in the Internet era of 2023, and against the background of the increasingly involution of the domestic Internet, I believe that everyone has learned about the three high indicators of Internet system design when interviewing for jobs, studying and searching for information online, that is, high concurrency, high performance, High availability. This article mainly talks about high concurrency and high performance. In essence, high performance is also to pave the way for high concurrency. Part of the high-concurrency design corresponds to the maximum concurrency number of the subject interface of this article.

Concurrency and parallelism

Before I start talking about concurrency, I need to review some old knowledge for you.
What is concurrency? What is parallelism?

Concurrency

The CPUs of early computers were all single-core. A CPU could only execute one process/thread at the same time. When there were multiple processes/threads waiting to be executed in the system, the CPU could only execute one process/thread before executing the next one. one.
When the computer is running, there are many instructions that involve I/O operations, and I/O operations are quite time-consuming and the speed is much lower than that of the CPU. This causes the CPU to be often idle. status, you can only wait for the I/O operation to complete before continuing to execute subsequent instructions.
In order to improve CPU utilization and reduce waiting time, people have proposed a theory of CPU concurrent work.
The so-called concurrency is to reasonably allocate CPU resources to multiple tasks through an algorithm. When a task performs an I/O operation, the CPU can switch to other tasks and wait until the I/O operation is completed. After the O operation is completed, or when a new task encounters an I/O operation, the CPU returns to the original task and continues execution.

Although the CPU can only execute one task at the same time, by allocating CPU usage rights to different tasks at the appropriate time, multiple tasks appear to be executed together visually. The execution speed of the CPU is extremely fast, and the time for switching between multiple tasks is also extremely short. The user cannot feel it at all, so the concurrent execution looks just like the real thing.

Parallelism

Concurrency is proposed for single-core CPUs, while parallelism is proposed for multi-core CPUs. Unlike single-core CPUs, multi-core CPUs can truly "execute multiple tasks simultaneously."

A multi-core CPU integrates multiple computing cores (Core), and each core is equivalent to a simple CPU. If you don't care about the details, you can think of multiple independent CPUs installed on the computer.

Each core of a multi-core CPU can perform a task independently, and multiple cores will not interfere with each other. Multiple tasks executed on different cores are truly running at the same time. This state is called parallelism.
For example, two tasks are also executed, and the working status of the dual-core CPU is as shown in the figure below:

When a dual-core CPU performs two tasks, each core performs one task, which is more efficient than a single-core CPU that constantly switches between the two tasks.

think for a while

So here we have talked about the relationship between the concepts of concurrency and parallelism and the maximum number of concurrent processing of our interface? Don’t worry, this is all foreshadowing, to help you digest the content that follows.
Here I will summarize the concepts of concurrency and parallelism:

Concurrency: During the same time period, multiple tasks are executed, but not necessarily at the same time. This means that the tasks are taking turns using time slices of the CPU, giving the illusion that they are running "at the same time." What concurrency really means is that although multiple programs or processes may be running, only one process or thread is actually using the CPU at a time.
Parallel: refers to multiple tasks being executed at the same time. This usually requires multiple processors or multi-core CPUs to implement, because only in this way can each processor or core perform a task at the same time. For example, when music software and IDEA are running at the same time, if the computer has more than two CPU cores, then the two applications can truly run in parallel.

Please pay attention to the bold font above, which also emphasizes that within the same period of time, it is a concept of a time range. For example, if a person eats three steamed buns at the same time, it takes 1 minute to finish one bun. Even if he has a big appetite, he can only eat one bun at a time. It takes 3 minutes to finish three buns.
Looking at parallelism again, parallelism is emphasized at the same moment. For example, parallelism means three people eating three steamed buns at the same time. Each of these three people can eat one steamed bun at the same time. It only takes 1 minute to eat three.
So when you see this, I’m asking you a question, how many steamed buns can 3 people eat at most in 3 minutes?
I believe everyone can tell me.
Understand the maximum number of concurrent processing of the interface
In the Java language, how do we understand concurrency and parallelism? It's still the same as eating steamed buns. Let me give you an analogy. I believe you will understand.
For example, I have an interface whose RT (response time) is 50 milliseconds. There is only one thread processing client requests. Then this thread can handle 20 client requests in 1000 milliseconds.
Here 20 represents the maximum number of concurrent threads within 1 second.
Then if there are two threads processing client requests, the maximum number of concurrency of the interface becomes 40, and three threads process client requests, the maximum number of concurrency of the interface becomes 60 , and so on.
Here we can derive a formula, the number of threads * the maximum number of concurrency of a single thread represents the maximum number of parallel processing of the interface.
Then according to the formula, we draw a conclusion that the maximum number of concurrent processing of the interface can be increased by increasing the number of threads or reducing the interface response time.
Let’s talk about the concurrent processing configuration of the Tomcat container
After Spring Boot was released, Tomcat became the default web container of Spring Boot. Based on the above content, here we will explain to you some key parameters involving the Tomcat server in Spring Boot, and how to use these parameters to control the request volume in high concurrency situations. The Spring boot configuration file is as follows,

SpringBoot version: 3.1.5

server:
  port: 8080
  tomcat:
    threads:
      max: 200
      min-spare: 10
    accept-count: 100
    max-connections: 8192


server.port

This parameter is used to specify the port number that the server listens on. By configuring different port numbers, multiple services can run in parallel on the same host.

To make it easier for everyone to understand, I use the metaphor of opening a restaurant. By specifying the server IP + startup port, it is equivalent to telling customers the location of our restaurant and waiting for customers to come to eat.

server.tomcat.threads.max
This parameter defines the maximum number of threads for the Tomcat server. In high concurrency situations, you can increase the maximum number of concurrent processing capabilities of the server by increasing the maximum number of threads, but you need to pay attention to the hardware resource limitations of the server.

threads.max is equivalent to setting the maximum number of waiters required by the hotel when it is busiest. If one waiter can only serve 1 guest, then 200 waiters can serve 200 guests at the same time. Note that this is a parallel service.

server.tomcat.threads.min-spare
This parameter is used to set the minimum number of threads that the server maintains in an idle state. Make sure the server has enough threads available when processing requests to avoid delays.

threads.min-spare is equivalent to setting the minimum number of waiters required when the hotel is idle.

server.tomcat.max-connections
This parameter specifies the maximum number of client connections that the server accepts and handles. After exceeding this limit, it will be placed in the accept-count queue

max-connections is equivalent to setting the maximum number of guests in the hotel space. Note that it is accommodation. Guests who enter the hotel cannot enjoy the service as soon as they enter. Often you still need to wait for a while. This period of time is when the waiter has finished serving the previous customer, and only the available waiter can serve the new customer entering the store.

server.tomcat.accept-count
This parameter defines the maximum queue length for incoming connection requests when all possible request handling threads are in use. In high concurrency situations, you can control the queue length of connection requests by appropriately adjusting this parameter.

accept-count is equivalent to setting the number of people queuing at the entrance of the hotel. Think about the situation when we need to queue up when we go out to eat. Of course, the restaurant is full and there are no seats. Then you need to queue up at this time. Note: What should I do if the queue is full?
At this time, the hotel will notify people who are not in line, but also want to ask if they are in line, "Don't wait in line, it is full today, everyone should go to eat elsewhere!".
Correspondingly, Tomcat encounters a situation where the number of client connections exceeds max-connections + accept-count. At this time, Tomcat will directly reject new client connections.

Guess you like

Origin blog.csdn.net/u011397981/article/details/134724073