JAVA thread evolution

Author: How big is the
link : https://www.zhihu.com/question/24322387/answer/142210426
Source: Zhihu The
copyright belongs to the author. For commercial reprints, please contact the author for authorization, and for non-commercial reprints, please indicate the source.

1. Background 1.1. Evolution of Java threading model 1.1.1. Single thread time goes back to more than ten years ago. At that time, mainstream CPUs were still single-core (except for small commercial high-performance computers), and the core frequency of CPU was the highest of the machine. one of the most important indicators. In the Java field, single-threaded programming was popular at the time. For CPU-intensive applications, frequent multi-thread collaboration and preemption of time slices would degrade performance. 1.1.2. Multithreading With the improvement of hardware performance, the number of CPU cores is increasing, and the standard configuration of many servers has reached 32 or 64 cores. Through multi-threaded concurrent programming, the processing power of multi-core CPUs can be fully utilized to improve the processing efficiency and concurrent performance of the system. Since 2005, with the gradual popularization of multi-core processors, multi-threaded concurrent programming in java has gradually become popular. At that time, the mainstream commercial JDK version was 1.4, and users could create new threads by means of new Thread(). Since JDK1.4 does not provide a thread management container like a thread pool, the synchronization, collaboration, creation and destruction of multiple threads need to be implemented by users themselves. Since creating and destroying threads is a relatively heavyweight operation, this primitive multi-threaded programming is not efficient and performant. 1.1.3. Thread pool In order to improve the efficiency and performance of Java multi-threaded programming and reduce the difficulty of user development. JDK1.5 introduced the java.util.concurrent concurrent programming package. In the concurrent programming class library, new class libraries such as thread pools, thread-safe containers, and atomic classes are provided, which greatly improves the efficiency of Java multi-threaded programming and reduces the difficulty of development. Since JDK1.5, concurrent programming based on thread pool has become the mainstream of Java multi-core programming. 1.2. Reactor model Most of the network frameworks written in C++ or Java are designed and developed based on the Reactor mode. The Reactor mode is event-driven and is especially suitable for handling massive I/O events. 1.2.1. Single-threaded model Reactor single-threaded model means that all IO operations are completed on the same NIO thread. The responsibilities of the NIO thread are as follows: 1) As a NIO server, receiving TCP connections from clients; 2) As a NIO client, Initiate a TCP connection to the server; 3) Read the request or response message of the communication peer; 4) Send a message request or response message to the communication peer. The schematic diagram of the Reactor single-threaded model is as follows: Figure 1-1 Reactor single-threaded model Since the Reactor mode uses asynchronous non-blocking IO, all IO operations will not cause blocking. In theory, a thread can handle all IO-related operations independently . From an architectural point of view, a NIO thread can indeed complete its responsibilities. For example, the client's TCP connection request message is received through the Acceptor class. After the link is established successfully, the corresponding ByteBuffer is dispatched to the specified Handler through Dispatch for message decoding. User threads can send messages to clients through NIO threads through message encoding. For some small-capacity application scenarios, a single-threaded model can be used. However, it is not suitable for high-load and large-concurrency application scenarios. The main reasons are as follows: 1) One NIO thread can process hundreds or thousands of links at the same time, which cannot be supported in terms of performance. Even if the CPU load of the NIO thread reaches 100%, it cannot be supported. Satisfy the encoding, decoding, reading and sending of massive messages; 2) When the NIO thread is overloaded, the processing speed will slow down, which will cause a large number of client connections to timeout, and retransmission will often be performed after the timeout, which is even heavier The load of the NIO thread will eventually lead to a large number of message backlogs and processing timeouts, which will become the performance bottleneck of the system; 3) Reliability problem: once the NIO thread accidentally runs away or enters an infinite loop, the communication module of the entire system will be unavailable and cannot be received. and processing external messages, causing node failure. In order to solve these problems, the Reactor multi-threading model has evolved. Let's learn the Reactor multi-threading model together. 1.2.2. Multithreading model The biggest difference between the Rector multithreading model and the single threading model is that there is a set of NIO threads to process IO operations. Its schematic diagram is as follows: Figure 1-2 Reactor Multithreading Model Reactor Multithreading Model Features: 1) There is a dedicated NIO Thread-Acceptor thread is used to monitor the server and receive the client's TCP connection request; 2) Network IO operations - read, write, etc. are responsible for a NIO thread pool, which can be implemented by a standard JDK thread pool, which includes a task queue and N available threads, these NIO threads are responsible for reading, decoding, encoding and sending messages; 3) 1 NIO thread can process N links at the same time, but 1 link only corresponds to 1 NIO thread, preventing A concurrent operation problem occurred. In most scenarios, the Reactor multi-threading model can meet the performance requirements; however, in very few special scenarios, there may be performance problems when one NIO thread is responsible for monitoring and processing all client connections. For example, there are millions of concurrent client connections, or the server needs to perform security authentication on the client handshake, but the authentication itself is very performance-intensive. In such scenarios, a single Acceptor thread may suffer from insufficient performance. In order to solve the performance problem, a third Reactor threading model, the master-slave Reactor multithreading model, is produced. 1.2.3. The master-slave multithreading model The master-slave Reactor threading model is characterized by: the server is no longer a single NIO thread for receiving client connections, but an independent NIO thread pool. After receiving the client's TCP connection request processing (which may include access authentication, etc.), the Acceptor registers the newly created SocketChannel with an IO thread in the IO thread pool (sub reactor thread pool), which is responsible for reading and writing the SocketChannel. and codec work. The Acceptor thread pool is only used for client login, handshake and security authentication. Once the link is established successfully, the link is registered to the IO thread of the back-end subReactor thread pool, and the IO thread is responsible for subsequent IO operations. Its threading model is shown in the following figure: Figure 1-3 Master-slave Reactor multi-threading model Using the master-slave NIO threading model, it can solve the problem of insufficient performance that one server-side listening thread cannot effectively handle all client connections. Its workflow is summarized as follows: randomly select a Reactor thread from the main thread pool as the Acceptor thread to bind the listening port and receive client connections; the Acceptor thread creates a new SocketChannel after receiving the client connection request and registers it with On other Reactor threads in the main thread pool, it is responsible for access authentication, IP black and white list filtering, handshake and other operations; after step 2 is completed, the link of the business layer is officially established, and the SocketChannel is multi-channeled from the Reactor thread of the main thread pool. The multiplexer is removed and re-registered to the thread of the Sub thread pool to handle I/O read and write operations. 2. Netty threading model 2.1. Netty threading model classification In fact, Netty's threading model is similar to the three Reactor threading models introduced in chapter 1.2. In the following chapters, we introduce Netty's threading through the thread processing flow chart of Netty server and client Model. 2.1.1. A popular approach to the server-side threading model is to separate the server-side listening thread from the IO thread, similar to Reactor's multi-threading model. Its working principle is as follows: Figure 2-1 Netty server-side thread workflow Let's combine Netty's The source code introduces the workflow of creating threads on the server side: The first step is to initiate the server-side creation operation from the user thread. It is performed at startup, so the Main function or startup class is generally responsible for creation, and the creation of the server is completed by the business thread. Two EventLoopGroups are instantiated when the server is created, and one EventLoopGroup is actually an EventLoop thread group responsible for managing the application and release of EventLoop. The number of threads managed by EventLoopGroup can be set through the constructor. If not set, the default is -Dio.netty. eventLoopThreads, if this system parameter is also not specified, is the number of available CPU cores × 2. The bossGroup thread group is actually the Acceptor thread pool, which is responsible for processing the client's TCP connection request. If the system has only one server port to monitor, it is recommended that the number of threads in the bossGroup thread group be set to 1. The workerGroup is the thread group that is really responsible for I/O read and write operations. It is set by the group method of ServerBootstrap and used for subsequent Channel binding. In the second step, the Acceptor thread binds the listening port and starts the NIO server. The relevant code is as follows: Figure 2-3 Select an Acceptor thread to listen to the server from the bossGroup Among them, the group() returns the bossGroup, and its next method is used for Get the available threads from the thread group, the code is as follows: Figure 2-4 After the creation of the Acceptor thread server channel is completed, it is registered on the multiplexer Selector to receive the client's TCP connection. The core code is as follows: Figure 2-5 Register ServerSocketChannel to Selector Step 3, if a client connection is monitored, create a client SocketChannel connection and re-register to the IO thread of the workerGroup. First, let's see how Acceptor handles client access: Figure 2-6 The read or connection event calls the unsafe read() method. For NioServerSocketChannel, it calls the read() method of NioMessageUnsafe. The code is as follows: Figure 2-7 NioServerSocketChannel read () method Finally, it will call the doReadMessages method of NioServerSocketChannel, the code is as follows: Figure 2-8 Create a client connection SocketChannel where childEventLoopGroup is the previous workerGroup, Select an I/O thread to be responsible for reading and writing network messages. The fourth step, after selecting the IO thread, register the SocketChannel to the multiplexer and monitor the READ operation. Figure 2-9 Monitoring network read events The fifth step is to handle network I/O read and write events. The core code is as follows: Figure 2-10 Handling read and write events 2.1.2. The thread model is simpler, and its working principle is as follows: Figure 2-11 Netty client thread model The first step is to initiate a client connection by the user thread. The sample code is as follows: Figure 2-12 Netty client creation code example You found that compared On the server side, the client only needs to create an EventLoopGroup, because it does not need a separate thread to monitor client connections, nor does it need to connect to the server through a separate client thread. Netty is an asynchronous event-driven NIO framework, its connections and all IO operations are asynchronous, so there is no need to create a separate connection thread. The relevant code is as follows: Figure 2-13 The current group() of the binding client connection thread is the previously passed EventLoopGroup, from which the available IO thread EventLoop is obtained, and then set as a parameter to the newly created NioSocketChannel. The second step is to initiate a connection operation and determine the connection result. The code is as follows: Figure 2-14 The connection operation determines the connection result. If the connection is not successful, monitor the connection network operation bit SelectionKey.OP_CONNECT. If the connection is successful, pipeline(). fireChannelActive() modifies the listen bit to READ. In the third step, the multiplexer of NioEventLoop polls the result of the connection operation. The code is as follows: Figure 2-15 Selector initiates a polling operation to determine the connection result. If the connection is successful, reset the monitoring bit to READ: Figure 2-16 Judging the result of the connection operation Figure 2-17 Set the operation bit to READ Step 4, the NioEventLoop thread is responsible for I/O reading and writing, the same as the server. Summary: The client is created, and the threading model is as follows: the user thread is responsible for initializing the client resources and initiating the connection operation; if the connection is successful, the SocketChannel is registered in the NioEventLoop thread of the IO thread group, and the read operation bit is monitored; if the connection is not immediately successful, Register the SocketChannel in the NioEventLoop thread of the IO thread group, and monitor the connection operation bit; after the connection is successful, change the monitor bit to READ, but there is no need to switch threads. 2.2. Reactor thread NioEventLoop2.2.1. NioEventLoop Introduction NioEventLoop is Netty's Reactor thread. Its responsibilities are as follows: As a server Acceptor thread, it is responsible for processing the client's request access; as a client Connecor thread, it is responsible for registering the listening connection operation bit, which is used to judge the asynchronous connection result; as an IO Thread, which monitors the network read operation bit, is responsible for reading messages from the SocketChannel; as an IO thread, it is responsible for writing messages to the SocketChannel and sending them to the other party. If a half-packet is written, it will automatically register and monitor the write event for subsequent continued sending. Half a packet of data until all data is sent; as a timed task thread, it can perform timed tasks, such as link idle detection and sending heartbeat messages; as a thread executor, it can execute common task threads (Runnable). In the server-side and client-side threading model chapters, we have introduced in detail how NioEventLoop handles network IO events. Let's take a brief look at how it handles timed tasks and executes ordinary Runnables. First of all, NioEventLoop inherits SingleThreadEventExecutor, which means that it is actually a thread pool with 1 thread. The class inheritance relationship is as follows: Figure 2-18 NioEventLoop inheritance relationship Figure 2-19 Thread pool and task queue definitions are different for users In other words, you can directly call the execute(Runnable task) method of NioEventLoop to execute a custom task. The code implementation is as follows: Figure 2-20 Execute a user-defined task Figure 2-21 NioEventLoop implements ScheduledExecutorService By calling the schedule series methods of SingleThreadEventExecutor, you can NioEventLoop executes Netty or user-defined timing tasks, and the interface is defined as follows: Figure 2-22 NioEventLoop timing task execution interface definition 2.3. NioEventLoop design principle 2.3.1. Serialization design avoids thread competition We know that when the system is running, if the thread context switch is performed frequently, it will bring additional performance loss. When multiple threads execute a business process concurrently, business developers also need to be vigilant about thread safety at all times. What data may be concurrently modified and how to protect it? This not only reduces development efficiency, but also brings additional performance loss. Serial execution of the Handler chain In order to solve the above problems, Netty adopts the serialization design concept. From the reading of the message, the encoding and the execution of the subsequent Handler, the IO thread NioEventLoop is always responsible for it, which means that the entire process will not be threaded. Context switching, the data will not face the risk of being modified concurrently. For users, they do not even need to know the thread details of Netty. This is indeed a very good design concept. Its working principle is as follows: Figure 2-23 NioEventLoop Serial execution of ChannelHandler A NioEventLoop aggregates a multiplexer Selector, so it can handle hundreds or thousands of client connections. Netty's processing strategy is that whenever a new client accesses, from the NioEventLoop thread group Obtain an available NioEventLoop in sequence, and return to 0 when the upper limit of the array is reached. In this way, the load balance of each NioEventLoop can be basically guaranteed. A client connection is registered to only one NioEventLoop, thus avoiding multiple IO threads to operate it concurrently. Netty reduces the user's development difficulty and improves processing performance through the serialization design concept. The thread group is used to realize the horizontal parallel execution of multiple serialized threads, and there is no intersection between the threads, so that the multi-core can be fully utilized to improve the parallel processing capability, and the additional performance loss caused by thread context switching and concurrency protection can be avoided. 2.3.2. Timing task and time wheel algorithm In Netty, many functions depend on timing tasks, and there are two typical ones: client connection timeout control; link idle detection. A more commonly used design concept is to aggregate JDK's scheduled task thread pool ScheduledExecutorService in NioEventLoop, and execute scheduled tasks through it. This is not optimal from a performance point of view, for the following three reasons: an independent timed task thread pool is aggregated in the IO thread, so that there will be a thread context switching problem during processing, which breaks the Netty string. Row-based design concept; there is a problem of multi-threaded concurrent operation, because the scheduled task Task and IO thread NioEventLoop may access and modify the same data at the same time; JDK's ScheduledExecutorService has room for performance optimization from a performance perspective. The operating system and the protocol stack, such as the TCP protocol stack, are the first to face the above problems. The reliable transmission relies on the timeout retransmission mechanism. Therefore, each packet transmitted through TCP requires a timer to schedule the timeout event. Such timeouts can be massive, and creating a timer for each timeout is unreasonable from a performance and resource consumption perspective. According to George Varghese and Tony Lauck's 1996 paper Hashed and Hierarchical Timing Wheels: data structures to efficiently implement a timer facility" proposes a timing wheel approach to manage and maintain a large number of timer schedules. Netty's timing task scheduling is based on the time round algorithm scheduling. Let's take a look at the implementation of Netty. Timing wheel is a data structure, its main body is a circular list, each list contains a structure called slot, its schematic diagram is as follows: Figure 2-24 Working principle of timing wheel The working principle of timing wheel can be analogous to The clock, as shown by the arrow (pointer) in the above figure, rotates in a certain direction at a fixed frequency, and each beat is called a tick. It can be seen that the timing wheel consists of 3 important attribute parameters: ticksPerWheel (the number of ticks in a round), tickDuration (the duration of a tick) and timeUnit (time unit), for example, when ticksPerWheel=60, tickDuration=1, timeUnit = seconds, which is completely similar to the movement of the second hand of a clock. Let's analyze the implementation of Netty in detail: the execution of the time wheel is complexly detected by NioEventLoop. First, check whether there are timed tasks and ordinary tasks in the task queue. If there are, these tasks are executed in a proportional cycle. The code is as follows: Figure 2- 25 Execute the task queue If there is no task to be understood and executed, call the select method of the Selector to wait. The waiting time is the time delay of the first timed out timed task in the timed task queue. The code is as follows: Figure 2-26 Calculate the delay from The task with the smallest delay pops up in the scheduled task Task queue, and the timeout time is calculated. The code is as follows: Figure 2-27 Obtaining the timeout time from the scheduled task queue The execution of the scheduled task: After the periodic tick, scan the scheduled task list and set the timed out scheduled tasks. Remove it to the common task queue and wait for execution. The relevant code is as follows: Figure 2-28 After the detection and copying of the timed task that detects the timeout is completed, the timed task is executed. The code is as follows: Figure 2-29 Executing timed tasks In order to ensure that the execution of timed tasks will not overcrowd the processing of IO events, Netty provides the IO execution ratio for users to set. Users can set the execution ratio allocated to IO to prevent IO processing timeout due to the execution of massive timed tasks. Or backlog. Because obtaining the nanosecond time of the system is a time-consuming operation, Netty checks whether the upper limit time of execution is reached every time it executes 64 timed tasks, and then exits. If it has not been executed, it will be processed at the next Selector polling to provide an opportunity for the processing of IO events. The code is as follows: Figure 2-30 Execution time upper limit detection 2.3.3. Focus rather than expand Netty is an asynchronous high-performance NIO Framework, it is not a business running container, so it does not need and should not provide business containers and business threads. A reasonable design pattern is that Netty is only responsible for providing and managing NIO threads, and other business layer thread models are integrated by users themselves. Netty should not provide such functions. As long as the layers are clearly divided, it will be more conducive to user integration and expansion. It is a pity that in the Netty 3 series version, Netty provides an ExecutionHandler similar to the Mina asynchronous Filter, which aggregates the JDK thread pool java.util.concurrent.Executor, and the user executes the subsequent Handler asynchronously. ExecutionHandler is to solve the problem that some user Handlers may have uncertain execution time and cause IO threads to be blocked or hung unexpectedly. It is reasonable to analyze such requirements from the perspective of demand rationality, but it is not suitable for Netty to provide this function. The reasons are summarized as follows: 1. It breaks the serialization design concept that Netty insists on. In the process of message reception and processing, thread switching occurs and a new thread pool is introduced, which breaks the design principle adhered to by its own architecture. It is actually a kind of Architectural compromise; 2. Potential thread concurrency safety issues, if the asynchronous Handler also operates the user Handler in front of it, and the user Handler does not have thread safety protection, this will lead to hidden and fatal thread safety issues; 3. Due to the complexity of user development, the introduction of ExecutionHandler breaks the original ChannelPipeline serial execution mode. Users need to understand the underlying implementation details of Netty and care about thread safety and other issues, which will lead to more losses than gains. In view of the above reasons, the subsequent versions of Netty completely deleted ExecutionHandler, and did not provide similar related functional classes, focusing on Netty's IO thread NioEventLoop, which is undoubtedly a huge improvement, Netty began to focus on IO threads again. itself, rather than providing a user-related business thread model. 2.4. Best Practices for Netty Thread Development For other resources, it is recommended to execute it directly in the business ChannelHandler without restarting the business thread or thread pool. Avoid thread context switching, and there is no thread concurrency problem. 2.4.2. It is recommended to deliver complex and time-uncontrollable services to the back-end business thread pool for unified processing. For such services, it is not recommended to directly start the thread or thread pool processing in the business ChannelHandler. It is recommended to encapsulate different services into Tasks. Delivered to the backend business thread pool for processing. Too many business ChannelHandlers will bring about development efficiency and maintainability problems. Don't use Netty as a business container. For most complex business products, you still need to integrate or develop your own business container, and do a good job of layering with Netty's architecture. . 2.4.3. Business threads avoid direct operation of ChannelHandler For ChannelHandler, IO threads and business threads may operate, because business is usually a multi-threaded model, so there will be multi-threaded operation of ChannelHandler. In order to avoid the problem of multi-thread concurrency as much as possible, it is recommended to follow Netty's own practice to encapsulate operations into independent tasks and execute them uniformly by NioEventLoop instead of direct operations by business threads. The relevant code is as follows: Figure 2-31 Encapsulation into tasks to prevent multiple Thread concurrent operations If you confirm that the concurrently accessed data or concurrent operations are safe, you don't need to do anything unnecessary. This needs to be judged and handled flexibly according to specific business scenarios. 3. Summary Although Netty's threading model is not complicated, it is still a challenging task to develop high-performance and high-concurrency business products using Netty reasonably. Only by fully understanding Netty's threading model and design principles can high-quality products be developed.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326400156&siteId=291194637