A reference Java high-concurrency asynchronous application case

A reference Java high-concurrency asynchronous application case

Taikang Online WeChat official account is a platform of Taikang Online Property Insurance Co., Ltd. It hopes to improve customers' knowledge and experience of insurance through continuous innovation, and accurately design products and provide customers with the application of big data technology. Serve. Taikang's online WeChat official account has more than 10 million fans. In daily operations, with the help of red envelope rewards, card and coupon sharing, message notification, WeChat sharing and other means, through good content, good activities, good products and corresponding precision marketing to enhance user stickiness and activity.

In daily operations, the official account will notify customers by sending out marketing or popular science messages to users. According to experience, the traffic will gradually increase 10 minutes after the WeChat message is sent, reach a peak in about 30 minutes, and drop significantly after an hour. During this time period, the system will be very stressed.

In the system design and improvement, many scenarios of the system are implemented asynchronously. On the one hand, it can shorten the time processing of the main process, and on the other hand, it can cut the peak to a certain extent through the asynchronous queue. Today, we focus on asynchronous optimization practices within a single JVM, and do not involve asynchronous optimization practices in distributed settings. During asynchronous execution, a remote service cluster can be called to achieve certain task decomposition.

Deployment diagram

 

The entire system is deployed on the public cloud. One Nginx and four Tomcats are deployed on the virtual machine. Nginx uses random load balancing to Tomcat. The virtual machines forward client requests to Nginx for load balancing through LB, and Nginx then distributes the requests to the tomcat application server.

Rest services are provided by multiple application servers to external services, and asynchronous queues are used inside each Tomcat. At the same time, a control server performs the compensation tasks and management functions of the asynchronous tasks. Tomcat and Redis use multi-level caching to reduce pressure on Redis and reduce dependencies.

Why should Nginx be deployed on a virtual machine to forward requests to Tomcat instead of being directly forwarded to Tomcat by LB. This is because the number of IPs that the LB can support is limited.

Typical User Scenario

During the operation of the official account, typical events include:

  • sending text verify code

  • SMS notification of successful purchase or lucky draw

  • Issuance of coupons or coupons

  • Distribute WeChat red envelopes

  • WeChat message notification

  • Order process processing

  • Timed batch processing (such as data synchronization)

  • Asynchronous tasks of workflow nature (uncompleted asynchronous task compensation)

The following details the reasons why different scenarios can be asynchronous:

  1. The SMS verification code sent in different scenarios (user registration, user purchase of products, etc.) can be sent asynchronously: on the one hand, the customer's timeliness requirements are not as high, and on the other hand, the user does not receive verification within a certain time range The user can click to send the verification code again.

  2. SMS or email notification after successful purchase or lucky draw can be done in an asynchronous manner. Because it involves the interests of users, it should be treated with caution. On the one hand, the data must be stored in the database or log first (pay attention to information security ^-^, do not store sensitive plaintext information or encrypted storage), and then put it into the asynchronous queue for execution.

    On the other hand, consider that when the application service stops unexpectedly, there is no compensation mechanism for sending successful data. This is uncommon, and to reduce the coupling and complexity of current asynchronous programs. We deploy an asynchronous task compensation program on a separate service to scan for unfinished tasks and replay them (be careful with rigor).

  3. The distribution of coupons and card coupons is similar to the successful purchase or lucky draw. \u000b can be delayed after the current activity peak, and it can be done asynchronously.

  4. Wechat red envelopes, because we need to interact with wechat, and wechat will notify customers of red envelopes, we can use an asynchronous method. When it comes to funds or gifts, be careful with the design, and need to have the ability to stop and start asynchronous tasks easily.

  5. WeChat message notification, because of the interaction with WeChat, after the success of WeChat notification, you can use asynchronous. This is similar to SMS verification code.

  6. Order process processing can be done asynchronously because the subsequent steps involved can be done using a simple workflow. There are several open source frameworks you can refer to.

  7. Data synchronization or asynchronous task compensation, because it is delayed processing, can be processed asynchronously. When in use, it can be periodically compensated with timing tasks, such as cron4j. It is suitable for the following total-score-total task processing mode.

For these "ubiquitous asynchrony", the internal model will be analyzed in detail later.

Asynchronous everywhere

The following figure contains 4 typical asynchronous queue models (pictures are from the Internet):

 

One producer produces data, and one consumer consumes data, which is generally used in the business logic of background processing.

  • One producer produces data, and multiple consumers consume data (there are two cases: the same message can be consumed by multiple consumers separately. Or multiple consumers form a group, and one consumer consumes one data).

  • Multiple producers produce data, and a single consumer consumes data, which can be used in scenarios where traffic is limited or queued for processing by a single resource.

  • Multiple producers produce data separately, and multiple consumers consume data (there are two situations: the same message can be consumed by multiple consumers separately. Or multiple consumers form a group, and one consumer consumes one data ).

 


Total score and total task model: it is especially suitable for the first thread to take out a batch of data and put it in the queue (such as select); multiple threads execute business logic respectively; the result after execution is executed by one thread (such as the update operation, which can prevent database locks)

Here are a few common models for technical analysis, and how to choose a framework in practice.

  1. Thread pool using blocking queue

  2. Use a fixed-step or fixed-time queue

  3. Using Disruptor

  4. Use MQ or Kafka

Use thread pool to achieve asynchronous (support multi-producer, multi-consumer)

 

Features: You can use the thread pool that comes with JDK to achieve asynchronous, simple programming, and a lot of data. It is recommended to be preferred in scenarios with small concurrency.

Use Guava Queues (supports multiple producers and single consumers)

 

Features: Asynchronous batch queue, batch data processing after the queue reaches the specified length or reaches the specified time. It is suitable for scenarios that require low response time and can tolerate a certain amount of data loss. For example, batch saving of short text data.

After a period of research, we found that Disruptor was more suitable for our needs.

First, let's introduce the powerful performance of Disruptor.

 

 

 

 

 

This picture contains the asynchronous queue model scene I enumerated above, so it is very representative. Disruptor has high performance because it uses a lock-free queue. The specific principle is not detailed, you can search and see. Disruptor supports the above typical scenarios, and flexibly uses Disruptor's workflow mechanism to simplify programming.

  • Getting started with English articles: https://github.com/LMAX-Exchange/disruptor/wiki/Getting-Started

  • Chinese demo link: http://my.oschina.net/u/2273085/blog/507735?p=1

  • Concurrency framework Disruptor related translation: http://ifeve.com/disruptor/

Post the official test results.

 

The following describes several uses of disruptor from the code level.

Using Disruptor (single producer multiple consumers)

 

Disruptor provides multiple implementations of WaitStrategy. Each strategy has different performance, advantages and disadvantages. Selecting an appropriate strategy according to the hardware characteristics of the CPU in the actual operating environment and matching specific JVM configuration parameters can achieve different performance improvements. For example, BlockingWaitStrategy, SleepingWaitStrategy, YieldingWaitStrategy, etc., where:

  • BlockingWaitStrategy is the least efficient strategy, but it consumes the least CPU and can provide more consistent performance in various deployment environments;

  • The performance of SleepingWaitStrategy is similar to BlockingWaitStrategy, and the CPU consumption is similar, but it has the least impact on the producer thread, and is suitable for scenarios similar to asynchronous logs;

  • YieldingWaitStrategy has the best performance and is suitable for low-latency systems. This strategy is recommended in scenarios that require extremely high performance and the number of event processing lines is less than the number of logical cores of the CPU; for example, the CPU enables hyperthreading.

We now use the BlockingWaitStrategy pattern.

Using Disruptor (multi-producer, multi-consumer)

 

 

Using Disruptor (multi-producer, multi-consumer)

 

In this example, a thread pool-like consumer group is used to process data.

Multi-step Pattern Workflow: Disruptor

 

Annoyance after using async

Annoyance No. 1: Risk of Data Loss

Solution: write to the log or database first, and then put it into the asynchronous queue.

Trouble 2: Increased pressure on other systems

Solution: Use certain current limiting and fusing to protect other systems.

Trouble 3: Asynchronous tasks are not executed after data is saved

Solution: Use the asynchronous task compensation method to periodically obtain data from the database, put it in the queue for execution, and update the data status bit after execution.

Trouble 4: How to set the queue length and the number of consumers

Workaround: Use an actual stress test to get the queue length. Or use the mathematical formula of queuing theory to get the preliminary value, and then carry out the actual pressure test.

Finally, introduce the experience in the project:

    • Do what you can: Make technology selection based on business characteristics, and try to avoid using asynchrony if the business volume is small. Do something, do something

    • Data Speaking: Necessary Stress Tests Must Be Performed When Asynchronous

    • First find out the key points of the system: optimize the performance within a single system, and then optimize globally through the decomposition of the overall system

    • Choose a framework based on the characteristics of the team and project.


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325982584&siteId=291194637