High Availability Service Layer

High Availability Service Layer

What is the service layer

As we all know, the service layer is mainly used to deal with the business logic of the website and is the core of the large business website. For example, the following three business systems are typical service layers that provide aggregation of basic service functions

  • User Center: Mainly responsible for user registration, login, and access to user user information functions
  • Trading Center: It mainly includes functions such as forward order generation, reverse order, query, amount calculation, etc.
  • Payment center: mainly including order payment, cashier, reconciliation and other functions

High Availability Service Layer

Overall structure

In the early stage of business development, it was mainly business-oriented, and the “ALL IN ONE” structure was generally used to develop products. This stage can be summed up in one sentence as “rough and fast”. When developed, the following problems will be encountered

  • File size: a code file with more than 2000 lines
  • Severe coupling: irrelevant services are directly accumulated in the Serivce layer
  • High maintenance cost: After the employee leaves, no one understands the business logic inside
  • Involve the whole body: change a small amount of business logic, you need to repackage and release all dependent packages

When encountering these problems, it is mainly solved by "demolition"

High Availability Service Layer

The specific dismantling method is mainly divided into units according to the business field, and vertical splitting is carried out. The advantages of splitting are obvious, mainly as follows:

  • An independent business module for each business
  • Complete decoupling between businesses
  • Businesses do not affect each other
  • business module independent
  • Independent development, launch, operation and maintenance
  • efficient

stateless design

For the business logic service layer, it is generally designed as a stateless service. Statelessness means that the service module only processes business logic without caring about the context information of business requests. Therefore, stateless servers are equal and independent of each other.

Failover is easy only when the service becomes stateless. Usually failover is to transfer user requests to other application servers for business logic processing in a balanced manner when an application server cannot serve user requests.

High Availability Service Layer

timeout setting

General website services will be divided into the main service and the called service. The timeout setting is to set a timeout waiting time Timeout when the calling service calls the called service. After the main call service finds a timeout, it enters the timeout processing process.

High Availability Service Layer

  1. When the calling service A calls the called service B, set the timeout waiting time to 3 seconds. It may be because the B service is down, the network condition is not good or the program BUG, ​​etc., the B service cannot respond to the A service call in time.
  2. At this time, after waiting for 3 seconds, service A will trigger the timeout logic and no longer care about the response of service B.
  3. The timeout logic of the A service can be determined according to the situation. For example, it can retry, make a request to another peer B service, or simply give up and end the request call.

The advantage of the timeout setting is that when a service is unavailable, the entire system will not have an avalanche response.

Asynchronous call

Generally, there are two types of request calls: synchronous and asynchronous. A synchronous request is like making a phone call and requires a real-time response, while an asynchronous request is like sending an email and does not require an immediate response.

These two types of calls have their own advantages and disadvantages, which mainly depends on which business scenario is faced. For example, in the face of high concurrent performance requirements, asynchronous calls have a greater advantage than synchronous calls. This is like a person cannot make multiple calls at the same time, but can send many emails.

High Availability Service Layer

So when should we use asynchronous calls?

In fact, it mainly depends on the business scenario. If the business allows delayed processing, it is processed asynchronously.

So how do we implement asynchronous calls?

Usually, queues are used to implement delayed processing of services. For example, an order center calls a distribution center. In this scenario, the business can accept delayed processing.

What are the main functions of the message queue?

  • Asynchronous processing - increases throughput
  • Peak shaving and valley filling - improve system stability
  • System Decoupling - Business Boundary Isolation
  • Data synchronization - eventual consistency guarantee

So how many queues are there? In fact, it mainly depends on the scope of the business

  • Inside the application - using thread pool, such as BlockingQueue in Java ThreadPool to do task-level buffering and processing
  • Outside the application - such as RabbitMQ and ActiveMQ are application-level queues to facilitate business boundary isolation and increase throughput

High Availability Service Layer

At the same time, technically speaking, message queues are generally divided into two models: Pull VS Push

  • Pull model: Consumers actively request message queues to obtain messages in the queues.
  • Push model: message queue actively pushes messages to consumers

Among them, the Pull mode can control the consumption speed, you don't have to worry that you can't process the message yourself, you only need to maintain the offset Offset in the queue. Therefore, when the consumption is limited and the producers pushed to the queue are uneven, the Pull mode is more appropriate.

Push is more suitable for situations with high real-time requirements. As long as the producer's message is sent to the message queue, the queue will actively push the message to the consumer. However, this mode has a much higher capacity requirement for the consumer. If there is a queue for consumption The consumer pushes some messages that cannot be processed, and when the consumer encounters an Exception, it will enter the queue again, causing the consumption to be blocked.

However, the more mature queues in the Internet industry mainly use the Pull mode, such as Kafka, RabbitMQ (both supported), RocketMQ, etc.

idempotent

What is an idempotent design?

In fact, it is very simple, that is, the effect of one request and multiple requests is the same. In mathematical terms, this is f(x) = f(f(x)).

So why do we do idempotent design? Mainly because the current systems are designed in a distributed way, and calls in a distributed system are generally divided into three states: success, failure, and timeout.

It doesn't matter if the call succeeded or failed, because the state is clear and unambiguous, but if there is a timeout, it doesn't know whether the request succeeded or failed.

High Availability Service Layer

If this happens, what should we do? Generally, the operation of retry is taken, and the corresponding interface is re-requested. If the request interface is a Get operation, that's fine, because the effect of requesting multiple times is the same. However, if it is a Post and Put operation, it will cause data inconsistency and even data coverage.

For example: When making payment on the payment checkout page, the payment fails due to the network timeout problem. At this time, we will make another payment operation, but when the payment is successful, it is found that your account balance has been reduced by 2 times. At this time, I must be very upset, and I will start to scold my mother in my heart...

The key to this problem is: after the network times out, do not know what the payment state is? Success or failure? Therefore, idempotent design is necessary, especially in industries with high data requirements such as e-commerce, finance, and banking.

How do we usually solve this situation?

  1. The requester generally produces a unique ID, which can have the same business, such as order number or payment serial number, and bring the unique ID when initiating the request.
  2. After the receiver receives the request, the first step is to query whether the receiver has a corresponding record by obtaining the unique ID. If there is, it will directly return the result of the last request. If not, it will be operated, and After the operation is completed, record it in the corresponding table

High Availability Service Layer

Service downgrade

Service downgrade mainly solves the problems of insufficient resources and excessive traffic. For example, e-commerce platforms use some services to not provide access during peak hours such as Double Eleven and 618, reducing the impact on the system.

What are the ways to downgrade?

  • Delayed service: For example, during the Spring Festival Gala, when WeChat sent a red envelope, it appeared to grab a red envelope, but the account balance did not increase, and it would take a few days to add it. In fact, this is WeChat's internal use of delayed services to ensure the stability of services, and to record running bills through queues
  • Functional degradation: Stopping unimportant functions is a very useful way to suspend relatively unimportant functions, allowing the system to free up more resources. For example, turn off the recommendation of related articles, the user's comment function, etc., and restore the service after the peak has passed.
  • Reduce data consistency: During the big promotion, we found that the real inventory data is not displayed on the page, only the two states of whether there is inventory or no inventory are displayed.

High Availability Service Layer

I just talked about the way of downgrading, so what should we pay attention to when we operate downgrading?

  1. Clearly define the degradation level: For example, the throughput exceeds X, the response time exceeds Y seconds per unit time, and the number of failures exceeds Z times. These thresholds need to be determined through stress testing during preparation.
  2. Sort out the business level: Before downgrading, you first need to determine which businesses are required, which ones are available, and which ones are dispensable.
  3. Downgrade switch: You can directly downgrade in the background by accessing the configuration center (such as Ctrip Apollo, Baidu Disconf). However, if the company does not have a configuration center, it can encapsulate an API interface for segmentation, but the API interface needs to be idempotent, and some simple signatures need to be done to ensure its certain security.

Summarize

Summarize the main content shared today

  • Overall architecture: vertical splitting according to business attributes, reducing project dependencies, separate development, launch, operation and maintenance
  • Stateless design: The user state data cannot be saved in the application service. If there is a state, there will be problems such as difficulty in expansion, single point, etc.
  • Timeout setting: when a service is unavailable, there will be no chain reaction of the entire system
  • Asynchronous call: synchronous call is changed to asynchronous call to solve the impact of remote call failure or call timeout on the system
  • Service degradation: sacrifice non-core business to ensure high availability of core business

The next issue will mainly talk about the distributed cache layer of the high-availability architecture, so stay tuned.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326488895&siteId=291194637