The smart fuse of the advertising business system - "smart flow control"

The smart fuse of the advertising business system - "smart flow control"

In addition to designing an excellent retry mechanism in the AB link - "double sending", there is also another " intelligent flow control " mechanism in the core terminal of the ADX system to ensure the robustness of the service and avoid failures in the microservice architecture. hierarchy effect.

Such mechanisms are like fuses, all kinds of which play an important role in flow fusing/self-healing.

Intelligent flow control

When the delivery engine gets the characteristics of the current traffic, it will request the services of different DSPs in real time to obtain the latest advertisement candidate information. In the DSP connected with ADX, the carrying capacity of each service is different at different times.

If we feed all the traffic of the platform directly to the DSP, the DSP cannot afford it. The DSP service may be directly destroyed. Even if it is not collapsed, a large number of timeouts and non-response will increase the time-consuming and failure rate of the entire ADX service export, thereby causing business accidents.

Regular traffic control

insert image description here
Note: For the full link flow chart, please refer to the "Advertising Business System Details" of the three top complex businesses of advertising, recommendation and search

As shown in the figure, when the delivery engine concurrently requests each DSP with 2w traffic, traffic intervention will be carried out through certain rules to ensure that the traffic meets the upper limit of the DSP load.

In each instance of the distributed deployment engine, the instance will estimate the total traffic delivered by the current delivery engine 2w based on the load balancing weight of 1/4 and the current single-instance traffic 5k. If it exceeds the expectation, an adaptation calculation will be performed to obtain the optimal scale of 2.5k.

Data source calculation

Intelligent flow control

The threshold value in conventional traffic control is the value evaluated by DSP technical students, and it is often contrary to this threshold value in actual production. For example, when a major event occurs, the service stability is reduced; or the node hardware device fails; or even the service function iteration is abnormal...

How to flexibly and dynamically determine the carrying capacity of DSP services in production? And real-time adaptation of its equivalent traffic is a key issue that needs to be paid attention to in ADX.

Here we introduce a common architecture in the system - mount mode.

function mount

Mounting type, by mounting the core dynamic threshold script and sharing the threshold data, the traffic control strategy is triggered.

insert image description here
We will mount a data calculation logic based on prometheus monitoring data outside the service. It will poll the DSP for the failure rate and timeout rate of the last 2 minutes with a granularity of 30s. By setting thresholds, decisions about the relationship between traffic and DSP carrying capacity are made. For example, timeout rate > 5% or average time-consuming > 100ms, or even timeout rate > 5% && average time-consuming > 100ms...

Based on this benchmark, we make a lifting mechanism. When the traffic overflows the DSP carrying capacity, the weight reduction adjustment will be made, and the threshold will be adjusted by 10%; otherwise, the weight will be adjusted to increase to the initial threshold by 2% steps.

threshold storage

When the mounted data is found to be in an abnormal state at this time, the real-time threshold will be calculated according to the above "step size strategy" and placed in the MC. The delivery engine will first take the value of MC as the traffic control threshold; if there is no abnormal state, the data threshold in Redis shall prevail.

Architecture strengths and weaknesses

The dynamic threshold script cooperates with the flow control strategy. The two parts work independently and cooperate to form an intelligent flow control system, which eliminates the problem at the beginning of the text. In the lifting weight mechanism, the granularity of smooth transition can be controlled arbitrarily.

  • The mounted architecture model maximizes policy decoupling and flexibility;
  • The disadvantage is also very obvious, relying on external scripts and third-party components, when the dependent part fails, the overall flow control function will be lost. [Generally, the corresponding real-time monitoring will be built, and the alarm will be used to reach out]

For a complex ADX system, the design of intelligent flow control reflects the autonomy and intelligence of the service to a great extent. It is an extremely precious part in the design of large-scale business system architecture.

Service construction and deployment

In the ADX system, the entire link involves nearly a hundred large and small microservices.

Good service construction and flexible and agile deployment capabilities are the cornerstones to ensure the rapid delivery of value in the advertising business...


See follow-up article!

Recommended reading:
Advertisement, recommendation, search three top complex business "Advertising Business System Details"
Advertising Business System Inheritance of the Past and the Future - "Message Center"
Advertising Business System Data Transfer Station - "Log Center - Real-time Service Monitoring"
The data bridge of the advertising business system - the core channel of the "log center-exposure data transfer and settlement"
advertising business system - the auxiliary decision-making of the "log center-s2s monitoring and reporting"
advertising business system - the "AB experimental platform"
advertising business system Framework Precipitation——Smart Fuse of “Data Consumption Service Framework”
Advertising Business System—Agile Delivery of “Smart Flow Control”
Advertising Business System——Business Connection of “Deployment Based on Docker Containers”
Advertising Business System—“PDB - Advertisement delivery [quantity and price]"


Get it done with three lines of code - Reversing the linked list...
Kafka's high-throughput, high-performance core technology and best application scenarios...
How HTTPS ensures data transmission security - TLS protocol...
Build a real-time monitoring system based on Prometheus + Grafana in five minutes...

Guess you like

Origin blog.csdn.net/qq_34417408/article/details/128674484