Ctrip is how big data for real-time control of the wind

 

About the Author : Yu Wei, Cheng Technology Center Risk Control Department senior development manager. 2010 joined Ctrip, Ctrip involved in clearing platform, the development of risk control system, more in-depth study of the system architecture, streaming data processing.

 

Ctrip As the OTA leader, suffered every day with severe risk of fraud, personal bank card stolen brush, account theft, malicious brush single marketing campaign, malicious seize resources.

 

Ctrip use self-developed risk control systems to effectively identify, prevent these risks. Ctrip wind control system from scratch, after five years of continuous exploration and innovation, has been effectively covered before, during, and afterwards all aspects. From the original based on the "simple rules + DB", to the current transaction can support 10X growth in intelligent risk control system, rules-based engine, real-time computing model, streaming, M / R, big data, data mining, machine learning and other risk control systems, with real-time, near real-time risk decision-making, data analysis capabilities.

 

 

A, Aegis system architecture

 

 

figure 1

 

Three main modules: risk control engine, data services, data operations, auxiliary systems.

 

Wind Control Engine: The main air control request processing, pre-processing data, and model execution rule engine service, air control engine required data is provided by a service module.

Data Services: There are real-time traffic statistics, risk portrait, behavior equipment data, external data access agents, RiskGraph. Data access layer are provided by the data provider computing layer

Data calculation : including portraits of operational risk, RiskSession, fingerprint equipment, as well as real-time traffic and non real-time computing.

 

Source data required for operation are: risk control Event data (order data, payment data), each system to capture the UBT, fingerprint devices, log data, and so on.

 

In addition to these, there are very sound risk control platform for the monitoring and early warning systems, manual review platform and reporting system.

 

 

Two, Aegis system architecture

 

 


figure 2

 

 

Third, the rules engine

 

 

The rules engine contains three major functions, first adaptation layer .

 

Because many types of business Cheng, each service has its own characteristics and, after entering the air control system (of Aegis), the entire air control system in order to facilitate data processing, there is a risk control distal adapter module, to each service data conversion in accordance with the standardized internal air control configuration to the control system for the wind .

 

After completion of the data adapter, air control system to be consolidated data .

 

For example, when there is risk control check sum payment, pay only thrown over BU payment information (payment amount, payment method, order number, etc.). But it does not include order information, this time we must quickly find the information in accordance with the payment order information, and these two data merge, in order to rule, using the model. As we all know, from the user to generate an order to initiate a payment, the time interval from seconds to days are possible, when the interval time is short, you will want to merge the data has not yet been processed, the processing order data from the floor to be very fast. The second part is to quickly find the data to the order, we order information based on the generated RiskGraph, you can quickly and accurately locate the order details the required data.

 

After completing the pre-variable data merge, we began to prepare rules, models needed, tag data, when preparing data preprocessing module we have to rely on later to explain the data services layer. Of course, in order to improve performance, we as a variable, tag data arrange, priority access to key rules, variable model needed, tag data.

 

As we all know, the characteristics of fraudsters is the waves, the wind control system needs to be able to respond in a timely manner, when the discovery of fraud, timely follow-up on the rules to prevent similar fraud. Therefore, the rules require fast, accurate, If so, then we need to rule quickly on the line, and the rules of their own personnel rules and can be on the line. There are rules and execution rules engine compares to achieve effective separation, not because of unreasonable rules affect the entire engine. Then the rules engine must meet these conditions.

 

We finally chose the open source Drools, first it is open source, it can use the second language Java, convenient entry, the third functional enough.

 

Such Ctrip risk control engine to achieve a regular line on Ctrip efficient risk control real-time engine by using a rules engine Drools, it has a very high flexibility, configurability, and because it is java grammar rule can staff their own rules and quickly online.

 

Because each request Event risk control, we need to perform hundreds of rules, and model, then, risk control engine introduced a rule execution path optimization methods. Established parallel serial +, + non-dependency rule execution dependency optimization method, and then reintroduced into the shorting mechanism, so that thousands of regular run time control 100ms.

image 3

 

Rules of flexibility is very strong, the development, the line is very fast, but the coverage of a single rule is relatively low, if you want to increase the coverage you need a lot of rules to cover the maintenance costs will be high this time the rules, then this when you need to use the model, the characteristics of the model is relatively high coverage can be done, the model logic can be very complicated, but it needs to be training the next line, so Ctrip wind control system utilizes a rule, the characteristics of each model are complementary.

 

In the current risk control systems primarily used: Logistic Regression, Random Forest. Algorithm uses two down, the current situation is: LR variable discrimination training good enough case, be characterized by engineering the results were better. When the RF linear weak ability to distinguish between variable time, more efficient. Therefore, the use of RF proportions more.

 

 

Fourth, the data services layer

 

Data services layer, the main function is to provide data services, we know that the risk control engine pre-treatment need to get to a lot of variables and tag, these variables and tag data are the data access layer to provide. The most important purpose of the service layer is fast response. Therefore, the main use Redis as a data buffer in the data services layer, important, high-frequency data directly using Redis is used as a persistence layer.

 

The core idea of ​​the data services layer is the full use of memory (local, Redis)

1, local memory (a large number of fixed data such as ip location, city information, etc.)

2, full use of the high performance cache Redis

 

Due to the real-time traffic data services, data services risk portrait data is stored directly in the Redis, its performance to meet the requirements of the rules engine, we are here to focus on what data access proxy services.

 

Data access proxy services, the most important idea is that the data is called before the first call to a third party service rules, save data to Redis, so as to request a rule request when it is possible to read directly from the Redis, since It did pre-loaded, and the freshness of its hit rate of data is very important. Our user-related data dimensions, for example, risk control system through the analysis of user logs, can detect which users have logged in, browse, scheduled action, so that you can pre-load the data from these external services related to user Redis when rules, read the user dimension model of external data, first read directly in redis, if not present then access external services.

 

In some scenes, we do combine the introduction of DB persistence, certain information when the user changes, the public service will send a Message to Hermes, we subscribe to the information when the user knows certain information occurs changes, we will take the initiative to access an external service to get data into Redis, because the wind control system to know the Message of these data changes, so the data is persisted to the DB is ok, of course, these data also have a TTL parameter to ensure its freshness. In this scenario, the system looks at first-DB Redis did not hit, the data meet the conditions do not exist in two places, will access external services, this time, its performance, storage space, you can get optimization.

 

 

Five, Chloro system

 

 

Chloro data analysis system service is the core of the air control system, the data used by the service layer data is calculated by the system Chloro provided.

 

The main analysis dimensions including: user portrait risk, users of social networks, transaction risk behavior characteristics of the model, the supplier risk model.

 


Figure 4

 

We can see the source of the data mainly hermes, hadoop, and a variety of risk control Event data thrown over the front. Listener is used to receive various types of data, then the data will enter CountServer and Real-Time Process system, wherein the data on the first and RiskSession enter Sessionizer, the module can be quickly Session reduction processing, depending on the key into a reduction session, before being submitted to real-time processing system for processing. When the Real Time Process data processing and CountServer for good, this time divided into two parts data, in part, the result of processing, have a copy of the original data, will be submitted to the Data Dispatcher, data routing system by its internal Chloro results It will go directly to the RiskProfile provided to the engine and model use. The raw data is written to the Hadoop cluster.

 

Batch Process on the use of Hadoop cluster of large data processing capability, for off-line data processing, when the Batch Process to deal with, they will be sent to the processing results Data Dispatcher, data routing by its.

 

Batch Process data analysis between cross Rsession can do.

 

 


Figure 5

 

RiskSession definition: quantification, characterization of user behavior, anyone with access to the start of the first event Ctrip any device, we believe Rsession start, and after the last event he left no trace 30 minutes left, we I think Rsession end.

 

Wind control system by comparing the user information: Uid, phone number, email, device information:

 

Fp (Fingerprint), clientId, vid, v, deviceId to determine whether it is the same user, through its behavior information: browsing tracks, track history to judge their behavior similarity.

 

For example: a single user at the PC side, then complete the payment in the mobile phone APP years, this is for Chloro a conversation, the conversation we call risk control Session, by defining Risksession's risk control system enables the user's behavior can be quantified, also you can portray. Such Risksession can actually be used as a Container user behavior. Use RiskSession can do cross-platform, more conducive to the analysis of user characteristics.

 

 


Figure 6

 

Risk Graph was developed based on the characteristics of the control system of the wind Cheng, Risk Graph HBase is performed based on a storage medium for the system, for example, to a user node which value is the Key HBase user table, which is characteristic of each column, and then a characteristic of a user and then create a table based on hbase, thus creating a class-based HBase architecture of Graph.

 

So a core idea of ​​this system is to create an index for each dimension of the data, and then to find content based on the index value. Currently wind control system has created a quick reference dozen dimensions.

 

 

Six, Aegis other subsystems

 

 


Figure 7

 

Aegis also configure the system, the user can in the above various configurations, such as rules, the running path, standardization, tag, variable definitions, data cleaning business has Luo Ji, etc., the monitoring system is also very important, of course, the development of wind control uphold the monitoring ubiquitous design, so that it can detect any small change in the system in the first place.

 

 

Seven, outlook

 

 

Ctrip risk control through the introduction of a rules engine in 3.0, the extensive use of open source in large-Chloro-based data processing system architecture, with the model achieved very good results. In 4.0, the machine learning, artificial intelligence, and other behavior characteristics continue to force direction, wind control system to further improve the ability to identify, the skilled technical will continue to embrace open, the next step will be introduced to improve the data processing and the like wind Spark Control System .

 

 

 

Published 363 original articles · won praise 74 · views 190 000 +

Guess you like

Origin blog.csdn.net/sinat_26811377/article/details/104576864