Dry information | Ctrip train ticket business practice based on causal inference

About the Author

Seven, a data analyst, focuses on user growth, data science and other fields.

1. Background

As a travel platform, Ctrip is closely related to user needs. It is particularly important to understand and identify the causal relationship of each strategy/system to conversion/revenue. In this process, other factors that affect the dependent variable need to be controlled, but these factors are usually complex and difficult to measured. When it is difficult to identify relationships, how to use more scientific methods to conduct micro and macro modeling analysis of strategies, and how to systematically evaluate the long-term impact of various strategies, are important issues to be solved.

At Train Ticket BG, we have encountered five types of problems that need to explore cause and effect at this stage: product function iterative evaluation, virtual product value evaluation, precision marketing and operations, AB-free experiment incremental effect evaluation, and external environment change impact evaluation.

We usually have several ways to solve these problems: 

  • Construct correct AB experiments on product design, reasonably calculate indicators, and measure the impact of product functions and iterations;

  • Causal inference based on observational data, that is, extracting causal relationships from existing experimental and non-experimental data;

  • Counterfactual reasoning is constructed through the combination of machine learning algorithms, data, and experiments to answer long-term effect questions. 

The core idea of ​​the above three methods is causal inference.

This article will take the practical problems existing in Ctrip’s train ticket business as an example to introduce some related work on causal inference of Ctrip’s train tickets. The main content includes: First, introduce the basic ideas and theoretical framework of causal inference theory, so that everyone can Understand from a macro perspective what causal inference tools are; secondly, explain the cases where we tried to use causal inference methods/tools to solve core business problems. There are mainly three more specific scenarios as follows:

  • Causal inference problems encountered in user operation scenarios;

  • Specific cases of causal inference in virtual value assessment scenarios;

  • Evaluation of the effects of other scenarios where AB experiments cannot be performed.

Finally, through practice we have correspondingly accumulated some frameworks for using tools.

2. Basic ideas and theoretical framework of causal inference

2.1 Basic idea

Causal relationships must first be distinguished from correlations that are very common in our daily lives. For example: we found that people outside the hospital are healthier than people inside the hospital. This can show that "there is a correlation between hospitals and physical health." However, it can be said that "is the hospital the cause of poor health?", which is obviously not possible. That is, as long as A and B often occur at the same time, it means that there is a correlation between A and B, but it does not mean that there must be a causal relationship between A and B. Causality emphasizes that A caused B to occur, so if there is causality, there must be correlation, and vice versa is not true (Figure 2-1).

Therefore, the core of causal inference is to consider the causal relationship between data on the premise that there is an association relationship in the data. That is, separating causation from association and making a correct estimate of the size of the causal analysis.

d1b591c0bd0041ea60b25a5c158916df.jpeg

Figure 2-1 Correlation and causality

2.2 Theoretical framework

In causal inference, there are two frameworks:

The core of Rubin's virtual fact model (Potential Outcome) is to find a suitable control group. Usually, we want to measure the difference in results between users who are affected by the experiment and those who are not affected by the experiment. For the same user, we can only observe the state of being affected/not being affected, so we need to Find appropriate control groups to estimate and measure unobserved effects. We usually construct some identification experiments. For example, AB experiments are often used on the Internet, or appropriate methods are used to find control groups based on observation data. Regarding observation data, there are two ideas here:

  • Constructing similar groups (Matching): This idea assumes that there are some samples among the samples not affected by the experimental strategy that are homogeneous with the samples affected by the experimental strategy. As long as we find ways to find these similar samples as virtual control groups, we can control exogenous factors. The most classic method of this idea is the propensity score matching method (PSM).

  • Constructing virtual reality (Synthetic Control): This idea believes that the impact of the strategy is actually the difference between the indicator performance after the strategy is implemented and the indicator performance in parallel time and space "assuming the strategy is not implemented". Therefore, as long as the indicator level of the virtual space and time that the hypothetical strategy does not reach is constructed through modeling methods, the experimental strategy returns can be evaluated. Typical methods include synthetic control method (SCM) and Causal Impact.

Pearl Causal Graph Model uses a directed graph to describe the causal relationship between variables. The causal relationship between variables is obtained by calculating the conditional distribution in the causal diagram. The directed graph guides us to use these conditional distributions to eliminate estimation bias. The core is also to estimate the test distribution and eliminate the bias caused by other variables.

The above two causal frameworks are two complementary methods of inferring virtual facts. The purpose is to calculate the impact on the results when there are confounding variables and intervening variables. Both require assumptions about causal relationships and control variables that bring bias. The difference is that the causal effect estimated by the Rubin framework is mainly the expected difference before and after the intervention, while under the Pearl framework, we estimate the distribution difference before and after the intervention. The problem solved by the Rubin framework is the estimation and statistical inference of the causal effect, while the Pearl framework is more Favors the identification of causal relationships. Figure 2-2 shows some common main uses of the two frameworks.

372fdaa11b32192165a1ac6e5b14aa4f.png

Figure 2-2 Causal inference toolbox

3. Practical cases

With the development of business, more and more attention is paid to the exploration and accurate evaluation of causal relationships. More and more business scenarios and evaluation problems need to be optimized and solved through causal inference theory, such as how to reduce marketing costs and how to scientifically evaluate member value. etc. Based on these issues, we conducted exploratory research on the causal inference theory, and finally implemented it on multiple key business issues, successfully solving existing problems.

3.1 User operation scenario—UPLIFT model

  • Model introduction : Find strategy-sensitive groups, find strategy-sensitive groups, strategy-sensitive groups refer to groups that can respond to a certain intervention, that is, the users in the upper right corner of Figure 3-1.

2a6ff46971ed06c489ce5d305d626213.jpeg

Figure 3-1 Schematic diagram of UPLIFT model

  • Business background : At this stage, the user operation volume is large, and text messages require costs. The UPLIFT model is used to find people who are sensitive to text messages, and based on refined strategic operations, it helps operators save costs and further improve operational ROI.

  • Model application :

Modeling methods: S-Learner, T-Learner, X-Learner, etc.

Evaluation method: QINI curve, etc.

  • Application results : It has been tested under multiple strategies and launched on the operation platform. Taking the top 10% of the model scores brings an increase of: number of people * 10% * 0.011, as shown in Figure 3-2.

b41e15c19d68acbec9e051a4f1e31a38.png

Figure 3-2 UPLIFT model result display

3.2 Virtual value assessment scenario—propensity score matching

  • Model introduction : Find similar groups of people from observation data by calculating propensity scores, that is, find people who are similar to the intervention group in the unintervention group, as shown in Figure 3-3.

9f7caec5332d5d3ed940634762dbc357.jpeg

Figure 3-3 PSM ideological diagram 

  • Business background : In the user growth business, some business sectors are in virtual form, such as corporate WeChat, official accounts, memberships, etc. The business side hopes to evaluate the incremental value brought by these virtual forms to guide cost investment.

  • Caliber iteration :

A. Caliber 1.0 :

Experimental group: users in the enterprise WeChat environment.

Control group: users who are in the general market and are not in the enterprise WeChat environment.

Conclusion: Users in the enterprise WeChat environment are xx% more valuable than users not in the enterprise WeChat environment.

In fact, this conclusion is definitely wrong. Because the two groups of users who are in the enterprise WeChat environment and those who are not in the enterprise WeChat environment are inherently unbalanced, because generally speaking, users who can take the initiative/be guided to enter the enterprise WeChat environment are relatively more loyal/active users. That is to say, there is a serious problem of sample self-selection, and the conclusions obtained are biased by confusion.

B. Caliber 2.0 : Strict logical control, controlling the first order time, user type and attention time are the same, as shown in Figure 3-4.

Experimental group: Active users with first order between 2020.1-2020.7 & following official accounts from 2020.7-2020.12.

Control group: active users whose first order was between 2020.1 and 2020.7, but have not followed the public account so far.

525baab95033e7dcfd6c115250311ff9.png

Figure 3-4 Caliber 2.0 display

It is relatively accurate compared to the original caliber, but under strict logical control, the number of users has shrunk significantly, making it impossible to accurately measure the incremental value of users in all public account environments.

C. Caliber 3.0 : PSM model looks for similar groups of people, as shown in Figure 3-5.

Experimental group: users who have joined the enterprise WeChat environment and have been retained for 180 days.

Control group: On the day when a user joins the enterprise WeChat environment, PSM is used to match similar users among the general population without replacement and is put into the control group.

2617611a06d5545f7c81e698afb5e1bd.jpeg

Figure 3-5 Problem-solving idea map

  • Result display : As shown in Figure 3-6, the upper left corner shows the original propensity scores of the experimental group and the control group, and the lower right corner shows the population scores after matching between the experimental group and the control group. It can be seen that from the two The propensity scores of the selected groups have a high degree of matching, that is, we believe that the two groups of people are highly homogeneous. It can also be seen from the picture in the lower left corner that before and after matching, the difference between the pre-matching groups is very large, and the variance between the matching groups is controlled within a reasonable range.

1922c0374ce209525bebc07d96d8162a.png

Figure 3-6 PSM model result diagram

3.3 Experimental design scenario—synthetic control method (SCM)

  • Model introduction : For example, if we implement intervention/policy in City A and cannot find the best control area in City A, we can use the synthetic control method to conduct appropriate linear combinations of several large cities to construct a "very similar" to City A. "Synthetic City A" and compare "Real City A" with "Synthetic City A".

  • Business background : Hotel products want to explore user price elasticity through AB experiments, that is, when adjusting pricing, look at the changes in conversion rates. In order to obtain relatively scientific conclusions to support decision-making without performing illegal operations, we use synthetic control (SCM) ) method to find a reasonable control group: By looking for comparable provinces as the control group for comparative evaluation before and after price adjustment, a virtual control group with similar data characteristics to the provinces in the experimental group is synthesized.

  • Plan details :

Experimental group: City A.

Control group: A virtual group (B *0.584+ C *0.223+ D *0.183+ E *0.01).

(The fitting situation is shown in Figure 3-7. It can be seen that the experimental group and the control group have a better fitting situation).

  • Data indicators : total conversion cr (submit hotel uv/list page uv).

ad0f07a58f8fcd2bfedc07731b12144a.jpeg

Figure 3-7 SCM model result diagram

3.4 Policy intervention scenario—Regression Discontinuity (RDD)

  • Model introduction : Use observation data very close to the critical point of intervention to construct the experimental group and the control group.

  • Business background : The official account will tweet every Thursday, and the tweet reminder method has been changed from a strong reminder to a weak reminder to evaluate its impact on the official account's reach conversion rate.

  • Data processing :

Centralization: The attention time is centered so that the critical point is 0, and the number of hours from the critical point is used as the relative attention time.

Data grouping: Users are sorted and grouped according to their attention time. There are about 100 people in each group, and the average relative attention time is taken.

Key indicators:

a, Intervening variable D: reminder method (0: strong reminder, 1: weak reminder).

b. Result variable Y: 3-day payment conversion rate, 7-day payment conversion rate.

c. Configuration variable X: the average relative attention time of each group.

  • Scheme design :

Get help from following public account users, as shown in Figure 3-8.

Experimental group: On the last Thursday before the revision, new public account users will be closed in the first three days.

Control group: the first Thursday after the revision, the first three days of the new official account users.

5dd6e201611391e1a9aa9e8ac682c3e5.jpeg

Figure 3-8 Breakpoint regression idea diagram

  • Data fitting :

Changing from a strong reminder to a weak reminder significantly reduces both the 3-day conversion rate and the 7-day conversion rate (P value is less than 0.01), as shown in Figure 3-9.

084fb76436234ce236a3fc84dca5c82f.png

Figure 3-9 Breakpoint regression result graph

4. Summary of the use of causal inference

4.1 Use of causal inference

Causal inference is divided into two parts: causal identification (discovery) and causal effect estimation.

  • WHEN: When it is impossible to design a perfect randomized experiment, the causal effect is measured from observational data (fitting a random experiment).

  • WHAT: The essence is to strip away the impact of external variables that we don’t care about on the results, so as to accurately estimate the single impact of the strategic factors we care about on the results.

  • HOW: The selection of evaluation methods can essentially be summarized as: use scenario identification to select an appropriate causal inference method, and use an appropriate method combined with real business data to solve the problem.

4.2 Use scene recognition

Through practical summary, the common usage scenarios of causal inference methods are as follows (Figure 4-1):

1) Scenario 1: Non-experimental scenario strategy effect evaluation

  • Problem identification: The evaluation calculates the group effect (ATE), and AB experiments cannot be performed.

  • Core idea: artificially create a virtual control group and compare it with the strategy's online data to estimate the true effect of the strategy.

  • Instructions: PSM\SCM\Casual Impact\DID.

  • Common scenarios:

a. The establishment of a new airport in Beijing will have an impact on our orders.

b. The sudden change of the reminder method on the WeChat official account will have an impact on the conversion rate of our users.

c. Study the impact of policies: For example, if a certain region passes a law to raise the minimum wage from US$4.25 to US$5.05 per hour, and an adjacent region keeps it unchanged, will it increase the number of jobs?

2) Scenario 2: Forward user exploration in the experimental scenario

  • Problem identification: Explore the heterogeneous effects of intervention (strategy) on different users (also called HTE), which refers to which segmented users are more sensitive to the strategy and more likely to be affected and how much the impact is, so as to better attribute and understand different For user groups, the traditional approach is multi-dimensional analysis, which is inefficient and error-prone.

  • Core idea: The group of people who are most sensitive to a certain intervention.

  • How to use: Cause-and-effect tree/cause-and-effect forest.

  • Common scenarios: Usually, analysis is done in combination with experiments.

a. Select those users with significant experimental results during the experiment, analyze their characteristics, find sensitive users, and help us make the next iteration.

b. A certain business conducted a product optimization experiment, but the performance of various consumption data in the experiment was poor. Taking the average usage time of the APP as an example, we want to find some groups of consumers who have positive experimental benefits.

3) Scenario 3: Research on strategically sensitive groups

  • Problem identification: Find the real groups sensitive to intervention (strategy) and invest budget/resources into these groups.

  • The core idea: Attribute the desired results (such as order conversion, etc.) and find the people who caused the desired results due to a certain intervention.

  • How to use: Uplift Model.

  • Common scenarios: user marketing scenarios, saving costs and improving ROI.

a. Now that the company has a budget, it can send coupons to users to increase user purchase rates. It should be sent to which users on the platform.

4) Scenario 4: Analysis of causal impact indicators

  • Problem identification: Analyze the impact of one or more indicators on the results or find factors that have a causal impact on the results and evaluate the impact.

  • Core idea: Conduct causal modeling based on historical observation data to solve multicollinearity problems and nonlinear problems of independent variables and dependent variables. Causal inference often encounters the problem of confounding variables. For example, we want to analyze the impact of live broadcast recommendation diversity (indicator D) on user activity (indicator Y), but at this time there are many variables X that are related to both D and Y. Related. The traditional way to solve this kind of problem is to use X to perform linear regression on Y. The parameters of X are the influence effects, or to use XGboost to look at the Shap value, etc. However, traditional methods rely on many strong assumptions, such as the lack of multicollinearity, and the estimates obtained under strong assumptions may not be reasonable. Therefore, in this scenario, traditional indicator impact analysis methods will not meet business needs. Double Machine Learning can provide ideas.

  • Common scenarios:

a. Estimate the causal effect between the price of ice cream and its sales volume.

b. The impact of installing Douyin on Kuaishou usage time.

c. Explore which potential user behaviors or content have a positive causal impact on user activity, and measure the causal effects.

e72de6a40264d74ca68b313c3132a297.jpeg

Figure 4-1 General framework for causal inference

[Recommended reading]

41544850eecd4a49122726b0b813e7aa.jpeg

 “Ctrip Technology” public account

  Share, communicate, grow

Guess you like

Origin blog.csdn.net/ctrip_tech/article/details/131237972